Научная статья на тему 'On the Use of Entropy as a Measure of Dependence of Two Events'

On the Use of Entropy as a Measure of Dependence of Two Events Текст научной статьи по специальности «Математика»

CC BY
56
32
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
entropy / average information / degree of dependence / probability space / probability distribution / experiment in a sample space / linear system / affine isomorphism / classification space

Аннотация научной статьи по математике, автор научной работы — Valentin Vankov Iliev

We define degree of dependence of two events A and B in a probability space by using Boltzmann-Shannon entropy function of an appropriate probability distribution produced by these events and depending on one parameter (the probability of intersection of A and B) varying within a closed interval I. The entropy function attains its global maximum when the events A and B are independent. The important particular case of discrete uniform probability space motivates this definition in the following way. The entropy function has a minimum at the left endpoint of I exactly when one of the events and the complement of the other are connected with the relation of inclusion (maximal negative dependence). It has a minimum at the right endpoint of I exactly when one of these events is included in the other (maximal positive dependence). Moreover, the deviation of the entropy from its maximum is equal to average information that carries one of the binary trials A U Ac and B U Bc with respect to the other. As a consequence, the degree of dependence of A and B can be expressed in terms of information theory and is invariant with respect to the choice of unit of information. Using this formalism, we describe completely the screening tests and their reliability, measure efficacy of a vaccination, the impact of some events from the financial markets to other events, etc. A link is available for downloading an Excel program which calculates the degree of dependence of two events in a sample space with equally likely outcomes.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «On the Use of Entropy as a Measure of Dependence of Two Events»

On the Use of Entropy as a Measure of Dependence of

Two Events

Valentin Vankov Iliev •

Institute of Mathematics and Informatics Bulgarian Academy of Sciences Sofia, Bulgaria viliev@math.bas.bg

Abstract

We define degree of dependence of two events A and B in a probability space by using Boltzmann-Shannon entropy function of an appropriate probability distribution produced by these events and depending on one parameter (the probability of intersection of A and B) varying within a closed interval I.

The entropy function attains its global maximum when the events A and B are independent. The important particular case of discrete uniform probability space motivates this definition in the following way. The entropy function has a minimum at the left endpoint of I exactly when one of the events and the complement of the other are connected with the relation of inclusion (maximal negative dependence). It has a minimum at the right endpoint of I exactly when one of these events is included in the other (maximal positive dependence).

Moreover, the deviation of the entropy from its maximum is equal to average information that carries one of the binary trials A u Ac and B u Bc with respect to the other. As a consequence, the degree of dependence of A and B can be expressed in terms of information theory and is invariant with respect to the choice of unit of information.

Using this formalism, we describe completely the screening tests and their reliability, measure efficacy of a vaccination, the impact of some events from the financial markets to other events, etc.

A link is available for downloading an Excel program which calculates the degree of dependence of two events in a sample space with equally likely outcomes.

Keywords: entropy; average information; degree of dependence; probability space; probability distribution; experiment in a sample space; linear system; affine isomorphism; classification space.

1. Introduction

In this paper we study the set of ordered pairs (A, B) of events in a probability space in order to define a measure of dependence (the power of relations) between A and B. This is done by means of Boltzmann-Shannon entropy of a variable probability distribution that arises naturally out of the pair (A, B). This approach is radically different from the standard ones where most of the measures of dependence are linear functions in the probability of intersection A n B, see Section 4. For a detailed study of these measures refer to [1].

The ordered pairs ( A, B) in the form of two-attributed tables are used meticulously by G. Udny Yule in his memoir [9] in order to "...classify the objects or individuals observed into two classes only". He presents various examples and defines different indices to study the degree of association (dependence) of the corresponding two events. Any such ordered pair of events is said to be, as we termed it, an Yule's pair of events. G. Udny Yule himself noted that W. R. Macdonell in [4] used a two-attributed table as a tool to study "...the degree of effectiveness of vaccination in small-pox".

The paper is organized as follows. In Section 2 we parameterize the members of an equivalence class consisting of Yule's pairs with fixed probabilities a and ft (that is, Yule's pairs of type (a, ft)). We use the fact that the probability distribution produced by the probabilities of results of the experiment corresponding to a Yule's pair (cf. [3,1,§5]) is solution of a linear system with one free variable 9 ( see (3)). The system of inequalities that restrict the components of this solution is equivalent to the restriction of the variation of 9 within a closed interval I (a, ft) C [0,1]. Thus, we naturally introduce (a, ft, 9)-equivalence classes of Yule's pairs, whose members are said to be Yule's pairs of type (a, ft, 9). Note that for any such pair, 9 is the probability of the intersection of its components.

When we vary (a, ft) € [0,1]2, the segment {a} x {ft} x I(a, ft) sweeps a tetrahedron T3 in R3, so the (a, ft, 9)-equivalence classes of Yule's pairs are represented by some points in T3, which, in turn, form so called dotted tetrahedron T^ ).

On the other hand, the affine isomorphism (4) which transforms R3 onto the hyperplane H in R4 that contains the solutions of the linear system (3), transforms the tetrahedron T3 onto the 3-simplex A3 C H. Moreover, the dotted tetrahedron T3 ) is mapped on the dotted 3-simplex A() C H, the latter classifying the probability distributions produced by all Yule's pairs. For the precise statements see Theorem 1 and Figure 1.

Given a Yule's pair of type (a, ft, 9), Boltzmann-Shannon entropy Ea,ft (9) of its distribution is a continuous function in 9 € I (a, ft) and its behaviour is described in Theorem 2 from Section 3. In particular, we show that Ea,ft (9) attains its global maximum at the only point 90 = aft for which the components of all Yule's pairs of type (a, ft, 90) (if any) are independent. The special case of a sample space with equally likely outcomes illustrates the fact that the maximum of dependence occurs at the endpoints of the interval I (a, ft). More precisely, at the left endpoint we have A C Bc or Bc C A and at the right endpoint — A C B or B C A.

Finally, Ea,ft (9) = EftA(9) and this common entropy function strictly increases to the left of 90 an strictly decreases to the right.

All of this motivates the use of entropy function Eaft (9) as a measure of dependence of two events A and B with Pr( A) = a and Pr( B) = ft: Negative dependence to the left of 90 and positive dependence to the right. By modifying appropriately Ex,p (9) by linear functions, we obtain a strictly increasing continuous function ea,p which maps the range of 9 onto the interval [—1,1] and serves (and is termed) as degree of dependence of the events A and B.

From the link provided in Remark 1 one can download a simple Excel program which calculates the degree of dependence of two events in a discrete uniform probability space with given cardinalities and given cardinality of their intersection.

It turns out that the expression for entropy function Ea,p is a particular case of what Shannon called in [6, Part I, Sction 6] the entropy of the joint event. More precisely, this is the complete amount of information which contains in the results of the experiment J from (1). On the other hand, J is the joint experiment of two binary trials: A = A U Ac and B = B U Bc. Theorem 3 shows that the mutual information I(A, B) of the experiments A and B is equal to the deviation of the entropy Ea,p(9) from its maximum Ea,p(aft). In accord with the expression (7) which represents the function ea,p(9) as a fraction of amounts of information, the degree of dependence of two events is invariant with respect to change of unit of information (bits, nats, etc.).

In case Yule's pairs are models of a screening tests, the probability F_ (9) of false negative and the probability F+ (9) of false positive test are tending from statistically insignificant nearby the left endpoint of the range of 9 to statistical significance in a neighbourhood of the right endpoint. Moreover, on the complement of any such neighbourhood the product F_(9)F+(9) is bound below by a positive constant. In other words, a kind of uncertainty principle holds — see Subsection 5.4.

In Subsection 5.5 we show that the degree of dependence of pairs of events can be used as a measure of effectiveness of vaccine for a particular decease. As an example we estimate the efficacy of vaccine for small-pox tested via the epidemic at Sheffield in 1887-88, the statistical data taken from [9,1] .

In Section 4 we give several examples of other measures of dependence which are evaluated

by using Sheffield's sample.

2. Definitions and Notation

Let (Q, A, Pr) be a probability space with set of outcomes Q, ^-algebra A, and probability function Pr. In this paper we are using only the structure of Boolean algebra on A. We introduce the following notation:

R is the range of the probability function Pr: A ^ R; [(a,0)] is the fiber of the surjective map A2 ^ R2, (A, B) ^ (Pr(A),Pr(B)), over (a,0) e R2; 9(A,B) = Pr(A n B), (A, B) e A2; [(a,0, 9)] is the fiber of the map [(a,0)] ^ R, (A, b) ^ 9(A,B), with image R(a,0) C R, over any 9 e R(a,0).

We note that the fibers [(a, 0)] for (a, 0) e R2 form a partition of A2 and the fibers [(a, 0,9)] for 9 e R(a,0) form a partition of [(a, 0)].

As usual, the events 0 and Q are called trivial. The members of the equivalence class [(a, 0)] (resp., the equivalence class [(a, 0,9)]) are said to be Yule's pairs of type (a, 0) (resp., Yule's pairs of type (a, 0,9)).

3. Methods

In this paper we are using fundamentals of:

• Linear algebra,

• Affine geometry,

• Information Theory.

4. Classification of Yule's Pairs

4.1. The Probability Distribution of a Yule's Pair

Any ordered pair (A, B) e A2 produces an experiment

J = (A n B) U (A n Bc) U (Ac n B) U (Ac n Bc) (1)

(cf. [3, I,§5]) and the probabilities of its results:

5^A,B) = Pr(A n B), ¿A,B) = Pr(A n Bc),

Z3A,B) = Pr(Ac n B), ¿A,B) = Pr(Ac n Bc). For any (A, B) e [(a,0)], the probability distribution

(&, 52, 53,54 ) = (z1A,B), 52AB), 53AB), dA,B)) (2)

satisfies the linear system

£1 + £2 = a

£3 + £4 = 1 - a

£1 + £3 = ß

£2 + £4 = 1 - ß.

(3)

Let H be the affine hyperplane in R4 with equation £1 + £2 + £3 + £4 = 1. The solutions of (3) depend on one parameter, say 9 = £1, and form a straight line £a,ß in H with parametric

representation

4,ß : £1 = 9, £2 = a - 9, £3 = ß - 9, £4 = 1 - a - ß + 9.

The map

i : R3 ^ H, (a, ß; 9) ^ (9, a - 9, ß - 9,1 - a - ß + 9) (4)

is an affine isomorphism with inverse affine isomorphism

X: H ^ R3, £ ^ (ft + ft, ft + ft, ft).

The trace of the 4-dimensional cube {£ € R4|0 < ft < 1,k = 1,2,3,4} onto the hyperplane H is the 3-dimensional simplex, that is, the tetrahedron, A3 defined in H by the inequalities

£1 > 0, £2 > 0, £3 > 0, £1 + £2 + £3 < 1.

The inverse image T3 = i-1(A3) via the affine isomorphism i is the tetrahedron in R3 defined by the system of inequalities

9 < a, 9 < ft, 9 > a + ft - 1,9 > 0. (5)

In other words, this is the tetrahedron with vertices O(0,0,0), M(1,0,0), N(0,1,0), P(1,1,1) — see Figure 1.

For any fixed (a, ft) € R2 we set AK,p = {a} x {ft} x R, C(a, ft) = AK,p n T3, so C(a, ft) = {a} x {ft} x I (a, ft), I (a, ft) c R. The system (5) yields that I (a, ft) equals the closed interval \i(a, ft), r(a, ft)], i(a, ft) = max(0, a + ft - 1), r(a, ft) = min(a, ft). We have aft € I (a, ft) and denote by I (a, ft) the interior of the interval I(a, ft) . We obtain immediately:

Lemma 1. Let (a, ft) € [0,1]2. The next three statements are equivalent:

(i) One has (a, ft) € (0,1)2.

(ii) One has aft € I(a, ft).

(iii) One has l(a, ft) = 0.

(iv) Under the above conditions, one has £k(9) > 0 for all 9 € I(a, ft) and all k = 1,2,3,4.

(v) Conversely, if there exists 9 € I (a, ft) such that ft (9) > 0 for all k = 1,2,3,4, then (i) — (iii) hold.

We have R(a,ft) c I(a, ft) and define the dotted interval I( ')(a, ft) = R(xft). The dotted segment C()( a, ft) = {a} x {ft} x I(')(a, ft), (a, ft) € R2, is the locus of all triples of probabilities (a,ft,9(A,B)), where (A, B) € [(a, ft)].

For any (a, ft) € R2 we set D(a, ft) = i(C(a, ft)). Since i(Aaft) = lK/p, we obtain that D(a, ft) =

n A3.

Let Ik (a, ft) = [Ik (a, ft), rk (a, ft)] be the corresponding range of the real variable ft (9) for k = 1,2,3,4, with (a, ft) = I (a, ft). The line segment D(a, ft) in has endpoints

(lx(a, ft),£2(a, ft),¿3(a, ft),£4(a, ft)), (n(a, ft),r2(a, ft),r3(a, ft),n(a, ft)). Since i is also a homeomorphism, we have D(a, ft) = laft n A3.

The line segment D(a,ft) contains the dotted segment D(')(a,ft) which is the locus of all probability distributions (2) for which (A, B) € [(a,ft)].

Finally, we note that T3 = U(a ^^ ^2C(a, ft), A3 = U(a ^^ ^2D(a, ft), and the unions T( ) =

^(x,ft)€R2C(')(a, ft), A3') = U(a,ft)eR2D(')(a, ft) are the corresponding dotted tetrahedrons. The above considerations and Figure 1 yield the following theorem and its corollary.

Theorem 1. (i) The affine isomorphism i: R3 ^ H from (4) is a strictly increasing transformation of any line segment C(a, ft) (resp., dotted line segment C(')(a,ft)) onto the line segment D(a, ft) (resp., onto the dotted line segment D(')(a,ft)).

(ii) i maps the tetrahedron T3 (resp., dotted tetrahedron T3( )) onto the tetrahedron A3 (resp., onto the dotted tetrahedron A3 )).

(iii) The dotted tetrahedron T3( ) is the classification space of all equivalence classes [(a, ft, 9)] of Yule's pairs.

(iv) The dotted tetrahedron A3 ) is the classification space of all probability distributions (2)

produced by Yule's pairs.

Corollary 1. (i) One has £1 (9) = 0 if and only if (a,ft, 9) € MON.

(ii) One has £2(9) = 0 if and only if (a,ft,9) € NOP.

(iii) One has £3(9) = 0 if and only if (a,ft,9) € MOP.

(iv) One has £4(9) = 0 if and only if (a, ft, 9) € MNP.

5. Entropy and Dependence of Yule's Pairs

5.1. Entropy of a Yule's Pair

Let us suppose that (a, fi) € (0,1)2. Then Lemma 1 implies I(a, fi) = 0 and 5k(9) > 0 for 9 € I(a, fi) and for all k = 1,2,3,4. Therefore Boltzmann-Shannon entropy of the probability distribution (51 (9),52(9),53(9),54(9)) is defined (cf. [6], [7]):

4

Eafi(9) = - E 5k(9) ln(5k(9)),9 € I (a, fi). k=1

Theorem 2. Let (a, fi) € (0,1)2. (i) For any 9 € I (a, fi) one has

W (9)= ln 52(9)53(9)

Ea fi (9)=ln mm •

(ii) The function Eafi (9) in 9 strictly increases on the interval (£(a, fi), afi] and strictly decreases on the interval [afi,r(a, fi)), having a global maximum at 9 = afi.

(iii) The function Eafi(9) can be extended uniquely as a continuous function on the closed interval I (a, fi), which strictly increases on the interval [i(a, fi), afi] and strictly decreases on the interval [afi, r(a, fi)].

(iv) One has I(a, fi) = I(fi, a) and Eafi = EpA.

Proof. (i) We have

(.) = - E 5k (9) M5k (9)) - E 5k <9> H=.

(ii) The equation E^ (9) = 0 (resp., the inequality E^(9) > 0) is equivalent to 9 = afi (resp., 9 < afi).

(iii) According to Corollary 1, one or two functions 5k(9) are zero at any endpoint of each interval I(a,fi) and all functions 5k(9) are strictly positive on the interior I(a,fi). For a fixed interval I (a, fi) and a fixed endpoint a of I(a, fi) the limit lim9^a Eafi (9) exists and we extend Eafi (9) as continuous at the point a.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(iv) We have I (a, fi) = I (fi, a) and the transposition of events A and B yields transposition of the functions 52(9) and 53(9).

The continuous function Eafi (9) in 9 € I (a, fi) is said to be the entropy function of Yule's pairs of type (a, fi) and its value at 9 = 9(A,B) is called entropy of Yule's pair (A, B) of type (a, fi). Theorem 2 implies immediately

Corollary 2. Let (a, fi) € (0,1)2. The following three statements are equivalent:

(i) One has 90 = afi.

(ii) If Yule's pair (A, B) of type (a, fi) satisfies the equality 5xA,B) = 90, then the events A and B are independent.

(iii) The entropy function Ea,p(9) of Yule's pairs of type (a, fi) attains its global maximum at the point 90.

5.2. Degree of Dependence of Pairs of Events

We "normalize" the entropy function by composing the functions Ex,p (9) and 2Ea,p (afi) — Exfi (9) on the intervals of their increase by the appropriate linear functions and obtain for any pair (a, fi) € (0,1)2 a continuous function ea,fi: I(a, fi) ^ [—1,1]. In accord with Theorem 2, (iv), we

have eaft = epA. The value of the function eaft at 9 E I (a, ft), 9 =

9(A,B),

is said to be degree of

dependence of the events A and B with a = Pr( A), ft = Pr(B).

The function eaft strictly increases on the interval I (a, ft) from —1 to 1 and attains value 0 at the point aft. The events A and B are said to be negatively dependent if 9(A,B) < aft and positively dependent if 9(A,B) > aft. When 9(A,B) = aft the events A and B are independent (the entropy is maximal). In a small neighbourhood of the left endpoint l(a, ft) of the interval I (a, ft) the dependence is negatively strong, with "maximum" 1 = | — 1| at 9 = i(a, ft) (if attainable). In a small neighbourhood of the right endpoint r(a, ft) of the interval I (a, ft) the dependence is positively strong, with maximum 1 at 9 = r(a, ft) (if attainable). In both cases, the entropy is minimal at the endpoints l(a, ft) and r(a, ft) of the corresponding semi-intervals. Note that in a small neighbourhood of the point 9 = aft the events A and B are "almost independent" (the entropy is close to its maximum).

Remark 1. One can find below the link to a simple Excel program which calculates the degree of dependence of two events in a sample space with equally likely outcomes: http://www.math.bas.bg/algebra/valentiniliev/

5.3. A Glance at the Information Theory

The experiment J from (1) is the joint experiment (see [6, Part I, Section 6]) of two simpler binary trials: A = A U Ac and B = B U Bc with Pr( A) = a, Pr(B) = ß. The average quantity of information of one of the experiments A and B, relative to the other, (see [2, §1]), is defined in this particular case by the formula

I (A B) = *ln â + «2ln «fir) + 6 ln (T-Oß+*ln (i - am - ß) (6)

The definition of the degree function ea,ß (9) and (6) yield immediately the following: Theorem 3. (i) One has

I (A, B)(9) = Ea,ß (aß) - Eaß (9)

for all 9 G I (a, ß). (ii) One has

e (9) , - eeff-EiZr, ifft) <9 < "ft (7)

^ E y^fin «aft < 9 < r(a, ft).

Ea,ft(a ft)—Ea ,ft(r(a ,ft)) ^ - - V

Remark 2. Since

Ea a (aft) — Ea ,g (£(a, ft)) = max I (A, B)(t ),

r r t(a ,ft)<T<a ft

Eaft (aft) — Eaft (r( a , ft)) = max I (A, B)(t),

r r a ft<T<r(a ,ft)

we can write down the equality (7) in the form

mB)ulB)M if *(a, ft) < 9 < aft

,ft)<T<a ft i(a,b)(t) \ — '

_m,B)(9)_ if aR<9< r( aft)

maxxft<T<r(x,ft) i(A,B)(T) if aft < 9 < r(a ft). Part (ii) of the above theorem implies

a ia\ — ) max«a,ß)<T<aß i(a,b)(t)

ea,ß (9)= \ ( i(A,B)(9)

Corollary 3. The degree of dependence of two events does not depend on the choice of unit of information.

The graphs of Eaft and eaft for some particular (a, ft) E (0,1)2 are presented in Figures 2 and

5.4. Application: Description of a Screening Test

According to Merriam-Webster Dictionary, a screening test is "...a preliminary or abridged test intended to eliminate the less probable members of an experimental series". In other words, some of the equally likely outcomes of a sample space Q possess a property and, in this way, form an event A. On the other hand, there exists an event B consisting of all outcomes which as if have this property after conducting the test. Thus, we obtain a Yule's pair of events in a sample space Q. The test does not always work perfectly — sometimes it is negative under the condition that the property is present (that is, false negative), and sometimes it is positive under the condition that the property is absent (that is, false positive). Let us suppose that all members of the population are tested and that Pr( A) = a, Pr(B) = fi, where (a, fi) € (0,1)2. Yule's pair (A, B) produces the experiment (1) and in turn, the probability distribution (2) consisting of its results. In the notation introduced in Subsection 4.1, the probability F— of false negative result

5(a,b) 5(a,b) a9 (. B)

is = and the probability F+ of false positive result is = l-, where 9 = 9(A,B). The product F— (9)F+(9) is a quadratic function in 9 which strictly decreases on the interval I(a, fi) and assumes value 0 at its right endpoint r(a, fi). In particular, for the complement of any open neighbourhood of r(a, fi) in the interval I(a, fi), there exists a positive constant K such that F— (9)F+(9) > K for any point 9 from this complement. In other words, both F— (9) and F+(9) can not be simultaneously as small as we want (a kind of uncertainty principle). The conditional probabilities F— (9) and F+(9) are statistically acceptable in a small neighbourhood of the point r(a, fi), at least one being 0 at this point. The reliability of F— (9) and F+(9) decreases when 9 approaches the left endpoint £(a, fi) of I (a, fi). When 9 = l(a, fi), at least one of F— (9) and F+(9) is equal to 1. In terms of the degree of dependence eafi(9), 9 = 9(A,B), of the events A and B, this behaviour can be described in the following way: When eafi (9) is close to —1, then the test is not reliable but its effectiveness increases when eafi (9) approaches 1. In a small neighbourhood of 1 the test is statistically acceptable.

5.5. Application: Effectiveness of Vaccination

Let us consider a population whose members have a particular disease for which a vaccine is developed. Let A be the set of all those who have recovered and let B be the set of vaccinated members. Then Ac is the set of all fatal endings and Bc is the set of unvaccinated. If a = Pr(A), fi = Pr(B), then the degree of dependence eafi(9), 9 = 9(A,B), of the events A and B measures the effectiveness of the corresponding vaccine. More precisely, when eafi (9) is close to —1, then the vaccine is counterproductive and its effectiveness increases being negative when eafi(9) < 0 and positive when eafi(9) > 0. In case eafi(9) = 0 the vaccination does not influence the recovery and it is very positively effective when ex^ (9) is close to 1.

In his memoir [9, Section I, Table I], G. Udny Yule presents a table used by W. R. Macdonell in [4] in order to show "...the recoveries and deaths amongst vaccinated and unvaccinated patients during the small-pox epidemic at Sheffield in 1887-88", see Table 1.

We have a = 0.88262811, fi = 0.899213268, 9 = 0.840102063, and eafi (9) = 0.268810618. Therefore the results of this vaccination are faintly positive (the recovery is not only due to vaccination!).

Yule's pair (A, B) considered as a screening test has statistically acceptable false negative probability Pr(Bc | A) « 0.0482 (the probability that a member is unvaccinated under the condition that he/she is recovered). On the other hand, this test has not statistically significant false positive probability Pr(B| Ac) « 0.4964 (the probability that a member of the population was vaccinated under the condition that he/she is dead). Equivalently: the matter of life and death depended of the result of tossing an almost fair coin!

6. Other Measures of Dependence In this section we assume that Q is a sample space with equally likely outcomes.

6.1. Yule's Q

The difference 5 = Pr(A n B) — Pr(A) Pr(B) = 9 — afi (the deviation from independence) is called copula between A and B. It is cited by G. Udny Yule in [9, Section I, no 5]. He notes there that the relation 5 = 5i (9)54(9) — 5i(9)53(9) is due to Karl Pearson (one of his teachers).

In [9, Section I, no 9], G. Udny Yule introduces his measure of association first given in [8, Section I, no 9]:

51(9)54(9) — 52(9)53(9) _ 9 — afi

51(9)54 (9) + 52(9)53 (9 ) 292 — (2a + 2fi — 1)9 + afi'

It has the necessary properties: (a) Q = 0 if and only if A and B are independent; (b) Q = 1 if and only if A c B or B c A; (c) Q = —1 if and only if A c Bc or Bc c A. Finally, —1 < Q < 1. In the case of Sheffield's epidemic, we have Q = 0.902299648. We define the function

Qa,fi (9)= 292 _ 1)9 + afi'9 € '(afi),

which produces Yule's Q (see Figure 7).

Remark 3. There are infinitely many functions of the form h(9) = f (9)+g(9) with the properties

(a), (b), (c), and —1 < h(9) < 1, defined on the interval I(a, fi), (a, fi) € (0,1)2. For example, there exist infinitely many pairs of cubic polynomials f (9) and g(9) which work.

6.2. Obreshkoff's Measures of Dependence

The properties of the copula 5 are also discussed by N. Obreshkoff in his textbook [5, Chapter 3,§6] and in [3]. In particular, the relation Pr(B|A) — Pr(B) + prj^A) shows that ... the probability of one of these events increases under the condition that the other comes true in case 5 > 0 and decreases in case 5 < 0". Moreover, —5 = Pr(A n Bc) — Pr(A) Pr(Bc). The number

P(B; A) = Pr(B| A) — Pr(B| Ac ) =

is called coefficient of regression ofB with respect to A. It measures the influence of A on B. We have — 1 < p(B; A) < 1.

It has the following properties: (a) p(B; A) = 0 if and only if A and B are independent, (b) p(B; A) = 1 if and only if A = B, (c) p(B; A) = —1 if and only if Ac = B.

In the above example of small-pox epidemic at Sheffield we have p(B; A) = 0.44819565. We define the functions

pafi(9) = mfi, pfia(9) = fi—fi),9 € I(a,fi),

which produce the corresponding coefficients of regression (see Figures 4 and 5).

The numbers p(B; A) and p( A; B) have the same sign and, in general, are not equal. Their geometric mean

R(A,B) = = 9 ~*fi =,

v V afi(1 — a)(1 — fi)

where ± is chosen to be the common sign of p(B; A) and p(A; B), is said to be coefficient of correlation between A and B. This coefficient has the above properties (a) — (c). In the case of Sheffield's epidemic we have R(A, B) = 0.4791876. We define the function

Rafi (9) = / a(1 — ti fi)' 9 € I (a, fi),

V afi(1 — a)(1 — fi) which produces the corresponding coefficient of correlation (see Figure 6).

7. Conclusions

This paper presents an original approach to the problem of measuring the degree of dependence of two events A and B in a probability space. It uses the only reliable way of evaluation of the power of relations between these events, borrowed from statistical physics and information theory: this is the utilization of Boltzmann-Shannon entropy. More precisely, we start with the joint experiment assembled by the two binary trials A U Ac and B U Bc. The four probabilities of results of this experiment constitute a variable probability distribution and satisfy a simple linear system whose general solution (ft, ft, ft, ft) depends on one parameter 9 (the probability of intersection A n B). Note that due to the natural constraints on ft, k = 1,2,3,4, 9 varies throughout a closed interval I(a, ft), where a = Pr(A) and ft = Pr(B). We modify naturally the entropy function of the distribution (ft(9),ft(9),g3(9),ft(9)) and obtain the degree of dependence function ea,ft(9): I(a, ft) ^ [—1,1]. By definition, if 9 = Pr(A n B), then eaft(9) measures the intensity of relations between A and B.

Our degree of dependence is still within the probation period. In its defence it can be said that evaluates the mutual information which is exchanged between the random objects A and B and, moreover, does not depend on the choice of unit of information. It also reflects plausibly the behaviour of a screening test or impact of a vaccination on the survival of a person. The function eaft (9) can also be used for measuring the effectiveness of a drug or medical treatment, the association of adverse events with use of some particular drug, the association of certain events with the stock market prices, etc.

8. Figures and Tables

B Bc Total

A 3951 200 4151

Ac 278 274 552

Total 4229 474 4703

Table 1

In all graphs below we use Sheffield's sample data.

Figure 2

Graph of the entropy function Eaft (9)

0.78 0.8 0.62 0.84 0.66 0.8B 0.9

Figure 3

Graph of the degree function eaft (9)

Figure 4

Comparison of the graphs of eaft (9) and pp;a (9)

Figure 5

Comparison of the graphs of ea^ (в) and paф (в)

0.78 0.8 0.82 0.84 0.86 0.8B O.B

Figure 6

Comparison of the graphs of ea^(в) and Ra; p(в)

0.78 0.8 0.82 0.84 0.86 0.8Б 0.0

Figure 7

Comparison of the graphs of ea^ (в) and Qa^ (в)

Acknowledgements

It is a pleasure for me to cordially thank Boyan Dimitrov, Kettering University, MI, USA, whose notes and suggestions are invaluable for making this paper more readable.

I would also like to express my sincere thanks to Dimitar Guelev for sending me references to the works of A. N. Kolmogorov on information theory, to Hristo Iliev for his mastery of drawing the graphs in this paper, and to the administration of the Institute of Mathematics and Informatics at the Bulgarian Academy of Sciences for creating perfect and safe conditions of work.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

9. Declaration of Conflicting Interests

The Author declares that there is no conflict of interest.

References

[1] Dimitrov B. (2010) Some Obreshkov Measures of Dependence and Their Use. Comptes rendus de l'Academie bulgare des Sciences 63:5-18.

[2] Gelfand I. M., Kolmogorov A. N., Yaglom A. M. (1993), Amount of Information and Entropy for Continuous Distributions. Mathematics and Its Applications, Selected Works of A. N. Kolmogorov, III: Information Theory and the Theory of Algorithms, 33-56, Springer Science+Business Media Dordrecht 1993.

[3] Kolmogorov A. N. (1956). Foundations of the Theory of Probability, Chelsea Publishing Company, New Yourk 1956.

[4] Macdonell W. R. (1902) On the influence of previous vaccination in cases of small-pox. Biometrika, 1:375-383.

[5] Obreshkoff N. (1963). Theory of Probability, Nauka i Izkustvo, Sofia 1963 (In Bulgarian)

[6] Shannon C. E. (1948) A Mathematical Theory of Communication. Bell System Technical Journal, 27:379-423.

[7] Shannon C. E. (1948) A Mathematical Theory of Communication. Bell System Technical Journal, 27:523-656.

[8] Yule G. Udny. (1900) On the Association of the Attributes in Stattistics. Phil. Trans. Roy. Soc., A 194: 257-319.

[9] Yule G. Udny. (1912) On the Methods of Measuring Association Between Two Attributes.

Journal of the Royal Statistical Society 75: 579-652.

i Надоели баннеры? Вы всегда можете отключить рекламу.