Научная статья на тему 'On the Degree of Mutual Dependence of Three Events'

On the Degree of Mutual Dependence of Three Events Текст научной статьи по специальности «Математика»

CC BY
34
5
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
entropy / average information / degree of dependence / probability space / probability distribution / experiment in a sample space / linear system / affine isomorphism / classification space

Аннотация научной статьи по математике, автор научной работы — Valentin Vankov Iliev

We define degree of mutual dependence of three events in a probability space by using Boltzmann-Shannon entropy function of an appropriate variable distribution produced by these events and depending on four parameters varying, in general, within of a polytope. It turns out that the entropy function attains its absolute maximum exactly when the three events are mutually independent and its absolute minimum at some vertices of the polytope where the events are "maximally" dependent. By composing the entropy function with an appropriate linear function we obtain a continuous "degree of mutual dependence" function with the same domain and the interval [0, 1] as a target. It attains value 0 when the events are mutually independent (the entropy is maximal) and value 1 when they are "maximally" dependent (the entropy is minimal). A link is available for downloading a Java code which evaluates the degree of mutual dependence of three events in the classical case of a sample space with equally likely outcomes.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «On the Degree of Mutual Dependence of Three Events»

On the Degree of Mutual Dependence of Three Events

Valentin Vankov Iliev •

Institute of Mathematics and Informatics Bulgarian Academy of Sciences Sofia, Bulgaria viliev@math.bas.bg

"...one of the most important problems in the philosophy of natural sciences is ... to make precise premises which would make it possible to regard any given real events as independent."

A. N. Kolmogorov,

Foundations of the Theory of Probability

Abstract

We define degree of mutual dependence of three events in a probability space by using Boltzmann-Shannon entropy function of an appropriate variable distribution produced by these events and depending on four parameters varying, in general, within of a polytope. It turns out that the entropy function attains its absolute maximum exactly when the three events are mutually independent and its absolute minimum at some vertices of the polytope where the events are "maximally" dependent. By composing the entropy function with an appropriate linear function we obtain a continuous "degree of mutual dependence" function with the same domain and the interval [0,1] as a target. It attains value 0 when the events are mutually independent (the entropy is maximal) and value 1 when they are "maximally" dependent (the entropy is minimal). A link is available for downloading a Java code which evaluates the degree of mutual dependence of three events in the classical case of a sample space with equally likely outcomes.

Keywords: entropy; average information; degree of dependence; probability space; probability distribution; experiment in a sample space; linear system; affine isomorphism; classification space.

1. Introduction

In our papers [6] and [7]) we introduce and study a measure of dependence of two events in a probability space, based on the fundamental notion of Boltzmann-Shannon entropy. The present work is written as a natural conceptual continuation of the above papers for the case of three events Ai, A2, A3. By analogy, we consider the joint experiment J3 of the corresponding three binary trials, whose probability distribution gives rise to the entropy function that, in turn, measures the mutual dependence of these events.

In accord with [6,4.1], any one of the three pairs of events Ai, Aj, 1 < i < j < 3, produces a joint experiment J whose probability distribution satisfies the linear system (3). Since the partition J3 of the sample space is finer than each partition J, its probability distribution (£1,...,£8) satisfies the linear system (5). After fixing the probabilities a = (a1, a2, a3) of the components of Yule's triple A = (A1, A2, A3), the general solution of the last system depends on four parameters d = (0o,..., 03) chosen among £k's. Taking into account that £k Ws are probabilities, we obtain that d varies within a subset I7(a) of R4, which is described in Theorem 1. In case a € (0,1)3 the

set I7(a) is a polytope, see [2, Ch. 12]. Since the system of linear inequalities (9) which define the polytope I7(a) is minimal (Lemma 2), we can apply the machinery from the previous citation in order to use the corresponding properties of this polytope.

The 7-tuples (a, 9) vary within a polytope I7 C R7 which is the inverse image of the 7-dimensional simplex A7 via the affine isomorphism (7). The projection p(a, 9) = a produces the fibre bundle (I7, p, [0,1]3) with fibre p-1(a) = C7(a) where C7(a) = {a} x I7(a), for the definition see [5, Part I, 2,1.1]. This fibre bundle is used for classification of all equivalence classes of Yule's triples with given a and 9, cf. [6, Theorem 1]. An isomorphic fibre bundle can be used for classification of all probability distributions produced by the above equivalence classes of Yule's triples. The general patterns of these two fibre bundles are described in terms of very elementary algebraic geometry at the end of Subsection 4.2 where also classification Theorem 2 is

formulated.

Corollary 1, (ii), yields that 0 < £k(9) < 1, k = 1,..., 8, if and only if 9 € I7(a). In particular, I7(a) is the natural domain of the entropy function Ea(9) of the probability distribution (£k(9))8=i, defined in (11).

In Lemma 4 we prove that Ea (9) is a strictly concave function that can be extended in a unique way as continuous at the polytope I7(a). Moreover, its continuous extension Ea is also a strictly concave function. In Corollary 2 we show that all permutations of the members of Yule's triple A = (A1, A2, A3) have the same entropy.

Subsection 5.2 is devoted to finding the set of critical points of the entropy function Ea(9). It turns out that this set is not empty: The special point 9(a) € I7(a) defined by the formulae (10) is critical, see Lemma 6.

Since the Hessian of Ea (9) is a negative definite quadratic form everywhere in its domain I7(a), we obtain that the set of local maximums of the entropy function Ea (9) coincides with the set of its critical points, see Lemma 7.

In accord with Weierstrass theorem, the extended entropy function Ea (9) attains an absolute maximum and an absolute minimum in its compact domain fy(a). Theorems 3 and 4 make this statement more precise. The former asserts that Ea(9) has a unique absolute maximum at the point 9(a). The latter uses the structure of the frontier of the polytope I7(a), described, for example, in [2, Chapter 12,12.1], and shows that Ea(9) attains its absolute minimum only at some of its vertices. We note here an analogy with the simplex method.

Subsection 6.1 contains two statements that motivate the use the extended entropy function Ea(9) for measuring the power of mutual relations among three events. In Lemma 8 we show that the components of a Yule's triple are mutually independent if and only if the corresponding 9 coincides with 9(a). In other words, we observe mutual independence exactly when Ea (9) attains its absolute maximum, which is in keeping conformity with our intuition. In the case of sample space with equally likely outcomes, Lemma 9 establishes the set-theoretic relations among the components of a Yule's triple when the corresponding 9 lies on any one of the 3-faces of the polytope I7(a). Intuitively, the "maximally" tight-fitting is observed at the vertices some of which are points of absolute minimum of Ea(9).

Let A = (A1, A2, A3) be a Yule's triple with a = (a1, a2, a3), a1 = Pr(A1), a2 = Pr(A2), a3 = Pr(A3). In the final Subsection 6.2 we compose the extended entropy function Ea (9) with a linear function and define a function ea: I7(a) ^ [0,1], whose value at any 9 € I7(a) corresponding to A is said to be degree of dependence of the events A1, A2, A3. Note that (9(a)) = 0 (the events A1, A2, A3 are mutually independent) and ea(91) = 1 for any vertex 91 where Ea(9) attains its absolute minimum (the events A1, A2, A3 are maximally dependent).

2. Definitions and Notation

Let (Q, A,Pr) be a probability space with set of outcomes Q, ^-algebra A, and probability function Pr. In this paper we are using only the structure of Boolean algebra on A.

We introduce the following notation:

Given events A1, A2, A3 from A, we set A = (A1, A2, A3) € A3;

R is the range of the probability function Pr: A ^ R;

Given a1, a2, a3 € R, we set a = (a1, a2, a3);

Given 0o, d1, d2,03 € R, we set 0 = (00, 01, 02,03);

I(ai,aj) = [max(0,ai + aj — 1),min(ai, aj)], 1 < i < j < 3, see [6, 4.1];

I(<Xi,ai) = [max(0, ai — aj),min(ai,1 — aj)], 1 < i < j < 3; [(a)] is the fiber of the surjective map

A3 ^ R3, (A1, A2, A3) ^ (Pr(A1 ),Pr(A2),Pr(A3)),

over a € R3;

[(ai, aj)] is the fiber of the surjective map

A2 ^ R2, (Ai, Aj) ^ (Pr(Ai),Pr(Aj)), over (ai, aj) € R2, 1 < i < j < 3;

00A) = Pr(A1 n A2 n A3), 0(A) = Pr(A1 n A2 n A3), 02A) = Pr(A1 n A2 n A3), 03A) = Pr(A1 n A2 n A3), A € A3;

0(A) = (00A), 0(A), 02A), 03A));

[(a, 0)] is the fiber of the map [(a)] ^ R4, A ^ 0(A), over any 0 € R4, and R(a) is its range.

We note that the fibers [(a)] for (a) € R3 form a partition of A3 and the fibers [(a, 0)] for 0 € R(a) form a partition of [(a)].

The members of the fiber [(a)] are said to be Yule's triples of type (a). The members of the fiber [(a, 0)] are called Yule's triples of type (a, 0).

3. Methods

In this paper we are using fundamentals of:

• Linear algebra,

• Affine geometry,

• Polytope theory,

• Fibre bundles,

• Real algebraic geometry.

4. Classification of Yule's Triples and Their Probability Distributions

4.1. The Probability Distribution of a Yule's Triple

Any ordered triple A = (A1, A2, A3) € A3 produces three experiments of the form

Jij = (Ai n Aj) U (Ai n Ac) U (Ac n Aj) U (Ac n Ac), 1 < i < j < 3,

and the experiment

J3 = (A1 n A2 n A3) U (A1 n A2 n A3) U (A1 n A2 n A3) U (A1 n A2 n A3)U

(A1 n A2 n A3) U (A1 n A2 n A3) U (A{ n A2 n A3) U (AI n A2 n A3) (cf. [8,1,§5]). We introduce the following notation:

£(Ai,A) = Pr(Ai n Aj), £(Ai,A) = Pr(Ai n Ac), 532

r(A' ,Aj )

Pr(AC n A,), £

(A ,Aj )

Pr(Ac n A, ),1 < i < j < 3.

Moreover, we set

£(A) = Pr(n A2 n A3), ¿A) = Pr(A1 n A2 n A3),

¿A) = Pr(A1 n A2 n A3), ¿A) = Pr(A1 n A2 n A3),

£(A) = Pr(Ax n A2 n A3), ¿A) = Pr(A1 n A2 n A3),

¿A) = Pr(Ax n A2 n A3), £(A) = Pr(Ax n A2 n A3). The above probabilities satisfy the following identities:

(A)

g(A) + £(A) = £(Al,A2), £(A) +

£(A) + £(A) = £(A1,A2), £(A) + g(A) £(A) + £(A) = £(Al,A3), £(A) + £(A) £(A) + £(A) = £(Al,A3), £(A) + £(A) £(A) + £(A) = £(A2,A3), £(A) + £(A) £(A) + £(A) = £(A2,A3), £(A) + £(A)

£(A1,A2)

(A1,A2) £4

£ (Ai,A3 ) £2

£ (Ai,A3 ) £4

(A2,A3) £2

(A2,A3) £4 .

For any 1 < i < j < 3 and any (A,, Aj) G [(a,,aj)], the probability distribution

(£ £(i,i) £(i,j) £(hj))

( Ai,Aj ) -r( Ai,Aj ) £(Ai,Aj ) £(Ai ,Aj )

(£1 ' j, £

satisfies the linear system

£1(

(i,j)

£1(

(i,j)

+ £2i,;)

£2

£ (i,;0 £3

+ £F

+ £

+ £Îj

£3

(i,j) _

^ £4

)

a,

1 — a

a, 1 — a,

(1)

(2)

(3)

The identities (2) and the linear systems (3) yield that for any ordered triple A G [a], the probability distribution

(£1, £2, £3, £4, £5, £6, £7, £s ) = (£1A), £2a) , £3a) , £4a) , £5a), £6a), £7a), £8a))

satisfies the linear system

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

£1 + £5 + £7 + £s = a1

£2 + £3 + £4 + £6 = 1 — a1

£2 + £5 + £6 + £s = a2

£1 + £3 + £4 + £7 = 1 — a2

£3 + £5 + £6 + £7 = a3

£1 + £2 + £4 + £s = 1 — a3.

(4)

(5)

Let us denote for short £ = (£1, £2, £3, £4, £5, £6, £7, £s) and let H7 be the affine hyperplane in Rs with equation £1 + £2 + £3 + £4 + £5 + £6 + £7 + £s = 1. For any a G R3 the solutions of (5)

3

depend on four parameters, say d0 = £5, Q\ = ft, d2 = £7, 03 = ft, and for any triple a € R3 form a 4-dimensional affine space la in H7 with parametric representation

£1 = ai - 00 — 02 — 03

£2 = a2 - 00 — 01 — 03

£3 = a3 - 00 — 01 — 02

£4 = 1 — ai — a2 — a3 + 200 + 01 + 02 + 03

£5 = 00

£6 = 01

(6)

£7 ft

02

03

The map

i7 : R7 ^ H7, (a, 0) ^ £,

(7)

defined by formulae (6) is an affine isomorphism with inverse affine isomorphism

X7 : H7 ^ R7, £ ^ (£1 + £5 + £7 + ft, £2 + £5 + ft + ft, ^3 + £5 + ft + £7, £5, ft, £7, ft ). (8)

The symmetric group S3 acts on R7by the rule a(a, d ) = (aa; ad), where aa = (a^-i(i), aa-1(2), aa-1(3)) and ad = (d0, da-i(1), da-i(2), da-i(3)), a G S3. When necessary, we write aa and ad in order to distinguish the actions of a on a's and d's, respectively.

On the other hand, we transport the action of S3 on the set {6,7,8} via the bijection 1 ^ 6,2 ^ 7,3 ^ 8 and define an action of S3 on the hyperplane H7 by the formula

a£ = (£a-1(1), ft-1 (2), ft-1 (3), £4, £5, ft-1 (6), £a-1(7), £a-1(8)). Lemma 1. The affine isomorphism i7 is also an isomorphism of S3-sets: i7(a(a,d)) = ai7(a,d). Proof. We check the statement for a set of generators of S3: For a = (12) we have ft((12)(a, d)) = £2 (a, d), £2 ((12)(a, d )) = £1 (a, d), £6((12)(a, d)) = ft(a, d), £7 ((12)(a, d )) = £6 (a, d).

For a = (23) we have

ft((23)(a, d)) = £3(a, d), £3 ((23)(a, d )) = £2 (a, d), £7((23)(a, d)) = £8 (a, d), £8 ((23)(a, d )) = £7 (a, d).

a •

4.2. The Geometric Classification

After fixing the coordinates X1, a2, and a3, the isomorphism 17 from (7) maps the 4-dimensional

affine space ft = {a} x R4 onto the 4-dimensional affine space ia in H7. We denote by ^ the

(affine) restriction of i7 on Zx, so ^ : Z« ^ ia.

The trace of the 8-dimensional cube {£ G R8|0 < ft < 1,k = 1,...,8} onto the hyperplane H7 is the 7-dimensional simplex A7 defined in H7 by the inequalities £1 > 0,..., ft > 0. The inverse image T7 = i-1 (A7) via the affine isomorphism 17 is the convex polyhedron in R7 with non-empty interior, defined by the system of inequalities

T7:

00 + 02 + 03 < a1

00 + 01 + 03 < a2

00 + 01 + 02 < a3

200 + 01 + 02 + 03 > a1 + a2 + a3 — 1

00

> > >

(9)

The form (8) of the inverse isomorphism x7 yields that T7 c [0,1]7. In particular, T7 is a polytope. Note that we are using the terminology about polytopes introduced in [2, Ch. 12].

For any a G R3 we set C7(a) = Zx n T7, so C7(a) = {a} x 17(a), where 17(a) c R4 and R4 is furnished with coordinates 9. The subset I7(a) is defined in R4 via the system (9) with fixed a. Hence 17(a) is a convex bounded polyhedron in R4. We also set D7(a) = i7(C7(a)). Since 17 (Za ) = I a, we obtain that D7(a) — £x n A7.

We consider T7, Zx — R4, C7(a), 17(a), ia, A7, and D7(a) as topological subspaces of the corresponding ambient linear spaces, with topology induced by their standard topology. Moreover, for each subset A of a topological space X we denote by A its interior with respect to X. We note that A is the largest open set contained in A, see [3, § 1, no6].

Lemma 2. The minimal number of half-spaces in R4, whose intersection is the polyhedron ^7(a) is 8.

Proof. We can not omit any one of the inequalities in (9) formed by the free variables £5 = 90, £6 = 9i, £7 = 92, and £8 = 93. It turns out that the general solution of the linear system (5) can also be written in terms of the free variables £1, £2, £3, and £4. In particular, neither of the inequalities > 0, £2 > 0, £3 > 0, and £4 > 0, that define the polytope T7 can be omitted, too.

We define the point 0(a) G R4 by the formulae

90x) = a1a2 a3, df = (1 — a1)a2a3, df = — a2 )a3, dïf = a^2(1 — a3).

Lemma 3. If a G [0,1]3, then 0(a) G I7(a) and the following three statements are equivalent:

(i) One has a G (0,1)3.

(ii) One has 0(a) G ^(a).

(iii) One has ^(a) = 0.

Proof. The equalities Q1 + 03 + d4 — a1 = —a1 (1 — a2)(1 — a3), 91 + d2 + d4 — a2 = — a2(1 — a1 )(1 — a3), 91 + d2 + d3 — a3 = —a3(1 — a1 )(1 — a2), and 2d1 + d2 + d3 + d4 — a1 — a2 — a3 + 1 = (1 — a1 )(1 — a2)(1 — a3) yield that the system (9) is satisfied if a G [0,1]3. If, in addition, a G (0,1)3, then (9) with strict inequalities holds. Thus, the implication (i) (ii) is also proved.

(ii) (iii) This is trivial.

(iii) (i) Let d G l7(a). Then £(d) > 0, k = 1,...,8, their sum is 1, and satisfy the linear system (5). Therefore a G (0,1)3.

(a)

(a)

(a)

(10)

Theorem 1. (i) One has

I7 (a) = <

(0,0,0,0) {0} x I(a2,a3) x {0} x {0} {0} x {0} x I(a1,a3) x {0} {0} x {0} x {0} x I(a1,a2) {a3} x {0} x {0} x {1 — a3} {a2} x {0} x {1 — a2} x {0} {a1} x {1 — a1} x {0} x {0} {(a2 — 03,0,a3 — a2 + 03,03)|03 G I(a2,a3)} {(a3 — 01,01,0,a1 — a3 + 01 )|01 G I(a3,a1)} {(a1 — 02,a2 — a1 + 02,02,0)|02 G I(a1,a2)}

if at least two of a-s are 0 if a1 = 0, a2 > 0, a3 > 0 if a2 = 0, a1 > 0, a3 > 0 if a3 = 0, a1 > 0, a2 > 0 if a1 = 1, a2 = 1, a3 > 0 if a1 = 1, a3 = 1, a2 > 0 if a2 = 1, a3 = 1, a1 > 0 if a1 = 1, a2 > 0, a3 > 0 if a2 = 1, a1 > 0, a3 > 0 if a3 = 1, a1 > 0, a2 > 0

and 17(a) is a polytope in R4 if a G (0,1)3.

(ii) One has i7(C7(a)) = iD7(a) the interiors being with respect to affine spaces Zx and ia, respectively.

Proof. (i) The systems (5) and (9) imply the equalities. In case a G (0,1)3, Lemma 3 yields that the bounded convex polyhedron 17(a) in R4 has non-empty interior. In other words, it is a polytope.

(ii) It is enough to note that the (affine) restriction i) ': Z a —^ £a is, in particular, a homeomor-phism.

Corollary 1. Let a £ R3.

(i) The system of constraint conditions 0 < ft(0) < 1,k = 1,... ,8, on the solutions (6) of linear system (5) is equivalent to the property 0 £ i7(a).

(ii) One has 0 < £k(0) < 1, k = 1,... ,8, if and only if 0 £ 17(a).

Proof. (i) The equalities C7(a) = Z«. n T7 and D7(a) = ia n A7 imply part (i). We have C7(a) = Za n T7 and D7(a) = ia n A7, where the interiors T7 and A7 are with respect to affine spaces R7 and H7, respectively. Now, Theorem 1, (ii), yields part (ii).

We have R(a) c I7(a) and define )(a) = R(a). The dotted polytope Ci, )(a) = {a} x 4 ')(a), (a) £ R3, is the locus of all 7-tuples of probabilities (a,0(A)), where A £ [(a)].

By plugging 0(a) in the formulae (6), we obtain the point £(a) £ H7 with coordinates

^ = ax(1 - a2)(1 - a3),£(2a = (1 - a1)a2(1 - a3),

£3a) = (1 - a 1 )(1 - a2)a3,^ = (1 - a1 )(1 - a2)(1 - a3),

£5a) = a1a2a3, £(6a = (1 - a 1)a2a 3, £((a = a1(1 - a2)a3, £(8x) = aa(1 - a3).

Let U3 be the rational 3-dimensional algebraic manifold defined in R7 by the equations (10). In other words, U3 is the locus of the points in R7 of the form (a, 0(a )), a £ R3. Let us denote W3 = i7(U3), so W3 is the locus of the points £(a), a £ R3, in H7. Then x7(W3) = U3, W3 is an algebraic subvariety of H7, and the restrictions of i7 and x7 on U3 and W3, respectively, form a pair of mutually inverse isomorphisms of 3-dimensional rational algebraic manifolds. Moreover, W3 n £ a = {£(a)} for any a £ R3. Let us denote k3 = i3 o S3, where S3 is the isomorphism of algebraic manifolds R3 — U3, a — (a, 0(a )). Therefore, K3: R3 — W3 is also an isomorphism of algebraic manifolds.

We have the product vector bundle with total space R7, base R3, projection (a, 0) — a, and fibre Z a. Now, we transport the structure of fibre bundle by means of the pair of isomorphisms (i7, k3) to H7 and W3, thus obtaining a structure of vector bundle with total space H7, base W3, projection n: H7 — W3, with n-1(£(a)) = £a. Via restriction we obtain a fibre bundle with total space T7, base [0,1]3, projection (a, 0) — a, and fibre C7(a), as well as a fibre bundle with total space A7 and base w3 = k3([0,1]3). Combining the equality i7(C7(a)) = D7(a), Lemma 3, and Theorem 1, (ii), we obtain that if a £ [0,1]3 (respectively, a £ (0,1)3), then £(a) £ D7(a) (respectively, £(a) £ D7(a)). Thus, w3 n D7(a) = {£(a)} and the projection n: A7 — w3 has fibres n-1 (£(a)) = D7(a). Moreover, the restriction of the pair (i7,k3) is an isomorphism of fibre bundles.

For the sake of transparency, we note that T7 = U(a)£j01j3C7(a), A7 = U(a)£[01]3D7(a). The

unions T7( • ) = U(a)£R3C( )(a), 4') = U(a)£R3D7)(a) are the corresponding dotted polytopes.

The above considerations yield the following classification theorem:

Theorem 2. (i) The affine isomorphism i7: R7 — H7 transforms any polytope C7(a) (resp., dotted polytope C7 )(a)) onto the polytope D7(a) (resp., onto the dotted polytope d7 )(a)).

(ii) The dotted polytope c7 )(a) is the classification space of all Yule's triples of type [(a, 0)]. The dotted polytope a7')(a) is the classification space of all probability distributions (1) produced by Yule's triples of type [(a, 0)].

(iii) i7 maps the polytope T7 (resp., dotted polytope t7 )) onto the polytope A7 (resp., onto the dotted polytope a7 )).

(iv) The dotted polytope t7 ) is the classification space of all Yule's triples. The dotted polytope A7( ) is the classification space of all probability distributions produced by Yule's triples.

5. Entropy and Dependence of Yule's Triples In this section we suppose a G (0,1)3, that is (Lemma 3), 17(a) = 0.

5.1. The Entropy Function

The function E: A7 ^ R, E(£) = — Ek=1 £k ln £k, is strictly concave since the open simplex A7 is convex and all of its "entropy" summands E(k)(£) = —£k ln £k are strictly concave. Let us fix a G (0,1)3 and let

8

Ex (9) = £ E«(9), E«(9) = —£k(9) ln £k(9), (11)

k=1

be the composition of E with the affine isomorphism i7x) : Ex (9) = E(i7x)(9)). In accord with Corollary 1, (ii), the entropy function (11) of the experiment J3 has I7(a) as a natural domain:

Ex: /7(a) ^ R.

Lemma 4. (i) The entropy function Ex is a strictly concave function.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(ii) The entropy function Ex can be extended as continuous at /7(a) and this extension Ex is unique.

(iii) The continuous extension Ex of Ex at /7(a) is also a strictly concave function.

Proof. Note that the polytope 17(a) and its interior 17(a) are bounded convex sets.

(i) The function Ex is composition of the affine map ¿f ^ followed by the strictly concave function E(£).

(ii) We apply [3, § 8, no5, Theorem 1].

(iii) The point 9(0) belongs to the frontier of the polytope 17(a) if and only if £k(9(0)) = 0 for indices k from some set K and £k (9(0)) > 0 for the rest of the indices, where k = 1,..., 8. Moreover, for any k G K we have E(k)(9) ^ 0 when 9 ^ 9(0), 9 G 17(a). In other words, E(k)(9(0)) = 0.

A boundary transition yields that Ex is a concave function. Moreover, since there are indices k G K, the function Ex is strictly concave. Indeed, let 9(1) G 17(a) and A G (0,1). In accord with [2, Ch. 11, Lemma 11.2.4], we have (1 — A)9(0) + A9(1) G /7(a), hence

E(k)((1 — A)9(0) + A9(1)) = E(k)((1 — A)9(0) + A9(1)) < (1 — A)E(k)(9(0)) + AE(k)(9(1)) for any k = 1,... ,8.

In case k G K we have E(k)(9(0)) = E(k)(9(0)) and we are done. Now, let k G K and let 9 ^ 9(0), 9 G 17(a). We obtain

E(k)((1 — A)9(0)+ A9(1)) = lim E(k)((1 — A)9 + A9(1)) <

9^9(0)

(1 — A) lim E(k)(9) + AE(k)(9(1)) = (1 — A)E (k)(9(0)) + A EE (k)(9(1)).

9^9(0)

The symmetric group S3 acts on the entropy functions Ex (9) by the rule oEx (9) = Ex (o—19),

o G S3.

Lemma 5. If o G S3, then Eox(9) = oEx(9) and 17(oa) = o917(a).

Proof. (i) According to Lemma 1, we have o—1 Eox (9) = Eox (o9) = E(i7ox)(o9)) = E(i7 (oa, o9)) = E(oi7(a,9)) = E(i7(a,9)) = Ex(9). Finally, the domain of oEx(9) is the polytope o9/7(a) and we obtain 17 (oa) = o917 (a).

Corollary 2. Let o G S3.

(i) One has Eox(o9) = Ex(9).

(ii) All permutations of the members of Yule's triple A = (A1, A2, A3) have the same entropy: If A G [(a)], then oA G [(oa)] and Eox(9(oA)) = tx(9(A)).

Proof. (i) Let 9(0) be point from the frontier of the polytope 17(a). Then a9(0) is point from the frontier of the polytope I7(aa) with interior a9I7(a). We have 9 ^ , 9 E I7(a), if and only if a9 ^ a9(0), a9 E aI7(a). The equality from Lemma 5 can be written in the form Eaa (a9) = Ea (9) and a boundary transition yields the result. (ii) Implied by part (i).

5.2. The Entropy Function and its Critical Points

For any 9 E I7(a) we obtain

dEa(9) = £1 (9)£2(9£(9) dEa(9) = ftWftW d9o £2(9)£5(9) , d9i £4(9)£6(9)'

dEa(9) = £i(9)£3(9) dEa(9) = £1 (9)£2(9) d92 £4(9)£7(9), d9s £4(9)£s(9)'

Thus, the set of critical points of the function Ea(9) is the intersection of the interior I7(a) C R4 and the algebraic variety in R4 with equations

£1 (9)£2(9)£s(9) - £2(9)£5(9)= 0, £2(9)£3(9) - £4(9)£6(9)= 0, £1 (9)£3(9) - £4(9)£7(9) = 0, £1 (9)£2(9) - £4(9)£8(9) = 0.

Lemma 6. (i) The point 9(a) is a critical point of the entropy function Ea. (ii) One has

Ea(9(a)) = - ln ^aa22aa(1 - a1 )1-a1 (1 - a2)1-a2(1 - a3)1-a3) . Proof. (i) We have

2

£( ) £( ) £( ) - £( ) 2 £( ) = a1 (1 - a2)(1 - a3)(1 - a1)a2(1 - a3)(1 - a1 )(1 - a2)a3-

(1 - 1)2(1 - 2)2(1 - 3)2 1 2 3 = 0,

£(a) £(a) _ £(a) £(a) =

£2 £3 - £4 £6 =

(1 - a 1)a2(1 - a3)(1 - a1)(1 - a2)a3 - (1 - a 1 )(1 - a2)(1 - a3)(1 - a 1)a2a3 = 0,

£(a) £(a) _ £(a) £(a) =

£1 £3 - £4 £7 =

a1(1 - a2)(1 - a3)(1 - a1)(1 - a2)a3 - (1 - a 1 )(1 - a2)(1 - a3)a 1(1 - a2)a3 = 0,

£(a) £(a) _ £(a) £(a) =

£1 £2 £4 £8 =

a1(1 - a2)(1 - a3)(1 - a1)a2(1 - a3) - (1 - a 1 )(1 - a2)(1 - a3)a 1a 2(1 - a3) = 0. (ii) We have

-Ea(9(a)) = -E(£(a))= £ £(a) ln£(a)

k=1

£(a) ln (a 1(1 - a2)(1 - a3)) + £2a) ln((1 - a1) a2 (1 - a3))+ £3a) ln((1 - a1 )(1 - a 2)a3) + £4"° ln((1 - a 1)(1 - a2)(1 - a3)) + £5a) ln(a 1a2a3) + £(6a ln((1 - a 1)a2a3) + £7a) ln(a 1 (1 - a2)a3) + ) ln(a 1a2(1 - a3)) =

{£{a + £5a) + £7a) + £8a)) lna1 + (£(2a) + £(a) + £6a) + £8a)) lna 2+

538

(£3x) + £5x) + £ix) + £7x)) ln a3 + (£2x) + £5x) + £ix) + £(x)) ln(1 — a1)+ (£1x) + £3x) + £4x) + £7x)) ln(1 — a2) + (£(x) + £2x) + £4x) + £(x)) ln(1 — a3) =

ln (a?aaal3 (1 — a1 )1—x1 (1 — a2)1—x2 (1 — a3)1—x3) .

5.3. The Entropy Function and its Second Derivative

Given k, k = 1,.. .,8, the Hessian of the function eX^(9), 9 G /7(a), is the 4 x 4 symmetric matrix H(k)(9) = (Hj(9j where (9) = — ^^. Then the Hessian H(9) of the entropy function Ex(9) is the 4 x 4 symmetric matrix H(9) = £®=1 H(k)(9). In accord with [4, Ch. 3, 3.1.4], since the functions eX^(9) are strictly concave, the corresponding quadratic forms TH(k)(9)t are negative semi-definite: frH(k)(9)r < 0 for all t G R4. In particular, the quadratic form tH(9)t = £k=1 tTH(k)(9)T is negative semi-definite. Moreover, since tH(5)(9)t = — ^T-j2,

*tH(6)(9)t = — ^t22, (tH(7)(9)t = —1 t32, and TH(8)(9)t = —1 t42, the quadratic form TH(9)t is negative definite for any 9 G 17(a) and we obtain

Lemma 7. The set of local maximums of the entropy function Ex (9) coincides with the set of its critical points.

The compactness of the polytope 17(x) yields that the extended entropy function Ex(9) attains its absolute maximum and absolute minimum.

Theorem 3. The extended entropy function Ex (9) has a unique absolute maximum attained at

the point 9( ) from (10).

Proof. Lemma 6 and Lemma 7 yield that the entropy function Ex (9) and, therefore, also the extended entropy function Ex (9), has a local maximum at the point 9(x). In accord with Lemma 4 and Lemma 10, Ex (9) has a unique absolute maximum at 9(x).

Theorem 4. If the extended entropy function Ex(9) attains an absolute minimum at some point from the polytope 17(x), then this point is a vertex of 17(a).

Proof. Lemma 2 allows us to use [2, Theorem 12.1.5, 12.1.8, Proposition 12.1.9] and we conclude that since the restriction of Ex (9) on an ¿-face, i = 1,2,3, of the polytope 17(a) is also a strictly concave function, we can apply at most four times Lemma 11.

The continuous extension Ex (9), 9 G /7(a), of the entropy function Ex (9), 9 G /7(a), is said to be the extended entropy function of Ynle's triples of type [(a)].

6. Degree of Mutual Dependence of a Triple of Events

6.1. Two Motivation Statements

Lemma 8. The three components of the Yule's triple A = (A1, A2, A3) are mutually independent if and only if 9(a) = 9(x).

Proof. In accord with [8, I,§5, (4)], the events A1, A2, A3 are mutually independent if and only if Pr(Aj n A;-) = Pr(Aj) Pr(A;-), 1 < i < j < 3, Pr(A1 n A2 n A3) = Pr(A1) Pr(A2) Pr(A3). Using (2), we write these conditions in the form

90 + 91 = a 2a 3

90 + 92 = X1X3

90 + 93 = a 1a 2

90 = 11X2X3.

The point 9(a) from (10) is the unique solution of this system.

Now, we suppose, in addition, that (Q, A, Pr) is a discrete uniform probability space. The faces of the polytope I7(a) C R4 are parts of the hyperplanes with equations £k(9) = 0, k = 1,... ,8. According to (1), the following equivalences hold:

Lemma 9. Let A = (A1, A2, A3) be a Yule's triple of events. One has:

£1(9(A)) = 0 iff A1 C A2 U A3, £2(9(A)) = 0 iff A2 C A1 U A3,

£3(9(A)) = 0 iff A3 C A1 U A2, £4(9(A)) = 0 iff A1 C A2 U A3, £5(9(A)) = 0 iff A1 n A2 C A3,£6(9(A)) = 0 iff A2 n A3 C A1, £7(9(A)) = 0 iff A1 n A3 C A2,£8(9(a)) = 0 iff A1 n A2 C A3.

6.2. Definition of Degree of Mutual Dependence

The value of extended entropy function Ea(9) of Yule's triples of type [(a)] at 9 = 9(A) is called entropy of Yule's triple A = (A1, A2, A3) of type [(a)]. In accord with Corollary 2, the entropy does not depend on the order of the components of A. This fact together with the opposites described in Lemmas 8 and 9 motivate the use of the extended entropy function Ea (9) as a measure of strength of mutual dependence of three events A1, A2, A3.

Let us denote by M the absolute maximum Ea (9( )) and let m be the absolute minimum of Ea(9), attained at some vertex of the polytope I7(a), see Theorems 3 and 4. The former also yields that m < M.

Following [6, 5.2], for any 9 E I7(a) we define ea: I7(a) ^ [0,1], ea(9) = E"¿-a^• The value of the function ea at 9 E I7 (a), 9 = 9(A), A = (A1, A2, A3), is said to be degree of mutual dependence of the events A1, A2, A3, with a1 = Pr(A1), a2 = Pr(A2), a3 = Pr(A3). Intuitively, ea(9(A)) measures the strength of the mutual relations among the events A1, A2, A3. The above definition of ea yields

Corollary 3. The degree of mutual dependence of three events does not depend on the choice of base of logarithms in the extended entropy function.

Example 5. In case a = (10, 5,10) the polytope I7(a) has 12 vertices

v\2138, ^,2,5^ ^,3,5^ v2,3,5^ v1,2,3> v\2l$7,

^,2,7^ ^,5,6^ ^,5,6^ ^,6,7^ v2,5,7^ v5,6,7,8.

Here by Vk1,k2,k3,k4 we denote the vertex which is the intersection point of the hyperplanes with equations £k1 = 0, £k2 = 0, £k3 = 0, and £k4 = 0. At the first four vertices the extended entropy function attains its absolute minimum (approximately equal to 0.8018185525433372). Equivalently, we have

ea (V1,2,3,8) = ea (V1,2,5,8 ) = ea (V1,3,5,8) = ea (V2,3,5,8 ) = 1.

On the other hand, let, for example, the vertex v1,3,5,8 belongs to the dotted polytope i7 )(a), that is, let 9(a) = v1,3,5,8, where A = (A1, A2, A3) is a Yule's triple.

Moreover, let us assume that (Q, A, Pr) is a sample space with equally likely outcomes. In accord with Lemma 9, we can conclude that the system of set-theoretic relations

A1 C A2 U A3, A3 C A1 U A2, A1 n A2 C A3c , A1 n A2 C A3,

or equivalently, the system of relations A3 C A1 U A2, A1 C A3 n A2, is one of the most powerful under the condition a = (10, 5,10).

On the other hand, v1,3,5,8 is again a vertex in case a = (5,10, §) but now the above system of relations is not the most powerful one: ea (v1,3,5,8) < 1.

Example 6. [9, Section 3, 3.2], (Bernstein 1928) Let us consider a sample space with four equally likely outcomes 112,121,211,222. The events A1 = {112,121}, A2 = {112,211}, A3 = {121,211}, are pairwise independent but not mutually independent because A1 n A2 n A3 = 0. Below we evaluate their degree of mutual dependence. We set A = (A1, A2, A3) and note that a =

(1,1, 2). Using (1), we obtain £(A) = £2A) = £3A) = £5A) = 0, £4A) = £6A) = £7A) = £8A) = i Therefore Ex (9(a)) = —2ln On the other hand, the polytope /7(x) has 50 vertices and the extended entropy function Ex (9) attains its absolute minimum m = — ln 2 at 48 of them. Since

M = Ex(£(x)) = —3ln2,wehave ex (9(A)) = ±.

Remark 1. One can find below the link to a Java program which calculates the degree of mutual dependence of three events in a sample space with equally likely outcomes: http://www.math.bas.bg/algebra/valentiniliev/

7. Conclusions

This paper finishes the trilogy that begins with [6] and [7]. It presents an original approach to the problem of measuring the magnitude of dependence of several events in a probability space, which rests upon Boltzmann-Shannon entropy of a probability distributions produced by these events. The first two parts are devoted to the fundamental case of two events where, for a given level of entropy intensity, one can discern negative from positive dependence, thus defining a direction. Moreover, the function of dependence of two events is closely related to the information exchanged between the two binary trials generated by these events.

The case of three events is studied here and this examination shows, in particular, that the general case of a finite number of events differs only in technical difficulties.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

A. Appendix

A.1. Folklore Results about Extrema of a Concave Function

Our source of definitions and results about convex sets is [1, Ch. 11].

Let C C Rn. We remind that the function f: C ^ R is said to be concave (respectively, strictly concave) if C is a convex set and for any two different points c1, c2 G C and any A G (0,1) one has f ((1 — A)c1 + Ac2) > (1 — A)f (c1) + Af (c2) (respectively, f ((1 — A)c1 + Ac2) > (1 — A)f (c1) + Af (c2)).

Lemma 10. (i) Any local maximum point of a concave function is an absolute one.

(ii) There exists at most one local maximum point of a strictly convex function.

(iii) There exists at most one absolute maximum point of a strictly concave function.

Proof. Let f: C ^ R be a concave function.

(i) Let c0 G C be a point at which f attains a local maximum and let U C C be a neighbourhood of c0 such that f (c0) < f (c) for all c G U. Let us suppose that there exists a point c1 G C such that f (c1) > f (c0). Then f ((1 — A)c0 + Ac1) < (1 — A)f M + Af (d) > f (c0) for all A G (0,1). If A is sufficiently close to 0, then f ((1 — A)c0 + Ac1) G U and hence f ((1 — A)c0 + Ac1) > f (c0) which is a contradiction.

(ii) Let, in addition, f be strictly concave and c1, c2 G C be two different points at which f attains a local maximum. In accord with part (i), we have f (c1) = f (c2) and then f ((1 — A)c1 + Ac2) > (1 — A)f (c1) + Af (c2) = f (c1) for all A G (0,1). Since f attains an absolute maximum at c1, this is a contradiction.

Part (ii) implies part (iii).

Lemma 11. Let f : C ^ R be a strictly concave function and let for any point c G C there exists an open line segment Wc such that c G Wc C C. If f attains an absolute minimum at co G C, then Co / C.

Proof. Let us suppose that c0 G C and let the points ci, c2 G Wc, ci = c2, be such that co = (1 - A)ci + Ac2 for some A G (0,1). Then f (ci ) > f (co), f (c2 ) > f (co ), and f (co ) = f ((1 -A)c1 + Ac2) > (1 - A)f (c1) + Af (c2) > (1 - A)f (co) + Af (co) = f (co), which is a contradiction.

Acknowledgements

It is a pleasure for me to cordially thank Dimitar Guelev for making experimental implementation of the evaluation of degree of dependence of three events from a discrete uniform probability space in Java. His numerical examples were invaluable for my work. I would like to thank also to the administration of the Institute of Mathematics and Informatics at the Bulgarian Academy of Sciences for creating perfect and safe conditions of work.

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Declaration of Conflicting Interests The Author declares that there is no conflict of interest.

References

[1] Berger M. (1987) Geometry I, Springer-Verlag Berlin Heidelberg.

[2] Berger M. (1987) Geometry II, Springer-Verlag Berlin Heidelberg.

[3] Bourbaki N. (1966) General Topology, Hermann, Paris.

[4] Boyd S, Vandenberghe L. (2004) Convex Optimization, Cambridge University Press.

[5] Husemoller D. (1966) Fibre Bundles, McGraw-Hill.

[6] Iliev V. V. (2021) On the Use of Entropy as a Measure of Dependence of Two Events. Reliability: Theory & Applicatins 16:237-248.

[7] Iliev V. V. (2022) On the Use of Entropy as a Measure of Dependence of Two Events. Part 2. Reliability: Theory & Applicatins 17:441-446.

[8] Kolmogorov A. N. (1956). Foundations of the Theory of Probability, Chelsea Publishing Company, New Yourk.

[9] Stoyanov J .M. (2013) Counterexamples in Probability, Dover Publications, Inc., Mineola, New York.

i Надоели баннеры? Вы всегда можете отключить рекламу.