Научная статья на тему 'EXACT AND CONDITIONAL BOUNDS FOR GENERALIZED CUMULATIVE ENTROPY'

EXACT AND CONDITIONAL BOUNDS FOR GENERALIZED CUMULATIVE ENTROPY Текст научной статьи по специальности «Математика»

CC BY
41
4
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
cumulative entropy / exact bounds / conditional bounds / calculus of variations

Аннотация научной статьи по математике, автор научной работы — Alexey V. Lebedev

The differential entropy is a natural analog of the Shannon entropy for discrete distributions in respect to absolutely continuous distributions (with density). In modern studies, many other kinds of entropy have been introduced and analyzed, including various cumulative entropies, which are based not on the density but on the (cumulative) distribution function of random variable. Such characteristics can be used, for example, in computer vision, reliability theory, risk analysis, etc. We consider some generalizations of cumulative entropy, for a wide class of entropy generators. We use the methods of probability theory, calculus of variations and Cauchy–Bunyakovsky–Schwarz inequality. In the class of centered and normalized random variables, exact and conditional bounds are found as well as the distributions on which they are attained. By conditional bounds we understand bounds for one generalized cumulative entropy given the value of another entropy (in the class of random variables with zero mean and unit variance). This problem is analogous to the previously posed and partly solved problem on conditional bounds for expectations of sample maxima when we know the expected maximum of a sample of another size or expected maxima of two smaller samples.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «EXACT AND CONDITIONAL BOUNDS FOR GENERALIZED CUMULATIVE ENTROPY»

EXACT AND CONDITIONAL BOUNDS FOR GENERALIZED CUMULATIVE ENTROPY

A lexey V. L ebedev •

Lomonoso v Moscow State University, Russia avlebed@y andex.ru

Abstract

The differential entropy is a natural analog of the Shannon entropy for discrete distributions in respect to absolutely continuous distributions (with density). In modern studies, many other kinds of entropy have been introduced and analyzed, including various cumulative entropies, which are based not on the density but on the (cumulative) distribution function of random variable. Such characteristics can be used, for example, in computer vision, reliability theory, risk analysis, etc. We consider some generalizations of cumulative entropy, for a wide class of entropy generators. We use the methods of probability theory, calculus of variations and Cauchy-Bunyakovsky-Schwarz inequality. In the class of centered and normalized random variables, exact and conditional bounds are found as well as the distributions on which they are attained. By conditional bounds we understand bounds for one generalized cumulative entropy given the value of another entropy (in the class of random variables with zero mean and unit variance). This problem is analogous to the previously posed and partly solved problem on conditional bounds for expectations of sample maxima when we know the expected maximum of a sample of another size or expected maxima of two smaller samples.

Keywords: cumulativ e entropy, exact bounds, conditional bounds, calculus of variations

1. Introduction

The differential entropy is a natural analog of the Shannon entropy for discrete distributions in respect to absolutely continuous distributions [14, 6]. For a random variable X with probability density function p(x), it is given by

c

H(X) = - p(x) ln p(x) dx.

J —to

For a given variance a2, the differential entropy attains its maximum on Gaussian distributions N(p, a2) [14, §20]; then

H(X) = 1 +ln ).

In modern studies, many other kinds of entropy have been introduced and analyzed, including various cumulative entropies, which are based not on the density but on the (cumulativ e) distribution function. Such characteristics can be used, for example, in computer vision [13], reliability theory and risk analysis [4, 5], etc. Even medical applications have been noted [1].

In [13], for nonnegativ e random variables there was introduced the cumulative residual entropy (CRE)

r

E(X) = - F(x) ln F(x) dx, Jo

wher e F(x) = 1 — F(x), F being the (cumulativ e) distribution function (CDF) of a random variable X, and in [4] there was introduced the cumulative entropy (CE)

CE(X) = — F(x) ln F(x) dx, Jo

which was afterwards also called the direct cumulativ e entropy (in contrast to the residual one). In such expr essions it is assumed that 0 ln 0 = 0.

It is clear that these functionals can be extended from nonnegativ e to arbitrar y random variables by taking integrals over the entire axis:

r+to r

E(X) = - F(x) ln F(x) dx, CE(X) = - F(x) ln F(x) dx. (1)

J -TO J -TO

In the general case, the integrals may both converge or diverge. For these cumulativ e entropies, there is symmetr y

E (X)= CE (-X). (2)

Note that cumulativ e entropies (as well as the differential entropy) are traditionally written as numerical characteristics of a random variable X, though they actually depend on its distribution function F only.

In [3], representations for E(X) and CE(X) through moments of order statistics (using the power series expansion of the logarithm) have been obtained and upper bounds on these entropies were constructed assuming that X has mean p and variance a2 (taking into account classical estimates for order statistics [7, 8]).

Namely, there were obtained the inequality [3, Theorem 1]

+to a

E(X) < £ -- -« 1.21 a, (3)

= (n + 1)V 2n + 1

which is also valid for CE(X) due to symmetr y (2), and the inequality [3, Theorem 3]

+ TO a /2

E(X) + CE(X) < V « 3.09a. (4)

nvn + 1

Also, various classes of generalized cumulativ e entropies have been consider ed [9, 10]. In particular , in [9] there were introduced the cumulative residual STM (Sharma-Taneja-Mittal) entropy

1 r to

SRkB(X) = -- (Fa(x) — Fp(x)) dx, a, p > 0, a = p,

p — a .¡0

and the cumulative STM entropy

1 r TO

SPap (X) = —J (Fa (x) - Fp (x)) dx, a, p > 0, a = p. Clearly, they can also be extended from nonnegativ e to arbitrar y random variables:

1 /' + TO

SRap(X) = -- (Fa(x) - Fp(x)) dx,

r p - a J-to

TO

1 r+œ

spa,p(X) = — J—œ (Fa(x) — Fp(x)) dx,

— a

a, p > 0, a = p.

In [10], for a broad class of generalized cumulativ e entropies, optimal distributions (with given means and variances) that maximize these entropies (i.e., give their exact upper limits) have been obtained by methods of calculus of variations; however, the corresponding maximum values

of the entropies have not been derived. If they are derived, for example, for E (X), CE (X), and E(X) + CE(X), it turns out that these bounds are stronger than (3) and (4).

We will consider for simplicity the class of distributions with zero mean and unit variance. Following [10], one can easily deduce that the maximum value of E(X) is 1 and the maximum is attained at the shifted exponential distribution with the CDF

F(x) = 1 - e-(x+1), x >-1; (5)

the maximum value of CE(X) is the same, and it is attained at the distribution with the CDF

F(x) = ex-1, x < 1; (6)

and the maximum value of E(X) + CE(X) is п/ \/3 « 1.81, the maximum being attained at the logistic distribution with the CDF

pnx/ \[3

F(x) = -. (7)

grtx/ V3 + 1

Next we formulate a simple statement that allows us to obtain an upper bound on the generalized cumulativ e entropy without deriving the corresponding optimal distribution; we will demonstrate it by an example of the cumulativ e residual STM-entr opy.

Then we solve a new problem about the range in which one generalized cumulativ e entropy of a random variable can lie provided that another entropy of this random variable is known (for random variables with zero mean and unit variance). Besides the general theorem, we in detail analyze the case of the relationship of the entropies E(X) and CE(X).

This problem is analogo us to the previously posed and partly solved problem on conditional bounds for expectation of sample maxima when we know the expected maximum of a sample of another size [11] or the expected maxima of two smaller samples [12]. In this case, the corresponding characteristics are also expressed as integral functionals of the distribution function.

From the point of view of calculus of variations, the arising problems belong to the class of isoperimetric problems and are solved by the method of Lagrange multipliers (Euler-Lagrange equations).

2. Main Results

Consider the class CN of centered and normalized random variables, i.e.,

CN = {X : EX = 0, VarX = 1}.

It is clear that for all the above-mentioned entropies, in order to establish bounds, it suffices to consi der random variables in this class. Indeed, let a random variable X have mean p and variance a2; then it admits a representation X = p + aX0 with X0 G CN, and it follows from definition (1) that E(X) = aE(X0), and so on.

Introduce a notation for the generalized inverse distribution function (also called the quantile function)

x(u) = inf{x : F(x) > u}, u G [0, 1], wher e F is the CDF of the random variable X. Then

X = x(U),

where U is uniformly distributed on [0,1], and the condition X G CN is equivalent to the following constraints on x(u):

EX = i x(u) du = 0, VarX = / x2 (u) du = 1,

00

wher e the function x(u), u G [0,1], is nondecr easing and right continuous.

We will consider functions g (entropy generators) satisfying the following conditions:

(*) g(u) is a nonnegativ e continuous concave function on [0,1] which is piece wise smooth on (0,1), with g(0) = g(l) = 0, and such that

[■ 1 2

G = (g(u)) du < <x. Jo

Introduce generalized cumulativ e entropies represented by the integral (if it converges)

r

Eg(X)= g(F(x)) dx, (8)

J — TO

wher e F is the CDF of the random variable X.

Using integration by parts and the change of variables u = F(x), we can obtain the following repr esentations:

rr 1 r 1

Eg(X) = — / xdg(F(x))= x(u)g'(1 — u) du = g(1 — u) dx(u),

J—to J0 J0

which gives a particular case of the generalized cumulativ e O-entropy

CE$(F)= f1 O(u) dx(u)

0

introduced in [10], with the only difference that in [10] it was not required that $(0) = O(1) = 0 (though it was actually the case in all examples consider ed there).

Definition (8) also implies Eg(X) = aEg(X0), X0 = (X — ^)/ a, a > 0.

Proposition 1. Let g satisfy condition (*); then

max Eq(X) = \/G.

XeCN x

The proposition follows from the fact that according to [10, Theorem 1] this maximum is attained at the distribution with the inverse CDF

x(u) = g(1 -U , u G [0,1].

VG

Corollary 1. Let 1/2 < min{a,f} < 1, a = f; then

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2af - a - f + 1 xGcn""V (2a - 1)(2f - 1)(a + f - 1) '

max SRxf = „, (9)

and the maximum is attained at the distribution with the inverse CDF

, , a(1 — u)a—1 — 6(1 — u)6—1

x(u) = --^-)-, u G [0,1]. (10)

(6 — u)V G

In this case an optimal distribution F is not found explicitly , but it can be obtained, for example, for a = 1 or 6 = 1, when all expressions become simpler (this was made in [10]).

Note that for min {a, 6} > 1 the concavity condition for g is violated, and for 0 < min {a, 6} < 1/ 2 the entropy SPK>6(X) may take infinitely large values on X £ CN (when the corresponding integrals diverge).

Clearly, analogous statements hold as well for SPK>6, since SPK>6(X) = SRK>6( — X). Theorem 1. Assume that g1 and g2 satisfy conditions (*), the integrals

Gij = Jq gi (u)gj(u) du 1 < U j < 2

are introduced, and it is known that Eg2 (X) = t. Then for all X £ CN we have

Eg, (X) < G- (C121 + y/(Gn G22 - G212 )(G22 - t2)) , (11)

and this bound is tight if the function

x(u) = Л1 g'l (1 - u) + Л2g2 (1 - u),

wher e _

Л1 = j G22 - t22, Л2 = t G12 , (12)

У G11 G22 - Gi2 G22

is nondecr easing on (0,1); then x(u) defines the distribution on which the bound is attained.

Note that by Proposition 1 we have Eg2(X) < G22, so the radicand is always nonnegativ e. The functions g1 (1 - u) and gg2(1 - u) are nondecr easing, and Л1 > 0; however, the nondecr easing condition for x(u) can be violated when Л2 < 0.

For the sequel, it would be convenient to introduce the notation for the constant

p =--1 « 0.645.

y 6

Corollary 2. For all X £ CN we have

E(X) < pCE(X) + ^(1 - p2)(1 -CE2(X)), (13)

and this bound is tight if CE(X) > p.

By symmetr y (2) of the entropies, we also have

CE(X) < pE(X) + J(1 - p2)(1 -E2(X)), and this bound is tight if E(X) > p. By inverting the inequality , we can also obtain a lower bound

E(X) > pCE(X) - ^(1 - p2)(1 -CE2(X))

in the range CE(X) > \J 1 - p2 « 0.764 wher e this bound is nonnegativ e (but we cannot claim that it is tight). Similarly , a lower bound for CE(X) can be found.

The question of what is the upper bound when x(u) is not nondecr easing remains open. In this case we deal with a problem of not the calculus of variations but optimal control (with an additional condition x'(u) > 0), which is much more complicated. One can also apply an appr oach to establishing (not tight) bounds using special families of distributions, as was done in [11]. This approach is exploited in the proof of the following theorem

Theorem 2. For

any 0 < t < p we have

'1 - a

max

XeCN, CE(X)=t

E(X) - ln- a))'

wher e a is a unique solution on (0, l) of the equation 1

a(ln a - l) - Li2 (l - a) + 1

t.

Vl - a2

By symmetr y (2), an analogous estimate holds for CE(X) given E(X), whence one can obtain a lower estimate for the maximum of E(X) given CE(X).

Figure 1 represents plots of the obtained bounds for the entropies E(X) and CE(X). In bold, we highlight the interval where the bound (13) is tight; the dotted line shows the bound of Theorem 2. Points of the bound marked by the triangle, star, and circle correspond to the distributions (5), (6), and (7). In the ranges CE(X) < p and E(X) < p, true bounds lie some wher e in between the solid and dotted lines. Establishing them deserves further investigation.

1Here, Lim(z) = z"/ nm is the polylogarithm of order m.

Figure 1: Plots of the bounds for the entropies E (X) and CE (X).

3. Proofs

Proof of Corollary 1. Let, for definiteness, a < fi; then 1/2 < a < 1. Put g(u) = (ua - ufi)/ (fi -a); then

aua-1 - fiufi-1

g (u)

ß - a

,,, N a(a - 1)ua-2 - ß(ß - 1)uß-2 , N g"(u) = —---s——--- < 0, u G (0,1).

ß - a

We obtain

G

1 (aua-1 - ßuß-1

ß-a

du

^ f1 (a2u2a-2 - 2aßua+ß-2 + ß2u2ß-2) du - a )2 Jo

(ß - a)2 jo

2aß + ß2

(fi - a)2 [2a - 1 a + fi - 1 2fi - 1 = 2afi - a - fi + 1 = (2a - 1)(2fi - 1)(a + fi - 1)

and equations (9) and (10).

Proof of Theorem 1. By considering the Lagrangian

L = J x(u)g[ (1 - u) + A2x(u)g/2(1 - u) + A3x(u)+ A4x2 (u)j du, we obtain the Euler -Lagrange equation

A1 g (1 - u) + A2g2(1 - u) + A3 + 2A4x(u) = 0,

wher e we may without loss of generality takee A4 = -1/ 2. Thus, we will seek for a function

x(u) = A1 g (1 - u) + A2g2 (1 - u) + A3

satisfying the conditions

[ x(u) du = 0,

Jo

f x2 (u) du = 1,

Jo

i x(u)g2 (1 - u) du = t.

o

(14)

2

2

a

The first condition, taking into account that gi(0) = g(l) = 0, i = 1,2, gives A3 = 0; the second and third yield a system of equations

ÎG11A2 + 2G12 Ai A2 + G22 A2 = 1, [G12 A1 + G22 A2 = t;

by solving this system with respect to A1 and A2, we obtain (12).

Next, for any function x(u) corresponding to X G CN, by the Cauchy-Buny akovsky-S chwarz inequality we obtain

J x(u)x(u) du = A1 Eg1 (X) + A21 < ^ J x2(u) dUj (^J X2(u) du

1/2 / „ 1 N 1/2

2i,.\A„\ = 1, (15)

whence

Eg1 (X) <

1 - A21 _ G22 - (t - A1G12 )t _ A1G121 + G22 -12

A1 A1G22 A1G22

= G^ (G121 + "J(G11 G22 - G22 )(G22 - t2 )) .

G22

If x(u) is nondecr easing and thus corresponds to some distribution, then with x(u) = X(u) inequality (15) turns into equality , and the bound is attained.

Proof of Corollary 2. We apply Theorem 1 in the case of g1 (u) = —u ln u and g2 (u) = — (1 — u) ln(1 — u); then, as we have already obtained, G11 = G22 = 1, and we find

G12 = —i (ln u + 1)(ln(1 — u) + 1) du = p; Jo

plugging this into (11), we obtain (13). In this case we have

X(u) = A1 (—(ln (1 — u) + 1)) + A2 (ln u + 1),

wher e

A1 = \l -, A2 = t - A

1 — t2 1 — p2

A necessar y and sufficient condition for x(u) to be nondecr easing on (0,1) is A2 > 0, which happens to be equiv alent to the inequality t > p.

Proof of Theorem 2. Consider a family of random variables X°, a £ [0,1), whose distribution is a mixtur e of zero (with probability a) and the standar d exponential distribution (with probability 1 — a). Then the inverse CDFs take the form

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

{0, 0 < u < a;

1 1 — u ^

— ln-, a < u < 1.

1 — a

We have

EX0 = 1 — a, E(X0 )2 = 2(1 — a), VarX0 = 2(1 — a) — (1 — a)2 = 1 — a2.

Put

v xo - ex:

xa

V/VârX° '

Then Xa G CN, a G [0,1); X0 has distribution (5); and Xa ^ 0 as a ^ 1 - 0

Compute the corresponding entropies for 0 < a < 1:

E(Xa) = 4== = -4= i1 ln ^—U (ln(1 - u) + 1) du

Vl - a2 V1 - a2 J a 1 - a

(1 - a)(1 - ln(1 - a)) /1 - a , , ,

--4 -— = \ -(1 - ln(1 - a)),

V1 + av v "

V1 - a2 V 1 + a

0)= ' f 1i„1-

a2 1 - a2 a 1 -

a(ln a - 1) - Li2 (1 - a) + 1

, CE(Xa0) 1 ri 1 - u ,

CE(Xa) = = .-2 ln-—- (ln u + 1) du

1 - a2 1 - a2 a 1 - a

V1 - a2 '

and CE(Xa) strictly decreases in the interval 0 < a < 1.

Thus, from the values of the entropies on the family Xa, a G (0,1), we can obtain the estimate of Theor em 2.

R eferences

[1] Ahmadini, A. A. H., Hassan, A. S., Zaky, A. N., Alshqaq, S. S. (2020) Bayesian inference of dynamic cumulativ e residual entropy from Pareto II distribution with application to COVID-19. AIMS Mathematics. 6(3):2196-2216.

[2] Balakrishnan, N., Bendr e, S. M. (1993) Improved bounds for expectations of linear functions of order statistics. Statistics. 24(2):161-165.

[3] Balakrishnan, N., Buono, F., Longobar di, M. (2022) On cumulativ e entropies in terms of moments of order statistics. Methodol. Comput. Appl. Probab. 24:345-359.

[4] Di Crescenzo, A. and Longobar di, M. (2009) On cumulativ e entropies. J. Stat. Plann. Inference. 139:4072-4087.

[5] Di Crescenzo, A. and Longobar di, M. (2013) Stochastic comparisons of cumulativ e entropies. In: Li, H., Li, X. (eds) Stochastic Orders in Reliability and Risk. Lectur e Notes in Statistics, vol. 208. Springer , New York, NY. 167-182.

[6] Gelfand, I. M., Kolmogor ov, A. N., Yaglom A. M. (1993) Amount of information and entropy for continuous distributions. In: Shiryayev, A.N. (eds) Selected Works of A. N. Kolmogor ov. Mathematics and Its Applications, vol. 27. Springer , Dordrecht. 33-56.

[7] Gumbel, E. J. (1954) The maxima of the mean largest value and of the range. Ann. Math. Stat. 25(1):76-84.

[8] Hartley, H. O. and David, H. A. (1954) Universal bounds for mean range and extreme observation. Ann. Math. Stat. 25(1):85-99.

[9] Kattumannil, S.K., Sreedevi, E. P., Balakrishnan, N. (2022) A generalized measur e of cumulative residual entropy. Entropy. 24, Art. 444.

[10] Klein, I. and Doll, M. (2020) (Generalized) maximum cumulativ e direct, residual, and paired O entropy appr oach. Entropy. 22, Art. 91.

[11] Ivanov, D. V. (2019) Conditional bounds of expected maxima of random variables and their reachability . Systems and Means of Informatics. 29(1):140-163. (in Russian)

[12] Ivanov, D. V. (2023) On the bounds for the expected maxima of random samples with known expected maxima of two samples of smaller size. Theory Probab. Appl. 68(1):2-15.

[13] Rao, M., Chen, Y., Vemuri, B., Wang, F. (2004) Cumulativ e residual entropy: A new measur e of information. IEEE Trans. Inf. Theory. 50:1220-1228.

[14] Shannon, C. E. (1948) A mathematical theory of communication. Bell. Syst. Tech. J. 27:379-423.

i Надоели баннеры? Вы всегда можете отключить рекламу.