A NEW GENERALIZATION OF TWO PARAMETRIC DIVERGENCE MEASURE AND ITS APPLICATIONS
Fayaz Ahmed
Department of Statistics, University of Kashmir, Srinagar, India fayazahmed4095@gmail.com Mirza Abdul Khalique Baig
Department of Statistics, University of Kashmir, Srinagar, India baigmak@gmail.com
Abstract
In this communication, we proposed two parametric generalized divergence measures. The well-known divergence measures available in the literature are a particular case of our new proposed divergence measure. We also looked into its monotonic behaviour and characterization results. We applied the proposed measure to some life-time distributions and observed that the deviation has been reduced. We have shown the mortality rate of two different countries based on COVID-19 data sets.
Keywords: characterization result, Kullback Leibler divergence measure, Havrda and Charvats divergence, monotonic behavior, probability distribution, Renyi's divergence.
1. Introduction
Information measures play an important role in the field of information theory and other applied sciences.[12] pioneered the concept of information measure (uncertainty).He proposed a way to achieve the uncertainty associated with the probability distribution and established that it is an important part of information theory, which today has many applications in various disciplines. Suppose X is a continuous non-negative random variable, then the [12] entropy is defined as
CO
Hs (X) = -f f (x)logf (x)dx (1)
0
where f is defined as the probability density function of X. Furthermore, it can be written as
Hs (X) = E(-logf (x)) The HS(X) is equal to the expected value of (-logf(x)).
The significance of adequate distance measures between probability distributions stems from their function and has extensive application in entropy.The most prominent divergence used in information theory is relative entropy, also known as [6] divergence (KL divergence).It is widely used in contigency tables, ANOVA tables, statistical inference, etc.
If f(x) and g(x) are the two probability distributions for a continuous random variable X and Y, respectively, then the [6] divergence is given by
œ
Dkl (f ||g) = / f dx
(2)
Furthermore, it can be written as
Dkl (f ||g) = Ekl
log
fM g(x).
Remarks
1. If g(x)=1, then it just becomes [12].
2. If g(x)= f(x), then [6] divergence is reduced to zero.
In this direction the generalization of [6] divergence of order a was proposed by [11] and is defined as
œ
dr (f ||g) = 0-îlogjf (x)ag(x)1-adx, a = 1, a > 0 (3)
0
Remarks
1. If a^ 1, then it simply becomes [6] divergence.
Several researchers have developed various generalizations of [6] divergence in different ways, and in this direction, [5] proposed a new generaliztion of the [6] divergence measure of order ' a' defined as follows
Dhc (f ||g)
1
a1
J f (x)ag(x)1-adx - 1
a = 1, a > 0
(4)
Remarks
1. If a^ 1, then it just becomes [6]
Our aim is to develop the new two-parametric divergence to reduce the deviations. Applied these proposed measure to life-time distributions having different density functions. Also, we obtained some characterization results.Our proposed new two-parametric measure is defined as
(f ||g) = j{
1
a
J f (x)a-fi+1 g(x)fi-adx - 1
, a = fi,
(5)
fi < a + 1, a, fi > 0
Remarks
1. When we take f(x) = g(x) then divergence became zero
2. if fi = 1 (5) reduced to [5] of order a
3. if fi = 1, a^ 1 it converge to simply [6] divergence
4. if g(x) = 1, (5) reduces to simply [12]
œ
1.1. Comparasion between known measure and new proposed measure
Example 1.2. Assume X and Y are two non-negative random variables with probability density functions as f (x) = 2x; 0 < x < 1 and g(x) = 2(1 — x); 0 < x < 1. The comparison was derived from (1.2.). The table (1) shows the comparison between the known measure and the proposed measure. From, table (1), we conclude that the divergence is reduced in the proposed divergence measure as compared to the known divergence measure. It means that when we introduce a parameter into a known measure, the distance reduces. We say that it is an alternate measure of the known divergence measure. Figure 1 demonstrates this.
Table 1: Comparasion between known measure and new proposed measure.
a ß Dhc (f ||g) Daß (f ||g)
0.1 1 0.094 0.076
0.2 1 0.18 0.11
0.3 1 0.26 0.12
0.4 1 0.34 0.12
0.5 1 0.42 0.10
0.6 1 0.51 0.08
0.7 1 0.61 0.05
0.8 1 0.72 0.02
0.9 1 0.85 0.008
0.2 0.4 0.6 0.8
ot.
Figure 1: Divergence between known measure and new proposed measure
Theorem 1. Assume X and Y are two non-negative random variables with probability density functions f(x) and g(x), then
Daß (f ||g) > 0 (6)
with equality if and only if f(x) = g(x). Proof. By using Gibbs'inequality
CO
jf (xVoggxjdx > 0 (7)
J (f g(xf- -1) uog (f (rn^-' - B dx > 0
0
CO
\a-
- I (f (x)'-ß+1 g(x)ß-' - 1) log(f (x)'-ß+lgx()x)ß-' - 1)dx > 0 (9)
(8)
then (-log) is a convex function and f g(x) = 1 we get
0
Í (f (x)a-/5+1 g(x)^-a - 0 log , , , j}XKa-)dx >
J V J *(f (x)K-P+1 g(x)P-a - 1) >
CO
-logI g(x)
(10)
and
J (f (xY-^g(xra - 1 log f {xr-+gX){x)- - 1} dx > -log(1) (11)
After simplification, we obtained the result. ■
Definition 1.1 Let X be an integer of finite measure. If f(x) and g(x) are density functions and x is integrable, then log-sum inequality defined as
f(x) dx g(x)dx
Theorem 2. Assume X and Y are two non-negative random variables with the probability density functions f(x) and g(x) respectively and a + 1, > 0 then
DK,fi (f ||g) >;
Proof. By using log-sum inequality we have
f Woggffidx > [f (x)dx] log
1
(f||g) > Dkl(f||g) (12)
/ f (x)log (f (x)«-fl)?-« - 1)dx
CC
>[ f (x)log (f (x)«f(x)?-« - 1)dx
(13)
/ f (x)log (f (x)«-f1x)?-« - 1)dx
(14)
> —log J (f (x)a—fi+1 g(x)fi—adx — 1) 0
CO CO
lf (x)l0g (f {x)«f)x)fi—« — 1)dx = If (x)!ogf (x)dx
CC CC
— (a — fi + 1) j f (x)logf (x)dx — (fi — a) J f (x)logg(x)dx (15)
00
CO
+ j f (x)log(1) 0
After simplification, then
—logfi(a — fi)Dxfi(f ||g) > [(a — fi)H(X) + (fi — a)I(X,Y)] (16)
where, I(X,Y) is a [7]. Hence,we get the desired result. ■
We proposed the weighted generalized divergence measure (WGDM) in Section 2, We studied the montonic properties in Section 3, and in Section 4, we identified divergence for the different life-time distributions.We evaluate this article's conclusion in the final part.
C
C
2. Weighted generalized divergence measure(WGDM)
In this section,we propose the weighted generalized divergence measure.In real-life situations,[12] and [6] divergence give equal importance to the random variable, but in practical situations, this may cause problems.To overcome this problem, first [2] introduced a measure known as weighted entropy.The weighted entropy is defined as
HSw (X) = - J xf (x)logf (x)dx
(17)
Remarks 1. If x= 1 then, it becomes simply [12].
The weight function is represented by a factor x that gives more weight to the larger value of the random variable.This measure is known as shift-dependent. Many researchers have proposed various weighted measures [13], [8] and [9] In the recent past, based on the concept of weighted entropy, [14] gave weight to the [6] divergence, defined as
DIl (f llg)
xf (x)logf(x-l dx g(x)
(18)
Remarks
1. If x=1 then, it becomes simply [6]. Furthermore, it can be written as
DIl (f llg) = Ekl
Xlog
fJA g(x).
Definition Similar to (2) and based on (5), the weighted proposed measure is defined as
dwl (f ||g)
ft(a - ft)
J xaf (x)a-ft+1 g(x)ft-adx - 1
, a = ft,
(19)
ft < a + 1, a, ft > 0
Remarks
1. If xa = 1 then, it became reduced to (5).
To show the importance of random variables in the new two-parametric generalized divergence measure, we consider the following example:
1
Example 2.1. Suppose X and Y are two non-negative continuous random variables with the density function as follows.
1. f1 (x) = 1, 0 < x < 1 and gi(x) = nxn-1 0 < x < 1
2. f2(x) = 1, 0 < x < 1 and g2(x) = n(1 - x)n-1 0 < x < 1
Then,the weighted generalized divergence measure characterized the distribution function uniquely.
Using (5) after simplification, we get
D
1
u«ft)(f ||g) = jâ-ft)
(ft - a)(n - 1) + 1
1
Dl(a,ft)(f llg)
(20)
Again using (8) after simplification, we get
Dw
D1(a,ft)
(f llg)
ft(a - ft)
nft-
a + (ft - a)(n - 1) + 1
1
(21)
a
1
1
DW{K,p)(f llg) =
a - ft)
1
nft-ar(a + 1)r(t - s) 1 r(a + t - s + 1)
(22)
Where, t = n(ft-a)+1, s = (ft-a), and B(u,v) = / xu-1 (1 - x)v-1 dx =
0
1 xu-1 = rurv 0 (1 + x)u+v = r(u + v)
which is known as the "complete beta function." We can see from the preceding example that, without weight, our proposed measure has the same value, but given weight, the value is different, so we conclude that the weighted measure uniquely determines the distribution.
From the table (2.1), we conclude that the proposed generalized divergence measure is equal but the weighted generalized divergence measure is different. It can be seen that when different values of alpha, beta, and n are used,
the D\(a, ft)(f | |g) = D2(a, ft)(f | |g), but when the proposed divergence measure is weighted, the Dw1 (a,ft)(f ||g) < D2(x,ft)(f ||g).
Theorem 3. If X and Y are two non-negative continuous random variables with probability
density functions f(x) and g(x), then the inequality is as follows
CO
Dlft(f ||g) > jOgftdkl(f ||g) + a j f (x)logx, a = ft, ft < a + 1, a, ft > 0 (23)
Proof. By using log-sum inequality, we have
!f(x)l0g (x"f(x)-ft+1J(x)ft- - 1)dx > J f (x),°g
0 v J w 6 w ' 0 (24)
f(x)
(xaf (x)a-ft+1g(x)ft-a - 1)
Jf (x)l0g (x" f (x)*fg(x)ft-* - 1) dx =
CC
—log J (xaf (x)a-ft+1g(x)ft-a - 1) dx
dx
(25)
C
J f (x)l0g {xaf {x)a-/+{1X^{x)ft-a - 1) ^ = ^g [ft(a - ft)DZft (f l |g)] (26)
Now from L.H.S of (24), we have
i f (x)log (-—:—,a-) dx = H(x) [(a - ft + 1) - 1]
j {xaf (x)a-ft+1g(x)ft-a -1) v 'LV H ' J
+(ft - a)I(X, Y) - a J f (x)logx
C
C
and
f(x)log
f(x)
dx
f (x)log
(.xaf (x)a—fi+1 g(x)fi—a — 1) logfi(a — fi)
CO
[H(x)(a — fi) + (fi — a)I(X, Y)] — a J f (x)logx
{xa f (x)a—/+(Xg!(x)fi—a — 1) ^ = log fi [ H(x) 1 (X Y)]
CC
+a J f (x)logx
Using (26) and (30), we obtained the result.
(28)
(29)
■
Theorem 4. Let X and Y be two random variables with weighted generalized divergence(WGD) DWfi (f I lg) and a=fi, a, fi > 0, then
CC
Dlfi (f IIg) < Ja—f) I xaf (x)X—fi+1 g(x)fi—adx —
(a — fi)
+ 1
(30)
Proof. Since we know that for any pasitive number(for any x > 0) then by using this inequality logx < x-1 we get
1
Dw,fi (fIIg) = log w—fi)
Jxaf(x)a—fi+1g(x)fi—adx — 1
Da,fi (fIIg) = w—fi)
Jxaf(x)a—fi+1g(x)fi—adx — 1
1
CC
Dlfi (f ||g) = fi^Tfi)! xaf (x)a—fi+1 g(x)fi—a dx —
(a — fi)
1
After simplification, we obtained the result.
(31)
(32)
(33) ■
3. MONTONIC PROPERTIES
Definition 3.1 A function's monotonicity gives insight into how it will behave. If the graph of a function increases only as the equation's values increase, the function is said to be monotonically increasing. Similar to this, a function is said to be monotonically declining if its values exclusively decrease. In this section, we demonstrate the monotonic properties of the proposed divergenec measure. Consider the following numerical examples:
Example 2.2. Assume X and Y are two non-negative random variables with probability density functions as
1. f1 (x) = 2x; 0 < x < 1 and gi(x) = 2(1 — x); 0 < x < 1
x ( 2 x)
2. f2 (x) = 0 < x < 2 and g1 (x) = K '; 0 < x < 2
1
CO
1
CO
1
Using (5) after simplification, we get
1
(f \ \g) = ßä^ß) [r(a - ß + 2)T(ß - a + 1) - 1]
(34)
Again using (19) after simplification, we get
D
D2(a,ß)
1(a,ß)(f ^ ( f || g ) =
1
ß(a - ß) 1
D2(a,ß)(f ||g)
/2f(2a - ß + 2) A r(a + 3) 1 1
ß(a - ß)
2(2a+1)r(2a - ß + 2)
Where,
1
1 x"-1
r(a + 3) TuTv
1
which is known as complete beta
B(u,v) = fxu—1 (1 — x)v—1 dx = f > , — , , 0 0 (1 + x)u+v T(u + v)
function.
Here, from the below graphs (a), (b), and (c), we obtain that for different values of a, fi and n
then the measure D1(a,fi)(f 1 ^ D2(a,fi) (f 1 |g), D7(K,fi) (f 1 ^ and D2(«,p) (f 1 |g) respectively, indicate increased behavior.
Figure 2: Monotonie behavior of proposed weighted and non-weighted measure
4. Divergence measures for some well-known life-time distribution
In this section, we obtained the divergence measures for some life-time distributions using the
new proposed divergence measure. Where,
1 11 %u—1
U= fia^fij' p= fi — a, w= ep + 1 , and B(u,v) = f xu—1(1 — x)v—1dx = J (Y+^+V = rurv ,
beta function.
r(u + v)
Table 2: Proposed divergence measure for some life-time distribution
Distribution f(x) g(x) x
Uniform 1 m e(m-x)(e-1) me 0 < x < m
Exponential ne-nx nee-nex x > n, e > 0
Finite range r(1 - x)r-1 re(1 - x)(re-1) 0 < x < 1; r, e > 0
Beta sxs-1 sexse-1 o < x < 1, e, s > 0
Power bxb-1 be xbe-1 0 < x < c; b, e > 0
cb cbe
Proposed Divergence measure
e p
p(e -1) + 1 e p
.(a - ß + w) e p
> - ß + w) e p
[(a -r ß + w) e p
a- ß + w
Table 3: Proposed weighted divergence measure for some life-time distribution
Distribution f(x) g(x) x
Uniform 1 m e(m-x)(e-1) me 0< x < m
Exponential ne-nx nee-nex x > n,
Finite r(1 - x)r-1 re(1 - x)(re-1) 0< x < 1
Beta sxs-1 sexse-1 0< x < 1
Power bxb-1 cb be xbe-1 cbe 0 < x < c
Proposed Divergence measure
U
repmar(a + 1)r(w - p) _ [ r(a + w - p + l) r ep r(a + l)
[(a - ß + w + 1)a+1 rePrr(a + 1)r(rw + a - ß)
[ r(2a + rw - ß + 1) r e p
U
1
(a - ß + w) + a
I_e pbca_
[ a - be (a - ß) + b(a - ß + 1)
1
1
1
5. Application
Concerning the applicability of the newly proposed divergence measure, we analyzed two sets of actual data published by Almongy et al.[1] based on COVID-19. The first data set was taken over 108 days from Mexico country. Data was collected from March 4 to July 20, 2020.This data set represents the mortality rate. We consider only 30 observations from 108 observations using a random number table. Dataset-1
1.041,2.988,5.242,7.903,6.327,7.840,7.267,6.370,2.926,5.985,7.854,3.233,7.151,4.292,2.326,3.298,5.459, 3.440,3.215,4.661,3.499,3.395,2.070,2.506,3.029,3.359,3.778,3.219,4.120,8.551.
The second data set was taken over 30 days from the Netherlands country. Data was collected
from March 31 to April 30, 2020.This data set also shows mortality rates.
Dataset-2
1.273,6.027,10.656,12.274,1.974,4.960,5.555,7.584,3.883,4.462,4.235,5.307,7.968,13.211,3.611,3.647,6.940, 7.498,5.928,7.099,2.254,5.431,10.289,10.832,4.097,5.048,1.416,2.857,3.461,14.918. With the parameters 01 and 02, both sets of data can be fitted as an exponential distribution.Here we used MLE method for unknown parameter estimation.The estimated value of parameter 01 = 0.220 and 02=0.1624 with different standard error. The estimated value of weighted proposed divergence measure are (f llg) = 1.543
DWxfi (f I lg) = 0.024. Our analysis demonstrates that Mexico has a higher mortality rate than the Netherlands.
6. Conclusions
In this communication, we proposed a new two parametric weighted generalized divergence measure of ordera and type ft. The characterization result is justified by the numerical example that it uniquely determines the distribution functions, and we also studied the mononic behaviour
of the proposed divergence measure.Finally, we derived some expressions for some life-time distributions and also showed the mortality rate of two different countries.
References
[1] Almongy, H. M., Almetwally, E. M., Aljohani, H. M., Alghamdi, A. S., and Hafez, E. H. (2021). A new extended Rayleigh distribution with applications of COVID-19 data Results in Physics, 23:104012.
[2] Belis, M. and Guiasu, S. (1968). A quantitative-qualitative measure of information in cybernetic systems (Corresp.). IEEE Transactions on Information Theory, 14(4):593-594.
[3] Di Crescenzo, A. and Longobardi, M. (2007). On weighted residual and past entropies. arXiv preprint math/0703489.
[4] Ebrahimi, N. and Kirmani, S. N. A. (1996). A characterisation of the proportional hazards model through a measure of discrimination between two residual life distributions. Biometrika, 83(1):233-235.
[5] Havrda, J. and Charvat, F. (1967). Quantification method of classification processes. Concept of structural a-entropy Kybernetika, 3(1):30-35.
[6] Kullback, S. and Leibler, R. A. (1951). information and sufficiency. Annals of mathematics statistics, 22(1):79-86.
[7] Kerridge, D. F. (1961). Inaccuracy and inference. Journal of the Royal Statistical Society. Series B (Methodological) ,184-194.
[8] Mirali, M.A.L.I.H.E H. and Fakoor, V. (2017). On weighted cumulative residual entropy. Communications in Statistics-Theory and Methods, 46(6):2857-2869.
[9] Moharana, R. and Kayal, S. (2019). On Weighted Kullback-Leibler Divergence for Doubly Truncated Random Variables. REVSTAT-Statistical Journal, 17(3):297-320.
[10] Gupta, R. D. and Nanda, A. (2002). a and fi entropies and relative entropies of distribution. Journal of statistics theory of application, 1(30):177-190.
[11] Renyi, A. (1961). On measures of entropy and information. Proceeding of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. In proceeding of the fourth Berkeley symposium on mathematical statistics and probability, 1:547-561.
[12] Shannon, C. E. (1948). A mathematical theory of communications. Bell System Technical Journal, 27:379-423.
[13] Suhov, Y. and Salimeh, S. Y. (2015). Entropy-power inequality for weighted entropy. arXiv preprint arXiv:1502.02188.
[14] Yasaei Sekeh, S. and Mohtashami Borzadaran, G. R. (2013). On Kullback-Leibler Dynamic Information. Available at SSRN 2344078.