A NEW TWO PARAMETRIC GENERALIZED DIVERGENCE MEASURE AND ITS RESIDUAL
Fayaz Ahmed
Department of Statistics, University of Kashmir, Srinagar, India [email protected] Mirza Abdul Khalique Baig
Department of Statistics, University of Kashmir, Srinagar, India [email protected]
Abstract
In this research article, we proposed a new two-parametric divergence measure and developed its weighted version. We also looked at its properties and specific cases with examples and also obtained some results and bounds for new two prametric weighted generalized divergence measure. With the aid of a numerical example that determine the distribution function and also studied some inequality for the new proposed divergence measure. The known divergence measure is the particular case of our proposed measure. The proposed measure uniquely characterized the distribution function using the proportional hazard rate model (PHRM). Its residual function is also being worked on.
Keywords: characterization result, Divergence measure, distribution function, proportional hazard rate model, Residual function.
1. Introduction
The idea of information measures plays an important role in the feild of information theory and other applied sciences.The conception of information measure( uncertianty) was first given by Shannon entropy [10].He suggested a method for achieving the probability distribution inherent uncertanity and established that it is a main component of information theory, which today has numerous applications across many fields.
Suppose X is a hypothetical continous non-negative random variable, then the shannon's [10] entropy is defined as
CO
HS (X) = -f f (x)logf (x)dx (1)
0
where f represent density function of X. In addition, it can be written as
HS (X) = E(-logf (x)) The HS (X) is equal to the expected value of (-logf(x)).
The concept of the distinction between the two density distributions is obtained from their function and is often used in Shannon entropy. The most common divergence applied in information theory is the Kullback-Leibler divergence, also known as relative entropy or Kullback-Leibler information divergence measure[4] (KL divergence). It is widely used in parameter estimation, contigency tables and ANOVA tables.
Suppose f(x) and g(x) are the two probability distributions for a continuous random variable X and Y, then the KL divergence[4] is given by.
CO
DKL (f || g) = j f (x)logg(X) dx (2)
Furthermore, it can be written as
f (x)
g(x).
DKL (f||g) = EKL log Remarks
1. If g(x)=1, then it simply becomes Shannon's[10].
2. If g(x)= f(x), then Kullback-Leibler[4] divergence is reduced to zero.
In this regard, Renyi[9] presented the generalization of Kullback-Leibler divergence[4] of order fi, which is defined as
DR(f ||g) = log jf (x)ßg(x)1-ßdx, ß = 1, ß > 0 (3)
0
Remarks
1. If fi^ 1, then it just becomes Kullback-Leibler[4] divergence. Since many researchers have developed distinct generalizations of Kullback-Leibler divergence[4] in distinct manners, Gupta and Nanda[8] introduced a new generalization of the KL divergence measure of order "fi" defined as follows.
TO ß_ 1
DG(f||g) = l°gl{gX)) f(x)dx, ß = 1, ß > 0 (4)
Remarks
1. If ß^ 1, then it simply becomes Kullback-Leibler divergence[4] Our goal is to develop a new Kullback-Leibler divergence[4] measure that seems to be two-parametric in nature.
Our proposed measure is defined as follows
to ß_a
Da,ß(f ||g) = ^/(g^) f(x)adx, ß = a, a > 1, a,ß > 0 (5)
Additionally, it can be expressed as
TO
Da,ß(X||Y) = log f (f (x))ß (g(x))a_ß dx (6)
0
Fayaz Ahmed and Mirza Abdul Khalique Baig A NEW TWO PARAMETRIC GENERALIZED DIVERGENCE MEASURE AND ITS RESIDUAL
Remarks
1. When we take f(x) = g(x) then divergence became zero
2. if a = 1 (5) reduced to the Gupta and Nanda[8] of order ft
3. if a = 1, ft^ 1 it converge to simply Kullback-Leibler divergence[4]
4. if g(x) = 1, (5) reduces to simply Shannon entropy[10]
The rest of the article consists of the following sections. In Section (2) we study the weighted generalized divergence measure and in Section (3) we study the weighted generalized residual divergence measure. In Section 4 we derive some characterization results using the proportional hazard rate model. Finally we studied some properties and bound the new proposed divergence measure.
2. Weighted Generalized Divergence Measure
We proposed the weighted generalised divergence measure in this section. Shannon's measure and Kullback-Leibler divergence[4] both give the random variable identical weight in real-world scenarios, but this can be problematic. To solve that problem, weighted entropy first developed by Belis and Guiasu [1] . The weighted entropy is defined as
CO
HSS(X) = - f xf (x)logf (x)dx (7)
0
Remarks
1. If x= 1, then, it becomes simply Shannon entropy[10]. A factor x that gives more weight to the random variable's higher value provides as the expression of the weight function. Shift-dependent is the term for this measure. Suhov and Yasaei Sekeh Mirali et al.[11], Mirali et.al[6] and Rajesh et al.[7] and many researchers who have presented numerous weighted measures.
Based on the idea of weighted entropy, in recent years, Yasaei Sekeh et al.[12] gave weight to the Kullback-Leibler divergence[4] divergence, defined as
CO
dwl (f \\g) = f xf wloggx) dx (8)
Remarks
1. If x=1 then, it becomes simply Kullback-Leibler divergence[4]. Additionally, it can be written as
D'il (f |\g) = Ekl
X,ogf (x)
g(x)
Definition 2.1. Similar to (2) and based on (8) the weighted proposed measure is defined as
CO
Dwft (X\\Y) = --- log J x (f (x))ft (g(x))a-ft dx (9)
0
Remarks
1. If x = 1 then, it becomes reduced to (6).
The following example show how GDM and its weighted form differ from one another.
Example 2.1. Suppose X and Y are two non-negative continuous random variables with the density function as follows
1. f1(x) = 1, 0 < x < land g1 (x) = 2x 0 < x < 1
2. f2(x) = 1, 0 < x < 1 and g2(x) = 2(1 - x) 0 < x < 1
subsequently, the distribution function was characterized by the weighted generalized divergence measure.
Using (5) after simplification, we get
D1(a,p)(f ||g) =
1
(P - a)
log2
1
a - p + 1
D2(a,p)(f ||g)
Again using (9) after simplification, we get
and
(P - a) Dl(K,p)(f ||g) = 1
togr-pf T(a - p + 1) (p - a)l0g - p + 3)
vu — 1
Where, t = n(p-a)+1, s = (p-a), and B(u,v) = / xu-1 (1 - x)v-1 dx = f
0 0
which is known as complete beta function.
(10)
(11) (12)
TuTv
(1 + x)u+v r(u + v)
We can see from the above example that our proposed measure has the same value without weight but a different value when given weight,hence we draw the conclusion that the weighted measure uniquely determines the distribution.
3. Weighted Generalized residual Divergence Measure
In this section, we discuss the generalized residual divergence measure and the weighted generalized residual measure. Shannon's measure and the Kullback-Leibler divergence measure are inapplicable to a component that has survived for some units of time. To overcome this problem, [3] introduced a measure known as the residual measure of entropy. This measure of uncertainty for a random variable's remaining lifetime Xt = [X - t|X > t] is defined as
CO
hs (x «=7 ir f dx (13)
Remarks
1. If t = 0, then it simply becomes Shannon entropy[10]. The weighted form is represented by (13)[2] and defined as
CO
Hi (X; t) = -j xf(|logf()dx (14)
Definition 3.1. The generalized residual divergence measure is defined as follows in accordance with (13) and based on (14)
TO
^№0 = j/^igx/m)dx (15)
Remarks
1. If t=0 then it simply becomes Kullback-Leibler divergence[4].
The generalized residual divergence measure's weighted form is defined as
TO
№ 0 = / xffi« <16>
Remarks
1. If x= 1 then it reduced to (15). To proceed in this manner, the generalized residual divergence measure Gupta and Nanda[8], which is defined as
co -a 1
DG(XIY0 = ^**JM («) dx (17)
It can be summarised as
DG (X||Y; «=¡¿i* / (f)' (Gir dx (18)
We proposed the weighted form of (18), which is similar to (17) and is defined as
DG (X||Y; () =¿1 * f (Gi)"dx (19)
Definition 3.2. Let X and Y be two non-negative random variables. The proposed weighted residual divergence measure is defined as
№ o = ¡-a log j x (^ / (G§) tx m
Remarks 1. If x = 1 and a = 1 then reduced to (18)
4. Characterization result
Proportional hazard rate model (PHRM)
Definition 4.1. Assume that X and Y are two non-negative random variables with a survival function F(x) and G(x) respectively. Then the following relationship holds as
G(x) = F(x)v (v > o) or Ag (x) = vAf(x)
This mathematical relationship is crucial in the development of various statistical models. Many statistical disciplines, including medicine, reliability, economics, and survival analysis, employ this approach.
Theorem 1. For any t > 0, the following equality is true if random variables X and Y satisfy the proportional hazard rate model with proportionately constant p >0.
Dawp (X||Y; t) =
j-—^ log (texp(p - a)DG (X| | Y; t) + J exp(p - a)D%(Xl | Y; t) j dv (21)
Proof. Rewriting the (20) to
J x(®' ($)'" dx = J(Jv°dv)j x(® p( Gm ^ dx
t0
v
t0
mymrdx=>jrnrdx
\F(t)J \G(t)J
p g(x) -p
CC
dv
+ / (/ ($)' №) dxj
v=t \x=v )
jxf))^ (Gg§) " * dx = exp(p - a)Da,p(X| | Y; t) Using the proportional hazard rate model in (24), we have
CC
i (f (x))p (g(x))a-p = (F(t))p (G(t))a-p exp(p - a)Da,p(X| | Y; t)
(22)
C n a C l t x \ C n a
Ix ($)'($) "dx = fU^d-v +1Vdv) Jx m m- d (23)
(24)
(25)
(26)
(f (x))p (g(x))a-p = (F(t)fa-p)+p exp(p - a)Da,p(X| |Y; t)
using (23),(24) and (26) in (20) we obtained the desired result.
(27)
■
Theorem 2. If the proportional hazard rate model is satisfied by the two random variables X and Y and proportionally constant p >0, and DW,p(X| | Y; t) is increasing for all t>0, then Da(X||Y; t) exclusively determines the F(t).
Proof. Rewriting (20) as
exp(p - a)DW (X||Y; t) = J x fyY (g^V - dx
(28)
Diff. (28) w.r.to t we have
lexp ((p - a)DG(X| |Y; t)) = -t (AF(t))p (Ag(t)T-
CC
+ f x (f (x))p (g(x))a-pd (F(t)G(t))
(29)
dtexp ((p - a)DGW (X| | Y; t)) = -t (Af (t))p (Ag (t))a -p (30)
+ [¡Af(t) - (a - ¡)Ag(t)] exp(p - a)DxJ(X||Y; t)
By using the (PHRM)
dtexp ((p - ^DWW | Y; t)) = -t (AF (t))p (pAF (t))a -
+ [pAf(t) - (a - ¡)pAF(t)] exp(p - o)dO,p(X||Y; t)
If, we fixed t>0 then AF(t) is the solution of the equation z(xt) = 0 where z(o) = exp(p -a)D°°p(X| | Y; t) > 0 since we assumed that it is increasing in t if xt ^ to the z(to) = to Now diff. w.r.to t we have
dz(xt) = Kt(v)x-p(xt)(a-1) - [p(a - p) + p] exp(p - a)DxJ(X| | Y; t) (32)
dXt
then dX¡z(xt) =0
xt =
[y(a - p) + p] exp(p - (X| |Y; t) at(y)a-p
1
a-1
(33)
then we can say that xt = x0 Here, xt represents AF(t) Hence D^(X||Y; t) characterized the survival function uniquely. ■
5. Properties and Bounds
Log-sum inequality
Definition 5.1. Let X be a bounded integer. If f(x) and g(x) are density functions and x is integrable, then
'f(u)du'
J f (u)l°8g(uU)du — J f (u)du
log
g(u)du
Definition 5.2. If D°°p(X| | Y; t) is increasing in t. Then, the survival function F is said to have increasing(decreasing) D°°p(X| | Y; t) of order p and type a expressed as IWGRDM or DWGRDM. It indicate weather F is increasing or decreasing if D^(X||Y; t) > (<)0
Theorem 3. If the X and Y be the two non-negative random variables represent the life spans of two system components with density function f(x) and g(x) and survival functions respectively, t>0, then DOf (X||Y; t) for 0 < p < a where a = 1 attains a lower bound as follows
to p a_p to p
D°f №0 >/(§}) '°g{GG§)) - dx + j--J (F§) l^x (34) Proof. With the aid of log-sum inequality, we have
x dx
1 (^ )\P
(ÍMVlog_ÍMl_ -1 (MVdx°g_f_ (35)
[h,)) gx(®mrdx -1^) g 1 x)p(suy-pdx ()
CO
U (t))
t
p
log
( fix!)p \F(t)J
x(m) (Geo) dx
(M\ U (t))
log
(rn. V
\F(t)J
( fM)p ( gix) ) a-p
^ ix mr № dx
= (p - a)Dxwp(X||Y; t)
(t)J (t)J
dx
From L.H.S. we have
(36)
(37)
CO
x
Co Co Co
p/ (FH)log (f$) - / (Fi)'logxdx-p/ (fD)log (f§)
CC p
-(a - p) /( №'og (Gi ) dx
(38)
■
Using (36) and (37), we derived the result.
Theorem 4. The upper bound DWap(X||Y; t) is valid for 0 < p < for the random variable X and Y with the support of (0,k] and also with the pdf f(x) and g(x) and survival function respectively,t>0 then
Df(X||Y; t) <
/ x )p (Gi)'-p dxlogx (fi)p ($)"-p dx
/ x{fF(t)) (G(t)) dx
+ 'og(k -1)
(39)
Proof. We have a log-sum inequality for
f № dxf ($)' - dx < f ($)'-dx
}x (f (x))p (g(x))a-p (40)
-p
-p
/ (F(t))p (G(t))a-p dx ,p ( g(x)) a-p
Jx (M) (GH)" dx'ogx (Fi)' ($)" dx = Jx (M) (GM)- dx
After simplifying, we got the desired result.
t ........ (41)
(p - a)Dxwp(X|| Y; t) - log(k - t)
■
Theorem 5. Suppose X and Y be two random variables with weighted generalized residual divergence measure(WGRDM) for 0 < p < we get
#™< ¡Th (/x^ ($)'" dx -1
Proof. From the support of this inequality log < (x - 1) we have
.p ( <y(r) ) a-p
(42)
D
p(xiiY;o = p-a-/($)' dx
\F(t)J \G(t)J
(43)
(P — *)Da (X||Y; t) = log (fl^)" - dx (44)
(gJA\a—j \F(t)J \G(t)J
After simplification, we obtained the result. ■
Theorem 6. Consequently, if the hazard rate Xp(t) or risk rate is decreasing in t then
(P _ a)Da (X||Y; t) =
i>mr m_dx -1
iou a_p
logJx(w)) dx_PlogXF(t)
(45)
(46)
P _a\ J \G (t)J Proof. The equation (20) is rewritten as
D-(X||Y;t) = 'Jx (f$)' ()dx (47)
to _ p a—j
Da(X||Y; t) = j—-alog j x(FX)) (Xf(x))j(— dx (48)
Therefore, F(x)<F(t) for x > t which implies that Xp(x) < Xp(t) then we have
to _ p a—j
Da(xiiy;') = j—a"gjx{^) X(')?(— dx (49)
After simplifying, we got the desired result. ■
6. Conclusion
In this communication, we proposed a new two parametric weighted generalized divergence measure of order p and type a. The characterization result is justify by the numerical example that it uniquely determine the distribution and also we studied its residual function. Finally, we obtained the bound for the new proposed divergence measure and also studied its properties.
References
[1] Belis, M. and Guiasu, S. (1968). A quantitative-qualitative measure of information in cybernetic systems (Corresp.). IEEE Transactions on Information Theory, 14(4):593-594.
[2] Di Crescenzo, A. and Longobardi, M. (2007). On weighted residual and past entropies. arXiv preprint math/0703489.
[3] Ebrahimi, N. and Kirmani, S. N. A. (1996). A characterisation of the proportional hazards model through a measure of discrimination between two residual life distributions. Biometrika, 83(1):233-235.
[4] Kullback, S. and Leibler, R. A. (1951). information and sufficiency. Annals of mathematics statistics, 22(1):79-86.
[5] Kerridge, D. F. (1961). Inaccuracy and inference. Journal of the Royal Statistical Society. Series B (Methodological) ,184-194.
[6] Mirali, M.A.L.I.H.E H. and Fakoor, V. (2017). On weighted cumulative residual entropy. Communications in Statistics-Theory and Methods, 46(6):2857-2869.
[7] Moharana, R. and Kayal, S. (2019). On Weighted Kullback-Leibler Divergence for Doubly Truncated Random Variables. REVSTAT-Statistical Journal, 17(3):297-320.
[8] Gupta, R. D. and Nanda, A. (2002). a and ß entropies and relative entropies of distribution. Journal of statistics theory of application, 1(30):177-190.
[9] Renyi, A. (1961). On measures of entropy and information. Proceeding of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. In proceeding of the fourth Berkeley symposium on mathematical statistics and probability, 1:547-561.
[10] Shannon, C. E. (1948). A mathematical theory of communications. Bell System Technical Journal, 27:379-423.
[11] Suhov, Y. and Salimeh, S. Y. (2015). Entropy-power inequality for weighted entropy. arXiv preprint arXiv:1502.02188.
[12] Yasaei Sekeh, S. and Mohtashami Borzadaran, G. R. (2013). On Kullback-Leibler Dynamic Information. Available at SSRN 2344078.