THE NEGATIVE BINOMIAL-AKASH DISTRIBUTION AND ITS APPLICATIONS
Rajitha.C.S and Ashly Regi •
Department of Mathematics, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India. rajitha.sugun@gmail.com,cs_rajitha@cb.amrita.edu
Abstract
A new two-parameter negative binomial mixture distribution named as negative binomial-Akash distribution is introduced in this paper. The proposed distribution is attained by compounding the negative binomial distribution with the Akash distribution. Some of its special characteristics are also derived, including factorial moments, mean, variance, index of dispersion etc. Furthermore, the behaviour of mean, variance and index of dispersion are discussed. The parameters of the proposed distribution are estimated using the maximum likelihood estimation method. This distribution can be used for modeling overdispersed count data. The usefulness and application of the proposed distribution are illustrated using two actual count data sets.
Keywords: Akash distribution, AIC, BIC, count data, method of maximum likelihood, mixture distribution.
1. Introduction
Count data modeling plays a vital role in the statistical literature. Usually Poisson distribution(PD) is used for analyzing the count data. It is a discrete probability distribution for modeling the number of occurrences of an event in a given period of time. Equi-dispersion is a main feature of PD which means that variance and mean are equal. However, in practice, the count data observed often shows overdispersion with mean smaller than variance (or underdispersion with mean greater than variance). Even in many fields, the count data shows the nature of the overdispersion. When this happens, the PD cannot handle overdispersed count data. To overcome this problem, an extension of PD is applicable. Therefore, the negative binomial distribution (NBD) was used in modeling over dispersed count data. The NBD is in fact a Poisson mixture distribution in which the distribution's parameter itself is considered as a random variable which follows gamma distribution. The NBD is a discrete failure distribution in a Bernoulli test sequence before a predetermined success occurs. The application of NBD can be found in various sectors, such as bio-statistics, accident statistics, actuarial science and economics. Although the NBD allows excessive dispersion, if the count data shows an excessive number of zeros, the NBD also does not work well. As a result, many studies have been conducted to find new distributions which provide better fit for overdispersed count data. Experiments show that mixed distribution, such as Poisson mixture and NB mixture distributions, provides better fit to count data than traditional count distributions. Numerous studies show that mixing PD and NBD with some lifetime probability distributions, such as the exponential distribution, provide better fit to overdispersed count data with an excessive number of zeros. However, studies show that Lindley distribution(LD) is a better model than one based on an exponential distribution. A detailed study of various mathematical properties, parameter estimation and application of LD was conducted by Ghitany et al., [3] and showed that LD is better than the exponential distribution. And this lifetime distribution
was introduced by Lindley [5] . However, there are many cases where the LD is not adequate. Therefore, Shanker [14] introduced a new lifetime distribution with a parameter called Akash distribution to model lifetime data, which is more flexible than LD. He derived some important statistical properties of the proposed distribution and the usefulness and applicability of this distribution were also discussed and illustrated with two sets of real life data. In addition to these models many more mixture distributions have been proposed and studied in the literature.
The Poisson mixture distribution was proposed by Sankaran[9] as a mixture of the PD with the LD named Poisson Lindley distribution(PLD). Later Mishra [13] achieved a two-parameter PLD by mixing the PD and the two-parameter Lindley distribution(TP-LD). The two-parameter distribution of Lindley was proposed by Shanker and Mishra [10]. Further, Shanker and Tekie [12] introduced a new quasi PLD by mixing the Poisson distribution with a new quasi Lindley distribution(QLD)[11]. After that Zamani et al., [17] introduced a new mixed PD called the Poisson-weighted Exponential distribution. Moreover, literature shows that NB mixtures provide better fit for count data than the Poisson mixture distributions. A new distribution of NB mixture by combining the NB with LD was proposed by Zamani and Ismail [16] which perform better than PD and NBD for count data. Subsequently Lord and Geedipally [6] analyzed the crash data containing excess number of zeroes using the NB-Lindley distribution(NB-LD) and compared their performance with PD and the NBD and found that the NB-LD works better than PD and the NBD. A new mixture distribution named NB TP-LD was proposed by Denthet et al.,[2]. Later a new three parameter mixed NBD called NB-Erlang distribution was introduced, and the applications of this distribution were performed using two sets of actual count data [4]. Saengthong and Bodhisuwan[8] studied four parameters NB-Crack distribution and estimated the parameters for the NB-Crack distribution using the MLE method and the moment method and these methods were illustrated with an application to accident data. The NB-generalized exponential distribution was introduced by Aryuyuen and Bodhisuwan [1]. Recently a new NB mixture distribution named NB-Sushila distribution was proposed by Yamrubboon et al., [15]. The NB-LD is a special case of this distribution.
In this article we present a new mixed NBD attained by compounding the NBD with the parameters s and 9 = e-Y and Akash distribution with parameter a. Furthermore, we derive various properties of the negative binomial-Akash distribution, including factorial moments (FM), mean and second order moment. The parameters of the NB-Akash distribution are derived by moment method and MLE method. We present the performance of the PD, the NBD, NB-LD and NB-Akash distribution using two sets of real data in terms of chi-square test of goodness of fit, log-likelihood, p-value, AIC(Akaike Information Criteria) and BIC(Bayesian Information Criteria) In Section 2, subsection 1 deals with the NBD, subsection 2 discusses the Akash distribution, subsection 3 discusses the proposed distribution called negative binomial-Akash(NB-Akash) distribution and derives its probability mass function. In Section 3 we discuss distributional characteristics such as FM, the mean and the second order moment and the behaviour of mean, variance and index of dispersion(ID). The estimation of the parameters is reported in Section 4. The usefulness and application of the NB-Akash distribution is discussed in Section 5. Finally, the conclusion is discussed in Section 6.
2.1. Negative Binomial Distribution
The NBD is used in cases where the data is overdispersed, that is, the mean smaller than the variance, to model count data. A discrete random variable Z is said to be a NBD with the
parameters s and 9 if its probability mass function (pmf) is
2. Proposed Model
z = 0,1,2,...,s > 0,0 < 9 < 1
(1)
9 is the probability of success and the experiment is repeated many times to obtain s successes. The FM of order m, the mean and the second order moment of NBD are
F[m](Z)
r(s + m) (1 - i r(s) ¥•
m = 0,1,2,...
E(Z)
s(1 - 9)
E{Z2) = s(1 - 9)[1+ s(1 - 9)]
The likelihood function (LF) of the NBD is given by
L(s,9) = nfs + ? 1 )9"s(1 - 9)j
j=1
Z= Zj
The log-likelihood(LL) function is
l(s, 9) = Z log (s + r) + nslog(9) + Z Zjlog(1 - ,
j=1
j=1
(2)
2.2. Akash Distribution
The Akash distribution was proposed by [14]. It is a modified version of Lindley distribution. The probability density function(pdf) of the one parameter Akash distribution with parameter a is
f (z; a)
a2 + 2
(1 + z2)e-az; z > 0, a > 0
The mean, variance and the MGF of the Akash distribution are
a2 + 6
E(Z) V (Z) =
a(a2 + 2)
a4 + 16a + 12 a2 (a2 + 2)2
Mz (t)
a2 + 2
+
a - t (a - t)3
(3)
(4)
2.3. Construction of NB-Akash Distribution
Definition 2.1. If the NBD has parameters s > 0 and d = e—7, Z|y ~ NB(s,d = e—7), where 7 is distributed as Akash distribution with parameter a, 7 ~ Akash(a), then the r.v Z follows NB-Akash distribution with parameters s and a, Z ~ NB — Akash(s,a).
Theorem 1. Let Z ~ NB — Akash(s, a) be a NB-Akash distribution as defined in Definition 2.1, then the pmf of Z is
P(z)
a3 (z + s - 1
a2 + 2V z Jin, U
z z (-1)
+
a + s + k (a + s + k)3 _
(5)
z = 0,1,2,..., s, a > 0
Proof. Since Z|7 ~ NB(s,9 = e-Y) and 7 ~ Akash(a), then pmf of Z can be attained by
/> TO
p(z) = p(zh)f (i; a)dj
(6)
m
3
a
3
1
2
a
1
2
k
where
P(z|Y)= (Z + Z ^e-YS(1 - e-Y)z = (Z + Z ^ t $ (-1)ke-Y(s+k)
and f (7; a) is the pdf of Akash distribution. Substituting (7) in (6) we get
P(z)
)£( 0 (-1)kfe-Y(s+k)f (Y; a)dY
¿(Z) (-1)'M7(-(s + k))
z + s - 1
Z / k=0 z + s - A ^ I z
Z - k=0
a3 (z + s - A v^/z
a2 + 2
k=0
e k (-1)
+
a + s + k (a + s + k)3_
Figure 1: The pmf of the NB-Akash d/sfn'bwh'onfor various values of parameters
■
3. Some Distributional Characteristics
This section is devoted to the discussion of FM, mean, variance and Index of Dispersion of NB-Akash distribution.
Theorem 2. If Z ~ NB - Akash(s, a), then the FM of order m of Z is
H [m] (Z)
a3 r(s + m) a2 + 2 r(s)
k=0
ei m K-1)
+
a - m + k (a - m + k)3
(8)
Proof. If Z|y ~ NB(s,9 = e 7 ) and 7 ~ Akash(a), then the FM of the order m of Z can be attained by
H[m](Z) = EY [H[m](Z|Y)]
The FM of order m of a NB mixture distribution where 9 = e-Y is
"r(s + m) (1 - e-Y)m"
H[m](Z) = EY
r(s)
e-7m
r(s + m)
E7(eY - 1)m
1
2
k
z
m
1
2
k
Using binomial expansion for the term (eY - 1)1 we can write
F[m](Z)
k=0
E (m )(-l)kE(e7(m-k))
r(s + m)
r(s + m)
a3 r(s + m) a2 + 2 r(s)
E^) (—l)kM7(m - k)
=" E C
+
a - (m - k) (a - (m - k))3_
The mean and variance are derived from (8) are given by
E(Z)= s
a (a - 2a + 3) (a - l)3(a2 + 2)
- l
E(Z2)= s(s +1) V (Z) = s(s + 1)
a3(a2 - 4a + 6) | ^^ a3(a2 - 2a + 3) 2
+ s
(a - 2)3(a2 + 2) a3(a2 - 4a + 6)
- (2s2 + s) - (3s2 + s)
(a - l)3(a2 + 2) a3(a2 - 2a + 3)
+ 2s2
■
(9)
(10)
(11)
(a — 2)3(a2 + 2) ' J (a — 1)3(a2 + 2)
Index of Dispersion (ID) is defined as the ratio of the variance to the mean denoted by D=V(Z)/E(Z)
s(s + 1)0—^ — (3s2 + + 2s2
D
s
a3 (a2-2a+3) 1 _(«-1)3(«2+2) - 1
(12)
1
2
Table 1: Mean, variance and ID of NB-Akash distribution for various parameter values
Mean
""N. a s 3 4 5 6 7 8
3 2.5227 1.3457 0.9063 0.6834 0.5496 0.4604
4 3.3636 1.7942 1.2083 0.9112 0.7328 0.6138
5 4.2045 2.2428 1.5104 1.1389 0.9159 0.7673
6 5.0455 2.6914 1.8125 1.3667 1.0991 0.9208
7 5.8864 3.14 2.1146 1.5945 1.2823 1.0742
8 6.7273 3.5885 2.4167 1.8223 1.4655 1.2277
Variance
"N. a s 3 4 5 6 7 8
3 52.340 8.7693 3.4687 1.9336 1.2835 7.838
4 85.685 13.965 5.3878 2.943 1.9224 1.3938
5 126.98 20.299 7.688 4.1349 2.6668 1.9137
6 176.22 27.7689 10.370 5.5091 3.5168 2.5022
7 233.41 36.3754 13.433 7.0659 4.4724 3.1593
8 298.56 46.1187 16.878 8.805 5.5335 3.8850
Index of Dispersion
a s 3 4 5 6 7 8
3 20.748 6.517 3.828 2.83 2.335 2.047
4 25.474 7.734 4.459 3.23 2.624 2.27
5 30.2 9.051 5.09 3.63 2.912 2.494
6 34.928 10.318 5.722 4.031 3.2 2.718
7 39.654 11.585 6.353 4.431 3.488 2.941
8 44.381 12.852 6.984 4.832 3.776 1.6055
Table 1 sum up the behavior of mean, variance and ID of NB-Akash distribution for selected parameter values, where ID is defined as the ratio of the variance to the mean denoted by D=V(Z)/E(Z)
Figure 2 shows the behaviour of mean, variance and ID for different values of the parameters. Since the ID is greater than 1, the distribution is suitable for overdispersed count data.
Figure 2: Behav/or of mean, variance and ID for var/ows values of parameters
4. Estimation of Parameters
The NB-Akash distribution parameters are evaluated using the moments method and the MLE procedure.
4.1. Method of Moments
The NB-Akash distribution has two parameters to estimate, which can be estimated using the first two moments about zero. For moment method, the parameters, s and a are evaluated by equating the moments of the sample and population.
mi = s
a3 (a2 - 2a + 3)
(a - 1)3(a2 + 2)
1
(13)
, . a3(a2 - 4a + 6) 2 \ a (a - 2a + 3) 2 m2 = s(s + 1), v 2—-V - (2s2 + s) , „,3, 2—-V + s2 (14)
2 v (a - 2)3 (a2 + 2) v '(a -1)3 (a2 + 2) v '
By equating equations 13 and 14 to the first two sample moments, the moment estimates of the two parameters s and a can be obtained.
Rajitha.C.S and Ashly Regi RT&A No 2 (68) THE NEGATIVE BINOMIAL-AKASH DISTRIBUTION_Volume 17, June 2022
4.2. Method of Maximum Likelihood
The LF of the NB - Akash(s, a) is
s,a) = n (s+Zj-1) E
1
+
a + s + k (a + s + k)3_
Hence the LL function is
l(z; s, a) = 3nlog(a) - nlog(a2 + 2)
+ E log(s + - 1)! - logz;! - log(s - 1)!
j=i
z,
'Z,
+ E log E (k (-i)k ,=i k=0 vk
12
+
a + s + k (a + s + k)3_
The optimal estimates of the parameters are obtained by partially differentiating this equation with respect to s and a.
d n
-l(z;s,a) = E(Y(s + z,) - Y(s))
+ EE EZ=0 (k)(-1)k ((a+s+k)2 + («+s6+k)5 1 Ek=0 (k)(-1)k (0+i+k + (a+s+k)3)
(15)
j=1 Ek=0 (k where Y(k) = is a digamma function
d . 3n 2na
—l(z; s, a) =---~—-
da x ' a a2 + 2
+ EE Ez=0 (j)(-1)k ((q+s+k)2 + (q+s+k)5) (16)
j=1 Efc=0 (k)(-1)k (a+Tk + (a+s+k)3)
Maximum likelihood estimates are obtained by equating Eq.(15) and Eq.(16) to zero. But solving these equations is complicated and difficult. So these equations are solved numerically using Newton Raphson method.
x
n
5. Result and Discussion
This section explains the application and usefulness of the NB-Akash distribution. The proposed distribution is compared with PD, NBD, NB-LD using two real-time data sets. The distributions used for comparison are:
(a)Poisson distribution(PD): The pmf of Poisson distribution for the random variable Z can be written as
e-AAz
P(Z = z) = —z!—; z = 0,1,2,..., A > 0 (17)
(b)Negative Binomial Distribution(NBD): If Z denotes a random variable which follows negative
binomial distribution with parameters s and 9, then
P(Z = z) = [probability of having (s - 1) successes in (z + s - 1)
trial] x [probability of achieving s success]
Z + s - 1! 9« s-1
z + Z - h 9s
x 0
J9S—1 (1 - 9)(z+s-1)-(s-1)
L^9s (1 - 9)z; z = 0,1,2,..., s > 0,0 <9 < 1 (c)Negative Binomial Lindley Distribution (NB-LD ): The pmf of NB-LD can be written as
(18)
P(Z = z) = ^ (s + zz- ^ ¿j (-1)j ;
9 + 1
j=0\j
(19)
z = 0,1,2,...,s,9 > 0
(d)Negative Binomial Akash Distribution (NB-Akash):The pmf of NB-Akash distribution is given by
p(Z = z) =
z = 0,1,2,..., s, a > 0
'z + s - 1 a2 + 2 I z
3
k=0
(-1)'
1
+
2
a + s + k (a + s + k)3_
(20)
Example 5.1. The data for this example is taken from the article [16] which provides information on 9,461 motor insurance policies according to which the number of accidents of each policy is recorded. The data set is overdispersed because the variance of the data is greater than its mean. The result of the proposed distribution is compared with Poisson, negative binomial, negative binomial-Lindley distributions. The parameter estimation and goodness-of-fit analysis are done through R software. For model comparison, measures such as chi-square test, the p-value, LL, AIC and BIC are used. Based on these measurements, the table 2 shows the NB-Akash distribution performs better than the PD, NBD and NB-LD.
Table 2: Observed and expected frequencies of Example 5.1
No. of No. of Fitting of distribution
claims drivers Poisson NB NB-LD NB-Akash
0 7840 7638.3 7843.3 7853.6 7852.1
1 1317 1634.6 1290.2 1287.4 1288.4
2 239 174.9 257.7 247.6 247.9
3 42 12.5 54.5 54.2 54.3
4 14 0.7 11.8 13.2 13.2
5 4 0 2.6 3.5 3.5
6 4 0 .6 1 1
7 1 0 0.2 0.3 0.3
8 0 0 0.1 0.2 0.3
Estimated A = 0.214 s = 0.7 s = 4.63 $ = 4.7477
parameter 9 = 0.765 9 = 23.55 a = 23.2
degrees of freedom 2 3 4 4
Chi-square 293.8 8.66 6.997 6.79
p-value < 0.01 0.01 0.072 0.079
LL -5490.78 -5348.00 -5344.7 -5344.678
AIC 10983.56 10700 10693.4 10693.36
BIC
10982.95 10699.22 10692.98 10692.94
Example 5.2. The data are taken from the article [6]. The data contain fatal accidents at the exit of the single vehicle highway on horizontal two-lane rural curves between 2003 and 2008. The parameters are estimated using the MLE method and the Poisson, negative binomial, negative binomial-Lindley distribution are fitted to the data. The performances of these distributions are compared in terms of chi-square tests of goodness of fit, p-value, LL, AIC and BIC. Table 3 shows the NB-Akash distribution performs better than the PD, NBD and NB-LD.
Table 3: Observed and expected frequencies of Example 5.2
No. of No. of Fitting of distribution
claims drivers Poisson NB NB-L NB-Akash
0 29087 28471.6 29204.8 29133.6 29099.6
1 2952 3918 2706 2855.5 2906.1
2 464 269.6 567.4 503.1 498
3 108 12.4 141.1 120.9 116.2
4 40 0.4 37.8 35.9 33.4
5 9 0 10.6 13.1 11.2
6 5 0 3 3.3 4.2
7 2 0 .9 3.3 1.7
8 3 0 0.3 0 0.8
9 1 0 0.1 0 0.4
10+ 1 0 0 3.3 0.4
Estimated A = 0.138 s = 0.138 s = 1.018 s = 1.1881
parameter Q = 0.2584 Q = 9.212 a = 10.364
degrees of freedom 2 3 4 4
Chi-square 2297.31 57.47 11.68 8.0666
p-value < 0.01 < 0.01 0.02 0.089
LL -14,208.1 -13,557.7 -13,529.8 -13528.43
AIC 28418.2 27119.4 27063.6 27060.86
BIC 28417.59 27118.98 27063.49 27060.75
6. Conclusion
In this paper a new mixed NB distribution named as negative binomial-Akash distribution is proposed by mixing the negative binomial distribution and Akash distribution. Some of the important characteristics such as FM, mean, variance and ID are studied, . The MLE method is used to evaluate the parameters of the NB-Akash distribution. The utility of NB-Akash distribution was illustrated using two real data sets. From the result it can be inferred that the NB-Akash distribution provides better fit than the PD, NBD and NB-LD.
References
[1] Aryuyuen, S. and Bodhisuwan, W. (2013). The negative binomial-generalized exponential (NB-GE) distribution. Applied Mathematical Sciences, 7:1093-1105.
[2] Denthet, Sunthree and Thongteeraparp, Ampai and Bodhisuwan, Winai. Mixed distribution of negative binomial and two-parameter Lindley distributions, 104-107, 2016.
[3] Ghitany, M. and Atieh, B. and Nadarajah, S. (2008). Lindley distribution and its application. Mathematics and CompMters in Simulation, 78:493-506.
[4] Kongrod, S. and Bodhisuwan, W. and Payakkapong, P. (2014). The negative binomial-Erlang distribution with applications. International Journal of Pure and Applied Mathematics, 92:389-401.
[5] Lindley, D.V (1958). Fiducial Distributions and Bayes' Theorem. Journal of the Royal Statistical Society. Series B (Methodological), 20:102-107.
[6] Lord, D and Geedipally, S (2011). The negative binomial-Lindley distribution as a tool for analyzing crash data characterized by a large amount of zeros. Accident; analysis and prevention, 43:1738-1742.
[7] Saengthong, P. and Bodhisuwan, W. (2013). Parameter estimation for negative binomial-Crack distribution and its application. Journal of Science and Technology, 33:125-130.
[8] Saengthong, P. and Bodhisuwan, W. (2013). Negative binomial-crack (NB-CR) distribution. International Journal of Pure and Apllied Mathematics, 84:213-230.
[9] Sankaran, M. (1970). The discrete poisson-lindley distribution. Biometrics, 26:145-149.
[10] Shanker, R and Mishra, A. (2013). A two-parameter Lindley distribution. Statistics in Transition new series, 1:45-56.
[11] Shanker, R and Amanuel, G.H. (2013). A New Quasi Lindley Distribution. International Journal of Statistics and Systems, 8:143-156.
[12] Shanker, R. and Tekie, A.L. (2014). A new quasi Poisson-Lindley distribution. International Journal of Statistics and Systems, 9:87-94.
[13] Mishra, A. (2014). A two-parameter Poisson - Lindley distribution. International Journal of Statistics and Systems, 9:79-85.
[14] Shanker, R. (2015). Akash Distribution and Its Applications. International Journal of Probability and Statistics, 4:65-75.
[15] Yamrubboon, D. and Bodhisuwan, W. and Saothayanun, L. and Sharma, S. (2017). The Negative Binomial-Sushila Distribution with Application in Count Data Analysis. Thailand Statistician, 15:69-77.
[16] Zamani, H. and Ismail, N. (2010). Negative Binomial-Lindley Distribution and Its Application. Journal of Mathematics and Statistics, 6:4-9.
[17] Zamani, H. and Ismail, N. and Faroughi, P. (2014). Poisson-weighted exponential: Univariate version and regression model with applications. Journal of Mathematics and Statistics, 10:148154.