A NEW EXTENSION OF KUMARASWAMY DISTRIBUTION FOR IMPROVED DATA MODELING ¡PROPERTIES AND APPLICATIONS
Mahvish Jan1, S.P.Ahmad2
1,2 Department of Statistics, University of Kashmir, Srinagar, India [email protected], [email protected]
Abstract
In this manuscript, we have introduced a new model of the Kumaraswamy distribution known as SMP Kumaraswamy (SMPK) distribution using SMP technique. The SMPK distribution has the desirable feature of allowing greater flexibility than some of its well-known extensions. A comprehensive account of statistical properties along with the estimation of parameters using classical estimation method is presented. Furthermore, a simulation study is carried out to assess the behavior of estimators based on their biases and mean square errors. Finally, we consider two real-life data sets; we observe that the proposed model outperforms other competing models using goodness of fit measures.
Keywords: Entropy, SMP transformation, Kumaraswamy distribution, Order statistics, Maximum
likelihood estimation.
Probability models offer a decisive role in data analysis, so researchers aim to create novel probability models to handle large data sets in many different domains. Statistical illustration is crucial in real-data studies because novel applications and phenomena is steady, necessitating the continuous construction of probability distributions. Despite the fact that there are many traditional distributions for dealing with data, new distributions are required to overcome inadequacies of these distributions and solve the problems more effectively and precisely. Several techniques of generalizing distributions have been introduced to increase the adaptability of traditional distributions by introducing extra parameters. Suppose the random variable X has the Kumaraswamy distribution with parameters ft and A respectively, then its probability density function (PDF) and cumulative distribution function (CDF) are respectively given by:
A two-parameter Kumaraswamy distribution for modeling hydrological data was introduced by [9]. Moreover, several new families of probability distributions have been introduced for modeling such type of data based on Kumaraswamy distribution, for example [3] introduced the Kumaraswamy Weibull distribution with application to failure data. A generalization of the Kumaraswamy distribution was proposed by [10] and derived some of its statistical properties and referred to it as the Exponentiated Kumaraswamy distribution and its log-transform. A new distribution using quadratic rank transmutation map was developed by [7] and named
1. Introduction
g(x; ft, A) = ftAxA-1 (1 - xA)ft-1; 0 < x < 1, ft > 0, A > 0 G(x; ft, A) = 1 - (1 - xA)ft; 0 < x < 1, ft > 0, A > 0
(1)
(2)
the distribution as Transmuted Kumaraswamy distribution. DUS-Kumaraswamy distribution having same domain as Kumaraswamy distribution introduced by [6] . A new continuous probability density function for a non-negative random variable as an alternative to some bounded domain distributions named Log-Kumaraswamy distribution was introduced by [4] . Kumaraswamy-Gull Alpha Power Rayleigh distribution was proposed by [8] . A generalization of the Exponentiated Kumaraswamy distribution was put forward by [5] and referred to it as the Transmuted Exponentiated Kumaraswamy distribution. A new distribution called Generalized Inverted Kumaraswamy-Rayleigh Distribution was proposed by [11]. A new distribution called the cubic transmuted Log-Logistic distribution was proposed by [15]. An innovative technique for generating probability distributions was proposed by [16] and named it as SMP technique. The CDF and PDF of SMP distribution are respectively given as
where F(x) = 1 — F(x) and for x £ R, F(x) is the CDF and f(x) is the PDF of the distribution to be extended.
GSMP(x) is a valid CDF. It satisfies the following properties:
1. Gsmp(—= 0; Gsmp(~) = 1
2. GSMP(x)is monotonic increasing function of x.
3. GSMP (x) is right continuous.
In the present manuscript, we investigate a novel extension of Kumaraswamy distribution using SMP method. The proposed distribution is named as SMP Kumaraswamy (SMPK) distribution. The primary rationale for contemplating SMPK distribution may be summarized as follows:
• The extension will involve the incorporation of additional parameters to capture more complex data patterns.
• This work will include theoretical derivations, properties of the new distributions, and
comparisons with the existing models.
• The proposed model will offer greater flexibility and provide better fit than the other competing models.
• The proposed model offers more flexible shapes of hazard and density plots.
• The proposed model can be used to model various datasets.
• The model considers classical estimation methodologies for parameter estimation.
• Applications of the extended distributions in real-world data analysis will be demonstrated to highlight their practical utility.
The rest of the paper is presented as follows. Section 2 introduces the SMP Kumaraswamy distribution; Section 3 and 4 unfolds reliability analysis and statistical properties, while section 5 focuses on the estimation of unknown parameters using the maximum likelihood approach for the proposed model. Section 6 and 7 presents simulation study and real-life applications. The conclusion of the study is given in Section 8.
(3)
a > 0
(4)
4. 0 < Gsmp (x) < 1
Mahvish Jan, S.P.Ahmad
A NEW EXTENSION OF KUMARASWAMY DISTRIBUTION FOR Ri^ N° 4(80) IMPROVED DATA MODELING :PROPERTIES AND APPLICATIONS_V°lume 19, December 2024
2. Smpk Distribution
The CDF and PDF of the proposed SMPK distribution is given by
gsmpk (x;a & a)
1-a
1 — (1 — x
gsMPK (x; a, ft, A)
A.p log «xA—1 (1 —xA f—1 elog a(1 a—1
AfixA—1(1 — xA )P—1;
a; a = 1, a > 0
■)P; a = 1
■A /
(5)
a = 0, a > 0 a=1
(6)
The density function plots of SMPK distribution for different combinations of parameters are presented in figure 1. From these plots it is evident that the proposed distribution is unimodal,
symmetric, and negatively skewed.
Figure 1: Plots of the PDF of SMPK distribution.
n a= 1.5,b = 2.5,1 = 3
n a= 1.6,b = 2.6,1 = 3
■ a = 0.5,b = 2.6,1 = 3
Figure 2: Plots of the CDF of SMPK DISTRIBUTION.
x
x
Mahvish Jan, S.P.Ahmad
A NEW EXTENSION OF KUMARASWAMY DISTRIBUTION FOR Ri&A, No 4(80) IMPROVED DATA MODELING :PROPERTIES AND APPLICATIONS_Volume 19 December, 2024
3. Reliability Analysis Of the SMPKumaraswamy Distribution This section focuses on reliability analysis of the SMPK distribution.
3.1. Survival Function
The survival function for the SMPK distribution is given as
1 _ eloga(1_xA)ft
R(x; a, ft, A) = 1 _ G(x; a, ft, A) =-1—^-; a = 1
3.2. Hazard Rate
The expression for the hazard rate of the SMPK distribution is obtained as
h(x; a, ß, A)
g(x; a, ß, A) _ log aAßxA-1(1 - xA)ß-1eloga(1-xA)ß
R(x; a,ß, A)
eloga(1-xA)ß _ 1
a = 1
From figure 3 it is clear that model has varying shapes like constant, decreasing, increasing and J-shaped for different values of parameters. Accordingly, the proposed model can be used to model datasets with such failure rates.
0.4 0.6
x
Figure 3: Plots of the hazard rate of SMPK distribution.
3.3. Reverse Hazard Function
The reverse hazard rate is defined as the ratio of the probability density function and the
corresponding distribution function. It is given as
hr( x; a, ß, A)
g(x; a, ß, A) _ logaAßxA-1 (1 - xA)ß-1 eloga(1-xA)ß
, a = 1
G(x; a, ß, A) a - eloga(1-xA)ß
4. Statistical Properties of the SMPK Distribution In this section, some important statistical properties of the SMPK distribution are presented.
4.1. Quantile function
Theorem 1: If X ~ SMPK (a, p, A) distribution, then the quantile function of X is given as
1
f log [u(1 — a) + a] \ p
log a
(7)
Where U is a uniform random variable, 0 < u < 1. Proof: Let G(x; a, p, A) = u
eloga(1—xA)p _
1a
x=
1
/log [u(1 — a) + aU p
log a
Remark:
The pth quantile is given by
1
/log [p(1 — a) + a]\ p
log a
A
x
u
A
A
x
p
4.2. Moments
The rth moment about origin of a random variable X having the SMPK distribution is obtained as
f1 reloga(1—xA)p logaApxA—1 (1 — xA)p—1
W = / ~ .11
Using the expansion,
^ = ^ x a — 1 dx
ex = £ j
j=0 j
Fr = Ap £ (log0;);+1 f1 xrxA—1 (1 — xA)P(j+1)—1 dx
j=0
, = £ </!b (A + 1)) (8)
/=0
B[(AA + 1),P(j + 1)] represents the beta function. Substituting r =1, 2, 3, 4 the first four moments about origin of the SMPK distribution are obtained.
Lemma l.Suppose a random variable X ~ SMPK(a, p, A) distribution with PDF given in Eq. (6) and let Ir (t) = J0 gSMPK(x; a, p, A) dx denotes the rth incomplete moment, then we have
«0 = o—tE1/1* (<4++1)) (9)
Proof:
Ir(t) = j xr gsmpk(x; a, p, A) dx
I (t)= rf xr eloga(1—xA)p logaApxA—t(1 — xA)p—1 dx r ( ) J0 a — 1
Ir (t)
ß ~ (log a)>+\ /, r
a - 1 j=0 j! 'A
B(tA; r + 1, ß(j + 1))
Where B[z; a, b] = fgXb 1 (1 _ x)c 1 dx is incomplete beta function, setting r=1 in Eq.(9) will yield first incomplete moment
I1(t)
ß f j^;1 + 1, ßj + 1)
a - 1 j=0
j.
4.3. Mean Residual life
Mean residual life is the expected remaining life given that a component has survived up to time t and is given by
1
Where
^(t) = R(tj (E(X) -Jo xgsMPK(x; a, ß, A) dx^J - t
E(x) = A E j1'(1 + 1, ß(j + 1)
a - 1 j=0 j!
f t ß E xg(x; a, ß, A) dx = —— —
J 0 a 1 ■_r\
— - b(a. 1 +1, ß( j +1)
j=0
v(t)
1a
ß g (log a)j+1
1 _ eloga (1-xAa - 1 j=0 j!
b( A +1, ß( j +1)) - b (tA;1 +1, ß( j +1)
t
4.4. Mean Waiting Time
Mean waiting time is the time elapsed since the failure of an item given that the item has failed in [0, t] and is given by
1 f t
P(t) = t _ gtj) Jo xgSMPK(x; a,ft, A) dx
H (t) = t -
G(t)Jo
_ß_E (loga)i+1 b (tA. 1 + 1 ß{j + 1)
a - eloga(1-tA)ß =o j. ; A + 1 ß(j + 1)
4.5. Renyi Entropy
The Renyi entropy was introduced by [17] in 1960 and is expressed as
Is (x)
1- s
log
1E
Is (x)=i-7 log/ gs (x) dx s > 0, s = 1
1 — s J-E
ßsAs-1B
1 f sj(loga)'+sßsAs-1
(a - 1)s =0 j.
1 - 1 ) (s - 1) + 1,ß(j + s) - s + 1
Remark: Shannon entropy is a special case of Renyi entropy for 5 = 1
4.6. Harvda & Charvat Entropy
The Harvard & Charvat entropy of a random variable X is defined by
Is (x)
1s
r E o
1 - gs (x) dx 0
where s > 0, s = 1
Is (x)
1s
1
1 E s1 (log(a))>+°as,s-1
(a - 1)s j=0
j!
—ßsAs-1 B ^ - ((s - 1) + 1),ß(j + s) - s + 1
1
1
1
4.7. Moment Generating Function
The moments of distribution are represented by the moment generating function (MGF). The following theorem provides the MGF for the SMPK distribution.
Theorem 2: Let X follows SMPK distribution, then the moment generating function Mx (t) is
M<» = ^ ££ «b (I + 1№ +,))
(10)
r=0 j=0 j!r!
Proof:The moment generating function of SMPK distribution can be obtained using the relation
~ tr
Mx (t) = E ZÏ ti
r=0
Using Eq. (8) in Eq. (11) and after necessary calculations, we get
œ œ ,r
m (t) = o-r EE /lrl
a 1 r=0 j=0 jiri
tr log(q)j+^ ( r
B{ { + 1, p(i + 1))
(11)
4.8. Order Statistics Theorem 3:The PDF of the general order statistics of SMPK distribution is given by
nl
g(t:n)(x) (t - 1)l(n - t)l
n -1
elogx(1-xA)li log a ApxA-1(1 - xA)p-1
a — 1
x E(-1)\ ,
i=0 v 1
eloga(1-xA)p - (
1 — a
i+t-1
Proof: Let x<T)x<2),...,x<n) be the order statistics of a random sample derived from SMPK distribution. Then, the PDF of tth order statistics is given by
g(t:n)(x)
n!
[G(x; a, p, A)]t-1 [1 - G(x; a, p, A)]n-t g(x; a,p, A) (12)
n—t
(t - 1)l(n - t)l
Prior to incorporating Eq.(5) and Eq.(6) in Eq.(12), we use binomial expansion of [1 - G(x; a, p, A)] as
[1 - G(x; a,p, A)]n-t = E(-1)1 - ^ [G(x; a,p, A)]1 j=0j
Thus, we obtain
nt
n-t
g(t:n)(x) = (t - 1)n(n - t)lg(x;a,p,A) Eo(-1),(vn - 1 [G(x;a,p, A)]i+t-1
g(t:n)(x) (t - 1)l(n - t)l
eloga(1-xA)p log aApxA-1(1 - xA)p-1
a — 1
x E(-1)1 =0
n - t /
eloga(1-xA)p - (
1 — a
i+t-1
(13)
The expression for PDF of minimum order statistics x<i) and maximum order statistics x<n) of SMPK distribution are respectively obtained by setting t=1 and t=n in Eq. (13).
œ
œ
5. Estimation of Parameters
We assume Xi, x2, x3, ...xn is a random sample of n observations drawn from the SMPK distribution with unknown parameters a, ft, A. The likelihood and log-likelihood functions are respectively given as
eloga E=1 (1_xA)ft (log a)n Anftn nn=1 xA-1 (1 _ xA)ft-1
l = n log
L(x; a, ft, A) log a Aft
(a _ 1)n
a1
i=1
i=1
+ log a £(1 _ xA )ft + £ (A _ 1) log Xi + (ft _ 1) log 1 _ xA) _ n log(a _ 1)
dl = da
dl
log (log a Aft) a log a _ (a + 1) a loga(a _ 1)2
+ g=1 (1 _ xA )ft = 0
n
dl
dft a _ 1 i=1 n
n r
+ Elog(1 _ xA) 1 + log a (1 _ xA)
,ft'
dA A(a _ 1) i=1
+ Elog
1 _ ft log a (1 _ xA)ft 1 xA _ (ft _ 1)
1- xA
Since, above equations are non-linear, we will use Newton-Raphson method and hence R software to solve these equations and estimate the parameters.
6. Simulation Study
To assess the performance of the proposed estimation method for the parameters of the SMPK distribution, we conducted a simulation study. Table 1 shows the true parameter values a, ft, A fixed at (0.5, 0.20, 0.25) & (1.5, 0.3, 0.4) respectively. Using R software, samples of sizes 20, 50,125 and 500 were randomly generated based on the quantile function from Eq. (7), with each scenario replicated 1000 times. For each parameter combination, we computed the MLEs along with their corresponding bias and mean squared errors (MSEs). The results are summarized in Table 1.
Table 1: MLE, Bias, and MSE for the parameters
Sample size Parameters MLE Bias MSE
n a ft A a ft X a ft a ft
20 0.5 0.20 0.25 1.552 0.217 0.411 1.337 0.058 0.241 28.782 0.006 0.357
50 0.893 0.201 0.295 0.599 0.037 0.106 2.219 0.003 0.025
125 0.585 0.200 0.260 0.234 0.026 0.059 0.161 0.003 0.008
500 0.514 0.202 0.249 0.085 0.016 0.024 0.026 0.004 0.002
20 1.5 0.3 0.4 1.000 0.240 0.292 0.500 0.060 0.108 3.781 0.016 0.042
50 1.000 0.248 0.333 0.500 0.052 0.067 1.459 0.008 0.018
125 1.080 0.259 0.378 0.428 0.041 0.022 0.865 0.004 0.008
500 1.489 0.295 0.378 0.010 0.005 0.004 0.356 0.001 0.002
n
n
a
0
x
0
From Table 1 it is clear that the MLEs exhibit stability and closely approximate the true parameter values. As the sample size increases across all parameter combinations, the MSE decreases, indicating enhanced precision in the estimation of model parameters. Additionally, the bias of all parameters consistently decreases with larger sample sizes, demonstrating improved accuracy of the estimation method.
7. Real Life Applications
In this section, we demonstrate the practical applicability of the SMPK distribution with two real life data sets. The potentiality of the proposed model is determined by comparing its performance with several other models, namely Transmuted Kumaraswamy distribution [7], Kumaraswamy Inverse Exponential distribution [14], Weighted Kumaraswamy distribution [1] and Kumaraswamy distribution [9] using goodness-of-fit criterions including -2ll, Akaike Information Criterion (AIC), Akaike Information Criterion Corrected (AICC), Hannan - Quinn information criterion (HQIC), Kolmogorov-Smirnov (KS) and P value statistics. The distribution with the lowest value of -2ll, AIC, AICC, HQIC, K-S and maximum P value is considered the best fit.
Application 1:Snowfall data
The data set relates to the daily snowfall amounts of 30 observations measured in inches of water, conducted in the vicinity of Climax by [12]. Application 2:Milk Production data
The data set shows the measurements of the proportion of total milk production in the first birth of 107 SINDI cows studied by [13]. The data has been previously studied by [2].
Table 2: Estimates, -1ll, AIC, AICC, HQIC, K-S statistic, and P-valuefor Dataset 1.
Model & A e c -2ll AIC AICC HQIC K-S P-value
SMPK 21.014 4.641 1.0648 - - -81.891 -75.891 -74.969 -156.437 0.081 0.991
TKD 0.945 - 0.614 5.819 - -80.794 -74.799 -73.871 -154.244 0.602 0.073
KIED 0.093 0.952 0.294 - - -79.227 -73.227 -72.304 -151.110 0.188 0.240
WKD - 7.810 1.001 - 0.8561 -79.118 -73.118 -72.196 -150.891 0.172 0.002
KUMD 0.861 6.8361 - - - -79.595 -75.595 -75.151 -154.294 0.121 0.774
Table 3: Estimates, -1ll, AIC, AICC, HQIC, K-S statistic, and P-value for Dataset 2.
Model a A e c -2ll AIC AICC HQIC K-S P-value
SMPK 0.066 3.471 1.439 - - -56.842 -50.842 -50.609 -104.434 0.047 0.969
TKD 1.823 - -0.561 3.436 - -54.097 -48.098 -47.865 -98.945 0.060 0.836
KIED 0.574 2.256 0.826 - - 75.946 81.946 82.179 161.143 0.261 0.080
WKD - 3.931 3.048 - 0.001 -52.816 -46.816 -46.582 -96.380 62.586 0.002
KUMD 2.195 3.436 - - - -50.789 -46.789 -46.674 -95.411 0.076 0.562
From Table 2 and Table 3 it is observed that SMPK distribution has least numerical value of all the comparison criterions and hence fits better to the real dataset as compared to other competing models.The plots of the fitted models are shown in figures 4 and 5.These plots also demonstrate that the SMPK distribution offers a close fit to both data sets.Additionally,Q-Q plots for the two data sets are also given in figures 6 and 7.Furthermore,we extract the shape of the hazard function from the observed data using the total time on test (TTT) plots given in Figures 8 and 9.The TTT plots for the data sets indicate that the data sets decreasing ,increasing hazard rate.
Model fitting for data set 1
Figure 4: Fitted density plots for dataset 1
Model fitting for data set 2
Figure 5: Fitted density plots for dataset 2
Normal Q-Q Plot for Data setl
Theoretical Quantiles
Figure 6: Q-Q Plot for dataset 1
Normal Q-Q Plot for Data set 2
-2-10 1 2 Theoretical Quantiles
Figure 7: Q-Q Plot for dataset 2
i/n
Figure 8: TTT Plot for data set 1
i/n
Figure 9: TTT Plot for data set 2
8. Conclusion
This study introduces the SMPK distribution as a new extension of the Kumaraswamy distribution, using the SMP approach. We have examined several statistical characteristics of the proposed model, including the survival function, hazard rate function, reverse hazard function, moments, quantile function, mean residual life, mean waiting time, Renyi entropy, Harvda & Charvat entropy, moment generating function and order statistics The parameter estimation is performed using the maximum likelihood estimation method. A simulation study is performed to evaluate
the performance of the maximum likelihood estimator (MLE) in estimating the parameters. The model's performance is evaluated using goodness-of-fit statistics. The proposed distribution is unimodal, symmetric and negatively skewed. Additionally, it displays constant, declining, rising, and J-shaped failure rates across various parameter values. Accordingly, the proposed distribution can be used to model datasets with similar failure rates. For practical applicability, the proposed distribution is applied to two real life datasets and it suggested that the SMPK model outperforms and provides a better fit than the Competitive models.
References
[1] M. Abd El-Monsef and S. Ghoneim. The weighted kumaraswamy distribution. Information, 18(8):3289-3300, 2015.
[2] A. Bhat, S. P. Ahmad, E. M. Almetwally, N. Yehia, N. Alsadat, and A. H. Tolba. The odd lindley power rayleigh distribution: properties, classical and bayesian estimation with applications. Scientific African, 20:e01736, 2023.
[3] G. M. Cordeiro, E. M. Ortega, and S. Nadarajah. The kumaraswamy weibull distribution with application to failure data. Journal of the Franklin institute, 347(8):1399-1429, 2010.
[4] A. I. Ishaq, A. A. Suleiman, H. Daud, N. S. S. Singh, M. Othman, R. Sokkalingam, P. Wiratcho-tisatian, A. G. Usman, and S. I. Abba. Log-kumaraswamy distribution: its features and applications. Frontiers in Applied Mathematics and Statistics, 9:1258961, 2023.
[5] J. Joseph and M. Ravindran. Transmuted exponentiated kumaraswamy distribution. Reliability: Theory & Applications, 18(1 (72)):539-552, 2023.
[6] K. Karakaya, I. Kinaci, C. Ku§, and Y. Akdogan. On the dus-kumaraswamy distribution. Istatistik Journal of The Turkish Statistical Association, 13(1):29-38, 2021.
[7] M. S. Khan, R. King, and I. L. Hudson. Transmuted kumaraswamy distribution. Statistics in Transition new series, 2(17):183-210, 2016.
[8] M. Kpangay, L. O. Odongo, and G. O. Orwa. The kumaraswamy-gull alpha power rayleigh distribution: properties and application to hiv/aids data. International Journal of Scientific Research and Engineering Development, 6(1):431-442, 2023.
[9] P. Kumaraswamy. A generalized probability density function for double-bounded random processes. Journal of hydrology, 46(1-2):79-88,1980.
[10] A. J. Lemonte, W. Barreto-Souza, and G. M. Cordeiro. The exponentiated kumaraswamy distribution and its log-transform. Brazilian Journal of Probability and Statistics, 27(1):31-53,
2013.
[11] A. S. Malik and S. P. Ahmad. Generalized inverted kumaraswamy-rayleigh distribution: Properties and application. Journal of Modern Applied Statistical Methods, 23, 2024.
[12] P. W. Mielke Jr, L. O. Grant, and C. F. Chappell. An independent replication of the climax wintertime orographic cloud seeding experiment. Journal of Applied Meteorology (1962-1982), pages 1198-1212, 1971.
[13] G. Moutinho Cordeiro and R. dos Santos Brito. The beta power distribution. Brazilian Journal of Probability and Statistics, pages 88-112, 2012.
[14] P. Oguntunde, O. Babatunde, and A. Ogunmola. Theoretical analysis of the kumaraswamy-inverse exponential distribution. International Journal of Statistics and Applications, 4(2):113-116,
2014.
[15] M. M. Rahman, J. A. Darwish, S. H. Shahbaz, G. Hamedani, and M. Q. Shahbaz. A new cubic transmuted log-logistic distribution: Properties, applications, and characterizations.
Advances and Applications in Statistics, 91(3):335-361, 2024.
[16] S. U. Rasool, M. A. Lone, and S. P. Ahmad. An innovative technique for generating probability distributions: A study on lomax distribution with applications in medical and engineering fields. Annals of Data Science, pages 1-17, 2024.
[17] A. Renyi. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1: contributions to the theory of statistics, volume 4, pages 547-562. University of California Press, 1961.