THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF DISTRIBUTIONS: THEORY AND APPLICATIONS
Ngozi O. Ubaka1 Friday Ewere2
department of Statistics, Federal University of Oyo-Ekiti, Ekiti State, Nigeria. 2Department of Statistics, University of Benin, Benin City, Edo State, Nigeria.
[email protected] [email protected]
Abstract
The continuous Bernoulli distribution is a one-parameter probability distribution which is useful in analysis on machine learning. A handful of studies has been done to generalize the continuous Bernoulli distribution. In this paper, we introduced a wider extension of the continuous Bernoulli distribution by considering its distribution function as a generator. We referred to the proposed family as the continuous Bernoulli-generated family of distributions. Basic statistical treatments of the proposed family such as the density and cumulative distribution functions, survival and hazard rate functions, quantile, moments, moment generating function, and Renyi entropy are derived. The method of maximum likelihood is employed to estimate the unknown parameters of the family and the asymptotic behaviour of the parameter estimates is investigated via Monte Carlo simulation study. The waiting time (in minutes) of 100 Bank customers and the tensile strength measured in GPa, of 69 carbon fibers data sets formed the basis for real-life data fittings. Results obtained from the fitting of the two data sets when compared with some existing non-nested models revealed that the fittings were in favor of the continuous-Bernoulli Weibull distribution over the rest competing distributions.
Keywords: Continuous Bernoulli Distribution; Moments; Quantile; Monte Carlo Simulation Study
1. INTRODUCTION
The cumulative distribution function (cdf) of the one-parameter continuous Bernoulli distribution has been defined by [13] as
i x,
F ( x, À) = <
Ax (l-à)1 X + À- 1 2À-1 '
à * k, o < x < i,
2
(1)
à =
x
2
Ngozi O. Ubaka and Friday Ewere
THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF RT&A' No3 (74)
DISTRIBUTIONS Volume 18, September 2023
with the probability density function (pdf) associated to (1) obtained as
caax (i-A)1-x, a*>2, o<x<i, f (x, A) = < (2)
1, a = >2
where the normalizing constant c is defined as 2tanh-1 (1 - 2 A)
CA=)
1-2A
(3)
and 2tanh
1
A = 12
(1 - 2A) = ln (1 - A) - ln (A), using the relation tanh-1 (x) = 1 ln ^ 1i£.j.
We denote a random variable X following the continuous Bernoulli distribution as X ~ CB (A). The continuous Bernoulli distribution has special application in machine learning. Particularly, in simulating the pixel intensities of natural images in deep learning and computer vision, mostly in the development of variational autoencoders. Similar to the one-parameter Topp-Leone and power distributions, the CB (A) distribution is also a one-parameter distribution with support on a unit-interval.
In the theory of statistical analysis of lifetime data, bounded distributions have found a wide variety of applications ranging from the field of engineering, actuarial sciences, economics, biological sciences, etc. Particularly, when the data are recorded in rates, percentages and proportions. Over the years, the beta and Kumaraswamy distributions are the topmost bounded distributions to be reckon with in regards to fitting [0,1]-valued data sets, until the advent of several methodologies in developing unit-interval distributions. Notable among these distributions are the log-Lindley distribution proposed by [10], unit-logistic distribution developed by [14], log-Xgamma distribution introduced by [2], Marshall-Olkin Topp-Leone distribution developed by [17], unit-Burr XII distribution studied by [11], Marshall-Olkin extended unit-Gompertz distribution studied by [15], transmuted Marshall-Olkin extended Topp-Leone Distribution introduced by [18], Kumaraswamy unit-Gompertz distribution proposed by [1], etc. It is noteworthy to mention that the power continuous Bernoulli distribution due to [3] and transmuted continuous Bernoulli distribution due to [4], apparently the only extensions of the classical continuous Bernoulli distribution belong to this list. The goal of this paper is to develop a novel family of distributions based on the continuous Bernoulli distribution, which is hoped to birth more tractable and flexible lifetime distributions in analyzing real data sets.
The rest of the paper is organized in the following sections. Section 2 is devoted to model formulation. Section 3 provides some sub-models from the proposed family of distributions. General mathematical treatments for the proposed family of distributions, the parameter estimation as well as the investigation of the asymptotic behaviour of the parameter estimates of the model via a Monte Carlo simulation are discussed in Section 4. Section 5 provides the applicability of the proposed family of distributions in real-life data fitting. Section 6 concludes the paper.
2. MODEL FORMULATION
Suppose a random variable T follows a known probability distribution with pdf f (t), [20] adopted the beta-generated technique developed by [6] to introduce the Topp-Leone-generated family of
distributions with cdf defined by
iG( x,£) ,
(1 -t)(t(2-t)) dt, 0 < t < 1, a > 0,
= G (x,£)a(2 - G (x,£))a , and the associated pdf obtained as
f ( x,a,£) = 2ag ( x, £ ) G ( x, £ )a-1 (l - G ( x, £ )) ( 2 - G (x, £ ))a-1.
As an alternative to the technique in (4), [5] introduced the so-called type II Topp-Leone generated (TIITL-G) family of distributions based on the methodology of [19] who introduced an alternative gamma-generator reported in [22]. The cdf and pdf of TIITL-G family are, respectively, defined by
(4)
(5)
i1-G(x,4) , „ ,
( ) ta-1 (1 -1)(2 -1 )a-1 dt,
= 1 -(1 -
■J'
J0
(1 - G2 (x^f
0 < t < 1, a > 0,
(6)
and
f (x,a,£) = 2ag (x,£) G ()(l -G2 (x,£)p . (7)
Motivated by the simplicity of the technique in (6) and using the CB (X) distribution defined in (3) as the generator, we develop a novel class of distributions with the cdf defined by
F (t, A, 4) =
j1-G(t,i)
(1 -A)'
G (t ,4)
-A
1 - 2A
A ф x, 0 < t < 1,
(8)
, G (t,4), A = %
The pdf corresponding to (8) is obtained as
'c^)^4 (1 "A)G(t,i), Аф X, 0 < t < 1,
f (t, A, 4) = <
(9)
, g x = >2
A random variable T having the cdf and pdf defined in (8) and (9), respectively, is said to follow the continuous Bernoulli-generated (CB (A,4)- G) family of distributions.
The survival and hazard rate functions of CB (A,4)- G family of distributions are defined in (10) and (11), respectively, as
>G(i,« (1 -X)G^ +X -1
S (t, A, 4) 4
and
h(t, A, 4) =
2A-1
1 - G (t ,4),
Аф X, 0 < t < 1,
A = 12
(10)
CAg(t,4)A-G(t4) (1 -A)
G (t ,4)
(1 -A)G
+ A-1
АфX, CA =(2A-1)Ca
(11)
g (t,4)
1 - G(t,4)'
a = 12
A
Furthermore, the quantile function of the CB (A, %)- G family of distributions is obtained as
ln [(1 - 2 A) u +A]- ln [A] 2tanh-1 (1 - 2A)
Whereas substituting u = 0.5 in (12), the median of the CB (A,£)- G family of distributions i
Qt (u) = g-
0 < u < 1.
(12)
is
obtained as Qt (0.5) = G
ln [2] + ln [A] 2 tanh-1 (1 - 2 A)
(13)
The utility of (12) is in generating random numbers from the CB (A,£)- G family of distributions, where u is generated from the uniform distribution satisfying 0 < u < 1.
3. SUB-MODELS OF THE CB (A,g)- G FAMILY OF DISTRIBUTIONS
This section is concerned with the formulation of tractable models from the CB (A, %)- G family of distributions based on the Weibull, Topp-Leone, Kumaraswamy and Burr XII distributions as the baseline distribution in (8).
3.1 The continuous Bernoulli Weibull CBW (A, a, ß) distribution
Let T be a random variable following the Weibull distribution with cdf, G (t,a, ß) = 1 - e~ßt° and pdf, g (t, a, ß) = aßta-1e~ßta, t > 0, a, ß> 0. We defined the cdf and pdf of the CBW (A, a, ß) distribution, respectively, as follows
(1 -A)1
-A
1 - 2A
F (t, A, a, ß) = <
and
f (t, A, a, ß) =
1 - e"
A * X, a, ß> 0, t > 0.
A = a, ß> 0.
(14)
Cwta-1e-ßta A^ (1 -A)1-
aßta-le-ßt"
A* X, a, ß> 0, t > 0, CJ = aßCA.
A = X, a, ß> 0
(15)
3.2 The continuous Bernoulli Topp-Leone CBTL (A, a) distribution The one-parameter Topp-Leone distribution is defined by the density function
g (t,a) = 2a (1-t)\t (2 - ?)]a-1 , a* 1, a> 0, 0 < t < 1, (16)
and the associated cdf is given by
G (t,a) = \t (2 - t)ia , a* 1, a> 0, 0 < t < 1, (17)
By inserting the pdf and cdf in (16) and (17) into (8) and (9), we defined the cdf and pdf of the CBTL (A, a) distribution, respectively, as
a
e
A
a
a
F (t, Х, a) = <
Х1^2-^" (1 _х)1>(2-^ -x
1 - 2Х
Х ф >2, a > 0, 0 < t < 1,
[t (2 -1 )]C
Х = X, a > 0.
and
f (t, X,a) =
CTL (1 -1 )[t (2 -1 )]a-1 X-[t (2-t )]a(l -X)[i(2-i)] , Х* a > 0,0 < t < 1, Cf = 2aCx.
2a (1 -1 )[t (2 -1 )]a
Х = X, a > 0
(18)
(19)
3.3 The continuous Bernoulli Kumaraswamy CBK (X,a, 0) distribution
The Kumaraswamy distribution developed by [12] is a bounded distribution with 2 shape parameters having the cdf, G (t) = 1 - (l - ta )0 and pdf, g (t) = a0ta-1 (l - ta )0 >, a, 0 > 0. By this information, the cdf and pdf of the CBK (X, a, 0) distribution is defined, respectively, as
F(t, Х, a, P) =
x(1-'')' (1 -x)1-(
1-( 1-ta
-x
1 - 2Х
Хф X, a, P> 0,0 < t < 1,
(20)
and
f (t, Х,а, P) =
1 -(1 - ta)P , Х = X, a, P> 0.
cХta-1 (1 - ta)P-1 Х^) (1 -Х)1"^)' , Хф X, a, P> 0,0 < t < 1, cХ=apCx.
(21)
apta-1 (1 - ta)'
P-1
Х = X, a, P > 0
3.4 The continuous Bernoulli Burr XII CBBXII (X,a, 0) distribution
A random variable T is said to follow the two-parameter Burr XII distribution, if the density function of T is defined by
g (t, a, 0) = a0ta-1 (1 + ta )-(0+1), a, 0> 0, t > 0, (22)
and the corresponding cdf is given by
G (t,a, 0) = 1 -(1 + ta)~0 , a, 0> 0, t > 0, (23)
By inserting (22) and (23) into (8) and (9), we defined the cdf and pdf of the CBBXII (X,a, 0) distribution, respectively, as follows
p
F (t, X,a, P) = <
X
(i+tar
(1 -x)1-
-X
1 - 2X
X ^ X, a,P> 0, t > 0,
(24)
and
f (t, X,a, P) =
Ckf
1 -(l + ta) , X = X, a, P> 0.
1 (l + ta)-(P+1)X(l+ta)-P(l-X)1-(l+ta)-P , X ^ X, a, P> 0, t > 0, CBx=apCx.
(25)
apta-1 (l + ta)
-(P+l)
X = X, a, P> 0
4. MATHEMATICAL PROPERTIES OF THE CB(l, 4)- G FAMILY OF
DISTRIBUTIONS
In this section, the mathematical properties of the CB (l,4)- G family of distributions such as the
rth non-central moments, moment generating function (mgf) and Renyi entropy are discussed. The method of maximum likelihood estimation is employed to estimate the model parameters and the asymptotic behaviour of the parameter estimates are investigated through a Monte Carlo simulation study.
4.1 The rth non-central moments
Let T be a random variable having the density function of the CB (l, 4)- G family of distributions, then the rth non-central moments of T is defined by
E[r ] = 6r = J*" trf (t,l,4)dt, r = 1,2,3,4,...
(26)
= CxT t]g(t,&X1-G(t& (1 -Xf(t,i) dt. Evaluating (26) yields the following results
E |T]= Cx £_ trg(t, & exp ((G(t, & ) ln (X) + (l - G(t, ln (l - X)) dt,
= XCx £_t]g(t,exp(o(t,&[ln(1-X)-ln(X)])dt,
= XCXp t] g(t, exp (G(t, [2 tanh-1 (l - 2X)]] dt. Applying the Maclaurin's series expansion of the exponential function,
(27)
G(t,&[2tanh-1(l-2X)]
= 1
[ 2 tanh-1 (1 - 2X)] "
n=0
n !
[G(t,#)]n
so that (27) now becomes, r » [2 tanh-1 (1 - 2X)]"
E
n=0
n !
j t]g (t[G(t,&)]ndt,
-p
» r2tanh-1 (1 - 2A)]" »
"=o n !(n + 1) j-»
» |~2tanh-1 (1 - 2A)T r = ACa£[ n,(n + 1) J Efc ]■ (28)
n=0
Where hn+1(t,4) = (n +1)g(t,4) [G(t,4)]nand E^7nr+1 ] are, respectively, the density function and rth non-central moments of the exp-G family of distributions with power parameter (n + 1). Thus, we can express the rth non-central moments of the CB (A, 4) - G family of distributions as a linear combination of the rth non-central moments of the exp-G family of distributions with power parameter (n +1).
For the purpose of numerical computation, we consider the two-parameter Weibull distribution as the baseline distribution. Hence, we compute the first four raw moments, variance, measures of skewness and kurtosis of the continuous Bernoulli Weibull CBW(A, a, 0) distribution in Table 1.
Table 1: The Moments of the CBW (A, a, 0) distribution for selected values of the Parameters
A 0 a W1 V1 »3 »4 S K
0.4 0.5 3 1.1721 1.5417 2.2068 3.3774 0.1679 0.0905 2.7315
5 1.0822 1.2283 1.4485 1.7636 0.0571 -0.3259 2.9750
7 1.0524 1.1367 1.2549 1.4119 0.0292 -0.5465 3.4994
3.0 3 0.6450 0.4669 0.3678 0.3098 0.0509 0.0889 2.7397
5 0.7563 0.5999 0.4944 0.4206 0.0279 -0.3265 2.8830
7 0.8147 0.6812 0.5822 0.5072 0.0175 -0.5310 3.6310
0.8 0.5 3 0.9696 1.0912 1.3758 1.9007 0.1511 0.4223 2.9993
5 0.9616 0.9824 1.0551 1.1827 0.0577 -0.0428 2.9123
7 0.9659 0.9640 0.9896 1.0415 0.0310 -0.2721 3.2045
3.0 3 0.5336 0.3305 0.2293 0.1743 0.0458 0.4181 2.9976
5 0.6720 0.4798 0.3601 0.2821 0.0282 -0.0523 3.0015
7 0.7478 0.5778 0.4592 0.3741 0.0186 -0.2719 3.0701
Information from Table 1 shows that the CBW distribution exhibits a left-skewed, right-skewed, platykurtic and leptokurtic properties which are essential in modeling heavy-tailed distributions.
4.2 The moment generating function
The moment generating function (mgf) of a random variable T with density function f (t) is defined by
MT (q) = E [eqt ] = £ eqtf (t) dt, (29)
Using similar approach in (29), we defined the mgf of the CB (A, 4) - G family of distributions as
* * [2 tanh-1 (1 - 22)1" qp r
Mt (q) = 2C2XIL- ( ^ )J E[^].
p=0 n\(n +1) p! L J
(30)
Since, eqt = X
(qt)1
p=0 f
4.3 The Renyi entropy
An entropy of a random variable say T, measures the degree of randomness associated with the
random variable T. The Renyi entropy of T is defined by [18] as
1 /• *
(r) = --log J f r(t) dt, r> 0, r* 1. (31)
1 - r ^ -*
By substituting (9) into (31), we defined the Renyi entropy of a random variable T following the CB (2,4)- G family of distributions as follows
(r) = -L]ogk)rJ™ gr(t,4)2r(1-G(t4))(1 -l)rG(t,4) dt
1 - r J-*
-log (Cx )r 2r J* ^^ (t, 4) exp (rG(t, 4) [ ln (1 -2) - ln (2)]) dt
i _ - log (Ci )r 27 J* gr (t, 4) exp (rG(t, 4) [2 tanh-1 (1 - 22)]) dt Again, applying the Maclaurin's series expansion of the exponential function,
r 1
1 -r 1
(32)
rG (t ,4)[ 2tanh-1(1-22)]
rn [2 tanh-1 (1 - 22)]"
n \
[G(t,4)]n
so that (32) now becomes, 1
(r)
1-r
log
« rn [2 tanh-1 (1 - 22)]n .*
(C2)r2rX—-r-— f gr(t,4) [G(t,4)]ndt
^^ n \ J-*
n=0
(33)
Two major properties of the Renyi entropy of a random variable T were identify by [9]. These include
(i) The Renyi entropy of T can assume a negative value;
(ii) For any r < r2, R < R and equality holds if and only if T is a uniform random variable.
Again, we compute the Renyi entropy of the CBW(2, a, 0) distribution for selected values of the parameters as shown in Table 2.
Table 2: Numerical computation of the Renyi entropy of the CBW (2, a, 0) distribution (2 = 0.8)
n=0
T
i rt a = 0.9, 0 = 0.5 a = 0.9, 0 =3.0 a = 1.5, 0=3.0 a = 1.5, 0 = 0.5
1 0.1 3.5600 1.5691 0.8868 2.0813
2 0.3 2.4724 0.4815 0.3213 1.5158
3 0.5 1.9849 -0.0060 0.0923 1.2869
4 0.7 1.6766 -0.3142 -0.0433 1.1513
5 0.9 1.4573 -0.5336 -0.1356 1.0589
6 2 0.8522 -1.1387 -0.3746 0.8199
7 4 0.4451 -1.5458 -0.5180 0.6765
8 6 0.2343 -1.7565 -0.5793 0.6152
9 8 0.0647 -1.9262 -0.6147 0.5799
The result in Table 2 validates the aforementioned properties of the Renyi entropy as suggested by [9].
4.4 Parameter estimation
4.4.1 Maximum likelihood estimation
The maximum likelihood estimation method is employed to estimate the parameters of the CB(2,4)-G family of distributions. Suppose (t1, t2,....tn) are random samples of size n from the CB (2,4) - G family of distributions, then the likelihood function is obtained as
n
LM = n^)21-G(t4) (1 -2)G(t,4)], <P = (2,4f . (34)
i=1
By taking the natural logarithm of both sides of (34), the log-likelihood function is obtained as
n n n
t{t,q>) = J>[g(^4)]+ln (35)
i=1 i=1 i=1
The maximum likelihood estimate, say p = (2,4) is obtained by differentiating the log-likelihood
function in (35) with respect to the parameters and equating the corresponding function to zero as shown below di{t,<p) _ l
32 2 . ,
Further simplification yields, 1 n
1E (1 - G (ti .4)),
n
2 = ' ' n
i=1
n rr if n
i=1 i=1
dg fe ,4)
Where g (ti, 4) = — and 54j is the j'h element of the vector of parameter 454 j
It is clear from these expressions that the parameters 2 can be solved analytically, whereas the parameter(s) 4j may require the use of software program such as R program for estimation.
4.4.2 Simulation study
In this subsection, we investigate the asymptotic behaviour of the parameter estimates of the CBW(2, a, 0) distribution. Random samples of size n = (15,25,50,75,100) are generated from the
CBW (2, a, 0) distribution at randomly fixed values of the parameters. A Monte Carlo simulation is repeated 1000 times and the following quantities are computed:
1 N
i) bias=n -p)
i=1
ii) root mean square error (RMSE) =
N
V) 2
1=1
N
iii) Coverage Probability of the 95% confidence interval of the estimates fy given by
j N ___
CP = ¥ ^1 (fy - var (fy) < < + var (fy) ) ' N i=1
Where I (.) is an indicator function and (fy) is the standard error of the estimate fy.
Table 3: Simulation results for bias, RMSE and CP of parameter estimates of CBW (X, a, 0) distribution
Parameters n Bias RMSE CP
a ß 2 a ß 2 a ß 2
a = 0.3 ß = 0.6 2 = 0.8 15 0.0042 0.3614 -0.2437 0.0752 0.5992 0.3598 0.986 0.988 0.908
25 -0.0215 0.3395 -0.2522 0.0623 0.5566 0.3558 0.958 0.972 0.888
50 -0.0578 0.3019 -0.2781 0.0527 0.4956 0.3441 0.948 0.970 0.864
75 -0.0704 0.2741 -0.2996 0.0477 0.4877 0.3253 0.938 0.940 0.878
100 -0.0961 0.2210 -0.3323 0.0421 0.4231 0.2926 0.958 0.964 0.910
a = 0.5 ß = 0.3 2 = 0.6 15 0.0324 0.1972 -0.1020 0.1422 0.2625 0.2880 0.978 0.958 0.918
25 0.0093 0.1887 -0.1074 0.1057 0.2472 0.2808 0.988 0.986 0.890
50 -0.0154 0.1628 -0.1158 0.0832 0.2404 0.2749 0.964 0.978 0.876
75 -0.0184 0.1017 -0.1356 0.0828 0.2361 0.2741 0.942 0.966 0.872
100 -0.0209 0.0772 -0.1648 0.0724 0.2227 0.2578 0.944 0.952 0.878
a = 0.9 ß = 3.0 2 = 0.4 15 0.1085 0.3271 0.0496 0.3043 1.2171 0.2746 0.956 0.998 0.914
25 0.0599 0.1131 0.0401 0.2177 0.9368 0.2645 0.964 0.990 0.904
50 0.0174 0.1082 0.0192 0.1920 0.8197 0.2632 0.926 0.956 0.852
75 0.0026 0.0824 0.0186 0.1619 0.7159 0.2586 0.914 0.942 0.824
100 -0.0043 0.0531 0.0079 0.1615 0.6676 0.2499 0.904 0.940 0.814
a = 0.9 ß = 0.6 2 = 0.4 15 0.0932 0.0618 0.0485 0.2758 0.3681 0.2862 0.978 0.940 0.910
25 0.0439 0.0527 0.0468 0.2190 0.3658 0.2812 0.966 0.938 0.858
50 0.0266 0.0523 0.0293 0.1871 0.3382 0.2702 0.938 0.928 0.818
75 0.0082 0.0470 0.0256 0.1551 0.3023 0.2532 0.950 0.938 0.844
100 0.0073 0.0452 0.0180 0.1524 0.3007 0.2521 0.922 0.912 0.828
From Table 3, we observe that the bias and root mean square errors of the parameter estimates decrease as the sample size n increases. Moreover, the coverage probability of the parameter estimates approaches the nominal level of 95% confidence interval.
5. REAL-LIFE DATA FITTINGS
The applicability of the proposed family of distributions is investigated in this section. To achieve this, two data sets including the waiting time (in minutes) of 100 Bank customers and the tensile strength measured in GPa, of 69 carbon fibers data sets are employed for data fittings. Some well-known non-nested models such as the Kumaraswamy Weibull (KW (X,a, 0)), Kumaraswamy inverse Weibull (KIW (X, a, 0)), Topp-Leone inverse Weibull (TLIW (X,a, 0) ), transmuted Weibull (TW (X,a, 0)) and the two-parameter Weibull distributions are employed alongside with the proposed continuous-Bernoulli Weibull (CBW (X,a, 0)) distribution to fit the two data sets. The data sets for the analysis are given below.
Data set 1: The first data set represents the waiting time (in minutes) of 100 Bank customers reported in [16]. The data set was first used by [8] to illustrate the flexibility of the Lindley distribution over the exponential distribution in data fittings. The data are given as follows: 0.8, 0.8, 1.3, 1.5, 1.8, 1.9 ,1.9, 2.1, 2.6, 2.7,2.9, 3.1, 3.2, 3.3 ,3.5, 3.6, 4.0, 4.1, 4.2, 4.2,4.3, 4.3, 4.4, 4.4, 4.6, 4.7, 4.7, 4.8, 4.9, 4.9,5.0, 5.3, 5.5, 5.7, 5.7, 6.1, 6.2, 6.2, 6.2, 6.3,6.7, 6.9, 7.1, 7.1, 7.1, 7.1, 7.4, 7.6, 7.7, 8.0,8.2, 8.6, 8.6, 8.6, 8.8, 8.8, 8.9, 8.9, 9.5, 9.6,9.7, 9.8, 10.7, 10.9, 11.0, 11.0, 11.1, 11.2, 11.2, 11.5,11.9, 12.4, 12.5, 12.9, 13.0, 13.1, 13.3, 13.6, 13.7, 13.9,14.1, 15.4, 15.4, 17.3, 17.3, 18.1, 18.2, 18.4, 18.9, 19.0,19.9, 20.6, 21.3, 21.4, 21.9, 23.0, 27.0, 31.6, 33.1, 38.5.
Data set 2: The second data set comprises of the tensile strength measured in GPa, of 69 carbon fibers tested under tension at gauge length of 20mm reported in [21]. This data set was also employed by [7] to demonstrate the applicability of the power Lindley distribution. The data are represented as follows: 1.312, 1.314, 1.479, 1.552, 1.700, 1.803, 1.861, 1.865, 1.944, 1.958, 1.966, 1.997, 2.006, 2.021, 2.027, 2.055, 2.063, 2.098, 2.14, 2.179, 2.224, 2.240, 2.253, 2.270, 2.272, 2.274, 2.301, 2.301, 2.359, 2.382, 2.382, 2.426, 2.434, 2.435, 2.478, 2.490, 2.511, 2.514, 2.535, 2.554, 2.566, 2.57, 2.586, 2.629, 2.633, 2.642, 2.648, 2.684, 2.697, 2.726, 2.770, 2.773, 2.800, 2.809, 2.818, 2.821, 2.848, 2.88, 2.954, 3.012, 3.067, 3.084, 3.090, 3.096, 3.128, 3.233, 3.433, 3.585, 3.585.
Some popularly used model selection criteria such as the maximized log-likelihood (LL), Akaike Information Criteria (AIC), and some goodness of fit test statistics such as the Komolgorov-Smirnov (K-S), Crammer von Mises (W*) and Anderson Darling (A*) test statistics with their corresponding p-value are considered to access the appropriate model for analyzing the two data sets. Tables 4 and 5 present the summary statistics for the fit of the distributions for the two data sets, respectively.
Table 4: Summary statistics for the waiting time data set
Models Estimates LL AIC K-S W A*
(p-value) (p-value) (p-value)
CBW a = 1.7229 0=0.0071 2 = 0.9356
-317.3098 640.6196 0.0423 0.0248 0.1682
(0.994) (0.9904) 0.9968)
KW a = 1.3727 -317.6755 641.3510 0 = 0.2015 2 = 1.3379
0.0508 0.0414 0.2578
(0.9587) (0.9263) (0.9660)
KIW a = 2.6384 0=1.1424 2 = -1.5224
-332.9531 671.9062 0.1099 0.4051 2.6255
(0.1785) (0.0698) (0.0427)
TLIW a = 0.5235 0=12.5524 2 = 0.9569
-327.1056 641.2112 0.0891 0.2449
(0.4044) (0.1951)
1.6727 (0.1402)
TW a = 1.5692 0 = 0.0157 2 = 0.6181
-317.8896
641.7791 0.0481 0.0384 0.2599
(0.9746) (0.9420) (0.9648)
Weibull a = 1.4584 -318.7307 641.4614 0 = 0.0305
0.0577 0.0609 0.4051
(0.8929) (0.8095) (0.8433)
Table 5: Summary statistics for tensile strength data set
Models Estimates
LL
AIC
K-S
(p-value)
W*
(p-value)
A*
(p-value)
CBW
a = 2.7806 ß = 0.1778 2 = 0.0026
-49.0740
104.1481
0.0400 (0.9999)
0.0142 (0.9998)
0.1210 (0.9998)
KW a = 3.9464 ß = 0.1690 2 = -0.1312
-49.9210
105.8421
0.0675 (0.9112)
0.0581 (0.8276)
0.3901 (0.8580)
KIW
a = 4.2588 ß = 2.8719 2 = -3.7556
-56.2704
118.5408
0.1061 (0.4193)
0.1995 (0.2688)
1.3439 (0.2185)
TLIW a = 0.5468 ß=34.889 2=3.4115
-58.0304
122.0608
0.1176 (0.2960)
0.2617 (0.1741)
1.7344 (0.1294)
TW
a = 5.9303 ß = 0.0021 2 = 0.6363
-49.1325
104.2650
0.0433 (0.9995)
0.0191 (0.9979)
0.1714 (0.9963)
Weibull a = 5.5045 ß = 0.0046
-49.5961
104.1923
0.0560 (0.9819)
0.0343 (0.9611)
0.2739 (0.9563)
From Tables 4 and 5, based on the conditions to measure superiority of models, the continuous-Bernoulli CBW (X,a, 0) distribution having the maximized log-likelihood value, least value in
terms of the AIC, K-S, W and A* test statistics with the corresponding highest p-value, outperforms the competitor distributions in analyzing the two data sets, and thus becomes the most appropriate model in fitting the data sets.
6. CONCLUSION
In this paper, we have developed a new class of probability distributions based on the continuous Bernoulli distribution. The proposed family is called the continuous Bernoulli-generated family of distributions. Mathematical derivation of some basic properties of the proposed family such as the density and cumulative distribution functions, survival and hazard rate functions, quantile, moments, moment generating function, and Renyi entropy were obtained. The method of maximum likelihood was employed to estimate the unknown parameters of the family and the asymptotic behaviour of the parameter estimates was investigated via Monte Carlo simulation study. Two reallife data sets including the waiting time (in minutes) of 100 Bank customers and the tensile strength measured in GPa, of 69 carbon fibers data sets were employed to illustrate the applicability of the proposed family. Existing non-nested models such as the Kumaraswamy Weibull, Kumaraswamy inverse Weibull, Topp-Leone inverse Weibull, transmuted Weibull and the two-parameter Weibull distributions were employed alongside the proposed continuous-Bernoulli Weibull distribution to
Ngozi O. Ubaka and Friday Ewere
THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF RT&A' No3 (74)
DISTRIBUTIONS Volume 18, September 2023
fit the two data sets. Results obtained from the fitting of the two data sets when compared using some model selection criteria and goodness of fit test statistics, revealed that the fittings were in favor of the continuous-Bernoulli Weibull distribution over the rest competing distributions.
References
[1] Akata, I. U., Opone, F. C. and Osagiede, F.E.U. (2023). The Kumaraswamy Unit-Gompertz Distribution and its Application to Lifetime Dataset. Earthline Journal of Mathematical Sciences, 11(1): 1-22.
[2] Altun, E. (2018). The log-xgamma distribution with inference and application. Journal de la Société de Statistique de Paris, 159(3): 40-55.
[3] Chesneau, C. and Opone, F. C. (2022). The power continuous Bernoulli distribution: Theory and Applications. Reliability: Theory & Application, 17(4): 232-248.
[4] Chesneau, C., Opone, F., and Ubaka, N. (2022). Theory and applications of the transmuted continuous Bernoulli distribution. Earthline Journal of Mathematical Sciences. 10(2): 385-407.
[5] Elgarhy, M., Nasir, M. A., Farrukh Jamal, F. and Ozel, G. (2018). The type II Topp-Leone generated family of distributions: Properties and applications, Journal of Statistics and Management Systems, 21: 1529-1551.
[6] Eugene, N., Lee, C. and Famoye, F. (2002). The beta-normal distribution and its applications. Communications in Statistics-Theory and Methods. 31: 497-512.
[7] Ghitany, M., Al-Mutairi, D., Balakrishnan, N. and Al-Enezi, I. (2013). Power Lindley distribution and associated inference. Computational Statistics and Data Analysis. 64: 20-33.
[8] Ghitany, M., Atieh, B. and Nadarajah, S. (2008). Lindley distribution and its applications. Mathematics and Computers in Simulation. 78: 493-506.
[9] Golshani, L. and Pasha, E. (2010). Renyi entropy rate for Gaussian processes. Information Sciences, 180: 1486-1491.
[10] Gomez-Déniz, E., Sordo, M. A. and Calderin-Ojeda, E. (2014). The Log-Lindley distribution as an alternative to the beta regression model with applications in insurance. Insurance: Mathematics and Economics, 54(1): 49-57.
[11] Korkmaz, M. and Chesneau, C. (2021). On the unit Burr-XII distribution with the quantile regression modeling and applications. Computational and Applied Mathematics, 40(1): 1-26.
[12] Kumaraswamy, P. (1980). A Generalized Probability Density Function for Doubly Bounded Random Process. Journal of Hydrology. 46: 79-88.
[13] Loaiza-Ganem, G. and Cunningham, J.P. (2019). The continuous bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems, 1326613276.
[14] Menezes, A. F. B., Mazucheli, J. and Dey, S. (2018). The unit-logistic distribution: different methods of estimation. Pesquisa Operational, 38(3): 555-578
[15] Opone, F. C., Akata, I. U. and Altun, E. (2022). The Marshall-Olkin Extended Unit-Gompertz Distribution: Its Properties, Regression Model and Applications. Statistica, 82(2): 97-118.
[16] Opone, F. C. and Ekhosuehi, N. (2018). Methods of Estimating the Parameters of the Quasi Lindley Distribution. Statistica, 78(2): 183-193.
[17] Opone, F. C. and Iwerumor, B. N. (2021). A New Marshall-Olkin Extended Family of Distributions with Bounded Support. Gazi University Journal of Science, 34(3): 899-914.
[18] Opone, F. C. and Osemwenkhae, J. E. (2022). The transmuted Marshall-Olkin extended ToppLeone Distribution. Earthline Journal of Mathematical Sciences, 9(2): 179-199.
[18] Rényi, A. (1961). On measure of entropy and information. Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability 1, University of California Press, Berkeley, Vol. 4, 1 January 1961, 547-561
[19] Ristic, M. M., & Balakrishnan, N. (2012). The gamma-exponentiated exponential distribution. Journal of Statistical Computation and Simulation, 82 (6): 1191-1206.
[20] Sangsanit Y. and Bodhisuwan, W. (2016). The Topp-Leone generator of distributions: properties and inferences. Songklanakarin Journal of Science and Technology, 38: 537-548.
[21] Tuoyo, D. O, Opone, F. C. and N. Ekhosuehi, N. (2021). The Topp-Leone Weibull distribution: its properties and application, Earthline Journal of Mathematical Sciences, 7(2): 381-401.
[22] Zografos, K. and Balakrishnan, N. (2009). On Families of Beta-G and Generalized Gamma-generated Distribution and Associate Inference. Statistical Methodology, 6: 344-362.