Научная статья на тему 'THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF DISTRIBUTIONS: THEORY AND APPLICATIONS'

THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF DISTRIBUTIONS: THEORY AND APPLICATIONS Текст научной статьи по специальности «Науки о Земле и смежные экологические науки»

CC BY
57
17
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
Continuous Bernoulli Distribution / Moments / Quantile / Monte Carlo Simulation Study

Аннотация научной статьи по наукам о Земле и смежным экологическим наукам, автор научной работы — Ngozi O. Ubaka, Friday Ewere

The continuous Bernoulli distribution is a one-parameter probability distribution which is useful in analysis on machine learning. A handful of studies has been done to generalize the continuous Bernoulli distribution. In this paper, we introduced a wider extension of the continuous Bernoulli distribution by considering its distribution function as a generator. We referred to the proposed family as the continuous Bernoulli-generated family of distributions. Basic statistical treatments of the proposed family such as the density and cumulative distribution functions, survival and hazard rate functions, quantile, moments, moment generating function, and Renyi entropy are derived. The method of maximum likelihood is employed to estimate the unknown parameters of the family and the asymptotic behaviour of the parameter estimates is investigated via Monte Carlo simulation study. The waiting time (in minutes) of 100 Bank customers and the tensile strength measured in GPa, of 69 carbon fibers data sets formed the basis for real-life data fittings. Results obtained from the fitting of the two data sets when compared with some existing non-nested models revealed that the fittings were in favor of the continuous-Bernoulli Weibull distribution over the rest competing distributions.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF DISTRIBUTIONS: THEORY AND APPLICATIONS»

THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF DISTRIBUTIONS: THEORY AND APPLICATIONS

Ngozi O. Ubaka1 Friday Ewere2

department of Statistics, Federal University of Oyo-Ekiti, Ekiti State, Nigeria. 2Department of Statistics, University of Benin, Benin City, Edo State, Nigeria.

[email protected] [email protected]

Abstract

The continuous Bernoulli distribution is a one-parameter probability distribution which is useful in analysis on machine learning. A handful of studies has been done to generalize the continuous Bernoulli distribution. In this paper, we introduced a wider extension of the continuous Bernoulli distribution by considering its distribution function as a generator. We referred to the proposed family as the continuous Bernoulli-generated family of distributions. Basic statistical treatments of the proposed family such as the density and cumulative distribution functions, survival and hazard rate functions, quantile, moments, moment generating function, and Renyi entropy are derived. The method of maximum likelihood is employed to estimate the unknown parameters of the family and the asymptotic behaviour of the parameter estimates is investigated via Monte Carlo simulation study. The waiting time (in minutes) of 100 Bank customers and the tensile strength measured in GPa, of 69 carbon fibers data sets formed the basis for real-life data fittings. Results obtained from the fitting of the two data sets when compared with some existing non-nested models revealed that the fittings were in favor of the continuous-Bernoulli Weibull distribution over the rest competing distributions.

Keywords: Continuous Bernoulli Distribution; Moments; Quantile; Monte Carlo Simulation Study

1. INTRODUCTION

The cumulative distribution function (cdf) of the one-parameter continuous Bernoulli distribution has been defined by [13] as

i x,

F ( x, À) = <

Ax (l-à)1 X + À- 1 2À-1 '

à * k, o < x < i,

2

(1)

à =

x

2

Ngozi O. Ubaka and Friday Ewere

THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF RT&A' No3 (74)

DISTRIBUTIONS Volume 18, September 2023

with the probability density function (pdf) associated to (1) obtained as

caax (i-A)1-x, a*>2, o<x<i, f (x, A) = < (2)

1, a = >2

where the normalizing constant c is defined as 2tanh-1 (1 - 2 A)

CA=)

1-2A

(3)

and 2tanh

1

A = 12

(1 - 2A) = ln (1 - A) - ln (A), using the relation tanh-1 (x) = 1 ln ^ 1i£.j.

We denote a random variable X following the continuous Bernoulli distribution as X ~ CB (A). The continuous Bernoulli distribution has special application in machine learning. Particularly, in simulating the pixel intensities of natural images in deep learning and computer vision, mostly in the development of variational autoencoders. Similar to the one-parameter Topp-Leone and power distributions, the CB (A) distribution is also a one-parameter distribution with support on a unit-interval.

In the theory of statistical analysis of lifetime data, bounded distributions have found a wide variety of applications ranging from the field of engineering, actuarial sciences, economics, biological sciences, etc. Particularly, when the data are recorded in rates, percentages and proportions. Over the years, the beta and Kumaraswamy distributions are the topmost bounded distributions to be reckon with in regards to fitting [0,1]-valued data sets, until the advent of several methodologies in developing unit-interval distributions. Notable among these distributions are the log-Lindley distribution proposed by [10], unit-logistic distribution developed by [14], log-Xgamma distribution introduced by [2], Marshall-Olkin Topp-Leone distribution developed by [17], unit-Burr XII distribution studied by [11], Marshall-Olkin extended unit-Gompertz distribution studied by [15], transmuted Marshall-Olkin extended Topp-Leone Distribution introduced by [18], Kumaraswamy unit-Gompertz distribution proposed by [1], etc. It is noteworthy to mention that the power continuous Bernoulli distribution due to [3] and transmuted continuous Bernoulli distribution due to [4], apparently the only extensions of the classical continuous Bernoulli distribution belong to this list. The goal of this paper is to develop a novel family of distributions based on the continuous Bernoulli distribution, which is hoped to birth more tractable and flexible lifetime distributions in analyzing real data sets.

The rest of the paper is organized in the following sections. Section 2 is devoted to model formulation. Section 3 provides some sub-models from the proposed family of distributions. General mathematical treatments for the proposed family of distributions, the parameter estimation as well as the investigation of the asymptotic behaviour of the parameter estimates of the model via a Monte Carlo simulation are discussed in Section 4. Section 5 provides the applicability of the proposed family of distributions in real-life data fitting. Section 6 concludes the paper.

2. MODEL FORMULATION

Suppose a random variable T follows a known probability distribution with pdf f (t), [20] adopted the beta-generated technique developed by [6] to introduce the Topp-Leone-generated family of

distributions with cdf defined by

iG( x,£) ,

(1 -t)(t(2-t)) dt, 0 < t < 1, a > 0,

= G (x,£)a(2 - G (x,£))a , and the associated pdf obtained as

f ( x,a,£) = 2ag ( x, £ ) G ( x, £ )a-1 (l - G ( x, £ )) ( 2 - G (x, £ ))a-1.

As an alternative to the technique in (4), [5] introduced the so-called type II Topp-Leone generated (TIITL-G) family of distributions based on the methodology of [19] who introduced an alternative gamma-generator reported in [22]. The cdf and pdf of TIITL-G family are, respectively, defined by

(4)

(5)

i1-G(x,4) , „ ,

( ) ta-1 (1 -1)(2 -1 )a-1 dt,

= 1 -(1 -

■J'

J0

(1 - G2 (x^f

0 < t < 1, a > 0,

(6)

and

f (x,a,£) = 2ag (x,£) G ()(l -G2 (x,£)p . (7)

Motivated by the simplicity of the technique in (6) and using the CB (X) distribution defined in (3) as the generator, we develop a novel class of distributions with the cdf defined by

F (t, A, 4) =

j1-G(t,i)

(1 -A)'

G (t ,4)

-A

1 - 2A

A ф x, 0 < t < 1,

(8)

, G (t,4), A = %

The pdf corresponding to (8) is obtained as

'c^)^4 (1 "A)G(t,i), Аф X, 0 < t < 1,

f (t, A, 4) = <

(9)

, g x = >2

A random variable T having the cdf and pdf defined in (8) and (9), respectively, is said to follow the continuous Bernoulli-generated (CB (A,4)- G) family of distributions.

The survival and hazard rate functions of CB (A,4)- G family of distributions are defined in (10) and (11), respectively, as

>G(i,« (1 -X)G^ +X -1

S (t, A, 4) 4

and

h(t, A, 4) =

2A-1

1 - G (t ,4),

Аф X, 0 < t < 1,

A = 12

(10)

CAg(t,4)A-G(t4) (1 -A)

G (t ,4)

(1 -A)G

+ A-1

АфX, CA =(2A-1)Ca

(11)

g (t,4)

1 - G(t,4)'

a = 12

A

Furthermore, the quantile function of the CB (A, %)- G family of distributions is obtained as

ln [(1 - 2 A) u +A]- ln [A] 2tanh-1 (1 - 2A)

Whereas substituting u = 0.5 in (12), the median of the CB (A,£)- G family of distributions i

Qt (u) = g-

0 < u < 1.

(12)

is

obtained as Qt (0.5) = G

ln [2] + ln [A] 2 tanh-1 (1 - 2 A)

(13)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

The utility of (12) is in generating random numbers from the CB (A,£)- G family of distributions, where u is generated from the uniform distribution satisfying 0 < u < 1.

3. SUB-MODELS OF THE CB (A,g)- G FAMILY OF DISTRIBUTIONS

This section is concerned with the formulation of tractable models from the CB (A, %)- G family of distributions based on the Weibull, Topp-Leone, Kumaraswamy and Burr XII distributions as the baseline distribution in (8).

3.1 The continuous Bernoulli Weibull CBW (A, a, ß) distribution

Let T be a random variable following the Weibull distribution with cdf, G (t,a, ß) = 1 - e~ßt° and pdf, g (t, a, ß) = aßta-1e~ßta, t > 0, a, ß> 0. We defined the cdf and pdf of the CBW (A, a, ß) distribution, respectively, as follows

(1 -A)1

-A

1 - 2A

F (t, A, a, ß) = <

and

f (t, A, a, ß) =

1 - e"

A * X, a, ß> 0, t > 0.

A = a, ß> 0.

(14)

Cwta-1e-ßta A^ (1 -A)1-

aßta-le-ßt"

A* X, a, ß> 0, t > 0, CJ = aßCA.

A = X, a, ß> 0

(15)

3.2 The continuous Bernoulli Topp-Leone CBTL (A, a) distribution The one-parameter Topp-Leone distribution is defined by the density function

g (t,a) = 2a (1-t)\t (2 - ?)]a-1 , a* 1, a> 0, 0 < t < 1, (16)

and the associated cdf is given by

G (t,a) = \t (2 - t)ia , a* 1, a> 0, 0 < t < 1, (17)

By inserting the pdf and cdf in (16) and (17) into (8) and (9), we defined the cdf and pdf of the CBTL (A, a) distribution, respectively, as

a

e

A

a

a

F (t, Х, a) = <

Х1^2-^" (1 _х)1>(2-^ -x

1 - 2Х

Х ф >2, a > 0, 0 < t < 1,

[t (2 -1 )]C

Х = X, a > 0.

and

f (t, X,a) =

CTL (1 -1 )[t (2 -1 )]a-1 X-[t (2-t )]a(l -X)[i(2-i)] , Х* a > 0,0 < t < 1, Cf = 2aCx.

2a (1 -1 )[t (2 -1 )]a

Х = X, a > 0

(18)

(19)

3.3 The continuous Bernoulli Kumaraswamy CBK (X,a, 0) distribution

The Kumaraswamy distribution developed by [12] is a bounded distribution with 2 shape parameters having the cdf, G (t) = 1 - (l - ta )0 and pdf, g (t) = a0ta-1 (l - ta )0 >, a, 0 > 0. By this information, the cdf and pdf of the CBK (X, a, 0) distribution is defined, respectively, as

F(t, Х, a, P) =

x(1-'')' (1 -x)1-(

1-( 1-ta

-x

1 - 2Х

Хф X, a, P> 0,0 < t < 1,

(20)

and

f (t, Х,а, P) =

1 -(1 - ta)P , Х = X, a, P> 0.

cХta-1 (1 - ta)P-1 Х^) (1 -Х)1"^)' , Хф X, a, P> 0,0 < t < 1, cХ=apCx.

(21)

apta-1 (1 - ta)'

P-1

Х = X, a, P > 0

3.4 The continuous Bernoulli Burr XII CBBXII (X,a, 0) distribution

A random variable T is said to follow the two-parameter Burr XII distribution, if the density function of T is defined by

g (t, a, 0) = a0ta-1 (1 + ta )-(0+1), a, 0> 0, t > 0, (22)

and the corresponding cdf is given by

G (t,a, 0) = 1 -(1 + ta)~0 , a, 0> 0, t > 0, (23)

By inserting (22) and (23) into (8) and (9), we defined the cdf and pdf of the CBBXII (X,a, 0) distribution, respectively, as follows

p

F (t, X,a, P) = <

X

(i+tar

(1 -x)1-

-X

1 - 2X

X ^ X, a,P> 0, t > 0,

(24)

and

f (t, X,a, P) =

Ckf

1 -(l + ta) , X = X, a, P> 0.

1 (l + ta)-(P+1)X(l+ta)-P(l-X)1-(l+ta)-P , X ^ X, a, P> 0, t > 0, CBx=apCx.

(25)

apta-1 (l + ta)

-(P+l)

X = X, a, P> 0

4. MATHEMATICAL PROPERTIES OF THE CB(l, 4)- G FAMILY OF

DISTRIBUTIONS

In this section, the mathematical properties of the CB (l,4)- G family of distributions such as the

rth non-central moments, moment generating function (mgf) and Renyi entropy are discussed. The method of maximum likelihood estimation is employed to estimate the model parameters and the asymptotic behaviour of the parameter estimates are investigated through a Monte Carlo simulation study.

4.1 The rth non-central moments

Let T be a random variable having the density function of the CB (l, 4)- G family of distributions, then the rth non-central moments of T is defined by

E[r ] = 6r = J*" trf (t,l,4)dt, r = 1,2,3,4,...

(26)

= CxT t]g(t,&X1-G(t& (1 -Xf(t,i) dt. Evaluating (26) yields the following results

E |T]= Cx £_ trg(t, & exp ((G(t, & ) ln (X) + (l - G(t, ln (l - X)) dt,

= XCx £_t]g(t,exp(o(t,&[ln(1-X)-ln(X)])dt,

= XCXp t] g(t, exp (G(t, [2 tanh-1 (l - 2X)]] dt. Applying the Maclaurin's series expansion of the exponential function,

(27)

G(t,&[2tanh-1(l-2X)]

= 1

[ 2 tanh-1 (1 - 2X)] "

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

n=0

n !

[G(t,#)]n

so that (27) now becomes, r » [2 tanh-1 (1 - 2X)]"

E

n=0

n !

j t]g (t[G(t,&)]ndt,

-p

» r2tanh-1 (1 - 2A)]" »

"=o n !(n + 1) j-»

» |~2tanh-1 (1 - 2A)T r = ACa£[ n,(n + 1) J Efc ]■ (28)

n=0

Where hn+1(t,4) = (n +1)g(t,4) [G(t,4)]nand E^7nr+1 ] are, respectively, the density function and rth non-central moments of the exp-G family of distributions with power parameter (n + 1). Thus, we can express the rth non-central moments of the CB (A, 4) - G family of distributions as a linear combination of the rth non-central moments of the exp-G family of distributions with power parameter (n +1).

For the purpose of numerical computation, we consider the two-parameter Weibull distribution as the baseline distribution. Hence, we compute the first four raw moments, variance, measures of skewness and kurtosis of the continuous Bernoulli Weibull CBW(A, a, 0) distribution in Table 1.

Table 1: The Moments of the CBW (A, a, 0) distribution for selected values of the Parameters

A 0 a W1 V1 »3 »4 S K

0.4 0.5 3 1.1721 1.5417 2.2068 3.3774 0.1679 0.0905 2.7315

5 1.0822 1.2283 1.4485 1.7636 0.0571 -0.3259 2.9750

7 1.0524 1.1367 1.2549 1.4119 0.0292 -0.5465 3.4994

3.0 3 0.6450 0.4669 0.3678 0.3098 0.0509 0.0889 2.7397

5 0.7563 0.5999 0.4944 0.4206 0.0279 -0.3265 2.8830

7 0.8147 0.6812 0.5822 0.5072 0.0175 -0.5310 3.6310

0.8 0.5 3 0.9696 1.0912 1.3758 1.9007 0.1511 0.4223 2.9993

5 0.9616 0.9824 1.0551 1.1827 0.0577 -0.0428 2.9123

7 0.9659 0.9640 0.9896 1.0415 0.0310 -0.2721 3.2045

3.0 3 0.5336 0.3305 0.2293 0.1743 0.0458 0.4181 2.9976

5 0.6720 0.4798 0.3601 0.2821 0.0282 -0.0523 3.0015

7 0.7478 0.5778 0.4592 0.3741 0.0186 -0.2719 3.0701

Information from Table 1 shows that the CBW distribution exhibits a left-skewed, right-skewed, platykurtic and leptokurtic properties which are essential in modeling heavy-tailed distributions.

4.2 The moment generating function

The moment generating function (mgf) of a random variable T with density function f (t) is defined by

MT (q) = E [eqt ] = £ eqtf (t) dt, (29)

Using similar approach in (29), we defined the mgf of the CB (A, 4) - G family of distributions as

* * [2 tanh-1 (1 - 22)1" qp r

Mt (q) = 2C2XIL- ( ^ )J E[^].

p=0 n\(n +1) p! L J

(30)

Since, eqt = X

(qt)1

p=0 f

4.3 The Renyi entropy

An entropy of a random variable say T, measures the degree of randomness associated with the

random variable T. The Renyi entropy of T is defined by [18] as

1 /• *

(r) = --log J f r(t) dt, r> 0, r* 1. (31)

1 - r ^ -*

By substituting (9) into (31), we defined the Renyi entropy of a random variable T following the CB (2,4)- G family of distributions as follows

(r) = -L]ogk)rJ™ gr(t,4)2r(1-G(t4))(1 -l)rG(t,4) dt

1 - r J-*

-log (Cx )r 2r J* ^^ (t, 4) exp (rG(t, 4) [ ln (1 -2) - ln (2)]) dt

i _ - log (Ci )r 27 J* gr (t, 4) exp (rG(t, 4) [2 tanh-1 (1 - 22)]) dt Again, applying the Maclaurin's series expansion of the exponential function,

r 1

1 -r 1

(32)

rG (t ,4)[ 2tanh-1(1-22)]

rn [2 tanh-1 (1 - 22)]"

n \

[G(t,4)]n

so that (32) now becomes, 1

(r)

1-r

log

« rn [2 tanh-1 (1 - 22)]n .*

(C2)r2rX—-r-— f gr(t,4) [G(t,4)]ndt

^^ n \ J-*

n=0

(33)

Two major properties of the Renyi entropy of a random variable T were identify by [9]. These include

(i) The Renyi entropy of T can assume a negative value;

(ii) For any r < r2, R < R and equality holds if and only if T is a uniform random variable.

Again, we compute the Renyi entropy of the CBW(2, a, 0) distribution for selected values of the parameters as shown in Table 2.

Table 2: Numerical computation of the Renyi entropy of the CBW (2, a, 0) distribution (2 = 0.8)

n=0

T

i rt a = 0.9, 0 = 0.5 a = 0.9, 0 =3.0 a = 1.5, 0=3.0 a = 1.5, 0 = 0.5

1 0.1 3.5600 1.5691 0.8868 2.0813

2 0.3 2.4724 0.4815 0.3213 1.5158

3 0.5 1.9849 -0.0060 0.0923 1.2869

4 0.7 1.6766 -0.3142 -0.0433 1.1513

5 0.9 1.4573 -0.5336 -0.1356 1.0589

6 2 0.8522 -1.1387 -0.3746 0.8199

7 4 0.4451 -1.5458 -0.5180 0.6765

8 6 0.2343 -1.7565 -0.5793 0.6152

9 8 0.0647 -1.9262 -0.6147 0.5799

The result in Table 2 validates the aforementioned properties of the Renyi entropy as suggested by [9].

4.4 Parameter estimation

4.4.1 Maximum likelihood estimation

The maximum likelihood estimation method is employed to estimate the parameters of the CB(2,4)-G family of distributions. Suppose (t1, t2,....tn) are random samples of size n from the CB (2,4) - G family of distributions, then the likelihood function is obtained as

n

LM = n^)21-G(t4) (1 -2)G(t,4)], <P = (2,4f . (34)

i=1

By taking the natural logarithm of both sides of (34), the log-likelihood function is obtained as

n n n

t{t,q>) = J>[g(^4)]+ln (35)

i=1 i=1 i=1

The maximum likelihood estimate, say p = (2,4) is obtained by differentiating the log-likelihood

function in (35) with respect to the parameters and equating the corresponding function to zero as shown below di{t,<p) _ l

32 2 . ,

Further simplification yields, 1 n

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

1E (1 - G (ti .4)),

n

2 = ' ' n

i=1

n rr if n

i=1 i=1

dg fe ,4)

Where g (ti, 4) = — and 54j is the j'h element of the vector of parameter 454 j

It is clear from these expressions that the parameters 2 can be solved analytically, whereas the parameter(s) 4j may require the use of software program such as R program for estimation.

4.4.2 Simulation study

In this subsection, we investigate the asymptotic behaviour of the parameter estimates of the CBW(2, a, 0) distribution. Random samples of size n = (15,25,50,75,100) are generated from the

CBW (2, a, 0) distribution at randomly fixed values of the parameters. A Monte Carlo simulation is repeated 1000 times and the following quantities are computed:

1 N

i) bias=n -p)

i=1

ii) root mean square error (RMSE) =

N

V) 2

1=1

N

iii) Coverage Probability of the 95% confidence interval of the estimates fy given by

j N ___

CP = ¥ ^1 (fy - var (fy) < < + var (fy) ) ' N i=1

Where I (.) is an indicator function and (fy) is the standard error of the estimate fy.

Table 3: Simulation results for bias, RMSE and CP of parameter estimates of CBW (X, a, 0) distribution

Parameters n Bias RMSE CP

a ß 2 a ß 2 a ß 2

a = 0.3 ß = 0.6 2 = 0.8 15 0.0042 0.3614 -0.2437 0.0752 0.5992 0.3598 0.986 0.988 0.908

25 -0.0215 0.3395 -0.2522 0.0623 0.5566 0.3558 0.958 0.972 0.888

50 -0.0578 0.3019 -0.2781 0.0527 0.4956 0.3441 0.948 0.970 0.864

75 -0.0704 0.2741 -0.2996 0.0477 0.4877 0.3253 0.938 0.940 0.878

100 -0.0961 0.2210 -0.3323 0.0421 0.4231 0.2926 0.958 0.964 0.910

a = 0.5 ß = 0.3 2 = 0.6 15 0.0324 0.1972 -0.1020 0.1422 0.2625 0.2880 0.978 0.958 0.918

25 0.0093 0.1887 -0.1074 0.1057 0.2472 0.2808 0.988 0.986 0.890

50 -0.0154 0.1628 -0.1158 0.0832 0.2404 0.2749 0.964 0.978 0.876

75 -0.0184 0.1017 -0.1356 0.0828 0.2361 0.2741 0.942 0.966 0.872

100 -0.0209 0.0772 -0.1648 0.0724 0.2227 0.2578 0.944 0.952 0.878

a = 0.9 ß = 3.0 2 = 0.4 15 0.1085 0.3271 0.0496 0.3043 1.2171 0.2746 0.956 0.998 0.914

25 0.0599 0.1131 0.0401 0.2177 0.9368 0.2645 0.964 0.990 0.904

50 0.0174 0.1082 0.0192 0.1920 0.8197 0.2632 0.926 0.956 0.852

75 0.0026 0.0824 0.0186 0.1619 0.7159 0.2586 0.914 0.942 0.824

100 -0.0043 0.0531 0.0079 0.1615 0.6676 0.2499 0.904 0.940 0.814

a = 0.9 ß = 0.6 2 = 0.4 15 0.0932 0.0618 0.0485 0.2758 0.3681 0.2862 0.978 0.940 0.910

25 0.0439 0.0527 0.0468 0.2190 0.3658 0.2812 0.966 0.938 0.858

50 0.0266 0.0523 0.0293 0.1871 0.3382 0.2702 0.938 0.928 0.818

75 0.0082 0.0470 0.0256 0.1551 0.3023 0.2532 0.950 0.938 0.844

100 0.0073 0.0452 0.0180 0.1524 0.3007 0.2521 0.922 0.912 0.828

From Table 3, we observe that the bias and root mean square errors of the parameter estimates decrease as the sample size n increases. Moreover, the coverage probability of the parameter estimates approaches the nominal level of 95% confidence interval.

5. REAL-LIFE DATA FITTINGS

The applicability of the proposed family of distributions is investigated in this section. To achieve this, two data sets including the waiting time (in minutes) of 100 Bank customers and the tensile strength measured in GPa, of 69 carbon fibers data sets are employed for data fittings. Some well-known non-nested models such as the Kumaraswamy Weibull (KW (X,a, 0)), Kumaraswamy inverse Weibull (KIW (X, a, 0)), Topp-Leone inverse Weibull (TLIW (X,a, 0) ), transmuted Weibull (TW (X,a, 0)) and the two-parameter Weibull distributions are employed alongside with the proposed continuous-Bernoulli Weibull (CBW (X,a, 0)) distribution to fit the two data sets. The data sets for the analysis are given below.

Data set 1: The first data set represents the waiting time (in minutes) of 100 Bank customers reported in [16]. The data set was first used by [8] to illustrate the flexibility of the Lindley distribution over the exponential distribution in data fittings. The data are given as follows: 0.8, 0.8, 1.3, 1.5, 1.8, 1.9 ,1.9, 2.1, 2.6, 2.7,2.9, 3.1, 3.2, 3.3 ,3.5, 3.6, 4.0, 4.1, 4.2, 4.2,4.3, 4.3, 4.4, 4.4, 4.6, 4.7, 4.7, 4.8, 4.9, 4.9,5.0, 5.3, 5.5, 5.7, 5.7, 6.1, 6.2, 6.2, 6.2, 6.3,6.7, 6.9, 7.1, 7.1, 7.1, 7.1, 7.4, 7.6, 7.7, 8.0,8.2, 8.6, 8.6, 8.6, 8.8, 8.8, 8.9, 8.9, 9.5, 9.6,9.7, 9.8, 10.7, 10.9, 11.0, 11.0, 11.1, 11.2, 11.2, 11.5,11.9, 12.4, 12.5, 12.9, 13.0, 13.1, 13.3, 13.6, 13.7, 13.9,14.1, 15.4, 15.4, 17.3, 17.3, 18.1, 18.2, 18.4, 18.9, 19.0,19.9, 20.6, 21.3, 21.4, 21.9, 23.0, 27.0, 31.6, 33.1, 38.5.

Data set 2: The second data set comprises of the tensile strength measured in GPa, of 69 carbon fibers tested under tension at gauge length of 20mm reported in [21]. This data set was also employed by [7] to demonstrate the applicability of the power Lindley distribution. The data are represented as follows: 1.312, 1.314, 1.479, 1.552, 1.700, 1.803, 1.861, 1.865, 1.944, 1.958, 1.966, 1.997, 2.006, 2.021, 2.027, 2.055, 2.063, 2.098, 2.14, 2.179, 2.224, 2.240, 2.253, 2.270, 2.272, 2.274, 2.301, 2.301, 2.359, 2.382, 2.382, 2.426, 2.434, 2.435, 2.478, 2.490, 2.511, 2.514, 2.535, 2.554, 2.566, 2.57, 2.586, 2.629, 2.633, 2.642, 2.648, 2.684, 2.697, 2.726, 2.770, 2.773, 2.800, 2.809, 2.818, 2.821, 2.848, 2.88, 2.954, 3.012, 3.067, 3.084, 3.090, 3.096, 3.128, 3.233, 3.433, 3.585, 3.585.

Some popularly used model selection criteria such as the maximized log-likelihood (LL), Akaike Information Criteria (AIC), and some goodness of fit test statistics such as the Komolgorov-Smirnov (K-S), Crammer von Mises (W*) and Anderson Darling (A*) test statistics with their corresponding p-value are considered to access the appropriate model for analyzing the two data sets. Tables 4 and 5 present the summary statistics for the fit of the distributions for the two data sets, respectively.

Table 4: Summary statistics for the waiting time data set

Models Estimates LL AIC K-S W A*

(p-value) (p-value) (p-value)

CBW a = 1.7229 0=0.0071 2 = 0.9356

-317.3098 640.6196 0.0423 0.0248 0.1682

(0.994) (0.9904) 0.9968)

KW a = 1.3727 -317.6755 641.3510 0 = 0.2015 2 = 1.3379

0.0508 0.0414 0.2578

(0.9587) (0.9263) (0.9660)

KIW a = 2.6384 0=1.1424 2 = -1.5224

-332.9531 671.9062 0.1099 0.4051 2.6255

(0.1785) (0.0698) (0.0427)

TLIW a = 0.5235 0=12.5524 2 = 0.9569

-327.1056 641.2112 0.0891 0.2449

(0.4044) (0.1951)

1.6727 (0.1402)

TW a = 1.5692 0 = 0.0157 2 = 0.6181

-317.8896

641.7791 0.0481 0.0384 0.2599

(0.9746) (0.9420) (0.9648)

Weibull a = 1.4584 -318.7307 641.4614 0 = 0.0305

0.0577 0.0609 0.4051

(0.8929) (0.8095) (0.8433)

Table 5: Summary statistics for tensile strength data set

Models Estimates

LL

AIC

K-S

(p-value)

W*

(p-value)

A*

(p-value)

CBW

a = 2.7806 ß = 0.1778 2 = 0.0026

-49.0740

104.1481

0.0400 (0.9999)

0.0142 (0.9998)

0.1210 (0.9998)

KW a = 3.9464 ß = 0.1690 2 = -0.1312

-49.9210

105.8421

0.0675 (0.9112)

0.0581 (0.8276)

0.3901 (0.8580)

KIW

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

a = 4.2588 ß = 2.8719 2 = -3.7556

-56.2704

118.5408

0.1061 (0.4193)

0.1995 (0.2688)

1.3439 (0.2185)

TLIW a = 0.5468 ß=34.889 2=3.4115

-58.0304

122.0608

0.1176 (0.2960)

0.2617 (0.1741)

1.7344 (0.1294)

TW

a = 5.9303 ß = 0.0021 2 = 0.6363

-49.1325

104.2650

0.0433 (0.9995)

0.0191 (0.9979)

0.1714 (0.9963)

Weibull a = 5.5045 ß = 0.0046

-49.5961

104.1923

0.0560 (0.9819)

0.0343 (0.9611)

0.2739 (0.9563)

From Tables 4 and 5, based on the conditions to measure superiority of models, the continuous-Bernoulli CBW (X,a, 0) distribution having the maximized log-likelihood value, least value in

terms of the AIC, K-S, W and A* test statistics with the corresponding highest p-value, outperforms the competitor distributions in analyzing the two data sets, and thus becomes the most appropriate model in fitting the data sets.

6. CONCLUSION

In this paper, we have developed a new class of probability distributions based on the continuous Bernoulli distribution. The proposed family is called the continuous Bernoulli-generated family of distributions. Mathematical derivation of some basic properties of the proposed family such as the density and cumulative distribution functions, survival and hazard rate functions, quantile, moments, moment generating function, and Renyi entropy were obtained. The method of maximum likelihood was employed to estimate the unknown parameters of the family and the asymptotic behaviour of the parameter estimates was investigated via Monte Carlo simulation study. Two reallife data sets including the waiting time (in minutes) of 100 Bank customers and the tensile strength measured in GPa, of 69 carbon fibers data sets were employed to illustrate the applicability of the proposed family. Existing non-nested models such as the Kumaraswamy Weibull, Kumaraswamy inverse Weibull, Topp-Leone inverse Weibull, transmuted Weibull and the two-parameter Weibull distributions were employed alongside the proposed continuous-Bernoulli Weibull distribution to

Ngozi O. Ubaka and Friday Ewere

THE CONTINUOUS BERNOULLI-GENERATED FAMILY OF RT&A' No3 (74)

DISTRIBUTIONS Volume 18, September 2023

fit the two data sets. Results obtained from the fitting of the two data sets when compared using some model selection criteria and goodness of fit test statistics, revealed that the fittings were in favor of the continuous-Bernoulli Weibull distribution over the rest competing distributions.

References

[1] Akata, I. U., Opone, F. C. and Osagiede, F.E.U. (2023). The Kumaraswamy Unit-Gompertz Distribution and its Application to Lifetime Dataset. Earthline Journal of Mathematical Sciences, 11(1): 1-22.

[2] Altun, E. (2018). The log-xgamma distribution with inference and application. Journal de la Société de Statistique de Paris, 159(3): 40-55.

[3] Chesneau, C. and Opone, F. C. (2022). The power continuous Bernoulli distribution: Theory and Applications. Reliability: Theory & Application, 17(4): 232-248.

[4] Chesneau, C., Opone, F., and Ubaka, N. (2022). Theory and applications of the transmuted continuous Bernoulli distribution. Earthline Journal of Mathematical Sciences. 10(2): 385-407.

[5] Elgarhy, M., Nasir, M. A., Farrukh Jamal, F. and Ozel, G. (2018). The type II Topp-Leone generated family of distributions: Properties and applications, Journal of Statistics and Management Systems, 21: 1529-1551.

[6] Eugene, N., Lee, C. and Famoye, F. (2002). The beta-normal distribution and its applications. Communications in Statistics-Theory and Methods. 31: 497-512.

[7] Ghitany, M., Al-Mutairi, D., Balakrishnan, N. and Al-Enezi, I. (2013). Power Lindley distribution and associated inference. Computational Statistics and Data Analysis. 64: 20-33.

[8] Ghitany, M., Atieh, B. and Nadarajah, S. (2008). Lindley distribution and its applications. Mathematics and Computers in Simulation. 78: 493-506.

[9] Golshani, L. and Pasha, E. (2010). Renyi entropy rate for Gaussian processes. Information Sciences, 180: 1486-1491.

[10] Gomez-Déniz, E., Sordo, M. A. and Calderin-Ojeda, E. (2014). The Log-Lindley distribution as an alternative to the beta regression model with applications in insurance. Insurance: Mathematics and Economics, 54(1): 49-57.

[11] Korkmaz, M. and Chesneau, C. (2021). On the unit Burr-XII distribution with the quantile regression modeling and applications. Computational and Applied Mathematics, 40(1): 1-26.

[12] Kumaraswamy, P. (1980). A Generalized Probability Density Function for Doubly Bounded Random Process. Journal of Hydrology. 46: 79-88.

[13] Loaiza-Ganem, G. and Cunningham, J.P. (2019). The continuous bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems, 1326613276.

[14] Menezes, A. F. B., Mazucheli, J. and Dey, S. (2018). The unit-logistic distribution: different methods of estimation. Pesquisa Operational, 38(3): 555-578

[15] Opone, F. C., Akata, I. U. and Altun, E. (2022). The Marshall-Olkin Extended Unit-Gompertz Distribution: Its Properties, Regression Model and Applications. Statistica, 82(2): 97-118.

[16] Opone, F. C. and Ekhosuehi, N. (2018). Methods of Estimating the Parameters of the Quasi Lindley Distribution. Statistica, 78(2): 183-193.

[17] Opone, F. C. and Iwerumor, B. N. (2021). A New Marshall-Olkin Extended Family of Distributions with Bounded Support. Gazi University Journal of Science, 34(3): 899-914.

[18] Opone, F. C. and Osemwenkhae, J. E. (2022). The transmuted Marshall-Olkin extended ToppLeone Distribution. Earthline Journal of Mathematical Sciences, 9(2): 179-199.

[18] Rényi, A. (1961). On measure of entropy and information. Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability 1, University of California Press, Berkeley, Vol. 4, 1 January 1961, 547-561

[19] Ristic, M. M., & Balakrishnan, N. (2012). The gamma-exponentiated exponential distribution. Journal of Statistical Computation and Simulation, 82 (6): 1191-1206.

[20] Sangsanit Y. and Bodhisuwan, W. (2016). The Topp-Leone generator of distributions: properties and inferences. Songklanakarin Journal of Science and Technology, 38: 537-548.

[21] Tuoyo, D. O, Opone, F. C. and N. Ekhosuehi, N. (2021). The Topp-Leone Weibull distribution: its properties and application, Earthline Journal of Mathematical Sciences, 7(2): 381-401.

[22] Zografos, K. and Balakrishnan, N. (2009). On Families of Beta-G and Generalized Gamma-generated Distribution and Associate Inference. Statistical Methodology, 6: 344-362.

i Надоели баннеры? Вы всегда можете отключить рекламу.