Научная статья на тему 'A New Generalization of Exponential Distribution for Modelling reliability Data'

A New Generalization of Exponential Distribution for Modelling reliability Data Текст научной статьи по специальности «Математика»

CC BY
79
32
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
Exponential distribution / Reliability / Maximum likelihood estimation

Аннотация научной статьи по математике, автор научной работы — V. Jilesh, Aifoona Ahammed P.M

In this paper, a new generalization of the exponential distribution is proposed. Different properties, important reliability measures and special cases of this distribution are investigated. Unknown parameters are estimated using the maximum likelihood method of estimation. A simulation study is carried out to assess the accuracy of the maximum likelihood estimates. Two real data sets are successfully modelled with the proposed distribution.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «A New Generalization of Exponential Distribution for Modelling reliability Data»

A New Generalization of Exponential Distribution for Modelling reliability Data

V. Jilesh, Aifoona Ahammed P.M

Department of Statistics, Government Arts and Science College, Kozhikode. Kerala, India.

jileshstat@gmail.com aifoonaahammed@gmail.com

Abstract

In this paper, a new generalization of the exponential distribution is proposed. Different properties, important reliability measures and special cases of this distribution are investigated. Unknown parameters are estimated using the maximum likelihood method of estimation. A simulation study is carried out to assess the accuracy of the maximum likelihood estimates. Two real data sets are successfully modelled with the proposed distribution.

Keywords: Exponential distribution, Reliability, Maximum likelihood estimation.

1. Introduction

Recently, researchers are more interested in developing new probability distributions and generalizations from the existing family of distributions. The aim of more realistic modelling of complex datasets can be attained through such generalizations. Such newly formed distributions are also showing better flexibilty and properties than the baseline distribution, becomes more suitable in reliability studies and other related fields. For instance, see Eugene et al [3], Bourguignon et al. [4], Cordeiro and Castro [3], Marshall and Olkin[9], Zografos and Balakrishnan [15], Silva et al. [14], Jayakumar and Mathew [7], Nadarajah and Kotz [10], Nadarajah and Kotz [12], Nadarajah and Gupta [11]. One important family of distribution, which is the basis of many other well studied probability distributions is the exponential distribution. One main limitation of this distribution in using the reliability modelling is it's constant hazard rate. So we can find different generalizations of the exponential distribution to overcome this problem. Some important generalizations are gamma, Weibull, Rayleigh and Generalized exponential distribution of Gupta and Kundu [5]. But, in some of the datasets, we can observe, a sudden drop in the frequency of observations after some specific data points. But the available generalizations are not appropriate to model that kind of datasets, which demands a breakage in the flexibility of the probability density function. Here we introduce a new distribution to model such data sets. The family of distrbution under study is derived using the exponential distribution and two sided power distribution of René Van Dorp and Kotz [13]. The probability density function of the two sided power distribution is given by,

f a (x)a-1, if 0 < x < 9 g(x) = { (¿x\^ > 0,0 < 9 < 1. (1)

I a ( 1—x J , if 9 < x < 1.

This paper is organized as follows. In section 2, we propose the new distribution and discuss it's basic properties, Reliability properties are studied in Section 3. The estimation of unknown parameters in the proposed distribution is done using maximum likelihood method and is discussed in Section 4. A simulation study to check the estimation procedure is conducted in section 5. In section 6, we successsfully modelled the fatigue failure data and aircraft failure time dataset using the proposed distribution.

2. Definition and Properties

A random variable X is said to follow a generalized exponential distribution, denoted by G(a, 6), if it's probability density function (pdf) is of the form

ae'

f (x) =

(^F)" if 0 < x <-ln(1 - 6)

(2)

ae

1-6

a-1

if - ln(1 - 6) < X <

where 0 < 6 < 1 and a > 0, a is not necessarily be an integer. When a=1, the above probability density function is the probability density function of exponential random variable with mean unity. The above probability denssity function can be derived by mixing the two sided power distribution and the exponential distribution. The shape of the probability density curves given by (2) for different values of parameters are shown in Figure (1). Note that for the values a > 1 the density curve first increases and reach a maximum value corresponding to x = log(a) and then decreases, there is a cutting point at —ln(1 — 6). For 0 < a < 1, the curve is always decreases with a cutting point at — ln(1 — 6). The values taken by the parameter 6 doesnot effect the monotone behaviour of the curve. Mode of the distribution is log(a). To derive the cumulative distribution function (CDF) of the proposed distribution, we have to consider two cases.

Figure 1: Density plot of G(a,6) for different values of 6

Case 1:0 < t < -ln(1 - 6)

Then

F(t) = [ae-x (

1e

dx

6a-1 J0

by substitution for 1 - e-x by u, we get

i e-x(1 - e-x)a-1 dx

0

a r1-e t

F(t) = ua- du

y ) 6a-1 J0

a

6a-1 J0

-1 —t \ a 1 — e t x

x

— x

OO

x\ a-1

6

a

Case 2: -ln(1 - 9) < t < œ

r-in(1-9) (1- e-x\a-1 ,t f e-x \«-i F(t) = J^ ae x (—9—J dx + y ae x ( --- ) dx

0 V 9 J J - in(1 - 9) \1 -9

1 - (1 - 9)

e-at

(1 - 9)a'

Thus, the cumulative distribution function is given by

f 9 (^j ", if 0 < x <-ln(1 - 9) F(x) =1 (3)

I 1 - (1 - 9)jf-9a, if - ln(1 - 9) < x < œ.

Similarly, the quantile function can be derived by inverting the distribution function. Thus, we obtain

Q(u) = F-1 (u), 0 < u < 1 (4)

ln

1 -9 ( u ) "

0 < u 9

- a [ln(1 - u) + (a - 1)ln(1 - 9)] 9 < u < 1.

The simulation of random sample having G(a,9) distribution can be done using the above equation, where U=u is a realization from a uniform random variable, that is U —> U(0,1). Moment generating function of a random variable X with G(a,9) distribution, is given by,

Mx (t) = r-9 i-p)" 'dx + £M1-e) <:>'»-'( f-nf 'dx (5)

a-1 „œ / „-r \ a-1

rOr I ln(1 9) e-x(1-t)(1 - e-x)a-1 dx + ^ 1 , T e-x(a-t)dx 9a-1 J0 V 7 (1 - 9)a-1 J-in(1--

a „,n „ a (1 - 9)a-t

-—tB(9,a, 1 - t) + --^—T-V-,

9a-1 v ' (1 - 9)a-1 (a -1)

where

r 9

B(9, m,n) = um-1 (1 - u)n-1 du Jo

is the incomplete beta function or the distribution function of beta 1st kind. The kth order raw moment for G(a,9) distribution can be written as

E(Xk ) = j xkf (x)dx (6)

r-ln(1-9) k -x ( 1 - e-x\a-\ rœ k -x ( e-x xa-1

—a— dx + ax e x -—

9 -ln(1-9) 1 - 9

But, it is difficult to get a good expression for the above integral. Hence, we have calculated the moments numerically. Table (1) gives the the mean and variance of G(a,9), for different values taken by the parameters a and 9.

An entropy is a measure of variation or uncertainty. The Renyi entropy of a random variable with probability density function f(-) is defined as

h(x) = z-log fY(x)dx, y > 0,7 = 1.

1 - Y J0

Table 1: mean and variance of G(a,Q), for different values a and d

e = „.3 e = „.5 e = „.8

a E(X) V(X) E(X) V(X) E(X) V(X)

„.5 1.6828 3.8627 1.4467 3.5725 1.„269 2.„617

1 1 1 1 1 1 1

1.5 „.7772 „.4552 „.8669 „.4825 1.„48„ „.6123

2 „.6677 „.26„6 „.8„68 „.2892 1.„976 „.44„6

2.5 „.6„29 „.1691 „.774„ „.1946 1.141„ „.3429

3 „.56„3 „.1185 „.7539 „.14„8 1.178„ „.2794

3.5 „.53„1 „.„879 „.74„6 „.1„71 1.2„96 „.2346

4 „.5„76 „.„678 „.7313 „.„844 1.2368 „.2„12

4.5 „.49„2 „.„539 „.7245 „.„685 1.26„4 „.1755

5 „.4764 „.„439 „.7194 „.„567 1.2812 „.1546

The Shannon entropy of a random variable X is defined by E [-logf (x)].

First derive the Renyi entropy for the corresponding to G(a,e) distribution. We have

Y r-ln(1-

f fY « = ( w-))' i""-e-Y'(1 - e-x )Y(a-l) dx

Y m

(7)

+ ( 7-^^ ) I e-aYXdx

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(1 - e)(a-1) J J-in(i-e)

( e^ )Y jf^ e-Yx (1 - ^)"-» dx + aM

But, the above expresson is difficult to be express in an explicit form. Consider the Shannon entropy, we have

H(X) = E(-logf (x)) (8)

!• M

= J -logf (x)f (x)dx

r-ln(i-e) r m

= Jo -logf (x)f (x)dx + J_ln(1-9) -logf (x)f (x)dx

-2log(a) + (a + 1)E(X) - (a - 1)

e(log(e) -1) - 2log(e)

Order statistics refers the ranking a sample from a distribution. Let X1, X2,.., Xk be k independent and identically distributed random variables, each with cumulative distribution function F(x) given in equation (3). We denote X(r) as the rth order statistic, r=1,2,...,k. Then fr (x),

the probability density function of X(r) for G(a,e) distribution is given by 1

fr(x) = B(r,k -r + 1) Fr-1 (x)(1 - F(x))k-rf (x) (9)

| ae-x(1-e-x)ar-1[1^-(1-e-x)ar,„ < x < -ln(1 - e)

(r 1)!(k r)! I ae-ax(k-r+1)[(1-e)a-1-e-ax]r-1

-^-—, -ln(1 - e) < x < M.

3. Reliability measures of the G(a, 6) distribution

From equation (3), the reliability function is

R(t) = 1 - F(t) (10)

6 (j^)^, jf 0 < x < -ln(1 - 6)

1 -

1 - (1 - 6)(1—)a, if - ln(1 - 6) < x < to

f 1 - 0 < x < -/n(1 - 6)

1 - ln(1 - 6) < x < to.

To identify the applicability of the introduced distribution in reliability studies, we have to famililarize with the shape characteristics of the reliability and hazard curves with the changes in parameter values. For the G(a,6) distribution the hazard function is given by

h(x)=R$) (11)

'(1-e-x )

6a-1-(1-e-x )

0 < x < -ln(1 -- ln(1 - 6) < x < to.

The plot of hazard function is given in Figure (2). For any fixed 6, The G(a,6) distribution has an increasing hazard function for a > 1 and it has decreasing hazard function for a < 1. For a=1 the hazard function becomes 1, independent of x. These results are not very difficult to prove, it simply follows from the fact that the G(a,6) distribution has a log-concave density for a > 1 and it is log-convex for a < 1. The hazard function of the G(a,6) distribution behaves exactly the same way as the hazard functions of the Weibull distribution distribution.

a

Figure 2: Plot of hazard function for different values of parameters a and 6

Reversed hazard function Reversed hazard has been using for the analysis of right- truncated and left-censored data and is applicable in such areas as Forensic Science. The formula of reversed hazard rate of a random life is defined as the ratio between the life probability density to its distribution function. The reversed hazard function for the G(a,6) distribution is given by,

i ex^r, 0 < x <-ln(1 - 6)

r(x'* 6)= eaxj-SW -'<<1 -flsx' < to <I2>

It has observed that, for all values of a and 6, the reversed hazard function is a decreasing function of x. Further, there is no non negative random variable have an increasing reversed hazard rate. Distributions showing the same behaviour in the reverse hazard function are weibull, lognormal.

Elasticity of a distribution express the change that, the distribution function undergoes when faced with the variation in the random variable. It is one of the most important concept in economics theory. In economics, Elasticity measures how sensitive an output variable is to change in an input variable. This classical concept of elasticity to an economic function can be extended to the cumulative distribution function of a random variable. The elasticity function e(x) is defined

as

p(x) =

dlnF(x) _ F'(x)/F(x) _ |x|f(x)

dlnx 1/|x | F(x)

In the case of G(a,6) distribution, Elasticity function is given by

e(x) = |x|r(x, a, t)

, F(x) > 0.

(13)

(14)

= | x|

¿x^, 0 < x <-ln(1 - 6)

eaxd-O-l)-1, -ln(1 - 6) < x <

which shows the close relationship that exists between the reversed hazard function and elasticity.

Mean residual life function (MRLF) is the average lifetime remaining for a component or an individual which had survived at time. For a continuous, non-negative random variable X, representing lifetime of a component or an individual is the residual life random variable at age t, denoted by Xt = X - t|X > t, is simply the remaining lifetime beyond that age. Then the MRLF denoted by ^(t) is defined as

V(t)

E(X - tlX > t) 1

F(t). 1

W)

(15)

F(x)dx

-j

R(x)dx

Then for G(a,6) distribution we have

p(t) =

+

(a-1)

6(a-1) - (1 - e-t)a Jt

(1 - 6)(a-1) r™

-ln(1-6) (1 p-x )a

1- (1 , * ) dx

6(a-1)

-at

-at

-ln(1-6) (1 - 6)(a-1)

substituting for (1 - p x) by u in first part of the integral we get

F(t)

6(a-1) - (1 - p-t)a .h-p-t

r 6

/ ua(1 - u)-1du +

1 p-t

-at

(1 - 6)a

(16)

But, the above expression does not have an explicit form and thus we need to calculate it numerically.

p

1

1

p

4. Estimation of Parameters; Maximum Likelihood Estimation

The proposed derivation of the maximum likelihood estimation procedure of a G(a,6) distribution is quite instructive. Let for a sample X = (X1, X2,.., Xs), the order statistics be X(1) < X(2) < .. < X(s). By definition, the likelihood function for X is

L(X; 6, a) = a

n p

i=1

-X,

(i)

nr=1 (1 - p-X(i)) ns=

r+1

p-X(i)

6r (1 - 6)s-r

(17)

1

a

s

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

where, X(r) < -ln(1 - 6) < X(r+1), with X(0) = 0 Then, the MLE estimators of the parameters are given by

1 - e-X(r)

a = —

logM(r)'

where, r = arg maxre{1,2,.,s}M(r) and

r-1 1 — e-X(<) s e-X(<) M(r) = n1 e X() n

V 7 f=il - e (r) i=r+i e-X(r)

To maximize the likelihood (17), we set

(18) (19)

(20)

max L(X;d, a) = max

a>0,0<0<1 a>0

e-X(') Ma-1

i=1

(21)

where, M is given by

M = max 0< <1

nr=1 (1 - e-X(i)) nS=r+1 e-X(i) er (1 - e)s-r

(22)

and as above X(r) < -ln(1 - 6) < X(r+1), with X(0) = 0.

Using the properties of Pitman family, X(r) would be the estimate of 6= -ln(1-6). Then by inverting we have,

1- e-X(r)

Now,

and

log

X(i) aOtx—1

as n e-X(') M

i=1

slog(a) - £ X(i) + (a - 1)log(M)

i=1

aalog

a

as n e-X(') Ma-1

i=1

- + log( M)

equating to 0 yields,

a=

log(M)

From equation (24), it follows that

(23)

(24)

a,

aalog

i>

e-X(") Ma-1

i=1

> 0 <=> a < -

log(M)

(25)

Hence, a corresponds to a global maximum of both (23) and (21). Note that for i < r, it follows

1 -X(i) -X(i) •"> that 0 < 1-ee < 1 and for i > r, it follows that 0 < ^ < 1. Hence 0 < M < 1 and thus

a > 0. Using equation (22), we may write M = maxre{0,..,s} H(r), where,

H (r) = max

X(r)< e <X(,

nr=1(1 - e-X(i))nS=r+1 e-X(i) er (1 - e)s-r

l(r)<6<X(r+1)

We shall discuss three cases: r € {1,2,..., s - 1}, r = 0 and r = s. Case 1: r € {1,2,..., s - 1} : Here, X(r) < 6 < X{r+1). From (26),

(26)

H(r)

r'-1 1- e-X(i)

max

n

r'e{r,r+1} f=1 1 - e-X(r') i=r'+1 e-X(r')

n

e-X(i)

(27)

s

s

a

s

s

s

a

Case 2: r = 0 : Here, 0 < 6 < X(1). From (26) it follows that in this case

H(0)

max

o< e< x,

Hence,

(i)

e-x(')

n

i=1

e-X(i) 1 - 9

e s p-x(.i) H(0) = n ^ = n

=1 e-X(1) =2 e-X(1)

(28)

(29)

Case 3: r = s : Here, X(s) < 6 < to. From (26) it follows that in this case

H( s) = max

X(s)<9«x>

1- eX(i)

Hence

s i _ ex(i) s-1 1 _ ex(i)

H(s) = n 1 x = H 1 X

W = 1 - e-X(s) = 1 - e-X(s)

From (27), (29)and (31) we obtain that M = maxre{1,..s} M(r), where

r-1 1 - p-x(i) s p-x(i) M(r)=n ^^ n -X1-

A=11 - e-x(r) i=r+1 e-x(r)

(30)

(31)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(32)

i=1

Note that M attained at 9 = X(f) where r = arg maxre{12,.. ,s-1} M(r).

The estimates given in (18) and (19) are quite intuitive. In particular the estimator of the parameter 9 is in term of a specific order statistic. Note that the approach for determining the MLE estimate 9 for the G(a,9) distribution is similar to the approach for determining the MLE estimate 9 for a triangular distribution and STSP (9, n) distribution (see René Van Dorp and Kotz [13]). In order to find the estimate, we use a quite different method.

Consider the matrix A = [ai/r] where

1-e-X(') 1-e-X(r) '

if

e-X(i)

X(.r)'

if

i < r

i > r.

(33)

Then A will be a real matrix with unit diagonal entries. Then, we find the product of the matrix elements in the rth column which are equal to the values of M(r) given by equation (32). identify the maximum value of M(r) and the corresponding rth order statistic X(r) is taken as the estimate

of 6 and by inverting we get the maximum likelihood estimate of the parameter 6. The maximum likelihood estimate of second parameter a can be evaluated using the equation (19), where s denotes the total number of observations.

5. Simulation

To verify the estimation procedure, We have considered a simulation study. We generated samples of different sizes using the quantile function of the proposed distribution and the parameters are estimated with the above discussed procedure. It is repeated 500 times and the mean of estimates are taken as the estimate of the parameters. Table (2) gives the estimates of the parameters, the values in brackets indicates the mean squared error. Note that, as the sample size increases, the estimate becomes more close to the true value of the parameter.

9

a

i,r

e

Table 2: Estimated values of the parameters and mean squared error.

e a

Sample size 0.3 0.5 0.9 1.5 2 3

100 0.3107 0.5036 0.8985 1.5469 2.0444 3.0467

(0.0135) (0.0031) (0.0003) (0.0229) (0.0416) (0.0873)

150 0.3081 0.5012 0.8980 1.5354 2.0326 3.0422

(0.0085) (0.0022) (0.0003) (0.0162) (0.0285) (0.0636)

200 0.3072 0.5007 0.8982 1.5186 2.0161 3.0323

(0.0056) (0.0014) (0.0001) (0.0104) (0.0196) (0.0442)

500 0.3018 0.5004 0.8998 1.5093 2.009 3.0228

(0.0021) (0.0005) (5.213 x10-05) (0.0048) (0.0091) (0.0190)

6. Data Analysis

In this section, we provide an application of the G(a,6) distribution by modeling two real data sets.

Data set I: The first data set represents the life of fatigue fracture of Kevlar 373/epoxy that are subject to constant pressure at the 90 percentage stress level until all had failed, so we have complete data with the exact times of failure. The same datasets were used in the literature by Alizadeh et. all [1]. The estimation of the parameters is done using the maximum likelihood method as discussed in the previous section. Firstly, we identified the matrix A with entries defined as in (33). Then we find the product of the matrix elements in the rth column which gives the values of M(r) given by equation (32). The maximum value of M(r) is identified and the corresponding rth order statistic is taken as the maximum likelihood estimate of 6 and by inverting we obtained the estimate of the parameter 6. Similarly, the maximum likelihood estimate a can be obtained using equation (19). Here, we obtained the estimate of 6 as 0.9998879 and the estimate of a as 2.4003. To verify, goodness of fit of the proposed distribution, we have performed Kolmogorov-Smirnov test and the corresponding p-value is 0.07278, which indicates that the G(a,6) is a suitable model for the data. The histogram of the data together with the fitted probability density curve is given in Figure (3).

Data set II: The second real data set represent the failure times of 84 aircraft windshield. This data is taken from Ijaz et all. [6]. The estimation was carried out as done in the previous data set. The value of M(r) and corresponding order statistic are noted and we obtain the estimte of 6 and a. The parameters for the data set II are estimated as 6 = 0.9961512, a = 7.827007. The p-value for Kolmogorov-Smirnov test is 0.1958. The histogram of the data set II with fitted probability curve is given in Figure (4).

Figure 3: (a) Histogram and fitted G(a,d) probability density function for dataset I. (b) Theoretical and fitted distribution function for the dataset I.

Figure 4: (a) Histogram and fitted G(a,d) probability density function for dataset II. (b) Theoretical and fitted distribution function for the dataset II.

References

[1] Alizadeh, M. Merovci, F. and Hamedani, G. G. (2000). Generalized Transmuted Family of Distributions: Properties and Applications. Hacetteppe University Bulletin of Natural science and Engineering Series B: Mathematics and Statistics, 46: 645-668.

[2] Bourguignon, M, Silva and R B, Cordeiro, G M. (2014). The Weibull-G family of probability distributions. Journal of Data Science, 12: 53-68.

[3] Cordeiro, G M and Castro, M. (2011). A new family of generalized distributions. Journal of Statistical Computation and Simulation, 81: 883-898.

[4] Eugene, N. Lee, C and Famoye, F. (2002). Beta-normal Distribution and its applications. Communications in Statistics - Theory and Methods, 31: 497-512.

[5] Gupta, R. D. and Kundu, D. (1999). Generalized Exponential Distributions. Australian and New Zealand Journal of Statistics, 41(2): 173-188.

[6] Ijaz,M., Asim, S. M. and Alamgir. (2019). Lomax exponential distribution with an application to real-life data, PLOS ONE, 14(12): 1-16.

[7] Jayakumar, K and Mathew, T. (2008). On a generalization to Marshall-Olkin scheme and its application to Burr type XII distribution. Statistical Papers, 49: 421-439.

[8] Lai, C.D., Zhang, L. and Xie, M. (2004). Mean Residual Life and Other Properties of Weibull Related Bathtub Shape Failure Rate Distribution. International Journal of Reliability, Quality and Safety Engineering, 11(2): 113-132.

[9] Marshall, A W and Olkin, I. A. (1997). new method for adding a parameter to a family of distributions with application to the exponential and weibull families. Biometrika, 84: 641-652.

[10] Nadarajah, S and S. Kotz. (2004). The beta Gumbel distribution. Mathematical Problems in Engineering, 10: 323-332.

[11] Nadarajah, S and A. K. Gupta. (2004). The beta Frechet distribution. Far East Journal of Theoretical Statistics, 14: 15-24.

[12] Nadarajah S and S. Kotz. (2006). The beta exponential distribution. Reliability Engineering and System Safety 91: 689-697.

[13] René Van Dorp, J. and Kotz, S. (2002). The Standard Two-Sided Power Distribution and it's Properties. The American Statistician, 56(2): 90-99.

[14] Silva, R B., Bourguignon, M., Dias, C R B, Cordeiro, G M. (2013). The compound class of extended Weibull power series distributions. Computational Statistics and Data Analysis, 58: 352-367.

[15] Zogarafos, K and Balakrishnan, N. (2009). On the families of beta-and generalized gamma-generated distribution and associated inference, Statistical Methodology, 6: 344-362.

i Надоели баннеры? Вы всегда можете отключить рекламу.