Научная статья на тему 'Some Properties and Different Estimation Methods for Inverse A (α) Distribution with an Application to Tongue Cancer Data'

Some Properties and Different Estimation Methods for Inverse A (α) Distribution with an Application to Tongue Cancer Data Текст научной статьи по специальности «Математика»

CC BY
245
44
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
Inverse distribution / Estimation methods / Hazard rate function / Lifetime distribution

Аннотация научной статьи по математике, автор научной работы — Shreya Bhunia, Proloy Banerjee

The inverted distribution is the distribution of the reciprocal of a random variable that follows a specified distribution. Here, a new one parameter inverse A(α) distribution has been introduced, which is the reciprocal of the A(α) distribution. An account of mathematical and statistical properties of the new distribution such as survival characteristics, quantile functions, mode, order statistics, ageing intensity function and stochastic ordering have been derived and discussed. Furthermore, from the frequentist view point we discussed several estimation approaches including maximum likelihood method, method of maximum product of spacings, ordinary and weighted least square methods, Cram´er-Von-Mises estimation and Anderson-Darling estimation methods. These methods are compared for both small and large samples by performing an extensive numerical simulation. The flexibility of the new lifetime distribution is demonstrated by modeling a tongue cancer data. The result indicates the superiority for proposed model compared to some popular competing ones.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Some Properties and Different Estimation Methods for Inverse A (α) Distribution with an Application to Tongue Cancer Data»

Some Properties and Different Estimation Methods for Inverse A(a) Distribution with an Application to Tongue Cancer Data

Shreya Bhunia* and Proloy Banerjee

Department of Mathematics and Statistics, Aliah University, Kolkata, India [email protected] and [email protected] * Corresponding author E-mail: [email protected]

Abstract

The inverted distribution is the distribution of the reciprocal of a random variable that follows a specified distribution. Here, a new one parameter inverse A(a) distribution has been introduced, which is the reciprocal of the A(a) distribution. An account of mathematical and statistical properties of the new distribution such as survival characteristics, quantile functions, mode, order statistics, ageing intensity function and stochastic ordering have been derived and discussed. Furthermore, from, the frequentist view point we discussed several estimation approaches including maximum likelihood method, method of maximum -product of spacings, ordinary and weighted least square methods, Cramer-Von-Mises estimation and AndersonDarling estimation methods. These methods are compared for both small and large samples by performing an extensive numerical simulation. The flexibility of the new lifetime distribution is demonstrated by modeling a tongue cancer data. The result indicates the superiority for proposed model compared to some popular competing ones.

Keywords: Inverse distribution, Estimation methods, Hazard rate function, Lifetime distribution

1. Introduction

In several applied fields of research such as engineering, medical sciences, economics, biological sciences etc., analyzing and modeling complex datasets are the most essential parts. Albeit in literature there exists many well known standard distributions, sometimes it may not always reflect the real world scenario. So, the researchers aspire to extend structures of the probability models. Recently, [1] introduced a new one parameter A(a) distribution and the applicability of the distribution is investigated by analyzing three datasets. A continuous random variable Y is said to follow an A(a) distribution if its probability density function (pdf) is of the form;

fv (y) = "4 exp

y2

1 ( (a\ \ a — 1 - exp — +--

a V \y)) y

; y > o

(1)

and is denoted by Y ~ A(a). The corresponding cumulative distribution function (cdf) of Y is given by,

Fv (y) = exp 0-^1 - exp (j^j ; y> 0 (2)

with scale parameter a > o.

In statistical literature there are various methods for proposing new distributions by using baseline distributions. For example, [2] introduced a general method for obtaining more flexible distributions by adding a new parameter to an existing family of distributions, Quadratic rank

transmutation map (QRTM) [3], DUS transformation [4], a—power transformation method [5] etc. In this context, finding inversion of univariate probability distributions and their applicability under the inverse transformation method is one of the preferred areas of research in recent times. Sometimes it has been found that inverted version of the distributions are much more effective to explore additional aspects of the phenomena that non-inverted distribution cannot. For instances, inverse exponential distribution is studied by [6], inverse Weibull distribution is studied by [7], [8] studied inverse Lindley distribution, inverse Xgamma distribution is studied by [9], inverse power Lindley distribution by [10], inverted Gamma distribution by [11], inverse Kumaraswamy distribution is studied by [12].

In this present study, we have also introduced the inverted version of the A(a) distribution using the same technique and named it as the inverse A(a) distribution. The new distribution is flexible to model positive real datasets which possesses increasing hazard rate function. Another beauty of this distribution includes heavy-tail, unimodal, parsimonious in parameter and easy to use. The objectives of this article are: (i) to obtain some mathematical properties for inverse A(a) distribution and (ii) to estimate the unknown parameter of the model from frequentist perspectives. The maximum likelihood estimation (MLE), method of maximum product of spacings (MPS), ordinary least square estimation (OLS) and weighted least square estimation (WLS), Cramer-Von-Mises estimation (CVM) and the method of Anderson-Darling (AD) are considered as frequentist methods for parameter estimation. Also we compare these estimation procedures on the basis of root mean square error (RMSE) values for different sample sizes and different parameter values using Monte-Carlo simulation technique. Furthermore, to the best of our knowledge, no attempt has been made to compare all of these estimators for the inverse A(a) distribution along with mathematical and statistical properties. Additionally, to illustrate the flexibility of this distribution a tongue cancer patient data has been analyzed.

The remainder of this article is organized as follows. In Section 2, the new distribution has been introduced. Different statistical properties and associated measures of Inv-A(a) distribution have been discussed in Section 3. Different classical estimation procedures for the parameter of inverse A(a) distribution have been considered in Section 4. In Section 5, a simulation study is conducted to compare the various obtained estimators. Empirical application based on a real dataset is discussed in Section 6. Finally, concluding remarks are given in Section 7.

2. The inverted A(a) distribution

A new probability distribution, termed as inverted or inverse-A(a) distribution has been introduced in this section. By origin, this distribution is the reciprocal of the A(a) distribution and for simplicity throughout this study we use the notation Inv-A(a) for this new lifetime model. Here we consider the random variable Y having the density function (1), then the cdf of the inverted random variable X = y is defined as

Fx (x) = P[X < x] = 1 - Fy ( — ) = 1 - exp

— (1 — exp(ax)) a

; x > 0.

(3)

Now, by differentiating FX (x) given in (3) the pdf of the Inv-A(a) distribution is obtained and expressed as follows;

— (1 — exp(ax)) + ax

fx (x) = exp

; x > 0

(4)

and a > 0 is the scale parameter. The Inv-A(a) distribution is an one parameter family of continuous probability distributions on the positive real line.

The plots of pdf and cdf function of Inv-A(a) distribution for different choices of scale parameter a are shown in Figures 1a and 1b respectively. The plots reveal that the Inv-A(a) density can be decreasing, unimodal and right skewed.

0.2

0.4 0.6

x

-- a = 5

a = 6.5

a = 8.5

a = 10.5

a = 13

0.8

1.0

__a = 0.5

a = 1.5

a = 3

a = 5

a = 8

0.0 0.5 1.0 1.5 x

(a) (b)

Figure 1: The pdf and cdf plots of Inv-A(a) distribution for different parameter choices.

2.0 2.5 3.0

3. Some statistical properties 3.1. Reliability characteristics

The survival function (sf) and hazard rate function (hrf) are the basic characteristics of any lifetime distributions. Both the measures are commonly employed to describe and model the fundamental properties of a variety of survival datasets. The survival function S(t), which is defined as the probability that an individual or an item is survived at least t (t > 0) unit of time and denoted as S(t) = P(X > t) = 1 — F(t).

Thus, the sf of the Inv-A(a) distribution is defined as,

S(t)= e a(1-eat) (5)

The hazard rate function, also known as the failure rate function, is another key feature to consider when measuring a real-life phenomenon with a lifetime distribution. It can be interpreted as the conditional probability of failure, given it has survived upto at least the time t (t > 0) and is defined as h(t) = 1-p)t) = ft); where f (t) is the pdf and S(t) is the sf of the corresponding distribution. Therefore, the hrf for the Inv-A(a) distribution is given by,

e a (1 -eat)+at

h(t) = -¡ao^T = eat (6)

The ratio between the lifetime probability density and its distribution function is characterised as the reversed (or proportional) hazard rate function (rhrf) of a random life phenomena. For the Inv-A(a) distribution the rhrf is given as follows,

f(t) e a (1-eat)+at

H (t) = F(i) = 1 — - a (1—) (7)

Another similar measure is cumulative hazard function and is defined as follows [13]:

A(t) = —log S(t) = —log |ea(1-eat)] = 1 (eat — 1) (8)

So, clearly from expression (6) we can see that the hazard rate function is increasing for a > 0. The shape of the hazard rate is displayed in figure 2b for different choices of a, whereas figure 2a represents the shape of the survival function.

i ,>\

i " • i

V'.'•.« \

V'-X \ ^

i i ' \ i >

\ \ •. \ * 1

\ > \ V \

__a = 0.5

a = 1.5

a = 2.5

a = 4

a = 6.5

a = 10

- - a = 0.7

a = 1

■ — a = 1.3

a = 1.5

-— a = 1.75

/ /

/ / ; ' / <

i ' ; ///

/ / .' /' ' ' / /

ra x

/ / / / /' /

0.0 0.5 1.0 1.5 2.0 2.5 3.0 t

0.5

1.0 t

1.5

2.0

(a) (b)

Figure 2: The survival and hazard plots of Inv-A(a) distribution for different parameter choices.

The shape of the hazard rate function can also be derived mathematically by using the following lemma.

Lemma 1. Suppose f (t), for t > 0 is the density function of a positive real valued continuous random variable. f'(t) is the derivative of f (t) and

n(t) = -

m f (t) ■

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Then if n'(t) > 0 for all t > 0 then the hazard rate function is an increasing function of t.

Proof. [14].

Here,

n(x) = eax - a. After differentiating with respect to x, we have

n'(x) = aeax; a > 0.

It is clearly seen that n'(x) > 0 for x > 0. Therefore, the distribution has increasing hazard rate function.

3.2. Quantile functions, median and mode

The qth quantile function xq of Inv-A(a) distribution will be obtained by solving the following equation

F(xq) = q where, 0 < q < 1.

Therefore, 1 - ea) = q

xq =—log [1 - alog(1 - q)] .

(9)

Thus, the median (or 2nd quartile) of the proposed distribution is obtained by substituting q = 2 in (9).

i.e.,

X i = Q2 = — log

2 a

1 - a l°g ( 2

= -log [1 + a log 2] . (10)

a

Similarly, the 1st and 3rd quartiles are obtained by replacing q = | and q = | respectively. Thus expressions for the 1st and 3rd quartiles are as follows.

Qi = 1 log

a

. 3

i - a Zog ( 4

and Q3 = — Zog [1 + a Zog 4].

a

Now, the mode of the inverse A(a) distribution denoted by xm will be derived by solving the equation f '(x; a) = 0, for which f ''(x) < 0. The solution xm will be the mode of the distribution for which f (x) attains the maximum value. Therefore, the mode of the proposed distribution is obtained by solving the differentiation expressed as

1 = 0 (11)

dx

After simplification, we get

xm (12)

a

It is noted that, though the mode of A(a) distribution exists but it cannot be expressed in a closed form. However, in our study of Inv-A(a) distribution, we see that the mode exists with an explicit form.

3.3. Order statistics

In nonparametric statistics and inference, order statistics are one of the useful techniques. In life testing and reliability analysis order statistics have a wide range of applications. Let us assume that X(i), X(2), • • • , X(n) be the order statistics of a random sample Xi, X2, • • • , Xn drawn from a continuous population with cdf FX (x) and pdf fX (x). Therefore under these assumptions pdf and cdf of the order statistics X(r), r = 1, 2, • • •, n is expressed as

fr (x) = ---f (x)F (r-i)(x)[1 - F (x)](n-r); r = 1, 2, ••• ,n (13)

(r — 1)! (n — r)!

and

n / ^

(n-i)

Fr(x) = it (nH(x)[1 - Fx(x)](

(—4)

Now, by using the pdf (4) and cdf (3) in equation (13), we can easily derive the pdf of r order statistic for the Inv-A(a) distribution as in the following expression

l! r w-, ^ l(r-i) ( ^ i(n-r+i)

(n

n!

I r-1 /

! x—«. r — 1

(r - 1)!(n - r)! fc=0

t (r - ^(-l)'e- [ei<i-'")r'+'+n. (—5)

While using equation (3) in (14) the cdf of rth order statistic becomes,

Fx(r) (x) = it t (n)(k)( —1)' e ^(i-eaX). (16)

A — ~ U — fl \ / \ /

__p cx x

i=r k=0

In particular, the densities of the smallest and largest order statistics of the Inv-A(a) distribution are obtained by substituting r =1 and n simultaneously in the expression (15). Hence the pdf of the smallest order statistic X(1) is expressed as,

/x(1) (x)= nei a(1-eax)+-} and the pdf of the largest order statistic X(n) is as follows,

* ^ n-1 (n - 1V ^fc i (1-e°X)(fc+1) +ax)

fX(n) (x)= ^ k (-1)k ^ a + ' .

k=0 ^ '

3.4. Ageing intensity function

Ageing intensity (AI) function has been developed by [15] and according to him, a unimodal failure rate can be represented as either approximately decreasing or approximately increasing or approximately constant. [16] investigated various features of AI functions, whereas [17] discussed AI function in the field of reliability theory. The AI function for a positive random variable X, denoted by LX (t), for any t > 0. The ratio of the instantaneous failure rate to a baseline failure rate is used to calculate the AI function. It is defined as

Lx (t)- h(t)

H (t)' -tf (t) S(t)lnS (t)

,t > 0.

Where f (.) and S(.) are the probability density function and survival function of the random

variable X respectively. H(t) is failure rate average and it can be written as H(t) = (/Q h(u)du)/t. Now, if X ~ Inv-A(a) then the expression for the AI function is obtained as

Lx (t) = -1. (17)

AI function is uniquely determined by the failure rate function, however the converse is not true. The stronger the ageing tendency of the related random variable, the higher the value of the AI function. If the failure rate is a constant then the AI = 1, If the failure rate is increasing then the AI > 1, if the failure rate is decreasing then the AI < 1.

3.5. Stochastic ordering

The notion of stochastic ordering was first suggested by [18] and used to demonstrate the comparative behaviour of two positive continuous random variables. Suppose X and Y are the two random variables with respective cdfs FX and Fv, then X is said to be smaller than Y in the following cases

• Stochastic order (X <st Y ) if FX (x) > Fv (x) for all x;

• Hazard rate order (X <hr Y ) if hX (x) > hv (x) for all x;

• Mean residual life order (X <mrl Y) if mX (x) < mv(x) for all x;

• Likelihood ratio order (X <lr Y ) if fV(X) decreases in x.

The aforementioned relationship are well known for establishing stochastic ordering of distributions.

X <lr Y X <hr Y X <mrl Y

X <st Y

When the required conditions are met, the Inv-A(a) distribution is ordered with regard to the strongest likelihood ratio ordering, as shown by the following theorem.

Theorem 1. let, X ~ Inv-A(ai) and Y ~ Inv-A(a2). If ai < a2, then X <lr Y and hence it implies other orderings.

Proof. According to the definition, the Likelihood ratio is defined as

£(x)

fX (x) exp i a i

fv (x) exp a2

)= 1 eai x

) ai ai

-1 (1 - exp(a2x)) + a2x

1

a2 a2

a2X

e

S. Bhunia, P. Banerjee RT&A, No 1 (67)

Inverse A(a) Distribution Volume 17, March 2022

Now differentiating with respect to x, we get

I^K - a2)+ ea2X - eaix

s(x)

^ £'(x) = £(x) {(ai - «2) + ea2x - eaix}

^ £'(x) < 0 if ai < «2.

Therefore, £(x) is decreasing function in x if ai < a2 and hence X <lr Y. The remaining orderings can also be established in similar manner.

4. Methods of Estimation

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

In this section, we describe some parameter estimation techniques for Inv-A(a) distribution under frequentist view point. In particular, the methods which we have discussed here, those are: maximum likelihood estimation (MLE), maximum product of spacings (MPS), ordinary least square (OLS) and weighted least square estimation (WLS), Cramer-Von-Mises estimation (CVM) and Anderson-Darling (AD) estimation.

4.1. Method of Maximum Likelihood

Here, we discuss the maximum likelihood estimation (MLE) method for estimating the unknown scale parameter a. Several desirable properties like consistency, asymptotic efficiency and invariance make this estimation technique most popular among others [19]. Let Xi, X2, • • • , Xn be an observed random sample from Inv-A(a) distribution with pdf (4) and the MLE for the unknown parameter is derived as follows. The likelihood function is defined as

L(x) = JJ f (x¿; a) = JJ exp

(x) = H f (x¿; a)

¿ = ¿ ¿=¿

So, the log-likelihood function becomes

— (1 — exp(ax¿)) + ax¿ a

1 n n

logL = - — — V eax + aVx¿. (18)

aa

¿=1 ¿=1

After differentiating logL in (18) with respect to a and equating to zero, we get the system of non-linear equation as,

¿/ogL 1 ^ eaXi _ — -y- x eax. n

— ¿E-e- — 02 + !> = ».

¿=1 ¿=1 1=1

a > x¿eaXi + a2 > x¿ = n.

J^eaxi — a^x¿eax + a2^ x¿ = n. (19)

¿=1 ¿=1 ¿=1

The solution of the above non-linear equation (19) gives the MLE for the parameter a. As the equation cannot be solved analytically, therefore some iteration techniques like Newton-Raphson method may be adopted to obtain the MLE.

4.2. Method of maximum product of spacings

For the estimation of unknown parameters of continuous univariate distributions, the maximum product of spacings (MPS) approach provides a strong alternative to MLE. [20] initially discussed the use of MPS estimation method whereas [22] demonstrated that this technique as efficient as the MLE and consistent across a wider range of situations. [21] developed the MPS approach as an approximation to the Kullback-Leibler information measure independently. Recently, [23], [24], [25], [26] etc. applied this approach in parameter estimation problem.

According to the procedure, the uniform spacings of a random sample drawn from the Inv-A(a) distribution are defined as

Dj(a) = F(xi:„; a) — F(x^i-«; a), i =1, 2, • • • , n + 1. Where, F(x0:n; a) = 0 and F(xn+i:n; a) = 1. Clearly, ^™+ii Dj(a) = 1.

The maximum product of spacings estimate aMPS is obtained by maximizing the geometric mean of the spacings,

"n+i

G(a)

J] Di (a)

1/(n+1)

„(a)

_i=1

with respect to a, or equivalently, by maximizing the logarithm of the geometric mean of sample spacings:

1 n+1

n(a) = -— E logDi(a). (20)

n +1

i=1

The estimate of the parameter a can be obtained by solving the following non-linear equation.

e1 (1-e"xi) {1 X1eaxi - ¿ (eaxi - 1)}

-+

e1 (1-e"Xi-1) {^ (e«xi-i - 1) - 1 xi-1eaxi-i } - e1 (1-e") {¿ (eax - 1) - 1 x¿eax}

1 - ei(1-e"xi)

n |e i (1E

. 0 ' — ea1 '

i=2

+ -1 (eaxn - 1) - 1 x„eaxn = 0 (21) a

Since the above non-linear equation is not having a closed form solution, it cannot be solved analytically. Therefore, we derived it numerically in next section by using some iteration technique.

4.3. Ordinary and weighted least square estimation

The ordinary least square and the weighted least square are the two conventional estimation procedures were developed by [27] in context of the parameters estimation of the Beta distribution. Let, x(1),x(2), ••• , x(n) be the ordered sample of size n from a distribution function F (xi:n; a). Then the ordinary least square estimator aoLS can be obtained by minimizing

OLS = E

i=1

F(xi:n; a)

n + 1

with respect to a. Now, the OLS estimator for the parameter of Inv-A(a) distribution can be obtained by solving the following non-linear equation

i=1

E 1 - e¿ (1--) - -^T e*(1--^ JL (e- - 1) - i*,e- = 0 (22)

Similarly, the weighted least square estimate (WLS) of the unknown parameter can be obtained by minimizing the following expression

WLS = E

F(xi:n; a)

n + 1

with respect to a and = ("+n-i(++2) be the weight function at the ith point.

Using equation (3) in the above expression and differentiating with respect to a we obtained aWLS by solving the following non-linear equation

(n + 1) V" I -/ / i (1-e a xi ) " \ (1-e ) I ÍOI.

(n +1)2(n + 2) (1 - ei(1-ea) - ei(1-ea" ) / J- (e- - 1) - 1 ^e-l

i(n - i + 1) y n +1J [a2 aj

ea(1-e ^(eaxi - 1) --x¿eax^ = 0. (23)

2

2

4.4. Minimum distance estimators

In this subsection, we briefly present two estimation approaches for the unknown parameter a of the proposed lifetime distribution based on the minimization, with respect to a, of the goodness-of-fit statistics. This class of statistics is defined based on the discrepancies between the estimate of the cdf and the empirical distribution function [28], [29].

4.4.1 Cramer-Von-Mises estimation

The Cramer-Von-Mises (CVM) estimator is a sort of minimal distance estimator, computed based on the discrepancies between the estimate of the cumulative distribution function and the empirical distribution function. This estimator is also known as maximum goodness of fit estimator. For more details about this method we refer [28, 29, 30] etc. [31] justified the use of Cramer-Von-Mises type minimal distance estimators by demonstrating that their bias is lower than that of other minimum distance estimators.

Let xi < x2 < • • • < xn be the ordered samples from the pdf (4). Then the Cramer-Von-Mises estimator aCVM can be obtained by minimizing Z with respect to a, where

1 , 2i- 1x2

z = 12T + £ F(Xi) -

12n ^ V K 2n

i=i v

1 ^^ a (1-eaxi )_ 2i - 1X 2

12n + £ 1 - e a

12n ^ V 2n

Thus, CVM estimator can be obtained by solving the following non-linear equation

n

^ - ea (1 —) - Hi-i ea (1-~) {¿ (e« - 1) - I x,e» > = 0. (M)

i=1

4.4.2 Anderson-Darling estimation

The Anderson-Darling (AD) estimator is another type of minimal distance estimator that is based on the Anderson-Darling statistic, is an alternative to traditional statistical tests for detecting sample distributions departure from normality [32]. The AD estimate, a ad of the parameter is obtained by minimizing the following expression with respect to a,

1 n

A = -n - - yV2i - 1) [logF(xi:„) + 1og(1 - F(xn+i_i:n))]. n ^—'

i=1

After minimizing the above expression with respect to a, we have the following nonlinear equation which will be solved numerically to obtain the AD estimator.

(2i - 1) 2ea(1-eaXi) - 1 jix¿eaxi - 42(eaxi - lU = 0. (25)

E

¿=i

5. Simulation study for different estimation methods

In this section, a Monte Carlo simulation study has been performed to investigate the behaviour of the proposed estimators. The performance is evaluated based on the root mean square error (RMSE) values of the following six estimates namely, maximum likelihood estimate (MLE), maximum product spacing (MPS), ordinary least square (OLS), weighted least square (WLS), Cramer-Von-Mises (CVM) and Anderson-Darling (AD) estimate. We generate K=1000 random samples X1,X2, ...,Xn of sizes n = 10, 25, 50, 75, 100 from Inv-A(a) distribution by using inverse transformation method. The initial choices of parameter are taken as a = 0.1,0.5,1.0, 3.0, 5.0. We calculate the ML, MPS, OLS, WLS, CVM, AD estimates for all choices of the scale parameter. Numerical outcomes are constructed in Table 1 where the average estimates and corresponding RMSE values are displayed.

Table 1 : Average estimate values and the associated RMSEs for Inv-A(a) distribution

Parameter choice Sample sizes (n) aMLE &MPS &OLS &WLS acvM O-AD

10 0.334841 0.071176 0.322474 0.353796 0.426889 0.169213

0.007426 0.000912 0.007035 0.008026 0.010337 0.002189

25 0.189182 0.061142 0.180246 0.174847 0.224161 0.123333

0.1 0.002820 0.001229 0.002538 0.002367 0.003926 0.000738

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

a = 50 0.145191 0.068292 0.124540 0.124579 0.146387 0.108763

0.001429 0.001002 0.000776 0.000777 0.001467 0.000277

75 0.128844 0.071494 0.116508 0.116079 0.130995 0.105799

0.000912 0.000901 0.000522 0.000508 0.000980 0.000183

100 0.123515 0.076731 0.112897 0.114688 0.123671 0.106887

0.000744 0.000736 0.000408 0.000464 0.000749 0.000218

10 0.752290 0.436867 0.716765 0.739353 0.844578 0.572157

0.007978 0.001996 0.006855 0.007569 0.010897 0.002282

25 0.595096 0.438685 0.578822 0.575220 0.627734 0.526648

0.5 0.003007 0.001939 0.002493 0.002379 0.004040 0.000843

a = 50 0.548418 0.454349 0.523074 0.524836 0.547285 0.509773

0.001531 0.001444 0.000730 0.000785 0.001495 0.000309

75 0.530561 0.460606 0.515562 0.516496 0.531643 0.506529

0.000966 0.001246 0.000492 0.000522 0.001000 0.000206

100 0.525595 0.468765 0.512467 0.515438 0.524463 0.508000

0.000809 0.000988 0.000394 0.000488 0.000774 0.000253

10 1.275037 0.904548 1.226252 1.223553 1.425039 1.077413

0.008697 0.003018 0.007155 0.007069 0.013441 0.002448

25 1.103888 0.918020 1.079842 1.076663 1.133836 1.030476

1.0 0.003285 0.002592 0.002525 0.002424 0.004232 0.000964

a = 50 1.053224 0.941482 1.022732 1.026182 1.049406 1.010681

0.001683 0.001850 0.000719 0.000828 0.001562 0.000338

75 1.033451 0.950529 1.015402 1.017632 1.033153 1.007198

0.001058 0.001564 0.000487 0.000558 0.001048 0.000228

100 1.028529 0.961345 1.012633 1.016743 1.025899 1.00906

0.000902 0.001222 0.000399 0.000529 0.000819 0.000286

10 3.356249 2.809940 3.258877 3.224152 3.430144 3.098857

0.011266 0.006010 0.008186 0.007088 0.013602 0.003126

25 3.136043 2.857535 3.090781 3.091121 3.160051 3.042742

a = 3.0 0.004302 0.004505 0.002871 0.002882 0.005062 0.001352

50 3.070314 2.903247 3.024113 3.032452 3.058492 3.013378

0.002224 0.003060 0.000763 0.001026 0.001850 0.000423

75 3.044011 2.920498 3.016559 3.022434 3.039508 3.009221

0.001392 0.002514 0.000524 0.000709 0.001249 0.000292

100 3.038722 2.939091 3.014403 3.021810 3.031602 3.012286

0.001224 0.001926 0.000455 0.000690 0.000999 0.000389

10 5.426122 4.735217 5.287511 5.219631 5.491406 5.11804

0.013475 0.008373 0.009092 0.006945 0.01554 0.003733

25 5.16381 4.809465 5.102561 5.104967 5.184119 5.052456

a = 5.0 0.005180 0.006025 0.003243 0.003319 0.005822 0.001659

50 5.084771 4.872474 5.026128 5.038306 5.06668 5.015507

0.002681 0.004033 0.000826 0.001211 0.002109 0.00049

75 5.053036 4.896338 5.018092 5.026767 5.045195 5.010834

0.001677 0.003278 0.000572 0.000846 0.001429 0.000343

100 5.047319 4.92115 5.016312 5.026294 5.036646 5.014891

0.001496 0.002493 0.000516 0.000831 0.001159 0.000471

LU

CO

MLE MPS OLS WLS CVM AD

10

25

50

sample size

75

100

LU

CO

10

25

50

sample size

75

MLE MPS OLS WLS CVM AD

100

(a) RMSE for a = 0.5 (b) RMSE for a = 5

Figure 3: RMSE of a under the six different estimation methods with the variation of sample size n

According to Table 1, as the sample size increases, the RMSE of ML, MPS, OLS, WLS, CVM and AD estimates of the scale parameter decrease. Hence, all the estimators hold the property of consistency. Also, it has been observed that the RMSE of the ML, MPS, WLS, CVM and AD estimates increase with the increment of the scale parameter. For small size of sample n=10, the performance of MPS estimate is effective when a < 1. Overall, the AD estimate is most effective among all the estimates as it produces the least RMSE value for most of the cases we have considered in our study. The results are also verified from the Figure 3.

6. Real data application of the INV-a(a) distribution

A real dataset has been considered with the goal of evaluating the potentiality of the Inv-A(a) distribution by comparing it with some other well known distributions already available in literature. Inverse exponential (IE) [6], inverse Xgamma (IXg) [9], inverse Lindley (IL) [8], inverse Gamma (IG) [11], inverse Kumaraswamy (IK) [12], inverse Weibull (IW) [7], inverted Nadarajah-Haghighi (INH) [33], Exponentiated inverse Rayleigh (EIR) [34], Inverse power Lindley (IPL) [10] are the few distributions belong to the inverse family have been selected as the competitive models. The parameters of the considered models have been estimated through the MLE approach.

The data consists of death times (in weeks) of 52 patients having tongue cancer with an aneuploid DNA profile discussed by [35] and given by [36]. Recently, [37] used this dataset in their study. Patients with sexually transmitted illnesses who had a paraffin-embedded sample of malignant tissue obtained were chosen and the time frames to reinfection were estimated. Using a flow cytometer, the tissue samples were evaluated to see if the tumour had an aneuploid (abnormal) or diploid (normal) DNA profile, as described by [35].

In ordered to make comparison among the considered models some criterion includes 2 x negative log-likelihood (-2InL), Akaike Information Criterion (AIC), Corrected AIC (CAIC), Bayesian information criterion (BIC) and Hannan-Quinn information criterion (HQIC) are utilized. A model with minimum values of these statistics are considered to be the best model. Further, we also use goodness of fit tests such as Kolmogorov-Smirnov (K-S), Cramer-Von-Mises (CVM) and Anderson-Darling (AD) tests along with their corresponding P Values. The MLE with respective standard error (in parentheses) of the parameters and values of —2 InL, AIC, BIC, CAIC and HQIC and the numerical values of K-S, CVM and AD statistics along with their corresponding P values are displayed in Table 2 and 3 respectively. It has been observed that the Inv-A(a) distribution

<u Q

Inverse A(a) I

2

Data

2

Data

(a) Histogram,

(b) Empirical cdf

o

DL

E

LU

0.2 0.4 0.6 0.8

Theoretical Probabilities

2

Data

(c) PP Plot (d) Empirical Survival Plot

Figure 4: Empirical pdf, cdf, pp and sf plots for the tongue cancer data

have the lowest values for all goodness-of-fit statistics and the largest P value among all other competitive models. As a result, our proposed model outperformed the other models for the tongue cancer data.

Figure 4 shows a plot of estimated histogram, empirical cdf, PP-plot and a plot of the survival functions modified by the suggested theoretical models onto the empirical survival function (KaplanMeier estimate), which may be used to verify the goodness of fit for the proposed model. A graphical technique based on total time on test (TTT) plot is also used here to identify the shapes of the data. According to [38], the hrf is constant if the TTT plot is visually portrayed as a straight diagonal, the hrf is increasing (or decreasing) if the TTT plot is concave (or convex). The hrf is U-shaped (bathtub) if the TTT plot is firstly convex and then concave, if not, the hrf is unimodal. The TTT plot in Figure 5 indicates that the empirical hrf of the tongue cancer dataset is 'monotonically increasing'. Hence the proposed lifetime model Inv-A(a) might be a good fit for the cancer data theoretically.

0

3

4

0

3

4

0

3

4

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Table 2: Analytical results of the Inv-A(a) distribution and the other competing models for the tongue cancer data

Model Estimates (SE) -2 logL AIC CAIC BIC HQIC

Inv-A(a) 0.2333(0.1187) 80.9634 82.9634 83.0434 84.9146 83.7114

IE (0) 0.1766 (0.0245) 146.0563 148.0563 148.1363 150.0075 148.8044

IXg (0) 0.3741 (0.0351) 216.1880 218.1880 220.1393 218.2680 218.9361

IL (0) 0.3112(0.0310) 191.1509 193.1509 193.2309 195.1021 193.8989

IG (a,0) 0.5805 (0.0953) 0.1025 (2.3912) 133.2271 137.2271 137.4720 141.1295 138.7232

IK (a,0) 2.5056(0.3860) 1.6592 (0.3184) 88.4087 92.4087 92.6536 96.3112 93.9048

IW (a,0) 0.4113(0.0790) 0.6745 (0.0619) 121.5629 125.5629 125.8078 129.4654 127.0590

INH (a,0) 1.2836 (0.4463) 0.4228(0.0580) 107.3259 111.3259 111.5708 115.2284 112.8220

EIR (a,a) 0.1706 (0.0256) 0.0283(0.0049) 161.3631 165.3631 165.6080 169.2656 166.8592

IPL (a,0) 0.5790 (0.0497) 0.7788(0.1020) 122.8562 126.8562 127.1011 130.7587 128.3524

Table 3: Goodness of fit measures of the Inv-A(a) distribution and the other competing models for the tongue cancer data

Model K-S P value CVM P value AD P value

Inv-A(a) 0.13896 0.26783 0.22006 0.23203 1.13202 0.29461

IE (0) 0.40253 9.606x10" -8 2.52648 4.603x10" -7 12.57664 1.153x10" -5

IXg (0) 0.51125 3.130x10" 12 4.93455 0.00000 28.18699 1.153x10" -5

IL (0) 0.48784 3.563x10" 11 4.04569 0.00000 23.22221 1.153x10" -5

IG (a,0) 0.27891 6.130x10" -4 1.21856 6.903x10" 4 6.15950 8.319x10" 4

IK (a,0) 0.20757 2.265x10" -2 0.41892 6.405x10" 2 2.08487 8.278x10" 2

IW (a,0) 0.21708 1.488x10" -2 0.81699 6.410x10" -3 4.48490 5.123x10" 3

INH (a,0) 0.19485 3.857x10" 2 0.63488 1.800x10" 2 3.51034 1.531x10" 2

EIR (a,a) 0.34511 8.350x10" -6 2.08864 5.598x10" 6 9.97822 1.867x10" 5

IPL (a,0) 0.21509 1.627x10" 2 0.81766 6.386x10" 3 4.5175 4.941x10" 3

7. Conclusion

In medical science and reliability engineering, development of a distribution with an increasing hazard rate function constitutes a considerable practical interest. In this article, we have presented an Inv-A(a) distribution with some properties such as quantile function, median, reliability function, hazard rate function, order statistics, ageing intensity function etc. The flexibility of this distribution primarily depends on the reliability behaviour as the distribution has an increasing hazard rate function. This feature of such distribution enhances the applicability to the real world. For instance, it describes the real scenarios which are more likely to fail with age, either of a human being or a machine whose parts wear out.

The model parameter is estimated through the ML estimation, maximum product of spacings estimation, ordinary and weighted least square estimation, CVM and AD estimation respectively.

o

o Ö

0.0

0.2

0.4

0.6

0.8

1.0

i/n

Figure 5: TTT plot for the Tongue cancer data

The Monte Carlo simulation study has been performed to investigate the performance of the obtained estimators and it is noticed that all the estimators are asymptotically unbiased and consistent. Among all the traditional estimation methods, Anderson-Darling method outperforms the others. Furthermore, we consider tongue cancer data to exhibit the applicability of the Inv-A(a) distribution in the field of bio-medical science. To examine the superiority of the proposed model, we compared it with some competitive models and found that our model has the best fittings amongst them based on the goodness of fit measures. Therefore, we hope that our new proposed model from the family of inverse probability distribution might be taken as a viable choice to analyze several medical science data.

[1] Alshenawy, R. (2020). A new one parameter distribution: properties and estimation with applications to complete and type II censored data. Journal of Taibah University for Science, Taylor & Francis, 14(1):11-18.

[2] Marshall, A. W. and Olkin, I. (1997). A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika, Oxford University Press, 84(3):641-652.

[3] William, T. S. and Ian,R. C. B. (2009). The alchemy of probability distributions: beyond Gram-Charlier expansions, and a skew-kurtotic-normal distribution from a rank transmutation map. arXiv preprint arXiv:0901.0434.

[4] Kumar, D., Singh, U. and Singh, S. K. (2015). A method of proposing new distribution and its application to Bladder cancer patients data. J. Stat. Appl. Pro. Lett, 2(3):235-245.

[5] Mahdavi, A. and Kundu, D. (2017). A new method for generating distributions with an application to exponential distribution. Communications in Statistics-Theory and Methods, Taylor & Francis, 46(13):6543-6557.

[6] Keller, A. Z., Kamath, A. R. R and Perera, U. D. (1982). Reliability analysis of CNC machine tools. Reliability engineering, Elsevier, 3(6):449-473.

[7] Calabria, R. and Pulcini, G. On the maximum likelihood and least-squares estimation in the inverse Weibull distribution. Statistica Applicata, 2(1):53-66.

[8] Sharma, V. K., Singh, S. K. and Singh, U. and Agiwal, V. (2015). The inverse Lindley distribution: a stress-strength reliability model with application to head and neck cancer data. Journal of Industrial and Production Engineering, Taylor & Francis, 32(3): 162-173.

References

[9] Yadav, A. S., Maiti, S. S. and Saha, M. (2021). The inverse xgamma distribution: statistical properties and different methods of estimation. Annals of Data Science, Springer, 8(2): 275-293.

[10] Barco, K. V. P., Mazucheli, J. and Janeiro, V. (2017). The inverse power Lindley distribution. Communications in Statistics-Simulation and Computation, Taylor & Francis, 46(8): 6308-6323.

[11] Abid, S. H. and Al-Hassany, S. A. (2016). On the inverted gamma distribution. International Journal of Systems Science and Applied Mathematics, 1(3): 16-22.

[12] Abd AL-Fattah, A. M., El-Helbawy, A. A. and Al-Dayian, G. R. (2017). Inverted Kumaraswamy Distribution: Properties and Estimation. Pakistan Journal of Statistics, 33(1).

[13] Sakthivel, K. M. and Dhivakar, K. (2021). Transmuted Sine-Dagum Distribution and its Properties. Reliability: Theory & Applications, 16,4 (65):150-166.

[14] Ronald, E. G. (1980). Bathtub and Related Failure Rate Characterizations. Journal of the American Statistical Association, 75(371):667-672.

[15] Jiang, R., Ji, P., and Xiao, X. (2003). Aging property of unimodal failure rate models. Reliability Engineering & System Safety, Elsevier, 79(1): 113-116.

[16] Nanda, A. K., and Bhattacharjee, S., and Alam, S. S. (2007). Properties of aging intensity function. Statistics & probability letters, Elsevier, 77(4): 365-373.

[17] Bhattacharjee, S., Nanda, A. K. and Misra, S. Kr. (2013). Reliability analysis using ageing intensity function. Statistics & Probability Letters, Elsevier, 83(5): 1364-1371.

[18] Shanthikumar, J. G. (1994). Stochastic orders and their applications. Academic Press.

[19] Casella, G., and Berger, R. L. (2021). Statistical inference. Cengage Learning.

[20] R. C. H. Cheng. and N. A. K., Amin. (1983). Maximum product of spacings estimation with applications to the lognormal distribution. Journal of the Royal Statistical Society: Series B, 45(3): 394-403.

[21] Bo, Ranneby. (1984). The Maximum Spacing Method. An Estimation Method Related to the Maximum Likelihood Method. Scandinavian Journal of Statistics, 11(2):93-112.

[22] F. P. A. Coolen. and M. J. Newby. (1991). The Maximum Spacing Method. An Estimation Method Related to the Maximum Likelihood Method. Kwantitatieve Methoden, 37:19-32.

[23] Kaushik. G., and S. Rao., J. (2001). A general estimation method using spacings. Journal of Statistical Planning and Inference, 93:71-82.

[24] T. S. T. Wong and W. K. Li. (2006). A Note on the Estimation of Extreme Value Distributions Using Maximum Product of Spacings. IMS Lecture Notes-Monograph Series, 52:272-283.

[25] Singh., U., Singh., S. K. and Rajwant., K. S. (2014). The Maximum Spacing Method. An Estimation Method Related to the Maximum Likelihood Method. Journal of Statistics Applications and Probability, 3(2):179-188.

[26] Mazucheli, J., Ghitany, ME. and Louzada, F. (2017). Comparisons of ten estimation methods for the parameters of Marshall-Olkin extended exponential distribution. Communications in Statistics-Simulation and Computation, Taylor & Francis, 46(7):5627-5645.

[27] James J. S., Venkatraman, S. and James, R. W. (1988). Least-squares estimation of distribution functions in johnson's translation system. Journal of Statistical Computation and Simulation, 29(4):271-297.

[28] D'Agostino, R. B. (1986). Goodness-of-fit-techniques. CRC press, 68.

[29] Luceno, Alberto. (2006). Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Computational Statistics & Data Analysis, Elsevier, 51(2): 904-917.

[30] Louzada, F., Ramos, P., L. and Perdona, G., SC. (2016). Different estimation procedures for the parameters of the extended exponential geometric distribution for medical data. Computational and mathematical methods in medicine, Hindawi, 2016.

[31] Macdonald, P. D. M. (1971). Comments and queries comment on "an estimation procedure for mixtures of distributions" by choi and bulgren. Journal of the Royal Statistical Society: Series B (Methodological), 33(2): 326-329.

[32] Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain" goodness of fit" criteria based on stochastic processes. The annals of mathematical statistics, JSTOR, 193-212.

[33] Tahir, M. H., Cordeiro, G. M., Ali, S., Dey, S. and Manzoor, A. (2018). The inverted Nadarajah-Haghighi distribution: estimation methods and applications. Journal of Statistical Computation and Simulation, Taylor & Francis, 88(14): 2775-2798.

[34] Rao, G. S. and Mbwambo, S. (2019). Exponentiated inverse Rayleigh distribution and an application to coating weights of iron sheets data. Journal of probability and statistics, Hindawi, 2019.

[35] Sickle-Santanello, B. J., Farrar, W. B., Decenzo, J., F., Keyhani-Rofagha, S., Klein, J., Pearl, D., Laufman, H. and O'Toole, R. V. (1988). Technical and statistical improvements for flow cytometric DNA analysis of paraffin-embedded tissue. Cytometry: The Journal of the International Society for Analytical Cytology, 9(6):594-599.

[36] Klein, J. P. and Moeschberger, M. L. (2003). Survival analysis: techniques for censored and truncated data. Springer, 1230.

[37] Bantan, R., Hassan, A. S., Elsehetry, M. and Kibria, B. M. (2020). Half-logistic xgamma distribution: Properties and estimation under censored samples. Discrete Dynamics in Nature and Society, Hindawi, 2020.

[38] Aarset, M. V. (1987). How to identify a bathtub hazard rate. IEEE Transactions on Reliability, IEEE, 36(1):106-108.

i Надоели баннеры? Вы всегда можете отключить рекламу.