Научная статья на тему 'A PROBABILITY MODEL FOR SURVIVAL ANALYSIS OF CANCER PATIENTS'

A PROBABILITY MODEL FOR SURVIVAL ANALYSIS OF CANCER PATIENTS Текст научной статьи по специальности «Математика»

CC BY
32
9
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
Survival analysis / compounding / hazard function / reversed hazard rate function / stress-strength parameter / maximum likelihood estimation / applications

Аннотация научной статьи по математике, автор научной работы — Mousumi Ray, Rama Shanker

It has been observed by statistician that to find a suitable model for the survival analysis of cancer patients is really challenging. The main reasons for that is the highly positively skewed nature of datasets. During recent decades several statistician tried to propose one parameter, two-parameter, three-parameter, four-parameter and five-parameter probability models but due to either theoretical or applied point of view the goodness of fit provided by these distributions are not very satisfactory. In this paper a compound probability model called gamma-Sujatha distribution, which is a compound of gamma and Sujatha distribution, has been proposed for the modeling of survival times of cancer patients. dolor Many important properties of the suggested distribution including its shape, moments (negative), hazard function, reversed hazard function, quantile function have been discussed. Method of maximum likelihood has been used to estimate its parameters. A simulation study has been conducted to know the consistency of maximum likelihood estimators. Two real datasets, one relating to acute bone cancer and the other relating to head and neck cancer, has been considered to examine the applicability, suitability and flexibility of the proposed distribution. The goodness of fit of the proposed distribution shows quite satisfactory fit over other considered distributions.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «A PROBABILITY MODEL FOR SURVIVAL ANALYSIS OF CANCER PATIENTS»

A PROBABILITY MODEL FOR SURVIVAL ANALYSIS OF

CANCER PATIENTS

Mousumi Ray* , Rama Shanker

Department of Statistics, Assam University, Silchar, India [email protected] , [email protected]

*Corresponding Author

Abstract

It has been observed by statistician that to find a suitable model for the survival analysis of cancer patients is really challenging. The main reasons for that is the highly positively skewed nature of datasets. During recent decades several statistician tried to propose one parameter, two-parameter, three-parameter, four-parameter and five-parameter probability models but due to either theoretical or applied point of view the goodness of fit provided by these distributions are not very satisfactory. In this paper a compound probability model called gamma-Sujatha distribution, which is a compound of gamma and Sujatha distribution, has been proposed for the modeling of survival times of cancer patients. dolor Many important properties of the suggested distribution including its shape, moments (negative), hazard function, reversed hazard function, quantile function have been discussed. Method of maximum likelihood has been used to estimate its parameters. A simulation study has been conducted to know the consistency of maximum likelihood estimators. Two real datasets, one relating to acute bone cancer and the other relating to head and neck cancer, has been considered to examine the applicability, suitability and flexibility of the proposed distribution. The goodness of fit of the proposed distribution shows quite satisfactory fit over other considered distributions.

Keywords: Survival analysis, compounding, hazard function, reversed hazard rate function, stress-strength parameter, maximum likelihood estimation, applications.

I. Introduction

Several statistical distributions have been extensively used for the modeling and analysis of survival times (time to event) data, also known as reliability data in biomedical sciences. On comparative studies on gamma and Weibull [1] distribution done by Shanker et al [2] shows that on some datasets relating to head and neck cancer these two classical two-parameter lifetime distributions does not provide good fit and on some datasets they perform diversely. During recent decades researchers were trying to modify Weibull distribution which would provide better fit to survival times of cancer patients. We know that the Weibull distribution is the most popular distribution for modeling survival data that properly explain the mortality and failure. Several authors have extended the Weibull distribution by adding one or more additional shape parameters to bring more flexibility in the shape of the distribution to accommodate the nature of

Mousumi Ray and Rama Shanker RT&A, No 3 (79) A PROBABILITY MODEL FOR SURVIVAL ANALYSIS ..._Volume I9, September 2024

the data. For example, exponentiated generalized Weibull (EGW) distribution by Cordeiro et al [3],

Beta-Weibull (BW) distribution by Famoye et al [4], Kumaraswamy Weibull (Kum-W) distribution

by Cordeiro et al [5], exponentiated Kumaraswamy Weibull (EKumW) distribution by Eissa [6],

Alpha power Weibull (APW) distribution by Nassar et al [7], are some among others. Although,

these two, three and four parameters extended Weibull distribution provide good fit to survival

times of cancer patients, but are not quite satisfactory because, in general, cancer data are highly

positively skewed.

During recent decades several researchers have been trying to derive a suitable lifetime distribution to model data which are highly positively skewed, especially survival times of cancer patients. The search for highly positively skewed continuous distribution (mean is much less than the variance) has been studied by several researchers using compounding technique as the compounding always provides a highly positively skewed distributions. For instance, gamma distribution is a positively skewed distribution and its compounding with other positively skewed distribution provides highly positively skewed distribution. A compound gamma distribution arises when a random variable say X, follows gamma distribution with a shape parameter p and scale parameter X and the parameter X itself behaves as a random variable with some distribution which is known as mixing distribution. There are four important one parameter positively skewed lifetime distributions namely, exponential distribution, Lindley distribution by Lindley [8], Shanker distribution by Shanker [9] and Sujatha distribution by Shanker [10] for modeling and analysis of survival time of cancer patients and out of these four distributions, Sujatha distribution provides much better fit as compared to the other distributions. The gamma-Lindley distribution (G-LD) proposed by Abdi et al [11] which is a compound of gamma distribution with Lindley distribution of Lindley [8] is highly positively skewed distribution. The gamma - Shanker distribution (G-SD) introduced by Ray and Shanker [12], which is a compound of gamma distribution with Shanker distribution of Shanker [9] is also highly positively skewed distribution. Further exponential-Shanker distribution (E-SD) suggested by Ray and Shanker [13] which is the compound of exponential distribution with Shanker distribution is also positively skewed distribution. The G-LD and the G-SD for x > 0,p> 0,a > 0 are defined by its probability density function (pdf) and cumulative density function (cdf) as follows

, s. pa1 (1 + p + a + x) xpl

fa-ld (x P,a)= / ---(1)

(a + 1)(a + x)

xp\(® +1) x + (1 + p + £»)®l Fa d (x;pa)= L( , ),+ ( (p+1 ) J (2)

(a + 1)(a + x)

pa2 (1 + p + ax + a2) xp-1

fa-sd (x;p,a) =

(1 + a2 )(a + x )2+p (3)

xp\x(1 + a2) + (1 + p + a2)a]

Fg-sd (x; p,a) = -^-—--+p-^ (4)

(1 + a )(a+ x)

Sujatha distribution is defined by its pdf and cdf

a3 (1 + x + x2)e~m

fsvn (x;a) = TTT 7v\ (a +a+ 2)

(5)

Fsud (x;a) =1 -

ax (ax + a + 2) a1 +a + 2

(6)

The motivations for considering the gamma-Sujatha distribution (G-SUD), the compound of gamma and Sujatha distribution are as follows:

Mousumi Ray and Rama Shanker RT&A, No 3 (79) A PROBABILITY MODEL FOR SURVIVAL ANALYSIS ..._Volume I9, September 2024

(i). Suppose X is the lifetime of component following gamma distribution with shape parameter p and scale parameter X . If the sample is drawn from the population having variability in the scale parameter X, then the variability can be well explained by assuming the distribution of X to be Sujatha distribution.

(ii). In real life situation, the sustainability of the components of population differs from each other in terms of heterogeneity. The analysis of data from such populations, heterogeneity can easily be taken into consideration using compound distributions. G-LD and G-SD are the two compound distributions proposed for the analysis of such variation in the components of populations. As Sujatha distribution provides better fit over Lindley and Shanker distributions, it is the expectation that the G-SUD would provide better fit over existing compound distributions.

(iii). In general, compound distribution is the most suited distributions for the datasets having long right tail, which have been observed in some real lifetime datasets relating to cancer datasets.

(iv). As Sujatha distribution performs well compared to exponential and Lindley distribution so it is hoped that G-SUD would performs better over the classical gamma and Weibull distributions as well as other two-parameters distributions.

The whole paper is divided into eleven sections. The section one is introductory in nature. The gamma-Sujatha probability model and some of its results are given in section two. The hazard function and the reversed hazard function of the proposed probability model are given in section 3. Section four contains the quantile and the moments of the distribution. The extreme order statistics and the stochastic ordering of the distribution are given in sections 5 and 6 respectively. The maximum likelihood estimation of parameters and the estimation of stress-strength parameter of the distribution are discussed in sections seven and eighth. The simulation study to know the consistency of maximum likelihood estimators and applications of the distribution are provided in sections nine and ten respectively. The conclusion of the whole paper is given in section eleven.

II. Gamma-Sujatha Distribution

The pdf and the cdf of gamma-Sujatha distribution (G-SUD) are obtained as

(03(0 + x)2 + (p + l) (p + 0 + x + 2)1 xp-1 fo-sub (x;p,0) =-T~T—„w ^p+3-;x > °'P>0,0 >0

fgsud (x;p,0) = -

(w2 + 0 + 2)(® + x)p

(0 + x )2 0+(pp + 1)(p + w + 2)}- 2 x (0 + x ~){rn+p(p + 2)}

+ p(p + l) x2

(7)

(02 + 0 + 2)(0 + x )2+p

; x > 0,p> °,0> 0

(8)

Figure 1 and 2 shows the pdf and cdf of G-SUD for selected values of parameters. The G-SUD shows the tendency to accommodate right tail and for particular values of parameters, the tail approach to zero at a faster rate. This means that G-SUD would provide better fit appropriately to those datasets where there is an extended right tail or the right tail approaches to zero at a faster rate. Such datasets are quite prevalent in the biomedical sciences relating to survival times of cancer patients.

p

x

Fig. 2: cdf plots of G-SUD

Theorem 1: The G-SUD is decreasing for p < 1 . Proof: We have,

pa3 (a + x)2 +(p + 1)(p + a + x + 2) Ixp 1 f (x;p,a) =---—-—-—---;x > 0,p> 0,a > 0

(a2 + a + 2)(a + x )p

log f (x;p,a) = log\(a + x )2 +(p + 1)(p + a + x + 2)| + (p-1) log (x )-(p + 3) log (a + x) + C .

where C is a constant. We have

d i fi \ —log f ( x;p,0) =--

dx x

(p +1){0 + x)2 + (p + 2)0 + ®+ x + 3)| (0 + x ) j(® + x )2 + (p + 1)(p + 0 + x + 2)j

For p< 1, -j- log f (x;p,0)< 0 and this means that f (x) is decreasing for all x

III. Hazard function and Reversed hazard function

The hazard function and the reverse hazard function are two important functions of a distribution. The reliability (survival) function of G-SUD is given by

(0 + x)2 J® + (p + l)(p + 0 + 2)} - 2x (0 + x)0 + p (pp + 2)}

+ p(p +1) x2

02 + 0 + 2)0 + x)p+2 - x

R ( x; p,a)=-

02 +0+ 2)0+ x )P+2

The corresponding hazard and reversed Hazard function of G-SUD are given by

f (x;p0)

h (x; p, 0) = —;--

v ' R (x;p,0)

3 p-1 p0 xp (0 + x )2 + (p +1) (0 + x ) + (p +1) (p + 2)J

02 +0 + 2)(0 + x)p+3 - xp (0 + x )2 {0 + (p +1) (p + 0 + 2)} - 2x (0 + x ){0 + p (p + 2)} + p(p +1) x2

f (x; p0) r( x;p,0j = F (x;p,0)

p0 (0 + x)2 + (p + 1)(p + 0 + x + 2)

x (0 + x ) (0 + x )210 + (p + 1)(p + 0 + 2)J- 2x (0 + x )|0 + p(p + 2)J + p(p +1) x

(9)

(10)

Fig.3: Hazard function of G-SUD

Fig.4: Reverse hazard function of G-SUD

Theorem 2: For p < 1, the hazard function of the G-SUD is decreasing and for p > 1 it is unimodal. Proof: We have

-3 \(a + x)2 +(p + 1)(p + a + x + 2)] xp-1

f ( x;p,a) = -

pa

pa

f'(x;p,a) = -Now, suppose that

%( x ) = -

(a2 + a + 2)(a + x )p+3 (pa -a- 4x ) {(a + x )2 + (p +1) (p + a + x + 2 ) j + x (a + x)(p + 2a + 2x +1)

x > 0,p> 0, a > 0 , and

x(a2 +a + 2)(a + x)'

p+4

-; x > 0,p> 0,a > 0

f '( x; p,a) f ( x;p,a)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(p -1) {(a + x ) (p2 + p(a + x + 3) + (a + x + 2)) + (p +1) (p + 2) (p + 3) j ---+ -

(a + x){(a + x)2 +(p + l)(p + a + x + 2)j

This gives

x ) =

(a + x ) |(c + x )2 + (p +1) (p + a + x + 2) jjp2 + p ( 2c + 2x + 3) + ( 2c + 2x + 2 ) j - |(c + x) (p2 + p (a + x + 3) + (a + x + 2)) + (p +1) (p + 2) (p + 3) j (p -1) x {(a + x) (p + 3c + 3x +1) + (p + 2) (p + a + x + 2) j

(a + x )2 j(a + x )2 +(p + 1)(p + a+ x + 2)j

It is quite obvious that for p < 1, (x) < 0 and for % > 1, (x) < 0 has a global maximum at mode (say x0 ).

Theorem 3: The G-SUD has decreasing reverse hazard function. Proof: We have,

x

-(x) = -

p03 (0 + x)2 + (p +1) (p + 0 + x + 2)

x (0 + x) (0 + x )2 0 + (p + 1)(p + 0 + 2)}- 2x (0 + x )0 + p(p + 2)J + p(p +1) x2

This gives d log r (x)

ax

-p0 p0|p + 0 + 2 (2 x + 2 - x2)} -0(0(3 + 4x)- 2 x (2 - 3x)- 7} - x (5x-p( 4 - 3x)-2} -2 0(002 + x2 + x) + 2x} + x2

|(0 + x)2 +(p + 1)(p + 0 + x + 2)} (0 + x )2 (0 + (p +1) (p + 0 + 2)} -2 x (0 + x )0 + p(p + 2)} + p(p +1) x2

1_ 1

x (0 + x)

< 0

This proves the theorem for all p,a .

IV. Quantiles and Moments

The pth quantiles xp of G-SUD is defined by F (xp) = p ,is the root of the equation

(0 + xp) |0 + (p + l)(p + 0 + 2)J- 2x(0 + xp)|0 + p(p + 2)J + p(p +1)xp2

-- = P

(02 + 0+ 2)(0 + xp )2+p

(12)

This gives

(0 + xp) |0 +(p + l)(p + 0 + 2)J- 2x(0 + xp)|0 + p(p + 2)J + p(p +1)xp

xp =

P 02 + 0 + 2)

( ^p+1 1+0 V xp y

(13)

It should be noted that this x may be used to generate G-SUD random variates. Further, the median of G-SUD can be obtained from above equation by taking p = 1.

The moments of G-SUD can be obtained as follows: If X ~ G-SUD (<p.m) then,

E (X) = E (E (X | A)) = E g j = pE ( 1 j = »

Thus, in general, E (Xr) = « for r > 1 .This means that all moments of G-SUD are infinite and hence G-SUD has no mean. As G-SUD has no mean, if we take a sample (X1,X2,...,Xn) from G-SUD,

then mean X does not tend to a particular value. Since G-SUD has no raw and central moments, we have to derive inverse moments. Negative moments are useful in several life applications, such as life testing problems and estimation purpose. The negative moments for G-SUD can be obtained as follows:

The rth negative moment (about origin) /u(-r) ,of the G-SUD is given by, ,_r(p-r) r!02 +0(r + 2) + (r +1)(r + 2)] ^ =

-r)

r(p) 0r (02 +0 + 2)

Thus, for r = 1,2,3 and 4, we have

2

(a2 + 2a + 6) ^(-1)' -.w ^ p> 1

,p> 2

a (a2 + a + 2)(p-1)

2 (a2 + 3a +12) = a2 (a2 + a + 2)(p-1)(p-2)

6 (a2 + 4a + 20) ^(-3) a3 (a2 +a + 2)(p- 1)(p-2)(p-3)

24 (a2 + 5a + 30)

,p > 3

^(-4) =

a4 (a2 +a + 2)(p- 1)(p-2)(p-3)(p-4)

<> 4

(15)

(16)

(17)

(18)

It is obvious from the above expressions for negative moments that negative moments are not defined for < < 1.

V. Extreme Order Statistics

Let, XVn,...,XBBbe the order statistics of a random sample of size n from the G-SUD (p,a) distribution with distribution function F (x). The cdf of the minimum order statistic Xln is given by

(a + x)2 ja + (p + 1)(p + a + 2)J-2x(a + x) ja + p(p + 2)J

+ p(p +1) x2

Fx,, (x ) = 1 -

(a2 +a + 2)(a + x)2+< -xp

(

a +a +

2)(a + x)

2+p

The cdf of the maximum order statistic X is given by

(a + x)2 ja + (pp +1) (pp + a + 2)J - 2x (a + x) ja + p(p + 2)J

+ p(p + 1) x2

Fx„., (x) =

(a2 +a + 2) (a + x )2+p

VI. Stochastic Orderings

In probability theory and Statistics, a stochastic order quantifies the concept of one random variable being "bigger" than other. In many problems, it becomes necessary to compare two lifetime distributions with reference to some of their characteristics. Stochastic orders provide the necessary tools in such case.

A random variable X is said to be smaller than a random variable Y in the

i. Stochastic order (X -<v, Y ) if Fx (x) > Fr (y) for all x

ii. Hazard rate order (x -<hr 7) if hx (x) > hr (v) for all x

iii. Mean residual life order (X <mri 7) if mx (x) > mY (y) for all x

iv. Likelihood ratio order (X 7) if x) ! decrease in x

V ' fT(Y)

• fx (x) , —decrease in x

iv. Likelihood ratio order (X <ir 7) if

f (Y)

The following results due to Shaked and Shantikumar [14] are well known for establishing

n

stochastic ordering of distributions:

X<lrY^X<hrY^X<mrlY

W

X<SJ

Theorem 4: Let X1 ~ G-SUD^,^) and X2 ~ G-SUD02,02) .If px=p2=p and cox<co2 if

= m2 = m > 1 with q\ <p2, then Xl X2 =i> Xl <hr X2 =i> Xl <st X2. Proof: We have

01 + x )2 + (p + 1)(p +0 + x + 2)j022 +0 + 2)02 + x )p +3

fx, (x) _ p0 fx2 (x)"

p202

(0 + x )2 +(p2 + 1)(p +0 + x + 2)J(02 +0 + 2)(0 + x )p +3

(19)

Case I: For p = p2 = p, we get

G1 ( x ) =

d log G (x)

dx

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

03 (0 + X)2 - h(p+ -1)(p + 0 + X + 1)]022 + 0 + 2)

03 (0 + X )2 +(p+ t-1) (p + 0 + X + 1)] (0^ + 0 + 2)

\ p+3

0 + x j

p + 3

2 (0 + x ) + (p +1)

0

'2 + x (0 + x) +(p + 1)(p + 0+ x + 2)J 10+ x (0 + x) +(p + 1)(p + 0+ x + 2)

p + 3

2 (0 + x ) + (p +1)

= Q (0)- Q (01)

(20)

Where

Q (0) =

d d0

2 (0 + x ) + (p +1)

Q (0) =

p + 3

0 + x (0 + x )2 + (p +1) (p + 0 + x + 2)

-(p + 3) 2 (0 + x)(p - 0 - x +1) + (p +1)(p + 2)

(0 + x)2 ((0 + x)2 +(p + 1)(p + 0 + x + 2 )}2

< 0

(21)

The Xl is stochastically smaller than X2 with respect to the likelihood ratio for pl = p2 = p provided 0 < 0 .

Case II: For 0 = 0 = 0 > 1, we get

G2 ( x ) = ■

p (0 + X )2 Kfl + hi )(p + 0 + x + 2 )]

p2 (0 + x)2 (p2 + i-l)(p +0 + x + 2)]

(22)

d log G2 (x)

dx

2 (0+ x) + 0 +1)

_ + p__pl

A (

2 (0 + x) + (p2 +1)

_ + P2__p1_

(0 + x) + (p + 1)(p+0 + x + 2) x 0 + x J ^(0 + x) + (p2 + 1)(p2+0 + x + 2) x 0 + x = S (p )-S (p)

Where

2 (0 + x) + (p +1)

(23)

S (p) =

. + p-- p

(0+ x) +(p + 1)(p + 0 + x + 2) x 0+ x

d , -(0(0 + 4p + 6) + x (x + 4p + 20 + 6) + 2p} 1 1 — S (p) = ^----i---+1--— > 0 for 0> 1

dp ((0 + x)2 +(p + 1)(p + 0 + x + 2)} x 0 + x

Mousumi Ray and Rama Shanker RT&A, No 3 (79) A PROBABILITY MODEL FOR SURVIVAL ANALYSIS ..._Volume I9, September 2024

d log G2 (x)

Thus, for p < p, ---< 0 . The X1 is stochastically smaller than X2 with respect to the

likelihood ratio for co1=a2=a> 1 provided p < p.

VII. Estimation of parameters

Let (x1,x2,...,xn)be the observed values of a random sample (X1, X,..., XB) from the G-SUD. Then the log-likelihood function is given by

1-1

x.

( 3 v fl[(® + X)2 +((P +1)(® + X) + (p + l)(p + 2)]|fl l (p,®) = [ p 1 -

( ) I®2 +® +2J ff(® +x)p+3

i=1

The log-likelihood function of G-SUD is thus obtained as

n p

lnL (p,®) = nlnp + 3nln® -nln(®2 + ® + 2) + fln| (® + x, )2 +(p +1)(® + x) + (p + 1)(p + 2) I

+

(p- 1)Z ln (xi )-(P + 3)£ ln (® + xi)

The maximum likelihood estimators of p and ® , say p and ® are the simultaneous solutions of the following log likelihood

ainz(p1®)= £+t (®+x,)+(2p+3) +t m (x m (®+x ) = 0

dp p f(® + x.)2 +(p+1)(® + X.) t! v '' f v ''

d ln L (p,®) 3 n n (2®+1) ^ 2(®+ xt) + (p +1) ^ 1 -=--7-\ + / -^— ( p + 3) / ,7-- = 0

9® ® (®2 +® + 2) ,, j(® + x. )2 +(p +1)(®+ x,. ) + (p + 1)(p + 2)J t=1 (® + xt)

It is very difficult to solve these two log-likelihood equations directly, so we will use Fisher's scoring method. We have

d2 lnL(p,®)_-n+£ 2[(®+ x,)2 +(p + 1)(® + x)]-(® + x,)[(® + x,) + (2p + 3)]

dpp p f=1 j(® + Xi )2 +(p +1)(® + Xi )j2

[(® + x,. )2 +(p +1)(® + x)] d2 ln I (p,®)_ ^-[(® + x,. ) + (2p + 3)][2 (® + x,. ) + (p +1)] ^ 1 _d2ln L(p,®) dpd® , 1 {(®+x )2 +(p+1)(®+x )}2 ,1 ®+x, d®dp

2® j^(® + xt )2 + (p +1) (® + xt) + (p +1) (p + 2)] d 2ln L (p,®)_-3n 2n [(®2 +®+ 2)-n (2® +1)2 ] - -[2 (® + x, ) + (p +1)]2

d®2 ®2 (®2 +® + 2)2 ti j(® + ^)2+(p +1)(® + x) + (p + 1)(p + 2)j

The following equation can be solved for MLE's of p and ® of G-SUD

'd2 ln L (p,®) d2 lnL(p,®)^ | d ln L (p,®)1

dp2 dpd® ' p-p01 dp

d2 ln L (p,®) d2 ln L (p,®) l®-®0 j d ln L (p,®)

v d® dp d®2 j p=p0 ^ d® j

where p and ®0 are initial value of p and ® respectively. The initial values of the parameters taken in this paper for estimating parameters are p = 0.5 and ®0 = 0.5 .

VIII. Estimation of the Stress-Strength parameter R = P (X > Y)

In reliability, the stress-strength model describes the life of a component which has a random strength X subjected to a random stress Y .The component fails at the instant that the stress applied to it exceeds the strength, and the component will function satisfactory Whenever X > Y . In this section our objective is to estimate R = P (X > Y) when

X ~ G-SUD 0 ,0) and Y ~ G-SUD 02, m2), X and Y are independently distributed. The, the Stress- Strength Parameter is given by

R = P (x > y) = J" P (x > y | y = y )fY (y) dy

= n1 - f (y)] f (y) dy

=1-Í Jo

(®1 + y)2 {®1 + (p +1 )(®1 + P + 2)} -2 y (®i + y ){®i +P1 (pi + 2)} +Pi (Pi +1) y2

(a 2 + y )2 +(p2 +1)(®2 + y) + (P2 +1 )(p + 2)

(a2 + a + 2) (a2 +a2+ 2) (a + y )ai+2 (a + y )a

1dy

=G (p, p2,0,0)

Let, (x ,x2,...,xn)be the observed value of a random sample of size n from G-SUD(p,0)and (y,y2,...,ym) be the observed value of a random sample of size m from G-SUD(p2,0). The log-likelihood function of p,p,0 and 0 is given by

ln L (p, p2,0,0)

n r 2 "l

= n ln (p) + 3n ln (0)-n ln 02 +0 + 2) + ^ ln I (0 + x) +(p +1)0 + x ) + (p + 1)(p + 2) I

1-1 L -

n n

+ (p - 1)£ln(xi )-(p + 3)^ln(0 + xi) + m ln(p) + 3m ln(0)- m ln(02 + 0 + 2)

1=1 1=1

m n -, m m

+Eln |(02 + y )2 +(p2 + 1)(02 + y ) + (p2 + 1)(p2 + 2 )] + (p -1)E ln (y- )-p2 + 3)E ln (02 + ^ )

/-1 i=1 i=1

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Now,

n " (0 + x) + (2p + 3) " . . " .

* = n + ( 1+ -) ( pi ) , + Xln(x)-£ln(0. + x) = 0

p i=1 (01 + xt) + (p +1) (0 + xt) i=1 i=1

m m (02 + y,) + (2p2 + 3) m m

p.1 = - + 1, ( 2,2) ( p ) , +Sln(y,-)-Zln0 + y) = 0

p2 i=1 (0. + y ) +(p. +1)(02 + y ) i=1 i=1

a =--

3« «(2a1 +1)

"X

2 («1 + X ) + 01 +1)

0 (a12 + a + 2) ,=1 0 + x, + 0 +1) 0 + x,) + 01 +1) 01 + 2)

-(p+ 3 )X

,, (a + x, )

= 0

3m m (^^^ +1)

-m

2 (01 + y, ) + (P2 + 1)

= 0

a2 (a2 +a + 2) ,=1 {a, + y,)2 + (P2 +1){a, + y,+(p +1)(p + 2) I,1 + y,)

Solving these non-linear equations using any iterative methods available in R packages we can obtain the MLEs of the parameters as (pY,p2,0o2) and hence the MLE of R can thus be obtained

as

S = G (p,p2 oo)

IX. A Simulation Study

This section contains a simulation study to examine the consistency of maximum likelihood estimators of the G-SUD. The mean, bias (B), MSE and variance of the MLE's are computed using the formulae

1 " 1 " / 1 n / ~ \2 Mean =1V H., B =1 V( H. - H), MSE = - Vl Ht - H) , Variance = MSE - B2 nj=1 i n i=1 v ' ' n i=- v '

Where, H = (o, p) and H = (o, p ) .

The simulation results for different parameter values of G-SUD have been presented in tables 1 and 2 respectively using acceptance-rejection method:

a. Acceptance -rejection method for generating random samples from the G-SUD consists of following steps.

i. Generate a random variable Y from exponential (o) and U from Uniform (0,1)

ii. If U < f , then set X = Y ("accept the sample"); otherwise ("reject the sample")

M g (y)

and if reject then repeat the whole process until we get the required samples, where M is a constant.

b. The sample sizes n = 25,50,100,150,200 are taken

c. The parameter values are considered as (p = 5.5 ,o = 0.6 and (p= 6 ,o = 10

d. Each sample size is replicated 10000 times

Tables 1 and 2 reveal that for increasing sample size, the value of the biases, MSE and variances of the MLE of the parameters of G-SUD becoming smaller and certify the first-order asymptotic theory of maximum likelihood estimators.

Table 1: The mean, Biases, MSE and Variances of G-SUD for cp= 5.0 , o = 0.6

Parameters Sample Size Mean Bias MSE Variance

p 25 5.105803 0.1058031 0.01352763 0.002333342

50 5.097851 0.0978509 0.01195673 0.00238192

100 5.093918 0.0939184 0.0109286 0.00210792

150 5.092278 0.0922778 0.01075683 0.00224162

200 5.089048 0.0890482 0.00983284 0.00190325

oo 25 0.595456 -0.0045436 0.00004471 0.00002407

50 0.595628 -0.0043716 0.00004846 0.00002935

100 0.596119 -0.0038801 0.00004259 0.00002753

150 0.596454 -0.0035456 0.00003761 0.00002504

200 0.596588 -0.0034117 0.00003651 0.00002487

Table 2: The mean, Biases, MSE and Variances of G-SUD for p= 6.0 , o = 10

Parameters Sample Size Mean Bias MSE Variance

p 25 5.945172 -0.05482844 0.0042597 0.00125354

50 5.961664 -0.03833594 0.0027186 0.00271866

100 5.980525 -0.01947528 0.0025010 0.00212172

150 5.985068 -0.01893228 0.0023664 0.00200800

200 5.987536 -0.01246439 0.0022490 0.00209365

oo 25 10.08853 0.08852744 0.0120358 0.00419872

50 10.06313 0.06317850 0.0088113 0.00481984

100 10.03813 0.03813436 0.0064691 0.00501485

150 10.02125 0.02124674 0.0064981 0.00604670

200 10.00821 0.00820760 0.0055774 0.00550691

X. Applications

This section deals with the goodness of fit of G-SUD over G-LD, G-SD, Weibull and gamma distributions to illustrate its applications and using two real datasets relating to survival time of acute bone cancer and head and neck cancer patients. The summary of the two datasets are presented in tables 3 and 4 respectively. The total time to test (TTT) plots of the two datasets are given in figures 5 and 6 respectively. The goodness of fit of the considered distributions for two datasets is provided in tables 5 and 6 respectively. The fitted plots of the considered distributions for the two datasets are given in figure 7. The p-p plots of the considered distributions for the two datasets are finally presented in figures 8 and 9 respectively. The datasets are as follows: Dataset 1: Acute bone cancer

This dataset represents the survival times (in days) of 73 patients who diagnosed with acute bone cancer available in Mansour et al [15] and are as follows:

0.09, 0.76, 1.81, 1.10, 3.72, 0.72, 2.49, 1.00, 0.53,0.66, 31.61, 0.60, 0.20, 1.61, 1.88, 0.70, 1.36, 0.43, 3.16, 1.57, 4.93, 11.07, 1.63, 1.39, 4.54, 3.12,86.01, 1.92, 0.92, 4.04, 1.16, 2.26, 0.20, 0.94, 1.82, 3.99, 1.46, 2.75, 1.38, 2.76, 1.86, 2.68, 1.76,0.67, 1.29, 1.56, 2.83, 0.71, 1.48, 2.41, 0.66, 0.65, 2.36, 1.29, 13.75, 0.67, 3.70, 0.76, 3.63, 0.68,2.65, 0.95, 2.30, 2.57, 0.61, 3.93, 1.56, 1.29, 9.94, 1.67, 1.42, 4.18, 1.37.

Table 3: The summary of acute bone cancer dataset

Min. 1st Qu. Median Mean Variance 3rd Qu. Max.

0.090 0.920 1.570 3.755 112.33 2.750 86.010

Fig.5: TTT-plot of the acute bone cancer dataset and simulated data of G-SUD respectively.

Dataset 2: Head and Neck cancer

This dataset is the survival time of 44 patients diagnosed by Head and Neck cancer disease are available in Efron [16] and are given by

12.20, 23.56, 23.74, 25.87, 31.98, 37, 41.35, 47.38, 55.46, 58.36, 63.47, 68.46, 78.26, 74.47, 81.43, 84, 92, 94, 110, 112, 119, 127, 130, 133, 140, 146, 155, 159, 173, 179, 194,195, 209, 249, 281, 319, 339, 432, 469, 519, 633, 725, 817, 1776

Table 4: The summary of head and neck cancer dataset

Min. 1st Qu. Median Mean Variance 3rd Qu. Max.

12.20 67.21 128.50 223.48 93286.41 219.00 1776.00

Fig.6: TTT-plot of the head and neck cancer dataset and simulated data of G-SUD respectively.

Table5: ML estimates, -21og L ,AIC , BIC and K-S statistics with their P-values of the distributions for acute bone cancer data set

Distributions ML estimates p( SEof pp 0( S.Eof 0 —21og L AIC BIC K-S p- value

G-SUD 4.4567 (1.1253) 0.7646 (0.1776) 281.7757 285.7757 300.0857 0.09 0.86

G-SD 4.8969(1.3904) 0.4967(0.1360) 282.8051 286.8051 301.1151 0.10 0.39

G-LD 5.1600(1.8468) 0.4375(0.1602) 284.315 288.315 302.625 0.11 0.33

Gamma 0.1985(0.0389) 0.7456(0.1057) 334.5311 338.5311 352.8411 0.56 0.00

Weibull 0.4395(0.0687) 0.7655(0.0567) 322.8033 326.8033 341.1133 0.25 0.00

Table 6: ML estimates, —21og L , AIC,BIC and K-S statistics with their P-values of the distributions for head and neck cancer dataset.

Distributions ML estimates p(SEof p) 0 (S.Eof 0 —21og L AIC BIC K-S p- value

G-SUD 8.6223 (11.3202) 11.1699(14.5932) 558.4763 562.4763 576.7863 0.08 0.90

G-SD 8.6787(11.7435) 10.0923(14.8515) 558.4641 562.4641 576.7741 0.09 0.81

G-LD 8.4483(10.4902) 11.1557(14.3688) 558.4555 562.4555 576.7655 0.09 0.70

Gamma 0.0047(0.0010) 1.0522(0.1886) 564.0254 568.0254 582.3354 1.00 0.00

Weibull 0.0070(0.0034) 0.9234(0.0809) 563.7155 567.7155 582.0255 0.5 0.04

ACUTE BONE CANCER

HEAD AND NECK CANCER ^

Fig. 7: Fitted plots of distributions for acute bone cancer and head and neck cancer datasets

P-P PIOI Of G-SUD

Fig. 8: P-P plots for considerd distributions of acute bone cancer dataset

Fig. 9: P-P plots for considerd distributions of head and neck cancer dataset

From the summary of the two datasets in tables 3 and 4, it is quite obvious that the considered datasets are highly positively skewed and highly over-dispersed. Based on the values of -2logL, AIC (Akaike information criterion), Kolmogorov - Smirnov (K-S) statistic and the fitted plots of two parameter lifetime distributions, it is crystal clear from the goodness of fit that two parameters G-SUD is the best for modelling survival times of patients suffering from acute bone cancer and head and neck cancer. It can be recalled that recently Klakattawi [17] proposed a new extended Weibull distribution with five parameters and used it for analysing survival time of cancer patients and found that it gave much better fit than several two-parameter, three parameter ,four parameter and five parameter lifetime distribution including Weibull distribution, alpha power Weibull (APW) distribution by Nassar et al [7], Beta-Weibull (BW) distribution by Famoye et al [4],Kumararaswamy-Weibull (Kum-W) distribution by Cordeiro et al [5], exponentiated generalized Weibull (EGW) distribution by Cordeiro et al [3], a new Kumaraswamy family of generalized Weibull distribution by Ahmed et al [18] and exponentiated Kumaraswamy Weibull distribution by Eissa [6], some among others. Here we would like to emphasize that the proposed gamma-Sujatha distribution (G-SUD) provides much closure fit than all these two-parameter, three-parameter, four-parameter and five-parameter lifetime distributions as it can be seen from the test of goodness of fit given by Klakattawi [17]. The most interesting feature of G-SUD is that being two-parameter distribution is much easier to characterize and handle the distribution as compared to three-parameter, four-parameter and five parameter distributions and hence it can be considered an important probability model for modeling survival time of cancer patients.

XI. Concluding Remarks

In this paper, we propose a gamma-Sujatha probability model, a compound of gamma and Sujatha distribution to model data of long tails. Some important statistical and reliability properties have been discussed. Maximum likelihood estimation has been discussed for estimating parameters and simulation studies to know the consistency of ML estimators are presented. The goodness of fit of the G-SUD has been compared with several well-known two-parameter distributions and observed that it provides much better fit and hence it can be considered as an important probability models for survival time of patients suffering from acute bone cancer and head and neck cancer in biomedical science. As the proposed distribution is the new probability model, a lot of works can be done in the future and definitely it will draw the attention of research workers in biomedical sciences and biomedical engineering.

Conflict of Interest The Authors declare that there is no conflict of Interest.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

References

[1] Weibull, W. (1951). A Statistical distribution function of wide applicability. Journal of applied mechanics, 18:293-297

[2] Shanker, R., Shukla, K.K., Shanker, R. and Tekie, A.L. (2016). On modelling of Lifetime data using two-parameter gamma and Weibull distributions . Biometrics & Biostatistics International journal, 4: 201 - 206.

[3]Cordeiro,G.M.,Ortega,E.M.M and Da-Cunha, D.C.C. (2013). The exponentiated Generalized Class of Distributions. Journal of Data Science, 11:1-27.

[4] Famoye,F., Lee,C. and Olumolade,O.(2005). The Beta-Weibull Distribution. Journal of Statistical Theory and Applications, 4:121-136.

[5] Cordeiro,G.M.,Ortega,E.M.M and Nadarajah,S. (2010). The Kumaraswamy Weibull Distribution with Application to failure data. Journal of Franklin Institute ,347:1399-1429.

[6] Eissa.F.H.(2017). The Exponentiated Kumaraswamy-Weibull Distribution with Application to Real Data. International Journal of Statistics and Probability, 6:167-182.

[7] Nassar,M. Alzaatreh,A., Mead M. and Abo-Kasem,O. (2017). Alpha Power Weibull Distribution: Properties and Applications. Communications in Statistics-theory and Methods, 46:1023610252.

[8] Lindley, D.V. (1958). Fiducial Distribution and Bayes' Theorem. Journal of The Royal Statistical Society, 20: 102-107.

[9] Shanker, R.(2015). Shanker Distribution and its Applications. International Journal of Statistics and Applications, 5:338-348.

[10] Shanker ,R. (2016). Sujatha distribution and its Applications. Statistics in Transition New Series, 17:391-410.

[11] Abdi, M., Asgharzadeh, A., Bakouch, H.S. and Alipour, Z. (2019). A new compound Gamma and Lindley distribution with Application to failure data. Austrian Journal of Statistics, 48:54-75.

[12] Ray, M. and Shanker, R. (2023a). A Compound of Gamma and Shanker Distribution. Reliability Theory & Applications, 18: 87-99.

[13] Ray, M. & Shanker, R. (2023b).A Compound of Exponential and Shanker Distribution With an Application. Journal of Scientific Research of the Banaras Hindu University, 67:39-46.

[14] Shaked, M. and Shanthikumar, J.G. (1994). Stochastic Orders and Their Applications. Academic Press New Work.

[15] Mansour M., Yousof H.M., Shehata W.A. and Ibrahim M.( 2020). A new two parameter Burr XII distribution: properties, copula, different estimation methods and modeling acute bone cancer data. Journal of Nonlinear Science and Applications, 13:223-238.

[16] Efron B.( 1988). Logistic regression, survival analysis, and the Kaplan-Meier curve. Journal of the American statistical Association ,83:414-425.

[17] Klakattawi.H.S. (2022). Survival Analysis of Cancer Patients using a New Extended Weibull Distribution. PLOS ONE, 17:1-20.

[18] Ahmed, M.A., Mahmoud M.R.and Elsherbini E.A. (2015). The New Kumaraswamy Family of Generalized Distributions with Application. Pakistan Journal of Statistics and Operation Research, 11:159-180.

i Надоели баннеры? Вы всегда можете отключить рекламу.