Научная статья на тему 'Analysing Random Censored Data from Discrete Teissier Model'

Analysing Random Censored Data from Discrete Teissier Model Текст научной статьи по специальности «Науки о Земле и смежные экологические науки»

CC BY
96
26
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
Bayesian estimation / Classical estimation / Discrete Teissier distribution / Random censoring

Аннотация научной статьи по наукам о Земле и смежным экологическим наукам, автор научной работы — Abhishek Tyagi, Bhupendra Singh, Varun Agiwal, Amit Singh Nayal

This paper deals with the classical and Bayesian estimation of the discrete Teissier distribution with randomly censored data. We have obtained the maximum likelihood point and interval estimator for the unknown parameter. Under the squared error loss function, a Bayes estimator is also computed utilising informative and non-informative priors. Furthermore, an algorithm to generate randomly right-censored data from the proposed model is presented. The performance of various estimation approaches is compared through comprehensive simulation studies. Finally, the applicability of the suggested discrete model has been demonstrated using two real datasets. The results show that the suggested discrete distribution fits censored data adequately and can be used to analyse randomly right-censored data generated from various domains.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Analysing Random Censored Data from Discrete Teissier Model»

Analysing Random Censored Data from Discrete Teissier Model

Abhishek Tyagi1, Bhupendra Singh1, Varun Agiwal2, Amit Singh Nayal1.

1 Department of Statistics, Chaudhary Charan Singh University, Meerut, India 2Indian Institute of Public Health, Hyderabad, Telangana, India. abhishektyagi033@gmail.com bhupendra.rana@gmail.com varunagiwal.stats@gmail.com *amitnayal009@gmail.com

Abstract

This paper deals with the classical and Bayesian estimation of the discrete Teissier distribution with randomly censored data. We have obtained the maximum likelihood point and interval estimator for the unknown parameter. Under the squared error loss function, a Bayes estimator is also computed utilising informative and non-informative priors. Furthermore, an algorithm to generate randomly right-censored data from the proposed model is presented. The performance of various estimation approaches is compared through comprehensive simulation studies. Finally, the applicability of the suggested discrete model has been demonstrated using two real datasets. The results show that the suggested discrete distribution fits censored data adequately and can be used to analyse randomly right-censored data generated from various domains.

Keywords: Bayesian estimation; Classical estimation; Discrete Teissier distribution; Random censoring.

1. Introduction

In many instances, the collection of data is constrained by time or budgetary limitations, making it difficult to obtain the whole data set. Such partial data is known as censored data. Various censoring schemes are available in the literature to examine this partial data. Conventional Type I and Type II censoring techniques are the most often used censoring schemes. In Type I censoring, the event is observed only if it occurs prior to some pre-specified time, whereas, in Type II censoring, the study continues until the predetermined number of individuals are observed to have failed. Random censoring is an another important censoring technique in the literature, this censoring scheme occurs when the subject under study is lost or removed from the experiment before its failure or event of intrest. This type of censoring commonly arises in medical time-to-event studies for example in clinical trials some patients do not complete the course of treatment and leave before the termination point. Therefore, the subject who leaves the study area before the event of interest occurs has a randomly censored value. The random censoring was introduced in literature by [1], he did so as part of his doctoral dissertation. For more details about the censoring schemes, their generalization, and analysis, one can refer to [2].

Randomly censored lifetime data frequently occur in many applications like medical science, biology, reliability studies, etc., which need to be analysed properly to make correct inferences and suitable research conclusions. These data are often right censored because it is not possible to observe the patients or the items under study until their death or patients may withdraw during the study period. In the existing literature, the random censoring scheme is widely studied under continuous models [see [3]].

In recent years, researchers in a variety of domains have acknowledged the distinctive sig�nificance played by discrete distributions. In certain circumstances, discrete distributions are more suitable than continuous distributions, even if the data is collected on a continuous scale. Also, discrete distributions are the only choice if the number of completed cycles of operation is used to measure the lifetime of different types of equipment [see [4],[5],[6] and references cited therein]. Most of the discrete models in the present literature were designed to fit the count data primarily, and in most cases, they fail to capture the diversity of the censored data. In literature, a few studies considered a random censoring scheme for discrete models, viz. [7] , [8], [9], and recently [10], discussed inferences of discrete inverted Nadarajah-Haghighi distribution with complete and random censored data. Due to the fact that most of the discrete distributions do not sufficiently portray the variety of real-world censored data, there is always a need for novel discrete distributions that can fit censored data adequately. One of such discrete distributions is the Discrete Teissier (DT) distribution proposed by [11], which provides the flexibility to fit the censored data with just a single parameter. Moreover, it can also model equi-, over-, and under�dispersed, positively skewed, negatively skewed, and increasing failure data. The probability mass function (PMF) of DT distribution is given by,

py = P[Y = y]= exp(1) exp(.y)(exp(.e.y) . exp(. . e.(y+1))); y = 0, 1, 2, ..., . > 0. (1)

Putting . = exp(.), the PMF (1) can be written as

py = P[Y = y]= exp(1).y(exp(..y) . . exp(..(y+1))); y = 0, 1, 2, ..., . > 1. (2)

The cumulative distribution function (CDF) corresponding to PMF (2) is

F(y)= 1 . .y+1 exp(1 . .(y+1)); y = 0, 1, 2, ..., . > 1. (3)

In this paper, we investigate the features of the DT distribution under randomly censored data. The article is organized as follows: The maximum likelihood estimator (MLE) for the model�s parameter under randomly right-censored data is discussed in Section 2. Section 3 deals with the Bayesian estimation of the unknown parameter. The algorithm to generate censored data from the proposed model is given in Section 4. We use a Monte Carlo simulation analysis in Section 5 to investigate the characteristics of the different estimates established in the previous sections. Section 6 deals with the real data analysis to study the applications of random censoring in DT distribution. Finally, some concluding remarks are given in Section 7.

2. Method of maximum likelihood

2.1. Point Estimation

In this part, we compute the maximum likelihood point estimator for the DT distribution�s parameter . in the presence of random censored data. Let yi be the ith individual lifetime. In the presence of right-censored observations, the ith individual contributes to the likelihood function (LF) based on a random sample (yi, di) of size n as follows:

Li =[p(yi)]di [S(yi)]1.di ,

where S(yi) is the survival function and di is a censoring indicator variable, that is, di = 1 for an observed lifetime and di = 0 for a censored lifetime (i = 1, 2, ..., n). Then, for the DT distribution under random censoring, the LF of . is given by

di i=1 (yi.di+1) .yi+1 .n

L(y, .)= exp(n)..n exp . .n exp(.yi+1 . .yi ) . . . (4)

i=1 i=1

The log likelihood (LL) function corresponding to LF (4) is

LL = n + .n (yi . di + 1) log . . .n .yi+1 + .n di log exp(.yi+1 . .yi ) . . . (5)

i=1 i=1 i=1

Taking the partial derivative of the LL function (5) with respect to the parameter ., we get the following normal equation,

.LL 1 di(.E1 + E3)

= .n (yi . di + 1) . .n (E2 + 1) . .n = 0. (6)

i=1 i=1 i=1

.. . 1 . .E1

where E1 = exp(.yi . .yi+1), E2 =(yi + 1) .yi+1 . 1 and E3 = .yi (yi . (yi + 1) .). The MLE of the parameter . can be obtained by simplifying Equation (6); however, this equation does not provide an analytical solution. As a result, we employ an iterative method such as Newton-Raphson (NR) to compute the estimate computationally using built-in codes in R software.

2.2. Interval Estimation

The MLE of the unknown parameter . is not found in closed form, hence exact distribution of MLE of . cannot be derived. Therefore, it is infeasible to compute exact confidence interval for .. Hence, we will construct the asymptotic confidence interval (ACI) for . using the asymptotic distribution of MLE of .. We know, the MLE .. of ., is consistent and asymptotic Gaussian

v. .2 LL

distribution with n(.. . .) follows N(0, I.1(.)), where I(.)= E ..2 . Therefore, the

.2 log L

variance of the estimator .. can be computed as V(..) . J.1(..) where J(..)= . ..2 .=... The

second-order partial derivative of LL function (5) is

n nn 2

.2 LL di((1..E1)(E32.yi(E3..yi )).(.E1+E3))

..2 = . . 12 . (yi . di + 1) . . 12 . yi(1 + E2)+ . 12 . (1..E1)2. i=1 i=1 i=1 Hence, the 100 . (1 . .)% ACI for the parameter . is

.. . Z./2 V(..),

where Z./2 is the upper ./2 quantile of the standard Gaussian distribution.

3. Bayesian estimation

The Bayesian estimation blends prior and experimental information in terms of prior density and LF, respectively, to derive posterior inferences about the unknown quantities. The prior infor�mation is generally divided into two categories: informative priors and non-informative priors. Here, we will perform Bayesian estimation using both informative and non-informative priors to obtain Bayes estimators of the unknown parameter. Furthermore, the highest posterior density (HPD) interval for the parameter . is also derived.

Case 1: When a probability distribution for the parameter . provides adequate and full information, informative prior (IP) is used. In this scenario, we suppose . has an exponential prior distribution with a density as

g(.)= .e..(..1); . > 1, . > 0. (7)

By combining prior distribution (7) with the LF (4) using the Bayes rule, the posterior distribution of . given data is

n dii=1 (yi.di+1)

P1(.|y) . ..n exp . . .yi+1 . .. .n (exp(.(yi+1) . .yi ) . .) (8)

i=1 i=1

A loss function reflects the statistical risk (error) that arises while estimating parameters. It is a function of true and estimated parameters and is used to choose the best estimator with the lowest risk. The squared error loss function (SELF), which gives equal weight to overestimation and underestimation, is one of the most often used loss functions in literature. The Bayes estimator of a parameter under SELF is simply the expectation of that parameter with respect to its posterior distribution.

In the case of proposed distribution, the Bayes estimator of a function of the parameter . under SELF, say .(.) is ..(.)= . . .(.)P1(.|y)d.. (9)

1

The integral (9) cannot be given explicitly because the posterior distribution (8) is not in closed form. In this case, we may use a family of Markov Chain Monte Carlo (MCMC) algorithms to mimic draws from a posterior distribution. The Metropolis-Hastings (MH) algorithm [[12] and [13]] is a prominent approach in MCMC that generates a chain of random samples based on a given function, which may then be used to get Bayes estimates of interest.

To implement the MH algorithm for the proposed model, we go through the following steps: Step 1. Set initial value of . as .(0) and begin with i = 1.

.(i.1), ..(i)

Step 2. Propose a move ..(i), with candidate proposal density g . Step 3. Calculate the Hastings ratio

P1 ..(i)|yg ..(i), .(i.1) .(i.1), ..(i)

. = . P1 .(i.1)|yg .(i.1), ..(i)

Step 4. Accept the proposed move ..(i) with probability . = U (0, 1) . min 1, ..(i.1), ..(i) and reject with probability 1 . ..

Step 5. Set i = i + 1. Step 6. Repeat steps 2-5 for all i = 1, 2, 3, ..., M where M is large, and simulate the sequence of samples of .(i), i = 1, 2, 3, ..., M.

Step 7. The Bayes estimator of . under SELF is calculated as

M .(i)

.. = 1 . ,

M . m

i=(m+1)

where m is the burn-in iterations of the Markov Chain.

To compute the HPD interval for ., let .(m+1) . .(m+2) . ... . .(M) denote the ordered values of .m+1, .m+2, ..., .M . Then, by [14] algorithm, the 100 . (1 . .)% HPD interval for . is (.(m+i.), .(m. i.+[(1..)(M.m)])), where i. is chosen so that,

.(m+i.+[(1..)(M.m)]) . .(m+i.)= min (.(m+i+[(1..)(M.m)]) . .(m+i)).

m.i.(M.m).[(1..)(M.m)]

Case 2: In non-informative prior (NIP), least or no information is available about the unknown parameter. For the proposed model, we perform the Bayesian analysis, when . has NIP of the following form,

1

g(.) . ; . > 1. (10)

. The un-normalized posterior distribution of . given data is computed by combining prior distribution (10) with LF (4).

n dii=1 (yi.di)+(n.1) .yi+1

P2(.|y) . ..n exp . . . .ni=1 (exp(.(yi+1) . .yi ) . .) . (11) i=1

Since the posterior distribution P2(.|y) is again in non-closure form, so the Bayes estimator of . is cannot be solved analytically. Therefore, using a similar algorithm as we have done in case 1, we can obtain the required point and interval estimates.

4. Algorithm to simulate random right censored data

In this section, we present a simple algorithm to generate the randomly right-censored data from the proposed model [15]. The algorithm consists of the following steps: Step 1. Fix the value of the parameter .. Step 2. Draw n random pseudo from Uniform(0,1) i.e. ui . U(0, 1); i = 1, 2, ..., n. Step 3. Obtain y ' = F.1(ui); i = 1, 2, ..., n, where F.1(�) is defined in Equation (6).

i Step 4. Draw n random pseudo from ci . U(0, max(yi ' )); i = 1, 2, ..., n. This is the distribution that controls the censorship mechanism. Step 5. If yi '. ci, then yi =[yi ' ] and di = 1, i = 1, 2, ..., n, else, yi =[ci] and di = 0, i = 1, 2, ..., n. Hence, pairs of values (y1, d1), (y2, d2), ..., (yn, dn) are obtained as the random right-censored data.

5. Simulation study

The performance of the MLE and Bayes (IP and NIP under SELF) estimator under randomly right-censored data is investigated in this section via a simulation study. The whole study is based on random samples drawn from the DT distribution of sizes 20, 25,...,150. The parametric values of the parameter are taken as 1.05, 1.50, 2.0, and 3.0. To produce the needed random variable Y from the DT distribution, we employed the conventional strategy of first drawing the pseudo-random value X from the continuous Teissier distribution and then discretizing this value to store Y. A random variable X may be generated by using the following formula:

1

Q(u)= log .W.1 u.1 ;0 < u < 1,

exp(1)

.

where . = exp(.) and W.1 denotes the Lambert function and its value can be easily obtained by the inbuilt R-function lambertWm1 available in the package lamW. The method described in Section 4 is utilized to produce the required random right-censored data. All simulation results are based on 2000 replicates for the different sample sizes considered for each parameter setting. We have calculated the mean-squared error (MSE) and the average absolute bias (AB) for MLE and Bayes point estimates and average width (AW) of 95% ACI and HPD intervals with their respective coverage probability (CP) based on these 2000 values, and the resulting findings are shown in Figures 1-3. Notably, when Bayesian estimation is used, an estimate for the parameter of the DT distribution is made using an exponential prior as an IP and a uniform prior as a NIP. Under exponential prior, the value of the hyper-parameter is calculated so that the expectation of the related prior density of the unknown parameter is equal to its actual parametric value. In this estimation scenario, we drew 51,000 MCMC samples for the parameter of the proposed distribution using the MH algorithm, excluding the first 11,000 samples as a burn-in phase to eliminate the effect of initial values. Additionally, to nullify the autocorrelation between successive draws, every tenth observation has been preserved. We have finally calculated the posterior quantities of interest by using generated posterior samples.

The following are some important inferences that are drawn from Figures 1-3:

The MLE and Bayes estimator of the unknown parameters show the consistency property, i.e., the MSE reduces as the sample size rises.

As n becomes larger, the average AB approaches zero.

The Bayes estimator with IP performs better as compared to the MLE and Bayes estimator with NIP.

The AW of HPD intervals under IP is lesser than those obtain under ACI and HPD with NIP.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

For large values of the parameter ., all estimation methods produce nearly similar results. A similar trend is observed when sample size n becomes large.

Figure 1: The MLE and Bayes estimate for (i) . =1.05 (ii) . =1.50 (iii) . =2.0 (iv) . =3.0.

Figure 2: The classical AW and CP for (i) . =1.05 (ii) . =1.50 (iii) . =2.0 (iv) . =3.0.

6. Application to random censored data

Here, we examine two real datasets to demonstrate the applicability of the DT model to censored data. These data sets along with their fitting are described as follows:

The first data set (I): This data set consists of failure times for Epoxy Insulation Specimens at the voltage level 57.5 Kv [see [16], pp. 335]. The failure times, in minutes, for the insulation specimens are given below (censoring times are indicated with asterisks) 510, 1000*, 252, 408, 528, 690, 900*, 714, 348, 546, 174, 696, 294, 234, 288, 444, 390, 168, 558, 288. Using Kolmogorov-Smirnov (K-S) statistics, we now evaluate the suitability of the DT distribution for modelling the above data. The K-S statistic and associated p-values of 0.17219 and 0.5936 indicate that the proposed model with MLE and associated standard error (SE) in parenthesis is 1.00195 (0.00018) sufficiently reflects the diversity of the data. Figure 4 (upper left panel)

Figure 3: The HPD AW and CP for (i) . =1.05 (ii) . =1.50 (iii) . =2.0 (iv) . =3.0.

depicts the unique existence of MLE, whereas Figure 4 (upper right panel) demonstrates that the suggested model captures the data accurately. In addition, Table 1 displays the ACI, Bayes estimates, HPD interval with NIPs, and K-S statistics with its p-value.

The second data set (II): This data come from a nine-month study on the effect of known carcinogens DES and DMBA in the induction of mammary tumors in female rats [see [16], pp. 339]. After treatment, the times to tumor appearance for the animals were noted. The censored observations are indicated by asterisks. The data values are 57*, 67*, 88, 94, 100, 107, 113, 123, 123, 125, 129, 129, 129, 136, 136, 143, 144, 191, 191, 192, 211, 218, 266*, 266*. For this data, the MLE (SE) of the parameter . is 1.00658(0.00058). Now, using this estimate for the considered data, the K-S statistics and associated p-value are 0.2354 and 0.1396, respectively. This well-known goodness-of-fit measure indicates that the suggested discrete model is adequate for modelling the given censored data. The unique existence of the MLE can be verified by Figure 4 (lower left panel). Graphically, from Figure 4 (lower right panel), we can conclude that the DT model closely follows the pattern of this censored data. Also, we have obtained the ACI, Bayes estimates and HPD interval with NIPs, and the K-S statistics with its p-value, and they can be viewed in Table 1.

Table 1: Classical and Bayesian estimates of censored data set I and II .

Data set Estimates K-S P-value

MLE (SE) 1.00193 (0.00018) 0.17219 0.5936

I ACI [1.00158, 1.00228]

Bayes (SE) 1.00199 (0.00028) 0.1596 0.68790

HPD [1.00148, 1.00246]

MLE (SE) 1.00658(0.00058) 0.23549 0.13960

II ACI [1.00546, 1.00773]

Bayes (SE) 1.00660 (0.00061) 0.23465 0.14220

HPD [1.00549, 1.00766]

Figure 4: The �LL and CDFs plots for data set I and II.

7. Conclusion

In this article, the one-parameter DT distribution introduced by [11] was studied, taking into account the use of right-censored data. We use both classical and Bayesian methods to estimate the unknown parameter of the DT distribution. Furthermore, an algorithm to produce randomly right-censored data is also provided. An extensive simulation study is presented for the assess�ment of the various estimation procedures under censored data. Finally, the uselfuness of the proposed model is illustrated with two examples considering right censored real data sets. The study suggested that the proposed model can be used to analyse randomly right-censored data generated from various domains. Moreover, the DT distribution has the potential to attract more comprehensive applications in a variety of fields. A future plan of action regarding the current study might be an examination of the other types of censored data using the proposed model.

Conflict of interest

The authors declare no conflicts of interest.

References

[1] Gilbert, J. P. (1962). Random censorship. The University of Chicago.

[2] Klein, J. P., & Moeschberger, M. L. (2003). Survival analysis: techniques for censored and truncated data (Vol. 1230). New York: Springer.

[3] Garg, R., Dube, M.,& Krishna, H. (2020). Estimation of parameters and reliability charac�teristics in Lindley distribution using randomly censored data. Statistics, Optimization & Information Computing, 8(1), 80-97.

[4] El-Morshedy, M., Eliwa, M. S., & Tyagi, A. (2022). A discrete analogue of odd Weibull-G family of distributions: properties, classical and Bayesian estimation with applications to count data. Journal of Applied Statistics, 49(11), 2928-2952.

[5] Pandey, A., Singh, R. P., & Tyagi, A. (2022). AN INFERENTIAL STUDY OF DISCRETE BURR-HATKE EXPONENTIAL DISTRIBUTION UNDER COMPLETE AND CENSORED DATA. Reliability: Theory & Applications, 17(4 (71)), 109-122.

[6] Eliwa, M. S., Tyagi, A., Almohaimeed, B., & El-Morshedy, M. (2022). Modelling coronavirus and larvae Pyrausta data: A discrete binomial exponential II distribution with properties, classical and Bayesian estimation. Axioms, 11(11), 646.

[7] Krishna, H., & Goel, N. (2017). Maximum likelihood and Bayes estimation in randomly censored geometric distribution. Journal of Probability and Statistics, 2017.

[8] de Oliveira, R. P., de Oliveira Peres, M. V., Martinez, E. Z., & Achcar, J. A. (2019). Use of a discrete Sushila distribution in the analysis of right-censored lifetime data. Model Assisted Statistics and Applications, 14(3), 255-268.

[9] Achcar, J. A., Martinez, E. Z., de Freitas, B. C. L., & de Oliveira Peres, M. V. (2021). Classical and Bayesian inference approaches for the exponentiated discrete Weibull model with censored data and a cure fraction. Pakistan Journal of Statistics and Operation Research, 467-481.

[10] Singh, B., Singh, R. P., Nayal, A. S., & Tyagi, A. (2022). Discrete Inverted Nadarajah-Haghighi Distribution: Properties and Classical Estimation with Application to Complete and Censored data. Statistics, Optimization & Information Computing, 10(4), 1293-1313.

[11] Singh, B., Agiwal, V., Nayal, A. S., & Tyagi, A. (2022). A Discrete Analogue of Teissier Distribution: Properties and Classical Estimation with Application to Count Data. Reliability: Theory & Applications, 17(1 (67)), 340-355.

[12] Metropolis N, Ulam S (1949). The Monte Carlo method. J Am Stat Assoc 44:335�341.

[13] Hastings, WK. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97�109.

[14] Chen M. H & Shao, Q., M. (1999). Monte Carlo estimation of Bayesian credible intervals and HPD intervals. J Comput Graph Stat., 8(1):69�92.

[15] Ramos, P. L., Guzman, D. C., Mota, A. L., Rodrigues, F. A., & Louzada, F. (2020). Sampling with censored data: a practical guide. arXiv preprint arXiv:2011.08417.

[16] Lawless, J. F. (2003). Statistical models and methods for lifetime data (Vol. 362). John Wiley & Sons.

i Надоели баннеры? Вы всегда можете отключить рекламу.