Научная статья на тему 'A Discrete Analogue of Teissier Distribution: Properties and Classical Estimation with Application to Count Data'

A Discrete Analogue of Teissier Distribution: Properties and Classical Estimation with Application to Count Data Текст научной статьи по специальности «Математика»

CC BY
498
71
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
COVID-19 / Discrete Teissier distribution / Maximum Likelihood estimation / Method of moment estimation / Least square estimation

Аннотация научной статьи по математике, автор научной работы — Bhupendra Singh, Varun Agiwal, Amit Singh Nayal, Abhishek Tyagi

This article presents a novel discrete distribution with a single parameter, called the discrete Teissier distribution. It is noted that this model, with one parameter, offers a high degree of fitting flexibility as it is capable of modelling equi-, over-, and under-dispersed, positive and negative skewed, and increasing failure rate datasets. In this article, we have explored its numerous essential distributional features such as recurrence relation, moments, generating function, index of dispersion, coefficient of variation, entropy, survival and hazard rate functions, mean residual life and mean past life functions, stress-strength reliability, order statistics, and infinite divisibility. The classical point estimators have been developed using the method of maximum likelihood, method of moment, and least-squares estimation, whilst an interval estimation based on Fisher’s information has also been presented. Finally, the applicability of the suggested discrete model has been demonstrated using two complete real datasets.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «A Discrete Analogue of Teissier Distribution: Properties and Classical Estimation with Application to Count Data»

„ . .,?,.. ^ ^ Volume 17, March 2022

Estimation with Application to Count Data

A Discrete Analogue of Teissier Distribution: Properties and Classical Estimation with Application to Count Data

1*

Bhupendra Singh1, Varun Agiwal2, Amit Singh Nayal1, Abhishek Tyagi

■^Department of Statistics, Chaudhary Charan Singh University, Meerut-250004, India 2

Indian Institute of Public Health, Hyderabad, Telangana, India.

bhupendra.rana@gmail.com varunagiwal.stats@gmail.com

amitnayal009@gmail.com *abhishektyagi033@gmail.com

Abstract

This article presents a novel discrete distribution with a single parameter, called the discrete Teissier distribution. It is noted that this model, with one parameter, offers a high degree of fitting flexibility as it is capable of modelling equi-, over-, and under-dispersed, positive and negative skewed, and increasing failure rate datasets. In this article, we have explored its numerous essential distributional features such as recurrence relation, moments, generating function, index of dispersion, coefficient of variation, entropy, survival and hazard rate functions, mean residual life and mean past life functions, stress-strength reliability, order statistics, and infinite divisibility. The classical point estimators have been developed using the method of maximum likelihood, method of moment, and least-squares estimation, whilst an interval estimation based on Fisher's information has also been presented. Finally, the applicability of the suggested discrete model has been demonstrated using two complete real datasets.

Keywords: COVID-19; Discrete Teissier distribution; Maximum Likelihood estimation; Method of moment estimation; Least square estimation

1. Introduction

In today's competitive world, the data generated from numerous sectors such as engineering, finance, and medical science, among others, is getting increasingly complicated. Therefore, we need distributions that are best suited for the analysis of this complex data. As a result, during the last three decades, developing a new probability distribution has become a major focus of statistical study. However, much of this research has focused on developing continuous probability distributions. But, there may be situations when discrete distributions are better appropriate for data modelling, or when the data generated is discrete. For example, in reliability engineering, the number of successful cycles before failure when a device is working in the cycle, the number of times a device is switched on/off; in survival analysis, the survival times for those suffering from diseases such as lung cancer or the period from remission to relapse may be recorded as the number of days/weeks, the number of deaths, or daily cases due to the COVID-19 pandemic observed over a specified duration, etc. Furthermore, the count phenomenon arises in many practical situations, such as the number of earthquakes that occur in a calendar year, the number of absences, the number of accidents, the number of species types in ecology, the number of insurance claims, and so on. Hence, it seems reasonable to model such scenarios using appropriate discrete distributions.

„ . .,?,.. ^ ^ Volume 17, March 2022

Estimation with Application to Count Data

Due to the fact that conventional discrete distributions such as the Binomial, Poisson, Geometric, and Negative Binomial were insufficient to model a variety of discrete data. [21] suggested a novel approach in order to build a new discrete model through the survival function of a continuous model. [3] named this approach the survival discretization method. One of the most significant advantages of this technique is that the discrete distribution that has been developed preserves the same functional form of the survival function as its continuous counterpart. As a result of this feature, the various reliability properties of the distribution remain unaltered. According to this methodology, for a given continuous random variable (RV) X with survival function (SF) SX(x) = P(X > x), the discretized version can be derived as

P(Y = y) = P(y < X < y + 1)

= Sx(y) - Sx(y + 1); y = 0,1,2,3,... ()

Over the last two decades, this approach has gotten a lot of attention. Using this technique, [21] gave a discretized version of the normal distribution. Following this, [22] obtained discrete Rayleigh distribution. A comprehensive analysis of the evolution of the discrete distribution up to 2014 was provided by [3]. Then afterwards, a large number of significant discrete distributions have emerged in the literature. For example,[1], [11],[29], [28], [6], and the references cited therein. Most recently, [7] gave a discrete analogue of the odd Weibull-G family of distributions. They discussed the classical and Bayesian estimation and showed the applicability of the proposed family to count datasets.

In this paper, we have proposed the discrete analogue of the Teissier model [27] named discrete Teissier (DT) distribution using the survival discretization method. Recently, the Teissier distribution comes light when [26] introduced a two-parameter exponentiated Teissier distribution. The main objectives of proposing the DT model can be summarized as follows:

• An important objective of the proposed study is to provide a discrete model that has greater flexibility with less number of parameters so that the form of various distributional characteristics is easily manageable and easy to analyze the real datasets.

• The discrete data generated from many practical studies, such as mortality experiments, industrial experiments, etc., show constant or increasing failure rates, so we want to develop a discrete model with a monotonically increasing failure rate function.

• To produce a model that not only fit an equi-, over-, and under-dispersed real data, that is also capable of modelling a positively skewed, negatively skewed, platykurtic, and leptokurtic dataset.

• To provide consistently better fits than other well-known discrete models in the existing statistical literature.

The rest of the article is organized as follows: Section 2 introduces the one-parameter DT distribution. In Section 3 some important distributional and reliability characteristics are studied. In section 4, we estimate the parameter of DT distribution by different classical methods. In Section 5, numerical illustrations using empirical and real datasets have been presented. Finally, some concluding remarks are given in Section 6.

2. Discrete Teissier distribution

If X follows univariate continuous Teissier distribution with parameter a then its probability

density function (PDF) and SF can be written as

f (x, a) = a(exp(ax) - 1) exp(ax - eax + 1); a > 0,x > 0, (2)

S(x) = exp(ax - eax + 1); a > 0,x > 0. (3)

Using the survival discretization approach (1), the DT distribution can be obtained as

py = P[Y = y] = Sx(y) - Sx(y + 1)

= exp(1) exp(ay)(exp(-eay) - exp(a - ea(y+1)));y = 0,1,2,...,a > 0.

A Discrete Analogue of Teissier Distribution: Properties and Classical ,, , i

„ . .,?,.. л- Т-Ч Volume 17, March 2022

Estimation with Application to Count Data

For ease of notation, after re-parametrization 9 = exp(a), the probability mass function (PMF) in (4) can be written as

py = P[Y = y] = exp(1)9y(exp(-9y) - 9exp(-9(y+1)));y = 0,1,2,...,9 > 1. (5)

The cumulative distribution function (CDF) corresponding to PMF (5) is

F(x) = 1 - 9y+1exp(1 - 9(y+1));y = 0,1,2,...,9 > 1. (6)

3. Statistical properties

3.1. The Shape of the Probability Mass Function

The PMF plots of the DT distribution for different parametric values are shown in Figure 1. The PMF of the suggested distribution may exhibit decreasing, bell-shaped, and unimodal (right-skewed) shapes, as seen in Figure 1. Furthermore, when 9 is increased, the degree of asymmetry and peakedness of the PMF increases. The limiting behavior of DT distribution for various choices

6=1.01

e=i.i

6=1.5

100 У

10 У

4 6 У

6=1.75

6=2

6=4

0

50

150 200

0

5

15

20

0

2

8

10

У

У

У

Figure 1: The shapes of PMF of DT distribution for various values of the parameter 9. of parameters at the boundary points is:

(i). lim py = 0, (ii). lim py = 0, (iii). lim py = 1, for y = 0 and lim py = 0, otherwise.

y 9^1 J y

3.2. Recurrence Relation for Probabilities

The recursive relation shown below can be used to calculate probability mass for various values of ^

P[Y = y + 1] = 9(exp(-9(y+1)) - 9exp(-9(y+2))) p[y = y] [ 9 ] (exp(-9y) - 9exp(-9(y+1)) [

It can be easily verifiable that [ PY(y)]2 > PY(y + 1)-PY(y - 1) for all y. Hence, the DT distribution is log-concave. This concavity implies that the proposed distribution has a non-decreasing failure rate, strongly unimodal, remains log-concave if truncated and its all the moments exists. The convolution of the proposed model with any other discrete distribution is also unimodal and log-concave ([13],[10]).

„ . .,?,.. ^ ^ Volume 17, March 2022 Estimation with Application to Count Data

3.3. Moments and related concepts

The moments of a probability distribution are important for measuring its different properties such as mean, variance, skewness, kurtosis, etc. The rth raw moments of the DT distribution can be obtained by using the relation

œ

= E(Yr) = E yrPy

y=0

(7)

, ~ M t ,k (yr-(y-1)r)e(k+1)y = exp(1) E E (-1) (y (y u))-.

y=1 k=0 U

Using Equation (7), the first four raw moments of the DT distribution are

m m a(k+1)y

©/ = E(Y) = exp(1) E E (-1)', (8)

y=1k=0

m m a(k+1)y

= E(Y2) = exp(1) E E (-1)'(2y - 1)%, (9)

y=1k=0

mm , , a(k+1)y

= E(Y3) = exp(1) E E (-1)' (3y2 - 3y + 0 %—, (10)

y=1k=0 v 7 ^

œ œ

= E(Y4) = exp(1) E E (-1)' (4y3 - 6y2 + 4y - 1 . (n)

y=1k=0 v y f

The variance of the DT distribution is

œ œ û(k+1)y

V(Y) = exp(1) E E (-1)'(2y - 1)

y=1k=0 L-

exp(1) EE (-1)' S(i+,)y

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

y=1k=0

L-

Using the raw moments in (8)-(11), we can easily find the skewness (Sk) and kurtosis (Kur) from the following relations

©3 - 3®2©/ + 2(®i)3 - 4©/+ 6©/(©/V - 3(©/V

Sk =-^—and Kur — \ J \ J

(Var(Y))3/2 (Var(Y))2

respectively.

The moment generating function (MGF) is an alternative representation of a probability distribution. It is an important tool to obtain various distributional characteristics. For the

proposed model, it can be obtained as

m

My(t) = E [exp(ty)] = E exp(ty)py

y=0 .

M

= 1 + exp(1)0 (exp(t) - 1) E exp(-0y).(0exp(t))y-1

y=1

The index of dispersion (IOD) is a technique for determining whether a data is equi-, under or over-dispersed. If the IOD>1(<1), it indicates the over-dispersion, (under-dispersion), while if IOD=1, it is equi-dispersed. In the case of the proposed model, the IOD is

t/ E E (-1)'(2y - 1)^ - exp(1)

IOD_Var(Y)_ y=i k=0 u

, 2

k e(-+1)y '

E E (-1)

y=1 k=0 U

E (Y) EE (-1)-^

y=1 k=0 u

2

A Discrete Analogue of Teissier Distribution: Properties and Classical ,, , i

„ . .,?,.. ^ ^ Volume 17, March 2022

Estimation with Application to Count Data

The coefficient of variation (CV) is a relative measure of dispersion and is generally used to compare two independent samples based on their variability. The higher value of CV indicates higher variability. For DT distribution, the CV can be obtained as

n2\ 1/2

£ £ (-1)k(2y - 1)^ - exp(1) Cy _ (Var(Y))1/2 _ Vy=1 k=0 ""

£ £ (-1)k^

y=1 k=0 U

E (Y) ££ (—^ •

y=1 k=0 u

It is not possible to get a closed-form of the above expressions, therefore, we use R software to demonstrate these characteristics numerically. Table 1 lists some numerical results of the mean, variance, skewness, kurtosis, IOD, and CV for the DT distribution under different setups of parametric values. From this table, it can be concluded that:

• The mean of the DT distribution decreases when the value of 9 increases.

• From the observed values of skewness, we can conclude that the DT distribution can be used to model positively and negatively skewed data.

• The proposed model is appropriate for modelling leptokurtic and platykurtic datasets.

• The DT distribution can be used to analyze over-dispersed, under-dispersed, and equi-dispersed datasets.

• As the value of 9 rises, the CV tends to increase.

Table 1: Descriptive measures at different values of the parameter 9.

9 Descriptive Measures

Mean Variance Skewness Kurtosis IOD CV

1.001 98.8320 8.2885 -20.5690 469.1044 0.0838 0.0291

1.005 94.5488 199.106 -3.6319 13.3240 2.1058 0.1492

1.010 81.4812 575.2569 -1.2399 0.4283 7.0599 0.2943

1.050 19.9959 81.0311 0.2090 -0.4210 4.0523 0.4501

1.100 9.9920 21.2957 0.2081 -0.4187 2.1312 0.4618

1.248 4.0301 4.0376 0.2035 -0.4075 1.0018 0.4985

1.750 1.2866 0.6969 0.1956 -0.4256 0.5416 0.6488

2.000 0.9422 0.4820 0.2086 -0.5048 0.5115 0.7368

2.500 0.5906 0.3074 0.2159 -0.9031 0.5205 0.9387

3.4. Entropy

Entropy is a crucial measure of complexity and uncertainty and is used in many fields including problems identification in statistics, statistical inference, physics, econometrics, and pattern recognition in computer science. One of the important entropy is Rényi entropy (RE) (see, [20]). For the DT distribution, the RE can be defined as (p > 0, p = 1)

IR (p)= log £=0 PPp

= 1-p (p + logE=0 9py(exp(-9y) - 9exp(-9(y+1)))p) .

Another famous entropy called Shannon entropy (ShE) can be obtained as a particular case of RE as p ^ 1, where ShE = - E[logP(y; a)].

3.5. Survival and hazard rate functions

The SF and hazard rate function (HRF) of the DT distribution is respectively given by, S (y; 9) = P(Y > y) = 9y exp(1 - 9y); y = 0,1,2,...,

„ . .,?,.. ^ ^ Volume 17, March 2022 Estimation with Application to Count Data

H(y; 6) = P(Y = y|Y > y) = 1 - 6 exp(6y - 6(y+x)); y = 0,1,2,....

Figure 2 depicts various plots of HRF of the proposed model. From the HRF plot, it is easily visible that the HRF of the DT distribution is increasing. Also, lim H(y; 6) = lim H(y; 6) =

y—m 6—^m

lim H(y; 6) = 1. Moreover, the reversed hazard rate function (RHRF) and the second rate of failure (SRF) of the proposed model are

H (y;6) = P(Y=yiY < y) = -ff-T"; y=0,1,2.....

and

respectively.

H** (y; e ) = log

S(y)

l_s(y + 1)J

ey (e -1) - log e; y = 0,1,2,

6=1.01

6=1.1

6=1.4

.......I

1-1-1-1—

0 50 100 150 200

i-1-T

0 5 10 15 20

0 5 10 15 20

y

6=1.6

6=1.

6=2

0 5 10 15 20 y

0 5 10 15 20 y

0 5 10 15 20 y

Figure 2: The shapes of HRF of DTd/stnbwh'onfor van'ows values of the parameter d.

y

y

3.6. Mean residual lifetime and mean past lifetime function

The mean residual life (MRL) function is used extensively in a wide variety of areas, including reliability engineering, survival analysis, and biomedical research since it represents the ageing mechanism. It is well known that the MRL function characterizes the distribution function F uniquely since it contains all of the model's information. In discrete setup, the MRL, symbolized by m(i), can be defined as

m(i) = E(Y - i|Y > i) = S77) E SO'); ' = 0,1,2,...

S(i) j=i+1

If Y has DT distribution with parameter e, then the MRL function of Y is

m(0 = e. exp(-e'+1 )E e ' exp(-e'+1

The expected inactivity time function or mean past life (MPL) function, denoted by m* (i), measures the time elapsed since the failure of X given that the system has failed sometime before

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

A Discrete Analogue of Teissier Distribution: Properties and Classical ,, , i

„ . .,?,.. ^ ^ Volume 17, March 2022

Estimation with Application to Count Data

' i'. It has many applications in a wide variety of areas, including reliability theory and survival analysis, actuarial research, and forensic science. In discrete setup, MPL function is defined as

1 i

m*(i) = E(i — X|X < i) = -—- £ F(k — 1); i = 1,2.....

F(i - 1) k=i

By replacing the CDF (6) in the expression of m* (i), we can easily obtain the MPL for the proposed

model.

3.7. Stress-strength analysis

The stress-strength (S — S*) analysis is widely applicable in various areas including engineering, medical science, psychology etc. The probability of failure is based on the probability of S exceeding S*. Suppose that the domain of S and S* is positive, then the S — S* reliability (R) can be computed as

TO

R = P[Ys < Ys*] = £ Pys (y)Sys*.

y=0

If YS ~DT(0i) and YS* ~DT(d2), then R can be expressed as

TO / \

R = 02 exp(2) £ (0102)y exp(—0^+1) (exp(— 6() — 0iexp(— 0^+1)). (12)

y=0

Given the difficulty of obtaining an explicit expression for R in this instance, we show this feature quantitatively using the R software. Tables 2 illustrates the calculated values of R for various parameter combinations. From Table 2, we infer that for a fixed value of 02, reliability increases as 0i increases, whereas for the particular value of 01, R ^ 0 , as 02 ^ to.

Table 2: The numerical values of R for fixed values of 01 and 82-

Parameter 02

1.001 1.010 1.050 1.250 1.500

1.001 0.02093 0.00617 0.00024 0.00001 0.00000

1.010 0.98078 0.49681 0.02554 0.00100 0.00026

01 1.050 0.99974 0.97225 0.48476 0.02531 0.00636

1.250 0.99999 0.99855 0.96300 0.43096 0.13811

1.500 0.99999 0.99950 0.98739 0.73634 0.37738

3.8. Order statistics

The order statistics play a vital role in the construction of tolerance intervals for the distributions and drawing inferences on population parameters especially in survival analysis. Let Y1, Y2,..., Yn be a random sample from the DT distribution. Also, let Y(X), Y(2),..., Y(n) represents the corresponding order statistics. Then the CDF of the rth order statistic say W = Y(r) is given by

n

(W " '

Fr (w) = t(n)P (w).[1 — F(w)]n—l

i=r ^ '

= (—1)k (n) (n— 1) (1—0(W+1) exp(1—0(w+1)))i+k. (13)

i=r k=0

A Discrete Analogue of Teissier Distribution: Properties and Classical ,, , i

„ . .,?,.. ^ ^ Volume 17, March 2022

Estimation with Application to Count Data

The corresponding PMF of rth order statistics is

fr(w) = Fr (w) - Fr (w - 1)

= EE (-1)^ " ) ( Y ^ {(1 - 6w+1exp(1 - 6(w+1)))i+fc - (1 - 6w exp(1 - 6w ))i+fc ! ' (14)

Particularly, by setting r = 1 and r = n in Equation (14), we can obtain the PMF of minimum

Q Y(!),..., Y(n) jj and the PMF of maximum ^ j Y^),..., Y(n) j j, respectively.

3.9. Infinite divisibility

In this section, the property of infinite divisibility of the DT distribution is examined. This property is critical in the theorems of probability theory, modelling problems, and waiting time distribution. A probability distribution with PMF px, x = 0,1,2,... is infinite divisible if px < e-1 V x = 1,2,... [24]. For DT distribution with 6 = 2, we observe that p1 = 0.5366 which is greater than e-1 (= 0.3679). Hence in general, DT distribution is not infinitely divisible. Further, since the classes of self-decomposition and stable distributions, in their discrete concepts, are subclasses of infinitely divisible distributions, therefore a DT distribution can neither be self-decomposable nor stable in general.

4. Classical Estimation

In this section, we address the problem of estimation through well-known estimation procedures like method of maximum likelihood, method of moment estimation, ordinary and weighted least squares estimation. In maximum likelihood estimation, we also derived the asymptotic distribution of the ML estimator and construct the asymptotic confidence interval (ACI) for the unknown parameter.

4.1. Method of maximum likelihood

Let Y1, Y2,...., Yn be a random sample of size n with mean y, then the likelihood-function (LF) for DT distribution can be written as

L(y, 6) = exp(n)6ny nn=1 (exp(-6y') - 6 exp(-6(y+1))). (15)

The log-likelihood (LL) function can be represented as

logL(y,6) = n + ny log6 + En=1 log(exp(-6yi) - 6exp(-6(yi+1))). (16)

Taking the partial derivative of the LL function with respect to the parameter, we get the following normal-equation,

d log L = ny + E E1£2 - yi 6yi-1 = 0 (17)

d6 6 + E 1 - 6 E1 , ()

where E1 = exp(6yi - 6yi+1) and E2 = (yi + 1) 6yi+1 - 1 .

The maximum likelihood (ML) estimator of 6 can be found by simplifying Equation (17), but unfortunately, this equation does not yield an analytical solution. Therefore, we use an iterative approach such as Newton-Raphson (NR) to calculate the estimate computationally.

The ML estimator 6 of 6, is consisPent and asymptotic Gaussian distribution with ^/n(j6 - 6) follows N(0,1-1(6)), where 1(6) = E d^z log f (y; 6) j. Therefore, the variance of the estimator

6 can be computed as V(6) « J-1 (6) where J(6) = - ^ J^ . The second-order partial derivative of the LL function is

A Discrete Analogue of Teissier Distribution: Properties and Classical ,, , i

„ . .,?,.. ^ ^ Volume 17, March 2022

Estimation with Application to Count Data

92logL = E (1-eEl)(-y,-(y,-1)-1+Ei£2£3+(y,-+1)2ey<+1 £:)+(-y,-ey»- +ee:£2)(£s+i)Ei ny 9 02 A 6(1- 0£1 )2 e2,

i=1

where E3 = 0y (y, - (y, + 1) 0). Hence, the 100 x (1 - 7)% ACI for the parameter 0 is 0 ^ Zy/2\JV(0), here Z7/2 is the upper 7/2 quantile of the standard Gaussian distribution.

4.2. Method of moment estimation

In this estimation process, firstly, we equate population moment(s) to the corresponding sample moment(s) and then solve this equation for the unknown parameter(s). In our case, the concerned equation is

y = E=10 exp(1 - 0). (18)

where y represents the mean based on the RS y1, y2,..., yn drawn from the DT distribution (5). We can obtain the method of moment (MOM) estimator 0MOM, by solving Equation (18) for 0. Since Equation (18) does not provide the MOM estimator of in explicit form, so we can use numerical methods to compute 0MOM.

4.3. Method of least squares estimation

Here, we present the regression-based estimation methods for estimating the model parameter. These approaches are known as the ordinary least square (OLS) and the weighted least square (WLS) estimators, and they were first suggested by [25]. The OLS and WLS estimators depend on the combination of the non-parametric and parametric distribution functions.

This method is widely used to estimate the parameters of a continuous model. Some authors utilize this technique to estimate the unknowns of a discrete model by considering the non-parametric CDF as a continuous type (see, [23]). Because discrete data is made up of ties observations, a non-parametric CDF that takes ties observations into account is more suited. In view of this, we use a different form of non-parametric CDF that relies on observation of relations. These methods can be described as follows:

Let Y1, Y2,..., Yn be a random sample from F(.) in Equation (6), and Y(1) < Y(2) <,..., < Y(n) be the corresponding ordered values having r tie-runs with the length Zj for the jth one, j = 1,2,... r, then the mean and variance of F(Y(,)) are respectively as

E

F(Y(,))

1 - n

j=1

nj - Zj

n

and V

F(Y(,)) = 1 - F(Y(,))

A_1_

j=1 nj (nj- zj)

The V[F(X(,)) parameter can

is known as Greenwood's formula. The OLS and WLS estimators of the unknown be obtained by minimizing

n / r i\2 JL r 1-1 / r i\2

W1 (0) = E (F(Y(,)) - E F(Y(,)) j and W2(0) = E V F(Y(,)) (f(Y(0) - E F(Y(,)) ) , i=1 i=1

respectively, with respect to the unknown parameter of the model.

Thus, in our case, the OLS estimator of the unknown parameter 0 say 0OLS can be achieved by minimizing

W1 (0) = E 0y,+1 exp(1 - 0(y,+1)) - n

,=1 V

nj - Zj

j=1 nj

with respect to 0. Evenly, 0OLS can be determined by solving

dW1 (0) d0

E

i=1

i n Z

0y,+1exp(1 - 0(y,+1)) - n n— j=1 nj

? (y, :n; 0) = 0,

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

n

A Discrete Analogue of Teissier Distribution: Properties and Classical ,, , i

„ . .,?,.. ^ ^ Volume 17, March 2022

Estimation with Application to Count Data

where £(y,; 9) = (y, + 1)9y (1 - 9y<'+1) exp(1 - 9(y<'+1)). The WLS estimator of 9, say 0WLS, can be achieved by minimizing

n r , _ 1 / i

W2(9) = E V F(Y(i)^ 9yi+1exp(1 - 9(yi+1)) - n n .

¿=1 V j=1 nj /

The estimator 9WLS can also be obtained by simplifying the following equation

^ = [F(Y(i)}1 -1

¿ = 1

9yi+1exp(1 - 9(yi+1)) - n

nj - Zj

j=1 nj

I(y,:n; a) = 0.

5. Numerical illustration

Here, we present the numerical illustrations of the proposed model based on the empirical and real datasets.

5.1. Using simulated data

In this sub-section, we observe the performance of different estimation techniques to estimate the unknown parameter of the proposed model. This assessment consists of the following steps:

1. Generate 2000 samples of sizes n = 20,25,.. .,150 from DT distribution with 9 = 1.05, 1.5, and 3.0. To generate the required RV Y from DT distribution we have used the general approach in which first we draw the pseudo-random value X from continuous Teissier distribution and then discretize this value to obtain Y. The following formula can be used to generate an RV X,

q(M) = 1log [-w-! (eX-j)

0 < M < 1,

where 9 = exp(a) and W_1 denotes the Lambert function and its value can be easily obtained by the inbuilt R-function lamberfWm1 available in the package l«mW.

2. Compute the ML, MOM, OLS, and WLS estimates for the 2000 samples, say j; j = 1,2, ...,2000; y = ML, MOM, OLS, and WLS. Also, we have computed the 95% ACI intervals for the above-generated samples.

3. Compute the mean-squared error (MSE) and average absolute bias (AB) for all point estimates, average width (AW) and coverage probability (CP), where

2000 , .. ^2 . 2000 . 2000

MSE = 2OT) E K - 9 , AB = 2OT) E j - 9, AW = ¿m E (UCLj - LCLj), and 2000 j=1 2000 j=1 2000 j=1

1 2000

CP = 2000 E I(LCLj < 9 < UCL), here, UCL and LCL denotes the upper and lower

2000 j=1

confidence limits for the jth sample, respectively, and !(•) is the indicator function takes value 1, if LCLj < 9 < UCLj, and 0 otherwise.

4. The empirical results are shown in Figures 3-4.

From Figures 3-4, the following key conclusions can be made:

• The MSE decrease to zero as n tends to infinity. This shows the consistency of the estimators. Also, the AB decrease to zero as n becomes large.

• All the estimation procedures perform satisfactorily for different values of n and 9. However, the ML estimator works superior to other classical procedures with respect to MSE. The MOM estimator is the second choice of estimation since the MSE of these estimates is lesser than those obtained for OLS and WLS estimators.

2000

„ . ^ t-^ Volume 17, March 2022

Estimation with Application to Count Data

• The AW of the ACI intervals decreases as we increase the sample size n.

• Here, the CP in the simulation of ACI intervals remains near about nominal value, this validates our simulation results.

• For the small value of the parameter 0, all estimation procedures work better as compare to the large value of . Also, as n becomes large, the considered estimation methods produce more or less similar results with respect to the MSE and AB.

80 100 120 140

100 120 140

20

40

60

20

40

60

80

100 120

140

20

40

60

80

n

80 100 120 140

100 120 140

20

40

60

20

40

60

80

100 120

140

20

40

60

80

n

Figure 3: The MSEs and ABs of d,fferent estimators for (,) 0 =1.05 (ti) 0 =1.50 (Iti) 0 =3.0.

5.2. The real data application

In this part, we use two real datasets to demonstrate the relevance and superiority of the DT distribution. The two datasets are from two distinct areas, with the first representing daily COVID-19 cases in India and the second one consists survival times of a group of laboratory mice. The fitting capability of the proposed model has been compared to that of various well-known

conventional and recently developed models. Table 3 has a list of the competitive models.

A Discrete Analogue of Teissier Distribution: Properties and Classical ,, , i

„ . .,?,.. ^ ^ Volume 17, March 2022

Estimation with Application to Count Data

S £ -

20 40 60 80 100 120 140

20 40 60 80 100 120 140

20 40 60 80 100 120 140

100 120 140

100 120 140

100 120 140

Figure 4: AW and CP for (i) 6 =1.05 (ii) 6 =1.50 (iii) 6 =3.0.

Table 3: The competitive models.

n

n

n

n

Model Parameter(s) Abbreviation References

Geometric e Geo -

Discrete Lindley a DsLi [9]

Discrete Rayleigh 9 DR [22]

Discrete Poisson Lindley a DPL [18]

Discrete Burr (a, ß) DBr [12]

Discrete Pareto 9 DPa [12]

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Two Parameter Discrete Half Logistic (a, ß) DHLo-II [8]

Discrete Perks (a, ß) DP [28]

Discrete Weibull (q, ß) DW [19]

Discrete Logistic (a, ß) DLOG [4]

A Flexible discrete model with one parameter a DsFx-I [5]

Poisson Bilal distribution 9 PB [2]

For comparison purposes, the estimation of the fitted models has been done through ML estimation. The model comparison is carried out based on -LL, Akaike information criterion (AIC), corrected Akaike information criterion (CAIC), Bayesian information criterion (BIC) and Kolmogorov-Smirnov (K-S) statistics using the open-source R software. However, there is an another refined approach to find K-S statistics for detail see [17],[15], and [16]. Here, the lower value of these criteria except the p-value and the higher p-value indicates the best fit.

The first dataset (I): In the first application, we consider the daily new cases in India from 16 March 2021 to 08 April 2021. The data is available at https://www.worldometers.info/coronavirus / country/india-sar/. The original data values are

28869, 35838, 39643, 40950, 43815, 40611, 47264, 53419, 59069, 62291, 62631, 68206, 56119, 53158, 72182, 81441, 89019, 92998,103793, 96557, 115269, 126315, 131893, 14482.

This dataset is modelled with DT and other competitive models. For ease of fitting, data have been divided by 10,000 and their floor values have been stored. Table 4 contains the estimated parameters and their corresponding standard errors (SEs) as well as the various fitting measures discussed earlier. From Table 4, we conclude that the DT model is the best-performed model among others since it has the lowest values of AIC, BIC, CAIC, HQIC, and K-S test statistics with the highest p-value. We have plotted the -LL and CDF plots in Figure 5 (upper left and upper right panel). This figure not only confirms the unique existence of the ML estimate but also portrays that the fitted CDF closely follow the pattern of the empirical CDF for the considered data.

Table 4: The ML estimate (SE) and various goodness of fit measures under dataset I.

Model ML estimate(SE) -LL AIC BIC CAIC K-S P-value

DT 1.1447 (0.0121) 61.6498 125.2997 126.4778 125.4815 0.12640 0.8374

DW 0.0035(0.0034), 2.5990 (0.4083) 61.1957 126.3914 128.7475 126.9628 0.12669 0.7907

DR 0.9844(0.0031) 61.8800 125.76 126.9381 125.9418 0.15186 0.6373

DP 0.0252(0.0208), 0.5020(0.0974) 62.2001 128.4003 130.7564 128.9717 0.13958 0.6869

DLOG 0.5860(0.05317), 7.4515(0.6792) 62.7109 129.4219 131.778 129.9934 0.13598 0.7166

PB 0.1136(0.0187) 67.2632 136.5266 137.7046 136.7084 0.30641 0.0169

DsLi 0.7920(0.0269) 67.8623 137.7246 138.9027 137.9064 0.31724 0.0120

DPL 0.2460(0.0396) 68.7832 139.5665 140.7445 139.7483 0.32841 0.0083

DHLo-II 0.8548(0.0316), 0.7729(0.0626) 69.1321 142.2644 144.6205 142.8358 0.35068 0.0038

DsFx-I 0.9020(0.0167) 71.0351 144.0703 145.2484 144.2521 0.81633 <0.0001

Geo 0.8716(0.0244) 71.6627 145.3255 146.5035 145.5073 0.29772 0.0284

DB 0.9261(0.0709), 6.6364(6.4802) 87.3628 178.7273 181.0834 179.2987 0.74374 <0.0001

DPa 0.6217(0.0603) 92.3961 186.7923 187.9703 186.9741 0.72802 <0.0001

Table 5 consists of ML, MOM, OLS, and WLS estimates with their SEs and 95% ACI intervals for 0. To compare different methods, the K-S statistics with associated p-values for all methods are also provided in Table 5. From Table 5, we can easily observe that all estimation methods perform quite satisfactorily as the p-values associated with K-S statistics is greater than 0.05.

Table 5: The d,fferent estimates, SE, and K-S wUh p-value under dataset I.

Method Estimate SE K-S P-value ACI

ML estimate 1.1447 0.0121 0.1264 0.8374 [1.1209,1.1683]

MOM 1.1372 0.0367 0.1544 0.6162 -

OLS 1.1561 0.0392 0.1633 0.5440 -

WLS 1.1561 0.0392 0.1633 0.5439 -

The second dataset (II): This dataset gives the survival times of a group of laboratory mice, which were exposed to a fixed dose of radiation at an age of 5 to 6 weeks [see, [14], pp. 445]. This group of mice lived in a conventional lab environment. The cause of death for each mouse was assigned after autopsy to be one of three things: thymic lymphoma (C1), reticulum cell sarcoma (C2), or other causes (C3). Here, we have used the dataset under C3 only. The mice are all died by the end of the experiment, so there is no censoring. The data values are: 40, 42, 51, 62,163,179, 206, 222, 228, 252, 259, 282, 324, 333, 341, 366, 385, 407, 420, 431, 441, 461, 462, 482, 517, 517, 524, 564, 567, 586, 619, 620, 621, 622, 647, 651, 686, 761, 763. The above dataset is modelled with DT and DW, DR, PB, DsLi, DPL, Geo, DB, DPa models. The estimated parameters and other fitting measures are reported in Table 6. From the outcomes of Table 6, we conclude that the DT distribution is the best choice among other competitive models since it has the lowest values of -LL, AIC, BIC, CAIC, HQIC, and K-S statistics with the highest P-value. Figure 5 (lower left and lower right panel) also depicts that DT distribution has a unique ML estimate for the given data and it is well enough to model this data.

„ . .,?,.. ^ ^ Volume 17, March 2022 Estimation with Application to Count Data

Table 6: The ML estimate (SE) and various goodness of fit measures under dataset II.

Model ML estimate (SE) -LL AIC BIC CAIC K-S P-Value

DT 1.0024(0.0002) 262.0291 526.0581 527.7217 526.1663 0.0907 0.9049

DW 0.9999 (3.548e-07), 2.0772 (0.0318) 263.1519 530.3039 533.6310 530.6372 0.1008 0.8223

DR 0.9999 (5.874e-07) 263.1909 528.3818 530.0454 528.4899 0.1080 0.7525

PB 0.0020(0.0002) 267.1738 536.3476 538.0112 536.4557 0.1597 0.2723

DsLi 0.9951(0.0005) 266.9048 535.8097 537.4733 535.9178 0.1587 0.2797

DPL 0.0048(0.0005) 266.9121 535.8242 537.4878 535.9323 0.1588 0.2786

Geo 0.9975(0.0004) 273.9544 549.9088 551.5723 550.0169 0.2385 0.0236

DB 0.9282(0.0621), 2.3077(2.0575) 334.6387 673.2775 676.6046 673.6108 0.6803 <0.0001

DPa 0.8422(0.0231) 334.8421 671.6843 673.3478 671.7924 0.6801 <0.0001

Table 7 displays the ML, MOM, OLS, and WLS estimates with their SEs and 95% ACI intervals for в. This table also contains the K-S statistics with associated p-values for all considered methods. From Table 7, we can easily observe that all estimation methods perform quite satisfactorily as the p-values associated with K-S statistics are greater than 0.05.

Table 7: The different estimates, SE and K-S with p-value under dataset II.

Method Estimate SE K-S P-Value ACI

ML estimate 1.0024 0.0002 0.0907 0.9049 [1.0020,1.0027]

MOM 1.0096 0.0022 0.2047 0.0759 -

OLS 1.0024 0.0004 0.0909 0.9034 -

WLS 1.0024 0.0004 0.0909 0.9034 -

-LL plot for Data set I

Fitted vs Empirical CDF plot for Data set I

~r 12

"Г 14

a

-LL plot for Data set II

Fitted vs Empirical CDF plot for Data set II

-1-' - Empirical CDF

Theortical CDF

1.0010 1.0015 1.0020 1.0025 1.0030 1.0035

400

y

200

600

0

Figure 5: The -LL and CDFs plots for dataset I and II.

6. Conclusion

In this article, a new one-parameter discrete Teissier distribution is obtained. It is observed that with one parameter, this model has great flexibility in terms of fitting as it is capable of modelling equi-, over and under-dispersed datasets. It is also capable of the modelling of

positively, negatively skewed and increasing failure datasets. In this article, various important distributional properties of DT distribution are discussed.

The unknown parameters of the proposed model are estimated under the various classical methods. An extensive simulation study is presented for the assessment of the various estimators under count data. Finally, the fitting capability of the proposed model for count data is illustrated using two real datasets. Hence, we can conclude that the suggested model may be used as an alternative model to some well-known existing models to analyze discrete data generated from various domains.

A future plan of action regarding the current study might be an examination of the censored data using the proposed model. We may investigate the load share model where the component failure time follows the DT distribution. The stress-strength parameter may also be examined using various censored data. In addition, a bivariate extension of the DT distribution can be developed.

Acknowledgement

The authors would like to thank the editor and an anonymous referee for their careful reading of the manuscript and constructive suggestions, which significantly improved the earlier version of the manuscript. The last author is thankful to the Department of Science and Technology, India for providing financial aid for the research work under the Inspire Fellowship Program vide letter number DST/INSPIRE Fellowship/2017/IF170038.

Conflict of interest The authors declare no conflicts of interest.

References

[1] Alamatsaz, M. H., Dey, S., Dey, T., & Harandi, S. S. (2016). Discrete generalized Rayleigh distribution. Pakistan Journal of Statistics, 32(1).

[2] Altun, E. (2020). A new one-parameter discrete distribution with associated regression and integer-valued autoregressive models. Mathematica Slovaca, 70(4), 979-994.

[3] Chakraborty, S. (2015). Generating discrete analogues of continuous probability distributions-A survey of methods and constructions. Journal of Statistical Distributions and Applications, 2(1), 6.

[4] Chakraborty, S., & Chakravarty, D. (2016). A new discrete probability distribution with integer support on (—ro, ro). Communications in Statistics-Theory and Methods, 45(2), 492-505.

[5] Eliwa, M. S., & El-Morshedy, M. (2021). A one-parameter discrete distribution for over-dispersed data: statistical and reliability properties with applications. Journal of Applied Statistics, 1-21.

[6] Eliwa, M. S., Alhussain, Z. A., & El-Morshedy, M. (2020). Discrete Gompertz-G family of distributions for over-and under-dispersed data with properties, estimation, and applications. Mathematics, 8(3), 358.

[7] El-Morshedy, M., Eliwa, M. S., & Tyagi, A. (2021a). A discrete analogue of odd Weibull-G family of distributions: properties, classical and Bayesian estimation with applications to count data. Journal of Applied Statistics, 1-25.

[8] El-Morshedy, M., Alizadeh, M., Al-Bossly, A.,& Eliwa, M. S. (2021b). A Probability Mass Function for Various Shapes of the Failure Rates, Asymmetric and Dispersed Data with Applications to Coronavirus and Kidney Dysmorphogenesis. Symmetry, 13(10), 1790.

[9] Gomez-Deniz, E., & Calderin-Ojeda, E. (2011). The discrete Lindley distribution: properties and applications. Journal of Statistical Computation and Simulation, 81(11), 1405-1416.

[10] Gupta, P. L., Gupta, R. C., and Tripathi, R. C. (1997). On the monotonic properties of discrete failure rates. Journal of Statistical Planning and Inference, 65(2), 255-268.

[11] Jayakumar, K., & Babu, M. G. (2018). Discrete Weibull geometric distribution and its properties. Communications in Statistics-Theory and Methods, 47(7), 1767-1783.

[12] Krishna, H., & Pundir, P. S. (2009). Discrete Burr and discrete Pareto distributions. Statistical Methodology, 6(2), 177-188.

[13] Keilson, J., and Gerber, H. (1971). Some results for discrete unimodality. Journal of the American Statistical Association, 66(334), 386-389.

[14] Lawless, J. F. (2003). Statistical models and methods for lifetime data (Vol. 362). John Wiley & Sons.

[15] Lemeshko, B.Yu., Lemeshko, S.B. and Postovalov S.N. (2010) Statistic Distribution Models for Some Nonparametric Goodness-of-Fit Tests in Testing Composite Hy-potheses.Communications in Statistics - Theory and Methods, 39: 460-471. DOI: 10.1080/03610920903140148

[16] Lemeshko, B.Yu., Lemeshko, S.B. (2011) Models of Statistic Distributions of Nonpara-metric Goodness-of-Fit Tests in Composite Hypotheses Testing for Double Exponential Law Cases. Communications in Statistics - Theory and Methods, 40:2879-2892. DOI: 10.1080/03610926.2011.562770

[17] Lemeshko B.Yu., Postovalov S.N. (2001) Application of the nonparametric goodness-of-fit Tests in testing composite hypotheses. Optoelectronics, Instrumentation and Data Processing. No.2. - P. 76-88.

[18] Sankaran, M. (1970). "The discrete Poisson-Lindley distribution", Biometrics., vol. 26, no. 1, pp. 145-149.

[19] Nakagawa, T., & Osaki, S. (1975). The discrete Weibull distribution. IEEE transactions on reliability, 24(5), 300-301.

[20] Renyi, A. (1961). On measures of entropy and information. Mathematical Statistics and Probability, 1, 547-561.

[21] Roy, D. (2003). The discrete normal distribution. Commun. Statist. Theor. Meth. 32(10):1871-1883.

[22] Roy, D. (2004). Discrete Rayleigh distribution. IEEE Trans. Reliab. 53:255-260.

[23] Shafqat, M., Ali, S., Shah, I., & Dey, S. (2020). Univariate Discrete Nadarajah and Haghighi Distribution: Properties and Different Methods of Estimation. Statistica, 80(3), 301-330.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

[24] Steutel, F.W. & van Harn, K. (2004). Infinite Divisibility of Probability Distributions on the Real Line. New York: Marcel Dekker.

[25] Swain, J. J., Venkatraman, S., & Wilson, J. R. (1988). Least-squares estimation of distribution functions in Johnson's translation system. Journal of Statistical Computation and Simulation, 29(4), 271-297.

[26] Sharma, V. K., Singh, S. V., & Shekhawat, K. (2020). Exponentiated Teissier distribution with increasing, decreasing and bathtub hazard functions. Journal of Applied Statistics, 1-23.

[27] Teissier, G. (1934). Recherches sur le vieillissement et sur les lois de la mortalit?. Annales de physiologie et de physicochimie biologique, 10(2), 237-284.

[28] Tyagi, A., Choudhary, N., & Singh, B. (2020). A new discrete distribution: Theory and applications to discrete failure lifetime and count data. J. Appl. Probab. Statist, 15,117-143.

[29] Tyagi, A., Choudhary, N., and Singh, B. (2019). Discrete additive Perks-Weibull distribution: Properties and applications. Life Cycle Reliability and Safety Engineering, 8(3), 183-199.

i Надоели баннеры? Вы всегда можете отключить рекламу.