Bootstrap methods for the censored data in empirical Bayes estimation of the reliability parameters

F. Grabski; A. Załęska-Fornal

BOOTSTRAP METHODS FOR THE CENSORED DATA IN EMPIRICAL BAYES ESTIMATION OF THE RELIABILITY PARAMETERS

F. Grabski, A. Zal^ska-Fornal

Department of Mathematics, Naval University, Gdynia, Poland e-mail: [email protected] ; [email protected]

ABSTRACT

Bootstrap and resampling methods are the computer methods used in applied statistics. They are types of the Monte Carlo method based on the observed data. Bradley Efron described the bootstrap method in 1979 and he has written a lot about it and its generalizations since then. Here we apply these methods in an empirical Bayes estimation using bootstrap copies of the censored data to obtain an empirical prior distribution.

1 INTRODUCTION

The bootstrap is a computer-based method used in applied statistics. It is a databased method of simulation for assessing statistical accuracy. The term bootstrap derives from the phrase 'to pull oneself up by one's bootstrap' which can be found in the eighteenth century Adventures of Baron Munchausen by Rudolf Erich Raspe. The method was proposed by Bradley Efron in 1979 as a method to estimate the standard error of a parameter. The main goal of the bootstrap method is a computer-based fulfilling of basic statistical ideas. The recent environment applications of bootstrap can be found in toxicology, fisheries survey, ground water and air pollution modeling, hydrology etc. Bootstrapping is a methodology whose implementation involves a powerful principle: creating many repeated data samples from a single one we have and making inference from those samples. We apply bootstrap in empirical estimation using the so-called bootstrap copies of the censored data to obtain an empirical distribution.

2 BOOTSTRAP AND RESAMPLING COPIES OF THE CENSORED

The random variable X denotes time to failure of an element. The probability distribution of the time to failure is defined by the cumulative distribution function (cdf)

Fe(x) = P(X < x) (1)

where d e0 is true but unknown parameter. To assess this distribution we test n identical elements e1,e2,...,enthrough the times y1,y2,...,yn correspondingly. Suppose, that the numbers Xj,x2,...,xn are the times to failures of the elements mentioned above. A vector xn = (x1,x2,...,xn) of the data is assumed to be the value of the random vector Xn = (X1,X2,...,Xn), where random variables X1,X2,...,Xn are mutually independent and identically distributed (i.i.d.). That random vector is a sample from the distribution Fe(-). A vector yn = (y1,y2,...,yn)of the testing times of elements (times of the observations, censoring points) we can treat as the value of the random vector Yn = (Y1,Y2,...,Yn). We assume that Y1,Y2,...,Yn are mutually independent random variables and they

are also independent of X's. Probability distributions of the random variables Yx,Y2,...,Yn are defined by cdf

Gt(y) = P(Y < y), i = 1,2,...,n (2)

Those functions do not depend on the parameter 0 £0. In many cases those functions are defined as

f 0 for y < yt

Gt (y) = j 1 f ' > * yt £ [0,»] . [ 1 for y > yt

It means that the quantities of Y1,Y2,...,Yn are determined.

The observations are described by the random variables

Uj = min( Xj Y ), j = 1,..., n (3)

f 1 for X, < Y,

A, = \ 1 1 . (4)

j [ 0 for X, > Yj W

The sufficient statistic describing observations can be written as the vector Zn = ((U1,A1),...,(Un,An). The value of that random vector is zn = ((u1,51),...,(un,Sn)), which

allows to obtain the vector z(n) = (z(1),z(2),..z(k),z(k+1),...,z(n)), where z(1),z(2),...,z(k) are the instants

of the elements failure and z(k+1),z(k+2),...,z(n) are the times observations of the working elements.

Suppose that we are able to estimate a parameter 0 £0 by using estimator 0n = T(Zn )(or

0n = T(Z(n))). The numbers 0n = T(zn) (or 0n = T(z(n))) are their values. After that we can use

the distribution F0 (•) to simulate so-called bootstrap copies

0n

z*(b) = (zw z*(b) z*(b)) b = 12 B of data z(n) = (z(1),z(2),...,z(n)). The bootstrap copies of data are the values of the random vectors

Z*(b) = (7*(b) Z*(b) Z*(b)) b = i 2 B Z(n) = (7(1) ,7(2) 7(n) h b = 1,2,....,B,

that are called the bootstrap samples. The function F0b (•) is a cumulative probability distribution of

0n

the independent random variables Z*(b),Z*(b),...,Z*(b).

If we have a vector of observation z(n) = (z(1),z(2),...,z(n)) of size n , we can define the empirical

cumulative distribution function F as

— #{z(t): z(t) < z}

F (z; z (n)) = 1(t) 0)--

n

that is equivalent to the discrete distribution

n

pk = , k = 1,2,..., l, n

where nk =#{t : z0.) = z(k))}.

This distribution can be expressed as a vector of frequencies p = (pj,p2,...,pl). Vectors of the data

zt) = (z1(r),z2(r),...,zn(r)), r = 1,2,....,R coming from distribution F(z;z(n)) are said to be resampltng coptes of the data

z (n ) = (z(1), z (2) ,..., z (n )) .

In other words a resampling copy of the data z) = (\z°2(rz°n(r}) is generated by randomly sampling n - times with replacement from the original data points z(n) = (z(1), z(2),..., z(n)). The randomly sampling means the random choice of an element from among z(1),z(2),...,z(n) in each of

n drawings. The resampling copy of the data is composed of the elements of the original sample, some of them can be taken zero times, some of them can be taken ones or twice etc. Notice that in zan(r) = (z° (r),z2(r),...,z°Jr)) the resampling copy, the elements are repeated as a rule. The typical number of the bootstrap B or resampling copies of the data range from 50 to 1000.

3 BOOTSTRAP ESTIMATORS

Let Zn = (Z*,Z2,...,Z*n) be a bootstrap sample for the given vector of data zn = (z1,z2,...,zn). A random variable 0*n = T(Z*n) is said to be a bootstrap estimator of the parameter 0.

The distribution of the statistics 0*n - 0n for the bootstrap sample with the fixed values data is close to the distribution of the statistics 0n -0.

From that rule it follows that the shapes of the distributions of the statistics 0 *n, 0n are similar .To obtain empirical distribution of the random variable 0*n we have to simulate bootstrap copies

zn(b) = (z*(b),z*(b),...,z*(b)), b = 1,2,....,B of data zn = (z1, z2,..., zn). After that we calculate the values of statistics

C(b) = T(z;(b)), b = 1,2,....,B . We can use a nonparametric kernel estimator to obtain the estimate of probability density of the bootstrap estimator 0*n. The value of this estimator with Gaussian kernel is given by

1 B

g(5)=Bh £ *

h

1 -where K(5) = ._e 2, (-ro, ro),

-¡2n

and h = 1.06s B s - standard deviation of 0,(b), b = 1,2,....,B. 4 THE BOOTSTRAP ESTIMATE OF STANDARD ERROR

zn(b) = (z*(b),z*(b),...,z*(b)), b = 1,2,....,B are the bootstrap replications of the statistics values

0T = T(z*(b)), b = 1,2,....,B (5)

and they correspond to the bootstrap censoring data.

The bootstrap estimate of the standard error of 0 is defined by the following formula

se^, =

£Q*(b) -0"J

b=1

B -1

(6)

£ Q ,(i)

where 0" = -

B

The bootstrap algorithm for estimating standard errors is as follows:

- GetBindependent bootstrap samples z*() = (z1*(b),z2(b),...,z*n(b)), b = 1,2,....,B

(for estimating a standard error, the number of B should be in the range 30-200).

- Compute the bootstrap replication correspond each bootstrap sample,

e'n(b> = t(z*b)), b = 1,2,...., B.

- Compute the standard error se-, by the sample standard deviation of B replications according to (6).

5 EMPIRICAL BAYES ESTIMATION

The recent work deal with empirical Bayes estimation has been stimulated by the work of Robbins (1955). It is well known that the value of Bayes estimator dB of parameter - under the squared-loss function is an expectation in posterior distribution. If - is a value of sufficient statistics for parameter -, than the value of Bayes estimator 0B of the parameter - is

_ _ f-f (--) g (-)dv(-)

-b = E(-\-) = ^^--(7)

Je f (- \ -)g(d)dv(d)

where v denotes a discrete counting measure or Lebesgue measure and g(-) is a prior density function of the parameter - with respect to measure v.

We suppose that a prior density of mentioned above parameter is unknown. In classical empirical Bayesian procedure a prior distribution is assessed from the past data. Very often the only data we have is the small sample z = (z1,z2,...,zn). In those cases instead of past data, we can use the

vectors zn(b) = (zj1(b),zl(b),...,z*(b)), b = 1,2,....,B, that are values of the bootstrap samples

corresponding to an unknown distribution Fe (•) of a random variable X denoting (for example)

time to failure. The bootstrap copies for censored data are generated from the distribution F- (•),

where - = T(z(n)). To estimate the unknown parameter - we have to calculate values of the bootstrap statistics -*'(b) = T(z*(b)), b = 1,2,....,B of that one. As a prior density we propose discrete density function

m

g (-) = m S(-,- ")),

m (8)

1 e j2,..., jw} C a..,B} where m. =#{k : -*{k) = -*(l)}denotes number observations equal to -*(l),

) i1 for - = -*(l)

5(0- )) = J )

V ' [0 for - * - *(l)

and m = Vw m = B.

Z-(i=1 Ji

From (7), for the counting measure v and the density function defined by (8) we obtain

_ _ Yw m-J (- (i)) -B = E(- \ -) = L^i=1 1 -

£1: m J --) (9)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

%l)f (- \-*(1)) ZB=1-~ (- \-

Let f0 (z*nb))) = l(z*nb); 0) be a likelihood function for the bootstrap sample

z*(b) _ ( _*(b) _*(b) _*(b) ) *'(n) - Vz(1) >z(2) n) >

J(n) - V (1) ? (2) n)

with unknown parameter Qg0. The function is defined by the formula

l (z ;n?.Q) = ftf.( O I! [1 -f.O] (10)

i=1 i=kb +1 ' v '

Notice that a prior distribution is constructed on the basis on the bootstrap samples. Since, a value of bootstrap empirical Bayes estimator has the form of (9).

6 EXAMPLES

Example 1.

Suppose that we wish to estimate a failure rate 0 = X in the exponential distribution given by

pdf

f0(x) = Xex, x > 0, X> 0 . (11)

Assume that we have data, which is the vector

z (n ) = (z(1), z (2) ,.. z(k ), z(k+1),..., z(n )),

where z(1),z(2),...,z(k) are times to failure of the tested elements and z(k+1),z(k+2),...,z(n)are times of the working elements observations. In that case a likelihood function is

l (z (n),X) = JJfe( z(l)) II [1 -Fe( z{l))] =

i=1 i=k+1

_ _ ¿( i )

n XeX zi) n [e' Z(i)] _ Xke i_'

x ) (12)

i_1 i_k+1

The number

(i) (13)

is the value of some sufficient statistics for the unknown parameter A. By substitution we obtain the likelihood function

l(z,X) _ Xke

which depends on z . To find the value of the maximum likelihood estimator we have to solve an equation

5 ln l (z,X)

ÔÂ

The solution of it is

_ 0.

X = k =

r "£z • (14)

£Z (i) i=1

The same way, using formula (7) for the bootstrap samples

z*nb)) = (z*1b),z*2b),...,)), b = 1,2,....,B we obtain the values of the maximum likelihood estimator of X

k *(b) k *(b)

X(3) =-, b = 1,2,...., B

r £zw

i=1 1

The function (9) in this case is given by the formula

AB = E(A | A) = ■

A) =ZT=1 )f (A IA))

2;=!mf (A I A(l))

kA(

where./(A | A*(0) = (A*(0)ke A

Finally we obtain

where A =

2 Z(0 '

i=i

AB =

kA

21, mt A*(0(A*(0)V ^

(i\k„ A

*(b)

,.=! m(A ) e

kA''

2B_1(A*( 1))k+1 ^

kA*( 1

2BJ1(A*( 1)) ke" ^ b = 1,2,...., B.

2 z

.=1

*(b)

By repetition we can obtain a sequence of values of a Bayes estimator that we can use to construct its empirical distribution.

Example 2.

We wish to estimate a value of an exponential reliability function

x) = e, x > 0, A > 0, 0 = A.

(15)

-Ax,,

At a fixed moment x0 the number

r = R-( x0) = e is a value of the reliability function. Hence

A = ^.

There is a given vector

z ( n) = (Z(1), Z(2),..Z(k), Z(k+1),..., Z( n))

the coordinates of which have the same meaning as in Example 1. Let t be described by (13). A likelihood function of the parameter A for zn is

l (t,a) = Ake-At.

Substituting the value of A and r = elnr we get the form of the likelihood function

(16)

l (r,A) = f (r| r) =

f ln rV I lnr Ir f in r > ^

V x0 J

ln r

V xo J

(17)

The likelihood equation

a in i(r,A)

dA

= 0

is carried out to the following form r in

k + r = 0.

r in r rx0

A root of the equation is a value of the maximum likelihood estimate of r and it has a form of

x

0

x

0

e

r

r = e

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

kxo

Using the bootstrap samples

z = ( W W *(b)) b = 12 B we obtain the values of the maximum likelihood estimator of r and it is defined by

r

*( b) r ( ) = e

\

n

Z*( b )

V ¡=1 7

b = 1,2,...,B .

As

ln r

Ki ) A

( r *(i)) xQ

0 y

f (r | r*(l)) = ln(r*(l},r) = then the value of the Bayes empirical estimate of r computed on the basis on

w

£ mtr "{l ) f (r | r "{l )) ÏB = E (r | r ) = -M--

Z m f (r I r *(i))

i=1

has the following form rB =

z

¡=1

mr

Ki )

ln r

Ki ) A

x,

( r *(i ))

'(i )\ Xq

0 y

z

m

ln r*(i ) A

x

k x_

(r )X0

0 y

(18)

7 CONCLUSIONS

In that paper we present the possibility of applying the bootstrap and resampling methods in empirical Bayes estimation. The bootstrap and resampling copies of the given data are used to construct an empirical prior distribution.

REFERENCES

L

k

i

k

T

i=1

1. Belyaev Yu. K.: Resampling and bootstrap methods in analysis of reliability data. Safety & Reliability. ESREL 2001, p.1877-1882.

2. Grabski F. Jazwinski J.: Metody bayesowskie w niezawodnosci I diagnostyce. WKL, Warszawa 2001.

3. Efron B., Tibshirani R. J.: An introduction to the Bootstrap, Chapman & Hall, New York, London 1993.

4. Koronacki J. Mielniczuk J.: Statystyka dla studentöw kierunköw technicznych i przyrodniczych, Wydawnictwa Naukowo-Techniczne, Warszawa 2001.

5. Savcuk W.P. Bayesian methods of the statistical assessment. Nauka, Moskva 1989 (Russian).

Bootstrap methods for the censored data in empirical Bayes estimation of the reliability parameters Текст научной статьи по специальности «Математика»

Аннотация научной статьи по математике, автор научной работы — F. Grabski, A Załęska-fornal

Похожие темы научных работ по математике , автор научной работы — F. Grabski, A Załęska-fornal

Текст научной работы на тему «Bootstrap methods for the censored data in empirical Bayes estimation of the reliability parameters»