BOOTSTRAP METHODS FOR THE CENSORED DATA IN EMPIRICAL BAYES ESTIMATION OF THE RELIABILITY PARAMETERS
F. Grabski, A. Zal^ska-Fornal
Department of Mathematics, Naval University, Gdynia, Poland e-mail: [email protected] ; [email protected]
ABSTRACT
Bootstrap and resampling methods are the computer methods used in applied statistics. They are types of the Monte Carlo method based on the observed data. Bradley Efron described the bootstrap method in 1979 and he has written a lot about it and its generalizations since then. Here we apply these methods in an empirical Bayes estimation using bootstrap copies of the censored data to obtain an empirical prior distribution.
1 INTRODUCTION
The bootstrap is a computer-based method used in applied statistics. It is a databased method of simulation for assessing statistical accuracy. The term bootstrap derives from the phrase 'to pull oneself up by one's bootstrap' which can be found in the eighteenth century Adventures of Baron Munchausen by Rudolf Erich Raspe. The method was proposed by Bradley Efron in 1979 as a method to estimate the standard error of a parameter. The main goal of the bootstrap method is a computer-based fulfilling of basic statistical ideas. The recent environment applications of bootstrap can be found in toxicology, fisheries survey, ground water and air pollution modeling, hydrology etc. Bootstrapping is a methodology whose implementation involves a powerful principle: creating many repeated data samples from a single one we have and making inference from those samples. We apply bootstrap in empirical estimation using the so-called bootstrap copies of the censored data to obtain an empirical distribution.
2 BOOTSTRAP AND RESAMPLING COPIES OF THE CENSORED
The random variable X denotes time to failure of an element. The probability distribution of the time to failure is defined by the cumulative distribution function (cdf)
Fe(x) = P(X < x) (1)
where d e0 is true but unknown parameter. To assess this distribution we test n identical elements e1,e2,...,enthrough the times y1,y2,...,yn correspondingly. Suppose, that the numbers Xj,x2,...,xn are the times to failures of the elements mentioned above. A vector xn = (x1,x2,...,xn) of the data is assumed to be the value of the random vector Xn = (X1,X2,...,Xn), where random variables X1,X2,...,Xn are mutually independent and identically distributed (i.i.d.). That random vector is a sample from the distribution Fe(-). A vector yn = (y1,y2,...,yn)of the testing times of elements (times of the observations, censoring points) we can treat as the value of the random vector Yn = (Y1,Y2,...,Yn). We assume that Y1,Y2,...,Yn are mutually independent random variables and they
are also independent of X's. Probability distributions of the random variables Yx,Y2,...,Yn are defined by cdf
Gt(y) = P(Y < y), i = 1,2,...,n (2)
Those functions do not depend on the parameter 0 £0. In many cases those functions are defined as
f 0 for y < yt
Gt (y) = j 1 f ' > * yt £ [0,»] . [ 1 for y > yt
It means that the quantities of Y1,Y2,...,Yn are determined.
The observations are described by the random variables
Uj = min( Xj Y ), j = 1,..., n (3)
f 1 for X, < Y,
A, = \ 1 1 . (4)
j [ 0 for X, > Yj W
The sufficient statistic describing observations can be written as the vector Zn = ((U1,A1),...,(Un,An). The value of that random vector is zn = ((u1,51),...,(un,Sn)), which
allows to obtain the vector z(n) = (z(1),z(2),..z(k),z(k+1),...,z(n)), where z(1),z(2),...,z(k) are the instants
of the elements failure and z(k+1),z(k+2),...,z(n) are the times observations of the working elements.
Suppose that we are able to estimate a parameter 0 £0 by using estimator 0n = T(Zn )(or
0n = T(Z(n))). The numbers 0n = T(zn) (or 0n = T(z(n))) are their values. After that we can use
the distribution F0 (•) to simulate so-called bootstrap copies
0n
z*(b) = (zw z*(b) z*(b)) b = 12 B of data z(n) = (z(1),z(2),...,z(n)). The bootstrap copies of data are the values of the random vectors
Z*(b) = (7*(b) Z*(b) Z*(b)) b = i 2 B Z(n) = (7(1) ,7(2) 7(n) h b = 1,2,....,B,
that are called the bootstrap samples. The function F0b (•) is a cumulative probability distribution of
0n
the independent random variables Z*(b),Z*(b),...,Z*(b).
If we have a vector of observation z(n) = (z(1),z(2),...,z(n)) of size n , we can define the empirical
cumulative distribution function F as
— #{z(t): z(t) < z}
F (z; z (n)) = 1(t) 0)--
n
that is equivalent to the discrete distribution
n
pk = , k = 1,2,..., l, n
where nk =#{t : z0.) = z(k))}.
This distribution can be expressed as a vector of frequencies p = (pj,p2,...,pl). Vectors of the data
zt) = (z1(r),z2(r),...,zn(r)), r = 1,2,....,R coming from distribution F(z;z(n)) are said to be resampltng coptes of the data
z (n ) = (z(1), z (2) ,..., z (n )) .
In other words a resampling copy of the data z) = (\z°2(rz°n(r}) is generated by randomly sampling n - times with replacement from the original data points z(n) = (z(1), z(2),..., z(n)). The randomly sampling means the random choice of an element from among z(1),z(2),...,z(n) in each of
n drawings. The resampling copy of the data is composed of the elements of the original sample, some of them can be taken zero times, some of them can be taken ones or twice etc. Notice that in zan(r) = (z° (r),z2(r),...,z°Jr)) the resampling copy, the elements are repeated as a rule. The typical number of the bootstrap B or resampling copies of the data range from 50 to 1000.
3 BOOTSTRAP ESTIMATORS
Let Zn = (Z*,Z2,...,Z*n) be a bootstrap sample for the given vector of data zn = (z1,z2,...,zn). A random variable 0*n = T(Z*n) is said to be a bootstrap estimator of the parameter 0.
The distribution of the statistics 0*n - 0n for the bootstrap sample with the fixed values data is close to the distribution of the statistics 0n -0.
From that rule it follows that the shapes of the distributions of the statistics 0 *n, 0n are similar .To obtain empirical distribution of the random variable 0*n we have to simulate bootstrap copies
zn(b) = (z*(b),z*(b),...,z*(b)), b = 1,2,....,B of data zn = (z1, z2,..., zn). After that we calculate the values of statistics
C(b) = T(z;(b)), b = 1,2,....,B . We can use a nonparametric kernel estimator to obtain the estimate of probability density of the bootstrap estimator 0*n. The value of this estimator with Gaussian kernel is given by
1 B
g(5)=Bh £ *
h
1 -where K(5) = ._e 2, (-ro, ro),
-¡2n
and h = 1.06s B s - standard deviation of 0,(b), b = 1,2,....,B. 4 THE BOOTSTRAP ESTIMATE OF STANDARD ERROR
zn(b) = (z*(b),z*(b),...,z*(b)), b = 1,2,....,B are the bootstrap replications of the statistics values
0T = T(z*(b)), b = 1,2,....,B (5)
and they correspond to the bootstrap censoring data.
The bootstrap estimate of the standard error of 0 is defined by the following formula
se^, =
£Q*(b) -0"J
b=1
B -1
(6)
£ Q ,(i)
where 0" = -
B
The bootstrap algorithm for estimating standard errors is as follows:
- GetBindependent bootstrap samples z*() = (z1*(b),z2(b),...,z*n(b)), b = 1,2,....,B
(for estimating a standard error, the number of B should be in the range 30-200).
- Compute the bootstrap replication correspond each bootstrap sample,
e'n(b> = t(z*b)), b = 1,2,...., B.
- Compute the standard error se-, by the sample standard deviation of B replications according to (6).
5 EMPIRICAL BAYES ESTIMATION
The recent work deal with empirical Bayes estimation has been stimulated by the work of Robbins (1955). It is well known that the value of Bayes estimator dB of parameter - under the squared-loss function is an expectation in posterior distribution. If - is a value of sufficient statistics for parameter -, than the value of Bayes estimator 0B of the parameter - is
_ _ f-f (--) g (-)dv(-)
-b = E(-\-) = ^^--(7)
Je f (- \ -)g(d)dv(d)
where v denotes a discrete counting measure or Lebesgue measure and g(-) is a prior density function of the parameter - with respect to measure v.
We suppose that a prior density of mentioned above parameter is unknown. In classical empirical Bayesian procedure a prior distribution is assessed from the past data. Very often the only data we have is the small sample z = (z1,z2,...,zn). In those cases instead of past data, we can use the
vectors zn(b) = (zj1(b),zl(b),...,z*(b)), b = 1,2,....,B, that are values of the bootstrap samples
corresponding to an unknown distribution Fe (•) of a random variable X denoting (for example)
time to failure. The bootstrap copies for censored data are generated from the distribution F- (•),
where - = T(z(n)). To estimate the unknown parameter - we have to calculate values of the bootstrap statistics -*'(b) = T(z*(b)), b = 1,2,....,B of that one. As a prior density we propose discrete density function
m
g (-) = m S(-,- ")),
m (8)
1 e j2,..., jw} C a..,B} where m. =#{k : -*{k) = -*(l)}denotes number observations equal to -*(l),
) i1 for - = -*(l)
5(0- )) = J )
V ' [0 for - * - *(l)
and m = Vw m = B.
Z-(i=1 Ji
From (7), for the counting measure v and the density function defined by (8) we obtain
_ _ Yw m-J (- (i)) -B = E(- \ -) = L^i=1 1 -
£1: m J --) (9)
%l)f (- \-*(1)) ZB=1-~ (- \-
Let f0 (z*nb))) = l(z*nb); 0) be a likelihood function for the bootstrap sample
z*(b) _ ( _*(b) _*(b) _*(b) ) *'(n) - Vz(1) >z(2) n) >
J(n) - V (1) ? (2) n)
with unknown parameter Qg0. The function is defined by the formula
l (z ;n?.Q) = ftf.( O I! [1 -f.O] (10)
i=1 i=kb +1 ' v '
Notice that a prior distribution is constructed on the basis on the bootstrap samples. Since, a value of bootstrap empirical Bayes estimator has the form of (9).
6 EXAMPLES
Example 1.
Suppose that we wish to estimate a failure rate 0 = X in the exponential distribution given by
f0(x) = Xex, x > 0, X> 0 . (11)
Assume that we have data, which is the vector
z (n ) = (z(1), z (2) ,.. z(k ), z(k+1),..., z(n )),
where z(1),z(2),...,z(k) are times to failure of the tested elements and z(k+1),z(k+2),...,z(n)are times of the working elements observations. In that case a likelihood function is
l (z (n),X) = JJfe( z(l)) II [1 -Fe( z{l))] =
i=1 i=k+1
_ _ ¿( i )
n XeX zi) n [e' Z(i)] _ Xke i_'
x ) (12)
i_1 i_k+1
The number
(i) (13)
is the value of some sufficient statistics for the unknown parameter A. By substitution we obtain the likelihood function
l(z,X) _ Xke
which depends on z . To find the value of the maximum likelihood estimator we have to solve an equation
5 ln l (z,X)
ÔÂ
The solution of it is
_ 0.
X = k =
r "£z • (14)
£Z (i) i=1
The same way, using formula (7) for the bootstrap samples
z*nb)) = (z*1b),z*2b),...,)), b = 1,2,....,B we obtain the values of the maximum likelihood estimator of X
k *(b) k *(b)
X(3) =-, b = 1,2,...., B
r £zw
i=1 1
The function (9) in this case is given by the formula
AB = E(A | A) = ■
A) =ZT=1 )f (A IA))
2;=!mf (A I A(l))
kA(
where./(A | A*(0) = (A*(0)ke A
Finally we obtain
where A =
2 Z(0 '
i=i
AB =
kA
21, mt A*(0(A*(0)V ^
(i\k„ A
*(b)
,.=! m(A ) e
kA''
2B_1(A*( 1))k+1 ^
kA*( 1
2BJ1(A*( 1)) ke" ^ b = 1,2,...., B.
2 z
.=1
*(b)
By repetition we can obtain a sequence of values of a Bayes estimator that we can use to construct its empirical distribution.
Example 2.
We wish to estimate a value of an exponential reliability function
x) = e, x > 0, A > 0, 0 = A.
(15)
-Ax,,
At a fixed moment x0 the number
r = R-( x0) = e is a value of the reliability function. Hence
A = ^.
There is a given vector
z ( n) = (Z(1), Z(2),..Z(k), Z(k+1),..., Z( n))
the coordinates of which have the same meaning as in Example 1. Let t be described by (13). A likelihood function of the parameter A for zn is
l (t,a) = Ake-At.
Substituting the value of A and r = elnr we get the form of the likelihood function
(16)
l (r,A) = f (r| r) =
f ln rV I lnr Ir f in r > ^
V x0 J
ln r
V xo J
(17)
The likelihood equation
a in i(r,A)
dA
= 0
is carried out to the following form r in
k + r = 0.
r in r rx0
A root of the equation is a value of the maximum likelihood estimate of r and it has a form of
x
0
x
x
0
e
r
r = e
kxo
Using the bootstrap samples
z = ( W W *(b)) b = 12 B we obtain the values of the maximum likelihood estimator of r and it is defined by
r
*( b) r ( ) = e
\
n
Z*( b )
V ¡=1 7
b = 1,2,...,B .
As
ln r
Ki ) A
( r *(i)) xQ
0 y
f (r | r*(l)) = ln(r*(l},r) = then the value of the Bayes empirical estimate of r computed on the basis on
w
£ mtr "{l ) f (r | r "{l )) ÏB = E (r | r ) = -M--
Z m f (r I r *(i))
i=1
has the following form rB =
z
¡=1
mr
Ki )
ln r
Ki ) A
x,
( r *(i ))
'(i )\ Xq
0 y
z
m
ln r*(i ) A
x
k x_
(r )X0
0 y
(18)
7 CONCLUSIONS
In that paper we present the possibility of applying the bootstrap and resampling methods in empirical Bayes estimation. The bootstrap and resampling copies of the given data are used to construct an empirical prior distribution.
REFERENCES
L
k
i
k
T
i=1
1. Belyaev Yu. K.: Resampling and bootstrap methods in analysis of reliability data. Safety & Reliability. ESREL 2001, p.1877-1882.
2. Grabski F. Jazwinski J.: Metody bayesowskie w niezawodnosci I diagnostyce. WKL, Warszawa 2001.
3. Efron B., Tibshirani R. J.: An introduction to the Bootstrap, Chapman & Hall, New York, London 1993.
4. Koronacki J. Mielniczuk J.: Statystyka dla studentöw kierunköw technicznych i przyrodniczych, Wydawnictwa Naukowo-Techniczne, Warszawa 2001.
5. Savcuk W.P. Bayesian methods of the statistical assessment. Nauka, Moskva 1989 (Russian).