y/(rk ) = ln(rk )
1 1 1
(26)
----1--
2rk 12(rk)2 120(rk )
+
i4
is the digamma function. The estimator of the parameter □ is simply given by X = k/1.
Another type of reliability field data was considered in papers by Usher [24] and Lin et al. [19]. These authors considered the case of so-called masked data. This type of data is observed when lifetimes of whole systems are observed, but the exact cause of failure (i.e. a component that failed) can be isolated only to some subset of components. Unfortunately, the problem of estimation of lifetime characteristics has been solved only either in the case of two-component systems [24] or in the case when there exists additional prior information about reliability of considered components.
2.4. Estimation of the failure rate from field data
Hu and Lawless [10] considered the case when reliability data sets contain information not only on times to first failures, but also on times to consecutive failures if the observed units failed several times during a warranty period. In such cases, which are typical for the reliability analysis of repairable objects, the most frequently used model that describes the process of failures is a Poisson process characterized by a failure rate □. When the failure rate varies in time the process of failures is called the non-homogeneous Poisson process, and the reliability characteristic of interest is the time-dependent failure rate X(t). Hu and Lawless [10] considered parametric and non-parametric estimation of A(t) in two cases: when only data on failed units are reported (i.e. in case of data truncation, when the number of non-failed units is unknown), and when the population's size and the distribution of individual censoring times are known. In the first case the estimator of the failure rate X(() can be found iteratively. In the second case the complexity of computations depends on the amount of knowledge about the population size and the distribution of censoring times.
3. Statistical analysis of reliability field data with incomplete interval-type information
In the previous section we have presented different mathematical models that can be used for the analysis of reliability field data. This type of lifetime data is in general more difficult to analyse using classical statistical methods. What is typical to field data is the existence of missing, unobserved or imprecisely reported data. If we want to analyse such data using a thorough statistical approach we immediately are in troubles. First of all, additional statistical information is needed which is necessary for the description of missing or imprecisely reported data in terms of the theory of probability. For example, if lifetime data are imprecisely reported due to the unknown delay time, see [17] for the description of the problem, the probability distribution of the delay time has to be identified using independent investigation. The same problem arises when we need to know the usage rate. The probability distribution of ai in (22) has to be estimated even in the case when there exist doubts whether such unique distribution ever exists. Another group of problems arises even in those cases when the additional information is available. Mathematical models used for the estimation of reliability characteristics become very complicated, and in many cases are rather intractable for an average user. Specialized software is needed, and this software is rarely commercially available. In all these and similar cases there exists, however, additional imprecise information about possible values of the quantities of interest. This information may be expressed in a form of intervals of possible values of model parameters or values of imprecisely reported observations. It has to be noted that this type of the representation of imprecision is not equivalent to the assumption that quantities with unknown or imprecisely reported values are uniformly distributed on those intervals. The interpretation of these intervals should be rather made in the spirit of the classical theory of measurement. If such unknown or imprecisely reported quantity is represented by the interval of its possible values it may be understood as if that value could be represented by any probability distribution defined over such interval. Thus, the application of interval data yields both pessimistic and optimistic bounds for the reliability characteristics of interest. In this section we present some examples of the usage of this approach in dealing with reliability field data.
Let us begin with the simplest model of life data. Suppose that M units are tested during a fixed time period T0. Let t1,..., tm be the observed times of m reported failures. The failures of the remaining M-m units have not been reported by the time T0, and we assume that for these units T0 is their censoring time. As usual, we assume that f (t; 8) is the density function of the time to failure, and □ is a vector of its unknown parameters. The maximum likelihood estimators of □ can be found by the maximization of the log-likelihood function
m
Z(8) = ^ log f t; 8) + (M - m)S(T0; 8), (27)
i=1
where S(To; 8) = P(T > To) is the survivor function. The problem stated above is a classical statistical problem extensively investigated for numerous probability distributions of lifetimes. Consider now its more realistic version. First of all let us assume that the reported failure times Ti do not represent real failure times due to some random delay. For example, a transmission leakage in a car may be reported after a visit to a service centre, and not after observing its first signs on a garage floor [21]. Let Di be a random delay time. Hence, the real time to failure is described by an unobserved random variable. Note however, that even if observed lifetimes Ti are distributed according to a well known probability distribution, e.g. the Weibull distribution, then the distribution of Xi may be completely different, even when the distribution of delays Di is known. In real situation the distribution of Di is usually very difficult to estimate, so the derivation of a more or less precise probabilistic model for the description ofXi is usually hardly possible. The existence of delays in the reporting of failures may cause additional complication. As a matter of fact we may not be sure if all failures have been reported by the censoring time T0. We do not consider this possibility in our model, as its thorough description seems to be very complicated. Now, let us consider another serious problem with the analysis of reliability field data. In the majority of practical cases reliability engineers are rather not interested in the description of reliability in terms of calendar time, but in terms of operational or usage time. In the previous section we discussed some basic problems that arise when we want to model this situation. Even in the simplest case of a linear transformation of a calendar time to a usage time we have to know the daily usage rate Ui that is a random variable whose distribution is very difficult to estimate. In practice this can be done only for products like cars when the usage time is continuously monitored in an automatic way. In all other practical situations the usage rate may be only estimated from imprecise statements of users. Let Zi be the lifetime in terms of usage time. Then we have Zi = (Ti - Di )Ui. In face of all difficulties mentioned above the derivation of the probability distribution of Zi seems to be hardly possible. Finally, let us notice that different usage rates influence the values of censoring times of non-failed units. These censoring times are now the realizations of a random variable Z0=T0U, where U represents a random usage rate for non-failed units. It is quite obvious that this distribution can be estimated either using expert opinions or from a specially designed statistical experiment.
The discussion presented above shows quite clearly that even in the simplest case of the analysis of real field data the precise mathematical description of the problem becomes very difficult or even mathematically intractable. However, we still have to analyse the data in the form they are available to us. Therefore, there is a need to propose approximate methods that should be simple enough in order to be applied in practice. In this section of the paper we propose to represent our lack of knowledge in terms of intervals representing the values of considered characteristics or quantities.
In order to simplify further notation let us denote by Dx a compact interval [xmin,xmax]. The lack of knowledge about the precise value of the time to a real failure let us describe by assuming that the real time to failure takes place in the interval Dti, where timcx! is equal to the reported failure time ti. Similarly, we assume that the usage rate for each observed failed unit is described by the interval Dui, and the usage rate for all censored unit belongs to the interval Du. Hence, we can calculate the interval the usage time to a failure belongs to. This can be done using the rules of simple interval arithmetics; the lower bound of the
interval Dzi is equal to Z^ min = timminUi,min , and the upper bound is given by Zi, max = ti,maxUi,max . SimilarlУ, the
lower bound for the usage censoring time is given by Z0min = T0 umin, and the upper bound by Z0 max = T0umax . We should also make the assumption that the probability distribution of lifetimes belongs to a certain class of probability distributions. This assumption is a crucial one, as strictly speaking this
distribution is different from that which describes observed times to failures measured in the calendar time. However, when the intervals of interest are not very wide this assumption seems to be practically acceptable.
In the next step of our analysis we calculate a multivariate interval A8 = [8min ,8max ] that describes the estimated values of □. Lower and upper bounds of □ can be found by solving two optimisation problems.
8 mm = inf arg max L(q) (28)
ztGazt,Z0GaZ0 8
0 max = SUP arg max LÍO), (29)
ZiGazi,Z0GáZ0 8
where L(o) is the log-likelihood function given by
m
L(e) = 2 log f (zt ; 0) + (M - m)S(Z0 ; 8). (30)
i=1
The optimisation problem defined by (28) - (29) may be, in a general case, difficult, as the interval computations for non-linear functions are usually time consuming. However, in some practical cases the optimisation problem may be significantly simplified. In the case of the exponential distribution the lower
and upper bound for the hazard rate □ are given by simple formulae
m
¿mm =—--(31)
2 +(M - m)Z0m
m
¿max =—-• (32)
2 Zi.mm +(M - m)Z0, min
i=1
Unfortunately, in the case of the Weibull distribution the interval for the possible estimated values of the shape parameter □ cannot be calculated using separately lower and upper bounds for observed lifetimes and censoring times. Only the bounds for the scale parameter (or its reciprocal) can be calculated in such a way. In general, simple computations are possible only then if a lifetime distribution is of a location-scale type. In such a case, the bounds for a location parameter can be calculated using lower and upper bounds for lifetimes and censoring times separately.
Let us consider now another relatively simple example of a practical application of the interval approach in the analysis of reliability field data. In section 2.2 of this paper we presented a mathematical model of lifetimes when the probability distribution of these random variables depends also on certain covariates, which describe usage conditions. These conditions may be described by a vector of covariates z, and the dependence of lifetimes on these covariates may be described by different mathematical models. Assume now, that this dependence is described by the proportional hazard model defined by equations (7) -(9). In this model probability distribution of lifetimes depends on the values of covariates via zp = zjpj +-----+ zqf5q, where □□'s are unknown regression coefficients. Estimation of these coefficients in
the proportional hazard model was proposed by Cox [5], and is briefly presented in section 2.2.
In case of reliability field experiments each investigated unit may be used in different conditions. Theoretically, these conditions may be defined quite precisely, and described by real numbers. However, in practice it is much more convenient to describe usage conditions by categorical variables. In such a case each covariate zj, j = 1,...,p may adopt only a finite number of possible values zjl, j = 1,...,p;l = 1,...,n} . If
the set of these values can be identified for each of k failed units we can find the estimators of □ by solving equations (19). However, in certain circumstances the users may face difficulties with a precise identification of the values of covariates z. Let us suppose, for example, that exploitation conditions vary in time, and it is not obvious whether these conditions should be labelled as moderate or severe. In such situation the
i=1
necessity to choose only one value of the covariate that describes the severity of exploitation conditions may distort a final reliability analysis. Introduction of another probabilistic model for the description of this situation may be too difficult from a practical point of view. Therefore, it seems to be much more convenient to use a set-valued description of the considered covariates. In case of covariates described by real numbers we can directly use the notation introduced previously, i.e. Az j =[zj,min,zj,max] j = 1,...,p. However, we
also can use this notation in case of ordered categorical data. Let D^^be a multivariate interval that describes the estimated values of the regression coefficients □□in the presence of interval data Azi,i = 1,...,m, where m is the number of observed failures. The lower and upper bounds for □□can be found by solving the optimisation problems
Pmin = inf arg maxL($) (33)
zieazi p
Pmax = SuP argmaxL(p) , (34)
ZjEAZj P
where L(p) is the log-likelihood function given by (18).
The solution of (33) - (34) is, in general, difficult. However, in many cases the dependence of reliability upon covariates has a monotonic nature. In this case the lower and upper bounds of □ defined by (33) - (34) may be found using appropriately chosen (depending on the direction of the dependence) boundary values of Azi,i = 1,...,m .
The limited volume of this paper allows us to present only a general description of relatively simple models for the analysis of reliability field data. These models are more complicated than the simplest lifetime models, but are applicable in such cases when a proper probabilistic analysis of reliability field data is either very difficult or even impossible. In order to overcome these problems we have to deal with some information of subjective nature. This is the price we have to pay if we want to solve more realistic problems.
4. Statistical analysis of reliability field data with imprecise fuzzy information
In the previous section we considered the case when the information which is necessary for a proper evaluation of reliability in terms of the theory of probability and mathematical statistics may be incomplete and imprecise. Our lack of full information we represented in terms of intervals describing the quantities of interest. Representation of uncertainty by intervals has its origins in the theory of measurement. If no additional information is present, this methodology allows the calculation of the bounds for reliability characteristics of interest. These bounds may be interpreted as "the worse" and "the best" possible values which take into account any type of variability of imprecisely or partially known values of field lifetime data. However, one can argue that this type of representation of uncertainty may not reflect the complexity of available information. For example, let us suppose that the daily usage rate of certain equipment is reported by its user as "about five hours a day". From further inquiry one may get information that it means "between four and six hours a day". Note, that this information does not tell anything about the way the usage rate varies in time. Therefore, the representation of uncertainty in a form of an interval seems to be quite appropriate. On the other hand, it is easy to note that the original information, "about five hours a day", carries additional information. One may believe that the real usage rate is more often closer to five hours than to any other number of hours. This still vague information, which does not allow building any probability distribution, may be described formally using the theory of fuzzy sets introduced by Lotfi A. Zadeh[25].
Fuzzy sets are the generalization of ordinary sets. In order to define a fuzzy set we have to specify a so called universe of discourse X, i.e. an ordinary set that contains all elements that are relevant for the description of a vaguely defined (or described) object. In the considered in this paper reliability context it
might be a set (or a subset) of positive real numbers, when we describe the usage rate, a set of integers, when we describe a partially known number of units on test, or a set of labels, when we describe the severity of working conditions. A membership function ua :X ^[0,l] such that ua (x) tells us to which degree an element x e X belongs to the fuzzy set A, is the second part of the definition of a fuzzy set. Thus, a fuzzy set A in a universe of discourse X is a set of pairs
A = \Ma (x W
(35)
This formalism is very useful for the description of vague and imprecise concepts, as the value of the membership function ua (x )e[0,l] describes our degree of belief that the value x describes the considered concept.
Each fuzzy set has a unique representation in terms of so called D-cuts, or □-level sets. The ordinary (non-fuzzy) set Aa — {x e X : ua(x)> a},for each a e (0,l], is called the D-cut of the fuzzy set A, and the set of all D-cuts uniquely defines this fuzzy sets.
When the universe of discourse is represented by the set of real numbers we can generalize the concept of a real number and define a fuzzy number. The fuzzy subset A of the real line R, with the membership function ua : R ^ [0,l] is a fuzzy number iff
a) A is normal, i.e. there exists an element x0 such that u(x0) = 1;
b) A is fuzzy convex, i.e. ua (( +(l ~^)x2)>UA (x1 )AUA (x2), Vxl ,x2 e R, V2e[0,l];
c) UA is upper semi-continous;
d) supp A is bounded.
From the definition given above one can easily find that for any fuzzy number A there exist four real numbers al, a2, al3, a4 and two functions: non-decreasing function nA: R ^[0,l], and non-increasing function E A : R ^[0,l], such that the membership function U A is given by
Va
(x) =
0
Va (x) 1
(x) 0
if if if if if
x < a1 a1 < x < a 2 a2 < x < a3 a3 < x < a4
(36)
Functions r/A and EA are called the left side and the right side of a fuzzy number A, respectively. A special, and very useful in practice, case of a general fuzzy number is a trapezoidal fuzzy number defined by the following membership function
mx (x) =
0 if x < x1
(x - "xi y (x2 - xi ) if x1 < x < x 2
1 if x2 < x < x3
(x4 - x)/(x4 - x3 ) if x3 < x < x 4
0 if x < 4 x
(37)
Note, that real-valued intervals considered in the previous section of this paper can be looked upon as trapezoidal fuzzy numbers for which xl — x2 — xmin ,
and
x3 — x4 — xmax .
Membership functions of fuzzy numbers that are defined as functions of other fuzzy numbers may be calculated using the following extension principle introduced by Zadeh, and described in Dubois and Prade [6] as follows:
Let X be a Cartesian product of universeX — Xl x X2 x ••• xXr, and Al,...,Ar be r fuzzy sets inXl,...,Xr, respectively. Let f be a mapping from X — Xl xX2 x ••• xXrto a universe Y such thaty — f (xl,...,xr).The extension principle allows us to induce from r fuzzy sets A; a fuzzy set B on Y through f such that
a4 < x
Mb ( )= suP min[^Ai (x ),... ,цАг (xr )J
xi,.,xr;y=f (xi,..,xr)
(38)
Mb (y )= 0 if f - (y )=0
(39)
One can prove, see e.g. books by Dubois and Prade [6] or by Zimmermann [27], that the application of the extension principle is equivalent to the application of the interval arithmetics on □-cuts.
Fuzzy sets, and their special instances - fuzzy numbers, have been applied in solving different reliability problems. An extensive overview of these applications can be found in Hryniewicz [8]. If we want to apply this approach to the analysis of field lifetime tests we can directly apply the results presented in the previous section. In order to do so let us notice that the calculations presented in that section are exactly the same as the calculations that should be done for given □ -cuts representing fuzzy data.
5. Conclusion
Probabilistic models that have been proposed for the description of field lifetime data, and are relatively easy to be applied in practice, usually do not describe all the aspects of this type of data. If we want to build models, which better describe reality, then immediately these models become very complicated. Moreover, additional assumptions have to be made in order to describe complex phenomena characteristic for this problem. In this paper we have proposed an alternative but only approximate approach where unknown values of model parameters are represented in terms of intervals. By applying the interval arithmetics we can calculate the bounds on the values of respective reliability characteristics. If additional but still imprecise information is available we propose to generalize the interval-valued calculations to fuzzy-valued ones. The results of these calculations can be interpreted as possibility distributions in the sense of Zadeh [26], defined on sets of possible values of vague quantities. It has to be stressed, however, that if appropriate probabilistic information is available it should not be replaced with the fuzzy one. Fuzziness in our models does not replace randomness, but supplements it if we have to use imprecisely perceived notions or vague statistical data.
[1] Cohen, C. A. (1959). Simplified estimators for the normal distribution when samples are singly censored or truncated. Technometrics, 1, 217 - 237.
[2] Cohen, C. A. (1991). Truncated and censored samples: theory and applications. Marcel Dekker, New York.
[3] Coit, D. W. & Dey, K. A. (1999). Analysis of grouped data from field-failure reporting systems. Reliability Engineering and System Safety, 65, 95 - 101.
[4] Coit, D. W. & Jin, T. (2000). Gamma distribution parameter estimation for field reliability data with missing failure times. IEE Transactions, 32, 1161 - 1166.
[5] Cox, D. R. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society, ser.B, 34, 187 - 202.
[6] Dubois, D. & Prade, H. (1980). Fuzzy Sets and Systems. Theory and Applications. Academic Press, New York.
[7] Duchesne, T. & Lawless, J. F. (2000). Alternative time scales and failure time models. Lifetime Data Analysis, 6, 157-179.
[8] Hryniewicz, O. (2007). Fuzzy sets in the evaluation of reliability. In: Computational Intelligence in Reliability Engineering. New Metaheuristcs, Neural and Fuzzy Terchniques in Reliability, Levitin, G. (Ed.), Springer, Berlin., 363 - 386.
[9] Hu, J. X. & Lawless, J. F. (1996a). Estimation from truncated lifetime data with supplementary information on covariates and censoring times. Biometrika, 83(4), 747-761.
[10] Hu, J. X. & Lawless, J. F. (1996b). Estimation of rate and mean functions from truncated recurrent event data. Journal of the American Statistical Association, 91, 300-310.
References
[11] Hu, J. X., Lawless, J. F. & Suzuki, K. (1998). Nonparametric estimation of a lifetime distribution when censoring times are missing. Technometrics, 40, 3-13.
[12] Jung, M. & Bai, D. S. (2007). Analysis of field data under two-dimensional warranty. Reliability Engineering and System Safety, 92, 135-143.
[13] Kalbfleisch, J. D. & Lawless, J. F. (1988). Estimation of reliability in field-performance studies. Technometrics. 30, 365-388.
[14] Kalbfleisch, J. D., Lawless, J. F. & Robinson, J.A. (1991). Methods for the analysis and prediction of warranty claims. Technometrics, 33, 273-285.
[15] Kaplan, E. L. & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, 457 - 481.
[16] Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data. John Wiley and Sons, New York.
[17] Lawless, J. F. (1998). Statistical analysis of product warranty data. International Statistical Review, 66, 41-60.
[18] Lawless, J. F., Hu, J. & Cao, J. (1995). Methods for the estimation of a lifetime distributions and rates from automotive warranty data. Lifetime Data Analysis, 1, 227-240.
[19] Lin, D. K. J., Usher, J. S. & Guess, F.M (1996). Bayes estimation of component-reliability from masked system-life data. IEEE Transactions on Reliability, 45, 233 - 237.
[20] Oh, Y. S. & Bai, D. S. (2001). Field data analyses with after-warranty failure data. Reliability Engineering and System Safety, 72, 1-8.
[21] Rai, B. & Singh, N. (2003). Hazard rate estimation from incomplete and unclean warranty data. Reliability Engineering and System Safety, 81, 79 - 82.
[22] Suzuki, K. (1985). Estimation of lifetime parameters from incomplete field data. Technometrics, 27, 263272.
[23] Suzuki, K. (1985). Nonparametric estimation of lifetime distributions from a record of failures and follow-ups. Journal of the American Statistical Association, 80, 68-72.
[24] Usher, J. S. (1996). Weibull component reliability-prediction in the presence of masked data. IEEE Transactions on Reliability, 45, 229-232.
[25] Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338 - 353.
[26] Zadeh, L.A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1, 3 - 28.
[27] Zimmermann, H. J. (1996). Fuzzy Set Theory and its Applications (Third Edition), Kluwer, Boston.