Statistical analysis of interval and imprecise data - applications in the analysis of reliability field data

Hryniewicz Olgierd

- the friction force of the bottom part of the ship's hull against the ground, taking into account the friction coefficient.

- the depth of the ship's penetration into the ground.

- work performed for overcoming friction force of the hull's bottom part.

- work performed for overcoming the resistance of friction of the lateral parts of the hull for a specified depth of the ship's penetration into the ground.

- work performed for overcoming soil wedge.

- the decrease of the ship's kinetic energy caused by contact with the ground.

- the decrease of the ship's speed components.

5. Conclusion

The keel clearance should warrant the safe manoeuvring of a ship in the port water area. Its value depends on many elements, in the midst of which the sea water level is very important. If keel clearance is great then the safety of ship is major but the admissible ship draft is less. It can cause:

- limited quantities of cargo loaded and unloaded, which means lower earnings for the port and stevedoring companies;

- lower ship-owners' profits as the ship's capacity is not used to the full or longer turnaround time due to necessary lighter age at the roads, before the ship's entrance. port charges are smaller as they depend on the ship's tonnage (berthing, towage etc.);

- in many cases large ships resign from using services of a port where they are not able to use their total cargo capacity.

It is possible to predict of maximum sailing draft for entering ships into the port by proper method of calculation. Such predictions enabled increases in maximum drafts in relation to UKC defined by port low as a fix value. It can translate into cargo increases ranging up to several thousands tonnes per ship. In particular it refers to the Polish ports (Gdansk, Gdynia, Swinoujscie). UKC requirements should be determined with a much higher degree of certainty allowing the manoeuvring of ship to be made more safely. A ship can touch the bottom of a navigable area due to the reduction of its keel clearance. The mechanism of ship's impact against the bottom basically differs from grounding or hitting a port structure (berth) and is not sufficiently described in the literature on the subject. Phenomena such as ship's pressure on the bottom ground and its reaction (passive earth pressure) are essential in the assessment of the impact effects. The kind and degree of hull damage mainly depend on the energy absorbed by the hull during its impact against the sea bottom. The results of the research permits to assess of navigational risk and thus to improve the safety of ship manoeuvring in port water areas.

References

[1] Galor, W. (2005). The managing of the navigational safety of ships in port water areas. Editors C.A.

Brebbia At all. WITPRESS Southampton, Boston.

[2] Galor, W. (2005). The movement of ship in water areas limited by port structures. Annual of Navigation no 10, Gdynia.

[3] Galor, W. (2005). Analiza okreslania zapasu wody pod st^pk^. Materiafy XI Miqdzynarodowej Konferencji Naukowo -Technicznej „Inzynieria Ruchu Morskiego ". Szczecin, 2005.

[4] Galor, W. (2005). Wybrane problemy bezpieczenstwa zeglugi na akwenach portowych. TRANSPORT pod redakj Z. Strzyzakowskiego, Prace Naukowe nr 3(23) 2005 Politechniki Radomskiej, Radom.

[5] Mazurkiewicz, B. (2006). Morskie budowle hydrotechniczne. Zalecenia do projektowania i wykonania

Z1-Z41. Wyd. FPPOiGM, Gdansk.

[6] Pedersen, T.P. & Zhang, S. (2000). Absorbed energy in ship collision and grounding, Revision Minorsky's Empirical Method, Journal of Ship Research, Vol.44, No.2.

METHODS FOR RISK ANALYSIS IN DISASTER REDUCTION

Van Gelder, P.H.A.J.M.

TU Delft, Delft, The Netherlands

Keywords

risk analysis, disaster reduction, natural hazards Abstract

This paper discusses a proposal for a risk management tool for applications to risk reduction of natural hazards.

1. Natural Hazards

It is always a difficult dilemma with research projects on natural hazards if it should focus on certain aspects of the hazard (its probability of occurrence, its damage potential, the effectiveness of mitigation measures and building codes, its human behaviour and injury causation during the catastrophe, etc), or if the project should be addressed as a complete entity which involves physical, technological, economic and social realities. In this paper the first option is chosen, although now and then parts of the second option are presented.

Many books on natural hazards too often fall to an anecdotal level of 'horror stories' lacking a serious academic treatment of the subject. This is in contrast with one of the first complete treatises on natural hazards by White et al. [16]. Since the book is over 30 years old, many of the issues in this book are outdated unfortunately. It describes the status of natural hazards research in the USA in the 70s, and it gives recommendation for future research. The main message in their book is that research in the 1970s concentrated largely on technologically oriented solutions to problems of natural hazards, instead of focusing equally on the social, economic and political factors which lead to non adoption of technological findings, or which indicate that proposed technological steps would not work or only tend to perpetuate the problem (according to the authors). For floods the authors propose five major lines of new research: Improving control and prediction, Warnings and flood proofing, Land Management, Insurance, relief and rehabilitation, basic data and methods. For other natural hazards, 15 in total, similar lines are outlined. Interesting is that the authors already present methods of estimating research results within an evaluation framework, including economic efficiency, trade-offs and values.

Natural hazards considered under climate change have been studied by McGuire et al [12] and is heavily based on the results of the 3rd assessment report of 2001 by the IPCC (Intergovernmental Panel on Climate Change), who upgraded their temperature rise forecasts to 8 degrees Celsius by the end of the century. The natural hazards in McGuire [12] are described in the light of IPCC's forecasts. Windstorms are described to anthropogenic climate change and are shown to have the potential for large changes for relatively small changes in the general climate. Its natural patterns of climate variability are discussed by McGuire, amongst which ENSO, NAO, and PNA (Pacific North American tele-connection). Studies are presented which try to observe and predict the frequency and severity of extreme windstorms on a spatial and temporal scale. Also river and coastal floods under global warming are examined. Most research on river floods has concentrated on changes in observed precipitation and prediction methods, but the authors also present non-climatic factors involving human influences on the river basin. Coastal flooding from tropical and extra tropical storms under sea level change is investigated, as well as sea temperature changes (heat -and cold waves). The 1999 Venezuela landslides, causing 50 000 fatalities, have put this undervalued natural hazard on the agenda again. The authors concentrate on the water accumulation below the surface of unstable slopes. The landslide's theological properties (which resist the movement) are studied under environmental change.

Sea level change is discussed under the uncertainties of response to warming of the Greenland and Antarctic ice sheets and the effect of CO2 gas mitigation in the coming decades. The effect of sea level rise on submarine landslides and as a consequence ocean-wide tsunami is analysed. Coastal erosion and other geomorphologic effects of sea level rise are left out here.

Also asteroid and comet impacts as initiators of environmental change are included in McGuire [12]. Time domain simulations of a 20km/s impact in a 4 km deep ocean are presented.

McGuire [12] ends with some results from a recent paper in Science (v 289, p 2068-74, DR Easter ling et al) on different forecasts of climate extremes. The authors plead for political will from industrialized countries such as USA, Japan and Australia to invert their increase in gas emissions before the hazardous aspects of climatic shift make themselves felt.

Bryant [5] gives a complete overview on natural hazards, as well as its social impacts. Apart from how natural hazards occur, the author also presents (controversial) methods how to predict hazards from occurring again (on short and long term). The author claims that there is sound scientific evidence that cosmic / planetary links exist with the occurrence of earthquakes and floods. The 11-year sunspot cycle and the 18.6-year lunar cycle (caused by the moon's orbit fluctuation) are used to show a correlation with the ENSO index, occurrences of floods and droughts in North America, Northern China, Australia, Patagonia, amongst others. Very surprising Bryant [5] shows that in some parts of the world (such as the Mediterranean) the sunspot frequency and the seismic activity are correlated, via fluctuations in the Earth's rotation (in the order of milliseconds). However, if earthquake occurrence is dominated by some force external to the Earth (as mentioned by the author), then one would expect clustering to be taking place at the same time worldwide, which is not supported by the data.

Cannon et al [7] claim that natural disasters are not only caused by the natural environment, but also (or maybe even more) by the social, political and economic environment. This is shown throughout their work when they concentrate on the various hazard types: floods, coastal storms, earthquakes, landslides, volcanoes, biological hazards and famine. The authors consistently use a flow diagram describing the framework of the root causes, dynamic pressures, unsafe conditions (on the one side), the hazard (on the other side), and the disaster (in the middle).

Cannon et al [7] describe 12 principles towards a safer environment. It cannot be made by technical measures alone. It should address the root causes by challenging any ideology, political or economic system that causes or increases vulnerability. It should reduce pressures by developing by macro forces such as urbanization, re-forestation, a.o. It should achieve safe conditions by protected environment, resilient local economy and public actions, such as disaster preparedness. Together with technical measures to reduce certain hazards (such as flood defences, shelter breaks, etc), it should all lead to a substantial reduction in disaster risk.

The authors illustrate natural hazards from a social studies point of view, with striking observations, such as the bureaucratic blindness and biased relief assistance in South Carolina following hurricane Hugo in 1989 to the needs of many African Americans who lacked insurance and other support systems. The huge North Vietnam floods in 1971 only resulted in a few hundred deaths, largely because of a highly efficient wartime village-level organization that allowed rapid evacuation and provision of first aid, whereas the similar 1970 Bangladesh floods killed a record 300,000 people.

2. Ten steps for a structured approach of risk analysis and risk reduction of natural hazards

In recent years probabilistic and statistical approaches and procedures are finding wider and wider applications in all fields of engineering science, starting from nuclear power aeronautic applications down to structural mechanics and engineering, offshore and coastal engineering, and in more or less sophisticated forms are the base of many of the most recent versions of Structural Codes of Practice throughout the world. Detailed commentaries of these codes have been written as CIRIA (1977) or ISO (1973) reports. Applications to civil engineering are described by the comprehensive text of Benjamin & Cornell [3]. More recent similar comprehensive texts are Augusti & al. [1] and Thoft-Christensen & Baker [15]. A general application to structures in a coastal environment is provided by Burcharth [6].

Risk analysis is usually structured in:

1. analysis of hazard (risk source, natural processes causing damages),

2. analysis of failure (risk pathway, mechanisms through which hazard causes damages).

3. analysis of vulnerability (behaviour of the risk receptors).

For the first analysis, extreme events and joint probabilities of natural processes making up the hazards should be statistically described. In the second analysis, components of the defence systems should be

identified, characterized and processes leading to failure are deterministically described. In the third analysis, understanding and assessment of direct and indirect damages and intangible losses including risk perception and acceptance from population, social and ecological reaction (resilience). The second step is process specific and will be described below, separately for each considered hazard. This step structured however in identification and prediction of failure modes, reliability analysis of defence structure or systems (combination of hazard statistics and structure behaviour) and modelling of post failure scenarios aiming to identify damages.

Damages caused by natural disasters can be distinguished as economical and non-economical, depending on whether or not a monetary value can be assigned to a specific damage. In addition, these damages are distinguished as direct and indirect, depending on whether the damage is the results of direct contact with the natural hazard or whether it results from disruption of economic activity consequent upon the hazard [13]. The economic approaches on the valuation of disaster generally pursue an objective of public policy: Given a set of courses of action to take to alleviate damages from hazardous events, what is the one with highest economic value? To answer that question, the literature has followed two approaches.

The first approach is that in which the value of a given public policy comes from the avoided damage. There is a series of damages associated with hazardous events, some of those that come to mind are loss of property, injury and loss of human life, or natural habitat disruption. Farber [9] and Yohe et al. [17] illustrate complex cases of valuation of property loss and disruption of economic activity caused by potential storm and flooding events. A qualitative list of potential losses can be found in Penning-Rowsell and Fordham [14]. A benefit transfer exercise consists in a statistical estimation of a function based on existing evidence in order to transfer value ("benefit") from the various study sites to the policy site, see Brouwer [4] and Bateman et al. [2]. On the basis of the evidence gathered to estimate the transfer function, it is possible to assess the risk of error in transferring values. End-users may then decide what risk they are willing to run for a particular application. The trade-off is between administering an expensive valuation survey (with low risk of error) and an inexpensive transfer of values with a potentially high risk of error depending on the particular site analysed.

The second approach is more direct in the sense that the researcher directly asks the relevant public to value the public policy itself, including its effects on flooding risk and potential physical damage. This approach has been illustrated in Penning-Rowsell and Fordham [14] and relies on "stated preferences" methods such as the contingent valuation or choice experiments; see Carson [8] and Haab and McConnell [10] for recent reviews on the former and Louviere et al. [11] on the latter. Contingent Valuation surveys consisted of the following steps: survey design, whose aim is to draw up a questionnaire suitable for the specific situation considered; sample design, to provide guidelines to obtain a random sample; pre-test of 30/50 interviews to check the wording of the questionnaire; main survey on the field of at least 600 interviews. As regards sites under risk of flooding, in general it is possible to carry out: site specific surveys to obtain data about property damages and to estimate damages from flooding, and post-flood household surveys to identify the immediate needs of the flood victims and to assess the intangible or non-economical flood effects [13].

Historically human civilizations have striven to protect themselves against natural and man-made hazards. The degree of protection is a matter of political choice. Today this choice should be expressed in terms of risk and acceptable probability of failure to form the basis of the probabilistic design of the protection. It is additionally argued that the choice for a certain technology and the connected risk is made in a cost-benefit framework. The benefits and the costs including risk are weighed in the decision process. Engineering is a multi-disciplinary subject, which also involves interaction with many stakeholders (individuals or organizations who have an interest in a project). This paper addresses the specific issue of how numerical occurrence probability levels of natural hazards are both formulated and achieved within the context of engineering design and how these relate to risk consequence.

A proposal for a common framework for risk assessment of any type of natural hazard is given by adapting the general theoretical approaches to the specific aspects of natural hazards, such as mass movements, and extreme waves. The specific features of each case will be presented in this paper and it will be shown that the common procedure proposed is able to deal appropriately with the specifics of each of the natural hazards considered.

Statistical methods are abundantly available to quantify the probability distributions of the occurrences of different hazards with special topics such as treating very seldom events, dealing with spatial and temporal variability of data, as well as with joint occurrences of different types of data. The two cases will

demonstrate the applicability of the general methods to the specific aspects of the data from mass movements, and extreme waves. The 1st step in a structured risk analysis of natural hazards is:

Step 1. Statistical analysis of observations

Data is collected from mass movements, flooding, extreme waves and earthquakes and analysed with statistical methods. Proper tools are used in order to harmonies data, which comes from different sources (for instance instrumental or historical observations of natural hazards).

Step 2. Integration of mathematical-physical models in probabilistic models

The possible progress of a natural hazard from phase I to phase I+1 is described with transition probabilities in Markov models. Mathematical-physical models are used to generate data to be combined with observations and measurements for statistical analysis.

Step 3. Estimation of dependencies between natural hazards

Collected data from mass movements, flooding, extreme waves and earthquakes in some instances are analysed with respect to linear correlations and non-linear dependencies. Mathematical-physical-based reasons can be investigated to explain the existence of correlations and dependencies between the occurrences of hazards at the same time.

Step 4. Use of multivariate statistical models

Joint probability distribution functions (JPDFs) describe the probability that a number of extreme events happen simultaneously. Dependencies between events cause difficulties in deriving these JPDFs.

Elements characterizing the degree of the past and future hazards can be combined with indicators for the vulnerability of the inhabited areas or of infrastructure installations. In databases, the damage is expressed in terms of fatalities and damage costs for private buildings, infrastructure installations and agricultural land. In the next steps it is necessary to relate the expected physical damage to the expected economic losses and expected losses of life.

Step 5. Economic models to derive (in)direct consequences of hazards: FD-curves

Risk is considered as the product of probability and consequences. All natural hazards are analysed with respect to their economic impacts on society. This leads to so-called FD-curves (the cumulative distribution function of the amount of damage D). Economic expertise is an important part in this step.

Step 6. Models to estimate loss of human lives: FN curves.

Apart from economic damage, natural hazards can also lead to human casualties. Estimates are derived and covariates are found of the possible number of casualties caused by natural hazards.

Step 7. Cost-Benefit transfer

The aim of step 7 is to examine whether or not it possible to transfer values from natural disasters mitigation, and in case it is, to extract a transfer function. First the different methodologies used to value hazardous events are compared and whether and how they can be aggregated. Then, the construction of the actual value database can be carried out. Finally, if sufficient data quality criteria are met, a statistical analysis is performed in order to extract a benefit transfer function for one or several categories of values of hazardous events.

The methods presently accepted to set the acceptable risk levels related to industrial risks can be considered and their applicability to set acceptable risk levels of natural hazards can be studied. An approach is proposed to determine risk acceptance levels for different types of natural hazards, discussing in particular the specific aspects of mass movements, flooding, extreme waves and earthquakes.

Step 8. Acceptable risk framework development

Decisions to provide protection against natural hazards are the outcome of risk analyses and probabilistic computations as an objective basis. Development of concepts and methods to achieve this are available from literature. It covers both multi-attribute design and setting of acceptable risk levels. The research reinforces the concept that efficient design not only requires good technical analysis, but also needs to consider the social aspects of design as well and incorporate the concerns and aspirations of stakeholders.

Each stakeholder has a different perspective on the objectives of a particular project and it is the designer's challenge to manage these multiple concerns and aspirations efficiently. If the efficiency of decision-making can be improved then it is quite possible that a 5% saving or larger can be achieved.

The main approaches to assess costs and benefits of different risk reduction measures can be analysed dealing in particular with the approaches to deal with multiple risk and to take in consideration their interaction. An approach is proposed to determine actions leading to as low as reasonably possible (ALARP) levels of risk for different types of natural hazards, discussing in particular the specific aspects of mass movements, flooding, extreme waves and earthquakes. For cost benefit analysis it is necessary to have models of the costs and of the benefits. Rough estimates on these numbers for the two cases will be shown in Sec. 3 and 4.

Step 9. Cost analysis of mitigation measures

In order to reduce the risks of natural hazards, mitigation strategies are applied. To answer the question if more mitigation is necessary (or in general the question "how safe is safe enough"), insight is developed in the costs of mitigation measures of natural hazards.

Step 10. Effectively analysis of mitigation measures

Apart from insight in the costs of mitigation measures, it is also necessary to quantify the effectively of these measures, in other words, how much can they reduce the consequences of natural hazards or reduce the probability of occurrence of these negative impacts.

3. Conclusion

The above 10 steps are proposed as an overall integrated and structured way to analyse risks from

natural hazards and are identified as 'best practice'.

References

[1] Augusti, G., Baratta, A. & Casciati, F. (1984). Probabilistic methods in Structural Engineering. Chapman and Hall, London.

[2] Bateman, I.J., Jones, A.P., Nishikawa, N., & Brouwer, R. (2000). Benefits transfer in theory and practice: A review and some new studies. CSERGE and School of Environmental Sciences, University of East Anglia.

[3] Benjamin, J.R. & Cornell, C.A. (1970). Probability, Statistics and Decision for Civil Engineers. McGraw-Hill, New York.

[4] Brouwer, R. (2000). Environmental value transfer: state of the art and future prospects. Ecological Economics, 32:1, 137 - 52.

[5] Bryant, E. (1991). Natural Hazards. Paperback: 312 pages , Publisher: Cambridge University Press, ISBN: 0521378893.

[6] Burcharth, H.F. (1997). Reliability-based designed coastal structures. In Advances in coastal and ocean engineering, Vol 3, Philip L.-F. Liu Ed., World Scientific, 145-214.

[7] Cannon, T., Davis, I., Wisner, B. & Blaikie, P. (1994). At Risk: Natural Hazards, People's Vulnerability, and Disasters. Hardcover: 284 pages, Publisher: Routledge , ISBN: 0415084768.

[8] Carson, R. (2000). Contingent Valuation: A User's Guide. Environmental Science & Technology, 34(8): 1413-18.

[9] Farber, S. (2001). The Economic Value of Coastal Barrier Islands: A Case Study of the Louisiana Barrier Island System. University of Pittsburgh: 26: Pittsburgh.

[10] Haab, T. & McConnell, K. E. (2002). Valuing Environmental and Natural Resources: The Econometrics of Non-Market Valuation Cheltenham, UK: Edward Elgar.

[11] Louviere, J. J., Hensher, D. A., & Swait, J.D. (2000). Stated Choice Methods. Cambridge University Press, Cambridge.

[12] McGuire, B., Mason, I. & Killburn, Ch. (2002). Natural Hazards and Environmental Change. Hardcover: 202 pages, Publisher: A Hodder Arnold Publication, ISBN: 0340742194.

[13] Penning-Rowsell et al. (1992). The Economics of Coastal Management. Belhaven Press, London.

[14] Penning-Rowsell, E. C. & Fordham, M. (1994). Floods across Europe. Middlesex University Press, London.

[15] Thoft-Christensen, P., & Baker, M. J. (1982). Structural reliability Theory and its Application. Springer Verlag, Berlin

[16] White, G. & Eugene Haas, J. (1975). Assessment of Research on Natural Hazards. Hardcover: 487

[17] Yohe, G., Neumann, J. E. & Marshall, P. (1999). The economic damage induced by sea level rise in the United States, in The impact of climate change on the United States economy. Robert Mendelsohn and James- E. Neumann eds. Cambridge; New York and Melbourne: Cambridge University Press, 331.

STATISTICAL ANALYSIS OF INTERVAL AND IMPRECISE DATA - APPLICATIONS IN THE ANALYSIS OF RELIABILITY

FIELD DATA

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Hryniewicz Olgierd

Systems Research Institute, Warszawa, Poland

Keywords

reliability, statistical analysis, field data, interval data, fuzzy data

Abstract

The analysis of field lifetime data is much more complicated than the analysis of the results of reliability laboratory tests. In the paper we present an overview of the most important problems of the statistical analysis of field lifetime data, and present their solutions known from literature. When the input information is partial or imprecise, we propose to use interval arithmetics for the calculation of bounds on reliability characteristics of interest. When this information can be described in a more informative fuzzy form, we can generalize our interval-valued results to their fuzzy equivalents.

1. Introduction

Statistical analysis of results of lifetime tests has its over 50 years lasting history. In contrast to methods usually applied for the analysis of ordinary statistical data, in case of lifetime data we have to take into account such specific features like censoring of observations or the existence of covariates. First applications of this methodology were designed for the analysis of reliability data. However, starting from the 1970's their main field of applications is the survival analysis applied not only to technical objects, but to human beings as well.

In classical textbooks on reliability it is always assumed that n independent objects (systems or components) are put on test, and in the ideal case of no censoring we observe the realizations of n independent and identically distributed (iid) random variables T1,...,Tn. When the lifetime test is terminated after the r-th observed failure, e.g. when we observe a predetermined number of failures r at times t(1) <...<t(r), and the remaining n-r objects survive a random censoring time t(r), we have the case of the

type-II censoring. On the other hand, when the lifetime test is terminated at the predetermined time tB, and the number of observed failures is a random variable, we have the case of the type-I censoring. In more general models, we may also assume the cases of individual random censoring (when we observe random variables Xt = min(Ti,Ci)i = 1,...,n , where Cf,i = 1,...,n are random, and independent from Tt,i = 1,...,n censoring times), multiple censoring (when for each subgroup of tested objects there exists a predetermined censoring time), or progressive censoring (when a predetermined number of objects are withdrawn from the test after each observed failure). The detailed description of these censoring schemes may be found in classical textbooks on the analysis of lifetime data, such as the book of Lawless [16].

Unfortunately, the practical applicability of well known methods is often limited to the case of precisely designed laboratory tests, when all important assumptions made by statisticians are at least approximately fulfilled. These tests provide precise information about lifetimes and censoring times, but due to the restricted (usually low) number of observed failures and/or the restricted test times, the accuracy of reliability estimation is rather low. Moreover, some types of possible failures may not be observed in such tests (e.g. due to their limited duration), and the obtained estimates of reliability may be overestimated.

It is beyond any discussion that the most informative reliability data may come only from field experiments, i.e. from the exploitation of considered objects in real conditions. Unfortunately, we have very seldom statistical data that are obtained under field conditions and fit exactly to the well known theoretical

models. For example, test conditions are usually not exactly the same for all considered objects. Therefore, the random variables that describe their lifetimes are not identically distributed. Another serious practical problem is related to the lack of precision in reported lifetime data. Individual lifetimes are often imprecisely recorded. For example, they are presented in a grouped form, when only the number of failures which occurred during a certain time interval is recorded. Sometimes, times to failures are reported as calendar times, and this does not necessarily mean the same as if they were reported as times of actual operation. Finally, reliability data that come from warranty and other service programs are not appropriately balanced; there exists more information related to a relatively short warranty time, and significantly less information about the objects, which survived that time.

Statisticians who work with lifetime data have tried to build models that are useful for the analysis of field data. The majority of papers devoted to this problem are related to the methodology of dealing with data from warranty programs. These programs should be considered as the main source of reliability field data. Therefore, the presentation of the statistical problems of the analysis of warranty data shall be an important part of this paper. Some important mathematical models related to the analysis of warranty data are presented in the second section of this paper. In all these models it is assumed that all observations are described more or less precisely, and all necessary probability distributions are either known or evaluated using precisely defined statistical data. In many cases this approach is fully justified. However, close examination of real practical problems shows that in many cases available statistical data are reported imprecisely. We claim that making these data precise by force may introduce unnecessary errors. Therefore, we believe that in case of really imprecise data this fact should be taken into account in an appropriate way. In the third section of the paper we present the solution of some chosen practical problems when the input information is given in an interval form. These results are generalized in the fourth section to the case of fuzzy input data. Some conclusions and proposals for future investigations are presented in the last section of the paper.

2. Mathematical models of reliability field data coming from warranty programs

Lifetime data collected during precisely controlled laboratory test provide important, albeit limited, information about reliability of tested equipment. This limitation has different reasons. First, the number of tested units and/or the duration of a lifetime test are usually very limited due to economic constraints. Second, controlled laboratory conditions do not reflect real usage conditions. For this reason, for example, some of important types of failures cannot be revealed during the test. Finally, only field data can provide useful information about dependencies between reliability characteristics of tested units and specific conditions of exploitation. In contrast to laboratory lifetime data, reliability field data may yield much more interesting information to a manufacturer. Unfortunately, the information that is characteristic for laboratory data is seldom available in case of field data. First of all, warranty programs that serve as the main source of reliability field data are not designed to collect precise data. For example, reliability data are collected only from those items that have failed during the warranty period. Moreover, this period may not be uniquely defined. It is a common practice to define the warranty period both in calendar (for example, one year) and operational (for example, in terms of mileage) time. Therefore, in many practical cases reliability data are intrinsically imprecise. Also exploitation conditions, important for the correct assessment of reliability, are not precisely reported. All those problems, and many others, make the statistical analysis of reliability field data a difficult problem. Therefore, despite its practical importance, the number of statistical papers devoted to the analysis of reliability field data is surprisingly low.

Statistical analysis of reliability field data coming from warranty programs can be roughly divided into two related, but distinct, parts: analysis of claims processes and the analysis of lifetime probability distributions. From the point of view of a manufacturer the most important information is contained in the description of the process of warranty claims. Comprehensive description of this type of analysis can be found in the papers by Lawless [17] and Kalbfleisch et al. [14]. In the analysis of claims processes statistical data are discrete, and are described by stochastic count processes like the Poisson process or its generalizations. By analysing count reliability data we can estimate such important characteristics as, e.g., the expected number of warranty claims during a specific period of time, the expected costs of such claims, etc. This type of information is extremely important for designing of warranty programs, planning of the supply of spare parts, and the evaluation of the efficiency of service activities, but does not yield precisely enough information about the intrinsic reliability characteristics of investigated units. Information of this type is much more useful for improving the quality on the design stage of a product, especially for the

comparison of different solutions, etc.

In this paper we limit the scope of our investigations to the statistical analysis of probability distributions of lifetimes. Throughout the paper we will denote by T the continuous random lifetime whose probability density function is denoted by f (t\ x;8), where x is a vector of parameters (covariates) that describe exploitation conditions, and 8 is a vector of parameters that describe the lifetime distribution.

2.1. Estimation from truncated lifetime data

In case of the analysis of warranty data we often face situations when we observe both failure times tt,i = 1,... and corresponding vectors of covariates xu i = 1,... are observed only for failed units. Let Tc be a certain prespecified censoring time such that failures are observed only when Ti <Tc, where Ti,i = 1,... denote random variables describing lifetimes of failed units. If only lifetimes of failed units are available, and the form of the lifetime probability distribution is known, the statistical inference about parameters 8 can be based on a truncated conditional likelihood function

L(e)=U f(x8), (1)

A kF(Tc\ x i; 8) w

where F (T x;8) = P{T < t\ x;8}. This likelihood function arises from the conditional (truncated at time Tc) probability distribution of the random lifetime T. It is interesting to note, that the likelihood function (1) does not depend upon the number N of units in the considered population of tested items. Therefore, (1) is suitable for the estimation of 8 when this number is unknown. Moreover, in case of a low proportion of failed items, this likelihood function can be quite uninformative, as it was noticed by Kalbfleisch and Lawless [13]. They showed using computer simulations that the variance of the estimators of unknown parameters is substantially larger than in the case when some information about non-failed units is available.

Estimation of 8 using the likelihood function (1) is rather complicated, even in simple cases. A comprehensive presentation of this problem can be found in the book by Cohen [2]. A relatively simple solution was proposed by Cohen [1] for the lognormal probability distribution of lifetimes, i.e. when logarithms of observed lifetimes are distributed according to the normal distribution. In this case the maximum likelihood estimators based on (1), and the moment estimators based on the first two moments coincide, but computation requires either special tables or the usage of numerical procedures. Cohen [1] considered a single left truncation at X0. In this case the kth sample moment is calculated from

"k =± . (2)

i=1 "

For the estimation of the unknown parameters □ and Dd Cohen [1] proposed to use three first moments defined by (2). The obtained estimators can be calculated from the following simple formulae:

^ff. (3)

and

„-= X 0 + (4)

"2 - 2"

These formulae are derived for the left truncated sample. However, they can be applied in case of right truncated samples, in which case the odd moments are negative. The solution given by (3) and (4) is theoretically less efficient than the maximum likelihood estimators obtained from (1). However, Rai and

Singh [21] have shown using extensive Monte Carlo simulations that there is no significant difference between these two methods. However, the efficiency of these estimators decreases, as expected, significantly when the percentage of truncated (i.e. not observed) lifetimes is larger than 30%.

2.2. Estimation from censored lifetime data with full information about censored lifetimes

In the previous subsection we considered the case when only the data from units failed prior to a certain time Tc are available. The situation when the information about non-failed units is available is well known as "censoring". Following Hu and Lawless [9] let us present the general mathematical model of lifetime data. We consider population P consisting of n units described by their lifetimes, tt,i = 1,....n, random censoring times, Ti,i = 1,....n, and vectors of covariates zi,i = 1,....n, respectively. Triplets ((, t , zt) are the realizations of a random sample from a distribution with joint probability function

f ((| 6;t,z*]dG(r,z), t > 0,t> 0,z e Rq , (5)

where lifetimes and censoring times are usually considered independent given fixed z, and G(t,z ) is an arbitrary cumulative distribution function. Let O be the set of m units for whom the lifetimes are observed, i.e. for whom ti <Ti,i = 1,...,n . The remaining n-m units belong to the set C of censored lifetimes for whom only their censoring times Ti and covariates zi are known. The function S(tl6;T,z)= 1 -F(tI6;t,z), where F(I6;t,z) is the cumulative distribution function of the lifetime, is called in the literature the survivor function or the survival function. The likelihood function that describes the lifetime data is now given by [9]

z(e)=n f (ti 10;Ti,z]dG(Tt,z)x

ieO

(6)

xn S ((10; Ti, z i dG(, z i)

ieC

The special cases of (6) are well known, and comprehensively described in many reliability textbooks, such as an excellent book by Lawless [16]. However, they are rather well suited for the description of laboratory life tests, where all censoring times are known, and the values of covariates that describe test conditions are under control. In this paper we recall only those results, which in our opinion are pertinent to the analysis of field lifetime data.

One of the features that distinguish reliability field tests from laboratory tests is the variety of test conditions. In the laboratory test these conditions are usually the same for all tested units. Only in case of accelerated lifetime these test conditions are different for different groups of tested units. In contrast to this situation, in reliability field tests usage conditions may be different for all tested units. Therefore, statistical methods that allow taking into account different test conditions are especially useful for the analysis of reliability field data.

There exist two general mathematical models that link lifetimes to test conditions and are frequently used in practice: proportional hazard models, and location-scale regression models. In the proportional hazard models the hazard function, defined as h(t;0,z ) = f (t; 0, z ) /S (t;0 ,z), is linked to the test conditions by the following equation

h(tl z) = h (t)g(z), (7)

where functions h0(.) and g(.) may have unknown parameters which have to be estimated from statistical data. Another representation of the proportional hazard model is the following:

S(tl z ) = S0 (t)g(z d. (8)

The most frequently used model is given by the following expression

h(t\ z) = h0 (t>, (9)

where zp = zxPx +-----+ zq Pq, and Dn's are unknown regression coefficients. This model was investigated by

many authors. To give an illustration of its application let us recall the results given in Lawless [16] for the case of the Weibull distribution of lifetimes.

The Weibull probability distribution is the most frequently used mathematical model of lifetime data. In the considered case of the proportional hazard model its survivor function is given by the following expression

Sz) = expl- (te ^

(10)

where ^>0 is the shape parameter, responsible for the description of the type of failure processes. If we use the transformation Y = logT , the logarithms of lifetimes are described by simple linear model

Y = zp + —W, (11)

where □□□□□, and the random variable W has a standard extreme value distribution with the probability density function exp[w - exp(w)].

Suppose that n units are tested, and independent observations (xi, z i ),i = 1,... ,n are available, where x, is either a logarithm of lifetime or logarithm of censoring time of the ith tested unit. Additionally suppose that exactly r failures are observed. If we apply the maximum likelihood methodology to this model, we arrive at the following set of equations [16]:

1 1 n

--Z za +-Z zneXi = 0,l = 1,. ,q (12)

i=1

r —xt + ->' Xiex = 0. (13)

r 1 ^ 1 ^

----Z x- +~Z:

— — to -1?

where xi =(( - z{p)/-. The solution of q+1 equations given by (12) and (13) yields the maximum

likelihood estimators of □ (and hence for the shape parameter □), and regression coefficients f3x,...,Pq. The

formulae for the calculation of the asymptotic covariance matrix of these estimators can be found in [16].

A second regression model commonly used for the analysis of lifetimes is the location-scale model for the logarithm of lifetime T. In this model the random variable Y = log T has a distribution with the location

parameter ¿u(z), and a scale parameter □, which does not depend upon the covariates z. This model can be written as follows:

Y = u(z ) + -T, (14)

where — > 0 and I; is a random variable with a distribution that is independent on z. Alternative representation of this model can be written as

( t \

S((\ z) = S0 -p- . (15)

Both models, i.e. proportional hazard model and location-scale model, have been applied for different probability distributions of lifetimes. The detailed description of those results can be found, for example, in

[16]. However, it is worth to note, that only in the case of the Weibull distribution (and the exponential distribution, which is a special case of the Weibull distribution) both models coincide.

When the type of the lifetime probability distribution is not known and the proportional hazards model seems to be appropriate we can apply distribution-free methods for the analysis of lifetimes. Let (8) be of the form

S (t\ z ) = S0 (t )

,zp

(16)

Cox [5] proposed a method for the separation of the estimation of the vector of regression coefficients □ from the estimation of the survivor function <S0(t). Suppose that the observed lifetimes are ordered as follows: t(1)< • • • < t(m). Let Ri = R(t(i) be the set of all units being at risk at time t(i), that is the set of all

non-failed and uncensored units just prior to t(i). Note, that in this model censoring times of the remaining n

- m units may take arbitrary values. For the estimation of □ □Cox [5] proposed to use a pseudo-likelihood function given by

m

L(ß)=n eZ(i7Z

z a )ß

leRi

(17)

Slight modification of (17) has been proposed in Lawless [16]. This modification allows for few multiple failures at times t/i\,i = 1,...,m. Let Di be the set of units that fail at tii), di be the number of those

units, i.e =| Di |, and 3i = ^ d zl . The likelihood function is now given by [16]

L(ß) = n

i=1

(18)

e

i=1

d

z

The maximum likelihood estimators of the regression coefficients □ are found from the following equations:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

A

,z,ß

Z ^ - di Z Ze

/ feA /

i=1

leDi

= 0, r = !,...,<

(19)

where Eir is the rth component in Si = ("¿1,...). Formulae for the calculation of the asymptotic

covariance matrix of these estimators are given in [16]. When the vector of the regression coefficients □ □has been estimated, we can use a distribution-free method, such as the Kaplan-Meier estimator [15], for the estimation of <S0(t).

2.3. Estimation from censored lifetime data with incomplete information about censored lifetimes

In case of real field lifetime data the full information about the non-failed units is often unavailable, even in the case when there exists full and precise information about all failed units. Consider, for example, the data from warranty programs. Suppose that we do our analysis at a certain moment of time using the data (lifetimes and values of covariates) on all units that have failed by that moment. As we usually do not have information about the units which have not failed, we neither know their censoring times nor the values of their corresponding covariates. Moreover, we may also not know even the total number of units n. However, if even partial information about these units is available, it can be used for the improvement of the efficiency of estimation. This information may come, for example, from the follow-ups of certain units during the warranty period or monitoring of some units after their warranty has been expired.

Suzuki [22], [23] was one of the first researchers who considered the case of incomplete information coming from field reliability data. In [23] he considered the case when a certain fraction p of units is

additionally monitored during their warranty period. Thus, we have lifetimes of all units that have failed during the warranty period and all censoring times that do not failed during the warranty period but have been monitored. Under the assumption of random censoring times independent from random times to failure Suzuki [23] derived the maximum likelihood estimator of the survivor function S(t) that generalizes the estimator proposed by Kaplan - Meier [15]. In [22] Suzuki applied his methodology to find estimators of the parameters of such lifetime distributions like the exponential distribution or the Weibull distribution. Consider, for example, the exponential distribution with the survivor function S(t) = exp(-At),A> 0,t > 0.

Let t1,...,tm be the observed lifetimes of m failed units, and t*,...,t"k be the known censoring times of those k monitored units that have not failed during the warranty period. The censoring times of the remaining nl = n - m - k units that have not failed during the warranty period are unknown. The maximum likelihood estimator of the hazard rate □ □is given as [22]

A = -

m

EH1+t E''

(20)

1=1

i=1

In the similar case of the Weibull distribution Suzuki [22] derived modified maximum likelihood equations.

In [22] Suzuki considered also another problem related to the analysis of warranty data. In modern warranty systems the warranty "time" is often bi-dimensional. For example, for newly sold cars warranties are defined both in terms of calendar time and mileage. Thus, failures that occurred during the calendar-time warranty period but after the moment when the maximum mileage had been exceeded are not reported. Formulae used for the calculation of respective estimators are more complicated in this case. A more general model, when the additional information about covariates is available, was considered by Kalbfleisch and Lawless [13].

The results of Suzuki [22], [23] originated the paper by Oh and Bai [20] who considered the case when monitoring of certain units taken randomly from the whole population of considered objects is monitored not only during a warranty period, but also during some after-warranty period. They assumed that: (i) each failure that occurs during a warranty period (0,T1] is reported with probability 1; (ii) each failure that occurs during an after-warranty period (T1,T2] is reported with probability p, and (iii) each unreported unit either fails during the after-warranty period but is not reported with probability 1-p or survives time T2. Let f ((; 8) be the probability density function of the lifetime, and S (t; 8) be the corresponding survivor function of the considered objects. We assume that the vector of parameters □ is unknown, but we know probability p. In this case the log-likelihood function is given by [20]

log Z(8) =

Z log{/((;8)}+ z[logP + log{/((;8)}] (21)

+ n3 log[(1 - P)((1; 8) + pS(2; 8)

where D1 is the set of units which failed during the warranty period (0,T1], D2 is the set of units failed and reported during the after-warranty period (T1,T2], and n3 is the number of units (both failed and not failed) not reported during (0,T2]. Maximum likelihood estimators of □ can be found, as usual, by the maximization of (21). Oh and Bai [20] considered also a more difficult problem when the probability of revealing failures during the after-warranty period is unknown. To solve this problem they applied the EM maximum likelihood algorithm and proposed an iterative procedure for finding the estimators of □. For both cases of known and unknownp Oh and Bai [20] calculated the asymptotic covariance matrix of the obtained estimators. Another approach was used by Hu et al. [11] who have found non-parametric estimators of the probability distribution of the

time to failure ft) when the additional information about the probability distribution of censoring times is available. Hu et al. [11] assumed that times to failure and censoring times are described by mutually independent discrete random variables and found moment and maximum likelihood estimators off(t).

The problem of two time scales mentioned in the paper by Suzuki [22] has attracted many researchers. The general discussion of the alternative time scales in modelling lifetimes is considered in the paper by Duchesne and Lawless [7]. In the considered in this paper context of the analysis of field reliability data this problem was considered by several authors. For example, Lawless et al. [18] considered the following linear transformation of the original calendar time t to an operational (usage) time u

ui (() = att, t > 0 , (22)

where ai is a random usage rate described by the cumulative probability function G(a). Jung and Bai [12] have used this approach for the analysis of lifetime data coming from warranty programs when warranty periods were defined in two time scales (e.g. calendar time and mileage). The results of their computations are rather difficult for real applications, and cannot be applied without a specialized software. Moreover, this model requires the knowledge of G(a), and this probability distribution is rarely known for practitioners.

Another approach for solving this problem was proposed by Jung and Bai [12] who described lifetime data by a bivariate Weibull distribution. They calculated a very complicated log-likelihood function that can be used for the estimation of the parameters of this distribution when the data are reported both in calendar time and operational time. They showed an example where this approach may be more appropriate than the linear transformation model proposed by Lawless et al. [18].

Reliability field data may be collected and stored also in other forms that are far from those known in classical textbooks. Coit and Dey [3], and Coit and Jin [4] consider the case, typical for the collection of real reliability data, when data from different test programs are available in a form (r,Tr), where r is the number of observed failures, and Tr is the cumulative time on test for the data record with r failures. Coit and Dey [3] considered the case when lifetimes are distributed according to the exponential distribution. They proposed the test for the verification of this assumption.

Coit and Jin [4] considered a case when lifetimes are distributed according to the gamma distribution

f (()Aktk-1e-M/r(k), t > 0,k > 0,A> 0 (23)

They considered the case typical for the analysis of field data for repairable objects, where a single data record consists of the number of observed failures and total time between those failures. Let Trj be the jth cumulative operating time for the data record with exactly r failures (Note, that no censoring is considered in this case); nr be the number of data records with exactly r failures; m be the maximum number of failures for any considered data record; M be the total number of observed

failures, i.e. M = mr; and t = TrjM be the average time to failure. The maximum

likelihood estimator of the shape parameter k can be found from the equation [4]

m

Z rnr \//(rk)- Mlnk = K', (24)

r=1

where

m nr

K ' = ^^rlnTj - Mint, (25)

r=1 j=1

and

Statistical analysis of interval and imprecise data - applications in the analysis of reliability field data Текст научной статьи по специальности «Науки о Земле и смежные экологические науки»

Аннотация научной статьи по наукам о Земле и смежным экологическим наукам, автор научной работы — Hryniewicz Olgierd

Похожие темы научных работ по наукам о Земле и смежным экологическим наукам , автор научной работы — Hryniewicz Olgierd

Текст научной работы на тему «Statistical analysis of interval and imprecise data - applications in the analysis of reliability field data»