Contribution to failure description

D. Valis

David Valis - CONTRIBUTION TO FAILURE DESCIPTION R&RATA # 2

(Vol. 2) 2009, June

CONTRIBUTION TO FAILURE DESCRIPTION

D Valis.

University of Defence, Brno, Czech Republic e-mail: david.valis@unob.cz

ABSTRACT

In our lives we meet many events which have very diverse causes, mechanisms of development and consequences. We frequently work with the events' description besides other assessments in safety/risk assessment. In pure technical applications these events are related with the failure occurrence of equipment, a device, a system or an item. The theory speaks about failure itself, its mechanisms, circumstances of occurrence, etc. but at the same time we need appropriate terminology to describe these conditions. Our basic approaches into observing, dealing and handling failure may fall into two groups. We either talk about a probabilistic approach or about a deterministic (logic) approach. As we need to get some information about a failure we need to find it or transfer it from different sources. This contribution can be a complex problem for the term "failure" and its related characteristics. In the paper there are mentioned functions of an object and their description, classification of failures, main characteristics of failure, possible causes of failure, mechanisms of failure and consequences of failure and also other contributions related with failure very closely.

1 INTRODUCTION

Before we introduce the topic of a failure let us ask a simple question. Why do things actually break? Answers can vary. One of the answers might be the following statement which we are going to develop more. Usually the reason for this is that the applied load exceeds the dimension/robustness of the product. The load can be purely mechanical (force, tension, etc.), purely electrical (power, electromagnetic field, etc.), purely chemical (effect of chemical substances, etc.), general physical (warmth, radiation, etc.), or of a totally different nature. Whenever the applied load exceeds the assumed dimension of the item, unwanted (usually irreversible) processes start, and sooner or later a failure occurs. The load can be a one time load or it can be applied a number of times. Concerning the first instance, overload failure will occur and in the second case fatigue failure will occur. As time passes, the product could become weaker for any one of many reasons (unless a failure occurs immediately). One of the basic assumptions dealing with a failure is as follows. Before any failure incurred due to inner cause (e.g. operation or using an item) occurs, it is essential to have a device in operation. Idleness of an item or a system can end in a failure due to natural ageing, but in this case the initial mechanism is not properly understood. A relevant failure occurs mostly only during operation. Some factors and characteristics for describing failures:

Process in time of occurrence and manifestation:

- failure causes;

- failure manifestations ; >- Failure profile

- failure consequence; J

Failure causes:

- design failures;

- manufacturing failures;

- overstress failures;

- misuse failures;

- degradation failures;

Failure manifestations:

- random failures;

- gradual;

- sudden;

- common caused failures;

- primary and secondary failures;

- intrinsic failures;

- extrinsic failures;

Failure consequences:

- insignificant;

- marginal;

- minor;

- major;

- critical;

- catastrophic;

Failure is a term widely used in technical practice especially concerning dependability theory. For the reliability practitioners failure is a basic term in dependability theory, and it is key and essential for observing stochastic relations of item behavior. It is an event which is used by probability theories on a general level, for they speak about a random event. In dependability theory it is necessary to realize the fact of failure as a stochastic term, to understand its meaning, and to understand other links. And only because of this, mathematical tools, used in dependability, are not only a dead and boring "set" of formulas, relations and graphical expressions.

While observing a technical item we concentrate basically on possible causes of failures, their development over time, their process, mechanism, and of course their impact, effect, or other influences which might result from a failure occurrence. It is inevitable to realize that a failure is of key importance for operation and function of technical items. Theory and practice in particular shows us that failures occur under different situations, various circumstances, different conditions, etc. Theoretically, dealing with failures, we can describe their possible causes, nature of occurrence, process of development, and we are able to model them at the same time. We can see connections between individual groups of failures and their profiles. We can match a range of importance and numerical values with the failures, they can fall into groups, sets, etc. However, our biggest, continual effort is to eliminate failure occurrence, reduce its number (frequency), limit the number of its occurrences over a specified time period or in relation to another observed dependent quantity (mileage, cycles, etc.). Our intention is to be able to determine their occurrence so exactly that we

could be prepared to face it as well as possible. Simply our aim is to get a better profile of an observed item from the view of its dependability and related properties.

Furthermore, we would like to describe possible classes of failures, their profiles, courses, development, consequences, and other relations which might be important for dependability theory and especially for this paper itself. The phenomena involved in this article are definitely not an example of a complete and synoptic list of all known and possible events assisting a failure. The aim of this article is to introduce the topic which is usually believed to be obvious, familiar and clear. However, reality need not match our ideas or the ideas of other people in full. The purpose of the paper is also to initiate the reader into the topic of a failure and at the same time to popularize it. Without full understanding we would not like the reader to absorb a piece of scripted information and not to obtain its complex form. A frequently used term might have a totally different meaning then. It would be great while working on it and finding it in a book, using theoretical tools, profiles, graphs, models, and other descriptions and contexts, we would be able to imagine there is definitely something more to the term (Blischke 2000, Elsayed 1996, Meeker & Luis 1998, Modares & Kaminskyi & Krivtsov 1999).

2 CURRENT TERMINOLOGY SITUATION

The following part speaks just briefly about the current terminology situation in the standardization field and especially in the branch of dependability and risk. The situation is caused by the ISO/IEC representatives and national bodies. Failure according to the present version of the IEC 60050-191/1990 is defined as follows: "termination of the ability of an item to perform a required function".

Note 1. After failure the item has a fault.

Note 2. Failure is an event, as distinguished from fault, which is a state.

Note 3. This concept as defined does not apply to items consisting of software only.

Failure according to the newly upgraded version IEC 60050-191 is defined as follows: "loss of ability to perform as required"

Note 1: When the loss of ability is caused by a pre-existing condition, the failure occurs when a particular set of circumstances is encountered.

Note 2: A failure of an item is an event, as distinct from a fault of an item, which is a state.

Note 3: Qualifiers may be used to classify failures according to the severity of consequences, such as catastrophic, critical, major, minor, marginal and insignificant, the definitions depending upon the field of application.

It results from these definitions and further analysis that the term "failure" will be understood as an event which leads straight to either a partial or complete loss of ability of an item to fulfil a required function. Most terms that are specified in the introduction dealing with the description of failure factors and profiles might also be found in a basic source document mentioned before.

At present it just so happens that because of modification and updating of terminology, an existing view of understanding a failure and relating facts can be changed. Just to demonstrate the complexity of the present state we introduce the following facts. According to the notes of the term failure mentioned above (see IEC 60050-191/1990) an item after failure has a fault. ("An item after failure has a fault".) Owing to continual discussions about this topic it is impossible to ignore the idea that a fault does not follow a failure but precedes it. This technical incompatibility together with many others has not been solved yet but their form has been very much discussed. A possible decision in favour of a new view will influence radically the existing approach, conception and observation of the failure.

While working with the term failure, as well as with relating states, it is necessary to take the current terminology mismatch into account and to adapt possible decisions to it. The possibility of a

realized change has to be accepted along with all the suffered consequences. Unfortunately, this change will violate the understanding of all existing terms/disciplines introduced so far that deal with a proper function/failure and dependability.

3 WHAT MIGHT THE FAILURE AFFECT

In this part it is necessary to draw attention to some relating events. We are dealing with a failure which prevents the items ability from performing a required function (either the main one, the minor one, or some other one as detailed below). It results from all the definitions in the paper that the inability of a system or a product to operate in a required way is a key term determining a failure.

Based on many studies and approaches a factual scale of individual functions description in complex conception was formed for a system. On the basis of these assumptions it is also essential to distinguish the influence of a failure on a function performed by an item. A failure occurrence might affect the range of the function. An outline of item functions is provided to make the understanding much easier, and failures occurrence is not strictly limited to a kind of an item function.

A required function - specifies an item task. A correct, exact and unequivocal definition is a primary, starting point for all dependability definitions as well as for a right failure definition. Operation conditions - affect significantly both dependability and especially possible failure occurrence, hence why they have to be determined very thoroughly.

1. Main function: - an intended (required) or primary function

2. Minor function: - need for providing main function

3. Supporting function: - the aim is to provide protection of people and an environment from potential damage regarding main or minor function failure as well as common support (brakes, circuit breakers, filters, etc.)

4. Information function: - it provides conditions, monitoring, measuring, diagnostics, etc. (it refers to displays, indicators etc.)

5. Interface function: - it provides an interface between an assessed item and other items (cabling, operating elements, switches, breakers, etc.).

The required function and/or operation conditions might be time dependent. In this case a mission profile has to be determined and all dependability viewpoints have to be related to it. A representative mission profile and corresponding dependability targets have to be stated in the item's specification. The mission duration is often/usually considered as a parameter t, that is time. The dependability function - especially the reliability function is designated as R(t). R(t) is the probability that no failure at item level will occur in the interval (0;t>, often with the assumption R(0) = 1 - it means that at the time t = 0 the object was in the state of operation. In order to avoid confusion a distinction between predicted and estimated (assessed) dependability should be made on the basis of a real evaluation during operation or tests. The predicted dependability is calculated on the basis of the item's dependability structure and the failure rate of its component. The estimated dependability is specified on the basis of a statistical evaluation of dependability tests or field data by known operating and environmental conditions.

Failure: - it occurs when an item terminates its ability to perform its required function. However simple the definition might look, it is difficult to apply it to complex items/systems. The basic operating time is generally a random variable. It is often reasonably long but on the other hand it might be very short, caused by systematic failure influence for example. It can also be caused by early failure influence resulting from a transient event at turn-on. A general presumption in investigating failure-free operating times is that at t = 0 which means that in an instant t = 0 the

object is free of defects and systematic failures and therefore it is able to operate one hundred per cent. Besides their relative frequency, failures can be categorized according to one of the views mentioned before (mode, course, cause, consequences, mechanisms, etc.). Failure profiles:

- critical stage - consequence seriousness

- failure cause - misuse failure;

- mishandling failure;

- weakness failure;

- design failure;

- manufacturing failure;

- ageing/wearout failure;

- others (e.g. software).

- failure mode (velocity) - sudden; ,

, , {- degradation - gradual J &

- according to a range of a consequence - cataleptic;

- complete;

- partial.

- according to a place of occurrence - during a test;

- during operation.

- according to occurrence mechanism- primary;

physical, chemical, %

or other processes - secondary;

leading to a failure - systematic/reproducible;

- according to verification possibility - verified failure;

- unverified failure.

These are the very basic failures categories and factors they fall into, and this is the common way of how to work and deal with them. Moreover, we can determine some other (supplementary) failure categories but their presence here is not possible due to space limits of the paper. The authors of the paper may provide more information for those who are interested (Elsayed 1996, Meeker & Luis 1998, Modares & Kaminskyi & Krivtsov 1999).

4 FAILURE OCCURRENCE CAUSE

According to the (IEC 60050-191/1990) the circumstances occurring during design, manufacture or use which have resulted in a failure are the cause of a failure. To know the cause of a failure is useful in case we want to decide how to prevent a failure or its reoccurrence. Failure causes can be classified in relation to the life cycle of the system.

Cause - the cause of a failure can be intrinsic, due to weaknesses in the item and/or wearout, or extrinsic, due to errors, misuse or mishandling during the design, production and especially the use itself. Extrinsic causes often lead to systematic failures which are deterministic and might be considered like defects (dynamic defects in software quality). Defects are present at t=0, even if they cannot be discovered at t=0. Failures always seem to appear in time, even if the time to failure is very short as it can be with systematic or early failures.

1. Design failure - occurs due to inadequate design. It is basically any failure directly related to item design. It means that due to item design a part of the whole degraded or got damaged and this resulted in a failure of the whole.

2. Weakness failure - occurs due to weakness (internal) inherent or induced in the system so that the system cannot stand the stress it encounters in its normal environment.

3. Manufacturing failure - a failure caused by nonconformity during manufacturing and processing. It is basically any failure caused by faulty processing, or inadequate manufacturing, or an error made while controlling the process during manufacturing, tests and repairs.

4. Ageing failure - a failure caused by the effects of usage and/or age.

5. Misuse failure - a failure caused by misuse of the system (operating in environments for which it was not designed).

6. Mishandling failure - a failure caused by incorrect handling and/or lack of care and maintenance.

7. Software error failure - a failure caused by a PC programme error.

5 FAILURE MECHANISM

The failure mechanism is a very complex and extensive passage of the failure profile. It can be sudden or gradual with its relating manifestations.

Failure mechanism - physical, chemical, electrical, thermal or other process that results in failure.

Mode (manifestation, course) - the mode of a failure is a symptom (local effect) by which a failure is observed. For example - opens, shorts, or drifts (for electronic components). Brittle rupture, creep, cracking, seizure, or fatigue (for mechanical components), etc.

A complete and sudden failure is called a catastrophic failure and a gradual and partial failure is designated a gradually degraded failure.

The connections related to these aspects of a failure are shown in the following description:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

1. Intermitted (incoherent) failure - a failure which lasts only for a short time. A good example of this is a fault that occurs only under certain conditions occurring intermittently (irregularly).

2. Extended failure - failures that occur until some corrective action rectifies the failure. They can be divided into the following two categories:

a) Sudden failure - a failure which occurs without warning

b) Gradual failure - a failure which occurs with signals to warn of the occurrence. Usually it is

a case of significant behaviour changes (decreasing performance, increasing temperature,

rising vibrations, etc.) or this style.

We have to distinguish among different failure mechanisms of mechanical, electrical, electronic and hydraulic parts. The differentiation is so complex that it can not be easily presented in this paper. The example of failure mechanism will be given at the section 9 (Blischke 2000, Elsayed 1996, Meeker & Luis 1998, Modares & Kaminskyi & Krivtsov 1999).

6 FAILURE CONSEQUENCES

Many information sources use the term failure consequence. Also many standards define them and work with them differently. The following part should help to clarify the concept of failure consequences, as we also know them from many reliability analyses.

Effect - the effect (consequence) of a failure can be different if considered on the item itself or at a higher level. A usual classification of a failure has usually the following qualitative profile and

is: non-relevant, partial, complete, ..., critical failure. Since a failure can also cause further failures in an item or a system, a distinction between primary and secondary failure is important.

A classification of the severity of a failure mode in accordance with the MIL-STD 882 is

listed:

1. Catastrophic failure - a failure that can lead to death or can cause total system (item) loss.

2. Critical failure - a failure which results in many serious injuries or major system damage. Sometimes we think of it as a failure, or combination of failures, that prevents an item from performing a required mission.

3. Marginal failure - a failure that leads to minor injury or minor system damage.

4. Negligible failure - a failure that leads to less than minor injury of system damage.

Another classification can be found in the RCM approach where the following classes are

used:

Failures with safety consequences;

Failures with environmental consequences;

Failures with operational consequences;

Failures with non-operational consequences.

A classification of the failure severity into groups (categories) is given in more standards. Each of them is specific in a way and corresponds with a presupposed application. The IEC 61 882, IEC 60 812, IEC 50 126 and many others are some of the examples. We do not have the ambition to make a complete list of failure consequences and their classification. The issue is to take into account many different approaches and handle with care with them as well as use them with clear intention (Meeker & Luis 1998, Modares & Kaminskyi & Krivtsov 1999).

7 SOURCES FOR FAILURE PROFILE DETERMINATION

We do not want to speak about basic and clear failure measures and characteristics which are obviously well known in our community. Our attempt is to present different sources of failure data/measures/characteristic obtaining. The main sources are:

1. Data on elements' reliability guaranteed by a producer - there is no need to expand on it;

2. Conclusive test results (observation) of the same (comparable) item reliability. It is based on the standardized assessment of reliability tests of technical items. The methods and methodologies of how to conduct tests are standardized for different equipment.

3. Predictions - standardised calculation of item's reliability based on a reliable source (MIL HDBK 217F). This is the American military standard that enables the data on electronic elements' reliability to be estimated. It is commonly used when estimating the elements' failure rate especially in military applications.

4. Specialized information databases on elements' reliability (specialized in terms of elements' profile or conditions of usage). Specialized information databases on elements' reliability are usually established and kept to meet the needs of single industrial branches or technical areas. The data acquired when observing items in operation or the results of specialized dependability tests are collected in the databases. One of the most respectable and frequently used databases on reliability in this area is the database established and kept by the Reliability analyses centre (RAC) which at present distributes three important databases on the commercial basis: (EPRD-97; NPRD-95; FMD-97; SPIDR 2007).

5. General information database on elements' reliability. These databases are usually published as parts of specialized literature in the dependability area. The information put in them is usually very general.

6. Expert estimations. Expert estimations of numerical values of reliability measures might be used only when appropriate values cannot be specified by a different, more reliable method. The authors of the article know from experience that this solution is accepted only as an exception because in most cases the numerical values of reliability measures can be determined by other methods described in this paper.or this style.

8 TYPICAL MEASURES OF A FAILURE OCCURRENCE

Failure rate

Failure rate plays a major role in dependability analyses. It is a numeric value of the measure that describes failure occurrence depending on the measurement of continuous/discrete quantity. It specifies the occurrence of a certain number of events per observed/measured unit.

Factors affecting failure rate:

- Component type;

- Component design;

- Component technology;

- Operational stress (temperature, voltage, pressure, etc.)

- Component quality grade (involving production quality control and post-production screening including burn-in)

- Environmental stress (vibration, shock, humidity)

- Activation and deactivation transients, e.g. voltage spikes, current surges, transient thermal stresses

- Component application;

Failure occurrence probability

This is another measure describing possible phenomenon-failure occurrence in a numeric way. It can be described by a discrete distribution or continuous distribution depending on a kind of variable and provided that it follows a certain level of relevancy which is called a confidence interval.

Mean-time to failure

Another frequently used measure of a continuous random variable (usually time), which specifies assumed mean-time to failure.

9 EXAMPLE OF FAILURE CHARACTERISTICS

The intention of this example is to present some technical parts which are commonly used and to show their typical failure mechanisms, failure modes/causes and the percentage distribution of these characteristics for them. Based on (EPRD-97; NPRD-95; FMD-97; SPIDR 2007) the example of several mechanical parts is shown. The items chosen for the example are the most common mechanical parts which are typically implemented in the systems. This example as well as the guidelines presented in the paper is supposed to contribute to the analyst knowledge and help him to orient while conducting standard analysis (e.g. PHA, FMECA, FTA, OSHA, JSA, etc.).

Example of several mechanical parts:

Statically loaded

Demountable:

- Screw:

Loose (approx 50%) Worn (approx 25%)

Induced - vibration/missing (approx 25%)

- Nut:

Bearing failure (approx 50%) Loose (approx 50%)

- Key:

Bent/Dented/Warped (approx 100%)

Non-rewirable:

- Welded joint:

- Riveted joint:

Broken (approx 50%) Workmanship (approx 50%)

Dynamically loaded

- Bearing: Worn (approx 60%)

Binding/Sticking (approx 20%) Loss of lubrication (approx 10%) Contaminated (approx 5%) Scored (approx 5%)

- Gear: Worn (approx 52%)

Binding/Sticking (approx 19%) Stripped (approx 10%) Broken (approx 7%) Jammed/Stuck (approx 7%) Displaced (approx 3%) Noisy (approx 2%)

This is only small example of the failure characteristic regarding few typical mechanical parts. The purpose of the example is to extend current lack of information we normally face. Based on the information mentioned in the previous section we frequently do not have such information about failures and their characteristics guaranteed by the producer. We do not have plenty of information from tests either since the tests are not conducted very frequently and in the wide range. Some prediction methods like (MIL HDBK 217F, and others) are not very suitable for every parts prediction and they give only one characteristic of the failure.

Next point which was the purpose of the presentation of the example for was to present also the related characteristics of a failure (mode/cause) apart of the measure. Sometimes if the analyst does not have clear imagination about modes/causes of failure he/she can hardly imagine if the item may fail down or not.

10 CONCLUSION

This contribution is supposed to give a general overview in the area of the basic term "a failure" as described above. As the understanding of all related matters is very complex it is not possible to express complete knowledge and experience here. Some reliability and safety engineers might be confused while beginning with specific analysis (e.g. FMECA, PHA, JSA, OSHA, etc.). The main benefit of this contribution is supposed to be a general and introductive material for understanding a failure its full profile with all related characteristics. The next purpose of the paper is to provide a hand (possibly guide lines) to orient the analyst on the appropriate information sources which are necessary for the analysis. Due to the limited space within the paper, the information provided is not complete, therefore those who are interested we kindly ask to contact the authors.

ACKNOWLEDGEMENTS

This paper has been prepared with support of the Grant Agency of the Czech Republic project No. 101/08/P020 - "Contribution to risk assessment of technical systems" and with support of the Ministry of Education, Youth and Sports of the Czech Republic, project No. 1M06047 - "Centre for Quality and reliability of production".

REFERENCES

1. BLISHKE, W. R. Reliability: Modelling, Prediction, and Optimisation, John Willey, 2000, New York.

2. ELSAYED, A. E. Reliability Engineering, Addison-Wesley, 1996, New York.

3. MEEKER, W. Q., LUIS, A. E. Statistical Methods for Reliability Data, John Willey, 1998, New York.

4. MODARES, M., KAMINSKYI, M., KRIVTSOV, V. Reliability Engineering and Risk Analysis. A Practical Guide ,", Marcel Dekker, 1999, New York.

5. EPRD-97 Electronic Part Reliability Data. IIT Research Institute - Reliability Analysis Center. Rome, New York. 1999.

6. NPRD-95 Non-electronic Part Reliability Data. IIT Research Institute - Reliability Analysis Center. Rome, New York. 1999.

7. FMD-97 Failure Mode/Mechanism Distributions. IIT Research Institute - Reliability Analysis Center. Rome, New York. 1999.

8. SPIDR 2007 System and Part Integrated Data Resource. Alion Science and Technology and System Reliability Center.

9. MIL-HDBK-217F Reliability Prediction of Electronic Equipment.

10. IEC 60050-191, International Electrotechnical Vocabulary (IEV) - Chapter 191: Dependability and quality of service.

Contribution to failure description Текст научной статьи по специальности «Строительство и архитектура»

Аннотация научной статьи по строительству и архитектуре, автор научной работы — D. Valis

Похожие темы научных работ по строительству и архитектуре , автор научной работы — D. Valis

Текст научной работы на тему «Contribution to failure description»