 Home Engineering  # Reliability Evaluation Methods

## Introduction

Over the years, a large number of published literature in the area of reliability has appeared in the form of books, journal articles, conference proceeding articles, and technical reports [1-5]. Many of these publications report the development of various types of methods and approaches for performing reliability analysis. Some of these methods are known as the Markov method, fault tree analysis (FTA), and failure modes and effect analysis (FMEA).

The Markov method is named after a Russian mathematician, Andrei A. Markov (1856-1922), and is a highly mathematical approach that is frequently used for evaluating reliability of repairable systems. The FTA method was developed in the early 1960s to analyze the safety of rocket launch control systems in the United States. The FMEA method was developed in the early 1950s by the US Navy’s Bureau of Aeronautics . Later, the National Aeronautics and Space Administration (NASA) extended it for classifying each potential failure effect according to its severity and renamed it: failure mode effects and criticality analysis (FMECA) . Nowadays, Markov, FTA, and FMEA methods are being used across many diverse areas to analyze various types of problems.

This chapter presents a number of methods considered useful to evaluate reliability of engineering systems.

## Failure Modes and Effect Analysis (FMEA)

FMEA is quite a versatile method widely used in the industrial sector for analyzing systems during the design phase from reliability and safety aspects. It may simply be described as a very effective approach for performing analysis of each and every potential failure mode in the system for determining the effects of such modes on the entire system . The history of this method goes back to the early 1950s when the Bureau of Aeronautics of the US Navy developed a requirement known as failure analysis for establishing a mechanism for reliability control over the detail design- related effort . Subsequently, the term failure analysis was switched over to failure modes and effect analysis.

Usually, the following seven steps are followed for performing FMEA:

• Step I: Establish system boundaries and its associated requirements
• Step II: List all system, subsystems, and components
• Step III: Identify and describe each component/part and list its possible failure modes
• Step IV: Assign occurrence probability/failure rate to each failure mode
• Step V: List effect(s) of each failure mode on subsystem(s), system, and plant
• Step VI: Enter necessary remarks for each identified failure mode
• Step VII: Review all critical failure modes and take appropriate actions

It is to be noted that prior to the implementation of FMEA, there are several factors that must be explored carefully. Four of these factors are as follows [9,10]:

• Factor I: Examination of each conceivable failure mode by all the involved personnel
• Factor II: Making necessary decisions based on the risk priority number
• Factor III: Measuring costs/benefits
• Factor IV: Obtaining approval and support of the involved engineer(s)

Over the years, professionals involved in reliability analysis have developed a number of guidelines/facts concerning FMEA. Some of these guidelines/facts are as follows :

• • Developing the most of FMEA in a meeting should be avoided
• • FMEA has certain limitations
• • FMEA is not a method for choosing the optimum design concept
• • FMEA is not designed for superseding the engineer’s work

Some of the main advantages of conducting FMEA are as follows [9,10]:

• • Starts from the detailed level and works upward
• • Compares designs and highlights safety-related concerns
• • Reduces engineering-related changes and improves the efficiency of test planning
• • Safeguards against repeating the same mistakes in the future
• • A systematic approach to categorize/classify hardware failures
• • Helpful to understand and improve customer satisfaction
• • Useful visibility tool for management that reduces product development time and cost
• • Improves communication between design interface personnel.

## Fault Tree Analysis (FTA)

This method was developed in the early 1960s at the Bell Telephone Laboratories for performing the safety analysis of the Minuteman Launch Control System [11,12]. Nowadays, it is widely used in industry to evaluate reliability of engineering systems during their design and development phase, particularly in the area of nuclear power generation. A fault tree may simply be described as a logical representation of the relationship of basic fault events that lead to the occurrence of a stated undesirable event, called the ‘Top event”, and is depicted using a tree structure with logic gates such as AND and OR gates.

The main objectives of conducting FTA are as follows :

• • To comprehend the functional relationships of system failures
• • To satisfy jurisdictional requirements
• • To highlight critical areas and cost-effective improvements
• • To verify the system’s ability to satisfy its imposed safety requirements
• • To understand the degree of protection that the design concept provides against failures’ occurrence

There are many prerequisites associated with the FTA. Six of the main prerequisites are as follows :

• • I: Clear identification of all related assumptions
• II: Clearly defined what constitutes system/item failure: the undesirable event (top event)
• III: Clearly defined analysis scope and objectives
• IV: A comprehensive review of system/item operational-related experience
• V: Clear comprehension of design, operation, and maintenance aspects of system/item under consideration
• VI: Clearly defined system/item interfaces as well as system/item physical bounds.

Four basic symbols used for constructing fault trees are shown in Figure 4.1

[11,12].

Each of the four symbols shown in Figure 4.1 is described below. FIGURE 4.1 Basic fault tree symbols: (a) AND gate, (b) OR gate, (c) basic fault event, and (d) resultant fault event.

• Step II: Highlight the undesirable fault event (i.e., top fault event) to be investigated.
• Step III: Determine all the possible causes that can lead to the occurrence of the top fault event by using fault tree symbols such as those given in Figure 4.1 and the logic tree format.
• Step IV: Develop the fault tree to the lowest level of detail as per the requirements.
• Step V: Conduct analysis of the developed fault tree in regard to factors such as comprehending the logic and interrelationships among the fault paths and gaining insight into the unique modes of item faults.
• Step VI: Determine the most effective corrective measures.
• Step VII: Document analysis and follow-up on all highlighted corrective measures.

Example 4.1

Assume that a windowless room has three light bulbs and one switch. Develop a fault tree, using Figure 4.1 symbols, for the undesired (i.e., top) fault event, Dark room, if the switch only fails to close.

In this case, there can only be no light in the room (i.e., Dark room) if all the three light bulbs burn out, if there is no incoming electricity, or if the switch fails to close. Using Figure 4.1 symbols, a fault tree for the example is shown in Figure 4.2. The single capital letters in Figure 4.2 denote corresponding fault events (e.g., T: Dark room [top fault event]). FIGURE 4.2 A fault tree for the top fault event: Dark room.

### Fault Tree Probability Evaluation

When occurrence probabilities of basic/primary fault events are known, then the occurrence probability of the top fault event can be calculated. This can only be calculated by first calculating the occurrence probabilities of the output fault events of all the lower and intermediate logic gates (e.g., OR and AND gates).

Thus, the probability of occurrence of the OR gate output fault event, X, is given by where

P(X) is the occurrence probability of the OR gate output fault event X. m is the number of OR gate input independent fault events.

P^Xj) is the occurrence probability of the OR gate input fault event Xj, for j = 1,

2, 3,.,.,/n.

Similarly, the probability of occurrence of the AND gate output fault event, Y, is given by where

P(Y) is the occurrence probability of the AND gate output fault event Y. n is the number of AND gate input independent fault events.

P{Yj) is the occurrence probability of the AND gate input fault event Fy, for j = 1,

2,3

Example 4.2

Assume that the occurrence probabilities of independent fault events C, D, E,

F, G, and H in Figure 4.2 are 0.01,0.04,0.05,0.06,0.02, and 0.08, respectively. Calculate the probability of occurrence of the top fault event T (Dark room) using Equations (4.1) and (4.2).

By substituting the given occurrence probability values of fault events G and H into Equation (4.1), we get where

P(B) is the occurrence probability of fault event В (No electricity).

Similarly, by substituting the given occurrence probability values of fault events D, E, and F into Equation (4.2), we get where

P(A) is the occurrence probability value of fault event A (three light bulbs burnt out).

By substituting the above two calculated values and the given data value into Equation (4.1), we get Thus, the probability of occurrence of the top fault event T (Dark room) is

0.8924. Figure 4.3 shows Figure 4.2 fault tree with the above calculated values and the specified fault event occurrence probability values.

•  AND gate: This symbol denotes that an output fault event occurs only if all theinput fault events occur. • OR gate: This symbol denotes that an output fault event occurs if any one ormore input fault events occur. • Circle: This symbol denotes a basic or primary fault event (e.g., failure of anelementary component/part) and the basic fault parameters are failure probability, failure rate, repair rate, and unavailability. • Rectangle: This symbol denotes a fault event that results from the logicalcombination of fault events through the input of a logic gate such as ANDand OR. Normally, FTA is conducted by the following seven steps as shown below [11,13]: • Step I: Define the system and its associated assumptions.

 Related topics