Home Engineering



Computer Failure Classifications, Hardware and Software Error Sources, and Computer Reliability MeasuresTable of Contents:
Computerrelated failures may be categorized under the following five classifications [15]:
There are many sources for the occurrence of hardware and software errors. Some of these sources are inherited errors, data preparation errors, handwriting errors, keying errors, and optical character reader. In a computerbased system, the inherited errors can account for over 50% of the errors [16]. Furthermore, data preparationassociated tasks can also generate quite a significant proportion of errors. As per Bailey [16], at least 40% of all errors come from manipulating the data (i.e., data preparation) prior to writing it down or entering it into the involved computer system. Additional information on computer failure classifications and hardware and software error sources is available in Refs. [15,16]. There are many measures used in the area of computer system reliability. They may be grouped under the following two categories [14,17]:
Computer Hardware Reliability versus Software ReliabilityAs it is very important to have a clear comprehension of the differences between hardware and software reliability, a number of comparisons of important areas are presented in Table 6.1 [12,18,19]. Fault MaskingThe term fault masking is used in the area of faulttolerant computing, in the sense that a system with redundancy can tolerate a number of failures/malfunctions prior to its own failure. More clearly, the implication of the term is that some kind of problem has surfaced somewhere within the framework of a digital system, but because of design, the problem does not affect the overall operation of the system under consideration. The best known fault masking method is probably modular redundancy and is presented in the following sections [12]. Triple Modular Redundancy (TMR)In this case, three identical modules/units perform the same task simultaneously and the voter compares their outputs (i.e., the modules/units) and sides with the majority [12,20]. More clearly, the TMR system fails only when more than one module/unit fails or the voter fails. In other words, the TMR system can tolerate failure of a single module/unit. An important example of the TMR system’s application is the Saturn V TABLE 6.1 Hardware and software reliability comparisons
launch vehicle computer [12,20]. The vehicle computer used TMR with voters in the central processor and duplication in the main memory [12,21]. The block diagram of the TMR scheme is shown in Figure 6.1 and the blocks in the diagram denote modules/units and the circle voter. For independently failing modules/units and the voter, the reliability of the system in Figure 6.1 is given by [ 12] where R,_{mv} is the reliability of the TMR system with voter. R is the reliability of the module/unit. R_{v} is the reliability of the voter. FIGURE 6.1 Block diagram for TMR system with voter. With a perfect voter (i.e., 100% reliable), Equation (6.1) becomes where R, is the reliability of the TMR system with perfect voter. It is to be noted that the voter reliability and the single unit’s reliability determine the improvement in reliability of the TMR system over a single unit system. For the perfect voter (i.e., R_{v} = 1), the TMR system reliability given by Equation (6.2) is only better than the single unit system when the reliability of the single unit is greater than 0.5. At R_{v} = 0.8, the TMR system’s reliability is always less than the single unit’s reliability. Furthermore, when the voter reliability is 0.9 (i.e., R_{v} = 0.9), the TMR system’s reliability is only marginally better than the single unit/module reliability when the single unit/module reliability is approximately between 0.667 and 0.833 [22]. TMR System Maximum Reliability with Perfect VoterFor perfect voter, the TMR system reliability is expressed by Equation (6.2). Under this scenario, the ratio of R_{lm/)} to a single unit reliability, R, is given by [23]
By differentiating Equation (6.3) with respect to R and equating it to zero, we get Thus, from Equation (6.4), we obtain R = 0.75. This simply means that the maximum values of the reliability improvement ratio, y, and the reliability of the TMR system, R,_{mp}, are respectively:
and
Example 6.1 Assume that a TMR system’s reliability with a perfect voter is expressed by Equation (6.2). Determine the points where the singleunit and the TMR system reliabilities are equal. To determine the point, we equate a single unit’s reliability with Equation (6.2) to obtain
By rearranging Equation (6.5), we get
The above equation (i.e., Equation (6.6)) is a quadratic equation and its roots are and
This means the reliabilities of the TMR system with perfect voter and the single unit are equal at R = l/2 or R = 1. Furthermore, the reliability of the TMR system with perfect voter will only be greater than the single unit’s reliability when the value of R is higher than 0.5. TMR System with Voter TimeDependent Reliability and Mean Time to FailureWith the aid of material presented in Chapter 3 and Equation (6.1), for constant failure rates of the TMR system units and the voter unit, the TMR system with voter reliability is expressed by [12,24].
where R_{lmv} (/) is the TMR system witli voter reliability at time t. A is the unit/module constant failure rate. A_{vr} is the voter unit constant failure rate. By integrating Equation (6.9) over the time interval from 0 to we get the following equation for the TMR system with voter mean time to failure [12,14]:
where MTTF_{lmv} is the mean time to failure of the TMR system with voter. For perfect voter (i.e., X_{vr} = 0), Equation (6.10) reduces to
where MTTF_{lmp} is the TMR system with perfect voter mean time to failure. Example 6.2 Assume that the constant failure rate of a unit/module belonging to a TMR system with voter is Я = 0.0004 failures per hour. Calculate the system reliability for a 500hour mission if the voter unit constant failure rate is A_{VJ}. = 0.0002 failures per hour. In addition, calculate the TMR system mean time to failure. By substituting the specified data values into Equation (6.9), we get
Similarly, by inserting the specified data values into Equation (6.10), we get
Thus, the TMR system with voter reliability and mean time to failure are 0.8264 and 1571.42 hours, respectively. NModular Redundancy (NMR)This is the general form of the TMR (i.e., it contains N identical modules/units instead of only three units). The number N is any odd number, and the NMR system can tolerate a maximum of n modular/unit failures if the value of N is equal to (2n + 1). As the voter acts in series with the /Vmodule system, the complete system malfunctions whenever a voter unit failure occurs. The reliability of the NMR system with independent modules/units is given by [12.25]
where R_{nmv} is the reliability of NMR system with voter. R_{v} is the voter reliability. R is the module/unit reliability. Finally, it is added that the timedependent reliability analysis of an NMR system can be performed in a manner similar to the TMR system reliability analysis. Additional information on redundancy schemes is available in Nerber [26]. 
<<  CONTENTS  >> 

Related topics 