Desktop version

Home arrow Engineering

  • Increase font
  • Decrease font

<<   CONTENTS   >>

Error Chain Model

While Edward's SHELL model was useful in identifying the causal factors and their linked relevance post-accident, a different and more simplistic model started to emerge in the mid-1970s. The new model is known as the Error chain and, in short, explains that a significant event is likely to transpire when seven (plus or minus two) events occur. This new hypothesis was supported by Boeing Commercial Aircraft Corporation's suggestion that an accident is unlikely to result from a single point of failure, such as the loss of a single hydraulic circuit. Rather, a series of events transpire, and when these


Error chain illustrating the individual 'links' of the chain forming an accident.

occur in a given flight the outcome is serious, with a crash or serious accident taking place.

Figure 6.5 represents the error chain model. The model is based on the theory that each event is based on 7 ± 2 individual events i.e. a minimum of five or maximum of nine separate events.

These chain elements are based on the following terms:

  • 1. Ambiguity. Any time when two or more sources or persons do not agree.
  • 2. Fixation or Preoccupation. The (excessive) focus of attention on any one item of event to the exclusion of all others.
  • 3. Confusion. A sense of uncertainty, anxiety in the context of the event.
  • 4. No One Flying the Aircraft. Lack of monitoring the current state of progress of the flight.
  • 5. No One Looking Out of the Window. There is a 'heads-down' attitude, and flight crews fail to continuously look outside, to prevent events such as CFIT/mid-air collisions, etc.
  • 6. Use of an Undocumented Procedure. This is the application of an unauthorised procedure in abnormal or emergency conditions - i.e. - deviation from the manuals and checklists.
  • 7. Violating Limitations or Minimum Operating Standards. Deliberate intentional deviation from minimum standards, such as weather, landing or take-off speeds, etc.
  • 8. Unresolved Discrepancies. The flight crews fail to resolve conflicts of opinion, information or changes in conditions that are necessary to generate consensus.
  • 9. Failure to Meet Targets. Flight crews fail to achieve given targets, such as airspeeds, minimas, altitudes, etc.
  • 10. Departure from Standard Operating Procedures. Flight crews intentionally depart from the prescribed SOPs, with the intention of saving time.


Error chain model with a LINK broken - resulting in 'no accident' concept.

11. Incomplete Communications. Communication is not fully effective because information is withheld. This failure to share information impedes other activities leading to misunderstanding, confusion, or disagreements.

When reviewing accidents, applying the Error chain model allows the reviewer to identify the major causal factors clearly and simplistically, with an understanding that the combination of 7 (±2) of the above factors will summarise the overall event. Another advantage of the Error chain is preventative philosophy, that if one single link event is removed from the equation, then the final result (i.e. the accident) will not transpire. This break the chain to prevent an accident philosophy is the principle advantage of this newer model, illustrated by Figure 6.6.

Flight Crew Training to Prevent Events

After Tenerife and the Staines accident, the aviation industry took bold steps to attempt to educate flight crews of the potential problems that can arise in an operational context, with the introduction of regular bespoke awareness training. The earlier recurrent training was known as Crew Coordination Concept (CCC), focused on identifying the elements that the Staines crash suffered from. Heavy emphasis was given to designated pilot duties on the flight-deck; the designation of PF exclusively to fly the aircraft; improved exchange of information between the crews, including transparent communication, and the monitoring and supportive role of the PNF. While the leading edge and trailing edge controls became integrated into a single control lever, the underlying problem of cohesion in the flight deck still continued to be an issue. This was in-part attributed to the different life experiences of the pilots, emerging either from a military background where orders are followed and not questioned or a civil training side where a more liberal approach to learning and human performance was taught. After the events surrounding the Tenerife crash, the training philosophy was adjusted to allow for the more updated Error chain model, succeeding the SHELL model.

In the early 1980s, the CCC training was expanded to include the professional roles that the cabin attendants played in an event. For example, Flight Saudia 163 on 19 August 1980 highlighted the importance of the roles of the cabin crews. Flight 163 suffered an in-flight fire shortly after take-off from Riyadh Airport to Jeddah. While Flight 163 was able to return back to make an emergency landing at Riyadh, when the aircraft came to a stop the cabin crews did not commence a full evacuation at this time. All 301 souls perished on this flight due to smoke inhalation from the cargo fire, resulting in Saudi Arabia's worst aviation loss of life (see Figure 6.7).

The CCC training was expanded to include the positive views and actions of the Cabin Crews and the training became known as Crew Resource Management (CRM). The inclusion of the cabin crews in flight crew training was very positive. There existed an us and them culture, where the flight deck door was known to be impede the sharing of information. This psychological barrier (the flight deck door) was mitigated by a new workplace communication strategy: the encouraged sharing of experiences of all the crews from all areas of the aircraft. Another useful improvement in this cooperation was seen in the airline's briefing rooms, where pilots and cabin crews brief separately, which has always been the industry norm. The solution to the 'them and us' culture was for the senior Captain of the flight to visit the Cabin Crew's staff briefing prior to departure. It was not uncommon prior to the 1980s for the Pilots and Cabin Crews to meet for the first time inside the


Saudi Flight 163 in-flight fire disaster, landing but all passengers and crews perishing before they could escape, 1980. (Leigh Kitto.) aircraft, because many airlines had separate ground transportation for the crews (pilots or cabin crews) to move the groups from the security screening to the aircraft. The advantages of CRM and its effectiveness were demonstrated with other accidents transpiring in the later years of the 1980s. Weak CRM as a factor for the loss of life was demonstrated in other events such as the British Airtours (Manchester Airport, UK) fire in August 1985, Kegworth (East Midlands Airport) British Midland crash in January 1989 and the Air Ontario crash (Dryden Airport, Canada) in March 1989. All three of these accidents were used as case studies for airlines throughout the world, so lessons could be learned in the hope of preventing similar events.

Professor James Reasons’ Swiss Cheese Model

Professor James Reason is one of the most important and revolutionary influences on error models and accident prevention in recent times. Reason, appointed as a Chair in Psychology at University of Manchester, UK, conceptualised a radically different model in the late 1980s, with numerous underlying observations in the human condition. Reason reviewed several well-known disasters (e.g. Piper Alpha Oil platform disaster, Challenger Shuttle Explosion, Chernobyl nuclear disaster, etc.) with a different view to most accident reports. The underlying principle that Reason surmised is that employees have good well-meaning intentions at work: deliberate acts such as sabotage are possible, but those deliberate, unlawful are already managed with the context of the law. However, events do take place in the workplace that are outside the direct control of the individual and the final outcome of the mishap is not intended. Reason identifies that safety critical industries, such as the nuclear power generation or space industry, are very rule-based and procedure-driven, yet both of these industries suffered very significant failures in the 1980s. Reason further stated that different types of errors (or single failures) existed, and these differed significantly.

An Active Failure is an occurrence where the operator of the system immediately recognises the failure: typically, these active failures are easy to identify due to their immediate identifiable nature (e.g. a mechanical pump failing).

The other type is known as a Latent Failure, alluding to the position that an error has been introduced into a system through an unintended action. The failure is hidden from view, and will only be detected at a later time, when other events transpire. Due to the latent condition, these failures are by their very nature complex and extremely difficult to identify in advance of a major event. Typically, latent failures can lie dormant for many years, and there is a likelihood of more than one event being associated with the system.

Reason suggests that different risk-averse industries have attempted to prevent failures by introducing preventative barriers. In the context of aviation, these barriers could include the following elements:

  • • The aircraft's initial design and manufacture is subject to 'a very careful oversight' by the National Authority certifying Aviation
  • • Pilots require extensive basic and type training on aircraft and must perform to a high standard in written and practical examinations
  • • Aircraft Engineers likewise require extensive basic and type training on aircraft and must perform to a high standard in written and practical examinations
  • • All maintenance to be carried out by Engineers who hold appropriate licenses (above), following written maintenance instructions from the aircraft manufacturer precisely
  • • All calibrated specialist tooling is in good condition, checked and certified at given intervals
  • • The planning of all maintenance activities is performed by technical staff, who review the supplied literature from the manufacturer and supply it in a working format to the engineering staff
  • • Etc

Each of the above bullet points is considered as a barrier of some protective layer, because historical events have taken place. These barriers will require some 'tuning' to prevent the same event happening again. It is this breaching of barriers or protective layers that Reason explains as the Swiss Cheese model. Reason's book, Human Error, 1990, explores this new Swiss Cheese theory, and his subsequent book Managing the Risks of Organizational Accidents 1997, contains numerous airline observations.

Figure 6.8 represents Reasons' Swiss Cheese model. The layers of Swiss cheese with holes in them represent the barriers that have been introduced. All barriers have 'holes' that appear and go away and are considered dynamic. Errors can penetrate single layers of protection, sometimes through multiple layers of defence. If a single barrier rejects (i.e. identifies the error as a problem), the protection task has been successful, and the potentially tragic outcome averted.

No single system can be 100% safe all the time and prevent all accidents/ serious events. When the errors do occur, which is expected, if the error is able to penetrate through the barriers holes it will pass to the next layer. If the error is of a type that can penetrate through all the barriers, then an accident or significant event will occur. In an aviation context, this can include the loss


Swiss Cheese model showing individual barriers intact.

of the aircraft, the passenger's lives and any other persons on the ground that interact with the respective aircraft. Figure 6.9 illustrates how these barriers can be identified and how the error(s) can then penetrate through all the layers of protections resulting in the accident.

The Swiss Cheese model explains many events, and the significant difference between this and other models is the awareness factor that can be gained by educated individuals on the front line i.e. pilots, engineers, Air Traffic controllers, etc. Additionally, the second-tier staff members who are not directly involved in hands-on maintenance or flying activities, but more in support roles (e.g. technical service general engineering staff), can be made aware of the possibilities of their own creation of events that, if undiscovered, will be classed as latent failures.

Reason also makes a profound observation, namely the staff at work go to perform their duties to the best of their ability. Sometimes things do not go as planned, and when the management of the given organisation discovers this, the management feels it necessary to impose punitive measures. Reason states that typically, the employee with the defective performance is sanctioned and privileges are often withheld (often with a financial implication): the individual is then subjected to a retraining activity because it is believed they require upskilling to better perform. Neither of these actions are effective, and the blame and train culture and activities are counter-productive and ineffective, as a future failure is probable. If the initial failure occurrence was not a deliberate act, then why would a previous sanction and additional training help prevent such possible future failures? Furthermore, when the


Swiss Cheese model showing all barriers breached and the accident.

next failures do occur, management takes the view that the individual has chosen to fail; therefore a more severe punishment is required to remedy this lack of performance. Reason rightly explains (1997) that staff in such an organisation will be reluctant to report failures, events, and occurrences to the company, and the likelihood of more latent failures becoming undetected is highly probable. Staff will cover up their own failures, and the management will be totally unaware of the levels of failure. To improve upon this negative, vicious circle, Reason (1997) promotes the substitution test, namely for the manager to put themselves in the employee's shoes. The substitution test simply states, if you find you could make this mistake, what are you (the manager) going to do about it, and if you state you wouldn't make the mistake, then what makes you so special!

Reason's application of the Swiss Cheese model to commercial aviation in the late 1990s experienced incredible levels of success and adoption. This was because the worldwide accident rate at this time suggested that if nothing was put in place by 2015, a hull loss and significant loss of life would be happening on a weekly basis. The adoption of Swiss Cheese theory and the mandated training of HF for all staff in an airline who can have an operationally detrimental effect, as they produce latent and active errors, has been instrumental in improving the safety record of the industry since 2001. This mandated training requirement has resulted in a significant reduction in the losses of aircraft and reduction in the loss of life, as supported by data published by Boeing and various aviation safety organisations. Prior to the mandating of initial and recurrent HF, the UK Civil Aviation Authority (CAA) conducted industry briefings for all accountable staff in UK registered airlines (between 1999 and 2001). The CAA's message to the airlines was the necessary improvements in human performance and accident reduction rate should not be 'sold to staff' that the principle inclusion of this subject will save lives. This is because the problem with measuring safety and accidents is a simple dilemma: commercial aircraft accidents are very infrequent. The lack of accidents leads to a false sense of achievement because in the preceding month, months, years, etc., your organisations' lack of events implies that the airline is fully resilient to possible future crashes. Reason confirms this difficulty, as demonstrated with the lack of catastrophic experiences in other non-aviation industries followed by massive losses (e.g. Challenger, Chernobyl, etc.). The CAA stated in presentations that the corporate commitment will be present when safety improvements result in fewer technical delays, fewer errors and a more productive work environment. Furthermore, the modus operandi suggested to gain the necessary full corporate commitment from the airline was to imply that '3% savings can be achieved from the engineering budget'.

Therefore, the financial savings from performing the work correctly, first time and every time has a significant, measurable cost factor, and thus the accident reduction was a welcome by-product of better working practices.

Safety Management Systems

The SMS is a collective system of elements with the principle objective of enhancing safety. This is achieved through an organisation creating their own bespoke practices and operations from four distinct areas:

  • • Safety Policy - the structure of how the organisation responds to legal requirements and operational deliverance. This policy creation and commitment requires corporate endorsement and defines methods/ processes and objectives.
  • • Risk Management - in the most simplistic evaluation, individual risks are assessed against likelihood and severity. For example, the likelihood might scale from highly improbable through to highly likely. Likewise, the consequence might be scaled from very minor damage or injury (e.g. a bruise) to catastrophic event (e.g. crash with loss of life). Using this simplistic two-parameter assessment, a matrix, or number matrix (often colour-coded), can be produced. This contains the multiplication of factors in respective centre boxes from the two corresponding elements. The highest numbers (e.g. certainty of event, death being the result) will score the largest risk value. The use of this model is to quantify the risk. Control measures to mitigate the risk of occurrence and effects are included.
  • • Safety Assurance evaluates independently how well the system is performing and whether the risk management is effective. Measurements can be performed to assess losses in a unit, either financially, loss of technical dispatch reliability (a commercial aircraft being available for revenue service) or even sickness and absenteeism.
  • • Safety Promotion is the continuous training and communication activities for all staff, to enhance the safety culture in the whole business.

The SMS is the combination of all four of the above bullet point items. While the content may appear broad and uninspiring (to some), the implication in the aviation sector is immense. Firstly, the SMS has been mandated in commercial aviation since 2013 by International Civil Aviation Organization (ICAO), and although ICAO has no legal authority, the regional aviation authorities (including the European Aviation Safety Agency) can and do have the legal framework to ensure corporate organisations fully participate. In the context of aviation, an interesting inclusion is the extension of the SMS to third-party repair and overhaul stations, because it is not only airlines that experience problems pertaining to safety that require close management and improvement.

An SMS is not a transferable product or software solution that can be sold from one company to another. Every organisation is different because staff in companies have different working practices and procedures. The SMS is, by definition, a unique collection of tools that have been adapted and applied to an organisation.

Other industries (aside from the aviation sector) have adopted the SMS as a regulatory minimum. The maritime industry has adopted the practices, as have certain national railway authorities, and the resemblance to the successful aviation adoption is very noticeable. The guiding principles are identical, namely deliberate acts (i.e. sabotage) require deliberate sanctions (i.e. custodial sentences), but as most deviations from approved procedures or processes are because of external factors outside of the control of the individual, the substitution test of the management (post an event) demonstrates the effectiveness of human performance and SMS.


The aviation industry has demonstrated for many years the willingness to learn from mistakes and events. After a significant event (such as a plane crash), the causal factors are identified and the national accident investigating bodies make these findings public via their published reports. The openness of this attitude has allowed for a number of models to be developed over the years, with each model explaining the different inter actions and complexities that appear. The models have all been developed and applied to different events, with the principle objective to learn from the past, because there is no advantage to the industry if the same type of accident occurs over and over.

While the models are all excellent in explaining factors pertaining to previous accidents, the Swiss Cheese Model is particularly advantageous in allowing an organisation to benchmark how close to the edge it currently is, without the need to experience catastrophic events. This self-awareness is the underlying principle that aligns many existing procedures and management processes to form an effect SMS.

The weakness of any safety critical system is the infrequency of the recorded events, and senior management must continue to ensure they fully appreciate all the challenges that the staff experience on the front line (e.g. pilots, engineers, air traffic controllers, airline engineering planners, etc.). Failure of the management to grasp the actual challenges and control them will likely result in Swiss Cheese 'holes' appearing in existing systems. History shows us that when enough of these holes line up, it is more likely that a catastrophic event takes place. The senior management (from the Chief Executive Officer down) must accept that as named 'accountable persons' they will be the first airline employees to be prosecuted by the national aviation authorities, thus the 'us and them' culture cannot be an excuse after an event.

The next chapter will discuss the changes to physical barriers and processes associated with security events in commercial aviation, such as 9/11. Additionally, other events including relevant non-aviation security occurrences are included, such as liquid improvised explosive ordinance.


<<   CONTENTS   >>

Related topics