Exploring Structural Elements of a Resilient System
The previous section elaborated on the idea of systems as enacted decisions. The system as envisaged in my hierarchical model is an abstract framework that describes the space where decisions are made but the purpose of the system is to achieve an outcome. It is a dynamic entity, not a static structure. The next case study illustrates the space in which action occurs. I want to explore boundaries, margins and tolerance. The event itself is relatively straightforward, albeit with a tragic outcome. It is another example of the banality of failure. It is the circumstances that allowed the event to happen that are of interest to us.
On 23 December 2003 a Learjet 24B took off from San Bernadino County Airport bound for Hailey, Idaho (NTSB. 2006). Climbing through 26,000 feet the FO requested a return to San Bernadino. When asked if he wanted to declare an emergency he declined. The controller cleared the aircraft to 24,000 feet. The initial descent was normal but the aircraft quickly pitched nose down, losing height at 10,000 fpm. Ninety seconds later the aircraft crashed into the high desert. Such was the impact force that little of the aircraft was recoverable. Tissue samples from the captain were collected but no identifiable remains of the FO were found. The weather was fine, no problems were reported with the aircraft and no other aircraft were in the vicinity. Apart from two recent incidents where the left engine flamed out in flight, we have no idea of what caused this accident.
It is a sad, tragic irony that the complete lack of evidence about the performance of the crew allows us to fully focus on the system the crew were working within. The context of the operation is summarised below and illustrated in Figure 3.2:
FIGURE 3.2 The Learjet ‘System’.
The goal of the system, on this day, was to undertake a flight on behalf of a customer. In order to be considered successful, two constraint sets needed to be satisfied. The first related to the condition of the aircraft and the second related to the nature of the dispatch of the aircraft. The boundary was created by the decision to undertake the job for the ‘loyal paying customer’, which set the point by which the constraints needed to be satisfied. Although the aircraft was technically owned by one company, it was managed on a daily basis by another and maintenance was provided by a third. This complicated arrangement is commonplace and allows private owners of aircraft to generate revenue. It also adds to complexity in that the behaviour of the system cannot, necessarily, be explained by the functioning of the component parts. This configuration illustrates the concept of interfaces: internal boundaries, in effect.
Interfaces in this event existed between the FAA and both the aircraft owner and the maintenance provider. There is also an interface between Pavair and JETT. Each party engaged in transactions across their respective interfaces. Transactions included work requests, completion sign-offs, form submissions, provision of requested information, checking and verification of documents, communication of decisions, issuing approvals. These various processes were effortful in that they required time and activity to satisfy and, then, to validate that they met requirements.
Dispatching aircraft under Type 91 or 135 approvals requires adherence to different sets of rules: one is simpler than the other. One set applies to private flying while the other covers commercial operations. It happens that a business jet might be dispatched under Type 91 regulations for an empty, positioning sector and then Type 135 rules are followed for the revenue sector. In this case, it appears that Pavair was operating in its own right and, thus, because there was no passenger on board when the aircraft departed, the first sector was technically legal. We have no way of making a judgement about the second sector with the ‘loyal customer’ on board. The first set of goal constraints were initially met.
The second set of constraints that needed to be satisfied related to the condition of the aircraft. The FAA provided the regulatory framework for both airworthiness and also operations. It exercised control by establishing formal requirements for work processes, for documentation, by setting standards of evidence to satisfy requirements, by authorising agents to act on its behalf and, ultimately, by granting or withdrawing permissions and approvals. Logbooks and official forms provided the feedback the FAA could use to renew permission to fly in addition to formal inspections. The FAA had requested evidence in relation to the aircraft, which did not arrive.
The system boundary was the last point at which the constraints we have identified could be satisfied before the system moves from a valid state to one that is invalid. A valid state is one that can be considered safe or viable. On this day, the boundary was established once the decision had been made to collect the passenger. The margin is the zone in which behaviour relative to the specific boundary is manifested. There were two signals of system status: the technical condition of the aircraft and the probability of compliance with the dispatch regulations for the second sector. We have no evidence of how Pavair intended to resolve the latter condition. We do know how the first condition was managed. A request was made by a third party for permission to operate the aircraft.
Buffering describes the variability of performance the system can accommodate. In effect, it captures the range of alternative behaviours that might be seen to depart from a notional ideal template and, yet, still allow the task to be done. It seems that significant periods of the aircraft’s maintenance history were undocumented. Information was held in different locations and, yet, the aircraft continued in use. If we look at the action taken to release the aircraft, we see surrogates interacting. The management of the part-processed FAA Form 337 is a further example. Despite all of this behaviour, while chaotic, the system remained intact. In lieu of actual sight of the requested documents, permission was granted on the basis of an understanding that the information had been sent. There is no evidence to suggest that either party knew that statement to be true. My analysis makes no suggestion that either the FAA inspector or the JETT engineer were culpable. They were simply acting to achieve an outcome. The act that breached the boundary was the release of the aircraft on the basis of a promise. Of course, this did not cause the aircraft to crash but even if the crew had been successful in fulfilling the promise to the loyal customer, the system would still have failed. This suggests that systems can possess significant buffering capacity. This example raises questions about the legitimacy of actors. Although the aircraft was owned by Pavair, communication was left to an engineer from JETT. Equally, the approval was granted not by the inspector with responsibility for the case but another, who happened to be in the office and took the call.
Tolerance describes the robustness of the system when operating close to the boundary. The one element of this story that we have not been able to examine is the behaviour of the flight crew. All we can say is that, in the space of a few minutes, the flight went from being within the control of the crew - the FO had declined to declare an emergency when asked - to catastrophic. In this case tolerance was brittle. If we look at the management of the aircraft, we cannot say that degradation was graceful. The system did not degrade to a lesser but still safe state. This is an example of what I consider an inert state. Inadequacies in the management of the aircraft’s airworthiness status were apparent and action was demanded but no meaningful progress was being made.
Efficacy describes the probability of an intervention achieving the goal within the margin, before the boundary is breached. The only evidence we have to assess efficacy in this example is the use of an informal telephone call and a subsequent verbal approval based on a promise that action had been taken. The formal process had been usurped by, probably, well-intentioned actors. The action was expedient but not efficacious in that the problem was not solved.
This first exploration of the systems concept illustrates a number of key points. First, actors in a system engage with one another to achieve a goal. The goal must not be confused with an immediate task. The goal was to undertake a legal flight. The tasks were those processes associated with achieving the goal. Although the FAA was the controlling authority, the manner in which control was exercised allowed sufficient variability for the process to fail. The boundary and its associated margin, in this case, is a negotiated space in which work is accomplished. It is not an absolute. Any intervention that changes the available margin, such as the intention to operate the flight, modifies the risk attached to the situation. Buffering describes a broad range of activity. It is important to understand that we are not making a judgement about the correctness of the various actions taken. Buffering recognises that normal work comprises a multiplicity of solutions and describes a system’s ability to cope with non-ergodicity and radical uncertainty. Efficacy describes the value of work, by which I mean given the current conditions, will the actions of individuals satisfy the applicable constraints before the boundary is reached? Efficacy is a measure of fitness, not ‘rightness’.
This example also illustrates how organisational complexity increases the opportunity for safeguards to be circumvented, for mechanisms of control to become confused and for communication to become fragile. The distributed nature of ownership, management and responsibilities in this case - who was actually accountable for what? - increased the probability of failure. The system functioned in absence of the formal, expected protocols and artefacts.