Desktop version

Home arrow Management

  • Increase font
  • Decrease font


<<   CONTENTS   >>

Training Assessors

Baker and Dismukes (2003) addressed some of the problems associated with training assessors. Quite often the introduction of assessment using markers represents a culture change, and it can also be quite intimidating for both the assessor and the trainee. The first step is to ensure that both sides of the process understand, and buy in to, the concept of assessment. Assessors must have a thorough understanding of the marker framework in terms of what it represents and where the boundaries between categories lie. They need to understand the grade scale and its use. Finally, they need to understand the process to follow in gathering and evaluating evidence.

Video clips of crews are probably the best media for calibration and standardisation but can be challenging to create. Having crews rehearse non-normal events in a simulator can often provide ample evidence to observe. However, normal performance, especially on line checks, is often routine and, therefore, apparently lacking in events of note. However, assessing ‘normal’ work is still a valid training goal. Smartphones are capable of capturing adequate video material on the flight deck, but audio recording can sometimes be an issue. Where assessors are from different aircraft types, unfamiliar technical and procedural details can sometimes be a distraction. Video segments of around 5-minute duration are usually adequate.

Training involves giving candidates exposure to examples of workplace performance so that they can identify specific episodes or samples of behaviour. A recommended approach is to, first, simply let assessors watch a video clip and then discuss what they saw. Quite often, trainees will initially offer a generic overview of the performance: they will broadly summarise what they saw in qualitative terms. The goal, however, is to identify specific instances of behaviour. Show the clip a second time but now get the class to cite the episodes they observed. Having gathered the examples on a flip chart or whiteboard, the next step is to get the class to categorise their evidence against the marker framework. Where an episode possibly falls into more than one category, get delegates to explaining their rationale. As a rule of thumb, an episode should be assigned to the marker for which it represents ‘best evidence’. That is to say, the most significant impact of the action observed tends to fall into one specific category. The activity is then repeated with a different video but now each trainee collects evidence and categorises before, then, sharing with their colleagues. Observing a performance and capturing evidence is a skill in its own right. Experienced trainers can often correctly evaluate the performance of a crew but struggle to support their conclusion with evidence.

Having collected and categorised the evidence, the next step is to assign a value to the performance. Calibration involves getting groups of assessors to discuss episodes and develop a consensus view about the most appropriate grade. Initially, it will be useful to look at the videos used for the initial observation task as the class will already be familiar with the performance. The class will also have developed a richer set of samples against each marker based on the group discussion. Initially, get the class to assign a grade to just one or two markers rather than attempt the whole suite at this stage. In any case, if manual handling is one of the categories, it will be very difficult to assess from a video. Each trainee assigns a grade and writes it on a post-it label. These are gathered in and then displayed on a whiteboard or flip chart. At this point, the spread of scores can often be surprising. Table 12.5 contains scores for ten trainees observing a video of a captain.

For one of the markers graded, trainees giving the highest and lowest scores are asked to cite their evidence and explain their reason for assigning the performance to the particular category. At this stage, their views are not challenged. The activity can

TABLE 12.5 Allocation of Grades

A

В

C

D

E

F

G

H

I

)

AP

3

3

3

3

3

3

3

2

4

2

SM

4

2

3

3

3

3

4

3

3

2

TM

4

2

3

3

4

3

4

2

3

2

C

3

2

3

3

4

3

3

2

4

2

AP = application of procedures; SM = systems management; TM = task management; C = communication.

then be repeated with another marker with the discussion being led by the facilitator. This time, trainees can be invited to change their grading based on the views of their colleagues. Finally, using a new, unfamiliar video clip, the class has to gather the evidence, grade and then arrive at a consensus view of the quality of the performance. The exercise is not complete until all agree on the correct score. Garrison et al. (2006), looking at IRR in coders of transcripts, found that agreement improved from 43% to 80% after a similar ‘negotiated agreement’ activity. The training course should finish with a final standardisation exercise where assessors have an independent opportunity to demonstrate competence in assessment. If possible, scores awarded by the group should be analysed using the Rvvg statistic.

Because of the issue with attitude shift - which could be either hardening and softening over time - periodic recalibration should be done at an annual workshop, where groups of assessors review a performance (Baker & Dismukes, 2002). Trainers assess and share their conclusions about the evidence and the grade awarded. A periodic review of assessor performance by inspection of grades awarded is also recommended.

Rater bias, leading to unreliable assessment, is a significant and ongoing challenge. Tests of IRR will identify the presence of an issue but not, necessarily the cause. The stability of the assessment system across time can also be tracked statistically but, again, this will not identify the cause of any discernible drift. For example, a change in pilot recruitment demographics may influence the data or refreshing of the rater cadre, resulting in a drop in the group’s shared expertise, could influence performance. From a systems safety perspective, meaningful assessment is a guide to resilience in that it points to the likely efficacy of crew interventions. Reflecting on the reasons for assessing, data gathered should also point to gaps between the operational risk profile and the capability of crews to cope. An assessment system lacking validity and reliability is an operational risk.

 
<<   CONTENTS   >>

Related topics