Reliability of references
Early research on the reliability of the employment reference produced pessimistic results (Muchinsky, 1979). For example, a study examining letters of recommendation in the US civil service found that different ratings from different referees correlated only at 0.40
(Mosel & Goheen, 1959). This is somewhat lower than but still comparable to that obtained in multi-source or 360-degree feedback settings, where the inter-rater reliability can approach 0.60 (Murphy & Cleveland, 1995). This is to be expected as people may show different aspects of themselves to different people. As Murphy and Cleveland (1995) argue, there would be little point in using multiple sources if we expected them to provide the same information. This is a well-known contradiction in academic grading where exams are frequently double-marked by faculty only to agree similar marks in the end (Baird, Greatorex & Bell, 2004; Dracup, 1997). However, inter-rater agreements of 0.60 are low and mean that only 36% of the variance in candidates’ attributes is accounted for, leaving a substantial percentage of variance unexplained.
This low reliability has been explained in terms of evaluative biases (Feldman, 1981) attributable to the personality characteristics of the referee. Referees’ mood states when writing the reference will influence whether the reference is more or less positive (Judge & Higgins, 1998). This is in line with Fiske’s well-known finding that emotional labels, notably extreme ones, are used to categorize factual information about others (Fiske, 1980). Thus when referees retrieve information about candidates their judgement is already clouded by emotional information (often as simple and general as ‘good’ or ‘bad’). Some of the sources of such mood states are dispositional (e.g., emotionally stable and extraverted individuals more frequently experience positive affect states, whereas the opposite applies to neurotic, introverted people), and personality characteristics can have other (non-affective) effects on evaluations, too. Thus the ability, personality and values of the referee shape the unstructured reference so much that they have more to do with compatibility between the referee and candidate than the candidate’s suitability for the job. It is, however, noteworthy that little research has been conducted in this area, so these hypotheses are speculative.
More reliable information from reference letters can be obtained if different raters base their ratings and conclusions on the same information. For instance, as early as the 1940s the UK Civil Service Selection Board (CSSB) examined multiple references for the same candidates (e.g., from school, university, the armed forces and previous employment), written by different referees. Results showed that inter-reliabilities for a panel of five or six people can be as high as 0.73 (Wilson, 1948). However, few employers can afford to examine such detailed information. Furthermore, even if internal consistencies such as interrater reliabilities are adequate, that does not mean that employment references will be valid predictors of job-related outcomes. Indeed, the validity of references has been an equally important topic of concern when assessing the utility of this method in personnel selection.