Psychometric Properties

In establishing the job-relatedness of a testing process, defendants must pay particular attention to the psychometric properties of their selection tools. There are several important factors that employers should focus on when demonstrating the job-relatedness of their tests, much of which is outlined in the various professional standards for employment testing.


Reliability concerns the amount of random error variance in a set of test scores. In other words, observed test scores are in some part a ‘true score’ (what the test is designed to measure) and in some part an error (both systematic and random). Estimates of reliability specifically assess random error, with the expectation that the less variance in test scores can be attributed to random error, the more it can be attributed to true score. Thus, estimates of reliability concern the test’s precision, or consistency, in producing results. This consistency in turn sets limits on the validity (or job-relatedness) of a test. Specifically, a test cannot be job-related (i.e., valid) if it is not first shown to be reliable. For employment tests, it is practically impossible to remove all random errors; there will always be some discrepancies between the assessment of a candidate’s potential and the candidate’s true potential.

Because selection tests cannot be entirely free of random errors, rank-ordered test scores are unlikely to produce a perfect rank order in the ability or knowledge a test is designed to measure. To accommodate for random error in measurement, researchers have proposed using test bands to determine groups of candidates who are expected to be interchangeable, or near- interchangeable, on their true scores (Cascio, Outtz, Zedeck & Goldstein, 1991). Test bands utilize a test’s reliability to create a standard error of difference (SED). The SED is then used to specify bands such that all candidates within a band may overlap, and hence not differ, in their true potential.

Although candidates are first rank-ordered in order to establish bands, hiring in a banding approach is then conducted using a secondary criterion (e.g., randomly) within bands rather than starting with the top-ranked candidate. Thus, the banding approach is an alternative to simply hiring in a top-down fashion. Because of the potential imprecision in ranking one candidate over another if there is error in a test, test banding minimizes the potential negative effects of a test’s ‘unreliability’. This can be particularly valuable from a diversity standpoint. When using test bands, more minorities can be included in the top group for hiring and therefore have a greater chance of being hired. For example, if the top two-ranked candidates are both White, but the two candidates ranked directly below them are Black, a test band including all four candidates will result in an increased chance of the two Black candidates receiving initial offers.

As such, employers can improve diversity in their hiring without utilizing subgroup norming, which is prohibited by the Civil Rights Act of 1991. Test banding is a relatively recent development and remains a controversial strategy (for reviews, see Bobko & Roth, 2004; Campion et al., 2001). While test banding generally has been upheld in the courts for selection procedures when decisions were not based solely on race (e.g., Chicago Firefighters v. City of Chicago, 2001; San Francisco Fire Fighters Local 798 v. San Francisco., 2006), it has yet to be reviewed by the Supreme Court. Furthermore, test banding has been found to be improper in other contexts, such as promotions (Massachusetts Association of Minority Law Enforcement Officers v. Gerald T. Abban and Others, 2001). Therefore, its use and how decisions will be made within bands must be considered and reviewed carefully before implementation (Hanges et al., 2013).

