Overview of Prior Research
Research on SJTs has mushroomed following their reintroduction in the academic literature by Motowidlo and colleagues (1990). The vast majority of research evidence on SJTs pertains to the contextualized view as the traditional perspective on SJTs. In this section, we review such research evidence concerning reliability, criterion-related and incremental validity, construct-related validity, subgroup differences, applicant reactions, faking, retest and coaching effects. Whenever meta-analytic findings are available, we refer to them.
Several meta-analyses have integrated internal consistency reliability coefficients that have been reported in the SJT literature. The mean a values reported in these meta-analyses ranged from 0.46 to 0.68 (Campion, Ployhart & MacKenzie, 2014; Catano, Brochu & Lamerson, 2012; Kasten & Freund, 2015). The reason for the moderate internal consistency reliability coefficients is the fact that SJTs are created on the basis of job situations that require the expression of a combination of different constructs, which results in heterogeneous test items and response options. Evidence for item heterogeneity comes from factor analytic investigations of SJTs that reveal no clear factor structure in the items (Schmitt & Chan, 2006).
As internal consistency is not a suitable reliability estimate for a measurement method that has heterogeneous items (Osburn, 2000), other types of reliability estimates, such as test-retest reliability and alternative form reliability, have been proposed in the literature (Lievens et al., 2008; Whetzel & McDaniel, 2009). Studies examining test-retest reliability are scarce but they tend to report considerably higher estimates. For instance, Catano and colleagues (2012) reported two SJT test-retest coefficients of r = 0.82 and r =0.66, respectively. Studies examining alternative form reliability coefficients are even scarcer because of the difficulty in developing alternative form SJTs that capture the same constructs when these constructs are often not clearly distinguishable to begin with. Notwithstanding this, Clause, Mullins, Nee, Pulakos and Schmitt (1998) reported alternative test reliability estimates ranging from r = 0.70 to r = 0.77 when they adopted a rigorous item cloning method for constructing alternative SJT forms (see also Lievens & Sackett, 2007). So, SJTs have generally been found to be sufficiently reliable measurement instruments, provided that appropriate reliability estimates are used.