The typical steps involved in developing contextualized SJTs are threefold (Lievens, Peeters & Schollaert, 2008; Motowidlo et al., 1990). The first stage concerns the development of item stems or situations to be presented in the SJT. The second stage involves the collection of response options from subject matter experts (SMEs), the choice of response instructions and of the response format. The third and final stage targets the development of the scoring key.
Stage 1: Item stems To gather the item stems or situations presented in the SJT, a job analysis is usually conducted. During this job analysis, SMEs are asked to generate critical incidents (Flanagan, 1954), which means that they are asked to recall examples of situations in which exceptionally good or exceptionally poor performance was demonstrated. The test developer often prompts the SMEs with the goal of collecting information about all the content domains and constructs deemed to be important for the job. The selected SMEs are typically incumbents, supervisors, managers or a mix of these sources of information. Alternatively, archival sources and even customers might serve as a source of information (Weekley et al., 2006). The critical incidents obtained are then sorted and checked for redundancy and level of specificity. The surviving incidents then serve to write item stems or descriptions of job-related situations. As an alternative to this inductive method of gathering critical incidents, a deductive method can be followed. In this strategy, the item stem content is derived from theoretical models (e.g., a model of conflict management).
Stage 2: Response options, response instructions, and response format After developing the situation descriptions, another group of SMEs is asked to generate response options they believe to be (in-)effective reactions to the situations. To obtain a wider range of response options with different levels of effectiveness, the test developer might also ask inexperienced workers to generate responses. The test developer then decides which options to retain, usually by choosing a mix of response options that are differentially effective in each situation. There are no general rules regarding the number of response options to retain. The majority of SJT items include 4 or 5 response options, even though SJT items with up to 10 response options also exist (e.g., the Tacit Knowledge Inventory; Wagner & Sternberg, 1991).
In the next stage, the test developer decides on the response instructions. This is not a trivial choice because the response instruction format affects the construct saturation of the SJT (McDaniel et al., 2007). One of two formats of response instructions is usually chosen: behavioural tendency instructions or knowledge instructions (McDaniel & Nguyen, 2001). Behavioural tendency instructions ask respondents what they would do in the given situation, whereas knowledge instructions ask respondents what they should do in the situation; in other words, they ask respondents to identify the best response to a given situation.
Test developers also make a choice about the response format to be employed. Generally, three response formats can be distinguished. Respondents are asked to select the best/worst response options, rank the response options from most to least effective or rate the response options on Likert-type scales. Arthur, Glaze, Jarrett, White, Schurig and Taylor (2014) comparatively evaluated these three common response formats by varying them while keeping the rest of the SJT design and content constant. The rate response format evidenced higher construct-related validity, lower levels of subgroup differences and increased reliability over the other two. A drawback of the rate response format, however, was its higher susceptibility to response distortion.
Stage 3: Scoring key After situations, response options, response format and instructions have been developed, the test requires a scoring key. Here, four different methods can be delineated. The rational method involves asking a group of SMEs to score the response options on (in-)effectiveness. Scores with acceptable inter-rater agreement (e.g., > 0.60) are retained for the test. The second is the empirical method which involves quantifying endorsements of correct response options gathered from a large sample of lay people instead of SMEs. For instance, options that are chosen to be correct by over 25% of the sample are retained for the test. Although notably different in approach, researchers have found no differences between these two scoring keys in terms of validity (e.g., Weekley & Jones, 1999). Combining the rational and empirical method is a third approach that can be followed. An example of this hybrid approach is retaining an empirical key only after SMEs have agreed on it. The final and least frequently followed method involves the development of a scoring key with answer options that reflect effective performance according to a chosen theoretical framework (e.g., leadership theories) scored as correct (Weekley et al., 2006).
In Figure 11.1 we present an example of a contextualized SJT item that was taken from the Tacit Knowledge Inventory for Managers (Wagner & Sternberg, 1991). Although not strictly called an SJT by the developers, the test is similar to the format and content of a typical SJT (McDaniel, Morgeson, Finnegan, Campion & Braverman, 2001).
Figure 11.1 Example of a contextualized SJT item. Source. Wagner & Sternberg (1991). Reproduced with permission of Robert J. Sternberg.