Testing with different devices
Another assumption employers sometimes make which may prove to be false is the idea that a test that is valid for a particular purpose when administered on one device remains valid (and equivalent) when administered on another device. For example, a test that was validated for a particular job in paper form may not be valid when administered on a computer, and a test validated on a desktop or laptop may or may not be valid when administered on a smart phone. Several studies comparing paper administration to computer administration and computer administration to mobile devices have been conducted.
Although there is some research on score differences and equivalence, there is little research on the validity of the same tests administered in different formats (e.g., paper and online versions) or on different devices. There are probably many reasons why such research is not conducted and published; however, a likely answer is the difficulty of acquiring an appropriate and reliable criterion to conduct a criterion-related validity study. Instead of validity studies, many researchers have turned to studies of the measurement invariance of scores across devices. Lack of invariance would suggest that the researcher should be concerned about the possibility that validities across different forms or devices are not equivalent. Research results indicate inconsistent patterns of equivalence across media. In addition, some researchers have established equivalence but noted higher scores or greater variance in one situation than in another. Consequently, some researchers (e.g., Buchanan & Smith, 1999; Lievens & Harris, 2003) emphasize that equivalence must be established for each new set of conditions because equivalence found under one set of conditions does not necessarily extend to another set of conditions.
Studies of the measurement equivalence of paper and computer-administered personality test have had mixed results. Some studies have concluded that the medium for administration has no effect on equivalence; others have reached the opposite conclusion. Several studies have compared scores from paper-based and computer-based tests. Although small differences in scores are sometimes found, most authors conclude that computer administration neither increases nor decreases socially desirable responding (Dwight & Fiegelson, 2000; Lautenschlager & Flaherty, 1990; Martin & Nagao, 1989; Potosky & Bobko, 1997; Richman et al., 1999).
Using item-response theory analyses, factor analysis, criterion-related validity and mean score differences, Chuah and colleagues (2006) looked at the equivalence of scores on a personality test administered on paper and online, and concluded that scores on neuroticism, extraversion, agreeableness and conscientiousness scales were equivalent. Salgado and Moscoso (2003) reached a similar conclusion regarding scores on a five-factor personality measure. They observed similar scores, factor structures and reliability estimates across the two formats, but noted greater variance in the computer-administered version of the test.
Morelli, Illingsworth, Scott and Lance (2012) evaluated the measurement equivalence and psychometric properties (configural, metric, scalar, measurement error, construct variance, and construct means) of a non-cognitive personality measure of conscientiousness, customer service, integrity, interpersonal, stress tolerance and teamwork given on mobile and non-mobile devices. They found scores from both types of device to be invariant, except for construct means. In addition, distributions, reliabilities, inter-correlations and descriptive statistics were similar. Illingsworth, Morelli, Scott and Boyd (2015) produced similar results. Using multi-group factor analysis, they demonstrated equivalence across mobile and non-mobile devices at the configural, metric, scalar and latent mean levels and the absence of meaningful practical score differences. Arthur and colleagues (2014) found a similar pattern: equivalence between the non-cognitive measures administered on mobile and non-mobile devices and no meaningful score differences.
Ployhart and colleagues (2003) and Mead and colleagues (2007) reached opposite conclusions. Looking at scores from a personality measure of conscientiousness, agreeableness and emotional stability, a biodata form and a situational judgement test administered in paper-and-pencil form and online, Ployhart and colleagues found the variance-covariance matrices were not equivalent, suggesting some sources of nonequivalence, and the online version had better distributional properties, lower means, more variance, higher internal consistency reliability and stronger correlations.
Mead and colleagues (2007) found equivalence on some measures (e.g., conscientiousness) but not all personality constructs when tests were administered using a paper-and- pencil form and an online form. When the study participants had a choice of format, metric invariance was present across formats; however, when participants had no choice, measurement invariance was not present.
Several researchers have found measurement equivalence between scores from tests administered on computers and those from tests administered on mobile devices; however, lower scores from tests administered on the mobile device appear to be consistent. Arthur and colleagues (2014) found measurement invariance when they compared scores on cognitive measures from tests taken on mobile and non-mobile devices and score differences with scores from tests taken on mobile devices to be significantly lower than those from tests taken on non-mobile devices. They also noted greater differences between scores on the verbal component of the test and scores on the numerical component.
Morelli and colleagues (2014) used multi-group confirmatory factor analysis to evaluate the measurement invariability of a cognitive ability test, multimedia work simulation, text- based SJI and a biodata measure of conscientiousness and customer service given on mobile and non-mobile devices. They concluded that the mobile and non-mobile versions of these tests were equivalent. They noted no score differences, except that the mean score for the SJI on mobile devices was lower than the non-mobile mean.
Several authors (e.g., Arthur et al., 2014; Hawkes, 2013; Mitchell & Blair, 2013) have hypothesized various reasons for the differences in scores from computer-based tests and mobile devices, among them the instability of the internet connection, the unavailability of a mobile application for the test, increased scrolling time, more difficulty manipulating the interface, more time required to read the small screen size, content incompatibility with the mobile device and higher-ability applicants’ preference for the non-mobile device. This suggests that caution should be used when interpreting the research on mobile devices as portable devices range in size and ease of manipulation. For example, a small smart phone creates a different user experience from a tablet which in turn is different from a laptop, all of which are mobile devices.