Desktop version

Home arrow Mathematics

  • Increase font
  • Decrease font

<<   CONTENTS   >>

Accuracy and Utility of Using Paradata to Detect Question-Reading Deviations

Table of Contents:


Words matter, especially in survey research. Interviewer deviations from question wording can change question meaning, thus undermining validity (Groves, et al. 2009; Krosnick, Malhotra, and Mittal 2014; Schuman and Presser 1996). For this reason, interviewers are instructed to read questions exactly as worded in standardized interviews. However, question-reading deviation estimates in telephone interviews range from a low of 4.6% (Mathiowetz and Canned 1980) to a high of 36% (Canned, Lawson, and Hausser 1975), and in face-to-face interviews, they can be as high as 84% (Ackermann-Piek and Massing 2014).

Given the importance of reading questions verbatim, and variability of interviewers' question-reading behavior, monitoring interviewers' behavior is arguably one of the most important procedures for quality control. In telephone interviews, live monitoring is often used, but in face-to-face interviews, monitoring is typically done by listening to interview recordings. Listening to interviews is resource-intensive and not always feasible (e.g., some interviews cannot be recorded because of technical limitations), and thus some organizations are looking for alternative tools to make existing quality control methods more efficient. One such tool is paradata.

Most modern survey software can easily and cheaply capture survey process data, or paradata, throughout the survey life cycle. Organizations are looking for ways to leverage these data to reduce costs, increase efficiency, and improve data quality. For example, time stamps have long been used to calculate interview length and detect respondent comprehension issues with individual questions, but they are also increasingly being used to monitor interviewers' question-reading behavior (Mneimneh, et al. 2014; Mneimneh, et al. 2018; Sun and Meng 2014).

To monitor interviewers' question-reading behavior using timing data, organizations estimate the expected question administration time to establish minimum or maximum (or both) question administration time thresholds (QATTs). The question duration is compared to the QATTs to identify and flag for further review questions that violate the thresholds or interviewers with high rates of such questions. Violations of minimum QATTs may indicate that interviewers omitted words from the question text, while violations of maximum QATTs may indicate that they added words.’1' This method gives organizations a tool for identifying likely problematic interviewers who should be the primary focus of monitoring efforts, thus saving time and money.

There are two known published methods of creating minimum QATTs: (1) the words per second (WPS) method, which consists of dividing the number of words in the question by a specified reading pace (e.g., ten words in the question divided by two words per second yields a cut point of 5 seconds - Sun and Meng 2014); and (2) using an a priori cutoff, such as one second (Mneimneh, et al. 2014). Little is known about how well these methods detect actual question misreadings. These studies also use only minimum QATTs, ignoring question readings that might be too long, such as if interviewers add words to the question text. There may be more accurate methods, such as using a WPS range or deriving QATTs from standard deviations of the mean question-reading times, so that both a minimum and maximum QATT can be used to identify questions read "too fast" or "too slow."

This study will take advantage of a unique data set from Wave 3 of the Understanding Society Innovation Panel that includes question timing paradata and behavior codes identifying misread questions to test how well QATT methods detect actual interviewer question-reading deviations. Three methods of developing QATTs are tested: (1) WPS point estimates (minimum QATT only), (2) WPS ranges (minimum and maximum QATTs), and

(3) standard deviations of mean question-reading times (minimum and maximum QATTs). Because there is no compelling evidence on which cutoff value is most accurate for a said method, various threshold values will be tested for each method: (1) Words per Second: Point Estimate Method: 2 WPS; 3 WPS; 4 WPS; (2) Words per Second: Range Method: 2-3 WPS; 1-3 WPS; 2-4 WPS; 1-4 WPS; (3) Standard Deviation Method: 0.5; 1.0; 1.5; 2.0.


<<   CONTENTS   >>

Related topics