Differences in Interaction Quantity and Conversational Flow in CAPI and CATI Interviews


Extensive literature exists comparing computer-assisted personal interviewing (CAPI) and computer-assisted telephone interviewing (CATI) (for an overview, see Cernat 2015; Holbrook, Green, and Krosnick 2003; Scherpenzeel 2001). From the Total Survey Error (TSE) framework (Groves and Lyberg 2010), these studies have examined differences between CAPI and CATI for outcomes related to selection (i.e., coverage and nonresponse) and measurement.

Overall, the mode comparisons have yielded inconsistent effects, which Scherpenzeel (2001) argues may be due to four different causes. First, some studies are true experimental comparisons that only vary the medium for administration, while others involve a "real life" comparison in which each mode is optimized within its own common fieldwork procedures (e.g., sampling). Second, the experimental designs of studies differ, with the split-ballot design being much more common, but less informative than test-retest designs (see Martin, O'Muircheartaigh, and Curtice 1993) or Multitrait-Multimethod designs (Andrews 1984; Scherpenzeel 2001; Scherpenzeel and Saris 1995, 1997). Third, early mode comparisons of personal versus telephone interviewing did not always use computer-assisted techniques for the CAPI mode, which made it difficult to monitor the CAPI interviewers. Nowadays, computer audio-recorded interviewing (CARI) for CAPI allows for a rather unobtrusive means to monitor interaction (Pascale 2016). Fourth, studies differ in their criteria for comparison. While most studies evaluate costs, speed in terms of finishing field work, response rates, and similarity of the response distributions, only a few studies evaluate the quality of the data in terms of measurement error, such as by comparing data collected in interviews with official records (Kormendi 1988). Lacking validating information, studies evaluate data quality using other criteria such as comparing how response effects differ between modes (for an overview, see Cernat 2015; Holbrook, et al. 2003; Jackie, Roberts, and Lynn 2010; Scherpenzeel 2001). Regardless of the criteria for comparison, inference of likely causes of mode effects are often limited, though satisficing and social desirability response bias have been suggested as underlying mechanisms. More specifically, the hypotheses tested in earlier works are based on the argument that differences are a result of greater trust and rapport and more effective nonverbal communication in face-to-face interviews as compared to telephone interviews (Holbrook, et al. 2003). An exhaustive review of 48 studies (all published before 2002) that compared face-to-face interviews with telephone interviews revealed confounding factors for 32 studies (Holbrook, et al. 2003), but a more recent systematic review that specifically takes these four factors into account is currently not available, and beyond the scope of our chapter.

In the current study, we use interaction quantity as a criterion for comparison to better understand mode effects. Interaction quantity is a rough indicator of actual behaviors of interviewers and respondents in interviewer-respondent interactions. In studies of interviewer-respondent interactions, generally referred to as "interaction analysis" or "behavior coding" depending on the level of detail employed (see Ongena and Dijkstra 2006), interactions are systematically evaluated for deviations from the so-called paradigmatic sequence (Schaeffer and Maynard 1996). A paradigmatic sequence, including one, two, or three turns, is the interaction as intended by the researcher: the interviewer reads the question exactly as worded, the respondent provides an answer that exactly matches one of the response options, and optionally, the interviewer acknowledges the response (e.g., "thank you"). Any deviation from this sequence may inform researchers about problematic aspects of the questionnaire or the interviewing procedure (Schaeffer and Maynard 1996).

Studies previously investigating interviewer-respondent interaction in CAPI and CATI have shown that the rate of standardization in CATI is usually higher than in CAPI (Pascale 2016; Pascale, Goerman, and Drom 2013; Snijkers 2002). To our knowledge, CAPI and CATI interactions have never been systematically compared in terms of the number of turns, events, and words spoken in the actual interaction, and therefore in this study we aim to fill this gap.

We aim to answer the following research questions:

  • (1) To what extent do CAPI and CATI interviewers differ in interaction quantity, hesitations, and uncertainty markers?
  • (2) How much variability in interaction quantity, hesitations, and uncertainty markers is due to mode versus respondents, or questions?
  • (3) What question and respondent characteristics are associated with interaction quantity, hesitations, and uncertainty markers in both modes?

Number of Turns, Events, and Words

A question-answer (QA) sequence consists of all the interactions between an interviewer and a respondent for a given question. QA sequences are made up of turns, separate and distinct utterances spoken by the interviewer versus the respondent. Within a given turn, the interviewer and the respondent can produce multiple actions or events (see Houtkoop- Streensta 2002). Events are discrete, identifiable, meaningful actions (such as questions, answers, and instructions, etc., see Ongena and Dijkstra 2006). Among CAPI and CATI interviews, we compare the QA sequences in terms of three different variables. First, we compare the number of turns taken by the interviewer and the respondent. Second, since both the interviewer and the respondent may, within one turn, produce multiple events, the number of events per question by mode is analyzed. Third, at an even more detailed level, we compare the number of words uttered (integrated in one variable for both the interviewer and the respondent) across modes and across different questions of the European Social Survey (ESS) questionnaire. This comparison may reveal differences in communication efficiency. It is expected that, due to more possibilities of nonverbal communication (Clark and Schober 1992), while discussing the questions, fewer words will be uttered by interviewers and respondents in CAPI than in CATI. However, the pace of question asking is likely to be slower in CAPI than in CATI, since silences are less problematic and risk of break-offs are lower (Holbrook, et al. 2003). Several studies have indeed found that face-to-face interviews take longer than telephone interviews (Groves and Kahn 1979; Holbrook, Green, and Krosnick 2003; Rogers 1976), but no data are available for differences across modes in length of individual questions.

