Data and Sample
Table of Contents:
To demonstrate the methods mentioned above, we use data from a German household panel study, the "IAB-BAMF-SOEP Survey of Refugees in Germany," launched in 2016 (Briicker et al. 2017). The target population was drawn from the German Central Register of Foreigners (AZR) and included asylum-seekers and refugees who arrived in Germany between January 2013 and January 2016 and adult household members (Kroh et al. 2017). The first wave yielded 4,816 respondents in 3,554 households - which corresponds to a household-level response rate of 48.84% (Kroh et al. 2017) - interviewed by 98 interviewers using computer-assisted personal interviewing (CAPI). Brief interviews were conducted with every head of the household (household interviews). More detailed interviews (person interviews) were conducted with adults in those households, including asylum-seekers and refugees, originating from various home countries, i.e. Syria (40%), Afghanistan (10%), Iraq (8%), West Balkans (7%), and Eritrea/Somalia (7%) (Kroh et al., 2017). Questionnaires were provided in seven languages (Arabic, English, Farsi/Dari, German, Kurmanji, Pashtu, and Urdu) complemented through audio files containing recordings of the questions in different languages and an interpreter hotline. During the re-contact quality checks, a first case of interviewer falsification (complete falsifications) by one interviewer (hereafter referred to as "FI") was detected (IAB, 2017). FI accounted for 6% of all person (n = 289) and household interviews (n = 217). Analysis of these data using statistical methods not only confirmed the first falsification case but also identified two further deviating interviewers (Kosyakova et al. 2019). The second interviewer ("F2") had conducted 46 person-level interviews, and the third interviewer ("F3") had conducted 16 person-level interviews. Data collected by these interviewers were excluded from the official data release (v34). We therefore use this high-profile case to demonstrate the effectiveness of different statistical identification tools.
Identification of Complete Falsifications
To illustrate the statistical identification tools discussed above, we consider the following indicators: acquiescent response style, extreme response style, interview duration, middle response style, recency effects, semi-open responses, and stereotyping. All indicator values were standardized such that high positive values indicate suspicious patterns. A list of these indicators, the operationalization, and their mean and standard deviation can be found in Table 7.1. These specific indicators represent a range of possible indicators; e.g. interviewer duration is based on paradata, acquiescent response style on agreement questions, stereotyping on attitudinal items, and recency effects on unordered answer option lists.
List of Used Falsification Indicators on the Interviewer-Level
Source: IAB-BAMF-SOEP Survey of Refugees in Germany, 2016, own calculations.
We illustrate combining the indicators via cluster analysis, using a simple agglomera- tive hierarchical cluster algorithm (Single-Linkage) with a Euclidean distance measure. The Single-Linkage algorithm is particularly useful in this context, since similar observations are fused first (Kaufman and Rousseeuw 2009) and therefore the most outlying interviewers are separated from the unsuspicious ones. The optimal clustering solution was determined considering the Duda-Hart index (Duda and Hart 1973; Milligan and Cooper 1985). We excluded seven interviewers who had fewer than five interviews because for these interviewers the aggregated indicators are more likely to reflect respondent-level influences. Therefore, results are presented for 91 interviewers.
Table 7.2 includes the clusters that were identified via Single-Linkage clustering. The algorithm created one cluster containing unsuspicious interviewers (cluster 1) and five further clusters (clusters 2-6) containing seven outlying (suspicious) interviewers.
List of Suspicious Interviewers According to Single-Linkage Clustering
Source: IAB-BAMF-SOEP Survey of Refugees in Germany, 2016, own calculations. Note: Falsifying interviewers are highlighted in boldface type.
Mean Indicator Values per Cluster. Source: IAB-BAMF-SOEP Survey of Refugees in Germany 2016, own calculations.
All three misbehaving interviewers are classified as outliers and four further interviewers are falsely suspected. Figure 7.1 shows the mean indicator values for every cluster. The largest and unsuspicious interviewer group shows no clear direction for any of the indicators. On average, all of these interviewers produced unsuspicious values. On the other hand, F2 shows suspicious indicator values for all indicators except for recency effects. FI and F3 were grouped into one cluster, since all indicators point in the suspicious direction. 158 in cluster 3 shows some suspicious and some unsuspicious indicator values. In addition, this interviewer has a small number of interviews and quality controls did not confirm the suspicion. The other three outliers in clusters 4 and 5 do not show any systematic suspicious patterns. The only remarkable pattern is the low value on semi-open responses indicating that those interviewers, on average, had more open answers to semi-open questions. Therefore, the algorithm classifies them as outlier but not in the sense of a falsifier.
Identification of Partial Falsifications
Although there are no known partial falsifications included in the data, we still aim to demonstrate the previously described method of using similar (or identical) response patterns across item batteries to identify partial falsifications. We demonstrate the method for identifying identical response patterns for one attitudinal scale that asked all respondents about their satisfaction with accommodation, including 9 items with 11 answer categories ranging from 0 ("totally dissatisfied") to 10 ("totally satisfied"). This would give 11Q = 2,357,947,691 different answer combinations under the assumption of independence, or random responses (Simmons et al. 2016), which certainly does not hold in real survey data. Information on the identical response patterns in the data can be found in the online Appendix 7A. The pattern that occurs most often is a straight-lining pattern with the extreme answer "totally satisfied" for all nine items (occurred in 169 interviews). Most other patterns that occur multiple times are variations of this pattern, e.g. 20 times the pattern 10-10-10-10-N/A-10-10-10-10,16 times 10-10-10-10-10-10-10-10-5, and 15 times 10-10- 10-10-10-10-10-10-9. To identify if single interviewers mainly produced those patterns, we calculated the share of identical response patterns across each interviewer's workload. The response pattern was labeled as identical if it occurred in at least one other interview. On average, interviewers produced 15.55% (SD = 12.64%) identical response patterns. FI produced the highest share (53.3% of her 289 interviews) of identical response patterns. The other two misbehaving interviewers did not produce excessively large shares of identical response patterns (F2 10.9% and F3 no identical responses at all). We therefore conclude that this method can help identify falsifiers, but it is quite sensitive to the strategies that falsifiers use to manipulate the data. We would recommend this method to supplement those presented above.
Identification of Duplicates across the Interview
We also tested the data for duplicates and near-duplicates over all survey questions. Since the household data only included a small and homogeneous list of answers, only person data are analyzed. A standard duplication test revealed no duplicates, meaning that all response patterns are unique. A test for near-duplicates showed that the highest match score is 94.73%, meaning that at least two interviews had similar answers for 94.73% of all questions. The mean lies at 80.23%, the median at 79.90%. Taking a closer look at interviews above the 99th percentile, it is noticeable that those near-duplicates can be attributed to only a few interviewers. Ten interviewers produced 56 interviews that had very similar responses. FI produced 25% of these interviews. To examine this in more depth, we calculated the mean near-duplicate rate for every interviewer, i.e. how many similar answers an interviewer showed on average. In particular, three interviewers show suspiciously high values: FI, F2, and a third interviewer who was not confirmed as a falsifier. Again, F3 did not show any suspicious values. We therefore conclude that near-duplicates can help to identify falsifiers, but they should still be used with caution. The data need to meet many assumptions and, in our case, this method did not identify all misbehaving interviewers.