The analytic data set has a complex nested structure. Each of the 125 outcomes was measured at the respondent level. Respondents (level 1) are uniquely nested within interviewers (level 2) as well as counties (level 2). Interviewers, however, are not uniquely nested within counties, resulting in a cross-classified data structure. Since interviewers and counties are treated as level 2 units in this analysis, interviewers and counties are said
Network graph depicting cross-classification of respondents (125 outcomes) by interviewer and county.
to be cross-classified for each of the 125 outcomes. Figure 21.1 depicts the cross-classified data structure. In total, 26,742 sample adults were nested within 991 interviewers and 815 counties. Therefore, on average, there are roughly 27 respondents per interviewer and 33 respondents per county.’1' We did not constrain the number of respondents (level 1 units) per interviewer (level 2 units) or county (level 2 units) to some minimum (e.g., 10 per group). Research has shown that the number of groups or level 2 units (of which we have large sample sizes) is more important in determining statistical power for multilevel modeling than the number of level 1 units within groups (Snijders 2005; West, Chapter 23, this volume), and that small numbers of level 1 units within groups are not detrimental to point and interval estimates of parameters (Bell, et al. 2014; Maas and Hox 2005).
For each of the 125 outcomes, we started with an unconditional model that included only the random effects for interviewers. In step two, we added the random effects for counties. In step three, we added the fixed effects of respondent characteristics to the model. In steps four and five, we added the fixed effects of the county-level characteristics and then the interviewer characteristics, respectively. Hence, step five was the final step in the modeling process in which all random and fixed effects were included in the model. The IIC based on the final model for each outcome is the focus of all analyses reported hereafter.
Of the 125 outcomes, 119 are dichotomous and were therefore modeled using logistic regression. Following Beretvas (2010), the full logistic regression model predicts a logit
transformation of the probability that a dichotomous outcome for respondent i is equal to
^ p f q Л
^yiRespondent_char„, , a set of q county characteristics ^/J(,County_char(, , a set of r
4 e=1 ) V b=l у
( r 'l
interviewer characteristics } ^,Interviewer_char, , a random effect due to county s (u0s),
V f-i /
and a random effect due to interviewer t (u0l), where u0s and u0l are normally distributed with mean zero and variances au0s and a,lOI respectively:
The following formula was used to approximate the value of the IIC for dichotomous outcomes:
The level 1 (respondent) variance is set to 3.29, which is the variance of the underlying standard logistic distribution (Snijders and Bosker 1999).
The remaining six outcomes were treated as continuous and modeled using linear regression. Again, following Beretvas (2010), the full linear regression model predicts a continuous outcome (VTa)as a function of an overall mean (y), a set of p respondent char- ( г ‘ Л ( ч
acteristics ^/TRespondent_char„, , a set of q county characteristics County _char(, ,a
чn=1 У Vbml
set of r interviewer characteristics / ^, lnter7iewer_char, , a random effect due to county
V f=i >
s (u0s)/a random effect due to interviewer t (u0,), and a residual term (e,(M)), where u0s and «oi are normally distributed with mean zero and variances cr£ and a},,, respectively, and c,/s л is normally distributed with mean zero and variance cre2:
The IIC equation for continuous outcomes is as follows:
The two-level, cross-classified logistic regression models were estimated with SAS PROC GLIMMIX (v9.4) using maximum likelihood estimation (method = laplace), while the two- level, cross-classified linear regression models were estimated with SAS PROC MIXED (v9.4) using restricted maximum likelihood estimation.
IICs by question characteristics. As noted previously, 13 of the 102 questions under analysis produced two or more outcomes. To analyze results at the question level, IICs for multiple
Descriptive Statistics for Measures Used in Analyses of IICs by Question Characteristics
Source: National Health Interview Survey, 2017.
outcomes from a single question were averaged; the average IIC was then assigned to that question. We took this approach to ensure that a characteristic associated with a single question would not be over-represented in the data set. This was especially important for our analysis of IICs by question characteristics. To explore differences in IICs by question characteristics, we computed median IICs across the 102 questions for each category of a question characteristic (e.g., the median IIC for questions that include optional text versus the median IIC for questions that do not include optional text; see Table 21.1 for descriptive statistics for the question characteristics). As discussed in the Results section, we focus on medians as opposed to means due to the right-skewed distribution of the estimated IICs. Therefore, to test for significant differences in median IICs by categories of a question characteristic, we used the following non-parametric tests: the Mann-Whitney-Wilcoxon two-sample test for characteristics with two categories (e.g., sensitive questions versus non-sensitive questions) and the Kruskal-Wallis test for measures with three or more categories (e.g., length of question, broken into quartiles). Given the small sample size (n = 102) for these analyses, we used an alpha level of .10 to determine if differences in median IICs were statistically significant.
IICs by interviewer characteristics. We are also interested in the associations between characteristics of interviewers or their behaviors and interviewer effects. For interviewer pace, we were particularly interested in the tails of the distribution, that is, interviewers
Descriptive Statistics for Measures Used in Analyses of IICs by Interviewer Characteristics
Source: National Health Interview Survey 2017.
who went the fastest and interviewers who went the slowest. We created a trichotomous measure of pace, whereby the fastest interviewers (who worked roughly 15% of interviews) comprised one group, the slowest interviewers (who also worked roughly 15% of interviews) comprised the second group, and the remainder of interviewers comprised the third group. A similar measure was created for interviewer cooperation rates. Interviewers with the lowest cooperation rates (who worked roughly 15% of interviews) were assigned to one group, interviewers with the highest cooperation rates (who also worked roughly 15% of interviews) were assigned to a second group, and the remainder of interviewers comprised the third group. Finally, we focused on two measures of interviewer experience: whether the interviewer worked on the NHIS in 2016 and the total number of sample adult interviews the interviewer worked in 2017. The latter measure defined three groups of interviewers: 1-20 sample adult interviews, 21-40 sample adult interviews, and 41 or more sample adult interviews. (See Table 21.2 for descriptive statistics for these measures.)
Taking interviewer pace as an example, the full model described in Equation 21.1 (bivariate outcomes) or Equation 21.3 (continuous outcomes) was then estimated and the IIC was computed (Equation 21.2 for bivariate outcomes, Equation 21.4 for continuous outcomes) for each of the 39 questions analyzed (see details regarding selection of questions below) using data collected by the fastest interviewers. This process was repeated for the slowest interviewers, and then for the interviewers with medium pace. A data set was then constructed that included a total of 117 IICs: 39 for interviewers with the slowest pace, 39 for interviewers with the fastest pace, and 39 for interviewers with a medium pace. The same steps were undertaken for the three interviewer cooperation rate groups, the three groups of interviewers defined by the number of sample adult interviews completed in 2017, and the two groups of interviewers defined by whether they worked on the NHIS in 2016. (See online supplemental Table A21.5 for a description of the data set used in these analyses.)
We then took the computed IICs and tested for differences in median IIC for the groups defined by each interviewer characteristic using either a Kruskal-Wallis test (for interviewer measures broken into three groups) or a Mann-Whitney-Wilcoxon two-sample test (for the "worked on the NHIS in 2016" measure). Again, the resulting distributions of IICs within groups of interviewers tended to be right-skewed; hence, the focus on medians as opposed to means. In addition, where suggested by the initial results, we collapsed two groups (for the trichotomous measures) and re-tested for a significant difference using Mann-Whitney-Wilcoxon two-sample tests. Again, given the small sample sizes for these analyses, we used an alpha level of .10 to determine if differences in median IICs were statistically significant.
As a check on the robustness of the findings of these analyses, we also estimated two- level models in which random intercepts were included for question and fixed effects were estimated for a specific interviewer characteristic (e.g., pace of interview defined as fastest, slowest, and medium). The results of these models corroborate the findings of the analyses described in this section and reported in Section 21.3.2.
Selection of questions for analysis of interviewer characteristics. We used the following criteria to select questions for this analysis. First, to ensure adequate sample sizes, we focused on questions where all sample adults were in universe. Second, for dichotomous outcomes, we limited the analysis to those with at least a 90%/10% split to avoid unstable models resulting from empty cells. And third, for questions that produced multiple outcomes, we selected the most prevalent response category to represent the question. In all, 45 of the 102 questions met these criteria. Due to model convergence errors, however, six items were eliminated, leaving 39 questions for analysis.