Desktop version

Home arrow Mathematics

  • Increase font
  • Decrease font


<<   CONTENTS   >>

Data Structure

The analytic data set has a complex nested structure. Each of the 125 outcomes was measured at the respondent level. Respondents (level 1) are uniquely nested within interviewers (level 2) as well as counties (level 2). Interviewers, however, are not uniquely nested within counties, resulting in a cross-classified data structure. Since interviewers and counties are treated as level 2 units in this analysis, interviewers and counties are said

FIGURE 21.1

Network graph depicting cross-classification of respondents (125 outcomes) by interviewer and county.

to be cross-classified for each of the 125 outcomes. Figure 21.1 depicts the cross-classified data structure. In total, 26,742 sample adults were nested within 991 interviewers and 815 counties. Therefore, on average, there are roughly 27 respondents per interviewer and 33 respondents per county.’1' We did not constrain the number of respondents (level 1 units) per interviewer (level 2 units) or county (level 2 units) to some minimum (e.g., 10 per group). Research has shown that the number of groups or level 2 units (of which we have large sample sizes) is more important in determining statistical power for multilevel modeling than the number of level 1 units within groups (Snijders 2005; West, Chapter 23, this volume), and that small numbers of level 1 units within groups are not detrimental to point and interval estimates of parameters (Bell, et al. 2014; Maas and Hox 2005).

Statistical Analyses

For each of the 125 outcomes, we started with an unconditional model that included only the random effects for interviewers. In step two, we added the random effects for counties. In step three, we added the fixed effects of respondent characteristics to the model. In steps four and five, we added the fixed effects of the county-level characteristics and then the interviewer characteristics, respectively. Hence, step five was the final step in the modeling process in which all random and fixed effects were included in the model.[1] The IIC based on the final model for each outcome is the focus of all analyses reported hereafter.

Of the 125 outcomes, 119 are dichotomous and were therefore modeled using logistic regression. Following Beretvas (2010), the full logistic regression model predicts a logit

transformation of the probability that a dichotomous outcome for respondent i is equal to

  • (
  • 1, In , as a function of an overall mean (y), a set of p respondent characteristics

^ p f q Л

^yiRespondent_char„, , a set of q county characteristics ^/J(,County_char(, , a set of r

4 e=1 ) V b=l у

( r 'l

interviewer characteristics } ^,Interviewer_char, , a random effect due to county s (u0s),

V f-i /

and a random effect due to interviewer t (u0l), where u0s and u0l are normally distributed with mean zero and variances au0s and a,lOI respectively:

The following formula was used to approximate the value of the IIC for dichotomous outcomes:

The level 1 (respondent) variance is set to 3.29, which is the variance of the underlying standard logistic distribution (Snijders and Bosker 1999).

The remaining six outcomes were treated as continuous and modeled using linear regression. Again, following Beretvas (2010), the full linear regression model predicts a continuous outcome (VTa)as a function of an overall mean (y), a set of p respondent char- ( г ‘ Л ( ч

acteristics ^/TRespondent_char„, , a set of q county characteristics County _char(, ,a

чn=1 У Vbml

('

set of r interviewer characteristics / ^, lnter7iewer_char, , a random effect due to county

V f=i >

s (u0s)/a random effect due to interviewer t (u0,), and a residual term (e,(M)), where u0s and «oi are normally distributed with mean zero and variances cr£ and a},,, respectively, and c,/s л is normally distributed with mean zero and variance cre2:

The IIC equation for continuous outcomes is as follows:

The two-level, cross-classified logistic regression models were estimated with SAS PROC GLIMMIX (v9.4) using maximum likelihood estimation (method = laplace), while the two- level, cross-classified linear regression models were estimated with SAS PROC MIXED (v9.4) using restricted maximum likelihood estimation.

IICs by question characteristics. As noted previously, 13 of the 102 questions under analysis produced two or more outcomes. To analyze results at the question level, IICs for multiple

TABLE 21.1

Descriptive Statistics for Measures Used in Analyses of IICs by Question Characteristics

Measure

Number of Questions

Percent

Question on a complex topic

Yes

22

21.6

No

80

78.4

Question length

Quartile 1 (<68 characters)

24

23.5

Quartile 2 (68-94 characters)

26

25.5

Quartile 3 (95-155 characters)

26

25.5

Quartile 4 (>156 characters)

26

25.5

Flesch reading ease score

Very easy/easy/fairly easy (scores of 70.0-100.0)

55

53.9

Standard (scores of 60.0 to <70.0)

25

24.5

Fairly difficult/difficult/very confusing (scores of 0.0 to <60.0)

22

21.6

Question includes definitions/clarifying statements

Yes

25

24.5

No

77

75.5

Question includes optional text

Yes

52

51.0

No

50

49.0

Type of question

Factual/demographic

85

83.3

Attitudinal/subjective

17

16.7

Question is deemed to be sensitive

Yes

17

16.7

No

85

83.3

Source: National Health Interview Survey, 2017.

outcomes from a single question were averaged; the average IIC was then assigned to that question. We took this approach to ensure that a characteristic associated with a single question would not be over-represented in the data set. This was especially important for our analysis of IICs by question characteristics. To explore differences in IICs by question characteristics, we computed median IICs across the 102 questions for each category of a question characteristic (e.g., the median IIC for questions that include optional text versus the median IIC for questions that do not include optional text; see Table 21.1 for descriptive statistics for the question characteristics). As discussed in the Results section, we focus on medians as opposed to means due to the right-skewed distribution of the estimated IICs. Therefore, to test for significant differences in median IICs by categories of a question characteristic, we used the following non-parametric tests: the Mann-Whitney-Wilcoxon two-sample test for characteristics with two categories (e.g., sensitive questions versus non-sensitive questions) and the Kruskal-Wallis test for measures with three or more categories (e.g., length of question, broken into quartiles). Given the small sample size (n = 102) for these analyses, we used an alpha level of .10 to determine if differences in median IICs were statistically significant.

IICs by interviewer characteristics. We are also interested in the associations between characteristics of interviewers or their behaviors and interviewer effects. For interviewer pace, we were particularly interested in the tails of the distribution, that is, interviewers

TABLE 21.2

Descriptive Statistics for Measures Used in Analyses of IICs by Interviewer Characteristics

Measure

Number of Interviewers

Percent of Interviewers

Number of Interviews

Percent of Interviews

Pace of interview (mean seconds per question)

Group 1:<6.81

165

16.6

4,041

15.1

Group 2: >6.81 to <10.57

632

63.8

18,709

70.0

Group 3: >10.57

194

19.6

3,992

14.9

Cooperation rate

Group 1: <64.02

296

29.9

3,980

14.9

Group 2: >64.02 to <87.85

530

53.5

18,793

70.3

Group 3: >87.85

165

16.6

3,969

14.8

Number of sample adult interviews

Group 1:1-20

515

52.0

4,157

15.5

Group 2: 21-40

236

23.8

6,977

26.1

Group 3:41 or more

240

24.2

15,608

58.4

Worked on the NHIS in 2016?

Yes

855

86.3

1,768

6.6

No

136

13.7

24,974

93.4

Source: National Health Interview Survey 2017.

who went the fastest and interviewers who went the slowest. We created a trichotomous measure of pace, whereby the fastest interviewers (who worked roughly 15% of interviews) comprised one group, the slowest interviewers (who also worked roughly 15% of interviews) comprised the second group, and the remainder of interviewers comprised the third group. A similar measure was created for interviewer cooperation rates. Interviewers with the lowest cooperation rates (who worked roughly 15% of interviews) were assigned to one group, interviewers with the highest cooperation rates (who also worked roughly 15% of interviews) were assigned to a second group, and the remainder of interviewers comprised the third group. Finally, we focused on two measures of interviewer experience: whether the interviewer worked on the NHIS in 2016 and the total number of sample adult interviews the interviewer worked in 2017. The latter measure defined three groups of interviewers: 1-20 sample adult interviews, 21-40 sample adult interviews, and 41 or more sample adult interviews. (See Table 21.2 for descriptive statistics for these measures.)

Taking interviewer pace as an example, the full model described in Equation 21.1 (bivariate outcomes) or Equation 21.3 (continuous outcomes) was then estimated and the IIC was computed (Equation 21.2 for bivariate outcomes, Equation 21.4 for continuous outcomes) for each of the 39 questions analyzed (see details regarding selection of questions below) using data collected by the fastest interviewers.[2] This process was repeated for the slowest interviewers, and then for the interviewers with medium pace. A data set was then constructed that included a total of 117 IICs: 39 for interviewers with the slowest pace, 39 for interviewers with the fastest pace, and 39 for interviewers with a medium pace. The same steps were undertaken for the three interviewer cooperation rate groups, the three groups of interviewers defined by the number of sample adult interviews completed in 2017, and the two groups of interviewers defined by whether they worked on the NHIS in 2016. (See online supplemental Table A21.5 for a description of the data set used in these analyses.)

We then took the computed IICs and tested for differences in median IIC for the groups defined by each interviewer characteristic using either a Kruskal-Wallis test (for interviewer measures broken into three groups) or a Mann-Whitney-Wilcoxon two-sample test (for the "worked on the NHIS in 2016" measure). Again, the resulting distributions of IICs within groups of interviewers tended to be right-skewed; hence, the focus on medians as opposed to means. In addition, where suggested by the initial results, we collapsed two groups (for the trichotomous measures) and re-tested for a significant difference using Mann-Whitney-Wilcoxon two-sample tests. Again, given the small sample sizes for these analyses, we used an alpha level of .10 to determine if differences in median IICs were statistically significant.

As a check on the robustness of the findings of these analyses, we also estimated two- level models in which random intercepts were included for question and fixed effects were estimated for a specific interviewer characteristic (e.g., pace of interview defined as fastest, slowest, and medium). The results of these models corroborate the findings of the analyses described in this section and reported in Section 21.3.2.

Selection of questions for analysis of interviewer characteristics. We used the following criteria to select questions for this analysis. First, to ensure adequate sample sizes, we focused on questions where all sample adults were in universe. Second, for dichotomous outcomes, we limited the analysis to those with at least a 90%/10% split to avoid unstable models resulting from empty cells. And third, for questions that produced multiple outcomes, we selected the most prevalent response category to represent the question. In all, 45 of the 102 questions met these criteria. Due to model convergence errors, however, six items were eliminated, leaving 39 questions for analysis.

  • [1] Note that the number of sample adults, interviewers, and counties vary across outcomes given item nonresponse and question universe. ’ Due to minimal or no variance at the county level, final models did not converge for some outcomes. Toachieve convergence in these cases, random effects for county were dropped and the model was re-estimated(county-level fixed effects were retained).
  • [2] Convergence errors and/or non-positive definite random effect covariance matrices emerged for some outcomes due to insufficient variance at the county level. Dropping the random county effects from the modelsgenerally solved this problem.
 
<<   CONTENTS   >>

Related topics