The data used in these analyses come from 26,742 sample adults aged 18 and over who participated in the 2017 NHIS. A multipurpose, nationally representative health survey of the civilian, noninstitutionalized U.S. population, the NHIS is conducted annually by the National Center for Health Statistics. Interviewers with the U.S. Census Bureau administer the questionnaire using computer-assisted personal interviewing (CAPI). Telephone interviewing is permitted to complete missing portions of the interview.
The sample adult is randomly selected from all adults aged >18 years in the family and answers for himself/herself (unless physically or mentally unable to do so, in which case a knowledgeable adult serves as a proxy respondent). The Sample Adult Core module collects information on adult sociodemographics, health conditions, health status and limitations, health behaviors, and health care access and use. The final sample adult response rate for 2017 was 53.0% (National Center for Health Statistics 2018).
Outcome variables. A total of 125 outcomes, based on 102 questions, were included in the analysis (see supplemental Table A21.1). Thirteen questions with a nominal scale produced multiple outcomes, using the approach described by Mangione, Fowler, and Louis (1992). For these 13 questions, response categories with a prevalence of 5% or greater were made into separate outcomes.
Twenty-five of the 102 questions came from the Family Core module that collects data on topics such as disability status, health insurance coverage, and income. For this module, a respondent aged 18 or older answers for himself/herself and all other members of the family. In roughly 30% of interviews, the sample adult was not the family respondent; hence, outcomes generated from these 25 questions include a mix of self-report and proxy report. Preliminary analyses did show proxy responses to be associated with slightly larger interviewer effects than self-reports. However, to avoid a substantial loss of data, we include a control variable for self (versus proxy) report in all models of outcomes based on questions from the Family Core module.
Given that a primary focus of our research is how interviewer effects vary by question characteristics, the 102 questions used in this analysis were purposively selected based on question characteristics shown to be associated with interviewer effects in past research*: difficult/complex topics (e.g., details of private health insurance plans) versus easier/less complex topics (e.g., demographics such as age, sex, and marital status); long versus short questions; definitions/clarifying text versus no definitions/clarifying text; optional text
It is important to note that this is not an exhaustive nor representative set of NHIS questions. How later results would be impacted if all NHIS questions were included is unknown.
versus no optional text; sensitive versus not sensitive; and demographic/factual versus attitudinal/subjective.
For this analysis, we defined difficult questions or questions on complex topics to be those that required respondents to recall behaviors or events that may be difficult to remember (e.g., ever received the hepatitis A vaccine) or dealt with a complicated topic to which respondents may have given little or no thought (e.g., details of public and private forms of health insurance plans). As proxies for difficult or complex topic questions, we hypothesized that these questions would generate a greater number of inadequate responses and more requests for clarification by the respondent. In turn, this may provide more opportunities for interviewers to improvise and deviate from the interview script, potentially leading to larger interviewer effects.
We also hypothesized that questions including definitions of key terms or clarifying information would present greater comprehension problems for respondents. In turn, these items would likely elicit more requests for clarification or to have the question repeated, again providing more opportunities for interviewers to improvise and stray from the interview protocol (e.g., shortening or simplifying the question). Similarly, we hypothesized that questions with lower Flesch reading ease scores (Flesch 1948), i.e. questions that are more difficult to read, would lead to variations across interviewers in how the questions were administered. Additionally, these questions may lead to more comprehension problems among respondents. Flesch reading ease scores were grouped into the following categories for analysis: very easy/easy/fairly easy (70.0 < scores < 100.0), standard (60 < scores < 70.0), and fairly difficult/difficult/very confusing (0.0 < scores < 60.0). Finally, we hypothesized that questions with optional text would lead to variability across interviewers as to when this text was read, in turn leading to larger interviewer effects.
Sensitive questions may lead to larger interviewer effects compared to non-sensitive items (Mangione, Fowler, and Louis 1992; Schnell and Kreuter 2005). To identify sensitive questions for this study, we turned to earlier research using NHIS data to explore the effects of respondent, question, and interviewer characteristics on item response times and item nonresponse (Dahlhamer, et al. 2019). In that research, each author (5 total) rated the sensitivity of 270 questions using two rating items: "This question is very personal (1 = completely disagree to 5 = completely agree)" and "I would be uncomfortable asking this question (1 = completely disagree to 5=completely agree)". For each question under analysis, the five raters' scores on the first rating item were summed and then the five raters' scores on the second rating item were summed. Since the summed scores based on the two rating items were highly correlated (0.94), the two rating scores for each question under analysis were further summed to create an index of sensitivity. The index was then recoded into four discrete categories using quartiles as cut points. Questions falling in quartile four had the highest sensitivity scores (index scores ranging from 18 to 37). A subset of these questions was selected for inclusion in this study.
Finally, we selected a mix of attitudinal/subjective questions and factual/demographic questions, again using Dahlhamer, et al. (2019) as a guide. (Please see online supplemental Table A21.1 for all questions included in the analysis and how they were coded with respect to question characteristics.)
Covariates included in multilevel models. For each outcome variable, we fitted a multilevel model that included fixed effects of respondent characteristics, interviewer characteristics, and county characteristics. Respondent characteristics included age (18-24, 25-44, 45-64, 65+), sex, race/ethnicity and language (Hispanic-English, Hispanic-Spanish, non-Hispanic white-any language, non-Hispanic black-any language, non-Hispanic other-any language), education (less than high school diploma/General Educational Development (GED) credential, high school diploma/GED, some college/Associate of Arts (AA) degree, bachelor's degree or higher), marital status (married/living with partner, never married, divorced/separated, widowed), reported health status (poor/fair, good, very good/excellent), cognitive difficulties (whether or not sample adult is limited in any way because of difficulty remembering or because he/she experiences periods of confusion), and mode of interview (telephone versus in-person). As noted earlier, for outcomes based on questions in the Family Core module, an additional control was included for whether the sample adult response was self-reported or reported by a proxy. Finally, we included the sample adult sampling weight (log-transformed) as an additional control variable in the full models. (Please see supplemental Table A21.2 for descriptive statistics for respondent characteristics included in the multilevel models.)
The fixed effects of county-level control variables were also included in the final models. These measures largely mimicked the respondent-level demographic measures and represent five-year rolling estimates taken from the American Community Survey (2012-2016). Measures included the proportion of the county population that was: aged 65 and over; under the age of 30; female; aged 25 and older that has less than a high school diploma or GED; aged 25 and older that has at least a bachelor's degree; married; never married; without health insurance coverage; with a physical or mental disability; Hispanic; and non-Hispanic black. The last two measures on this list are categorical (quartiles), while all other measures are retained as continuous measures and grand mean centered. (Please see online supplemental Table A21.3 for descriptive statistics for county-level measures included in the multilevel models.)
Finally, the fixed effects of four interviewer characteristics were included in the final models, including two measures of interviewer experience: whether the interviewer worked on the NHIS in 2016 and interviewing experience in 2017. The latter is a chronological count of the sample adult interviews conducted by the interviewer for 2017. For both measures, we hypothesize that greater interviewing experience will lead to greater mastery of interviewing skills and the interview protocol which, in turn, may be associated with smaller interviewer effects on estimates. A third measure captured the interviewer's cooperation rate for 2017. We included this measure to explore the link between interviewers' abilities to secure respondent participation and subsequent data quality. Finally, we included a measure of within-interview interviewer behavior: pace of sample adult interviews for 2017. Pace was defined as the mean number of seconds per question. We hypothesize that a faster interview pace may lead to greater deviations from the interview protocol (e.g., reading questions exactly as worded) and, therefore, may be associated with larger interviewer effects on estimates. (Please see supplemental Table A21.4 for descriptive statistics for interviewer characteristics included in the multilevel models.)