Desktop version

Home arrow Mathematics

  • Increase font
  • Decrease font

<<   CONTENTS   >>

Investigating the Use of Nurse Paradata in Understanding Nonresponse to Biological Data Collection


The collection of biological data in social surveys has led to new possibilities for biosocial research by enabling a more comprehensive understanding of the interplay between the different factors affecting health and behavioral outcomes. This combination of survey and biological data has already yielded some significant discoveries in the social and health sciences (e.g., Banks, et al. 2006; Kim, et al. 2013; Puterman, et al. 2016). The biological measurements collected in social surveys include blood and saliva specimens, which can be used to assess markers of disease (e.g., hemoglobin Ale). For example, the Wisconsin Longitudinal Study collects salivary DNA samples from respondents by mail (Dykema, et al. 2017). The Survey of Health, Retirement, and Aging in Europe collects dried blood spots in several European countries plus Israel. Lay interviewers are used to collect the blood spots in addition to carrying out the interview (Weiss, Sakshaug, and Borsch-Supan 2018). Other examples include Understanding Society: the UK Household Longitudinal Study (UKHLS) and the English Longitudinal Study of Ageing (ELSA), which use nurses to collect venous blood samples from respondents (Benzeval, et al. 2014; Steptoe, et al. 2012). Recent analyses of the UKHLS show evidence of links between informal caregiving and adverse metabolic markers (Lacey, McMunn, and Webb 2018), and higher levels of chronic stress biomarkers for those reemployed in poorer quality jobs (Chandola and Zhang 2017).

The collection of biological data in conjunction with survey data comes with a unique set of challenges, one of which is nonresponse. There are at least three stages at which nonresponse can occur. The first stage is the decision of the respondent to take part in the biological collection component of the interview. Respondents could decide to forego the biological component entirely because of time constraints, general unwillingness, or physical limitations. This would result in nonresponse for all possible measurements, and this could lead to nonresponse bias if those refusing to participate in the biological component are significantly different from those who participate (Groves 2006). This initial stage of participation is likely to have different effects depending on whether the biological data collection occurs at the same time as the interview (as in the lay interviewer collection model), or if there is a gap between the interview and the biological data collection (as in the nurse collection model). In the case of the UKHLS, the nurse visit took place five months after the interview. By this time, respondents may have reconsidered their decision to participate in the survey, or they may be harder to reach due to increasing work, family, or other competing demands. The 1970 British Birth Cohort tried using nurses to conduct interviews and collect biological data all in one visit, but these efforts were not successful, as nurses did not like making repeat visits and persuading reluctant respondents to participate (Brown 2018).

For respondents who do decide to proceed with the biological component, some may only be willing to consent to a subset of possible measurements. This introduces a second stage of participation. For example, a respondent may be unwilling to consent to the collection of blood, but willing to consent to a less invasive collection, such as saliva. This situation is typical in biosocial surveys where cooperation rates are generally lower for the most invasive measurements (Dykema, et al. 2017; Jaszczak, Lundeen, and Smith 2009; Weiss, Sakshaug, and Borsch-Supan 2018). In the third stage of participation, respondents who consent to specific measurements may be unwilling, or physically unable, to complete the measurement once it begins. For example, it may not be possible to extract sufficient amounts of blood or saliva from the respondent, or the respondent may experience physical discomfort during the collection procedure that prevents them from finishing the collection. Even if a sufficient amount of sample is collected, nonresponse can still occur if the collected sample is mishandled or damaged during the shipment or processing stage.

While samples selected for surveys are designed to be representative of the target population, the multiple stages at which nonresponse can occur during the biological data collection can lead to bias in statistical analyses of the collected data (Cernat, et al. 2018; Korbmacher 2014; Sakshaug, Couper, and Ofstedal 2010). The situation is likely to be more severe when a separate nurse visit is required following the interview and/or more invasive biological measures, such as blood, are collected (McFall, et al. 2014). In order to address this nonresponse, survey researchers typically create and apply sampling weights based on response propensity models (Brick and Kalton 1996). These models may also be useful in adapting or targeting approaches to increase contact and cooperation during fieldwork (e.g., Lynn 2017). However, variables included in the models must be available for both respondents and nonrespondents in the sample. Furthermore, to be effective at reducing bias, the variables used in response propensity models must be related both to the response outcome and the analysis variables of interest (Little and Vartivarian 2005).

Auxiliary variables used in response propensity models include different types of data collected during the survey process, known as paradata (for a collection of examples, see Kreuter 2013; for the use of paradata in nonresponse adjustment, see Olson 2013). Paradata can include variables such as call histories, response timings, and interviewer observations, and may provide a cost-effective way to address nonresponse in analyses which use survey data. Such data have been shown to be related to response and improved the fits of response propensity models, though in many studies these data tend not to be strongly related to the survey variables of interest (Diez Roux 2001; Kreuter and Kohler 2009; Kreuter, et al. 2010; Lin and Schaeffer 1995; Peytchev and Olson 2007; Sakshaug and Kreuter 2011; West, Kreuter, and Trappmann 2014). Paradata can have other quality issues, such as missingness and measurement error, which can vary across sources and variables (Sinibaldi, Durrant, and Kreuter 2013; Smith 2011; Stoop, et al. 2010; West 2013).

The use of paradata to study nonresponse to biological data collection and adjust for its occurrence is an understudied area of research, particularly when the collection involves the employment of trained nurses to carry out the fieldwork in a separate visit following the interview, such as in the UKHLS and the ELSA studies. There is evidence that nurses can systematically affect response at each stage of participation (Cernat, et al. 2018). Although separate weights are constructed to adjust for nonresponse to biological data collection in these large-scale surveys, the auxiliary data used in the adjustment mostly come from the sampling frame and the questionnaire. Paradata collected from the attempted nurse visit may provide further useful information to better predict nonresponse and improve the response propensity models fitted after biological data collection, especially since there is some evidence that nurses may differ from interviewers in terms of how they approach and carry out survey tasks related to the contact and cooperation of sample members (Brown 2018). However, nurses are not used to making follow-up visits and collecting paradata (e.g., doorstep interactions) and may be more prone to errors, especially for difficult cases. Therefore, the range and quality of survey paradata collected by nurses is likely to vary.

The aim of this chapter is to explore the potential for using nurse visit paradata to address nonresponse to biological data collection. We carry out an investigation of the quality of paradata variables collected during the nurse visit phase of the UKHLS. Through this analysis, we determine whether such data can help to predict response across the different stages of biological data collection. This determination will inform our conclusion about whether paradata collected by nurses during biological data collection should be incorporated into nonresponse adjustment procedures, or at least disseminated to data users who can then decide for themselves whether to use these data as control variables in their analysis.

In this chapter, we aim to address the following three research questions: (1) What types of paradata are available at each stage of biological data collection where nonresponse might occur? (2) What is the quality of the available paradata - for example, how much variation exists in the variables, and how much data are missing? (3) Can paradata variables improve models of nonresponse for each of the stages of biological data collection? Can these variables still be useful even when there are significant concerns about their quality?

<<   CONTENTS   >>

Related topics