Desktop version

Home arrow Computer Science arrow OECD guidelines on measuring subjective well-being.


Limits of validity

Although the evidence for the reliability, validity and usefulness of subjective well-being measures is strong, like all measures they are not perfect, and there are limitations that need to be considered by both producers and users of subjective well-being data.

Subjective well-being measures have been found to have a relatively high noise-to-signal ratio. For example, in reviewing the evidence, Diener (2011) states that around 60-80% of the variability in life satisfaction scales is associated with long-term factors and that the remaining 20-40% is due to occasion-specific factors and errors of measurement. These occasion-specific factors can include one-off occurrences that affect large numbers of people simultaneously, such as major news events orValentine’s Day (Deaton, 2011), or circumstantial events that may affect individuals’ momentary mood prior to the survey (Schwarz and Strack, 2003). Whilst the latter effect should be sufficiently random to wash out of large representative data sets, the former implies that a reasonable number of days, as well as people, need to be sampled to reduce the risk of systematic error. This is further supported by work demonstrating that the day of the week (e.g. Taylor, 2006; Helliwell and Wang, 2011), the season (Harmatz et al., 2000) and the weather (e.g. Barrington-Leigh, 2008) can also influence certain subjective well-being measures,9 although results do tend to be more mixed in these areas.

Table 1.2. Evidence on the validity of measures of subjective well-being

Type of evidence


Face validity

  • • Item-specific non-response rates.
  • • Time to reply.

Rassler and Riphahn (2006); Smith (2013); ONS (2011);

Convergent validity

• Ratings by friends and family.

Frey and Stutzer (2002); Pavot and Diener (1993); Schneider and Schimmack (2009).

• Ratings by interviewers.

Pavot and Diener (1993).

• Emotion judgements by strangers.

Diener, Suh, Lucas and Smith (1999).

• Frequency/intensity of smiling.

Frey and Stutzer (2002); Kahneman and Krueger (2006); Seder and Oishi (2012).

• Changes in behaviour.

Frijters (2000); Diener (2011); Clark, Georgellis and Sanfrey (1998).

• Biophysical measures.

Urry etal. (2004); Steptoe, Wardle and Marmot (2005); Kahneman and Krueger (2006)

• Relationships among different evaluative, affective and/or eudaimonic measures.

Diener, Helliwell and Kahneman (2010); Kahneman and Krueger (2006), Clark and Senik (2011); Diener, Wirtz, Biswas-Diener, Tov, Kim-Prieto, Choi and Oishi (2009); Huppert and So (2009)

Construct validity

• Association with income (individual and national level).

Sacks, Stevenson and Wolfers (2010).

• Life events (e.g. impact of becoming unemployed, married, disabled, divorced or widowed).

Diener, Lucas and Napa Scollon (2006); Lucas (2007); Lucas, Clark, Georgellis and Diener (2003); Winkelmann and Winkelmann (1998).

• Life circumstances (health status, education, social contact, being in a stable relationship).

Dolan, Peasgood and White (2008); NEF (2009).

• Daily activities (e.g. commuting, socialising, relaxing, eating, praying, working, childcare, housework).

Kahneman and Krueger (2006); Frey and Stutzer (2008); Helliwell and Wang (2011); Stone (2011).

Subjective well-being measures can also be sensitive to specific aspects of the survey content. For example, Deaton (2011) found that asking questions about whether or not the country is going in the right direction immediately before an evaluative subjective well-being measure exerted a strong downward influence on the data. Similarly, a number of authors have shown a question order effect when life satisfaction and dating or marriage satisfaction questions are asked (e.g. Strack, Martin and Schwarz, 1988; Schwarz, Strack and Mai, 1991; Tourangeau, Rasinski and Bradburn, 1991). Pudney (2010) finds some evidence that the survey mode impacts the relationship between different satisfaction domains and their objective drivers. These effects are real and can have significant implications for measurement and fitness for use. However, they are largely factors that have the potential to be managed through consistent survey design. Chapter 2 discusses these issues and the best way of handling them.

Differences may also exist among respondents in terms of how questions are interpreted, how different response formats and scale numbers are used, and the existence of certain response styles, such as extreme responding and acquiescence. Socially desirable responding may also impact the mean levels of reported subjective well-being. The evidence for these effects, and the methodological implications, are discussed in further detail in Chapter 2. To the extent that these differences are randomly distributed among populations of interest, they will contribute to random “noise” in the data without necessarily posing a fundamental challenge to data users. However, where they systematically vary across different survey methods, and/or where they affect certain groups, nationalities or cultures differently, this can make the interpretation of group and sample differences in subjective well-being problematic.

Group differences in scale use may arise for a number of reasons, including translation issues (e.g. Veenhoven, 2008; Oishi, 2010), differences in susceptibility to certain response styles (Hamamura, Heine and Paulhus, 2008; Minkov, 2009; van Herk, Poortinga and Verhallen, 2004), or the cultural relevance and sensitivity of certain subjective well-being questions (Oishi, 2006; Vittersp, Biswas-Diener and Diener, 2005). Various methods do exist to detect and control for these effects, and survey design can also seek to minimise this variability, as is described in Chapter 2. However, further research is needed to inform the best approach to international comparisons in particular. That said, there is evidence to suggest that these responses do not extensively bias the analysis of determinants in micro-data (Helliwell, 2008; Fleche, Smith and Sorsa, 2011). At the national level, there remain clear and consistent relationships between objective life circumstances and subjective well-being (Helliwell, 2008), and these differences are reflected in mean scores - such that, for example, the distribution of life satisfaction scores in Denmark and Togo are almost non-overlapping (Diener, 2011).

A final consideration is the extent to which individual, cultural or national fixed effects influence subjective well-being measures - including differences in personality and dispositional affect (Diener, Oishi and Lucas, 2003; Diener, Suh, Lucas and Smith, 1999; Schimmack, Diener and Oishi, 2002; Suls and Martin, 2005). These differences may be due to genetic factors (e.g. Lykken and Tellegen, 1996) or environmental factors in a person’s development that produce chronically accessible and stable sources of information on which subjective well-being assessments may be based (e.g. Schimmack, Oishi and Diener, 2002). To the extent that public policy can shape a person’s life experiences, particularly in developmental phases, the existence of large and relatively stable individual fixed effects does not necessarily mean that measures are insensitive to the effects of policy interventions - but the time frames over which these experiences take effect may be quite long.

It is clear that individual fixed effects are real, and they account for a significant proportion of variance across individuals. For example, in two national longitudinal panel studies, Lucas and Donnellan (2011) found that 34-38% of variance in life satisfaction was due to stable trait differences, and a further 29-34% of additional variance was due to an autoregressive component that was moderately stable in the short-term, but could fluctuate over longer time periods. However, this study did not include measures of the objective life circumstances that might impact on both stable trait-like and autoregressive components. Different cultures and nations also vary in both mean levels (e.g. Bjprnskov, 2010; OECD, 2011) and in the composition or construction of reported subjective well-being (e.g. Schimmack, Oishi and Diener, 2002; Diener, Napa Scollon, Oishi, Dzokoto and Suh, 2000; Oishi, 2006) - although again, it is rare to find research that explicitly documents the contribution of national or cultural fixed effects over and above objective life circumstances.

It is possible that individual, cultural and national fixed effects are substantive, in so far as they have a genuine impact on how people subjectively feel about their well-being. If this is the case, they should not be regarded as measurement error. In practice, however, it is not always easy to disentangle fixed effects in actual experienced subjective well-being from differences in translation, response styles, question meaning and retrospective recall biases - although Oishi (2010) suggests that measurement artefacts (such as differences in number use, item functioning and self-presentation) play a relatively small role in explaining overall national differences at the mean level. What we do know is that, although there is a reasonably stable component to life evaluations, these measures are still sensitive to life circumstances, and do change over time in response to life events (Lucas, 2007; Lucas, Clark, Georgellis and Diener, 2003; Diener, Lucas and Napa Scollon,

2006). Thus, the main drivers of subjective well-being of interest to policy-makers, including long-term life circumstances that may influence resilience to the impact of negative life events, can still be examined. Panel data can also be used so that the impact of changes in life circumstances (including policy interventions) can be examined whilst controlling for fixed effects where necessary.

Summary: Limits of validity

While there are a wide range of issues that potentially limit the validity of subjective measures of well-being, many of these are either of marginal impact in terms of fitness for purpose (i.e. they do not substantively affect the conclusions reached) or can be dealt with through appropriate survey design. For example, any contextual effects from the preceding question will not bias the analysis of changes over time if the survey itself does not change over time. In practical terms, the impact of these limitations on fitness for purpose depends very much on the purpose for which the data is being used. These issues will be dealt with in Chapter 2. One major limit does, however, need to be acknowledged. Despite evidence that cultural factors do not substantively bias multi-variate analysis, there is good reason to be cautious about cross-country comparisons of levels of subjective well-being - particularly life satisfaction.

Coherence and the measurement of subjective well-being

Coherence addresses the degree to which different measures of the same underlying construct tell the same story. Two similar statistics might reflect slightly different concepts, and hence not be comparable even though they are both highly accurate. Coherence depends on the use of common concepts, definitions and classifications. For this reason, the issue of coherence in measures of subjective well-being essentially reduces to the case for a common measurement framework. As discussed in the first section of this chapter, this is the primary rationale for producing a set of guidelines. While the guidelines themselves will not initially constitute a standard, they are a necessary initial step in the process that might later lead to a formal standard.

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >

Related topics