Desktop version

Home arrow Sociology arrow Exponential Random Graph Models for Social Networks Theory, Methods, and Applications


Missing or Partially Observed Data

As discussed in Koskinen et al. (2010), ignoring missing data may be particularly detrimental in social network analysis. Specifically, if we assume the type of dependencies discussed in Chapter 7, the state of missing tie-variables may potentially alter our interpretation of what we have observed. The central problem may be understood from a synthetic example: assume that several people know each other on account of a person who then leaves the room, whereupon a researcher arriving late will be at loss to explain how these strangers came to be together.

If ties are elicited from self-reports, the most likely form of missing data is usually nonresponses, which result in what Huisman (2009) terms “unit” missing. To use as much data as possible while allowing for the nonrespondents to differ from respondents, one may treat nonresponse as a covariate that interacts with configurations of interest as in Robins, Pat- tison, and Woolcock (2004). More recently, Handcock and Gile (2007, 2010) proposed a likelihood-based approach to treating missing data in ERGMs. Making the assumption that data are missing at random, missing data are simulated in the course of estimation so that the vector of observed statistics is substituted for the expected statistics conditional on the part that has been observed. Procedures for fitting ERGMs to data with missing values are available in statnet and PNet.

If there are missing data, an adapted maximum likelihood estimator can be used. This is the case only if the missingness is ignorable, which roughly means that the probability distribution of what is missing depends on the observed data only and not on nonobserved data, and parameters of the ERGM are distinct from parameters determining the missingness mechanism. For example, snowball sampling designs lead to ignorable missingness (Handcock & Gile, 2010; Thompson & Frank, 2000). Let x = (xobs, xmis), where xobs denotes the array of observed tie-variables, and xmis the array of nonobserved tie-variables. With ignorable missingness, the missing information principle of Orchard and Woodbury (1972) implies that the maximum likelihood equation may be written as

E0 {?( X)} = E0 {?( Xobs, Xmis) I Xobs = xobs}•

Compare this to the maximum likelihood equation for complete data,

where xobs is simply the total observed graph. The left-hand sides of both equations are the same and require calculating the expected value of the sufficient statistic for the ERGM. The right-hand side of the equation that is valid for incomplete data,

requires that we make simulated draws from the missing data, conditional on the observed data. The condition means that only the missing tie- variables are changed, not the observed tie-variables in the course of the Metropolis algorithm.

Koskinen et al. (2010) presented a Bayesian approach for dealing with missing data. Similarly to Handcock and Gile (2007, 2010), missing data are simulated in the course of estimation of the parameters. Although the Bayesian procedure was not designed for link prediction, it appears to do a reasonable job of predicting missing tie-variables.

Dealing with partially missing covariate values is complicated by the fact that we typically do not have a model for the covariates immediately available and that the missing covariates themselves would inherit some of the dependencies of the ERGM itself. Although this may be dealt with in a Bayesian framework, a convenient solution is to bootstrap missing covariates, to impute using sample means, or impute using some suitably chosen model.

For further conceptual issues and background on missing data in social network analysis, see, for example, Kossinets (2006).

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >

Related topics