For the alternating statistics, as noted previously, we often fix the smoothing constant X, but we can choose to consider it a free parameter to be estimated. The resulting model then belongs to the curved exponential family of distributions with more parameters than sufficient statistics - in the maximum likelihood equation, there are more unknowns, в, than knowns. The moment equation is no longer equivalent to the maximum likelihood equation, but a similar equation may be constructed by a transformation, increasing the number of statistics and parameters so that they are equal (Hunter & Handcock, 2006).

Bayesian Inference

In the Bayesian paradigm, the uncertainty regarding what is unknown should be modeled by probabilities and probability distributions (see, e.g., Bernardo & Smith, 1994; Box & Tiao, 1973; Lindley, 1965). Because the values of the parameters are unknown, we model our uncertainty regarding the parameters using a prior distribution. When we have observed data, our model for data may be used to assess the likelihood of different values of the parameters, given the probability of observing data for these values. Hence, we may use the data model to update our uncertainty about the parameters into a posterior distribution.

The posterior distribution is not available in closed form, so a sampling procedure has to be performed in which we draw values from the posterior distribution. This may be done using MCMC, where we update the current parameter vector by proposing new values for the parameters and accept these with a probability that depends on how probable the proposed values are in relation to the current values, according to the posterior (an alternative is proposed by Atchade, Lartillot, and Robert (2008)). This does not rely on the moment equation and applies equally well to curved ERGMs. To evaluate the ratios of posterior distributions in the MCMC updating steps, we also need to draw some sample graphs from the model defined by that parameter value (Koskinen, 2008; Koskinen, Robins, & Pattison, 2010). A Bayesian estimation procedure is proposed by Koskinen (2008) and Koskinen et al. (2010), and it is argued that a Bayesian procedure offers advantages over the maximum likelihood approach. The estimation procedure returns a distribution of likely parameters and offers the possibility of using prior specification of uncertainty about the parameters (specified subjectively or through conjugacy - Diaconis & Ylvisaker, 1979). Caimo and Friel (2011) proposed a related but superior approach that uses the exchange algorithm of Murray, Ghahramani, and MacKay (2006), which makes Bayesian estimation more computationally efficient (and less complicated than maximum likelihood estimation). These algorithms produce an output в^{(r)}, в^{(r+1)}, в^{(r} +^{2)},..., в^{(T)}, which is a draw from the posterior distribution of в | x_{o}b_{s}, where r is a suitably chosen burn-in. The relevant information about the parameters given our observation is contained in this output, and a point estimate is obtained as the mean

§ = Tir (§^{(r)} + §^{(r}+^{1} + ••' + §^{(T)}), with uncertainty measured by the standard deviation SD(§k).

The algorithm of Caimo and Friel (2011) is surprisingly simple: in each iteration, a move is proposed to §*, drawn from a normal distribution centered on the current value §^{(t)}. Given this proposal, one graph is generated x* ~ ERGM(§*). We draw a uniform random variate, u ~ Unif (0,1), and based on § * and the generated graph x*, we accept § * and set §^{(t+1)} = § * if log(u) < (§^{(t)} - § *)^{T} (z(x*) — z(x_{o}b_{s})); otherwise, set §^{(t}+^{1)} = §^{(t)}. Heuristically, this means that we accept parameters that generate graphs similar to what we have observed.