# Outlining an econometric model of depression, anxiety, and anger

This section outlines the issues involved, and the variables used, in an econometric analysis of depression, anxiety, and anger. This analysis was based on estimating a logit model for a dependent variable yj such that yj = 1, if the person (i = 1, N) had had a particular condition (depression, anxiety, anger), yt = 0, if he or she had not experienced that condition. The model was estimated on a vector of variables (discussed later), which could, plausibly, affect the chances of that condition existing. A natural purpose of the logistic model was, first, to establish the probabilities of depression, anxiety, and anger and, then, to examine how the probability of having a condition changed in response to a change in the value of one of the condition-affecting factors. It should be emphasised that in estimating the logit model, it was not possible, for reasons of multicollinearity, to include all the categories with respect to the variables: the category that was omitted for a variable is referred to as the reference category (for that variable). In the results, presented later, these reference categories are marked by [R].

If Pr[yi = 1] and Pr/y, = 0] represent, respectively, the probabilities of a person having and not having a condition, the logit formulation expresses the log of the odds ratio as a linear function of К variables (indexed k = 1, K) which take values, Xn Xi2,..., XjK with respect to person i, i = 1, N:

where f3k is the coefficient associated with variable k, k = 1,. . ., K.

From equation (3.1) it follows that

where the term e in the equation (3.2) represents the exponential term.

## The method of recycled predictions

The results in this chapter are presented in terms of the probabilities computed from equation (3.2), using the method “recycled predictions” described in Long and Freese (2014, chapter 4) and in a STATA manual.7 Since this method underpins the results presented in this chapter it is useful, at the very outset, to describe it in some detail. The variable y, in equation (3.1) is defined over persons distinguished by different characteristics - by gender, race, region, and so forth. Suppose that one of these characteristics is gender and persons are identified inter alia by whether they are male or female. The object is to identify the probabilities of having a particular condition which can be entirely ascribed to gender and, furthermore, to test whether these differ significantly between men and women. The method of “recycled predictions” enables one to do this.

Suppose that the first variable relates to a person’s gender so that Xi( = 1 if person i is a man and Xn = 2 if she is a woman. For ease of exposition assume that the persons are ordered so that Xit = 1 for i = 1, . . . , M and Xn = 2 for i = M + 1, . . . , N. Now, using the logit estimates from equation (3.1), equation (3.2) predicts for each person his or her probability of having had the condition. This probability is denoted pt(i = 1,..., N). The mean of the p; defined over all the N persons in the estimation sample will be the same as the (estimation) sample proportion of persons that have had the condition (i.e., persons for whom у, = 1). Similarly, the mean of the defined over the M men and N - M women will be the same as the (estimation) sample proportion of men and women that have had that condition. In other words, the estimated logit equation passes through the sample means.8

However, the difference between the two sample means - men (pM) and women (pw) - does not reflect the differences, due solely to gender, between men and women in their probabilities of having had that condition. This is because men and women differ not just in terms of gender but also with respect to variables like race, region, income, and education, among others. Computing the mean probabilities over each subgroup will not neutralise these differences and, hence, differences between pM and pw cannot be attributed solely - although, of course, some part may be attributable - to differences in gender.

The method of “recycled predictions” isolates the gender effect on the predicted probability of men and women having had a condition. First, “pretend” that all N persons in the estimation sample are men. Holding the values of the other variables constant (either to their observed sample values, as in this chapter, or to their mean values), compute the average probability of having had a condition under this assumption and denote it pM. Next, “pretend” that all N persons in the estimation sample are women and, again holding the values of the other variables constant, compute the average probability of having had a condition under this assumption and denote it pw.

Since the values of the non-gender variables are unchanged between these two hypothetical scenarios, the only difference between them is that, in the first scenario, the male coefficient is “switched on” (with the female coefficient “switched off”), while in the other scenario the female coefficient is “switched on” (with the male coefficient “switched off”), for all the N persons in the estimation sample.9 Consequently, the difference between pM and pw is entirely due to differences in gender. In essence, therefore, in evaluating the effect of two characteristics A and В on the likelihood of a particular outcome, the method of “recycled predictions” compares two probabilities: first, under an “all have the characteristic A" scenario and then under an

“all have the characteristic B” scenario, with the values of the other variables remaining unchanged between the scenarios. The difference in the two probabilities is then entirely due to the attributes represented by A and В (in this case, gender differences). These probabilities, respectively, pA and pB, are referred to in this chapter as the predicted probabilities (PPs) of an event under A and B. So, for example, in the earlier exposition, pM and p" refer to the PPs of men and women having a particular condition.

 Related topics