Implicational relations
Let us first return to the issue of how implicational relations constrain the inflectional uncertainty associated with individual cells, continuing with the example of the Russian noun paradigms in Table 7.5. As shown in Table 7.6, the dative and instrumental singulars exhibit the greatest variation, both with three distinct realizations. In the case of the dative singular, the first declension form slovarju ends in u, the second declension form nedele ends in e and the third declension form tetradi ends in i. The instrumental singular is realized by corresponding forms inёт,ej and ju. In order to measure the reduction in uncertainty obtained by locating these cells in paradigms, we must determine the amount of uncertainty that would be associated with them in isolation.
A measure of the uncertainty associated with a cell can be defined in terms of the surprisal values of the cell’s realizations. Surprisal assigns a measure of information to an ‘outcome, in this case the occurrence of a given realization, that is inversely related to its likelihood of occurrence. Let us use ‘p(u)’ to represent the probability that the dative singular cell is realized by a form in u. Then I_{u}, the surprisal value of the dative singular exponent u, can be defined in Figure 7.1 as the negative log of its probability p(u).^{[1]}
This notion of surprisal expresses the intuition that the less likely u is to occur, the more informative it is when it does occur. Surprisal values can be defined for each realization, based on its probability of occurrence. The uncertainty associated with the dative singular cell can then be defined as the sum of the surprisal values of the cell realizations, weighted for frequency. This is the uncertainty value measured by entropy (Shannon 1948).
To apply this measure to a paradigm cell C, let Rc be the set of realizations of C, the set {Xu, Xe, Xi}, in the case of the dative singular. As in the definition of surprisal, let p(x) represent the probability that the cell C is realized by a realization x. Then H(C), the entropy of C, is defined in Figure 7.2.^{[2]} The entropy of
Figure 7.1 Exponent ‘surprisal’ (cf. Cover and Thomas 1991, Hale 2001)
Figure 7.2 Cell entropy (cf. Shannon 1948)
INFORMATIONTHEORETIC WP
175
Figure 7.3 Entropy ‘ceiling’ for dative singular cell in Russian
a cell increases as a function of the number of outcomes and the uniformity of their distribution. The greatest uncertainty arises when there is a large number of equiprobable outcomes. Uncertainty is reduced when there are fewer ‘choices, either because there are either few outcomes in total or because the outcomes have highly skewed distributions. As with surprisal, there is an intuitive correlation between entropy and the number and distribution of alternatives. The larger the choice space and the more evenly distributed the alternatives, the more difficult it is to guess which alternative will occur.
The use of entropy provides a particularly robust estimation of morphological uncertainty. On the one hand, the accuracy of the estimation improves as the accuracy of distributional information obtained from corpora increases. On the other hand, the estimation degrades gracefully as the accuracy of distributional information decreases, so that the absence of frequency information does not produce no estimation, but instead a worstcase estimation. As a consequence, an ‘entropy ceiling’ can be estimated from a grammar, word list or other descriptive source that identifies an inventory of morphological variants but provides little or no information about their distribution.
For example, an entropy ceiling for the dative singular cell, C_{ds}, can be defined from a standard reference grammar. For the sake of illustration, let us assume that the forms u, e and i are the only dative singular exponents in Russian.^{10 11} In the absence of frequency information, let us also assume that the endings are evenly distributed, so that each occurs onethird of the time. Under these conditions, the probability values ‘p(x)’ in Figure 7.2 effectively ‘cancel out’ in Figure 7.3. As a result, the entropy of the dative singular cell, H(C_{ds}), reduces to log_{2} (n), where n is the number of case exponents.^{[3]} ^{[4]}
The entropy of approximately 1.6 bits represents the worst case uncertainty measure for a cell that can be realized by three exponence patterns. Adding information about the frequency of these alternatives will tend to introduce a distributional bias that reduces entropy. This is again an intuitive effect, since the more unbalanced the distribution of alternatives is, the easier it is to guess the more frequent alternatives. But even in the absence of frequency information, entropy defines a useful uncertainty ceiling for a cell.
Classical WP models constrain the uncertainty associated with individual cells by locating those cells in paradigms that exhibit patterns of interdependent choices.
Figure 7.4 Conditional entropy (cf. Cover and Thomas 1991:16)
The uncertaintyreducing effect of paradigmatic affiliation can again be expressed in informationtheoretic terms, as the conditional entropy of a cell C_{2} given a known cell C_{1}. Conditional entropy measures the amount of uncertainty that remains associated with C_{2} if C_{1} is already known.
Conditional entropy is standardly defined, as in Figure 7.4, in terms of p(ylx), the conditional probability of a realization y of C_{2} given a realization x of C_{1}. The conditional probability of y given x, p(ylx), is in turn defined as , the joint probability of x and y divided by the probability of x.
Just as entropy corresponds to the frequencyweighted sum of surprisal values, conditional entropy can be thought of as the weighted sum of conditional probabilities. This correspondence again reflects a simple intuition. Continuing with the paradigms in Table 7.5, there will be three basic relations between the realization of the dative singular cell and the realization of other cells. If a cell exhibits no inflectional variation, the same realization will cooccur with each of the three dative singular realizations and therefore be of no value in predicting which of the three occurs in a given paradigm or inflection class. At the other extreme, a cell that exhibits the same threeway contrast as the dative singular cell will be maximally useful in predicting the dative singular realization. Between these extremes lie cells that exhibit variation that partially overlaps with the variation shown by the dative singular.
The paradigms in Table 7.5 illustrate these three types of covariation. The nominative, dative, instrumental and locative plurals exhibit no variation, the instrumental singular exhibits congruent variation, and the remaining cells exhibit overlapping variation. Each pattern is represented by a block of rows in Table 7.8. The first block indicates that the instrumental plural realization ami cooccurs onethird of the time with each of the dative singular realizations u, e and i. Hence the joint probability,p(ami, x) = 1/3 for each dative singular realization x, and p(ami), the summed probability of ami, is 1.
The second block of rows indicates that the genitive singular realization a cooccurs with the dative singular realization u, while i cooccurs with the other realizations. The final block indicates that the three instrumental singular realizations each cooccur with a single dative singular realization. The conditional probabilities defined in Table 7.8 in turn determine the conditional entropy of the corresponding cells. The conditional entropy of the dative singular given the instrumental plural, H(Cd_{s}lQp), is defined in Figure 7.5. The intuitive observation that an invariant cell is a poor predictor of inflectional variants is reflected in the fact that H(CdslCjp) preserves all of the entropy, 1.59 bits, associated with H(Cd_{s}) in Figure 7.3.
Table 7.8 Invariant, overlapping, and congruent cooccurrence
u 
e 
i 
? 

ami 



1 
(Inst Pl) 
a 

0 
0 

(Gen Sg) 
i 
0 




ёт 

0 
0 

(Inst Sg) 
^{ej} 
0 

0 


ju 
0 
0 


Figure 7.5 Conditional entropy of dative singular given instrumental plural
Figure 7.6 Conditional entropy of dative singular given instrumental singular
Figure 7.7 Conditional entropy of dative singular given genitive singular
Figure 7.6 shows how cells that covary are each good predictors of the variation in the other. In this case, the conditional entropy of the dative singular given the instrumental singular, H(C_{ds} Q_{s}), eliminates all of the entropy associated with the dative singular. A cell that exhibits partially overlapping covariation will likewise reduce uncertainty by an intermediate amount. Thus the covariation of the dative and genitive singular in Table 7.8 is reflected in Figure 7.7 by the fact that the conditional entropy of the dative singular given the genitive singular, H(Cd_{s}Cg_{S}), eliminates roughly two thirds of the entropy associated with the dative singular.
 [1] Hale (2001) applies this notion of‘surprisal’ (which he attributes to Attneave (1959:6)) to thesyntagmatic task of measuring cognitive load in sentence processing.
 [2] Shannon’s use of a base 2 logarithm provides an entropy value that measures uncertainty in bits,which is the standard measure ofinformation.
 [3] See Timberlake (2004: §3.6) for a more detailed discussion of case allomorphy.
 [4] The final step in this reduction exploits the general correspondence between the negative andreciprocal of a log, i.e, for any base b, log^ (X) = — logb (x).