# Information-theoretic WP

A formalization of the classical WP model must make explicit the structure that remains implicit in an ‘exemplary paradigm and principal parts’ description while clarifying the uncertainty-reducing role of the insight “that one inflection tends to predict another” (Matthews 1991:197). As discussed in Chapters 4 and 5, the latter insight runs like a leitmotif through the classical WP tradition. The Neogrammarian notion of ‘analogy’ associated with Paul (1920) rests on the idea that a speaker’s knowledge of inflectional patterns can be exploited to reduce uncertainty about the forms of a new item:

One learns a number of paradigms by heart and then memorizes only as many forms of individual words as is necessary to recognize their affiliation to this or that paradigm. Now and then a single form suffices. One forms the remaining forms at the moment that one needs them, in accordance with the paradigm, that is, by analogy. (Paul 1920:112)

Hockett (1967: 221) expresses a similar position in a more contemporary setting when he asserts that a linguist or learner who matches forms against paradigms to deduce novel forms “would now be required to produce new forms in exactly the way the native user of the language produces or recognizes them—by analogy”.^{[1]} However, as discussed in Chapter 4.4, Hockett qualifies this endorsement by explicitly rejecting the pedagogical idealization that the forms of a language can be divided into stored exemplars, diagnostic forms and deduced forms:^{[2]}

The native user of the language... operates in terms of all sorts of internally stored paradigms, many of them doubtless only partial; and he may first encounter a new basic verb in any of its inflected forms. For the native user, the forms that we have for convenience selected to be our ‘principal parts’ have no such favored position. They are as likely to be created analogically, as needed, as are any of the other forms. (Hockett 1967:221)

Traditional proportional analogies provide a format for expressing deductions sanctioned by the implicational structure of a morphological system. If the system is conceptualized as a network of partially interdependent patterns, somewhat along the lines suggested by Bybee (2010), proportions can be thought of as expressing individual correspondences. Hence a description of the structure of a system can be obtained from sets of mutually compatible proportions, along with sequences of proportions that identify the chains of deductions that originate from a particular morphological choice. The ‘paradigm structure conditions’ (PSCs) of Wurzel (1984), in which “implicative patterns determine the structure of the paradigms of a language”, represent the most explicitly formalized variant of this type of analysis:

Observation of complicated paradigms shows that implicative relations do not only obtain between one basic inflexional form ... and all the other inflexional forms, but exist through?out the whole paradigm: all paradigms (apart from suppletive cases) are structured on the basis of implicative patterns which go beyond the individual word, patterns of varying complexity. (Wurzel 1984:208)

As discussed in Chapter 4.2, the choice of material implication to represent impli- cational relations limits the usefulness of PSCs. A formalization of the classical WP model must be able to express exceptionless patterns where they occur. However, a viable formalization must be highly sensitive to the statistical properties of language in order to capture the fact that patterns and predictions typically take the form of tendencies of varying reliability.

Models with these properties were initially developed in processing models in the psycholinguistic literature. Somewhat unexpectedly, the conceptions of uncertainty and uncertainty reduction proposed in these approaches turned out to be equally applicable to the problem of modelling paradigmatic variation and implicational structure. As discussed in Chapter 3.1.1, the point of origin for current formalizations of classical WP models lies in a series of morphological processing studies (KostiC 1991, 1995; KostiC *et al.* 2003; Moscoso del Prado Martin *et al. *2004b) that develop measures of ‘morphological information’ in terms of entropy (Shannon 1948). By chaining together classical WP conceptions of variation as uncertainty with the information-theoretic treatment of uncertainty as entropy, one arrives at a general characterization of variation as entropy. The greater the variation exhibited by a system (or part of a system), the greater the uncertainty and the higher the entropy. From this point, it is a small step to propose, as Ackerman *et al.* (2009) do, that the reduction in uncertainty achieved by implicational relations can be modelled in terms of conditional entropy.^{7} One morphological element—whether a cell or form or some other unit of analysis—is informative about a second element to the extent that knowledge of the first element reduces the amount of uncertainty about the second element measured by its entropy.

There are of course different information-theoretic measures that can be applied to the analysis of morphological systems and different interpretations that can be assigned to those measures, and the measures and interpretations adopted in initial accounts are in many respects provisional. Nevertheless, standard information- theoretic measures have properties that are useful for formalizing classical WP models. Entropy provides a measure of the variation that correlates with the complexity of a system. Entropy reduction provides a good approximation of the notion of‘informativeness’ that underpins implicational relations. A similar notion suggests solutions to the longstanding challenges raised by principal part selection and analogy validation.

Sections 7.2.1-7.2.3 now outline an information-theoretic formulation of a classical WP model and clarify how entropy and entropy reduction contribute to a formal treatment of traditional notions of variation and structure. ^{[3]}

- [1] As mentioned in connection with ‘novel forms’ in fn. 1 of Chapter 5 above, the term ‘new forms’ isagain interpreted as referring to unencountered forms, not neologisms.
- [2] The pedagogical idealization that a single diagnostic form can be identified for each lexical itemcorresponds to the ‘Single Base Hypothesis’ of Albright (2002), discussed on p. 95, and to the ‘SingleRoot Hypothesis’ assumed in the accounts discussed in Aronoff (2012).
- [3] The idea of modelling the structure of morphological systems in terms of entropy had been incirculation before Ackerman et al. (2009) and was proposed the previous year in two independentpresentations, Blevins etal. (2008) and Sproat (2008), at the 3rd Workshop on Quantitative Investigationsin Theoretical Linguistics in Helsinki.