The Zipfian Paradigm Cell Filling Problem
The claim that predictive dependencies play a pivotal role in language learning and use may appear counterintuitive at first, particularly for speakers of languages (like English) that exhibit relatively little inflectional variation. Given the volume of input to which speakers are normally exposed, it might seem reasonable to expect that they would encounter all or most of the inflectional variants in a language learned under naturalistic conditions. However, there is good reason to believe that this expectation merely reflects the fact that speakers are so adept at extrapolating from partial exposure that they have no introspective ability to identify the forms they have directly encountered.
Language corpora, which provide the best available representation of language input, appear to exhibit a systematic form of data sparsity. The forms (usually words) of a corpus obey Zipf’s law (Zipf 1949), according to which the frequency of a form is inversely proportional to its rank in the corpus. As corpora (or samples of corpora) increase in size, they do not gradually enumerate all the inflectional variants of regular items. Instead, they reinforce the rank-size distributions established in smaller corpora or samples.
The result is sparsely populated inflectional paradigms that collectively exhaust the inflectional variation exhibited by the inflection and word classes of a language. The distributional biases in the input thus create a genuine ‘stimulus sparsity’ problem for morphological acquisition, since a speaker cannot be assumed to encounter all of the forms of their language. This problem is in some ways reminiscent of the ‘stimulus poverty’ issue that has been debated in the domain of syntax. However, morphological sparsity appears to be a robust fact about the observable composition of what is sometimes termed ‘primary linguistic data’. Sparsity at the morphological level merely reflects the fact that, as Kurumada et al. (2013:440) note, “Zipfian distributions are ubiquitous across natural language”. The initially counterintuitive character of this phenomenon is attributable to their observation that the “consequences [of these distributions] for learning are only beginning to be explored”.
The phenomenon itself was recognized by Hockett (1967), who drew a pair of initial consequences. The first was that ‘exemplary paradigms’ were just as much a pedagogical idealization as ‘principal parts’. The second was that a psychologically plausible account must model a speaker’s ability to extrapolate from any new form of an item on the basis of sparsely populated paradigms:
in his analogizing... [t]he native user of the language... operates in terms of all sorts of internally stored paradigms, many of them doubtless only partial; and he may first encounter a new basic verb in any of its inflected forms. (Hockett 1967:221)
Given that patterns of interpredictability permit the extrapolation of a larger system from a subset of forms, these patterns contribute to a solution to the challenge posed by morphological sparsity. Of course interpredictability is just a type of system-level regularity, without which speakers could not reliably predict forms that they had not directly encountered. In a classical WP approach, exemplary paradigms provide the model for the deduction of unencountered forms. In a more psychologically realistic approach, the basis for analogical deductions will be sets of paradigms that exhibit congruent patterns of form variation. These sets correspond to the inflection classes of a classical WP model, with the difference that the component paradigms are partial, so it is the set, rather than any individual paradigm, that collectively exhausts the form variation associated with the class. The idea that the organization of items into inflection classes plays a role in guiding analogical deduction is also consistent with psycholinguistic studies showing that these classes have a direct influence on morphological processing (Milin et al. 2009a).
Although the assignment of paradigms to classes offers a solution to the problem of extrapolating from partial paradigms, this does not entirely avoid the effects of morphological sparsity, given that class assignment is in general less determinate for partial than for full paradigms. In some cases it may be that systems evolve in such a way that the attested variants of items tend to be of high diagnostic value for class assignment. This would, for example, be true of languages in which highly frequent nominative singular forms were a reliable guide to declension class. Other more general factors may also facilitate class assignment. One candidate is lexical form neighbourhoods, whose effects have been investigated in a range of psycholinguistic studies (Baayen et al. 2006; Gahl et al. 2011). In systems where there is a systematic correlation between lexical neighbourhoods and inflection classes, neighbourhood structures could facilitate class assignment. There is suggestive evidence that something of this sort is true in German. The preliminary study reported in Blevins et al. (2016b) found a strong correlation between similarity in form between items, measured by Levenshtein distance (Levenshtein 1966), and the number of co-filled cells in their paradigms. The more similar two forms were, the more of the same cells were filled in their respective paradigms.
These effects may reflect at least in part the fact that the common patterns of inflection that define inflection classes often correlate with similarities in stem shape or other aspects of form. However, whatever their ultimate origin, such correlations will be of use in assigning items to classes in which the variants collectively provide a basis for the analogical deduction of forms.
-  It does not, e.g., depend on assumptions about the status of ‘structure-dependent operations’(Chomsky 1965: 56) or on disputed claims—see, e.g., Sampson (1989), Pullum and Scholtz (2002), andClark and Lappin (2011)—to the effect that “People attain knowledge of the structure of their languagefor which NO evidence is available in the data to which they are exposed as children” (Hornstein andLightfoot 1981:9).
-  The ubiquity of Zipfian distributions also helps to account for the prevalence of low conditionalentropy, in the sense of Ackerman and Malouf (2013). Since a system will be deduced from the formsthat speakers actually encounter, it follows, from a learnability perspective, that those forms will beinformative about the deduced system.