Desktop version

Home arrow Language & Literature arrow Word and paradigm morphology

Knowledge and uncertainty

A learning-based perspective also helps to clarify why information-theoretic measures are so useful for modelling morphological systems and for predicting speakers’ behavioural responses (Milin et al. 2009a,b). Contemporary theoretical models typically describe a morphological system by characterizing an inventory of well-formed morphological units and associated sets of contrastive grammatical properties/structures. In almost all models, the wellformedness of each individual unit is determined independently of other units in the system. The main exception involves cases of the kind discussed in Chapter 6.2.3, where an association between forms is established by ‘rules of referral’.

The main extension proposed in information-theoretic approaches involves the integration of information about the distribution of units. Taking type frequency into account provides a principled means of correcting the overrepresentation of restricted patterns in ‘one-unit-one-vote’ inventories. More generally, frequency information permits an uncertainty-based definition of morphological variation in terms of the distribution of alternatives. Implicational structure then correlates with the reduction of this uncertainty.

The notions of uncertainty and uncertainty reduction measured by this kind of information-theoretic approach have direct reflexes in psycholinguistic models of language learning and processing. Language learning, like learning in other cognitive domains, receives a natural interpretation as a process of uncertainty reduction (Ramscar et al. 2013a; Ramscar and Port 2016).[1] Uncertainty reduction is so central to language learning that infants appear to track the rates of uncertainty reduction associated with information sources:

Given how rapidly infants learn, even in complex environments, we can infer that they are able to access implicit knowledge about their rate of uncertainty reduction and use that knowledge to select to which source(s) of information to attend. (Gerken and Balcomb 2010: 82)

Surprisal-based models of syntactic processing (Hale 2001; Levy 2008) operate with a similar conception of uncertainty reduction, most transparently in models that embody the Entropy Reduction Hypothesis (Hale 2003, 2006).

In sum, language learning and language processing are both usefully construed in terms of uncertainty reduction. Speakers appear to be particularly attentive to features that reduce uncertainty about unencountered input. In processing syntagmatic structures, these are the features that permit speakers to predict unencountered input or accommodate to distributional patterns associated with an interlocutor or larger speech community, etc. In a paradigmatic context, implica- tional dependencies express information that a speaker can exploit in extrapolating from a partial (and biased) sample of a language.

In contrast, theoretical approaches can be construed as modelling an essentially pedagogical task. These approaches confront the same basic challenge as the lexicographer, who must enumerate the well-formed words (or other units) of a language and exclude the ill-formed units. It is of course true that producing and recognizing acceptable expressions of a language are tasks that a competent speaker is normally expected to be able to perform. Yet the ability to distinguish well-formed from ill-formed expressions is a poor idealization of a speaker’s linguistic ‘knowledge’ and there is no evidence that the acquisition of this knowledge is ever a goal in itself. Instead, the task of discriminating the acceptable expressions of a language appears to be subsumed within the larger and more useful task of determining the distribution and use of the expressions that are in circulation in a language community.

The divergence between inventory- and distribution-based approaches can be seen to reflect more fundamental assumptions about the nature of language. For the most part, the theoretical models developed in the contemporary period have approached languages as systems of discrete categorical inventories. The sound system is idealized as a discrete set of phones, classified as phonemes or allophones, and a similar inventory-based conception is extended to larger morphological and syntactic domains (whether characterized in terms of units or rules). Although there are various ways in which models can be augmented to accomodate statistical patterns, the underlying language system remains categorical, as does the speaker’s knowledge of the system.

This conception often incorporates a distinction between the content of a speaker’s knowledge (their linguistic ‘competence’ in the terms of Chomsky 1965) and their use of language (Chomsky’s ‘performance’). However, studies of phe- nonomena such as frequency effects have shown that a speaker’s command of a language involves detailed knowledge about the frequency of expressions, relative frequencies of component parts and larger collections, patterns of collocation, and other types of contextual information.[2] All of these types of ostensibly ‘performance-related’ knowledge have a much more clearly established psychological relevance than the factors proposed in connection with economy conditions or scientific compactness. The role that distributional information appears to play in learning and its value for predicting behavioural responses support a conception of the speaker’s knowledge in terms of a probabilistic language model rather than a simple inventory.

  • [1] The notion of ‘information gain’ in machine learning expresses a broadly similar notion(Abu-Mostafa et al. 2012).
  • [2] See, e.g., the essays in Bybee (2007) and the discussion in Baayen (2010).
 
Source
< Prev   CONTENTS   Source   Next >

Related topics