Grammatical and phonological words
These claims can be clarified by distinguishing two of the established senses of the notion ‘word’. Phonological words (or word forms) are sequences of phonemes (or graphemes). Grammatical words (or words) are phonological words (or sequences of phonological words, in the case of multi-word expressions) with a morphosyntactic interpretation. It is the second of these senses that is of primary relevance to ‘word-based’ models. Bloomfield’s notion of “a minimum free form” provides the classic definition of the ‘grammatical’ word:
A minimum free form is a word.
A word is thus a form which may be uttered alone (with meaning) but cannot be analyzed into parts that may (all of them) be uttered alone (with meaning). (Bloomfield 1926:156)
On this interpretation, the three homophonous occurrences of hit in (3.1) are different words, realizing the preterite, past participle and infinitive forms of the verb hit. This is the notion of word that is most relevant to measures of token frequency, which treat each occurrence of a form as a separate ‘token’.
(3.1) a. She hit the target. (preterite)
b. She has hit the target. (past participle)
c. She will hit the target. (infinitive)
A “minimum free form... (with meaning)” also corresponds to the unit from which Robins (1959:128) claims “grammatical statements... are more profitably abstracted... than from individual morphemes”. In most languages, the abstraction of words is facilitated by cues that enhance their perceptual salience. Open- class items are often subject to a minimum word constraint, whether measured in terms of moras, syllables or metrical feet, and there is experimental evidence that speakers exploit these constraints in the segmentation of continuous speech (Norris et al. 1997). Words often define the positions at which stress, pitch or other suprasegmental features are realized, and word edges may be marked by processes such as boundary lengthening (Bybee 2001). The perceptual salience of words is enhanced by the fact that words (unlike sub-word units such as phonemes or morphemes) may stand on their own as independent utterances. In addition, if there is any content to notions like ‘the one-word stage’ (Dromi 1987), it would appear that the word is the basic utterance during early stages of language acquisition.
As one might expect, the functional load of individual cues varies across languages, reflecting general differences in phonological systems, so that no single cue identifies words cross-linguistically. Moreover, it is often word forms that are most clearly demarcated in the speech stream by phonetic cues. Word forms in this sense are sequences of phonemes (or graphemes) without a fixed meaning or function. In (3.1), the single word form hit realizes the preterite, past participle and infinitive forms of the verb hit. The same word form realizes other forms of hit, as well as the corresponding noun in They took a hit, etc. Such simple forms do not ordinarily play a significant grammatical role, though they are central to recent frequency-based models.
Although grammatical words often correspond to single word forms, the correlation between these units may be imperfect, disrupted by phonological or syntactic processes. Cliticization of prosodically light elements creates sequences in which multiple grammatical words correspond to a single phonological word. In (3.2a), the contracted phonological word she’s corresponds to the grammatical words she and has. Separable particles illustrate a converse mismatch, in which a single grammatical word such as German wegwerfen ‘throw out’ is realized by the word forms werfen and weg in (3.2b).
(3.2) a. She’s hit the flowers.
b. Sie werfen die Werbung einfach weg. they throw the flyers simply out
‘They simply throw the flyers out.’
As Robins (1959) acknowledges, these processes introduce discrepancies between the notion of‘word’ relevant for the description of grammatical relations, and the sense of ‘word’ that is marked phonetically. This divergence has led some scholars to question the status and even the usefulness of the notion ‘word’ as a unit of analysis, within individual systems and across languages. Matthews (2002: 266) summarizes the collection of typological studies in Dixon and Aikhenvald (2002) by observing that they “make clear not just that criteria conflict, but that different linguists may resolve some kinds of conflict very differently”. Haspelmath (2011:70) echoes this assessment in acknowledging “ ‘Words’ as language-specific units are often unproblematic... but the criteria employed in different languages are often very different” Following a comprehensive cross-linguistic survey of prosodic domains, Schiering et al. (2010: 657) likewise conclude that “the ‘word’ has no privileged or universal status in phonology, but only emerges through frequent reference of sound patterns to a given construction type in a given language”.
Although they bring new data and methods to bear on questions regarding the status of words, these studies essentially reprise a longstanding criticism of word-based approaches. The difficulty of demarcating words is taken by Bloomfield (1914) as evidence for the primacy of the sentence:
It has long been recognized that the first and original datum of language is the sentence,— that the individual word is the product of a theoretical reflection which ought not to be taken for granted, and, further, that the grouping of derived and inflected words into paradigms, and the abstraction of roots, stems, affixes, or other formative processes, is again the result of an even more refined analysis. (Bloomfield 1914: 65)
The epistemological priority that Bloomfield assigns to utterances is particularly compatible with exemplary construction-based approaches, such as Booij (2010), as well as with a more utterance-based model of the lexicon. But the “abstraction” of units that Bloomfield mentions is entirely parallel to “abstractions” in the passage on p. 6 above, where Robins (1959:128) notes that “grammatical statements are abstractions, but they are more profitably abstracted from words as wholes than from individual morphemes”. At every level below the utterance (or even discourse), the units of a linguistic analysis are subject to exactly the same type of cost-benefit analysis.
Hence, while it is true that discrepancies arise between phonological and grammatical words, within and across languages, these discrepancies arise precisely because there are cues which, with varying degrees of reliability, mark word boundaries or otherwise guide the segmentation of utterances into words. The existence of mismatches should not obscure the fact that the two notions of ‘word’ overlap, at least partially, in many languages, and that this overlap permits speakers to isolate grammatical words. Although grammatical words may be imperfectly demarcated, sub-word units—including, significantly, roots—are rarely if ever cued at all by phonetic properties. There is no discrepancy between the ‘grammatical morpheme’ and the ‘phonological morpheme’ for the simple reason that there is no such thing as a ‘phonological morpheme’ Hence the objection that grammatical words are not reliably and invariantly cued in the speech stream provides no motivation for shifting the focus of morphological analysis onto units smaller than the word (such as stems, roots or morphemes), since these units require an even greater degree of abstraction from the speech signal. The observation that words are abstractions just falls under the broader generalization that all linguistic units smaller than utterances are abstracted from larger sequences of connected speech.
It is also worth bearing in mind the possibility that much of the debate about the alignment of grammatical and phonological words may be misconceived at a more fundamental level. Nearly all linguistic approaches to this issue frame the problem in terms of specifying reliable and cross-linguistically valid cues for demarcating grammatical words. Underlying these approaches is the assumption that there should be some set of necessary and sufficient conditions for defining words in a given language, or across languages.
From an abstractive perspective, the cross-linguistic variability of phonetic cues does not necessarily call into question the status of words as units but may instead reflect the fact that the cues serve a secondary function, reinforcing boundaries that are principally of predictive value. From this standpoint, what unites the units that ‘emerge’ in different languages is the fact that they reflect common statistical patterns. The ‘words’ of a given language will have two characteristic properties. The first is that they correspond to sequences that occur between peaks of uncertainty about following phonetic material. The second is that they reduce uncertainty about following words. Statistical units abstracted in this way from larger forms will be ‘emergent’ in essentially the sense of Schiering etal. (2010: 657). Phonetic cues will then tend to reinforce these units in ways that reflect the sound patterns of individual languages.
This perspective has mainly been developed in the large and diverse psycholinguistic and computational literature on word segmentation and recognition. This literature includes work on identifying ‘uniqueness points’ in Marslen-Wilson and Welsh (1978), Marslen-Wilson and Tyler (1980) and Balling and Baayen (2012), neural network-based predictive models (Elman 1990), and statistical models of word segmentation (Goldwater et al. 2009). The models of word segmentation developed in these studies are based on the observation that entropy (roughly, uncertainty about the segments that follow) declines as more of a word is processed, then peaks again at word boundaries. The treatment of ‘words’ as statistically emergent units of predictive value is expressed directly in the second assumption below:
Observations about predictability at word boundaries are consistent with two different kinds of assumptions about what constitutes a word: either a word is a unit that is statistically independent of other units, or it is a unit that helps to predict other units (but to a lesser degree than the beginning of a word predicts its end). (Goldwater et al. 2009:22)
From this perspective, the phonological word would correspond to phonetic sequences between entropy ‘peaks’, with phonetic cues representing one source of entropy reduction. The correlation between phonological and grammatical words could be similarly probabilistic rather than defined by discrete criteria.
On this type of approach, the absence of invariant linguistic cues does not call into question the status of the units that are cued. Rather, it is the idea that these cues are definitional that stands in need of revision. A predictive perspective also helps to account for the intractability of what Spencer (2012) terms ‘The Segmentation Problem’. Given that uncertainty is continuous, not discrete, it is only by smoothing that we obtain discrete boundaries and units. Generalizing over these units lends a measure of support to the traditional claim that words are “a more stable and solid focus of grammatical relations than the component morpheme”. But this does not necessarily determine a unique segmentation of the speech stream, least of all at the sub-word level.