The Evolution of Language

The difference between an assumption of such distinctive evolutionary steps and the perspective of extended evolution becomes strikingly clear when we turn to our first example, the problem of the origin of language and more generally of communicative structures among actors. These structures arise in collaborative situations and depend on the specific constellation, size and ensuing challenges of the relevant communities. Even before the first proto-linguistic communication systems came into being, there must have existed some regulative patterns of cooperation, such as situative action coordination mediated by visual and other material clues. At least 1.8 million years ago, at the time of Homo habilis, these regulative structures had already been shaped by a shared material culture of tool use and transmission. Communication systems, including gestures, pointing, facial expressions, pantomiming and, much later, vocalizations, initially would have only marginally supported such regulative structures without representing the full range of cooperative possibilities (Dediu and Levinson 2013). Rather, they may have begun as sporadic, domain-specific and highly context-dependent communicative interactions, which complemented other regulative structures and inherited their “meaning” from these structures.

While communication systems presuppose certain cognitive capabilities on the side of the actors, such as joint attention, they also affect the development of these capabilities by opening up an explorative space in the Vygotskyan sense discussed above (Lock 2000; Damerow 2000). This space exists precisely because communication systems constitute, like the underlying material culture, external representations of regulative structures that typically have a larger horizon of applicability than that given by their initial purpose or circumstances of application. This opens up the possibility for an iterative process of language evolution, resulting in the layered structure of the human communication system that we actually observe today, which comprises gestures, facial expressions, pointing, pantomiming and vocalizations, all as part of one integrated multi-modal system of human communication. In a path-breaking paper, Levinson and Holler (2014) have suggested that this multi-modal system is indeed the result of a superposition of evolutionary layers. But what kind of evolution could produce such a layered structure?

Tomasello refers to the famous treatment of major transitions in evolution by Maynard Smith and Szathmary (1995), pointing out that “humans have created genuine evolutionary novelties via new forms of cooperation, supported and extended by new forms of communication. ... And humans have done this twice, the second step building on the first” (Tomasello 2014, 141). For Maynard Smith and Szathmary, the major transitions involve changes in the way information is stored and processed, listing as the three main possibilities duplication and divergence, symbiosis and epigenesis. Their own evolutionary account of language, for example, involves the explanation of the genesis of grammatical structures by genetic assimilation, essentially turning cultural into biological inheritance.

Such an argument may fit well with the concept of human cognitive evolution being characterized by major evolutionary transitions, as is assumed by Tomasello (2014, 127) who claims that language “plays its role only fairly late in the process. ... Language is the capstone of uniquely human cognition and thinking, not its foundation.” Dediu and Levinson (2013), on the other hand, argue “that recognizably modern language is likely an ancient feature of our genus pre-dating at least the common ancestor of modern humans and Neanderthals about half a million years ago.” According to their view, human evolution is a more protracted and reticulated process, involving both vertical and horizontal processes of gene-cultural coevolution and leading to the multi-layered regulatory structure of the human communication system observed today.

In our view, it is precisely the combination of regulatory networks and niche construction that may account for this structure. The evolution of human communication systems is thus governed by the same dynamics that is at the forefront of an emerging new synthesis in evolutionary theory. The process must have started from some contingent ecological context that constituted an external scaffolding for human social interactions, such as conditions favoring collective foraging or hunting. These initially fragile social interactions must have involved some context- dependent signaling, for instance, by means of gestures mimicking actions. The next step would then be a gradual exploration and extension of this situative action coordination, including a discovery of new possibilities such as the ritualization and conventionalization of gestures. The point is that this exploration effectively changed the environment in which social interactions took place, creating a new niche with feedback on the action coordination itself, in particular, on the possibilities for anticipating, by means of communicative acts, the articulation of goals of actions and hence the separation of their planning and execution, as well as the division of labor.

As a consequence, actions were then performed in a new context, accompanied by an ever more extended communication system. As we have emphasized, this extension of the communication system enriched the possibilities of action coordination. The enhanced coordination may also be considered as an internalization of the contingent external conditions of social interaction-which are now turned into intrinsic properties of this interaction. In short, what may have been initially sporadic, situation-dependent signals within an originally only marginal communication process were eventually transformed into elements of a more and more self-sustaining system of communication, comprising, for instance, conventionalized gestures that are used outside of immediate action contexts. These elements hence receive their meaning not only from the contexts of action in which they are being applied but also from their role in the emerging communicative system. One immediate effect of such an internalization of external contexts is the stabilization of the originally fragile social scaffolding that may have been highly dependent on contingent external factors, for instance, specific ecological conditions.

A further consequence of the exploration of the inherent potential of an incipient communication system is the bootstrapping of developmental possibilities that can only be exploited under appropriate external conditions. This is what is designated in psychology as Vygotsky’s zone of proximal development, that is, the difference between what an actor can do spontaneously without help and what the actor can do with support from a favorable environment (Vygotsky 1978; Smith et al. 1997). This difference is not only familiar from children but also from acculturated apes brought into a human environment. In our case, however, the environment favoring ontogenetic development must, of course, be itself constructed in the course of historical evolution as a cultural niche. In agreement with arguments by Lock and Damerow (Lock 2000; Damerow 2000), we claim that in the evolution of populations, systems of external representation provide the conditions for the elaboration of implications that correspond functionally to Vygotsky’s zone of proximal development in individual learning.

Following Peter Damerow (1996), we further assume that on the level of the individual, action coordination is ruled by cognitive structures that are built up in ontogenesis through interaction with the environment and, furthermore, that these cognitive structures are shaped by the material means of actions. Indeed, the material means available in a given situation determine not only a horizon of possibilities for actions but also what is generic about an action, that is, what can be transferred from one context to another, thus shaping the cognitive structures emerging from their usage. Under this assumption, the construction of a cultural niche encompassing both material means for interacting with the environment and an external system of communication represented by bodily signals must in turn affect the cognitive structures acquired by individuals during their development.

This then is the beginning of a further step in the bootstrapping process: the relation between material means and external representations of thinking and communication, on the one hand, and cognitive structures on the other is, of course, not one-sided, with the actions forming the basis and cognition following suit. Newly developed cognitive structures may in turn enable an improvement of the material means of actions employed; similarly these cognitive structures can be represented in communication processes by new kinds of external representations, thus giving rise to an iterative, albeit highly path-dependent evolutionary process that generates cognitive structures of ever greater generality, in the sense of an increasing decontextualization.

It is quite conceivable that such an iterative evolution can indeed account for the emergence of the multimodal system of human communication of which modern language forms the capstone. It is, in any case, a characteristic feature of the process we describe here that new layers do not replace earlier ones but are rather integrated with the pre-existing layers in an ever more extended regulative architecture. Consider the fact, for instance, that our vocal language continues to be accompanied by body language. But rather than illustrating this in detail for the example of language, we now turn to much more recent periods of history to demonstrate the generality of our model of cultural evolution.

