Desktop version

Home arrow Communication

  • Increase font
  • Decrease font

<<   CONTENTS   >>

Modelling paradigms and AI eras: cognitivist and emergentist perspectives

As briefly illustrated in the previous sections, the early days of AI were mainly characterized by the “cognitivist” assumption that intelligent activity in both living and artificial systems was possible due to the capability of encoding knowledge about the external world via “internal” abstract symbolic representations, directly corresponding to elements of the reality. In this setting, intelligent behaviour (e.g., in language, vision, planning, etc.) was viewed as the expression of operations carried out on such symbols and the motto of this early phase (also known as “cognitivism”, see e.g., Vernon, 2014) was synthesized by the expression “cognition is computation”. Here, the word “computation” was intended to mean the capability of manipulating such symbolic structures. The theoretical reference framework that inspired such an assumption, in both cognitive psychology and artificial intelligence, was the so-called “Physical Symbol System [1]

Hypothesis” (PSSH), introduced by Newell and Simon (1976). According to this theory, intelligent beings are physical symbol systems. In this framework, symbolic representations were not only a denotational means for referring to entities of the external world but also a means for denoting other internal symbolic structures (thus allowing to hypothesise an internal information processing mechanism able to overcome the classical Input-Output direct mapping assumed by the behaviourist tradition'7). In this view, symbolic systems are assumed to be realizable by means of different “hardware” (e.g., a Von Neumann architecture or a natural brain[2] [3]) and symbolic processing is considered a necessary and sufficient condition for intelligent behaviour. In particular, the apparatus of such a hypothesis assumes that an intelligent agent should be equipped with the following elements (Newell, 1990):

  • • Memory Systems (to contain the symbolic information)
  • • Symbols (to provide a pattern to match or index other symbols)
  • • Operations (to manipulate symbols)
  • • Interpretations (to allow symbols to specify operations)
  • • Symbolic Capacities for



With respect to what was mentioned earlier about the “symbolic paradigm”, some additional clarifications are needed to fully grasp what concerns both the “Symbols” and the “Compositionality” requirements identified by Newell in the above mentioned list.

12 Cognitive science and Al

For what concerns the “symbols”, as mentioned, the PSSH assumes that such abstract structures can refer to and be combined with (as is evident more clearly in the figure 1.1 below) other internal symbols and processes.

This possibility is important in light of the “compositionality” requirement. Compositionality is an important feature of symbolic systems and is also considered an irrevocable trait of human cognition. In a compositional system of representation, it is possible to distinguish between a set of primitive, or atomic, symbols and a set of complex symbols. Complex symbols are generated from primitive symbols through the application of suitable recursive syntactic rules: generally, a potentially infinite set of complex symbols can be generated from a finite set of primitive symbols. The meaning of complex symbols can be determined starting from the meaning of primitive symbols, using recursive semantic rules that work in parallel with syntactic composition rules. In the context of classical cognitive science, it is often assumed that mental representations are indeed compositional. A clear and explicit formulation of this assumption was proposed by Fodor and Pylyshyn (Fodor & Pylyshyn, 1988). They claim that the compositionality of mental representations is mandatory to explain fundamental cognitive phenomena (i.e., the generative and systematic character of human cognition) and they also show how the contrasting neural, distributed representations encoded in artificial neural networks are not compositional.14 [4]

Overview of the internal dynamics of physical symbol systems (adapted from Vernon, 2014)

FIGURE 1.1 Overview of the internal dynamics of physical symbol systems (adapted from Vernon, 2014).

Given this state of affairs, then, solving a problem for a physical symbol system means being able to perform a Heuristic Search within a problem space represented by symbolic structures. Here, in fact, the intelligent behaviour is assumed to emerge by generating and progressively modifying symbol structures until a solution structure (e.g., a goal) is reached. This overall assumption is known as the Heuristic Search Hypothesis[5] and, as it is probably evident to the readers, some of the above-mentioned early systems like GPS (as well as the formalisms like the Semantic Networks and, as we will see, SOAR as well) are heavily built upon the PSSH and its Heuristic Search corollary.

Parallel to these “symbolic” developments, a radically different modelling approach based on neuron-like “subsymbolic” or “connectionist” computations (e.g., Grossberg, 1976; McClelland, 2010) was being explored. Proponents of this approach (one of the most successful in the so-called “emergentist” field[6]) maintain that many classic types of structured knowledge, such as graphs, grammars, rules, objects, structural descriptions, programs, etc., can be useful yet misleading metaphors for characterizing “thought” in both natural and artificial systems. In particular, these structures are seen as epiphenomenal rather than real, emergent properties of more fundamental sub-symbolic cognitive processes (McClelland, 2010) (Figure 1.1).

In general, in contrast to the symbolic paradigm, the knowledge in these neural networks is distributed across a collection of units rather than localized as in symbolic data structures. The central idea of such models, in fact, is that a large number of simple computational units can achieve intelligent behaviour when networked together. This insight applies equally to neurons in biological nervous systems and to hidden units in computational models. The representations and algorithms used by this approach, therefore, were (and are) more directly inspired by neuroscience rather than psychology. As a consequence, differing from the PSSH, in this modelling framework (and in general all the so-called emergentist modelling frameworks) the “physical hardware” (e.g., the body) instantiating the actual computation is assumed to play an important role.

From a historical perspective, the connectionist movement took inspiration from the functional models of nervous cells, introduced in the pioneering work by Warren McCulloch and Walter Pitts (developed during the pre-cybernetic period and heavily influencing cybernetic research), showing how every “net” of formal neurons — if furnished with a tape and suitable input, output, and scanning systems - is equivalent to a Turing machine[7] (McCulloch & Pitts, 1943). Such initial insights were later enriched by research from Donald Hebb (Hebb, 1948) about the learning processes in the nervous system[8] [9] and further studies of learning and classification processes in networks, a la McCullock and Pitts, lead to the development of the first artificial neural network (ANN) known as


Perceptron (developed by Rosenblatt in 1958).“

After these pioneering works, during the 1960s, research on neural nets seemed to take a step back once a notorious book by Minsky and Papert (1969) showed the limitations of the then-existent Perceptron in discriminating very simple visual stimuli. Despite such limitations, however, various researchers continued to work on this framework and the “renascence of neural nets”, that took place in the 1980s, happened in ground that was still fertile. Nevertheless, “this renascence was marked by at least two crucial events, accompanied by the development in those years of computers with great computing power, allowing them to simulate neural nets of increasing complexity” (Cordeschi, 2002: 213). In particular, in 1982, John Hopfield proved that symmetrical neural nets necessarily evolve towards steady states - then interpreted as attractors in the dynamic system theory - and that they can function as associative memories (Hopfield, 1982). In 1985, James MacLelland, David Rumelhart, and their collaborators introduced the approach known as parallel distributed processing (PDP) of information by starting a number of investigations on natural language acquisition by emphasizing the role of artificial neural networks and of parallel computation in the study of cognitive phenomena. They showed how a learning algorithm based on error correction, known as “backpropagation”,23 made it possible to overcome the main limitations of neural nets described by Minsky and Papert (Rumelhart, McClelland, & the PDP Research Group, 1986).26 Back then, the achieved results had a strong echo since they were also considered the first example countering the predominant (in both linguistics and AI) Chomskian view of language processing, which took moves from the book “Syntactic Structures” (Chomsky, 1957), declaring the primacy of syntax and grammars. Since these pioneering works, connectionist systems have been widely adopted in a variety of applications in both the cognitive modelling and AI communities. Connectionist systems (and emergent systems in general) have been important in the AI landscape since they have provided more suitable solutions (with respect to the symbolic approach) able to deal with the environment and with the processing of the perceptual aspects of sensory input. In particular, they have fought the tendency of (early) symbolic AI to consider, in an isolated way, perceptual systems, motor systems, and high-level cognitive functions etc.27 On the other hand, they have targeted the close interaction between the “mind” (natural or artificial), the body (i.e., the “hardware”), and environment.28 This has led, in some cases, to radical

  • 25 The backpropagation rule intervenes to change the weights of the connections between the hidden units, going backward from the error, which is calculated at the output units. Rosenblatt had anticipated the formulation of various aspects of this rule that, however, was fully formalised by Geft'Hinton, winner of the Turing Award Prize in 2019 for, among the other things, the invention of the backpropagation algorithm.
  • 26 The work and its assumptions were not free from criticisms. See, for example, Pinker and Prince (1988) and the subsequent debate that dominated the late 1980s and 1990s.
  • 27 This tendency was, in a later period, contrasted also within the cognitivist/symbolic approach by Allen Newell. While Simon, in fact, continued his development of “microtheories” or “middle-range” theories (see Cordeschi, 2002) by focusing on the refinement of the analysis of verbal protocol, Newell didn’t consider the construction of single simulative microtheories a sufficient means to enable the generalisation of “unifying” theories of cognition (the original goal of Information Processing Psychology). Therefore, diverging from Simon, he proposed building simulative programs independent from single cognitive tasks and able to include invariant structures of human cognition. In this way, he started the enterprise of studying and developing integrated and multi-tasking intelligence via cognitive architectures that would have led to the development of the SOAR system.
  • 28 It is worth noticing, however, that in classic “cognitivist” tradition as well the importance of the environment in the deployment of intelligent behaviour was somehow recognised. Herbert Simon, in fact, in his lecture series on “The sciences of the artificial” (later published as a famous book with the same title), introduced the so-called “Ant metaphor”, which would later come to be known as “Simon’s Ant metaphor” and which can be described as follows: “An ant, viewed as a behaving system, is quite simple. The apparent complexity of its behaviour over time is largely a reflection of the complexity of the environment in which it finds itself’. Simon then applies this consideration to human beings by suggesting that the apparent complexity of human behaviour is also largely a reflection of the complexity of the environment in which we live. Therefore he suggests that the environment should play an important role in building simulative models of cognition since “the behaviour takes on the shape of the task environment”. Despite these relevant insights, however, early AI systems assuming the PSSH did not succeed in integrating such aspects in their models and were severely criticized by proponents of the assumptions that have proposed the complete elimination of the notion of “representation” (intended in the cognitivist/symbolic sense) from the vocabulary of the cognitive and artificial sciences. This movement was led by the roboticist Rodney Brooks through the proposal of the so-called “Subsumption Architecture” (Brooks, 1986, 1991). This proposal consists of a layered, decentralized, robotic control architecture that does not make any use of internal representation of the world (i.e., the motto of this view is “use the world as a model”), where the relevant parts of the control system interact and activate each other through sensing the world. Subsumption architecture has been very influential from an engineering point of view since a vast variety of effective, implemented robotic systems use it.2<> It is based on the so-called “creature hypothesis”, according to which the most important part in the design of an intelligent artificial system can be reduced to the difficulty of building a machine that act as smart as an insect. In other words, the underlying assumption of such a hypothesis is that once the perceptual/reactive part of a “creature” (natural or artificial) is built, then building the rest of the intelligence features is an easy task to achieve. The figure 1.2 below shows the characteristics of this kind of architecture. Each layer, programmed by using finite state machines of problem solving was assumed to deal with specific tasks (e.g., the task of avoiding obstacles, wandering, seeking, etc.) and higher levels of the hierarchy subsume the actions of lower levels. The design of successive task-achieving layers is stopped once the overall desired task is achieved (Figure 1.2).

Such radical proposal, however, has also shown significant limitations. In fact, even if they lead, through the development of innovative architectures for decentralized action control, to the ability of acting in non-structured environments in real time, these systems nevertheless showed their limitations when asked to deal with more high-level cognitive tasks, such as planning, reasoning, multi-agent coordination, and so on. Such tasks, on the other hand, were dealt with in a more satisfactory way via the symbolic approach, thus suggesting the practical utility of the notion of “representation”.

The classical move, in this case, was the adoption of hybrid approaches trying to connect low-level and high-level faculties by integrating neural and symbolic approaches. Investigations of the integration between “symbolic” and “subsym- bolic” in AI have coexisted during recent decades, but despite the realization of [10] [11]

Cognitive science and Al 17

Brooks’ Subsumption Architecture (adapted from Brooks, 1999)

FIGURE 1.2 Brooks’ Subsumption Architecture (adapted from Brooks, 1999).

many hybrid systems, a general solution to the problem of the ad-lwc integration of such heterogeneous components does not yet exist. In particular, connection- ist models have continued to achieve the best results in handling activities like pattern recognition and classification or associative learning. They have failed, however, in handling higher cognitive functions, like complex inference-based reasoning, which are better modelled by symbolic approaches.[12] A well-known problem of these connectionist representations, for example, concerns the difficulty of implementing compositionality in neural networks (Fodor & Pylyshyn, 1988). Finally, another classical problem of artificial neural networks is represented by their “opacity”: a neural network behaves as a sort of “black box” and specific interpretation for the operation of its units and weights is far from trivial. Despite such foundational problems, today neural networks are used in a variety of fields that range from machine vision to natural language processing to autonomous cars, due to the success of the new generation of deep learning architectures. On the other hand, symbolic approaches also suffer from a number of problems, other than the above-mentioned ones of dealing with com- monsense reasoning and commonsense compositionality; these range from the “frame problem” (McCarthy & Hayes, 1969) to the “symbol grounding” one (Harnad, 1990). In short, the frame problem consists of a difficulty in formally representing, in logic-based representational languages, changes in an environment in which an agent (e.g., a robot) has to solve some tasks without having to explicitly resort to an enormous number of axioms to also exclude a number of intuitively obvious - for humans - non-effects. For example, if a robot places a cup on a table, it is necessary not only specify that the cup is now on the table, but also that the light remains on, that the table is still in the same place, that the robot is still in the same room, etc. The symbol grounding issue, on the other hand, concerns the problem of how to obtain the grounding between symbolic representations and the corresponding entities that they denote in the external world. This is notoriously hard for symbolic systems and is alleviated in connectionist systems, since the data they directly take in input (e.g., images, signals, etc.) are closer to the perceptual “real world” sensory data. Summing up: the last 65 years of applied research have shown that both the main modelling approaches developed in the context of cognitive modelling and AI communities have different strengths and limitations. In any case they are not able to account, if considered in isolation, for all aspects of cognitive faculties.

  • [1] As reported in Cordeschi (2002), Minsky emphasized that these two tendencies were distinguished “in methods and goals” from a third tendency, which “has a physiological orientationand alleges to be based on an imitation of the brain,” i.e., neural net and self-organizing systemapproaches. We will discuss later in this book the “neural” or brain-inspired methods in early(and modern) AI research. As anticipated, such approaches belong to the so-called “connection-ist agenda”.
  • [2] Behaviourism (in this context we are referring to so-called “methodological behaviourism”,which is different from “philosophical behaviourism”) is a methodological approach to thestudy of behaviour in natural systems, born at the beginning of last century, and based on theobservable analysis of the responses (e.g., the produced output) to certain stimuli (the input)manipulated via different types of reinforcement (this is also known as “operant conditioning”).Watson (1913), one of the founders of this approach, defined psychology as “a purely objective experimental branch of natural science” and its program as the “prediction and control ofbehavior”. As a consequence of this radical view, behaviourists did not consider/analyze theinternal mechanisms driving a given behaviour (provided certain stimuli). The now-famousexperiments done by the Russian physiologist and Nobel Prize winner Ivan Pavlov about theconditioned reflex of dogs and their automatic stimulus-response behaviour (where the stimuluswas constituted by a “ringing bell” that the dogs had learn to associate to the arrival of food,and the response to the salivation caused by the bell ringing) was an important landmark inthis tradition (as was his other work about so-called “classical conditioning”). This approach,was severely criticised by the cognitivist tradition in psychology, the “computationalist” viewin the philosophy of mind, and Information Processing Psychology, which, on the other hand,assumed the presence of internal information processing mechanisms as driving forces leadingto a manifest behaviour.
  • [3] This claimed “interchangeability” means that, in this framework, the physical instantiation(i.e., the “hardware”) perse is not important since the intelligent behaviour emerging via symbolmanipulation is assumed to be independent of the particular form of the instantiation.
  • [4] It is worth noting that, while standard compositionality is easily handled by symbolic system,“commonsense compositionally” (i.e., one involving typicality-based reasoning a la Rosch) hasalways been a problematic aspect to model. This problem is paradigmatically represented by theso called PET FISH problem: if we consider this concept, in its prototypical characterisation,as the result of the composition of the prototypical representations of the concepts “PET” and“FISH”, we soon realise that the prototype of pet fish cannot result from the composition of the“PET” and “FISH”. A typical pet - indeed - is furry and warm, a typical fish is greyish, but atypical pet fish is neither furry and warm nor greyish (typically, it is red). The pet fish phenomenon is a classic example of the difficulty to deal with when building formalisms and systemsaiming at imitating this compositional human ability. Nowadays, a proposal to deal with the
  • [5] The Heuristic search hypothesis has been very influential in AI since many algorithms (e.g.,from the “hill climbing” to the “beam search” to the notorious A* algorithm) that have beendeveloped to improve the efficiency of finding optimal or suboptimal paths in problems represented as a graph-like structure have been developed by starting from this hypothesis. For anintroduction to these classical algorithms, we refer the reader to introductory books on AI (seee.g., Russell & Norvig, 2002). One of the first successful and convincing implementations ofsuch “search-based” approaches (e.g., the A* algorithm) was in the robot Shakey, developed in1966 by Nilsson and colleagues (see Hart et al., 1968; Nilsson, 1971; Fikes et al., 1972).
  • [6] The expression “emergentist approaches” is determined by the fact that the class of modellingframeworks of this tradition assume that the information to be processed is learned from theenvironment in a bottom-up way and intelligent behaviour (if any) is assumed to be an emergent property coming from this interaction. Within emergentist frameworks we can includedynamical systems (using differential equations to model the dynamic of a system and its changeover time, caused by the interaction with the environment) and enactive approaches (usuallyemploying both connectionist and dynamical frameworks and assuming embodied agents). Werefer to Vernon (2014, Chapter 2), for an introduction to such frameworks.
  • [7] In 1936, Turing introduced the abstract computing machine bearing his name and explicitlyconstrued a universal machine that could simulate, with appropriate encoding, any computation carried out by any Turing machine (including, of course, the universal one) (Turing,1936-37).
  • [8] Roughly speaking, so-called “Hebbian learning” consists of the evidence that when the axon ofa given cell A is near enough to excite a cell В and repeatedly or persistently takes part in firingit, this kind of associative connection A—»B leaves a trace in the nervous system that learns thissimple associative rule.
  • [9] The Perceptron was one of the first neural network architectures. This simple form of neuralnetwork consists of a first layer, corresponding to the sensory system (an analog for a retina),which is randomly connected to one or more elements in a second layer of nodes: the association system. The latter consists of association cells, or А-units, whose output is a function of theinput signal.
  • [10] “emergentist” paradigms. Emergentist modelling approaches, in fact, have proven to be moreefficacious in modelling the environment and its intervention in the emergence of intelligentbehaviour.
  • [11] The first implementation of such an architecture was executed in robots like “Allen” and“Herbert”, developed by Brooks and his group at MIT in the late 1980s. In particular, Herbert,a soda-can collecting robot, was able to exhibit the following capabilities (uncommon at thattime): moving around in a real environment without running into obstacles; detecting sodacans using a camera and a laser; using an arm that could extend, sense, and evaluate whether ornot to pick up the soda can, etc. Nowadays, Subsumption Architecture is employed in the mostsuccessful robotic platform so far: the Roomba robot!
  • [12] The novel generation of connectionist models based on deep learning have also recently gainedattention for the results obtained in tasks like automated machine translation (Jean et al., 2015).However, this success in language based tasks seems to be mainly obtained because that taskhas been treated as a machine vision task, where the structure (i.e., the patterns) of a sourcelanguage had to be mapped and compared with the one of a target language. Despite these newachievements, deep learning language models still provide poor results, compared to other approaches, in high-level cognitive tasks, ranging from Question Answering and Narrative/StoryComprehension to Commonsense Reasoning.
<<   CONTENTS   >>

Related topics