Desktop version

Home arrow Communication

  • Increase font
  • Decrease font

<<   CONTENTS   >>

The way forward

In the next few pages I will provide some examples of topics where a cognitive approach to the science of the artificial could provide mutual benefits to both the AI and the Cognitive Science communities. I will also provide some pointers to initiatives in this direction that are already in place in the international scientific community. A preliminary general analysis concerning the cognitive design approach (CDA) necessarily involves the individuation of a class of problems where CDA can prove its efficacy in building better machines and better simulative models of cognition. In this respect, evidence from AI research suggests that a place to look at is represented by all those tasks that are particularly easy for humans to but that, on the other hand, are difficult to model in artificial systems. The list of these tasks is transversal and encompasses different research topics within the AI and Cognitive Science communities. A non-exhaustive list would certainly contain - for example - the following ones: machine learning (in particular, the field of unsupervised learning), transfer learning, commonsense reasoning (concerning physical, spatial, and action-oriented environments), analogical reasoning, computational creativity, story and narrative understanding, multimodal integration, emotion modelling, dialogues and conversation management in unrestricted settings, decisions explainability, and social cognition abilities relying on notions such as trust and theory of mind (intended as the capability of understanding other agent’s purposes, intentions, and goals).

Let us now zoom in on some of these activities by pointing out how the cognitive design approach can be of help. A primary field of interest concerns the area of cognitively inspired machine learning, which encompasses problems and issues ranging from learning from few or no examples (e.g., few-shot, one-shot, or zero-shot learning) to multimodal integration and transfer learning. From this point of view, among the main characteristics of the learning capability in humans are the relative easiness of learning new knowledge through a limited number of examples and, in addition, the capability of generalizing and mapping what was learnt to other domains. A particularly interesting point of contact between machine learning and the results from experimental research in cognitive science, in particular from developmental psychology, was suggested in a recent paper by Zaadnoordijk, Besold, and Cusack (2020). Here, the authors propose that in order to improve the currently poor performances of systems dealing with unsupervised machine learning (which, as mentioned, is the type of learning obtained with no human supervision), AI engineers and computer scientists should look at how babies acquire their spatial, language, and motor cognitive capabilities in order to endow such AI systems with the possibility of manifesting some forms of ontogenetic development. This general idea is, of course, not new and well rooted in developmental approaches to AI. However, these authors specifically identify five main elements of learning in babies (in particular, babies within the first 12 months of life) that could be used as a possible fruitful focus for machine learning research. Such factors, indeed, are assumed to be the crucial elements enabling infants’ quality and speed of learning. The first identified factor concerns the fact that babies’ information processing is guided and constrained from birth from their (biological) neural architecture. This means that, similarly, machine learning researchers should find the starting conditions (i.e., the initial training setup) that determine the developmental learning capabilities in different machine learning architectures. The second factor concerns the fact that babies are able to learn statistical relations across diverse, multimodal inputs. Multimodal approaches have already been successfully pursued in machine learning for decades. However, until today, these successes have not caused a widespread shift from unimodal to multimodal training of Deep Neural Networks (DNNs). The third factor is that infants are able to acquire learning in a cumulative way over time while, on the other hand, neural networks (in particular, deep neural networks used in deep learning applications) suffer from the problem known as catastrophic interference: a process where new knowledge overwrites, rather than integrates, previous knowledge. Despite the fact that some solutions have been proposed to alleviate this problem, the general issue of catastrophic interference still remains unsolved. This aspect represents a crucial element that needs to be addressed in the current AI research, and insights coming from the way in which babies scaffold, over time, learned knowledge can be really crucial to provide technical insights. The fourth factor identified is that learning in infants does not happen in a passive way and, similarly, more interdisciplinary efforts should be done in the context of so-called “active learning” and “curiosity-driven learning”. Finally, the fifth factor concerns the fact that learning in babies is also the result of interactions with other agents. This fact should represent a call for doing machine learning research “out of the vacuum” and in more ecological settings; i.e., machine learning systems should be studied within a societal context of interacting agents (which can be humans or other artificial systems). This aspect has been also recently considered within the “cognitive AI” community with the proposal of the so-called Interactive Task Learning (Laird et al., 2017). This task consists of allowing an artificial system to learn the underlying elements of a task (rather than how to optimally solve it) via a natural interactions with humans, used as instructors in a typical student-teacher situation. The overall idea of this relatively new field of investigation is that the task to be learnt can be described (via natural language or physical activity, e.g., by pointing to objects in an environment) such that its overall purpose, the appropriate “moves”, and its goal and termination conditions can be internalized by an artificial system. The “student” (i.e., the artificial system, in our case) can ask specific questions to the instructor (e.g., a human) to assess that the internal representation of the tasks that it is building is actually the correct one. Finally, after the system is able to understand the essence of the task, it can learn how to execute it well. This example goes exactly in the direction suggested by Zaadnoordijk, Besold, and Cusack (2020) and would require a robust campaign of experiments to show the limitations of the current systems in this ecological setting.

If we switch our attention to a different area, we can explore a similar discourse. For example, let us consider the context of story and narrative understanding. Since the beginning of early AI research, cognitive scientists like Shank and Abelson (with the introduction of the “script” notion) unveiled the role of story-centred data structures as building blocks for the realization of dialogue and narrative understanding systems. Still, nowadays, cognitive science research plays (and has played) a major role in the development of one of the most advanced narrative AI technologies: the Genesis system developed by Patrick Henry Winston and his collaborators at MIT (Winston, 2012a, 2014). This system analyzes stories in textual formats and is able to provide short summaries with an incredible level of synthesis. The system implements the Strong Story Hypothesis: a theoretical view that, according to Winston, frames the reason for why humans are smarter than other primates by hypothesizing that story understanding (and storytelling) plays a central role in human intelligence (Winston, 2011, 2012b). At the heart of the story-understanding mechanisms of Genesis are few commonsense rules that are able to extract explicit representations of events from texts and materialize some implicit connections within a plot. This area of research represents a quite straightforward example of the benefits of collaboration between AI and Cognitive Science scholars (Winston, who passed away in 2019, was one of the rare scholars able to do research in both fields). This is an example that should be encouraged and reinforced to improve current AI technologies.[1] By mentioning Winston’s research, we have also touched on two other important topics that still would benefit from a cognitively inspired approach: the research area on commonsense reasoning and the one on natural language understanding. Concerning the former: we have already seen (in Chapter 2) that this is a very old problem within the AI agenda. Nonetheless, many logic-oriented approaches (ranging from fuzzy logic to default logics to all the different proposals of non-monotonic logics) have mostly failed to provide a suitable solution, due to the intrinsic computational complexity of their reasoning procedure (ranging from undecidability to intractability, according to the ditferent versions of proposed formalisms). Only recently have we been able to assist the development, under certain strict specific assumptions, of non-monotonic reasoning frameworks able to reach the same complexity of standard deductive reasoning procedures (e.g., see Giordano et al., 2020, for a review) or a polynomial time complexity (see Bonatti et al., 2015). However commonsense reasoning is really an ubiquitous tasks in humans, an everyday activity, and in addition, these achievements are not sufficient to account for its linear and real-time execution. In other words, by using the previously introduced “dual process” terminology proposed by Kahneman, commonsense reasoning processes are mostly Type 1 processes since they operate in a very fast way. On the other hand, most of the logics proposed to handle this phenomenon have a complexity that is higher than standard deductive processes (i.e., belonging to Type 2 processes). The poor performances of current Al systems in commonsense reasoning (involving not just natural language or question answering but also systems dealing with temporal, physical, and visual reasoning tasks) has also been recently pointed out in a detailed review by Davis and Marcus (2015). Therefore, it is striking to see how, despite this being a well-known problem since the early days of Al, we have not seen much progress on this topic. It is also clear, however, how being able to produce human-level solutions for this challenge represents a crucial research goal for Al researchers to achieve. The complexity of this task also suggests a strong interdisciplinary effort. The advantages are quite evident: Cognitive Science can provide Al with knowledge about heuristics to implement in commonsense reasoning systems, cognitively inspired Al systems can provide insights about the dynamics (which are not visible through neuroscience experiments nor psychological ones) of the interaction between the different commonsense heuristics put in place in a machine, thus providing a computational ground to test theories and individuate eventual theoretical “bugs” to fix (as the case of the DUAL PECCS systems shows). Commonsense reasoning represents also one of the main problems of current natural language processing (NLP) systems. Let us consider, for our purposes, the most recent and advanced technology in the field: the GPT-3, announced in May 2020 by researchers from OpenAI (Brown et al., 2020). GPT-3 is a new deep-learning model (built by scraping the web) with around 175 billion parameters. The model achieves state-of-the-art performance across a variety of NLP tasks. Training the model, however, is so expensive that it could not be retrained to fix an eventual bug (retraining with half a trillion words costs - according to an estimate —SI2 million). The neural network model, indeed, is so large that it cannot easily be moved off the cluster of machines it was trained on (and this leads to a problem that we could call “gigantic immobility”). Now, these simple data alone can give an idea of how far a model of this type is from the considerations that we have raised above. In particular, a language model of this type cannot learn new knowledge and be updated, does not develop itself over time, is completely ungrounded on any kind of biological or cognitively inspired constraint, and, definitively, is galaxies away from understanding anything and from being comparable to an infant’s ability of learning and grasping the foundations of language. The incredible performances showed by the GPT-3 model in NLP tasks are, indeed, obtained thanks to the incredible computational power demanded and adopted for its usage. The cost of handling such an unconstrained and gigantic model, however, is unsustainable in all possible ways (e.g., both economically and environmentally, due to the high energy consumption of such a system). In addition, as with its predecessors, the model seems to fall short in dealing with the foundational issues of natural language understanding and, as in the cases analyzed previously in the book, the super-human performances are accompanied by subhuman-like errors in tasks of commonsense and analogical reasoning. In this respect, a caveat must be mentioned: the latter mentioned experiments are only preliminary and unsystematic[2] (and, as such, need to be scientifically and extensively validated in order to have scientific relevance). Nonetheless, they seem to confirm the classical limitations of this type of system. In particular, Davis and Marcus report - at this link: davise/papers/GPT3CompleteTests.html - a whole set of examples used in their experiments with GPT-3 on commonsense reasoning involving different kinds of abilities,[3] while Melanie Mitchell has described in a Medium article[4] her experiments on analogical reasoning by comparing GPT-3 to the notorious system Cop- yCat, developed by Douglas Hofstadter and colleagues (including Mitchell) at MIT in the 1980s and 1990s (Hofstadter and Mitchell, 1995). Of course, as mentioned, these preliminary and unsystematic experiments cannot be considered scientific evidence of the inability of GPT-3 to deal with commonsense or analogical reasoning (also because in many cases the reported answers are correct), however, the types or errors produced reveal, in mice, some classical misalignment between functional AI systems and human performance. Such initial hints, however, need to be confirmed or disproved by extensive and systematic evaluation campaigns. A fact that certainly emerges and doesn’t need any additional analysis, however, concerns the complete lack of “transparency” in this model as to why certain errors are made or answers provided. Previously (in Chapter 1) we called this the “opacity” problem and we have already mentioned how this is a foundational element that affects connectionist systems, in general, and that explodes with deep learning architectures. The call for transparency and accountability of algorithmic procedures is, however, not only an important desideratum for human—machine interaction. In certain cases, e.g., in Europe, it is now also a mandatory requirement, requested by the novel legislation on privacy known by the acronym “GDPR” (General Data Protection Regulation) that has been in effect since 2018. GDPR, indeed, explicitly calls for the “right to an explanation” for the decisions made by automatic systems. This has led to the development of a brand new field called Explainable AI (or XAI, see footnote 4 in Chapter 3)."1 However, the problem here is what kind of explanations do humans need in order to make intelligible the underlying decision strategies used by artificial systems? Again: Cognitive Science can help in this respect. The first hint coming from this field, indeed, is that humans look for causal explanations (i.e., a very restrictive type of mechanistic explanations, see Chapter 3). The importance of causality in human reasoning (and in AI) has been championed by Turing Award winner Judea Pearl (Pearl, 2009; Pearl and Mackenzie, 2018). The extrapolation of causal cues in systems like deep neural networks, however, is far from being solved, as Pearl warns. Apart from the extraction of causal cues, however, there are other cognitive traits that should be taken into account in XAI systems (at least in those that revolve around final users as their direct targets). For example, the fact that humans look for selected explanations, i.e., due to our limitations we do not want to be presented with the whole chain of causal connections explaining a given algorithmic decision (since this would require, in certain cases, managing too many unintelligible variables). We want a synthetic explanation that gets to the core of the causal chain. Also, the mechanisms behind the individuation of such selections, however, are not known and would require closer collaboration between AI and Cognitive Science researchers. Another cognitively effective explanatory cue concerns the use of contrastive explanations (see Miller, 2019): people do not ask why event P happened, but rather why event P happened instead of some event Q. This aspect should be embedded in systems interacting with end-users. Finally, explanations are a social element in human-human interactions. So, including them in a narrative and argumentative framework would facilitate their comprehensibility to human users/’ The implementation of all these characteristics in AI systems have huge potential but, again, require close collaboration between researchers working in the two fields (or at their intersection). A similar discourse holds for systems capable of analogical reasoning. There are actually very few systems able, to a certain extent, to deal with this crucial cognitive capability. Apart from the previously mentioned pioneering CopyCat system, the most advanced in this respect is [5] [6]

the Companions Cognitive Architecture developed by Ken Forbus and colleagues, and formed by a triplet of software components known as the Structure Mapping Engine (SME), MAC/FAC system for similarity based retrieval, and the SAGE system for generalization (Forbus and Hinrich, 2017). Unsurprisingly, this system is built using a structural design approach based on a theoretically driven decomposition of the processes being simulated and, in particular, on the implementation of the Structure-Mapping Theory (Gentner, 1983), which sees analogy as a knowledge-mapping and transfer process from a base domain to a target domain. Analogy and computational analogical reasoning should also be an area that machine learning researchers should look at more closely, in order to deal with the problem of transfer learning that affect their models. Analogies are also important in the context of automatic knowledge discovery[7] and creative invention of novel knowledge and can be crucially implemented also in AI systems operating in the context of “computational creativity” research. This subfield of AI research is currently open to both cognitively inspired and machine-oriented approaches (see e.g., Augello et al., 2016; Lieto et al., 2019; Veale and Cardoso, 2019; Chiodino et ah, 2020) for building systems able to exhibit creative traits when involved in tasks requiring what Margaret Boden calls transformational, exploratory, and combinatorial creative processes (Boden, 2009). The overall impression is that, for the field of AI in general, the invention or authentic “artificial styles” (e.g., in diverse artistic domains) requires something more than what is available nowadays. Creativity, however, is not only a matter of artistic domains. When we find a novel solution for solving an everyday problem, we also exploit the potential of our creative thinking. In this respect, the vast amount of literature on cognitive research about creative problem solving (in human and animal cognition) could be very useful for building creative machines. The quest for computational creativity, however, also requires considering additional elements: (computational models of) emotions that are a driving force of our decisions and meta-cognitive capabilities (Damasio, 1994); and the fact that creativity happens in a social context and, as such, must intercept, interpret, and abstract in insightful way what happens in a multiagent societal setting. For the first aspect, it is well known that current research on effective computing in AI could benefit from insights from psychological studies on emotion (Picard, 1997). However - with a few notable exceptions of systems like SenticNet (Cambria et al., 2020), developed by Erik Cambria and colleagues at the NTU and explicitly adopting psychological models of emotion (Susanto et al. 2020) to conduct sentiment analysis[8] - most current research in the field is focused on the multimodal recognition of human emotions from physical states (e.g., from face recognition or computer vision systems or from wearable devices) by using standard machine learning techniques and with little research done on emotion- driven decisions and appraisal. Within the same context of “sentiment analysis”, in fact, the focus is on building very shallow models that are able to classify, after an extensive phase of manual annotation, the “positive” vs. “negative” polarity of texts without any grounding and grasping of the emotional content expressed therein and also without providing significant findings in the context of natural language processing research (this field is mainly driven by business needs rather than scientific ones). In the context of cognitive architecture and robotics, on the other hand, computational models of emotions have been more accurately devised and adopted within agents architectures in order to inform the processing mechanisms of intelligent systems. Some of these architectures (e.g., the H-CogAff architecture developed by Sloman and colleagues, see Sloman, 2001, 2002) have shown, in accordance with psychological theories, how emotions are not a unitary phenomenon and are not all the same. The neuroscientist Antonio Damasio (1994: 131-134), for example, suggests that there are primary and secondary emotions, while Sloman (1998) suggests including tertiary emotions as well. Apart from the terminology, the difference between such variances of emotional states lies in the type of processes involved in their manifestation (ranging from physiological reactions to higher forms of semantically rich, affective states). While the first type of emotional reactive responses can be somehow detected and reproduced by shallow models, secondary and tertiary emotions require more sophisticated metamanagement activities. Emotion modelling and recognition has played and still plays an important role in the context of social robotics. The first social robot ever realized was Kismet, developed by Cynthia Breazeal (Breazeal and Scassellati, 2000; Breazeal, 2004) at MIT. This robot was the first one able to demonstrate a wide repertoire of emotional behaviours, able to engage humans in an interaction. Nowadays, most social robotic platforms currently used (like Pepper or the iCub) have a more stylized humanoid shape (in the case ofiCub, the shape resembles that of a little child). Still these platforms are mainly focused on the perceptual task of assigning an emotional label to certain visual or auditory signals or individuating some low-level social cues (e.g., gaze following) that require the interplay of perception, attention, and memory modules. However, more complex social interactions (such as building rapport, negotiation, etc.) require the interplay of additional modules such as emotion/motivation information processing modules, metacognition, and language, just to name a few. The gap to fill, therefore, concerns - once again - the integration of perceptual level social cues with higher order meta- cognitive functions that would allow considering emotions a real driver for human-like decision making. This would enable a more natural human-robot interaction that can take place over longer time scales (days, weeks, and months: what Newell calls the “Social Band”) and would extend our interaction with machines to crucial notions such as cooperation, collective-action, theory of mind, and trust (Castelfranchi and Falcone, 2010).

  • [1] Along this line, Patrick Winston, together with the computational linguist Mark AlanFinlayson - his former Ph.l). student at MIT - promoted, from 2009 to 2016, a series of workshops called “Computational Models of Narratives”, partly financed by DARPA, which becamea reference point for scholars working in this field I co-organizedwith Finlayson and other international scholars (Ben Miller, Remi Ronfard, and Stephen Ware)the 2015 and 2016 editions of the workshop ( and
  • [2] The reasons behind the lack of systematic results are different. The first one concerns the factthe software has been announced very recently, in respect to the time of writing this. Thesecond one concerns the fact that GPT-3 is currently (as of September 2020) accessible only viaApplication Programming Interface (API) and that this access has been provided to a minimalset of researchers around the world, selected on a discretional basis by OpenAI.
  • [3] A synthesis is provided in an MIT Technology Review article:
  • [4]
  • [5] Actually this field of investigation it is not really new (only its name is), since the quest for providing human-understandable explanations of algorithmic decisions was already the battle fieldof the experts systems and the case-based reasoning community.
  • [6] Antonis Kakas and Loizos Michael are two of the most active researchers in the AI communitywho argue for the need for a cognitive approach to argumentation and explanation frameworks(Kakas and Michael, 2016) and who have proposed, in the last few years, a number of tutorials and events at major AI conferences on this theme. E.g., see and
  • [7] An important sub-area of this historical field of AI is the one of automatic scientific discovery,championed by the protagonists of the cognitivist tradition like Pat Langley and his pioneering BACON system (Langley, Bradshaw, and Simon, 1983). An updated view on the field isprovided in Langley (2019).
  • [8] Sentiment analysis is a subfield of affective computing aiming at automatically individuating thedriving sentiment or mood in pieces of texts.
<<   CONTENTS   >>

Related topics