Emotion AI and the Ubuntu Philosophy

Alex Potamianos
11 min readSep 12, 2022

--

Emotion AI and the Ubuntu Philosophy

In this article, I will talk about emotions in a social context for humans, primates, and machines. The article comes in three parts: First, I will review the aspects of sociality that makes us humans unique among all species, especially for emotion sharing as a prerequisite for cognitive development. This will be mainly an elevator pitch of the work of Tomasello in two of his seminal books “Origins of Human Communication” and “Becoming Human”. In the second part, I will briefly discuss the state of the art in the quest to arm computers with emotional and social intelligence, as well as the challenges that lie ahead. I will conclude with a reminder of the marvels and pitfalls of our socio-affective neural pathways that can lead us anywhere from socio-affective flow to mob mentality (socio-affective groupthink).

Human sociality and the Ubuntu philosophy: Ubuntu is not the most popular open-source unix variant, but rather the set of values, behaviors, and social norms that people of ethnic groups typically of Central and Southern African origin consider to be what defines us as humans. Ubuntu (the word itself comes from the Zulu language), in its many variants, emphasizes that humans share a universal bond, “I am because you are”. Recent findings in neuroscience, developmental psychology, and evolutionary developmental biology affirm that social cognition and social norms are a significant differentiator between humans and our primate relatives.

Let’s take a step back and dig a bit deeper. Scientists, most notably Tomasello and his team at Max Planck, experimented for the past four decades with chimpanzees, bonobos, and human children to understand what makes us uniquely human. One of the main findings is that although children and primates at age two have very similar capabilities when it comes to physical cognition (skills associated with space, quantities, causality, etc.) toddlers are significantly more adept at social cognition (skills associated with imitation, communication and intention reading). Furthermore, children demonstrate much faster developmental growth rate in social skills, but also need many more years for their social skills to fully mature.

But how does this uniquely human sociality arise? Humans unlike primates have mental models of multiple points of view for the same object, idea, or person. The main tool for achieving this cognitive breakthrough is joint attention and joint intentionality. Primates learn dyadically: me and you, me and an object; and just like humans have elaborate capabilities of (mostly spatial) inference and mind-reading, that is reading intentions. Humans learn triadically: me, you, and a third person; me, you, and an object. The “me” and the “you” in this context become an “us” we have shared attention, share common ground from previous interactions, and are willing to cooperate. The “us” is not always in agreement or consonance but rather negotiate our “views” about a 3rd party or an object. This semantic triangulation is what gives rise to the notion of “perspective”, different views about an object, idea, or person.

Now that we are beginning to understand what makes us uniquely human, let’s go back to the Ubuntu philosophy. According to Michael Onyebuchi Eze: “A person is a person through other people” strikes an affirmation of one’s humanity through recognition of an “other” in his or her uniqueness and difference.”

In essence, what is being advocated here is the notion of perspective, me and you can be different and can hold different views. I am a human, you are a human, we are the same but we are also different. The cognitive capacity to hold -for example- for the same word different cognitive representations emerges around 3 years of age together with the notion of collective intentionality. Not just me and you, us, sharing attention, experiences, and goals; rather, me and you and you and you and you, collectively us sharing attention, experiences and goals.

Continuing on Eze’s quote:

“It is a demand for a creative intersubjective formation in which the “other” becomes a mirror (but only a mirror) for my subjectivity. This idealism suggests to us that humanity is not embedded in my person solely as an individual; my humanity is co-substantively bestowed upon the other and me. Humanity is a quality we owe to each other.”

To paraphrase: your subjective views as they are mirrored off me via mind-reading, become collectively an “objective” view, our common understanding or the “truth”. These “truths” eventually get coded to social norms and moral codes, the building blocks of societies. All in all our ability to negotiate and coordinate among multiple views, tell “right” from “wrong” gives rise to our unique sociality, makes us truly human. The ability to form “objective” views and conform to social norms (through executive regulation) emerges at age 4–5 for typically developing children.

OK but I promised to talk about emotion, how does emotion play into all this? Well, our social skills cannot develop without forming emotional bonds with our caregivers in the first year of our lives. Emotional sharing is a prerequisite to normal human development. Children engage in proto-conversations with their caregivers at a very early age, way before they develop language skills. Proto-conversations involve facial gestures, pointing, vocalizations and are structured in turns just like regular dialogue for example see the “still face” experiments here. Affective convergence is the goal here when the adult caregiver is not reciprocating the baby gets angry and confused. Emotion sharing is absent only in the most serious cases of autism spectrum disorder.

What is the difference between our sociality and the sociality of ants or bees? Choice. Ants and bees have a mostly predefined sociality; their social rules are hardcoded genetically or epigenetically on their DNA (an epigenetic example is a parasitic “infection” that can cause worker bees to abandon their colony). Humans have precoded developmental pathways for their sociality but the social norms and associated morality can have tremendous variation. Humans can create societies a-la-cart, invent and negotiate communication protocols on-the-fly. Ants and bees get prefabricated sociality. We are genetically predisposed to be builders of societies, we can place collectively brick upon brick to re-define (within strict pre-existing bounds of social conventions) our social edifice.

Social Machines: How are computers doing as far as being social, understanding, and expressing emotions and behaviors? Over the past three decades, we have improved by leaps and bounds in our ability to analyze human speech and facial expressions to extract highly accurate estimates of emotions and some basic behaviors. So we can today claim that we can detect emotion in text and voice almost as well as humans, and approach human performance in image and video processing of emotions. However, machines still do pretty poorly in understanding emotion and behaviors in a social context, because machines are still lacking common ground, the set of common experiences that me and you, us, share together. So despite the ever-advancing mind-reading capabilities that machines have today, their limited social skills make them poor interlocutors and companions. Conversational AI systems, such as Alexa, Google, and Siri are becoming ubiquitous in our homes, however, the interaction remains at a very basic social level; conversational assistants are just that, assistants for providing information, giving access to music, setting alarms, etc. So although machines can read your intentions and emotions today, they are unable to share emotions, create joint experiences that define common ground, and most importantly achieve joint intentionality. To over-simplify machines today have -at best- the social cognitive capabilities of a two-year-old.

Let’s now discuss some of the recent advances in cognitive AI and artificial general intelligence, disciplines that are motivated by the cognitive capabilities of humans and are attempting to create machines that have similar traits. What are the missing pieces towards creating machines that have socio-affective capabilities similar to those of humans. Let’s start from how machines vs humans learn: machines today learn dyadically using pairs of data and associated labels for that data. An emerging area in machine learning, contrastive learning, teaches machines how to learn triadically by comparing two data points and their associated labels. Machines today learn mostly in a supervised manner trying to optimize a specific criterion towards achieving a goal, e.g., word accuracy for a speech recognizer. Human learning, especially social learning, is not necessarily goal-oriented, the social interaction itself is the goal; thus learning emerges as exploration rather than exploitation of a criterion/goal. Open-ended learning and self-learning are also hot research areas in machine learning today trying to better balance the exploitation vs exploration (EvE) dilemma. Cognitive processing in humans happens as a combination of bottom-up processes from senses/perception to abstractions/meaning but also top-down as feedback from higher-level cognitive processing to lower-level perceptual processing. The combination of bottom-up statistical processing, top-down feedback, and high-level symbolic processing is an exciting research area in cognitive AI. There are many other skills that machines lack today, for example, elaborate turn-taking; knowing when and how to speak in a multi-party conversation is a complex skill that children fully master in the first years of primary school. Knowledge representations that can take into account multiple points of view, aka perspectives, and coherently synthesize it into the “objective view” of the social group, aka social learning. And the list goes on … All these are prerequisites to be able to achieve breakthroughs needed for human-like socio-affective intelligence. Namely for humans and machines to achieve: emotion sharing, joint attention, common ground, joint intentionality (creating goals in pairs and in a group), and, finally, social norms and morality.

I left for last the notion of machines with personas and personalities. Humans have very strict preferences on who they socially interact with, i.e., who is a partner, a friend, or an acquaintance. This depends on common interest of course, what we have been calling joining intentionality, but there are also relationships that are mostly based on being mainly fulfilling at a social level. Here is where machine personality comes into play. In recent years, we have built technology that can match people towards a goal using just the voice print of these people. The matching occurs via a machine learning algorithm that uses the emotional and behavioral profiles extracted from the way people speak, their speaking style. In the future, these technologies will be also used for speech generation, to build machines that can optimally interact with you in a social, educational, or workplace setting.

OK, so we built it. Will they come? In addition to the aforementioned technology breakthroughs required, the social role that we attribute to machines has to change for these technologies to be relevant. Machines have to be accepted as partners rather than as assistants. My belief is that when machines start exhibiting human-like sociality this role transformation will happen quickly and seamlessly in our minds, simply because this is how human sociality works. Our children will be the first that will embrace this change because they are the least racist among us. Of course, there will always be Luddites, groups of people that will reject machines as equals — a form of racism that is inherent in the dark side of human sociality that I describe next.

The bright and dark sides of human sociality: The fact that we are social animals with excellent mind-reading also makes us expert killers, simply because our (emotional and social) intelligence makes us capable of predicting the reaction of prey. Our sociality also makes us capable of waging wars (weirdly wars are not a uniquely human trait; wars have also been waged between clans of chimpanzees, see for example the Gombe Chimpanzee War).

Human sociality, the ability to create norms and moral codes that are “objectively” shared in our (often narrow) social group is what also makes us capable of terrible crimes, essentially driving competitive social groups to disrepute, even to extinction. The views of others that don’t belong in our narrow social group are almost by default perceived as “wrong” if they deviate from the “objective views’’ of our social group. Going one step further our shared views get codified into social norms and behaviors that define an “objective” morality for our social group. Deviations from these canonical social behaviors are inhibited within our social group by individuals that have a high social IQ, i.e., need to belong, and are polemically criticized outside our social group. This need to conform can often become unbearable on the individual, especially for social groups where the “moral” code has been painstakingly codified over decades and centuries into rigid social structures, e.g., castes, or rules and regulations applied to your day-to-day life. This conflict between the individual’s ego and the social us is the source of much cognitive dissonance and unhappiness in human societies. Further, differences in social norms and behaviors often are associated with a notion of “objective” moral superiority in a group that leads to (universally accepted) criminal behavior: bullying, violence, beatings, lynching, assassinations, even genocide when moral superiority is backed by a state or ethnic group in power. Historically, all such actions of group violence can be attributed or are significantly amplified by a sense of moral superiority and/or a sense of being “wronged”.

Another danger of our sociality is mob mentality and groupthink. In our need to conform to social norms and be collaborative we often reach a state of dark social flow that leads to actions and decisions that are -in retrospect- dangerous or stupid.

Let’s now turn to social flow and the bright side of sociality. In a classic book, “Flow: The psychology of Optimal Experience”, the psychologist with the most unpronounceable name, Michaly Csikszentmihalyi (Chee’k-sent-mee-ha-lee), explains in detail the cognitive state of “deep focus, high energy, contentment, and high performance while performing a task”. Also known as being in the zone your brain cannot put a foot wrong — you are in total connection with your task, fully absorbed in it, losing track of time, and completely detached from the outside world. It feels like you cannot miss. Similarly, social flow is a mental state that you enter jointly with other people, where all your focus, energy, and socio-cognitive skills help you become entrained, in sync with a group of people, like pendulums swinging together. To achieve this state of social trans you need to employ your skills at recursive mind-reading, a silent mind dialogue, as well as, be willing to cooperate and surrender your ego to the group. These experiences become even more fulfilling when you share past experiences, i.e., common ground, with the group. Consider for example the deep state of group flow in a group of musicians. Social flow is a unique human ability and indeed one of the most enjoyable and rewarding human experiences.

Let’s talk about the future — how humans and machines will coexist not only for utilitarian but also for social purposes. In the not-so-far-away future, maybe only about half of your friends will be made out of carbon. My hope is that machines will become actors of good in this futuristic scenario. Human sociality has a strong maturational component, but the maturation process passes through interacting with other humans, initially your caregivers and later on your peers.

By pointing out and helping us reflect on our negative behaviors, our dark human moments, machines can be positive regulators of human behavior. Social machines together with the abundance of technological wonders, may bring an end to human strife and give a new meaning to the notion of Ubuntu. After all, social beings are the quintessential reincarnation of “I am because you are!”

--

--

Alex Potamianos

Amazon scholar, academic, co-founder of Behavioral Signals