1. Introduction
Affective computing supports artificial intelligence by designing technologies that allow computational systems to recognize and elaborate human emotions and affections [
1], enriching Decision Support Systems’ features in making decisions, from emotional classification [
2] to biometry [
3]. Up to now, the research community has gained information about human emotions and affections [
4,
5,
6,
7]. Current technologies try to go deeper in recognizing “profound” human features [
8,
9,
10], rather than just recognizing shapes, objects, or faces. Applied to autonomous agents, brain- and consciousness-related scientific research continuously spread towards a wide range of study fields, such as those related to psychiatric disorders [
11], Neuroscience, and Cognitive Psychology. It is interesting, though, to examine in depth how the human cognitive scale leads to intelligence and how it can be translated into technology. Computational models of cognition are of great utility when used to simulate cognitive processes, providing testbeds for cognitive scientists to evaluate their hypotheses [
12]. Proposing formal models of cognition does not represent a reductionism of human mind description, but introduces new frontiers of comprehension.
In this paper, we do not want to enter the philosophical, psychological, neuroscientific, or medical debate about brain functioning. We limit our work to the analysis of a selected set of human cognitive levels, with the aim of translating mathematically their hierarchy into simple computable models. By means of empirical experience and scientific suggestions, we intend to contribute the widening of the way artificial intelligence is conceived; we think that the computational reproduction of human intelligence should not be based only on a single learning layer—for example, a model designed with just one neural network that classifies sensory inputs—but also on further levels of cognition related to those we think are the most important human factors. It is necessary to pursue the building of new paradigms of artificial intelligence that can be compared with human attitudes. Classification and regression algorithms, when exclusively based on sensory inputs, represent a conceptual reduction with respect to human intelligence, since human beings do not just recognize shapes or facial expressions. To be considered as “intelligent”, an agent should take decisions not only by processing sensory samples, but also by taking into account cognitive levels like those of emotion and consciousness.
Is it sufficient to consider just a neural network as intelligent? Do we need to design further levels of learning, in order to simulate reliable human cognition? The answer lies in the deepening of human cognition comprehension.
Although this paper mainly impacts Smart Sensing, it is prodromal to the rational integration of consciousness in solutions with artificial intelligence. In fact, already in the present work, the modeling of artificial sensations, perceptions, emotions, and affections is analyzed in a framework that sees attention, awareness, and cognition at the sensory-cognitive stages. These last stages will be discussed in the next paper, which will allow us to show the complete methodology on an artificial consciousness interposed between sensing and artificial intelligence. Such a vision is absolutely new in the international scenario since, without dealing with religious, ethical, or psychological issues, it allows us to strengthen cognition in the context of artificial thinking by enforcing human ability to create more sophisticated artificial intelligence. In the last 10 years, there have been many works in the context of Smart Sensing, especially if we consider the applications implemented with Industry 4.0 and with the Internet of Things [
13,
14]. The European Commission has promoted and financed different initiatives favoring human–machine interaction thanks to emotional involvement. For example, with the 7th Framework Programme, the ALICE (Adaptive Learning via Intuitive/Interactive, Collaborative and Emotional systems) project showed the effect of a type of learning that increased the level of attention of the learners, thanks to an analysis of learning styles and emotional involvement. With Horizon 2020, not only emotions but also affections have become the center of attention in different projects with international impact. They concern not only learning assisted by artificial intelligence technologies, but also automotive, online trading, and more generally computational finance, customer profiling in digital marketing, etc. This work frames Smart Sensing, specifically perceptions, emotions, and affections, as a bridge towards artificial consciousness to solve the gap between sensing and artificial intelligence, enriching the latter with elements that allow us to take another step towards miming the processes of analysis, evaluation, and human understanding. In fact, in the international scenario, in robotics, we find that the sensor system is at the service of artificial intelligence without solution of continuity and without elements that, in addition to sensations, can digitize perceptions, emotions, and affections. Furthermore, the same studies on affective computing appear in their own right to analyze specific issues and respond to particular needs. In this work, we instead present an integrated vision. The sensation captured by the sensor is transformed into perception, and then enriched with emotions and affections. Only after these steps, it will be possible to analyze the cognitive effect and the following decision, thanks to an artificial intelligence. The present work fills the gap between sensing and artificial intelligence by modeling artificial human-inspired cognitive levels, so that we can define the “historical–cognitive enrichment of information” thanks to the stimulus–memories interaction.
In
Figure 1, we want to introduce our info-structural model of cognition like an onion. Layers 1, 2, 3, and 4 depend on layers immediately outside, while Layers 5, 6, and 7 depend on the inside layers. The points of contact between two or more layers represent their retroactive dependencies.
In the following sections, we present the model showing how cognitive state processing has been managed and how Smart Sensing has been modelled. The last part of the present study illustrates the experimental results obtained by processing visual stimuli inputs. In the discussion, cognitive levels related to the model are mentioned with the initial uppercase, while real human cognitive levels are mentioned with the initial lowercase. The contribution we want to convey is to suggest a general framework that acquires the five sensory signals to transform them into emotional and affective artificial cognition.
2. Related Works
The term “smart sensing” is commonly associated with applications regarding energy-efficient smart sensors, mostly for the Internet of Things [
15,
16]. In this paper, we want to adopt the same title to define the section of our framework that acquires sensory inputs to compute artificial instances of Sensation, Perception, Emotion, and Affection cognitive levels. This goal requires the definition of a cognition model which assumes an info-structural form.
The attempt of reproducing human perception, as well as emotion and affection, has been addressed in many ways, and it is difficult to illustrate each in a study that is not a review, but an original contribution in the direction of creating a basis to arrive at cognition starting from the sensation. We can, however, describe some of the most common approaches. In Reference [
17], the concepts of sensation, perception, and cognition are taken into account separately, declining the second as active, i.e., sensory acquisition with environment adaptation capabilities, and passive, i.e., sensory acquisition without any feedback; in the case of an electronic tongue, modeling was performed according to a mapping of human perceptions to an artificial sensor system. The study takes into account levels of sensing, perception, and cognition; the immediate activity of our sensory system, the interpretation of sensory stimuli, and the acquisition, retrieval, and use of the information. The concept of attention is also taken into account for active perception as a function of the task being executed. The authors obtained two relevant results: a human-like electronic tongue, for evaluating food and water, and an artificial hand, for simulating the sense of touch. From a more biological point of view, tactile perception has also been emulated through piezoeletric artificial synapses [
18]. In Reference [
19], texture features coinciding with human eye perception have been proposed, obtaining results that are comparable with the human visual sensory system. Visual perception has also been considered by [
20], in which a cognitive system guides the attention to an object of interest, and it has been assumed simultaneous to goal strategies in [
21], by validating a system through functional magnetic resonance imaging. Other interesting models of perception are those based on Bayesian frameworks [
22] and artificial neural networks [
23,
24]. In general, except for [
17], which explicitly considers a hierarchical model similar to the one presented in this study, all of the previous cited works regard particular sensory sources or concern the concept of perception by focusing on specific tasks such as feature extraction, object recognition, and localization.
In Reference [
25], a hierarchical model based on multi-attribute group decision-making that combines personality, mood, and emotional states has been proposed. Here, personality is represented as a vector of characteristics, with mood as a state space in which the origin corresponds to the “neutral” and the affective model as the mapping between these states and the emotions. The hierarchy follows the order of personality, mood, emotion, and affective state. The results showed that, by creating a model based on group experts’ traits, it is possible to assist, or even to replace, the groups themselves in generating affective states for decision making. An example of an emotional framework based on Reinforcement Learning has been proposed in [
26]. In this framework, agents learn cooperative behaviors. They receive rewards from the environment as a consequence of their actions, computing sensations through an internal environment composed of emotion appraisal and derivation models that generate intrinsic rewards useful for behavioral adaptation. Furthermore, affective interaction mechanisms have also been studied in [
27]: social effects of emotions are classified primarily as emotions experienced but not communicated, emotions experienced and intentionally communicated, and emotions not experienced but intentionally communicated. Starting from a psychological background, they have realized a multi-agent system in which competitive and cooperative interactions have been obtained between agents with negative and positive social connections, respectively.
The purpose of this work was to generalize the attempt to reproduce the above cognitive levels, providing a framework that does not conceive an agent’s actions with respect to the environment, but only the stimuli acquisition through time-dependent polynomial functions, sliding window memories, and machine learning.
3. Proposed Model and Cognitive State Processing
As we can see in
Table 1, it is possible to refine, through a computational fashion, in several cognitive levels, what happens when we make decisions, estimates, assessments, or recognize patterns. Specifically, this study refers to non-interacting agents—machines that perform no actions with respect to the external environment—and their hierarchy of cognitive levels.
In our model, the artificial agent intercepts environmental events by means of its Sensation level and processing sensory data, and sends its results to the next level. Each cognitive level receives and processes a set of data, producing a result, subsequently recorded in memory, that is forwarded to the next level of the hierarchy. Once the computations above have been completed, the agent obtains a tuple of results, whose components are kept in memory for a limited period of time.
A cognitive state
is defined as the tuple of results related to cognitive levels’ computations. Formally,
where
is the result, or instance, of the
i-th cognitive level related to the acquisition, at the discrete time
n, of an external event, and
l is the number of cognitive levels in our structure, which is 7 in total.
By “external event” we mean the sampling, at the instant n, with step , of the sensory input signals, since the sensory sphere of a robot is made of sensors. The above assumption seems to be reasonable since artificial agents, e.g., assume a visual capacity through cameras, which capture frames, i.e., samples of the visual reality. In this study, cognitive state acquisition and cognitive level processing is considered instantaneous and characterized by a non-limited capacity, pseudo-instantaneous behavior, and ideal parallelism/concurrency.
Human beings seem not to possess a memory capable of remembering, in time, every acquired cognitive instance. In fact, memory is often classified as short-term and long-term [
28]. In order to simplify mathematically this characteristic, it is reasonable to ensure that cognitive instances would be volatile information. In fact, compared to an emotion, a sensation is less decisive for the subject’s decision; it is noticeable that an agent’s behavior is dependent on its emotional state, even when the sensory stimulus that elicited the given emotion is no longer captured [
29]. The closer we get to the affection, the more cognitive levels seem to hold back cognitive data over time; emotions stimulate the activity of memory [
30]. Thus, we consider cognitive instances
as information progressively decaying in agent’s memory.
The removal period of the
i-th cognitive level instance is defined as the period
that elapses between the cognitive state acquisition and the elimination of
from the memory related to the
i-th level. It must also respect the following bound:
where
are the removal periods of the instances related to the seven cognitive levels. In this way, the agent’s decisions depend more decisively on the activity of deeper cognitive levels. Indeed, human–environment interaction leads to decisions that depend more on an affection than on a sensation [
31,
32].
By way of example, being the cognitive state acquisition period , the period that elapses between the first cognitive level’s input acquisition and the last cognitive level’s result generation, we show, considering l levels, some sequential steps concerning the cognitive state temporal processing:
an agent acquires a Sensation instance at time n;
the agent processes the result of the i-th cognitive level at the time n, with ;
the agent processes the result of the last cognitive level l at the instant ;
the agent removes the Sensation instance acquired at time n at the instant , where is the removal period of the instance;
the agent removes the result recorded at time n of the i-th cognitive level at the instant , with ;
the agent removes the acquired result at time n of the last cognitive level l at the instant .
Consequently, a new Sensation instance acquisition can take place at the instant .
4. Smart Sensing
Smart Sensing is modeled as a hierarchy of four cognitive levels, which are Sensation, Perception, Emotion, and Affection. Each level is thought as linked to the next one, providing as output a result, i.e., a cognitive instance. In order to support subsequent processing, each instance is periodically memorized into an evanescent window memory, disappearing after a predefined period of time.
To provide visual demonstrations on how the model works, we assume low dimensionality sensory inputs and cognitive instances. In a real use case, as we show in the last section of this study, cognitive instances can be highly dimensional and proportional to the deployed feature extraction. For example, to describe a sensory visual input, it could be possible to use a Convolutional Neural Network [
33] to provide, in time, as many signals as needed to match the dimensionality of the features’ vector. This aspect will be neglected in order to show some graphs and not to overly weigh down the notation.
As regards sensations and perceptions, the literature is generally discordant since researchers consider them as synonyms. In this study, these concepts are considered as not equivalent; specifically, the following hypothesis and definitions are adopted.
4.1. Sensation
Sensations allow our mind to understand ourselves and the world around us. Although they are essentially personal and subjective, it is impossible to measure them exactly, but it is possible to ask people to describe them. This first qualitative experiment makes possible to compare sensations and to note that, in some cases, they are caused by physical world specific changes, i.e., what is outside of us and what we perceive. Generally, every variation of the physical world is perceived by human beings in such a way that description of its variation is very similar.
Although the above premise seems obvious, it allows us to suppose that there are psychophysical relations between certain stimuli—physical variables—and some sensations—psychological variables—that tend to be predictable and independent with respect to an observer. To reach a further level of detail, it is possible to distinguish the concepts of “sensation” and “neuro-sensation”. The first declination is exactly the definition provided above and it is linked to sensory organs. On the other hand, when the stimulus, coming from sensory organs, reaches the central nervous system, more correctly, we must consider a “neuro-sensation”, or perception, a feeling that is enriched thanks to the memory of experiences. This explains, for example, the reason why people can derive a different perception from the same sensory stimuli. In our study, we will consider sensation and perception as two different cognitive levels.
In human beings, sensation is considered as the modification of our neurological system due to stimuli offered by the environment and captured by our sensory organs, whose channels are hearing, sight, smell, taste, and touch. In machines, a sensation can be considered as the sum of the contributions related to the sensors’ signals. For example, while for human beings the sense of sight is acquired through the activity of eyes, for a machine a similar result is acquired through a camera. The same goes for the hearing sense; what humans listen to through their ears can be computed by machines through their microphones.
While human beings’ sensation is a complex cognitive level to describe, for a machine its definition is simpler. For example, a machine’s visual sensation is made of pixels, each of which, independently from the color-orientation, can assume maximum and minimum values (corresponding, respectively, to white and to black, by considering the RGB format). The sensory input related to camera frames has huge dimensions since three signals are evaluated for each pixel in the scene, but their characteristics are practically the same. Similarly, machine auditory sensation is made of sampled waves’ intervals, each of which can assume a certain range of magnitude values on the decibel scale. These two examples of sensation can be considered as a computational approximation of those of human beings.
The additive sensory signal is then defined as the sum of the five sensory input signals. Formally,
Sensory functions
are described by random processes with threshold, since the arrival of a sensation to the agent derives from stochastic occurrences. Likely, human beings perceive their conscious sensory experience as long as the stimuli overcome a certain threshold [
34]; this phenomena is also noticeable when analyzing electrodermal activity (EDA) signals [
35]. Therefore, we can define the threshold of the
i-th sensory input
as follows:
The Sensation cognitive instance is defined as the vector of decaying sensory inputs. Formally,
where
,
and
is the sampling period of the
i-th sensory input.
Furthermore, Sensation memory
is defined as a tensor of the following type:
with
and whose terms
expire following the decay function:
with
as the generic continuous temporal instant.
Analyzing the total additive sensory signal, evaluated at discrete cognitive state acquisition time instants, it is possible to understand how the Sensation cognitive level processes sensory input signals together. The additive sensory signal tends to acquire the shape of the input with the highest magnitude.
In
Figure 2, we highlight how Sensation computes the function
when processing both increasing and decreasing sensory signals, together with how its cognitive instances are managed in the general memory. Dashed lines, on the left picture, represent the time ranges during which the most recent sensory sample is acquired and the decay behavior of the cognitive instances is achieved. When the sixth state of the memory is acquired, the first sensory sample, recorded at the zero instant, has already been deleted from the general memory. When an instance is added to the memory, its intensity and its relevance has already started decaying.
Sensation has been modeled as a cognitive layer which takes, at every instant n, samples of the five sensory inputs and, according to their amplitude, computes the decay function to decrease their importance as a function of time. The values of the decay function are acquired by the Perception cognitive level, which keeps them in memory to combine. For each sensory dimension, the current sample is acquired, and the decay function relative to the previously acquired samples is computed.
4.2. Perception
The laws of perception are said to be autochtonous since they are considered as innate and not as a result of learning, even though an evolutionary progress in the elaboration of perceptions themselves is present. Since the first months of life, the newborn has been able to recognize colors and shapes—in particular, human figures—but only after the acquisition of the so-called “perceptive constancy”, i.e., the ability to connect forms or figures in which he recognizes similarities [
36].
While in Europe, the Gestalt developed the phenomenological laws of perception, in the United States, the New Look of Perception paradigm took hold. The latter school, practically neglected by Gestalt, has become relevant to the personal and social values related to perceived objects. The forms are no longer considered innate and are anchored to the needs and purposes of individuals. Personal values and needs have become key elements in structuring perceptive processes, and significant objects and symbols are perceived as distorted and dissonant.
Under the above hypothesis, by leaving out more psychological aspects and orienting the discussion on a scientific-informative vision, we can introduce, from a systemic-functional point of view, the following definition.
Perception cognitive instance is defined as the following:
where
is the Perception’s decay from the memory
, in time, following polynomial order. Weight vector
, related to memory, is a term that indicates, progressively, the relevance of window elements
. If
is the weight related to the most recently added perceptive element in the window, and
is the weight related to the least recently added perceptive element in the window itself, it follows necessarily that
,
∀
. The quantity
is the weight vector linked to single Sensation instances, and
is the weight vector related to residents in memory Perception instances. Vector
is not constant, but variable, and it depends on time and on the results retrieved from the cognitive level of attention, since human beings’ perception is affected by attention [
37]. The terms
and
have been introduced to obtain the desired behavior related to the decay of the instances, while the term related to the memory is divided by
to achieve faster decay.
The elimination of a generic memory element occurs at the instant , where is the removal period of the perceptual instance in memory.
Finally, we define the perceptive vector, useful to compute cognitive instances of the next layer of the model, as follows:
The effect of the weight vector
, together with the contributions related to the designed memory model, can be observed analyzing the results of Perception.
Figure 3 and
Figure 4 show an effective example of how Perception computes its results. By setting a short removal period
, this cognitive level does not increase the magnitude of its inputs, but models sensory signals according to the content of the general memory and to the state of the vector
.
As the additive sensory signal, Perception also tends to acquire the shape of the input characterized by the highest magnitude. This result seems reasonable since human beings tend to perceive the most relevant received signals. For example, a loud sound increases the perception level with respect to the hearing sense [
38], while a room with no light decreases the perception related to sight. Moreover, high frequency sensory signals play an important role in defining the output, since the human cognitive model regards them as highly emotional informative [
39]. For example, the auditory sensory signal of a scream is characterized by a higher frequency, since human voice pitch is more acute.
Figure 3 shows a comparison between the total additive sensory signal and the Perception; on the right, perceptive peaks and the memory contribution between them are apparent. While flatter regions of
exhibit an oscillatory behavior, the Perception regions look smoother. In addition, as shown in
Figure 5, Perception outputs always keep information about the sensory signals’ shape, even when increasing the general memory capacity.
Decays are computed as exponential functions characterized by the amplitude of acquired data. However, this statement is not necessarily true, since the sensory amplitude is modulated by the parameter , which depends on the agent’s level of attention related to the given input source. We think that this model simplifies reasonably the way humans remember perceptions of external events through their senses, and takes effectively into account factors—memory and attention capacities—that are related to subjects’ personal characteristics.
Perception takes the vector of Sensation cognitive instances computed in the previous level and, for each of its dimensions, applies time-varying weights stored in to simulate the perception of a subject with respect to each sensory organ. For example, by considering as the weight related to the Perception of the sense of sight, its lowering towards zero indicates a visual deficit that can occur. This cognitive level applies the decay function to its inputs to decrease their importance as a function of time. All the instances of Perception acquired in the past are added to the current one to keep memory of the past. Finally, the values of the decay function are acquired by the Emotion cognitive level, which keeps them in memory to combine.
4.3. Emotion
In evolutionary or Darwinian terms, the main function of emotions is to make humans’ reaction more effective during situations in which an immediate response is required for survival, i.e., a reaction with no necessary cognitive or conscious processing. According to Cannon-Bard, the emotional stimulus is firstly processed by subcortical centers of the brain, particularly by the amygdala, which receives information from thalamus posterior nuclei to induce an autonomic and neuroendocrine reaction. Emotions, though, cause many somatic modifications, e.g., heart rate change, increased or decreased sweating, respiratory rhythm acceleration, and muscle tension increases or relaxation.
Emotions also have a relational function, i.e., communicate and self-regulate our psychophysiological state. According to James-Lange’s theory, emotion is a response to physiological variations. Humans experience many emotions with different physiological sensations and reactions. These theories have been criticized since people affected by spinal cord injuries still express emotions, as well as many similar physiological expressions. In some cases, especially due to strong emotions, a direct association, between physiological and emotional manifestations, still exists [
40].
In order to build an effective model, it is necessary to take into account the classification of the most important emotions. One decisive contribution comes from the significant research conducted by Paul Ekman. He led thousands of experiments and acquired a high amount of data related to our topic of interest [
41]. Therefore, we can classify emotions into anger, disgust, sadness, happiness, fear, surprise, and contempt [
42].
The Emotion cognitive instance is defined as follows:
where
is the Emotion’s decay from the memory
, in time, following polynomial order. Weight vector
, related to memory, is a term that indicates, progressively, the relevance of window elements
. If
is the weight related to the most recently added emotional element in the window, and
is the weight related to the less recently added emotional element in the window itself, it follows necessarily that
,
∀
. The quantity
is the weight vector linked to single Perception instances, and
is the weight vector related to resident in memory Emotion instances. Vector
is not constant, but variable, and it depends on time. The terms
and
have been introduced to obtain the desired behavior related to the decay of the instances, while the term related to the memory is divided by
to achieve faster decay.
The elimination of a generic memory element occurs at the instant , where is the removal period of the emotional instance in memory.
Finally, we define the emotional vector, useful to compute cognitive instances of the next layer of the model, as follows:
Results shown in
Figure 6 and
Figure 7 represent sample plots of Emotion. It is evident that, while Perception presents evident fluctuations—mostly related to sensory signals’ shape—Emotion radically attenuates this behavior by providing smoother functions. It is possible to discriminate emotional peaks and assign them to emotional classes. In addition, as illustrated in the figure, the increase in the emotional memory capacity, by increasing the removal period
, results in more relevant memory contribution and wider emotional peaks, but the differences with Perception are still evident.
We also observe that, for both Perception and Emotion, the increase in the memory capacity results in the increase in agent sensibility. This characteristic can be seen by comparing the plots in
Figure 5 and
Figure 6: regarding the first plot, the function trend rapidly changes; regarding the second plot, the function stabilizes more persistently over time. This is an interesting result, since sensitive subjects are inclined to maintain perceptive and emotional states for a longer duration.
However, considering Emotion as a polynomial combination of Perception instances and of the emotional memory is reductive, since it is not possible to associate deterministically a cognitive instance to a given emotion. A stochastic approach is therefore required. In general,
M being the family of stochastic models that perform the classification of emotions by means of a training set, we define the emotional class related to an Emotion cognitive instance as follows:
with
the vector of probabilities, respectively, related to neutral, anger, disgust, sadness, happiness, contempt, surprise, and fear.
is the predicted emotional class at the instant
n, while
is the model’s set of parameters, and
is the basis function for the transformation of the cognitive instance at time
n.
Training samples are therefore of the following form:
where
indicates the association between Emotion cognitive instances
and each emotional class
, and
is the number of samples in the training set. We can obtain a vector of scores for every emotional class. This approach is clearly supervised since, to obtain a fully unsupervised emotional learning, it is necessary to classify Emotion cognitive instances through an autonomous activity of consciousness, since emotion discrimination requires at least a moral basis. However, in this discussion, we do not want to introduce such a complexity.
Emotion takes the vector of Perception cognitive instances computed in the previous level and, for each of its dimensions, applies time-varying weights stored in to simulate the emotion of a subject with respect to each sensory organ. For example, by considering as the weight of the Emotion related to the sense of sight, its lowering towards zero indicates an emotional desensitization that can occur. This cognitive level applies the decay function to its inputs to decrease their importance as a function of time. All the instances of Emotion acquired in the past are added to the current one to keep memory of the past. Finally, each Emotion cognitive instance is associated with an emotional class and the values of the decay function are acquired by the Affection cognitive level, which keeps them in memory to combine. The weights in are also dependent on the decay function related to Affection cognitive instances acquired at the immediately previous instant.
4.4. Affection
Aristotle conceives affection as “páthos”—one of the ten categories of the substance; senses produce affections to impress sensory data on the spirit. The elements that cause sensitive and sentimental changes in the spirit, e.g., pleasure, pain, and desire, come from external objects; therefore, the affections coincide with the “passions” of the ethical sphere. The latter meaning is also found in Cicero, who adopts “affectiones” as a synonym of “perturbatio animi”, and in Augustine of Hippo, who uses the terms “perturbationes”, “affectus”, and “affectiones” as synonyms for “passiones”.
According to Plato, Cartesio, Spinoza, Leibniz, and Hegel, whereas good behavior is based on knowledge of truth, the affections are dangerous because they affect negatively cognition and moral attitudes. In the Aristotelian and Epicurean philosophies, the affections are valid in the cognitive field since sensory data are passively received by the subject and therefore they are always true, while anticipatory judgments are false. No man exists without passions; they need to be moderated instead of removed. Kant states it is essential that our spirit is “affected” by affections—otherwise, the cognitive activity of reasoning would be false—but if they are conceived as passions, their role is negative, i.e., cancers of practical reason.
For this work, with the aims of stimulating an advancement of ICT technologies by computationally deepening the above concepts, we provide the following definition.
The Affection cognitive instance is defined as follows:
where
is the Affection’s decay from the memory
, in time, following polynomial order. Weight vector
, related to memory, is a term that indicates, progressively, the relevance of window elements
. If
is the weight related to the most recently added affective element in the window, and
is the weight related to the less recently added affective element in the window itself; it follows necessarily that
,
∀
. The quantity
is the weight vector linked to single Emotion instances, and
is the weight vector related to resident in memory Affection instances. Vector
is not constant, but variable, and it depends on time. The terms
and
have been introduced to obtain the desired behavior related to the decay of the instances, while the term related to the memory is divided by
to achieve faster decay.
The elimination of a generic memory element occurs at the instant , where is the removal period of the emotional instance in memory.
Affection, as shown in
Figure 8 and
Figure 9, heavily attenuates Emotion behavior with a sort of emotional peak grouping. This seems reasonable, since an affection can be considered as the synthesis of a certain set of emotions felt at a given time. Notice that the more the memory capacity increases, the more the Affection improves its emotional synthesis.
Affection takes the vector of Emotion cognitive instances computed in the previous level and, for each of its dimensions, applies time-varying weights stored in to simulate the affection of a subject with respect to each sensory organ. For example, by considering as the weight of the Affection related to the sense of sight, its lowering towards zero indicates an affection decrease that can occur. This cognitive level applies the decay function to its inputs to decrease their importance as a function of time. All the instances of Affection acquired in the past are added to the current one to retain memories from the past.
We have modeled affection as the decaying contributions of current emotions related to the sensory signals and the past affective history. This cognitive level is intended as a combination, a “grouping”, of different emotions related to an external entity—an “emotional synthesis”. Finally, we can conclude that, while Sensation and Perception can be classified as unique categories, what we call “Sensing”, Emotion, and Affection can be classified as another category, called “Sentiment”. Both categories are part of the macro-category called “Smart Sensing”.
6. Experiments with Visual Stimuli
The following experimentation is affected by the work in [
7], in which spontaneous emotional activity was recognized and classified for a group of people by subjecting visual stimuli in order to create a database of facial expressions. The above study considered a real case of human emotional action, and we were inspired to try a similar approach by “replacing” one of those subjects with the model presented in this paper. A person receives a visual stimuli, e.g., an image, and accordingly shows an emotion; we endeavor to have an artificial agent that can acquire the same kind of visual stimuli and consequently to output emotions. Our purpose was not to accomplish a comparison between machines and individuals, but to instruct Smart Sensing to supply those emotions felt by human beings when they are affected by certain incentives. Thus, we present a use case in which sequences of images are transformed into Emotion cognitive instances in order to produce artificial emotional activity.
In
Figure 10, we can find the learning architecture and a method to validate a learner coupled with the Emotion cognitive level. The present experiment is partitioned into two cores: (i) model evaluation, conducted by training and testing the learner with predefined episodes—a series of Emotion cognitive instances obtained by supplying pre-established sequences of visual stimuli; (ii) emotional activity, achieved by testing the formerly trained model on never-seen episodes—a series of Emotion cognitive instances obtained by supplying shuffled configurations of the visual stimuli served for the evaluation test. The former task is necessary to guarantee that the learner provides the desired emotions with respect to the established episodes; the latter is essential to inspect the emotions the learner produces when the memory
presents states different from those involved in the evaluation.
The images, reshaped to a 150 × 220 size, are labeled according to the approach described in (32) and are subject to the ImageNet [
43] network, which provides features vectors of 12,288 components, of which an example of transformation into a cognitive instance is shown in
Figure 11. After a first training phase, the learner is evaluated and re-trained subsequently to a feature selection based on importance weights. The evaluation is executed by training and testing on three types of episodes—
,
, and
—whose stimuli are listed in
Table 2. During the training phase, when a given emotion class becomes associated with an Emotion cognitive instance, the learner acquires the above association as a function of the current instance and the state of the Emotion memory. By way of example, when an Emotion cognitive instance associated with an injury stimulus becomes labeled with disgust, the agent will be trained to show disgust, with respect to the injury stimulus, by also keeping information, through the memory
, about instances acquired previously. Thus, the emotional behavior depends on the order through which images are supplied to the agent.
Together with the accuracy, a “coherency” metric is also considered. It represents the ability of the model to distinguish between positive, negative, and neutral emotional stimuli. Positive stimuli are considered as associated with happiness and surprise, while neutral and negative ones, respectively, are associated with the neutral and the other remaining emotions. We decide to train the learner by means of ensemble models [
44], in order to acquire more stable predictions for the small dataset we constructed—about 612 samples. In
Table 3, we show the model evaluation performances obtained by using Random Forest [
45,
46] and XGBoost [
47]; we reached better accuracy and coherency with gradient boosting, of which a confusion matrix is shown in
Table 4 and
Table 5. Contempt- and happiness-related stimuli are sometimes confused; the former is recognized as happiness, the latter as sadness. However, as can be seen in
Table 5, confusion regarding positive and negative emotional stimuli turns out to be suitable for the experiments. The results of the emotional activity on never-seen episodes involve accuracy and coherency reduction due to the effect of memory. In fact, without the contribution of
, evaluation and emotional activity tests provide the same results.
In
Table 6, we show the emotional activity tests performed over episodes different from those used for training; the model was tested over a test set composed of the same samples used for the evaluation, but shuffled randomly. The contribution of
is noticeable; it determines a lowering of accuracy and coherency, causing a completely different emotional activity, compared to the emotions predicted in the evaluation test. Even if the agent shows different emotions towards the same kind of stimuli, the relative frequency of predicted emotions is approximately equal to the relative frequency obtained for the evaluation. For example, as can be seen in
Table 6, even when the agent, during the emotional activity, has predicted different emotions with respect to the same visual stimuli provided for the evaluation test, it always shows approximately 23% of Fear. The obtained results show the dependence of the emotional activity on
and on the order through which visual stimuli are provided to the model. The agent outputs emotions according to its past history and with an approximately constant relative frequency.
The results of the emotional activity as a function of
, shown in
Figure 12, suggest that, when the Emotion visual weight
tends toward 0.1, accuracy and coherency over never-seen episodes assume the same score achieved for the evaluation test (85% of accuracy and 92% of coherency).
When the importance of Emotion cognitive instances decreases, the agent tends to ascribe less relevance to current memory states for the emotional activity, predicting emotional classes according to the past state of the memory related to the training phase. During the emotional activity, when decreases, the agent takes into account the emotional history to which it has ascribed more importance in the past.
7. Discussion
Results show that the agent acquires the desired emotional experience with suitable accuracy and without the use of a neural network. The emotional behavior is consistent by virtue of the constant relative frequency of emotions; the content of cognitive memories affects the emotional output and, as a consequence of the tests conducted on episodes different from those learned in the training phase, the agent provides different emotional behavior with respect to different sequences of stimuli. The variation in the emotional behavior is represented by the lowering of accuracy with respect to the initial model evaluation. We have also shown that the agent, by lowering the importance of the current emotion, behaves as if the new sequence of stimuli was identical to the one learned in the past. Therefore, when the agent does not ascribe importance to its emotions, it starts assuming a behavior that does not depend on the current stimuli but on the most emotionally relevant past ones.
A comparison between our model and the related works presented in
Section 2 reveal differences. The study conducted in [
17] introduced a threshold concept that is similar to what we defined in (4), but that associates it with perception rather than with sensation; we have assumed this process as prior to the cognitive acquisition of the Perception cognitive level. In Reference [
17], the concept of artificial perception is also conceived as subjected to the assessment of a level of cognition that provides meaning to sensory stimuli. On the contrary, in our info-structural model, we associate the level of cognition with that of consciousness, which we assume takes into account also emotional and affective processing related to the perceived stimuli. We believe, in fact, that human cognition also depends on the emotional memory associated with those stimuli. Furthermore, our perception representation is closer to the concept of passive perception, since our model does not take into account the agent’s actions towards the environment. Other studies on perception [
18,
19,
20,
21,
22,
23,
24] have very few similarities with the present study, since they focus on particular characteristics by addressing problems that we have overlooked due to the generalization aims of the present study. In fact, our intention was to model the general functioning of perception in relation to the cognitive levels of Emotion and Affection. The study in [
25] models a personality characteristics vector as a function of variables related to neuroticism, extraversion, openness, conscientiousness, and agreeableness, which are instantiated by answering personality tests. In our model, the tracts of personality are determined by the emotional training occurring during the model evaluation theorized in
Section 4.3 and described experimentally in
Section 6. In Smart Sensing, an agent’s personality is determined dynamically by the associations between the emotional classes and the perceptions of stimuli with the addition of related emotional memory content. This is, in fact, a pre-defined orientation through which we define the way the agent emotionally reacts to the environment. The approach in [
26] regards an agent’s behavioral adaptation as a function of the rewards received from the environment. In our framework, the emotional behavior does not depend on the actions the agent performs, since non-interacting emotional activity does not seem to depend on rewards. In fact, as shown in
Section 6, our agent, like a human being, outputs negative emotions with respect to negative stimuli that, like the vision of someone sick, could not depend on its actions. Even though it is necessary to model an emotional mechanism based on the interaction with the environment, we think that it is more suitable to first build a cognitive model based on human behaviors that do not depend on rewards. Furthermore, we consider the adaptation to be associated with the conception the agent has of good and evil, our understanding of which will deepen in continuation of our studies regarding consciousness, in which we will investigate the level of human morality that does not depend on the actions performed in the past.
The experiment described in
Section 6 can be placed in the scenario related to the research on the so-called “artificial emotion” [
48,
49]—emotions “felt” by a machine. In this field, remarkable studies include those of [
50], in which robots are provided with facial expressions based on the interaction with a human partner, and [
51], which provides a general framework for designing emotions in autonomous agents. With the present study, our contribution allows an empirical and info-structural representation of artificial cognitive instances in terms of vectors. Machine emotions become “recognizable” inside an artificial agent and contribute to determining a form of emotional activity that depends on the upper cognitive levels—Sensation and Perception—and on cognitive memories. The present model acts regardless of agent goals and actions towards the environment, providing the agent with emotions even when it acts just as a “viewer”. Smart Sensing could be used as an empirical hypothesis for inquiring the way humans process information perceptively and emotionally, as well as verifying future emotional activities of subjects downstream to a specific history of stimuli. Smart Sensing represents the starting point of a new approach for implementing an artificial consciousness, which takes into account Sensation, Perception, Emotion, and Affection cognitive levels as a function of time. In addition to demonstrating that the model functions in reproducing artificial emotional activity, providing good results with a limited amount of samples, this study provides a framework, never presented before, enforced by psychological and behavioral literature, which encourages the development of artificial intelligence systems through the observation of human sensations, perceptions, emotions, and affections. We argue that, at the current stage of research in artificial intelligence, it is no longer suitable to design cognitive systems that neglect the highlighted cognitive levels and the inter-functional relationships between them. As we have seen, emotion depends on the way the stimuli are perceived through sensory sources and on the level of affection related to the stimuli themselves; this is a statement revealed to be true for human beings, thus testable by everyone, and should be true also for artificial “minds”. The research community can contribute to the present research line by performing experiments that also take other sensory channels into account, e.g., auditory or tactile, and by expanding Smart Sensing with further cognitive levels. From an applicative point of view, through our implementation choices, it is possible to neglect the use of neural networks, which, by definition, need large amounts of samples, when developing emotional sensor systems, e.g., agents with affective capacity. Future developments could consist in trying to turn the learner used to output emotional classes into a formal model that does not include any form of supervision, which is a challenging task. The solution to this last issue may involve the dependence of the Emotion cognitive level on a mechanism of consciousness capable of distinguishing emotionally positive stimuli from negative and neutral ones. Another future extension of this work is the support of cognitive instance classification with a facial expression recognition model, a smart way of providing the agent with empathic functionalities. The present work lays the foundations for the design of our idea of artificial consciousness, which will be addressed in a subsequent paper. We will present an info-structural model of cognition based on attention, awareness, and consciousness, which, as we will see, intrinsically depends on Smart Sensing.
We intend to underline the novelty of this study from a methodological point of view. It opens a unitary perspective regarding the interaction between all the cognitive levels reproducible in an artificial agent. This is a great challenge for the research community, since it also favors the concurrence of the human sciences and the overall branches of knowledge, which, in a logic of integration, can offer a renewed technological–scientific and humanistic path in the present moment of research, elaboration, and applications. The present work intends to outline a research hypothesis and illustrate horizons according to an overall vision of the subject-man/artificial agent relationship.