**1. Introduction**

The dual fears of identity theft and password hacking are now becoming a reality, where the only hope of a secure method for preserving data are behavioral systems. Systems which are based on user behavior are usually understood as behavioral systems. Behavioral traits are almost impossible to steal. Multiple commercial, civilian, and governmen<sup>t</sup> entities have already started using behavioral biometrics to secure sensitive data. One of the major components of behavioral biometrics is the recognition of facial emotion and its intensity [1–3]. In the industry and academic research, physiological traits have been used for identification through biometrics. Any level of biometrics could not be performed without good sensors, and when it comes to facial emotion intensity recognition, apart from high-quality sensors (cameras), there is a need for efficient algorithms to recognize emotional intensity in real time. With the increased use of images over the past decade, the automated facial analytics such as facial detection, recognition, and expression recognition along with its intensity has gained importance and are useful in security and forensics. Components such as behavior, voice, posture, vocal intensity, and emotion intensity of the person depicting the emotion, when combined, help in measuring and recognizing various emotions.

Considering human facial images, as seen in Figure 1, recognizing the emotions and finding their intensity are vital. Primarily, 3D facial images are the most thoroughly researched [4–7] and predictions are made by the available systems based on the features that were extracted from the images for emotion and intensity recognition. The intensity of emotion plays a significant role in behavioral

biometrics for future crime prediction systems. The intensity may be referred to as "*the degree of manifestation along the dimension of behavior*" [3]. The intensity of emotion is often directly associated with the intensity of facial muscle movements. This, in turn, indicates that the intensity of muscle movement represents the index of the intensity of emotional state, implying the intensity of the emotion that is being experienced. Such intensities can be measured in both spontaneous and posed expressions. These intensities are affected by the behavior of the person, whether the emotion depicted is voluntary or involuntary. Spontaneous facial expressions of an emotion indicate the behavior of the face that occurs when a person displays involuntary emotion, with no prior planning or intention. Posed facial expressions of emotions, on the other hand, are used on a large-scale for studies involving the intensity of facial emotions. The criterion for the accuracy of intensity detection of the five observed basic emotions (and a neutral expression) is based on the analysis of the facial behavior components that are relevant to emotional intensity communication [8]. This involves detecting the face and recognizing the intensity of emotion depicted, both of which could be achieved using classifiers assisted by a training set. Existing work gives a detailed survey of the algorithms used in this area [9–11]. Also, multiple intensities are measured with a value of rank one recognition and rank five recognition, where rank one is that the intensity is measured at the highest accuracy level and rank five at the lowest. Algorithms that have been used for feature extraction span from classical techniques such as principal component analysis (PCA) and linear discriminant analysis (LDA) to modern techniques such as machine learning (ML) and artificial neural networks (ANN) [10–16].

**Figure 1.** Illustration of 5 Basic Emotions: Happy, Surprise, Neutral, Sad and Angry.

In recent years, major studies were carried out in the field of emotion intensity detection relating to three major areas. The first area relates to the cross-cultural character, where studies concluded that cultures highly agree with each other in facial emotion identification [8,17,18]. These studies also revealed a cross-cultural agreemen<sup>t</sup> in the prediction of the intensity of two different expressions for the same emotion [19]. The second area brings forward the research which shows that there exist major differences in gender skills to decode/predict nonverbal signs. In these studies, women have been superior predictors of emotions than men [20–23]. In the third area, several studies have been conducted, based on the five basic emotions, to find major error patterns and effects of the emotional intensity in diseases such as schizophrenia, autism, and borderline personality disorder [24–26].

Emotion recognition is being put to service in diverse real-life applications where a person's emotional state serves as a cue to the successful operation of these systems. Physically and mentally challenged people are impuissant to show their emotions and require an alternate criterion to perceive their emotional state of mind. The Autism Spectrum disorders in an individual have been a core area of research in affective computing, and several research works have laid a tangible emphasis of emotion recognition for such cases [27–29]. Patients seeking therapies [30,31] and counselling for depressions [32], disorder of consciousness [33], and schizophrenia [34] have been a subject of interest in many explorations to unearth their curbed and concealed emotions. The information filtering systems also referred to as recommender systems, find an interest in detecting a person's emotional state. These systems predict the preference of the user for a particular good or service and recommend the best possible option according to its ability. There are health-related recommender systems such as emHealth [35], hospital recommender systems [36], multimedia recommender systems [37,38] and for movies [39,40], web search [41] and e-commerce [42,43]. Prominent use of emotion recognition is in marketing and for getting automatic feedback. The concept, referred to as emotional marketing [44] milk affective computing for decision support [45,46] and for product feedback assessment [47,48]. E-Learning is also setting its feet wet by exploiting emotion recognition [49,50]. An emotional state of the learner may sugges<sup>t</sup> a modification in the presentation style and to be more interactive for effective tutoring. Use of emotion recognition is also getting prevalent in gaming where his emotional state governs a user's interactions. Exergames (or gamercizing) frameworks [51,52], and other interactive and cloud-based gaming are becoming emotional-aware to provide a more immersive gaming experience [53,54]. The use of emotion recognition is also applicable in law for detecting and perceiving the intentions of the suspect [55] and in monitoring for risk prevention [56] and for smart health-care [57].

This work primarily explores the use of popular ML algorithms to recognize the intensity of emotion in combination with popular feature extraction algorithms. Following is a brief explanation of the major contributions of this comparative study:


The paper includes an in-depth literature review which discusses recent works in the area of facial emotion intensity recognition and is presented as Section 2. The current challenges and motivation behind this research are also discussed in Section 2. Section 3 gives a brief insight into the experiment along with the techniques that were used. We pay particular attention to the face detection techniques and emotion recognition algorithms in this section. The different correlations between Action Unit (AU) intensities and the description of the method for calculating intensities of the emotions based on facial expressions are described in Section 3.1. AUs can be understood as the specific parts (or units) of a face which come into action while a face depicts emotion. This section also discusses the gold-standard categorization of AUs, popularly used in emotion detection and recognition systems. The accuracy achieved from various experiments and the comparative study is presented in Section 4. Finally, the paper is concluded with a discussion on insights gained from the experiments and future work in Section 5.

#### **2. Literature Review**

Facial emotion recognition has been used for a variety of applications such as the identification of Autism and Schizophrenia, detection of a drowsy driver [59], identifying abnormalities in early stages of Alzheimer's disease or schizophrenia, and for crime prediction systems. Before we discuss compare the prevalent works, it is important to understand the datasets that were used to train the recognition algorithms by popular works. Several databases exist that have been used, and are summarized in Table 1 which has been reproduced from [60]. It is also important to understand that several algorithms pose limitations. As an example, the Viola-Jones algorithm can be used along with Haar feature selection and AdaBoost training algorithm for remarkable detection of the eye region and nose bridge region with the limitation that it is only effective for the frontal images and can hardly cope with a 45◦ face rotation [61].


**Table 1.** Popular databases for measuring intensity of emotions (Reproduced from [60]). (Type: P: Only Posed, S: Only Spontaneous, SP: Spontaneous and Posed).

Furthermore, we studied various works where emotion recognition was reported as an aggregate score of all emotions or expressions and summarized our findings in Table 2. We also studied works that attempt to measure the emotional intensity of AUs, and therefore, facial emotion. Here, we look at some of the works in a little more detail that reported accuracies for individual emotion detection. Jens et al. experimented to find the expression leakage in relation to the emotional intensity with a database created by capturing images of 21 participants [68]. The study was based on three emotions of happiness, sadness, and fear. The results of the study showed that there was facial expression leakage between the posed and genuine emotions. These *leaks* were measured by finding the intensity differences between the emotions. The participants were shown video footage and were told to enact genuine and posed emotions for the experiments. The study was limited to participants from an academic background with an average age of 19.4.


**Table 2.** Comparative study of accuracy achieved for intensity of emotions.

\* Low accuracy indicates it is low for real-world applications.

Hess et al. measured the intensity of emotional facial expression for the emotions of happiness, sadness, anger, and disgust, posed by two men and women [83]. The expressions depicted had six levels of emotional intensities. The accuracy achieved were 85.3% (anger), 79.3% (disgust), 97.9% (sadness), and 91.8% (happiness). The method included collecting and pre-processing the colored images into high-quality grayscale images. This study was conducted using a set of 5 pre-defined intensity of expressions of 20%, 40%, 60%, 80%, and 100%. The authors concluded the study by providing evidence that the perceived intensity of the underlying emotion varied linearly with the physical intensity of expressions.

Biele et al. performed an experiment where the intensity of happiness and anger were measured using static and dynamic images (animations) [23]. Anger was noted to be more intense than the extreme happy emotion. Also, the intensity results for dynamic images were higher than that of the static images. Gender differences also had an impact on the results, e.g., a difference in intensity was observed for anger in males and both emotions among females. They used the 2 (Subject Sex) · 2 (Actor Sex) · 2 (Emotion) · 2 (Stimuli Dynamics) ANOVA method for finding the intensity of emotions. A need for a new methodology for measuring the intensity of emotions to have a better insight into how men and women processed different emotional information was stressed upon in this work.

Another work experimented the perceived emotion by measuring the intensities of the emotions [84]. The basis of this analysis relied entirely on the fact that facial expressions conveyed rich information concerning the state of mind. Researchers have argued that neuromotor programs of facial expression patterns might be able to serve as the building blocks of emotion communication [84–87]. The interaction pattern was observed behaviorally for emotion intensity ratings, and neutrally for functional magnetic resonance imaging activation in the amygdala (a roughly almond-shaped mass of gray matter inside each cerebral hemisphere, involved with the experiencing of emotions), as well as fusiform and medial prefrontal cortices. However, this behavior was observed only for mild-intensity expressions.

Delannoy et al.'s experiments were based on the image-based approach for the three ranks of intensities, low–medium–high [88]. They performed one-against all SVMs for training classifiers. They used only 68 images from 10 subjects from the CK+ dataset and used ten-fold cross-validation. However, the drawback in their approach was that multi-class classifiers approach assume that the rankings of intensities were independent of each other. Hence the relation between labels was not well employed for performance enhancement. Furthermore, even with the classification of expressions in three intensity rankings, the AU must still be extracted from the image sequence as a feature representation.

Several other researchers performed experiments to conduct more accurate expression intensity estimation results [78,89–91]. The authors employed the training data with known intensity labels for these works. These intensity labels were categorized into two sections, discrete rank representations [88,89], and continuous value representations [92]. Once the label intensities were employed, the expression intensity degree was predicted and validated more accurately. The drawback of these approaches was that multiple images or an image sequence were required to estimate the intensity of emotion and. Also, sufficient images were required for estimation in a sequence-based approach.

Littlewort et al. performed their experiment by conducting estimation of the intensity rankings for facial expressions by using SVMs [93]. The authors used the distances to the SVM hyperplanes to estimate the rankings of emotion intensities. Chang et al. performed a manifold learning approach discrimination for recognizing facial expressions and used the distances of the manifold to determine the intensities of the emotions [94]. However, the distances to the classification boundaries in the feature space may not necessarily reflect how neutral/strong was an expression and hence, concluded inaccurate results.

Rudovic et al. proposed a method using intrinsic topology of multidimensional continuous facial affect data. An ordinal manifold first modeled the data. The topology was then used for the (H-CORF) Hidden Conditional Ordinal Random Field. Later, it was used for dynamic ordinal regression [95]. This was done to constrain H-CORF parameters to lie on the ordinal manifold. The resulting model attained simultaneous dynamic recognition and intensity estimation of facial expressions of multiple emotions. This method was used for both posed and spontaneous expressions. They also tested their model on databases such as BU-4DFE, CK, and CK+. All this research on the previously applied techniques and these facts provide the evidence that brings us to the conclusion that more research is required for the accurate detection of emotions with intensities.

#### *Current Challenges and Motivation*

Even with a higher accuracy of emotional intensity measurements, practical real-time systems face a lot of problems. These problems are primarily related to time resolution, low-level emotion recognition (facial expressions captured with low peak frequencies), scarcity of the available databases for research-related intensity measurement, face and angle variation, illumination variations, and non-alignment of the faces. The research involving facial emotion recognition is divided into two categories: (i) image-based and (ii) video-based. Image-based recognition uses static frames for the recognition, while the video-based method includes dynamic frames for recognition. A lot of work is needed to address the problems mentioned above. Low-level emotion recognition is one such problem, since measuring the facial intensity of emotion is directly related to human–machine interaction (HMI), where robots can interact more naturally if they know the intensity of the emotion of the person they are communicating with. Security, surveillance, biometrics, and patient monitoring are a few other problems that have not been explored much. Majority of the work concentrates on the emotions depicted at the peak-level of the intensity, neglecting the emotions depicted at lower intensity levels as they are comparatively difficult to recognize. Measuring the low-level frame intensity is a challenge as the available databases lack the discriminating features.

Comprehensive standards were designed by Paul Ekman and Friesen to subdivide each emotion into several special AUs which is popularly known as the Facial Action Coding Units (FACS). These categorizations are considered to be a gold standard for emotion detection and recognition systems. An insight into the relevant literature in the field reveals that the examination of the accuracy of intensity predictions of 5 basic emotions from spontaneous/posed facial expressions has been highly ignored. Besides, the attention has been primarily paid to the examination of the factors that affect the perception of the category of emotion (the one that leads to expression analysis such as happiness/sadness/fear/anger), neglecting the intensity. Naturally, this does not mean that research on the accuracy of intensity prediction is less critical. With advances in technology, providing an input image and classifying it into one of the six emotions (5 basic, and one neutral) is proving to be insufficient. Hence, more attention needs to be given to the intensity of the five basic emotions to judge and understand human emotions for HMI, medical, and biometrical applications. Over the years, psychological research has shown that understanding the dynamics of the expressions is equally essential to understand human emotion. In other words, emotion dynamics is primarily related to emotion intensity variations in spatial and temporal domains. The future where robots successfully understand human emotions through intensity is far away since much work remains to be done. This underlying problem was the primary motivation to follow the AUs and their intensity to solve futuristic issues.
