**4. Results**

#### *4.1. Automatic ROI Location*

The performance of the proposed method using trained expert criteria and error probability to accurately detect face landmarks was validated on 11 children through two sets of 220 annotated thermal frames by the trained expert, each one obtained for the following two conditions: (1) Baseline, (2) Test (see Section 3.1). Figure 5a,b shows that our proposal significantly improved the ROI placements regarding trained expert criteria. For both conditions, the proposed method closely agreed (errors lower than 10 pixels) with the trained expert, although the children trended to make abrupt face movements for the second condition, as they may have go<sup>t</sup> surprised by the uncovered robot. However, the Viola-Jones algorithm, non-assisted by the error probability, significantly disagreed with the trained expert, such as shown in Figure 5a,b. Notice that Viola-Jones presented the highest error locating both **R**<sup>10</sup> and **R**11, which corresponds to the right and left chin sides, respectively. This undesirable mistakes may have been caused by mouth movements during talking or by facial expressions, such as happiness and surprise. As a highlight, our proposal takes into account the best located ROI, **R**<sup>1</sup> or **R**<sup>5</sup> for Baseline, and **R**<sup>3</sup> for Test, which reduced the location error as much as possible, such as shown in Figure 5a,b, respectively. It is worth mentioning that our approach for ROI replacements uses the probability error to select (taking into account the manual annotations of a trained expert) that ROI better located during the first stage for automatic ROI locations by applying Viola-Jones. Then, this selected ROI is used to relocate the other neighbor ROIs. Therefore, the first stage using Viola-Jones to detect the three initial ROIs is quite decisive.

Figure 6 shows, for a child, the ROI placement using the automated Viola-Jones algorithm (green), the recalculated algorithm (blue) and the manual placement (yellow). In Ref. [17,21], the authors demonstrated that face regions (see Figure 4) linked to the set of branches and sub-branches of vessels that innervate the face are decisive to study emotions, as the skin temperature variations over these regions can be measured by IRTI. Figure 6 shows that our proposal based on probability error can be used to supervise automatic methods for ROI locations, in order to improve the appearance feature extraction over detected ROIs and, consequently, increase the performance during the emotion recognition.

**Figure 5.** Comparison between Viola-Jones (without applying ROI relocations) and Viola-Jones applying ROI relocations, computing the mean and standard error per ROIs: (**a**) analysis from Baseline; (**b**) analysis from Test. n.s means no significant difference (*p* > 0.05), while \* (*p* < 0.05), \*\* (*p* < 0.01), \*\*\* (*p* < 0.001) and >\*\*\* (*p* < 0.0001) indicate significant difference. **R**1, right forehead side; **R**2, left forehead side; **R**3, right periorbital side; **R**4, left periorbital side; **R**5, tip of nose; **R**6, right cheek; **R**7, left cheek; **R**8, right perinasal side; **R**9, left perinasal side; **R**10, right chin side; **R**11, left chin side.

**Figure 6.** Comparison between manual placement (in yellow, used as reference), Viola-Jones algorithm (green) and Viola-Jones with our replacement algorithm (blue). Such as shown, our replacement algorithm obtained better results than using only Viola-Jones algorithm.

#### *4.2. Emotion Recognition*

A database with 28 typically developing children was used to analyze the performance of the proposed system for five emotions recognition [17]. This database contains appearance feature vectors that were computed on eleven face ROIs (see Figure 4), whose correct placements were carefully verified by a trained expert. Then, the effectiveness of PCA plus other classifiers for five emotions classification was evaluated here, being PCA a popular adaptive transform that under controlled head-pose and imaging conditions may be useful to capture features of expressions efficiently [9]. Figure 7a shows the average performance by applying PCA for different principal components plus LDA classifier based on full covariance matrix, being 60 components sufficient for five emotions recognition. Notice that non-significant difference (*p* < 0.05) using more than 60 components was achieved. Similarly, non-significant difference was obtained using PCA with 60 components plus other classifiers, such as LDA using full (Linear) or diagonal (Diag Linear) covariance matrices, and Linear SVM, such as shown in Figure 7b. For instance, LDA using full and diagonal covariance matrices achieved Kappa values of 81.84 ± 1.72% and 77.99 ± 1.74%, respectively, and 78.58 ± 1.83% for Linear SVM. LDA based on full covariance matrix has low-computational cost and can be easily embedded into the N-MARIA hardware for on-line emotion recognition during the child-robot interaction. Moreover, PCA and LDA have been successfully used in other similar studies [16,17,20,23,36], achieving promising results. Then, we selected PCA with 60 principal components plus LDA based on full covariance matrix as the best setup for five emotions recognition.

**Figure 7.** Performance of classification of five emotions applying principal component analysis (PCA) for dimensionality reduction. (**a**) Average accuracy and Kappa obtained by applying PCA for different principal components plus Linear Discriminant Analysis for classification; (**b**) average Kappa achieved for 60 principal components plus other classifiers.

Consequently, a comparison between PCA using 60 components and FNCA was carried out to recognize five emotions of the 28 typically developing children, using LDA based on full covariance matrix. Figure 8 and Tables 3 and 4 show that both PCA and FNCA methods are interchangeable for dimensionality reduction, achieving mean ACC of 85.29% and 85.75%, respectively.

**Table 3.** Performance of the system for five emotions recognition using FNCA plus LDA based on full covariance matrices.


ACC, accuracy; TPR, true positive rate; FPR, false positive rate.

**Figure 8.** Performance of classification of five emotions applying principal component analysis (PCA) and Fast Neighbourhood Component Analysis (FNCA) for dimensionality reduction.

**Table 4.** Performance of the proposed system for five emotions recognition using PCA with 60 components plus LDA based on full covariance matrices.


ACC, accuracy; TPR, true positive rate; FPR, false positive rate.

It is worth noting that PCA improved the performance for subject S06 (ACC of 83.27% and Kappa of 78.10% for PCA, and ACC of 73.11% and Kappa of 65.09% for FNCA), such as shown in Figure 8a,b. This improvement may be consequence of the PCA robustness for feature extraction under controlled head-pose and imaging conditions [9]. Also, notice that for subject S22 the lowest performance was achieved by applying PCA (ACC of 49.27% and Kappa of 36.75%) and FNCA (ACC of 49.98% and Kappa of 37.78%). However, ACC values higher than 85.74% were achieved for a total of 14 children by applying PCA or FNCA, being the highest ACC of 99.78% for subject S26 using PCA. Labelling feature vectors of emotion patterns is a challenge, mainly when labels are assigned over periods of time, as some delays between both perception and manual label assignment may exist. Similarly, a visual stimulus may produce different facial emotions, thus, the manual labeled procedure is subjective. Then, an automatic method for feature set labelling may be suitable to obtain reliable labels and therefore a classification model for LDA.

On the other hand, Figure 8c shows that four emotions (disgust, fear, happiness, and surprise) were successfully recognized by applying both PCA and FNCA, achieving TPR > 85%, whereas sadness was classified with less sensitivity (TPR of 74.69% and 76.15% for FNCA and PCA, respectively) by the proposed system. As a highlight, low values of FPR (≤5.40%) were obtained to recognize all studied

emotions. In general, the results achieved show that PCA and FNCA are interchangeable to obtain the features that increase the emotion discrimination by discovering those regions that are informative in terms of expressions [9]. Similar to other studies [52], the sadness emotion was less recognized (ACC < 77%) than the other emotions, as each child expressed sadness in a different manner, producing a large intra-class variability for sadness.

Many researchers [1] have conducted studies to infer facial emotions over visual images using only three out of the six basic expressions, specifically happiness, surprise and sadness, as these emotions are easier to be recognized for humans, and have been successfully used to generate cohesive expressions across participants. Similarly, other studies using visual images report that the recognition rates for expressions as fear or disgust can be very low, in a range from 40 to 50%. As a highlight, five basic emotions were recognized by our proposed system based on infrared camera, achieving the highest performance for both disgust (TPR of 89.93%) and fear (TPR of 88.22%), followed by happiness (TPR of 86.36%) and surprise (TPR of 87.14%), such as shown in Figure 8c and Table 4. In agree with other studies [17,21], our results show that thermal images are promising for facial emotion analysis and recognition.

Similarly, this database was used as a training set of our recognition system to infer emotions on 17 children, while they firstly remained in front of N-MARIA covered by a period of time, and after it was uncovered. Figures 9a–d and 10a,b show inferred emotions by our system for the seventeen children during the Baseline and Test experiments, being happiness and surprise the most frequent outputs. Emotions such as disgust, fear, and sadness were few inferred by the system, which agree with what the children reported, such as shown in Figure 10c.

**Figure 9.** Automatic emotion recognition over unknown patterns. (**<sup>a</sup>**,**b**) are inferred emotions obtained during the Baseline phase of the experiment; (**<sup>c</sup>**,**d**) are emotion decisions achieved for the Test phase of the experiment.

**Figure 10.** Summary of children emotions before and after they saw the robot for the first time. (**a**) and (**b**) are inferred emotions by the recognition system; (**c**) emotions reported by the children.

Figure 10c shows that a neutral emotion was reported by more children for N-MARIA covered, while they felt surprise and happiness seeing N-MARIA by first time. In contrast, happiness and surprise were slightly more inferred after having uncovered the robot for the children, which also agreed with what the children reported. It is worth noting that emotions may vary among children for a same visual stimulus, such as a social robot. For this reason, a single indicator for evaluation is not sufficient to interpret accurately the responses of children to a robot while they are interacting. Then, a questionnaire was answered by each child, reporting their emotions during both Baseline and Test (see Section 3.1). However, self-assessments have problems with validity and corroboration, as described in Ref. [1], as participants might report differently from how they are actually thinking or feeling. Then, it is not trivial to attribute the children responses to their true behaviours. Similarly, the emotion produced by each child may be conditioned, as they know that they are being observed by the robot, parent, and the specialist conducting the experiment. Also notice that an existent database of 28 typically developing children (age: 7–11 years) was used in our study to train our proposed system to infer one of five basic emotions (disgust, fear, happiness, sadness, surprise) produced by 17 children during the interaction with N-MARIA. However, visual and thermal image processing may be affected due to the quality of the input image, as it depends of the illumination conditions and the distance between child and robot during the interaction.
