1. Introduction
Autism spectrum disorder (ASD) is a lifelong, neurodevelopmental disorder that can occur to different degrees and in a variety of forms [
1]. The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM V) states that persons with ASD show deficits in social-emotional reciprocity, ranging from an abnormal social approach and failure of normal back-and-forth conversation to a total lack of initiation of social interaction [
2]. The social communication challenge that faces individuals with autism is rooted partially in emotion recognition (ER) deficits that underline the difficulties in processing and interpreting socio-emotional cues [
3]. Individuals with autism demonstrate impairment in social cognition that includes the identification of facial expression, face recognition, discrimination of faces, and memory for faces. As a result, individuals with autism often demonstrate increased stress and anxiety, abnormal perception of faces, and impaired processing of emotions [
4].
This paper is part of the work of the Erasmus+ project EMBOA:
Affective loop in Socially Assistive Robotics as an intervention for Children with Autism (
https://emboa.eu/, accessed on 20 December 2021). The project aims to implement, evaluate, and develop guidelines into the feasibility of applying emotion recognition technologies in robot-supported intervention for children with autism in order to create an affective loop in child–robot interactions. The project combines three domains: autism therapy, social robots, and automatic emotion recognition [
5]. Having said this to establish the context and motivation of the presented study, in this paper, we focus on automatic emotion recognition applied on children with autism, not only in the context of child–robot interaction applications, but with the purpose of robots perceiving emotions in mind.
The purpose of this paper is to report the results of a systematic literature review aimed at exploration of the state of the art in the automatic emotion recognition technologies applied to recognizethe emotions of children with autism. To be more precise, in the field of interest there are those studies which show how to automatically recognize emotions felt by the autistic children, not the capacity of children to recognize emotions in others. Following [
6] we understand automatic emotion recognition as an interdisciplinary research field which deals with the algorithmic detection of human affect, e.g., anger or sadness, from a variety of sources, such as speech or facial gestures.
There are three studies (literature reviews) that we follow in our study [
7,
8,
9]. In the study by Kowallik A. E. and Schweinberger S. R. [
7], the authors review papers related to sensor-based social information processing. They focus on studies that use sensors to identify (diagnose) autism and to support intervention. The study does not focus specifically on emotion recognition (only three of the mentioned intervention papers are related to emotion recognition), although the listed modalities are the ones used in automatic recognition as well. In the study by Chaidi I. and Drigas A. [
8], the authors present a literature review on both the expression and understanding of emotions in autism. They refer to the perception of emotions by children with autism rather than to the recognition of the emotions of the children. The study by Rashidan et al. [
9] focuses on emotion recognition applied on children with autism. The study raises research questions regarding the stimuli used and the method of feature extraction, and might be considered as complementary to ours. Moreover, the reviews do not report challenges and recommendations for using the emotion recognition technologies, while this is an important part of our study. In our paper, whenever we refer to emotion recognition, we mean the automatic one, applied to recognizing emotion in children.
This paper presents a systematic literature review (SLR [
10,
11]) of automatic emotion recognition in children with autism. The paper uses the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) [
12] standard for reporting the study and is organized as follows.
Section 2 describes the research methods used and the systematic literature review execution.
Section 3 reports the quantitative and qualitative results. The results are followed by a discussion of research validity and an outline of challenges that future works might address.
2. Methods
A systematic literature review was used in the study as a methodological approach for capturing the state of the art in the domain of interest. The systematic method was chosen as the study aimed at finding key studies and performing the review with transparency and rigour that would allow replicating the study [
10,
11]. According to the PRISMA approach, the following steps were performed: setting up the research questions, defining the keywords, search string, and the inclusion and exclusion criteria, deciding on the search engines, performing the data extraction, a multiple-phase selection based on quality criteria and research questions, the final selection of papers and snowballing technique, and the extraction of the key findings. The design of the study is described in detail in
Section 2.1,
Section 2.2,
Section 2.3,
Section 2.4 and
Section 2.5.
2.1. Research Questions
In the study, we aimed at the identification of the key previous studies, covering all aspects, technical and psychological, related to automatic emotion recognition applied on children with autism. We finished up with four research questions:
RQ1: What emotions are recognized in children with autism in these studies?
RQ2: Which observation channels are used in emotion recognition in children with autism?
RQ3: Which techniques are used in emotion recognition in children with autism?
RQ4: What techniques are used for multimodal recognition?
Regarding research question 1, the question covers the issues of distinct emotions that are automatically tracked in autism-related studies. The question also spans over the issue of emotion representation for the use of emotion recognition—whether they are yes–no labels or have a scale assigned. There are several popular models for the representation of emotions in affective computing [
13]. The first one is Ekman’s model of basic emotions (happiness, anger, fear, sadness, surprise, and disgust), sometimes expanded with a neutral state if none of the six occur. In this model, the emotions could be treated as discrete (yes–no) or continuous [
14,
15]. Another popular model is a valence–arousal dimensional model of emotions, which represents an emotional state as a point in the plane of valence (positive–negative scale) and arousal (active–passive scale) [
16]. Answering research question 1, we want to determine if those models are used or what alternatives are proposed in studies involving children with autism.
Regarding RQ2, by observation channel we mean a type of signal holding information on observable symptoms of emotional state that was used for emotion recognition. From the observation channels, one might extract modalities, e.g., facial expressions, body posture, skin conductance, or eye fixation areas. Modality is a type of information on a specific observable symptom extracted from the signal that is further analyzed to estimate an emotional state. Please note that one recorded channel might bring several modalities to analyze (e.g., both facial expressions and voice might be extracted from the video channel). A single modality might be obtained from multiple channels (e.g., facial expressions might be obtained from video channels or electromyographic sensors placed on the face). The question addressed in this paper is which channels and which modalities are used in the automatic recognition of emotions in children with autism.
Regarding RQ3 and RQ4, by technique we mean a technical data-processing method for the extraction of emotional state. One might find machine-learning algorithms, such as support vector machine (SVM) or neural networks, among frequent techniques for emotion recognition. Research question RQ4 was added because multimodal emotion recognition is frequently used to obtain more reliable and accurate results, and diverse fusion/integration mechanisms are applied. Therefore, in this study, we want to identify the recognition and fusion methods used in emotion recognition in children with autism.
2.2. Keywords
The keywords defined were grouped into the cluster related to emotions (emotion; affective; emotional; mood; affect; expression), related to children (children; child; young), and related to autism (autism; ASD; autism spectrum disorder; autism spectrum; ASC; autism spectrum condition; autistic; pervasive disorder). The final search query appears as follows:
(emotion OR affective OR emotional OR
mood OR affect OR expression)
AND (children OR child OR young)
AND (autism OR ASD OR ASC OR autistic
OR ‘‘pervasive disorder’’)
Please note that with a keyword “autism” we also cover phrases “autism spectrum disorder”, “autism spectrum”, and “autism spectrum condition”.
2.3. Inclusion/Exclusion Criteria
We settled on including in the SLR only original research and review papers written in English and published in journals or conference proceedings. We agreed to exclude papers written in other languages from the other phase of SLR for those databases where no language filter was available. Since emotion recognition in children with ASD is relatively new, and since we are interested in all psychological discoveries related to emotions in autism, no constraints on the publication date were made. We also decided to exclude short communications. These are our report eligibility criteria.
In addition to research questions and report eligibility criteria, we defined study eligibility criteria for studies to be included in the further stage of the SLR. We required that a paper concerns emotion recognition, and that there were children with ASD involved in the study. However, we also agreed to include some papers that do not satisfy these criteria (in the qualitative analysis only) if they brought some value, such as a well-described challenge or a guideline.
2.4. Search Engines and Search Strings
We decided to use seven scientific databases: ACM Digital Library, Elsevier Science Direct, IEEE Xplore, Scopus, SpringerLink, Web of Science, and PubMed. The first ones are the most popular scientific databases in technical sciences. The last one is important for scientists in medical and psychological domains and we chose to include it.
In the beginning, we performed an initial analysis to decide which search field we should use to obtain a feasible number of records for the SLR. We considered three options, i.e., all fields, title only, and topic/keywords only. Please note that some engines provide search by topic and some by keywords, which are not exactly the same, but were treated as equivalent in the study. The search was performed in two rounds—the first one dates October and November 2019 on technical databases and in January 2020 on PubMed, and the second round was performed in January 2022 to update the review with papers from years 2020 and 2021.
Each scientific database has its own search engine, resulting in a different query format. Thus, we had to slightly modify our search query to fit these requirements. For example, for IEEE Xplore, we had to add the field name to each keyword. The query for a title of a scientific paper is shown below.
(‘‘Document Title’’:emotion OR
‘‘Document Title’’:affective OR
‘‘Document Title’’:emotional OR
‘‘Document Title’’:mood OR
‘‘Document Title’’:affect OR
‘‘Document Title’’:expression)
AND (‘‘Document Title’’:children OR
‘‘Document Title’’:child OR
‘‘Document Title’’:young)
AND (‘‘Document Title’’:autism OR
‘‘Document Title’’:ASD OR
‘‘Document Title’’:ASC OR
‘‘Document Title’’:autistic OR
(‘‘Document Title’’:pervasive AND
‘‘Document Title’’:disorder))
Elsevier Science Direct allows the usage of only eight logical operators in a single query. Our search query contains fourteen ones. Therefore, we decided to split our query into six queries based on the first parenthesis and merge their results at the end of the searching process.
The number of obtained records for each database and search field in the initial search is presented in
Table 1. For Elsevier Science Direct, we present two numbers for the title field. The main one is the sum of the number of results for all six queries which arise after splitting the original one. This also applies to other search fields. The one in parentheses is the number of records after removing duplicates.
Finally, we decided to use the title field because 1565 records is feasible to analyze, in contradiction to multiple-times more results from other fields. However, we are aware that this could lead to publication bias in our study. Some papers could be missing because they have not fitted into our selection criteria. On the other side, some studies could be reported in multiple papers and, thus, counted multiple times in the quantitative analysis.
2.5. Papers Selection and Key Findings Extraction
We settled on a manual selection of papers for their relevance to the eligibility criteria by title only. The papers should be evaluated on a three-point scale: 0—irrelevant, 1—somehow relevant, 2—strongly relevant. Four independent investigators recruited from the authors of this paper performed the tagging. Further decisions were made based on the sum of scores for taggers. Papers that scored 8 were taken to the next stage automatically. Papers that scored less than 4 were excluded automatically. Other papers with a score of 4–7 underwent screening by the abstract procedure. The Fleiss kappa coefficient was used to determine inter-rater consistency of tagging. Papers that passed the screening phase were analyzed in detail. We assumed that we could add additional papers during reading by noting out highly relevant papers from the bibliography that did not occur in the primary list of papers (according to the snowballing technique procedure).
We prepared forms in spreadsheets to extract the key findings in emotion recognition in children with autism. We were interested in several research issues related to automatic emotion recognition, i.e., which emotions are recognized in children with autism, which channels and techniques are used for emotion recognition, and how multiple modalities are handled. We also planned to analyze demographic data, including the number of children in the study (with ASD or typically developing), their gender, and age. We also wanted to note all challenges, recommendations, and other relevant observations. We agreed that a tagger could extend the forms, i.e., by new emotions or modalities that were not considered at the beginning of this study.
After we had read the papers and filled in the forms, we extended some of them which had too little information and repeated the process of the key findings extraction. For example, we extended a spreadsheet with information about children by adding vocabulary (wording) describing children. In some papers, authors used the expression “child with autism” while in others, “autistic child” was used. This procedure of repetition was to avoid selective reporting bias.
We decided to analyze the data by means of quantitative and qualitative analysis. For quantitative analysis, we used a simple count measure. We decided to unify some columns from the final spreadsheets, e.g., we grouped some emotion names under labels from the two emotion models: Ekman’s basic emotions or two-dimensional ones.
5. Conclusions
The paper provides a systematic literature review for the challenge of automatic emotion recognition applied in studying and training children with autism. Over 2000 papers were initially extracted from 7 search engines, finally including 50 papers in a qualitative and 27 in a quantitative analysis.
The study reveals some observations regarding observation channels, modalities, and methods used for emotion recognition in children with autism. Qualitative analysis revealed important clues on participant group construction and the most common combinations of modalities and methods. The study might be of interest for researchers who apply emotion recognition or enhance methods for affect classification in autism-related studies.
This systematic literature review revealed a number of challenges related to the application of emotion recognition to studies on children on the autism spectrum. Some good practices were also identified. There are several implications of the findings, both for science and practical applications. Further works might analyze diverse stimuli in autism and perhaps create a stimuli set dedicated for that special group, adjusted to its reactivity type. Moreover, multimodal approaches seem not to be explored enough, and perhaps more studies would reveal more practical results. Other issues of concern in further studies include: (1) the participants’ group construction taking into account sex, developmental age, and level of functioning; (2) the mixed (compound) labelling approaches; (3) creating datasets for further research—with measurement and label data related only to children with autism, or (4) which emotions are of real interest in the analyzed interaction rather than starting with a basic emotions model.