**1. Introduction**

Emotions play an essential role in rational decision-making, perception, learning and a variety of other functions that affect both human physiological and psychological status [1]. Therefore, understanding and recognising emotions are very important aspects of human behaviour research. To study human emotions, affective states need to be evoked in laboratory environments, using elicitation methods such as images, audio, videos and, recently, virtual reality (VR). VR has experienced an increase in popularity in recent years in scientific and commercial contexts [2]. Its general applications include gaming, training, education, health and marketing. This increase is based on the development of a new generation of low-cost headsets which has democratised global purchases of head-mounted displays (HMDs) [3]. Nonetheless, VR has been used in research since the 1990s [4]. The scientific interest in VR is due to the fact that it provides simulated experiences that create the sensation of being in the real world [5]. In particular, environmental simulations are representations of physical environments that allow researchers to analyse reactions to common concepts [6]. They are especially important when what they depict cannot be physically represented. VR makes it possible to study these scenarios under controlled laboratory conditions [7]. Moreover, VR allows the time- and cost-effective isolation and modification of variables, unfeasible in real space [8].

#### *1.1. Virtual Reality Set-Ups*

The set-ups that display VR simulations have been progressively integrated into studies as the relevant technologies have evolved. These consist of a combination of three objective features, formats, display devices and user interfaces.

The format describes the structure of the information displayed. The most common are two-dimensional (2D) multimedia and three-dimensional (3D) environments, and the main di fference between them is their levels of interactivity [9]. 2D multimedia, including 360◦ panoramic images and videos, provide non-interactive visual representations. The validity of this format has been extensively explored [10]. Moreover, the latest advances in computer-generated images simulate light, texture and atmospheric conditions to such a degree of photorealism that it is possible to produce a virtual image that is indistinguishable, to the naked eye, from a photograph of a real-world scene [11]. This format allows scientists to test static computer-generated environments, with many variations, cheaply and quickly in a laboratory. On the other hand, 3D environments generate interactive representations which allow changes in the user's point of view, navigation and even interaction with objects and people [12]. Developing realistic 3D environments is more time consuming than developing 360◦ computer-generated photographs, and their level of realism is limited by the power of the hardware. However, the processing potency of GPUs (graphics processing units) is increasing every year, which will enhance the performance of 3D environments. Moreover, the interaction capacity of 3D environments, which facilitates the simulation of real-world tasks, is a key aspect in the application of virtual reality [2].

The display devices are the technological equipment used to visualise the formats. They are classified according to the level of immersion they provide, that is, the sensorimotor contingencies that they support. These are related to the actions that experimental subjects carry out in the perception process, for example, when they bend down and shift the position of their heads, and their gaze direction, to see underneath an object. Therefore, the sensorimotor contingencies supported by a system define a set of valid actions (e.g., turning the head, bending forward) that carry meaning in terms of perception within the virtual environment [13]. Since immersion is objective, one system is more immersive than another if it is superior in at least one characteristic while others remain equal. There are three categories of immersion system, non-immersive, semi-immersive and immersive [2]. Non-immersive systems are simpler devices which use a single screen, such as a desktop PC, to display environments [14]. Semi-immersive systems, such as the cave automatic virtual environment (CAVE), or the powerwall screen, use large projections to display environments on walls, enveloping the viewer [15,16]. These displays typically provide a stereo image of an environment, using a perspective projection linked to the position of the observer's head. Immersive devices, such as HMDs, are fully-immersive systems that isolate the user from external world stimuli [17]. These provide a complete simulated experience, including a stereoscopic view, which responds to the user's head movements. During the last two decades, VR has usually been displayed through desktop PCs or semi-immersive systems, such as CAVEs and powerwalls [18]. However, improvements in the performance and availability of the new generation of HMDs is boosting their use in research [19].

The user interfaces, which are exclusive to 3D environments which allow this level of interaction, are the functional connections between the user and the VR environment which allow him or her to interact with objects and navigate [20]. Regarding interaction with objects, manipulation tasks include: selection, that is, acquiring or identifying an object or subset of objects, positioning, that is, changing an object's 3D position, and rotation, that is, changing an object's 3D orientation. In terms of the navigation metaphors in 3D environments, virtual locomotion has been thoroughly analysed [21], and can be classified as physical or artificial. Regarding the physical, there are room-scale-based metaphors, such as real-walking, which allow the user to walk freely inside a limited physical space. These are normally used with HMDs, and position and orientation are determined by the position of the user's head. They are the most naturalistic of the metaphors, but are highly limited by the physical tracked area [22]. In addition, there are motion-based metaphors, such as walking-in-place or redirected walking. Walking-in-place is a pseudo-naturalistic metaphor where the user performs a virtual locomotion to navigate, for example, by moving his/her hands as if (s)he was walking, or by performing footstep-like movements, while remaining stationary [23]. Redirected walking is a technique where the user perceives (s)he is walking freely but, in fact, is being unknowingly manipulated by the virtual

display: this allows navigation in an environment larger than the actual tracked area [24]. Regarding the artificial, controller-based metaphors allow users to control their movements directly through joysticks or similar devices, such as keyboards and trackballs [25]. In addition, teleportation-based metaphors allow the user to point where (s)he wants to go and teleport him or her there with an instantaneous "jump" [26]. Moreover, recent advancements in the latest generation HMD devices have increased the performance of navigation metaphors. Point-and-click teleport metaphors have become mainstream technologies implemented in all low-cost devices. However, other techniques have also increased in performance: walking-in-place metaphors have become more user-friendly and robust, room-scale-based metaphors now have increased coverage areas, provided by low-cost tracking methods, and controller-based locomotion now addresses virtual sickness through effective, dynamic field-of-view adjustments [27].

#### *1.2. Sense of Presence*

In addition to the objective features of the set-up, the experience of users in virtual environments can be measured by the concept of presence, understood as the subjective feeling of "being-there" [28]. A high degree of presence creates in the user the sensation of physical presence and the illusion of interacting and reacting as if (s)he was in the real world [29]. In the 2000s, the strong illusion of being in a place, in spite of the sure knowledge that one is not actually there, was characterised as "place illusion" (PI), to avoid any confusion that might be caused by the multiple meanings of the word "presence". Moreover, just as PI relates to how the world is perceived, and the correlation of movements and concomitant changes in the images that form perceptions, "plausibility illusion" (PsI) relates to what is perceived, in a correlation of external events not directly caused by the participant [13]. PsI is determined by the extent to which a system produces events that directly relate to the participant, and the overall credibility of the scenario being depicted in comparison with viewer expectations, for example, when an experimental participant is provoked into giving a quick, natural and automatic reply to a question posed by an avatar.

Although presence plays a critical role in VR experiences, there is limited understanding of what factors affect presence in virtual environments. However, there is consensus that exteroception and interoception factors affect presence. It has been shown that exteroception factors, such as higher levels of interactivity and immersion, which are directly related to the experimental set-up, provoke increased presence, especially in virtual environments not designed to induce particular emotions [30–32]. As to the interoception factors, which are defined by the content displayed, participants will perceive higher presence if they feel emotionally affected; for example, previous studies have found a strong correlation between arousal and presence [33]. Recent research has also analysed presence in specific contexts and suggested that, for example, in social environments, it is enhanced when the VR elicits genuine cognitive, emotional and behavioural responses, and when participants create their own narratives about events [34]. On the other hand, presence decreases when users experience physical problems, such as cybersickness [35].

#### *1.3. Virtual Reality in Human Behaviour Research*

VR is, thus, proposed as a powerful tool to simulate complex, real situations and environments, offering researchers unprecedented opportunities to investigate human behaviour in closely controlled designs in controlled laboratory conditions [33]. There are now many researchers in the field, who have published many studies, so a strong, interdisciplinary community exists [2].

Education and training is one field where VR has been much applied. Freina and Ott [36] showed that VR can offer grea<sup>t</sup> educational advantages. It can solve time-travel problems, for example, students can experience different historical periods. It can address physical inaccessibility, for example, students can explore the solar system in the first person. It can circumnavigate ethical problems, for example, students can "perform" serious surgery. Surgical training is now one of the most analysed research topics. Interventional surgery lacked satisfactory training methods before the advent of VR, except learning on real patients [37]. Bhagat, Liou and Chang [38] analysed improvements in military training. These authors suggested that cost-e ffective 3D VR significantly improved subjects learning motivation and outcomes and provided a positive impact on their live-firing achievement scores. In addition, besides enhancements in cost-e ffectivity, VR o ffers a safe training environment, as evidenced by the extensive research into driving and flight simulators [39,40]. Moreover, de-Juan-Ripoll et al. [41] proposed that VR is an invaluable tool for assessing risk-taking profiles and to train in related skills, due to its transferability to real-world situations.

Several researchers have also demonstrated the e ffectiveness of VR in therapeutic applications. It o ffers some distinct advantages over standard therapies, including precise control over the degree of exposure to the therapeutic scenario, the possibility of tailoring scenarios to individual patients' needs and even the capacity to provide therapies that might otherwise be impossible [42]. Taking some examples, studies using VR have analysed the improvement in the training in social skills for persons with mental and behavioural disorders, such as phobias [43], schizophrenia [44] and autism [45]. Lloréns, Noé, Colomer and Alcañiz [46] showed that VR-based telerehabilitation interventions promoted the reacquisition of locomotor skills associated with balance, in the same way as in-clinic interventions (both complemented with conventional therapy programmes). Moreover, it has been proposed as a key tool for the diagnosis of neurodevelopmental disorders [47].

In addition, VR has been applied transversally to many fields, such as architecture and marketing. In architecture, VR has been used as a framework within which to test the overall validity of proposed plans and architectural designs, generate alternatives and conceptualise learning, instruction and the design process itself [48]. In marketing, it has been applied in the analysis of consumer behaviour in laboratory-controlled conditions [49] and as a tool to develop emotionally engaging consumer experiences [50].

One of the most important topics in human behaviour research is human emotions, due to the central role that they play in many background processes, such as perception, decision-making, creativity, memory and social interaction [51]. Given the presence that VR provokes in users, it has been suggested as a powerful means of evoking emotions in laboratory environments [8]. In one of the first confirmatory studies into the e fficacy of immersive VR as an a ffective medium, Baños et al. [30] showed that emotion has an impact on presence. Subsequently, many other similar studies showed that VR can evoke emotions, such as anxiety and relaxation [52], positive valence in obese children taking exercise [53], arousal in natural environments, such as parks [54], and di fferent moods in social environments featuring avatars [55].

#### *1.4. The Validity of Virtual Reality*

Finally, it is crucial to point out that the usefulness of simulation in human behaviour research has been analysed through the validity concept, that is, the capacity to evoke a response from the user in a simulated environment similar to one that might be evoked by a physical environment [56]. Thus, there is a need to perform direct comparisons between virtual and real environments. Some comparisons have studied the validity of virtual environments by assessing psychological responses [57] and cognitive performance [58]. However, there have been fewer analyses of physiological and behavioural responses [59,60]. Heydarian et al. analysed user performance in o ffice-related activities, for example, reading texts and identifying objects, and found that the participants performed similarly in an immersive virtual environment setting and in a benchmarked physical environment for all of the measured tasks [61]. Chamilothori, Wienold, and Andersen compared subjective perceptions of daylit spaces, and identified no significant di fferences between the real and virtual environments studied [62]. Kimura et al. analysed orienteering-task performance, where participants in a VR room showed less facility, suggesting that caution must be applied when interpreting the nuances of spatial cue use in virtual environments [63]. Higuera-Trujillo, <sup>L</sup>ópez-Tarruella, and Llinares analysed psycho-physiological responses, through electrodermal activity (EDA), evoked by real-world and VR scenarios with di fferent immersion levels, and demonstrated correlations in the physiological

dynamics between real-world and 3D environments [64]. Marín-Morales et al. analysed the emotional responses evoked in subjects in a real and a virtual museum, and found no self-assessment differences, but did find differences in brain dynamics [65]. Therefore, further research is needed to understand the validity of VR in terms of physiological responses and behavioural performance.

#### *1.5. Implicit Measures and the Neuroscience Approach*

Traditionally, most theories of human behaviour research have been based on a model of the human mind that assumes that humans can think about and accurately verbalise their attitudes, emotions and behaviours [66]. Therefore, classical psychological evaluations used self-assessment questionnaires and interviews to quantify subjects' responses. However, these explicit measures have been demonstrated to be subjective, as stereotype-based expectations can lead to systematically biased behaviour, given that most individuals are motivated to be, or appear to be, nonbiased [67]. The terms used in questionnaires can also be differentially interpreted by respondents, and the outcomes depend on the subjects possessing a wide knowledge of their dispositions, which is not always the case [68].

Recent advances in neuroscience show that most of the brain processes that regulate our emotions, attitudes and behaviours are not conscious. In contrast to explicit processes, humans cannot verbalise these implicit processes [69]. In recent years, growing interest has developed in "looking" inside the brain to seek solutions to problems that have not traditionally been addressed by neuroscience. Thus, neuroscience offers techniques that can recognise implicit measurements not controlled by conscious processes [70]. These developments have provoked the emergence in the last decades of a new field called neuroeconomics, which blends psychology, neuroscience and economics into models of decision-making, rewards, risks and uncertainties [71]. Neuroeconomics addresses human behaviour research, in particular the brain mechanisms involved in economic decision-making, from the point of view of cognitive neuroscience, using implicit measures.

Several implicit measuring techniques have been proposed in recent years. Some examples of their applications in human behaviour research are: heart rate variability (HRV) has been correlated with arousal changes in vehicle drivers when detecting critical points on a route [72], electrodermal activity (EDA) has been used to measure stress caused by cognitive load in the workplace [73], electroencephalogram (EEG) has been used to assess engagemen<sup>t</sup> in audio-visual content [74], functional magnetic resonance imaging (fMRI) has been used to record the brain activity of participants engaged in social vs. mechanical/analytic tasks [75], functional near-infrared spectroscopy (fNIRS) has been used as a direct measure of brain activity related to decision-making processes in approach-avoidance theories [76], eye-tracking (ET) has been used to measure subconscious brain processes that show correlations with information processing in risky decisions [77], facial expression analysis (FEA) has been applied to detect emotional responses in e-learning environments [78] and speech emotion recognition (SER) has been used to detect depressive disorders [79]. Table 1 gives an overview of the implicit measuring techniques that have been used in human behaviour research.



In addition, recent studies have highlighted the potential of virtual reality environments for enhancing ecological validity in the clinical, a ffective and social neurosciences. These studies have usually involved the use of simple, static stimuli which lack many of the potentially important aspects of real-world activities and interactions [90]. Therefore, VR could play an important role in the future of neuroeconomics by providing a more ecological framework within which to develop experimental studies with implicit measures.

#### *1.6. A*ff*ective Computing and Emotion Recognition Systems*

A ffective computing, which analyses human responses using implicit measures, has developed into an important field of study in the last decades. Introduced by Rosalind Picard in 1997, it proposed the automatic quantification and recognition of human emotions as an interdisciplinary field based on psychophysiology, computer science, biomedical engineering and artificial intelligence [1]. The automatic recognition of human emotion statements using implicit measures can be transversally applied to all human behaviour topics and complement classic explicit measures. In particular, it can be applied to neuroeconomic research as they share the same neuroscientific approach of using implicit measures, and due to the important relationship that has been found between emotions and decision-making [71]. Emotion recognition models can be divided into three approaches: emotional modelling, emotion classification and emotion elicitation.

The emotional modelling approach can be divided into the discrete and the dimensional. Discrete models characterise the emotion system as a set of basic emotions, which includes anger, disgust, fear, joy, sadness and surprise, and the complex emotions that result from combining them [91]. On the other hand, dimensional models propose that emotional responses can be modelled in a multidimensional space where each dimension represents a fundamental property common to all emotions. The most commonly used theory is the circumplex model of a ffect (CMA), which proposes a three-dimensional space consisting of: valence, that is, the degree to which an emotion is perceived as positive or negative, arousal, that is, the intensity of the emotion in terms of activation, from low to high, and dominance, which ranges from feelings of total lack of control or influence on events and surroundings to the opposite extreme of feeling influential and in control [92].

A ffective computing uses biometric signals and machine-learning algorithms to classify emotions automatically. Many signals have been used, such as voice, face, neuroimaging and physiological [93]. It is noteworthy that one of the main emotion classification topics uses variables associated with central nervous system (CNS) and autonomic nervous system (ANS) dynamics [93]. First, human emotional processing and perception involve cerebral cortex activity, which allows the automatic classification of emotions using the CNS. EEG is one of the techniques most used in this context [94]. Second, many emotion recognition studies have used the ANS to analyse the changes in cardiovascular dynamics provoked by mood changes, where HRV and EDA are the most used techniques [95]. The combination of physiological features and machine-learning algorithms, such as in support vector machines, linear discriminant analysis, K-nearest neighbour and neural networks, has achieved high levels of accuracy in inferring subjects' emotional states [96].

Finally, emotion elicitation is the ability to reliably and ethically elicit a ffective states. This elicitation is a critical factor in the development of systems that can detect, interpret and adapt to human a ffect [97]. The many methods that elicit emotions in laboratories can be mainly divided into two groups, active and passive. Active methods involve directly influencing subjects, including behavioural manipulation [98], social interaction [99] and dyadic interaction [100]. Passive methods usually present external stimuli, such as images, sound or video. As to the use of images, the International A ffective Picture System (IAPS) is among the databases most used as an elicitation tool in emotion recognition methodologies [95]. This includes over a thousand depictions of people, objects and events, standardised on the basis of valence and arousal [97]. As to audio, the International A ffective Digitalised Sound System (IADS) database is the most commonly applied in studies which use sound to elicit emotions [101]. However, some studies directly use music or narrative to elicit emotions [102]. With respect to

audio-visual stimuli, many studies have used film to induce arousal and valence [103]. These emotion elicitation methods have two important limitations. The set-ups used, mostly screens, are non-immersive devices, which provoke only a low level of presence in subjects [30]. Therefore, the stimuli do not evoke in the subjects a feeling of "being there", which is needed to analyse emotions in simulated real-world situations. In addition, the stimuli are non-interactive, so they do not allow the subjects to intervene in the scene, which would open the possibility to recognise emotional states during interactive tasks. These limitations can be overcome by using immersive VR as a new emotion elicitation method. Since the year 2000, VR has increasingly been used as affective stimulation, however the majority of the studies undertaken have applied classic statistical methods, such as hypotheses testing and correlation, to analyse subjects' physiological responses to different emotions [104]. However, in recent years, some research has started to apply affective computing paradigms with VR as the emotion elicitation method, combining implicit measures with machine-learning methods to develop automatic emotion recognition models [105].

This paper provides a systematic review of the literature on the use of head-mounted displays in implicit measure-based emotion recognition research, and examines the evolution of the research field, the emotions analysed, the implicit techniques, the data analysis, the set-ups and the validations performed.

#### **2. Materials and Methods**
