1. Introduction
Automated emotion recognition (AEE) is a process of automatic recognition to human emotional responses by computers. It is an important research branch in the field of human-computer interaction (HCI) [
1]. The use of AEE has great potential in various intelligent systems, including education (students’ learning status assessment), marketing (customers’ feedback assessment), and mental health monitoring (patients’ emotional states adjustment) [
2]. Emotion recognition can be achieved through facial expression, body posture, voice and bio-signals, etc. [
3]. Among them, bio-signals can further objectively and truly express human emotions because of their non-subjective manipulation [
1,
4].
In recent years, with the development of virtual reality (VR) technology, emotion recognition based on multimodal bio-signals in VR environments has become a hot research topic [
5,
6]. VR creates an immersive virtual environment for users and allows them to experience the real feelings [
7]. It is very important for the research in the field of psychology, which is also considered as the most likely prospective research technique to replace real and laboratory environments. Up to the present, the music, pictures, videos and other non-immersive stimulating materials are used by most of researchers to evoke emotions. Moreover, it has been proved that VR scenes can evoke the emotion of users better [
8].
Bio-signals show excellent data consistency in emotion analysis because they represent an unfiltered immediate response to human emotions. At present, bio-signals used in emotion analysis mainly include electroencephalogram (EEG) [
9,
10,
11], photoplethysmogram (PPG) [
12,
13,
14], electrodermal activity (EDA) [
15,
16] and skin temperature(SKT) [
17,
18], etc. There are mainly three types of devices for acquiring bio-signals. One is the special medical devices, such as electroencephalograph, electrocardiograph, electromyography, pulse oximeter and thermometer. However, there is a problem of data synchronization in the combination of multiple devices. The second category is dedicated bio-signals measuring devices, such as Biopac MP150, Procomp Infiniti and Power Lab, etc. The advantage of these dedicated bio-signals measuring devices is that these can collect multiple bio-signals synchronously. The common drawbacks of these two types of devices are high price, cumbersome cables and time-consuming wearing.
With the development of wearable physiological detection, bio-signals sensors can already be built into terminals such as clothes, hats, shoes, gloves, beds and game handles for long-term users’ emotional perception and feedback. The Empatica company launched E4 smart watch to collect data for affective computing researchers [
19]. This watch can collect EDA, PPG and SKT at the wrist, and can also calculate pulse rate (PR) through PPG signal. The reference [
20] embedded dry electrodes EEG sensor and two eye-tracking cameras in the HTC Vive headset. It can synchronously record muli-channel EEG signals and eye movement image. Moreover, the authors of [
20] provided an emotion recognition interface for predicting users’ evaluations of attractiveness. While there are two disadvantages in the system, one is that it cannot acquire other peripheral bio-signals which is vital for emotion recognition. The other is that the EEG sensors module cannot be equipped with other VR or AR HMD on the market. A single dry-electrode wearable device NeuroSky MindWave was applied to collect prefrontal EEG signal to connect personality traits and emotional states in [
21]. The wearable devices Emotive EPOC and SHIMMER are used in [
22] to collect 14-channel EEG signals and one-channel PPG signal in five VR game scenes, respectively. The results suggested that the bio-signals acquired by low-cost wearable devices could be employed to recognize emotion, with high precision. All the above researches indicate that with the development of wearable technology, the effectiveness of the use for wearable devices to collect bio-signals for emotion recognition has proven. However, there are still some drawbacks in the above mentioned wearable devices for emotion recognition in VR environments. For example, none of the wearable devices mentioned above can synchronously collect emotion-related EEG signals and other peripheral bio-signals. In addition, these devices lack a unified interface to perform stimulus selection, data acquisition, and emotion modeling simultaneously.
In this paper, we propose a wearable forehead bio-signals acquisition platform called HMD Bio Pad which can be attached to most of the VR or AR HMD on the market using a velcro fastener. The HMD Bio Pad hardware consists of flexible sensors pad and bio-signals acquisition system. Flexible sensors pad is designed by flexible printed circuit board (PCB) which can be bent at will. The metal dry electrodes and sensors are placed on the flexible sensors pad. Bio-signals acquisition system is mainly responsible for signal conditioning, acquisition and transmission. Two-channel EEG signals, one-channel EDA signal, PPG signal and SKT are recorded simultaneously via Bluetooth when users ordinarily wear HMD Bio Pad. Wireless communication supports the portability of HMD Bio Pad. Then we conducted different experiments to evaluate the performances of EEG signal, EDA signal, PPG signal and SKT. Finally, we developed a user-friendly HCI interface for the researches in field of psychophysiology in VR environments. The major contributions of this work can be summarized as follows:
A wearable forehead bio-signals acquisition device called HMD Bio Pad is developed which has the advantages of portability, comfort and ease of wearing;
Using metal dry electrodes and attaching flexible sensors pad to HMD can greatly reduce experimental preparation time and improve the convenience of the system.
HMD Bio Pad can simultaneously collect EEG signals and other peripheral bio-signals, of which performances have been validated by different experiments;
A HCI interface is provided for researchers to perform stimulus selection, data acquisition, and emotion modeling simultaneously.
The remainder of this paper is organized as follows:
Section 2 describes the system overview.
Section 3 and
Section 4 introduce HMD Bio Pad hardware and software platform respectively. Evaluation of HMD Bio Pad is presented in
Section 5. Conclusions and future works are discussed in the last Section.
3. The Hardware Design of HMD Bio Pad
In this section, we give the description of the HMD Bio Pad hardware design. The main goal is to design a wearable device which collects multimodal bio-signals from forehead. The HMD Bio Pad hardware consists of flexible sensor pad and bio-signals acquisition system. They are connected by USB Type-C connector. The structure diagram of flexible sensor pad can be seen in
Figure 3a. One side of flexible printed circuit board (PCB) is covered with a breathable, skin-friendly leather material, and the other side is covered using a velcro fastener. The electrodes and sensors are placed on the flexible PCB, which are shown in
Figure 3b.
Bio-signals acquisition system consists of the following parts: EEG acquisition system, EDA acquisition system, SKT acquisition system, PPG acquisition system, switch, power management, microcontroller unit (MCU) and BLE wireless communication.
3.1. EEG Acquisition System
In recent years, numerous neurophysiological researches have been reported the correlations between EEG signals and emotions. Recent studies showed that the frontal scalp seems to store more emotional activation than other regions of brain [
23,
24]. The EEG asymmetry of the left and right hemispheres is an important reference feature to judge cognitive and affective disorders [
25]. Since the forehead is not covered by hair, it has small skin-to-electrode interface impedance, which is more conducive for the use of dry electrodes to collect high-fidelity EEG signal. Therefore, we place dry electrodes on each of the left and right sides of the prefrontal lobe. The reference electrode is applied with a ear-clip which is welded to the bio-signals acquisition system by a shield cable. According to the 10–20 international EEG standards, the FP1 and FP2 are chosen as the positions of active electrodes, the earlobe is used as the position of reference electrode (A2), which form the forehead two-channel EEG signals acquisition system. The electrodes placement positions are shown in
Figure 4a.
EEG is a technique for recording the electrophysiological activity of brain neurons on the surface of the cerebral cortex or scalp. The amplitude of EEG in microvolt (
) order of magnitude is usually very weak. Therefore, EEG signal is susceptible to interference and difficult to be directly detected. It is necessary to design analog front-end (AFE) circuit for signal amplification and conditioning. The two-channel AFE circuit of EEG designed in this paper is shown in
Figure 4b.
The AFE circuit of EEG is composed of instrument amplifier (IA) circuit, DC voltage correction circuit, second-order low-pass filter circuit and single-ended to differential circuit. Due to weak amplitude of EEG signal as 10–50
V, the IA with low input-referred noise, high common-mode rejection ratio (CMRR) and high input impedance was required for the first stage amplification of EEG signal. AD8422 chip (Analog Devices, Norwood, MA, USA) with low input-referred noise (0.1
VPP), high CMRR (94–150 dB) and high input impedance (200 GΩ) is used as IA. It is the third generation product of AD620 chip (Analog Devices, USA), and the gain
G is determined by the gain setting terminals resistance
. The range of the gain
G is 1–1000. It is noted that excessive gain will also amplify the DC voltage contained in EEG and saturate the output of amplifier. Therefore, the gain
G is set as 100. According to Equation (
1), the value of
is 200Ω.
According to Equation (
2), CMRR is about 120 dB.
The transfer function of the AD8422 is given as follows:
where
and
represent the positive and negative inputs, respectively.
denotes the input reference voltage.
The DC voltage introduced by the electrode wires contacting with the scalp is called polarization voltage (millivolt magnitude). The polarization voltage amplified by IA will result in saturation and serious distortion of EEG signal. At the same time, the polarization voltage also limits CMRR of the preamplifier and shortens the gain range of IA. The purpose of the DC voltage correction circuit is to eliminate the polarization voltage. In this paper, the integral feedback circuit with a cut-off frequency of 0.5 Hz is applied to realize DC voltage correction in
Figure 4b. After the amplification of the input EEG signal by the IA, the signal passes through the integral feedback circuit. Then the output signal is fed back to the
pin of IA, the output of the integral feedback circuit can be calculated by Equation (
4).
We substitute Equation (
4) into Equation (
3), the output transfer function of the instrument amplifier can be found in Equation (
5).
It can be seen from Equation (
5) that the amplified DC offset voltage will be eliminated at the output of the instrument amplifier through the integral circuit. The integral feedback circuit can realize dynamic DC correction.
The EEG signal after IA amplification and DC correction mainly removes the influence of extremely low frequency, while still contains high frequency interference signal in EEG signal. The high frequency interference signal mainly includes environmental electromagnetic waves, electromyography (EMG) signal and noise caused by the active devices. Therefore, it is necessary to design a low-pass filter to eliminate high frequency interference. The frequency range of EEG signal is generally from 0.5 to 100 Hz, but the frequency range of alpha (8–13 Hz), beta (13–30 Hz), theta (4–8 Hz) and delta (0.5–4 Hz) associated with emotion ranges from 0.5 to 30 Hz [
2]. In this paper, a second-order low-pass filter with 35-Hz cutoff frequency is designed for eliminating noise from EEG signal. In addition, the filter circuit amplifies 2 times of the EEG signal.
The conversion of single-ended signal into differential signal can effectively reduce the common mode interference and increase the dynamic range of the signal. The output signal of the second-order low-pass filter is converted into a differential signal by performing a single-ended to differential circuit. The differential conversion circuit uses fully differential amplifier THS4521 chip (Texas Instrument, Dallas, TX, USA). THS4521 has very low input noise and power. When the bandwidth is 100 kHz, the voltage noise density is low to 4.6 nV/
, which is very suitable for driving the high precision
-
type analog-to-digital converter (ADC). The converted differential signal is fed into the high-precision differential ADC, which converts the analog signal into digital signal. The ADC chip ADS1256 (Texas Instrument, Dallas, TX, USA) has extremely low-noise, 24-bit resolution, 4-channel
-
differential inputs. The ADC conversion result is calculated using the following Equation:
where reference voltage
= 2.5 V,
is the internal gain value,
L represents the complement of digital quantities collected by the ADC.
3.2. EDA Acquisition System
Electrodermal activity (EDA) refers to the sympathetic response caused by strong emotional stimulation, which leads to the rapid increase of secretion of sweat glands in a short period of time, that is, the generation of mental sweating [
26]. Mental sweating is the most obvious on the palmar and plantar sites, and can also be found on the back of the hands, forehead, neck, forearms and legs [
27]. In this paper, the EDA electrodes are placed on the forehead, as shown in
Figure 4b. The acquisition of EDA converts the change of forehead surface impedance into that of electrical signal. The AFE circuit of EDA is depicted in
Figure 5, where
and
represent two mental electrodes.
The AFE circuit of EDA consists of the following two stages. The first stage is the
/V conversion of forehead surface impedance. Considering the power consumption (4.4
A/ Amplifier (Typical)), offset voltage (±1 mV (Maximum)), output voltage swing and the number of operation amplifiers, the MCP6422 chip (Microchip Technology, Chandler, AZ, USA) with two internal low-power amplifiers is selected in this paper. According to the principle of “virtual short” and “virtual break” of the operational amplifier, the flowing current from
to
is calculated in Ω/V stage. After the calculation, the maximum current going through the human body is less than 10
A, which is within the safety range of human body. Generally, the impedance of human body range from tens K
to hundreds K
. Thus, according to the calculation results, the output voltage from the AFE circuit of EDA ranges from 0.4 to 2.4 V. Since the useful frequency range of EDA signal is below 5 Hz [
28], the second stage of the circuit uses the classical Sallen-Key second-order active low-pass filter to eliminate the high frequency interference. The cut-off frequency of the low-pass filter can be calculated by Equation (
7):
where
,
. According to the calculation results, the cut-off frequency of the filter is approximately 5.3 Hz. The output EDA signal is calculated by Equation (
8):
where
represents skin impedance. Substituting the marked parameters in
Figure 5 into Equation (
8), the relationship between
and
can be obtained as shown in Equation (
9).
The skin impedance
can be obtained by collecting the AFE output
through the ADC. EDA signal is generally represented by conductance. According to Equation (
9), the solution formula of conductance
can be obtained as shown in Equation (
10):
where the unit of
is
S.
3.3. SKT Acquisition System
The change of skin temperature is a manifestation of vascular response. There are slight differences in body skin temperature with different emotional states, which can be used for emotion recognition research [
29]. At present, the body temperature measurement sites mainly include oral, rectal, armpit, ear and forehead, etc. In this paper, the integrated temperature sensor is placed at the center of the forehead, as shown in
Figure 4b. The integrated contact temperature sensor LMT70 (Texas Instrument, Dallas, TX, USA) with small-size (0.88 mm * 0.8 mm), high-accuracy (20–42
, ±0.05
) and low-power consumption (9.2
A), which is suitable for wearable devices. The schematic diagram of the temperature sensor structure is shown in
Figure 6.
LMT70 is welded on the flexible PCB by reflow soldering technology, and the high-performance thermal conductive adhesive is filled into the customized internal hollowing thermal conductive metal. The thermal conductive metal after filling the thermal conductive adhesive is connected and fixed with LMT70. The purpose of this structure design is to keep the conductive metal at the same height with the EEG and EDA metal electrodes, so that the electrodes and the conductive metal can fully contact with forehead skin. Another purpose is to avoid skin injury that may be caused by long-term direct contact of LMT70 with the skin. The output of LMT70 is analog quantity, which needs to be converted into digital quantity through ADC. We use the third channel of ADS1256, and the voltage value
can be calculated according to Equation (
6). The temperature value
T can be calculated by Equation (
11):
where
m represents slope value,
b denotes the intercept value. It is known that the forehead temperature range of normal human body is between 30
and 40
in a laboratory environment. In this paper, according to the corresponding typical values of the voltages associated with 30
and 40
in the LMT70 chip manual, the slope value
m and the intercept value
b are set as 0.1943
/mv and 213.340
, respectively.
3.4. PPG Acquisition System
The amplitude, period, pulse rate (PR) and pulse rate variability (PRV) of PPG can be used as features of emotion recognition, especially PR. Researches have been reported that when people are in positive emotional state, the corresponding PR value will be low; otherwise, the PR value increases, which is not beneficial to health. The common locations for obtaining PPG signal are fingers, wrists, earlobes and forehead [
30]. Studies showed that 4 cm on the left or right of the forehead center is the ideal location for PPG acquisition [
31]. In this paper, a reflective PPG sensor is selected and placed on the forehead, as shown in
Figure 4b. The PPG sensor selects the high-sensitivity, low-power consumption (<1 mW) MAX30102 chip (Maximum Integrated, San Jose, CA, USA), which integrates 2 internal LEDs (red and infrared LED) and a photodetector. The PPG sensor communicates with the MCU bidirectionally through the
bus. MCU can set LED current, sampling frequency, ADC bits and other parameters, and can read the photoelectric detector output results through the
bus.
3.5. Switch, Power Management, MCU and BLE Wireless Communication
The system can be run or stopped by a long 3 s touch switch. The touch switch-activated green LED is indicates whether the system is running. The power management includes battery charging and power supply circuit. The 3.7 V (full charge around 4.2 V) lithium-ion battery is used for power supply of the whole system. The chip LTC4054 (Linear Technology, Milpitas, CA, USA) is used to control battery charging through the USB Type-c connector. The red charge indicator led extinguish when battery is full. The low dropout regulator(LDO) will generate ±5.0 V, 3.3 V and 2.7 V voltages according to the power supply requirements of the whole system analog and digital circuits. The MCU is mainly responsible for the multimodal physiological data collection, fusion and transmission. Based on the above requirements, this paper selects the low energy Bluetooth SOC nRF52832 (Nordic Semiconductor, Trondheim, NOR). This chip supports Bluetooth 5 protocol and programmable broadcast gain, and its effective data transmission speed is as high as 1447 Kbps. In addition, the overall architecture is based on Arm CortexTM-M4F CPU with built-in floating-point operation unit and DSP processing unit, which can quickly handle complex tasks, so that the CPU can work in a low energy state for a long time. nRF52832 integrates a wealth of digital peripherals, such as: UART, and SPI, etc. The multi-channel Easy DMA and PPI functions allow communication between peripherals without CPU intervention.
The PCB of bio-signals acquisition system is shown in
Figure 7a. The board size is 5.2 cm × 3.2 cm. The main hardware function modules are marked with red line and tested. An example that HMD Bio Pad is assembled in DPVR E3 VR HMD is depicted in
Figure 7b.
Figure 7c shows HMD Bio Pad worn by the subject who is in VR scene.
4. HCI Interface Software Design
The human-computer interaction (HCI) interface can develop a unified interface for researchers to acquire bio-signals, induce emotions and recognize emotions. It can raise the work efficiency and improve the user experience. The HCI interface is implemented via Python language. The system software is mainly divided into three parts: data visualization interface, experimental paradigm setting interface and emotion modeling interface.
4.1. Data Visualization Interface
Data visualization interface is mainly used to complete the process of the real-time waveform display of multimodal bio-signals, the communication rate and port number setting, and the data storage. The visualization interface program design mainly includes two threads. One is responsible for data receiving, unpacking, verification, etc, and the other is mainly responsible for waveform drawing. Data receiving module receives bio-signals data from MCU through Bluetooth according to certain data packet format. The data packet format is shown in
Figure 8. In this paper, the sampling frequency of EEG is 400 samples per second. Since other peripheral bio-signals are low frequency signal, they are recorded at a sampling frequency of 100 Hz. The data packet consists of 168 bytes including a header byte (0x7F), a payload length byte (0xE6), data payload packet and a CRC checksum byte. After calculation, 20 data packets of 168 bytes each in size need to be transmitted per second. Every 50 ms, a data packet is transmitted to PC. Therefore, the establishment connection time is 50 ms. Bluetooth is in sleep state when there is no data transmission, which can reduce the number of data transmission, thereby reducing the power consumption of Bluetooth.
After receiving one data packet sent by BLE, the PC unpacks and verifies it according to the data format shown in
Figure 8. By observing the time-domain and frequency-domain waveform of the two-channel EEG signals, it can be found that there is partial power line interference in the collected data. Considering the low energy and small size of wearable devices, we did not design an analog 50-Hz notch filter when designing the AFE circuit of EEG. Therefore, a digital comb filter has been implemented on the PC to eliminate power line interference. In this paper, the Filter Design & Analysis Tool (FDATool) in MATLAB (Mathworks, Natick, MA, USA) signal processing toolbox is used to design filter. The FDATool interface provides an interactive design environment for filter design. We choose the infinite impulse response (IIR) comb filter with eighth-order and 1-Hz bandwidth. From the frequency response of the comb filter in
Figure 9a, it can be seen that there is an attenuation of 20.4 dB at 50 Hz and its frequency doubling. The difference equation of the comb filter is as follows:
where
N = 8 is the order of the filter.
and
represent input and output signal, respectively.
represents filter coefficient which is generated by FDATool with the value of 0.96852105385218623. The blinking EEG data collected from the forehead before and after passing through the comb filter are shown in
Figure 9b. It can be seen that the filtered EEG signal attenuates 50-Hz power line interference.
The frequency of emotion-related EEG signal ranges from 0.5 to 30 Hz, and after the comb filter processing, there may still be some high frequency interference greater than 30 Hz. Therefore, a low-pass filter after the output of the comb filter is applied to further eliminate the high frequency interference. In this paper, a direct II type Butterworth IIR low-pass filter with a cut-off frequency of 30-Hz is designed. Considering the real-time display of EEG signal, the filter order is set to be 2. The forward and feedback channels are respectively expressed as:
where
and
are filter coefficients. The feedback value
after iteration can be calculated by substituting the input signal
into Equation (
13), then the output signal
can be calculated according to Equation (
14).
Figure 9c represents the frequency response of the second-order low-pass filter, and
Figure 9d describes the waveform before and after the blinking EEG data passing through the low-pass filter.
Since other peripheral bio-signals are known to be low frequency signals, different cuff-out frequency low-pass filters are designed to meet the requirements according to the above method. The filtered data can be drawn in real-time by calling the drawing thread. Data visualization interface can display the real-time SKT and PR value. It can also save the current collected data to the local document at any time.
4.2. Experimental Paradigm Setting Interface
The experimental paradigm interface is used to set the experimental paradigm according to the requirements of researchers. It mainly includes the following functions.
- (1)
VR scene selection: Users can select the VR scene required for this experiment from the VR scene library;
- (2)
Parameter setting: Users can set the length parameters of subjects immersed in VR environments, including prompt time, VR scene playback time and questionnaire survey time, etc;
- (3)
VR scene play: Users start the VR scene by clicking the ‘Play’ button. When the experiment time reaches the specified value, the VR scene automatically stops playing. The ‘Stop’ button and the ‘Reset’ button are used to stop playing the VR scene and reset the parameters, respectively;
- (4)
Interactive control: Users interact with the controls on the interface to change the parameters, so as to send instructions to the system and finally realize the function.
4.3. Emotion Modeling Interface
Emotion modeling interface mainly includes data pre-processing, feature extraction, feature dimension reduction, feature standardization, model training, model preservation, model loading, emotion classification, etc.
- (1)
Data Pre-processing: Using digital IIR filter such as Butterworth type, Chebyshev I type. According to the requirements, user can choose low-pass, high-pass, band-pass, band-stop four kinds of filters and then set the sampling frequency, order, cut-off frequency and other parameters;
- (2)
Feature Extraction: Extracting time-domain, frequency-domain, time-frequency domain and nonlinear feature extraction. User can choose multi-feature fusion function of multimodal bio-signals;
- (3)
Feature Dimensionality Reduction: Using different feature dimensionality reduction methods to avoid “dimension disaster”, including principal component analysis (PCA) and linear discriminant analysis (LDA). User can set the number of dimensions required for dimension reduction;
- (4)
Feature Standardization: Including Z-score standardization, maximum and minimum value standardization;
- (5)
Feature Classification: Selecting common classifier and set parameters. The common classifiers are Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Bayesian Network (BN) and Decision Tree (DT).
The schematic diagram of the HCI interface is shown in
Figure 10.