1. Introduction
Emotion is a human state that has been subject to extensive research in different fields, including but not limited to psychology [
1,
2,
3], neuroscience [
4], and human-computer interaction [
5]. As a result of the growing interest in virtual reality, emotions shifted into the focus of virtual reality research [
6,
7]. To this end, prior research looked at the role of embodiment in VR [
8,
9,
10], and, more specifically, how different approaches to embodiment influence emotions [
11]. At the same time, there is no comprehensive investigation into how avatar personalization and gender influence our emotions in VR.
This knowledge is beneficial for several reasons. Social VR platform designers could provide more suitable avatar options to users, for example, in terms of experience, enjoyment, and efficiency. Therapy sessions conducted in VR might lead to better outcomes with knowledge of the emotional effects avatars have on users when resembling themselves. And finally, researchers working on avatar embodiment could be provided with insights into how avatar designs may affect participants’ affective states in a VR user study.
VR applications often support the embodiment of a user through virtual avatars, making avatars an integral part of VR environments. We see the use of avatars in various contexts (such as gaming [
12,
13], training [
14], and socialization [
15]). Avatars are defined as virtual characters driven by human behavior [
16] and act as a user’s surrogate in the virtual environments (VEs) [
17]. Avatar appearances and gender were also shown to have an impact on the actual perception of oneself and may result in a change in behavior [
18].
Research showed ways in which the behavior of digital representations can be altered to induce positive effects [
19]. Even though there is an understanding of the effects of avatar appearance (e.g, full-body avatars create a stronger presence in VR [
20], physically attractive male and female avatars are rated more socially competent [
21]) we lack knowledge on how different avatar appearances affect our emotions in VR. We address this research gap by exploring how different avatar appearances affect emotion and emotion elicitation in VR. Our work is guided by the following research question: How does the embodied avatar appearance based on personalization and gender affect user emotions in VR?
To answer this research question, we explored four different avatar appearances (personalized own gender, personalized opposite-gender, non-personalized own gender, and non-personalized opposite-gender; cf.
Figure 1) in a user study with 40 participants. We ran a within-subject user study to investigate the effect of avatar appearance on emotion and emotion elicitation. To evoke emotions among the participants, we used the autobiographical recall method [
22]. Emotions were measured using the self-assessment manikin [
23] and physiological measures. We found significant differences in happiness with personalized same-gender avatars and personalized opposite-gender avatars.
We contribute towards understanding the effect of avatar appearance and embodiment on user emotions in VR through conducting a within-subject user study with 40 participants. The findings from qualitative and quantitative data analysis show that all autobiographical recall methods can successfully elicit happiness among the participants. From four different kinds of avatars (personalized same- and opposite-gender, non-personalized same- and opposite-gender), significant differences in the emotional state were found with personalized same-gender avatars. Participants also have the highest embodiment with a personalized same-gender avatar. We present qualitative feedback collected from participants to improve the design of personalized avatars.
This article is structured as follows: In
Section 2, we present prior research.
Section 3 presents our methodology and the conducted user study. Furthermore, we present our analysis and results in
Section 4, followed by a discussion of the findings in
Section 5. We conclude our work and suggest directions for future work in
Section 7.
3. Methodology
We designed a within-subject user study (N = 40) to understand the effect of avatar appearance and gender on emotions in VR. Participants experienced four different avatars of two different genders. Each participant embodied all four avatars. To study the effect of different avatar appearances, we designed a total of four different avatars as follows:
Personalized Avatar with the Same Gender (PSG)
Personalized Avatar with the Opposite Gender (POG)
Non-Personalized Avatar with the Same Gender (NPSG)
Non-Personalized Avatar with the Opposite Gender (NPOG)
3.1. Apparatus
3.1.1. VR Environment
We created the VR environment using Unity3D and presented it to participants via an HTC VIVE Pro headset (2880 × 1600 pixels, 90 Hz, 110° fov), using the SteamVR plugin. The VR environment was a virtual room where the participants were seated in front of a table and mirror. The room had additional props, e.g., a window and a framed picture to make the room more lively. To avoid distraction these props were made very simple. Using the mirror, participants could view their embodied avatars. In addition, the mirror enhances avatar embodiment within the participants. On the right side of the room, instructions were displayed to participants. The room setup is shown in
Figure 2.
3.1.2. Autobiographical Recall Method
The autobiographical recall involves recalling intense emotional experiences and is a widely used method to induce emotions [
23,
39] experimentally. This method was selected as it does not require additional material participants need to interact with. This allowed participants to focus on their avatar appearance reflected in the mirror. The happy emotion was chosen as autobiographical recall and has been found to be effective in eliciting happiness [
40], both in VR and in the real world [
7]. It was also found that happy memories occur proportionally more often than unhappy memories [
22].
3.1.3. Avatar Design
The avatars were generated by
Ready Player Me (
https://readyplayer.me accessed on 22 February 2023), a tool to generate avatars based on photographs. It integrates with Unity and eases generating multiple avatars of the own and opposite-gender. Mouth movement from
Ready Player Me was included to enhance embodiment. We created two different kinds of avatars.
To create non-personalized avatars, we used photographs of a male and female face from Braun et al. [
41]. For the hair and eye color, the most common colors [
42] in Europe were chosen for the non-personalized avatars, i.e. blonde hair and blue eyes. The avatars are shown in
Figure 1d,e.
To generate a personalized avatar, a photograph of each participant was used. Minor adjustments were made to ensure that the generated avatar looks as similar as possible to the provided picture of the participant. The same picture was also used to generate the avatar of the opposite-gender. The facial features were chosen mainly by
Ready Player Me. Examples are shown in
Figure 1b,c.
3.2. Measures
In order to answer our research question, we collected the following data:
We asked for participants’ ages, gender, profession, education, and prior experience with VR (scale from ‘no experience’ to ‘owner of a headset’).
Emotions have been measured traditionally in two ways: (1) by subjective self-reports and (2) by observing physiological responses while being exposed to an emotion elicitation task. To answer our research question, we collected both self-reported assessments and physiological data. All the scales used are standardized well-established questionnaires. Additionally, the physiological responses considered are inspired by the literature [
43,
44].
The BFI-10 questionnaire was used to assess participants’ personalities. The questionnaire evaluates Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness [
45] on a 5-point Likert scale.
To measure the degree of the experienced embodiment with each avatar, a virtual embodiment questionnaire (VEQ) [
47] was used. The VEQ measures three main factors: the perceived body ownership over a virtual avatar (e.g, “It felt like the virtual body was my body”), the agency over a virtual avatar (e.g., “I felt like I was controlling the movements of the virtual body.”), and the perceived change in the body schema (e.g. “I felt like the form or appearance of my own body had changed.”). Each factor consists of 4 statements on a 7-point Likert scale.
We assessed participants’ emotional responses, using the SAM scale, which is a pictorial self-assessment technique. It consists of three dimensions: valence, arousal, and dominance [
23], rated on a 9-point Likert scale.
Previous work showed a correlation between emotions and colors [
48]. In our work, we investigate if there is any correlation between color and the elicited emotion in VR. Hence, participants were asked to choose a color that best represents how they feel. It was shown to them in the interface and they had to select it from the available palette, using their controller. The choices available were yellow, red, green, blue, purple, and orange.
The participants were asked to rate how well they could identify with the shown avatar. The scale was from 1 to 7 (1 = not similar at all; 7 = perfect virtual representation). They were also asked what they liked and disliked about the avatar and how the avatar could be improved for a better rating.
In addition to self-report assessments, physiological data were recorded to observe physiological responses resulting from the emotions elicited by the use of different avatars. We measured the heart rate (HR), electrodermal activity (EDA), and temperature with the E4 Empatica wristwatch and processed the data using the Empatica analysis software service (
https://www.empatica.com/research/, accessed on 17 March 2023).
We conducted a short interview after the study, where participants reflected and talked about their experience with the embodied avatars to see if they felt any differences. The interview was audio recorded and later transcribed.
3.3. Study Procedure
Prior to the study, each participant was asked to prepare four different happy stories so they can elicit happiness with four different avatar appearances. In addition, they were asked to send a photo of themselves to create a personalized avatar.
As participants arrived at the lab, they gave consent to data collection, (demographics, self-assessments, the physiological data, and the audio of the interviews). Then, they provided demographic data and filled in the personality questionnaire (BFI-10). Afterwards, they were shown the avatars they would embody during the study. For the personalized avatar, we let participants rate how well they could identify themselves with the shown avatar. The experimenter instructed them to recall their happy memory, taking as much time as they needed. Furthermore, they were instructed to look at the mirror at least once before emotion elicitation, so as to be aware of which avatar they currently represent.
We used multiple neutralization videos to help participants reach a neutral state after embodying each avatar [
49]. After entering the VR scene, the first neutralization video was played. Afterwards, they had to fill in the SAM questionnaire and color picker. They were then asked to recall their first happy memory. Then the participants needed to answer the SAM scale, the color representing their feeling, and the embodiment questionnaire. After this, a new neutralization video was played. During the neutralization video, the avatar was switched. This process was repeated for the other 3 avatars. The order was counterbalanced using a Latin square. After the last elicitation, participants filled in the presence questionnaire. When participants left the VR scene, a post-interview was conducted. Participants recalled their experiences and we asked them how they felt. The study took approximately 30 min.
4. Analysis and Results
We present the analysis of the data collected. Embodiment, SAM scale, and the physiological data were first tested for normality with the Shapiro–Wilk test before further analysis. If not normally distributed (p < 0.05), a non-parametric test was chosen.
4.1. Participants
A total of 40 participants (21 females, 19 males) with an average age of years (, , ) were recruited for the study through mailing lists and social media platforms. About half of the sample () consisted of students. 17 participants had a bachelor’s degree (42.5%), and seven had a master’s degree (17.5%).
Regarding prior VR use, 16 participants reported that they had used VR only once (40%), 10 reported an occasional use (25%), nine had no experience (22.5%), four used it often (10%) and one was a VR device owner (2.5%). All participants were healthy individuals.
4.2. Big Five Inventory
The BFI-10 [
50] results and reference group are shown in
Table 1. The reference group is 18–35 years old, with high education, and mixed gender which fits the demographics of our participants. Compared to the reference group, our participants had on average a lower extraversion and conscientiousness score but a higher neuroticism score.
4.3. Presence
To assess how participants perceived the environment, the total Presence score was calculated. The results show a Presence score of () out of 7, which indicates that participants felt quite present in the virtual environment during the study and, thus, no particular biases is expected.
4.4. Embodiment
The descriptive embodiment scores are depicted in
Table 2. The sample for the embodiment data analysis had a similar distribution to the main sample, with an average age of
years (
,
,
), 12 males and 15 females. Due to an issue with data recording, the embodiment data of the first 13 participants were lost. We conducted the analysis with
for the embodiment measures.
Shapiro-Wilk tests showed that data was not normally distributed for some of the embodiment factors assessed. Therefore, we performed individual pairwise comparisons and report Wilcoxon-Signed Rank tests for the comparisons where the normality assumption was violated, and t-tests for those where normality could be assumed.
4.4.1. Effects of Avatar Personalization on Embodiment
For this, the personalized same-gender (PSG) and non-personalized same-gender (NPSG) avatars are compared. Wilcoxon Signed-Rank test showed a significant difference (, ) for Ownership, with the personalized condition PSG () being higher rated than the non-personalized condition NPSG (). A significant difference was also found for agency (, ). Again, PSG () is higher rated than (). We did not find a significant difference for the change factor (, ), with the median for PSG and for GSG being equal; .
We also compared the personalized opposite-gender (POG) and non-personalized opposite-gender (NPOG) avatars. A Wilcoxon Signed-Rank test shows no significant difference (Z = −1.766, p = 0.077) in ownership between POG () and NPOG (). For agency, there is no significant difference (Z = −0.263, p = 0.793), with median for POG and NPSG being . For change, there is no significant difference (Z = −0.238, p = 0.812) with the median for POG () and NPOG ().
In summary, there is a stronger embodiment perception for the personalized same-gender avatar compared to the non-personalized same-gender avatar in ownership and agency. No significant difference was found between non-personalized or personalized avatars of the opposite-gender.
4.4.2. Effects of Avatar Gender on Embodiment
We first compared the personalized same-gender (PSG) and personalized opposite-gender (POG) avatars. Wilcoxon Signed-Rank test showed a significant difference (, ) in ownership, with the median for PSG () being higher than for POG (). For agency, there is a significant difference (, ), with PSG () being higher rated than POG (). For the change, we found no significant difference (, ; for PSG, for POG).
When comparing the avatars for non-personalized same-gender (NPSG) and non-personalized opposite-gender avatar (NPOG), we found no significant difference (, ) for ownership. Similarly, no significant difference was found for agency (, ; for NPSG, for NPOG) or change (Z = 0.310, p = 0.757); for NPSG, for NPOG).
In summary, significant differences were found between avatar gender for the personalized avatars. A personalized avatar of the same-gender is favored regarding ownership and agency. However, there is no significant difference in embodiment between avatar gender when comparing the non-personalized avatars.
An exploratory 2 (avatar gender) × 2 (avatar personalization) analysis of variance (ANOVA) corroborates the above results for gender and personalization and showed respective main effects (). Pairwise comparisons revealed an influence of avatar gender only in the personalized avatar conditions (), whereas an influence of personalization was only visible in conditions where the avatar had the same-gender as the participant (). Controlling for the participant gender using analysis of covariance (ANCOVA) did not show any significant influence of the participant gender on the results.
4.5. Emotion Elicitation
During the study, we collected a total of five SAM assessments: one before the emotion elicitation and the other four for each avatar after the emotion elicitation. Box-plot diagrams for each dimension of the SAM assessments are shown in
Figure 3. Shapiro-Wilk tests reveal that data was not normally distributed for some of the SAM assessments. We performed individual pairwise comparisons and report Wilcoxon-Signed Rank tests for the comparisons where the normality assumption was violated, and
t-tests for those where normality could be assumed on the basis of the results of the Shapiro-Wilk tests.
4.5.1. Baseline
We first compared the emotional rating for each condition to the baseline. Wilcoxon Signed-Rank tests showed that all emotion dimensions were significantly different from the baseline in all conditions ().
4.5.2. Effects of Avatar Personalization on Emotion
First, the personalized same-gender (PSG) and non-personalized same-gender avatars (NPSG) are tested for significant differences. A Wilcoxon Signed-Rank test () shows significant differences (, ) in valence, with PSG () being higher rated than the NPSG (). For arousal, there is a significant difference (, ), with the median of PSG () being higher than the median of NPSG (). For dominance, there was no significant difference (, ). The median of PSG () is slightly higher than the median of NPSG ().
A Wilcoxon Signed-Rank test shows no significant difference (, ) in valence, with POG () being higher than NPOG (). For arousal, there is a significant difference (, ) with the median of POG () being higher than NPOG (). For dominance, there is no significant difference with (, ) with POG and NPOG resulting in a median of .
In summary, there is a significant difference between the personalized avatar and non-personalized avatar of the same avatar gender as the participant, which is shown by a stronger emotion elicitation for the personalized avatar in valence and arousal with a mean increase of 1 in valence and 1 in arousal. For the opposite gender, a significant difference was only found for arousal, where an increase of 1 in arousal can be observed for the personalized opposite-gender avatar compared to its non-personalized counterpart.
4.5.3. Effects of Avatar Gender on Emotion
We first compared the two personalized avatars of the same and opposite gender. A Wilcoxon signed rank test shows no significant difference (, ) in valence, with the median for PSG of being higher than for POG (). For arousal, there is no significant difference (, ), with the median of PSG () being higher than POG (). For dominance, there is no significant difference (, ). The median is the same for PSG and POG ().
Furthermore, we compared the two non-personalized avatar types. A Wilcoxon Signed-Rank test shows no significant difference (, ) in valence, with the median of NPSG () being higher than NPOG (). For arousal, there is no significant difference (), with the median of NPSG () being higher than NPOG (). For dominance, there seems to be no significant difference with (, ), with the median for NPSG and NPOG being .
To conclude, no significant differences were found for the avatar gender, neither when comparing the personalized avatars nor when comparing the non-personalized avatars.
The above-stated effects were exploratorily tested using a 2 (avatar gender) × 2 (avatar personalization) ANOVA. The results corroborate the above-stated findings with the main effects of avatar personalization. Pairwise comparisons confirmed that for arousal, personalization led to a significant effect in both gender types (). However, the effects for valence and dominance were only present in the same-gender avatars (). Exploratory ANCOVAs using gender, age, and the dimensions of the BIG5 inventory as covariates revealed a significant impact of agreeableness for the arousal ratings, a significant impact of extraversion and neuroticism for the dominance ratings, and a significant impact of conscientiousness for the valence rating (all ).
4.6. Color Choice
Since there were six choices for the colors, an equal distribution would mean a color was picked in 16.7% of the cases by the participants by chance. Before emotion elicitation, the colors appear to be distributed rather evenly, with the top three being orange (25%), followed by blue (22.5%) and yellow (20%). After elicitation, generally blue seems to be chosen less often compared to green. This is especially noticeable for the personalized same-gender avatar, where the most frequently picked color was green (30%), followed by yellow (25%). In contrast, the other three avatars seem to share a similar pattern, with yellow being the most picked color. For the personalized opposite-gender avatar, yellow (25%) is followed by orange (22.5%). For the non-personalized same-gender avatar, yellow (27.5%) is followed by orange (22.5%) and green (22.5%). For the non-personalized opposite-gender avatar, yellow (35%) is followed by green (22.5%) and orange (17.5%). In comparison, the second least picked color is red. The least picked color was purple, ranked last for every avatar in this study. In summary, after the participants elicit their happy memory, they favor the colors green, orange, and yellow compared to the other colors.
4.7. Physiological Data
The time intervals for each avatar were calculated with the help of time stamps. Physiological data were averaged for the different time sections of the study, which are: (1) The time when participants elicit their emotion, (2) the baseline before the emotion elicitation, which is the time of the previous neutralization videos before each elicitation, (3) the time before they entered the VR and (4) the time after they left the VR.
Since each participant has different absolute physiological values, changes were calculated in percentages. For calculation, the given time section (before VR, baseline, and after VR) was used as the base value. The results are shown in
Table 3,
Table 4 and
Table 5. Shapiro–Wilk tests showed that the majority of measures for temperature, heart rate, and EDA deviate from a normal distribution (
p’s
). We performed Friedman tests to analyze the data.
We found that for all participants, the temperature increased during emotional elicitation using autobiographical memory, compared to the temperature before entering the virtual environment. A Friedman test was used and showed no significant differences before VR, baseline, and after VR between the avatars ().
An analysis of the heart rate showed a decrease in the heart rate when emotion elicitation happens compared to the heart rate before entering VR. The heart rate during elicitation is also lower compared to the after-VR heart rate. Friedman tests showed no significant differences in any of the avatar conditions between the time sections ().
The largest percentage of changes can be found for EDA. A decrease can be observed in EDA when recalling the memory compared to the before VR EDA. For the baseline, the difference is less than one percentage for POG and NPOG, but a decrease of around four to five percent can be seen for PSG and NPSG. The EDA during elicitation is also lower compared to the after-VR EDA. The largest difference here can be found for NPSG with a 26.54 percent difference. Friedman tests showed no significant differences between the avatars for all time sections ().
4.8. Avatar Similarity
For the personalized avatar of the same and opposite gender, participants were asked how well they can identify with the avatar. The mean value is
(
) for the avatar with the same-gender and
(
) for the avatar with the opposite gender. A breakdown of the rating is shown in
Table 6. For both genders, the personalized same-gender avatar has a higher score than the personalized opposite-gender avatar. The data suggest that male participants give slightly higher scores for the avatar with the opposite gender compared to female participants.
4.9. Participant Feedback
Most participants felt more comfortable with their personalized same-gender avatars. This was followed by the personalized opposite-gender avatar. Opinions regarding the generic avatars were mixed. Some participants mentioned that when they saw the non-personalized avatar in the mirror that they felt as though it was another person in the room. P02 stated, “At first I thought that I can generally identify more with males but when I saw the non-personalized man I had the feeling that it was a second person in the room and not me. I had this feeling [of identity] more towards the female avatars”. P05 said, “Sometimes I did not feel like myself when looking in the mirror”.
The participants were also asked to rank the avatars based on how well they could identify with the avatars after recalling a happy moment with each of them. The data suggest that people identify with personalized avatars more. The personalized same-gender avatar was placed first the most followed by the personalized avatar of the opposite-gender.
The opinions on the avatar’s clothes were generally positive or indifferent. Some female participants noted that the clothing for the female was too serious and, thus, they lowered the rating since the avatar felt very stiff to them. The body size was considered positive but some participants mentioned that they think that the avatar’s look is too muscular or too thin, depending on participant’s own body. The feedback on the proposed hairstyle for the avatar was overall very positive. Face features that needed improvements were eyebrows, a slimmer face, a better jawline, and a more similar-looking nose.
As seen in
Table 6, the avatar rating for the opposite-gender was on average lower compared to the avatar of the same-gender. A common remark was that participants felt that the avatar of the opposite-gender should automatically receive a lower identity score compared to the avatar of their own gender. Another reason for a lower score was that the participants had a harder time identifying with the shown avatar because they felt no connection to it since they have no comparison of how their real-life opposite-gender appearance would look like.
5. Discussion
We conducted a VR experiment to investigate the effect of different avatar appearances in terms of personalization and the avatar’s gender on embodiment and emotion.
To confirm previous findings and to investigate interactions with gender, we compared the two levels of personalization as well as the two levels of avatar gender with regard to embodiment. From the literature, we expected that personalization leads to a stronger embodiment perception, specifically with regard to the perception of virtual body ownership [
10,
47]. We found such an impact for body ownership, as expected, but also on the level of agency. Since we did not change any control paradigms, the latter may be explained by the strong association and link between the concepts of body ownership and agency [
47,
51]. One of the most interesting findings of our study is that these effects seem to be limited to the avatars of the same gender but to be rather strong here, whereas this effect was not visible when the avatars had the opposite gender of the participant. In a very similar fashion, the avatar gender was influencing the level of embodiment significantly, but only in the personalized avatar condition.
This is particularly interesting because suggests an “ideal” avatar to be of the same-gender and to be personalized. In this way, a high score with regard to embodiment can be achieved, which might lead to positive aspects, for example, in therapy, social VR applications, and the like.
Furthermore, we compared the effects of different avatar appearances on emotion using the autobiographical recall method, eliciting a happy emotion. This study answers an interesting question about emotion elicitation. Prior research investigated the autobiographical recall method in VR using the think-aloud method in the absence of any avatars. Findings suggest that users find it difficult and also silly at some point to speak aloud without anyone present virtually. In our study, participants recalled past life events in front of a mirror reflection that embodied either their own self or a non-personalized avatar.
Two findings are interesting. Firstly, the comparison to the baseline indicates that emotion can be successfully elicited using the recall method when an avatar is present in the virtual scene, which extends previous findings. Results show that this method successfully elicited the desired emotions among the avatars types.
We found that the level of personalization affects the elicited emotion. Based on our findings, we believe using self-reflection can be a successful method of eliciting emotions in VR environments. Qualitative feedback from participants also suggests that participants feel comfortable using this method.
We consider three measures to understand the effectiveness of this method. The SAM scale, the physiological data, and the gathered feedback from the interview. Every avatar led to an increase in valence, arousal, and dominance compared to values obtained before the elicitation. The physiological data, consisting of heart rate, EDA, and temperature, can be found in
Table 3,
Table 4 and
Table 5. To see if emotion elicitation caused a physiological response, the average of the physiological data gathered during emotion elicitation is compared with the data obtained (1) before the participants entered VR and (2) the baseline, which is the time frame of the previous neutralization video before emotion elicitation. Kreibig’s review of the autonomic nervous system shows that eliciting happiness should cause an increase in temperature, heart rate, and EDA [
52].
The interview revealed that nearly all participants felt happy when recalling their memories with the different avatars. While some noted that with some avatars it was easier or more intense, the general opinion was that emotion elicitation works with every avatar. Using the autobiographical recall method, eliciting happiness is possible with all avatars.
Generally, the personalized same-gender avatar seems to perform best in various aspects. It had the highest embodiment score, its avatar rating compared to the personalized opposite-gender avatar was higher, its SAM score regarding valence was the highest and the interview showed that the avatar was often placed first when asked to rank all embodied avatars. These results show that, if an avatar is to be used, it should closely resemble the user for eliciting the happy emotion.
The average age of participants was rather young (25.55) and many participants mentioned how they are used to embodying different avatars from video games. Therefore, different-gender avatars might not trigger any discomfort. This absence of discomfort when embodying different avatars might also explain the average presence score of 4.92 out of 7 which means that participants, despite embodying various avatars in a short period of time, felt predominantly present during the study.
Significant differences were found between the avatars for embodiment and SAM score. Regarding embodiment, the personalized same-gender avatar has a significantly higher score for acceptance and control compared to the non-personalized same-gender. There is also a significant difference between both personalized avatars. The avatar of the same gender was found to have a significantly higher acceptance and control score. For SAM values, there are also significant differences found between the personalized and non-personalized avatars of the same gender where valence and arousal are significantly higher for the personalized same-gender avatar. A significant difference was also found between the personalized and non-personalized avatars of the opposite-gender where arousal levels are higher for the personalized avatar of the opposite gender.
In summary, to answer our research question, the data in this study show that personalized avatars perform better than non-personalized avatars regarding embodiment score and SAM score, and they were rated higher regarding identity. In addition, this study found significant differences between avatars for embodiment and SAM score. Furthermore, the data hint at a correlation between color and embodiment for some avatars. Also, there are additional correlations found between personality traits, SAM score, and embodiment.
6. Limitations
We acknowledge that our work has limitations. Firstly, while physiological data from the participants were recorded through the E4 watch, this data might contain inaccuracies as participants were constantly moving their arms during the study which might cause the sensor to record inaccurate data. The psycho-physiological literature discusses a range of challenges related to the (real-time) assessment of a person’s physiological information. Most important are the ‘Baseline Problem’, the ‘Timing of Data Assessment Problem’, and the ‘Intensity of Emotion Problem’ [
53].
Secondly, Autobiography recall for the same emotion might be limited to a maximum number of uses. Some participants felt that they were already happy and recalling another happy memory did not cause a further change in mood.
Thirdly, Ready Player Me was used for the avatar creation. This stylized avatar has its advantages as it tries to avoid the Uncanny Valley effect, an effect where an increase in robot or avatar realism might actually lead to an acceptance decrease [
54]. However, Ready Player Me also struggles to create avatars with more unusual faces, for example, smaller eyes or a bigger nose. As a result, the avatars from Ready Player Me looked somewhat similar and differences between the avatars might be subtle.
Fourthly, we conducted our study with 40 people only. While future work could explore how the findings generalize to larger populations, for example by means of out-of-the-lab studies [
55], our exploration still demonstrates the influence of avatar personalization on emotions in VR. Moreover, we did not consider the impact of the game itself nor the different groups of game players on the attitude towards the specific avatar.
Lastly, we considered healthy individuals only, leaving an investigation of users with potentially influencing factors for future work.