1. Introduction
The rapid development of digital technology and the incorporation of artificial intelligence (AI) have profoundly transformed the way humans and machines interact in the contemporary era. In particular, the advent of human-like machines (HLMs) has enhanced the user experience (UX) by making the interactions more natural and effective [
1]. HLMs can communicate with users by mimicking human language, emotions, and behavior, and they now are being utilized in various fields, including smart homes, healthcare, and education [
2,
3,
4]. Understanding the psychological and social impact of interacting with HLMs is crucial for both technology design and UX research in this context.
Early research in this area primarily focused on the visual attributes of HLMs. The external design of HLMs significantly affects the UX, with facial expressions and emotions playing a pivotal role in user interaction. A number of studies have shown that designs that emulate the human appearance of robots and humanoids can elicit users’ positive cognitive and emotional responses [
5,
6,
7,
8,
9,
10]. For example, HLMs that are capable of expressing emotions in a natural manner have been observed to enhance user trust and intimacy. However, the “uncanny valley” phenomenon, whereby users experience discomfort or eeriness when a machine appears almost but not entirely human, remains a significant challenge [
11,
12,
13]. To address this issue, researchers have studied the impact of the sophistication and subtlety of HLMs’ detailed facial movements and emotional expressions on user perception and emotion in various interaction contexts and scenarios.
Recently, advanced AI technologies such as ChatGPT, which is based on Large Language Models (LLMs), have enabled more sophisticated interactions with machines through a multitude of modalities, such as text, images, and voice. In particular, voice-based interactions are becoming increasingly important [
14,
15]. Previously, users interacted with chatbots through structured dialogue and text-based interactions. However, now they can interact with machines as naturally as they would with human voices. This allows users to interact with HLMs more intuitively and naturally, making the technology more accessible and providing a positive UX. Research is increasingly being conducted on HLMs in various applications of voice-based interactions.
The application of HLMs and research on voice-based interactions is increasingly important in the field of autonomous vehicles. As the driver’s role changes with the development of autonomous driving technology, the way drivers interact with their vehicles also changes. Depending on the level of technology deployment, vehicles can take over some or most driving tasks, allowing drivers to pay more attention to secondary tasks [
16,
17,
18]. This expansion in how drivers interact with their vehicles necessitates changes to the user interface. Voice-based interactions will play a crucial role in this environment. While visual or touch-based interactions have the potential to distract drivers, voice interactions allow for effective communication between drivers and their vehicles while maintaining visual focus on the road [
19]. Additionally, in various secondary task situations, such as watching videos, playing games, or reading, voice-based interactions can effectively capture the driver’s attention and facilitate intuitive interactions. The use of human-like dialogue with HLMs enhances the UX by understanding the driver’s intent and context, leading to natural responses. Nevertheless, there is still a lack of UX research on the nature and design direction of voice interactions between drivers and HLMs in autonomous vehicles.
Mixed-method (MM) research, which integrates quantitative and qualitative research techniques, is an appropriate methodology for studying voice interactions with HLMs in autonomous vehicles. This research method facilitates the development of a comprehensive understanding of user behavior, preferences, and the complex emotional responses that arise from interacting with technology [
20]. Quantitative research is useful for statistically analyzing interactions between users and HLMs to identify general trends and patterns. In contrast, qualitative research provides a more profound understanding of the individual user’s experiences, perceptions, and technology use, exploring aspects of the UX that are challenging to capture through quantitative data alone. This integrated approach allows for the examination of the various variables and contexts that may arise when designing and evaluating interactions with HLMs in autonomous vehicles.
This research aims to understand the impact of HLMs on the UX of young adults during voice interactions between drivers and autonomous vehicles through MM research methods. Additionally, it seeks to propose design directions for HLMs in autonomous vehicles. It is expected that this study will uncover the nuanced relationship between intimacy, trust, and perceived safety when interacting with HLMs, and suggest design recommendations for voice agents in autonomous vehicles.
The remainder of this paper is organized as follows:
Section 2 presents a review of related work on anthropomorphism in voice agents and the MM approach.
Section 3 describes the methodology, including details of participants and the experimental setup.
Section 4 presents the results of this study, followed by a discussion in
Section 5. Finally,
Section 6 concludes the paper with a summary of contributions, key findings, and suggestions for future research.
2. Related Works
2.1. Anthropomorphism in Voice Agent Applications
The application of anthropomorphic design elements in voice agents has been extensively explored across various domains. Baird et al. [
21] investigated synthesized voices and found that achieving complete human likeness remains a challenge. Wagner and Schramm-Klein [
22] emphasized that human-like characteristics, such as social behavior and personality, positively influence user acceptance of digital voice assistants. These studies support the broader applicability of anthropomorphism for enhancing trust across different AI technologies. Voice characteristics and interaction styles also play a crucial role in shaping user trust and acceptance. Seaborn et al. [
23] highlighted the social function of voice in human-agent interactions, emphasizing that voice characteristics and interaction styles significantly influence user trust and acceptance. Abdulrahman and Richards [
24] found that both human and synthetic voices could build trust, challenging the preference for human voices and suggesting that synthetic voices can deliver equivalent benefits when designed with context-specific considerations. In the context of healthcare applications, Hagens [
25] examined the design of trustworthy voice assistants, emphasizing the need for anthropomorphic design to enhance user trust and engagement. Hu and Lu [
26] explored the dual humanness of conversational AI, showing that both speaking and listening capabilities contribute to the establishment of trust, but an imbalance between these capabilities can hinder user acceptance. Pias et al. [
27] studied the impact of voice characteristics, such as tone, age, and gender, on the persuasiveness of voice assistants, and found that these factors significantly influence user engagement and decision-making. Additionally, Calahorra-Candao and Martín-de Hoyos [
28] focused on the impact of anthropomorphism on perceived safety in voice shopping, revealing that human-like attributes in virtual assistants significantly enhance user trust and safety perception.
The integration of anthropomorphic features in autonomous vehicles (AVs) has also been extensively studied, with findings indicating that such features have a significant impact on user trust, acceptance, and interaction quality (
Table 1). A key study by Waytz, Heafner, and Epley [
29] demonstrated that users are more likely to trust AVs when they possess human-like characteristics, such as names, gender, and voices. This foundational research highlighted the potential of anthropomorphism to mitigate the trust gap in autonomous systems, making technology appear more competent and reliable. Further extending this understanding, Niu, Terken, and Eggen [
30] found that anthropomorphic information significantly enhanced trust compared to purely symbolic information. By adding human-like features, users perceived AVs as more relatable and functional. This highlights the importance of integrating anthropomorphic design elements to foster a social connection between users and AVs.
Voice characteristics also play a crucial role in influencing user perceptions. Lee, Ratan, and Park [
31] showed that voice agents designed to align with gender stereotypes (informative male and social female) were perceived as more useful and user-friendly. This suggests that users have intrinsic expectations about voice characteristics that can be exploited to enhance the acceptance of AV. Dong, Lawson, Olsen, and Jeon [
32] compared different voice agent embodiments, and discovered that a female voice-only agent was perceived as more likable, comfortable, and competent. This finding highlights the impact of voice gender on user comfort and trust. Similarly, Im, Sung, Lee, and Kok [
33] demonstrated that synthetic voices could be as effective as human voices, especially for functional tasks. These results suggest that the selection between human and synthetic voices should be context-dependent, with synthetic voices offering advantages in specific scenarios.
Interaction strategies incorporating human-like conversational behaviors significantly enhance the UX and trust. Large et al. [
34] explored the role of natural language conversation in fostering trust within AVs. They found that participants sought both functional and emotional connections with the vehicle, indicating that conversational interfaces could create a more engaging and trustworthy UX. Lee, Sanghavi, Ko, and Jeon [
35] investigated the effects of informative versus conversational speech styles in AV agents. Their findings showed that conversational agents were perceived as more competent and socially present, suggesting that human-like interaction styles could improve trust and satisfaction. Moreover, Wang et al. [
36,
37] studied the combined effects of speech style and embodiment on driver perception and performance. They concluded that conversational and embodied agents promoted likability and perceived warmth, contributing to safer driving behaviors and an enhanced UX.
While existing research highlights the importance of anthropomorphic design elements, in voice characteristics and interaction strategies for enhancing user trust and acceptance of voice agents, several limitations persist. Many studies have focused on individual aspects of anthropomorphism or specific user interactions, often neglecting the combined impact of these elements on the overall UX. Additionally, previous research has predominantly utilized either quantitative or qualitative methods in isolation, limiting the depth and comprehensiveness of the findings. This study aims to bridge these gaps by employing a mixed-method approach to investigate the combined effects of voice characteristics and humanized speech strategies in a fully autonomous vehicle context. By integrating both quantitative and qualitative data, our research provides a more nuanced understanding of how these factors influence the UX and trust. Furthermore, findings from existing studies highlight that anthropomorphism does not always have a positive impact; the combination of anthropomorphic strategies and their application context can yield varying effects. This underscores the need for further multifaceted research to explore these dynamics comprehensively.
Table 1.
Studies on anthropomorphic voice agents in AVs.
Table 1.
Studies on anthropomorphic voice agents in AVs.
Study | Anthropomorphism Strategy | Evaluation Method | Evaluation Metrics | Experimental Environment | Key Findings |
---|
Waytz et al. [29] | Names, Gender, Voice | Quantitative | Trust, Functionality | Simulation environment | Trust in autonomous vehicles increased with human-like features. |
Niu et al. [30] | Human-like information | Quantitative | Trust | Simulation environment | Anthropomorphized information significantly enhances trust. |
Lee et al. [31] | Gendered voice | Quantitative | Usefulness, User-friendliness | Simulation environment | Voices matching gender stereotypes are perceived as more useful. |
Dong et al. [32] | Female voice | Quantitative | Likeability, Comfort, Competence | Simulation environment | Female voice agents are perceived as more likable, comfortable, and competent. |
Wang et al. [36] | Speech style, Embodiment | Quantitative | Likeability, Warmth | Simulation environment | Conversational and embodied agents are rated higher in likeability and warmth. |
Wang et al. [37] | Conversational voice | Quantitative | Driving performance, Likeability | Simulation environment | Conversational agents enhance driving performance and are preferred by users. |
Large et al. [34] | Natural language conversation | Quantitative | Trust, Engagement | Simulation environment | Conversational interfaces increase trust and engagement. |
Im et al. [33] | Human and synthetic voice | Quantitative | Fluency, Competence, Attitude | Simulation environment | Synthetic voices can be as effective as human voices in certain contexts. |
Our study | Human and synthetic voice, Humanized speech | Mixed methods | Intimacy, Trust, Intention to use, Perceived safety, Perceived anthropomorphism | Simulation environment | Higher levels of anthropomorphism increase perceived intimacy and anthropomorphism, but do not significantly impact trust or perceived safety. |
2.2. Mixed-Method Approach
Mixed-method research combines quantitative and qualitative research techniques to harness the strengths of both approaches, allowing for a more comprehensive analysis of research questions. This method is particularly useful in fields in which understanding complex human behaviors and interactions is crucial, such as in the development and assessment of voice agents in autonomous vehicles. By integrating numerical data with rich textual data, mixed methods provide a holistic view of the subject matter, capturing both the breadth and depth of UXs. The primary advantage of employing a mixed-method approach is its dual capability to generalize findings through quantitative data while also providing contextual depth through qualitative insights [
38,
39]. This approach not only enhances the reliability of the research findings but also ensures that the data interpretation is grounded in actual UXs and perceptions [
40].
However, the main challenge lies in the complexity of data integration and the need for extensive resources to manage two distinct types of data collection and analysis processes effectively [
41]. Mixed methods are particularly widely adopted in emerging research fields that focus on uncovering more profound insights into interactive technologies. This approach is highly beneficial in areas in which UXs are not yet fully understood and when there is a significant need to explore the complex interactions between humans and technology [
42]. For instance, quantitative data might reveal how users generally feel about the helpfulness of a voice agent, while qualitative data can provide deeper insights into why users feel a certain way, perhaps pointing to specific features of the voice or interaction style that influence user satisfaction [
43].
Quantitative data, often gathered through structured tools like surveys or experiments, are analyzed using statistical techniques such as descriptive statistics (mean, standard deviation) to summarize trends and inferential statistics (e.g., ANOVA, regression) to explore relationships or differences across variables. This allows researchers to identify patterns or significant differences among groups. On the other hand, qualitative data are collected through methods like interviews, focus groups, or observations, and analyzed using techniques such as coding and thematic analysis. Additionally, more advanced methods like network analysis or text mining can be employed to identify patterns and extract themes from large datasets. These techniques help uncover the deeper, contextual meanings behind participants’ responses, explaining the reasons behind trends identified in quantitative data. The key challenge lies in integrating these two types of data. Common methods such as explanatory sequential design, in which quantitative results are followed by qualitative exploration to explain the findings, or convergent design, in which both types of data are collected simultaneously and then compared for consistency, are often used. Triangulation is another widely used method, in which findings from different data sources are cross-validated to ensure robustness. This combined approach enhances both the reliability and richness of the research findings, ensuring that they are robust and contextually meaningful.
In this study, leveraging a mixed-method approach enables a detailed assessment of voice agents’ effectiveness for a specific driving scenario and their impact on the overall UX. Quantitative measures provide a broad overview of intimacy, trust, intention to use, perceived safety, and perceived anthropomorphism, while qualitative interviews offer insights into the emotional impacts of these agents. This dual examination helps identify key attributes of voice agents that promote a positive UX, informing more targeted and effective future enhancements [
44].
3. Method
3.1. Participants
A total of 30 participants, consisting of mentally and physically healthy undergraduate and graduate students, were recruited for the study (
Table 2). The sample size was determined based on a priori analysis using G*Power software (ver. 3.1.9.7, Heinrich-Heine-Universität Düsseldorf, Germany) to satisfy the medium effect size of 0.25, power of 0.8, and significance level of 0.05. The analysis determined that a sample size of 28 would be appropriate for the experimental design. Among the participants, 13 were males with an average age of 24.23 years (SD = 1.64), and 17 were females with an average age of 22.24 years (SD = 1.14), with an age range of 20 to 28 years. The standard deviation (SD) reflects the variability of age within each gender group. Half the participants reported driving at least once a week, and over 80% had experience interacting with voice agents. As the study assumed voice interactions in a fully autonomous vehicle, possessing a driving license was not a necessary criterion for participation. However, approximately 70% of the participants held a driving license.
3.2. Interaction with Voice Agent
Three types of voice agents were developed based on whether they used a machine or human voice and on the application of the humanized speech strategies (
Table 3). All three types of agents were in-vehicle agents that supported voice-based conversations in Korean and used a formal, polite, sentence-ending style. Agent A used a machine voice without humanized speech strategies. Agent B employed a human voice but without humanized speech strategies. Agent C utilized a human voice combined with humanized speech strategies. This progression from Agent A to Agent C was designed to increase the level of anthropomorphism in the interactions. The human voice used for Agents B and C was female. Humanized speech strategies are designed to make interactions more natural and engaging by incorporating elements that mimic human conversation. Based on previous research, these strategies include empathetic responses, explanations of actions, agreement, providing advice, and asking questions [
45,
46,
47,
48]. Agent C incorporated strategies such as emotional responses, reasons for decision, transparency, agreement, and asking questions, each contributing to a more human-like interaction. Emotional responses allow the agent to express empathy, making the interaction more engaging. Reasons for decision provide users with explanations for the agent’s actions, fostering trust and understanding. Transparency involves the agent openly communicating its decision-making processes and actions, which helps to build trust by ensuring that the user understands what the agent is doing and why, thereby reducing uncertainty and fostering confidence in the interaction. Agreement helps to align with the user’s preferences, creating a sense of cooperation. Lastly, asking questions encourages a two-way dialogue, making the interaction more dynamic and personalized. To stimulate interactions with the voice agents in the driving scenarios, text-to-speech (TTS) software was used. Conversations between the driver and the voice agents were recorded based on pre-established driving scenario scripts, and the voice agents were generated using a female TTS voice provided by Selvy Voice.
3.3. Driving Scenario and Apparatus
This study focused on a general driving scenario deemed most suitable for driving contexts, specifically a navigation scenario. In this context, the scenario assumed driving in a downtown city setting in which the vehicle started from a stationary position and headed toward a designated destination (
Table 4). The driving process included typical elements such as lane changes and speed adjustments and lasted approximately 2 min. The driving scenarios were simulated using City Car Driving (Ver. 1.5.9.2, Forward Global Group, Ltd.) and did not include hazardous elements, such as abrupt traffic changes, sudden braking by the leading vehicle, dangerous vehicle entries from oncoming lanes, or pedestrians crossing at unexpected locations. The simulation parameters were set to represent a typical urban driving environment. For instance, vehicular traffic density was set at 20%, with quiet traffic behavior, while pedestrian density was set higher at 70% to ensure a natural urban experience. Hazardous conditions, such as dangerous traffic changes, emergency braking, and dangerous vehicle entries from oncoming lanes, were excluded to maintain a smooth driving process. Additionally, road accidents were set not to occur frequently, and no traffic controller appearances or traffic light malfunctions were included to prevent unnecessary interruptions during the simulation. To enhance interaction fidelity, a Logitech G29, including a steering wheel and pedals, was used and connected to a driving seat to create a realistic driving environment. A 32-inch PC monitor was utilized along with speakers to present the voice agent interaction stimuli. The experimental environment is depicted in
Figure 1.
3.4. Data Collection and Analysis
A mixed-method approach was employed in this study. A survey was conducted to quantitatively evaluate the UX of interactions between drivers and the three types of voice agents using five subjective measures: intimacy, trust, intention to use, perceived safety, and perceived anthropomorphism. These measures are known to directly or indirectly impact the acceptance of AI-enabled technologies, as demonstrated in previous research [
29,
30,
34,
45,
47,
48,
49,
50,
51]. These measures serve as key UX evaluation indicators for voice agents, and their selection is grounded in their relevance to understanding user interactions with HLMs, as discussed in
Section 1 and
Section 2. In particular, intimacy and trust are essential for fostering positive relationships between users and technology, while intention to use, perceived safety, and perceived anthropomorphism are critical factors in determining user engagement and acceptance of voice agents in autonomous vehicles. Using collected quantitative data, a one-way ANOVA was conducted to compare mean values across different agent conditions, with statistical significance set at 0.05, using SPSS software (Version 29). Sphericity was assessed using Mauchly’s test. In cases where the assumption of sphericity was violated, the degrees of freedom were adjusted using either the Greenhouse–Geisser correction (when ε < 0.75) or the Huynh–Feldt correction (when ε > 0.75) [
52]. Post-hoc analyses were performed with multiple comparisons whenever a significant difference was found in the ANOVA. Semi-structured interviews were conducted for qualitative data collection. Participants’ responses regarding the positive and negative aspects of each agent were recorded, and additional questions were posed based on the researcher’s discretion. The qualitative data were then structured and analyzed through content analysis. Following the conventional content analysis method described by Hsieh and Shannon [
53], two researchers (Y.M.K and J.K) independently identified initial coding schemes from the raw data. Subsequently, all researchers participated in collaborative discussions to derive the final coding scheme, which serves as a framework for organizing qualitative data. Through this analysis process, the outcomes were determined and are presented in
Section 4.2.
3.5. Procedure
Before the main experiment, upon arrival at the laboratory, participants were given a detailed explanation of the study, and the researchers reaffirmed their prior consent. Participants who agreed to partake in the study were shown examples of interactions with voice agents in an autonomous driving simulation to enhance their understanding of the experimental setup. Additionally, the definitions and evaluation methods of the evaluation measurements were explained. Each participant performed three autonomous driving simulations in random order. After each simulation, subjective measurements were assessed, followed by a semi-structured interview. This procedure was repeated for all three voice agents, with participants allowed to rest between simulations. The entire experiment took approximately 40 min per participant, and the experimental procedure is illustrated in
Figure 2.
4. Results
4.1. Quantitative Data Analysis Results
4.1.1. Reliability of the Subjective Measures
Cronbach’s alpha was used to validate the internal consistency of the subjective evaluation questions for each measure used in the study (
Table 5). A Cronbach’s alpha of 0.7 or higher indicates high internal consistency [
54]. All measures scored above 0.7 across the agents, indicating that the questions for each measure were answered consistently.
4.1.2. Effects of the Voice Agent
Descriptive statistics for the measures are shown in
Table 6. Agent C scored the highest values on intimacy, trust, perceived safety, and perceived anthropomorphism. Although Agent A scored the lowest on intimacy and perceived anthropomorphism, it had the highest score on intention to use.
The results of the one-way repeated measures ANOVA revealed significant effects of the voice agent on intimacy (
F(2, 58) = 33.997,
p < 0.001,
ηp2 = 0.540), intention to use (
F(2, 58) = 3.509,
p < 0.05,
ηp2 = 0.108), and perceived anthropomorphism (
F(2, 58) = 30.449,
p < 0.001,
ηp2 = 0.512). Multiple comparison analyses were performed to identify significant differences between the agents (
Figure 3). Intimacy levels were significantly different between each pair of agents, increasing progressively from Agent A to B and from B to C. Intention to use was significantly lower for Agent B than for Agents A and C. Perceived anthropomorphism also showed significant differences between each pair of agents, similar to the pattern observed for intimacy.
4.2. Qualitative Data Analysis Results
A total of 216 comments were collected and categorized into eight distinct codes (
Table 7): Conciseness and Clarity, Naturalness of Conversation Flow, Familiarity and Intimacy, Understandability and Accuracy, Trust, Empathy and Emotion Expression, Information Adequacy, and Autonomy. Below is a summary of each code and the key findings.
Conciseness and Clarity: This refers to the ability of the voice agent to deliver information in a clear and concise manner. Agents A and B received positive feedback for being brief and to the point, while Agent C had more mixed reviews. For example, one participant praised Agent A for its “clear and concise answers” (7th participant), while another noted that Agent C was “kind but not concise” (18th participant).
Naturalness of Conversation Flow: This evaluates the smoothness and fluidity of the conversation. Agent A and C were generally well-received, while Agent B’s formal tone led to mixed feedback. A participant noted that Agent B’s voice felt “unnatural” despite being human (20th participant), while Agent C was described as more natural and conversational (30th participant).
Familiarity and Intimacy: This code assesses the degree to which users feel a sense of connection or emotional closeness with the voice agent. Agent C was most successful in creating a friendly and intimate atmosphere, with multiple participants praising its warm and personable tone. One participant noted that the agent’s tone was “kind and friendly” (18th participant), contributing to a more comfortable and engaging interaction. In contrast, while Agent B used a human voice, it was often described as impersonal and rigid, which detracted from the sense of familiarity. For example, one participant mentioned that Agent B felt “distant and disconnected,” despite its humanized voice (fifth participant). This suggests that simply using a human voice does not automatically enhance the perception of intimacy if the delivery feels mechanical or formal.
Understandability and Accuracy: This measures how well the agent understands and accurately executes user commands. Agents A and B generally performed well in this category, with participants highlighting their ability to immediately comprehend and act on instructions. For instance, one participant remarked regarding Agent B, “there was no inconvenience because the agent immediately understood and accepted my command” (11th participant). In contrast, Agent C received minimal feedback in this area, suggesting it performed adequately without standing out in terms of understandability and accuracy.
Trust: Trust reflects users’ confidence in the reliability and consistency of the agent’s performance. Interestingly, Agent A, which used a machine voice, was perceived as more trustworthy than Agent B, which used a human voice. Some participants mentioned that the machine voice seemed more reliable because it focused solely on providing accurate and consistent information, without attempting to simulate emotional tones. One participant specifically noted that “the machine voice felt more trustworthy than the human voice” (28th participant). This suggests that, for some users, a machine-like tone can enhance perceptions of reliability by avoiding the potential inconsistencies or unpredictability associated with human voices. It highlights an important nuance in voice interaction design: while human-like voices may improve familiarity and empathy, they do not necessarily inspire greater trust. Further exploration of how different tones of voice impact user trust would be a valuable area for future research.
Empathy and Emotion Expression: This code evaluates how effectively the voice agent conveys empathy and expresses emotions during interactions. Agent C stood out the most in this area, receiving both positive and negative feedback. Some participants appreciated the emotional engagement of the agent, noting that it created a more human-like and empathetic experience. One participant commented that they felt the agent was “continuously caring for the driver” (first participant). However, for other users, especially those who preferred more functional or task-oriented interactions, the expression of empathy was perceived as unnecessary or even distracting, with one participant stating that the empathetic comments “felt like a distraction” (28th participant). The variation in feedback suggests that while emotional tone can enhance the user experience for some, it may not be universally preferred. As such, a more detailed analysis of how different user groups respond to the emotional tone of voice agents could be an interesting direction for future research.
Information Adequacy: This refers to the sufficiency and relevance of the information provided by the agent. Agent C stood out in this regard, offering comprehensive and relevant information that participants found helpful in decision-making. One participant appreciated that Agent C “explained why it was increasing speed” (26th participant), adding a layer of transparency to its actions. On the other hand, Agents A and B were noted for lacking adequate explanations for their decisions, with participants expressing a desire for more context in their responses (11th participant).
Autonomy: This measures the agent’s ability to act independently without needing constant user input. Agent C was the only agent noted for its autonomous capabilities, especially in adjusting speed based on road conditions. A participant commented that it was “good that the agent adjusted the speed autonomously based on the characteristics of the road” (third participant). This feature was well-received by participants who valued the agent’s ability to take proactive actions for a safer and smoother driving experience. In contrast, Agents A and B did not exhibit significant autonomous behavior, and there were no notable comments regarding their performance in this area.
5. Discussion
This study effectively categorized perceived anthropomorphism into three levels, corresponding to Agents A, B, and C. The results indicate a clear and significant increase in perceived anthropomorphism from Agent A (machine voice and no humanized speech strategy) to Agent B (human voice and no humanized speech strategy), and from Agent B to Agent C (human voice and humanized speech strategy). This progression confirms that the agents were designed in such a way that the incorporation of human-like elements into voice agents enhances users’ perception of anthropomorphism. This finding confirms the robustness of our experimental design in effectively exploring UXs across different levels of anthropomorphism.
Interestingly, this study found no significant differences in trust and perceived safety between Agents A, B, and C. This could be attributed to the inherent complexity of these constructs, which may not be influenced solely by the human-like qualities of the voice agents. Trust and perceived safety are multifaceted and could be influenced by prior experience with technology, individual differences in risk perception, and the specific context of use. For instance, one participant commented, “While the human voice lacked unnecessary emotional explanations, it felt more trustworthy and efficient” (Agent B, sixth participant). Conversely, another noted, “As the voice felt more familiar, the trust and safety perception diminished, especially since it used a human voice” (Agent C, seventh participant). In contrast, positive feedback included, “The explanations provided a sense of stability, and the empathetic remarks made me feel the agent considered my safety and comfort” (Agent C, 22nd participant). These varied responses highlight the need for further research to dissect these constructs and identify other factors that may influence users’ trust and perceived safety when interacting with voice agents in autonomous vehicles. It also suggests that while humanized voices may increase perceived anthropomorphism, this may not necessarily translate into increased trust or perceived safety. Thus, additional research is required to dissect these constructs and identify other factors that may influence users’ trust and perceived safety when interacting with voice agents in autonomous vehicles.
The results regarding intention to use revealed several interesting patterns. Specifically, Agent A showed a higher intention to use than Agent B, and Agent C showed a higher intention to use than Agent B, with no significant difference between Agents A and C. This suggests that while humanized speech and a human voice contribute positively to intention to use, the mechanical voice of Agent A still has a strong appeal. One possible explanation is that users may perceive the mechanical voice as more efficient or reliable, qualities that are highly valued in utilitarian contexts such as driving [
55]. In this regard, mechanical voices could play a significant role in future systems, particularly in contexts where efficiency and reliability are prioritized over emotional engagement, such as automated customer service, navigation systems, or industrial environments. Additionally, individuals who prefer functional interactions or are less inclined toward emotionally driven interfaces may find mechanical voices more appealing. This suggests that the value of mechanical voices may continue to be recognized in certain fields and for specific user preferences, and that the adoption of mechanical voices could remain relevant in applications in which trust, accuracy, and task-oriented communication are important. Additionally, the use of a human voice without a humanized speech strategy for the anthropomorphic design of Agent B may have caused participants to feel that the conversational communication was different from what they would expect in either a normal human-to-human conversation or a human-to-voice agent conversation. Consequently, this mismatch may have lowered the intention to use for Agent B compared to Agents A and C.
This study also revealed a divergence between intimacy and intention to use. While Agent C scored highest in intimacy, it did not significantly differ from Agent A in terms of intention to use. This discrepancy can be understood by recognizing that intimacy and intention to use have complex relationships in terms of UX dimensions. Intimacy relates to the emotional connection and personal rapport that users feel with the agent and can affect the intention to use [
56]. However, the intention to use is not solely affected by intimacy and can be more influenced by practical considerations, such as efficiency and reliability, depending on the context of use [
57]. The findings indicate that users might value a human-like, intimate interaction, yet can prefer the mechanical voice in the context of autonomous driving. This divergence highlights the need for future research to delve deeper into the distinct factors that drive these different aspects of UX and how they can be optimized for various applications.
The fact that adding human-like speech strategies to a human voice did not show a significant difference in terms of the intention to use from a mechanical voice without such strategies suggests that the effort may not always yield the desired results. Nevertheless, these human-like elements have shown positive effects in terms of intimacy, suggesting potential long-term benefits for user engagement. The sense of intimacy could lead to more proactive and participatory interactions over time. However, a mismatch between human-like speech strategies and voice characteristics, as seen in this study, can potentially lead to uncomfortable experiences, similar to the uncanny valley effect observed in visual contexts. This highlights the importance of a harmonious application of anthropomorphic elements and suggests that future research should focus not only on the levels of anthropomorphism but also on the coherence and consistency of its application.
The current study opens several avenues for future research. One important direction is to investigate other potential factors that could influence trust and perceived safety beyond anthropomorphism. For example, exploring the impact of voice agent transparency, reliability of the information provided, and user control over interactions could provide deeper insights. In addition, understanding the contexts in which different levels of anthropomorphism are most beneficial in the context of autonomous driving will be critical. Finally, further exploration of the balance between human-like attributes and functional efficiency could inform the design of voice agents that meet both emotional and practical user needs.
6. Conclusions
This study examined the impact of varying levels of anthropomorphism in voice agents on the UX of young adults in autonomous vehicles. First, we categorized the voice agents into three types: Agent A (machine voice without a humanized speech strategy), Agent B (human voice without a humanized speech strategy), and Agent C (human voice with a humanized speech strategy). Our findings revealed an increase in perceived anthropomorphism from Agent A to Agent C, confirming that human-like elements significantly enhance users’ perception of anthropomorphism. Second, it was observed that there were no significant differences in trust and perceived safety across the agents, indicating that these constructs are influenced by factors beyond anthropomorphism. Additionally, while Agent A had a higher intention to use compared to Agent B, and Agent C had a higher intention to use than Agent B, there was no significant difference between Agents A and C. This suggests that while humanized speech and a human voice have a positive impact on intention to use, the mechanical voice of Agent A remains appealing due to its perceived efficiency and reliability. Third, the divergence between intimacy and intention to use highlights the complexity of the UX dimensions, suggesting that a balanced approach is needed when designing voice agents to optimize both emotional engagement and practical functionality.
Despite these insights, this study has several limitations that suggest potential directions for further investigation. The participant sample, consisting of young, healthy students, lacked diversity in age, gender, and driving experience, which may limit the generalizability of our findings. Additionally, the small sample size of 30 participants may affect the robustness of the results. Future studies should aim for a larger and more diverse participant pool to enhance the applicability of the findings across different demographic groups. Furthermore, the experimental setup was conducted in a controlled laboratory environment, which does not fully replicate real-world driving conditions. Environmental factors such as noise and the presence of other passengers were not considered, which could influence the effectiveness of voice interactions. Future research should explore these interactions in more naturalistic settings, including real vehicle experiments, to validate and expand upon our findings. In addition, the combination of mechanical voices with highly emotional or humanized agents was not explored in the present study. Investigating how users respond to emotionally expressive agents with mechanical voices remains an untested area and could provide valuable insights into the balance between functionality and emotional engagement. Future research should consider exploring this combination across different interaction contexts, such as customer service or safety-critical scenarios, to further understand the role of voice characteristics in shaping the UX. Finally, this study focused on a single, generic driving scenario. Future studies should investigate a wider range of driving contexts, such as highways and country roads, and various tasks, including emergency handling and entertainment system control, to provide a more comprehensive understanding of user interactions with voice agents in autonomous vehicles. Additionally, further exploration could be conducted on user preferences for different types of voice assistants and the psychological and emotional factors influencing these preferences.
Author Contributions
Conceptualization, Y.M.K. and J.K.; Funding acquisition, D.P.; Methodology, Y.M.K. and D.P.; Project administration, Y.M.K.; Supervision, D.P.; Writing—original draft, Y.M.K. and D.P.; Writing—review & editing, J.K. All authors have read and agreed to the published version of the manuscript.
Funding
This research was also partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-00242528).
Institutional Review Board Statement
This study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Pukyong National University (protocol code 48513, 10 April 2023).
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement
Data will be available on request.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Pelau, C.; Dabija, D.-C.; Ene, I. What makes an AI device human-like? The role of interaction quality, empathy and perceived psychological anthropomorphic characteristics in the acceptance of artificial intelligence in the service industry. Comput. Hum. Behav. 2021, 122, 106855. [Google Scholar] [CrossRef]
- Garg, R.; Sengupta, S. He is just like me: A study of the long-term use of smart speakers by parents and children. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 1–24. [Google Scholar] [CrossRef]
- Kim, J.; Merrill, K., Jr.; Xu, K.; Sellnow, D.D. I like my relational machine teacher: An AI instructor’s communication styles and social presence in online education. Int. J. Hum. Comput. Interact. 2021, 37, 1760–1770. [Google Scholar] [CrossRef]
- Martins, A.; Nunes, I.; Lapão, L.; Londral, A. Unlocking human-like conversations: Scoping review of automation techniques for personalized healthcare interventions using conversational agents. Int. J. Med. Inform. 2024, 105385. [Google Scholar] [CrossRef] [PubMed]
- Carter, E.J.; Mistry, M.N.; Carr, G.P.K.; Kelly, B.A.; Hodgins, J.K. Playing catch with robots: Incorporating social gestures into physical interactions. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication, Edinburgh, UK, 25–29 August 2014. [Google Scholar]
- Dong, J.; Santiago-Anaya, A.; Jeon, M. Facial Expressions Increase Emotion Recognition Clarity and Improve Warmth and Attractiveness on a Humanoid Robot without Adding the Uncanny Valley. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Washington, DC, USA, 23–27 October 2023; Sage CA: Los Angeles, CA, USA, 2023; Volume 67, pp. 933–939. [Google Scholar]
- Kiesler, S.; Powers, A.; Fussell, S.R.; Torrey, C. Anthropomorphic interactions with a robot and robot–like agent. Soc. Cogn. 2008, 26, 169–181. [Google Scholar] [CrossRef]
- Martini, M.C.; Buzzell, G.A.; Wiese, E. Agent appearance modulates mind attribution and social attention in human-robot interaction. In Proceedings of the Social Robotics: 7th International Conference 2015, ICSR 2015, Paris, France, 26–30 October 2015. Proceedings 7. [Google Scholar]
- Song, C.S.; Kim, Y.K. The role of the human-robot interaction in consumers’ acceptance of humanoid retail service robots. J. Bus. Res. 2022, 146, 489–503. [Google Scholar] [CrossRef]
- Takahashi, Y.; Kayukawa, Y.; Terada, K.; Inoue, H. Emotional expressions of real humanoid robots and their influence on human decision-making in a finite iterated prisoner’s dilemma game. Int. J. Soc. Robot. 2021, 13, 1777–1786. [Google Scholar] [CrossRef]
- Di Natale, A.F.; Simonetti, M.E.; La Rocca, S.; Bricolo, E. Uncanny valley effect: A qualitative synthesis of empirical research to assess the suitability of using virtual faces in psychological research. Comput. Hum. Behav. Rep. 2023, 10, 100288. [Google Scholar] [CrossRef]
- Kätsyri, J.; Förger, K.; Mäkäräinen, M.; Takala, T. A review of empirical evidence on different uncanny valley hypotheses: Support for perceptual mismatch as one road to the valley of eeriness. Front. Psychol. 2015, 6, 390. [Google Scholar] [CrossRef]
- Song, S.W.; Shin, M. Uncanny valley effects on chatbot trust, purchase intention, and adoption intention in the context of e-commerce: The moderating role of avatar familiarity. Int. J. Hum. Comput. Interact. 2024, 40, 441–456. [Google Scholar] [CrossRef]
- Huang, S.; Zhao, X.; Wei, D.; Song, X.; Sun, Y. Chatbot and Fatigued Driver: Exploring the Use of LLM-Based Voice Assistants for Driving Fatigue. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024. [Google Scholar]
- Yang, Z.; Xu, X.; Yao, B.; Rogers, E.; Zhang, S.; Intille, S.; Shara, N.; Gao, G.G.; Wang, D. Talk2Care: An LLM-based Voice Assistant for Communication between Healthcare Providers and Older Adults. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2024, 8, 1–35. [Google Scholar] [CrossRef]
- Hungund, A.P.; Pradhan, A.K. Impact of non-driving related tasks while operating automated driving systems (ADS): A systematic review. Accid. Anal. Prev. 2023, 188, 107076. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Wang, Z.; Wang, W.; Yuan, Q. Understanding Driver Preferences for Secondary Tasks in Highly Autonomous Vehicles. In Proceedings of the International Conference on Man-Machine-Environment System Engineering, Beijing, China, 21–23 October 2022; Springer Nature: Singapore, 2023; pp. 126–133. [Google Scholar]
- Wilson, C.; Gyi, D.; Morris, A.; Bateman, R.; Tanaka, H. Non-Driving Related tasks and journey types for future autonomous vehicle owners. Transp. Res. Part F Traffic Psychol. Behav. 2022, 85, 150–160. [Google Scholar] [CrossRef]
- Mahajan, K.; Large, D.R.; Burnett, G.; Velaga, N.R. Exploring the benefits of conversing with a digital voice assistant during automated driving: A parametric duration model of takeover time. Transp. Res. Part F Traffic Psychol. Behav. 2021, 80, 104–126. [Google Scholar] [CrossRef]
- Orlovska, J.; Novakazi, F.; Wickman, C.; Soderberg, R. Mixed-method design for user behavior evaluation of automated driver assistance systems: An automotive industry case. In Proceedings of the Design Society: International Conference on Engineering Design, Delft, The Netherlands, 5–8 August 2019. [Google Scholar]
- Baird, A.; Jørgensen, S.H.; Parada-Cabaleiro, E.; Cummins, N.; Hantke, S.; Schuller, B. The perception of vocal traits in synthesized voices: Age, gender, and human likeness. J. Audio Eng. Soc. 2018, 66, 277–285. [Google Scholar] [CrossRef]
- Wagner, K.; Schramm-Klein, H. Alexa, Are You Human? Investigating Anthropomorphism of Digital Voice Assistants-A Qualitative Approach. In Proceedings of the Fortieth International Conference on Information Systems, Munich, Germany, 15–18 December 2019. [Google Scholar]
- Seaborn, K.; Miyake, N.P.; Pennefather, P.; Otake-Matsuura, M. Voice in human–agent interaction: A survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–43. [Google Scholar] [CrossRef]
- Abdulrahman, A.; Richards, D. Is natural necessary? Human voice versus synthetic voice for intelligent virtual agents. Multimodal Technol. Interact. 2022, 6, 51. [Google Scholar] [CrossRef]
- Hagens, E. Designing Trustworthy Voice Assistants for Healthcare: Theory and Practice of Voice Assistants for the Outpatient Clinic Healthy Pregnancy. Master’s Thesis, Delft University of Technology, Delft, The Netherlands, 2022. [Google Scholar]
- Hu, P.; Lu, Y. Dual humanness and trust in conversational AI: A person-centered approach. Comput. Hum. Behav. 2021, 119, 106727. [Google Scholar] [CrossRef]
- Pias, S.B.H.; Huang, R.; Williamson, D.S.; Kim, M.; Kapadia, A. The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product Recommendations. In Proceedings of the 6th ACM Conference on Conversational User Interfaces, Luxembourg, 8–10 July 2024; pp. 1–15. [Google Scholar]
- Calahorra-Candao, G.; Martín-de Hoyos, M.J. The effect of anthropomorphism of virtual voice assistants on perceived safety as an antecedent to voice shopping. Comput. Hum. Behav. 2024, 153, 108124. [Google Scholar] [CrossRef]
- Waytz, A.; Heafner, J.; Epley, N. The mind in the machine: Anthropomorphism increases trust in an autonomous vehicle. J. Exp. Soc. Psychol. 2014, 52, 113–117. [Google Scholar] [CrossRef]
- Niu, D.; Terken, J.; Eggen, B. Anthropomorphizing information to enhance trust in autonomous vehicles. Hum. Factors Ergon. Manuf. Serv. Ind. 2018, 28, 352–359. [Google Scholar] [CrossRef]
- Lee, S.; Ratan, R.; Park, T. The voice makes the car: Enhancing autonomous vehicle perceptions and adoption intention through voice agent gender and style. Multimodal Technol. Interact. 2019, 3, 20. [Google Scholar] [CrossRef]
- Dong, J.; Lawson, E.; Olsen, J.; Jeon, M. Female voice agents in fully autonomous vehicles are not only more likeable and comfortable, but also more competent. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Online, 5–9 October 2020; Sage CA: Los Angeles, CA, USA, 2020; Volume 64, pp. 1033–1037. [Google Scholar]
- Im, H.; Sung, B.; Lee, G.; Kok, K.Q.X. Let voice assistants sound like a machine: Voice and task type effects on perceived fluency, competence, and consumer attitude. Comput. Hum. Behav. 2023, 145, 107791. [Google Scholar] [CrossRef]
- Large, D.R.; Clark, L.; Burnett, G.; Harrington, K.; Luton, J.; Thomas, P.; Bennett, P. “It’s small talk, jim, but not as we know it”. Engendering trust through human-agent conversation in an autonomous, self-driving car. In Proceedings of the 1st International Conference on Conversational User Interfaces, Dublin, Ireland, 22–23 August 2019; pp. 1–7. [Google Scholar]
- Lee, S.C.; Sanghavi, H.; Ko, S.; Jeon, M. Autonomous driving with an agent: Speech style and embodiment. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings, Utrecht, The Netherlands, 21–25 September 2019; pp. 209–214. [Google Scholar]
- Wang, M.; Lee, S.C.; Kamalesh Sanghavi, H.; Eskew, M.; Zhou, B.; Jeon, M. In-vehicle intelligent agents in fully autonomous driving: The effects of speech style and embodiment together and separately. In Proceedings of the 13th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Leeds, UK, 9–14 September 2021; pp. 247–254. [Google Scholar]
- Wang, M.; Lee, S.C.; Montavon, G.; Qin, J.; Jeon, M. Conversational voice agents are preferred and lead to better driving performance in conditionally automated vehicles. In Proceedings of the 14th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Seoul, Republic of Korea, 17–20 September 2022; pp. 86–95. [Google Scholar]
- Creswell, J.W.; Clark, V.L.P. Designing and Conducting Mixed Methods Research; Sage Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar]
- Johnson, R.B.; Onwuegbuzie, A.J.; Turner, L.A. Toward a definition of mixed methods research. J. Mix. Methods Res. 2007, 1, 112–133. [Google Scholar] [CrossRef]
- Tashakkori, A.; Teddlie, C. (Eds.) Handbook of Mixed Methods in Social & Behavioral Research; Sage Publications: Thousand Oaks, CA, USA, 2010. [Google Scholar]
- Ivankova, N.V.; Creswell, J.W.; Stick, S.L. Using mixed-methods sequential explanatory design: From theory to practice. Field Methods 2006, 18, 3–20. [Google Scholar] [CrossRef]
- Greene, J.C.; Caracelli, V.J.; Graham, W.F. Toward a conceptual framework for mixed-method evaluation designs. Educ. Eval. Policy Anal. 1989, 11, 255–274. [Google Scholar] [CrossRef]
- Mertens, D.M. Research and Evaluation in Education and Psychology: Integrating Diversity with Quantitative, Qualitative, and Mixed Methods; Sage Publications: Thousand Oaks, CA, USA, 2023. [Google Scholar]
- Morse, J.M. Principles of Mixed Methods. In Handbook of Mixed Methods in Social & Behavioral Research; Sage Publications: Thousand Oaks, CA, USA, 2003; p. 189. [Google Scholar]
- Ha, T.; Kim, S.; Seo, D.; Lee, S. Effects of explanation types and perceived risk on trust in autonomous vehicles. Transp. Res. Part F Traffic Psychol. Behav. 2020, 73, 271–280. [Google Scholar] [CrossRef]
- Haslam, N. Dehumanization: An integrative review. Personal. Soc. Psychol. Rev. 2006, 10, 252–264. [Google Scholar] [CrossRef]
- Lee, J.G.; Lee, K.M. Polite speech strategies and their impact on drivers’ trust in autonomous vehicles. Comput. Hum. Behav. 2022, 127, 107015. [Google Scholar] [CrossRef]
- Ruijten, P.A.; Terken, J.M.; Chandramouli, S.N. Enhancing trust in autonomous vehicles through intelligent user interfaces that mimic human behavior. Multimodal Technol. Interact. 2018, 2, 62. [Google Scholar] [CrossRef]
- Han, S.; Yang, H. Understanding adoption of intelligent personal assistants: A parasocial relationship perspective. Ind. Manag. Data Syst. 2018, 118, 618–636. [Google Scholar] [CrossRef]
- Yoo, Y.; Yang, M.Y.; Lee, S.; Baek, H.; Kim, J. The effect of the dominance of an in-vehicle agent’s voice on driver situation awareness, emotion regulation, and trust: A simulated lab study of manual and automated driving. Transp. Res. Part F Traffic Psychol. Behav. 2022, 86, 33–47. [Google Scholar] [CrossRef]
- Lu, L.; Cai, R.; Gursoy, D. Developing and validating a service robot integration willingness scale. Int. J. Hosp. Manag. 2019, 80, 36–51. [Google Scholar] [CrossRef]
- Field, A. Discovering Statistics Using IBM SPSS Statistics; Sage: Thousand Oaks, CA, USA, 2013. [Google Scholar]
- Hsieh, H.-F.; Shannon, S.E. Three Approaches to Qualitative Content Analysis. Qual. Health Res. 2005, 15, 1277–1288. [Google Scholar] [CrossRef]
- Nunnally, J.; Bernstein, I. Psychometric Theory, 3rd ed.; MacGraw-Hill: New York, NY, USA, 1994. [Google Scholar]
- Wang, Y.; Zhang, W.; Zhou, R. Speech-based takeover requests in conditionally automated driving: Effects of different voices on the driver takeover performance. Appl. Ergon. 2022, 101, 103695. [Google Scholar] [CrossRef]
- Lee, Y.; Kwon, O. Intimacy, familiarity and continuance intention: An extended expectation–confirmation model in web-based services. Electron. Commer. Res. Appl. 2011, 10, 342–357. [Google Scholar] [CrossRef]
- Partala, T.; Saari, T. Understanding the most influential user experiences in successful and unsuccessful technology adoptions. Comput. Hum. Behav. 2015, 53, 381–395. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).