- Article
Voice, Text, or Embodied AI Avatar? Effects of Generative AI Interface Modalities in VR Museums
- Pakinee Ariya,
- Perasuk Worragin and
- Phichete Julrode
- + 2 authors
Virtual museums delivered through immersive virtual reality (VR) function as information environments where users access interpretive content while navigating spatially. With the integration of generative artificial intelligence (AI), conversational assistants can dynamically mediate information interaction; however, evidence remains limited regarding how different AI interface representations affect user experience. This study compares three generative AI interface modalities in a VR virtual museum: voice only, voice with synchronized text, and voice with an embodied AI avatar. A controlled experiment with 75 participants examined their effects on user engagement, perceived information quality, and subjective cognitive workload while holding informational content constant. The results indicate that the voice-and-text modality produced the highest perceived information quality, whereas the embodied AI avatar modality yielded the highest user engagement. No significant differences were observed in cognitive workload across modalities. These findings suggest that AI interface modalities play complementary roles in VR-based information interaction and provide design guidance for selecting appropriate AI representations in immersive information systems.
11 March 2026



![Workflow diagram of the application of the PRISMA-ScR protocol. Source: Authors’ own elaboration based on Bastos et al., Codina and Page et al. [18,19,20].](https://mdpi-res.com/cdn-cgi/image/w=281,h=192/https://mdpi-res.com/informatics/informatics-13-00041/article_deploy/html/images/informatics-13-00041-g001-550.jpg)



