Multimodal Technologies and Interaction

18 pages, 1447 KB

Open AccessSystematic Review

Parental Communication Strategies During Screen Time in Early Childhood: A Scoping Review of Joint Media Engagement

by Litna A Varghese, Gagan Bajaj, Megha Mohan, Jayashree S. Bhat, Jayashree Kanthila and Aiswarya Liz Varghese

Multimodal Technol. Interact. 2026, 10(6), 66; https://doi.org/10.3390/mti10060066 - 4 Jun 2026

Abstract

Background: This scoping review aimed to systematically identify communication strategies used during Joint Media Engagement (JME) and examine their associations with developmental outcomes and contextual factors. Methods: A systematic search of seven databases (up to April 2025) was conducted using Rayyan, [...] Read more.

Background: This scoping review aimed to systematically identify communication strategies used during Joint Media Engagement (JME) and examine their associations with developmental outcomes and contextual factors. Methods: A systematic search of seven databases (up to April 2025) was conducted using Rayyan, following PRISMA-ScR guidelines; 26 studies met inclusion criteria and were synthesized to categorize parent communication strategies and their theoretical underpinnings. Results: Fifteen distinct communication strategies were identified and organized into four theoretical frameworks; Social Learning, Sociopragmatic, Behaviourist, and Theory of Mind along with a fifth category for technical scaffolding. Strategies aligned with Social Learning were most frequently reported and consistently associated with improvements in children’s language, cognitive, and socio-emotional outcomes. Findings also showed that JME strategies vary based on contextual factors, including parent type, geography, device type, media content, and child characteristics. Although most studies did not explicitly focus on JME, those employing mixed methods provided deeper insights. Conclusions: JME is shaped by both interaction quality and context, with Social Learning-based strategies playing a central role in supporting child development. The findings highlight the need for more rigorous, JME-focused research across diverse digital formats to strengthen the evidence-based parent coaching approaches to optimize JME practices in early childhood. Full article

► Show Figures

Figure 1

31 pages, 22757 KB

Open AccessArticle

Personalizing Live Avatar Interaction for Children with ASD Through Restricted Interests: A Feasibility Study

by Luis Fernando Guerrero-Vásquez, Martín López-Nores, Henry J. Jara-Quito, Dalila M. González-González and Jack Fernando Bravo-Torres

Multimodal Technol. Interact. 2026, 10(6), 65; https://doi.org/10.3390/mti10060065 - 2 Jun 2026

Abstract

Virtual avatars have shown potential as supports in Autism Spectrum Disorder (ASD) interventions, but many existing systems provide largely standardized interactions that do not account for individual variability. This study presents an exploratory evaluation of a virtual puppet system that enables real-time interaction [...] Read more.

Virtual avatars have shown potential as supports in Autism Spectrum Disorder (ASD) interventions, but many existing systems provide largely standardized interactions that do not account for individual variability. This study presents an exploratory evaluation of a virtual puppet system that enables real-time interaction by synchronously transmitting a human model’s movements, facial gestures, and voice to a digital avatar. The system was personalized using each participant’s restricted interests (RIs), identified through a clinical triangulation process involving therapist input, caregiver reports, and observation. After an initial technical validation with 16 neurotypical children, the system was evaluated in a proof-of-concept sample of 11 children with ASD (7 in an experimental group exposed to RI-based personalization and 4 in a control group interacting with a standard interface). Data sources included eye tracking and therapist-completed observational questionnaires. Across sessions, descriptive patterns in gaze fixation and therapist reports suggested that RI-based personalization may help sustain attention to the screen and support engagement with the therapeutic environment relative to non-personalized interaction. Heatmap patterns further indicated that children under the personalized condition visually explored RI-related elements within the scene. This study provides evidence of technical and procedural feasibility and generates hypotheses for future research. Full article

► Show Figures

Figure 1

30 pages, 3261 KB

Open AccessArticle

Illusionary Selves: Critiquing Online Persona Construction Through AI-Mediated Interaction Design

by Xueyi Li, Yonghong Liu and Yangcheng Wang

Multimodal Technol. Interact. 2026, 10(6), 64; https://doi.org/10.3390/mti10060064 - 1 Jun 2026

Abstract

Social media platforms have become central sites of identity construction, where visibility and legitimacy are shaped through algorithmic systems, aesthetic conventions, and platform economies. This paper approaches online personas through the lens of illusionary selves, understood here as online personas experienced as authentic [...] Read more.

Social media platforms have become central sites of identity construction, where visibility and legitimacy are shaped through algorithmic systems, aesthetic conventions, and platform economies. This paper approaches online personas through the lens of illusionary selves, understood here as online personas experienced as authentic while being shaped by sociotechnical processes, examining how they are produced through sociotechnical processes entangling design practices, generative artificial intelligence(AI), and cultural expectations. We present an AI-mediated critical design inquiry into how generative systems translate and normalize visual patterns of online self-imaging. Using a pix2pix-based model trained on 630 internet celebrity selfies, facial images are abstracted into dot-based representations and aggregated across selfie angles, foregrounding repetition and normalization. An interactive design installation links bodily orientation and numerical parameters to generative output in real time, introducing perceptual friction in self-imaging. A total of 30 participants engaged with the system in situated contexts, and their experiences were documented through observation, video recording, and a 5-point Likert questionnaire across three dimensions: perceptual friction, awareness of algorithmic mediation, and reflective responses to self-presentation. Results indicate high levels of perceptual friction (mean [M] = 4.21), strong awareness of algorithmic mediation (M = 4.29), and consistent reflective unease (M = 4.07). Through situated use, the system renders algorithmic mediation tangible and positions AI as an implicated actor in identity construction. This work contributes a conceptual framing of AI-mediated critical design, showing how generative and interactive systems operate as epistemic devices interrogating online persona construction. Full article

(This article belongs to the Special Issue Human-AI Collaborative Interaction Design: Rethinking Human-Computer Symbiosis in the Age of Intelligent Systems)

► Show Figures

Figure 1

33 pages, 10391 KB

Open AccessArticle

Computational Method for Predicting Visual Attention in Older Adults with Age-Related Features

by Xiangdong Li, Xinchi Shi, Haoyu Gu, Tianai Shen, Shiwei Cheng and Jing Wang

Multimodal Technol. Interact. 2026, 10(6), 63; https://doi.org/10.3390/mti10060063 - 1 Jun 2026

Abstract

Age-related changes in visual perception alter attentional deployment, yet computational models of visual attention have been validated almost exclusively on younger populations. This limits both the theoretical investigation of age-specific mechanisms and practical applications in age-inclusive design, where researchers depend on specialised eye-tracking [...] Read more.

Age-related changes in visual perception alter attentional deployment, yet computational models of visual attention have been validated almost exclusively on younger populations. This limits both the theoretical investigation of age-specific mechanisms and practical applications in age-inclusive design, where researchers depend on specialised eye-tracking equipment to observe such differences. Therefore, we present the Elderly Visual Attention Estimation (EVAE) model, a computational framework that predicts early visual attentional orienting in older adults by combining stimulus-driven image features with age-specific top-down priors. The framework models six dimensions of elderly visual attention from cross-age eye-tracking data: colour brightness sensitivity, centre bias, foreground–background differentiation, depth detection, early attentional prior, and sustained-attention spatial prior. On public datasets, EVAE achieves an AUC-Judd of 0.92, which outperforms existing saliency models and deep learning approaches such as DeepGaze II. The framework is optimised for an input resolution of 128 × 96 pixels, producing fixation probability maps that are upsampled to match the original stimulus resolution for practical interface evaluation. Cross-age validation confirms the model’s specificity, as EVAE predicts attentional behaviour in older adults but does not generalise to younger adults. An ablation study shows that image features and top-down spatial priors each contribute independently to prediction accuracy, and that bottom-up saliency alone cannot account for age-related attentional patterns. Centre bias and early attentional prior are the strongest predictors, indicating that visual ageing involves greater reliance on spatial strategies and compensatory processing. As an alternative to hardware-based eye-tracking, EVAE widens the scope of empirical research into older adults’ visual attention and informs the design of accessible digital interfaces. Full article

► Show Figures

Figure 1

15 pages, 547 KB

Open AccessArticle

First-Grade Students’ Perspectives on Digital and Traditional Learning

by Josipa Jurić, Branka Šegvić and Zoa Šimundić

Multimodal Technol. Interact. 2026, 10(6), 62; https://doi.org/10.3390/mti10060062 - 1 Jun 2026

Abstract

The development of digital technologies in early education raises important questions regarding how students perceive and experience learning in digital environments. The aim of this study was to explore first-grade students’ perspectives on learning using mobile devices, tablets, and digital textbooks, with particular [...] Read more.

The development of digital technologies in early education raises important questions regarding how students perceive and experience learning in digital environments. The aim of this study was to explore first-grade students’ perspectives on learning using mobile devices, tablets, and digital textbooks, with particular emphasis on their perceived advantages, limitations, and preferences. The research was conducted using a qualitative approach through four focus groups with a total of 20 first-grade students. The results indicate that students recognise the motivational and stimulating potential of digital technologies, particularly in terms of visualisation and the engaging nature of content. However, they simultaneously express a clear preference for traditional learning, emphasising the importance of concentration, independent thinking, and adult support. Digital tools are perceived as useful but secondary, and are often associated with distractions and reduced cognitive effort. The findings suggest that students do not perceive all forms of learning equally, but rather associate meaningful learning with effort, autonomy, and active cognitive engagement. The results highlight the need for the careful integration of digital technologies into the teaching process, particularly in early education, where attention, structure, and social interaction are crucial for effective learning. Full article

(This article belongs to the Special Issue Online Learning to Multimodal Era: Interfaces, Analytics and User Experiences)

► Show Figures

Figure 1

35 pages, 5851 KB

Open AccessArticle

AUMOR: Augmented-Reality-Based Mobile Application for University Orientation

by Muhammad Nadeem, Melinda Oroszlanyova, Pauly Awad, Hasan Ozkan and Svetlana Beryozkina

Multimodal Technol. Interact. 2026, 10(6), 61; https://doi.org/10.3390/mti10060061 - 29 May 2026

Abstract

Fresh engineering students are often required to absorb a large amount of new information within a short period of time, which can be academically and emotionally challenging. To address this challenge, this study introduces AUMOR, a mobile application designed to enhance university orientation [...] Read more.

Fresh engineering students are often required to absorb a large amount of new information within a short period of time, which can be academically and emotionally challenging. To address this challenge, this study introduces AUMOR, a mobile application designed to enhance university orientation by delivering contextual information at the point of need. It integrates GPS-based localization with QR code triggers to provide real-time, location-specific guidance and interactive content through an augmented reality (AR) interface. It uses GPS functionality to provide real-time location-based services, including information about academic buildings, student services, and recreational facilities. The QR codes on devices and laboratory equipment provide relevant information when scanned. A post-deployment user perception survey was conducted using a paper-based questionnaire involving 128 participants, including both students and faculty members. The results indicate that users perceived the application as helpful in enhancing their spatial awareness, navigation confidence, and ability to locate campus facilities, demonstrating high levels of usability and acceptance. The findings suggest that students perceived AUMOR as helpful for university orientation and suggest potential as a scalable solution. Full article

(This article belongs to the Special Issue Educational Virtual/Augmented Reality)

► Show Figures

Graphical abstract

20 pages, 6107 KB

Open AccessArticle

Improving Chatbot Usability Through Structured Prompt-Based Interaction Design

by Gisel Katerine Bastidas-Guacho, Edison Patricio Azogue Martínez, Marco Antonio Gabilanes Martínez and Patricio Xavier Moreno-Vallejo

Multimodal Technol. Interact. 2026, 10(6), 60; https://doi.org/10.3390/mti10060060 - 28 May 2026

Abstract

This study presents a comparative evaluation of the usability of an intelligent chatbot implemented in a childcare center management system, focusing on the impact of a prompt-enhanced conversational configuration on user experience. The Chatbot Usability Questionnaire (CUQ) was used to assess perceived usability [...] Read more.

This study presents a comparative evaluation of the usability of an intelligent chatbot implemented in a childcare center management system, focusing on the impact of a prompt-enhanced conversational configuration on user experience. The Chatbot Usability Questionnaire (CUQ) was used to assess perceived usability under two conditions: a baseline configuration and an enhanced configuration incorporating role-based prompting and preprocessing mechanisms. The results indicate a substantial increase in CUQ scores, from 69 in the baseline condition to 91 in the enhanced condition, suggesting improved perceived usability. Rather than isolating prompt engineering as a standalone variable, this work evaluates a system-level design approach that integrates structured prompts, role-based contextualization, and interaction refinement strategies. This study contributes to the understanding of how prompt-enhanced conversational designs can improve response clarity, relevance, and interaction quality in multi-role environments, including parents, teachers, and administrators. The findings provide empirical evidence that such configurations are associated with more coherent and role-appropriate interactions in service-oriented chatbot systems. Full article

► Show Figures

Figure 1

20 pages, 1030 KB

Open AccessArticle

The Pedagogical Transfer Chain in the DigCompEdu Framework from a Teacher-Reported Perspective: A Predictive Analysis Using PLS-SEM and ANN

by Daira Marizol Carvajal Morales, Jessica Mariela Carvajal Morales, Milton Alfonso Criollo Turusina, Santiago José Chele Delgado, Erika Jadira Romero Cardenas and Juan Diego Valenzuela Cobos

Multimodal Technol. Interact. 2026, 10(6), 59; https://doi.org/10.3390/mti10060059 - 26 May 2026

Abstract

The steady advancement of online education has not automatically translated into improved educational quality. Teacher training often continues to focus on the technical use of digital tools, while the pedagogical processes through which teachers report supporting students’ digital competence remain insufficiently understood. The [...] Read more.

The steady advancement of online education has not automatically translated into improved educational quality. Teacher training often continues to focus on the technical use of digital tools, while the pedagogical processes through which teachers report supporting students’ digital competence remain insufficiently understood. The objective of this study was to examine the sequential and predictive structure of teachers’ digital competence using the DigCompEdu framework as a reference. A quantitative cross-sectional study was conducted with a sample of 136 university teachers involved in online education. Data were collected through a self-reported questionnaire based on DigCompEdu and analyzed in two phases: Partial Least Squares Structural Equation Modeling (PLS-SEM) and Artificial Neural Networks (ANNs). The PLS-SEM results suggested a sequential pattern of associations among teacher-reported constructs: Professional Commitment (PC) was positively associated with Digital Resource Management (DR), which in turn was positively associated with Digital Pedagogy (DP) and Assessment and Feedback (AF). These dimensions were associated with Student Empowerment (SE), which showed the strongest positive relationship with teachers’ reported practices for Facilitating Students’ Digital Competence (FS). The ANN sensitivity analysis showed adequate predictive performance in the testing phase (RMSE = 0.155) and identified Student Empowerment as the predictor with the highest normalized importance within the specified model. These findings suggest that faculty development in online higher education may benefit from moving beyond basic digital literacy and platform management toward pedagogical design, formative assessment, inclusive participation, and learner agency. However, the results should be interpreted as evidence of teacher-reported facilitation practices within the analyzed sample, rather than as direct evidence of students’ actual digital competence development. Full article

(This article belongs to the Special Issue Online Learning to Multimodal Era: Interfaces, Analytics and User Experiences)

► Show Figures

Figure 1

36 pages, 2361 KB

Open AccessReview

A Comprehensive Review of Deep Learning Approaches for Video-Based Sign Language Recognition: Datasets, Challenges and Insights

by Ulmeken Berzhanova, Aigerim Yerimbetova, Marek Milosz, Bakzhan Sakenov, Dina Oralbekova, Elmira Daiyrbayeva and Daniyar Turgan

Multimodal Technol. Interact. 2026, 10(6), 58; https://doi.org/10.3390/mti10060058 - 22 May 2026

Abstract

This study presents a comprehensive review of more than 100 research papers on sign language recognition (SLR) published between 2020 and 2026. The analysis focuses on deep learning approaches applied to video-based SLR, including spatiotemporal feature extraction, temporal modeling, attention mechanisms, motion-based representations, [...] Read more.

This study presents a comprehensive review of more than 100 research papers on sign language recognition (SLR) published between 2020 and 2026. The analysis focuses on deep learning approaches applied to video-based SLR, including spatiotemporal feature extraction, temporal modeling, attention mechanisms, motion-based representations, hybrid frameworks, transfer learning methods and other methods. Particular attention is given to how these methods model spatiotemporal dynamics and capture subtle gesture characteristics in sign language communication. The review highlights several recent developments, such as the introduction of specialized datasets, the emergence of real-time recognition systems, and the integration of multimodal fusion strategies. At the same time, persistent challenges remain, including data scarcity in low-resource sign languages, limited linguistic standardization of datasets, and insufficient model interpretability. The findings underline the importance of developing scalable and generalizable models capable of handling diverse datasets and user variability. The distinct contributions of this review are fourfold: (1) a comprehensive synthesis of over 100 studies published between 2020 and 2026, covering the full spectrum of deep learning architectures for video-based SLR; (2) a structured six-category taxonomy enabling systematic cross-architectural comparison; (3) a comprehensive focus on low-resource sign languages, which remain underrepresented in the existing literature; and (4) a critical analysis of the current benchmark landscape for low-resource sign languages, identifying key gaps and outlining strategic directions for future dataset development. These contributions are intended to guide further research toward more robust, inclusive, and universally applicable SLR systems. Full article

► Show Figures

Figure 1

44 pages, 2602 KB

Open AccessArticle

From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education

by Nikolaos Pellas

Multimodal Technol. Interact. 2026, 10(5), 57; https://doi.org/10.3390/mti10050057 - 21 May 2026

Abstract

Computational thinking (CT) is increasingly recognized as essential in education, yet teacher preparation programs struggle to develop both computational proficiency and pedagogical readiness in pre-service teachers (PSTs). This study examines an AI-mediated, game-making course grounded in the emerging “vibe coding” paradigm, where 24 [...] Read more.

Computational thinking (CT) is increasingly recognized as essential in education, yet teacher preparation programs struggle to develop both computational proficiency and pedagogical readiness in pre-service teachers (PSTs). This study examines an AI-mediated, game-making course grounded in the emerging “vibe coding” paradigm, where 24 novice PSTs iteratively constructed programs through natural language prompting. Adopting a mixed-methods design, the study drew on pre- and post-course attitude questionnaires, reflective accounts of prompting strategies, and open-ended responses. Results indicate that participants substantively engaged with core CT practices, particularly debugging, iterative refinement, and problem decomposition. Nonetheless, this downward recalibration in self-reported coding and teaching confidence represents a productive adjustment rather than a failure. Conversely, attitudes toward game-making improved significantly, with a statistically significant medium effect size for perceived instructional value (d = 0.51), the largest practical effect observed across dimensions. Most participants intended to integrate CT into future teaching. These findings suggest that prompt-driven learning environments support meaningful engagement with computational processes when carefully scaffolded, but do not inherently ensure pedagogical readiness, particularly for higher-order CT practices such as abstraction and pattern recognition. Unlike prior research that has examined game-making processes or PST attitudes toward CT in isolation, this study empirically integrates all three within a single scaffolded instructional design using vibe coding. This integration enables a process-level account of how CT is enacted—and how it develops—when code generation is partially delegated to AI systems. Beyond documenting attitude shifts, the study introduces an analytical rubric for identifying CT engagement in AI-mediated prompting and derives evidence-based design principles that specify the pedagogical conditions under which vibe coding supports, rather than bypasses, computational reasoning. Full article

(This article belongs to the Special Issue Technology-Enhanced Game-Based Approaches in Education: Learning, Emotions, and Motivation)

► Show Figures

Figure 1

17 pages, 1515 KB

Open AccessArticle

Attention-Based Multimodal Fusion for Salience-Aware Blended Emotion Recognition

by José Salas-Cáceres, Modesto Castrillón-Santana, Oliverio J. Santana, Daniel Hernández-Sosa and Javier Lorenzo-Navarro

Multimodal Technol. Interact. 2026, 10(5), 56; https://doi.org/10.3390/mti10050056 - 20 May 2026

Abstract

Blended emotion recognition introduces the challenge of identifying not only which emotions are present in an expressive display but also their relative salience. The proposed methodology builds upon the pre-extracted features provided with the dataset and enhances performance through a combination of temporal [...] Read more.

Blended emotion recognition introduces the challenge of identifying not only which emotions are present in an expressive display but also their relative salience. The proposed methodology builds upon the pre-extracted features provided with the dataset and enhances performance through a combination of temporal modeling and multimodal fusion strategies. Unimodal experiments revealed that visual encoders consistently outperformed audio ones, with the multimodal HiCMAE encoder achieving the strongest single-encoder results with 34% presence accuracy and 18.23% salience accuracy. Multimodal fusion further improved performance, with the best validation results obtained using a combination of simple concatenation and attention-based fusion, reaching 47.86% in presence accuracy and 27.92% in salience accuracy. Overall, the proposed methodology surpasses the chosen baseline introduced in the original paper across a k-fold experiment, confirming the effectiveness of multimodal attention-based fusion for the accurate prediction of both emotion presence and salience in blended affective behaviour. The experimental results further indicate that multimodal expression recognition consistently outperforms unimodal approaches, highlighting the complementary nature of cross-modal information. Full article

► Show Figures

Figure 1

18 pages, 4228 KB

Open AccessArticle

MAVAGEN: Multimodal Avatar Generation Framework for Personalized Human–Computer Interaction

by Alexandr Axyonov, Elena Ryumina, Dmitry Ryumin and Alexey Karpov

Multimodal Technol. Interact. 2026, 10(5), 55; https://doi.org/10.3390/mti10050055 - 18 May 2026

Abstract

Digital-avatar systems still provide limited control over emotionally expressive behavior in human–computer interaction, especially in Large Language Model (LLM)-based chatbots and virtual assistants with personalized visual embodiments. To address this problem, we propose Multimodal Avatar Generation (MAVAGEN), a multimodal avatar generation framework for [...] Read more.

Digital-avatar systems still provide limited control over emotionally expressive behavior in human–computer interaction, especially in Large Language Model (LLM)-based chatbots and virtual assistants with personalized visual embodiments. To address this problem, we propose Multimodal Avatar Generation (MAVAGEN), a multimodal avatar generation framework for synthesizing upper-body digital avatars with personalized appearance and controllable emotional expression. The user specifies the desired gender and age, as well as provides a short text input from which the target emotional state is inferred. MAVAGEN then retrieves an identity image from the HaGRIDv2-1M corpus and generates an avatar clip with synchronized facial expressions, hand gestures, and expressive speech. The framework uses the following six feature streams: textual features, emotion-distribution features, landmark-based pose features, depth-geometry features, RGB-appearance features, and acoustic features. In a quantitative evaluation against recent human animation methods, MAVAGEN achieves the best overall avatar quality, with FID 48.20, FVD 592.00, SSIM 0.741, Sync-C 7.40, HKC 0.929, HKV 25.30, CSIM 0.563, and EmoAcc 0.88. Ablation results show that emotion and acoustic features contribute most to emotional agreement, while landmark-based pose and depth features improve geometric and motion stability. These results support the practical use of MAVAGEN in personalized LLM-based assistants and other emotion-sensitive interactive systems. Full article

► Show Figures

Figure 1

20 pages, 19314 KB

Open AccessArticle

Haptic and Thermal Rendering of Astronomical Data: A Multimodal Approach to Inclusive Science Communication

by Beatriz García, Johanna Casado and Alexis Mancilla

Multimodal Technol. Interact. 2026, 10(5), 54; https://doi.org/10.3390/mti10050054 - 12 May 2026

Abstract

Universal Accessibility in Astronomy requires a paradigm shift from visual-centric communication to multisensory data interaction. Because astronomy communication relies inherently on high-resolution imagery and visual metaphors, it creates significant accessibility barriers for blind and low-vision (BLV) audiences. To address this, multimodal encoding offers [...] Read more.

Universal Accessibility in Astronomy requires a paradigm shift from visual-centric communication to multisensory data interaction. Because astronomy communication relies inherently on high-resolution imagery and visual metaphors, it creates significant accessibility barriers for blind and low-vision (BLV) audiences. To address this, multimodal encoding offers a feasible and meaningful solution by redistributing information across alternative sensory channels, ensuring that the absence of sight does not preclude the comprehension of spatial data. This article explores the development and evaluation of a low-cost, multimodal tool designed to represent complex astronomical concepts—specifically stellar magnitude and color—through tactile and auditory stimuli. Unlike traditional methods, our approach focuses on the haptic-cognitive link, allowing users to “feel” data through physical relief models. We present a structured impact study involving a heterogeneous group of blind, low-vision, and sighted participants. The methodology followed a mixed-methods approach, including a participatory workshop with 20 individuals and a detailed usability assessment with a core group (n= 6) of blind and low-vision participants. Preliminary results from this pilot phase demonstrate that multimodal integration effectively reduces the perceived mental effort for complex spatial data comprehension. Quantitative and qualitative feedback suggests that tactile-auditory sensory substitution not only improves accessibility but also enhances engagement and information retention across all user groups. These findings highlight the potential of multimodal models in transforming public scientific environments, such as museums and observatories, into inclusive, interactive spaces. Full article

► Show Figures

Figure 1

28 pages, 896 KB

Open AccessArticle

A Conceptual Framework for Mobile Augmented-Reality Storytelling to Support Collaborative Language Learning in Vocational Education and Training

by Eirini Maria Paraskevioti, Athanasios Christopoulos, Stylianos Mystakidis, Mikko-Jussi Laakso and Tapio Salakoski

Multimodal Technol. Interact. 2026, 10(5), 53; https://doi.org/10.3390/mti10050053 - 11 May 2026

Abstract

Augmented Reality (AR) has been found to produce significant effects on individual learning outcomes but its impact on collaborative applications remains moderate. Existing AR frameworks emphasize individual instructional design, whereas frameworks for collaborative learning rarely engage with the spatial and device-mediated affordances of [...] Read more.

Augmented Reality (AR) has been found to produce significant effects on individual learning outcomes but its impact on collaborative applications remains moderate. Existing AR frameworks emphasize individual instructional design, whereas frameworks for collaborative learning rarely engage with the spatial and device-mediated affordances of mobile AR. In response to this inadequacy in the literature, we introduce the Mobile Augmented-Reality Storytelling for Vocational Education and Training (MARS-VET) framework, a four-dimensional conceptual architecture that integrates Computer-Supported Collaborative Learning (CSCL) scripting principles with mobile AR affordances for collaborative English as a Foreign Language (EFL) writing in Vocational Education and Training (VET) settings. MARS-VET synthesizes theoretical perspectives across four dimensions: contextual anchoring, which embeds activities within authentic workplace scenarios; collaborative orchestration, which structures group interaction through macro- and micro-level scripts; competency cultivation, which sequences writing progression from model-based reproduction toward autonomous professional text production; and capacity building, which addresses the professional-development requirements of implementing educators. Content validity was established through expert panel evaluation involving international specialists (N = 11) who rated the framework against 36 items using a four-point relevance scale and provided additional qualitative feedback. The Scale-level Content Validity Index (S-CVI/Ave = 0.91) exceeded established thresholds, with all four dimensions achieving satisfactory item-level indices. Experts reached unanimous agreement on items addressing workplace scenario identification and co-located access to linguistic resources. Qualitative feedback led to terminology refinements and clarification of orchestration mechanisms. The framework offers VET institutions and educators a reference for the design and evaluation of collaborative AR experiences in an area where integrative frameworks have so far been lacking. Full article

(This article belongs to the Special Issue Online Learning to Multimodal Era: Interfaces, Analytics and User Experiences)

► Show Figures

Figure 1

43 pages, 4156 KB

Open AccessArticle

AI-Mediated Multimodal Learning and Its Impact on Sustainable Design Cognition: An Experimental Study with Interior Design Students

by Yang Song and Shaochen Wang

Multimodal Technol. Interact. 2026, 10(5), 52; https://doi.org/10.3390/mti10050052 - 9 May 2026

Abstract

In recent years, artificial intelligence has been fully involved in design practice and educational activities, and its impact on practice and education has received widespread attention from the academic community. This study aimed to preliminarily explore, through a controlled experiment, the differences in [...] Read more.

In recent years, artificial intelligence has been fully involved in design practice and educational activities, and its impact on practice and education has received widespread attention from the academic community. This study aimed to preliminarily explore, through a controlled experiment, the differences in the impact of generative artificial intelligence (AI) tools and traditional web/literature tools on the sustainable design learning outcomes of interior design students in a specific teaching context at a university in China. A study was conducted on 58 third-year college students who were divided into an AI tool group (Class B) and a traditional tool group (Class A). Three semi-structured questionnaire surveys were conducted over two months to collect data on their understanding, attitudes, and practical applications of sustainable design. Quantitative statistics and text analysis methods were used for the comparison. The results showed that under specific experimental conditions, students who used AI tools showed a more significant improvement in their self-evaluation of knowledge mastery, but their sense of recognition of the importance of knowledge and subsequent learning willingness also decreased. In subsequent design practice, students in the traditional tool group showed higher initiative in applying concepts and diversity in strategies. Text analysis further suggests that AI-assisted learning may be more conducive to the rapid structured acquisition of knowledge, while traditional learning methods exhibit different characteristics in promoting deep semantic associations. The conclusions of this study are based on short-term experimental observations of specific samples and toolsets, revealing the tension between efficiency and depth that may be faced when integrating AI tools into interior design education, providing a reference and discussion basis for broader and longer-term teaching research in the future. Full article

► Show Figures

Graphical abstract

32 pages, 4367 KB

Open AccessArticle

Comparison of Path Planning Algorithms for Manipulator Robots in Collaborative Manufacturing Environments: An Immersive Virtual Reality-Based Approach

by Jonathan David Aguilar and Carlos Felipe Rengifo

Multimodal Technol. Interact. 2026, 10(5), 51; https://doi.org/10.3390/mti10050051 - 6 May 2026

Abstract

Trajectory planning algorithms are essential in human–robot collaboration (HRC), as they must generate efficient trajectories for seamless interaction. Given the risks and complexity of testing in real-world scenarios, a virtual environment was developed in Unity 3D, integrating a virtual model of the UR3 [...] Read more.

Trajectory planning algorithms are essential in human–robot collaboration (HRC), as they must generate efficient trajectories for seamless interaction. Given the risks and complexity of testing in real-world scenarios, a virtual environment was developed in Unity 3D, integrating a virtual model of the UR3 robot that delivers workpieces to a user equipped with a Meta Quest device. The RRT, RRT-Star (RRTS), and RRT-Connect (RRTC) algorithms were evaluated using ANOVA and Tukey post hoc tests, considering the following response variables: safety, feasibility, smoothness, and computation time across three experimental scenarios characterized by (i) low, (ii) medium, and (iii) high levels of movement of the participant’s left hand. The statistical results indicate that RRTC exhibited the best performance in terms of smoothness and computation time. Based on these findings, a multicriteria decision-making analysis was conducted using the Analytic Hierarchy Process (AHP), combining quantitative evidence derived from the statistical analysis with expert judgments supported by bibliographic references. This multicriteria analysis enabled the coherent integration of the different evaluation criteria and concluded that RRTC is the most suitable alternative for collaborative assembly tasks in HRC environments. Full article

► Show Figures

Graphical abstract

14 pages, 1029 KB

Open AccessArticle

Sentiment Analysis Based on Enhanced Feature Decoupling and Multimodal Logical Reasoning

by Hua Yang, Ming Zhao, Yuanhao Qiu, Yuanyuan Li, Junying Guo, Ziran Zhang, Baozhou Chen, Mingzhe He and Yu Hong

Multimodal Technol. Interact. 2026, 10(5), 50; https://doi.org/10.3390/mti10050050 - 3 May 2026

Abstract

Despite significant advances, multimodal sentiment analysis still faces critical challenges in modeling complex cross-modal interactions and extracting discriminative sentiment features. To address these limitations, this paper proposes a hierarchical multimodal sentiment analysis framework. Specifically, a cross-modal feature enhancement module is first introduced to [...] Read more.

Despite significant advances, multimodal sentiment analysis still faces critical challenges in modeling complex cross-modal interactions and extracting discriminative sentiment features. To address these limitations, this paper proposes a hierarchical multimodal sentiment analysis framework. Specifically, a cross-modal feature enhancement module is first introduced to capture deep correlations among textual, visual, and acoustic modalities via cross-attention mechanisms, thereby obtaining context-aware fused representations. Subsequently, an attention-gated feature disentanglement approach is employed to effectively separate sentiment-relevant information from content-specific features within the fused representations; an independence loss is further imposed to enforce orthogonality between these two feature subsets, thereby mitigating noise induced by repetitive visual frames and textual stop words. Finally, all disentangled features are integrated to facilitate high-level sentiment reasoning through a multimodal logical inference module, where supervised contrastive loss is incorporated to enhance the discriminability of sentiment expressions. Extensive experiments conducted on two public benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that the proposed framework achieves improvements of 2–6% across multiple evaluation metrics compared with state-of-the-art methods. Full article

► Show Figures

Figure 1

28 pages, 12791 KB

Open AccessArticle

Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison

by Nikolina Rodin, Dario Ogrizović, Luka Batistić and Sandi Ljubic

Multimodal Technol. Interact. 2026, 10(5), 49; https://doi.org/10.3390/mti10050049 - 1 May 2026

Abstract

Fitts’ law is a foundational model for predicting pointing performance and has been increasingly explored in immersive virtual reality (VR) environments. This paper presents a controlled experimental framework for deriving modality-specific Fitts’ law models in VR and evaluating their predictive transfer to applied [...] Read more.

Fitts’ law is a foundational model for predicting pointing performance and has been increasingly explored in immersive virtual reality (VR) environments. This paper presents a controlled experimental framework for deriving modality-specific Fitts’ law models in VR and evaluating their predictive transfer to applied interaction tasks. The framework comprises two scenarios. The first replicates a standardized ISO 9241 pointing task in a 3D virtual environment to derive predictive movement time models by systematically varying target distance (20–50 cm), target size (2.5–5 cm), and spatial configuration (

0^{\circ}

,

45^{\circ}

,

90^{\circ}

,

135^{\circ}

). The second simulates an applied warehouse-inspired task involving tool sorting and structured placement actions to evaluate the generalizability of the derived models in more ecologically valid VR interactions. Thirty-two participants completed all tasks using the Meta Quest 3 headset and two interaction modalities: a handheld controller and hand tracking with gesture recognition. Results show that Fitts’ law remains a strong predictor of movement time for 3D pointing in VR, with high linear fits for both the controller (

R^{2} = 0.9615

) and hand tracking (

R^{2} = 0.9668

). However, models derived from standardized pointing tasks showed limited transferability to applied object-manipulation scenarios, producing prediction errors of approximately 27–35% and systematically underestimating movement times. Additionally, both objective metrics and subjective evaluations indicated that controller-based interaction outperformed hand tracking in efficiency, accuracy, perceived workload, and usability. These findings highlight both the robustness and limitations of Fitts-based performance modeling in realistic VR interaction contexts. Full article

► Show Figures

Figure 1

41 pages, 1843 KB

Open AccessArticle

FLAG: Fatty Liver Awareness Game for Liver Health Literacy in Last-Semester Software Engineering Students

by Franklin Parrales-Bravo, José Borbor-Albay, Janio Jadán-Guerrero and Leonel Vasquez-Cevallos

Multimodal Technol. Interact. 2026, 10(5), 48; https://doi.org/10.3390/mti10050048 - 1 May 2026

Abstract

Non-alcoholic fatty liver disease affects approximately thirty percent of the global population, yet public awareness remains dangerously low among young adults facing occupational risk factors. This study introduces the Fatty Liver Awareness Game (FLAG), an educational serious game designed to improve liver health [...] Read more.

Non-alcoholic fatty liver disease affects approximately thirty percent of the global population, yet public awareness remains dangerously low among young adults facing occupational risk factors. This study introduces the Fatty Liver Awareness Game (FLAG), an educational serious game designed to improve liver health literacy among software engineering students at the University of Guayaquil. While evaluated with this specific sample, FLAG is intended for the broader target population of young adults in developing nations who face occupational sedentary risk and limited access to preventive health education. Through a controlled experiment with fifty participants randomly assigned to game-based or traditional lecture instruction, the game demonstrated superior effectiveness, with a twenty-percentage-point advantage in post-test scores and a seventy-two percent reduction in incorrect responses compared to fifty percent in the lecture group. The large effect size (Cohen’s d = 1.43) and reduced performance variability among game participants indicate that interactive, feedback-rich learning environments can outperform passive instruction for this population and content domain. While the present design does not isolate the contribution of individual game elements—such as narrative framing, explanatory feedback, or mini-game interleaving—the results establish FLAG as a replicable model for digital health interventions targeting underserved populations at critical developmental junctures. Future component analyses are needed to determine which specific design features drive the observed advantages. Full article

► Show Figures

Figure 1

Journal Description

Multimodal Technologies and Interaction

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI