Multimodal Technologies and Interaction

MTI, Vol. 10, Pages 61: AUMOR: Augmented-Reality-Based Mobile Application for University Orientation

Muhammad Nadeem — 2026-05-29

MTI, Vol. 10, Pages 61: AUMOR: Augmented-Reality-Based Mobile Application for University Orientation

Multimodal Technologies and Interaction doi: 10.3390/mti10060061

Authors: Muhammad Nadeem Melinda Oroszlanyova Pauly Awad Hasan Ozkan Svetlana Beryozkina

Fresh engineering students are often required to absorb a large amount of new information within a short period of time, which can be academically and emotionally challenging. To address this challenge, this study introduces AUMOR, a mobile application designed to enhance university orientation by delivering contextual information at the point of need. It integrates GPS-based localization with QR code triggers to provide real-time, location-specific guidance and interactive content through an augmented reality (AR) interface. It uses GPS functionality to provide real-time location-based services, including information about academic buildings, student services, and recreational facilities. The QR codes on devices and laboratory equipment provide relevant information when scanned. A post-deployment user perception survey was conducted using a paper-based questionnaire involving 128 participants, including both students and faculty members. The results indicate that users perceived the application as helpful in enhancing their spatial awareness, navigation confidence, and ability to locate campus facilities, demonstrating high levels of usability and acceptance. The findings suggest that students perceived AUMOR as helpful for university orientation and suggest potential as a scalable solution.

MTI, Vol. 10, Pages 60: Improving Chatbot Usability Through Structured Prompt-Based Interaction Design

Gisel Katerine Bastidas-Guacho — 2026-05-28

MTI, Vol. 10, Pages 60: Improving Chatbot Usability Through Structured Prompt-Based Interaction Design

Multimodal Technologies and Interaction doi: 10.3390/mti10060060

Authors: Gisel Katerine Bastidas-Guacho Edison Patricio Azogue Martínez Marco Antonio Gabilanes Martínez Patricio Xavier Moreno-Vallejo

This study presents a comparative evaluation of the usability of an intelligent chatbot implemented in a childcare center management system, focusing on the impact of a prompt-enhanced conversational configuration on user experience. The Chatbot Usability Questionnaire (CUQ) was used to assess perceived usability under two conditions: a baseline configuration and an enhanced configuration incorporating role-based prompting and preprocessing mechanisms. The results indicate a substantial increase in CUQ scores, from 69 in the baseline condition to 91 in the enhanced condition, suggesting improved perceived usability. Rather than isolating prompt engineering as a standalone variable, this work evaluates a system-level design approach that integrates structured prompts, role-based contextualization, and interaction refinement strategies. This study contributes to the understanding of how prompt-enhanced conversational designs can improve response clarity, relevance, and interaction quality in multi-role environments, including parents, teachers, and administrators. The findings provide empirical evidence that such configurations are associated with more coherent and role-appropriate interactions in service-oriented chatbot systems.

MTI, Vol. 10, Pages 59: The Pedagogical Transfer Chain in the DigCompEdu Framework from a Teacher-Reported Perspective: A Predictive Analysis Using PLS-SEM and ANN

Daira Marizol Carvajal Morales — 2026-05-26

MTI, Vol. 10, Pages 59: The Pedagogical Transfer Chain in the DigCompEdu Framework from a Teacher-Reported Perspective: A Predictive Analysis Using PLS-SEM and ANN

Multimodal Technologies and Interaction doi: 10.3390/mti10060059

Authors: Daira Marizol Carvajal Morales Jessica Mariela Carvajal Morales Milton Alfonso Criollo Turusina Santiago José Chele Delgado Erika Jadira Romero Cardenas Juan Diego Valenzuela Cobos

The steady advancement of online education has not automatically translated into improved educational quality. Teacher training often continues to focus on the technical use of digital tools, while the pedagogical processes through which teachers report supporting students’ digital competence remain insufficiently understood. The objective of this study was to examine the sequential and predictive structure of teachers’ digital competence using the DigCompEdu framework as a reference. A quantitative cross-sectional study was conducted with a sample of 136 university teachers involved in online education. Data were collected through a self-reported questionnaire based on DigCompEdu and analyzed in two phases: Partial Least Squares Structural Equation Modeling (PLS-SEM) and Artificial Neural Networks (ANNs). The PLS-SEM results suggested a sequential pattern of associations among teacher-reported constructs: Professional Commitment (PC) was positively associated with Digital Resource Management (DR), which in turn was positively associated with Digital Pedagogy (DP) and Assessment and Feedback (AF). These dimensions were associated with Student Empowerment (SE), which showed the strongest positive relationship with teachers’ reported practices for Facilitating Students’ Digital Competence (FS). The ANN sensitivity analysis showed adequate predictive performance in the testing phase (RMSE = 0.155) and identified Student Empowerment as the predictor with the highest normalized importance within the specified model. These findings suggest that faculty development in online higher education may benefit from moving beyond basic digital literacy and platform management toward pedagogical design, formative assessment, inclusive participation, and learner agency. However, the results should be interpreted as evidence of teacher-reported facilitation practices within the analyzed sample, rather than as direct evidence of students’ actual digital competence development.

MTI, Vol. 10, Pages 58: A Comprehensive Review of Deep Learning Approaches for Video-Based Sign Language Recognition: Datasets, Challenges and Insights

Ulmeken Berzhanova — 2026-05-22

MTI, Vol. 10, Pages 58: A Comprehensive Review of Deep Learning Approaches for Video-Based Sign Language Recognition: Datasets, Challenges and Insights

Multimodal Technologies and Interaction doi: 10.3390/mti10060058

Authors: Ulmeken Berzhanova Aigerim Yerimbetova Marek Milosz Bakzhan Sakenov Dina Oralbekova Elmira Daiyrbayeva Daniyar Turgan

This study presents a comprehensive review of more than 100 research papers on sign language recognition (SLR) published between 2020 and 2026. The analysis focuses on deep learning approaches applied to video-based SLR, including spatiotemporal feature extraction, temporal modeling, attention mechanisms, motion-based representations, hybrid frameworks, transfer learning methods and other methods. Particular attention is given to how these methods model spatiotemporal dynamics and capture subtle gesture characteristics in sign language communication. The review highlights several recent developments, such as the introduction of specialized datasets, the emergence of real-time recognition systems, and the integration of multimodal fusion strategies. At the same time, persistent challenges remain, including data scarcity in low-resource sign languages, limited linguistic standardization of datasets, and insufficient model interpretability. The findings underline the importance of developing scalable and generalizable models capable of handling diverse datasets and user variability. The distinct contributions of this review are fourfold: (1) a comprehensive synthesis of over 100 studies published between 2020 and 2026, covering the full spectrum of deep learning architectures for video-based SLR; (2) a structured six-category taxonomy enabling systematic cross-architectural comparison; (3) a comprehensive focus on low-resource sign languages, which remain underrepresented in the existing literature; and (4) a critical analysis of the current benchmark landscape for low-resource sign languages, identifying key gaps and outlining strategic directions for future dataset development. These contributions are intended to guide further research toward more robust, inclusive, and universally applicable SLR systems.

MTI, Vol. 10, Pages 57: From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education

Nikolaos Pellas — 2026-05-21

MTI, Vol. 10, Pages 57: From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education

Multimodal Technologies and Interaction doi: 10.3390/mti10050057

Authors: Nikolaos Pellas

Computational thinking (CT) is increasingly recognized as essential in education, yet teacher preparation programs struggle to develop both computational proficiency and pedagogical readiness in pre-service teachers (PSTs). This study examines an AI-mediated, game-making course grounded in the emerging “vibe coding” paradigm, where 24 novice PSTs iteratively constructed programs through natural language prompting. Adopting a mixed-methods design, the study drew on pre- and post-course attitude questionnaires, reflective accounts of prompting strategies, and open-ended responses. Results indicate that participants substantively engaged with core CT practices, particularly debugging, iterative refinement, and problem decomposition. Nonetheless, this downward recalibration in self-reported coding and teaching confidence represents a productive adjustment rather than a failure. Conversely, attitudes toward game-making improved significantly, with a statistically significant medium effect size for perceived instructional value (d = 0.51), the largest practical effect observed across dimensions. Most participants intended to integrate CT into future teaching. These findings suggest that prompt-driven learning environments support meaningful engagement with computational processes when carefully scaffolded, but do not inherently ensure pedagogical readiness, particularly for higher-order CT practices such as abstraction and pattern recognition. Unlike prior research that has examined game-making processes or PST attitudes toward CT in isolation, this study empirically integrates all three within a single scaffolded instructional design using vibe coding. This integration enables a process-level account of how CT is enacted—and how it develops—when code generation is partially delegated to AI systems. Beyond documenting attitude shifts, the study introduces an analytical rubric for identifying CT engagement in AI-mediated prompting and derives evidence-based design principles that specify the pedagogical conditions under which vibe coding supports, rather than bypasses, computational reasoning.

MTI, Vol. 10, Pages 56: Attention-Based Multimodal Fusion for Salience-Aware Blended Emotion Recognition

José Salas-Cáceres — 2026-05-20

MTI, Vol. 10, Pages 56: Attention-Based Multimodal Fusion for Salience-Aware Blended Emotion Recognition

Multimodal Technologies and Interaction doi: 10.3390/mti10050056

Authors: José Salas-Cáceres Modesto Castrillón-Santana Oliverio J. Santana Daniel Hernández-Sosa Javier Lorenzo-Navarro

Blended emotion recognition introduces the challenge of identifying not only which emotions are present in an expressive display but also their relative salience. The proposed methodology builds upon the pre-extracted features provided with the dataset and enhances performance through a combination of temporal modeling and multimodal fusion strategies. Unimodal experiments revealed that visual encoders consistently outperformed audio ones, with the multimodal HiCMAE encoder achieving the strongest single-encoder results with 34% presence accuracy and 18.23% salience accuracy. Multimodal fusion further improved performance, with the best validation results obtained using a combination of simple concatenation and attention-based fusion, reaching 47.86% in presence accuracy and 27.92% in salience accuracy. Overall, the proposed methodology surpasses the chosen baseline introduced in the original paper across a k-fold experiment, confirming the effectiveness of multimodal attention-based fusion for the accurate prediction of both emotion presence and salience in blended affective behaviour. The experimental results further indicate that multimodal expression recognition consistently outperforms unimodal approaches, highlighting the complementary nature of cross-modal information.

MTI, Vol. 10, Pages 55: MAVAGEN: Multimodal Avatar Generation Framework for Personalized Human–Computer Interaction

Alexandr Axyonov — 2026-05-18

MTI, Vol. 10, Pages 55: MAVAGEN: Multimodal Avatar Generation Framework for Personalized Human–Computer Interaction

Multimodal Technologies and Interaction doi: 10.3390/mti10050055

Authors: Alexandr Axyonov Elena Ryumina Dmitry Ryumin Alexey Karpov

Digital-avatar systems still provide limited control over emotionally expressive behavior in human–computer interaction, especially in Large Language Model (LLM)-based chatbots and virtual assistants with personalized visual embodiments. To address this problem, we propose Multimodal Avatar Generation (MAVAGEN), a multimodal avatar generation framework for synthesizing upper-body digital avatars with personalized appearance and controllable emotional expression. The user specifies the desired gender and age, as well as provides a short text input from which the target emotional state is inferred. MAVAGEN then retrieves an identity image from the HaGRIDv2-1M corpus and generates an avatar clip with synchronized facial expressions, hand gestures, and expressive speech. The framework uses the following six feature streams: textual features, emotion-distribution features, landmark-based pose features, depth-geometry features, RGB-appearance features, and acoustic features. In a quantitative evaluation against recent human animation methods, MAVAGEN achieves the best overall avatar quality, with FID 48.20, FVD 592.00, SSIM 0.741, Sync-C 7.40, HKC 0.929, HKV 25.30, CSIM 0.563, and EmoAcc 0.88. Ablation results show that emotion and acoustic features contribute most to emotional agreement, while landmark-based pose and depth features improve geometric and motion stability. These results support the practical use of MAVAGEN in personalized LLM-based assistants and other emotion-sensitive interactive systems.

MTI, Vol. 10, Pages 54: Haptic and Thermal Rendering of Astronomical Data: A Multimodal Approach to Inclusive Science Communication

Beatriz García — 2026-05-12

MTI, Vol. 10, Pages 54: Haptic and Thermal Rendering of Astronomical Data: A Multimodal Approach to Inclusive Science Communication

Multimodal Technologies and Interaction doi: 10.3390/mti10050054

Authors: Beatriz García Johanna Casado Alexis Mancilla

Universal Accessibility in Astronomy requires a paradigm shift from visual-centric communication to multisensory data interaction. Because astronomy communication relies inherently on high-resolution imagery and visual metaphors, it creates significant accessibility barriers for blind and low-vision (BLV) audiences. To address this, multimodal encoding offers a feasible and meaningful solution by redistributing information across alternative sensory channels, ensuring that the absence of sight does not preclude the comprehension of spatial data. This article explores the development and evaluation of a low-cost, multimodal tool designed to represent complex astronomical concepts—specifically stellar magnitude and color—through tactile and auditory stimuli. Unlike traditional methods, our approach focuses on the haptic-cognitive link, allowing users to “feel” data through physical relief models. We present a structured impact study involving a heterogeneous group of blind, low-vision, and sighted participants. The methodology followed a mixed-methods approach, including a participatory workshop with 20 individuals and a detailed usability assessment with a core group (n= 6) of blind and low-vision participants. Preliminary results from this pilot phase demonstrate that multimodal integration effectively reduces the perceived mental effort for complex spatial data comprehension. Quantitative and qualitative feedback suggests that tactile-auditory sensory substitution not only improves accessibility but also enhances engagement and information retention across all user groups. These findings highlight the potential of multimodal models in transforming public scientific environments, such as museums and observatories, into inclusive, interactive spaces.

MTI, Vol. 10, Pages 53: A Conceptual Framework for Mobile Augmented-Reality Storytelling to Support Collaborative Language Learning in Vocational Education and Training

Eirini Maria Paraskevioti — 2026-05-11

MTI, Vol. 10, Pages 53: A Conceptual Framework for Mobile Augmented-Reality Storytelling to Support Collaborative Language Learning in Vocational Education and Training

Multimodal Technologies and Interaction doi: 10.3390/mti10050053

Authors: Eirini Maria Paraskevioti Athanasios Christopoulos Stylianos Mystakidis Mikko-Jussi Laakso Tapio Salakoski

Augmented Reality (AR) has been found to produce significant effects on individual learning outcomes but its impact on collaborative applications remains moderate. Existing AR frameworks emphasize individual instructional design, whereas frameworks for collaborative learning rarely engage with the spatial and device-mediated affordances of mobile AR. In response to this inadequacy in the literature, we introduce the Mobile Augmented-Reality Storytelling for Vocational Education and Training (MARS-VET) framework, a four-dimensional conceptual architecture that integrates Computer-Supported Collaborative Learning (CSCL) scripting principles with mobile AR affordances for collaborative English as a Foreign Language (EFL) writing in Vocational Education and Training (VET) settings. MARS-VET synthesizes theoretical perspectives across four dimensions: contextual anchoring, which embeds activities within authentic workplace scenarios; collaborative orchestration, which structures group interaction through macro- and micro-level scripts; competency cultivation, which sequences writing progression from model-based reproduction toward autonomous professional text production; and capacity building, which addresses the professional-development requirements of implementing educators. Content validity was established through expert panel evaluation involving international specialists (N = 11) who rated the framework against 36 items using a four-point relevance scale and provided additional qualitative feedback. The Scale-level Content Validity Index (S-CVI/Ave = 0.91) exceeded established thresholds, with all four dimensions achieving satisfactory item-level indices. Experts reached unanimous agreement on items addressing workplace scenario identification and co-located access to linguistic resources. Qualitative feedback led to terminology refinements and clarification of orchestration mechanisms. The framework offers VET institutions and educators a reference for the design and evaluation of collaborative AR experiences in an area where integrative frameworks have so far been lacking.

MTI, Vol. 10, Pages 52: AI-Mediated Multimodal Learning and Its Impact on Sustainable Design Cognition: An Experimental Study with Interior Design Students

Yang Song — 2026-05-09

MTI, Vol. 10, Pages 52: AI-Mediated Multimodal Learning and Its Impact on Sustainable Design Cognition: An Experimental Study with Interior Design Students

Multimodal Technologies and Interaction doi: 10.3390/mti10050052

Authors: Yang Song Shaochen Wang

In recent years, artificial intelligence has been fully involved in design practice and educational activities, and its impact on practice and education has received widespread attention from the academic community. This study aimed to preliminarily explore, through a controlled experiment, the differences in the impact of generative artificial intelligence (AI) tools and traditional web/literature tools on the sustainable design learning outcomes of interior design students in a specific teaching context at a university in China. A study was conducted on 58 third-year college students who were divided into an AI tool group (Class B) and a traditional tool group (Class A). Three semi-structured questionnaire surveys were conducted over two months to collect data on their understanding, attitudes, and practical applications of sustainable design. Quantitative statistics and text analysis methods were used for the comparison. The results showed that under specific experimental conditions, students who used AI tools showed a more significant improvement in their self-evaluation of knowledge mastery, but their sense of recognition of the importance of knowledge and subsequent learning willingness also decreased. In subsequent design practice, students in the traditional tool group showed higher initiative in applying concepts and diversity in strategies. Text analysis further suggests that AI-assisted learning may be more conducive to the rapid structured acquisition of knowledge, while traditional learning methods exhibit different characteristics in promoting deep semantic associations. The conclusions of this study are based on short-term experimental observations of specific samples and toolsets, revealing the tension between efficiency and depth that may be faced when integrating AI tools into interior design education, providing a reference and discussion basis for broader and longer-term teaching research in the future.

MTI, Vol. 10, Pages 51: Comparison of Path Planning Algorithms for Manipulator Robots in Collaborative Manufacturing Environments: An Immersive Virtual Reality-Based Approach

Jonathan David Aguilar — 2026-05-06

MTI, Vol. 10, Pages 51: Comparison of Path Planning Algorithms for Manipulator Robots in Collaborative Manufacturing Environments: An Immersive Virtual Reality-Based Approach

Multimodal Technologies and Interaction doi: 10.3390/mti10050051

Authors: Jonathan David Aguilar Carlos Felipe Rengifo

Trajectory planning algorithms are essential in human–robot collaboration (HRC), as they must generate efficient trajectories for seamless interaction. Given the risks and complexity of testing in real-world scenarios, a virtual environment was developed in Unity 3D, integrating a virtual model of the UR3 robot that delivers workpieces to a user equipped with a Meta Quest device. The RRT, RRT-Star (RRTS), and RRT-Connect (RRTC) algorithms were evaluated using ANOVA and Tukey post hoc tests, considering the following response variables: safety, feasibility, smoothness, and computation time across three experimental scenarios characterized by (i) low, (ii) medium, and (iii) high levels of movement of the participant’s left hand. The statistical results indicate that RRTC exhibited the best performance in terms of smoothness and computation time. Based on these findings, a multicriteria decision-making analysis was conducted using the Analytic Hierarchy Process (AHP), combining quantitative evidence derived from the statistical analysis with expert judgments supported by bibliographic references. This multicriteria analysis enabled the coherent integration of the different evaluation criteria and concluded that RRTC is the most suitable alternative for collaborative assembly tasks in HRC environments.

MTI, Vol. 10, Pages 50: Sentiment Analysis Based on Enhanced Feature Decoupling and Multimodal Logical Reasoning

Hua Yang — 2026-05-03

MTI, Vol. 10, Pages 50: Sentiment Analysis Based on Enhanced Feature Decoupling and Multimodal Logical Reasoning

Multimodal Technologies and Interaction doi: 10.3390/mti10050050

Authors: Hua Yang Ming Zhao Yuanhao Qiu Yuanyuan Li Junying Guo Ziran Zhang Baozhou Chen Mingzhe He Yu Hong

Despite significant advances, multimodal sentiment analysis still faces critical challenges in modeling complex cross-modal interactions and extracting discriminative sentiment features. To address these limitations, this paper proposes a hierarchical multimodal sentiment analysis framework. Specifically, a cross-modal feature enhancement module is first introduced to capture deep correlations among textual, visual, and acoustic modalities via cross-attention mechanisms, thereby obtaining context-aware fused representations. Subsequently, an attention-gated feature disentanglement approach is employed to effectively separate sentiment-relevant information from content-specific features within the fused representations; an independence loss is further imposed to enforce orthogonality between these two feature subsets, thereby mitigating noise induced by repetitive visual frames and textual stop words. Finally, all disentangled features are integrated to facilitate high-level sentiment reasoning through a multimodal logical inference module, where supervised contrastive loss is incorporated to enhance the discriminability of sentiment expressions. Extensive experiments conducted on two public benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that the proposed framework achieves improvements of 2–6% across multiple evaluation metrics compared with state-of-the-art methods.

MTI, Vol. 10, Pages 49: Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison

Nikolina Rodin — 2026-05-01

MTI, Vol. 10, Pages 49: Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison

Multimodal Technologies and Interaction doi: 10.3390/mti10050049

Authors: Nikolina Rodin Dario Ogrizović Luka Batistić Sandi Ljubic

Fitts’ law is a foundational model for predicting pointing performance and has been increasingly explored in immersive virtual reality (VR) environments. This paper presents a controlled experimental framework for deriving modality-specific Fitts’ law models in VR and evaluating their predictive transfer to applied interaction tasks. The framework comprises two scenarios. The first replicates a standardized ISO 9241 pointing task in a 3D virtual environment to derive predictive movement time models by systematically varying target distance (20–50 cm), target size (2.5–5 cm), and spatial configuration (0∘, 45∘, 90∘, 135∘). The second simulates an applied warehouse-inspired task involving tool sorting and structured placement actions to evaluate the generalizability of the derived models in more ecologically valid VR interactions. Thirty-two participants completed all tasks using the Meta Quest 3 headset and two interaction modalities: a handheld controller and hand tracking with gesture recognition. Results show that Fitts’ law remains a strong predictor of movement time for 3D pointing in VR, with high linear fits for both the controller (R2=0.9615) and hand tracking (R2=0.9668). However, models derived from standardized pointing tasks showed limited transferability to applied object-manipulation scenarios, producing prediction errors of approximately 27–35% and systematically underestimating movement times. Additionally, both objective metrics and subjective evaluations indicated that controller-based interaction outperformed hand tracking in efficiency, accuracy, perceived workload, and usability. These findings highlight both the robustness and limitations of Fitts-based performance modeling in realistic VR interaction contexts.

MTI, Vol. 10, Pages 48: FLAG: Fatty Liver Awareness Game for Liver Health Literacy in Last-Semester Software Engineering Students

Franklin Parrales-Bravo — 2026-05-01

MTI, Vol. 10, Pages 48: FLAG: Fatty Liver Awareness Game for Liver Health Literacy in Last-Semester Software Engineering Students

Multimodal Technologies and Interaction doi: 10.3390/mti10050048

Authors: Franklin Parrales-Bravo José Borbor-Albay Janio Jadán-Guerrero Leonel Vasquez-Cevallos

Non-alcoholic fatty liver disease affects approximately thirty percent of the global population, yet public awareness remains dangerously low among young adults facing occupational risk factors. This study introduces the Fatty Liver Awareness Game (FLAG), an educational serious game designed to improve liver health literacy among software engineering students at the University of Guayaquil. While evaluated with this specific sample, FLAG is intended for the broader target population of young adults in developing nations who face occupational sedentary risk and limited access to preventive health education. Through a controlled experiment with fifty participants randomly assigned to game-based or traditional lecture instruction, the game demonstrated superior effectiveness, with a twenty-percentage-point advantage in post-test scores and a seventy-two percent reduction in incorrect responses compared to fifty percent in the lecture group. The large effect size (Cohen’s d = 1.43) and reduced performance variability among game participants indicate that interactive, feedback-rich learning environments can outperform passive instruction for this population and content domain. While the present design does not isolate the contribution of individual game elements—such as narrative framing, explanatory feedback, or mini-game interleaving—the results establish FLAG as a replicable model for digital health interventions targeting underserved populations at critical developmental junctures. Future component analyses are needed to determine which specific design features drive the observed advantages.

MTI, Vol. 10, Pages 47: Parallel Bilingual Datasets: A Multimodal Deep Learning Framework for Proficiency and Style Classification

Padmavathi Kesavan — 2026-04-30

MTI, Vol. 10, Pages 47: Parallel Bilingual Datasets: A Multimodal Deep Learning Framework for Proficiency and Style Classification

Multimodal Technologies and Interaction doi: 10.3390/mti10050047

Authors: Padmavathi Kesavan Miranda Lakshmi Travis Martin Aruldoss Martin Wynn

This study presents a multimodal deep learning framework for automatic proficiency and style classification of parallel Bilingual Tamil–Hindi learner data. The proposed system employs a dual-headed neural architecture to simultaneously predict proficiency levels (Basic, Advanced) and stylistic categories (Formal, Literary) using shared feature representations. A curated dataset of bilingual text samples is utilized, along with synthetic speech generated through text-to-speech (TTS) to enable controlled multimodal experimentation. Five deep learning architectures are evaluated under text-only, audio-only, and learnable fusion settings. Experimental findings indicate that text-based models consistently achieve strong performance in both proficiency and style classification tasks. In contrast, the audio-only model demonstrates limited effectiveness, highlighting the constraints of synthetic acoustic features in capturing meaningful linguistic information. The fusion models provide only marginal improvements over text-based approaches, suggesting that textual representations play a dominant role in proficiency and stylistic classification within controlled datasets. These results emphasize the importance of linguistic features over acoustic signals for automated language assessment in low-resource settings. The proposed framework provides a scalable and reproducible approach and offers a foundation for future work incorporating real speech data and more diverse linguistic inputs.

MTI, Vol. 10, Pages 46: AI-Enhanced Motion Capture for Multimodal Interaction in Chinese Shadow Puppetry Heritage

Gaihua Wang — 2026-04-28

MTI, Vol. 10, Pages 46: AI-Enhanced Motion Capture for Multimodal Interaction in Chinese Shadow Puppetry Heritage

Multimodal Technologies and Interaction doi: 10.3390/mti10050046

Authors: Gaihua Wang Hengchao Yun Lixin Yang Qingyuan Zheng Tianmuran Liu

This study examines how AI-enhanced motion capture (AI-MoCap) mediates the preservation, transmission, and re-creation of Chinese shadow puppetry as performative intangible cultural heritage. Through a state-of-the-art review and comparative analysis of three representative application models—technology-driven, culturally integrated, and entertainment-oriented—the paper explores how AI-MoCap supports the digitization of performative techniques while reshaping modes of cultural presentation and interaction. Cross-case comparison highlights recurring tensions between technical standardization and cultural authenticity while also indicating possibilities for symbolic reconstruction, contextual continuity, and ethically grounded design. Based on this comparison, the paper develops a dual-channel inheritance framework—“perception–symbol” and “design–performance”—and treats cultural resolution and digital ethics as analytical and normative principles for resisting algorithmic homogenization. Rather than functioning only as a digitization tool, AI-MoCap can be understood as a mediating mechanism whose cultural value depends on how it remains embedded in community-based performative logics, symbolic systems, and ethical boundaries. The resulting framework offers transferable guidance for future research, curation, training, and policy discussion in the digital safeguarding of performance-based heritage.

MTI, Vol. 10, Pages 45: Interactive Narratives and Serious Games in Oncology and Grief Support: A Systematic Literature Review

João Macieira — 2026-04-27

MTI, Vol. 10, Pages 45: Interactive Narratives and Serious Games in Oncology and Grief Support: A Systematic Literature Review

Multimodal Technologies and Interaction doi: 10.3390/mti10050045

Authors: João Macieira Marco Vale Elena Vanica Vitor Carvalho

The impact of oncological diseases extends far beyond the clinical patient, profoundly affecting the mental health of caregivers, family members, and volunteers who navigate complex emotional landscapes of grief, anxiety, and trauma. While the domain of digital health has seen a proliferation of serious games aimed at pediatric patient education and treatment adherence, the specific perspective of the “second-order patient”, the caregiver or survivor, remains significantly under-explored. The primary objective of this study is to systematically review the current state of interactive narratives in oncology, palliative care, and grief support, identifying research gaps to inform the broader design space of empathy-driven serious games. Following the PRISMA guidelines, 31 articles were selected from an initial query of 116 records. Interventions were categorized into Serious Games, Games, and Gamification. The analysis reveals a critical thematic transition: early interventions relied heavily on biological “battle” metaphors to empower patients, whereas the current literature advocates for “thanatosensitive” designs that foster empathy. However, a distinct research gap persists regarding narratives that explore post-loss meaning reconstruction and the hospital volunteer experience. Synthesizing these findings, this paper establishes an evidence-based theoretical framework demonstrating a significant opportunity for games that prioritize dialogue and emotional processing over traditional winning conditions. As a practical application of these findings, we also briefly outline the conceptualization of a prototype simulating a widower’s experience volunteering in a palliative ward, shifting the ludic focus from defeating a disease to navigating loss.

MTI, Vol. 10, Pages 44: From Tradition to Technology: A Framework for Smart Pilgrim Management on the Camino de Santiago

Adriana Mar — 2026-04-23

MTI, Vol. 10, Pages 44: From Tradition to Technology: A Framework for Smart Pilgrim Management on the Camino de Santiago

Multimodal Technologies and Interaction doi: 10.3390/mti10050044

Authors: Adriana Mar Fernando Monteiro Pedro Pereira Jose Carlos García João F. A. Martins Daniel Basulto

The Camino de Santiago, a UNESCO-listed pilgrimage route, has experienced sustained growth in visitor numbers, challenging municipalities to preserve cultural integrity while ensuring service quality. This study reviews people-counting technologies and proposes a smart pilgrim management framework grounded in flux measurement systems to support data-driven and sustainable decision-making. Drawing on the smart tourism literature, the conceptual framework integrates infrared counters, mobile tracking solutions, and GPS/Wi-Fi data to generate real-time insights into pilgrim flows. A pilot simulation illustrates how these data can inform operational and strategic planning. The framework enables local authorities to monitor pedestrian movements, anticipate service demands (sanitation, accommodation, and safety), and detect overcrowding in sensitive heritage areas. By incorporating technological solutions into traditionally low-tech pilgrimage settings, municipalities can transition from reactive to proactive management approaches. The paper contributes a scalable and ethically grounded framework tailored to heritage pilgrimage routes, advancing smart tourism applications in culturally significant contexts.

MTI, Vol. 10, Pages 43: Who, Where, What, and How to Nudge: A Systematic Review of Co-Designed Digital Nudges for Behavioral Interventions

Alaa Ziyud — 2026-04-21

MTI, Vol. 10, Pages 43: Who, Where, What, and How to Nudge: A Systematic Review of Co-Designed Digital Nudges for Behavioral Interventions

Multimodal Technologies and Interaction doi: 10.3390/mti10040043

Authors: Alaa Ziyud Khaled Al-Thelaya Jens Schneider

Digital nudges refer to subtle modifications in digital choice architectures that are increasingly applied across domains such as healthcare, human–computer interactions, and behavioral science. However, existing approaches often overlook users’ needs, contextual factors, and ethical considerations related to transparency and autonomy. This systematic literature review, guided by PRISMA 2020, examines the integration of co-design methodologies in digital nudging across four dimensions: participants, application domains, nudge forms, and development methods. The findings show that co-design is primarily driven by end-users, supported by domain experts and technology specialists. Applications are concentrated in health-related contexts, particularly chronic disease management and mental health. The effectiveness of priming varied across studies, with some reporting short-term benefits and others indicating user fatigue, suggesting context-dependent impact and limited long-term effectiveness.

MTI, Vol. 10, Pages 42: From Prompts to High-Fidelity Prototypes: A Usability Evaluation of Generative AI-Driven Prototyping Tools for Smart Mobile App Design

John Bustamante-Orejuela — 2026-04-17

MTI, Vol. 10, Pages 42: From Prompts to High-Fidelity Prototypes: A Usability Evaluation of Generative AI-Driven Prototyping Tools for Smart Mobile App Design

Multimodal Technologies and Interaction doi: 10.3390/mti10040042

Authors: John Bustamante-Orejuela Xavier Quiñonez-Ku Pablo Pico-Valencia

The integration of Generative Artificial Intelligence (GAI) into software design tools has transformed the early stages of mobile application development, particularly prototype creation from natural-language prompts. This study evaluates the usability and effectiveness of GAI-assisted prototyping tools for generating high-fidelity mobile application prototypes. A controlled laboratory usability study was conducted in which undergraduate Information Technology Engineering students used and evaluated four widely adopted prototyping platforms: Figma, Uizard, Visily, and Stitch. Participants employed these tools to recreate mobile interfaces corresponding to the interaction model of the Duolingo application. The System Usability Scale (SUS) was used to assess perceived usability and effectiveness from the users’ perspective. The results indicate that all evaluated tools enabled rapid prototype generation; however, significant differences emerged in usability, structural fidelity, and perceived control. Figma and Stitch achieved the highest usability scores and demonstrated greater alignment with the reference prototype (82.86 and 80.36, respectively). Visily achieved a favorable usability score (78.57), while Uizard obtained a moderate score (67.14). Although Uizard and Visily exhibited strong automation capabilities and faster initial generation, their outputs required additional manual refinement to achieve higher fidelity and customization. Participant feedback emphasized the importance of output quality, responsiveness, and foundational design knowledge in achieving satisfactory results. Overall, the findings suggest that current GAI-based prototyping tools are effective and valuable in real-world software development contexts. However, their effectiveness appears closely related to the degree of user control, responsiveness, and the ability to iteratively refine AI-generated interface components.

MTI, Vol. 10, Pages 41: Introducing Brain–Computer Interfaces in Factories and Fabrication Lines for the Inclusion of Disabled Workers–Industry 5.0—A Modern Challenge and Opportunity

Marian-Silviu Poboroniuc — 2026-04-17

MTI, Vol. 10, Pages 41: Introducing Brain–Computer Interfaces in Factories and Fabrication Lines for the Inclusion of Disabled Workers–Industry 5.0—A Modern Challenge and Opportunity

Multimodal Technologies and Interaction doi: 10.3390/mti10040041

Authors: Marian-Silviu Poboroniuc Zoltán Nochta Martin Klepal Nina Hunter Danut-Constantin Irimia Alina Georgiana Baciu Kelaja Schert Tim Piotrowski Alexandru Mitocaru

Flexible factories and adaptive fabrication lines offer a testbed for advanced multimodal interaction concepts that can support the inclusion of disabled workers in Industry 5.0 manufacturing systems. The study synthesizes interdisciplinary data from ergonomics, industrial automation, and EU regulatory frameworks to establish a conceptual model for human-machine interaction. Building on conceptual modeling and a structured literature analysis, the study proposes a six-step integration framework that links task demands, worker capabilities, and interaction modalities within human-in-the-loop manufacturing environments. Although no empirical case study was conducted in this phase, an exemplary application is presented for a semi-automated bike wheel manufacturing process. Detailed machine-based assembly line flows and simulated process data were utilized for illustrative purposes to depict the process and validate the proposed Capability–Task Matching Matrix. The results operationalize the human-centric vision of Industry 5.0 by providing a structured methodology for the inclusion of disabled workers within fabrication environments. The findings are organized into two primary components: the conceptual development of the Integration Approach and its practical application to a semi-automated industrial use-case. Finally, a particular focus is placed on Brain–Computer Interfaces (BCIs) as an emerging interaction channel that enables non-muscular control, attention monitoring, and neuroadaptive feedback, complementing conventional interfaces rather than replacing them. The framework is illustrated through application to the same semi-automated bicycle wheel assembly line, where BCI-supported interaction, augmented interfaces, and robotic assistance are mapped to specific production tasks and assessed in terms of feasibility and technological maturity. Drawing on the paper’s results, an explanatory 10-year roadmap outlines the feasibility and phased deployment of BCI solutions. It aligns technological advances with European regulations and a vision for a fully inclusive manufacturing enterprise.

MTI, Vol. 10, Pages 40: The Discrimination Threshold on the Palm for Two Successive Rectangular Stimuli

Mayuka Kojima — 2026-04-15

MTI, Vol. 10, Pages 40: The Discrimination Threshold on the Palm for Two Successive Rectangular Stimuli

Multimodal Technologies and Interaction doi: 10.3390/mti10040040

Authors: Mayuka Kojima Akio Yamamoto

This study investigates tactile spatial resolution on the palm using two successive rectangular stimuli. Whereas classical tactile resolution studies have focused mainly on point or circular stimulation, less is known about how spatial resolution depends on the placement and geometry of rectangular, device-relevant stimuli. We measured the successive two-stimulus discrimination threshold using three rectangular stimulators across five palm areas aligned along the proximal–distal axis. Participants compared a fixed reference stimulus with a variable comparison stimulus, and the minimum separation at which the two stimuli were perceived as occurring at different locations was recorded as the threshold. The overall average threshold across all experimental conditions was approximately 5.2 mm. The threshold varied systematically across palm regions, being smallest around the palmar digital crease and the base of the fingers. In the central palm, threshold differences were more evident for changes in stimulator width than for changes in stimulator length. These results extend tactile spatial resolution research beyond point stimulation and provide design-relevant guidance for palm-based haptic devices.

MTI, Vol. 10, Pages 39: Multimodal Smart-Skin for Real-Time Sitting Posture Recognition with Cross-Session Validation

Giva Andriana Mutiara — 2026-04-09

MTI, Vol. 10, Pages 39: Multimodal Smart-Skin for Real-Time Sitting Posture Recognition with Cross-Session Validation

Multimodal Technologies and Interaction doi: 10.3390/mti10040039

Authors: Giva Andriana Mutiara Muhammad Rizqy Alfarisi Paramita Mayadewi Lisda Meisaroh Periyadi

Prolonged sitting with poor posture is associated with musculoskeletal disorders, reduced productivity, and long-term health risks. Many existing posture monitoring systems predominantly rely on single-modality sensing, such as pressure or vision-based approaches, limiting their ability to capture both static alignment and dynamic micro-movements. This study proposes a multimodal smart-skin system integrating pressure, temperature, and vibration sensors for sitting posture recognition. A total of 42 sensors distributed across 14 anatomical locations were deployed, generating 15,037 samples collected over three independent sessions to evaluate cross-session temporal generalization across nine posture classes under controlled experimental conditions. Two deep learning architectures—Temporal Convolutional Networks with Attention (TCN + Attn) and Convolutional Neural Network–Long Short-Term Memory (CNN − LSTM)—were compared under Leave-One-Session-Out (LOSO) cross-validation. TCN + Attn achieved 85.23% LOSO accuracy, outperforming CNN − LSTM by 2.56 percentage points while reducing training time by 36.7% and inference latency by 33.9%. Ablation analysis revealed that temperature sensing was the most discriminative unimodal modality (71.5% accuracy), and full multimodal fusion improved LOSO accuracy by 22.93% compared to pressure-only configurations. These results demonstrate the feasibility of multimodal smart-skin sensing combined with temporal convolutional modeling for cross-session posture recognition and indicate potential for efficient real-time, privacy-preserving ergonomic monitoring. This study should be interpreted as a controlled, single-subject proof-of-concept, and further validation in multi-subject and real-world environments is required to establish broader generalizability.

MTI, Vol. 10, Pages 38: PAD-Guided Multimodal Hybrid Contrastive Emotion Recognition upon STEM-E2VA Dataset

Shufei Duan — 2026-04-02

MTI, Vol. 10, Pages 38: PAD-Guided Multimodal Hybrid Contrastive Emotion Recognition upon STEM-E2VA Dataset

Multimodal Technologies and Interaction doi: 10.3390/mti10040038

Authors: Shufei Duan Wenjie Zhang Liangqi Li Ting Zhu Fangyu Zhao Fujiang Li Huizhi Liang

There are still challenges in speech emotion recognition, as the representation capability of single-modal information is limited, there are difficulties in capturing continuous emotional transitions in discrete emotion annotations, and the issues of modal structural differences and cross-sample alignment in multimodal fusion methods persist. To address these, this study undertakes work from both data and model perspectives. For data, a Chinese multimodal database STEM-E2VA was constructed, synchronously collecting four modalities of data: articulatory kinematics, acoustics, glottal signals, and videos. This covers seven discrete emotion categories and employs PAD continuous annotation. By integrating discrete and continuous dimensional annotations, it better represents the distinction between strong and weak emotions under the same discrete emotion label. Concurrently, to process the biases in PAD annotations, we employed the SCL-90 psychological questionnaire to analyze annotators’ cognitive and emotional perceptions, thereby ensuring data reliability. For model, this paper proposes a multimodal supervised contrastive fusion network incorporating PAD perception. It employs a PAD-enhanced hybrid contrastive loss function to optimize intra-model and inter-modal feature alignment. Utilizing a cross-attention mechanism combined with a GRU–Transformer network for temporal feature extraction, it achieves deep fusion of multimodal information, reducing inter-modal discrepancies and cross-class confusion. Experiments demonstrate that the proposed method achieves 85.47% accuracy in discrete sentiment recognition on STEM-E2VA, with a substantial reduction in RMSE for PAD dimension prediction. It also exhibits excellent generalization capability on IEMOCAP, providing a novel framework for integrating discrete and continuous sentiment representations.

MTI, Vol. 10, Pages 37: Ergonomic Evaluation of Augmented Reality-Based Visualization of Scattered Radiation Distribution During Partial-Angle CT

Hiroaki Hasegawa — 2026-04-02

MTI, Vol. 10, Pages 37: Ergonomic Evaluation of Augmented Reality-Based Visualization of Scattered Radiation Distribution During Partial-Angle CT

Multimodal Technologies and Interaction doi: 10.3390/mti10040037

Authors: Hiroaki Hasegawa

Computed tomography (CT)-guided procedures require close proximity to the CT gantry or patient, increasing occupational exposure to scattered radiation. Even though radiation-protective equipment is commonly used, the optimization of CT fluoroscopic techniques remains important. Partial-angle CT (PACT) employs a limited exposure angle, producing cumulative scattered radiation distributions that vary with the selected angle and are difficult to estimate in advance. I aimed to develop an augmented reality (AR)-based visualization method for cumulative scattered radiation distributions during PACT and to evaluate its ergonomic feasibility as a proof of concept for occupational exposure reduction. An AR display system was developed to overlay cumulative scattered radiation distributions onto physical space using AR glasses. Workload was assessed using the NASA Task Load Index (NASA-TLX), and usability was assessed using the System Usability Scale (SUS). Compared with non-virtual conditions using radiation-protective glasses alone, AR-assisted visualization was associated with increased perceived workload, and usability scores were lower than those reported in previous AR studies. These findings indicate that, for AR display systems to support occupational exposure reduction, perceived task demands must be comparable to conventional protection strategies. Further improvements in visualization methods, user familiarity with AR environments, and ergonomic optimization are required to facilitate clinical implementation.

MTI, Vol. 10, Pages 36: The Emergent Rhythms of a Robot Vacuum Cleaner—An Empirically Grounded Account of Agential Realism

Linus de Petris — 2026-04-01

MTI, Vol. 10, Pages 36: The Emergent Rhythms of a Robot Vacuum Cleaner—An Empirically Grounded Account of Agential Realism

Multimodal Technologies and Interaction doi: 10.3390/mti10040036

Authors: Linus de Petris Siamak Khatibi Yuan Zhou

This article builds on the argument that design for complex interactive systems should shift from creating linear transactional interactions toward organizing relational complexity. Grounded in Karen Barad’s agential realism, we argue that a designer’s role can benefit from not predefining interactions but from curating the material-discursive conditions under which meaningful relations can emerge. To explore the empirical and temporal dimensions of this practice, we conducted an exploratory workshop setting the conditions for emergent gameplay dynamics and discussions on agential realist anticipation. Participants utilized a custom-designed game and built their own physical controllers to anticipate and adapt to shifting gameplay conditions. Our results demonstrate how alterations in relational constraints, rather than explicit pre-programmed goals, drove the emergence of non-predefined gameplay rhythms. The findings provide empirical grounding for an agential realist understanding of anticipation, showing that an interactive system’s identity lies in its unfolding processual patterns rather than a static final state. Based on these findings, we propose three design principles for further exploration: Design for Relational Emergence, Design for Re-membering, and Design for Emergent Patterns. Consequently, we conclude by outlining a conceptual approach for non-linear computational architectures, drawing on principles from Enactive AI and reservoir computing.

MTI, Vol. 10, Pages 35: Reading Noise: Integrating Physiological Sensing and Sound-Driven Visualization to Externalize Noise-Related Cognitive Disruption During Reading

Xueyi Li — 2026-03-30

MTI, Vol. 10, Pages 35: Reading Noise: Integrating Physiological Sensing and Sound-Driven Visualization to Externalize Noise-Related Cognitive Disruption During Reading

Multimodal Technologies and Interaction doi: 10.3390/mti10040035

Authors: Xueyi Li Yonghong Liu Zihui Jiang Yangcheng Wang

Environmental noise may interfere with the reading experience by increasing cognitive load and psychophysiological arousal, yet these effects are difficult to perceive and communicate in real time. This study presents Reading Noise, an interactive installation that combines physiological sensing and sound-driven visualization to externalize perceived noise-related disturbance and psychophysiological strain during reading. In a controlled experiment, 46 participants completed reading tasks under four levels of background conversational noise (0–30, 31–60, 61–90, and >90 dB) while ambient sound level, electrodermal activity (EDA), and electrocardiogram (ECG) were recorded in real time. Following data quality screening, inferential statistical analyses were performed on the analyzable physiological subset (n = 16). Based on these data, a hybrid mapping strategy combining rule-based assignment and LMM-informed exploratory calibration was developed to map acoustic and physiological changes onto dynamic text-based visual parameters, including deformation intensity, jitter, and motion instability, for real-time feedback. Within the analyzable subset, noise level was associated with significant changes in the recorded physiological indicators (all p < 0.05): skin conductance level (SCL) and skin conductance responses per minute (SCRs/min) increased (4.69 ± 2.13 to 5.93 ± 2.19 μS; 1.49 ± 1.59 to 2.51 ± 2.13), whereas the percentage of successive RR intervals differing by more than 50 ms (pNN50) and the root mean square of successive differences (RMSSD) decreased (15.84 ± 16.52% to 10.57 ± 11.35%; 36.63 ± 17.62 to 29.67 ± 16.66 ms). Subjective cognitive load also increased significantly (2.06 ± 0.29 to 6.38 ± 0.31). A follow-up installation study with 24 cross-disciplinary participants, with reported group interaction observations drawn from a 12-participant subset, suggested that the installation may facilitate shared interpretation of attention-related disruption and cognitive strain, indicating the potential of physiology-informed visual translation as a boundary object approach for empathetic, sound-mediated communication.

MTI, Vol. 10, Pages 34: Distributed Teaching Agency–AI in the University: A Typology Based on Student Voice

Tomás Fontaines-Ruiz — 2026-03-27

MTI, Vol. 10, Pages 34: Distributed Teaching Agency–AI in the University: A Typology Based on Student Voice

Multimodal Technologies and Interaction doi: 10.3390/mti10040034

Authors: Tomás Fontaines-Ruiz Antonio Ponce-Rojo Paolo Fabre Merchán Walther Casimiro Urcos Liliana Cánquiz Rincón

Generative AI is reshaping university teaching and creating tension around authority, evidence, and accountability when decisions are made using algorithms. From a student perspective, this study constructed a typology of distributed teacher–AI agency (TAI) and examined the discursive mechanisms that produce the illusion of teacher autonomy. A non-experimental, cross-sectional, explanatory study was conducted: a lexicometric analysis of the ALCESTE (IRAMUTEQ) questionnaire, using open-ended responses from 3120 students (Mexico, n = 2051; Ecuador, n = 1069), segmented into 1077 units, and analyzed using positioning theory. Co-agency was operationalized using Teacher Agency (A), Delegation to AI (D), Governance (G: disclosure, criteria, verification), and the Illusion Index (II = A/(D + G + 1)). Three configurations emerged: Immediate Customizer (28.8%) with very high A and minimal D/G (II = 25.4); Technological Literacy Facilitator (27.3%) with visible delegation and safeguards (II ≈ 2.0); and Operational Optimizer (43.9%) oriented toward accelerating tasks with moderate governance (II ≈ 2.7). The illusion was associated with the agentive erasure of AI and a rhetoric of immediacy/efficiency that replaced verifiable criteria. These findings transform the student voice into a criteria-based diagnostic tool for strengthening traceability, minimal verification, and responsible orchestration of AI in higher education.

MTI, Vol. 10, Pages 33: HMI Design of Intelligent Vehicles Based on Multimodal Experiments of Driver Emotions

Tongyue Sun — 2026-03-21

MTI, Vol. 10, Pages 33: HMI Design of Intelligent Vehicles Based on Multimodal Experiments of Driver Emotions

Multimodal Technologies and Interaction doi: 10.3390/mti10030033

Authors: Tongyue Sun Yongjia Li Xihui Yang

Negative driving emotions constitute a significant factor compromising road safety. Current intelligent vehicle human machine interaction (HMI) systems predominantly focus on functional implementation, lacking the capability to perceive and adapt to the driver’s psychological state. To address this issue, this study investigates the intrinsic relationship between driving emotions and HMI through multimodal experiments. Experiment One reveals the distribution patterns of drivers’ visual attentional scope under different emotional states. Experiment Two establishes a color preference model for HMI interfaces corresponding to specific emotions. Experiment Three quantitatively analyzes the impact of emotional variations on the perceptual efficiency of auditory warnings. Based on the experimental data, an interaction design principle matching “Emotion-Scene-Modality” is formulated, guiding the design of a data-driven, emotion-adaptive HMI prototype system. This system can perceive the driver’s emotional state in real time via multimodal sensors and dynamically adjust interface color themes, information layout, warning sound effects, and voice interaction style according to predefined interaction strategies. Usability testing demonstrates that, compared to traditional static HMI, this affective adaptive system effectively mitigates the driver’s negative emotional load and provides alerts that are more perceptible and less likely to cause irritation during critical moments. Consequently, it offers a significant theoretical foundation and practical reference for constructing a safer and more comfortable next-generation intelligent vehicle cockpit interaction paradigm.

MTI, Vol. 10, Pages 32: Design and Evaluation of Interactive Radar Visualisation of Academic Performance for Parents and Students

Ka Ian Chan — 2026-03-20

MTI, Vol. 10, Pages 32: Design and Evaluation of Interactive Radar Visualisation of Academic Performance for Parents and Students

Multimodal Technologies and Interaction doi: 10.3390/mti10030032

Authors: Ka Ian Chan Patrick Pang Huiwen Zou

This study investigates how parents and students interpret and form continued engagement intentions with a radar visualisation tool designed to present multi-subject academic performance. While data visualisation is increasingly used in education, limited empirical attention has been given to whether parents and students, who share the same performance information but hold distinct roles, respond to visualised reports through similar behaviours. To address this gap, an interactive radar visualisation was developed to present secondary school students’ achievement across subjects with peer reference points. Drawing on the Unified Theory of Acceptance and Use of Technology (UTAUT) as an analytical framework, this study examines the determinants of continued intention to use the visualisation tool. Questionnaire data were collected from 706 parents and 264 students in a Macao secondary school. Structural equation modelling (SEM) revealed fundamentally different ideas of continued engagement. For parents, continued intention was significantly associated with performance expectancy (PE) and effort expectancy (EE), social influence (SI) and facilitating conditions (FC), suggesting the tool functioned as a decision support system for academic planning. For students, only social influence (SI) and facilitating conditions (FC) emerged as significant predictors, indicating that peer comparison and external expectations may not fit their needs. Parents also reported significantly higher continued intention than students. The finding extended UTAUT by demonstrating that core acceptance relationships are moderated by different roles, reframing technology acceptance in educational visualisation from system adoption to information interpretation. The study provides empirical evidence that visualised performance reporting functions not merely as a data display but also as a communication medium whose meaning is actively constructed by users. These insights highlight the need for role-sensitive design, emphasising actionable planning support for parents and personally meaningful, agency-oriented feedback for students, in order to foster productive home–school communication and sustained engagement with learning information.

MTI, Vol. 10, Pages 31: Navigating the Future: A Design Fiction Study on User Perceptions of Next-Gen LLM-Based Voice Interaction

Biju Thankachan — 2026-03-20

MTI, Vol. 10, Pages 31: Navigating the Future: A Design Fiction Study on User Perceptions of Next-Gen LLM-Based Voice Interaction

Multimodal Technologies and Interaction doi: 10.3390/mti10030031

Authors: Biju Thankachan Deepak Akkil Sama Rahman Kristiina Jokinen Markku Turunen

Voice user interfaces (VUIs) have evolved from simple command-based systems to more advanced platforms capable of engaging in complex, multi-turn conversations. While current VUIs primarily perform routine tasks, their future trajectory is poised to be significantly shaped by advancements in large language models (LLMs), enhancing their language understanding and human-like interaction capabilities. This study explores user perceptions of next-generation VUIs using a design fiction approach. We crafted five plausible future scenarios, depicted in comic-style formats, showcasing diverse VUI use-cases. Results from the focus group discussions reveal valuable insights highlighting the potential and challenges of integrating advanced VUIs into everyday interactions. Our results highlight the importance of building trust, factors influencing trust, social aspects and implications of technology, preferences for interaction techniques, and various ethical considerations associated with technology. We conclude by providing design guidelines for future VUIs, emphasizing the need for designing to build trust, the importance of domain specificity, the importance of enabling social experiences mediated via VUIs, and more.

MTI, Vol. 10, Pages 30: Possibilities of Artificial Intelligence in Sports Refereeing: An Exploratory Study Contrasting the Literature Review with Expert-Perceived Opportunities

David Martín Moncunill — 2026-03-19

MTI, Vol. 10, Pages 30: Possibilities of Artificial Intelligence in Sports Refereeing: An Exploratory Study Contrasting the Literature Review with Expert-Perceived Opportunities

Multimodal Technologies and Interaction doi: 10.3390/mti10030030

Authors: David Martín Moncunill Domingo Sampedro Lirio Miguel Ángel Bravo Hijón

Sports have progressively incorporated technological advances, yet while the impact on performance and broadcasting is remarkable, the application of Artificial Intelligence (AI) in sports refereeing appears residual. A closer examination of prior research suggests that this limited development reflects deeper conceptual patterns within the field. While existing research on AI in sports officiating has predominantly conceptualized the field under an accuracy-optimization paradigm (focusing on decision precision, visual attention patterns, referee fatigue, and performance enhancement), there is a systematic lack of theoretical and empirical work that frames officiating as a broader socio-technical ecosystem. In particular, the literature does not provide conceptual models addressing (i) AI-assisted risk prevention and athlete safety as a core officiating function, (ii) human–AI task redistribution in cognitively overloaded and hybrid evaluative environments (e.g., disciplines such as artistic gymnastics or bodybuilding, where technical execution and aesthetic judgment are simultaneously assessed), and (iii) the redefinition of the referee’s role when AI operates as an anticipatory or real-time alert system rather than merely as a post hoc verification tool. Thus, the gap is not only one of application but of knowledge production: the dominant paradigm optimizes decision accuracy, yet it leaves the question of how AI can transform refereeing responsibilities, cognitive load distribution, and safety governance within competitive ecosystems under-theorized. This exploratory study adopts a Human–Computer Interaction (HCI) perspective to contrast existing initiatives with the practical expectations of professional referees. The methodology comprises two pillars: a systematic literature review following PRISMA guidelines and qualitative experimentation involving professional referees using focus groups and affinity diagrams techniques. From an initial total of 1251 records retrieved across five academic databases (2019–2025), 1122 articles were analyzed after applying strict inclusion/exclusion criteria. The findings provide preliminary support for our hypothesis of a significant underutilization gap, showing that research is concentrated on accuracy systems, while high-potential areas identified as critical by experts, such as athlete safety, represent only 0.6% of the analyzed literature. The study contributes a conceptual framework based on five categories established by experts, according to the identified use cases, providing guidance for future AI integration and interdisciplinary research in the sports officiating ecosystem. Based on the results, we point to future applications and lines of research aimed at integrating AI as a tool for sports refereeing.

MTI, Vol. 10, Pages 29: IRVINE: An Interactive Visualization for Spontaneous Reporting Systems Databases Missing Values

Ali Sharifi Kia — 2026-03-13

MTI, Vol. 10, Pages 29: IRVINE: An Interactive Visualization for Spontaneous Reporting Systems Databases Missing Values

Multimodal Technologies and Interaction doi: 10.3390/mti10030029

Authors: Ali Sharifi Kia Kamran Sedig Niaz Chalabianloo Sheikh S. Abdullah Flory T. Muanda

Large-scale post-marketing drug safety data from spontaneous reporting systems offer new opportunities to explore adverse drug events (ADEs). However, these datasets often contain high rates of missing and incomplete data, undermining the reliability and interpretability of pharmacovigilance analyses. Effective management of these data quality issues requires interactive tools to explore patterns of missingness across multiple dimensions. We present IRVINE (Interactive Visualization for Spontaneous Reporting Systems Databases Missing Values), an interactive visualization system designed to explore and compare missing data in spontaneous reporting systems. IRVINE integrates multiple coordinated components—including a global overview, detailed attribute-level breakdowns, a temporal analysis interface, and a cross-database comparison environment—allowing users to fluidly transition between global summaries and fine-grained diagnostic views. The system supports dynamic filtering, drill-down exploration, and interactive temporal analysis to examine changes in data completeness over time and across categories. Through three usage scenarios and a user study, we demonstrate how IRVINE supports effective exploration of reporting completeness. Results indicate that users perceived the system as easy to use and effective for identifying missingness patterns, with particular strengths in comparative and detail-level analysis. This work lays a foundation for improved transparency, interpretability, and data quality assessment in large-scale pharmacovigilance systems.

MTI, Vol. 10, Pages 28: Short-Term Performance of Visual Attention Prompt Methods Across Driver Proficiency in a Driving Simulator

Jinwei Liang — 2026-03-11

MTI, Vol. 10, Pages 28: Short-Term Performance of Visual Attention Prompt Methods Across Driver Proficiency in a Driving Simulator

Multimodal Technologies and Interaction doi: 10.3390/mti10030028

Authors: Jinwei Liang Makio Ishihara

In complex driving environments, drivers must continuously detect and respond to critical visual information such as traffic signs and pedestrians. However, important targets may sometimes be overlooked due to high cognitive load during driving. Therefore, visual attention prompt methods have been proposed to guide drivers’ gaze toward relevant targets. A visual attention prompt method is a visual cue presented in a key area in a user’s field of view to draw his/her visual attention. This study evaluates the short-term performance of five visual attention prompt methods (Point, Arrow, Blur, Dusk, and ModAF) in a driving simulator and compares their performance between novice and proficient drivers. Eye-tracking data and multiple analyses are used to examine whether the influence of these methods could be maintained after they are disabled and to clarify drivers’ response patterns across methods in consideration with their driving proficiency. The results indicate that visual attention prompt methods could induce a short-term transfer effect, as drivers still tend to fixate on target traffic signs earlier after the methods are disabled, and the elapsed-time analysis estimates that this effect lasts about 84.35 s. Overall, the Point, Arrow, and Dusk methods show relatively stronger performance with significant reductions in the elapsed time to fixate on the traffic sign. The clustering analysis further shows that drivers’ response patterns are not uniform, with two clusters for novice drivers and three clusters for proficient drivers. The results suggest that most novice drivers tend to benefit from explicit non-directional visual cues that enhance target salience, such as the Point method, whereas proficient drivers are more likely to benefit from explicit directional visual cues that provide clear directional guidance, such as the Arrow method. These findings suggest that visual attention prompt methods may be useful for developing driver training strategies tailored to different levels of driving proficiency, helping drivers maintain more effective visual attention allocation during driving and potentially contributing to improved driving safety.

MTI, Vol. 10, Pages 27: Exploring Student and Educator Challenges in AI Competency Development: A Comparative Analysis

Xin Zhao — 2026-03-09

MTI, Vol. 10, Pages 27: Exploring Student and Educator Challenges in AI Competency Development: A Comparative Analysis

Multimodal Technologies and Interaction doi: 10.3390/mti10030027

Authors: Xin Zhao Fengchun Miao Haoyu Xie Xuanning Chen

As artificial intelligence (AI) rapidly transforms the landscape of higher education, there is a critical need to develop AI competency among both educators and students. However, current AI policies and guidelines are often top-down and lack grassroots insights from key stakeholders. Drawing on the recently released UNESCO AI competency frameworks for educators and students (2024), this study presents findings from a global survey of over 600 students and educators. The results highlight significant disparities in AI engagement across groups, disciplines, and regions, as well as barriers such as inconsistent institutional guidance, limited access to hands-on training, and infrastructural constraints, particularly in Global South contexts. Drawing on these insights, the study offers practical, evidence-informed recommendations for higher education institutions, educators, and students to support equitable, sustainable, and context-sensitive AI competency development.

MTI, Vol. 10, Pages 26: Interactive Simulation of Plaster Model Turning for Porcelain Slip-Casting Mould-Master Design

Dimitrios Zourarakis — 2026-03-06

MTI, Vol. 10, Pages 26: Interactive Simulation of Plaster Model Turning for Porcelain Slip-Casting Mould-Master Design

Multimodal Technologies and Interaction doi: 10.3390/mti10030026

Authors: Dimitrios Zourarakis Ines Moreno Arnaud Dubois Jessie Derogy Panagiotis Koutlemanis Nikolaos Partarakis Xenophon Zabulis

This paper presents the design and evaluation of an interactive simulator for plaster turning in porcelain slip-casting. Whereas most virtual pottery systems model clay deformation, our tool simulates the subtractive shaping of rigid plaster blanks, an essential intermediate step in mould-master production. Co-designed with expert practitioners through a user-centred process, the simulator follows workshop practice from blank preparation to the geometric constraints of the turning wheel. We report five iterative prototypes and show how expert feedback replaced generic sculpting metaphors with task-faithful interactions, including correct hand positioning, rotation-dependent turning, and authentic preparatory routines. Our evaluation suggests that the system supports the acquisition of tacit procedural knowledge while also producing geometric data compatible with physically based rendering workflows. This research contributes to the digital preservation of intangible cultural heritage by making the material reasoning of porcelain manufacture accessible in a virtual environment.

MTI, Vol. 10, Pages 25: Augmented Reality’s Impact on Student Creativity in Design and Technology: An Immersive Learning Study

Zuraini Yakob — 2026-03-04

MTI, Vol. 10, Pages 25: Augmented Reality’s Impact on Student Creativity in Design and Technology: An Immersive Learning Study

Multimodal Technologies and Interaction doi: 10.3390/mti10030025

Authors: Zuraini Yakob Nazlena Mohamad Ali Mohamad Hidir Mhd Salim Norshita Mat Nayan

This quasi-experimental study examined the effectiveness of Augmented Reality (AR)-enhanced instruction on creativity development in Malaysian Design and Technology education. Forty-six, fifteen-year-old female students were assigned to AR-enhanced (n = 23) or traditional instruction (n = 23) groups for a four-week Mechatronic Design unit. Creativity was assessed using an adapted Torrance Tests of Creative Thinking-Figural (TTCT-F) instrument with expert validation and independent scoring by three raters. Bootstrapped ANCOVA (5000 iterations) controlling for pretest differences revealed significant improvements across all Guilford creativity components in the AR group: Elaboration (F = 27.093, p < 0.001, η2 = 0.387), Originality (F = 20.445, p < 0.001, η2 = 0.322), Fluency (F = 17.896, p < 0.001, η2 = 0.294), and Flexibility (F = 7.593, p = 0.008, η2 = 0.150). The differential effect pattern suggests AR operates through multiple mechanisms, primarily socio-constructivist collaborative scaffolding, followed by motivational enhancement and cognitive load reduction. These findings demonstrate AR’s substantial potential for creativity development in Design and Technology education, particularly for collaborative elaboration and generative ideation. However, single gender sampling, brief intervention duration, and quasi-experimental design limit generalizability, warranting future research with diverse populations and extended interventions.

MTI, Vol. 10, Pages 24: From the Reality–Virtuality Continuum to the XR Ecosystem: A Systematic Literature Review of Definitions and Conceptual Models

Xiaoran Han — 2026-03-02

MTI, Vol. 10, Pages 24: From the Reality–Virtuality Continuum to the XR Ecosystem: A Systematic Literature Review of Definitions and Conceptual Models

Multimodal Technologies and Interaction doi: 10.3390/mti10030024

Authors: Xiaoran Han Teijo Lehtonen Tuomas Mäkilä

Extended Reality (XR) technologies are rapidly reshaping human–computer interaction; however, persistent ambiguity in the use of core terms (VR, AR, MR) hampers cumulative knowledge building, cross-study comparability, and technical standardisation. This review evaluates the XR conceptual landscape across four primary dimensions: the historical evolution of core definitions, the synthesis of contemporary theoretical frameworks, the critical extensions of the Reality-Virtuality (RV) Continuum, and the alignment between academic taxonomies and industry practices. This review evaluates the XR conceptual landscape across four primary dimensions: the historical evolution of core definitions, the synthesis of contemporary theoretical frameworks, the critical extensions of the Reality-Virtuality (RV) Continuum, and the alignment between academic taxonomies and industry practices. To address this issue, we conducted a PRISMA-guided systematic literature review across four major databases (IEEE Xplore, ACM Digital Library, Scopus, and Web of Science), complemented by seminal and industry sources. Of the 173,677 retrieved records, 59 studies were included in the synthesis. Using thematic synthesis, we mapped the historical evolution of definitions and conceptual models and identified recurring analytical dimensions. The results indicate a clear paradigm shift from Milgram’s one-dimensional Reality–Virtuality continuum—originally grounded in visual display technology—towards a multidimensional conceptual space that integrates subjective user-experience constructs (e.g., coherence and plausibility) with objective system characteristics. The included studies cover 1968–2025, with marked acceleration in the 2020s: 2022 alone accounts for the highest annual count (9 studies), and nearly half of the corpus (47.5%) was published in 2021–2025. We further show that industry actors pragmatically re-bound these academic concepts for product and market positioning, leading to systematic divergences between academic and industrial definitions. By distilling key turning points and synthesising core analytical dimensions into a structured lens, this review provides a historically grounded, actionable understanding of the XR conceptual landscape to support terminological alignment across research and practice.

MTI, Vol. 10, Pages 23: Behavioral Engagement in VR-Based Sign Language Learning: Visual Attention as a Predictor of Performance and Temporal Dynamics

Davide Traini — 2026-03-02

MTI, Vol. 10, Pages 23: Behavioral Engagement in VR-Based Sign Language Learning: Visual Attention as a Predictor of Performance and Temporal Dynamics

Multimodal Technologies and Interaction doi: 10.3390/mti10030023

Authors: Davide Traini José Manuel Alcalde-Llergo Mariana Buenestado-Fernández Domenico Ursino Enrique Yeguas-Bolívar

Understanding how learners engage with immersive sign language training environments is essential for advancing virtual reality-based education and inclusion. This study analyzes behavioral engagement in SONAR, a virtual reality application designed for sign language training and validation. We focus on three automatically derived engagement indicators (Visual Attention (VA), Video Replay Frequency (VRF), and Post-Playback Viewing Time (PPVT)) and examine their relationship with learning performance in a sample of 117 university students. Participants completed a self-paced Training phase with 12 sign language instructional videos, followed by a Validation quiz assessing retention. We employed Pearson correlation analysis to examine the relationships between engagement indicators and quiz performance, followed by binomial Generalized Linear Model (GLM) regression to assess their joint predictive contributions. Additionally, we conducted temporal analysis by aggregating moment-to-moment VA traces across all learners to characterize engagement dynamics during the learning session. Results show that VA exhibits a strong positive correlation with quiz performance (r = 0.76), followed by PPVT (r = 0.66), whereas VRF shows no meaningful association. A binomial GLM confirms that VA and PPVT are significant predictors of learning success, jointly explaining a substantial proportion of performance variance (pseudo−R2 = 0.83). Going beyond outcome-oriented analysis, we characterize temporal engagement patterns by aggregating moment-to-moment VA traces across all learners. The temporal profile reveals distinct attention peaks aligned with informationally dense segments of both training and validation videos, as well as phase-specific engagement dynamics, including initial acclimatization, oscillatory attention cycles during learning, and pronounced attentional peaks during assessment. Together, these findings highlight the central role of sustained and strategically allocated visual attention in VR-based sign language learning and demonstrate the value of behavioral trace data for understanding and predicting learner engagement in immersive environments.

MTI, Vol. 10, Pages 22: When Interfaces “Act for You”: An Eye-Tracking Experiment on Delegation, Transparency Cues, and Trust in Agentic Shopping Assistants

Stefanos Balaskas — 2026-03-01

MTI, Vol. 10, Pages 22: When Interfaces “Act for You”: An Eye-Tracking Experiment on Delegation, Transparency Cues, and Trust in Agentic Shopping Assistants

Multimodal Technologies and Interaction doi: 10.3390/mti10030022

Authors: Stefanos Balaskas Kyriakos Komis Ioanna Yfantidou Dimitra Skandali

Agentic shopping assistants increasingly move beyond recommending products to executing actions in users’ workflows (e.g., adding items to cart, applying coupons, selecting shipping). This shift from advice to delegation raises questions about appropriate reliance, perceived control, and how interface cues support oversight when systems can act. We report a laboratory eye-tracking experiment using a chat-only e-commerce prototype in a mixed 2 × 2 design: action autonomy varied within participants (recommend-only vs. act-on-behalf, with undo/edit), and transparency cues varied between participants (minimal statements vs. preview + rationale describing what will happen and why). Three standardized shopping tasks were completed by 72 participants. Results included behavioral logs (task time, overrides), areas-of-interest (AOI)-based eye-tracking (chat attention and verification indicators), and post-task self-reports (trust, control, uneasiness, perceived transparency). Act-on-behalf autonomy reduced completion time, but it also increased unease, decreased trust and perceived control, and increased the likelihood of an override, suggesting a trade-off between efficiency and oversight. The autonomy-related penalties for trust and perceived control under act-on-behalf execution were lessened by preview + rationale transparency, which additionally enhanced perceived transparency, trust, and unease. This mechanism coincided with eye-tracking: transparency decreased verification latency during agent actions and redirected attention toward information supplied by assistants. Transparency did not reliably reduce overrides, suggesting that minimal effective transparency can streamline supervision and improve evaluations without eliminating corrective behavior.

MTI, Vol. 10, Pages 21: Agent-Based Paradigm for the Self-Configuration of a Conceptual Mechanical Assembly Modeling Application in Virtual Reality

Julian Conesa — 2026-02-22

MTI, Vol. 10, Pages 21: Agent-Based Paradigm for the Self-Configuration of a Conceptual Mechanical Assembly Modeling Application in Virtual Reality

Multimodal Technologies and Interaction doi: 10.3390/mti10020021

Authors: Julian Conesa Francisco José Mula Manuel Contero

The immersive, multisensory experiences offered by virtual reality have been transformative across multiple disciplines, enhancing practical and theoretical skills while increasing user motivation and learning. On the other hand, multi-agent systems have proven to be effective in facilitating the expansion and modularity of computer systems. This paper presents an application developed in a virtual reality environment based on multi-agent systems for the conceptual design of mechanical assemblies from primitives. As a main novelty, the primitives can be defined by the user of the application from a set of models and images, and an Excel document, without the need for programming knowledge, taking advantage of the possibilities offered by multi-agent systems. In addition, for each primitive, it is possible to define a set of geometric and dimensional modifications, as well as a set of position relations with respect to other primitives to generate mechanical assemblies.

MTI, Vol. 10, Pages 20: How Virtual Reality Design Reshapes Our Ecological Connection to Natural Systems

Ivonne Angelica Castiblanco Jimenez — 2026-02-20

MTI, Vol. 10, Pages 20: How Virtual Reality Design Reshapes Our Ecological Connection to Natural Systems

Multimodal Technologies and Interaction doi: 10.3390/mti10020020

Authors: Ivonne Angelica Castiblanco Jimenez Santiago Parra Barrios Ana Maria Correa Jimenez

This integrative literature review examines how virtual reality (VR) design can transform environmental understanding by changing users from passive observers to active participants in ecological systems. We aimed to analyze the interaction strategies through which VR enables environmental awareness and to identify the most effective approaches for fostering ecological connection. Through systematic analysis of studies published between 2015 and 2025, we found that effective VR implementations share three core design mechanisms: progressive engagement that builds connection over time, a careful balance between interaction and reflection, and multisensory integration that creates believable immersive experiences. These design mechanisms, in turn, build ecological connection through three fundamental pillars: perspective-taking that generates empathy, the creation of authentic sensory experiences, and the development of network thinking to understand complex interconnections. This review contributes to the field by mapping the development of environmental VR applications, identifying successful implementation strategies, and highlighting research gaps. Our analysis provides a comprehensive interaction framework for designing more effective environmental experiences and advancing this emerging field when innovative approaches are most needed.

MTI, Vol. 10, Pages 19: MoodScape: Emotion-Informed Terrain Synthesis for Virtual Reality System

Rahul Kumar Rai — 2026-02-11

MTI, Vol. 10, Pages 19: MoodScape: Emotion-Informed Terrain Synthesis for Virtual Reality System

Multimodal Technologies and Interaction doi: 10.3390/mti10020019

Authors: Rahul Kumar Rai Reshu Bansal Shashi Shekhar Jha

(1) Background: Virtual environments (VEs) significantly influence human emotions through various elements such as lighting, color, and terrain. While the effects of lighting and color on emotions within VEs have been extensively studied, the impact of the terrain remains underexplored. This paper addresses this gap by investigating the correlation between terrain characteristics in VEs and users’ emotional states. (2) Methods: We conducted a user study in which participants were exposed to various 3D terrains and used the Self-Assessment Manikin (SAM) to rate their emotional responses (valence, arousal, and dominance). Building on these insights, we propose MoodScape, an automated framework for emotion-informed terrain generation that significantly reduces the need for extensive expertise and manual effort. In the current implementation, continuous SAM valence–arousal targets are discretised into four quadrant-based affect/terrain classes, and this discrete class label conditions DH-CVAE-GAN terrain synthesis. MoodScape designs a generative adversarial network (GAN) architecture called DH-CVAE-GAN, which integrates a dual-head conditional variational autoencoder as the generator alongside a discriminator network to ensure effective and realistic terrain generation. The DH-CVAE-GAN is trained on a satellite-derived digital elevation model (DEM) dataset, which helps the generated terrains reflect realistic geographic patterns. (3) Results: Quantitative and qualitative evaluations on our study sample suggest that MoodScape can generate terrains whose perceived affective tone is broadly consistent with the specified affect-class inputs, indicating potential applications in gaming and exploratory therapeutic Virtual Reality, while formal clinical efficacy remains in future work.

MTI, Vol. 10, Pages 18: Virtual Reality Learning Environments: A Review of Support for Autonomous Learning Development

Pablo Fernández-Arias — 2026-02-05

MTI, Vol. 10, Pages 18: Virtual Reality Learning Environments: A Review of Support for Autonomous Learning Development

Multimodal Technologies and Interaction doi: 10.3390/mti10020018

Authors: Pablo Fernández-Arias Antonio del Bosque Diego Vergara

The rapid expansion of digital education in the 21st century has positioned Virtual Reality Learning Environments (VRLEs) as promising spaces for fostering greater learner autonomy. As immersive technologies become more accessible and pedagogically versatile, they offer students opportunities to regulate their learning processes, experiment in interactive scenarios, and progress at their own pace. This review examines how autonomous learning has been conceptualized and investigated within VRLE research through a comprehensive bibliometric analysis of studies published between 2000 and 2025. The results reveal a research field shaped by two major orientations: one focused on human and pedagogical dimensions (learner diversity, instructional design, and evidence-based strategies) and another on technological innovation (artificial intelligence, machine learning, and simulation-based systems). Topic analyses show that digital and immersive education dominate current scholarly production, while areas directly related to autonomy, personalized learning, and student-centered methodologies remain comparatively less developed. Accordingly, it is crucial to reinforce pedagogical structures that enable autonomous learning in VR environments and to integrate technological advancements in a manner that translates into tangible improvements in educational quality across different settings.

MTI, Vol. 10, Pages 17: Beyond the Classroom: Technology-Enabled Acceleration Models for Gifted Learners in the Digital Era

Yusra Zaki Aboud — 2026-02-04

MTI, Vol. 10, Pages 17: Beyond the Classroom: Technology-Enabled Acceleration Models for Gifted Learners in the Digital Era

Multimodal Technologies and Interaction doi: 10.3390/mti10020017

Authors: Yusra Zaki Aboud

The digital era represents a paradigm shift in gifted education, moving at an accelerating pace away from traditional models toward flexible and personalized technology-based pathways. This study investigates the impact of a model implemented via the FutureX platform in Saudi Arabia on the autonomy and self-regulated learning (SRL) of 63 gifted high school students. Using a quasi-experimental design, the study integrated quantitative measures (paired t-tests) with phenomenological analysis of interviews. The quantitative results showed statistically significant improvements (p < 0.001) in the dimensions of autonomy and self-regulated learning, with large Cohen’s d effect sizes for planning (d = 1.05), monitoring (d = 1.05), and cognitive control (d = 1.30). These gains were supported by a pedagogical design intentionally embedded within the platform to scaffold self-regulation. These findings were reinforced by qualitative results, with 88% of gifted students reporting that the platform provided appropriately challenging content and promoted self-learning and goal-setting behaviors.

MTI, Vol. 10, Pages 16: Systematic Analysis of Vision–Language Models for Medical Visual Question Answering

Muhammad Haseeb Shah — 2026-02-03

MTI, Vol. 10, Pages 16: Systematic Analysis of Vision–Language Models for Medical Visual Question Answering

Multimodal Technologies and Interaction doi: 10.3390/mti10020016

Authors: Muhammad Haseeb Shah Heriberto Cuayáhuitl

General-purpose vision–language models (VLMs) are increasingly applied to imaging tasks, yet their reliability on medical visual question answering (Med-VQA) remains unclear. We investigate how three state-of-the-art VLMs—ViLT, BLIP, and MiniCPM-V-2—perform on radiology-focused Med-VQA when evaluated in a modality-aware manner. Using SLAKE and OmniMedVQA-Mini, we construct harmonised subsets for computed tomography (CT), magnetic resonance imaging (MRI), and X-ray, standardising schema and answer processing. We first benchmark all models in a strict zero-shot setting, then perform supervised fine-tuning on modality-specific data splits, and finally add a post-hoc semantic option-selection layer that maps free-text predictions to multiple-choice answers. Zero-shot performance is modest (exact match ≈20% for ViLT/BLIP and 0% for MiniCPM-V-2), confirming that off-the-shelf deployment is inadequate. Fine-tuning substantially improves all models, with ViLT reaching ≈80% exact match and BLIP ≈50%, while MiniCPM-V-2 lags behind. When coupled with option selection, ViLT and BLIP achieve 90–93% exact match and F1 across all modalities, corresponding to 95–97% BERTScore-F1. Our novel results show that (i) modality-specific supervision is essential for Med-VQA, and (ii) post-hoc option selection can transform strong but imperfect generative predictions into highly reliable discrete decisions on harmonised radiology benchmarks. The latter is useful for medical VLMs that combine generative responses with option or sentence selection.

MTI, Vol. 10, Pages 15: A User-Centered Evaluation of a VR HMD-Based Harvester Training Simulator

Pranjali Barve — 2026-02-02

MTI, Vol. 10, Pages 15: A User-Centered Evaluation of a VR HMD-Based Harvester Training Simulator

Multimodal Technologies and Interaction doi: 10.3390/mti10020015

Authors: Pranjali Barve Raffaele De Amicis

Skilled operation of forestry harvesters is essential for ensuring safety, efficiency, and sustainability in logging practices. However, conventional training methods are often prohibitively expensive and limited by access to specialized equipment. This study delivers one of the first user-centered validations of a low-cost, VR HMD-based forestry harvester simulator, directly addressing access and scalability barriers in training. With 26 participants, we quantify cognitive load, usability, user experience, and simulator sickness using established instruments. An increase in cognitive load was seen from baseline tutorial to each training module (NASA-TLX: 18.65→34.26→38.43; rm-ANOVA, p < 0.001). Usability was ‘Good’ (with a mean SUS score: 76.63), hedonic UX ranked in the top decile (UEQ-S), and simulator sickness was moderate (mean SSQ score: 28.91), while task success remained high across all modules. These results indicate early-stage feasibility and usability of a low-cost VR HMD harvester simulator for student-focused introductory instruction, and they provide actionable design guidance (e.g., managing extraneous load, comfort safeguards) advancing evidence-based VR HMD-based training in the forest engineering and harvesting domain. Our findings validate the potential of VR-HMD as a tool for forestry education capable of addressing training accessibility gaps and enhancing learner motivation through immersive experiential learning.

MTI, Vol. 10, Pages 14: An HCI-Centered Experiences of ICT Integration and Its Impact on Professional Competencies Supporting Formative Assessment in Higher Education e-Learning

Abdelaziz Boumahdi — 2026-02-02

MTI, Vol. 10, Pages 14: An HCI-Centered Experiences of ICT Integration and Its Impact on Professional Competencies Supporting Formative Assessment in Higher Education e-Learning

Multimodal Technologies and Interaction doi: 10.3390/mti10020014

Authors: Abdelaziz Boumahdi Fadwa Ammari Mohammed Ammari

As universities expand their e-learning systems, it becomes increasingly important to understand how the use of information and communication technologies (ICTs) changes the skills needed for effective formative assessment. This study uses the principles of human–computer interaction (HCI) to create a framework for examining how digital tools, interfaces, and modes of interaction influence the way teachers assess students in higher education. The research relies on the information provided by 115 Mohammed V University teachers, who filled out a competency-based assessment grid regarding online assessment practices. The results remain exploratory and context-dependent and do not make claims of statistical representativeness beyond the studied institutional context. The findings attest to the virtues of digital technology in improving methodological and techno-pedagogical skills, without excluding the existence of serious shortcomings in semio-ethical and evaluative skills. It is certainly useful to leverage feedback to correct imperfections in evaluation practices and make them more responsive to digital interfaces. It is becoming imperative to rethink professional skills as the regulatory halo of the online formative assessment system, in order to evaluate a more synergistic framework that can give better visibility to virtual classrooms.

MTI, Vol. 10, Pages 13: A Mixed Reality Tool with Automatic Speech Recognition for 3D CAD Based Visualization and Automatic Dimension Generation in the Industry 5.0 Shipyard

Aida Vidal-Balea — 2026-02-01

MTI, Vol. 10, Pages 13: A Mixed Reality Tool with Automatic Speech Recognition for 3D CAD Based Visualization and Automatic Dimension Generation in the Industry 5.0 Shipyard

Multimodal Technologies and Interaction doi: 10.3390/mti10020013

Authors: Aida Vidal-Balea Antón Valladares-Poncela Javier Vilar-Martínez Tiago M. Fernández-Caramés Paula Fraga-Lamas

Industry 5.0 is composed of a variety of complex tasks and challenging processes requiring specialized labor and multidisciplinary coordination. Specifically, when it comes to shipbuilding, shipyards leverage advanced technologies, seeking to replace operations that continue to rely on traditional methods, such as 2D blueprints and paper-based documentation, which can lead to inefficiencies and alignment errors in precision-dependent tasks. For this reason, this article focuses on embracing Mixed Reality (MR) technologies to address these challenges in the context of electrical outfitting tasks. The design, development and evaluation of a MR application tailored for HoloLens 2 smart glasses aims to streamline the workflow for operators, reducing reliance on paper-based documentation and enhancing the precision of assembly processes. The proposed system allows for the precise positioning of 3D models in the real environment, ensuring accurate alignment during assembly. Additionally, it incorporates automatic dimension generation between objects in the scene. To further enhance usability, the application integrates a Galician on-device Automatic Speech Recognition (ASR) system, allowing operators to interact seamlessly with the MR interface using voice commands. The whole system has been exhaustively tested, both through usability and functionality evaluations, which validate MR as a viable tool for shipyard assembly and inspection tasks.

MTI, Vol. 10, Pages 12: Design and Prototype of a Chatbot for Public Participation in Major Infrastructure Projects

Jonathan Matthei — 2026-01-30

MTI, Vol. 10, Pages 12: Design and Prototype of a Chatbot for Public Participation in Major Infrastructure Projects

Multimodal Technologies and Interaction doi: 10.3390/mti10020012

Authors: Jonathan Matthei Johannes Maas Maurice Wischum Sven Mackenbach Katharina Klemt-Albert

Public participation is a central element of democratic decision-making processes, but it often faces challenges within planning approval procedures due to problems of understanding and accessibility. This paper aims to counteract these challenges through the conceptual development, prototypical implementation and validation of a chatbot. The chatbot is designed to facilitate access to planning documents and improve the participation process as a whole. After presenting the theoretical foundations of chatbots and large language models (LLMs), three central use cases are described. The main tasks of the chatbot are to simplify the language of complex planning documents, find documents and information, and answer frequently asked questions. The underlying architecture of the prototype is based on the concept of retrieval augmented generation (RAG) and uses a vector database in which the information is embedded and stored as vectors. To evaluate the developed prototype, four focus workshops were conducted with professionals affiliated with road and rail infrastructure administrations at both state and federal levels in Germany. During these workshops, participants tested the core functionalities and assessed the system using both quantitative and qualitative criteria. The results indicate a strong potential for improving the handling of standard inquiries. By improving access to complex planning documents, the system may also contribute to a reduction in objections. At the same time, the evaluation emphasizes the importance of limiting hallucinations through appropriate technical safeguards and clearly indicating the use of AI to users. The insights gained from this study will be incorporated into the prototype developed within the BIM4People research project, funded by the German Federal Ministry of Transport. The aim therefore is to implement additional use cases and continuously optimize the functionality of the system through an iterative development process.

MTI, Vol. 10, Pages 11: Adaptive Realities: Human-in-the-Loop AI for Trustworthy XR Training in Safety-Critical Domains

Daniele Pretolesi — 2026-01-22

MTI, Vol. 10, Pages 11: Adaptive Realities: Human-in-the-Loop AI for Trustworthy XR Training in Safety-Critical Domains

Multimodal Technologies and Interaction doi: 10.3390/mti10010011

Authors: Daniele Pretolesi Georg Regal Helmut Schrom-Feiertag Manfred Tscheligi

Extended Reality (XR) technologies have matured into powerful tools for training in high-stakes domains, from emergency response to search and rescue. Yet current systems often struggle to balance real-time AI-driven personalisation with the need for human oversight and calibrated trust. This article synthesizes the programmatic contributions of a multi-study doctoral project to advance a design-and-evaluation framework for trustworthy adaptive XR training. Across six studies, we explored (i) recommender-driven scenario adaptation based on multimodal performance and physiological signals, (ii) persuasive dashboards for trainers, (iii) architectures for AI-supported XR training in medical mass-casualty contexts, (iv) theoretical and practical integration of Human-in-the-Loop (HITL) supervision, (v) user trust and over-reliance in the face of misleading AI suggestions, and (vi) the role of interaction modality in shaping workload, explainability, and trust in human–robot collaboration. Together, these investigations show how adaptive policies, transparent explanation, and adjustable autonomy can be orchestrated into a single adaptation loop that maintains trainee engagement, improves learning outcomes, and preserves trainer agency. We conclude with design guidelines and a research agenda for extending trustworthy XR training into safety-critical environments.

MTI, Vol. 10, Pages 10: APAR: A Structural Design and Guidance Framework for Gamification in Education Based on Motivation Theories

J. Carlos López-Ardao — 2026-01-10

MTI, Vol. 10, Pages 10: APAR: A Structural Design and Guidance Framework for Gamification in Education Based on Motivation Theories

Multimodal Technologies and Interaction doi: 10.3390/mti10010010

Authors: J. Carlos López-Ardao Miguel Rodríguez-Pérez Sergio Herrería-Alonso M. Estrella Sousa-Vieira Alfonso Lago Ferreiro Andrés Suárez-González Raúl F. Rodríguez-Rubio

Gamification is widely used to enhance student motivation, yet many educational design proposals remain conceptual and provide limited operational guidance for digital learning environments. This paper introduces APAR (Activities, Points, Achievements and Rewards), a content-independent structural framework for designing and implementing educational gamification in learning platforms. Grounded in motivation theories (including Self-Determination Theory and Relatedness–Autonomy–Mastery–Purpose) and reward taxonomies (Status, Access, Power and Stuff), APAR distinguishes high-level design constructs from concrete game elements (e.g., points, badges and leaderboards) and provides a systematic design loop linking learning activities, feedback, intermediate goals and reinforcement. The contribution includes (i) a mapping table relating each APAR construct to motivation models, supported dynamics and typical learning-platform implementations; (ii) an actionable design guide; and (iii) an empirical illustration implemented in Moodle in a higher-education Computer Networks course. In this setting, the proportion of enrolled students taking the final exam increased from 58% to 72% in the first year, and the proportion of enrolled students passing increased from 17% to 38%; in 2022–2023 these values were 70% and 39%, respectively (56% of exam takers passed). While the use case relies on quantitative course-level indicators and is observational, the findings support the potential of structural gamification as an integrated methodological tool and motivate further mixed-method validations.

MTI, Vol. 10, Pages 9: AI-Powered Procedural Haptics for Narrative VR: A Systematic Literature Review

Vimala Perumal — 2026-01-09

MTI, Vol. 10, Pages 9: AI-Powered Procedural Haptics for Narrative VR: A Systematic Literature Review

Multimodal Technologies and Interaction doi: 10.3390/mti10010009

Authors: Vimala Perumal Zeeshan Jawed Shah

Haptic feedback is important for narrative virtual reality (VR), yet authoring remains costly and difficult to scale due to device-specific tuning, placement constraints, and the need for semantically congruent timing. We systematically reviewed user studies on haptics in narrative VR to establish an empirical baseline and identify gaps for AI-powered procedural haptics. Following PRISMA 2020, we searched IEEE Xplore, ACM Digital Library, Scopus, Web of Science, PubMed, and PsycINFO (English; human participants; haptics synchronized to narrative events) and performed backward/forward citation chasing (final search: 31 July 2025). We also conducted a parallel scoping scan of grey literature (arXiv and CHI/SIGGRAPH workshops/demos), finalized on 7 September 2025; these records are summarized separately and were not included in the evidence synthesis. Of 493 records screened, 26 full texts were assessed, and 10 studies were included. Quantitatively, presence improved in 6/8 studies that measured it and immersion improved in 3/3; sample sizes ranged 8–108. Across varied modalities and placements, haptics improved presence and immersion and often enhanced affect; validated measures of narrative comprehension were rare. None of the included studies evaluated AI-generated procedural haptics in user studies. We conclude by proposing a structured, three-phase research roadmap designed to bridge this critical gap, moving the field from theoretical promise to the empirical validation of intelligent systems capable of making rich, adaptive, and scalable haptic narratives a reality.

MTI, Vol. 10, Pages 8: From Cues to Engagement: A Comprehensive Survey and Holistic Architecture for Computer Vision-Based Audience Analysis in Live Events

Marco Lemos — 2026-01-08

MTI, Vol. 10, Pages 8: From Cues to Engagement: A Comprehensive Survey and Holistic Architecture for Computer Vision-Based Audience Analysis in Live Events

Multimodal Technologies and Interaction doi: 10.3390/mti10010008

Authors: Marco Lemos Pedro J. S. Cardoso João M. F. Rodrigues

The accurate measurement of audience engagement in real-world live events remains a significant challenge, with the majority of existing research confined to controlled environments like classrooms. This paper presents a comprehensive survey of Computer Vision AI-driven methods for real-time audience engagement monitoring and proposes a novel, holistic architecture to address this gap, with this architecture being the main contribution of the paper. The paper identifies and defines five core constructs essential for a robust analysis: Attention, Emotion and Sentiment, Body Language, Scene Dynamics, and Behaviours. Through a selective review of state-of-the-art techniques for each construct, the necessity of a multimodal approach that surpasses the limitations of isolated indicators is highlighted. The work synthesises a fragmented field into a unified taxonomy and introduces a modular architecture that integrates these constructs with practical, business-oriented metrics such as Commitment, Conversion, and Retention. Finally, by integrating cognitive, affective, and behavioural signals, this work provides a roadmap for developing operational systems that can transform live event experience and management through data-driven, real-time analytics.

MTI, Vol. 10, Pages 7: Eye-Tracking and Emotion-Based Evaluation of Wardrobe Front Colors and Textures in Bedroom Interiors

Yushu Chen — 2026-01-06

MTI, Vol. 10, Pages 7: Eye-Tracking and Emotion-Based Evaluation of Wardrobe Front Colors and Textures in Bedroom Interiors

Multimodal Technologies and Interaction doi: 10.3390/mti10010007

Authors: Yushu Chen Wangyu Xu Xinyu Ma

Wardrobe fronts form a major visual element in bedroom interiors, yet material selection for their colors and textures often relies on intuition rather than evidence. This study develops a data-driven framework that links gaze behavior and affective responses to occupants’ preferences for wardrobe front materials. Forty adults evaluated color and texture swatches and rendered bedroom scenes while eye-tracking data capturing attraction, retention, and exploration were collected. Pairwise choices were modeled using a Bradley–Terry approach, and visual-attention features were integrated with emotion ratings to construct an interpretable attention index for predicting preferences. Results show that neutral light colors and structured wood-like textures consistently rank highest, with scene context reducing preference differences but not altering the order. Shorter time to first fixation and longer fixation duration were the strongest predictors of desirability, demonstrating the combined influence of rapid visual capture and sustained attention. Within the tested stimulus set and viewing conditions, the proposed pipeline yields consistent preference rankings and an interpretable attention-based score that supports evidence-informed shortlisting of wardrobe-front materials. The reported relationships between gaze, affect, and choice are associative and are intended to guide design decisions within the scope of the present experimental settings.

MTI, Vol. 10, Pages 6: HISF: Hierarchical Interactive Semantic Fusion for Multimodal Prompt Learning

Haohan Feng — 2026-01-06

MTI, Vol. 10, Pages 6: HISF: Hierarchical Interactive Semantic Fusion for Multimodal Prompt Learning

Multimodal Technologies and Interaction doi: 10.3390/mti10010006

Authors: Haohan Feng Chen Li

Recent vision-language pre-training models, like CLIP, have been shown to generalize well across a variety of multitask modalities. Nonetheless, their generalization for downstream tasks is limited. As a lightweight adaptation approach, prompt learning could allow task transfer by optimizing only several learnable vectors and thus is more flexible for pre-trained models. However, current methods mainly concentrate on the design of unimodal prompts and ignore effective means for multimodal semantic fusion and label alignment, which limits their representation power. To tackle these problems, this paper designs a Hierarchical Interactive Semantic Fusion (HISF) framework for multimodal prompt learning. On top of frozen CLIP backbones, HISF injects visual and textual signals simultaneously in intermediate layers of a Transformer through a cross-attention mechanism as well as fitting category embeddings. This architecture realizes the hierarchical semantic fusion at the modality level with structural consistency kept at each layer. In addition, a Label Embedding Constraint and a Semantic Alignment Loss are proposed to promote category consistency while alleviating semantic drift in training. Extensive experiments across 11 few-shot image classification benchmarks show that HISF improves the average accuracy by around 0.7% compared to state-of-the-art methods and has remarkable robustness in cross-domain transfer tasks. Ablation studies also verify the effectiveness of each proposed part and their combination: hierarchical structure, cross-modal attention, and semantic alignment collaborate to enrich representational capacity. In conclusion, the proposed HISF is a new hierarchical view for multimodal prompt learning and provides a more lightweight and generalizable paradigm for adapting vision-language pre-trained models.

MTI, Vol. 10, Pages 5: Neuroplasticity-Informed Learning Under Cognitive Load: A Systematic Review of Functional Imaging, Brain Stimulation, and Educational Technology Applications

Evgenia Gkintoni — 2025-12-31

MTI, Vol. 10, Pages 5: Neuroplasticity-Informed Learning Under Cognitive Load: A Systematic Review of Functional Imaging, Brain Stimulation, and Educational Technology Applications

Multimodal Technologies and Interaction doi: 10.3390/mti10010005

Authors: Evgenia Gkintoni Andrew Sortwell Stephanos P. Vassilopoulos Georgios Nikolaou

Background/Objectives: This systematic review examines neuroplasticity-informed approaches to learning under cognitive load, synthesizing evidence from functional imaging, brain stimulation, and educational technology research. As digital learning environments increasingly challenge learners with complex cognitive demands, understanding how neuroplasticity principles can inform adaptive educational design becomes critical. This review examines how neural mechanisms underlying learning under cognitive load can inform the development of evidence-based educational technologies that optimize neuroplastic potential while mitigating cognitive overload. Methods: Following PRISMA guidelines, we synthesized 94 empirical studies published between 2005 and 2025 across PubMed, Scopus, Web of Science, and PsycINFO. Studies were selected based on rigorous inclusion criteria that emphasized functional neuroimaging (fMRI, EEG), non-invasive brain stimulation (tDCS, TMS), and educational technology applications, which examined learning outcomes under varying cognitive load conditions. Priority was given to research with translational implications for adaptive learning systems and personalized educational interventions. Results: Functional imaging studies reveal an inverted-U relationship between cognitive load and neuroplasticity, with a moderate challenge in optimizing prefrontal-parietal network activation and learning-related neural adaptations. Brain stimulation research demonstrates that tDCS and TMS can enhance neuroplastic responses under cognitive load, particularly benefiting learners with lower baseline abilities. Educational technology applications demonstrate that neuroplasticity-informed adaptive systems, which incorporate real-time cognitive load monitoring and dynamic difficulty adjustment, significantly enhance learning outcomes compared to traditional approaches. Individual differences in cognitive capacity, neurodiversity, and baseline brain states substantially moderate these effects, necessitating the development of personalized intervention strategies. Conclusions: Neuroplasticity-informed learning approaches offer a robust framework for educational technology design that respects cognitive load limitations while maximizing adaptive neural changes. Integration of functional imaging insights, brain stimulation protocols, and adaptive algorithms enables the development of inclusive educational technologies that support diverse learners under cognitive stress. Future research should focus on scalable implementations of real-time neuroplasticity monitoring in authentic educational settings, as well as on developing ethical frameworks for deploying neurotechnology-enhanced learning systems across diverse populations.

MTI, Vol. 10, Pages 4: Neuroception of Psychological Safety and Attitude Towards General AI in uHealth Context

Anca-Livia Panfil — 2025-12-30

MTI, Vol. 10, Pages 4: Neuroception of Psychological Safety and Attitude Towards General AI in uHealth Context

Multimodal Technologies and Interaction doi: 10.3390/mti10010004

Authors: Anca-Livia Panfil Simona C. Tamasan Claudia C. Vasilian Raluca Horhat Diana Lungeanu

Interest in general AI is widespread, and much is expected from its large-scale adoption in the healthcare sector. However, the success of uHealth implementations relies on genuine trust, beyond technical performance. Neuroception of psychological safety (NPS), grounded in polyvagal theory, encompasses the human subconscious and automatic processes of safety and risk detection. We conducted a cross-sectional survey to explore a hypothetical connection between NPS and the perception of general AI in the uHealth context, by an anonymous online questionnaire comprising the following: Neuroception of Psychological Safety Scale (NPSS), four-item AI Attitude Scale (AIAS-4), and questions on AI threat, age, gender, and level of education. Multivariate analysis was performed using covariance-based structural equation modeling. We received 201 responses: 73 (36.3%) males vs. 128 (63.7%) females, all adults with varying levels of education (from 0 = basic formal education to 4 = master’s degree). Respondents belonged to four demographic cohorts: from Baby boomers to Generation Z. SEM results indicated that attitudes towards AI-driven health interventions are significantly impacted by social engagement and compassion (NPSS factors). Gender, education, and demographic cohort were confirmed as significant covariates. NPS-related attitudes towards AI should be considered and analyzed by healthcare providers, application developers, and policy or regulatory authorities.

MTI, Vol. 10, Pages 3: Mixed Reality Game Design for the Effectiveness and Application Research of Integrating Sustainable Concepts into Blended Learning

Zhengqing Wang — 2025-12-30

MTI, Vol. 10, Pages 3: Mixed Reality Game Design for the Effectiveness and Application Research of Integrating Sustainable Concepts into Blended Learning

Multimodal Technologies and Interaction doi: 10.3390/mti10010003

Authors: Zhengqing Wang Chenxi Xiao Pengwei Hsiao

This study explores how mixed reality (MR) game environments, enabled by sensor-based motion tracking and interactive visualization technologies, can be effectively integrated into blended learning to promote sustainability education. Using eight Macau bakeries as empirical cases, field investigations collected and categorized surplus bread samples, while carbon emission frameworks informed pedagogical design. Employing a multidimensional research methodology combining questionnaires and semi-structured interviews, the study delved into the intrinsic link between bread waste and carbon emissions. Through perceptual interaction design and task-oriented challenge modes within the MR environment, users were immersed in experiencing the pathway of sustainable behavioral impact. Post-instructional engagement with the MR game revealed that >90% of participants expressed strong affinity for the system design, and >85% perceived it as intuitively operable. Analysis of user feedback and performance data demonstrates the system’s potential to deliver solutions for reducing bread waste and carbon emissions. By establishing a replicable MR game framework and technical mechanisms, this research offers novel perspectives for future sustainability education studies in the field of behavioral mixed reality design.

MTI, Vol. 10, Pages 2: Human–AI Feedback Loop for Pronunciation Training: A Mobile Application with Phoneme-Level Error Highlighting

Aleksei Demin — 2025-12-26

MTI, Vol. 10, Pages 2: Human–AI Feedback Loop for Pronunciation Training: A Mobile Application with Phoneme-Level Error Highlighting

Multimodal Technologies and Interaction doi: 10.3390/mti10010002

Authors: Aleksei Demin Georgii Vorontsov Dmitrii Chaikovskii

This paper presents an AI-augmented pronunciation training approach for Russian language learners through a mobile application that supports an interactive learner–system feedback loop. The system combines a pre-trained Wav2Vec2Phoneme neural network with Needleman–Wunsch global sequence alignment to convert reference and learner speech into aligned phoneme sequences. Rather than producing an overall pronunciation score, the application provides localized, interpretable feedback by highlighting phoneme-level matches and mismatches in a red/green transcription, enabling learners to see where sounds were substituted, omitted, or added. Implemented as a WeChat Mini Program with a WebSocket-based backend, the design illustrates how speech-to-phoneme models and alignment procedures can be integrated into a lightweight mobile interface for autonomous pronunciation practice. We further provide a feature-level comparison with widely used commercial applications (Duolingo, HelloChinese, Babbel), emphasizing differences in feedback granularity and interpretability rather than unvalidated accuracy claims. Overall, the work demonstrates the feasibility of alignment-based phoneme-level feedback for mobile pronunciation training and motivates future evaluation of recognition reliability, latency, and learning outcomes on representative learner data.

MTI, Vol. 10, Pages 1: Motion Capture as an Immersive Learning Technology: A Systematic Review of Its Applications in Computer Animation Training

Xinyi Jiang — 2025-12-23

MTI, Vol. 10, Pages 1: Motion Capture as an Immersive Learning Technology: A Systematic Review of Its Applications in Computer Animation Training

Multimodal Technologies and Interaction doi: 10.3390/mti10010001

Authors: Xinyi Jiang Zainuddin Ibrahim Jing Jiang Gang Liu

Motion capture (MoCap) is increasingly recognized as a powerful multimodal immersive learning technology, providing embodied interaction and real-time motion visualization that enrich educational experiences. Although MoCap is gaining prominence within educational research, its pedagogical value and integration into computer animation training environments have received relatively limited systematic investigation. This review synthesizes findings from 17 studies to analyze how MoCap supports instructional design, creative development, and workflow efficiency in animation education. Results show that MoCap enables a multimodal learning process by combining visual, kinesthetic, and performative modalities, strengthening learners’ sense of presence, agency, and perceptual–motor understanding. Furthermore, we identified five key technical affordances of MoCap, including precision and fidelity, multi-actor and creative control, interactivity and immersion, perceptual–motor learning, and emotional expressiveness, which together shape both cognitive and creative learning outcomes. Emerging trends highlight MoCap’s growing convergence with VR/AR, XR, real-time rendering engines, and AI-augmented motion analysis, expanding its role in the design of immersive and interactive educational systems. This review offers insights into the use of MoCap in animation education research and provides a springboard for future work on more immersive and industry-relevant training.

MTI, Vol. 9, Pages 118: Mapping Blended Learning Activities to Students’ Digital Competence in VET

Marko Radovan — 2025-12-15

MTI, Vol. 9, Pages 118: Mapping Blended Learning Activities to Students’ Digital Competence in VET

Multimodal Technologies and Interaction doi: 10.3390/mti9120118

Authors: Marko Radovan Danijela Makovec Radovan

While blended learning facilitates digital literacy development, the specific design models and student factors contributing to this process remain underexplored. This study examined the relationship between various blended learning design models and digital literacy skill acquisition among 106 upper-secondary Vocational Education and Training (VET) students. Relationships among student activities, digital competencies, and prior blended learning experience were analyzed. Engagement in collaborative, task-based instructional designs—specifically collaborative projects and regular quizzing supported by digital tools—was positively associated with digital competence. Conversely, passive participation in live sessions or viewing pre-recorded videos exhibited a comparatively weaker association with competence development. While the use of virtual/augmented reality and interactive video correlated positively with digital tool usage, it did not significantly predict perceptions of online safety or content creation skills. Students with prior blended learning experience reported higher proficiency in developmental competencies, such as content creation and research, compared to their inexperienced peers. Cluster analysis identified three distinct student profiles based on technical specialization and blended learning experience. Overall, these findings suggest that blended learning implementation should prioritize structured collaboration and formative assessment.

MTI, Vol. 9, Pages 117: Augmented Reality in Biology Education: A Literature Review

Katja Stanič — 2025-11-25

MTI, Vol. 9, Pages 117: Augmented Reality in Biology Education: A Literature Review

Multimodal Technologies and Interaction doi: 10.3390/mti9120117

Authors: Katja Stanič Andreja Špernjak

This systematic review summarises the latest research on the use of augmented reality (AR) in biology education at primary, secondary and tertiary levels. Searching Web of Science, Scopus and Google Scholar, we found 40 empirical studies published up until early 2024. For each study, we analysed biological content, technical features, learning practices and pedagogical impact. AR is most used in human anatomy, particularly in the circulatory and respiratory systems, but also in genetics, cell biology, virology, botany, ecology and molecular processes. Mobile devices dominate as a mediation platform, with marker-based tracking and either commercial apps or self-developed Unity/Vuforia solutions. Almost all studies embed AR in constructivist or inquiry-based pedagogies, and report improved motivation, engagement and conceptual understanding. Nevertheless, reporting on the technical details is inconsistent and the long-term effects are not yet sufficiently researched. AR should therefore be viewed as a pedagogical tool rather than a technological goal that requires careful instructional design and equitable access to ensure meaningful and sustainable learning.

MTI, Vol. 9, Pages 116: Cross-Modal Attention Fusion: A Deep Learning and Affective Computing Model for Emotion Recognition

Himanshu Kumar — 2025-11-24

MTI, Vol. 9, Pages 116: Cross-Modal Attention Fusion: A Deep Learning and Affective Computing Model for Emotion Recognition

Multimodal Technologies and Interaction doi: 10.3390/mti9120116

Authors: Himanshu Kumar Martin Aruldoss Martin Wynn

Artificial emotional intelligence is a sub-domain of human–computer interaction research that aims to develop deep learning models capable of detecting and interpreting human emotional states through various modalities. A major challenge in this domain is identifying meaningful correlations between heterogeneous modalities—for example, between audio and visual data—due to their distinct temporal and spatial properties. Traditional fusion techniques used in multimodal learning to combine data from different sources often fail to adequately capture meaningful and less computational cross-modal interactions, and struggle to adapt to varying modality reliability. Following a review of the relevant literature, this study adopts an experimental research method to develop and evaluate a mathematical cross-modal fusion model, thereby addressing a gap in the extant research literature. The framework uses the Tucker tensor decomposition to analyse the multi-dimensional array of data into a set of matrices to support the integration of temporal features from audio and spatiotemporal features from visual modalities. A cross-attention mechanism is incorporated to enhance cross-modal interaction, enabling each modality to attend to the relevant information from the other. The efficacy of the model is rigorously evaluated on three publicly available datasets and the results conclusively demonstrate that the proposed fusion technique outperforms conventional fusion methods and several more recent approaches. The findings break new ground in this field of study and will be of interest to researchers and developers in artificial emotional intelligence.

MTI, Vol. 9, Pages 115: Reducing Periprocedural Pain and Anxiety of Child Patients with Guided Relaxation Exercises in a Virtual Natural Environment: A Clinical Research Study

Ilmari Jyskä — 2025-11-24

MTI, Vol. 9, Pages 115: Reducing Periprocedural Pain and Anxiety of Child Patients with Guided Relaxation Exercises in a Virtual Natural Environment: A Clinical Research Study

Multimodal Technologies and Interaction doi: 10.3390/mti9120115

Authors: Ilmari Jyskä Markku Turunen Kaija Puura Elina Karppa Sauli Palmu Jari Viik

Fear of needles is common among child patients. It causes stress and can lead to difficulty in procedures and future treatment avoidance. Virtual reality (VR) has emerged as a promising tool to reduce pain and anxiety non-pharmacologically. However, a research gap exists regarding what VR content is most effective in decreasing periprocedural stress. This article reports a VR feasibility study conducted with 83 child patients aged 8–12 years during a cannulation procedure. It has a between-subjects design with four groups, comparing deep breathing and mindfulness-based relaxation in a virtual nature environment (VNE) to passive VNE and standard care. The results from both relaxation exercise groups have been previously reported. This follow-up article adds findings from passive VNE and control groups, comparing all four for effectiveness and patient experience. The key findings highlight that deep breathing was highly effective according to heart rate variability (HRV) data, but less enjoyable than the mindfulness-based relaxation, which achieved higher patient satisfaction but was less effective according to HRV. Passive VNEs were pleasant but did not cause measurable stress reduction. All VR interventions improved patient experience over standard care. Relaxation exercises in a VNE reduce periprocedural stress more efficiently than passive VNEs or standard care in pediatrics.

MTI, Vol. 9, Pages 114: Evaluating Rich Visual Feedback on Head-Up Displays for In-Vehicle Voice Assistants: A User Study

Mahmoud Baghdadi — 2025-11-16

MTI, Vol. 9, Pages 114: Evaluating Rich Visual Feedback on Head-Up Displays for In-Vehicle Voice Assistants: A User Study

Multimodal Technologies and Interaction doi: 10.3390/mti9110114

Authors: Mahmoud Baghdadi Dilara Samad-Zada Achim Ebert

In-vehicle voice assistants face usability challenges due to limitations in delivering feedback within the constraints of the driving environment. The presented study explores the potential of Rich Visual Feedback (RVF) on Head-Up Displays (HUDs) as a multimodal solution to enhance system usability. A user study with 32 participants evaluated three HUD User Interface (UI) designs: the AR Fusion UI, which integrates augmented reality elements for layered, dynamic information presentation; the Baseline UI, which displays only essential keywords; and the Flat Fusion UI, which uses conventional vertical scrolling. To explore HUD interface principles and inform future HUD design without relying on specific hardware, a simulated near-field overlay was used. Usability was measured using the System Usability Scale (SUS), and distraction was assessed with a penalty point method. Results show that RVF on the HUD significantly influences usability, with both content quantity and presentation style affecting outcomes. The minimal Baseline UI achieved the highest overall usability. However, among the two Fusion designs, the AR-based layered information mechanism outperformed the flat scrolling method. Distraction effects were not statistically significant, indicating the need for further research. These findings suggest RVF-enabled HUDs can enhance in-vehicle voice assistant usability, potentially contributing to safer, more efficient driving.

MTI, Vol. 9, Pages 113: A Multi-Institution Mixed Methods Analysis of a Novel Acid-Base Mnemonic Algorithm

Camille Massaad — 2025-11-11

MTI, Vol. 9, Pages 113: A Multi-Institution Mixed Methods Analysis of a Novel Acid-Base Mnemonic Algorithm

Multimodal Technologies and Interaction doi: 10.3390/mti9110113

Authors: Camille Massaad Harrison Howe Meize Guo Tyler Bland

Acid-base analysis is a high-load diagnostic skill that many medical students struggle to master when taught using traditional text-based flowcharts. This multi-institution mixed-methods study evaluated a novel visual mnemonic algorithm that integrated Medimon characters, symbolic imagery, and pop-culture references into the standard acid-base diagnostic framework. First-year medical students (n = 273) at six distributed WWAMI campuses attended an identical lecture on acid-base physiology. Students at five control campuses received the original text-based algorithm, while students at one experimental campus received the Medimon algorithm in addition. Achievement was measured with a unit exam (nine focal items, day 7) and a final exam (four focal items, day 11). A Differences-in-Differences approach compared performance on focal items versus baseline items across sites. Students at the experimental campus showed no significant advantage on the unit exam (DiD = +1.2%, g = 0.12) but demonstrated a larger, but still non-significant, medium-to-large effect on the final exam (DiD = +11.0%, g = 0.85). At the experimental site, 39 students completed the Situational Interest Survey for Multimedia (SIS-M), revealing significantly higher triggered, maintained-feeling, maintained-value, and overall situational interest scores for the Medimon algorithm (all p < 0.001). Thematic analysis of open-ended responses identified four themes: enhanced clarity, improved memorability, increased engagement, and barriers to interpretation. Collectively, the findings suggest that embedding visual mnemonics and serious-game characters into diagnostic algorithms can enhance learner interest and may improve long-term retention in preclinical medical education.

MTI, Vol. 9, Pages 112: A Scoping Review of AI-Driven mHealth Systems for Precision Hydration: Integrating Food and Beverage Water Content for Personalized Recommendations

Kyriaki Apergi — 2025-11-08

MTI, Vol. 9, Pages 112: A Scoping Review of AI-Driven mHealth Systems for Precision Hydration: Integrating Food and Beverage Water Content for Personalized Recommendations

Multimodal Technologies and Interaction doi: 10.3390/mti9110112

Authors: Kyriaki Apergi Georgios D. Styliaras George Tsirogiannis Grigorios N. Beligiannis Olga Malisova

Background: Precision nutrition increasingly integrates mobile health (mHealth) and artificial intelligence (AI) tools. However, personalized hydration remains underdeveloped, particularly in accounting for both food- and beverage-derived water intake. Objective: This scoping review maps the existing literature on mHealth applications that incorporate machine learning (ML) or AI for personalized hydration. The focus is on systems that combine dietary (food-based) and fluid (beverage-based) water sources to generate individualized hydration assessments and recommendations. Methods: Following the PRISMA-ScR guidelines, we conducted a structured literature search across three databases (PubMed, Scopus, Web of Science) through March 2025. Studies were included if they addressed AI or ML within mHealth platforms for personalized hydration or nutrition, with an emphasis on systems using both beverage and food intake data. Results: Of the 43 included studies, most examined dietary recommender systems or hydration-focused apps. Few studies used hydration assessments focusing on both food and beverages or employed AI for integrated guidance. Emerging trends include wearable sensors, AR tools, and behavioral modeling. Conclusions: While numerous digital health tools address hydration or nutrition separately, there is a lack of comprehensive systems leveraging AI to guide hydration from both food and beverage sources. Bridging this gap is essential for effective, equitable, and precise hydration interventions. In this direction, we propose a hydration diet recommender system that integrates demographic, anthropometric, psychological, and socioeconomic data to create a truly personalized diet and hydration plan with a holistic approach.

MTI, Vol. 9, Pages 111: A Digital Model-Based Serious Game for PID-Controller Education: One-Axis Drone Model, Analytics, and Student Study

Raul Brumar — 2025-10-24

MTI, Vol. 9, Pages 111: A Digital Model-Based Serious Game for PID-Controller Education: One-Axis Drone Model, Analytics, and Student Study

Multimodal Technologies and Interaction doi: 10.3390/mti9110111

Authors: Raul Brumar Stelian Nicola Horia Ciocârlie

This paper presents a serious game designed to support the teaching of PID controllers. The game couples a visually clear Unity scene with a physics-accurate digital model of a drone with a single degree of freedom (called a one-axis drone) and helps prepare students to meet the demands of Industry 4.0 and 5.0. An analytics back-end logs system error at 10 Hz and interaction metrics, enabling instructors to diagnose common tuning issues from a plot and to provide actionable hints to students. The design process that led to choosing the one-axis drone and turbulence application via “turbulence balls” is explained, after which the implementation is described. The proposed solution is evaluated in a within-subjects study performed with 21 students from mixed technical backgrounds across two short, unsupervised tinkering sessions of up to 10 min framed by four quizzes of both general and theoretical content. Three questions shaped the analysis: (i) whether error traces can be visualized by instructors to generate actionable hints for students; (ii) whether brief, unsupervised play sessions yield measurable gains in knowledge or stability; and (iii) whether efficiency of tuning improves without measurable changes in tune performance. Results show that analysis of plotted error values exposes recognizable issues with PID tunes that map to concrete hints provided by the instructor. When it comes to unsupervised play sessions, no systematic pre/post improvement in quiz scores or normalized area under absolute error was observed. However, it required significantly less effort from students in the second session to reach the same tune performance, indicating improved tuning efficiency. Overall, the proposed serious game with the digital twin-inspired one-axis drone and custom analytics back-end has emerged as a practical, safe, and low-cost auxiliary tool for teaching PID controllers, helping bridge the gap between theory and practice.

MTI, Vol. 9, Pages 109: Testing a New Approach to Monitor Mild Cognitive Impairment and Cognition in Older Adults at the Community Level

Isabel Paniak — 2025-10-21

MTI, Vol. 9, Pages 109: Testing a New Approach to Monitor Mild Cognitive Impairment and Cognition in Older Adults at the Community Level

Multimodal Technologies and Interaction doi: 10.3390/mti9100109

Authors: Isabel Paniak Ethan Cohen Christa Studzinski Lia Tsotsos

Dementia and mild cognitive impairment (MCI) are growing health concerns in Canada’s aging population. Over 700,000 Canadians currently live with dementia, and this number is expected to rise. As the older adult population increases, coupled with an already strained healthcare system, there is a pressing need for innovative tools that support aging in place. This study explored the feasibility and acceptability of using a Digital Human (DH) conversational agent, combined with AI-driven speech analysis, to monitor cognitive function, anxiety, and depression in cognitively healthy community-dwelling older adults (CDOA) aged 65 and older. Sixty older adults participated in up to three in-person sessions over six months, interacting with the DH through journaling and picture description tasks. Afterward, 51 of the participants completed structured interviews about their experiences and perceptions of the DH and AI more generally. Findings showed that 84% enjoyed interacting with the DH, and 96% expressed interest in learning more about AI in healthcare. While participants were open and curious about AI, 67% voiced concerns about AI replacing human interaction in healthcare. Most found the DH friendly, though reactions to its appearance varied. Overall, participants viewed AI as a promising tool, provided it complements, rather than replaces, human interactions.

MTI, Vol. 9, Pages 110: From Consumption to Co-Creation: A Systematic Review of Six Levels of AI-Enhanced Creative Engagement in Education

Margarida Romero — 2025-10-21

MTI, Vol. 9, Pages 110: From Consumption to Co-Creation: A Systematic Review of Six Levels of AI-Enhanced Creative Engagement in Education

Multimodal Technologies and Interaction doi: 10.3390/mti9100110

Authors: Margarida Romero

As AI systems become more integrated into society, the relationship between humans and AI is shifting from simple automation to co-creative collaboration. This evolution is particularly important in education, where human intuition and imagination can combine with AI’s computational power to enable innovative forms of learning and teaching. This study is grounded in the #ppAI6 model, a framework that describes six levels of creative engagement with AI in educational contexts, ranging from passive consumption to active, participatory co-creation of knowledge. The model highlights progression from initial interactions with AI tools to transformative educational experiences that involve deep collaboration between humans and AI. In this study, we explore how educators and learners can engage in deeper, more transformative interactions with AI technologies. The #ppAI6 model categorizes these levels of engagement as follows: level 1 involves passive consumption of AI-generated content, while level 6 represents expansive, participatory co-creation of knowledge. This model provides a lens through which we investigate how educational tools and practices can move beyond basic interactions to foster higher-order creativity. We conducted a systematic literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for reporting the levels of creative engagement with AI tools in education. This review synthesizes existing literature on various levels of engagement, such as interactive consumption through Intelligent Tutoring Systems (ITS), and shifts focus to the exploration and design of higher-order forms of creative engagement. The findings highlight varied levels of engagement across both learners and educators. For learners, a total of four studies were found at level 2 (interactive consumption). Two studies were found that looked at level 3 (individual content creation). Four studies focused on collaborative content creation at level 4. No studies were observed at level 5, and only one study was found at level 6. These findings show a lack of development in AI tools for more creative involvement. For teachers, AI tools mainly support levels two and three, facilitating personalized content creation and performance analysis with limited examples of higher-level creative engagement and indicating areas for improvement in supportive collaborative teaching practices. The review found that two studies focused on level 2 (interactive consumption) for teachers. In addition, four studies were identified at level 3 (individual content creation). Only one study was found at level 5 (participatory co-creation), and no studies were found at level 6. In practical terms, the review suggests that educators need professional development focused on building AI literacy, enabling them to recognize and leverage the different levels of creative engagement that AI tools offer.

MTI, Vol. 9, Pages 108: NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction

Christos Papakostas — 2025-10-18

MTI, Vol. 9, Pages 108: NAMI: A Neuro-Adaptive Multimodal Architecture for Wearable Human–Computer Interaction

Multimodal Technologies and Interaction doi: 10.3390/mti9100108

Authors: Christos Papakostas Christos Troussas Akrivi Krouska Cleo Sgouropoulou

The increasing ubiquity of wearable computing and multimodal interaction technologies has created unprecedented opportunities for natural and seamless human–computer interaction. However, most existing systems adapt only to external user actions such as speech, gesture, or gaze, without considering internal cognitive or affective states. This limits their ability to provide intelligent and empathetic adaptations. This paper addresses this critical gap by proposing the Neuro-Adaptive Multimodal Architecture (NAMI), a principled, modular, and reproducible framework designed to integrate behavioral and neurophysiological signals in real time. NAMI combines multimodal behavioral inputs with lightweight EEG and peripheral physiological measurements to infer cognitive load and engagement and adapt the interface dynamically to optimize user experience. The architecture is formally specified as a three-layer pipeline encompassing sensing and acquisition, cognitive–affective state estimation, and adaptive interaction control, with clear data flows, mathematical formalization, and real-time performance on wearable platforms. A prototype implementation of NAMI was deployed in an augmented reality Java programming tutor for postgraduate informatics students, where it dynamically adjusted task difficulty, feedback modality, and assistance frequency based on inferred user state. Empirical evaluation with 100 participants demonstrated significant improvements in task performance, reduced subjective workload, and increased engagement and satisfaction, confirming the effectiveness of the neuro-adaptive approach.

MTI, Vol. 9, Pages 107: The Hybrid Learning Atelier: Designing a Hybrid Learning Space

Jan Michael Sieber — 2025-10-14

MTI, Vol. 9, Pages 107: The Hybrid Learning Atelier: Designing a Hybrid Learning Space

Multimodal Technologies and Interaction doi: 10.3390/mti9100107

Authors: Jan Michael Sieber Anne Brannys Heinrich Söbke Mubtasim Islam Sabik Eckhard Kraft

Hybrid learning spaces may be described as physical environments enhanced by digital technologies, which enable learning scenarios involving both in-person and online participation. This article presents a hybrid learning space designed for higher education. The design of the space has been informed by Lefebvre’s design principles: (a) spatial practice enabling flexible usage scenarios, (b) representations of space conveying openness and adaptability, and (c) representational spaces supporting experiences of presence in both physical and digital form. The article describes design characteristics guiding the implementation of the hybrid learning space and explains corresponding design decisions, such as the use of a wall-sized projection. Further, the article introduces affordances and usage scenarios of the hybrid learning space developed. Moreover, an evaluation study of the hybrid learning space is conducted by means of a 360°-based virtual field trip (VFT). The VFT, led by an educator, serves as preparation for a field trip (FT) to a composting plant two weeks later. Participants of both VFT and FT (N = 11) completed a questionnaire addressing psychological constructs related to learning, including motivation, emotion, immersion, presence, and cognitive load. We report the results of the VFT alongside those of the FT as a baseline. Some notable differences, for example in social presence, suggest areas for further development of the hybrid learning space. Overall, the study characterises key features of hybrid learning spaces, identifies their contribution to high-quality teaching and provides inspirations for their further development.

MTI, Vol. 9, Pages 106: Multimodal Learning Interactions Using MATLAB Technology in a Multinational Statistical Classroom

Qiaoyan Cai — 2025-10-13

MTI, Vol. 9, Pages 106: Multimodal Learning Interactions Using MATLAB Technology in a Multinational Statistical Classroom

Multimodal Technologies and Interaction doi: 10.3390/mti9100106

Authors: Qiaoyan Cai Mohd Razip Bajuri Kwan Eu Leong Liangliang Chen

This study explores and models the use of MATLAB technology in multimodal learning interactions to address the challenges of teaching and learning statistics in a multinational postgraduate classroom. The term multimodal refers to the deliberate integration of multiple representational and interaction modes, i.e., visual, textual, symbolic, and interactive computational modelling, within a coherent instructional design. MATLAB is utilised as it is a comprehensive tool for enhancing students’ understanding of statistical skills, practical applications, and data analysis—areas where traditional methods often fall short. International postgraduate students were chosen for this study because their diverse educational backgrounds present unique learning challenges. A qualitative case study design was employed, and data collection methods included classroom observations, interviews, and student work analysis. The collected data were analysed and modelled by conceptualising key elements and themes using thematic analysis, with findings verified through data triangulation and expert review. Emerging themes were structured into models that illustrate multimodal teaching and learning interactions. The novelty of this research lies in its contribution to multimodal teaching and learning strategies for multinational students in statistics education. The findings highlight significant challenges international students face, including language and technical barriers, limited prior content knowledge, time constraints, technical difficulties, and a lack of independent thinking. To address these challenges, MATLAB promotes collaborative learning, increases student engagement and discussion, boosts motivation, and develops essential skills. This study suggests that educators integrate multimodal interactions in their teaching strategies to better support multinational students in statistical learning environments.

MTI, Vol. 9, Pages 105: Research on the Detection Method of Flight Trainees’ Attention State Based on Multi-Modal Dynamic Depth Network

Gongpu Wu — 2025-10-10

MTI, Vol. 9, Pages 105: Research on the Detection Method of Flight Trainees’ Attention State Based on Multi-Modal Dynamic Depth Network

Multimodal Technologies and Interaction doi: 10.3390/mti9100105

Authors: Gongpu Wu Changyuan Wang Zehui Chen Guangyi Jiang

In aviation safety, pilots must efficiently process dynamic visual information and maintain a high level of attention. Any missed judgment of critical information or delay in decision-making may lead to mission failure or catastrophic consequences. Therefore, accurately detecting pilots’ attention states is the primary prerequisite for improving flight safety and performance. To better detect the attention state of pilots, this paper takes flight trainees as the research object and the simulated flight environment as the experimental background. It proposes a method for detecting the attention state of flight trainees based on a multi-modal dynamic depth network (M3D-Net). The M3D-Net architecture is a lightweight neural network architecture that integrates temporal image features, visual information features, and flight operation data features. It aligns image and text features through an attention mechanism to enhance the semantic association between modalities; it utilizes the Depth-wise Separable Convolution and LSTM (DSC-LSTM) module to model temporal information, dynamically capturing the contextual dependencies within the sequence, and achieving six-level attention state classification. This paper conducted ablation experiments to comparatively analyze the classification effects of the model and also evaluates the effectiveness of our proposed method through model evaluation metrics. Experiments show that the classification effect of the model architecture proposed in this paper reaches 97.56%, with a model size of 18.6 M. Compared with traditional algorithms, the M3D-Net architecture has better performance prospects in terms of application.

MTI, Vol. 9, Pages 104: Adaptive Neuro-Fuzzy Inference System Framework for Paediatric Wrist Injury Classification

Olamilekan Shobayo — 2025-10-08

MTI, Vol. 9, Pages 104: Adaptive Neuro-Fuzzy Inference System Framework for Paediatric Wrist Injury Classification

Multimodal Technologies and Interaction doi: 10.3390/mti9100104

Authors: Olamilekan Shobayo Reza Saatchi Shammi Ramlakhan

An Adaptive Neuro-Fuzzy Inference System (ANFIS) framework for paediatric wrist injury classification (fracture versus sprain) was developed utilising infrared thermography (IRT). ANFIS combines artificial neural network (ANN) learning with interpretable fuzzy rules, mitigating the “black-box” limitation of conventional ANNs through explicit membership functions and Takagi–Sugeno rule consequents. Forty children (19 fractures, 21 sprains, confirmed by X-ray radiograph) provided thermal image sequences from which three statistically discriminative temperature distribution features namely standard deviation, inter-quartile range (IQR) and kurtosis were selected. A five-layer Sugeno ANFIS with Gaussian membership functions were trained using a hybrid least-squares/gradient descent optimisation and evaluated under three premise-parameter initialisation strategies: random seeding, K-means clustering, and fuzzy C-means (FCM) data partitioning. Five-fold cross-validation guided the selection of membership functions standard deviation (σ) and rule count, yielding an optimal nine-rule model. Comparative experiments show K-means initialisation achieved the best balance between convergence speed and generalisation versus slower but highly precise random initialisation and rapidly convergent yet unstable FCM. The proposed K-means–driven ANFIS offered data-efficient decision support, highlighting the potential of thermal feature fusion with neuro-fuzzy modelling to reduce unnecessary radiographs in emergency bone fracture triage.

MTI, Vol. 9, Pages 103: Research on Safe Multimodal Detection Method of Pilot Visual Observation Behavior Based on Cognitive State Decoding

Heming Zhang — 2025-10-01

MTI, Vol. 9, Pages 103: Research on Safe Multimodal Detection Method of Pilot Visual Observation Behavior Based on Cognitive State Decoding

Multimodal Technologies and Interaction doi: 10.3390/mti9100103

Authors: Heming Zhang Changyuan Wang Pengbo Wang

Pilot visual behavior safety assessment is a cross-disciplinary technology that analyzes pilots’ gaze behavior and neurocognitive responses. This paper proposes a multimodal analysis method for pilot visual behavior safety, specifically for cognitive state decoding. This method aims to achieve a quantitative and efficient assessment of pilots’ observational behavior. Addressing the subjective limitations of traditional methods, this paper proposes an observational behavior detection model that integrates facial images to achieve dynamic and quantitative analysis of observational behavior. It addresses the “Midas contact” problem of observational behavior by constructing a cognitive analysis method using multimodal signals. We propose a bidirectional long short-term memory (LSTM) network that matches physiological signal rhythmic features to address the problem of isolated features in multidimensional signals. This method captures the dynamic correlations between multiple physiological behaviors, such as prefrontal theta and chest-abdominal coordination, to decode the cognitive state of pilots’ observational behavior. Finally, the paper uses a decision-level fusion method based on an improved Dempster–Shafer (DS) evidence theory to provide a quantifiable detection strategy for aviation safety standards. This dual-dimensional quantitative assessment system of “visual behavior–neurophysiological cognition” reveals the dynamic correlations between visual behavior and cognitive state among pilots of varying experience. This method can provide a new paradigm for pilot neuroergonomics training and early warning of vestibular-visual integration disorders.

MTI, Vol. 9, Pages 102: Design of Enhanced Virtual Reality Training Environments for Industrial Rotary Dryers Using Mathematical Modeling

Ricardo A. Gutiérrez-Aguiñaga — 2025-09-30

MTI, Vol. 9, Pages 102: Design of Enhanced Virtual Reality Training Environments for Industrial Rotary Dryers Using Mathematical Modeling

Multimodal Technologies and Interaction doi: 10.3390/mti9100102

Authors: Ricardo A. Gutiérrez-Aguiñaga Jonathan H. Rosales-Hernández Rogelio Salinas-Santiago Froylán M. E. Escalante Efrén Aguilar-Garnica

Rotary dryers are widely used in industry for their ease of operation in processing large volumes of material continuously despite persistent challenges in energy efficiency, cost-effectiveness, and safety. Addressing the need for effective operator training, the purpose of this study is to develop virtual reality (VR) environments for industrial rotary dryers. Visual and behavioral aspects were considered in the methodology for developing the environments for two application cases—ammonium nitrate and low-rank coal drying. Visual aspects considered include the industrial-scale geometry and detailed components of the rotary dryer, while behavioral aspects were governed by mathematical modeling of heat and mass transfer phenomena. The case studies of ammonium nitrate and low-rank coal were selected due to their industrial relevance and contrasting drying characteristics, ensuring the versatility and applicability of the developed VR environments. The main contribution of this work is the embedding of validated mathematical models—expressed as ordinary differential equations—into these environments. The numerical integration of these models provides key process variables, such as solid temperature and moisture content along the rotary dryer, thereby enhancing the behavioral realism of the developed VR environments.

MTI, Vol. 9, Pages 101: SmartRead: A Multimodal eReading Platform Integrating Computing and Gamification to Enhance Student Engagement and Knowledge Retention

Ifeoluwa Pelumi — 2025-09-23

MTI, Vol. 9, Pages 101: SmartRead: A Multimodal eReading Platform Integrating Computing and Gamification to Enhance Student Engagement and Knowledge Retention

Multimodal Technologies and Interaction doi: 10.3390/mti9100101

Authors: Ifeoluwa Pelumi Neil Gordon

This paper explores the integration of computing and multimodal technologies into personal reading practices to enhance student engagement and knowledge assimilation in higher education. In response to a documented decline in voluntary academic reading, we investigated how technology-enhanced reading environments can re-engage students through interactive and personalized experiences. Central to this research is SmartRead, a proposed multimodal eReading platform that incorporates gamification, adaptive content delivery, and real-time feedback mechanisms. Drawing on empirical data collected from students at a higher education institution, we examined how features such as progress tracking, motivational rewards, and interactive comprehension aids influence reading behavior, engagement levels, and information retention. Results indicate that such multimodal interventions can significantly improve learner outcomes and user satisfaction. This paper contributes actionable insights into the design of innovative, accessible, and pedagogically sound digital reading tools and proposes a framework for future eReading technologies that align with multimodal interaction principles.

MTI, Vol. 9, Pages 100: Investigating the Effect of Pseudo-Haptics on Perceptions Toward Onomatopoeia Text During Finger-Point Tracing

Satoshi Saga — 2025-09-23

MTI, Vol. 9, Pages 100: Investigating the Effect of Pseudo-Haptics on Perceptions Toward Onomatopoeia Text During Finger-Point Tracing

Multimodal Technologies and Interaction doi: 10.3390/mti9100100

Authors: Satoshi Saga Kanta Shirakawa

With the advancement of haptic technology, the use of pseudo-haptics to provide tactile feedback without physical contact has garnered significant attention. This paper aimed to investigate whether sliding fingers over onomatopoetic text strings with pseudo-haptic effects induces change in perception toward their symbolic semantics. To address this, we conducted an experiment using finger-point reading as our subject matter. The experimental results confirmed that the “neba-neba,” “puru-puru,” and “fusa-fusa” effects create a pseudo-haptic feeling for the associated texts on the “hard–soft,” “slippery–sticky,” and “elastic–inelastic” adjective pairs. Specifically, for “hard–soft,” it was found that the proposed effects could consistently produce an impact.

MTI, Vol. 9, Pages 99: Assessing Cognitive Load Using EEG and Eye-Tracking in 3-D Learning Environments: A Systematic Review

Rozemun Khan — 2025-09-22

MTI, Vol. 9, Pages 99: Assessing Cognitive Load Using EEG and Eye-Tracking in 3-D Learning Environments: A Systematic Review

Multimodal Technologies and Interaction doi: 10.3390/mti9090099

Authors: Rozemun Khan Johannes Vernooij Daniela Salvatori Beerend P. Hierck

The increasing use of immersive 3-D technologies in education raises critical questions about their cognitive impact on learners. This systematic review evaluates how electroencephalography (EEG) and eye-tracking have been used to objectively measure cognitive load in 3-D learning environments. We conducted a comprehensive literature search (2009–2025) across PubMed, Scopus, Web of Science, PsycInfo, and ERIC, identifying 51 studies that used EEG or eye-tracking in experimental contexts involving stereoscopic or head-mounted 3-D technologies. Our findings suggest that 3-D environments may enhance learning and engagement, particularly in spatial tasks, while affecting cognitive load in complex, task-dependent ways. Studies reported mixed patterns across psychophysiological measures, including spectral features (e.g., frontal theta, parietal alpha), workload indices (e.g., theta/alpha ratio), and gaze-based metrics (e.g., fixation duration, pupil dilation): some studies observed increased load, while others reported reductions or no difference. These discrepancies reflect methodological heterogeneity and underscore the value of time-sensitive assessments. While a moderate cognitive load supports learning, an excessive load may impair performance, and overload thresholds can vary across individuals. EEG and eye-tracking offer scalable methods for monitoring cognitive effort dynamically. Overall, 3-D and XR technologies hold promise but must be aligned with task demands and learner profiles and guided by real-time indicators of cognitive load in immersive environments.

MTI, Vol. 9, Pages 98: A Review of Socially Assistive Robotics in Supporting Children with Autism Spectrum Disorder

Muhammad Nadeem — 2025-09-18

MTI, Vol. 9, Pages 98: A Review of Socially Assistive Robotics in Supporting Children with Autism Spectrum Disorder

Multimodal Technologies and Interaction doi: 10.3390/mti9090098

Authors: Muhammad Nadeem Julien Moussa H. Barakat Dani Daas Albert Potams

This study aimed to investigate the use of social robots as an interactive learning approach for treating children diagnosed with autism spectrum disorder (ASD). A review was conducted using the meta-analysis technique to compile pertinent research. An analysis was performed on the results of the online search process, which gathered information on pertinent research published until 31 January 2025, from three publication databases: IEEE Xplore, SCOPUS, and Google Scholar. One hundred and seven papers out of the 591 publications that were retrieved satisfied the previously established inclusion and exclusion criteria. Despite the differences in methodology and heterogeneity, the data were synthesized narratively. This review focuses on the various types of social robots used to treat ASD, as well as their communication mechanisms, development areas, target behaviors, challenges, and future directions. Both practitioners and seasoned researchers looking for a fresh approach to their next project will find this review a useful resource that offers broad summaries of state-of-the-art research in this field.

MTI, Vol. 9, Pages 97: Exploring Consumer Perception of Augmented Reality (AR) Tools for Displaying and Understanding Nutrition Labels: A Pilot Study

Cristina Botinestean — 2025-09-16

MTI, Vol. 9, Pages 97: Exploring Consumer Perception of Augmented Reality (AR) Tools for Displaying and Understanding Nutrition Labels: A Pilot Study

Multimodal Technologies and Interaction doi: 10.3390/mti9090097

Authors: Cristina Botinestean Stergios Melios Emily Crofton

Augmented reality (AR) technology offers a promising approach to providing consumers with detailed and personalized information about food products. The aim of this pilot study was to explore how the use of AR tools comprising visual and auditory formats affects consumers’ perception and understanding of nutrition labels of two commercially available products (lasagne ready meal and strawberry yogurt). The nutritional information of both the lasagne and yogurt product were presented to consumers (n = 30) under three experimental conditions: original packaging, visual AR, and visual and audio AR. Consumers answered questions about their perceptions of the products’ overall healthiness, caloric content, and macronutrient composition, as well as how the information was presented. The results showed that while nutritional information presented under the original packaging condition was more effective in changing consumer perceptions, the AR tools were found to be more “novel” and “memorable”. More specifically, for both lasagne and yogurt, the visual AR tool resulted in a more memorable experience compared to original packaging. The use of visual AR and visual and audio AR tools were considered novel experiences for both products. However, the provision of nutritional information had a greater impact on product perception than the specific experimental condition used to present it. These results provide evidence from a pilot study supporting the development of an AR tool for displaying and potentially improving the understanding of nutrition labels.

MTI, Vol. 9, Pages 96: Analyzing Player Behavior in a VR Game for Children Using Gameplay Telemetry

Mihai-Alexandru Grosu — 2025-09-09

MTI, Vol. 9, Pages 96: Analyzing Player Behavior in a VR Game for Children Using Gameplay Telemetry

Multimodal Technologies and Interaction doi: 10.3390/mti9090096

Authors: Mihai-Alexandru Grosu Stelian Nicola

Virtual reality (VR) has become increasingly popular and has started entering homes, schools, and clinics, yet evidence on how children interact during free-form, unguided play remains limited. Understanding how interaction dynamics relate to player performance is essential for designing more accessible and engaging VR experiences, especially in educational contexts. For this reason, we developed VRBloons, a child-friendly VR game about popping balloons. The game logs real-time gameplay telemetry such as total hand movement, accuracy, throw rate, and other performance related gameplay data. By analyzing several feature-engineered metrics using unsupervised clustering and non-parametric statistical validation, we aim to identify distinct behavioral patterns. The analysis revealed several associations between input preferences, movement patterns, and performance outcomes, forming clearly distinct clusters. From the performed analysis, input preference emerged as an independent dimension of play style, supporting the inclusion of redundant input mappings to accommodate diverse motor capabilities. Additionally, the results highlight the opportunities for performance-sensitive assistance systems that adapt the difficulty of the game in real time. Overall, this study demonstrates how telemetry-based profiling can shape the design decisions in VR experiences, offering a methodological framework for assessing varied interaction styles and a diverse player population.

MTI, Vol. 9, Pages 95: Unplugged Activities in the Development of Computational Thinking with Poly-Universe

Aldemir Malveira de Oliveira — 2025-09-09

MTI, Vol. 9, Pages 95: Unplugged Activities in the Development of Computational Thinking with Poly-Universe

Multimodal Technologies and Interaction doi: 10.3390/mti9090095

Authors: Aldemir Malveira de Oliveira Piedade Vaz-Rebelo Maria da Graça Bidarra

This paper presents an educational experience of using Poly-Universe, a game created by Janos Saxon, with the aim of developing computational thinking (CT) skills through unplugged activities. It was implemented in the course “Algorithm Analysis,” with the participation of students in the sixth period of Computer Science at a University Center for Higher Education in Brazil. These students were facing various cognitive difficulties in using the four pillars of CT, namely abstraction, pattern recognition, algorithm, and decomposition. To address the students’ learning gaps, unplugged activities were implemented using Poly-Universe pieces—geometric shapes such as triangles, squares, and circles—exploring the connection through the pillars of CT. A mixed methodology integrating quantitative and qualitative approaches was applied to compare the progress of the students and their reactions when developing the activities. The results obtained evidenced that the level of learning involving the computational pillars on “Algorithm Analysis” had a significant evolution, from 30% to almost 80% in terms of achievement in academic tests. In addition, an increase in students’ engagement and collaboration was also registered. Therefore, the implementation of unplugged activities with Poly-Universe revealed a promotion of skills related to the pillars of CT, especially in the analysis of algorithms.

MTI, Vol. 9, Pages 94: Augmented Reality in Education Through Collaborative Learning: A Systematic Literature Review

Georgios Christoforos Kazlaris — 2025-09-06

MTI, Vol. 9, Pages 94: Augmented Reality in Education Through Collaborative Learning: A Systematic Literature Review

Multimodal Technologies and Interaction doi: 10.3390/mti9090094

Authors: Georgios Christoforos Kazlaris Euclid Keramopoulos Charalampos Bratsas Georgios Kokkonis

The rapid advancement of technology in our era has brought significant changes to various fields of human activity, including education. As a key pillar of intellectual and social development, education integrates innovative tools to enrich learning experiences. One such tool is Augmented Reality (AR), which enables dynamic interaction between physical and digital environments. This systematic review, following PRISMA guidelines, examines AR’s use in education, with a focus on enhancing collaborative learning across various educational levels. A total of 29 peer-reviewed studies published between 2010 and 2024 were selected based on defined inclusion criteria, retrieved from major databases such as Scopus, Web of Science, IEEE Xplore, and ScienceDirect. The findings suggest that AR can improve student engagement and foster collaboration through interactive, immersive methods. However, the review also identifies methodological gaps in current research, such as inconsistent sample size reporting, limited information on questionnaires, and the absence of standardized evaluation approaches. This review contributes to the field by offering a structured synthesis of current research, highlighting critical gaps, and proposing directions for more rigorous, transparent, and pedagogically grounded studies on the integration of AR in collaborative learning environments.

MTI, Vol. 9, Pages 93: Augminded: Ambient Mirror Display Notifications

Timo Götzelmann — 2025-09-04

MTI, Vol. 9, Pages 93: Augminded: Ambient Mirror Display Notifications

Multimodal Technologies and Interaction doi: 10.3390/mti9090093

Authors: Timo Götzelmann Pascal Karg Mareike Müller

This paper presents a new approach for providing contextual information in real-world environments. Our approach is consciously designed to be low-threshold; by using mirrors as augmented reality surfaces, no devices such as AR glasses or smartphones have to be worn or held by the user. It enables technical and non-technical objects in the environment to be visually highlighted and thus subtly draw the attention of people passing by. The presented technology enables the provision of information that can be viewed in more detail by the user if required by slowing down their movement. Users can decide whether this is relevant to them or not. A prototype system was implemented and evaluated through a user study. The results show a high level of acceptance and intuitive usability of the system, with participants being able to reliably perceive and process the information displayed. The technology thus offers promising potential for the unobtrusive and context-sensitive provision of information in various application areas. The paper discusses limitations of the system and outlines future research directions to further optimize the technology and extend its applicability.

MTI, Vol. 9, Pages 92: Evaluating Educational Game Design Through Human–Machine Pair Inspection: Case Studies in Adaptive Learning Environments

Ioannis Sarlis — 2025-09-01

MTI, Vol. 9, Pages 92: Evaluating Educational Game Design Through Human–Machine Pair Inspection: Case Studies in Adaptive Learning Environments

Multimodal Technologies and Interaction doi: 10.3390/mti9090092

Authors: Ioannis Sarlis Dimitrios Kotsifakos Christos Douligeris

Educational games often fail to effectively merge game mechanics with educational goals, lacking adaptive feedback and real-time performance monitoring. This study explores how Human–Computer Interaction principles and adaptive feedback can enhance educational game design to improve learning outcomes and user experience. Four educational games were analyzed using a mixed-methods approach and evaluated through established frameworks, such as the Serious Educational Games Evaluation Framework, the Assessment of Learning and Motivation Software, the Learning Object Evaluation Scale for Students, and Universal Design for Learning guidelines. In addition, a novel Human–Machine Pair Inspection protocol was employed to gather real-time data on adaptive feedback, cognitive load, and interactive behavior. Findings suggest that Human–Machine Pair Inspection-based adaptive mechanisms significantly boost personalized learning, knowledge retention, and student motivation by better aligning games with learning objectives. Although the sample size is small, this research provides practical insights for educators and designers, highlighting the effectiveness of adaptive Game-Based Learning. The study proposes the Human–Machine Pair Inspection methodology as a valuable tool for creating educational games that successfully balance user experience with learning goals, warranting further empirical validation with larger groups.

MTI, Vol. 9, Pages 91: Assessment of the Validity and Reliability of Reaction Speed Measurements Using the Rezzil Player Application in Virtual Reality

Jacek Polechoński — 2025-09-01

MTI, Vol. 9, Pages 91: Assessment of the Validity and Reliability of Reaction Speed Measurements Using the Rezzil Player Application in Virtual Reality

Multimodal Technologies and Interaction doi: 10.3390/mti9090091

Authors: Jacek Polechoński Agata Horbacz

Virtual reality (VR) is widely used across various areas of human life. One field where its application is rapidly growing is sport and physical activity (PA). Training applications are being developed that support various sports disciplines, motor skill acquisition, and the development of motor abilities. Immersive technologies are increasingly being used to assess motor and cognitive capabilities. As such, validation studies of these diagnostic tools are essential. The aim of this study was to estimate the validity and reliability of reaction speed (RS) measurements using the Rezzil Player application (“Reaction” module) in immersive VR compared to results obtained with the SMARTFit device in a real environment (RE). The study involved 43 university students (17 women and 26 men). Both tests required participants to strike light targets on a panel with their hands. Two indicators of response were analyzed in both tests: the number of hits on illuminated targets within a specified time frame and the average RS in response to visual stimuli. Statistically significant and relatively strong correlations were observed between the two measurement methods: number of hits (rS = 0.610; p < 0.001) and average RS (rS = 0.535; p < 0.001). High intraclass correlation coefficients (ICCs) were also found for both test environments: number of hits in VR (ICC = 0.851), average RS in VR (0.844), number of hits in RE (ICC = 0.881), and average RS in RE (0.878). The findings indicate that the Rezzil Player application can be considered a valid and reliable tool for measuring reaction speed in VR. The correlation with conventional methods and the high ICC values attest to the psychometric quality of the tool.

MTI, Vol. 9, Pages 90: Design and Evaluation of a Serious Game Prototype to Stimulate Pre-Reading Fluency Processes in Paediatric Hospital Classrooms

Juan Pedro Tacoronte-Sosa — 2025-08-27

MTI, Vol. 9, Pages 90: Design and Evaluation of a Serious Game Prototype to Stimulate Pre-Reading Fluency Processes in Paediatric Hospital Classrooms

Multimodal Technologies and Interaction doi: 10.3390/mti9090090

Authors: Juan Pedro Tacoronte-Sosa María Ángeles Peña-Hita

Didactic digital tools can commence, enhance, and strengthen reading fluency in children undergoing long-term hospitalization due to oncology conditions. However, resources specifically designed to support rapid naming and decoding in Spanish remain scarce. This study presents the design, development, and evaluation of a game prototype aimed at addressing this gap among Spanish-speaking preschoolers in hospital settings. Developed using Unity through a design-based research methodology, the game comprises three narratively linked levels targeting rapid naming, decoding, and fluency. A sequential exploratory mixed-methods design (QUAL-quan) guided the evaluation. Qualitative data were obtained from a focus group of hospital teachers (N = 6) and interviews with experts (N = 30) in relevant fields. Quantitative validation involved 274 experts assessing the game’s contextual, pedagogical, and technical quality. The prototype was also piloted with four end-users using standardised tests for rapid naming, decoding, and fluency in Spanish. Results indicated strong expert consensus regarding the game’s educational value, contextual fit, and usability. Preliminary findings suggest potential for fostering and supplementing early literacy skills in hospitalised children. Further research with larger clinical samples is recommended to validate these outcomes.

MTI, Vol. 9, Pages 89: Cognitive Workload Assessment in Aerospace Scenarios: A Cross-Modal Transformer Framework for Multimodal Physiological Signal Fusion

Pengbo Wang — 2025-08-26

MTI, Vol. 9, Pages 89: Cognitive Workload Assessment in Aerospace Scenarios: A Cross-Modal Transformer Framework for Multimodal Physiological Signal Fusion

Multimodal Technologies and Interaction doi: 10.3390/mti9090089

Authors: Pengbo Wang Hongxi Wang Heming Zhang

In the field of cognitive workload assessment for aerospace training, existing methods exhibit significant limitations in unimodal feature extraction and in leveraging complementary synergy among multimodal signals, while current fusion paradigms struggle to effectively capture nonlinear dynamic coupling characteristics across modalities. This study proposes DST-Net (Cross-Modal Downsampling Transformer Network), which synergistically integrates pilots’ multimodal physiological signals (electromyography, electrooculography, electrodermal activity) with flight dynamics data through an Anti-Aliasing and Average Pooling LSTM (AAL-LSTM) data fusion strategy combined with cross-modal attention mechanisms. Evaluation on the “CogPilot” dataset for flight task difficulty prediction demonstrates that AAL-LSTM achieves substantial performance improvements over existing approaches (AUC = 0.97, F1 Score = 94.55). Given the dataset’s frequent sensor data missingness, the study further enhances simulated flight experiments. By incorporating eye-tracking features via cross-modal attention mechanisms, the upgraded DST-Net framework achieves even higher performance (AUC = 0.998, F1 Score = 97.95) and reduces the root mean square error (RMSE) of cumulative flight error prediction to 1750. These advancements provide critical support for safety-critical aviation training systems.

MTI, Vol. 9, Pages 88: Development of a Multi-Platform AI-Based Software Interface for the Accompaniment of Children

Isaac León — 2025-08-26

MTI, Vol. 9, Pages 88: Development of a Multi-Platform AI-Based Software Interface for the Accompaniment of Children

Multimodal Technologies and Interaction doi: 10.3390/mti9090088

Authors: Isaac León Camila Reyes Iesus Davila Bryan Puruncajas Dennys Paillacho Nayeth Solorzano Marcelo Fajardo-Pruna Hyungpil Moon Francisco Yumbla

The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. At the same time, the technology currently available to provide emotional support in these contexts remains limited. In response to the growing need for emotional support and companionship in child care, this project proposes the development of a multi-platform software architecture based on artificial intelligence (AI), designed to be integrated into humanoid robots that assist children between the ages of 6 and 14. The system enables daily verbal and non-verbal interactions intended to foster a sense of presence and personalized connection through conversations, games, and empathetic gestures. Built on the Robot Operating System (ROS), the software incorporates modular components for voice command processing, real-time facial expression generation, and joint movement control. These modules allow the robot to hold natural conversations, display dynamic facial expressions on its LCD (Liquid Crystal Display) screen, and synchronize gestures with spoken responses. Additionally, a graphical interface enhances the coherence between dialogue and movement, thereby improving the quality of human–robot interaction. Initial evaluations conducted in controlled environments assessed the system’s fluency, responsiveness, and expressive behavior. Subsequently, it was implemented in a pediatric hospital in Guayaquil, Ecuador, where it accompanied children during their recovery. It was observed that this type of artificial intelligence-based software, can significantly enhance the experience of children, opening promising opportunities for its application in clinical, educational, recreational, and other child-centered settings.

MTI, Vol. 9, Pages 87: 3D Printing as a Multimodal STEM Learning Technology: A Survey Study in Second Chance Schools

Despina Radiopoulou — 2025-08-24

MTI, Vol. 9, Pages 87: 3D Printing as a Multimodal STEM Learning Technology: A Survey Study in Second Chance Schools

Multimodal Technologies and Interaction doi: 10.3390/mti9090087

Authors: Despina Radiopoulou Antreas Kantaros Theodore Ganetsos Paraskevi Zacharia

This study explores the integration of 3D printing technology by adult learners in Greek Second Chance Schools (SCS), institutions designed to address Early School Leaving and promote Lifelong Learning. Grounded in constructivist and experiential learning theories, the research examines adult learners’ attitudes toward 3D printing technology through a hands-on STEM activity in the context of teaching scientific literacy. The instructional activity was centered on a physics experiment illustrating Archimedes’ principle using a multimodal approach, combining 3D computer modeling for visualization and design with tangible manipulation of a printed object, thereby offering both digital and Hands-on learning experiences. Quantitative data was collected using a structured questionnaire to assess participants’ perception toward the 3D printing technology. Findings indicate a positive trend in adult learners’ responses, finding 3D printing accessible, interesting, and easy to use. While expressing hesitation about independently applying the technology in the future, overall responses suggest strong interest and openness to using emerging technologies within educational settings, even among marginalized adult populations. This work highlights the value of integrating emerging technologies into alternative education frameworks and offers a replicable model for inclusive STEM education and lays the groundwork for further research in adult learning environments using innovative, learner-centered approaches.

MTI, Vol. 9, Pages 86: Telerehabilitation Strategy for University Students with Back Pain Based on 3D Animations: Case Study

Carolina Ponce-Ibarra — 2025-08-24

MTI, Vol. 9, Pages 86: Telerehabilitation Strategy for University Students with Back Pain Based on 3D Animations: Case Study

Multimodal Technologies and Interaction doi: 10.3390/mti9090086

Authors: Carolina Ponce-Ibarra Diana-Margarita Córdova-Esparza Teresa García-Ramírez Julio-Alejandro Romero-González Juan Terven Mauricio Arturo Ibarra-Corona Rolando Pérez Palacios-Bonilla

Nowadays, the use of technology has become increasingly indispensable, leading to prolonged exposure to computers and other screen devices. This situation is common in work areas related to Information and Communication Technologies (ICTs), where people spend long hours in front of a computer. This exposure has been associated with the development of musculoskeletal disorders, among which nonspecific back pain is particularly prevalent. This observational study presents the design of a telerehabilitation strategy based on 3D animations, which is aimed at enhancing the musculoskeletal health of individuals working or studying in ICT-related fields. The intervention was developed through the Moodle platform and designed using the ADDIE instructional model, incorporating educational content and therapeutic exercises adapted to digital ergonomics. The sample included university students in the field of computer science who were experiencing symptoms associated with prolonged computer use. After a four-week intervention period, the results show favorable changes in pain perception and knowledge of postural hygiene. These findings suggest that a distance-based educational and therapeutic strategy may be a useful approach for the prevention and treatment of back pain in academic settings.

MTI, Vol. 9, Pages 85: Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools

Chen Chu — 2025-08-22

MTI, Vol. 9, Pages 85: Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools

Multimodal Technologies and Interaction doi: 10.3390/mti9090085

Authors: Chen Chu Jianan Zhao Zhanxun Dong

AI-powered full-site web generation tools promise to democratize website creation for novice users. However, their actual usability and accessibility for novice users remain insufficiently studied. This study examines interaction barriers faced by novice users when using Wix ADI to complete three tasks: Task 1 (onboarding), Task 2 (template customization), and Task 3 (product page creation). Twelve participants with no web design background were recruited to perform these tasks while their behavior was recorded via screen capture and eye-tracking (Tobii Glasses 2), supplemented by post-task interviews. Task completion rates declined significantly in Task 2 (66.67%) and 3 (33.33%). Help-seeking behaviors increased significantly, particularly during template customization and product page creation. Eye-tracking data indicated elevated cognitive load in later tasks, with fixation count and saccade count peaking in Task 2 and pupil diameter peaking in Task 3. Qualitative feedback identified core challenges such as interface ambiguity, limited transparency in AI control, and disrupted task logic. These findings reveal a gap between AI tool affordances and novice user needs, underscoring the importance of interface clarity, editable transparency, and adaptive guidance. As full-site generators increasingly target general users, lowering barriers for novice audiences is essential for equitable access to web creation.

MTI, Vol. 9, Pages 84: Systematic Review of Artificial Intelligence in Education: Trends, Benefits, and Challenges

Juan Garzón — 2025-08-20

MTI, Vol. 9, Pages 84: Systematic Review of Artificial Intelligence in Education: Trends, Benefits, and Challenges

Multimodal Technologies and Interaction doi: 10.3390/mti9080084

Authors: Juan Garzón Eddy Patiño Camilo Marulanda

Artificial intelligence (AI) is changing how we teach and learn, generating excitement and concern about its potential to transform education. To contribute to the debate, this systematic literature review examines current research trends (publication year, country of study, publication journal, education level, education field, and AI type), as well as the benefits and challenges of integrating AI into education. This review analyzed 155 peer-reviewed empirical studies published between 2015 and 2025. The review reveals a significant increase in research activity since 2022, reflecting the impact of generative AI tools, such as ChatGPT. Studies highlight a range of benefits, including enhanced learning outcomes, personalized instruction, and increased student motivation. However, there are challenges to overcome, such as students’ ethical use of AI, teachers’ resistance to using AI systems, and the digital dependency these systems can generate. These findings show AI’s potential to enhance education; however, its success depends on careful implementation and collaboration among educators, researchers, and policymakers to ensure meaningful and equitable outcomes.

MTI, Vol. 9, Pages 83: Homo smartphonus: Psychological Aspects of Smartphone Use—A Literature Review

Piotr Sorokowski — 2025-08-19

MTI, Vol. 9, Pages 83: Homo smartphonus: Psychological Aspects of Smartphone Use—A Literature Review

Multimodal Technologies and Interaction doi: 10.3390/mti9080083

Authors: Piotr Sorokowski Marta Sobczak

The increasing prevalence of smartphone use has raised concerns about its impact on human psychological functioning. This literature review provides a comprehensive overview of the psychological dimensions influenced by smartphone use, spanning health psychology, individual differences, social psychology, and cognitive functioning. The review draws on findings from numerous studies, primarily conducted in highly developed Western and Asian countries, where cultural factors may influence usage patterns and psychological outcomes. Key limitations in the current body of research include geographical biases and methodological challenges such as sample homogeneity and reliance on self-report measures. Evidence suggests that excessive smartphone use can lead to addiction and is associated with negative psychological and health consequences. The review also highlights how individual differences—such as personality traits, age, and gender—affect smartphone usage. Social implications, both positive (e.g., increased connectivity) and negative (e.g., interpersonal conflict), are explored in depth. Cognitive effects are considered, particularly in relation to attention and memory, where findings suggest potential impairments in sustained focus and information retention. While the literature often emphasizes risks, this review also points to the need for further exploration of the potential benefits of smartphone use. In summary, the review offers valuable insights into the complex psychological effects of smartphones and underscores the importance of future research to better understand their nuanced impact on well-being.

MTI, Vol. 9, Pages 82: Perception and Monitoring of Sign Language Acquisition for Avatar Technologies: A Rapid Focused Review (2020–2025)

Khansa Chemnad — 2025-08-14

MTI, Vol. 9, Pages 82: Perception and Monitoring of Sign Language Acquisition for Avatar Technologies: A Rapid Focused Review (2020–2025)

Multimodal Technologies and Interaction doi: 10.3390/mti9080082

Authors: Khansa Chemnad Achraf Othman

Sign language avatar systems have emerged as a promising solution to bridge communication gaps where human sign language interpreters are unavailable. However, the design of these avatars often fails to account for the diversity in how users acquire and perceive sign language. This study presents a rapid review of 17 empirical studies (2020–2025) to synthesize how linguistic and cognitive variability affects sign language perception and how these findings can guide avatar development. We extracted and synthesized key constructs, participant profiles, and capture techniques relevant to avatar fidelity. This review finds that delayed exposure to sign language is consistently linked to persistent challenges in syntactic processing, classifier use, and avatar comprehension. In contrast, early-exposed signers demonstrate more robust parsing and greater tolerance of perceptual irregularities. Key perceptual features, such as smooth transitions between signs, expressive facial cues for grammatical clarity, and consistent spatial placement of referents, emerge as critical for intelligibility, particularly for late learners. These findings highlight the importance of participatory design and user-centered validation in advancing accessible, culturally responsive human–computer interaction through next-generation avatar systems.

MTI, Vol. 9, Pages 81: Organizing Relational Complexity—Design of Interactive Complex Systems

Linus de Petris — 2025-08-12

MTI, Vol. 9, Pages 81: Organizing Relational Complexity—Design of Interactive Complex Systems

Multimodal Technologies and Interaction doi: 10.3390/mti9080081

Authors: Linus de Petris Siamak Khatibi

With the advent of AI- and robot-systems, the current Human–Computer Interaction (HCI) paradigm, which treats interaction as a transactional exchange, is increasingly insufficient for complex socio-technical systems. This paper argues for a shift toward an agential realist perspective, which understands interaction not as an exchange between separate entities, but as a phenomenon continuously enacted through dynamic, material-discursive practices known as ‘intra-actions’. Through a diffractive reading of agential realism, HCI, complex systems theory, and an empirical case study of a touring exhibition on skateboarding culture, this paper explores an alternative approach. A key finding emerged from a sound-recording workshop when a participant described the recordings not as “how it sounds,” but as “how it feels” to skate. The finding reveals the limits of traditional HCI and it illustrates how interacting parts are co-constituted through the intra-actions of entangled agencies. An argument is made that design for interactive complex systems should change from focusing on causal transactional interaction towards organizing relational complexity, which is staging the conditions for a rich scope of emergent encounters to unfold. The paper concludes by suggesting further research into non-causal explanation and computation.

MTI, Vol. 9, Pages 80: Space Medicine Meets Serious Games: Boosting Engagement with the Medimon Creature Collector

Martin Hundrup — 2025-08-07

MTI, Vol. 9, Pages 80: Space Medicine Meets Serious Games: Boosting Engagement with the Medimon Creature Collector

Multimodal Technologies and Interaction doi: 10.3390/mti9080080

Authors: Martin Hundrup Jessi Holte Ciara Bordeaux Emma Ferguson Joscelyn Coad Terence Soule Tyler Bland

Serious games that integrate educational content with engaging gameplay mechanics hold promise for reducing cognitive load and increasing student motivation in STEM and health science education. This preliminary study presents the development and evaluation of the Medimon NASA Demo, a game-based learning prototype designed to teach undergraduate students about the musculoskeletal and visual systems—two critical domains in space medicine. Participants (n = 23) engaged with the game over a two-week self-regulated learning period. The game employed mnemonic-based characters, visual storytelling, and turn-based battle mechanics to reinforce medical concepts. Quantitative results demonstrated significant learning gains, with posttest scores increasing by an average of 23% and a normalized change of c = 0.4. Engagement levels were high across multiple dimensions of situational interest, and 74% of participants preferred the game over traditional formats. Qualitative analysis of open-ended responses revealed themes related to intrinsic appeal, perceived learning efficacy, interaction design, and cognitive resource management. While the game had minimal impact on short-term STEM career interest, its educational potential was clearly supported. These findings suggest that mnemonic-driven serious games like Medimon can effectively enhance engagement and learning in health science education, especially when aligned with real-world contexts such as space medicine.