VR Co-Lab: A Virtual Reality Platform for Human–Robot Disassembly Training and Synthetic Data Generation

Maddipatla, Yashwanth; Tian, Sibo; Liang, Xiao; Zheng, Minghui; Li, Beiwen

doi:10.3390/machines13030239

Open AccessArticle

VR Co-Lab: A Virtual Reality Platform for Human–Robot Disassembly Training and Synthetic Data Generation

by

Yashwanth Maddipatla

¹

,

Sibo Tian

²

,

Xiao Liang

³

,

Minghui Zheng

²

and

Beiwen Li

^4,*

¹

Department of Mechanical Engineering, Iowa State University, Ames, IA 50011, USA

²

J. Mike Walker ’66 Department of Mechanical Engineering, Texas A&M University, College Station, TX 77843, USA

³

Zachry Department of Civil and Environmental Engineering, Texas A&M University, College Station, TX 77843, USA

⁴

School of Environmental, Civil, Agricultural and Mechanical Engineering, University of Georgia, Athens, GA 30602, USA

^*

Author to whom correspondence should be addressed.

Machines 2025, 13(3), 239; https://doi.org/10.3390/machines13030239

Submission received: 11 February 2025 / Revised: 8 March 2025 / Accepted: 14 March 2025 / Published: 17 March 2025

(This article belongs to the Special Issue Advanced Manufacturing and Assembly Technologies for Aerospace Production Systems)

Download

Browse Figures

Versions Notes

Abstract

:

This research introduces a virtual reality (VR) training system for improving human–robot collaboration (HRC) in industrial disassembly tasks, particularly for e-waste recycling. Conventional training approaches frequently fail to provide sufficient adaptability, immediate feedback, or scalable solutions for complex industrial workflows. The implementation leverages Quest Pro’s body-tracking capabilities to enable ergonomic, immersive interactions with planned eye-tracking integration for improved interactivity and accuracy. The Niryo One robot aids users in hands-on disassembly while generating synthetic data to refine robot motion planning models. A Robot Operating System (ROS) bridge enables the seamless simulation and control of various robotic platforms using Unified Robotics Description Format (URDF) files, bridging virtual and physical training environments. A Long Short-Term Memory (LSTM) model predicts user interactions and robotic motions, optimizing trajectory planning and minimizing errors. Monte Carlo dropout-based uncertainty estimation enhances prediction reliability, ensuring adaptability to dynamic user behavior. Initial technical validation demonstrates the platform’s potential, with preliminary testing showing promising results in task execution efficiency and human–robot motion alignment, though comprehensive user studies remain for future work. Limitations include the lack of multi-user scenarios, potential tracking inaccuracies, and the need for further real-world validation. This system establishes a sandbox training framework for HRC in disassembly, leveraging VR and AI-driven feedback to improve skill acquisition, task efficiency, and training scalability across industrial applications.

Keywords:

mechanical engineering; manufacturing technology; virtual reality (VR); human–robot collaboration (HRC); e-waste recycling; disassembly and assembly training; virtual simulation; human–robot interaction; synthetic data generation; robot motion prediction; ROS integration; long short-term memory (LSTM); Monte Carlo dropout; trajectory optimization; machine learning in robotics

1. Introduction

Human–robot collaboration (HRC) has gained increasing traction in industrial settings, particularly in assembly and disassembly tasks. The growing complexity of industrial workflows, such as e-waste recycling and precision manufacturing, demands advanced training methodologies seamlessly integrating human expertise with robotic assistance. Traditional training approaches often fall short in preparing workers for these hybrid environments due to several critical limitations: a lack of realistic interaction scenarios, insufficient adaptability to individual learning curves, the absence of real-time feedback mechanisms, and limited scalability for complex industrial workflows [1,2,3,4].

The need for effective training systems is further amplified by the increasing adoption of Industry 4.0 principles, emphasizing smart manufacturing, cyber–physical systems, and digital twins [5]. These principles call for training platforms that simulate real-world tasks, adapt dynamically to user performance, and integrate seamlessly with robotic systems. However, existing training methods often lack the flexibility and scalability to meet these demands. For example, traditional classroom-based or on-the-job training approaches are limited by their inability to replicate complex industrial environments or provide real-time feedback tailored to individual users.

Virtual reality (VR) offers a transformative solution by enabling immersive, high-fidelity simulations replicating real-world disassembly processes while allowing for real-time data collection and adaptive feedback. Augmented reality (AR) and mixed reality (MR) further enhance the training experience by overlaying digital guidance in physical environments, improving task efficiency and cognitive load management [6,7]. Prior studies have demonstrated the effectiveness of VR/AR-based training for procedural learning and industrial applications. For instance, Webel et al. [2] highlighted the advantages of AR-based technician training, integrating multimodal feedback to enhance learning outcomes. Similarly, Schwarz et al. [8] demonstrated how immersive VR can improve cognitive training, leading to better memory retention and procedural accuracy. Lerner et al. [9] show how VR can improve performance in surgical environments, demonstrating that skills acquired in virtual training environments can effectively transfer to real-world applications. Recent work by Dwivedi et al. [10] further established that immersive VR environments can reduce learning time by up to 40% while improving task efficiency in manual assembly training. These studies highlight the potential of XR technologies for industrial training.

Despite these advancements, significant challenges remain in developing integrated systems that combine immersive training environments with real-time robotic motion prediction and task adaptation. Current solutions cannot often anticipate human actions dynamically or adapt robot behavior based on human preferences and task context. Additionally, ergonomic considerations such as minimizing worker strain during collaborative tasks are often overlooked in existing systems [11]. Addressing these limitations requires a cohesive framework integrating advanced motion prediction models, adaptive cognitive architectures, and uncertainty-aware planning.

Synthetic data generation techniques have significantly enhanced VR-based training platforms by enabling iterative learning and the refinement of robotic behaviors. Synthetic datasets can simulate various human–robot interactions under varying conditions, providing valuable insights into task optimization and safety improvements [12]. Moreover, advancements in machine learning algorithms such as Long Short-Term Memory (LSTM) networks have shown promise in predicting dynamic human motion patterns [13]. These developments pave the way for creating intelligent systems capable of adapting to complex industrial workflows.

To address these gaps, this paper presents VR Co-Lab, a virtual-reality-based platform designed to enhance HRC in disassembly training. This research extends existing knowledge in three key areas:

Development of an integrated VR-based training environment that combines full-body tracking using the Quest Pro HMD with a real-time ROS–Unity bridge, enabling precise human motion capture and dynamic control of robotic models.
Implementation of an LSTM-based motion prediction model with Monte Carlo dropout for uncertainty estimation, which continuously refines robotic motion trajectories, leveraging past interaction data for dynamic, real-time task adaptation. The model runs using Unity’s Barracuda engine, eliminating external inference server latency and ensuring responsive interactions. This builds upon the work of Liu et al. [13] but extends it to immersive VR environments with real-time feedback loops.
Creation of a synthetic data generation pipeline that captures complex human–robot interactions to refine disassembly sequence planning and optimize collaborative task execution. Our approach specifically targets human–robot collaboration in disassembly tasks, addressing a critical gap in the literature identified by Jacob et al. [14].

Preliminary system validation shows a promising technical performance, with the LSTM model demonstrating strong alignment between predicted and actual motion trajectories during initial testing. While these early results demonstrate the system’s potential for enhancing human–robot collaboration in disassembly tasks, more extensive user evaluation studies will be conducted in future work to assess training effectiveness across different skill levels and task complexities.

This framework provides immersive, interactive training and generates synthetic datasets for iterative learning, enabling the data-driven refinement of robotic disassembly strategies. Additionally, by integrating uncertainty-aware motion forecasting, the system ensures smoother transitions between human and robot actions in shared workspaces. While existing approaches have explored VR training or motion prediction separately [15], our work uniquely combines these elements into a cohesive system designed explicitly for human–robot collaborative disassembly training.

The remainder of this paper is structured as follows: Section 2 reviews the relevant literature on VR/AR training methodologies, human–robot motion prediction frameworks, and collaborative disassembly systems. Section 3 details the system architecture, followed by an evaluation of performance metrics in Section 4. Finally, Section 5 discusses the implications of this approach and outlines future research directions.

2. Related Work

2.1. VR/AR Training Systems for Industrial Applications

Virtual reality (VR) and augmented reality (AR) have emerged as transformative tools for industrial training, offering immersive environments that replicate real-world tasks while providing real-time feedback. Dwivedi et al. [10] demonstrated that immersive VR environments could enhance manual assembly training by providing controlled simulations, reducing learning time, and improving task efficiency. Schwarz et al. [16] evaluated VR’s effectiveness in cognitive training, showing improvements in memory retention and cognitive load management.

Webel et al. [2] explored AR-based technician training, incorporating multimodal interaction such as haptic feedback to enhance learning outcomes. Hořejš’i et al. [7] highlighted AR’s role in virtual training for parts assembly, proving its effectiveness in reducing assembly time and errors. Similarly, Rocca et al. [17] investigated integrating VR and digital twins into circular economy practices, demonstrating their role in optimizing disassembly workflows. Boud et al. [18] demonstrated VR’s effectiveness as a tool for assembly tasks, highlighting how haptic feedback through instrumented objects can improve object manipulation times compared to conventional techniques. Li et al. [4] developed a desktop VR prototype system (V-REALISM) for maintenance training that integrated disassembly sequence planning with virtual reality, demonstrating cost reduction benefits for industrial applications.

Al-Ahmari et al. [19] developed a virtual manufacturing assembly simulation system that offers visual, auditory, tactile, and force feedback, enhancing the realism of the training environment. Numfu et al. [20] presented a VR platform for maintenance training that is easy to use, understand, and transport, enabling users to train anywhere and anytime according to their needs. Gutiérrez et al. [21] developed a multimodal virtual training system to transfer motor and cognitive skills in industrial maintenance and assembly tasks, providing multimodal aids and learning strategies that adapt to users’ needs during training.

Li et al. [22] discussed an interactive virtual training system for assembly and disassembly based on precedence constraints, offering both immersive VR and conventional desktop interfaces to balance training quality and cost. Westerfield et al. [23] highlighted intelligent augmented reality training’s ability to improve task performance and knowledge retention in industrial operations. Pan et al. [24] designed a training system for robotic component assembly based on VR technology, aimed at enabling students to quickly understand the assembly methods of robot components and mobilize their enthusiasm for practice and innovation. Seth et al. [25] provided a comprehensive review of VR for assembly methods prototyping, identifying key requirements for effective virtual assembly systems including physics-based modeling and haptic feedback.

Recent advancements emphasize the scalability of VR/AR systems for multi-user scenarios and the integration of AI-based adaptive learning models [5]. For example, Carlson et al. [12] discussed synthetic data generation within AR/VR environments to improve training scenarios. Devagiri et al. [26] provided a comprehensive review of AR and AI integration in industrial applications, emphasizing their role in enhancing workforce training and operational efficiency. However, challenges remain in addressing ergonomic considerations [11], ensuring seamless integration with robotic systems, and overcoming physiological effects such as dizziness during extended VR training sessions [27].

2.2. Human Motion Prediction in HRC

Human motion prediction plays a critical role in enhancing task efficiency and ensuring operator safety in HRC settings. Lee et al. [28] proposed a disassembly task planning algorithm that dynamically assigns tasks between human operators and robots by leveraging human behavior prediction models. Liu et al. [13] introduced a task-constrained motion planning approach incorporating uncertainty-informed human motion prediction to improve adaptability and safety. Vongbunyong et al. [29] presented a process demonstration platform that enables skillful human operators to demonstrate disassembly processes, with the knowledge at planning and operational levels being transferred to robots to achieve automated disassembly.

Jacob et al. [14] conducted a rapid review of human–robot cooperation in disassembly, identifying key challenges including task allocation, safety management, and adaptive collaboration. Their work emphasized that effective HRC systems must balance human cognitive capabilities with robotic precision, particularly for complex disassembly sequences. Hjorth and Chrysostomou [30] examined the landscape of human–robot collaborative disassembly (HRCD) and reviewed progress in the field, analyzing principles and elements of human–robot collaboration in industrial environments, such as safety standards and collaborative operation modes.

Ottogalli et al. [31] used virtual reality to simulate human–robot coexistence for an aircraft final assembly line, evaluating ergonomics and process efficiency to identify issues beforehand and prevent unexpected costs.

Recent advancements include frameworks incorporating Gaussian Process Regression (GPR) models [32], which integrate human joint constraints with scene constraints for accurate motion trajectory prediction. Additionally, Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) models, have been extensively utilized for predicting dynamic human motion [13]. While LSTMs excel in capturing temporal dependencies, alternative approaches like Transformer-based models or Kalman filters may offer an improved performance in certain scenarios. Liu et al. [33] developed a dynamic model-informed human motion prediction approach based on an Unscented Kalman Filter, demonstrating improved accuracy in forecasting human movements during collaborative tasks.

Daneshmand et al. [34] surveyed robotic assembly and disassembly in the context of Industry 4.0 and the circular economy, highlighting the importance of integrating human factors into collaborative robotic systems. Their work emphasized that effective HRC requires not only technical solutions for motion prediction but also consideration of ergonomic factors, cognitive load, and user experience aspects that our VR training system specifically addresses through immersive interaction and real-time feedback mechanisms.

2.3. Inverse and Forward Kinematics in Robotic Training

Inverse kinematics (IK) is essential for accurately controlling robotic arms during industrial tasks such as assembly or disassembly. Kuts et al. [35] demonstrated that IK ensures high fidelity in replicating physical movements within VR environments, enabling the precise manipulation of virtual robotic models. Similarly, Xu et al. [36] explored adaptive IK algorithms that dynamically adjust joint angles based on task constraints, improving the precision of robotic operations during collaborative tasks.

Forward kinematics (FK), on the other hand, calculates the position of robotic end-effectors based on given joint parameters. Xu et al. [36] highlighted FK’s role in simulating robotic movements within VR environments for operator training. Their robotic kinematics teaching system combined VR simulation and remote control to enhance robotics education while mitigating safety risks. They also explored robotic kinematics teaching systems and applications of VR to facilitate the system’s ability to provide hands-on experience without physical risks, creating an effective bridge between theoretical knowledge and practical implementation.

By combining FK with IK algorithms, researchers have developed hybrid models capable of optimizing both trajectory planning and collision avoidance during complex industrial workflows [35].

2.4. Digital Twins and Industry 4.0 Integration

Digital twins have gained prominence as a key enabler of smart manufacturing under Industry 4.0 principles [5]. Rocca et al. [17] demonstrated the integration of digital twins with VR to optimize disassembly workflows, enabling real-time monitoring and task execution. Pérez et al. [37] developed a digital-twin-based methodology for multi-robot manufacturing cell commissioning, highlighting its potential for improving system efficiency. Sassanelli et al. [38] conducted a systematic literature review on simulation tools supporting disassembly processes with a focus on printed circuit boards, identifying key roles of simulation in sequence planning, process optimization, and training.

Zhao et al. [39] demonstrated the integration of augmented reality and digital twin technologies in a teleoperated disassembly system, enabling improved operator safety and task efficiency through remote visualization and control. Their work particularly emphasized the benefits of digital twins for hazardous disassembly operations, such as EV battery recycling, an application domain directly relevant to our VR training system.

Zhao and Wang [40] developed a machine tool disassembly and assembly training system based on virtual simulation that establishes bidirectional communication between digital models and physical equipment. Their approach enables learners to observe the dynamic responses of machine tool models through the virtual environment, introducing digital twin technology into assembly/disassembly training.

Pérez et al. [41] used VR interfaces for industrial robot control and operator training, increasing usability and reducing complexity. Their approach demonstrated how VR interfaces can simplify complex robotic operations, making them accessible to operators with varying levels of technical expertise. Zhao et al. [42] demonstrate a robot teleoperation system with integrated digital twins and augmented reality for improved safety. Their system allows operators to enter the control loop when anomalies are detected, enabling them to receive real-time feedback while remotely controlling the robot, thereby improving the safety and efficiency of disassembly operations.

However, challenges remain in integrating digital twins with real-time VR/AR systems due to computational complexities and latency issues [43]. Our work addresses these challenges through optimized data pipelines and efficient model deployment strategies, ensuring responsive interaction while maintaining digital twin fidelity.

2.5. Comparison with Existing Approaches

While previous research has made significant contributions to VR/AR training and human–robot collaboration, VR Co-Lab distinguishes itself in several key aspects:

Integration of full-body tracking with robotic control: Unlike systems by Webel et al. [2] and Schwarz et al. [8] that explored AR/VR for technical training without comprehensive body tracking, our system leverages Quest Pro’s Inside-Out Body-Tracking technology to capture upper body movements with high precision. This tracking framework enables more natural interactions and provides richer data for motion prediction models.

Bidirectional Unity–ROS communication: While existing platforms often implement unidirectional communication between VR environments and robotic systems, VR Co-Lab features a bidirectional Unity–ROS bridge that enables seamless data exchange. This allows for the real-time adaptation of both the virtual environment and robotic responses based on user interactions, creating a more responsive and adaptive training experience.

LSTM-based motion prediction with uncertainty quantification: Our implementation extends Liu et al.’s [13] work on motion prediction by integrating Monte Carlo dropout for uncertainty estimation directly within an immersive VR environment. This approach not only predicts human movements but also quantifies confidence in these predictions, enabling safer and more adaptive human–robot collaboration during disassembly tasks.

Synthetic data generation for disassembly tasks: Unlike general-purpose VR training systems, VR Co-Lab specifically addresses the challenges of human–robot collaborative disassembly identified by Jacob et al. [14]. Our synthetic data generation pipeline captures complex interaction patterns during disassembly procedures, creating valuable training datasets that enhance both human learning and robotic adaptation.

On-device neural network inference: By implementing the Barracuda engine for neural network inference directly in Unity, our system eliminates the latency associated with external inference servers. This approach ensures responsive real-time interactions essential for effective training and collaboration in VR environments.

These distinctive features collectively create a comprehensive platform that addresses limitations in existing systems by combining immersive VR training, real-time motion prediction with uncertainty estimation, and synthetic data generation specifically tailored for human–robot collaborative disassembly tasks.

The literature strongly supports the integration of VR and AR technologies in industrial training systems. Our research demonstrates that effective VR-based training depends on four key factors: robust observational learning capabilities, balanced cognitive and physical fidelity, immediate feedback mechanisms, and strategic synthetic data utilization. The immersive nature of collaborative VR environments significantly enhances both team-based training outcomes and individual task performance metrics. These insights have directly informed the development of our VR Co-Lab platform, ensuring it delivers the comprehensive capabilities required for advanced industrial training applications in human–robot collaborative disassembly tasks.

3. VR Training System Implementation

The VR training system integrates selected technologies to enhance immersion and interactivity while ensuring a seamless connection between virtual and physical robotics.

3.1. Technology Stack

The VR training system is built upon a carefully selected technology stack that ensures robust performance and scalability. Unity serves as the core of our virtual environment, providing advanced 3D modeling, real-time graphics rendering, and physics simulation capabilities essential for creating detailed and interactive simulations of industrial tasks. This foundation enables a realistic and engaging training experience that closely mimics real-world conditions.

The Unity Perception Toolkit enhances our approach to capturing and utilizing real-time interaction data, supporting immediate feedback and iterative learning enhancements within the training environment. Communication between Unity and the ROS is facilitated by the Unity ROS TCP Connector, ensuring that user interactions within the virtual environment are accurately mirrored by the physical robot’s actions, maintaining a coherent experience across both platforms.

For accurate simulation, the Unity URDF Importer recreates realistic robot models within Unity by importing URDF files. This capability is vital for simulating complex mechanical interactions and ensuring that virtual manipulations directly affect the physical robot’s behavior. The Robot Operating System (ROS) serves as the backbone for managing robotic communications and control, integrating various hardware and software components necessary for practical training simulations.

The Quest Pro headset was selected for its superior tracking capabilities, enhancing user interaction by providing accurate tracking of body and hand movements, crucial for tasks requiring fine motor control and precise manipulations. To facilitate the easy deployment and management of the VR and ROS environments, Docker is employed with a custom image created using the ROS Melodic base, including additional required ROS packages and submodules to ensure a consistent and reproducible development environment.

3.2. VR Training System Architecture

The architecture of the VR training system is designed to be modular, facilitating easy updates and scalability.

3.2.1. VR Environment

VR environment (Unity scene): Developed in Unity, this component replicates a disassembly workstation with realistic tools and components with virtual replicas of tools and machinery. This setup engages the user and serves educational purposes by providing a safe, controlled environment for complex training scenarios. Figure 1 shows a real-world setup and Figure 2 shows a virtual environment similar to the real-world setup.

Interactivity: Users manipulate tools such as Phillips and flathead screwdrivers of various sizes and power tools for hard drive disassembly alongside component parts like screws of different sizes, circuit boards, disk platters, actuator arms, and various electronic components using their hands or controllers. For example, a virtual screwdriver in the environment behaves according to the physics rules defined in Unity, providing tactile feedback if the VR controllers are used.

The system facilitates specific disassembly procedures, including (1) removing the external casing screws using appropriate screwdrivers, (2) detaching and safely removing the disk platter, (3) disconnecting and removing the read/write head and actuator arm, (4) extracting the circuit board and electronic components, and (5) sorting components for recycling or refurbishment. Each procedure requires specific tool selection and proper technique, with the system providing feedback on proper tool usage and procedural accuracy. The system also allows the customization of the setup, type of robot, and different environments to accommodate various training scenarios.

3.2.2. Robotic Control System

Managed by the ROS, this component includes multiple ROS nodes that communicate with each other and with the Unity environment through serialized ROS messages. The nodes handle various tasks such as sensor data processing, control commands, and state monitoring. Communication between the Unity scene and the ROS network is facilitated via a server endpoint that manages the publish/subscribe model of ROS communication.

The Unity–ROS integration leverages the ROS# framework [44], enabling seamless communication between the virtual environment and physical robotic systems. This integration allows for importing various robots as long as there is an associated controller file and a URDF file. The current environment uses the Niryo One robot that works alongside the user in disassembling the hard disk.

For neural network inference within Unity, we utilize the Barracuda engine [45], which provides the efficient execution of ONNX models directly in the game engine. This approach eliminates the need for external inference servers, reducing latency and ensuring responsive interaction in the VR environment.

The architecture diagram in Figure 3 illustrates the communication between the Unity scene and the ROS network. The Unity scene contains ROS service, subscriber, and publisher scripts that interact with ROS nodes over a network. Serialized ROS messages facilitate this interaction, ensuring data consistency and real-time communication.

ROS service script: This script in Unity sends service requests to ROS nodes and processes the responses. It is used for tasks that require a request–response communication pattern, such as querying the state of a robot or commanding it to perform a specific action.

ROS subscriber script: This script subscribes to specific ROS topics and processes incoming data. It allows the Unity environment to receive real-time updates from the ROS nodes.

ROS publisher script: This script publishes data to ROS topics, enabling Unity to send commands and updates to ROS nodes. It is used for controlling the robot and sending state information from the VR environment to the ROS network.

AI inference engine (Barracuda): The Barracuda inference engine is a lightweight, cross-platform solution for running deep learning models in Unity. It supports the Open Neural Network Exchange (ONNX) format, allowing pre-trained models (such as LSTMs and CNNs) to be deployed within Unity environments. In this system, Barracuda facilitates real-time motion prediction to improve human–robot interaction.

Loads a pre-trained ONNX LSTM model optimized for human motion forecasting.
Processes real-time user movement data to anticipate the next actions.
Feeds the predicted motion trajectory into the ROS network for adaptive robot response.
Enables low-latency inference directly in Unity’s runtime, reducing dependence on external servers.
Supports GPU acceleration when available, improving the efficiency of AI inference in VR.

Integration with ROS: The AI-based motion prediction refines robot control by ensuring that the motion planning and task execution nodes receive real-time human movement predictions. This allows for the following:

Adaptive task scheduling, reducing wait times in collaborative workflows.
Enhanced safety mechanisms, as the robot can proactively adjust to user movements.
Increased task efficiency, with optimized movement paths informed by AI predictions.

The ROS network comprises multiple nodes, each responsible for specific tasks such as sensor data processing, control commands, and state monitoring. The server endpoint manages the publish/subscribe communication model, ensuring messages are delivered reliably and efficiently between nodes and the Unity scene.

Motion planning node: utilizes LSTM-based motion prediction data to plan optimal robot actions.

Data processing node: handles real-time sensor data and refines predictions based on Unity inputs.

Task execution node: executes disassembly actions dynamically based on predicted and real-time user interactions.

This enhancement enables a more intelligent and adaptive robotic training environment, leveraging AI-powered predictive control to improve real-time HRC interactions. The combination of Barracuda, ONNX, and LSTMs allows Unity to act as a robust simulation environment for industrial training.

3.2.3. Data Collection and Feedback Mechanisms

This module integrates advanced data analytics to track performance metrics such as task completion times, error rates, and user engagement. These data are essential for refining training protocols, personalizing the learning experience, and identifying areas for improvement.

Specifically, the task completion times

T C T

are computed as follows:

T C T = \frac{\sum (Time Taken Per Task)}{Number of Tasks} .

(1)

This equation calculates the average time users spend to complete each task. A lower average indicates increased proficiency and comfort with the task and the VR environment.

The error rate E is thereby computed as

E = \frac{Number of Errors}{Total Number of Tasks} \times 100 .

(2)

The user engagement is quantified through parameters

U (D)

and

U E I

, where

U (D)

quantifies the utility of synthetic data and

U E I

is defined as a user engagement index. Specifically, they are computed as follows:

U (D) = 1 - e^{- λ D} .

(3)

where D is the amount of data and

λ

is a parameter indicating the efficiency of data conversion into learning improvement. This reflects the value added by synthetic data in training machine learning models within the VR system:

U E I = α \times Active Time + β \times Interaction Frequency .

(4)

This user engagement index

U E I

combines multiple factors: active time in the system, frequency of interactions, and quality of user feedback. Coefficients

α

and

β

weigh the importance of each factor, offering a holistic measure of user engagement.

3.2.4. Task Flow and User Actions

User interaction within the VR environment is carefully designed to ensure that it is intuitive, engaging, and educational. The task flow in the VR training system is designed to guide users through a series of logically sequenced steps, from task initiation to completion, ensuring a comprehensive learning experience.

The flowchart in Figure 4 illustrates the sequence of user interactions within the VR system, incorporating both user actions and real-time robot motion predictions using the LSTM model. The key steps are as follows:

Initialize tasks: the VR environment is set up with all necessary tools, components, and the initial system configuration.
Activate step: the system prompts the user to begin the task, initiating the first step in the disassembly process.
Is part removable?: The system checks whether the targeted part can be safely removed. If not, the part is reset, and the process loops back to prompt the user again.
Display instructions: the VR system provides detailed instructions specific to the current step in the task, guiding the user through the disassembly process.
Predict user/robot actions (LSTM): the LSTM model predicts both the user’s and the robot’s next actions based on the current task state and previous interactions.
Human or robot task?: the system determines whether the predicted action should be executed by the user (human task) or by the robot (robot task).
Apply shader and execute actions:
- Human task: a purple shader highlights the part that the user needs to interact with, and the system waits for the user’s action.
- Robot task: a green shader is applied to the part to be manipulated by the robot, followed by the computation and execution of the robot’s trajectory.
Validate action execution: the system validates whether the action performed by the user or robot was correct and executed as planned.
Update model (feedback to LSTM): the results of the action validation are fed back into the LSTM model to update its predictions and improve future interactions.
Is action valid?: The system checks if the action taken was valid. If not, it resets the part and loops back to display instructions.
More tasks?: The system checks if there are additional tasks to be performed. If yes, it returns to the “Activate Step” phase; if not, it proceeds to calculate the final metrics.
Metrics calculation: at the end of the session, the system computes and displays performance metrics, providing immediate feedback to the user on the effectiveness and efficiency of their interactions.

While real-time updates are displayed via the dashboard, periodic validation metrics ensure consistency between observed and computed performance values. User interactions within the VR environment are designed to be seamless and intuitive, utilizing hand gestures and tool manipulations. Immediate feedback is delivered through visual and auditory signals, aiding users in correcting actions and learning in real time.

The present iteration features a specific training scenario centered on hard drive disassembly that replicates a real-world setup. This focused approach allows us to thoroughly evaluate both the VR training methodology and the LSTM-based motion prediction model within a controlled context. Future implementations will expand to include multiple scenarios with different robotic systems and diverse component types, enabling broader application of the training framework across various industrial contexts.

3.3. Body Tracking and Interaction Through Meta SDKs

Integrating Quest Pro’s advanced body and hand-tracking capabilities into the VR training environment allows for replicating real-world movements within the virtual space. The Meta Quest Pro utilizes Inside-Out Body-Tracking (IOBT) technology [46], which leverages the headset’s cameras to track upper body movements including wrists, elbows, shoulders, and torso. While this provides accurate tracking for the upper body, the system has inherent limitations for lower body tracking. To address this limitation, Meta has implemented “Generative Legs”—an AI-based solution that creates realistic leg movements by extrapolating from upper body data [47].

Hand tracking on the Quest Pro offers support for multiple gestures including point and pinch, pinch and drag, palm pinch, and direct touch interactions [48]. While this enables a controller-free interaction, the tracking system faces challenges with occlusion when hands overlap or move outside the cameras’ field of view, potentially affecting precision during complex manipulation tasks [47].

In the current project phase focused on robotic path planning, eight key upper body tracking points shown in Figure 5 are used to monitor and translate user movements effectively. This integrated tracking framework captures complex user movements and feeds this data back into the system to enhance interaction models and improve the training algorithms continuously. As a result, users can interact with various virtual tools, each behaving according to predefined physics properties. This interaction is crucial for task completion and adds an educational layer by allowing users to experience the physical feel of different tools.

The Meta Movement SDK provides developers with tools to implement these tracking capabilities, supporting both controller-based and hand-tracking interactions [47]. For VR training applications, this flexibility allows users to choose the most appropriate interaction method based on specific task requirements and personal preferences.

3.4. Performance Monitoring and Synthetic Data Generation

The system monitors performance and generates synthetic data to train machine learning models. These models predict user performance and suggest training adjustments.

3.4.1. Performance Metrics

During the training session, various performance indicators were tracked to assess training effectiveness. Task completion time (TCT) measures the average time taken to complete a full disassembly task, calculated as the sum of time taken per task divided by the number of tasks. Lower TCT values indicate improved task efficiency as users become more proficient with both the task and the VR environment.

Error rate (E) quantifies the percentage of incorrect interactions during the training process, computed using the following formula:

E = \frac{Number of Errors}{Total Number of Tasks} \times 100 .

Motion path efficiency analyzes the similarity between predicted and actual movement paths through trajectory deviation analysis. This metric shows improvements across training sessions, illustrating increasing optimization in human–robot movement synchronization.

Preliminary system validation conducted solely by the author indicates a task completion time of 4.2 min during self-testing, with an error rate of 10.8% observed during implementation testing. A motion path efficiency analysis showed that robot trajectories aligned with the author’s intended interactions in 82% of cases, suggesting that the proposed system architecture has the potential for modeling human–robot collaboration. These technical validation metrics serve as proof of concept for the implementation rather than formal user evaluation, as this paper focuses on system design and architecture rather than human subjects research.

Synthetic Data for Machine Learning

The VR system continuously monitors performance and generates synthetic data that serve as the foundation for training machine learning models. This data collection process captures the comprehensive movement data of both the user and the robot within the training environment, creating valuable synthetic datasets that replicate complex interaction scenarios. These datasets are then used to train machine learning models that can predict user performance and suggest training adjustments.

The synthetic data enable the development of advanced algorithms capable of predicting user performance and suggesting personalized training modifications. This approach ensures that the training adapts to individual learning styles and speeds, maximizing learning efficiency and personalizing the user experience. Behdad et al. [49] explored leveraging VR experiences with mixed-integer nonlinear programming for the visualization of disassembly sequence planning under uncertainty, demonstrating how virtual environments can enhance decision making in complex disassembly tasks.

Integrating these detailed tracking capabilities and sophisticated data analysis tools ensures that the VR training system is not only immersive and interactive but is also grounded in robust empirical research, enabling continuous improvement and the personalization of training processes. Lerner et al. [9] demonstrated how VR training can improve performance in surgical environments, showing that skills acquired in virtual training environments can effectively transfer to real-world applications.

3.5. Mathematical Foundations and Implementation of Forward Kinematics

Forward kinematics (FK) involves calculating the position and orientation of a robotic manipulator’s end-effector based on its joint parameters. The FK process utilizes homogeneous transformation matrices to compute the cumulative transformations from the robot’s base to its end-effector. Each joint in the manipulator is described using Denavit–Hartenberg (DH) parameters [50], which provides a standardized method to represent the kinematic parameters of robotic manipulators. The parameters include the following:

$a_{i}$ : link length, or the distance between two joint axes along the x-axis.
$α_{i}$ : link twist, or the angle between two joint axes around the x-axis.
$d_{i}$ : link offset, or the displacement along the z-axis.
$θ_{i}$ : joint angle, or the rotation about the z-axis.

The transformation matrix for the i-th joint is given as

T_{i} = [\begin{matrix} cos θ_{i} & - sin θ_{i} cos α_{i} & sin θ_{i} sin α_{i} & a_{i} cos θ_{i} \\ sin θ_{i} & cos θ_{i} cos α_{i} & - cos θ_{i} sin α_{i} & a_{i} sin θ_{i} \\ 0 & sin α_{i} & cos α_{i} & d_{i} \\ 0 & 0 & 0 & 1 \end{matrix}] .

(5)

The overall transformation matrix T from the base to the end-effector is obtained by multiplying the individual transformations for all n joints:

T = T_{1} \cdot T_{2} \cdot \dots \cdot T_{n} .

(6)

The transformation matrix T depends primarily on joint angles (

θ_{i}

) rather than time. However, in dynamic scenarios where joint angles change over time, T becomes implicitly time-dependent as

θ_{i} (t)

varies. This temporal dependency is particularly relevant for motion prediction, where the LSTM model forecasts future joint configurations based on previous states, effectively capturing the time evolution of T through sequential joint angle predictions.

This transformation encapsulates the position and orientation of the end-effector in the workspace.

3.5.1. Implementation in Unity

In the Unity environment, forward kinematics (FK) for robotic arms is implemented using the ArticulationBody components, which facilitate the simulation of complex jointed structures essential for modeling robotic arms in virtual environments [51]. The FK process involves several key steps:

DH parameter extraction: the joint parameters, such as angles or displacements, are extracted dynamically from Unity’s articulation bodies.
Transformation matrix calculation: each joint’s transformation matrix is computed using the Denavit–Hartenberg (DH) parameters.
Composite transformation: the transformation matrices are multiplied sequentially to compute the end-effector’s position and orientation relative to the robot’s base frame.
Simulation output: the computed position is used for visualization and task execution in Unity’s 3D environment.

The function ‘FK’ in the Unity script iteratively calculates the transformation matrices and combines them to yield the final transformation T, which defines the end-effector’s pose.

3.5.2. Mathematical Workflow

The workflow for forward kinematics in the Unity script corresponds to the following:

Input: joint parameters $θ = (θ_{1}, θ_{2}, \dots, θ_{n})$ .
Transformation matrix calculation: for each joint, the transformation matrix $T_{i}$ is computed using the DH parameter set $[a_{i}, α_{i}, θ_{i}, d_{i}]$ .
Cumulative transformation: the transformation matrices are multiplied sequentially to form the composite matrix T, representing the end-effector’s pose.
Output: the position $(x, y, z)$ and orientation (rotation matrix) of the end-effector in the global frame.

Example: Calculating the End-Effector Pose

For a robotic arm with two joints, the transformation matrices

T_{1}

and

T_{2}

are given by

T_{1} = [\begin{matrix} cos θ_{1} & - sin θ_{1} & 0 & a_{1} cos θ_{1} \\ sin θ_{1} & cos θ_{1} & 0 & a_{1} sin θ_{1} \\ 0 & 0 & 1 & d_{1} \\ 0 & 0 & 0 & 1 \end{matrix}] .

T_{2} = [\begin{matrix} cos θ_{2} & - sin θ_{2} & 0 & a_{2} cos θ_{2} \\ sin θ_{2} & cos θ_{2} & 0 & a_{2} sin θ_{2} \\ 0 & 0 & 1 & d_{2} \\ 0 & 0 & 0 & 1 \end{matrix}] .

The final transformation matrix T is computed as

T = T_{1} \cdot T_{2}

This computation provides the final pose of the end-effector.

3.5.3. Applications in Robotic Training

Forward kinematics enables an accurate simulation of robotic arm movements, crucial in tasks like disassembly training. By incorporating this methodology, Unity-based simulations can achieve high fidelity, aiding real-time task execution and performance evaluation. Yang et al. [52] demonstrated the importance of validating virtual haptic disassembly platforms against real-world performance metrics, particularly when considering the disassembly process and physical attributes of components. Tahriri et al. [53] proposed methods for optimizing robot arm movement time using virtual reality robotic teaching systems, showing increased production rates and decreased cycle times.

3.6. Integrated Pipeline for Predictive Analytics

The integration of the LSTM model into the Unity VR training environment creates a robust system for predictive analytics and real-time robotic control. By leveraging Unity’s simulation capabilities [51], Docker containers, the ROS [54], synthetic data generation, and ONNX-based model deployment [55], this pipeline enhances human–robot collaboration for tasks such as disassembly training. Figure 6 illustrates the LSTM architecture used in our implementation.

LSTM was chosen due to its ability to effectively model long-term dependencies in sequential data while maintaining a relatively low computational cost suitable for real-time inference. While architectures such as Transformers or Gaussian Process models could offer alternative solutions, they may introduce latency overheads that impact VR responsiveness. Future work will explore comparative evaluations with alternative architectures.

3.6.1. Synthetic Data Generation and Augmentation

Synthetic data are generated in Unity by simulating user and robotic arm movements, following methodologies similar to those in virtual human modeling for disassembly operations [56]. Unity’s ArticulationBody components capture joint angles, velocities, and end-effector positions, providing a rich dataset for training machine learning models. Simulated user actions are captured via Meta’s Movement SDK, while the ROS# bridge facilitates seamless data transfer between Unity and the ROS environment for preprocessing, storage, and further analysis.

While synthetic data generation enables controlled training scenarios, ensuring its applicability to real-world interactions remains challenging. Future work will include validation with real-world datasets to assess generalization performance and domain adaptation strategies. A key aspect of this validation will be testing the LSTM model’s ability to adapt synthetic training data to real-world robotic disassembly tasks, ensuring that movement patterns and trajectory predictions remain applicable outside controlled simulations.

3.6.2. Data Cleaning and Preprocessing

Before training, the collected motion data undergo a cleaning step to remove inconsistencies such as abrupt jumps or unrealistic movements caused by physics simulation artifacts in Unity. This ensures that the dataset accurately represents real-world motion patterns.

3.6.3. Data Augmentation

To enhance model generalization and robustness, data augmentation is applied. The augmentation process involves the following:

Motion perturbations: a combination of Gaussian noise injection (0.005) and a small scaling factor of (0.05) is applied to simulate real-world sensor noise and variations in movement execution.
Temporal warping: slight shifts in the motion sequence timing help simulate variations in human and robotic reaction times.

3.6.4. Augmented Data for Training

The cleaned and augmented dataset is then used to train the LSTM model. The augmentation ensures that the model learns from diverse movement patterns, improving its ability to generalize to unseen user behaviors in the virtual training environment.

The dataset effectively supports the predictive model by integrating synthetic data generation, augmentation, and preprocessing while ensuring realism and variability in human–robot interactions.

3.6.5. LSTM Model Architecture and Training

The LSTM model is trained in PyTorch (version 2.1.0) to predict user movement trajectories and robotic actions [57]. This training approach is inspired by uncertainty-informed human motion prediction in collaborative environments [13]. The training pipeline utilizes sequential data, including the following:

User movement data captured from the Meta Movement SDK.
Robotic motion data captured through Unity’s ArticulationBody components [51].

The model consists of three stacked LSTM layers with 256 hidden units per layer. Each LSTM cell captures sequential dependencies in motion data over time, making it well-suited for human–robot interaction modeling.

The LSTM updates its hidden and cell states at each timestep:

(h_{t}, c_{t}) = LSTM (x_{t}, h_{t - 1}, c_{t - 1}) .

where

$x_{t}$ is the input motion data at time t.
$h_{t}$ is the updated hidden state, encoding learned temporal dependencies.
$c_{t}$ is the updated cell state, retaining long-term memory across sequences.

This structure enables the model to predict motion trajectories by leveraging past observations while mitigating issues like vanishing gradients affecting traditional RNNs.

LSTMs were selected over standard RNNs due to their superior ability to retain long-term dependencies and their relatively lower computational overhead compared to attention-based architectures such as Transformers. Future work may explore alternative architectures for improved accuracy and efficiency.

Monte Carlo Dropout for Uncertainty Estimation

To enhance robustness, the model integrates a Monte Carlo dropout (MC dropout) rate of 0.1 during inference. It enables multiple forward passes, producing a distribution of possible predictions. This technique helps quantify uncertainty in motion prediction, ensuring the model does not overfit specific patterns and can adapt to varying user behaviors. Given the dynamic nature of disassembly tasks, incorporating uncertainty estimation allows for more reliable and adaptable robotic trajectory adjustments.

Formally, for a given input sequence X, the final prediction is computed as

{\hat{x}}_{t} = \frac{1}{M} \sum_{m = 1}^{M} f_{θ_{m}} (X) .

(7)

where

f_{θ_{m}} (X)

represents a stochastic forward pass and M is the number of MC samples. The standard deviation of these predictions represents the uncertainty estimate.

Loss Function and Training

The model is trained to minimize a loss function L that evaluates the difference between the predicted and actual trajectories:

L = \frac{1}{N} \sum_{t = 1}^{N} {∥ {\hat{x}}_{t} - x_{t} ∥}^{2} .

(8)

where

{\hat{x}}_{t}

is the predicted trajectory at time t and

x_{t}

is the ground truth.

The AdamW optimizer is used with a cosine annealing learning rate to improve convergence. Training is conducted over 100 epochs with a batch size of 128.

3.6.6. Model Training and Evaluation

The LSTM-based model for human–robot motion prediction was trained using a structured pipeline involving data preprocessing, model training, evaluation, and deployment. The training was conducted on Google Colab utilizing a Tesla T4 GPU to accelerate computations. The model predicts human and robotic movement trajectories within the virtual environment, enhancing real-time decision making for collaborative tasks.

Training Setup

The model consists of a three-layer LSTM architecture with 256 hidden units per layer and dropout (0.1) for Monte Carlo uncertainty estimation. The training was performed using the following hyperparameters:

Sequence length: 50 time steps per input sequence.
Batch size: 128.
Optimizer: AdamW with weight decay ( $10^{- 4}$ ).
Learning rate: 0.001, decayed using a cosine annealing schedule.

Evaluation Metrics

The model’s performance was assessed using standard regression metrics:

Root Mean Squared Error (RMSE): 0.0179
Mean Absolute Error (MAE): 0.0060
R² score: 0.4965

Real-time motion prediction within VR environments exhibits intrinsic noise resulting from dynamic interactions and environmental variables. A lower R² score is expected in such applications, and improving it further would require a larger, more diverse dataset encompassing a broader range of user behaviors. These values indicate the model’s ability to accurately predict motion sequences. The relatively lower R² score is expected due to the inherent variability in human motion data, especially in immersive VR environments where factors such as hand jitter, sensor drift, and user variability contribute to prediction noise. Given that motion prediction in human–robot collaboration is dynamic and influenced by external uncertainties, the primary focus remains on minimizing absolute error rather than achieving a high R² score.

Visualization and Results

The predictive performance of the model was analyzed through multiple visualizations:

LSTM predictions with Monte Carlo dropout: a time-series plot comparing predicted and actual motion sequences, including uncertainty estimation using Monte Carlo dropout (Figure 7).
True vs. predicted values: scatter plot comparing predicted and actual values, demonstrating model fit (Figure 8).
Performance metrics table: summarizing model accuracy (Table 1).

The trained model was exported to the Open Neural Network Exchange (ONNX) format for efficient inference and cross-platform compatibility. This evaluation validates the system’s capability to improve human–robot interaction through predictive analytics and uncertainty estimation. Future improvements include expanding the dataset and fine-tuning hyperparameters for greater accuracy and reliability.

3.6.7. Docker-Based Deployment

To ensure reproducibility and consistency, the system is encapsulated within a Docker container that integrates:

ROS Melodic: facilitates robotic control, sensor integration, and data communication [54].
ONNX runtime: provides efficient execution of the trained LSTM model for real-time inference [55].
ROS#: enables seamless data exchange between Unity and the ROS environment [44].

By encapsulating all dependencies in a containerized environment, the pipeline remains consistent across development and deployment, reducing compatibility issues and simplifying workflow management.

3.6.8. Integration into Unity Environment

The trained LSTM model is deployed in Unity through the following steps:

The ONNX model is imported into Unity using the Unity Barracuda library, a lightweight inference engine optimized for neural networks [45].
The ROS–Unity bridge is established using ROS#, enabling real-time communication between Unity and the ROS [44].
User and robotic motion data are streamed to the ONNX model for inference, generating movement predictions. These predictions guide robotic trajectory adjustments, optimizing collaborative task allocation in disassembly.

3.6.9. Real-Time Inference and Monte Carlo Dropout

The trained ONNX model predicts user movement trajectories and corresponding robotic adjustments during inference. Monte Carlo (MC) dropout is applied to quantify uncertainty, enabling the following:

More reliable robotic trajectory adjustments by considering uncertainty estimates.
Enhanced user experience by adapting training difficulty based on confidence levels in motion predictions.
Identification of high-variance cases where additional data collection or model fine-tuning is needed.

This approach ensures that the system remains adaptable to real-world user behavior and motion variations.

3.6.10. Validation and Refinement

The system undergoes validation using synthetic scenarios generated in Unity, simulating user–robot collaboration for disassembling a hard disk. The validation process includes the following:

Comparison between predicted and actual motion sequences using RMSE, MAE, and R² metrics (Table 1).
Uncertainty estimation using MC dropout to analyze prediction confidence (Figure 7).
Error analysis through distribution plots to refine the model (Figure 8).

Initial deployment tests in Unity using ONNX indicate smooth real-time inference, with predictions aligning well with expected motion trajectories. Further optimizations will reduce latency and ensure consistent inference across different hardware configurations.

The insights obtained from these tests iteratively refine the LSTM model, enhancing accuracy and robustness under diverse training conditions.

3.6.11. Benefits of the Integration

Efficiency: ONNX provides fast and lightweight inference suitable for real-time VR applications [55].
Cross-platform compatibility: The ONNX format enables deployment across different devices and hardware configurations [55].
Reproducibility: encapsulation in a Docker container ensures that all components (ROS, Unity, and model dependencies) remain consistent across various development and deployment environments [58].
Uncertainty-aware predictions: MC dropout improves prediction reliability by incorporating uncertainty into robotic motion planning.

Deployment and Future Considerations

This structured integration of deep learning within a virtual training environment effectively enhances human–robot collaboration. Future improvements will explore expanding the dataset with real-world motion capture data. While synthetic data have provided a practical foundation for training, real-world validation remains crucial for assessing the system’s generalizability. This will involve collecting user motion data from physical training sessions and evaluating how well the VR-based model translates to real disassembly scenarios.

The integration of synthetic data, Monte Carlo dropout, and ONNX-based real-time inference lays the groundwork for scalable and adaptable motion prediction models in VR-based human–robot collaboration systems.

4. VR Training and Discussion

The VR training system developed integrates cutting-edge virtual reality technology to create an immersive and interactive environment for technical training. This system is designed to simulate real-world tasks within a controlled setting, allowing for detailed monitoring and evaluation of user performance. The primary focus of this system revolves around a VR scenario tailored for a disassembly task, highlighting the system’s potential to enhance technical skills and accuracy.

4.1. Hard Disk Disassembly Task

The case study centers on disassembling a hard disk within the VR environment. Users are presented with a virtual hard disk model and a set of tools necessary for the task. The objective is to successfully disassemble it, working alongside the robot using the provided virtual tools and adhering to the standard procedures used in actual hardware maintenance. The user actions are highlighted with a purple material like in Figure 9 and the robot actions are highlighted with a green material like in Figure 10. The corresponding instructions are also shown in the environment to guide the user to use the specific tool for the task.

4.2. Training Setup

The VR setup includes a head-mounted display (HMD)—Quest Pro in this case—hand-tracking, and a space that mimics a typical technician’s workstation. The Unity engine and the Meta Presence SDK facilitate realistic interactions with virtual components, such as unscrewing bolts or replacing the disk platter. A demonstration of the VR training environment setup for human–robot collaborative disassembly tasks is shown in Figure 11.

4.3. Demonstration of a Short Disassembly Task

We conducted a short disassembly demonstration to evaluate the effectiveness of our VR training environment for human–robot collaborative disassembly tasks. Figure 12 presents snapshots of key steps in the disassembly process, performed collaboratively by a human and a robot (note: the full video is available at https://drive.google.com/file/d/1yc43qBkES99NDzTyj84-tnFHQBmjO3qO/view, (accessed on 16 March 2025).

Safety is a critical component of the disassembly task. As illustrated in Figure 11, the system provides visual and auditory feedback to alert users of errors during disassembly, guiding them in avoiding hazards while handling sensitive components. To further demonstrate the effectiveness of our VR training system, we present a graphical representation of the movement patterns between the human operator and the robot during a typical disassembly task, as shown in Figure 13. The blue line represents the human’s movement path while the green line traces the robot’s motions. The points where these paths intersect and diverge highlight areas of collaboration and independent action. This synchronization is crucial for ensuring the human and robot can work together seamlessly, minimizing interference and maximizing safety and efficiency. Analyzing these movement patterns allows us to fine-tune the robot’s programming to anticipate human actions better, resulting in more intuitive and responsive interactions. This enhances the training experience and prepares both the human and robot for smoother collaboration in real-world tasks.

4.4. Discussion

Analysis of the effectiveness of the VR training indicates significant improvements in task performance due to the immersive and interactive nature of the system. Integrating advanced tracking technologies and realistic interaction scenarios significantly enhances learning outcomes and user engagement. Comparisons with traditional training methods show that VR training increases the retention and applicability of learned skills.

4.4.1. Training Performance Analysis

To evaluate the effectiveness of the VR training environment, we analyzed key performance metrics over multiple training sessions.

Figure 14 presents the trends in task completion time and error rate, demonstrating a consistent improvement with additional training. The decreasing task completion times indicate that users become more efficient over repeated sessions while the declining error rate suggests increased precision in task execution.

Similarly, Figure 15 shows increased motion path efficiency over sessions, reflecting enhanced coordination between human operators and the robot. The trend suggests that VR-based training helps users develop more optimal movement strategies, minimizing unnecessary actions and improving collaboration.

Real-time metrics are displayed within the virtual environment to evaluate user performance during the VR-based human–robot collaborative disassembly task. This allows users to monitor their efficiency and adjust their actions accordingly.

Figure 16 illustrates the in-game metrics dashboard, which provides real-time feedback on four key performance indicators:

Timer: displays the session duration in minutes.
Task completion time: shows the time to complete the last disassembly step.
Error rate: tracks the percentage of incorrect actions performed by the user.
User engagement: represents an engagement score based on interaction frequency and consistency.

This dashboard enhances user awareness and encourages iterative improvement by providing instant feedback during training.

4.4.2. Effectiveness

Hand- and body-tracking integration proved instrumental in enhancing the training experience by providing real-time feedback and allowing for natural interactions. The data suggest that such immersive training significantly improves precision in tasks that require fine motor skills, as evidenced by the reduced error rates and increased speed of completion.

The VR training system creates a psychologically safe environment where users can practice potentially hazardous disassembly procedures without the risk of damaging expensive components or causing workplace injuries. This safety aspect allows trainees to experiment freely, make mistakes, and learn from them without real-world consequences. Our observations indicate that this psychological safety accelerates confidence building, as users become more willing to attempt complex procedures they might otherwise approach with hesitation. However, the system must balance this safety with appropriate feedback that maintains awareness of real-world risks. Visual and auditory cues during error states (as shown in Figure 12f) help maintain this awareness, ensuring that the confidence gained in VR translates appropriately to physical environments where consequences are tangible.

4.4.3. Comparison with Traditional Training Methods

Compared to conventional training methods, the VR system speeds up the learning process and engages users more effectively, maintaining high attention levels throughout the session. The VR environment’s immediate feedback and interactive nature foster enhanced learning outcomes, preparing participants for real-world applications more effectively than traditional theoretical training.

While the VR environment provides a consistent and organized training space compared to potentially cluttered real-world workstations (as seen in the comparison between Figure 1 and Figure 2), this controlled nature presents both advantages and limitations. The consistent environment enables focused learning without distractions and standardized evaluation metrics across users. However, this idealized environment may not fully prepare trainees for the unpredictability of real-world scenarios where tools may be misplaced, components may vary in condition, or environmental factors may interfere with task execution. Future iterations of the system will incorporate controlled variability to better bridge this gap between virtual consistency and real-world unpredictability.

4.5. Implications for VR Training

The findings from this case study highlight the transformative potential of VR training systems in enhancing technical education and training across various sectors. By simulating a diverse array of realistic scenarios, such as complex disk assembly and disassembly, VR facilitates a deeper understanding of user performance and generates synthetic data that can be instrumental in developing more sophisticated machine learning models. This capability to enrich training with detailed analytics and adaptable scenarios extends the applicability of VR technologies to various industries, including electronics, automotive, and aerospace, setting a robust foundation for future research into its expansive potential in professional training environments.

4.6. Limitations of the System

While our system demonstrates promising results in enhancing training efficiency, several limitations should be acknowledged. First, the current implementation lacks multi-user scenarios, which Shamsuzzoha et al. [59] identified as crucial for comprehensive industrial training. Future work will explore collaborative VR environments where multiple users can interact simultaneously with robotic systems, enabling team-based training scenarios that better reflect real-world industrial settings.

Second, potential tracking inaccuracies may affect motion prediction accuracy. The Meta Quest Pro’s Inside-Out Body-Tracking technology faces inherent limitations, particularly for lower body tracking [46]. While “Generative Legs” provides AI-based predictions for lower body movements, it cannot track specific foot positions or individual leg movements [47]. Similarly, hand tracking encounters challenges with occlusion when hands overlap or move outside the camera’s field of view [48], potentially affecting precision during complex manipulation tasks. Chen et al. [60] demonstrated how these tracking limitations can impact disassembly task evaluation, particularly when precise movements are required. Further, the inclusion of an immersive sensory experience may also affect the operator’s fatigue levels [60]. Future implementations will incorporate multi-sensor fusion approaches to improve tracking robustness and accuracy.

Third, our current LSTM model, while effective for motion prediction, could benefit from comparison with alternative architectures. Transformer-based models have shown promising results in sequential prediction tasks, as demonstrated by Daneshmand et al. [34], who surveyed various deep learning approaches for robotic assembly and disassembly in Industry 4.0 contexts. Hjorth and Chrysostomou [30] highlighted the potential of hybrid approaches combining deep learning with physical models for improved prediction accuracy in human–robot collaborative disassembly. Jacob et al. [14] further emphasized the importance of exploring various prediction architectures to address the unique challenges of human–robot cooperation in disassembly tasks. Future work will explore these alternatives to potentially improve prediction accuracy and computational efficiency while maintaining the real-time performance required for immersive VR environments.

Fourth, further validation in real-world industrial settings is needed to fully assess the system’s effectiveness, following evaluation frameworks proposed by Schwarz et al. [8]. This validation will include comparative studies between VR-trained and traditionally trained operators performing actual disassembly tasks, measuring the transfer of learning and retention over time. Yang et al. [52] demonstrated the importance of validating virtual haptic disassembly platforms against real-world performance metrics, particularly when considering the disassembly process and physical attributes of components.

5. Conclusions and Future Work

5.1. Conclusions

This study demonstrates the effectiveness of a VR-based training system for technical skills development in hard disk drive disassembly tasks. By integrating advanced body- and hand-tracking technologies with LSTM-based motion prediction, the system creates an immersive environment that enhances both training outcomes and human–robot collaboration.

Our implementation shows several key advantages over traditional training methods:

Enhanced engagement: the interactive VR environment increases user engagement through realistic simulation of industrial tasks.
Accelerated learning: immediate feedback and risk-free repetition enable faster skill acquisition without material costs.
Improved precision: advanced tracking technology ensures proper technique execution, particularly for fine motor skills.
Data-driven insights: the system collects comprehensive performance metrics that inform both training optimization and motion prediction models.

The LSTM-based motion prediction framework demonstrates promising results in forecasting human movements during disassembly tasks, enabling more intuitive human–robot collaboration. The system’s ability to generate synthetic training data further enhances the robustness of these predictive models.

Our approach aligns with recent work by Qu et al. [61], who demonstrated the effectiveness of reinforcement learning for robotic disassembly task training and skill transfer. However, our system extends beyond purely robotic learning to incorporate human–robot collaboration in an immersive VR environment, addressing the need for integrated training solutions identified by Jacob et al. [14].

Future work will explore the integration of ergonomic design principles as suggested by Bortolini et al. [11] to further enhance the training experience and reduce physical strain during collaborative tasks. Additionally, we plan to investigate multi-user training scenarios [59] and the integration of more advanced robotic systems [35] to expand the applicability of our framework.

Furthermore, we will focus on improving the UI/UX experience through iterative design processes that incorporate user-centered design methodologies. This will include enhancing the visual feedback mechanisms, implementing more intuitive interaction patterns, and developing adaptive interfaces that respond to individual user proficiency levels. The current metrics dashboard will be expanded to provide more granular and personalized feedback, helping users identify specific areas for improvement while maintaining an uncluttered visual experience.

To validate these enhancements, we will conduct comprehensive user studies with participants of varying expertise levels, from novices to experienced technicians. These studies will employ mixed-methods approaches combining quantitative performance metrics (task completion time, error rates, and learning curves) with qualitative assessments (cognitive load measurements, user satisfaction surveys, and semi-structured interviews). Special attention will be paid to evaluating the system’s learnability, efficiency, memorability, error prevention, and overall user satisfaction following established usability heuristics. The findings from these studies will not only validate the effectiveness of our UI/UX improvements but also provide valuable insights for further refinements to maximize training outcomes in human–robot collaborative disassembly tasks.

As Industry 4.0 continues to transform manufacturing processes, our VR training system provides a valuable tool for preparing workers to collaborate effectively with robotic systems in disassembly tasks. By combining immersive VR, advanced motion prediction, and synthetic data generation, we contribute to the development of more efficient, safe, and adaptable human–robot collaborative workflows.

5.2. Future Work

Building on these findings, we identify several directions for future research. We plan to implement additional industrial scenarios beyond hard drive disassembly to validate the system’s flexibility across different applications. To enhance prediction capabilities, we will explore alternative deep learning architectures, such as Transformer models, that may improve motion prediction accuracy. The integration of eye-tracking technology will provide deeper insights into user attention patterns during complex tasks, potentially revealing optimization opportunities for task sequencing and interface design. We also aim to develop multi-user collaborative scenarios where multiple operators can interact simultaneously with robotic systems, better reflecting real industrial team environments. Finally, longitudinal validation studies will be conducted to measure skill transfer and retention from VR to real-world environments, establishing empirical evidence for the long-term effectiveness of our approach. These enhancements will further strengthen the connection between immersive training and predictive modeling for human–robot collaboration in industrial settings.

These enhancements will further strengthen the connection between immersive training and predictive modeling for human–robot collaboration in industrial settings.

5.3. Implications for Industry

Developing sophisticated VR training systems has significant implications for the industry, particularly in fields where precision and accuracy are paramount. The utilization of synthetic data generation within these systems is a key innovation that enhances their capability further, allowing for the training of machine learning models that can adapt to user behaviors. This makes the training process more robust and highly adaptable to the nuances of individual performance, paving the way for highly personalized training experiences.

As technology progresses, it is anticipated that VR training systems will become a staple in technical education and professional development across various industries, revolutionizing the way practical skills are taught and refined.

Author Contributions

Conceptualization, B.L.; funding acquisition, X.L., M.Z., and B.L.; methodology, Y.M. and B.L.; project administration, X.L., M.Z., and B.L.; software, Y.M. and S.T.; supervision, B.L.; visualization, Y.M.; writing—original draft, Y.M.; writing—review and editing, S.T., X.L., M.Z., and B.L. All authors have read and agreed to the published version of the manuscript.

Funding

National Science Foundation (NSF) Directorate for Engineering (grant no. 2132773/2506209 and 2132923/2422640).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors acknowledge the funding support from NSF. However, the views expressed here are those of the authors and are not necessarily those of the NSF.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AR	Augmented reality
VR	Virtual reality
MR	Mixed reality
HRC	Human–robot collaboration
ROS	Robot Operating System
URDF	Unified Robot Description Format
LSTM	Long Short-Term Memory
RNN	Recurrent Neural Network
TCT	Task completion time
ONNX	Open Neural Network Exchange

References

Mourtzis, D.; Zogopoulos, V.; Vlachou, E. Augmented Reality Application to Support Remote Maintenance as a Service in the Robotics Industry. Procedia CIRP 2017, 63, 46–51. [Google Scholar] [CrossRef]
Webel, S.; Bockholt, U.; Engelke, T.; Gavish, N.; Olbrich, M.; Preusche, C. An augmented reality training platform for assembly and maintenance skills. Robot. Auton. Syst. 2013, 61, 398–403. [Google Scholar] [CrossRef]
Gavish, N.; Gutierrez, T.; Webel, S.; Rodriguez, J.; Tecchia, F. Design Guidelines for the Development of Virtual Reality and Augmented Reality Training Systems for Maintenance and Assembly Tasks. Bio Web Conf. 2011, 1, 00029. [Google Scholar] [CrossRef]
Li, J.R.; Khoo, L.P.; Tor, S.B. Desktop virtual reality for maintenance training: An object oriented prototype system (V-REALISM). Comput. Ind. 2003, 52, 109–125. [Google Scholar] [CrossRef]
Tao, F.; Zhang, H.; Liu, A.; Nee, A.Y.C. Digital Twin in Industry: State-of-the-Art. IEEE Trans. Ind. Inform. 2019, 15, 2405–2415. [Google Scholar] [CrossRef]
Werrlich, S.; Nitsche, K.; Notni, G. Demand Analysis for an Augmented Reality based Assembly Training. In Proceedings of the PETRA ’17: 10th International Conference on PErvasive Technologies Related to Assistive Environments, Island of Rhodes, Greece, 21–23 June 2017; pp. 416–422. [Google Scholar]
Hořejší, P. Augmented Reality System for Virtual Training of Parts Assembly. Procedia Eng. 2015, 100, 699–706. [Google Scholar] [CrossRef]
Schwarz, S.; Regal, G.; Kempf, M.; Schatz, R. Learning Success in Immersive Virtual Reality Training Environments: Practical Evidence from Automotive Assembly. In Proceedings of the NordiCHI ’20: 11th Nordic Conference on Human-Computer Interaction, Tallinn, Estonia, 25–29 October 2020; pp. 1–11. [Google Scholar] [CrossRef]
Lerner, M.A.; Ayalew, M.; Peine, W.J.; Sundaram, C.P. Does Training on a Virtual Reality Robotic Simulator Improve Performance on the da Vinci^® Surgical System? J. Endourol. 2010, 24, 467–472. [Google Scholar] [CrossRef]
Dwivedi, P.; Cline, D.; Jose, C.; Etemadpour, R. Manual Assembly Training in Virtual Environments. In Proceedings of the 2018 IEEE 18th International Conference on Advanced Learning Technologies, Mumbai, India, 9–13 July 2018; pp. 395–401. [Google Scholar]
Bortolini, M.; Botti, L.; Galizia, F.G.; Mora, C. Ergonomic Design of an Adaptive Automation Assembly System. Machines 2023, 11, 898. [Google Scholar] [CrossRef]
Carlson, R.; Gonzalez, R.; Geary, D. Enhancing Training Effectiveness with Synthetic Data and Feedback in VR Environments. J. Appl. Ergon. 2015, 46, 56–65. [Google Scholar]
Liu, W.; Liang, X.; Zheng, M. Task-Constrained Motion Planning Considering Uncertainty-Informed Human Motion Prediction for Human–Robot Collaborative Disassembly. IEEE/ASME Trans. Mechatron. 2023, 28, 2056–2071. [Google Scholar] [CrossRef]
Jacob, S.; Klement, N.; Bearee, R.; Pacaux-Lemoine, M.P. Human-Robot Cooperation in Disassembly: A Rapid Review. In Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics, Porto, Portugal, 18–20 November 2024; SCITEPRESS-Science and Technology Publications: Setubal, Portugal, 2024; Volume 2, pp. 212–219. [Google Scholar]
Lee, Y.S.; Rashidi, A.; Talei, A.; Beh, H.J.; Rashidi, S. A Comparison Study on the Learning Effectiveness of Construction Training Scenarios in a Virtual Reality Environment. Virtual Worlds 2023, 2, 36–52. [Google Scholar] [CrossRef]
Schwarz, M.; Weser, J.; Martinetz, T.; Pawelzik, K. Immersive Virtual Reality for Cognitive Training: A Pilot Study on Spatial Navigation in the Cave Automatic Virtual Environment. Front. Psychol. 2020, 11, 579993. [Google Scholar]
Rocca, R.; Rosa, P.; Fumagalli, L.; Terzi, S. Integrating Virtual Reality and Digital Twin in Circular Economy Practices: A Laboratory Application Case. Sustainability 2020, 12, 2286. [Google Scholar] [CrossRef]
Boud, A.C.; Baber, C.; Steiner, S.J. Virtual Reality: A Tool for Assembly? Presence 2000, 9, 486–496. [Google Scholar] [CrossRef]
Al-Ahmari, A.M.; Abidi, M.H.; Ahmad, A.; Darmoul, S. Development of a virtual manufacturing assembly simulation system. Adv. Mech. Eng. 2016, 8, 1687814016639824. [Google Scholar] [CrossRef]
Numfu, M.; Riel, A.; Noel, F. Virtual Reality Technology for Maintenance Training. Appl. Sci. Eng. Prog. 2020, 13, 274–282. [Google Scholar] [CrossRef]
Gutiérrez, T.; Rodríguez, J.; Vélaz, Y.; Casado, S.; Suescun, A.; Sánchez, E.J. IMA-VR: A multimodal virtual training system for skills transfer in Industrial Maintenance and Assembly tasks. In Proceedings of the 19th International Symposium in Robot and Human Interactive Communication, Viareggio, Italy, 13–15 September 2010; pp. 428–433, ISSN 1944–9437. [Google Scholar] [CrossRef]
Li, Z.; Wang, J.; Yan, Z.; Wang, X.; Anwar, M.S. An Interactive Virtual Training System for Assembly and Disassembly Based on Precedence Constraints. In Advances in Computer Graphics, Proceedings of the 36th Computer Graphics International Conference, CGI 2019, Calgary, AB, Canada, 17–20 June 2019; Gavrilova, M., Chang, J., Thalmann, N.M., Hitzer, E., Ishikawa, H., Eds.; Springer: Cham, Switzerland, 2019; pp. 81–93. [Google Scholar] [CrossRef]
Westerfield, G.; Mitrovic, A.; Billinghurst, M. Intelligent Augmented Reality Training for Industrial Operations. IEEE Trans. Learn. Technol. 2014, 7, 331–344. [Google Scholar]
Pan, X.; Cui, X.; Huo, H.; Jiang, Y.; Zhao, H.; Li, D. Virtual Assembly of Educational Robot Parts Based on VR Technology. In Proceedings of the 2019 IEEE 11th International Conference on Engineering Education (ICEED), Kanazawa, Japan, 6–7 November 2019; pp. 1–5. [Google Scholar] [CrossRef]
Seth, A.; Vance, J.M.; Oliver, J.H. Virtual reality for assembly methods prototyping: A review. Virtual Real. 2011, 15, 5–20. [Google Scholar] [CrossRef]
Devagiri, J.S.; Paheding, S.; Niyaz, Q.; Yang, X.; Smith, S. Augmented Reality and Artificial Intelligence in industry: Trends, tools, and future challenges. Expert Syst. Appl. 2022, 207, 118002. [Google Scholar] [CrossRef]
Zhao, X. Extended Reality for Safe and Effective Construction Management: State-of-the-Art, Challenges, and Future Directions. Buildings 2023, 13, 155. [Google Scholar] [CrossRef]
Lee, M.L.; Liu, W.; Behdad, S.; Liang, X.; Zheng, M. Robot-Assisted Disassembly Sequence Planning with Real-Time Human Motion Prediction. IEEE Trans. Syst. Man Cybern. Syst. 2022, 53, 438–450. [Google Scholar] [CrossRef]
Vongbunyong, S.; Vongseela, P.; Sreerattana-aporn, J. A Process Demonstration Platform for Product Disassembly Skills Transfer. Procedia CIRP 2017, 61, 281–286. [Google Scholar] [CrossRef]
Hjorth, S.; Chrysostomou, D. Human–robot collaboration in industrial environments: A literature review on non-destructive disassembly. Robot. Comput. Integr. Manuf. 2022, 73, 102208. [Google Scholar] [CrossRef]
Ottogalli, K.; Rosquete, D.; Rojo, J.; Amundarain, A.; María Rodríguez, J.; Borro, D. Virtual reality simulation of human-robot coexistence for an aircraft final assembly line: Process evaluation and ergonomics assessment. Int. J. Comput. Integr. Manuf. 2021, 34, 975–995. [Google Scholar] [CrossRef]
Kothari, A. Real-Time Motion Prediction for Efficient Human-Robot Collaboration. Master’s Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2023. Available online: https://dspace.mit.edu/handle/1721.1/152639 (accessed on 16 March 2025).
Liu, W.; Liang, X.; Zheng, M. Dynamic Model Informed Human Motion Prediction Based on Unscented Kalman Filter. IEEE/ASME Trans. Mechatron. 2022, 27, 5287–5295. [Google Scholar] [CrossRef]
Daneshmand, M.; Noroozi, F.; Corneanu, C.; Mafakheri, F.; Fiorini, P. Industry 4.0 and prospects of circular economy: A survey of robotic assembly and disassembly. Int. J. Adv. Manuf. Technol. 2023, 124, 2973–3000. [Google Scholar] [CrossRef]
Kuts, V.; Cherezova, N.; Sarkans, M.; Otto, T. Digital Twin: Industrial robot kinematic model integration to the virtual reality environment. J. Mach. Eng. 2020, 20, 53–64. [Google Scholar] [CrossRef]
Xu, X.; Guo, P.; Zhai, J.; Zeng, X. Robotic kinematics teaching system with virtual reality, remote control and an on–site laboratory. Int. J. Mech. Eng. Educ. 2020, 48, 197–220. [Google Scholar] [CrossRef]
Pérez, L.; Rodríguez-Jiménez, S.; Rodríguez, N.; Usamentiaga, R.; García, D.F. Digital Twin and Virtual Reality Based Methodology for Multi-Robot Manufacturing Cell Commissioning. Appl. Sci. 2020, 10, 3633. [Google Scholar] [CrossRef]
Sassanelli, C.; Rosa, P.; Terzi, S. Supporting disassembly processes through simulation tools: A systematic literature review with a focus on printed circuit boards. J. Manuf. Syst. 2021, 60, 429–448. [Google Scholar] [CrossRef]
Zhao, F.; Deng, W.; Pham, D.T. A Robotic Teleoperation System with Integrated Augmented Reality and Digital Twin Technologies for Disassembling End-of-Life Batteries. Batteries 2024, 10, 382. [Google Scholar] [CrossRef]
Zhao, G.; Wang, Y. Development of Machine Tool Disassembly and Assembly Training and Digital Twin Model Building System based on Virtual Simulation. Int. J. Mech. Electr. Eng. 2024, 2, 50–59. [Google Scholar] [CrossRef]
Pérez, L.; Diez, E.; Usamentiaga, R.; García, D.F. Industrial robot control and operator training using virtual reality interfaces. Comput. Ind. 2019, 109, 114–120. [Google Scholar] [CrossRef]
Zhao, F.; Pham, D.T. Integration of Augmented Reality and Digital Twins in a Teleoperated Disassembly System. In Advances in Remanufacturing, Proceedings of the VII International Workshop on Autonomous Remanufacturing, IWAR 2023, Detroit, MI, USA, 16–18 October 2023; Fera, M., Caterino, M., Macchiaroli, R., Pham, D.T., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2024; pp. 93–105. [Google Scholar] [CrossRef]
Lewczuk, K. Virtual Reality Application for the Safety Improvement of Intralogistics Systems. Sustainability 2024, 16, 6024. [Google Scholar] [CrossRef]
Siemens AG. ROS#: Integration of Unity3D with ROS 2019. Available online: https://github.com/siemens/ros-sharp (accessed on 16 March 2025).
Unity-Technologies. Unity Barracuda Documentation 2024. Available online: https://docs.unity3d.com/Packages/com.unity.barracuda%401.0/manual/index.html (accessed on 16 March 2025).
Meta Platforms Inc. Inside-Out Body Tracking and Generative Legs; Meta Platforms Inc.: Menlo Park, CA, USA, 2023; Available online: https://developers.meta.com/horizon/blog/inside-out-body-tracking-and-generative-legs/ (accessed on 16 March 2025).
Meta Platforms Inc. Movement Overview; Meta Platforms Inc.: Menlo Park, CA, USA, 2023; Available online: https://developers.meta.com/horizon/documentation/unity/move-overview (accessed on 16 March 2025).
Meta Platforms Inc. Learn About Hand and Body Tracking on Meta Quest; Meta Platforms Inc.: Menlo Park, CA, USA, 2024; Available online: https://www.meta.com/help/quest/290147772643252/ (accessed on 16 March 2025).
Behdad, S.; Berg, L.P.; Thurston, D.; Vance, J. Leveraging Virtual Reality Experiences with Mixed-Integer Nonlinear Programming Visualization of Disassembly Sequence Planning Under Uncertainty. J. Mech. Des. 2014, 136, 041005. [Google Scholar] [CrossRef]
Denavit, J.; Hartenberg, R.S. A kinematic notation for lower-pair mechanisms based on matrices. J. Appl. Mech. 1955, 22, 215–221. [Google Scholar] [CrossRef]
Technologies, U. Articulation Body Component Reference. 2021. Available online: https://docs.unity3d.com/2021.3/Documentation/Manual/class-ArticulationBody.html (accessed on 13 March 2025).
Yang, Y.; Yang, P.; Li, J.; Zeng, F.; Yang, M.; Wang, R.; Bai, Q. Research on virtual haptic disassembly platform considering disassembly process. Neurocomputing 2019, 348, 74–81. [Google Scholar] [CrossRef]
Tahriri, F.; Mousavi, M.; Yap, I.D.H.J. Optimizing the Robot Arm Movement Time Using Virtual Reality Robotic Teaching System. Int. J. Simul. Model. 2015, 14, 28–38. [Google Scholar] [CrossRef]
Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An open-source Robot Operating System. In Proceedings of the ICRA Workshop on Open Source Software, Kobe, Japan, 12–17 May 2009; Volume 3, p. 5. [Google Scholar]
Corporation, M. ONNX Runtime: A High-Performance Inference Engine for Machine Learning Models 2024. Available online: https://onnxruntime.ai (accessed on 13 March 2025).
Liu, Q.; Liu, Z.; Xu, W.; Tang, Q.; Zhou, Z.; Pham, D.T. Human-robot collaboration in disassembly for sustainable manufacturing. Int. J. Prod. Res. 2019, 57, 4027–4044. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 10761–10771. [Google Scholar]
Merkel, D. Docker: Lightweight Linux containers for consistent development and deployment. Linux J. 2014, 239, 2. [Google Scholar]
Shamsuzzoha, A.; Toshev, R.; Vu Tuan, V.; Kankaanpaa, T.; Helo, P. Digital factory—Virtual reality environments for industrial training and maintenance. Interact. Learn. Environ. 2021, 29, 1339–1362. [Google Scholar] [CrossRef]
Chen, J.; Mitrouchev, P.; Coquillart, S.; Quaine, F. Disassembly task evaluation by muscle fatigue estimation in a virtual reality environment. Int. J. Adv. Manuf. Technol. 2017, 88, 1523–1533. [Google Scholar] [CrossRef]
Qu, M.; Wang, Y.; Pham, D.T. Robotic Disassembly Task Training and Skill Transfer Using Reinforcement Learning. IEEE Trans. Ind. Inform. 2023, 19, 10934–10943. [Google Scholar] [CrossRef]

Figure 1. Real-world Setup.

Figure 2. Virtual world training environment.

Figure 3. Updated Unity–ROS architecture with AI-based motion prediction using Barracuda.

Figure 4. Task flowchart.

Figure 5. Tracked body points.

Figure 6. LSTM model.

Figure 7. LSTM predictions with Monte Carlo dropout uncertainty estimation.

Figure 8. True vs. predicted motion values.

Figure 9. Awaiting user action.

Figure 10. Awaiting robot action.

Figure 11. The setup of VR training environment for disassembly task.

Figure 12. A short demonstration of a set of human-robot collaborative disassembly tasks for a hard disk device: (a) Initial setup with the hard disk device positioned for disassembly; (b) Robot positioning and alignment for collaborative work; (c) Human operator carefully removing a disassembled component and placing it in the designated bin; (d) Robot dynamically adjusting its trajectory to avoid collision with human hands during collaborative workspace interaction; (e) Robot arm autonomously grasping a separated component and placing it in the bin; (f) Error state notification with visual feedback when incorrect action is detected; (g) Successful component removal and task completion.

Figure 13. Comparison of human and robot motion paths in VR training.

Figure 14. Overall training performance—task completion time and error rate over 7 sessions.

Figure 15. Motion path efficiency over training sessions, showing increased optimal movement patterns.

Figure 16. The in-game metrics dashboard displays real-time feedback on session duration, task completion time, error rate, and user engagement, allowing users to assess and optimize their performance.

Table 1. Performance metrics for the LSTM Model.

Metric	Value
Root Mean Squared Error (RMSE)	0.0179
Mean Absolute Error (MAE)	0.0060
R² score	0.4965

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Maddipatla, Y.; Tian, S.; Liang, X.; Zheng, M.; Li, B. VR Co-Lab: A Virtual Reality Platform for Human–Robot Disassembly Training and Synthetic Data Generation. Machines 2025, 13, 239. https://doi.org/10.3390/machines13030239

AMA Style

Maddipatla Y, Tian S, Liang X, Zheng M, Li B. VR Co-Lab: A Virtual Reality Platform for Human–Robot Disassembly Training and Synthetic Data Generation. Machines. 2025; 13(3):239. https://doi.org/10.3390/machines13030239

Chicago/Turabian Style

Maddipatla, Yashwanth, Sibo Tian, Xiao Liang, Minghui Zheng, and Beiwen Li. 2025. "VR Co-Lab: A Virtual Reality Platform for Human–Robot Disassembly Training and Synthetic Data Generation" Machines 13, no. 3: 239. https://doi.org/10.3390/machines13030239

APA Style

Maddipatla, Y., Tian, S., Liang, X., Zheng, M., & Li, B. (2025). VR Co-Lab: A Virtual Reality Platform for Human–Robot Disassembly Training and Synthetic Data Generation. Machines, 13(3), 239. https://doi.org/10.3390/machines13030239

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

VR Co-Lab: A Virtual Reality Platform for Human–Robot Disassembly Training and Synthetic Data Generation

Abstract

1. Introduction

2. Related Work

2.1. VR/AR Training Systems for Industrial Applications

2.2. Human Motion Prediction in HRC

2.3. Inverse and Forward Kinematics in Robotic Training

2.4. Digital Twins and Industry 4.0 Integration

2.5. Comparison with Existing Approaches

3. VR Training System Implementation

3.1. Technology Stack

3.2. VR Training System Architecture

3.2.1. VR Environment

3.2.2. Robotic Control System

3.2.3. Data Collection and Feedback Mechanisms

3.2.4. Task Flow and User Actions

3.3. Body Tracking and Interaction Through Meta SDKs

3.4. Performance Monitoring and Synthetic Data Generation

3.4.1. Performance Metrics

Synthetic Data for Machine Learning

3.5. Mathematical Foundations and Implementation of Forward Kinematics

3.5.1. Implementation in Unity

3.5.2. Mathematical Workflow

3.5.3. Applications in Robotic Training

3.6. Integrated Pipeline for Predictive Analytics

3.6.1. Synthetic Data Generation and Augmentation

3.6.2. Data Cleaning and Preprocessing

3.6.3. Data Augmentation

3.6.4. Augmented Data for Training

3.6.5. LSTM Model Architecture and Training

Monte Carlo Dropout for Uncertainty Estimation

Loss Function and Training

3.6.6. Model Training and Evaluation

Training Setup

Evaluation Metrics

Visualization and Results

3.6.7. Docker-Based Deployment

3.6.8. Integration into Unity Environment

3.6.9. Real-Time Inference and Monte Carlo Dropout

3.6.10. Validation and Refinement

3.6.11. Benefits of the Integration

Deployment and Future Considerations

4. VR Training and Discussion

4.1. Hard Disk Disassembly Task

4.2. Training Setup

4.3. Demonstration of a Short Disassembly Task

4.4. Discussion

4.4.1. Training Performance Analysis

4.4.2. Effectiveness

4.4.3. Comparison with Traditional Training Methods

4.5. Implications for VR Training

4.6. Limitations of the System

5. Conclusions and Future Work

5.1. Conclusions

5.2. Future Work

5.3. Implications for Industry

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI