1. Introduction
Container handling is a core aspect of port operations, requiring precise handling and strong safety awareness from operators, which directly impacts the successful delivery of goods. However, the frequent use of heavy machinery at container terminals often leads to accidents, resulting in severe economic losses, injuries, and even fatalities [
1]. Since absolute safety cannot be guaranteed in these operations, and given the significant risks involved, safety training is particularly crucial. One of the main causes of frequent accidents is improper handling by operators. Traditional port operator training relies on real equipment operation and experience-based guidance, which is costly and risky. Moreover, it cannot comprehensively simulate various emergency situations or recreate accident scenarios, thus limiting the effectiveness and scope of training. Researching and developing effective safety training methods and techniques to improve operators’ safety awareness and precision can reduce the frequency and severity of accidents, as well as significantly lower economic losses and personal injuries.
Traditional port operator training relies on real equipment operation and experience-based guidance to develop the skills necessary for crane operation. However, this method is costly and risky. On-site crane operation training often unavoidably hinders normal construction processes and exposes novice operators to danger, as even a minor mistake can lead to serious incidents, such as injuries, delays, and budget overruns [
2].
The advent of virtual reality (VR) technology has effectively replaced the traditional hands-on training model by offering a realistic experience while reducing costs and risks [
3,
4]. VR-based safety training not only provides a comprehensive and long-lasting platform to develop perceptual experiences and cognitive abilities in solving problems and making decisions under complex and stressful situations but also eliminates real-time exposure to danger or environmental conditions [
5]. In crane training, VR technology is often used to train operators [
2,
6].
Lin et al. [
7] demonstrated the use of 4D-BIM technology to create real-time augmented reality (AR) visualizations for crane operations, thereby optimizing both operational safety and efficiency. Fang and Teizer [
8] designed a multi-user virtual 3D training environment to improve teamwork between crane operators and ground personnel. Juang J. R. et al. [
9] developed a crane simulator named SimCrane 3D, which applies kinesthetic feedback and stereoscopic vision to the virtual crane simulator to enhance operator training. B. Patrão and P. Menezes [
10] proposed an immersive virtual reality simulator that uses a stereoscopic head-mounted display to deliver visual and auditory stimuli, thereby training tower crane operators. Noda Y et al. [
11] researched a training system that allows operators to safely and effectively master the operation of electric overhead cranes while suppressing load sway. Additionally, a distributed virtual reality bridge crane simulator system based on a 3D engine has been developed [
12].
Although many existing virtual crane training systems can develop operators’ practical skills, they mainly focus on enhancing performance and lack the simulation of potential accidents during operations. These systems also do not include the design of corresponding risk simulation modules, which makes them ineffective in enhancing operators’ safety risk awareness. Since container handling at container terminals is one of the most accident-prone activities in port operations, enhancing drivers’ risk awareness and providing comprehensive training for port operators is particularly important.
Safety risk assessment is a crucial part of safety management and an essential component of safety system engineering [
13]. Safety risk assessments mainly focus on areas such as coal mining [
14], subway stations [
15], construction sites [
16], and ports [
17,
18,
19,
20,
21,
22,
23]. In the context of ports, researchers have primarily studied loading and unloading operations, port equipment, and dangerous goods. For instance, Chen-Yu Lin et al. [
18] developed a novel region-specific risk assessment model for container ports. This model divides the container port into four areas—the loading and unloading area, internal transportation area, storage area, and gate area—based on the container transport process. By combining Failure Mode and Effect Analysis (FMEA) with a quantitative risk analysis model, it assesses the risk of accidents in each segment of the process. Alyami H. et al. [
19] proposed an advanced FMEA method based on a fuzzy rule-based Bayesian network (FRBN) to evaluate the severity of hazardous events (HEs) at container terminals. Sunaryo and Hamka [
20] developed a risk assessment model for container ports using hazard identification, risk assessment, and fault tree analysis methods. Y.L. Yang et al. [
21] analyzed the risk factors affecting safe loading and unloading operations, their causal relationships, and interconnections. They used the Decision-Making Trial and Evaluation Laboratory (DEMATEL) method to evaluate the core risk factors for container handling operations at Taiwan’s Kaohsiung Port. Other methods include safety assessment models for dangerous goods within the port [
22,
23].
In maritime navigation, various types of risk assessments are employed. Previous studies on quantitative risk analysis of maritime navigation have utilized different methods for risk evaluation, each with distinct applications, such as event trees (ETs), fault trees (FTs) [
24], Bayesian networks (BNs) [
25,
26], and the Analytic Hierarchy Process (AHP) [
27,
28]. ETs and FTs are particularly useful for analyzing the causal effects of specific risks and are easy to understand. However, they rely heavily on historical data and can become extremely time-consuming as the number of factors increases. BNs, on the other hand, are quantitative tools applied to maritime traffic modeling and have indeed extended into the field of maritime traffic safety. While BNs allow for the integration of data with expert knowledge and are suitable for complex systems with probabilistic uncertainties, their complexity and the difficulty in determining expert probabilities can present challenges [
29]. A summary and analysis of the literature are presented in
Table 1.
Currently, few risk assessment methods directly target quay crane operators at container ports, and they mostly focus on real operating scenarios, with limited research on risk assessment in virtual environments. Therefore, this paper proposes an automated evaluation method for the risk behaviors of quay crane operators at ports. Building upon the existing automated quay crane remote operation simulator, a risk simulation module is designed, and a Deep Q-Network (DQN) model is constructed to learn the operational methods and specifications of skilled operators. This model, trained using a large amount of simulated operational data, accurately reflects the operational behaviors and decision-making processes of skilled operators. Based on this model, a method for evaluating operators’ risk behaviors is proposed. Through interaction with the baseline model, the system can simulate various operational scenarios, monitor and evaluate operators’ behaviors in real time, and provide an objective and effective assessment of their risk behaviors.
2. Risk Simulation Module Design
In container port operations, improper handling by drivers can lead to severe property damage and casualties, making the standardization of driver operations and the evaluation of operational risks crucial. To ensure the scientific rigor and effectiveness of operational training and risk assessment methods for remote quay crane operators at ports, this study adds a risk simulation module to the existing automated quay crane remote operation intelligent simulation system.
Figure 1 shows the original automated quay crane remote operation simulator, which comprises a model physics engine module, a model motion control module, and a remote crane monitoring system data management module. This simulator not only accurately replicates the real operational scenarios of automated quay cranes but also enables the training of novice operators’ operational capabilities [
30].
The risk simulation module is developed based on one of the most significant risks that remote quay crane operators face during operations—collisions. It identifies and collects key data that could lead to collision risks during operations. These data help determine which operational methods and strategies employed by drivers are effective and safe. Therefore, in the automated quay crane remote operation simulator, this study uses the simulator’s sensor system to capture environmental information and record key data at every time point during the skilled drivers’ operations, including actions, speed, position, and the safety distance between the crane and the operational target. The specific data recorded are listed in
Table 2, and
Figure 2 illustrates the parameters corresponding to
Table 2. The physical quantities recorded in this study are classified into constants, control variables, and passive measurements. Control variables are those that the driver can manually control via the joystick, while passive measurements are the variables measured by the system during the driver’s operation. All variables in this study use the International System of Units (SI), and the ranges of other variables are specified in Equations (
10) and (
11).
These data help identify the potential risks that drivers may encounter under different operational conditions. For example, the speed of the trolley enables drivers to learn how to control it in various operating environments, particularly when approaching obstacles and target positions, to avoid collisions. The swinging angle and speed of the load are used to train drivers on their sensitivity to the swinging angle and how to effectively control the swinging speed, thereby improving their skills in maintaining load stability. The position of the spreader allows drivers to grasp the real-time location of the spreader, enhancing their spatial awareness of both the spreader and the surrounding environment, thus avoiding collisions. The relative distance between the load and the target position trains drivers to accurately judge and operate toward the target, reducing the risk of collisions caused by errors in distance judgment. The relative distance between the container and the truck pallet helps drivers precisely control the load position during loading and unloading, preventing collisions between the container and the pallet. The relative distance between the left and right sides of the load’s operational path along the side of the ship and obstacles is intended to train drivers to operate in complex environments, improving their perception and avoidance of obstacles on both sides. The relative distance between the load and obstacles is designed to keep drivers vigilant about the distance between the load and obstacles during operations, thereby avoiding collisions. The relative distance between the load and the top of the target container helps drivers accurately control the height of the load, preventing collisions with the top of the target container and avoiding container collapse.
3. Baseline Model Design Based on a DQN
This study uses the DQN algorithm to establish a baseline model for the operational behaviors of skilled quay crane operators. The DQN is a value-based deep reinforcement learning (DRL) method that combines deep neural networks with the Q-learning algorithm to train an agent (in this case, the simulated port operator) to make optimal decisions in a specific environment.
3.1. State Space
The state space represents the relevant characteristics of the port operation environment and is typically used as input information for DRL. In this model, the input state information is collected from the virtual operation simulator’s sensors, including data on the position and speed between the spreader and the container. The specific state information includes the following: the load mass m, wire rope length l, wire rope lifting speed , trolley speed , swing angle of the spreader , swing speed of the load , swing angular velocity , time taken to perform each operation t, spreader position , trolley position , relative distance between the load and the target position , relative distance between the container and the truck pallet , relative distance between the operational path along the ship’s side and obstacles on both sides , relative distance between the load and obstacles , and relative distance between the load and the top of the target container , totaling 15 states.
The final state space is defined as , which can represent the status of the spreader at any given moment in the environment.
3.2. Reward Function
In the process of learning and training the operational model of skilled drivers, the design of the reward function plays a crucial role, as it determines the effect and efficiency of the neural network’s training. The reward function serves as an indicator to evaluate the effectiveness and safety of the operator’s actions with a results-oriented approach. The reward function in this study is designed as a combination of operational safety, operational efficiency, and load stability.
3.2.1. Operational Safety
In terms of safety, the spreader needs to avoid collisions with obstacles and successfully complete the lifting task. This paper sets up different safety rewards according to the actual operating conditions and increases the danger zone near the obstacles so that the spreader can avoid obstacles to reach the target point more quickly.
If the spreader collides with an obstacle, the driver receives a significant penalty to strongly discourage dangerous operations. At each time step, if no collision occurs, a small positive reward is given to the driver, encouraging them to cover as much distance as possible without collisions. The specific design is as follows:
The danger zone is defined as the area where the spreader (whether unloaded or loaded) approaches an obstacle but has not yet collided with it. To help the spreader quickly move away from obstacles, the penalty around obstacles is intensified. However, to avoid trapping the spreader in local situations, the penalties in the danger zone should not be too concentrated and need to be adjusted according to the distance from the nearest obstacle. The reward function for the safety distance is designed as follows:
where
is the distance between the container and the obstacle at time t. The value of the safety distance is set based on the safety regulations of remote-controlled quay crane operations and the average operating values of skilled drivers.
The overall operational safety reward function is given by
3.2.2. Operational Efficiency
In terms of operational efficiency, the objective of the efficiency reward is to encourage the model to complete tasks as quickly as possible while maintaining safety and accuracy. Therefore, the efficiency reward function consists of time efficiency, operational accuracy, and safety rewards.
The time efficiency reward is designed based on the time required to complete an operation. For every 5 s reduction in operational time, a reward of (+1) point is given. The specific design is as follows:
where
is the target operational time for completing a loading or unloading task and
is the actual time taken to complete the operation.
Operational accuracy rewards are designed based on the distance between the ideal deviation of container placement and the actual deviation during the actual operation to encourage high accuracy upon completion of the task, and are designed as follows:
where
is the reward coefficient for accuracy,
controls the rate of the reward decay,
is the actual deviation distance from the target position, and
is the ideal deviation for placing the container.
To prevent compromising safety while pursuing efficiency, a safety reward is also set:
where
is the safety reward coefficient, and its range of values is [0,1].
The total operational efficiency reward is given by
3.2.3. Load Stability
When moving the spreader and container, the load’s maximum swing angle is measured to assess the operational smoothness. A smaller swing angle indicates better load stability. The reward function for load stability is designed as follows.
According to the site environmental conditions and the actual operation of the driver, the values of
a and
b in the above formula are
and
, which are more in line with real site conditions and operational safety requirements. Let the weights of operational safety, operational efficiency, and load stability be
,
, and
, respectively. The final design of the reward function is expressed as
3.3. Network Design
In this study, the DQN network model for driver operations is structured as shown in
Figure 3. The input layer takes the current state of the spreader in the simulator as input. The loss function is set to the Mean Squared Error (MSE) loss, and the activation function used is the ReLU function. The optimizer is the Adam optimizer, which ensures stable training. The weights of the neural network are initialized using the Xavier initialization method, which helps mitigate the issues of vanishing and exploding gradients by keeping the variance of each layer’s output close to one. The evaluation network and the target network have the same structure, both using the network structure shown in
Figure 3, but their update frequencies differ. The evaluation network is updated at every step, while the target network is updated only after a certain period, meaning that the parameters of the evaluation network are periodically copied to the target network.
3.4. Model Training and Learning Process
The DQN-based skilled driver training model consists of the following stages: First, before training and learning, it is essential to correctly design the environmental states, skilled operator actions, action strategies, and reward functions. Next, the model learns to obtain a strategy that maximizes future rewards through interaction and continuously adjusts the value function to achieve the optimal strategy. Finally, after the training and learning process is completed, the optimal control strategy can be derived based on state information. Novice drivers can then use this model to learn the operational methods of skilled drivers, guiding them to safely and efficiently complete the entire loading and unloading process of containers.
Figure 4 shows a flowchart of the model training using the DQN algorithm based on skilled driver data. First, the sensor system is used to collect environmental information from the simulated port operation scenario. Then, the current environmental state information of the port operation scene is input into the evaluation network, which outputs the Q-values of each action in the action space. Based on the policy, an action
a is selected, and the spreader executes it to obtain a reward
r, transitioning the spreader to a new state
. The current state
, action
a, next state
, and reward are stored as historical experience information in the experience replay pool.
At each time step, a random sample of data is selected from the experience pool to train the evaluation network, allowing it to approximate the optimal action value. Finally, the DQN loss function is computed by combining the Q-values of the evaluation network and the target network, and the network parameters are updated using historical experience data. When the evaluation network has been sufficiently trained, its weights will approximate the optimal parameters. Additionally, the evaluation network periodically copies its parameters to the target network every N time steps to reduce the correlation between the two networks.
4. Evaluation Method Design for Driver Risk Behaviors
In this section, the trained baseline model is used to evaluate the basic skills and operational safety risk behaviors of novice drivers in the remote operation of quay container cranes. Novice drivers interact with the trained baseline model, comparing their operational behaviors with the model’s expected behaviors to assess their capabilities and risk behaviors.
4.1. Operational Difference Measurement
To quantify the disparity between the operation sequences of novice drivers and the optimal ones recommended by the benchmark model, this paper assesses whether the novice drivers’ operations can effectively maneuver the vehicle to reach the desired state stably and rapidly by comparing the differences between the state transitions resulting from their operations and those suggested by the model. The Euclidean distance is employed to compare the distances between the actual state transitions and the expected ones.
The Euclidean distance is a method for calculating the straight-line distance between two points in a multi-dimensional space. In this context, it can be used to measure the difference between the state achieved by the novice operator’s actions, denoted as
, and the expected state derived from the model’s optimal actions, denoted as
. Each state consists of multiple dimensions, such as position, speed, and other variables. The Euclidean distance between the novice operator’s state and the model’s optimal state is given by
In formula (
10), the Euclidean distance is used to measure the differences between state vectors. Each state vector consists of multiple dimensions, with each dimension representing a different variable
(e.g., position or speed). To avoid certain variables dominating the distance calculation due to their larger ranges, all variables (i.e., each dimension
in the state vector) are normalized before the distance calculation. This normalization ensures that all variables contribute equally to the distance metric, regardless of their original scales. For each dimension
in the state vector, the normalization process adjusts the variable to the range
. The normalization for a variable
with a range of
is given by
This normalization ensures that all dimensions are scaled equally, preventing variables with larger ranges from disproportionately affecting the distance metric. If the variables are not normalized, variables with larger ranges may dominate the calculation, leading to biased results that do not accurately reflect the true differences in the state vectors.
4.2. Safety Evaluation Indicators
In the remote control quay crane operation simulator, it is very important to establish safety evaluation indices, which can help evaluate the operational safety of novice drivers and guide training and improvement. Based on the operational standards followed by quay crane operators and the qualitative manual evaluation criteria, this paper quantifies the safety evaluation indicators. Manual scoring often relies on experiential judgment to assess operators’ risk behaviors, such as visually estimating whether the operating distance is within the safe range, which typically results in considerable deviation. By establishing these safety evaluation indicators, risk behaviors can be quantified, enabling a comprehensive evaluation of novice operators’ operational safety and providing targeted improvement measures accordingly. The specific safety evaluation indicators are described below.
4.2.1. Number of Approaches to Obstacles
The number of times the spreader’s distance from the obstacle is below the safe distance during operation is counted, reflecting the operator’s ability to maintain a safe distance between the equipment and the obstacle.
This paper sets a safe distance . Every time the distance between the spreader and the obstacle , the count increases, and the total number of such occurrences during the entire operation is recorded.
4.2.2. Speed Control Consistency
Speed control consistency is evaluated by calculating the standard deviation of the speed changes of the trolley and the spreader, reflecting the driver’s ability to maintain consistent and smooth speed control. For a series of recorded speed values
, the average speed
is calculated as
where
n is the total number of recorded speed values, and
represents the speed at the
recorded moment. The standard deviation of speed changes
is then calculated as
A smaller standard deviation indicates better consistency in speed control.
4.2.3. Operational Accuracy
Operational accuracy assesses the novice driver’s ability to perform specific tasks (e.g., short-distance movement, long-distance movement, and alignment of the spreader and container). The Euclidean distance is used to calculate the deviation between the target point and the actual stopping point to measure the accuracy of these specific operations.
Let the target point be
and the actual stopping point be
. The difference in operational accuracy can be expressed as
Here, and are the horizontal and vertical coordinates of the actual stopping point, while and are the horizontal and vertical coordinates of the target point. represents the difference in operational accuracy, where a smaller value indicates higher operational accuracy.
4.2.4. Emergency Response Time
The emergency response time measures the driver’s reaction time in the event of sudden incidents (e.g., container tilting during lifting). Sensors record the exact moment when an emergency occurs and the moment when the driver initiates a response. The average emergency response time for a novice driver is calculated as
where
is the time when the driver responds to the
j-th event,
is the time when the
j-th emergency occurs, and
N is the total number of events.
4.2.5. Load Stability
Load stability is assessed by measuring the swing angle of the load during movement. A smaller swing angle indicates better stability.
From the recorded position data, the spreader position
and the trolley position
are identified. Let the swing angle of the load be
and define the unit vector
pointing vertically downward. The difference vector between the two positions
d is then
The swing angle
is then computed as
In the formula,
is the length of the position difference vector between the spreader and the trolley, and
is the factor used to convert radians to degrees. Equation (
18) adjusts the sign of the angle based on the sign of the
y component of the difference vector
d to ensure the correct direction of the final angle. If
, it indicates that the spreader is in front of the trolley (relative to the negative
y-axis), and the angle is taken as positive; otherwise, it is negative.
6. Discussion
With the improvement of automation levels in ports and the growth of cargo volumes, the requirements for drivers’ operational skills and safety awareness are also increasing. However, traditional training methods not only have high costs and risks but also exhibit problems of unstable training effects and insufficient coverage. This study innovates based on this and proposes an automated evaluation method utilizing virtual reality and the DQN model. This method not only effectively reduces training costs and risks but also improves drivers’ abilities to deal with complex environments and sudden events.
The experimental results of this study show that the automated evaluation method based on the DQN model has higher objectivity and consistency in evaluating operational risk behaviors. The standard deviation and coefficient of variation of the experimental group are significantly lower than those of the control group, indicating that the system can effectively reduce the fluctuations caused by subjective human judgments. This consistency is particularly important for port operations, as the operating environment in ports is complex and variable. A highly stable evaluation method can more accurately reflect drivers’ operational abilities, reduce accidents, and improve overall operational efficiency.
Although only 10 operators participated in our experiment, the scores presented are the averages of multiple evaluations, ensuring the validity of the experimental results for verifying this method. In the future, we will consider incorporating more safety risk factors related to quay crane operations and recruiting more participants to further enhance the accuracy of the risk behavior assessment for operators from multiple perspectives.
7. Conclusions
This paper first designs a risk simulation module based on the existing automated quay crane remote operation simulator. This module can simulate potential risks in various real operating scenarios, filling the gap in the current simulator’s capability to model driver operational risks. A DQN-based baseline model is then constructed to learn the operational methods and standards of skilled drivers. This model accurately reflects the operational behaviors and decision-making processes of experienced drivers. Based on this baseline model, an automated evaluation method for drivers’ risk behaviors is proposed. Novice drivers interact with the model to simulate various operational scenarios, allowing the system to monitor and assess their behaviors in real time and score their risk-related actions. This approach not only makes the evaluation process more objective and scientific but also enhances the drivers’ safety risk awareness, thereby reducing errors and accidents during actual operations.
Currently, this experiment has been authorized at only one terminal, resulting in a relatively small sample size limited to the drivers of a single port. Future research can expand to more ports and broader areas to validate the applicability of the model across different operational environments. Additionally, this study can be enhanced by incorporating more sensor data, such as drivers’ physiological states (e.g., fatigue levels and attention levels), to improve the accuracy of the evaluation from multiple dimensions. This will contribute to establishing a more comprehensive port safety evaluation system, providing new perspectives and practical guidelines for the development of the industry.