The underwater intelligent cleaning and inspection robot is specifically designed for the safety inspection of marine oil platform risers and the removal of marine organisms attached to the risers. It is equipped with an Ultra-Short Baseline positioning system (USBL), an attitude sensor, a depth sensor, and a compass, enabling precise positioning, attitude awareness, and depth perception. In addition, its propulsion system includes four horizontal thrusters and four vertical thrusters. The model parameters of the underwater robot are shown in
Table 3, and the physical prototype and thruster layout are illustrated in
Figure 7.
To verify that DDPG-LADRC has stronger robustness, this paper proposes two experimental simulation scenarios.
5.1. Scenario 1
To verify the enhanced effect of combining reinforcement learning DDPG with a linear active disturbance rejection controller in terms of disturbance suppression capability and control accuracy, the position and attitude of the underwater robot are tracked under time-varying external disturbances. The transient performance of the control system under perturbations is evaluated to validate the disturbance rejection and robustness of the DDPG-LADRC control scheme. Disturbances are introduced during the movement of the ROV as follows:
The initial conditions for the underwater robot are set as
, with the velocity and angular velocity set as
. Additionally, for the controller parameters, the PID parameters are set as:
The parameters for the Active Disturbance Rejection Control are set as follows:
, because
which means
= 15,
,
The relevant DDPG setting parameters are shown in
Table 2 above. The underwater robot simulation is designed to run for 100 s, with a simulation step size of 0.01 s. The proposed control algorithm is mainly compared with PID and LADRC under fixed parameters through three-dimensional trajectory tracking, and planar tracking, to verify the degree of improvement in the system’s transient performance by the DDPG-LADRC control strategy. The trajectory tracking curve in the inertial coordinate system is:
First, a feasibility analysis of the parameter optimization for LESO is conducted.
Figure 8 compares the observation errors of the LESO optimized by DDPG with those of the fixed-parameter LESO. It can be observed that the fixed-parameter LADRC controller is not precise in tracking total disturbances. In contrast, the DDPG-LADRC can maintain better performance with a shorter time under the constraints of model parameter uncertainty and strong unknown external disturbances in underwater robot trajectory tracking control. The DDPG-LADRC can quickly respond to changes in disturbances and adjust its control strategy promptly to adapt to these changes, thereby enhancing the system’s dynamic performance. This indicates that the optimized observer parameters of DDPG-LADRC are effective.
The three-dimensional trajectory tracking performance of the ROV under different control schemes, as well as the tracking curves in the XY, XZ, and YZ planes shown in
Figure 9, can be observed. It can be seen that even in the presence of disturbances, the DDPG-LADRC control scheme can achieve precise trajectory tracking, with control performance superior to that of the PID controller and the fixed parameter LADRC controller, demonstrating stronger robustness. Therefore, parameter optimization based on DDPG can enhance the control performance of LADRC.
The selected evaluation indicators for transient performance are overshoot, settling time, and peak time.
In underwater robot control, overshoot is an important indicator used to describe the dynamic performance of a system. Overshoot is typically measured by the difference between the maximum output value and the steady-state value, and it can also be expressed as a percentage of this difference relative to the steady-state value. The system without overshoot typically stabilizes at the setpoint without deviating too much from the target value, indicating that there is no significant overreaction or oscillation during the response process. From
Table 4, we can see that DDPG-LADRC maintains response speed without overshoot, while PID and LADRC exhibit overshoot. When the overshoot is too large, the control system is prone to oscillation. The results indicate that DDPG-LADRC ensures the dynamic response process of the system, maintaining high robustness even in the face of model uncertainty or external disturbances. The parameter optimization effect of DDPG-LADRC is evident, effectively meeting the dynamic performance requirements of the system.
In underwater robot trajectory tracking control, the adjustment time is an important dynamic performance indicator. It reflects the robot’s sensitivity to changes in control signals and its ability to respond quickly, defined as the time required for the ROV to respond and maintain within a certain allowable error range (usually ±2% or ±5% of the final value) after initially reaching the target value. A shorter adjustment time means that the ROV can stabilize more quickly around the target value, reducing oscillations or instability during the transition process. Additionally, a rapid response can better handle external disturbances and changes in the internal parameters of the ROV, enhancing the system’s robustness and stability. Referring to
Table 5, it can be seen that the adjustment time of DDPG-LADRC for the ROV in the
-direction is significantly better than the other two control strategies, reducing by 93% and 98%, respectively. In the
-direction, the reductions are 93% and 86%, respectively, and in the
-direction, the reductions are 66.7% and 90%, respectively. The attitude angles
were reduced by 64% and 89%, respectively.
Even if the overshoot is 0, the system response may still have a “peak,” which does not refer to a deviation exceeding the steady-state value, but rather to the maximum value during the response process. In the underwater robot trajectory tracking control system, the peak time is an important dynamic performance indicator that describes the time required for the system response to exceed its steady-state value and reach the first peak. Referring to
Table 6, the comparison of peak times shows that in the
-direction, DDPG-LADRC significantly outperforms the other two control strategies, reducing by 93% and 98%, respectively. In the
-direction, it reduces by 82% and 90%, respectively, and in the
-direction, it reduces by 80% and 98%, respectively. The attitude angles
are reduced by 93% and 89%, respectively.
In summary, through a comparative analysis of transient performance under different control methods, the results indicate the superiority of the DDPG-LADRC control strategy in terms of transient performance. Compared to PID controllers and traditional LADRC controllers, the proposed DDPG-LADRC is more suitable for underwater robotic systems that are multivariable, strongly coupled, have significant randomness, and are subject to unknown disturbances.
The tracking error of the ROV trajectory tracking in
Figure 9 is shown in
Figure 10. Compared to the PID controller and the fixed parameter LADRC controller, the proposed DDPG-LADRC controller has a smaller steady-state error. The PID and fixed-parameter LADRC control schemes are unable to eliminate steady-state errors in a short time, which leads to an inability to track the desired trajectory. However, the DDPG-LADRC significantly improves the control accuracy of the system by introducing DDPG to achieve online tuning of LADRC parameters in response to environmental changes. This ensures that the ROV can maintain satisfactory control performance even in the presence of inaccurate model parameters and significant uncertain disturbances.
After 60 s, data from 1000 sampling points should be collected to calculate the root mean square error for determining the stable accuracy of the control method, as presented in
Table 7.
In underwater robot control, better stability accuracy means that the robot can precisely reach the target position. Simulation results indicate that the designed DDPG-LADRC controller not only has robust performance but also possesses the ability to quickly track commands and suppress disturbances. Further comparisons show that the performance of DDPG-LADRC surpasses that of PID and conventional fixed-parameter LADRC. Therefore, parameter optimization based on DDPG can enhance the control performance of LADRC.
5.2. Scenario 2
To further verify the robustness of the controller, the anti-interference capability of different control methods under strong interference conditions was compared. The most representative tracking trajectory during the ROV’s motion was selected (Formula (51)). A dual closed-loop sliding mode control scheme based on a nonlinear extended state observer (NESO-DSMC) was added for the comparison of control methods [
26], to validate the superiority of the DDPG-LADRC controller’s performance.
The parameters for the Active Disturbance Rejection Control are set as follows:
, Because
which means
,
,
The relevant DDPG setting parameters are shown in
Table 2 above. In addition, the controller parameters proposed in the NESO-DSMC are chosen as follows:
,
[
26].
The external interference added is shown in Equation (50). The added disturbance signal is related to the state of the ROV, and this signal is constantly changing.
The tracked trajectory is shown in Formula (51). This trajectory indicates that the ROV first descends vertically, then performs linear back-and-forth and spiral movements on a horizontal plane, accompanied by changes in depth and adjustments in heading, ultimately returning to a horizontal straight path. The initial position and attitude of the ROV are set as:
.
The simulation results shown in
Figure 11 demonstrate that the DDPG-LADRC can achieve accurate disturbance estimation for the perturbation observations and corresponding disturbance observation error curves of the three state variables
. The maximum observation error value for the observer in the
-direction is 0.00141, the maximum observation error in the
-direction is 0.0016, and the maximum observation error in the
-direction is 0.0021. DDPG-optimized LESO has achieved the estimation accuracy for disturbances that meet our requirements.
From
Figure 12, it can be seen that LADRC, due to the issue of fixed parameters in the controller, is unable to eliminate steady-state errors in a short time. Under continuously changing external disturbances, LADRC cannot achieve optimal control performance. In the presence of significant uncertain disturbances, NESO-DSMC cannot reach the same level of error convergence accuracy as DDPG-LADRC.
Table 8 and
Table 9 show the RMSE and MAE under different control methods, indicating that DDPG-LADRC has better robustness compared to LADRC and NESO-DSMC. DDPG-LADRC can eliminate steady-state errors within 5 s because it incorporates DDPG for online adjustment of LADRC parameters in response to uncertain disturbances caused by environmental changes, significantly improving the control accuracy of the system. This ensures that the ROV can maintain satisfactory control performance even in the presence of inaccurate model parameters and significant uncertain disturbances.