Fuzzy Reinforcement Learning Disturbance Cancellation Optimized Course Tracking Control for USV Autopilot Under Actuator Constraint

Gao, Xiaoyang; Hu, Xin; Yang, Ang

doi:10.3390/jmse13081429

Open AccessArticle

Fuzzy Reinforcement Learning Disturbance Cancellation Optimized Course Tracking Control for USV Autopilot Under Actuator Constraint

by

Xiaoyang Gao

¹

,

Xin Hu

^2,*

and

Ang Yang

¹

School of Maritime Economics and Management, Dalian Maritime University, Dalian 116026, China

²

School of Mathematics and Statistics Science, Ludong University, Yantai 264025, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(8), 1429; https://doi.org/10.3390/jmse13081429

Submission received: 25 June 2025 / Revised: 22 July 2025 / Accepted: 25 July 2025 / Published: 27 July 2025

(This article belongs to the Special Issue Control and Optimization of Ship Propulsion System)

Download

Browse Figures

Versions Notes

Abstract

Unmanned surface vehicles (USVs) course control research constitutes a vital branch of ship motion control studies and serves as a key technology for the development of marine critical equipment. Aiming at the problems of model uncertainties, external marine disturbances, performance optimization, and actuator constraints encountered by the autopilot system, this paper proposes a composite disturbance cancellation optimized control method based on fuzzy reinforcement learning. Firstly, a coupling design of the finite-time disturbance observer and fuzzy logic system is conducted to estimate and reject the composite disturbance composed of internal model uncertainty and ocean disturbances. Secondly, a modified backstepping control technique is employed to design the autopilot controller and construct the error system. Based on the designed performance index function, the fuzzy reinforcement learning is utilized to propose an optimized compensation term for the error system. Meanwhile, to address the actuator saturation issue, an auxiliary system is introduced to modify the error surface, reducing the impact of saturation on the system. Finally, the stability of the autopilot system is proved using the Lyapunov stability theory. Simulation studies conducted on the ocean-going training ship “Yulong” demonstrate the effectiveness of the proposed algorithm. Under the strong and weak ocean conditions designed, this algorithm can ensure that the tracking error converges within 7 s.

Keywords:

USV course tracking; reinforcement learning; optimized control; composite disturbance cancellation

1. Introduction

With the deepening promotion of intelligent marine vessel technologies by global maritime powers, the Unmanned Surface Vehicle (USV) has emerged as a pivotal platform for marine environmental monitoring, patrol search-and-rescue, and resource exploration, owing to its autonomy, mobility, and cost effectiveness [1,2,3]. As a fundamental guarantee for USV mission execution, course control has become a critical branch in the investigation of ship motion control. However, course control necessitates high-precision tracking under complex marine conditions and constrained mechanical equipment [4,5,6]. The propulsion system of USVs is inherently challenged by strong nonlinearity and environmental disturbances. Meanwhile, how to implement low-cost control for USV has become an urgent issue to maintain their long-term operational status. In addition, the autonomous driving system is also subject to physical constraints; exceeding the rated limits will inevitably lead to actuator failure, causing the vehicle to deviate from the predetermined route and even endangering navigation safety. Under such circumstances, constructing a coordinated control scheme that integrates disturbance elimination, performance optimization, and rudder angle constraints has become a key challenge in improving the safety of USVs [7,8,9,10].

The USV autopilot system typically relies on feedback from gyrocompasses for heading measurement. Heading rate measurements can be obtained from rate sensors, gyroscopes, the numerical differentiation of heading measurements, or state estimators. The first control scheme implemented on ships was a PID-based autopilot [11], followed by successive proposals of control techniques such as backstepping [12]. However, early control algorithms showed limited effectiveness in resisting various disturbances. To enhance USV adaptability in complex sea conditions and achieve stable course control, methods have been developed including adaptive control [13,14,15,16,17,18], disturbance observer-based control [19,20], and H_infinity robust control [21,22]. For example, Zhu et al. [23] designed an adaptive backstepping controller using fuzzy approximators to handle unknown internal uncertainties, while [21] proposed a fault-tolerant control scheme for autopilots based on robust techniques to suppress disturbances and address actuator failures. In particular, both approaches focus on disturbance suppression rather than estimation and elimination. In contrast, references [19,20] utilize disturbance estimation and cancellation techniques, which are better suited for high-precision control requirements. Concurrently, research on composite disturbances (integrating internal uncertainties and external perturbations) has evolved [24], coupling neural networks or fuzzy logic systems (FLSs) with finite-time disturbance observers to co-design disturbance cancellation control schemes. However, these studies prioritize anti-disturbance control for USV safety while neglecting optimal performance indices, which represent critical engineering demands.

For nonlinear autopilot systems, traditional linear optimal control struggles to directly solve for optimal solutions, whereas solving the Hamilton–Jacobi–Bellman (HJB) partial differential equation remains computationally intractable [25]. Reinforcement learning (RL) has emerged as an effective approach to address this challenge [26,27,28], fostering the development of RL-based autopilot controller designs in recent years [29,30]. References [31,32] proposed nonlinear optimal control schemes for USV autopilots using function approximators to construct critic and actor networks. However, these studies, despite addressing nonlinear optimization, overlook anti-disturbance design, compromising adaptability in complex marine environments. Additionally, USV autopilot actuators are subject to physical constraints (for example, 35° angle limits), highlighting the importance of a coordinated control scheme that integrates composite disturbance cancellation, performance optimization, and actuator saturation handling.

Building on the above analyses, this paper presents a composite disturbance cancellation optimal saturation control scheme for USVs based on fuzzy reinforcement learning. The approach involves coupling finite-time disturbance observers with fuzzy logic systems, designing optimal controllers for the performance index functions via fuzzy RL, and introducing an auxiliary system to modify the error surface and mitigate saturation effects. The comparison between the proposed USV RL scheme and the existing method is shown in Table 1. The novel contributions of this work include the following.

Composite Disturbance Cancellation with RL Optimization. This study presents a composite disturbance cancellation optimized autopilot control scheme by integrating an FLS with a finite-time disturbance estimator and incorporating this coupled design into an RL framework. The proposed strategy not only achieves precise disturbance rejection but also minimizes performance indices, thereby enabling potential energy consumption reduction for USVs.
RL-Optimized Control with Actuator Constraint. In USV control systems, actuator constraints (e.g., rudder angle limits) pose critical challenges for maintaining both control precision and system safety. This study proposes the RL framework that integrates the tanh function to address actuator saturation issues, complemented by an auxiliary system for error surface modification.

The remainder of this paper is structured as follows. Section 2 formulates the problem statement, while Section 3 details the design of the disturbance cancellation optimized controller with actuator constraints. Section 4 demonstrates the simulation results. Finally, conclusions are drawn in Section 5.

2. Problem Formation and Preliminaries

This section details the description of the USV autopilot model along with FLS.

2.1. Problem Formulation

The planar motion diagram of the USV shown in Figure 1, where u and v represent the forward speed and sway speed, respectively.

Ψ

and

φ

represent the heading angle and rudder angle.

O_{B} X_{B} Y_{B}

is the attached coordinate system, with

O_{B}

being the center of gravity of the USV;

O_{E} X_{E} Y_{E}

is the geodetic coordinate system, with

O_{E}

as the origin and

O_{E} X_{E}

as the north direction.

The model of the USV autopilot control system is established as [34]

\{\begin{matrix} \dot{φ} = r \\ \dot{r} = H (r) + g c (τ) + d \end{matrix}

(1)

Among them,

φ

represents the heading angle of USV; r represents the heading angular velocity; and

g = K / T

represents the gain of the control system. The parameters K and T are the rotation index and the following index, both of which are maneuverability indicators of the USV.

c (τ)

indicates a restricted actuator.

H (r)

is a nonlinear function, which can be approximated as

H (r) = - a_{1} r - a_{2} r^{3} - a_{3} r^{5} - \dots

(2)

a_{i}

is the nonlinearity coefficient of the ship and is a constant. Due to limitation of physical conditions, the rudder is constrained by a constraint that can be constructed as

c (τ) = \{\begin{matrix} sign (τ) τ_{M}, if | τ | \geq τ_{M} \\ τ, if | τ | < τ_{M} \end{matrix}

Clearly, the relationship between the applied control

c (τ)

and the control input

τ

has a sharp corner when

| τ | = τ_{M}

. Then, the non-smooth saturated control signals enter the control system. To avoid damage to the stability of the system and increase the depletion of the rudder, a smooth function can be used to approximate the constraint in (1) as

u (τ) = τ_{M} \times tanh (\frac{τ}{τ_{M}})

(3)

where

τ_{M}

is the upper bound of

τ

.

e (τ) = c (τ) - u (τ)

is the approximation error, and

|e (τ)| = |c (τ) - u (τ)| \leq τ_{M} (1 - tanh (1)) = e_{M}

means that the upper bound of

e (τ)

is

e_{M}

.

The following assumptions are provided as follows.

Assumption 1

([23]). Desired signal

y_{r}

is smooth, bounded, and has a 2-order continuous bounded derivative.

Assumption 2

([33]). The unknown environmental disturbance d is characterized as time-varying processes. Additionally, its first-order derivative is bounded, with the maximum value denoted as

d_{M}

.

The core control objective of this study is to develop a fuzzy reinforcement learning-based composite disturbance cancellation optimal control strategy for Unmanned Surface Vehicle (USV) autopilots subject to actuator constraints. This strategy is designed to enable the USV heading to not only track the desired signal accurately but also minimize the performance index function. Simultaneously, it ensures that all signals within the closed-loop system remain bounded.

2.2. Fuzzy Logic Systems (FLSs)

FLSs are used to approximate the uncertainties. Based on [33,35,36], the FLS is composed of four fundamental components: a fuzzy rule base that encodes domain knowledge, a singleton fuzzifier responsible for converting crisp inputs into fuzzy sets, a fuzzy reasoning engine that applies inference mechanisms to generate fuzzy outputs, and a defuzzifier that transforms these fuzzy results back into crisp values for practical use. The fuzzy rule base is shown via IF-THEN rules as

R^{L}

: if

X_{1}

is

{F_{1}}^{L}

, …,

X_{n}

is

{F_{n}}^{L}

, then

Y

is

G^{L}

, where

X = {[X_{1}, \dots, X_{n}]}^{⊤}

is the input and

Y

is the output.

{F_{n}}^{L}

and

G^{L}

denote the fuzzy sets. The FLS can be presented as

Y = \frac{\sum_{L = 1}^{N} L \prod_{i = 1}^{n} s_{F_{i}^{L}} (X_{i})}{\sum_{L = 1}^{N} \prod_{i = 1}^{n} s_{F_{i}^{L}} (X_{i})}, i = 1, \dots, n

(4)

where

w_{L} = {max}_{Y \in R} s_{F^{L}} (Y)

. The fuzzy basis function is

S_{L} (X) = \frac{\prod_{i = 1}^{n} S_{F_{i}^{L}} (X_{i})}{\sum_{L = 1}^{N} [\prod_{i = 1}^{n} S_{F_{i}^{L}} (X_{i})]}

(5)

Letting

w = {[w_{1}, \dots, w_{N}]}^{⊤}

and

s (X) = {[s_{1} (X), \dots, s_{N} (X)]}^{⊤}

, the FLS is shown as

Y = w^{⊤} s (X)

.

Lemma 1

((Universal Approximation Property of Fuzzy Logic Systems) [37]). Given a continuous function

f (\cdot)

defined over a compact set

R

, and for an arbitrarily small positive real number

ϵ > 0

, a fuzzy logic system (FLS) exists such that the following inequality holds:

sup_{\cdot \in ℜ} |f (\cdot) - w^{⊤} s (\cdot)| \leq ϵ

(6)

3. Rl-Composite Disturbance Cancellation Optimized Tracking Control Design

The FLS is deployed to approximate system uncertainties, effectively handling unknown dynamics. Complementing this, a finite-time disturbance observer is constructed to accurately estimate external disturbances in a timely manner. For actuator constraint, an auxiliary system is integrated into the control framework, ensuring that control inputs remain within the physical limits. The optimal controller

τ

is proposed via the RL optimized term

τ^{Ξ}

and backstepping-based controller

τ^{Θ}

as the sum of

τ = τ^{Θ} + τ^{Ξ}

. The stability of the entire autopilot control system is rigorously proven using Lyapunov theory.

3.1. Construction of Finite-Time Disturbance Observer (FTDO)

A FTDO is constructed for the estimation. Define the

p_{1} = r

and

p_{2} = d

. The system (1) can be extended, rewritten as

\{\begin{matrix} {\dot{p}}_{1} = H (r) + g c (τ) + p_{2} \\ {\dot{p}}_{2} = d \end{matrix}

(7)

The FTDO can be established as [19]

\{\begin{matrix} {\dot{\hat{p}}}_{1} = \hat{H} (r) + g u (τ) + {\hat{p}}_{2} - μ_{1} {sig}^{1 / 2} ({\hat{p}}_{1} - p_{1}) \\ {\dot{\hat{p}}}_{2} = - μ_{2} sgn ({\hat{p}}_{1} - p_{1}) \end{matrix}

(8)

where

μ_{1}

and

μ_{2}

are positive constants.

{sig}^{1 / 2} ({\hat{p}}_{1} - p_{1}) = sgn ({\hat{p}}_{1} - p_{1}) {| {\hat{p}}_{1} - p_{1} |}^{1 / 2}

The observation error system along with (8) and (7) can be shown as

\{\begin{matrix} {\dot{\tilde{p}}}_{1} = {\tilde{p}}_{2} + μ_{1} {sig}^{1 / 2} ({\hat{p}}_{1} - p_{1}) + \tilde{H} (r) + g e (τ) \\ {\dot{\tilde{p}}}_{2} = d + μ_{2} sgn ({\hat{p}}_{1} - p_{1}) \end{matrix}

(9)

where the

\hat{H} (r)

and

\tilde{H} (r)

denote the estimation and the estimation error of

H (r)

via FLS.

The proposed FTDO finite-time stability can be proved based on the [38] and will not be elaborated further here.

3.2. Composite Disturbance Cancellation Control Design

A composite disturbance cancellation controller is designed by the adaptive backstepping with an auxiliary system. The following error surfaces are defined firstly

z_{1} = φ - y_{r}

(10)

z_{2} = r - α - λ

(11)

where

α

is the optimized virtual term, which can be defined as

α = α^{Θ} + α^{Ξ}

,

α^{Θ}

is the virtual control, and

α^{Ξ}

is the optimal virtual term.

λ

is the auxiliary system’s state, which can be designed as

λ = - λ + g (u (τ) - τ)

(12)

Step 1: Taking the derivative of

z_{1}

with respect to (10) and (11) results in

\begin{matrix} {\dot{z}}_{1} & = \dot{φ} - {\dot{y}}_{r} \\ = r - {\dot{y}}_{r} \\ = z_{2} + α^{Θ} + α^{Ξ} + λ - {\dot{y}}_{r} \end{matrix}

(13)

Choose the Lyapunov function candidate as

V_{1} = \frac{1}{2} z_{1}^{2}

(14)

The time derivative of

V_{1}

is

\begin{matrix} {\dot{V}}_{1} & = z_{1} {\dot{z}}_{1} \\ = z_{1} (z_{2} + α^{Θ} + α^{Ξ} + λ - {\dot{y}}_{r}) \end{matrix}

(15)

Design the virtual control as

α^{θ} = - k_{1} z_{1} - λ + {\dot{y}}_{r}

(16)

where

k_{1}

is a design positive constant.

Taking (16) into (15), it yields

{\dot{V}}_{1} = - k_{1} z_{1}^{2} + z_{1} α^{Ξ} + z_{1} z_{2}

(17)

Step 2: Consider the second error surface (11) with the time derivative shown as

\begin{matrix} {\dot{z}}_{2} & = \dot{r} - \dot{α} - \dot{λ} \\ = H (r) + g c (τ) + d + λ - g u (τ) + g τ - \dot{α} \\ = g τ + g e (τ) + f (θ) + d + λ - \dot{α} + ω^{⊤} δ (α) + η_{α} \end{matrix}

(18)

where

θ = [r, α], f (θ) = H (r) - H (α)

. the uncertainty

H (r)

and

H (α)

can be handled by FLS as

H (\cdot) = ω^{⊤} δ (\cdot) + η_{\cdot}

.

η_{\cdot}

are the approximation errors. The actual control is represented as

τ = τ^{Θ} + τ^{Ξ}

.

Considering the following Lyapunov function candidate as

V_{2} = V_{1} + \frac{1}{2} {z_{2}}^{2} + \frac{1}{2} {\tilde{ω}}^{⊤} \tilde{ω}

(19)

The time derivative of

V_{2}

with (18) is

\begin{matrix} {\dot{V}}_{2} & = - k_{1} z_{1}^{2} + z_{1} α^{Ξ} + z_{1} z_{2} + z_{2} [g (τ^{Θ} + τ^{Ξ}) + g e (τ) + f (θ) + d + λ - \dot{α} \\ + ω^{⊤} δ (α) + η_{α}] + \frac{1}{2} {\tilde{ω}}^{⊤} \dot{\hat{ω}} \end{matrix}

(20)

The design of the actual controller with the adaptation law is presented as follows:

τ^{Θ} = g^{- 1} [- z_{1} - k_{2} z_{2} - {\hat{ω}}^{⊤} δ (α) - λ - \hat{d} + \dot{α}]

(21)

\dot{\hat{ω}} = z_{2} δ (α) - k_{3} \hat{ω}

(22)

where

k_{3}

and

k_{2}

are the designed constants. Substituting (21) and (22) into (20) yields

\begin{matrix} {\dot{V}}_{2} & = - k_{1} z_{1}^{2} - k_{2} z_{2}^{2} + z_{1} α^{Ξ} + z_{2} g τ^{Ξ} + z_{2} g e (τ) \\ + z_{2} \tilde{d} + z_{2} f (θ) + z_{2} η_{α} + k_{3} {\tilde{ω}}^{⊤} \hat{ω} + z_{2} {\tilde{ω}}^{⊤} δ (α) \end{matrix}

(23)

and the following inequalities can be obtained:

z_{2} e (τ) \leq \frac{1}{2} z_{2}^{2} + \frac{1}{2} e_{M}^{2}

(24)

z_{2} \tilde{d} \leq \frac{1}{2} z_{2}^{2} + \frac{1}{2} {\tilde{d}}_{M}^{2}

(25)

z_{2} η_{α} \leq \frac{1}{2} z_{2}^{2} + \frac{1}{2} η_{α M}^{2}

(26)

z_{2} {\tilde{ω}}^{⊤} δ (α) \leq \frac{1}{2} z_{2}^{2} + \frac{1}{2} {\tilde{ω}}^{⊤} \tilde{ω}

(27)

{\tilde{ω}}^{⊤} \hat{ω} \leq - \frac{1}{2} {\tilde{ω}}^{⊤} \tilde{ω} + \frac{1}{2} ω^{⊤} ω

(28)

Substituting (24)–(28) into (23) yields

\begin{matrix} {\dot{V}}_{2} & \leq - k_{z} {∥Z∥}^{2} + \frac{1}{2} k_{3} ω_{M}^{2} + \frac{1}{2} k_{3} {\tilde{ω}}_{M}^{2} + \frac{1}{2} η_{α}^{2} + \frac{1}{2} {\tilde{d}}_{M}^{2} + \frac{1}{2} g e_{M}^{2} \\ + Z^{⊤} ([\begin{matrix} 0 \\ f (θ) \end{matrix}] + [\begin{matrix} 1 & 0 \\ 0 & g \end{matrix}] Φ^{Ξ}) \end{matrix}

(29)

where

k_{z} = min {k_{1}, k_{2}}

and

Z = {[z_{1}, z_{2}]}^{⊤}

.

Φ^{Ξ} = {[α^{Ξ}, τ^{Ξ}]}^{⊤}

is the optimized controller which contains the optimal virtual and actual controls.

Based on [39], the boundedness of all variables within the entire closed-loop system can be ensured by formulating a controller that stabilizes the subsequent tracking error system. The optimized control law is designed to maintain system stability, thereby guaranteeing the bounded nature of all system variables:

\dot{Z} = [\begin{matrix} 0 \\ f (θ) \end{matrix}] + [\begin{matrix} 1 & 0 \\ 0 & g \end{matrix}] Φ^{Ξ}

(30)

3.3. Fuzzy RL Optimized Compensator Design

For the tracking error system mentioned above, a fuzzy RL optimized compensator is devised. Subsequently, the error system (30) can be reformulated as

\dot{Z} = F (Z) + G Φ^{Ξ}

(31)

where

F (Z) = {[\begin{matrix} 0, f (θ) \end{matrix}]}^{⊤}

and

G = [\begin{matrix} 1 & 0 \\ 0 & g \end{matrix}]

.

Define the performance index function as follows:

ρ = \int_{T}^{\infty} (Q (Z) + Φ^{Ξ ⊤} R Φ^{Ξ}) d t

(32)

In this context, function

Q (Z) \in R

exhibits positive definiteness, while matrix

R \in R^{2 \times 2}

is characterized by the same positive definite property.

For the (32) under the admissible

Φ

, the corresponding Hamiltonian function is formulated as

ϑ (Z, Φ) = Q (Z) + Φ^{⊤} R Φ + \nabla ρ_{Z}^{⊤} (F (Z) + G Φ) .

(33)

Here,

\nabla ρ_{Z}

denotes the gradient of

ρ (Z)

with respect to Z. Calculating the equation as

\partial ϑ (Z, Φ) / \partial Φ = 0

Φ^{Ξ} = - \frac{R^{- 1} G^{⊤} \nabla ρ_{z}^{Ξ}}{2}

(34)

The HJB equation can be obtained with (34) and (33) as

0 = Q (Z) + \nabla ρ_{Z}^{Ξ ⊤} f (θ) - \frac{1}{4} \nabla ρ_{z}^{Ξ ⊤} G R^{- 1} G^{⊤} \nabla ρ_{z}^{Ξ} .

(35)

Considering the direct solution of the HJB equation is intractable, the FLS is employed to approximate the performance index function, which is expressed as

ρ (Z) = ϖ^{⊤} ζ (Z) + ε

(36)

where the weight matrix is denoted by

ϖ \in R^{l \times 1}

, the activation function by

ς (Z) \in R^{l \times 1}

, and the approximation error by

ε \in R

.

The estimation forms of (34) and (36) are shown as

\hat{ρ} (Z) = {\hat{ϖ}}^{⊤} ς (Z) + ε

(37)

{\hat{Φ}}^{Ξ} = - \frac{R^{- 1} G^{⊤} {\hat{ϖ}}^{⊤} ς (Z)}{2}

(38)

In order to minimize

{\hat{Φ}}^{Ξ}

, the following weight update law can be designed based on the gradient descent method

\begin{matrix} \dot{\hat{ω}} & = - β_{1} [{(\nabla ς_{Z}^{⊤})}^{⊤} \hat{f} (θ) - \frac{1}{2} \times {(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ς_{Z}^{⊤} \hat{ϖ}] \\ \times [Q (Z) + {\hat{ϖ}}^{⊤} {(\nabla ς_{Z}^{⊤})}^{⊤} \hat{f} (θ) - \frac{1}{4} \times {\hat{ϖ}}^{⊤} {(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ς_{Z}^{⊤} \hat{ϖ}] \end{matrix}

(39)

where

0 < β_{1} < 1

denotes the learning rate.

The weight error dynamics are given by

\begin{matrix} \dot{\tilde{ϖ}} = & - β_{1} [{(\nabla ζ_{Z}^{⊤})}^{⊤} Z - {(\nabla ζ_{Z}^{⊤})}^{⊤} f (θ) \\ + {(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ς_{Z}^{⊤} \tilde{ϖ} + \frac{1}{2} {(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ε_{Z}] \\ \times [{\tilde{ϖ}}^{⊤} {(\nabla ζ_{z}^{⊤})}^{⊤} \dot{Z} + {\hat{ϖ}}^{⊤} {(\nabla ζ_{z}^{⊤})}^{⊤} \tilde{f} (θ) \\ + \frac{1}{4} {\tilde{ϖ}}^{⊤} {(\nabla ζ_{Z}^{⊤})}^{⊤} Ψ \nabla ζ_{Z}^{⊤} \tilde{ϖ} + \frac{1}{2} {\tilde{ϖ}}^{⊤} {(\nabla ζ_{Z}^{⊤})}^{⊤} Ψ \nabla ε_{Z} + κ] . \end{matrix}

(40)

The whole control scheme has been designed, and the final fuzzy RL-optimized control law is designed as

τ = τ^{Θ} + τ^{Ξ}

. The overall control diagram of the proposed RL autopilot control strategy is shown in Figure 2.

Theorem 1.

For the autopilot control system (1), with Assumptions 1 and 2, the proposed FTDO (8), disturbance cancellation controller (21), optimized compensator (38), adaptation laws (22) and (39), respectively, the proposed optimal strategy can guarantee that all the signals in the whole closed-loop USV autopilot control system are bounded via appropriate parameters.

Proof.

The Lyapunov function candidate is selected as

V_{a l l} = V_{3} + \frac{1}{2} {\tilde{ϖ}}^{⊤} \tilde{ϖ} .

(41)

Taking the time derivative of

V_{a} l l

considering Equations (29) and (41) yields

\begin{matrix} {\dot{V}}_{a l l} & \leq - k_{z} {∥Z∥}^{2} + \frac{1}{2} k_{3} ω_{M}^{2} + \frac{1}{2} k_{3} {\tilde{ω}}_{M}^{2} + \frac{1}{2} {η_{α}}^{2} + \frac{1}{2} {\tilde{d}}_{M}^{2} + \frac{1}{2} g e_{M}^{2} \\ + Z^{⊤} (f (θ) + G Φ^{Ξ}) - β_{1} [\tilde{ϖ} {(\nabla ς_{Z}^{⊤})}^{⊤} \dot{Z} - \tilde{ϖ} {(\nabla ς_{Z}^{⊤})}^{⊤} \tilde{f} (θ) \\ + \tilde{ϖ} {(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ς_{Z}^{⊤} \tilde{ϖ} + \frac{1}{2} \tilde{ϖ} {(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ε_{Z}] \\ \times [{\tilde{ϖ}}^{⊤} {(\nabla ς_{Z}^{⊤})}^{⊤} \dot{Z} + {\hat{ϖ}}^{⊤} {(\nabla ς_{Z}^{⊤})}^{⊤} \tilde{f} (θ) \\ + \frac{1}{4} {\tilde{ϖ}}^{⊤} {(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ς_{Z}^{⊤} \tilde{ϖ} + \frac{1}{2} {\tilde{ϖ}}^{⊤} {(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ε_{Z} + κ] . \end{matrix}

(42)

According to [39], some assumptions are shown as

∥\nabla ς_{Z}^{⊤}∥ \leq ς_{M}

,

∥\nabla ε_{Z}∥ \leq ε_{M}

,

f (θ) + G Φ^{Ξ} \leq b_{2} \sqrt{∥Z∥}

,

∥ϖ∥ \leq ϖ_{M}

and

b_{3} \leq ∥{(\nabla ς_{Z}^{⊤})}^{⊤} Ψ \nabla ς_{Z}^{⊤}∥ \leq b_{4}

. Here,

ξ_{M}, ε_{M}, b_{2}, b_{3}, b_{4}, ϖ_{M}

are all positive constants. Then, (42) can be rewritten as

\begin{matrix} {\dot{V}}_{a l l} & \leq - k_{z} {∥Z∥}^{2} + \frac{1}{2} k_{3} ω_{M}^{2} + \frac{1}{2} k_{3} {\tilde{ω}}_{M}^{2} + \frac{1}{2} η_{α}^{2} + \frac{1}{2} {\tilde{d}}_{M}^{2} + \frac{1}{2} g e_{M}^{2} \\ + ∥Z∥ b_{2} \sqrt[]{∥Z∥} + β_{1} [{∥\tilde{ϖ}∥}^{2} (ζ_{M}^{2} - b_{3} + \frac{b_{3}^{2}}{4} ζ_{M}^{- 2}) + \frac{b_{2}^{2}}{2} ∥Z∥ + \frac{1}{2} {∥\tilde{ω}∥}^{2} + \frac{1}{4} ε_{M}^{2}] \\ \times [{∥\tilde{ϖ}∥}^{2} (ξ_{M}^{2} + b_{4} + \frac{b_{3}^{2}}{4} ξ_{M}^{- 2}) + b_{2}^{2} ∥Z∥ + \frac{1}{2} {∥\tilde{ω}∥}^{2} + ξ_{M}^{2} + \frac{b_{3}^{2}}{4} ξ_{M}^{- 2} + \frac{5}{16} ε_{M}^{2}] \end{matrix}

(43)

By Young’s inequality, one obtains

{\dot{V}}_{a l l} \leq - γ_{1} {∥Z∥}^{2} - γ_{2} {∥\tilde{ω}∥}^{2} - γ_{3} {∥\tilde{ϖ}∥}^{2} + γ_{4}

(44)

where

γ_{1} = k_{z} - \frac{11 b_{2}^{2}}{16} - \frac{b_{5}^{2}}{2} - \frac{3 b_{2}^{2}}{2} ζ_{M}^{2} - \frac{b_{3}^{2}}{4} ζ_{M}^{- 2}

,

γ_{2} = \frac{b_{1}}{8} - \frac{3 b_{4}^{2}}{4} - \frac{5}{2}

,

γ_{3} = b_{4} ε_{M}^{2} + b_{3} ζ_{M}^{2} - \frac{b_{3}^{2}}{4} - \frac{b_{3}^{4}}{32} ζ_{M}^{4}

,

γ_{4} = \frac{1}{2} ζ_{M}^{2} ε_{M}^{2} + \frac{7}{16} ε_{M}^{2} + \frac{b_{3}^{2}}{32} ζ_{M}^{- 2} + \frac{1}{2} η_{α}^{2} + \frac{1}{2} {\tilde{d}}_{M}^{2} + \frac{1}{2} g e_{M}^{2}

,

b_{5} = b_{2} b_{3} - \frac{3 b_{2}}{32} ζ_{M}^{2} - \frac{b_{2} b_{3}^{2}}{4} ζ_{M}^{- 2}

. By selecting appropriate parameters, it implies that

{\dot{V}}_{a l l} < 0

if

∥ Z ∥ > \sqrt{\frac{γ_{4}}{γ_{1}}} or ∥ \tilde{w} ∥ > \sqrt{\frac{γ_{4}}{γ_{2}}} or ∥ \tilde{ω} ∥ > \sqrt{\frac{γ_{4}}{γ_{3}}}

.

This concludes the proof. □

4. Simulation Results

This section presents simulation results to verify the effectiveness of the proposed disturbance cancellation fuzzy RL-optimized control approach, using the ocean-going training vessel Yulong. The numerical simulations are made with the help of an experimental computer (Intel Core Ultra 7-U7-265CPU@5.30 GHz, RAM: 16.00 GB) and the simulation tool is MATLAB R2024a platform. The relevant parameters of the vessel are provided in [40].

To verify the effectiveness of the algorithm proposed in this paper, three cases are used for specific demonstration. Among them, Case 1 and Case 2 demonstrate the performance under weak and strong disturbances respectively. Case 1 (weak ocean disturbance):

d = (1 / (K 0 / T 0)) \times 0.01 \times sin (0.05 t) + 0.007 \times cos (0.03 t) .

Case 2 (strong ocean disturbance):

d = 5 \times ((1 / (K 0 / T 0)) \times 0.01 \times sin (0.05 t) + 0.007 \times cos (0.03 t)) .

Case 3 is the verification of the USV extended to 3 DOF. For Cases 1 and 2, three algorithms are employed for comparison, which are PID, RL with traditional disturbance observer [30], and adaptive neural network control considering input saturation [34]. It is particularly pointed out that, in order to ensure the validity of the algorithm comparison, the parameters and initial values in these control experiments are designed to be basically consistent. The experimental results are shown in Figure 3, Figure 4 and Figure 5. And the Root Mean Square Error (RMSE) comparison of different controllers in different cases is shown in Table 2.

When choosing the tracking signal, we select the mathematical model that can represent a certain actual performance requirement [34] as follows:

{\ddot{y}}_{m} (t) + 0.1 {\dot{y}}_{m} (t) + 0.0025 y_{m} (t) = 0.0025 y_{r} (t)

(45)

Among them,

y_{m}

the ideal system performance that characterizes the heading.

y_{r}

is the command input signal, and its value ranges from 0 degrees to 30 degrees. The parameters of the mathematical model are obtained through calculation as

K = 0.478

,

T = 216

,

a_{1} = 1

,

a_{2} = 30

. In the simulation, the initial conditions are selected as follows: the initial state

ϕ (0) = 15^{\circ}

(i.e., the heading deviation is +15°) and the initial state

r (0) = 1^{\circ} / s

. The constraint limits of actuator is

τ_{M} = 35^{\circ}

.

For the disturbance cancellation control scheme, the initial auxiliary states are selected as

λ (0) = 0

,

{\hat{p}}_{1} (0) = 0.1

and

{\hat{p}}_{2} (0) = 0.1

. The parameters are selected as

k_{1} = 2

,

k_{2} = 5

,

μ_{1} = 1

,

μ_{2} = 2.5

and

k_{3} = 0.1

. The initial adaptation law parameters are assigned random values within the interval (0,1). The fuzzy basis functions are constructed as

δ (α) = \frac{exp [- \frac{{(α + 3 - l)}^{2}}{2}]}{\sum_{l = 1}^{5} exp [- \frac{{(α + 3 - l)}^{2}}{2}]} and δ (r) = \frac{exp [- \frac{{(r + 2 - l)}^{2}}{2}]}{\sum_{l = 1}^{5} exp [- \frac{{(r + 2 - l)}^{2}}{2}]} .

For the optimal controller, in the performance index function,

Q (Z) = 4 Z^{⊤} Z

and

R = 3

. The initial value of

\hat{ϖ} (0)

is assigned random values within the interval (0,1) under

β_{1} = 0.02

. Additionally, the fuzzy basis function is selected as the following:

ς (Z) = \frac{exp [- \frac{{(z_{1} + 2 - l)}^{2}}{2}] \times exp [- \frac{{(z_{2} + 2 - l)}^{2}}{2}]}{\sum_{l = 1}^{5} exp [- \frac{{(z_{1} + 2 - l)}^{2}}{2}] \times exp [- \frac{{(z_{2} + 2 - l)}^{2}}{2}]} .

4.1. Comparisons with Traditional PID Controller

The comparisons were conducted between the designed RL controller and the PID controller under nearly identical initial conditions and parameters.

The simulation results are presented in Figure 2 and Figure 3, respectively. Figure 2 shows that under weak marine disturbances, the tracking performance of PID and the proposed RL method exhibits little difference. However, the tracking error plot reveals that the PID algorithm yields relatively larger errors. Moreover, without constraints on the control inputs, the initial values of the PID control inputs reach approximately −200° to 500°, exceeding the physical limits of rudder angles. This is practically unacceptable in engineering applications. In Figure 3, under strong marine disturbances, a significant disparity in tracking performance is observed between PID and the proposed RL method. The PID-based control exhibits notable tracking errors of ±0.5°, whereas the RL controller maintains tighter error bounds. These results collectively demonstrate that the proposed optimized controller outperforms the PID controller in terms of control accuracy and constraint compliance.

4.2. Comparisons with Traditional NN Adaptive Control with Input Constraints [34]

These comparisons were conducted between the designed RL controller and NN controller under nearly identical initial states and parameters.

The simulation results are presented in Figure 2 and Figure 3, respectively. Figure 2 shows that under weak marine disturbances, it can be seen that the NN control has a larger overshoot compared with the proposed algorithm. The initial tracking error of NN control is relatively large, but due to the adaptive adjustment, the error gradually converges after 8 s. However, compared with the proposed algorithm with disturbance rejection and the method in reference [30], the disturbance suppression effect of NN control is relatively ordinary. This also leads to (c) where the NN control input reaches saturation multiple times, with a duration of about 8 s. In Figure 3, the problems encountered by NN control are further amplified: it can be observed that between 350 s and 500 s, the performance of NN control is obviously inferior to that of the disturbance rejection method. Under strong disturbances, the tracking error of NN control increases significantly to around 0.5°, and its control input remains in saturation for a long time starting at about 9 s, which imposes a heavy burden on the machinery. Therefore, the proposed optimized controller has certain advantages over the NN controller in terms of disturbance rejection and optimization.

4.3. Comparisons with RL Control with Traditional DO [30]

These comparisons were conducted between the designed RL finite-time disturbance rejection controller with input constraints and the reinforcement learning optimized controller with DO under nearly identical initial states and parameters.

The simulation results are presented in Figure 2 and Figure 3, respectively. Figure 2 shows that under weak marine disturbances, from (a) and (b), it can be seen that compared with the proposed algorithm, the RL control in reference [30] can track the desired signal within 3 s and enable the tracking error to converge faster than the proposed algorithm. However, (c) reveals that this is achieved by removing the constraints on the input. From (d), it can be observed that for the same disturbance, the convergence speed of the proposed FTDO is nearly 2 s faster than that of DO. In Figure 3, when there are no constraints on the control input, reference [30] exhibits better control performance, but this ignores the requirements of engineering practice. From (d), it can be seen that for the same strong disturbance, the estimation accuracy of the proposed FTDO is somewhat higher than that of DO. Respectively, (e) and (f) illustrate the convergence processes of the adaptive parameters of the proposed algorithm. Therefore, the proposed optimized controller has certain advantages over reference [30] in terms of disturbance rejection.

4.4. Extended Simulation on 3-DOF USV

To verify the universality of the proposed algorithm, it is extended to a fully actuated 3-DOF USV. For the specific parameters and turning trajectory of the USV, reference can be made to [41]. The detailed results are shown in Figure 5. It can be seen from (a) to (d) that the proposed algorithm is also applicable to the 3-DOF USV, with a considerable overshoot of tracking error and relatively accurate estimation of disturbances. Then, (e) and (f) represent the corresponding update rates of adaptive parameters. Therefore, the proposed optimized control algorithm can still be applied to the 3-DOF USV.

5. Conclusions

In this paper, a composite disturbance cancellation optimized control via fuzzy reinforcement learning has been proposed to handle the problems of model uncertainties, external marine disturbances, performance optimization and actuator constraints encountered by the USV autopilot system. The coupling design for the finite-time disturbance observer and fuzzy logic system has been conducted to estimate and reject the composite disturbance. The design of backstepping and RL techniques have been used to construct the optimized constraint controller with the designed performance function and the auxiliary system. The stability of the autopilot control system has been theoretically validated using Lyapunov stability theory. Simulation studies conducted on the ocean-going training ship “Yulong” have demonstrated the effectiveness of the proposed algorithm.

Currently, the work in this paper has been performed regarding the adaptive optimization control of rudders. Future research will consider the following directions. Simulation experiments have verified the effectiveness of this framework. Physical simulations can be conducted using real ships under sensor noise and other external factors, taking a step forward for real-world applications. In addition, as the computational burden of high-complexity algorithms gradually increases, exploring ways to reduce the computational load and complexity of algorithms is promising.

Author Contributions

Methodology, X.G. and A.Y.; Resources, X.H.; Writing—original draft, X.G.; Writing—review & editing, X.G. and X.H.; Supervision, X.H. and A.Y.; Project administration, X.H.; Funding acquisition, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Natural Science Foundation of China under Grants 52301418, 62273172, 72203029; the Sichuan Science and Technology Program under Grants 2024NSFSC0878; the Fundamental Research Funds for the Central Universities under Grant 3132025295; the Natural Science Foundation of Shandong Province under Grant ZR2024MF055; Yantai Science and Technology Innovation Development Plan Research Project under Grant 2024JCYJ092; the Youth Foundation Project of Humanities and Social Sciences Research of the Ministry of Education Humanities and Social Sciences Fund of Chinese Ministry of Education under Grant 22YJCZH210, and National Key R&D Program of China under Grant No. 2023YFE0113200.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Peng, Z.; Wang, J.; Wang, D.; Han, Q. An Overview of Recent Advances in Coordinated Control of Multiple Autonomous Surface Vehicles. IEEE Trans. Ind. Inform. 2021, 17, 732–745. [Google Scholar] [CrossRef]
Shi, Y.; Shen, C.; Fang, H.; Li, H. Advanced control in marine mechatronic systems: A survey. IEEE/ASME Trans. Mechatron. 2017, 22, 1121–1131. [Google Scholar] [CrossRef]
Gao, X.; Li, T. Dynamic Positioning Control for Marine Crafts: A Survey and Recent Advances. J. Mar. Sci. Eng. 2024, 12, 362. [Google Scholar] [CrossRef]
Azzeri, M.; Adnan, F.; Zain, M.M. Review of course keeping control system for unmanned surface vehicle. J. Teknol. Sci. Eng. 2015, 74, 1–20. [Google Scholar] [CrossRef]
Ning, J.; Wang, Y.; Liu, L.; Li, T. Disturbance observer based adaptive heading control for unmanned marine vehicles with event-triggered and input quantization. Int. J. Robust Nonlinear Control 2024, 34, 11469–11486. [Google Scholar] [CrossRef]
Gao, X.; Long, Y.; Li, T.; Hu, X.; Chen, C.P.; Sun, F. Optimal Fuzzy Output Feedback Control for Dynamic Positioning of Vessels With Finite-Time Disturbance Rejection Under Thruster Saturations. IEEE Trans. Fuzzy Syst. 2023, 31, 3447–3458. [Google Scholar] [CrossRef]
Yue, Y.; Ning, J.; Li, T.; Liu, L. Adaptive neural network course tracking control of USV with input quantisation and output constraints. Int. J. Syst. Sci. 2025, 56, 2674–2688. [Google Scholar] [CrossRef]
Wang, W.; Wang, Y.; Li, T. Distributed Formation Maneuvering Quantized Control of Under-Actuated Unmanned Surface Vehicles with Collision and Velocity Constraints. J. Mar. Sci. Eng. 2024, 12, 848. [Google Scholar] [CrossRef]
Liu, W.; Ye, H.; Yang, X. Model-Free Adaptive Sliding Mode Control Method for Unmanned Surface Vehicle Course Control. J. Mar. Sci. Eng. 2023, 11, 1904. [Google Scholar] [CrossRef]
Lyu, G.; Peng, Z.; Wang, D.; Wang, J. Safety-certified Receding-horizon Motion Planning and Containment Control of Autonomous Surface Vehicles via Neurodynamic Optimization. IEEE Trans. Intell. Veh. 2024, 1–13. [Google Scholar] [CrossRef]
Minorsky, N. Directional stability of automatically steered bodies. J. Am. Soc. Nav. Eng. 1922, 34, 280–309. [Google Scholar] [CrossRef]
Fossen, T.I.; Grovlen, A. Nonlinear output feedback control of dynamically positioned ships using vectorial observer backstepping. IEEE Trans. Control Syst. Technol. 1998, 6, 121–128. [Google Scholar] [CrossRef]
Zhang, Q.; Zhang, M.; Hu, Y.; Zhu, G. Error-driven-based adaptive nonlinear feedback control of course-keeping for ships. J. Mar. Sci. Technol. 2021, 26, 357–367. [Google Scholar] [CrossRef]
González-Prieto, J.A.; Pérez-Collazo, C.; Singh, Y. Adaptive integral sliding mode based course keeping control of unmanned surface vehicle. J. Mar. Sci. Eng. 2022, 10, 68. [Google Scholar] [CrossRef]
Ning, J.; Wang, Y.; Chen, C.L.P.; Li, T. Neural Network Observer Based Adaptive Trajectory Tracking Control Strategy of Unmanned Surface Vehicle With Event-Triggered Mechanisms and Signal Quantization. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 3136–3146. [Google Scholar] [CrossRef]
Chen, B.; Hu, J.; Zhao, Y.; Ghosh, B.K. Finite-time observer based tracking control of uncertain heterogeneous underwater vehicles using adaptive sliding mode approach. Neurocomputing 2022, 481, 322–332. [Google Scholar] [CrossRef]
Vu, M.T.; Hsia, K.H.; El-Sousy, F.F.M.; Rojsiraphisal, T.; Rahmani, R.; Mobayen, S. Adaptive Fuzzy Control of a Cable-Driven Parallel Robot. Mathematics 2022, 10, 3826. [Google Scholar] [CrossRef]
Lv, J.; Ju, X.; Wang, C. Neural network prescribed-time observer-based output-feedback control for uncertain pure-feedback nonlinear systems. Expert Syst. Appl. 2025, 264, 125813. [Google Scholar] [CrossRef]
Yang, D.; Hu, X.; Liu, W.; Guo, C. Finite-time control design for course tracking of disturbed ships subject to input saturation. Int. J. Control 2022, 95, 1409–1418. [Google Scholar] [CrossRef]
Mu, D.; Wang, G.; Fan, Y.; Qiu, B.; Sun, X. Adaptive course control based on trajectory linearization control for unmanned surface vehicle with unmodeled dynamics and input saturation. Neurocomputing 2019, 330, 1–10. [Google Scholar] [CrossRef]
Zhang, X.; Xu, X.; Li, J.; Ma, F.; Zhang, Z.; Brunauer, G.; Steyskal, F. Fault estimation and H∞ fuzzy active fault-tolerant control design for ship steering autopilot subject to actuator and sensor faults. IEEE Sens. J. 2023, 23, 28110–28119. [Google Scholar] [CrossRef]
Wang, Y.; Yang, X.; Hao, L.; Li, T.; Chen, C.L.P. Integral Sliding Mode Output Feedback Control for Unmanned Marine Vehicles Using T–S Fuzzy Model with Unknown Premise Variables and Actuator Faults. J. Mar. Sci. Eng. 2024, 12, 920. [Google Scholar] [CrossRef]
Zhu, L.; Li, T. Observer-based autopilot heading finite-time control design for intelligent ship with prescribed performance. J. Mar. Sci. Eng. 2021, 9, 828. [Google Scholar] [CrossRef]
Guo, L.; Wen, X.Y. Hierarchical anti-disturbance adaptive control for non-linear systems with composite disturbances and applications to missile systems. Trans. Inst. Meas. Control 2011, 33, 942–956. [Google Scholar] [CrossRef]
Werbos, P. Advanced forecasting methods for global crisis warning and models of intelligence. In General System Yearbook; Society for General Systems Research: Washington, DC, USA, 1977; pp. 25–38. [Google Scholar]
Kamalapurkar, R.; Andrews, L.; Walters, P.; Dixon, W.E. Model-based reinforcement learning for infinite horizon approximate optimal tracking. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 753–758. [Google Scholar] [CrossRef]
Yuan, L.; Li, T.; Tong, S.; Xiao, Y.; Shan, Q. Broad Learning System Approximation-Based Adaptive Optimal Control for Unknown Discrete-Time Nonlinear Systems. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 5028–5038. [Google Scholar] [CrossRef]
Vu, M.T.; Nguyen, V.T.; Do, Q.T.; Youn, W.; Nguyen, T.H. Robust non-integer predictive control for wind turbine pitch angle regulation in full load regions using deep on-policy learning. Eng. Appl. Artif. Intell. 2025, 156, 111156. [Google Scholar] [CrossRef]
Liu, S.; Zuo, Y.; Li, T.; Wang, H.; Gao, X.; Xiao, Y. Adaptive Composite Fixed-Time RL-Optimized Control for Nonlinear Systems and Its Application to Intelligent Ship Autopilot. IEEE Trans. Artif. Intell. 2025, 6, 66–78. [Google Scholar] [CrossRef]
Yuan, L.E.; Xiao, Y.; Li, T.; Zhou, D. Output Feedback Adaptive Optimal Control of Multiple Unmanned Marine Vehicles with Unknown External Disturbance. J. Mar. Sci. Eng. 2024, 12, 1697. [Google Scholar] [CrossRef]
Zwierzewicz, Z.; Dorobczyński, L.; Jaszczak, S. Designing an optimal ship course-keeping system for an unknown object model via adaptive dynamic programming approach. Procedia Comput. Sci. 2023, 225, 4667–4674. [Google Scholar] [CrossRef]
Bai, X.; Yi, J.; Zhao, D. Approximate Dynamic Programming for Ship Course Control. In Proceedings of the Advances in Neural Networks–ISNN, Nanjing, China, 3–7 June 2007; pp. 349–357. [Google Scholar]
Hu, X.; Long, Y.; Li, T.; Chen, C.L.P. Adaptive Fuzzy Backstepping Asymptotic Disturbance Rejection of Multiagent Systems With Unknown Model Dynamics. IEEE Trans. Fuzzy Syst. 2022, 30, 4775–4787. [Google Scholar] [CrossRef]
Li, J.; Li, T.; Fan, Z.; Bu, R.; Li, Q.; Hu, J. Robust adaptive backstepping design for course-keeping control of ship with parameter uncertainty and input saturation. In Proceedings of the 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR), Dalian, China, 14–16 October 2011; pp. 63–67. [Google Scholar]
Li, Y.; Dong, S.; Li, K. Fixed-Time Command Filter Fuzzy Adaptive Formation Control for Nonholonomic Multirobot Systems With Unknown Dead-Zones. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17305–17316. [Google Scholar] [CrossRef]
Zhou, L.; Sun, Q.; Ding, S.; Han, S.; Wang, A. A machine-learning-based method for ship propulsion power prediction in ice. J. Mar. Sci. Eng. 2023, 11, 1381. [Google Scholar] [CrossRef]
Zhang, J.; Ning, J.; Tong, S. Adaptive Fuzzy Secure Collision-Free Formation Control for Nonlinear MASs With DoS Attacks. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 5705–5716. [Google Scholar] [CrossRef]
Gong, C.; Su, Y.; Zhu, Q.; Zhang, D.; Hu, X. Finite-time dynamic positioning control design for surface vessels with external disturbances, input saturation and error constraints. Ocean Eng. 2023, 276, 114259. [Google Scholar] [CrossRef]
Tong, S.; Sun, K.; Sui, S. Observer-Based Adaptive Fuzzy Decentralized Optimal Control Design for Strict-Feedback Nonlinear Large-Scale Systems. IEEE Trans. Fuzzy Syst. 2018, 26, 569–584. [Google Scholar] [CrossRef]
Yang, Y.; Ren, J. Adaptive fuzzy robust tracking controller design via small gain approach and its application. IEEE Trans. Fuzzy Syst. 2003, 11, 783–795. [Google Scholar] [CrossRef]
Fossen, T.I. Marine control systems–guidance. Navigation, and control of ships, rigs and underwater vehicles. In Marine Cybernetics; Springer: Trondheim, Norway, 2002. [Google Scholar]

Figure 1. Earth coordinate system and body coordinate system.

Figure 2. Overall control diagram of the designed RL control strategy.

Figure 3. Under Case 1: (a) Tracking performance with desired heading signal. (b) Tracking errors. (c) Control inputs. (d) Performance of the disturbance cancellation methods. (e) Convergence behavior of adaptive parameters. (f) Convergence of weight parameters.

Figure 4. Under Case 2: (a) Tracking performance with desired heading signal. (b) Tracking errors. (c) Control inputs. (d) Performance of the disturbance cancellation methods. (e) Convergence behavior of adaptive parameters. (f) Convergence of weight parameters.

Figure 5. Under Case 3: (a) Tracking performance of 3DOF USV. (b) Tracking errors. (c) Control inputs. (d) Performance of the disturbance cancellation methods. (e) Convergence behavior of adaptive parameters. (f) Convergence of weight parameters.

Table 1. Comparison of different control methods.

Control Method	Optimization	Adaptation	Disturbance Cancellation	Finite-Time Disturbance Cancellation	Actuator Constraint
Ref. [5]			✓
Ref. [19]			✓		✓
Ref. [23]		✓
Ref. [33]		✓	✓
Ref. [29]	✓	✓			✓
Proposed control	✓	✓	✓	✓	✓

Table 2. RMSE comparison of different controllers in different cases.

Case	Case 1				Case 2
Controller	Proposed Controller	PID	[34]	[30]	Proposed Controller	PID	[34]	[30]
RMSE	0.004	0.226	0.122	0.009	0.007	0.464	0.293	0.143

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, X.; Hu, X.; Yang, A. Fuzzy Reinforcement Learning Disturbance Cancellation Optimized Course Tracking Control for USV Autopilot Under Actuator Constraint. J. Mar. Sci. Eng. 2025, 13, 1429. https://doi.org/10.3390/jmse13081429

AMA Style

Gao X, Hu X, Yang A. Fuzzy Reinforcement Learning Disturbance Cancellation Optimized Course Tracking Control for USV Autopilot Under Actuator Constraint. Journal of Marine Science and Engineering. 2025; 13(8):1429. https://doi.org/10.3390/jmse13081429

Chicago/Turabian Style

Gao, Xiaoyang, Xin Hu, and Ang Yang. 2025. "Fuzzy Reinforcement Learning Disturbance Cancellation Optimized Course Tracking Control for USV Autopilot Under Actuator Constraint" Journal of Marine Science and Engineering 13, no. 8: 1429. https://doi.org/10.3390/jmse13081429

APA Style

Gao, X., Hu, X., & Yang, A. (2025). Fuzzy Reinforcement Learning Disturbance Cancellation Optimized Course Tracking Control for USV Autopilot Under Actuator Constraint. Journal of Marine Science and Engineering, 13(8), 1429. https://doi.org/10.3390/jmse13081429

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fuzzy Reinforcement Learning Disturbance Cancellation Optimized Course Tracking Control for USV Autopilot Under Actuator Constraint

Abstract

1. Introduction

2. Problem Formation and Preliminaries

2.1. Problem Formulation

2.2. Fuzzy Logic Systems (FLSs)

3. Rl-Composite Disturbance Cancellation Optimized Tracking Control Design

3.1. Construction of Finite-Time Disturbance Observer (FTDO)

3.2. Composite Disturbance Cancellation Control Design

3.3. Fuzzy RL Optimized Compensator Design

4. Simulation Results

4.1. Comparisons with Traditional PID Controller

4.2. Comparisons with Traditional NN Adaptive Control with Input Constraints [34]

4.3. Comparisons with RL Control with Traditional DO [30]

4.4. Extended Simulation on 3-DOF USV

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI