Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems

Peris, Clevon; Norton, Michael; Khoo, Suiyang

doi:10.3390/electronics13122424

Open AccessArticle

Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems

by

Clevon Peris

^*

,

Michael Norton

and

Suiyang Khoo

School of Engineering, Deakin University, Geelong, VIC 3220, Australia

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(12), 2424; https://doi.org/10.3390/electronics13122424

Submission received: 15 May 2024 / Revised: 14 June 2024 / Accepted: 19 June 2024 / Published: 20 June 2024

(This article belongs to the Special Issue UAV (Unmanned Aerial Vehicles) Networks: Recent Developments and Emerging Trends)

Download

Browse Figures

Versions Notes

Abstract

:

While using multirotor UAVs for transport of suspended payloads, there is a need for stability along the desired path, in addition to avoidance of any excessive payload oscillations, and a good level of precision in maintaining the desired path of the vehicle. However, due to the nonlinear and underactuated nature of the system, in addition to the presence of mismatched uncertainties, the development of a control system for this application poses an interesting research problem. This paper proposes a control architecture for a multirotor slung load system by integrating a Multi-Surface Sliding Mode Control, aided by a Radial Basis Function Neural Network, with a Deep Q-Network Reinforcement Learning agent. The former will be used to ensure asymptotic tracking stability, while the latter will be used to suppress payload oscillations. First, we will present the dynamics of a multirotor slung load system, represented here as a quadrotor with a single pendulum load suspended from it. We will then propose a control method in which a multi-surface sliding mode controller, based on an adaptive RBF Neural Network for trajectory tracking of the quadrotor, works in tandem with a Deep Q-Network Reinforcement Learning agent whose reward function aims to suppress the oscillations of the single pendulum slung load. Simulation results demonstrate the effectiveness and potential of the proposed approach in achieving precise and reliable control of multirotor slung load systems.

Keywords:

aerial robotics; multi-surface sliding mode control; adaptive control; control systems; radial basis function; neural networks; reinforcement learning

1. Introduction

1.1. Background

Unmanned Aerial Vehicles (UAVs) have been applied in a variety of fields [1]. One of the most well-known applications is the transport of suspended payloads, which can be in the form of parcel delivery, military rescue, surveillance systems, etc. UAVs are particularly beneficial for this purpose, owing to their versatility, adaptability, and efficiency. Slung load UAVs are capable of carrying payloads suspended beneath the aircraft, enabling the transportation of goods to remote or inaccessible areas with effective agility and speed. However, the dynamics of slung load systems introduce unique challenges that must be addressed to ensure safe and efficient operation. The combination of UAVs with slung loads poses an interesting control problem. A multirotor UAV is underactuated [2], in that it has more degrees of freedom than the number of control inputs available to us. The addition of this slung load adds further degrees of freedom to the system, only increasing the complexity of the problem. The slung load also has its own complexities such as load oscillations, aerodynamic disturbances, and dynamic instabilities, all of which significantly impact the UAV’s flight performance and the safety of the payload it transports.

In recent years, to handle this problem, various control methods have been sought by researchers. In [3], a backstepping technique was introduced to manage the dynamics of a quadrotor carrying a suspended load with limited actuators. The focus was on tackling the challenge of accurately following desired trajectories for the hanging payload. The research outlined in [4] utilized Fractional Order Sliding Mode Control (FOSMC) to improve the performance of a quadrotor tracking a predefined trajectory, adding to existing knowledge by developing a resilient FOSMC strategy for quadrotors to also handle external disturbances. In [5], an approach of Nonlinear Proportional–Integral–Derivative (NLPID) controllers for position response was combined with PID controllers to handle the attitude and swing response of a coupled UAV system. The authors of [6] explored an adaptive neural network-based fault-tolerant control method tailored for quadcopter slug load systems, looking to account for perturbances such as actuator faults, wind disturbances, and the presence of a suspended payload. A double active disturbance rejection control (ADRC) strategy was suggested in [7], in which the control system incorporated an extended state observer (ESO) in both the position and attitude loops to accurately estimate and counteract system disturbances. The authors of [8] studied an axially moving helicopter slung load system, by employing a fuzzy logic system (FLS). A fuzzy control technique was utilized to atone for system uncertainties and fault deviation vectors, after which an adaptive fuzzy control law was crafted within the introduced FLS framework. In [9], a geometric finite-time inner–outer-loop control approach was introduced for quadrotor slung load systems. This strategy involved a two-loop feedback control system. The inner-loop control law focused on achieving finite-time attitude tracking for the quadrotor, while the outer-loop controller stabilized the cable direction, aiming to ensure precise tracking of the payload trajectory. In [10], an Adaptive LQR controller was introduced, where the controller’s objective focused on regulating an arbitrary point, specifically the geometric centroid of the quadcopter, rather than that of the payload or the quadcopter’s center of gravity. In [11], a backstepping technique, accompanied by the introduction of an Uncertainty and Disturbance Estimator (UDE), was developed. The UDE was noted for its ability to transform the robust control challenge into a low-pass filter design in the frequency domain, thereby producing an estimation of lumped uncertainties.

1.2. Overview

In the case of multirotor UAVs with slung loads, most existing control methods consider the application of a robust control method, which primarily focuses on the trajectory tracking control of the multirotor, while using methods such as disturbance rejection or fault tolerance for any mismatched uncertainties, including those of slung load oscillations. Reinforcement learning methods are known to be effective in the stabilization of pendula attached to moving vehicles, although these have primarily focused on inverted or rotating pendula. With the benefits of robust trajectory tracking control, and the effectiveness of a self-learning reinforcement learning (RL) controller, there is potential to produce an efficient controller which can suppress payload oscillations for any given trajectory of the multirotor.

RL has previously been implemented with robust controllers, such as model predictive control for autonomous vehicles [12], sliding mode control for PV system grid integration [13], PID for a dynamic positioning system [14], LQR for continuous-time systems [15], and also some adaptive control methods [16,17]. In these cases, however, the RL techniques are applied to the robust controllers themselves, such that the control parameters—for instance, the P, I, and D parameters of a PID controller [18]—are adjusted by an RL algorithm based on feedback received from the system it is applied to. They have not commonly been used alongside robust control methods, and their application to slung load pendulum systems is still in its early stages.

Deep Q-Network (DQN) is an RL agent that combines the principles of reinforcement learning with deep neural networks, enabling the training of agents to make high-level decisions in environments with large state and action spaces. At its core, DQN is built upon the Q-learning algorithm, a classic reinforcement learning technique. The fundamental idea behind DQN is to approximate the optimal action-value function using a deep neural network. The use of deep neural networks allows DQN to handle complex and high-dimensional input spaces effectively [19,20,21,22]. DQN has also proven to be effective in applications involving drones, such as autonomous navigation [23], parcel delivery [24], and surveillance systems [25]. Some work implementing DQN agents for the control of inverted pendula presently exists [26,27,28], although no existing work has been carried out for slung load systems, or for multilink pendula. Some research has been carried out with UAVs [29,30], although as is the case with RL, this does not include slung load transport.

Multi-Surface Sliding Mode Control (MSSC) is a variation of sliding mode control which was suggested as a means of simplifying the development of control systems with complexity in model differentiation [31]. MSSC divides a system into several smaller parts based on their structural characteristics, with each sub-part having an individual sliding surface. An MSSC system is broken down into smaller subsystems based on their characteristics, with each subsystem having an independent sliding mode surface. In recent times, MSSC has emerged as a highly promising approach in control methodology. Unlike traditional sliding mode control, where a single sliding surface guides the system’s states along a desired trajectory, MSSC divides the control space into multiple sliding surfaces. This innovation addresses challenges such as uncertainties, disturbances, and nonlinearities that can lead to performance issues in conventional methods. Researchers have explored MSSC applications across a range of fields including robotics, aerospace, automotive, and power systems [32,33].

While a conventional MSSC is effective in maintaining effective control of the system, both in terms of trajectory tracking and suppression of oscillations, there is potential for it to be improved. The integration of RBF Neural Networks (RBFNNs) with MSSC enhances the robustness and adaptability of the control system, as RBFNNs are utilized to approximate and compensate for the uncertainties and nonlinearities in the MSL, while the MSSC can maintain the system’s trajectory. Furthermore, the use of DQN-based RL introduces an adaptive element to the control strategy, which enables the slung load pendulum to learn optimal control policies through interaction with the environment, thus improving performance over time, particularly in scenarios where the system dynamics may change or where there are varying operational conditions.

There is presently an attempt to shift industrial applications such as load delivery and transport to methods which are more effective and efficient in terms of time and labor. These include the replacement of urban trucks for such operations with drones [34], healthcare delivery [35], and agriculture [36], and utilizing modern advancements in aerial vehicles, such as eVTOLs, for intelligent transportation systems [37]. With the advent of these intelligent systems, a control method to support such systems in the presence of unexpected disturbances would prove to be very useful.

Keeping these factors in mind, in this work, we will combine the benefits of both the robust MSSC and the self-learning DQN-based RL methods. We will present a framework wherein an MSSC, aided by a Radial Basis Function Neural Network (RBFNN), will be used to guarantee asymptotic stability of the multirotor UAV, while the RL agent will be applied to the suppression of the slung load oscillations. The main contributions of this paper are as follows:

(i): Investigation of an RBFNN-MSSC applied to a thrust-vectored multirotor for trajectory tracking purposes;
(ii): Application of a DQN-based RL agent to a slung load pendulum;
(iii): Comparison of the performance of the combined control system with an RBFNN-MSSC applied to the entire multirotor slung load (MSL) system, based on the slung load oscillations.

2. Methodology

2.1. Dynamic Modeling

In this section, we propose a dynamic model for our multirotor slung load system, cascading their dynamics into a unified system. We use two state vectors to describe this system:

q_{t r a n s} = [x, y, z, θ_{x}, θ_{y}]

represents the system’s translational motion—with

x, y

and

z

denoting the position, and

θ_{x}

and

θ_{y}

denoting the slung load oscillation angles—and

q_{r o t} = [ϕ, θ, ψ]

. The overall state vector

q = [q_{t r a n s}, q_{r o t}]

encompasses the overall state vector. We take

M

to be the multirotor mass and

m

to be the slung load mass.

2.1.1. Multirotor Dynamics

For this work, we will consider a quadcopter with 4 rotors as our multirotor. The characteristics of this system are highlighted in Figure 1. In the following equations,

R_{A}^{B}

will denote the rotation of some frame B into another frame A. This procedure is well documented in the literature [38].

R_{X} (θ) = [\begin{matrix} 1 & 0 & 0 \\ 0 & C (θ) & - S (θ) \\ 0 & S (θ) & C (θ) \end{matrix}] R_{y} (θ) = [\begin{matrix} C (θ) & 0 & S (θ) \\ 0 & 1 & 0 \\ - S (θ) & 0 & C (θ) \end{matrix}] R_{z} (θ) = [\begin{matrix} C (θ) & - S (θ) & 0 \\ S (θ) & C (θ) & 0 \\ 0 & 0 & 1 \end{matrix}]

(1)

Here,

C (θ) = c o s θ

and

S (θ) = s i n θ

. These represent the general rotation matrices of the multirotor in terms of its roll, pitch, and yaw.

R_{P i}^{b} = R_{Z} (\frac{(i - 1)}{2} π) R_{X} (A_{i}) R_{Y} (B_{i}) = [\begin{matrix} R_{{P_{i}}_{11}}^{b} & R_{{P_{i}}_{12}}^{b} & R_{{P_{i}}_{13}}^{b} \\ R_{{P_{i}}_{21}}^{b} & R_{{P_{i}}_{22}}^{b} & R_{{P_{i}}_{23}}^{b} \\ R_{{P_{i}}_{31}}^{b} & R_{{P_{i}}_{32}}^{b} & R_{{P_{i}}_{33}}^{b} \end{matrix}]

(2)

where

R_{{P_{i}}_{11}}^{b} = C_{(\frac{(i - 1)}{2}) π} C_{B_{i}} - S_{(\frac{(i - 1)}{2}) π} S_{(A_{i})} S_{(B_{i})} R_{{P_{i}}_{12}}^{b} = {- S}_{(\frac{(i - 1)}{2}) π} C_{(A_{i})} R_{{P_{i}}_{13}}^{b} = C_{(\frac{(i - 1)}{2}) π} S_{(B_{i})} + S_{(\frac{(i - 1)}{2}) π} S_{{(A}_{i})} C_{(B_{i})} R_{{P_{i}}_{21}}^{b} = S_{(\frac{(i - 1)}{2}) π} C_{{(B}_{i})} + C_{(\frac{(i - 1)}{2}) π} S_{(A_{i})} S_{(B_{i})} R_{{P_{i}}_{22}}^{b} = C_{(\frac{(i - 1)}{2}) π} C_{{(A}_{i})} R_{{P_{i}}_{23}}^{b} = S_{(\frac{(i - 1)}{2}) π} S_{(B_{i})} - C_{(\frac{(i - 1)}{2}) π} S_{(A_{i})} C_{(B_{i})} R_{{P_{i}}_{31}}^{b} = - C_{(\frac{(i - 1)}{2}) π} S_{(B_{i})} R_{{P_{i}}_{32}}^{b} = S_{(A_{i})} R_{{P_{i}}_{33}}^{b} = C_{(A_{i})} C_{(B_{i})}

(3)

\begin{array}{l} O_{P_{i}}^{b} = R_{Z} (\frac{i - 1}{2}) π [\begin{matrix} L \\ 0 \\ 0 \end{matrix}] \\ = [\begin{matrix} C_{(\frac{(i - 1)}{2} π) L} \\ S_{(\frac{(i - 1)}{2} π) L} \\ 0 \end{matrix}] \end{array}

(4)

where i = 1…, n (for this quadrotor, n = 4), and L is the arm length of the multirotor, measured from the body frame centre (

O_{b})

to the center of a propeller (

O_{P_{i}}^{b}

). The torque acting on any propeller

i

, denoted as

{τ_{p}}_{i}

, is first obtained using Euler’s angular momentum theory:

{τ_{p}}_{i} = {I_{P}}_{i \dot{ω} P_{i}} + {ω_{P}}_{i} \times {I_{P}}_{i} {ω_{P}}_{i} + {[\begin{matrix} 0 & 0 & k_{c} ω_{P_{i Z}} |ω_{P_{i Z}}| \end{matrix}]}^{T}

(5)

where

{I_{P}}_{i}

represents each propeller’s inertia matrix,

{ω_{P}}_{i Z}

denotes the vector

ω_{P_{i}}

’s z-axis component,

k_{c} > 0

, the modulus of elasticity between

{ω_{P}}_{i Z}

and the counter-rotating torque about

{Z_{P}}_{i}

axis,

{ω_{P}}_{i} = {R_{P}}_{i}^{b - 1} ω_{b} + [\begin{matrix} \dot{α_{\dot{i}}} \\ \dot{β_{i}} \\ \hat{ω_{i}} \end{matrix}]

(6)

and

\hat{ω_{i}}

is the angular velocity of the

i^{t h}

propeller. If

{T_{P}}_{i}

is the

i^{t h}

propeller’s produced thrust, we obtain

{T_{P}}_{i} = {[\begin{matrix} 0 & 0 & k_{f} \hat{ω_{i}} |\hat{ω_{i}}| \end{matrix}]}^{T}

(7)

where

k_{f} > 0

is a fixed proportionality constant. Applying the fundamental theorem of mechanics for the body frame, together with Euler’s angular momentum theory, we obtain

S_{1} : M [\begin{matrix} \ddot{X_{b}} \\ \ddot{Y_{b}} \\ \ddot{Z_{b}} \end{matrix}] = R_{W}^{b} [\begin{matrix} 0 \\ 0 \\ - m g \end{matrix}] + \sum_{i = 1}^{4} {R_{P}}_{i}^{b} {T_{P}}_{i} + F_{D}

(8)

S_{2} : I_{b} {\dot{ω}}_{b} = \sum_{i = 1}^{4} ({O_{P}}_{i}^{b} \times {R_{P}}_{i}^{b} {T_{P}}_{i} - {R_{P}}_{i}^{b} {τ_{p}}_{i}) - ω_{b} \times I_{b} ω_{b}

(9)

where

I_{b}

is the inertial matrix governing the multirotor body and

F_{D}

represents force due to drag.

Considering an n-propeller multicopter, where

i = 1, . ., n

and

n \geq 2

and using the above equations, we obtain

S_{3} : M [\begin{matrix} \ddot{X_{b}} \\ \ddot{Y_{b}} \\ \ddot{Z_{b}} \end{matrix}] = R_{W}^{b} [\begin{matrix} 0 \\ 0 \\ - m g \end{matrix}] + \sum_{i = 1}^{n} {R_{P}}_{i}^{b} {T_{P}}_{i} + F_{D}

(10)

S_{4} : I_{b} {\dot{ω}}_{b} = \sum_{i = 1}^{n} ({O_{P}}_{i}^{b} \times {R_{P}}_{i}^{b} {T_{P}}_{i} - {R_{P}}_{i}^{b} {τ_{p}}_{i}) - ω_{b} \times I_{b} ω_{b}

(11)

2.1.2. Slung Load Dynamics

Now, we will define the dynamics of the slung load model, which will be represented in the form of a single pendulum. As we take the pendulum to be a point mass, the kinetic energy is purely translated. By considering the pendulum’s three-dimensional angle, we can obtain the specifics of its exact location, while ignoring cable hoisting and keeping its cable length consistent, essentially considering it to be rigid and inelastic. The slung load displacement is represented by

θ_{x}

and

θ_{y}

, which denote the oscillations about the co-ordinate axes. To represent the linear position of the slung load in the co-ordinate axes, we define a vector

X_{1}

, such that

X_{1} = l [\begin{matrix} \begin{matrix} s i n (θ_{x}) c o s (θ_{y}) \end{matrix} \\ s i n (θ_{x}) s i n (θ_{y}) \\ c o s (θ_{x}) \end{matrix}]

(12)

As per the dynamic model defined in [39], the system’s dynamic equations are

(M + m) \ddot{x} - m l c o s (θ_{x}) c o s (θ_{y}) {\ddot{θ}}_{x} + m l s i n (θ_{x}) s i n (θ_{y}) {\ddot{θ}}_{y} - m l c o s (θ_{x}) s i n (θ_{y}) {\dot{θ}}_{y}^{2} - m l s i n (θ_{x}) c o s (θ_{y}) {\dot{θ}}_{x}^{2} = f_{x} (t)

(13)

(M + m) \ddot{y} - m l c o s (θ_{x}) s i n (θ_{y}) {\ddot{θ}}_{x} + m l s i n (θ_{x}) c o s (θ_{1 y}) {\ddot{θ}}_{y} + m l c o s (θ_{x}) c o s (θ_{y}) {\dot{θ}}_{x}^{2} - m l s i n (θ_{x}) s i n (θ_{y}) {\dot{θ}}_{y}^{2} = f_{y} (t)

(14)

(M + m) \ddot{z} + m l s i n (θ_{x}) {\ddot{θ}}_{x} + m l s i n (θ_{x}) {\dot{θ}}_{x}^{2} + m l s i n (θ_{y}) {\ddot{θ}}_{y} + m l s i n (θ_{y}) {\dot{θ}}_{y}^{2} + (M + m) g = f_{z} (t)

(15)

(m l^{2} \cos^{2} (θ_{x}) + I_{x x}) {\ddot{θ}}_{1 x}^{2} - m l c o s (θ_{x}) c o s (θ_{y}) \ddot{x} - m l c o s (θ_{x}) s i n (θ_{y}) \ddot{y} + m l s i n (θ_{x}) \ddot{z} + m l^{2} c o s (θ_{y}) s i n (θ_{y}) {\dot{θ}}_{y}^{2} + m g l s i n (θ_{x}) = 0

(16)

(m l^{2} + I_{y y}) {\ddot{θ}}_{y} + m l s i n (θ_{x}) s i n (θ_{y}) \ddot{x} + m l s i n (θ_{x}) c o s (θ_{y}) \ddot{y} + m l s i n (θ_{y}) \ddot{z} - m l^{2} s i n (θ_{x}) c o s (θ_{x}) {\dot{θ}}_{y} = 0

(17)

Here, we will use the

θ_{x}

and

θ_{y}

terms to train our DQN agent.

Defining

x_{b} = {[\begin{matrix} X_{b} & Y_{b} & Z_{b} \end{matrix}]}^{T}

,

{T_{P}}_{i} = [0 0 k_{c} ω_{P_{i Z}}]

and obtaining the linear and angular accelerations from subsystems

S_{3}

and

S_{4}

result in

\ddot{x_{b}} = R_{W}^{b} [\begin{matrix} 0 \\ 0 \\ - m g \end{matrix}] + \frac{1}{M} \sum_{i = 1}^{4} {R_{P}^{b}}_{i} {T_{P}}_{i} + {D_{x}}_{b}

(18)

{\dot{ω}}_{b} = I_{b}^{- 1} \sum_{i = 1}^{4} ({O_{P}}_{i}^{b} \times {R_{P}}_{i}^{b} {T_{p}}_{i} - \frac{k_{c}}{k_{f}} {R_{P}^{b}}_{i} {T_{p}}_{i}) + D_{ω})

(19)

where

D_{ω} = - I_{b}^{- 1} [\sum_{i = 1}^{4} {R_{P}^{b}}_{i} ({I_{P}}_{i} \dot{ω_{p_{i}}} + {ω_{P}}_{i} \times I_{P_{i}} {ω_{P}}_{i}) + ω_{b} \times I_{b} ω_{b}] + Δ_{ω}

(20)

D_{x_{b}} = \frac{1}{M + M_{L}} F_{D} + Δ_{x_{b}}

(21)

where

Δ_{ω}

and

Δ_{x_{b}}

represent any external uncertainties impacting the UAV’s rate of change in angular and linear momentum, respectively.

x_{b}

and

ω_{b}

represent the system’s linear and angular co-ordinates, respectively, in the body frame.

Differentiating

{\dot{ω}}_{b}

with respect to time, results in

\overset{⃛}{x_{b}} = {\dot{R}}_{W}^{b} [\begin{matrix} 0 \\ 0 \\ - g \end{matrix}] + \frac{1}{M + M_{L}} \sum_{i = 1}^{n} (\frac{\partial {R_{P}^{b}}_{i}}{\partial A_{x}} {T_{P}}_{i} \dot{A_{i}} + \frac{\partial {R_{P}^{b}}_{i}}{\partial B_{i}} {T_{P}}_{i} \dot{B_{i}} + {R_{P}^{b}}_{i} \frac{\partial T_{i}}{\partial {\hat{ω}}_{i}} {\hat{ω}}_{i}) + \dot{{D_{x}}_{b}}

(22)

= {\dot{R}}_{W}^{b} [\begin{matrix} 0 \\ 0 \\ - g \end{matrix}] + F_{x \dot{A}} \dot{A} + F_{x \dot{Β}} \dot{Β} + F_{x \dot{ω}} \dot{ω} + \dot{{D_{x}}_{b}}

(23)

{\ddot{ω}}_{b} = I_{b}^{- 1} [\sum_{i = 1}^{n} (\frac{\partial {R_{P}^{b}}_{i}}{\partial A_{x}} {T_{P}}_{i} \dot{A_{i}} + \frac{\partial {R_{P}^{b}}_{i}}{\partial B_{i}} {T_{P}}_{i} \dot{B_{i}} + {R_{P}^{b}}_{i} \frac{\partial T_{i}}{\partial {\hat{ω}}_{i}} {\hat{ω}}_{i}) - \frac{k_{c}}{k_{f}} (\sum_{i = 1}^{n} (\frac{\partial {R_{P}^{b}}_{i}}{\partial A_{x}} {T_{P}}_{i} \dot{A_{i}} + \frac{\partial {R_{P}^{b}}_{i}}{\partial B_{i}} {T_{P}}_{i} \dot{B_{i}} + {R_{P}^{b}}_{i} \frac{\partial T_{i}}{\partial {\hat{ω}}_{i}} {\hat{ω}}_{i}))] + {\dot{D}}_{ω} \dot{= F_{x \dot{α}} A} + F_{x \dot{β}} \dot{β} + F_{x \dot{ω}} \dot{ω} + {\dot{D}}_{ω}

(24)

where

A = {[A_{1}, A_{2}, \dots, A_{n}]}^{T}, \dot{A} = {[\dot{A_{1}}, {\dot{A}}_{2}, \dots, {\dot{A}}_{n}]}^{T} B = {[B_{1}, B_{2}, \dots, B_{n}]}^{T}, \dot{B} = {[\dot{B_{1}}, {\dot{B}}_{2}, \dots, {\dot{B}}_{n}]}^{T} \hat{ω} = [\hat{ω_{1}}, \hat{ω_{2}}, \dots, \hat{ω_{n}}, \dot{\hat{ω}} = {[\dot{{\hat{ω}}_{1}}, {\dot{\hat{ω}}}_{2}, \dots, {\hat{\dot{ω}}}_{n}]}^{T}

(25)

F_{x \dot{A}} = \frac{1}{m} [\frac{\partial {R_{P}^{b}}_{1}}{\partial A_{1}} {T_{P}}_{1} \frac{\partial {R_{P}^{b}}_{2}}{\partial A_{2}} {T_{P}}_{2} \dots \frac{\partial {R_{P}^{b}}_{n}}{\partial A_{n}} {T_{P}}_{n}],

(26)

F_{ω \dot{A}} = I_{B}^{- 1} [{O_{P}^{b}}_{i} \times \frac{\partial {R_{P}^{b}}_{1}}{\partial A_{1}} - \frac{k_{c}}{k_{f}} \frac{\partial {R_{P}^{b}}_{1}}{\partial A_{1}} {T_{P}}_{1} {O_{P}^{B}}_{2} \times \frac{\partial {R_{P}^{b}}_{2}}{\partial A_{2}} {T_{P}}_{2} - \frac{k_{c}}{k_{f}} \frac{\partial {R_{P}^{b}}_{2}}{\partial A_{2}} {T_{P}}_{2} \dots {O_{P}^{b}}_{n} \times \frac{\partial {R_{P}^{b}}_{n}}{\partial A_{n}} {T_{P}}_{n} - \frac{k_{c}}{k_{f}} \frac{\partial {R_{P}^{b}}_{n}}{\partial A_{n}} {T_{P}}_{n}]

(27)

It is understood that

F_{x \dot{B}}

and

F_{x \dot{\hat{ω}}}

can be expressed similar to

F_{x \dot{A}}

while

F_{ω \dot{B}}

and

F_{ω \dot{\hat{ω}}}

can be defined like

F_{ω \dot{A}}

.

Taking

x_{1} = {[x_{b}^{T} {X_{1} (\int ω_{b} d t)}^{T}]}^{T}

,

x_{2} = {[{\dot{x}}_{b}^{T} X_{2} {\dot{ω}}_{b}^{T}]}^{T}

and

x_{3} = {[{\dot{x}}_{b}^{T} X_{3} {\dot{ω}}_{b}^{T}]}^{T}

, where

X_{2} = {\dot{X}}_{1}

and

X_{3} = {\dot{X}}_{2}

, we can rearrange (24) as

\dot{x_{1}} = x_{2} + δ_{1} \dot{x_{2}} = x_{3} + δ_{2} \dot{x_{3}} = [\begin{matrix} [\begin{matrix} 0 \\ 0 \\ - {\dot{R}}_{W}^{b} g \end{matrix}] \\ 0 \end{matrix}] + [\begin{matrix} J_{1} \\ J_{2} \\ J_{3} \end{matrix}] U + [\begin{matrix} {\dot{D}}_{x B} \\ {\dot{D}}_{ω} \end{matrix}]

(28)

where

J_{1} = J_{2} = [F_{x \dot{A}} (A, B, \hat{ω}) F_{x \dot{B}} (A, B, \hat{ω}) F_{x \hat{\dot{ω}}} (A, B, \hat{ω})],

(29)

J_{3} = [F_{ω \dot{A}} (A, B, \hat{ω}) F_{ω \dot{B}} (A, B, \hat{ω}) F_{ω \hat{\dot{ω}}} (A, B, \hat{ω})]

(30)

which represent the Jacobian matrices which multiply the control law,

U

, of the multirotor, and

δ_{1}, δ_{2}

and

δ_{3} = [\begin{matrix} {\dot{D}}_{x B} \\ {\dot{D}}_{ω} \end{matrix}]

represent undefined disturbances in the states of position, velocity, and acceleration, respectively.

2.2. Multi-Surface Sliding Mode Control

In this section, we derive the control for our MSL system. This is in the form of a Multi-Surface Sliding Mode Control, aided by a Radial Basis Function Neural Network. While MSSC has been effectively applied to the MSL system in question previously [40], the addition of an RBFNN to aid with system estimation yields some significant benefits. Its inclusion allows for the dynamic approximation of complex nonlinear functions and system uncertainties, which do not have a particular definition. Furthermore, RBFNNs provide an adaptive mechanism that continuously updates the control strategy based on real-time feedback from the system, ensuring that the controller adapts to changes in the system dynamics, maintaining optimal performance without the need for manual retuning.

For the co-ordinate system defined in (28), we first define the sliding surface variables for this system as

s_{1} = x_{1} - x_{1 d}

(31)

s_{2} = x_{2} - x_{2 d}

(32)

s_{3} = x_{3} - x_{3 d}

(33)

where

x_{1 d}

is the desired position of the multirotor, which is assumed to be bounded, together with

{\dot{x}}_{1 d}

and

{\ddot{x}}_{1 d}

, and

x_{i d}

, and

i = 1,2, 3

is the virtual control at sliding surface

s

. Now, as the slung load oscillations are suppressed by the RL agent,

θ_{x}

and

θ_{y}

must be obtained from the RL simulation. While performing any maneuvers on the slung load pendulum, the RL reward function also takes the velocity, or the displacement of the slung load from its desired path, into account. The reward function governing the slung load will be defined later. First, we shall explain the derivation of MSSC using an RBF Neural Network (RBFNN).

Differentiating

s_{1}

with respect to time, we acquire

{\dot{s}}_{1} = {\dot{x}}_{1} - {\dot{x}}_{1 d} = s_{2} + x_{2 d} + Δ_{1} - {\dot{x}}_{1 d}

(34)

where

Δ_{1} = δ_{1}

.

Taking

x_{2 d} = {\dot{x}}_{1 d} - \frac{K_{1}}{ϕ_{1}} s_{1} - {\hat{Δ}}_{1}

(35)

where

{\hat{Δ}}_{1}

is the approximation of

Δ_{1}

,

K_{1} > 0

and

ϕ_{1} > 0

. Substituting (35) into (33) results in

{\dot{s}}_{1} = s_{2} + (Δ_{1} - {\hat{Δ}}_{1}) - \frac{K_{1}}{ϕ_{1}} s_{1}

(36)

Differentiating (35) with respect to time results in

{\dot{x}}_{2 d} = {\ddot{x}}_{1 d} - {\dot{\hat{Δ}}}_{1} - \frac{K_{1}}{ϕ_{1}} (x_{2} + Δ_{1} - x_{1 d})

(37)

Now,

{\dot{x}}_{2 d}

is split into

{\dot{x}}_{2 d} = {\hat{\dot{x}}}_{2 d} + {\tilde{\dot{x}}}_{2 d}

, where

{\hat{\dot{x}}}_{2 d}

consists of the known portions of

{\dot{x}}_{2 d}

.

{\hat{\dot{x}}}_{2 d} = {\ddot{x}}_{1 d} - \frac{K_{1}}{ϕ_{1}} (x_{2} - x_{1 d})

(38)

{\tilde{\dot{x}}}_{2 d} = - {\dot{\hat{Δ}}}_{1} - \frac{K_{1}}{ϕ_{1}} (Δ_{1})

(39)

Now, differentiating

s_{2}

with respect to time, we acquire

{\dot{s}}_{2} = {\dot{x}}_{2} - {\dot{x}}_{2 d} = s_{3} + x_{3 d} + Δ_{2} - {\hat{x}}_{2 d}

(40)

where

Δ_{2} = δ_{2} - {\tilde{\dot{x}}}_{2 d}

. Then,

x_{3 d}

is chosen as

x_{3 d} = {\hat{\dot{x}}}_{2 d} - \frac{K_{2}}{ϕ_{2}} s_{2} - {\hat{Δ}}_{2}

(41)

where

{\hat{Δ}}_{2}

is an estimation of

Δ_{2}, K_{2} > 0

and

ϕ_{2} > 0

.

Substituting (41) into (40), we obtain

{\dot{s}}_{2} = s_{3} + (Δ_{2} - {\hat{Δ}}_{2}) - \frac{K_{2}}{ϕ_{2}} s_{2}

(42)

Differentiating

q_{3 d}

with respect to time yields

{\dot{x}}_{3 d} = {\hat{\dot{x}}}_{3 d} + {\tilde{\dot{x}}}_{3 d}

(43)

where

{\hat{\dot{x}}}_{3 d} = {\overset{⃛}{x}}_{1 d} - \frac{K_{1}}{ϕ_{1}} (x_{3} - {\dot{x}}_{1 d}) - \frac{K_{2}}{ϕ_{2}} (x_{3} - {\hat{\dot{x}}}_{2 d})

(44)

{\tilde{\dot{x}}}_{3 d} = - {\dot{\hat{Δ}}}_{2} - \frac{K_{1}}{ϕ_{1}} δ_{2} - \frac{K_{2}}{ϕ_{2}} Δ_{2}

(45)

Finally, differentiating

s_{3}

with respect to time, we acquire

{\dot{s}}_{3} = {\dot{x}}_{3} - {\dot{x}}_{3 d} = Q + J U + δ_{3} - {\hat{\dot{x}}}_{3 d} - {\tilde{\dot{x}}}_{3 d}

(46)

The control law is thus designed as

U = \frac{1}{J} ({\hat{\dot{x}}}_{3 d} - Q - {\hat{Δ}}_{3} - \frac{K_{3}}{ϕ_{3}} s_{3})

(47)

where

{\hat{Δ}}_{3}

is an estimate of

Δ_{3} = δ_{3} - {\tilde{\dot{x}}}_{3 d}, K_{3} > 0

, and

ϕ_{3} > 0

.

J = [\begin{matrix} J_{1} \\ J_{2} \\ J_{3} \end{matrix}]

as defined in (29) and (30).

2.3. Neural Network Approximation

Neural networks are typically used to approximate functions. Here, we will use RBF networks to estimate unknown disturbances

Δ_{i}

,

i = 1, 2, 3

, similar to [41]. An RBF Neural Network with M dimensions,

F (ζ) : R^{N} \to R^{M}

can be presented by

\hat{F} (ζ) = {[f_{1}, f_{2}, \dots, f_{M}]}^{T}

(48)

f_{i} = \sum_{j = 1}^{L} γ_{i j} ρ_{j}, i = 1,2 \dots M

(49)

ρ_{j} = e^{(\frac{- {(ζ - μ_{j})}^{T} (ζ - μ_{j})}{ψ_{j}^{2}})}

(50)

where

L

is the number of hidden nodes,

γ_{i j}

signifies the weight which connects the

j^{t h}

hidden node to the

i^{t h}

output node,

μ_{j}

denotes the

j^{t h}

central vector,

ρ_{j}

is the

j^{t h}

Gaussian function, and

ψ_{j}

represents the width of the

j^{t h}

Gaussian function, where

0 < ϕ_{j} < 1

.

Using an RBF network, to estimate an

M

dimensional continuous function

F (ζ)

using an RBF network, (49) is rearranged as

F (ζ) = \hat{F} (ζ) + ϵ = {γ^{*}}^{T} ρ + ϵ

(51)

where

ρ = {[ρ_{1}, ρ_{2}, \dots, ρ_{L}]}^{T}

(52)

{γ^{*}}^{T} = [\begin{matrix} γ_{11} & γ_{12} & \dots & γ_{1 L} \\ γ_{21} & γ_{22} & \dots & γ_{2 L} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ γ_{M 1} & γ_{M 2} & \dots & γ_{M L} \end{matrix}]

(53)

ϵ = {[ϵ_{1}, ϵ_{2}, \dots, ϵ_{M}]}^{T}

(54)

with

γ^{*}

representing the desired weight matrix and

ϵ

denoting an error vector obtained when the number of nodes being utilized is known beforehand. The unknown uncertainties

Δ_{i}, i = 1,2, 3

are estimated by utilizing an RBF network as in (51), where

{\hat{Δ}}_{i} (X, γ_{Δ_{i}}) = γ_{Δ_{i}}^{T} ρ_{i}

(55)

with

X = (x_{1}, x_{2}, x_{3})

.

θ_{Δ_{i}}^{*}

can be determined using the RBF network as follows:

γ_{Δ_{i}}^{*} = a r g m i n_{γ_{Δ_{i}}} \sup_{X \in Ω} | Δ_{i} - {\hat{Δ}}_{i} |

(56)

where

Ω \subset R^{n}

represents the region in which

X

exists. We make the following assumptions in the deduction that follows:

Assumption 1: For

i = 1,2, 3

the ideal estimation

{\hat{Δ}}_{i}

is within the bounds:

γ_{Δ_{i_{m i n}}} \leq γ_{Δ_{i}}^{*} \leq γ_{Δ_{i_{m a x}}}

(57)

Assumption 2: For

i = 1,2, 3

the ideal estimation

ϵ_{i}

is within the bounds:

|ϵ_{i}| \leq ϵ_{N_{i}}

(58)

where

ϵ_{N_{i}} > 0

. i.e., the error and the uncertainties lie within certain limits of the system’s behavior, to ensure that the system state is able to reach the sliding surface from any initial condition.

2.4. RBFNN-Based MSSC

Now, a robust controller is designed, aiming to control a quadrotor UAV subjected to unknown disturbances. In this regard, the multi-surface sliding controller represents the base control, while the disturbances will be approximated by the RBFNN.

Theorem 1:

For the multirotor slung load system, considering (57) and (58), if

ϵ_{N_{j}} \neq 0

for

j = 1,2, 3

, then with the virtual controllers in (35) and (41), and the control law in (47) in addition to the adaptation law defined by

{\dot{γ}}_{Δ_{j}}^{T} = \frac{s_{i}}{η_{i}} ρ_{j}

(59)

where

η > 0

, the system trajectories will arrive at a desired trajectory with a steady-state error bounded by

| ϕ_{1} |

.

Proof:

Beginning with

S_{3}

, the Lyapunov function is taken as

V_{3} = \frac{1}{2} s_{3}^{2} + η_{3} Γ_{Δ_{3}} Γ_{Δ_{3}}^{T}

(60)

where

Γ_{Δ_{3}} = γ_{Δ_{3}}^{*} - γ_{Δ_{3}}

(61)

Differentiating

V_{3}

with respect to time, we obtain

{\dot{V}}_{3} = s_{3} {\dot{s}}_{3} - η_{3} Γ_{Δ_{3}} {\dot{γ}}_{Δ_{3}}^{T}

(62)

Substituting Equations (36), (42), (46), and (47) into (62), we acquire

{\dot{V}}_{3} = s_{3} ((Δ_{3} - {\hat{Δ}}_{3}) - ϵ_{3} - \frac{K_{3}}{ϕ_{3}} s_{3}) - η_{3} Γ_{Δ_{3}} {\dot{γ}}_{Δ_{3}}^{T}

(63)

Substituting (59) and (61), we obtain

{\dot{V}}_{3} = s_{3} ϵ_{3} - \frac{K_{3}}{ϕ_{3}} s_{3}

(64)

When

| S_{3} > ϕ_{3} |

,

{\dot{V}}_{3} < - \frac{K_{3}}{ϕ_{3}} s_{3}^{2}

(65)

where

K_{3} > ϵ_{N_{3}}

. Note that once

|s_{3}| \leq ϕ_{3}

is achieved,

s_{3}

will always remain within the boundary layers. Next, the second surface

s_{2}

is studied. The following Lyapunov function is taken:

V_{2} = \frac{1}{2} s_{2}^{2} + η_{2} Γ_{Δ_{2}} Γ_{Δ_{2}}^{T}

(66)

By differentiating

V_{2}

with respect to time, we obtain

{\dot{V}}_{2} = s_{2} {\dot{s}}_{2} - η_{2} Γ_{Δ_{2}} Γ_{Δ_{2}}^{T} = s_{2} (s_{3} + (Δ_{2} - {\hat{Δ}}_{2}) - ϵ_{2} - \frac{K_{2}}{ϕ_{2}} s_{2}) - η_{2} Γ_{Δ_{2}} {\dot{γ}}_{Δ_{2}}^{T}

(67)

= s_{2} s_{3} + ϵ_{2} s_{2} - \frac{K_{2}}{ϕ_{2}} s_{2}^{2}

(68)

As

s_{3}

is initially arbitrary, we consider two cases:

(i): $|s_{3} (0)| < ϕ_{3};$ when $|s_{2}| > ϕ_{2}$ , then $s_{2} {\dot{s}}_{2} < 0$ , with $K_{2} = ϕ_{3} + c_{2}$ , where $c_{2} > ϵ_{N_{2}}$ .
(ii): $|s_{3} (0)| > ϕ_{3}$ ; substituting $K_{2}$ into (65), we obtain

{\dot{V}}_{2} = s_{2} (s_{3} + ϵ_{2} - (ϕ_{3} + c_{2}) \frac{s_{2}}{ϕ_{2}}) = s_{2} (s_{3} - \frac{ϕ_{3}}{ϕ_{2}} s_{2}) + s_{2} ϵ_{2} - \frac{c_{2}}{ϕ_{2}} s_{2}^{2}

(69)

Now, when

|s_{2}| \geq ϕ_{2}

, the following holds:

s_{2} {\dot{s}}_{2} < s_{2} (s_{3} - \frac{ϕ_{3}}{ϕ_{2}} s_{2}) - c_{1} \frac{s_{2}^{2}}{ϕ_{2}}

(70)

Now, dividing (69) by

| s_{2} |

and knowing that

\frac{s_{2}}{| s_{2} |} = s g n (s_{2})

and

s g n (s_{2}) {\dot{s}}_{2} = \frac{d}{d t} |s_{2}|

, (69) becomes

\frac{d}{d t} |s_{2}| < s g n (s_{2}) s_{3} - \frac{ϕ_{3}}{ϕ_{2}} |s_{2}| - c_{2} \frac{s_{2}}{ϕ_{2}}

(71)

As

\frac{ϕ_{3}}{ϕ_{2}} > 0

and

s_{3}

is bounded, then

| s_{2} |

is bounded, when

|s_{3}| < ϕ_{3}

, by the same consideration in case (i),

s_{2} {\dot{s}}_{2} < 0

is achieved when

|s_{2}| > ϕ_{2}

. Using the same approach and using

K_{1} = ϕ_{2} + c_{1}

with

c_{1} > ϵ_{N_{1}}

, the convergence of

s_{1}

may also be verified.

Hence, we have verified the asymptotic stability of the system in tracking the multirotor position with a suspended payload. □

Since the RL is primarily being utilized to suppress the slung load oscillations, the system state for the RL algorithm now only requires the pendulum angles to be defined. These will initially be obtained from the MSL simulation before being provided to the MSSC, which will concurrently operate on the multirotor.

2.5. Deep Q-Network Reinforcement Learning

Reinforcement learning algorithms, in general, are initiated with the help of the Markov Decision Process, which includes a few terms to understand;

O_{t}

is an observation,

a_{t}

is any action and

r_{t}

represents a reward.

s_{t}

represents the system state. As the environment learns from itself in this case,

O_{t} = s_{t}

. Figure 2 represents the reinforcement learning flowchart; the observation is first obtained from the environment by the agent, which then returns an action to the environment. Based on this action, the environment provides a reward or penalty to the agent in addition to the observation. In the MSL system, the agent action is that of the force applied in moving the UAV to its desired position in the presence of the slung load.

The system state includes the UAV’s position and the slung load pendulum orientation. It is defined as

O_{t} = s_{t} = [θ_{x} {\dot{θ}}_{x} θ_{y} {\dot{θ}}_{y}]

(72)

The goal for this system is to bring the slung load to the normal position right below the UAV’s centre of mass, such that it has no oscillations. The goal will be achieved when a few conditions pertaining to the vertical position of the pendulum load are within the angular threshold, while the UAV position also remains within the threshold of the XY plane. Initially, the pendulum is placed at a random angle starting position. The episode terminates if the angular threshold is exceeded.

In the course of the interaction, the intelligent agent first determines the next state,

s_{t + 1}

, from the environment. In this case, that would be the multirotor’s position and the slung load’s oscillations. Next, it creates an action

a_{t}

—which in this case is the force applied to move the MSL—based on the current state, and finally earns the reward,

r_{t + 1}

, by putting the current action out. The intelligent agent’s ultimate objective is to maximize the cumulative reward over the course of long-term operation, taking into account the benefits acquired during the interaction process.

g_{t} = r_{t + 1} + γ (r_{t + 2} + γ r_{t + 3} + γ^{2} r_{t + 4} + \dots) = r_{t + 1} + γ g_{t + 1}

(73)

where

γ

represents the learning rate of the agent.

Reinforcement learning often employs the epsilon greedy method [42] with a Q-table to guide actions based on states. A Q-table is used by the policy to find references representing the relationship between states and actions. Every term in the Q-table represents a Q-value

Q (s_{j}, a_{i})

where

i = (1,2 \dots m)

and

j = (1,2 \dots n)

. The chosen action is determined based on the policy of the system in question. However, managing a large Q-table can be cumbersome, especially in complex scenarios like UAV control. To address this, Deep Q-Networks (DQNs) replace the Q-table with neural networks.

In a DQN, one network predicts Q-values (prediction model) and another estimates target Q-values (target model). These networks adjust their weights through gradient descent using the error between target and predicted Q-values. During training, the agent generates training data by interacting with the environment and storing experiences in an experience replay buffer. These experiences are used to update the networks iteratively.

By using neural networks instead of Q-tables, DQN efficiently handles large state–action spaces, making it suitable for complex tasks like UAV control.

In this research, we will utilize the DQN agent to suppress the slung load oscillations, whose flow of control is as expressed in Figure 3. The DQN is used to suppress the pendulum’s oscillations by learning an optimal policy for controlling the pendulum through reinforcement learning. The key idea is to train a neural network to approximate the Q-function, which estimates the expected future rewards for each action given the current state of the system. With the initial environment and system state vector having been defined, a reward function is designed to encourage the pendulum to stay in its normal position, with oscillations suppressed. The reward function is defined as

r_{t} = \{\begin{matrix} 1, & | θ | \leq 1^{\circ} \\ - 10, & | θ | > 1^{\circ} \end{matrix}

(74)

This implies that the DQN agent is rewarded for suppressing the oscillation to within one degree, and penalized tenfold for failing to do so. Hence, the reward function restricts the oscillation angle to

1^{\circ}

, which is further suppressed by the MSSC if necessary.

The network architecture consists of 6 layers; the Feature Input Layer represents the state vector input layer, taking the end effector angle, its velocity, and the UAV position as inputs. The ‘CriticStateFC1’ is a Fully Connected Layer with 24 neurons, while ‘CriticStateFC2’ is another Fully Connected Layer that has 48 neurons. The activation function is a rectified linear unit (ReLU), represented by ‘CriticRelu1’. There is also an additional ReLU layer: the ‘CriticCommonRelu’. A third completely connected layer that has three neurons called ‘Output’ generates the network’s final output. This architecture is summarized in Table 1.

A replay buffer is set up to store present values of the system state, helping to break the correlation between consecutive samples by sampling random mini-batches for training. The MSL must now be trained to have its payload oscillations suppressed. For this, it is provided with training episodes in which the pendulum has various starting positions in each episode. In these episodes, upon obtaining information from the MSL environment, using the epsilon greedy method mentioned above, an action with the highest likelihood of maintaining the maximum reward, based on the selected reward function, is applied to the pendulum, and the resultant oscillation angle and state vector, as well as the reward obtained, are stored in the replay buffer. Periodically, random mini-batches are sampled from this replay buffer, and the target Q-value is calculated using the reward and the highest Q-value from the next state of the target network. The training continues until the desired average reward is maintained, such that the pendulum remains within the predefined threshold regardless of its starting position. The trained policy agent can then be saved and used again without having to re-train a new agent each time that the MSL is required to move from one point to another. Once trained, the DQN agent-based real-time control need not necessarily be trained again unless it is subject to a disturbance of a significantly different nature.

2.6. Control Architecture

Here, to combine the two controllers, we use the RBFNN-MSSC to maintain the multirotor trajectory and the DQN-RL to train the slung load to suppress its oscillations through self-learning. The architecture of this combined control system is displayed in Figure 4. Initially, the state vector consists of the position and attitude of the multirotor, in addition to the slung load oscillation angles. To feed them into the RBFNN-MSSC, the oscillation angles must be linearized as in (12). These vectors depend upon the magnitude of the oscillation angles. As seen in (72), the DQN agent requires an observation vector, which takes these oscillation angles,

θ_{x}

and

θ_{y}

, as inputs. Once the desired reward as explained in (74) is achieved, the training process is complete. The oscillation angles have been suppressed to a certain limit and can then be fed back into the system control loop. The RBFNN-MSSC then ensures that the desired trajectory of the system is achieved, by minimizing the error from the reference signal, which indicates the ideal trajectory of the multirotor and the desired orientation of the slung load.

3. Results

To verify the effectiveness of the proposed control, we have implemented some simulations in MATLAB R2022b. The parameters for the simulation are defined in Table 2. The sliding surface ratios are taken as 0.8, 1.0 and 1000 for

s_{1}

,

s_{2}

, and

s_{3}

, respectively. We train the system which is subject to an initial disturbance with a vector defined by

0.6 \times r a n d (n) + 0.1

, where

r a n d (n)

generates a number between 0 and 1.

The training of the DQN agent yields the results as seen in Figure 5. Each episode is implemented for 500 steps, and would terminate if the threshold of

1^{\circ}

is exceeded upon entering into the bounds of

[- 1^{\circ}, 1^{\circ}]

. Upon achieving the desired reward, the agent could be saved and used for future simulations, i.e., it need not be trained each time a simulation is implemented. The resultant oscillation angle is then fed back into the MSSC control loop. To verify its effectiveness, we compare its performance with the MSSC being applied to the slung load as part of the MSL as well, in contrast to it being controlled by the RL agent.

The parameters used for the simulation are highlighted in Table 2. We first consider the case of the MSL control along a square path. The robustness of the trajectory tracking is highlighted in Figure 6, where the sliding surfaces ensure that the quadrotor remains within the bounds of the desired path. In tracking this square trajectory, there is a slight offset from the desired path each time that a turn is performed at the corner of the desired square path, as seen in Figure 7, but it is quickly able to get back on track along the linear path of movement. The magnitude of reduction in slung load oscillations is greater with the RL agent over the MSSC, as represented in Figure 8, with a lower peak and a slightly quicker convergence. In Figure 9, the control inputs indicate that the initial chattering effect brought about by the disturbances applied to the MSL is quickly suppressed by the RBFNN. Hence, the proposed control architecture is effective when the MSL traces a square path.

To further validate the proposed control, we next consider a butterfly path. Once again, the sliding surfaces represented in Figure 10 indicate that the MSL is kept within the bounds of the desired path, whose tracked trajectory is shown in Figure 11. The step response characteristics displayed in Table 3 highlight the reduction in the transient time of the slung load pendulum when utilizing the DQN-based RL over the RBFNN-MSSC. A reduction in the undershoot represents a reduction in the lag of the pendulum behind the forward-moving trajectory of the multirotor. The peak is reduced by about 20%, while the peak time is also significantly reduced by 66%. The trajectory traced here is almost point for point, even in the presence of applied impulse disturbances. The effectiveness of the DQN-RL agent over the RBFNN-MSSC on its own is once again displayed in Figure 12, where the oscillation of the slung load is reduced. This reduction is further emphasized in Table 4, where we see that the oscillation angle, in utilizing the RL agent over the MSSC, is reduced by 30%. Finally, Figure 13 displays the elimination of the chattering effect even when there is a change in direction at every instant, with the addition of the RBFNN to the MSSC.

4. Conclusions

In this paper, we proposed a robust adaptive controller in which a Multi-Surface Sliding Mode Controller was aided by a Radial Basis Function Neural Network to ensure that a multirotor carrying a suspended payload would have a guaranteed asymptotic stability within certain bounds. To suppress the oscillations of the suspended payload, we combined the RBFNN-MSSC with a Deep Q-Network-based reinforcement learning agent, with a reward function that would focus on limiting the oscillations to a certain threshold. Upon comparing the results of this control architecture with a control system employing the RBFNN-MSSC on its own for the entire system, based on the oscillations of the suspended payload, the proposed architecture was found to be the more effective of the two. Lyapunov Stability Theory was also proposed to verify the effectiveness of the proposed control. Future work may include proposing an additional reward function for the multirotor to combine with that proposed for the suspended payload.

Author Contributions

Conceptualization, S.K. and M.N.; methodology, C.P.; validation, C.P., M.N. and S.K.; writing—original draft preparation, C.P.; writing—review and editing, M.N. and S.K.; supervision, M.N. and S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nawaz, H.; Ali, H.M.; Massan, S. Applications of unmanned aerial vehicles: A review. Tecnol. Glosas Innovación Apl. Pyme. Spec. 2019, 2019, 85–105. [Google Scholar] [CrossRef]
Emran, B.J.; Najjaran, H. A review of quadrotor: An underactuated mechanical system. Annu. Rev. Control. 2018, 46, 165–180. [Google Scholar] [CrossRef]
Baraean, A.; Hamanah, W.M.; Bawazir, A.; Quama, M.M.; El Ferik, S.; Baraean, S.; Abido, M.A. Optimal Nonlinear backstepping controller design of a Quadrotor-Slung load system using particle Swarm Optimization. Alex. Eng. J. 2023, 68, 551–560. [Google Scholar] [CrossRef]
Al-Dhaifallah, M.; Al-Qahtani, F.M.; Elferik, S.; Saif, A.-W.A. Quadrotor robust fractional-order sliding mode control in unmanned aerial vehicles for eliminating external disturbances. Aerospace 2023, 10, 665. [Google Scholar] [CrossRef]
Manalathody, A.; Krishnan, K.S.; Subramanian, J.A.; Thangavel, S.; Thangeswaran, R.S.K. Non-linear Controller for a Drone with Slung Load. In Proceedings of the International Conference on Modern Research in Aerospace Engineering, Noida, India, 21–22 September 2023; pp. 219–228. [Google Scholar]
Li, B.; Li, Y.; Yang, P.; Zhu, X. Adaptive neural network-based fault-tolerant control for quadrotor-slung-load system under marine scene. IEEE Trans. Intell. Veh. 2023, 9, 681–691. [Google Scholar] [CrossRef]
Wang, Z.; Qi, J.; Wu, C.; Wang, M.; Ping, Y.; Xin, J. Control of quadrotor slung load system based on double ADRC. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 6810–6815. [Google Scholar]
Ren, Y.; Zhao, Z.; Ahn, C.K.; Li, H.-X. Adaptive fuzzy control for an uncertain axially moving slung-load cable system of a hovering helicopter with actuator fault. IEEE Trans. Fuzzy Syst. 2022, 30, 4915–4925. [Google Scholar] [CrossRef]
Gajbhiye, S.; Cabecinhas, D.; Silvestre, C.; Cunha, R. Geometric finite-time inner-outer loop trajectory tracking control strategy for quadrotor slung-load transportation. Nonlinear Dyn. 2022, 107, 2291–2308. [Google Scholar] [CrossRef]
Tolba, M.; Shirinzadeh, B.; El-Bayoumi, G.; Mohamady, O. Adaptive optimal controller design for an unbalanced UAV with slung load. Auton. Robot. 2023, 47, 267–280. [Google Scholar] [CrossRef]
Wang, Y.; Yu, G.; Xie, W.; Zhang, W.; Silvestre, C. UDE-based Robust Control of a Quadrotor-Slung-Load System. IEEE Robot. Autom. Lett. 2023, 8, 6851–6858. [Google Scholar] [CrossRef]
Kabzan, J.; Hewing, L.; Liniger, A.; Zeilinger, M.N. Learning-based model predictive control for autonomous racing. IEEE Robot. Autom. Lett. 2019, 4, 3363–3370. [Google Scholar] [CrossRef]
Bag, A.; Subudhi, B.; Ray, P.K. A combined reinforcement learning and sliding mode control scheme for grid integration of a PV system. CSEE J. Power Energy Syst. 2019, 5, 498–506. [Google Scholar]
Lee, D.; Lee, S.J.; Yim, S.C. Reinforcement learning-based adaptive PID controller for DPS. Ocean Eng. 2020, 216, 108053. [Google Scholar] [CrossRef]
Rizvi, S.A.A.; Lin, Z. Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback. IEEE Trans. Cybern. 2019, 50, 4670–4679. [Google Scholar] [CrossRef] [PubMed]
Annaswamy, A.M. Adaptive control and intersections with reinforcement learning. Annu. Rev. Control Robot. Auton. Syst. 2023, 6, 65–93. [Google Scholar] [CrossRef]
Du, B.; Lin, B.; Zhang, C.; Dong, B.; Zhang, W. Safe deep reinforcement learning-based adaptive control for USV interception mission. Ocean Eng. 2022, 246, 110477. [Google Scholar] [CrossRef]
Wu, L.; Wang, C.; Zhang, P.; Wei, C. Deep reinforcement learning with corrective feedback for autonomous uav landing on a mobile platform. Drones 2022, 6, 238. [Google Scholar] [CrossRef]
Liang, X.; Du, X.; Wang, G.; Han, Z. Deep reinforcement learning for traffic light control in vehicular networks. arXiv 2018, arXiv:1803.11115. [Google Scholar]
Ma, S.; Lee, J.; Serban, N.; Yang, S. Deep Attention Q-Network for Personalized Treatment Recommendation. In Proceedings of the 2023 IEEE International Conference on Data Mining Workshops (ICDMW), Shanghai, China, 4 December 2023; IEEE: Piscataway, NJ, USA; pp. 329–337. [Google Scholar]
Peng, B.; Sun, Q.; Li, S.E.; Kum, D.; Yin, Y.; Wei, J.; Gu, T. End-to-end autonomous driving through dueling double deep Q-network. Automot. Innov. 2021, 4, 328–337. [Google Scholar] [CrossRef]
Fang, X.; Gong, G.; Li, G.; Chun, L.; Peng, P.; Li, W.; Shi, X.; Chen, X. Deep reinforcement learning optimal control strategy for temperature setpoint real-time reset in multi-zone building HVAC system. Appl. Therm. Eng. 2022, 212, 118552. [Google Scholar] [CrossRef]
Kersandt, K.; Muñoz, G.; Barrado, C. Self-training by reinforcement learning for full-autonomous drones of the future. In Proceedings of the 2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), London, UK, 23–27 September 2018; pp. 1–10. [Google Scholar]
Muñoz, G.; Barrado, C.; Çetin, E.; Salami, E. Deep reinforcement learning for drone delivery. Drones 2019, 3, 72. [Google Scholar] [CrossRef]
Raja, G.; Baskar, Y.; Dhanasekaran, P.; Nawaz, R.; Yu, K. An efficient formation control mechanism for multi-UAV navigation in remote surveillance. In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar]
Özalp, R.; Varol, N.K.; Taşci, B.; Uçar, A. A review of deep reinforcement learning algorithms and comparative results on inverted pendulum system. In Machine Learning Paradigms. Learning and Analytics in Intelligent Systems; Springer: Cham, Switzerland, 2020; pp. 237–256. [Google Scholar]
Dang, K.N.; Van, L.V. Development of deep reinforcement learning for inverted pendulum. Int. J. Electr. Comput. Eng. 2023, 13, 3895–3902. [Google Scholar]
Li, X.; Liu, H.; Wang, X. Solve the inverted pendulum problem base on DQN algorithm. In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; pp. 5115–5120. [Google Scholar]
Huang, H.; Yang, Y.; Wang, H.; Ding, Z.; Sari, H.; Adachi, F. Deep reinforcement learning for UAV navigation through massive MIMO technique. IEEE Trans. Veh. Technol. 2019, 69, 1117–1121. [Google Scholar] [CrossRef]
Wang, S.; Qi, N.; Jiang, H.; Xiao, M.; Liu, H.; Jia, L.; Zhao, D. Trajectory Planning for UAV-Assisted Data Collection in IoT Network: A Double Deep Q Network Approach. Electronics 2024, 13, 1592. [Google Scholar] [CrossRef]
Hedrick, J.K.; Yip, P.P. Multiple sliding surface control: Theory and application. J. Dyn. Sys. Meas. Control 2000, 122, 586–593. [Google Scholar] [CrossRef]
Thanh, H.L.N.N.; Hong, S.K. An extended multi-surface sliding control for matched/mismatched uncertain nonlinear systems through a lumped disturbance estimator. IEEE Access 2020, 8, 91468–91475. [Google Scholar] [CrossRef]
Ullah, S.; Khan, Q.; Mehmood, A.; Bhatti, A.I. Robust backstepping sliding mode control design for a class of underactuated electro–mechanical nonlinear systems. J. Electr. Eng. Technol. 2020, 15, 1821–1828. [Google Scholar] [CrossRef]
Qu, X.; Zeng, Z.; Wang, K.; Wang, S. Replacing urban trucks via ground–air cooperation. Commun. Transp. Res. 2022, 2, 100080. [Google Scholar] [CrossRef]
Nyaaba, A.A.; Ayamga, M. Intricacies of medical drones in healthcare delivery: Implications for Africa. Technol. Soc. 2021, 66, 101624. [Google Scholar] [CrossRef]
Rejeb, A.; Abdollahi, A.; Rejeb, K.; Treiblmaier, H. Drones in agriculture: A review and bibliometric analysis. Comput. Electron. Agric. 2022, 198, 107017. [Google Scholar] [CrossRef]
Zheng, C.; Yan, Y.; Liu, Y. Prospects of eVTOL and modular flying cars in China urban settings. J. Intell. Connect. Veh. 2023, 6, 187–189. [Google Scholar] [CrossRef]
Khoo, S.; Norton, M.; Kumar, J.J.; Yin, J.; Yu, X.; Macpherson, T.; Dowling, D.; Kouzani, A. Robust control of novel thrust vectored 3D printed multicopter. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 1270–1275. [Google Scholar]
Peris, C.; Norton, M.; Khoo, S.Y. Variations in Finite-Time Multi-Surface Sliding Mode Control for Multirotor Unmanned Aerial Vehicle Payload Delivery with Pendulum Swinging Effects. Machines 2023, 11, 899. [Google Scholar] [CrossRef]
Peris, C.; Norton, M.; Khoo, S.Y. Multi-surface Sliding Mode Control of a Thrust Vectored Quadcopter with a Suspended Double Pendulum Weight. In Proceedings of the IECON 2021–47th Annual Conference of the IEEE Industrial Electronics Society, Toronto, ON, Canada, 13–16 October 2021; pp. 1–6. [Google Scholar]
Clevon Peris, M.N.; Khoo, S.Y. Adaptive Multi Surface Sliding Mode Control of a Quadrotor Slung Load System. In Proceedings of the IEEE 10th International Conference on Automation, Robotics and Application (ICARA 2024), Athens, Greece, 23 February 2024. [Google Scholar]
Kuang, N.L.; Leung, C.H. Performance effectiveness of multimedia information search using the epsilon-greedy algorithm. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 929–936. [Google Scholar]

Figure 1. Model of a quadcopter with a suspended pendulum load.

Figure 2. General flow of an RL control system.

Figure 3. RL using the DQN method.

Figure 4. Framework of the RBFNNMSSC with DQN-RL.

Figure 5. Training of the DQN agent for real-time control of the slung load.

Figure 6. Sliding surfaces for trajectory tracking along a square path.

Figure 7. Trajectory tracking of the multirotor by the RBFNN-MSSC.

Figure 8. Comparison of the slung load oscillation with MSSC and with RL along a square path.

Figure 9. Control inputs (rotor tilt angles) along the square path.

Figure 10. Sliding surfaces for trajectory tracking along a butterfly path.

Figure 11. Trajectory tracking of the multirotor by the RBFNN-MSSC along a butterfly path.

Figure 12. Comparison of the slung load oscillation with MSSC and with RL along the butterfly path.

Figure 13. Control inputs (rotor tilt angles) along the butterfly path.

Table 1. DQN agent layers.

Layer	Name	Definition
1	‘observation’	Feature Input
2	‘CriticStateFC1’	Fully Connected
3	‘CriticStateFC2’	Fully Connected
4	‘CriticRelu1’	Activation Function
5	‘CriticCommonRelu’	ReLU
6	‘Output’	Fully Connected

Table 2. Simulation parameters.

Parameters	Symbols	Values
Multirotor mass	M	4 kg
Slung load mass	m	0.5 kg
Slung load link length	l	1 m
Rotor speed for hover	ω_res	29,700 m/s
UAV moment of inertia	I	2.07 × 10⁻² kg/m²
Feedback control time step	T_c	0.005
Simulation run time	T	10 s
Sampling rate	T_s	0.01 s
Learning rate	γ	5 × 10⁻³

Table 3. Step response characteristics of MSSC vs. RL.

Parameter	Value
	Butterfly Trajectory		Square Trajectory
	MSSC	RL	MSSC	RL
Rise time	0.01	0.006	0.008	0.006
Transient time	27.08	22.6	25.35	22.21
Settling time	29.94	29.98	29.98	29.97
Settling min	−0.12	−0.12	−2.55	−3.26
Settling max	1.43	1.01	3.53	2.91
Overshoot	2.71	6.94	3.81	4.0
Undershoot	3.6	1.1	4.04	3.58
Peak	1.83	1.58	3.75	3.26
Peak time	1.01	0.37	1.02	0.38

Table 4. Quantitative data of slung load oscillations.

Parameter		Max	Mean	RMS
Butterfly Trajectory	MSSC	1.43	−0.01	0.37
Butterfly Trajectory	RL	1.01	−0.01	0.26
Square Trajectory	MSSC	3.53	−1.22 × 10⁻⁴	0.98
Square Trajectory	RL	2.92	−7.41 × 10⁻⁴	0.70

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peris, C.; Norton, M.; Khoo, S. Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems. Electronics 2024, 13, 2424. https://doi.org/10.3390/electronics13122424

AMA Style

Peris C, Norton M, Khoo S. Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems. Electronics. 2024; 13(12):2424. https://doi.org/10.3390/electronics13122424

Chicago/Turabian Style

Peris, Clevon, Michael Norton, and Suiyang Khoo. 2024. "Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems" Electronics 13, no. 12: 2424. https://doi.org/10.3390/electronics13122424

APA Style

Peris, C., Norton, M., & Khoo, S. (2024). Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems. Electronics, 13(12), 2424. https://doi.org/10.3390/electronics13122424

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems

Abstract

1. Introduction

1.1. Background

1.2. Overview

2. Methodology

2.1. Dynamic Modeling

2.1.1. Multirotor Dynamics

2.1.2. Slung Load Dynamics

2.2. Multi-Surface Sliding Mode Control

2.3. Neural Network Approximation

2.4. RBFNN-Based MSSC

2.5. Deep Q-Network Reinforcement Learning

2.6. Control Architecture

3. Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI