Next Article in Journal
TrustHealth: Enhancing eHealth Security with Blockchain and Trusted Execution Environments
Next Article in Special Issue
A Deep Reinforcement Learning Algorithm for Trajectory Planning of Swarm UAV Fulfilling Wildfire Reconnaissance
Previous Article in Journal
Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay
Previous Article in Special Issue
Development of an Uneven Terrain Decision-Aid Landing System for Fixed-Wing Aircraft Based on Computer Vision
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems

School of Engineering, Deakin University, Geelong, VIC 3220, Australia
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(12), 2424; https://doi.org/10.3390/electronics13122424
Submission received: 15 May 2024 / Revised: 14 June 2024 / Accepted: 19 June 2024 / Published: 20 June 2024

Abstract

:
While using multirotor UAVs for transport of suspended payloads, there is a need for stability along the desired path, in addition to avoidance of any excessive payload oscillations, and a good level of precision in maintaining the desired path of the vehicle. However, due to the nonlinear and underactuated nature of the system, in addition to the presence of mismatched uncertainties, the development of a control system for this application poses an interesting research problem. This paper proposes a control architecture for a multirotor slung load system by integrating a Multi-Surface Sliding Mode Control, aided by a Radial Basis Function Neural Network, with a Deep Q-Network Reinforcement Learning agent. The former will be used to ensure asymptotic tracking stability, while the latter will be used to suppress payload oscillations. First, we will present the dynamics of a multirotor slung load system, represented here as a quadrotor with a single pendulum load suspended from it. We will then propose a control method in which a multi-surface sliding mode controller, based on an adaptive RBF Neural Network for trajectory tracking of the quadrotor, works in tandem with a Deep Q-Network Reinforcement Learning agent whose reward function aims to suppress the oscillations of the single pendulum slung load. Simulation results demonstrate the effectiveness and potential of the proposed approach in achieving precise and reliable control of multirotor slung load systems.

1. Introduction

1.1. Background

Unmanned Aerial Vehicles (UAVs) have been applied in a variety of fields [1]. One of the most well-known applications is the transport of suspended payloads, which can be in the form of parcel delivery, military rescue, surveillance systems, etc. UAVs are particularly beneficial for this purpose, owing to their versatility, adaptability, and efficiency. Slung load UAVs are capable of carrying payloads suspended beneath the aircraft, enabling the transportation of goods to remote or inaccessible areas with effective agility and speed. However, the dynamics of slung load systems introduce unique challenges that must be addressed to ensure safe and efficient operation. The combination of UAVs with slung loads poses an interesting control problem. A multirotor UAV is underactuated [2], in that it has more degrees of freedom than the number of control inputs available to us. The addition of this slung load adds further degrees of freedom to the system, only increasing the complexity of the problem. The slung load also has its own complexities such as load oscillations, aerodynamic disturbances, and dynamic instabilities, all of which significantly impact the UAV’s flight performance and the safety of the payload it transports.
In recent years, to handle this problem, various control methods have been sought by researchers. In [3], a backstepping technique was introduced to manage the dynamics of a quadrotor carrying a suspended load with limited actuators. The focus was on tackling the challenge of accurately following desired trajectories for the hanging payload. The research outlined in [4] utilized Fractional Order Sliding Mode Control (FOSMC) to improve the performance of a quadrotor tracking a predefined trajectory, adding to existing knowledge by developing a resilient FOSMC strategy for quadrotors to also handle external disturbances. In [5], an approach of Nonlinear Proportional–Integral–Derivative (NLPID) controllers for position response was combined with PID controllers to handle the attitude and swing response of a coupled UAV system. The authors of [6] explored an adaptive neural network-based fault-tolerant control method tailored for quadcopter slug load systems, looking to account for perturbances such as actuator faults, wind disturbances, and the presence of a suspended payload. A double active disturbance rejection control (ADRC) strategy was suggested in [7], in which the control system incorporated an extended state observer (ESO) in both the position and attitude loops to accurately estimate and counteract system disturbances. The authors of [8] studied an axially moving helicopter slung load system, by employing a fuzzy logic system (FLS). A fuzzy control technique was utilized to atone for system uncertainties and fault deviation vectors, after which an adaptive fuzzy control law was crafted within the introduced FLS framework. In [9], a geometric finite-time inner–outer-loop control approach was introduced for quadrotor slung load systems. This strategy involved a two-loop feedback control system. The inner-loop control law focused on achieving finite-time attitude tracking for the quadrotor, while the outer-loop controller stabilized the cable direction, aiming to ensure precise tracking of the payload trajectory. In [10], an Adaptive LQR controller was introduced, where the controller’s objective focused on regulating an arbitrary point, specifically the geometric centroid of the quadcopter, rather than that of the payload or the quadcopter’s center of gravity. In [11], a backstepping technique, accompanied by the introduction of an Uncertainty and Disturbance Estimator (UDE), was developed. The UDE was noted for its ability to transform the robust control challenge into a low-pass filter design in the frequency domain, thereby producing an estimation of lumped uncertainties.

1.2. Overview

In the case of multirotor UAVs with slung loads, most existing control methods consider the application of a robust control method, which primarily focuses on the trajectory tracking control of the multirotor, while using methods such as disturbance rejection or fault tolerance for any mismatched uncertainties, including those of slung load oscillations. Reinforcement learning methods are known to be effective in the stabilization of pendula attached to moving vehicles, although these have primarily focused on inverted or rotating pendula. With the benefits of robust trajectory tracking control, and the effectiveness of a self-learning reinforcement learning (RL) controller, there is potential to produce an efficient controller which can suppress payload oscillations for any given trajectory of the multirotor.
RL has previously been implemented with robust controllers, such as model predictive control for autonomous vehicles [12], sliding mode control for PV system grid integration [13], PID for a dynamic positioning system [14], LQR for continuous-time systems [15], and also some adaptive control methods [16,17]. In these cases, however, the RL techniques are applied to the robust controllers themselves, such that the control parameters—for instance, the P, I, and D parameters of a PID controller [18]—are adjusted by an RL algorithm based on feedback received from the system it is applied to. They have not commonly been used alongside robust control methods, and their application to slung load pendulum systems is still in its early stages.
Deep Q-Network (DQN) is an RL agent that combines the principles of reinforcement learning with deep neural networks, enabling the training of agents to make high-level decisions in environments with large state and action spaces. At its core, DQN is built upon the Q-learning algorithm, a classic reinforcement learning technique. The fundamental idea behind DQN is to approximate the optimal action-value function using a deep neural network. The use of deep neural networks allows DQN to handle complex and high-dimensional input spaces effectively [19,20,21,22]. DQN has also proven to be effective in applications involving drones, such as autonomous navigation [23], parcel delivery [24], and surveillance systems [25]. Some work implementing DQN agents for the control of inverted pendula presently exists [26,27,28], although no existing work has been carried out for slung load systems, or for multilink pendula. Some research has been carried out with UAVs [29,30], although as is the case with RL, this does not include slung load transport.
Multi-Surface Sliding Mode Control (MSSC) is a variation of sliding mode control which was suggested as a means of simplifying the development of control systems with complexity in model differentiation [31]. MSSC divides a system into several smaller parts based on their structural characteristics, with each sub-part having an individual sliding surface. An MSSC system is broken down into smaller subsystems based on their characteristics, with each subsystem having an independent sliding mode surface. In recent times, MSSC has emerged as a highly promising approach in control methodology. Unlike traditional sliding mode control, where a single sliding surface guides the system’s states along a desired trajectory, MSSC divides the control space into multiple sliding surfaces. This innovation addresses challenges such as uncertainties, disturbances, and nonlinearities that can lead to performance issues in conventional methods. Researchers have explored MSSC applications across a range of fields including robotics, aerospace, automotive, and power systems [32,33].
While a conventional MSSC is effective in maintaining effective control of the system, both in terms of trajectory tracking and suppression of oscillations, there is potential for it to be improved. The integration of RBF Neural Networks (RBFNNs) with MSSC enhances the robustness and adaptability of the control system, as RBFNNs are utilized to approximate and compensate for the uncertainties and nonlinearities in the MSL, while the MSSC can maintain the system’s trajectory. Furthermore, the use of DQN-based RL introduces an adaptive element to the control strategy, which enables the slung load pendulum to learn optimal control policies through interaction with the environment, thus improving performance over time, particularly in scenarios where the system dynamics may change or where there are varying operational conditions.
There is presently an attempt to shift industrial applications such as load delivery and transport to methods which are more effective and efficient in terms of time and labor. These include the replacement of urban trucks for such operations with drones [34], healthcare delivery [35], and agriculture [36], and utilizing modern advancements in aerial vehicles, such as eVTOLs, for intelligent transportation systems [37]. With the advent of these intelligent systems, a control method to support such systems in the presence of unexpected disturbances would prove to be very useful.
Keeping these factors in mind, in this work, we will combine the benefits of both the robust MSSC and the self-learning DQN-based RL methods. We will present a framework wherein an MSSC, aided by a Radial Basis Function Neural Network (RBFNN), will be used to guarantee asymptotic stability of the multirotor UAV, while the RL agent will be applied to the suppression of the slung load oscillations. The main contributions of this paper are as follows:
(i)
Investigation of an RBFNN-MSSC applied to a thrust-vectored multirotor for trajectory tracking purposes;
(ii)
Application of a DQN-based RL agent to a slung load pendulum;
(iii)
Comparison of the performance of the combined control system with an RBFNN-MSSC applied to the entire multirotor slung load (MSL) system, based on the slung load oscillations.

2. Methodology

2.1. Dynamic Modeling

In this section, we propose a dynamic model for our multirotor slung load system, cascading their dynamics into a unified system. We use two state vectors to describe this system: q t r a n s = [ x , y , z , θ x , θ y ] represents the system’s translational motion—with x , y and z denoting the position, and θ x and θ y denoting the slung load oscillation angles—and q r o t = ϕ , θ , ψ . The overall state vector q = [ q t r a n s , q r o t ] encompasses the overall state vector. We take M to be the multirotor mass and m to be the slung load mass.

2.1.1. Multirotor Dynamics

For this work, we will consider a quadcopter with 4 rotors as our multirotor. The characteristics of this system are highlighted in Figure 1. In the following equations, R A B will denote the rotation of some frame B into another frame A. This procedure is well documented in the literature [38].
R X θ = 1 0 0 0 C θ S θ 0 S θ C θ R y θ = C θ 0 S θ 0 1 0 S θ 0 C θ R z θ = C θ S θ 0 S θ C θ 0 0 0 1
Here, C θ = c o s θ and S θ = s i n θ . These represent the general rotation matrices of the multirotor in terms of its roll, pitch, and yaw.
R P i b = R Z i 1 2 π   R X A i R Y B i = R P i 11 b R P i 12 b R P i 13 b R P i 21 b R P i 22 b R P i 23 b R P i 31 b R P i 32 b R P i 33 b
where
R P i 11 b = C i 1 2 π C B i S i 1 2 π S ( A i ) S ( B i ) R P i 12 b = S i 1 2 π C ( A i ) R P i 13 b = C i 1 2 π S ( B i ) + S i 1 2 π S ( A i ) C ( B i ) R P i 21 b = S i 1 2 π C ( B i ) + C i 1 2 π S ( A i ) S ( B i ) R P i 22 b = C i 1 2 π C ( A i ) R P i 23 b = S i 1 2 π S ( B i ) C i 1 2 π S ( A i ) C ( B i ) R P i 31 b = C i 1 2 π S ( B i ) R P i 32 b = S ( A i ) R P i 33 b = C ( A i ) C ( B i )
O P i b = R Z i 1 2 π L 0 0   = C i 1 2 π L S i 1 2 π L 0
where i = 1…, n (for this quadrotor, n = 4), and L is the arm length of the multirotor, measured from the body frame centre ( O b ) to the center of a propeller ( O P i b ). The torque acting on any propeller i , denoted as τ p i , is first obtained using Euler’s angular momentum theory:
τ p i = I P i ω ˙ P i + ω P i × I P i ω P i + 0 0 k c ω P i Z ω P i Z T
where I P i represents each propeller’s inertia matrix, ω P i Z denotes the vector ω P i ’s z-axis component, k c > 0 , the modulus of elasticity between ω P i Z and the counter-rotating torque about Z P i axis,
ω P i = R P i b 1 ω b + α i ˙ ˙ β i ˙ ω i ^
and ω i ^ is the angular velocity of the i t h propeller. If T P i is the i t h propeller’s produced thrust, we obtain
T P i = 0 0 k f ω i ^ ω i ^ T
where k f > 0 is a fixed proportionality constant. Applying the fundamental theorem of mechanics for the body frame, together with Euler’s angular momentum theory, we obtain
S 1 : M X b ¨ Y b ¨ Z b ¨ = R W b 0 0 m g + i = 1 4 R P i b T P i + F D
S 2 : I b ω ˙ b = i = 1 4 O P i b × R P i b T P i R P i b τ p i ω b × I b ω b
where I b is the inertial matrix governing the multirotor body and F D represents force due to drag.
Considering an n-propeller multicopter, where i = 1 , . . , n and n 2 and using the above equations, we obtain
S 3 : M X b ¨ Y b ¨ Z b ¨ = R W b 0 0 m g + i = 1 n R P i b T P i + F D
S 4   : I b ω ˙ b = i = 1 n O P i b × R P i b T P i R P i b τ p i ω b × I b ω b

2.1.2. Slung Load Dynamics

Now, we will define the dynamics of the slung load model, which will be represented in the form of a single pendulum. As we take the pendulum to be a point mass, the kinetic energy is purely translated. By considering the pendulum’s three-dimensional angle, we can obtain the specifics of its exact location, while ignoring cable hoisting and keeping its cable length consistent, essentially considering it to be rigid and inelastic. The slung load displacement is represented by θ x and θ y , which denote the oscillations about the co-ordinate axes. To represent the linear position of the slung load in the co-ordinate axes, we define a vector X 1 , such that
X 1 = l s i n ( θ x ) c o s ( θ y ) s i n ( θ x ) s i n ( θ y ) c o s ( θ x )
As per the dynamic model defined in [39], the system’s dynamic equations are
M + m x ¨ m l c o s ( θ x ) c o s ( θ y ) θ ¨ x + m l s i n ( θ x ) s i n ( θ y ) θ ¨ y m l c o s ( θ x ) s i n ( θ y ) θ ˙ y 2 m l s i n ( θ x ) c o s ( θ y ) θ ˙ x 2 = f x ( t )
M + m y ¨ m l c o s ( θ x ) s i n ( θ y ) θ ¨ x + m l s i n ( θ x ) c o s ( θ 1 y ) θ ¨ y + m l c o s ( θ x ) c o s ( θ y ) θ ˙ x 2 m l s i n ( θ x ) s i n ( θ y ) θ ˙ y 2 = f y ( t )
M + m z ¨ + m l s i n ( θ x ) θ ¨ x + m l s i n ( θ x ) θ ˙ x 2 + m l s i n ( θ y ) θ ¨ y + m l s i n ( θ y ) θ ˙ y 2 + M + m g = f z t
m l 2 cos 2 ( θ x ) + I x x θ ¨ 1 x 2 m l c o s ( θ x ) c o s ( θ y ) x ¨ m l c o s ( θ x ) s i n ( θ y ) y ¨ + m l s i n ( θ x ) z ¨ + m l 2 c o s ( θ y ) s i n ( θ y ) θ ˙ y 2 + m g l s i n ( θ x ) = 0
m l 2 + I y y θ ¨ y + m l s i n ( θ x ) s i n ( θ y ) x ¨ + m l s i n ( θ x ) c o s ( θ y ) y ¨ + m l s i n ( θ y ) z ¨ m l 2 s i n ( θ x ) c o s ( θ x ) θ ˙ y = 0
Here, we will use the θ x and θ y terms to train our DQN agent.
Defining x b = X b Y b Z b T , T P i = [ 0   0   k c ω P i Z ] and obtaining the linear and angular accelerations from subsystems S 3 and S 4 result in
x b ¨ = R W b 0 0 m g + 1 M i = 1 4 R P b i T P i + D x b
ω ˙ b = I b 1 i = 1 4 ( O P i b × R P i b T p i k c k f R P b i T p i ) + D ω )
where
D ω = I b 1 i = 1 4 R P b i I P i ω p i ˙ + ω P i × I P i ω P i + ω b × I b ω b + Δ ω
D x b = 1 M + M L F D + Δ x b
where Δ ω   and Δ x b   represent any external uncertainties impacting the UAV’s rate of change in angular and linear momentum, respectively.
x b and ω b represent the system’s linear and angular co-ordinates, respectively, in the body frame.
Differentiating ω ˙ b with respect to time, results in
x b = R ˙ W b 0 0 g + 1 M + M L i = 1 n R P b i A x T P i A i ˙ + R P b i B i T P i B i ˙ + R P b i T i ω ^ i   ω ^ i + D x b ˙
= R ˙ W b 0 0 g + F x A ˙   A ˙ + F x Β ˙ Β ˙ + F x ω ˙ ω ˙ + D x b ˙
ω ¨ b = I b 1 i = 1 n R P b i A x T P i A i ˙ + R P b i B i T P i B i ˙ + R P b i T i ω ^ i   ω ^ i k c k f i = 1 n R P b i A x T P i A i ˙ + R P b i B i T P i B i ˙ + R P b i T i ω ^ i   ω ^ i + D ˙ ω = F x α ˙ A ˙ + F x β ˙ β ˙ + F x ω ˙ ω ˙ + D ˙ ω
where
A = A 1 , A 2 , , A n T ,   A ˙ = A 1 ˙ , A ˙ 2 , , A ˙ n T B = B 1 , B 2 , , B n T ,   B ˙ = B 1 ˙ , B ˙ 2 , , B ˙ n T ω ^ = [ ω 1 ^ , ω 2 ^ , , ω n ^ ,   ω ^ ˙ = ω ^ 1 ˙ , ω ^ ˙ 2 , , ω ˙ ^ n T
F x A ˙   = 1 m R P b 1 A 1 T P 1 R P b 2 A 2 T P 2 R P b n A n T P n ,
F ω A ˙ = I B 1 O P b i × R P b 1 A 1 k c k f   R P b 1 A 1 T P 1 O P B 2 × R P b 2 A 2 T P 2 k c k f   R P b 2 A 2 T P 2 O P b n × R P b n A n T P n k c k f   R P b n A n T P n
It is understood that F x B ˙ and F x ω ^ ˙ can be expressed similar to F x A ˙ while F ω B ˙ and F ω ω ^ ˙ can be defined like F ω A ˙ .
Taking x 1 = x b T X 1 ω b d t T T , x 2 = x ˙ b T X 2 ω ˙ b T T and x 3 = x ˙ b T X 3 ω ˙ b T T , where X 2 = X ˙ 1 and X 3 = X ˙ 2 , we can rearrange (24) as
x 1 ˙ = x 2 + δ 1 x 2 ˙ = x 3 + δ 2 x 3 ˙ = 0 0 R ˙ W b g 0 + J 1 J 2 J 3 U + D ˙ x B D ˙ ω
where
J 1 = J 2 = F x A ˙ A , B , ω ^ F x B ˙ A , B , ω ^ F x ω ˙ ^ A , B , ω ^ ,
J 3 = F ω A ˙ A , B , ω ^ F ω B ˙ A , B , ω ^ F ω ω ˙ ^ A , B , ω ^
which represent the Jacobian matrices which multiply the control law, U , of the multirotor, and δ 1 , δ 2 and δ 3 = D ˙ x B D ˙ ω represent undefined disturbances in the states of position, velocity, and acceleration, respectively.

2.2. Multi-Surface Sliding Mode Control

In this section, we derive the control for our MSL system. This is in the form of a Multi-Surface Sliding Mode Control, aided by a Radial Basis Function Neural Network. While MSSC has been effectively applied to the MSL system in question previously [40], the addition of an RBFNN to aid with system estimation yields some significant benefits. Its inclusion allows for the dynamic approximation of complex nonlinear functions and system uncertainties, which do not have a particular definition. Furthermore, RBFNNs provide an adaptive mechanism that continuously updates the control strategy based on real-time feedback from the system, ensuring that the controller adapts to changes in the system dynamics, maintaining optimal performance without the need for manual retuning.
For the co-ordinate system defined in (28), we first define the sliding surface variables for this system as
s 1 = x 1 x 1 d
s 2 = x 2 x 2 d
s 3 = x 3 x 3 d
where x 1 d is the desired position of the multirotor, which is assumed to be bounded, together with x ˙ 1 d and x ¨ 1 d , and x i d , and i = 1,2 , 3 is the virtual control at sliding surface s . Now, as the slung load oscillations are suppressed by the RL agent, θ x and θ y must be obtained from the RL simulation. While performing any maneuvers on the slung load pendulum, the RL reward function also takes the velocity, or the displacement of the slung load from its desired path, into account. The reward function governing the slung load will be defined later. First, we shall explain the derivation of MSSC using an RBF Neural Network (RBFNN).
Differentiating s 1 with respect to time, we acquire
s ˙ 1 = x ˙ 1 x ˙ 1 d = s 2 + x 2 d + Δ 1 x ˙ 1 d
where Δ 1 = δ 1 .
Taking
x 2 d = x ˙ 1 d K 1 ϕ 1 s 1 Δ ^ 1
where Δ ^ 1 is the approximation of Δ 1 , K 1 > 0 and ϕ 1 > 0 . Substituting (35) into (33) results in
s ˙ 1 = s 2 + Δ 1 Δ ^ 1 K 1 ϕ 1 s 1
Differentiating (35) with respect to time results in
x ˙ 2 d = x ¨ 1 d Δ ^ ˙ 1 K 1 ϕ 1 ( x 2 + Δ 1 x 1 d )
Now, x ˙ 2 d is split into x ˙ 2 d = x ˙ ^ 2 d + x ˙ ~ 2 d , where x ˙ ^ 2 d consists of the known portions of x ˙ 2 d .
x ˙ ^ 2 d = x ¨ 1 d K 1 ϕ 1 ( x 2 x 1 d )
x ˙ ~ 2 d = Δ ^ ˙ 1 K 1 ϕ 1 ( Δ 1 )
Now, differentiating s 2 with respect to time, we acquire
s ˙ 2 = x ˙ 2 x ˙ 2 d = s 3 + x 3 d + Δ 2 x ^ 2 d
where Δ 2 = δ 2 x ˙ ~ 2 d . Then, x 3 d is chosen as
x 3 d = x ˙ ^ 2 d K 2 ϕ 2 s 2 Δ ^ 2
where Δ ^ 2 is an estimation of Δ 2 ,   K 2 > 0 and ϕ 2 > 0 .
Substituting (41) into (40), we obtain
s ˙ 2 = s 3 + Δ 2 Δ ^ 2 K 2 ϕ 2 s 2
Differentiating q 3 d with respect to time yields
x ˙ 3 d = x ˙ ^ 3 d + x ˙ ~ 3 d
where
x ˙ ^ 3 d = x 1 d K 1 ϕ 1 x 3 x ˙ 1 d K 2 ϕ 2 ( x 3 x ˙ ^ 2 d )
x ˙ ~ 3 d = Δ ^ ˙ 2 K 1 ϕ 1 δ 2 K 2 ϕ 2 Δ 2
Finally, differentiating s 3 with respect to time, we acquire
s ˙ 3 = x ˙ 3 x ˙ 3 d = Q + J U + δ 3 x ˙ ^ 3 d x ˙ ~ 3 d
The control law is thus designed as
U = 1 J ( x ˙ ^ 3 d Q Δ ^ 3 K 3 ϕ 3 s 3 )
where Δ ^ 3 is an estimate of Δ 3 = δ 3 x ˙ ~ 3 d , K 3 > 0 , and ϕ 3 > 0 . J = J 1 J 2 J 3 as defined in (29) and (30).

2.3. Neural Network Approximation

Neural networks are typically used to approximate functions. Here, we will use RBF networks to estimate unknown disturbances Δ i ,   i = 1 ,   2 ,   3 , similar to [41]. An RBF Neural Network with M dimensions, F ζ : R N R M can be presented by
F ^ ζ = f 1 , f 2 , ,   f M T
f i = j = 1   L γ i j ρ j ,   i = 1,2 M
ρ j = e ( ζ μ j T ζ μ j ψ j 2 )
where L is the number of hidden nodes, γ i j signifies the weight which connects the j t h hidden node to the i t h output node, μ j denotes the j t h central vector, ρ j is the j t h Gaussian function, and ψ j represents the width of the j t h Gaussian function, where 0 < ϕ j < 1 .
Using an RBF network, to estimate an M dimensional continuous function F ( ζ ) using an RBF network, (49) is rearranged as
F ζ = F ^ ζ + ϵ = γ * T ρ + ϵ
where
ρ = ρ 1 , ρ 2 , , ρ L T
γ * T = γ 11 γ 12 γ 1 L γ 21 γ 22 γ 2 L γ M 1 γ M 2 γ M L
ϵ = ϵ 1 , ϵ 2 , , ϵ M T
with γ * representing the desired weight matrix and ϵ denoting an error vector obtained when the number of nodes being utilized is known beforehand. The unknown uncertainties Δ i , i = 1,2 , 3 are estimated by utilizing an RBF network as in (51), where
Δ ^ i X , γ Δ i = γ Δ i T ρ i
with X = ( x 1 , x 2 , x 3 ) . θ Δ i * can be determined using the RBF network as follows:
γ Δ i * = a r g m i n γ Δ i sup X Ω | Δ i Δ ^ i |
where Ω R n represents the region in which X exists. We make the following assumptions in the deduction that follows:
Assumption 1: For i = 1,2 , 3 the ideal estimation Δ ^ i is within the bounds:
γ Δ i m i n γ Δ i * γ Δ i m a x
Assumption 2: For i = 1,2 , 3 the ideal estimation ϵ i is within the bounds:
ϵ i ϵ N i
where ϵ N i > 0 . i.e., the error and the uncertainties lie within certain limits of the system’s behavior, to ensure that the system state is able to reach the sliding surface from any initial condition.

2.4. RBFNN-Based MSSC

Now, a robust controller is designed, aiming to control a quadrotor UAV subjected to unknown disturbances. In this regard, the multi-surface sliding controller represents the base control, while the disturbances will be approximated by the RBFNN.
Theorem 1: 
For the multirotor slung load system, considering (57) and (58), if ϵ N j 0 for j = 1,2 , 3 , then with the virtual controllers in (35) and (41), and the control law in (47) in addition to the adaptation law defined by
γ ˙ Δ j T = s i η i ρ j
where η > 0 , the system trajectories will arrive at a desired trajectory with a steady-state error bounded by | ϕ 1 | .
Proof: 
Beginning with S 3 , the Lyapunov function is taken as
V 3 = 1 2 s 3 2 + η 3 Γ Δ 3 Γ Δ 3 T
where
Γ Δ 3 = γ Δ 3 * γ Δ 3
Differentiating V 3 with respect to time, we obtain
V ˙ 3 = s 3 s ˙ 3 η 3 Γ Δ 3 γ ˙ Δ 3 T
Substituting Equations (36), (42), (46), and (47) into (62), we acquire
V ˙ 3 = s 3 Δ 3 Δ ^ 3 ϵ 3 K 3 ϕ 3 s 3 η 3 Γ Δ 3 γ ˙ Δ 3 T
Substituting (59) and (61), we obtain
V ˙ 3 = s 3 ϵ 3 K 3 ϕ 3 s 3
When | S 3 > ϕ 3 | ,
V ˙ 3 < K 3 ϕ 3 s 3 2
where K 3 > ϵ N 3 . Note that once s 3 ϕ 3 is achieved, s 3 will always remain within the boundary layers. Next, the second surface s 2 is studied. The following Lyapunov function is taken:
V 2 = 1 2 s 2 2 + η 2 Γ Δ 2 Γ Δ 2 T
By differentiating V 2 with respect to time, we obtain
V ˙ 2 = s 2 s ˙ 2 η 2 Γ Δ 2 Γ Δ 2 T = s 2 s 3 + Δ 2 Δ ^ 2 ϵ 2 K 2 ϕ 2 s 2 η 2 Γ Δ 2 γ ˙ Δ 2 T
= s 2 s 3 + ϵ 2 s 2 K 2 ϕ 2 s 2 2
As s 3 is initially arbitrary, we consider two cases:
(i)
s 3 0 < ϕ 3 ; when s 2 > ϕ 2 , then s 2 s ˙ 2 < 0 , with K 2 = ϕ 3 + c 2 , where c 2 > ϵ N 2 .
(ii)
s 3 0 > ϕ 3 ; substituting K 2 into (65), we obtain
V ˙ 2 = s 2 ( s 3 + ϵ 2 ϕ 3 + c 2   s 2 ϕ 2 ) = s 2 s 3 ϕ 3 ϕ 2 s 2 + s 2 ϵ 2 c 2 ϕ 2 s 2 2
Now, when s 2 ϕ 2 , the following holds:
s 2 s ˙ 2 < s 2 s 3 ϕ 3 ϕ 2 s 2 c 1 s 2 2 ϕ 2
Now, dividing (69) by | s 2 | and knowing that s 2 | s 2 | = s g n ( s 2 ) and s g n s 2 s ˙ 2 = d d t s 2 , (69) becomes
d d t s 2 < s g n s 2 s 3 ϕ 3 ϕ 2 s 2 c 2 s 2 ϕ 2
As ϕ 3 ϕ 2 > 0 and s 3 is bounded, then | s 2 | is bounded, when s 3 < ϕ 3 , by the same consideration in case (i), s 2 s ˙ 2 < 0 is achieved when s 2 > ϕ 2 . Using the same approach and using K 1 = ϕ 2 + c 1 with c 1 > ϵ N 1 , the convergence of s 1 may also be verified.
Hence, we have verified the asymptotic stability of the system in tracking the multirotor position with a suspended payload. □
Since the RL is primarily being utilized to suppress the slung load oscillations, the system state for the RL algorithm now only requires the pendulum angles to be defined. These will initially be obtained from the MSL simulation before being provided to the MSSC, which will concurrently operate on the multirotor.

2.5. Deep Q-Network Reinforcement Learning

Reinforcement learning algorithms, in general, are initiated with the help of the Markov Decision Process, which includes a few terms to understand; O t is an observation, a t is any action and r t represents a reward. s t represents the system state. As the environment learns from itself in this case, O t = s t . Figure 2 represents the reinforcement learning flowchart; the observation is first obtained from the environment by the agent, which then returns an action to the environment. Based on this action, the environment provides a reward or penalty to the agent in addition to the observation. In the MSL system, the agent action is that of the force applied in moving the UAV to its desired position in the presence of the slung load.
The system state includes the UAV’s position and the slung load pendulum orientation. It is defined as
O t = s t = [ θ x   θ ˙ x   θ y   θ ˙ y ]
The goal for this system is to bring the slung load to the normal position right below the UAV’s centre of mass, such that it has no oscillations. The goal will be achieved when a few conditions pertaining to the vertical position of the pendulum load are within the angular threshold, while the UAV position also remains within the threshold of the XY plane. Initially, the pendulum is placed at a random angle starting position. The episode terminates if the angular threshold is exceeded.
In the course of the interaction, the intelligent agent first determines the next state, s t + 1 , from the environment. In this case, that would be the multirotor’s position and the slung load’s oscillations. Next, it creates an action a t —which in this case is the force applied to move the MSL—based on the current state, and finally earns the reward, r t + 1 , by putting the current action out. The intelligent agent’s ultimate objective is to maximize the cumulative reward over the course of long-term operation, taking into account the benefits acquired during the interaction process.
g t = r t + 1 + γ ( r t + 2 + γ r t + 3 + γ 2 r t + 4 + ) = r t + 1 + γ g t + 1
where γ represents the learning rate of the agent.
Reinforcement learning often employs the epsilon greedy method [42] with a Q-table to guide actions based on states. A Q-table is used by the policy to find references representing the relationship between states and actions. Every term in the Q-table represents a Q-value Q ( s j , a i ) where i = ( 1,2 m ) and j = ( 1,2 n ) . The chosen action is determined based on the policy of the system in question. However, managing a large Q-table can be cumbersome, especially in complex scenarios like UAV control. To address this, Deep Q-Networks (DQNs) replace the Q-table with neural networks.
In a DQN, one network predicts Q-values (prediction model) and another estimates target Q-values (target model). These networks adjust their weights through gradient descent using the error between target and predicted Q-values. During training, the agent generates training data by interacting with the environment and storing experiences in an experience replay buffer. These experiences are used to update the networks iteratively.
By using neural networks instead of Q-tables, DQN efficiently handles large state–action spaces, making it suitable for complex tasks like UAV control.
In this research, we will utilize the DQN agent to suppress the slung load oscillations, whose flow of control is as expressed in Figure 3. The DQN is used to suppress the pendulum’s oscillations by learning an optimal policy for controlling the pendulum through reinforcement learning. The key idea is to train a neural network to approximate the Q-function, which estimates the expected future rewards for each action given the current state of the system. With the initial environment and system state vector having been defined, a reward function is designed to encourage the pendulum to stay in its normal position, with oscillations suppressed. The reward function is defined as
r t = 1 , | θ | 1 10 , | θ | > 1
This implies that the DQN agent is rewarded for suppressing the oscillation to within one degree, and penalized tenfold for failing to do so. Hence, the reward function restricts the oscillation angle to 1 , which is further suppressed by the MSSC if necessary.
The network architecture consists of 6 layers; the Feature Input Layer represents the state vector input layer, taking the end effector angle, its velocity, and the UAV position as inputs. The ‘CriticStateFC1’ is a Fully Connected Layer with 24 neurons, while ‘CriticStateFC2’ is another Fully Connected Layer that has 48 neurons. The activation function is a rectified linear unit (ReLU), represented by ‘CriticRelu1’. There is also an additional ReLU layer: the ‘CriticCommonRelu’. A third completely connected layer that has three neurons called ‘Output’ generates the network’s final output. This architecture is summarized in Table 1.
A replay buffer is set up to store present values of the system state, helping to break the correlation between consecutive samples by sampling random mini-batches for training. The MSL must now be trained to have its payload oscillations suppressed. For this, it is provided with training episodes in which the pendulum has various starting positions in each episode. In these episodes, upon obtaining information from the MSL environment, using the epsilon greedy method mentioned above, an action with the highest likelihood of maintaining the maximum reward, based on the selected reward function, is applied to the pendulum, and the resultant oscillation angle and state vector, as well as the reward obtained, are stored in the replay buffer. Periodically, random mini-batches are sampled from this replay buffer, and the target Q-value is calculated using the reward and the highest Q-value from the next state of the target network. The training continues until the desired average reward is maintained, such that the pendulum remains within the predefined threshold regardless of its starting position. The trained policy agent can then be saved and used again without having to re-train a new agent each time that the MSL is required to move from one point to another. Once trained, the DQN agent-based real-time control need not necessarily be trained again unless it is subject to a disturbance of a significantly different nature.

2.6. Control Architecture

Here, to combine the two controllers, we use the RBFNN-MSSC to maintain the multirotor trajectory and the DQN-RL to train the slung load to suppress its oscillations through self-learning. The architecture of this combined control system is displayed in Figure 4. Initially, the state vector consists of the position and attitude of the multirotor, in addition to the slung load oscillation angles. To feed them into the RBFNN-MSSC, the oscillation angles must be linearized as in (12). These vectors depend upon the magnitude of the oscillation angles. As seen in (72), the DQN agent requires an observation vector, which takes these oscillation angles, θ x and θ y , as inputs. Once the desired reward as explained in (74) is achieved, the training process is complete. The oscillation angles have been suppressed to a certain limit and can then be fed back into the system control loop. The RBFNN-MSSC then ensures that the desired trajectory of the system is achieved, by minimizing the error from the reference signal, which indicates the ideal trajectory of the multirotor and the desired orientation of the slung load.

3. Results

To verify the effectiveness of the proposed control, we have implemented some simulations in MATLAB R2022b. The parameters for the simulation are defined in Table 2. The sliding surface ratios are taken as 0.8, 1.0 and 1000 for s 1 , s 2 , and s 3 , respectively. We train the system which is subject to an initial disturbance with a vector defined by 0.6 × r a n d ( n ) + 0.1 , where r a n d ( n ) generates a number between 0 and 1.
The training of the DQN agent yields the results as seen in Figure 5. Each episode is implemented for 500 steps, and would terminate if the threshold of 1 is exceeded upon entering into the bounds of [ 1 , 1 ] . Upon achieving the desired reward, the agent could be saved and used for future simulations, i.e., it need not be trained each time a simulation is implemented. The resultant oscillation angle is then fed back into the MSSC control loop. To verify its effectiveness, we compare its performance with the MSSC being applied to the slung load as part of the MSL as well, in contrast to it being controlled by the RL agent.
The parameters used for the simulation are highlighted in Table 2. We first consider the case of the MSL control along a square path. The robustness of the trajectory tracking is highlighted in Figure 6, where the sliding surfaces ensure that the quadrotor remains within the bounds of the desired path. In tracking this square trajectory, there is a slight offset from the desired path each time that a turn is performed at the corner of the desired square path, as seen in Figure 7, but it is quickly able to get back on track along the linear path of movement. The magnitude of reduction in slung load oscillations is greater with the RL agent over the MSSC, as represented in Figure 8, with a lower peak and a slightly quicker convergence. In Figure 9, the control inputs indicate that the initial chattering effect brought about by the disturbances applied to the MSL is quickly suppressed by the RBFNN. Hence, the proposed control architecture is effective when the MSL traces a square path.
To further validate the proposed control, we next consider a butterfly path. Once again, the sliding surfaces represented in Figure 10 indicate that the MSL is kept within the bounds of the desired path, whose tracked trajectory is shown in Figure 11. The step response characteristics displayed in Table 3 highlight the reduction in the transient time of the slung load pendulum when utilizing the DQN-based RL over the RBFNN-MSSC. A reduction in the undershoot represents a reduction in the lag of the pendulum behind the forward-moving trajectory of the multirotor. The peak is reduced by about 20%, while the peak time is also significantly reduced by 66%. The trajectory traced here is almost point for point, even in the presence of applied impulse disturbances. The effectiveness of the DQN-RL agent over the RBFNN-MSSC on its own is once again displayed in Figure 12, where the oscillation of the slung load is reduced. This reduction is further emphasized in Table 4, where we see that the oscillation angle, in utilizing the RL agent over the MSSC, is reduced by 30%. Finally, Figure 13 displays the elimination of the chattering effect even when there is a change in direction at every instant, with the addition of the RBFNN to the MSSC.

4. Conclusions

In this paper, we proposed a robust adaptive controller in which a Multi-Surface Sliding Mode Controller was aided by a Radial Basis Function Neural Network to ensure that a multirotor carrying a suspended payload would have a guaranteed asymptotic stability within certain bounds. To suppress the oscillations of the suspended payload, we combined the RBFNN-MSSC with a Deep Q-Network-based reinforcement learning agent, with a reward function that would focus on limiting the oscillations to a certain threshold. Upon comparing the results of this control architecture with a control system employing the RBFNN-MSSC on its own for the entire system, based on the oscillations of the suspended payload, the proposed architecture was found to be the more effective of the two. Lyapunov Stability Theory was also proposed to verify the effectiveness of the proposed control. Future work may include proposing an additional reward function for the multirotor to combine with that proposed for the suspended payload.

Author Contributions

Conceptualization, S.K. and M.N.; methodology, C.P.; validation, C.P., M.N. and S.K.; writing—original draft preparation, C.P.; writing—review and editing, M.N. and S.K.; supervision, M.N. and S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Nawaz, H.; Ali, H.M.; Massan, S. Applications of unmanned aerial vehicles: A review. Tecnol. Glosas Innovación Apl. Pyme. Spec. 2019, 2019, 85–105. [Google Scholar] [CrossRef]
  2. Emran, B.J.; Najjaran, H. A review of quadrotor: An underactuated mechanical system. Annu. Rev. Control. 2018, 46, 165–180. [Google Scholar] [CrossRef]
  3. Baraean, A.; Hamanah, W.M.; Bawazir, A.; Quama, M.M.; El Ferik, S.; Baraean, S.; Abido, M.A. Optimal Nonlinear backstepping controller design of a Quadrotor-Slung load system using particle Swarm Optimization. Alex. Eng. J. 2023, 68, 551–560. [Google Scholar] [CrossRef]
  4. Al-Dhaifallah, M.; Al-Qahtani, F.M.; Elferik, S.; Saif, A.-W.A. Quadrotor robust fractional-order sliding mode control in unmanned aerial vehicles for eliminating external disturbances. Aerospace 2023, 10, 665. [Google Scholar] [CrossRef]
  5. Manalathody, A.; Krishnan, K.S.; Subramanian, J.A.; Thangavel, S.; Thangeswaran, R.S.K. Non-linear Controller for a Drone with Slung Load. In Proceedings of the International Conference on Modern Research in Aerospace Engineering, Noida, India, 21–22 September 2023; pp. 219–228. [Google Scholar]
  6. Li, B.; Li, Y.; Yang, P.; Zhu, X. Adaptive neural network-based fault-tolerant control for quadrotor-slung-load system under marine scene. IEEE Trans. Intell. Veh. 2023, 9, 681–691. [Google Scholar] [CrossRef]
  7. Wang, Z.; Qi, J.; Wu, C.; Wang, M.; Ping, Y.; Xin, J. Control of quadrotor slung load system based on double ADRC. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 6810–6815. [Google Scholar]
  8. Ren, Y.; Zhao, Z.; Ahn, C.K.; Li, H.-X. Adaptive fuzzy control for an uncertain axially moving slung-load cable system of a hovering helicopter with actuator fault. IEEE Trans. Fuzzy Syst. 2022, 30, 4915–4925. [Google Scholar] [CrossRef]
  9. Gajbhiye, S.; Cabecinhas, D.; Silvestre, C.; Cunha, R. Geometric finite-time inner-outer loop trajectory tracking control strategy for quadrotor slung-load transportation. Nonlinear Dyn. 2022, 107, 2291–2308. [Google Scholar] [CrossRef]
  10. Tolba, M.; Shirinzadeh, B.; El-Bayoumi, G.; Mohamady, O. Adaptive optimal controller design for an unbalanced UAV with slung load. Auton. Robot. 2023, 47, 267–280. [Google Scholar] [CrossRef]
  11. Wang, Y.; Yu, G.; Xie, W.; Zhang, W.; Silvestre, C. UDE-based Robust Control of a Quadrotor-Slung-Load System. IEEE Robot. Autom. Lett. 2023, 8, 6851–6858. [Google Scholar] [CrossRef]
  12. Kabzan, J.; Hewing, L.; Liniger, A.; Zeilinger, M.N. Learning-based model predictive control for autonomous racing. IEEE Robot. Autom. Lett. 2019, 4, 3363–3370. [Google Scholar] [CrossRef]
  13. Bag, A.; Subudhi, B.; Ray, P.K. A combined reinforcement learning and sliding mode control scheme for grid integration of a PV system. CSEE J. Power Energy Syst. 2019, 5, 498–506. [Google Scholar]
  14. Lee, D.; Lee, S.J.; Yim, S.C. Reinforcement learning-based adaptive PID controller for DPS. Ocean Eng. 2020, 216, 108053. [Google Scholar] [CrossRef]
  15. Rizvi, S.A.A.; Lin, Z. Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback. IEEE Trans. Cybern. 2019, 50, 4670–4679. [Google Scholar] [CrossRef] [PubMed]
  16. Annaswamy, A.M. Adaptive control and intersections with reinforcement learning. Annu. Rev. Control Robot. Auton. Syst. 2023, 6, 65–93. [Google Scholar] [CrossRef]
  17. Du, B.; Lin, B.; Zhang, C.; Dong, B.; Zhang, W. Safe deep reinforcement learning-based adaptive control for USV interception mission. Ocean Eng. 2022, 246, 110477. [Google Scholar] [CrossRef]
  18. Wu, L.; Wang, C.; Zhang, P.; Wei, C. Deep reinforcement learning with corrective feedback for autonomous uav landing on a mobile platform. Drones 2022, 6, 238. [Google Scholar] [CrossRef]
  19. Liang, X.; Du, X.; Wang, G.; Han, Z. Deep reinforcement learning for traffic light control in vehicular networks. arXiv 2018, arXiv:1803.11115. [Google Scholar]
  20. Ma, S.; Lee, J.; Serban, N.; Yang, S. Deep Attention Q-Network for Personalized Treatment Recommendation. In Proceedings of the 2023 IEEE International Conference on Data Mining Workshops (ICDMW), Shanghai, China, 4 December 2023; IEEE: Piscataway, NJ, USA; pp. 329–337. [Google Scholar]
  21. Peng, B.; Sun, Q.; Li, S.E.; Kum, D.; Yin, Y.; Wei, J.; Gu, T. End-to-end autonomous driving through dueling double deep Q-network. Automot. Innov. 2021, 4, 328–337. [Google Scholar] [CrossRef]
  22. Fang, X.; Gong, G.; Li, G.; Chun, L.; Peng, P.; Li, W.; Shi, X.; Chen, X. Deep reinforcement learning optimal control strategy for temperature setpoint real-time reset in multi-zone building HVAC system. Appl. Therm. Eng. 2022, 212, 118552. [Google Scholar] [CrossRef]
  23. Kersandt, K.; Muñoz, G.; Barrado, C. Self-training by reinforcement learning for full-autonomous drones of the future. In Proceedings of the 2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), London, UK, 23–27 September 2018; pp. 1–10. [Google Scholar]
  24. Muñoz, G.; Barrado, C.; Çetin, E.; Salami, E. Deep reinforcement learning for drone delivery. Drones 2019, 3, 72. [Google Scholar] [CrossRef]
  25. Raja, G.; Baskar, Y.; Dhanasekaran, P.; Nawaz, R.; Yu, K. An efficient formation control mechanism for multi-UAV navigation in remote surveillance. In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar]
  26. Özalp, R.; Varol, N.K.; Taşci, B.; Uçar, A. A review of deep reinforcement learning algorithms and comparative results on inverted pendulum system. In Machine Learning Paradigms. Learning and Analytics in Intelligent Systems; Springer: Cham, Switzerland, 2020; pp. 237–256. [Google Scholar]
  27. Dang, K.N.; Van, L.V. Development of deep reinforcement learning for inverted pendulum. Int. J. Electr. Comput. Eng. 2023, 13, 3895–3902. [Google Scholar]
  28. Li, X.; Liu, H.; Wang, X. Solve the inverted pendulum problem base on DQN algorithm. In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; pp. 5115–5120. [Google Scholar]
  29. Huang, H.; Yang, Y.; Wang, H.; Ding, Z.; Sari, H.; Adachi, F. Deep reinforcement learning for UAV navigation through massive MIMO technique. IEEE Trans. Veh. Technol. 2019, 69, 1117–1121. [Google Scholar] [CrossRef]
  30. Wang, S.; Qi, N.; Jiang, H.; Xiao, M.; Liu, H.; Jia, L.; Zhao, D. Trajectory Planning for UAV-Assisted Data Collection in IoT Network: A Double Deep Q Network Approach. Electronics 2024, 13, 1592. [Google Scholar] [CrossRef]
  31. Hedrick, J.K.; Yip, P.P. Multiple sliding surface control: Theory and application. J. Dyn. Sys. Meas. Control 2000, 122, 586–593. [Google Scholar] [CrossRef]
  32. Thanh, H.L.N.N.; Hong, S.K. An extended multi-surface sliding control for matched/mismatched uncertain nonlinear systems through a lumped disturbance estimator. IEEE Access 2020, 8, 91468–91475. [Google Scholar] [CrossRef]
  33. Ullah, S.; Khan, Q.; Mehmood, A.; Bhatti, A.I. Robust backstepping sliding mode control design for a class of underactuated electro–mechanical nonlinear systems. J. Electr. Eng. Technol. 2020, 15, 1821–1828. [Google Scholar] [CrossRef]
  34. Qu, X.; Zeng, Z.; Wang, K.; Wang, S. Replacing urban trucks via ground–air cooperation. Commun. Transp. Res. 2022, 2, 100080. [Google Scholar] [CrossRef]
  35. Nyaaba, A.A.; Ayamga, M. Intricacies of medical drones in healthcare delivery: Implications for Africa. Technol. Soc. 2021, 66, 101624. [Google Scholar] [CrossRef]
  36. Rejeb, A.; Abdollahi, A.; Rejeb, K.; Treiblmaier, H. Drones in agriculture: A review and bibliometric analysis. Comput. Electron. Agric. 2022, 198, 107017. [Google Scholar] [CrossRef]
  37. Zheng, C.; Yan, Y.; Liu, Y. Prospects of eVTOL and modular flying cars in China urban settings. J. Intell. Connect. Veh. 2023, 6, 187–189. [Google Scholar] [CrossRef]
  38. Khoo, S.; Norton, M.; Kumar, J.J.; Yin, J.; Yu, X.; Macpherson, T.; Dowling, D.; Kouzani, A. Robust control of novel thrust vectored 3D printed multicopter. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 1270–1275. [Google Scholar]
  39. Peris, C.; Norton, M.; Khoo, S.Y. Variations in Finite-Time Multi-Surface Sliding Mode Control for Multirotor Unmanned Aerial Vehicle Payload Delivery with Pendulum Swinging Effects. Machines 2023, 11, 899. [Google Scholar] [CrossRef]
  40. Peris, C.; Norton, M.; Khoo, S.Y. Multi-surface Sliding Mode Control of a Thrust Vectored Quadcopter with a Suspended Double Pendulum Weight. In Proceedings of the IECON 2021–47th Annual Conference of the IEEE Industrial Electronics Society, Toronto, ON, Canada, 13–16 October 2021; pp. 1–6. [Google Scholar]
  41. Clevon Peris, M.N.; Khoo, S.Y. Adaptive Multi Surface Sliding Mode Control of a Quadrotor Slung Load System. In Proceedings of the IEEE 10th International Conference on Automation, Robotics and Application (ICARA 2024), Athens, Greece, 23 February 2024. [Google Scholar]
  42. Kuang, N.L.; Leung, C.H. Performance effectiveness of multimedia information search using the epsilon-greedy algorithm. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 929–936. [Google Scholar]
Figure 1. Model of a quadcopter with a suspended pendulum load.
Figure 1. Model of a quadcopter with a suspended pendulum load.
Electronics 13 02424 g001
Figure 2. General flow of an RL control system.
Figure 2. General flow of an RL control system.
Electronics 13 02424 g002
Figure 3. RL using the DQN method.
Figure 3. RL using the DQN method.
Electronics 13 02424 g003
Figure 4. Framework of the RBFNNMSSC with DQN-RL.
Figure 4. Framework of the RBFNNMSSC with DQN-RL.
Electronics 13 02424 g004
Figure 5. Training of the DQN agent for real-time control of the slung load.
Figure 5. Training of the DQN agent for real-time control of the slung load.
Electronics 13 02424 g005
Figure 6. Sliding surfaces for trajectory tracking along a square path.
Figure 6. Sliding surfaces for trajectory tracking along a square path.
Electronics 13 02424 g006
Figure 7. Trajectory tracking of the multirotor by the RBFNN-MSSC.
Figure 7. Trajectory tracking of the multirotor by the RBFNN-MSSC.
Electronics 13 02424 g007
Figure 8. Comparison of the slung load oscillation with MSSC and with RL along a square path.
Figure 8. Comparison of the slung load oscillation with MSSC and with RL along a square path.
Electronics 13 02424 g008
Figure 9. Control inputs (rotor tilt angles) along the square path.
Figure 9. Control inputs (rotor tilt angles) along the square path.
Electronics 13 02424 g009
Figure 10. Sliding surfaces for trajectory tracking along a butterfly path.
Figure 10. Sliding surfaces for trajectory tracking along a butterfly path.
Electronics 13 02424 g010
Figure 11. Trajectory tracking of the multirotor by the RBFNN-MSSC along a butterfly path.
Figure 11. Trajectory tracking of the multirotor by the RBFNN-MSSC along a butterfly path.
Electronics 13 02424 g011
Figure 12. Comparison of the slung load oscillation with MSSC and with RL along the butterfly path.
Figure 12. Comparison of the slung load oscillation with MSSC and with RL along the butterfly path.
Electronics 13 02424 g012
Figure 13. Control inputs (rotor tilt angles) along the butterfly path.
Figure 13. Control inputs (rotor tilt angles) along the butterfly path.
Electronics 13 02424 g013
Table 1. DQN agent layers.
Table 1. DQN agent layers.
LayerNameDefinition
1‘observation’Feature Input
2‘CriticStateFC1’Fully Connected
3‘CriticStateFC2’Fully Connected
4‘CriticRelu1’Activation Function
5‘CriticCommonRelu’ReLU
6‘Output’Fully Connected
Table 2. Simulation parameters.
Table 2. Simulation parameters.
ParametersSymbolsValues
Multirotor massM4 kg
Slung load massm0.5 kg
Slung load link lengthl1 m
Rotor speed for hoverωres29,700 m/s
UAV moment of inertiaI2.07 × 10−2 kg/m2
Feedback control time stepTc0.005
Simulation run timeT10 s
Sampling rateTs0.01 s
Learning rateγ5 × 10−3
Table 3. Step response characteristics of MSSC vs. RL.
Table 3. Step response characteristics of MSSC vs. RL.
ParameterValue
Butterfly TrajectorySquare Trajectory
MSSCRLMSSCRL
Rise time0.010.0060.0080.006
Transient time27.0822.625.3522.21
Settling time29.9429.9829.9829.97
Settling min−0.12−0.12−2.55−3.26
Settling max1.431.013.532.91
Overshoot2.716.943.814.0
Undershoot3.61.14.043.58
Peak1.831.583.753.26
Peak time1.010.371.020.38
Table 4. Quantitative data of slung load oscillations.
Table 4. Quantitative data of slung load oscillations.
ParameterMaxMeanRMS
Butterfly
Trajectory
MSSC1.43−0.010.37
RL1.01−0.010.26
Square
Trajectory
MSSC3.53−1.22 × 10−40.98
RL2.92−7.41 × 10−40.70
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Peris, C.; Norton, M.; Khoo, S. Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems. Electronics 2024, 13, 2424. https://doi.org/10.3390/electronics13122424

AMA Style

Peris C, Norton M, Khoo S. Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems. Electronics. 2024; 13(12):2424. https://doi.org/10.3390/electronics13122424

Chicago/Turabian Style

Peris, Clevon, Michael Norton, and Suiyang Khoo. 2024. "Adaptive Multi-Surface Sliding Mode Control with Radial Basis Function Neural Networks and Reinforcement Learning for Multirotor Slung Load Systems" Electronics 13, no. 12: 2424. https://doi.org/10.3390/electronics13122424

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop