1. Introduction
Visual servoing (VS) is a motion control method for robotic systems that utilizes visual feedback obtained from mono or stereo cameras [
1]. It is often deployed in applications where a robot needs to move according to given visual goals, such as assembly tasks. With the increase of cameras in robot manipulator control, VS draws more attention and advanced approaches show up Increased cameras in robot manipulator control has drawn more attention and resulted in advanced approaches [
2]. The use of these approaches in different fields, applications, and systems such as humanoid robots or unmanned vehicles is also becoming widespread [
3,
4].
Classical VS starts with obtaining meaningful data defined as features from an image which is obtained by using a camera, as shown in
Figure 1 [
5]. These
k features are arranged in vector form
s according to their metrics such as coordinates in the image plane. The main goal is to reach the desired
s* and the VS control law in terms of the velocities of the robotic system, especially the Cartesian velocities of the end effector, defined by the error vector between
s and
s*. After this generalized definition of VS, the approaches can be classified into two main types: position-based visual servoing (PBVS) and image-based visual servoing (IBVS) [
1]. On the one hand, in PBVS, the control law uses the information about the pose of the end effector relative to the desired pose to obtain
s*. On the other hand, IBVS utilizes
s obtained from the instant image without any operations. PBVS is affected by pose estimation errors and IBVS is subject to keeping the image features in the field of view (FOV); therefore, hybrid approaches such as partitioned and 2½ D VS [
6,
7] or featureless approaches such as kernel-based VS [
8] have been derived to avoid these problems. The advantage of robustness against depth estimation errors makes IBVS worthy for practical applications and it is preferred in this study.
The main platform of this study is a quadrotor which can be defined as an unmanned aerial vehicle (UAV) that is propelled by four independent rotors and can be used in a variety of applications, including aerial photography, mapping, or surveillance. While applications of VS on UAVs are becoming popular, the first application of VS on quadrotors can be considered to be the study by Mahony and Hamel [
9]. The study used linear features but feature noise was neglected. Ceren and Altug proposed a VS system for quadrotors based on spherical image projection [
10]. Although spherical camera projection is a potential replacement for VS image projection, investigations on VS control of quadrotors with spherical imaging have shown that this leads to inappropriate motion characteristics in Cartesian space. [
11,
12]. De Plinval et al. utilized the homography matrix obtained from frontal camera motion and gyrometer measurements to propose VS stabilizer laws [
13]. Mebarki et al. used nonlinear observers to estimate translational velocity and spherical image features and designed a control law by utilizing an integral backstepping approach [
14]. Abdessameud and Janabi-Sharifi proposed a VS system based on similar translational velocity estimates, but the velocity signals were neglected [
15]. Zhang et al. focused on the visibility problem of features by optimizing VS control inputs under visibility constraints [
16], while neglecting velocity signals. Control methods such as model predictive control (MPC) are also adapted to VS control of quadrotors. Zhang et al. proposed a robust nonlinear MPC VS scheme for quadrotors under disturbances, but the sudden changes in control inputs which could be dangerous for practical applications were neglected [
17]. Quadrotor team formation with IBVS was discussed in [
18], but the velocity signals and image feature trajectories were neglected. Xie and Lynch presented an input saturated controller for UAVs to regulate the relative pose between a vehicle and a planar horizontal visual target [
19]. They simplified the control law design by using a virtual camera to define a set of image moment features and proposed another input saturation law to keep features in FOV. They mentioned it was a generalized law for UAVs, but only features of two points were selected and different dynamics between fixed-wing and rotary-wing UAVs were neglected. Zhao et al. focused on PBVS control of quadrotors with sensor fusion [
20]. They utilized an IMU, an ultrasonic sensor, and a vision sensor and designed a robust compensator to enhance the robustness against nonlinearities, couplings, and uncertainties. The position of the quadrotor was estimated using a visual feature and the designed controller was compared to the classical PID controller, but details such as feature trajectory were not given. Cao et al. proposed an IBVS controller for quadrotors to stabilize hovering and tracking [
21]. Backstepping controllers were designed to stabilize the VS of the quadrotor and a trajectory observer based on a nonlinear tracking differentiator was integrated into the proposed system. The system was verified under disturbance such as a sudden change of goal image feature locations, but motion in 3D, rotor torques, and image noises were neglected.
Other hot topics about motion control systems are fault detection and diagnosis (FDD) and fault tolerant control (FTC). FDD is the process of identifying and diagnosing the type, the altitude, and the time of a malfunction or fault in a system [
22]. This process is followed by FTC which can be defined as the ability of a control system to continue operating under defined operation constraints in the presence of a fault [
23]. It must be noted that temporary faults that show their presence at a time interval can be encountered. To ensure fault tolerance, critical control systems are designed with hardware redundant components, such as triple computer redundancy in airplanes, but the FTC design focuses on reconfiguring the controller with or without using FDD information.
In the last two decades, FDD and FTC methods have been adapted, modified, and updated for quadrotors as the flight dynamics are quite different from fixed-wing UAVs. A detailed review on FDD and FTC for UAVs can be found in [
24]. This study focused on actuator faults, AI-based FDD, active FTC for quadrotors, and the studies on these foci are summarized. Avram et al. designed an FDD system that involved a nonlinear fault detection estimator, a bank of nonlinear adaptive fault isolation estimators, and a fault accommodator which used fault estimation information [
25]. The study showed sufficient results, but the partial actuator fault ratio was quite limited which could be passively tolerated by PID controllers of the quadrotor. Song et al. proposed an indirect neural network (NN)-based adaptive FTC scheme with virtual parameter estimating algorithms [
26]. The proposed system provided robustness against model uncertainties, but sudden changes in the control signals that could be catastrophic for a real quadrotor were neglected in the study. Ren proposed a robust
observer to observe actuator fault and state estimation of a quadrotor in the presence of external disturbances, parameter uncertainties, and nonlinear terms [
27]. The fault was defined as a failure factor in lift and trust torques, but the faulty actuator was not diagnosed in the study. Ma et al. also proposed an observer-based adaptive controller to estimate and compensate actuator and sensor faults of quadrotors [
28]. The faults in the actuators were defined as biases in the form of a nonlinear function, but the physical equivalence was not defined and the results were given for a noiseless scenario.
From a practical point of view, actuator faults in UAVs may cause catastrophic results while performing dedicated tasks. To avoid these undesirable scenarios, UAVs should benefit from all the hardware that are reliable after the fault to diagnose and accommodate this situation. Quadrotors are equipped with IMUs for navigation but it is hard to diagnose actuator faults from the signals of IMUs since the inner and outer loop controllers are coupled according to RPY motions. Furthermore, using auxiliary signals from auxiliary sensors such as cameras may enhance the possibility of diagnostics without the need of hardware redundancy. Additionally, cameras can provide this assistance while implementing other tasks such as visual SLAM or visual autolanding. Cameras are also more robust than IMUs and they still can be active in the case of faulty sensors on the quadrotor. In this study, camera images and features are deployed to diagnose quadrotor actuator faults as the primary contribution to the literature. The proposed system, which combines detection and diagnostics, first approximates actuator faults. The contenders for this approximation include NN, ELM, linear SVM, and LSTM. NN had the best performance based on the RMSE measure, although the competition was fairly intense.
Active FTC is defined as the reconfiguration of controller parameters or switching between different controllers according to a fault diagnosis [
23]. These options may cause sudden changes and discontinuities that may cause hard maneuvers which are undesirable for a reliable flight. As a solution, a fuzzy GS-based active FTC stage is proposed to tune the faulty actuator gain while providing convergence after the fault diagnosis.
The system’s robustness is evaluated in the presence of feature noise. Despite the steady-state feature errors, the system exhibits sufficient tolerance to actuator faults and does not diverge. The proposed system is also tested for tracking moving feature targets under feature noise. The tracking performance of the proposed system is quite convincing while keeping the features in FOV.
The study is organized as follows: In
Section 2, the visual servo approach for quadrotors, and the proposed system with fault approximators and the fuzzy gain scheduling mechanism are described in detail; in
Section 3, the simulation results for the proposed system under feature noise are given; in
Section 4, conclusions and future goals are summarized.
2. The Proposed Fault Tolerant Visual Servo Control System
The proposed fault tolerant visual servo control system is presented in depth in this section. As the first step, it should be emphasized that an IBVS system with point features and with an eye-in-hand configuration is proposed as the quadrotor is carrying a down-looking camera. The proposed system is shown in
Figure 2.
VS begins with the projection of point features to the image plane. Assume that a manipulator equipped with an eye-in-hand camera is seeing
k feature points in 3D. These feature points’ coordinates in the image plane of the camera are given as:
where
ui, vi are the coordinates in the
u-v image plane. To characterize the behavior of the fixed-motionless feature points, these vectors are merged into a matrix in the form:
with
s* having the same dimensions as the desired fixed feature points matrix. Error convergence is the primary objective of all VS approaches:
Following these notations, the linear velocity
and angular velocity
of a point are determined in the world coordinates as:
where
are used to denote the point location and the velocity vectors, respectively. Subsequently, the focal length of the camera
, the depth of the feature
, and
are used to define the projection of the linear and angular velocities in Equation (4) to the image plane.
Equation (5) is shown as:
where
is the interaction matrix (or named as image Jacobian). Since it is challenging to determine a feature’s actual depth, its estimated value is presumed to represent the depth at
and this results in the estimated interaction matrix
.
Most VS systems make assumptions about camera formation and feature specs. Two assumptions for the proposed VS system are given below:
Assumption 1. The camera is attached to the center of the quadrotor with the eye-in-hand configuration, and the camera frame (
) and the center of the quadrotor (
) intersect without any transformations:
Assumption 2. The depths of each feature are the same for each point, and all feature points are collinear. The characteristics adhere to the collinearity criteria outlined in [29] since A, B, and C are constants: The Moore–Penrose pseudoinverse of the estimated interaction matrix
, the error vector, and a fixed gain are referred by classical IBVS as a kinematic velocity controller to exponentially reduce the error:
Here, it should be noted that
and
define the effectiveness of this velocity controller. There are two major differences when applying the conventional IBVS that was designed for fully actuated robot manipulators to quadrotors [
5]. First, due to the underactuation of quadrotor systems, the Jacobian only includes the columns that correspond to
. Secondly, the features in the image will move as a result of the quadrotor’s inability to translate without first tilting in the desired direction, increasing the image feature error. This may be disregarded for small amounts of roll and pitch, but it has to be considered for aggressive maneuvers. Here, it is ignored in this study.
The VS controller directly creates the necessary velocities for the velocity loops of these degrees of freedom in addition to performing the function of the outermost position loops for
x- and
y-position, altitude, and yaw. Here, it must be noted that the velocity loops still need rate information as an input, and quadrotors obtain this information through an inertial measurement unit. The inner loop PD controllers define attitude torque demands that are roll torque
, pitch torque
, yaw torque
, and height control force
, with gravity feedforward. Then, Euler’s equation of motion provides the rotational acceleration for the quadrotor as:
where
I is the inertia matrix,
ω is the angular velocity vector, and
τ is the torque vector applied to the quadrotor. The motion of the quadrotor is defined by the relation between torque vector and height control force using the torque applied to each propeller:
where
is the coefficient matrix including aerodynamic drag, distances of blades from the center of mass of the quadrotor, and lift constant defined in [
5]. This closed loop is shown with blue signal flows in
Figure 2.
In this study, we focus on actuator faults with partial loss of effectiveness (LOE) that may be a result of motor or propeller damage or component deterioration [
25,
26]. While diagnosing a fault, the LOE ratio should be approximated to prepare for the stage of FTC. The fault diagnosis stage of the proposed system starts with
s*as the input of the bank of fault approximators. Each approximator approximates the LOE of a propeller as
fi.
Four AI candidates, neural networks (NN), linear support vector machine (SVM), and a deep NN, LSTM are chosen as function approximators.Four AI candidates, i.e., neural networks (NN), linear support vector machine (SVM), extreme learning machine (ELM), and a deep NN, i.e., LSTM are chosen as function approximators. As the first candidate, NN is a powerful classification and regression tool that can map input–output relations without any user interference as a black-box unit [
30]. The architecture of the fault approximator NN is given in
Figure 3.
Haykin [
30] provided all detailed information about NNs. Here, the learning algorithm, i.e., the Levenberg–Marquart (LM) algorithm, which is one of the essential indicators of the learning stage of NN, is briefly reviewed. NNs seek to reduce error and update their stated parameters with learning algorithms to provide acceptable outputs for appropriate inputs. The gradient descent approach, which is described in Equation (12), is the most popular learning algorithm used for this purpose:
where
E(
n) is the
n. step error function,
wij is the weight from neuron
i to neuron
j, and
η is the learning rate parameter. Instead of this parameter update, LM uses the sum of square errors for each input sample as the first step and its first derivative is defined in partial differential form as the Jacobian matrix in Equation (14). Then, the parameter update is implemented with
μ learning rate using Equation (15):
SVMs are a prominent and commonly used approach for machine learning classification issues by transferring the data to another hyperplane to classify linearly. In contrast, finding a function that roughly maps an input domain to actual numbers using a training sample is the goal of regression analysis. Linear SVM defines a linear equation under an error constraint:
where
yn is the output dataset,
xn is the input dataset, and
ε is the maximum error, as shown in
Figure 4. The line segments for the boundary linear equations are shown in red and the defined linear equation by linear SVM is shown in blue. Using basic quadratic programming approaches, this minimization problem may be stated in ordinary quadratic programming form. However, using quadratic programming approaches might be computationally costly. Therefore, approaches such as sequential minimal optimization (SMO) are referred to [
31]. For input–output mapping for fault detection, four independent SVMs, each for one fault output, should be defined.
The most well-known regressors are NNs, SVMs, and hybrid structures such ANFIS, however, each of them has flaws. According to input–output mapping, the hidden layer parameters of SLFNs and the user-defined parameters of SVMs and ANFIS should be carefully set [
32,
33]. ELM generalizes single-layer feedforward NNs whose hidden layer does not require tuning as a member of the NN family. For this architecture, the output function of ELM as one output node scenario is:
where
is the vector of the output weights between the hidden layer of
nodes and the output node,
is the output vector of the hidden layer with
activation function. Input–output mapping of ELM is shown in
Figure 5.
The sigmoid, sine, or hard limiter functions are the definitions of the activation functions of the green hidden layer nodes in
Figure 5. ELM tries to minimize the training error as well as the norm of the output weights, whereas conventional learning algorithms only frequently achieve the minimum training error between output and goal
T:
where
is the minimum norm least square solution of
with
where
is the Moore–Penrose pseudoinverse as in Equation (9). In [
33], it was stated that the orthogonal projection technique, orthogonalization method, iterative approach, and singular value decomposition (SVD) could all be used to obtain this pseudoinverse. Input–output presentation is the same as NNs.
The fourth regressor for fault function approximation is a deep NN architecture, i.e., long short-term memory (LSTM). An LSTM network is a kind of recurrent neural network (RNN) that can discover long-term relationships between sequence data’s time steps [
34]. Practically speaking, basic RNNs have a limited ability to learn longer term dependencies. RNNs are frequently trained via backpropagation, which can cause "vanishing gradient" or "exploding gradient" issues. These issues result in either extremely tiny or very high network weights, which limit the efficacy of applications that call on the network to learn long-term relationships. To solve this problem, LSTM networks employ extra gates to regulate which data from the hidden cell are transmitted as output and to the following hidden state, as shown in the LSTM cell in
Figure 6. Here,
f is the forgetting gate that decides what information is to be carried forward,
g is the memory cell,
i is the input gate that decides which values will be updated,
o is the output gate,
ci is the cell state that is the memory of LSTM,
hi is the hidden state, and
xt is the input. The network can more successfully learn long-term associations in the data thanks to the extra gates. LSTM networks are superior to simple RNNs for evaluating sequential data because they are less sensitive to the time gap.
A sequence input layer and an LSTM layer are the two main parts of an LSTM network. Data from time series or sequences are fed into the network using a sequence input layer. Sequence data’s long-term relationships between time steps are learned by an LSTM layer. A regression output layer is also defined for regression purposes. It is not shown in
Figure 6.
These AI regressor candidates are selected as fault approximators of the fault diagnosis system in
Figure 1. The output of the approximator gives information about the LOE of the faulty rotor, and the quadrotor controllers should be updated according to this information which is defined as active FTC. The inner and outer loop controllers are coupled according to RPY motions, and changing the PD values of these controllers will not compensate the LOE of the faulty rotor directly. Therefore, torque and force signals in (11) should be scheduled by
, the fault gain factor that is defined by fault approximators as gain scheduling (GS). However, this GS approach may cause a sudden change in the gain, resulting in sudden velocity changes in the motion, and concluding in divergence during the VS tasks. To avoid catastrophic results, an adaptive gain should be defined to provide a soft transition. Instead of an analytical definition, a fuzzy logic (FL) unit is deployed to obtain a soft transition and to also include user experience. The impact of error magnitude on IBVS features is taken into consideration while defining the linguistic rules of FL.
An FL unit’s output depends on its type, input membership function (MF) types, MF aggregation, rulebase, and defuzzification type. A Mamdani-type FL unit is employed in this study, and therefore, the output functions are fuzzy MFs. As the most common fuzzy implication type, minimum is chosen and it must be noted that new implication types such as IFESI can be found in the literature [
35]. Maximum is the aggregation type, and the centroid of area (COA), which is the weighted average of the centroids of output MFs with weighting factors
of input MFs
, and
ith rule is the defuzzification type [
32]:
After this FTC stage, the quadrotor receives the appropriate velocities for each rotor. Then, the quadrotor kinematics and dynamics provide the motion in 3D, completing the closed loop of the system presented in
Figure 2.
3. Simulation Results
To show the performance of the proposed system, the proposed fault approximators and FTC system are implemented using MATLAB Simulink, Robotics Toolbox, Machine Vision Toolbox [
5], Deep Learning Toolbox, Fuzzy Logic Toolbox, and ELM codes from [
33]. In this study, an X-4 flyer model is chosen as the quadrotor platform and the details of this platform can be found in [
36].
Table 1 provides an overview of the quadrotor model’s parameters.
It is assumed that a camera is fixed to the quadrotor’s center without being transformed, as mentioned in the assumptions in
Section 2. The camera’s resolution is 1024 × 1024 pixels, and the principal point’s coordinates are (512,512). The system’s control loop and video stream both run at a rate of 20 Hz. The four fixed collinear points of a square with a side length of 0.5 m are used to define the features as
in Cartesian coordinates. As the goal of the IBVS system,
, these points’ centers should collide with the principal point.
and
are defined as:
In Equation (9), the estimated value of the depth is needed for
and it is assumed to be 2 m. The performance of the system may be affected by this estimation, as in [
1].
As the first step of fault diagnosis, a dataset for fault approximators is created. As mentioned in [
30], regressors need a dataset that covers all the workspace for the best approximation. Therefore, a healthy IBVS system is simulated with initial linear and angular values as
m. and
rad., respectively. Here, it must be noted that IBVS drags the features in the image plane through a sliding surface mode which is reached in the sliding phase in sliding mode control. Five LOE percentages, i.e., 10%, 20%, 30%, 40%, and 50%, are defined for each rotor and the actuator faults are injected to the healthy system at six different time instants which may change the behavior of the system. Here, it must be noted that the healthy IBVS diverges after a fault bigger than 10% LOE. The fixed gain value in Equation (9),
, for the healthy system is 0.3. The dimensions of the dataset for inputs and outputs are 8 × 27,613 and 4 × 27,613, respectively. The data are divided into two parts, 85% data for training and 15% data for testing. Furthermore, the inputs and outputs are normalized for better approximation performance.
The first candidate is NN with three hidden layers with 10 neurons in each layer. The activation functions of each layer, including the output layer, are logarithmic sigmoid and the learning algorithm is LM. The goal is 0 RMSE and 1000 epochs are performed. As the second candidate, the proposed system uses four different ELMs, each for a rotor fault, with sigmoid-type activation functions with 10 hidden neurons. The third candidate is linear SVM with five-fold cross validation without any dimensionality reduction methods. Here, it must be noted that other SVM types with different kernel functions such as cubic SVM are tested, but linear SVM gives the best result. The last candidate is LSTM with 200 hidden units with Adam optimizer [
37]. Again, the goal is 0 RMSE and 250 epochs are performed. The RMSE results for each approximator and each rotor are given in
Table 2.
Table 2 shows that NN is the best approximator according to the RMSE results. While other approximators, as more recent AI architectures, are expected to show better performance, NN surpasses them with nonlinear regression capability. As an example, the first NN output of Rotor 1 fault approximator with 30% LOE for 6 time instants are shown in
Figure 7 with blue as the real LOE and red as the approximation.
As the GS stage of the proposed system, a gain factor for torque and force signals in Equation (11),
is defined using an FL unit as feature error norm,
as input, and
as output. The generalized bell MFs of this FL unit were selected to provide a smooth nonlinear surface, free of discontinuities. There are three MFs for each input and output. The rule base is modified in accordance with lessons learned through experiments that
should be small, while the error norm is high to avoid discontinuities. The MFs and the surface between input and output are given in
Figure 8.
As stated in the section on practical disturbances for real-time VS systems, all systems can be exposed to feature noise and poor camera calibration [
29,
38]. When the system approaches convergence, which can be defined as very small feature errors, noise in the features may result in oscillations in the motion of the quadrotor. Two cases are discussed in the following subsections.
3.1. Case 1: Fixed Target Features under Noise
To show the performance and the robustness of the system, a fault at Rotor 1 with 30% LOE at
t = 8 s and uniformly distributed random noise with five magnitude disturbing all feature points are considered. The results for this scenario are shown in
Figure 9.
In
Figure 9a, the blue circles, the red circles, and the black circles show the starting, the finishing, and the target point features, respectively. The feature trajectories, also shown in
Figure 9a, and the errors under feature noise, shown in
Figure 9b, demonstrate the proposed IBVS system’s convergence and resilience in the face of defined real-world problems. The fault approximator NN catches the fault at time instant 8.02 with an approximation of 36.28% LOE and fuzzy GS reconfigures the system in a small time interval, but a delay of 0.02 s in the fault diagnosis causes an abrupt change in Rotor 1 speed, as shown in
Figure 9d. Furthermore, the fault causes steady-state errors in the features, as shown in
Figure 9b, but the system converges under an actuator fault and the features are kept in FOV as an important practical positive. This is a tradeoff of the proposed FTC system but convergence is the main goal of an FTC system. The trajectory with the blue circle as the starting location and the red circle as the finishing location in
Figure 9c and the RPY signals of the quadrotor in
Figure 9e do not contain any abrupt changes which are good positives, especially for a reliable flight system.
3.2. Case 2: Moving Target Features under Noise
As a challenging scenario, it is assumed that the target features are moving in the
u-v image plane with a sine shape. The starting feature points and feature noise characteristics are the same as in the first case and the goal features are defined as:
The results for this scenario are shown in
Figure 10. The feature colors in
Figure 10a are the same as in
Figure 9a. Additionally, cyan circles show the starting of goal features. The fault approximator shows its ability again and diagnoses the fault at 8.021 s with an approximation of 36.05% LOE.
It is clear that the proposed FTC system can track the moving features in
Figure 10a, but the errors increase in an admissible amount, as shown in
Figure 10b. Furthermore, the effect of rotor fault at Rotor 1 is obvious after the time instant at
t = 8 s. After this time instant, it must be noted that the feature trajectories are kept in FOV while tracking moving targets, as in the first case, which is an important superiority. The trajectory followed by the quadrotor still does not contain any sudden maneuvers and this provides a reliable flight in realization. RPY motion and rotor angular speed characteristics are quite similar to fixed target results. It is a consequence of very low roll and pitch maneuvers with fixed yaw characteristics while tracking the moving targets.