One of the core problems in an SAR operation with an unmanned system is navigation, which is about driving a vehicle safely from one place to its target without colliding with other obstacles. The navigation is composed of three parts: task assignment, global path planning, and local collision avoidance.
3.3.1. Task Assignment
Control of UAVs in SAR operations can be carried out manually by expert pilots, but one of the problems is that coordination among pilots is difficult because of the dynamic nature of the environment during a disaster. So the challenging problem of automatic search by multiple drones has received much attention, and it is a non-deterministic polynomial (NP) problem of combinatorial optimization under multiple constraints. The successful and efficient allocation of available resources will be a solver of such a situation, in which rescue efficiency can be maximized. Vehicles should be able to quickly, reliably, and efficiently find answers to the question: considering the resources available in the network and the tasks that should be performed, what is the best allocation of these tasks among us? The key to solving this problem is to establish the task assignment model and use the assignment algorithm.
The proposed assignment algorithms can be divided into centralized approaches and distributed approaches. Centralized approaches, such as genetic algorithms [
90,
91] and particle swarm optimization [
92], require UAVs to constantly communicate their situational awareness to a central station, which generates plans for the entire agency team. In addition, the paper [
93] modified the previous genetic algorithm to solve the complexity caused by heterogeneity in UAV swarm. Max-sum is also a centralized optimization method suitable for a wide range of UAV applications, including task assignment in SAR [
94]. Max-sum enables the best performance of the system’s applications in wireless sensor networks. The main drawback of the algorithm is the need to replan the entire assignment for each period to optimize the assignment. Therefore, it may not be suitable for real-time applications with high dynamics; moreover, it may not fit well with a large number of UAVs due to the increased communication overhead. This is also the disadvantage of centralized approaches. Due to limited communication, the centralized approach is slow to respond to dynamic changes and is susceptible to system failures. However, it should be noted that, with the development of a cloud-based UAV Internet management system [
95] and less computationally intensive algorithms [
96], the UAV may communicate with the cloud server that coordinates the task allocation between them through the Internet connection in the future, and the dilemma of the centralized algorithm may be solved.
The current reality is that most of the environments are uncertain, dynamic, and only partially observable, so it is difficult to implement a centralized global optimization algorithm. Individual agents need to independently and possibly myopically decide what to do next based on the information they receive. Various distributed methods have been proposed. One of the most basic algorithms is the opportunistic task allocation Strategy (OTA) [
97], in which an unmanned aircraft randomly selects blocks in an unexplored search area. One way to do this is when information is extremely scarce. There are also some complicated distributed methods, such as market-based methods and consensus algorithms. The key to the market-based approach can be visualized as the auctioneer announcing a bidding task, each agent sending the bid to the auctioneer, and the highest bidder robot winning the task. In a multi-drone SAR operation, the bid value is calculated based on the distance between the drone and the survivor so that the survivor can be rescued in the shortest possible time. However, this method requires a connected topology and a large amount of data transmission to send bids [
98,
99]. The consensus of the distributed algorithm focuses on solving the consistency problem of the distributed systems, and the most famous one is the Raft algorithm [
100], which is considered easy to understand in design and excellent in performance. It is a crucial step for distributed consensus algorithms from theoretical research to practical application. In general, centralized and distributed control should be combined, and distributed control should be used for basic group behaviors such as formation flight, obstacle avoidance, and collision avoidance. In addition, more advanced behaviors (e.g., information sharing, task scheduling, distributed computing, etc.) should be controlled centrally. In recent years, some studies have added the application of learning. The paper [
101] describes the problem as a Markov decision process (MDP) and uses deep reinforcement learning (DRL) to obtain state-based decisions. Some other studies have investigated solutions to limited drone battery power, including optimizing energy efficiency [
47] and heterogeneous collaborative systems for vehicles and UAVs [
102,
103], and since multiple agents act in a decentralized way, methods to discourage competitive behavior rather than promote cooperation is also one idea [
104]. Task assignment, as a mature problem, has been studied quite a lot. The various task assignment approaches and their features are tabulated in
Table 7.
3.3.2. Path Planning
The path planning problem for a UAV may be viewed as an optimization problem [
105] in which the most common goal is to find a feasible path from the beginning location to the terminal position while following different optimization parameters and constraints. The SAR mission is not always in the open wilderness and may sometimes be in a cluttered and obstacle-rich environment; for example in an urban area or indoor environment, it is necessary for a UAV to adopt a path planning algorithm ensuring the traversed path to be collision-free and optimal in terms of path length. According to the actual situation, there will be some variations. For example, to realize the comprehensive use of UAVs and mobile charging stations, VRP with the synchronous network (VRPSN) is defined in [
106], which is a new kind of VRP.
Many methods for UAV path planning have been proposed in recent years. The most common ones are sampling-based methods, such as RRT and PRobability roadmap; there are graph-based methods for designing paths, such as Voronoi graph algorithm, concluded Dijkstra algorithm, A* algorithm, and Markov decision processes [
107,
108]. It is worth mentioning that [
108] integrates target motion prediction with the tracking trajectory planning (as
Figure 6 shows), enabling the advanced path planning utility. Applications in this area have been studied as early as 2011; the paper [
109] preliminarily addresses SAR using quadrotors. The paper [
110] used a hill-climbing algorithm, iteratively, to first find a path at each step by the hill-climbing algorithm, and then optimized the objective function to assign the search effort (flight time) to each cell in the path. The paper [
111] broke two assumptions and extended a framework for probabilistic search based on decision-making to merge multiple observations of grid cells and changes in UAV altitude, enabling small, light, low-speed, and agile UAVs, such as quadrotors, to perform occupancy network-based search tasks. The paper [
112] presents a path planning method where the UAV is regarded as a Dubins vehicle. The path planning method is based on the tangent graph where the obstacles are abstracted as circles. Then the tangent graph composes of straight lines and arcs on the circles. Finally, the shortest path from a source position to a destination can be found by a graph searching algorithm. However, these algorithms do not consider UAV kinematic and dynamic limitations. Furthermore, these algorithms require prior knowledge of the production map, a requirement that greatly limits their applicability.
Another kind of optimal path-planning approach is the potential field-based method. PF was proposed by Khatib in 1986 [
113]. In Potential Fields (PF),
and obstacles have attractive and repulsive potentials respectively. The two potential forms the potential field of the UAV, and the resultant force of the magnetic field on the UAV determines its motion direction. Then some algorithms are proposed, such as the artificial potential field and the interfering fluid dynamic [
114], to realize global offline path planning. However, because the potential field method leads the vehicle to the minimum value in the field, it often falls into the local minimum value. When the target and the obstacle are close to each other, a feasible route cannot be found.
Biological-Based Path Planning algorithms, which are mainly based on machine learning, have made great progress in recent years with the support of swarm intelligence techniques. Many algorithms have been proposed, and here are a few of them. A Genetic Algorithm (GA) can be used to resolve the constrained and unconstrained optimization problems, but it cannot guarantee an optimal path. Local minima can occur in narrow environments, therefore, lower security and narrow corridor problems need to be avoided [
115,
116]. Particle Swarm Optimization (PSO) is a classical meta-heuristic population-based algorithm to resolve problems of multiobjective path planning. The paper [
117] used simulation to compare the particle swarm optimization (PSO) with the other optimizing algorithms, including layered search and rescue, spiral search, and fish-inspired allocation. They further proposed the algorithm based on the particle swarm optimization algorithm, reducing the collisions without affecting iterations to convergence. The paper [
118] proposes an optimal trajectory determination method for multi-robot paths in cluttered environments based on an improved particle swarm optimization algorithm (IPSO) and an improved gravitational search algorithm (IGSA) to minimize the maximum path length required for all robots in the environment to reach their respective destinations. Ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. In [
119], they propose an improved ACO to resolve various Vehicle Routing Problems (VRPs), which is utilized for Unmanned Aerial Vehicle (UAV) task allocation and route planning. At last, the comparison between mentioned path planning algorithms is provided in
Table 8 for reference.
Unknown Environments Exploration
It is often important for UAVs to conduct SAR operations under unknown environments, for example, a GPS-denied environment or a place where no pre-stored map is available. Therefore, UAVs will be assigned the task of unknown space exploration. Agents need to formulate an effective exploration strategy along with motion planning to decide how to move in an unknown environment to minimize exploration time and cost. This usually requires UAVs to have the real-time ability to perceive the surrounding environment and adjust strategies.
Plenty of papers focus on developing effective exploration strategies, one of which is the random exploration strategy. With this method, the robot obtains more information by randomly choosing the direction and speed to move in an unknown environment. The Rapidly exploring random trees (RRT) algorithm introduced in the paper [
120] is a type of randomized algorithm that constructs a tree-like structure by repeatedly adding new nodes to the existing tree, with the goal of efficiently exploring the search space. RRT has been widely adopted in the robotics community. To further implement the RRT algorithm practically, the sensors are integrated for the environment perception of robots. This strategy is also called Sensor-based Random Tree (SRT) [
121,
122], a variant of the RRT algorithm. In the SRT algorithm, the robot uses its sensors to check if the proposed connection would collide with any obstacles in the environment before adding the new point to the tree. The paper [
123] proposed the improvement of the RRT algorithm implementation. Instead of moving agents using the RRT algorithm, this article uses multiple independent RRT trees to quickly and efficiently search for frontier points to discover unknown areas.
Similar to the RRT algorithm, the Monte Carlo tree search (MCTS) is also proposed for area exploration and path planning. It was first proposed as a framework for game AI in the paper [
124], which illustrated the detailed procedures for the MCTS. It is further implemented in path planning of multiple games, including Ms. Pac-Man [
125,
126] and a two-player turn-based strategy board game called Go (in which the multi-agent Monte Carlo is considered) [
127]. Simulation in games has shown positive results.
It is further implemented into robot operations, including the coordination of UAVs in disaster response and casualty discovery [
128], and the search and rescue plannings [
129]. The Monte Carlo method can estimate the obstacle distribution of the unknown environment through multiple random sampling, and then generate a probability map, based on which the path planning could be operated by UAV agents. A comprehensive review of the Monte Carlo method is presented in [
130], which illustrated the development and variants of the Monte Carlo method in detail.
However, the RRT and Monte Carlo methods both belong to the randomness-based exploration strategy, which is very inefficient for exploring large areas or complex environments. This is because agents may repeatedly explore known areas due to the inherited defect of this strategy [
131].
To avoid collisions and repeated exploration, a model-based exploration strategy is proposed for robot agents to collect and analyze environmental information. This strategy necessitates the robotic system to model the environment, for instance, by utilizing technologies such as LiDAR or cameras to construct 3D maps. The robot then employs the map information to plan a trajectory and conduct an effective exploration.
In this strategy, the state-of-art method is the simultaneous localization and mapping (SLAM), which is proposed in [
132] in 1991. Robots or mobile devices, equipped with sensors such as cameras, and LiDARs, are supplemented by inertial measurement units for capturing information about the environment and their movements. Analysis of this information enables the robot to estimate both its position and the shape of the environment map simultaneously [
133]. The paper [
134] briefly introduced the development of the SLAM technique. Robots such as drones are equipped with LiDAR sensors and other sensors (such as IMUs) for real-time acquisition of environmental information. By filtering, segmenting, and registering the point cloud data obtained by LiDAR, the computer can extract feature information in the environment, such as walls and corridors. After analyzing the sensor data, the UAV can realize real-time positioning (that is, obtain the position and attitude of the UAV). At the same time, by combining the extracted feature information with the position and attitude information of the drone, a map of the indoor environment can be constructed (for example, LiDAR sensors can generate laser point cloud data for building a three-dimensional map of the environment). After that, the UAV makes navigation decisions based on the constructed map and real-time positioning information, such as path planning, obstacle avoidance, etc., so as to realize the function of environmental exploration.
The paper [
135] improved the basic SLAM technique and introduced the FastSLAM algorithm, and it has been further enhanced into FastSLAM 2.0 [
136] in 2003. FastSLAM is a classical particle filter-based SLAM algorithm that uses two parallel filters: a particle filter for robot localization and an extended Kalman filter (EKF)-based filter for map building. The FastSLAM 2.0 algorithm uses the Rao–Blackwellized particle filter (RBPF) to simultaneously handle robot position and map building. This improvement can significantly reduce the number of particles required to achieve accurate SLAM results. The paper [
137] introduced the Unscented FastSLAM based on the unscented particle filter that uses an unscented Kalman filter (UKF) to further reduce the number of particles, but the UFastSLAM is restricted to nonlinear measurement models. The Differential Evolution technique is proposed in [
138] to handle non-linear optimization problems and further enhance the SLAM performance.
Nowadays, SLAM enhances UAVs’ autonomous control and environment perception, resulting in increased efficiency, reliability, and safety. A typical SLAM-based exploration block diagram is introduced in
Figure 7 with the basic SLAM function (localization and mapping), planning layer, and communications layer for UAVs teams through the network. The proposed SLAM algorithm in the paper [
139] utilizes LiDAR and MEMS IMU (Micro Electro Mechanical System inertial measurement Unit) with a fixed Kalman filter for state estimation, resulting in improved feature extraction accuracy and reduced filtering algorithm computation. The work in paper [
140] further demonstrates an enhanced ability to navigate unexplored floors through LiDAR grid construction of orthogonal walls, filtering out static furniture and dynamic human bodies, and utilizing the Linear Quadratic Estimation (LQE) method to assist in calculating the displacement and orientation of the robot. The paper [
141] suggests the utilization of RGB-D cameras to obtain dense color and depth images [
142] for an onboard UAV SLAM approach. The paper [
143] utilizes the visual SLAM to propose the exploration strategy for distributed multi-UAV systems. Regarding the limited connection between each agent, the result shows an obvious reduction in exploration time and traveled distance for both two and three UAVs.
Furthermore, the SLAM-based exploration performance of UAV agents is enhanced by the integration of machine learning [
144,
145]. Ref. [
145] utilizes population coding algorithms and self-learning features to construct a cooperative multi-coupling system for collaborative decision-making in UAV-based SLAM operations. By adopting consensus architecture and neural network training, the drones in the system can reach a consensus and work together to help the system adapt to changing environments and achieve self-adaptation and optimization. The research in [
144] realizes the ability to make indoor environmental maps in real-time by combining SLAM and the Single Image Depth Estimation (SIDE) algorithm based on Convolutional Neural Networks (CNN). A learning-based exploration solution is proposed in work [
146], which aims at using an end-to-end learning method to obtain the geometric information of the environment directly from RGB images without relying on specialized sensors, resulting in greater flexibility and adaptability. However, the paper [
147] stated that this work shows little difference in navigation performance from untrained traditional methods. It proposed the Active Neural SLAM, which achieves fast exploration of unknown areas by modularizing the task and conducting independent training in each module while combining traditional analytical path planners with learning-based SLAM modules. This indicates the combination between the model-based exploration strategy and machine learning has promising prospects for further development.
In conclusion, UAV SLAM technology enables real-time environmental perception and mapping, allowing for the deeper detection and recognition of the environment, such as terrain, obstacles, buildings, and roads. However, the principle of C-SLAM (Collaborative Simultaneous Localization and Mapping) is to achieve a global perspective by integrating the local perspectives of multiple UAVs [
148], thereby enabling more accurate topological localization and map construction. The accuracy of map construction and positioning is affected by the number of drones, limitations in network bandwidth, delay, as well as the different frame of reference used by each UAV [
149,
150].
To overcome the technical limitations in C-SLAM, multi-sensor fusion technology should be further researched and developed to better combine data from different sensors and improve the robustness and accuracy of localization and map construction. At the same time, for communication and data transmission in C-SLAM, the UAV SLAM system should adopt higher-speed data transmission technology and optimize network architecture and protocols to reduce delay and bandwidth limitations. Moreover, the continued development of SLAM technology and algorithms is improving the reliability of the Model-Based Exploration Strategy, along with the enhanced performance of UAV environmental detection. To further enlarge the feasibility of the UAV SLAM under different scenarios, automatic parameter tuning, including thresholds of feature matching, and RANSAC parameters, should be further achieved [
151]. This leads to the integration between AI (or learning features) and the SLAM system in future study. For example, machine learning algorithms can leverage large amounts of real data to learn the relationship between the environment and parameters, enabling automatic adjustment of SLAM parameters based on real-time data and environmental changes for improved performance and accuracy.
3.3.3. Collision Avoidance
It is not difficult to find from the above that path planning is to generate a set of path points that bypass obstacles from the initial position to the final goal, while collision avoidance takes a given waypoint assignment as a local goal to avoid obstacles. The rest of this section serves as a survey of these works and presents the development history and the latest research progress of collision avoidance algorithms, especially smart collision avoidance in dense and narrow spaces.
Early collision avoidance algorithms mainly targeted static obstacles. Since path planning also needs to be considered in obstacle avoidance, the predominant idea for obstacle avoidance in the 1970s and 1980s is to construct a configuration space, and many improved path-planning algorithms have been proposed. However, none of these classical algorithms can minimize the input energy and achieve the optimum results while avoiding obstacles. In the middle of the 1980s, some path-planning algorithms considering uncertain and dynamic environments were proposed, such as potential functions [
113], control theory, and other heuristic algorithms. These algorithms solve the shortcomings of classical algorithms but still face challenges when dealing with complex moving obstacles. In the 1990s, many local motion planning algorithms were proposed to improve efficiency, such as e dynamic window technology, inevitable collision states, and velocity obstacles. Such algorithms abandon the optimal global solution to improve efficiency and can process inputs in real-time but, because they do not optimize the trajectories subject to time or energy, UAVs will fall into a deadlock when facing dynamic obstacles.
After the 2000s, with the development of new technology and improved hardware computing power, more and more obstacle avoidance algorithms have been proposed, making UAVs more agile and robust. The paper [
152] describes the safety evaluation process that the international community has deemed necessary to certify such systems about UAVs. The paper [
153] proposes an adaptive tracking controller based on output feedback linearization that compensates for dynamic changes in the quadrotor’s center of gravity. The paper [
154] combines the improved Lyapunov Guidance Vector Field (LGVF), the Interfered Fluid Dynamical System (IFDS), and the strategy of varying receding-horizon optimization from Model Predictive Control (MPC) to track the target and avoid obstacles in a complex dynamic environment,
Figure 8 presents the demonstration of its Local obstacle avoidance strategy. In [
155], a new extended multi-rotor Voliro is proposed, a new type of air platform that can fly in any direction while maintaining any direction, significantly improving the agility of the UAV, and may be used for indoor SAR operations when reduced in size. The work in the paper [
156] even accomplishes a rapid 180-degree course reversal for UAVs with minimal computational effort, including a simple feedforward/feedback controller, which was successfully implemented for small fixed-wing UAVs. A new method for autonomous navigation of small unmanned aerial vehicles (UAVs) in artificial forests using only a single camera was proposed using Faster region convolutional neural network (FAR-CNN) to detect tree trunks [
157]. The paper [
158] proposes a hybrid approach incorporating first principles and learning to model the quadrotor and its aerodynamic effects with unprecedented accuracy, enabling flight close to the physical limits of the platform. Further, the paper [
159] presents an online planning method following the framework of model predictive control (MPC) to jointly optimize the motion of the UAV and the configurations of the RISs under the consideration of energy efficiency.
The implementation of non-linear MPC into collision avoidance among multi-UAV agents is proposed in [
160]. The critical distance is set with cost penalties for collision-free operations. The paper [
161] also introduced the non-linear MPC collision avoidance into object transportation, where the agents are required to transport an object collaboratively. The simulation result proves the validity and convergence of the method. The various task assignment approaches and their features are tabulated in
Table 9.
Agile Movement in Tight Spaces
Additionally, the agile movement of drones in confined spaces can provide many benefits for UAV search and rescue operations. UAVs can expeditiously arrive at the intended destination, and their agility and hovering capabilities enable them to promptly respond to search and rescue demands, consequently enhancing the pace of search and rescue endeavors. Furthermore, UAVs can effortlessly penetrate narrow spaces or perilous terrains that are arduous for rescue personnel to access. They can penetrate restricted spaces, edifices, ravines, woodlands, and other arduous-to-reach locales to effectuate search and rescue tasks.
In UAV flight control, the use of model-based control involves creating a mathematical model to represent the UAV’s motion and dynamic characteristics. This allows for precise control of the UAV using techniques such as model predictive control and optimal control. At present, there exist various model-based flight control methods, among which nonlinear dynamic inversion (NDI) [
162] is one of them. NDI linearizes the dynamics of an aircraft using an aerodynamic model, which yields a linear system that is fundamentally identical for all aircraft, given that the aerodynamic model is correct. Based on NDI, the paper [
163] eliminates the sensitivity of model mismatch and reduces the cost of flight control system design by feeding back angular acceleration. Adaptive control [
164,
165] is another model-based flight control method. The adaptive parameters are updated in real-time to maintain flight stability by adapting to the environment. The paper [
166] further applies the Cerebellar Model Arithmetic Computer (CMAC) to update the adaptive parameters for adapting the varying payload and unknown disturbance simultaneously. Furthermore, the MPC mentioned in the former context is also included in model-based flight control methods.
Practical issues exist in its application. Firstly, the motion and dynamic characteristics of UAVs are complex, making it difficult to establish accurate mathematical models [
167]. Secondly, there are many uncertainties and interferences in practical application scenarios, making it challenging to achieve precise model predictive control and optimal control [
168]. The instability is also introduced by the deletion of some practical effective terms in models for simplification of calculation [
14].
Therefore, combining deep learning with control theory can effectively address these issues and improve the robustness and adaptability of UAV control, which has better prospects for practical applications. The artificial neural network (ANN) is combined with the model-based flight controller for learning complex control systems. The integration enables real-time adaptation and learning of the control system, with relatively simple requirements on hardware and processing procedure. The detailed advantages and limitations are further illustrated in [
168], which is listed in
Table 10:
The papers [
169,
170] proposed hybrid supervised neural network models for dynamic systems, using flexible modules in non-recurrent and recurrent networks. The model is evaluated with an autonomous helicopter/UAV system, comparing radial basis and multilayer perceptron with the real system. The results confirmed its feasibility and potential for further investigation. The paper [
171] investigates the use of ANN-based dynamic models for control synthesis and demonstrates that even a simple ANN architecture can accurately generalize dynamics beyond training data when coupled with LQR and PD control for trajectory tracking, making it suitable for control purposes. The paper [
172] develops a ReLU model that combines a quadratic lag model with a double-layered simple ReLU network. Following training via least-squares regression and stochastic gradient descent, the model exhibits an overall improvement of 58% in acceleration prediction performance.
The paper [
173] proposes a control system design using feedback linearization and a neural network for model inversion error. Pseudo Control Hedging is used to protect the adaptive element from non-linearities and the resulting system allows position, velocity, attitude, and angular rate commands. The outer loop allows tracking of position, velocity, attitude, and angular rate with an attitude correction. The paper [
174] introduces a hybrid adaptive control method to recover the stability of a damaged aircraft under single damage, in which direct adaptive ANN compensates less dynamic inversion error and the simulation results show good effectiveness of the approach. The paper [
175] uses a neural network-based adaptive sliding mode controller to control the altitude of a quadrotor. The paper [
176] integrates the ANN into backstepping flight control of a helicopter, and the result presents strong resistance to sudden mass-changing perturbations.
The flight control method combined with the neural network can learn the nonlinear dynamic characteristics of an aircraft, making it highly adaptable and robust, especially for uncertain or unknown nonlinear system characteristics. However, this method requires a considerable amount of data for training, resulting in a longer training time.
Despite the significant contributions of academic research towards the development of technology, the majority of it remains in a simulation stage within the laboratory. This is due to the presence of various assumptions that may not hold in real-world applications or may pose implementation challenges in hardware. Therefore, there is still a considerable distance to cover in the advancement of technology.