This section provides a comprehensive examination of the latest research findings. We divided this section into two parts. First, we review the proposed papers, and then we discuss them. The discussion includes the presented navigation method and a review of the dataset used in the autonomous navigation of 5G drones.
2.1. Review of Papers
Studies introduced potential resolutions by employing machine learning, genetic algorithms, and traditional search methods. Such strategies can be categorized into three distinct types: mapless, online maps, and offline maps. An offline map indicates that the drone has complete information about the environment, including the start and end points and obstacle locations, prior to autonomous navigation [
27]. Online map building, also known as simultaneous localization and mapping (SLAM), is the process of creating a map during navigation without any prior information. Mapless strategies do not need any prior knowledge of the environment; instead, they rely on the ability to observe and navigate surroundings without maps [
27]. Numerous studies were conducted on the application of RL in this field. In [
28], the authors introduced a reinforcement learning framework that enabled a drone to accomplish many subgoals in the path-planning context. Experiments took place in a two-dimensional grid setting without using any sensors, and the drone’s objective was to reach the target point in a straightforward manner. Lee et al. [
29] proposed enhancing the path optimization of surveillance drones during flight using a reinforcement learning technique without specifying an obstacle avoidance mechanism. Only a two-dimensional grid environment was tested using the proposed method. Cui et al. [
30] developed a dual-layered Q-learning methodology to address the issue of planning paths for drones. They also employed the B-spline technique for path smoothing, which improved the planned path’s performance. They used a 30 × 30 cell grid arrangement in MATLAB software to validate the efficacy of the suggested methodology. Arshad et al. [
25] used the Udacity and collision sequence datasets to train CNNs, which provided two outputs: the forward velocity and steering angle of UAVs. Such UAVs utilize a front camera, and the proposed technique teaches them to fly by mimicking the movements of automobiles and bikers. The performance was measured using the collision probability and steering angle, with an accuracy of 96.26%. Darwish and Arie [
31] used a hybrid approach that combines CNN and LSTM to track and move toward the dynamic target and predict Q and V values. They used a sequence of four grayscale depth and RGB images for tracking and navigation purposes. They utilized AirSim and Unreal Engine for the simulation and OpenAI Gym for the training and control. They encountered challenges during this simulation since the background may conceal the diminutive target drone. The proposal by Artizzu et al. [
32] involves using an omnidirectional camera to perceive the entire drone environment and an actor–-critic network to control drones by training this network based on three images: depth, RGB, and segmentation. The experiments demonstrated the method’s ability to navigate in two different virtual forest environments.
Recently, Yu et al. [
16] presented a hybrid approach that combines VAE with LSTM to apply their strategy at a variable speed rather than at a fixed speed. This hybrid approach uses depth images captured from a stereo camera. The last paper we mentioned was based on a latent representation of depth images, similar to the work of Kulkarni et al. [
33], who proposed depth cameras to navigate cluttered environments and used variational autoencoders for collision prediction. This method provides collision scores, which enable safe autonomous flight. In this process, they aimed to emphasize the advantages of this method in managing real sensor data errors, as opposed to methods that rely solely on simulation training. In addition, Kulkarni et al. [
1] relied on variational autoencoders. They used a hybrid approach that combines a variational autoencoder with a deep reinforcement learning approach for aerial robots to control the velocity and yaw rate commands by converting a depth image into a collision image. Zhang et al. [
15] proposed combining a CNN with an autoencoder; the input of this network is depth images from an RGB camera. The network’s output comprises steering commands that use the angular velocity to avoid obstacles. They mentioned that they could integrate additional sensors to enhance the efficiency of their method. Similarly to Yu et al. [
16], González et al. [
34] proposed a variable speed instead of a fixed speed in their work. They presented a differential evolution algorithm for path planning that covers a specific area of an environment. This algorithm yields the steering angle of the path that incurs the lowest distance cost. The velocity of a drone increases when it is located at a considerable distance from obstacles, whereas it has a slower speed when it is near one.
Drones in swarms have become a topic of interest to researchers, including Wang et al. [
18], who proposed implementing reinforcement learning to prevent collisions between multiple agents in a decentralized environment where communication is absent. The observation of the drone includes the positions and velocities of its neighbors. The proposed approach involves utilizing steering commands as the sole method of action. However, the experiment did not specify the exact sensors used. Mikkelsen et al. [
35] introduced a decentralized method, positing that robots communicate wirelessly to share data within a certain distance. This study aimed to develop a distributed algorithm that can determine the velocities of individual robots within a swarm while maintaining a formation, communication, and collision avoidance. The experiment was conducted on a two-dimensional grid. Ma et al. [
14] presented a framework for controlling the formation of a UAV swarm using vision-based techniques. The authors presented the proposition that the hierarchical swarm possesses a centralized framework. The algorithm architecture consists of four main modules: detection, tracking, localization, and formation control. Deep learning enables the system to determine its position without relying on the Global Navigation Satellite System (GNSS). The image serves as the lead UAV’s input, the velocity command serves as its output, and the broadcaster distributes the locations for the remaining UAVs. Schilling et al. [
36] aimed to establish a safe path for drones during their Internet of Things (IoT) operations. They proposed that vision-based drone swarms depend on Voronoi diagrams (VDs), where each drone detects nearby drones. In the diagram, every point represents a drone, which is a dynamic obstacle to constructing a constrained plane, known as a VD. The significant distance between obstacles and all edges of the path contributes to the safety and reliability of VD-derived paths. The drone is equipped with an omnidirectional camera.
Several methods are based on point cloud builds to represent obstacles in the field of view. Chen et al. [
37] proposed using a quadrotor with a depth camera to capture a point cloud of obstacles within its field of view. They then used this point cloud to construct the map. They pointed out that the quadrotor’s limited field of view can make it challenging to achieve completely safe flights in dynamic environments, potentially leaving some dynamic obstacles undetected. Xu et al. [
38] proposed using an RGB camera to detect obstacles and generate depth images. These images are then trained to detect obstacles and convert them into cloud points. The cloud points are then classified as dynamic or static and tracked. The authors asserted that the main limitation of the performance of their algorithm is the sensor’s field of view. Future enhancements can be achieved by using sensor fusion in a multiple-camera system.
Previous studies fused sensors. Yue et al. [
39] proposed using a camera and laser to sense the target distance while deep reinforcement learning autonomously navigates a drone in an unknown environment. Sensors only make the drone aware of its front environment. They mentioned that due to its limited field of view, a UAV cannot perceive the direction of motion of dynamic obstacles throughout the global range. Doukhi et al. [
40] utilized the deep Q-network algorithm to determine the optimal collision-free policy to select the best action among the three moving commands, which are right, left, and forward. They used LiDAR data and a depth camera as input for the algorithm. However, this method does not address navigation problems in a three-dimensional space. Jevtic et al. [
41] presented reinforcement learning for flying mobile robots to avoid collisions. LiDAR data were used to identify obstacles in the mobile robot’s surroundings. They delineated three actions: forward movement, diagonal left movement, and diagonal right movement. Experiments were conducted in a two-dimensional environment using MATLAB Simulink version R2022a Simulink. Xie et al. [
12] introduced a deep reinforcement learning algorithm that relies on a camera and a distance sensor to capture environmental information from in front of a drone to plan its path in a complex and dynamic three-dimensional environment. CNNs derive features from pictures, while recurrent neural networks (RNNs) preserve the trajectory history. They performed 3D simulations on a virtual robot experimentation platform (V-REP); the experiment was devoid of dynamic objects. Chrronis et al. [
13] used a reinforcement learning approach to enable autonomous drone navigation using four distance sensors: one on the drone’s belly and three on its front. These sensors only provide the drone with awareness of the environment in front of it. Based on this awareness, the drone moves by rotating its Z-axis in both clockwise and counterclockwise directions, adjusting its speed and altitude accordingly. Microsoft’s AirSim4 conducted experiments in static environments. They mentioned that cameras would enable them to leverage additional information from the environment beyond what these sensors provide.
Several papers have pointed out the limited field of view of the current proposed methods for path planning and obstacle avoidance. In [
12], the authors mentioned that different sensors show different types of information; therefore, using multiple sensors in reinforcement learning to make more reasonable decisions is an important research direction [
12]. In future work, the authors of [
13] intend to repeat the same series of experiments but integrate alternative sensors, such as cameras, to take advantage of additional data from the environment. The experiments in [
38] proved that the sensor’s field of view was the main bottleneck that affected their algorithm’s performance. Consequently, they proposed sensor fusion in multiple-camera systems that can be used to improve future performance. In [
37], the authors mentioned that a significant factor that affected the robustness was the unsatisfactory performance of perception due to a limited field of view in an environment that contained numerous dynamic obstacles. Hence, they proposed investigating perception-aware planning in future works to better predict the status of dynamic obstacles and enhance the robustness of the system. The experiments in [
39] proved that a UAV cannot perceive the motion direction of dynamic obstacles across the entire global range due to its limited field of view. In future research endeavors, they mentioned that they could better integrate multi-sensory data to emulate autonomous perceptual capabilities in animal systems. Examining the robustness of the proposed algorithm, which can be used to exit U-shaped obstacles, poses an additional challenge that warrants further study [
20]. Researchers need to find even better ways to integrate these sensors so that drones can better sense their surroundings and make quick decisions to prevent them from colliding with other objects [
42].
The four tables below compare autonomous navigation methods for 5G drones, highlighting the obstacles they avoid, testing environments, sensor fusion, and field of view.
Table 1 shows that the strategies in [
19,
35,
37,
38] are ineffective in dynamic environments due to their reliance on online and offline map strategies.
Table 2 shows that only [
12] presented a way out of U-shaped obstacles by recalling previous actions. This solution is insufficient for complicated U-shaped obstacles. The limitations of studies [
14,
18] include that drones can only avoid collisions with other drones in the same swarm by transmitting data via a communication link. Therefore, drones without a communication link cannot avoid collisions. The drawback of most articles is that their proposals did not avoid dynamic objects, as shown in
Table 2. Meanwhile, the approaches that did avoid them suffered from limited fields of view. In contrast,
Table 3 illustrates the sensors utilized in each method. In [
20], sensors that require high computational time were mentioned; therefore, in
Table 3, we indicate these sensors, which are stereo cameras, monocular cameras, omnidirectional cameras and LiDAR. Certain sensors, such as stereo, monocular, omnidirectional, and RGB cameras, are sensitive to light and weather conditions, unlike radar and LiDAR, which are not affected by these factors [
20,
43].
In
Table 3, there are three approaches [
14,
18,
35] that rely on communication links to avoid collisions with other drones. However, this approach may not work well because not all drones are able to communicate with other drones. Additionally, there are static obstacles, such as buildings, and dynamic objects, such as enemy drones, that cannot be communicated with.
Table 4 illustrates the limited fields of view of drones in all the studies. Most studies only focused on environmental perception through the front of the drone and ignored the other parts of the drone. We note that no method proposed a way to perceive the environment above a drone. Although [
32,
36] perceived the environment from four sides, they suffered from a limited field of view because they did not perceive the environment above the drone. In addition, these studies used an omnidirectional camera, so they suffered from a high computational time.