5.1. Detection and Localization
The detection of UAVs can be achieved either by direct self-reporting or indirect discovery performed by ground BSs. Conventional detection techniques, including radar, LiDaR sensors, acoustic recognition, electro-optical sensors, and computer vision seem inappropriate in challenging overcrowded dense-urban environments with numerous physical obstacles, ambient noise and light variation, and NLoS propagation conditions. To overcome these issues, intelligent detection and identification techniques have emerged based on data driven algorithms. Among them, ML classifier models have been previously proposed as an accurate classification method. Moreover, DL methods have also been used for the UAV detection problem.
In Reference [
88], a complementary ML-based identification method was applied using large training data sets consisting of UAV and non-UAV data flows from real-world encrypted Wi-Fi traffic. In general, exploiting Wi-Fi traffic is not straightforward due to possible data encryption whereas applying ML methods for real-time UAV identification may not always be beneficial due to time constraints and possible processing delays. Thus, a delay-aware ML-based UAV identification framework was proposed that achieves permissible accuracy while minimizes the delay. This framework handles the encrypted data flow as a time series, exploiting information of packet size and inter-arrival time for statistical feature extraction and estimating the packet inter-arrival time using maximum likelihood estimation (MLE) method. The prediction time is significantly reduced due to l1-norm regularization and the integration of the feature selection and accuracy optimization in one objective function. Based on the results, the delay-aware ML solution achieves an accuracy of about 85–95% being capable of identifying UAVs within 0.15 to 0.35 s in four scenarios where, in each one, a data set from a different traffic class was used.
In Reference [
89], an air-to-ground LTE cellular network was considered, under strong LoS propagation conditions and harsh interference. The existence of the aerial users (AUs) was accurately and timely detected using ML and measured RF data from UEs as input in a rural area. More specifically, for the AUs detection, a relatively small number of training samples and a weighted distribution for the training set were used along with three ML techniques: (i) the rather simple and insensitive to the distribution or size of the training set Bayesian estimator [
90]; (ii) the SVM with high computational complexity and high sensibility but adequate performance [
90]; and (iii) the MLP [
91], which supports real-time on-the-fly training and avoids storage costs. It is noted that the decision time and the performance of the classification algorithms are a function of the number of features. Using the proposed ML classification methods and an significant number of features, the reliability was approximately 99% whereas the accuracy approached 100%. Moreover, it was observed that, depending on the available data set, each ML solution can lead to different results, as it is the case with unevenly balanced data among aerial and ground users. More specifically, Bayesian and SVM estimators exhibit better generalization capabilities for different performance metrics, while MLP can offer improved performance if a specific metric is targeted.
An ANN-based detection algorithm was proposed in Reference [
92] using three features: the slope, the kurtosis, and the skewness of the signal received from an UAV. Both outdoor and indoor experimental measured data from RF signals were used to train the algorithm and to obtain feature extraction and classification of UAV or non-UAV signals. The results in terms of the error rate revealed that the recognition rate can exceed 82% within a distance of 3 km, a score that is better than other scores provided by conventional non-ML detection methods. The detection and classification of UAVs using Markov-based naive Bayes ML techniques was also realized in Reference [
93] for multiple RF raw signal fingerprints from several UAV controllers and different SNR levels. In particular, the classification was based on the energy transient signal and statistical processing in order to avoid noise sensitivity and to adapt to modulation techniques. This method is characterized by low computational complexity, does not exploit the time-domain, and hence averts possible delays for the detection of the transient of the signal, especially in low SNR scenarios. The feature sets were used to train different ML algorithms, including the KNN classification [
90], the discriminant analysis (DA), the SVM, and the NN. The results showed that the KNN classifier has the best performance, obtaining an accuracy of above 80% for 15 dB SNR and 14 controllers, whereas the NN has the the worst performance when SNR is low. Further, increasing the number of controllers leads to an average accuracy of 96.3%. Overall, RF fingerprinting is a promising technique, but it requires specific equipment, e.g., expensive SDRs at high frequencies.
In Reference [
94], an acoustic ML-based UAV detection approach with multiple listening nodes, i.e., computing units capable of detecting the UAV sounds, and a control center was presented, relying on features such as Mel-frequency cepstral coefficients (MFCC) [
95] and short-time Fourier transform (STFT) for training. STFT provides more signal information but contains more noise compared to MFCC, and recently, it has been considered promising for training using acoustic signals, as DNNs and DL are capable of processing more complicated data. SVM and CNNs were considered in order to estimate in real-time the flying path of an UAV in noisy environments. Overall, the STFT-SVM model succeeded in achieving the best performance, as SVM shows improved classification performance in binary tasks with reduced training resources compared to CNN. The current flying status (i.e., whether it is flying or it is static on the ground) of a powered-on UAV that is remotely controlled by a malicious operator was detected in real-time in Reference [
96] by eavesdropping the communication traffic exchanged between the UAV and its controller and by applying classification algorithms. In particular, the Weka platform [
97] was adopted that contains various standard ML algorithms for data mining tasks. Among them, the Trees-J48 (J48) [
98], RandF [
99], and NN were used. This cost-effective passive method does not require special equipment or signal transmission and intends to statistically analyze features, such as packet size and inter-arrival time analysis. The results showed that the delay is less than 4 s with 93% detection accuracy.
In Reference [
100], a system that autonomously, instantaneously, and accurately locates the controllers of UAVs from the transmitted RF control signals was described. This system uses an RF sensor array to monitor the signal spectrum, whereas a CNN is trained in order to predict the bearing of the drone controller relative to the sensor and, then, estimates the position of the controllers. Using the proposed configuration, an operator can be apprehended at distances of up to 500 m with a mean absolute error of 3.67
in bearing calculation, which in turns corresponds to a fair positional error of 40 m. Then, in Reference [
101], ML tools have been used for estimating the locations of drone-UEs based on sparse information. In particular, employing the kernel density estimation, the UAV spatial distribution has been estimated and then used for obtaining the optimal cell association. Moreover, tools from the optimal transport theory have been also adopted. The presented results confirm that the proposed approach significantly reduces the UAVs-UEs latency as compared to conventional UE-cell association based on SINR while also improves the spectral efficiency. Finally, three deep neural networks (DNN) are trained and tested in Reference [
102] using an open source RF database in order to detect the presence of UAVs, to identify their type, and to determine their flight mode. This database includes raw RF signals, which were collected for a variety of flight modes. The performance of these DNNs was verified via a tenfold cross-validation process. According to the classification results, the number of classes dramatically affects the accuracy, since some UAVs were manufactured by the same company. In particular, the mean accuracy decreased from 99.7% for the first DNN (2-classes) to 84.5% for the second DNN (4-classes) and to 46.8% for the third DNN (10-classes).
In UAV swarm mission-critical scenarios, each UAV should detect and track the other swarm members in order to acquire information regarding the relative distance and bearing. In this direction, a DL- and visual-based detection/tracking algorithmic framework based on the You Only Look Once (YOLO) object detection system [
103] was proposed in Reference [
104]. In this framework, the tracker, i.e., UAVs equipped with a low-cost and lightweight visual camera, aims at reliably predicting the location of target UAVs on the image plane of their camera. To experimentally evaluate the performance of this approach and to collect multiple data sets of images to train the learning algorithms, a flight test campaign comprising different UAVs and cameras with varying resolution was accomplished. The results underlined the accuracy of the proposed method, which obtained 90% of correct detection instances, in a timely manner. These results also demonstrated that this method offers robustness against challenging conditions, such as sun illumination, as well as background and target-range variability.
Table 5 summarizes papers tackling detection and localization issues through ML-based techniques in UAV networks.
5.2. Placement and Trajectory Design
This challenge is also directly related with the nontechnical but equally important legal and regulatory issues of constructing an international legislation framework for UAV operation [
105]. These regulations do not permit UAVs to fly over all areas and introduce altitude limitations as well. Note that there exist some common rules across different countries: (i) the UAVs should fly only in visual line-of-sight (VLoS), usually limited to 500 m from the pilot; (ii) the UAVs should fly only under 120 m relative to the ground. For other operations, such as flying over people, cities, beyond VLoS (BVLoS), etc., special permissions can be granted depending on the country, while testing of new systems and applications must be usually done in segregated airspace areas. Moreover, flying over people not involved in the operation is not allowed in most countries or it is allowed only under a specific authorization of the aviation authorities. Several studies on the optimized placement and trajectory of UAVs as BSs or relays have been previously conducted (e.g., References [
106,
107]). These studies have focused on providing efficient connectivity to a group of distributed ground users.
The joint optimization of trajectory and power control in multiple UAVs scenarios was obtained in Reference [
108] aiming at maximizing the users’ throughput and satisfying the users’ rate requirement. The optimization problem relied on ML techniques and included three steps in order to determine the trajectory design of UAVs acting as agents: (i) a multi-agent Q-learning-based placement algorithm was proposed for estimating the 3-D optimal placement of the UAVs with respect of the initial positioning of the ground users, and (ii) the mobility and the upcoming positioning of the ground users were predicted using an ESN-based prediction algorithm [
109] and a real-world geographical data set. The latter one was collected from an online social network, i.e., Twitter, as input and included GPS coordinates of anonymous users and recorded time stamps. The third step of the optimization problem included a multi-agent Q-learning-based algorithm that was used to determine the optimal position and to transmit power of the UAVs in each time slot according to the users’ mobility. The results underlined that the proposed algorithm can converge to an optimal state while comparisons with two benchmark schemes, i.e., the historical average (HA) model and the LSTM depicted the superiority of ESN in terms of MSE over both HA and LSTM models at a lower complexity. These results also illustrated that the proposed approach can increase the throughput by approximately 17%, while accuracy improves, as the size of the reservoir increases.
The authors of Reference [
110] follow a Gaussian Process (GP) approach to derive the air-to-ground communication channel, providing an additional link to a group of ground nodes, thus improving their communication quality. For this purpose, GP is employed for predicting the communication channel strength at random UAV positions in an urban environment. Then, the channel model can be used to perform optimal UAV trajectory planning, either with offline pre-scanning based on GP, followed by a nonlinear model predictive control (NMPC) planner or NMPC planner with online GP. More specifically, in the offline method, the UAV performs data measurements from ground nodes via a prespecified scanning pattern flight. After, these data are fed to the GP to build the channel strength between the different UAV positions and ground nodes. On the contrary, in the online method, the UAV trajectory is designed according to the current knowledge of the air-to-ground channel and through periodic measurements by the UAV incomplete map using GP. It was shown that the offline creation of the communication channel strength map, with GP prior to the start of the mission, outperforms the online creation of the map during the mission without scanning.
In Reference [
111], the trajectory of an UAV BS employed to provide communication services to multiple users was optimized targeting sum-rate improvement using Q-learning. To model the air-to-ground channel between the UAV BS and the ground users, the log-distance path loss model was used. The simulation scenario included an UAV BS flying at a constant altitude and acting as an autonomous agent, two static ground users, and a cuboid obstacle. To obtain Q-function approximators, a standard table-based approach and an NN were utilized. Based on the simulation results, the UAV BS was capable of learning the network topology without preexisting knowledge of the environment. Also, the UAV BS could autonomously obtain its landing position in a timely manner. These results also underlined that NN is more efficient and scalable and requires significantly less training data than table-based Q-learning.
In topologies where mobile ground nodes require UAV relaying, the paper in Reference [
112] proposed to combine ML-based measurement with a probabilistic LAP channel model to facilitate UAV trajectory planning. More specifically, the authors assumed four distinct types of urban environments with unknown building position and shape and a wireless channel including the effects of path-loss, multi-path fading, and shadowing with empirically-known distributions. Also, the ground nodes’ current positions are available to the UAVs but their mobility patterns are not known. Thus, in order to select the appropriate probabilistic LAP model, UAVs collect a pair of signal strength and elevation angle measurements among the UAV and ground nodes. This data allows a neural network (NN) predictor to determine the current urban environment type. The NN’s output can be used for UAV trajectory design using a a Cross Entropy Optimiser (CEO) to generate a set of possible trajectories. Then, a convergence criterion is imposed and, when it is fulfilled, the optimal trajectory is fed to the UAVs. By periodically performing this process, channel prediction for trajectory design in scenarios with mobile ground nodes can be efficiently achieved. Therefore, the NN-based trajectory planning exhibits promising performance in settings where little information on the mission area is available while considering the mobility of the network nodes.
The non-convex nondeterministic polynomial-time hardness (NP-hard) problem of jointly handling in real-time the 3-D deployment and the dynamic movement of multiple UAVs was studied in Reference [
113], where the goal was to maximize the sum mean opinion score (MOS) of ground mobile users while attaining an adequate QoE. A three-step solution was proposed, which comprised (i) the use of a genetic algorithm based k-means algorithm [
114] to determine the initial cell partition of the users; (ii) the development of a Q-learning based deployment algorithm, which considers each UAV as an agent, trying offline to self-train and find its optimal 3-D placement assuming static ground users (at the first time slot); and (iii) the development of a Q-learning based movement algorithm for scenarios where the users are moving. The results demonstrated that these algorithms rapidly converge to a desired solution after a significantly small number of iterations. Furthermore, Q-learning based-deployment offers improved performance and reduced complexity when compared to k-means and Iterative-GAKmean alternatives.
In Reference [
115], an unsupervised online self-tuning learning algorithm for joint mobility prediction and object profiling of the individual UAVs was proposed. Apart from predicting the flying objects’ future locations without requiring prior knowledge of the mobility profiles or trajectories of the UAVs, the proposed method also enables the classification of the UAVs into particular groups based on their motion properties, e.g., rotatory and fixed-wing UAVs, via an hierarchical generative model. From the results, a success rate of 90% in profiling mobile objects was yielded for a reasonable noise level and a relative small training data set (over time) compared to conventional data-driven methods. Overall, this method can be practically applied to FANETs with dynamic network topologies and autonomous UAVs and to predict future topologies.
A UAV-based IoT data harvesting scenario in an urban area was also presented in Reference [
116], where a resource-constrained aerial base station was employed to serve multiple static ground nodes, e.g., IoT sensors. Contrary to most works in the field, the propagation parameters are considered to be unknown. Based on this uplink scenario, a joint flight trajectory and node scheduling design problem was formulated to minimize the estimation error of the channel model parameters and to maximize the data traffic between the UAV and each node. The trajectory learning phase included the collection of measured data from the ground users that resulted in adequate knowledge of the propagation parameters from the UAV side. Then, an iterative path planning algorithm was proposed along with dynamic programming [
117] techniques and the exploitation of a 3-D city map compression method in order to efficiently handle the aforementioned non-convex problem. The results demonstrated the benefits of the proposed learning method, whereas it was proved that the algorithm can converge to at least a locally optimal solution. Moreover, even though the ML-based solution leverages the rich map data, a map compression method is employed, making the trajectory design problem less complex compared to standard optimization tools.
Then, in Reference [
118], a UAV BS flying at an altitude well above the building height was adopted for the provision of video streaming services to several UEs clustered in a circular area. In an effort to optimize the flight planning and, thus, to enhance the QoE of UEs with regard to the video segment delay during streaming applications, a Q-learning approach was formulated. This approach was denoted as Q-SQUARE and modeled the UAV BS’s path as a sequence of states, which were related to the UAV BS’s position, the elapsed flight time, and the residual energy. Also, Q-SQUARE considered multiple recharging stations, where the UAV BSs were recharged after a flight period. The numerical results demonstrated that Q-SQUARE can substantially improve the system performance in terms of the QoE by optimizing the flight path of multiple UAV BSs serving the area of interest.
The joint optimization of the placement and power allocation of multiple UAVs with respect to the ground users in an unknown region was also accomplished in Reference [
119]. Since initially, the UAVs have imperfect CSI; the optimization problem was formulated by exploiting game theory. Then, a robust and distributed learning algorithm was proposed to guide multiple UAV flight plans in order to maximize the sum-rate with fairness for all ground users under specific flight region and power constraints. This algorithm converged to a stable state of maximizing the aforementioned optimization objective, resulting in a guide that facilitated the flight path of the UAVs. A flight path planning model for a dynamic and auto-configurable FANET was presented in Reference [
120]. This model took advantage of a metaheuristic optimization-based approach that intends to optimize in real time the position of a flying relay device, i.e., an UAV, in order to obtain the best throughput while having knowledge of the position of the other UAVs. Thus, an ANN was trained using mobility data of UAVs including the position and traffic information that was supplied through the Network Simulator 2 (NS-2). According to the results, the average throughput of the optimized FANET was significantly higher, i.e., 135%, in comparison with the non-optimized FANET in an area of 200 × 200 m.
In Reference [
121], a Q-learning positioning approach was proposed in an effort to find the best 3-D placement of drones in multiple drone small cells considering propagation areas, where the conventional terrestrial communication infrastructure is not operational or accessible due to a large-scale natural disaster. In these temporary intelligent small cells, the drones have limited resources and the ground users could have distinct requirements in terms of the data throughput and the mobility characteristics. The goal of this approach was to facilitate the construction of an efficient emergency communication network by maximizing the total network radio coverage while attaining robustness against dynamic network conditions, mobility issues, and interference. The simulation results included performance comparison between the proposed Q-learning method and random fixed positioning and circular positioning strategies and depicted that the former outperforms the others, in terms of coverage, QoS, and backhaul throughput.
The performance of a downlink air-to-ground communication system was optimized in Reference [
122], where an aerial platform acted as a BS and the ground users were not static. In particular, a low-complexity Q-learning algorithm was used to find, in relatively short processing time compared to heuristic algorithms, the 3-D optimal position of the aerial BS in such a dynamic environment, where the network topology continuously changes, satisfying the QoS requirements. Thus, the benefits of RL-based positioning is evident, as the need for reinitialization in the heuristic-based approaches is avoided and gradual changes due to UEs mobility are efficiently monitored, leading to positioning with lower complexity. In Reference [
101], a 3-D cellular network was also proposed that utilizes drones as BSs and drones as UEs. The former use ML tools to determine the spatial probability distribution of the latter for a certain time period, considering the mobility properties and aiming at minimizing the latency. In particular, a kernel density estimation method was developed, for which the training is attained using the sparse (owing to the excessive overhead costs) available prior information of the location of drone-UEs. Additionally, the problem of the proper placement of an aerial platform as a BS was investigated in Reference [
123], taking into account the users’ requirements and specific scenarios. Specifically, a DQN algorithm was proposed that maximizes the spectral efficiency by exploiting a large-scale pre-learning experience of different user layouts. The DQN combines the DL CNN and reinforcement learning Q-learning advances, but it has the advantage of being more time efficient. The superior performance of this algorithm was demonstrated in the simulation results, where the spectral efficiency of the system achieved 91.3% maximum spectral efficiency, with lower complexity than conventional genetic algorithms, such as hill climbing and simulated annealing algorithms.
In Reference [
124], the self-optimization of multiple UAVs’ trajectory in real-time sensing applications was tackled using a Q-learning learning method in a decentralized manner. A single-cell UAV orthogonal frequency division multiple access (OFDMA) network was considered, where the UAVs transmit the sensory data to a terrestrial BS over orthogonal subchannels to avoid mutual interference and the location of the terrestrial base station and the UAV was specified by 3-D cartesian coordinates. It was also considered that the UAVs perform the sensing tasks in a synchronized iterative manner and send the measured data to the base station, whereas the sensing quality of the UAVs was evaluated using the probabilistic sensing model [
125]. In addition, a sense-and-send protocol was proposed that facilitates the coordination of the UAVs handling different tasks, and then, the probability for successful valid data transmission using nested Markov chains was investigated. A frame-level simulation of this protocol was built in MATLAB
®, and the results underlined the rapid convergence of the proposed algorithm compared to traditional single- and multi-agent Q-leaning algorithms.
A distributed DRL algorithm for the navigation of a group of UAVs acting as BSs flying in a target region was developed in Reference [
126]. The optimization problem was formulated as a partially observable MDP (POMDP), where the UAVs were capable of only observing the areas in their vicinity. Since each UAV has connectivity constraints and limited battery lifetime, the optimization problem aimed at improving the temporal average radio coverage and geographical fairness while minimizing the total energy consumption of the UAVs. In this algorithm, an agent, i.e., the UAV, and the environment interact at each of a sequence of discrete timeslots. The training of the algorithm was obtained using the observations of each UAV regarding the propagation environment. According to the simulation results, the proposed optimization method outperfoms the state-of-the-art DRL-EC3 approach based on deep deterministic policy gradient (DDPG) [
127] in terms of energy efficiency. A double Q-learning algorithm that handles the problem of trajectory design of UAVs was presented in Reference [
128]. Although standard Q-learning algorithms use the same Q-table for selection and evaluation processes and usually tend to overestimation and suboptimal results, this algorithm uses two Q-tables to decouple the selection from the evaluation. The effectiveness of the proposed algorithm was demonstrated via extensive simulations, and the gain was up to 19.4% and 6.7% regarding the number of satisfied users compared to the random algorithm and Q-learning algorithm, respectively.
Further advancements on UAV navigation were given in Reference [
129]. In greater detail, a UAV navigation scheme based on DRL that can select the best UAV-to-ground links in real-time was presented for a massive MIMO system which included one ground station with a large number of antennas and multiple UAVs with single antennas. First, a DQN was constructed to extract the environment information and to obtain useful features of the massive MIMO channel, and then, the DQN was trained in order to facilitate the decision-making procedure based on the received signal strengths. Using the DQN, a Q-learning policy was also exploited for successfully optimizing UAV navigation while achieving increased coverage and rapid convergence. The high performance of the proposed DQN navigation scheme was confirmed through extensive simulation results, where the channel was Rician with a Rician factor of 6 dB and a ground station with 128 transmit antennas and 32 single-antennan UAVs flying at different velocities were considered.
Finally, the optimization problem of proper dynamic placement of UAVs as BSs in downlink scenarios was studied in Reference [
130]. It was considered that the UAVs should make decisions about their placement in each time-slot, depending on the unknown density of ground users in an ML manner. Moreover, a constraint regarding the minimum UAV-recall-frequency or otherwise the maximum life-time, indicating the energy efficiency of mobile UAVs networks, should be also satisfied. In each time-slot, the UAV-recall-frequency was considered static and the results showed that the optimal UAV placement is achieved when the transmit power becomes equal to the onboard circuit power. In addition, the optimal hovering altitude that minimizes transmit power is proportional to the coverage radius, whereas the slope depends on the propagation environment and tends to increase in areas with high-rise buildings. These results also underlined that limiting on-board circuit power prolongs the life-time of mobile UAV networks. Next, the multiple time-slot scenario was studied, where unstable and non-ergodic time-varying density of ground users served with fixed data rate exists. For this scenario, the optimization problem was more complex and required a multi-stage decision process leading to an integer nonlinear programming coupled with an inherent integer linear programming. Since this problem was NP-hard, a sequential-Markov-greedy-decision (SMGD) method was proposed in order to achieve near-minimal UAV-recall-frequency in polynomial time. According to the results, a large number of sample sets are indispensable for effective pattern formation and reduction of UAV-recall-frequency in large areas with high-rise buildings and low ground user density whereas the SMGD becomes more complex as the number of UAVs increases.
Table 6 includes the relevant works on ML-based resource allocation and network planning techniques for UAV-enabled communication networks.