**1. Introduction**

There are more and more applications for the collective use of Unmanned Aerial Vehicles (UAVs), more known UAV swarms. In addition to the advantages of the individually usage of these systems, the main motivation for swam usage is the reduction of flight time and operating costs together with increased fault tolerance [1]. Advances in the creation of algorithms [2] and telecommunications [3] allow us to have collective systems that are practically autonomous in their entirety. Thus, it is not necessary to have an operator per vehicle. Currently there are few systems that solve these path planning problems in the literature oriented to agricultural and forestry use, especially dedicated to the optimization of field survey tasks. This sector can be strongly benefited by the group use of aircraft. Therefore, the main field of application of this project is field prospecting.

This objective of this paper is to develop a system for solving the Path Planning problem with 2D grid-based maps adapted to UAVs' sensors with different number of UAVs using Q-Learning techniques.

### **2. Materials and Methods**

This section describes the calculation used for the extraction of the flight maps and the proposed method for the calculation of the flying paths, each described in its corresponding subsection.

**Citation:** Puente-Castro, A.; Rivero, D.; Pazos, A.; Fernandez-Blanco, E. Using Reinforcement Learning in the Path Planning of Swarms of UAVs for the Photographic Capture of Terrains. *Eng. Proc.* **2021**, *7*, 32. https:// doi.org/10.3390/engproc2021007032


Academic Editors: Joaquim de Moura, Marco A. González, Javier Pereira and Manuel G. Penedo

Published: 15 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

### *2.1. Flight Maps*

For the calculation of the flight maps, the cell size is calculated as the projection of the capture area of the sensors on the terrain based on the image size, the flight height and the lens angle of view. In order to better combine the captured data, the smallest area among all UAVs is chosen to take advantage of the overlapping of those with larger capture areas.

No previous information is extracted from the calculated grid-map to direct the calculation of the paths in order to avoid biases. However, by storing also information such as the position of the drones and the cells already visited at each moment, it is possible to provide a great amount of information in real time in order to improve the calculation of the paths.

#### *2.2. Proposed Model*

The proposed model for the calculation of the paths is a variation of the Q-Learning algorithm [4]. In this Reinforcement Learning algorithm (RL) [5] the calculation of the q-values is predicted based on an Artificial Neural Network (ANN) [6] with two fullyconnected layers with sigmoid activations and the RMSprop optimizer.

To obtain better results in less time, a Hill-Climbing policy [7] is followed to update the rewards received by the UAVs as they move. A training strategy using Memory Replay [8] has also been followed.

Another inherent problem with the proposed models is their configuration with respect to UAVs. There are two possibilities: first, to use a single global ANN for all UAVs; and, second, to use an ANN for each UAV, or local ANN. The first proposal requires less computational resources, but the path calculation for one UAV can be distorted with erroneous information from the paths of the other UAVs. On the other hand, the second approach requires more computational resources, but each ANN is specialized only for each UAV.

#### **3. Results**

For the experiments, simulations were carried out in the terrain of the CITIC research center. The metric of interest is the flight time taken to find a solution as it influences the energy consumption of each UAV. Resuls are listed at Table 1.


**Table 1.** Table summarizing the best times for each experiment with different numbers of UAVs and ANN configurations.

#### **4. Conclusions**

The calculation of flight path calculation of UAV swarms is approachable by Q-Learning with small full-connected ANNs. This makes the system faster and more efficient than others found in the literature. Thus, facilitating its use by other users. Minimizing the time taken to find each solution is a satisfactory metric that is rarely used by other authors. However, it is one of the most realistic since it is not possible to predict the battery consumption since it depends on other external factors such as the incident wind. One ANN per UAV is usually the best option. As the number of UAVs increases the time taken to find a solution does not grow much more, unlike a global ANN.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** The authors would like to thank the support from NVidia corp., which granted the GPU used in this work. They also acknowledge the support from the CESGA, where many of the preliminary tests were run.

### **Abbreviations**

The following abbreviations are used in this manuscript:


