*Article* **A Multi-Colony Social Learning Approach for the Self-Organization of a Swarm of UAVs**

**Muhammad Shafiq <sup>1</sup> , Zain Anwar Ali 1,\*, Amber Israr <sup>1</sup> , Eman H. Alkhammash <sup>2</sup> and Myriam Hadjouni <sup>3</sup>**


**Abstract:** This research offers an improved method for the self-organization of a swarm of UAVs based on a social learning approach. To start, we use three different colonies and three best members i.e., unmanned aerial vehicles (UAVs) randomly placed in the colonies. This study uses max-min ant colony optimization (MMACO) in conjunction with social learning mechanism to plan the optimized path for an individual colony. Hereinafter, the multi-agent system (MAS) chooses the most optimal UAV as the leader of each colony and the remaining UAVs as agents, which helps to organize the randomly positioned UAVs into three different formations. Afterward, the algorithm synchronizes and connects the three colonies into a swarm and controls it using dynamic leader selection. The major contribution of this study is to hybridize two different approaches to produce a more optimized, efficient, and effective strategy. The results verify that the proposed algorithm completes the given objectives. This study also compares the designed method with the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) to prove that our method offers better convergence and reaches the target using a shorter route than NSGA-II.

**Keywords:** social learning; ant colony optimization; multi-agent system

## **1. Introduction**

In the last decade, research has been exponentially increasing in the domains of flight control, path planning, and obstacle avoidance of unmanned aerial vehicles (UAVs) [1–3]. The analyses get increasingly complex when dealing with multiple UAVs in different formations. The natural behaviors of birds, ants, and fishes have been proven to be significant in formulating successful bio-inspired algorithms for the formation control, route planning, and trajectory tracking of a swarm of multiple UAVs [4–6]. Some of the important algorithms inspired by nature include ant colony optimization [7], pigeoninspired optimization [8], and particle swarm optimization [9].

The primary inspiration for this study is to utilize the knowledge obtained from studying the natural flocking and swarming activities of ants and use them for controlling UAVs. Researchers have used these nature-inspired algorithms for numerous purposes including cooperative path planning of multiple UAVs [10], distributed UAV flocking among obstacles [11], and forest fire fighting missions [12]. We also find multiple studies that hybridize a bio-inspired algorithm with another method to increase its efficiency [13–15].

There are many existing solutions regarding the problems of path planning and multi-UAV cooperation. One such research study [16] deals with the inspection of an oilfield using multiple UAVs while avoiding obstacles. The researchers achieve this using an improved version of Non-Dominated Sorting Genetic Algorithm (NSGA). Another existing solution to tackle a multi-objective optimization is addressed in reference [17]. In [17],

**Citation:** Shafiq, M.; Ali, Z.A.; Israr, A.; Alkhammash, E.H.; Hadjouni, M. A Multi-Colony Social Learning Approach for the Self-Organization of a Swarm of UAVs. *Drones* **2022**, *6*, 104. https://doi.org/10.3390/ drones6050104

Academic Editor: Xiwang Dong

Received: 11 April 2022 Accepted: 20 April 2022 Published: 23 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). *drones* the researchers use a hybrid of NSGA and local fruit fly optimization to solve interval multi-objective optimization problems. In reference [18], academics use a modified particle swarm optimization algorithm for the dynamic target tracking of multiple UAVs.

Our proposed method consists of many concepts, which are explained as follows:

Ant colony optimization (ACO) is an optimization technique used by ant colonies to find the shortest route to take to get to their food [19]. Using pheromones left behind from earlier ants, the ACO mimics ants looking for food. The path used by the most ants contains the most pheromones, which aids the next ant in choosing the shortest way [20]. The ACO technique is used in a variety of applications, such as the routing of autonomous vehicles and robots, which need to find the shortest path to a destination, and the design of computer algorithms, which need to find the optimal solution for a given problem.

Sometimes, however, the ACO is slow to converge and falls into the local optimum. To help solve these issues, researchers introduced a modified version of ACO called maxmin ant colony optimization (MMACO). It operates by controlling the maximum and minimum amounts of pheromone that can be left on each possible trial [21]. The range of possible pheromone amounts on each possible route is limited to avoid stagnation in the search process.

In social animals, social learning plays a significant part in behavior learning. Social learning, as opposed to asocial (individual) learning, allows individuals to learn from the actions of others without experiencing the costs of individual trials and errors [22]. That is why this study incorporates social learning mechanisms into MMACO. Unlike traditional MMACO variations, which update ants based on past data, each ant in the proposed SL-MMACO learns from any better ants (called demonstrators) in the present swarm.

Some of the state-of-the-art work in the field of optimization algorithms include research [23] that proposes the use of social learning-based particle swarm optimization (SL-PSO) for integrated circuits manufacturing. The SL-PSO is used to increase the imaging performance in extreme ultraviolet lithography. Results showed that the errors were reduced significantly compared to conventional methods. Similarly, another recent study [24] uses improved ant colony optimization (IACO) for human gait recognition. The IACO is used to enhance the extracted features, which are then passed on to the classifier. Compared with current methods, the IACO technique in [24] is more accurate and takes less time to compute.

A multi-agent system (MAS) is a collection of agents that interact with each other and the environment to achieve a common goal. The main function of the MAS is to tackle issues that a single agent would find difficult to solve. To achieve its objective, another important function of the MAS is to be able to interact with each agent and respond accordingly. In MAS, each agent can determine its state and behavior based on the state and behavior of its neighbors [25]. MAS has multiple uses in fields such as robotics, computer vision, and transportation [26–28]. In most MAS scenarios, an external entity is required to direct the agents toward the destination [29].

A leader is an agent that can alter the states of the follower agents. A leader can be outside or inside the MAS or can even be virtual. A large swarm of UAVs can be controlled more efficiently by selecting fewer leaders than follower agents. For example, when we want to control a fleet of UAVs, we can select a few leader agents, let the remaining UAVs follow the leader, and use the leading UAVs to control the state of the system.

The major contributions of this study are as follows:


The paper is organized into seven sections. Section 1 presents the introduction and the literature review. In Section 2, we break down the problem into three scenarios and describe each one in detail. Section 3 provides the framework of the proposed solution. Section 4 defines the proposed method with each of its constituent parts discussed at length and offers the algorithm and the flowchart. In Section 5, we discuss the simulations and their outcomes. Section 6 presents the conclusion of the study.

#### **2. Research Design**

We broke our problem into three different scenarios to help make it easier. In the first scenario, the UAVs are in random positions and then using our proposed algorithm, they organize themselves into a formation. In the second scenario, we navigate the newly organized formations through some obstacles. In the last scenario, we combine the three formations into one swarm and then navigate it through the same environment. Below, we describe each scenario in detail:

Scenario 1:

Figure 1 presents the first scenario. In this scenario, there are three UAVs in three different colonies and the UAVs are placed at random positions within each colony. The environment contains different obstacles like mountains and rough terrain. The goal is to reach the target using the shortest route possible without colliding with other UAVs. The main objective here is to maintain the formation throughout the journey.

**Figure 1.** Illustration of the first scenario.

#### Scenario 2:

Figure 2 illustrates the second scenario. In this scenario, there are again three UAVs in three different colonies and the UAVs are placed at random positions within each colony. The environment is also the same. The goal is to reach the target using the shortest route possible without colliding with the obstacles or other UAVs. The main distinction between the first and the second scenario is that, in the first task, we only had to demonstrate the ability of the algorithm to maintain a formation. However, the second task also requires navigating through the obstacles without any collision.

**Figure 2.** Illustration of the second scenario.

Scenario 3:

Figure 3 illustrates the third scenario. In this scenario, we pick up where the second scenario left off, i.e., the three colonies are now in the desired formations. The environment is the same as in the second scenario. The goal is to first synchronize the three colonies into one big swarm, and then, while maintaining the swarm, reach the target using the shortest route possible without colliding with the obstacles or other UAVs.

**Figure 3.** Illustration of the third scenario.

#### **3. Solution Architecture**

Figure 4 presents the framework of our proposed solution for the aforementioned problems. It is clear from the figure that, initially, the UAVs in the three different colonies are at random positions. Here, we apply the social learning-based max-min ant colony optimization (SL-MMACO) to each colony. SL-MMACO works by first finding the most optimal routes for each colony, and then the social learning mechanism sorts the ants from best to worst. Afterward, the multi-agent system (MAS) appoints the best ant as the leader and the remaining ants as agents. In the next step, we see that all the colonies are now synchronized to avoid any collision between the UAVs. Finally, we connect all three colonies into one big swarm and then select its leader dynamically according to the mission requirements.

**Figure 4.** Solution architecture.

#### **4. Proposed Algorithm**

This section introduces the different concepts and theories that we used for the development of our proposed algorithm. In this research paper, we are using a graph-based approach.

#### *4.1. Ant Colony Optimization*

Here, we are using a graph-based approach. The concept of nodes, edges, and legs is important to understand path planning using ant colony optimization (ACO). Figure 5 presents the relationship between the nodes, edges, and legs. The ACO generates intermediary points between the initial and final positions. These intermediate points are called the nodes. An edge is a link between two nodes. For instance, edge (b,c) is the length from edge b to c, whereas a leg is generated whenever a UAV turns.

**Figure 5.** Relationship between nodes, edges, and legs.

Suppose that the *m*th ant is at node *i* on time *t*, the probability of transition can be written as:

$$p\_{ij}^m(t) = \frac{\tau\_{ij}^\alpha \eta\_{ij}^\beta}{\sum\_{\mathbf{c} \in allowed\_i} \tau\_{ic}^\alpha \eta\_{ic}^\beta} \tag{1}$$

where the probability of transition from node *i* to node *j* of the *m*th ant is *p m ij*(*t*), the pheromone on the edge (*i*, *j*) is *τij*(*t*), the transit feasibility from node *i* to node *j* is *ηij*(*t*), the set of nodes that are neighboring *i* is *allowed<sup>i</sup>* , the constant influencing the *τij*(*t*) is *α*, and the constant influencing the *ηij*(*t*) is *β*.

After the method begins, the starting pheromone rate varies according to the edges. The pheromone rate is then reset by each ant that generated the result, which starts the next cycle of the process. *τij*(*t*) on the edge (*i*, *j*) is:

$$
\pi\_{\text{ij}}(t+1) = (1 - \rho) \times \pi\_{\text{ij}}(t) + \sum\_{m=1}^{k} \Delta \pi\_{\text{ij}}^{m}(t) \tag{2}
$$

where the rate of pheromone evaporation is *ρ* (0 ≤ *ρ* ≤ 1), total ants are represented by *k*, and the pheromone rate of the edge (*i*, *j*) is ∆τ *m ij*(*t*). ∆τ *m ij*(*t*) can be further defined as

$$
\Delta \tau\_{ij}^{m}(t) = \begin{cases}
\ \mathcal{Q}/\mathcal{L}\_{m}; & \text{ant } \prime m' \text{ uses edges of} (\mathbf{i}, j) \\
\ \mathbf{0}; & \text{otherwise}
\end{cases}
\tag{3}
$$

where the length of the route built by the *m*th ant is *L<sup>m</sup>* and *Q* is the constant.

#### *4.2. Max-Min Ant Colony Optimization*

We need to improve the traditional method to ensure that the ACO converges quickly. MMACO delivers some remarkable results in this area by restricting the pheromones on each route. To understand MMACO, we must first examine the path's cost. The average cost of path *Ja*,*k*(*t*) can be given as:

$$J\_{a,k}(t) = \frac{1}{k} \sum\_{m=1}^{k} J\_{a,m}(t) \tag{4}$$

Note that the *m*th ant only updates the pheromone when the cost of path of the *mth* ant in the *t*th iteration fulfill *Ja*,*k*(*t*) ≥ *Ja*,*m*(*t*).

The MMACO updates the route using Equation (3) after every iteration. After each iteration, the algorithm determines the most optimum and least optimal paths. To improve the probability of discovering the global best route, it discards the least optimum route. As a result, Equation (3) may be revised as follows:

$$
\Delta \mathbf{r}\_{ij}^{m}(t) = \begin{cases}
\text{Q/L}\_{\text{o}}; & \text{route } (\text{i}, \text{j}) \text{ refer to the optimal route} \\
\text{0}; & \text{otherwise}
\end{cases}
\tag{5}
$$

In the above equation, *L<sup>O</sup>* is the most optimal route and *L<sup>w</sup>* is the current iteration's worst route. The quantity of pheromone produced by MMACO is limited to specified values. This limitation aids in accelerating convergence and avoiding stagnation.

The algorithm restricts the pheromone on each route to a specified minimum and maximum value, denoted by *τmin* and *τmax*, respectively. This can be represented mathematically as:

$$\tau\_{lj}(t) = \begin{cases} \tau\_{\max}; & \tau\_{i,j}(t) \ge \tau\_{\max} \\ \tau\_{lj}(t); & \tau\_{\min} < \tau\_{i,j}(t) < \tau\_{\max} \\ \tau\_{\min}; & \tau\_{i,j}(t) \le \tau\_{\min}(t) \end{cases} \tag{6}$$

## *4.3. Social Learning Mechanism*

The conduct of a person to learn from their surroundings is referred to as social learning. You should learn not just from the top students in class, but also from students who are better than you. Most biological groups follow the same idea. Initially, for the map and compass process, locations and velocities of the ants are produced at random and are indicated as *X<sup>i</sup>* and *V<sup>i</sup>* (*i* = 1, 2, . . . , *m*). At the next iteration, the new location Xi and velocity Vi are calculated by the formula [30]:

$$X\_a^{N\_\mathcal{E}} = X\_a^{N\_\mathcal{E}-1} + V\_a^{N\_\mathcal{E}} \tag{7}$$

$$V\_a^{N\_\mathbb{C}} = V\_a^{N\_\mathbb{C}-1} \times e^{-R.N\_\mathbb{C}} + rand \times c\_1 \times (X\_{mod} - X\_a^{N\_\mathbb{C}-1}) \tag{8}$$

$$c\_1 = 1 - \log\left(\frac{N\_c}{m}\right) \tag{9}$$

where *R* represents the map and compass factor, which is between 0 and 1, *N<sup>c</sup>* is the current number of iterations, *c<sup>1</sup>* the learning factor, *Xmod* the demonstrator ant superior to the current ant, and *m* is the total number of ants. Each follows the ant better than itself, and this is known as the learning behavior. Figure 6 illustrates the process for the selection of demonstrator *Xmod*.

**Figure 6.** Social learning mechanism.

For the landmark operation, the social behavior occurs when ants from the center are removed, and other ants migrate toward the center. The procedure can be given as:

$$X\_{cent}^{N\_c - 1} = \sum\_{a=1}^{N\_c - 1} \frac{X\_a^{N\_c - 1}}{N^{N\_c - 1}} \tag{10}$$

$$X\_a^{N\_\ell} = X\_a^{N\_\ell - 1} + rand \times c\_2 \times (X\_{\text{cent}}^{N\_\ell - 1} - X\_a^{N\_\ell - 1}) \tag{11}$$

$$\mathbf{c}\_2 = \mathbf{a}\_s \left(\frac{N\_c}{m}\right) \tag{12}$$

where the social influence factor is *c*2, and *α<sup>s</sup>* is called the social coefficient.

#### *4.4. Multiple Agent Systems*

The multi-agent system comprises *n* independent agents that move with the same absolute velocity. Every agent's direction is updated to its neighbor's status. At time *t*, the neighbors of an agent *A* (1 ≤ *A* ≤ *n*) are those who are located within a circle of radius *r* (*r >* 0) centered on the position of agent *a*. At time *t*, the neighbor of the agent *a* is *Na*(*t*),

$$N\_a(t) = \{ b | d\_{ab}(t) < r \} \tag{13}$$

where the Pythagorean Theorem can be used to compute *dab*(*t*).

The coordinates of the agent at time *t* are (*xa*(*t*), *ya*(*t*)), where agent *a* is a neighbor to agent *b*. The absolute velocity *v* (*v* > 0) of each agent in the system is the same.

$$\mathbf{x}\_a(t+1) = \mathbf{x}\_a(t) + v \cos \theta\_a(t) \tag{14}$$

$$y\_a(t+1) = y\_a(t) + v \sin \theta\_a(t) \tag{15}$$

At time *t*, the heading angle of agent *a* is *θa*(*t*). The following equation is used by the algorithm to update the heading angle:

$$\theta\_a(t+1) = \tan^{-1} \frac{\sum\_{b \in \mathcal{N}\_d(t)} \sin \theta\_b(t)}{\sum\_{b \in \mathcal{N}\_d(t)} \cos \theta\_b(t)} \tag{16}$$

The equation above is used to discover obstructions by examining their surroundings. If a missing node in the neighbor (i.e., a hurdle) exists, the heading angle will be changed to prevent colliding with the obstruction.

We can analyze this algorithm using basic graph theory. Please note that each agent's neighbors are not always the same. The undirected graph set G*<sup>t</sup>* = {V, *ε<sup>t</sup>* } is used for agent coordination. Where the set containing every agent is V = {1, 2, ···, *N*}, and the time-varying edge set is *ε<sup>t</sup>* . A graph is connected if any two of its vertices are connected.

Equation (16) can be modified as:

$$\tan \theta\_a(t+1) = \sum\_{b \in N\_d(t)} \frac{\cos \theta\_b(t)}{\sum\_{m \in N\_d(t)} \cos \theta\_m(t)} \tan \theta\_d(t) \tag{17}$$

To further simplify Equation (17), we use a matrix,

$$
\tan\theta(t+1) = I(t)\tan\theta(t)\tag{18}
$$

where tan *θ*(*t*) , (tan *θ*1(*t*), . . . , tan *θN*(*t*))*<sup>τ</sup>* . For the graph G*<sup>t</sup>* , the weighted average matrix is *I*(*t*) , (*iab*(*t*)).

$$i\_{ab}(t) = \begin{cases} \frac{\cos\theta\_b(t)}{\sum\_{m \in N\_d(t)} \cos\theta\_m(t)} & \text{if } (a, b) \in \varepsilon\_t\\ 0, & \text{otherwise} \end{cases} \tag{19}$$

For the synchronization, we study the linear model of Equation (16) as follows:

$$\theta\_a(t+1) = \frac{1}{n\_a(t)} \sum\_{b \in N\_a(t)} \theta\_b(t) \tag{20}$$

where the number of elements in *Na*(*t*) is *na*(*t*). Equation (18) can be rewritten as,

$$
\tan \theta(t+1) = I(t)\theta(t) \tag{21}
$$

where *θ*(*t*) , (*θ*1(*t*), . . . , *θN*(*t*))*<sup>τ</sup>* , and the entries of the matrix e*I*(*t*) are,

$$\widetilde{i}\_{ab}(t) = \begin{cases} \frac{1}{n\_a(t)} \prime & \text{if } (a,b) \in \mathfrak{e}\_t\\ 0 & \text{otherwise} \end{cases} \tag{22}$$

#### *4.5. Synchronization and Connectivity*

To continue the study of the synchronization of the designed algorithm and the connectivity of the associated neighboring graphs, we should formally describe synchronization. If the headings of each agent match the following criteria, the system will achieve synchronization.

$$\lim\_{t \to \infty} \theta\_a(t) = \theta, \ a = 1, \dots, N \tag{23}$$

where *θ* varies to the starting values {*θa*(0), *xa*(0), *ya*(0), *a* = 1, . . . , *N*} and the system parameters *v,* and *r.*

Considering the model in Equations (14)–(16), let *θa*(0) ∈ −*π* 2 , *π* 2 , *a* = 1, . . . , *N* , and assume that the neighbor at the start, G<sup>0</sup> = {V, *ε*0} is connected. Therefore, to achieve synchronization, the system will have to satisfy,

$$v \le \frac{d}{\Delta\_0} \left( \frac{\cos \overline{\theta}}{N} \right)^N \tag{24}$$

whereas the number of agents is represented by *N*. Meanwhile,

$$\overrightarrow{\theta} = \max\_{a} |\theta\_{\mathfrak{a}}(\mathbf{0})| \tag{25}$$

$$d = r - \max\_{a,b \in \varepsilon\_0} d\_{ab}(0) \tag{26}$$

$$\Delta\_0 = \max\_{a,b} \left\{ \tan \theta\_d(0) - \tan \theta\_b(0) \right\} \tag{27}$$

Considering the models in Equations (14), (15) and (20)*,* let *θa*(*0*) ∈ [0, 2*π*), and let us assume that the starting neighbor graph is connected. Therefore, to achieve synchronization, the system will have to satisfy,

$$v \le \frac{d\left(\frac{1}{N}\right)^N}{2\pi} \tag{28}$$

whereas *d* is the same as described in Equation (26).

## *4.6. Dynamic Leader Selection*

Due to communication problems between agents, the formation structure of multiagent systems might occasionally change. Considering a random failure of communication, each connection (*a,b*)∈*<sup>E</sup>* fails independently with probability *<sup>p</sup>*. Let *<sup>G</sup>*<sup>ˆ</sup> be the graph topology and *EG*<sup>ˆ</sup> (*tconv*) be the expected convergence time for this model of communication failure, while *EG*<sup>ˆ</sup> () denotes the argument's expectation across the group of network structures represented by *<sup>G</sup>*<sup>ˆ</sup> . The convergence rate will be maximized by reducing *<sup>E</sup>G*<sup>ˆ</sup> (*tconv*)). As a result, the formula for choosing a leader *k* to maximize the convergence rate is as follows:

$$\max\_{\mathbf{Y}} \mathbf{E}\_{\hat{\mathbf{G}}} \Big( \min\_{\mathbf{x}(0)} \mathbf{x}(0)^{T} (\mathbf{Y}\hat{\mathbf{L}} + \hat{\mathbf{L}}\mathbf{Y}) \mathbf{x}(0) \Big) \tag{29}$$

So that it meets the below conditions,

$$\begin{cases} \text{tr}(Y) \ge n - k \\ \text{Y}\_{aa} \in \{0, 1\} \forall\_a \ne V \\ \text{Y}\_{ab} = 0 \forall\_a \ne b \end{cases} \tag{30}$$

As such, the objective function is in line with the projected convergence rate for potential network topologies. Where min*x*(0) *x*(0) *T YL*ˆ + *LY*ˆ *x*(0) is a convex function of *Y*.

#### *4.7. B-Spline Path Smoothing*

The hybrid algorithm's route consists mostly of a combination of line segments. The B-spline curve method is utilized to ensure the smoothness of the route created. The B-spline method is an improvement over the Bezier approach that preserves the convexity and geometrical invariability.

The B-spline path smoothing can be written as:

$$P(\boldsymbol{u}) = \sum\_{i=0}^{n} d\_i N\_{ij}(\boldsymbol{u}) \tag{31}$$

Considering Equation (31), *di*(*i* = 0, 1, . . . , *n*) are control points, and *Nij*(*u*) are the normalized b-order functions of the B-spline. These can be described as:

$$\begin{cases} \begin{array}{l} \mathcal{N}\_{lj}(\boldsymbol{u}) = \begin{cases} 1, & \text{if } \boldsymbol{u}\_{i} \le \boldsymbol{u} \le \boldsymbol{u}\_{i+1} \\ 0, & \text{otherwise} \end{cases} \\\ \mathcal{N}\_{lj}(\boldsymbol{u}) = \frac{\boldsymbol{u} - \boldsymbol{u}\_{i}}{\boldsymbol{u}\_{i+j} - \boldsymbol{u}\_{l}} \mathcal{N}\_{i,j-1}(\boldsymbol{u}) + \frac{\boldsymbol{u}\_{i+j+1} - \boldsymbol{u}}{\boldsymbol{u}\_{i+j+1} - \boldsymbol{u}\_{l+1}} \mathcal{N}\_{i+1,j+1}(\boldsymbol{u}) \\\ \text{define} \stackrel{\mathcal{O}}{\boldsymbol{0}} = \mathcal{0} \end{array} \tag{32}$$

The essential functions of the B-spline curve are determined by the parametric knots *u*<sup>0</sup> ≤ *u*<sup>1</sup> ≤ . . . ≤ *un*+*<sup>j</sup>* . In contrast to the Bezier curve, the B-spline curve is unaffected by altering a single control point. Another benefit of the control points over the Bezier curve is that the degree of polynomials does not increase when the control points are increased.
