**1. Introduction**

In recent years, unmanned aerial vehicles (UAVs) have attracted additional focus. The use of UAVs provides several distinct benefits over standard human-crewed airplanes, particularly concerning the operative charge, the operator's protection, the UAVs' functionality in arduous or risky settings, and their availability for civil implementations [1]. The latest technological developments have made it easy to set up an unmanned aerial system with a complex topology for crucial operations [2]. Their swift development and intense involvement in intelligent transportation (IT) has significantly affected the path that drone societies have attempted to establish for the prospective UAV systems. The present

Ghanimi, H.M.A.; Kumar, S.; Abbas, A.H.; Abosinnee, A.S.; Alkhayyat, A.; Hassan, M.H.; Abbas, F.H. Botnet Detection Employing a Dilated Convolutional Autoencoder Classifier with the Aid of Hybrid Shark and Bear Smell Optimization Algorithm-Based Feature Selection in FANETs. *Big Data Cogn. Comput.* **2022**, *6*, 112. https://doi.org/10.3390/bdcc6040112

**Citation:** Abdulsattar, N.F.; Abedi, F.;

Academic Editors: Yang-Im Lee and Peter R.J. Trim

Received: 9 September 2022 Accepted: 27 September 2022 Published: 11 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

169

decentralized technology advances allow for diverse operations and the correlation of resources [3]. This technique permits unnecessary the use of crucial elements and enhances the system's comprehensive strength [4]. Nevertheless, many contemporary developments in the network-attached UAV fleet domain concentrate on the path to attaining a drone network (DN) [5]. Low regard is given to the DN systems' cyber security, resulting in the very advanced DN systems being defenseless against diverse STs [6,7].

This assures the data's secrecy, attainability, and unity while transmitting during UAV-to-UAV transmission, and the safety of UAV-to-ground-node transmission remains a major problem experienced by FANETs. In FANETs, UAVs transfer data that encompass audio, video, image, text, GPS position, and other formats. In transmitting these data, they must possess a fine QoS, having low delay and error rates [8]. For dependable data delivery, FANETs send the most significant data in disparate deployments that must be dispatched in a time-bound way. Hence, the networks' dependability remains excellent [9].

The compromised FANET-IoT devices (IoTD) in no way exhibit signs of being hacked and function as zombies for the botmaster (BM) when initiating the attacks [10]. A BN's dimensions may remain small, comprising hundreds of bots, while a bigger BN can have thousands of bots. A few bots will be present on the dark web very inexpensively, while enormous BNs have heavy costs [11].

There are two kinds of BNs: (i) BNs accepting commands and in consistent interaction with the BM within a client–server framework; (ii) peer-to-peer bots that communicate independently with one another and initiate the attacks after obtaining the BM's commands. BMs interact with bots by employing the aid of a command-and-control (CnC) server; the bots remain concealed until the BM gives commands. The concealed bots' conduct creates infested bots and a botnet attack (BA), which is an intricate job [12]. The BAs include the following: (i) scan commands employed in discovering the defenseless IoTD; (ii) ACK, SYN, UDP, and TCP flooding; (iii) combination attacks employed in starting a link and transferring the spam into this [13]. The current drawback in UAV-assisted FANETs is the effective detection of security threats. For that purpose, the feature selection and classification methods need improvement. The contributions of this study are described below:


The proposed hybrid HSBSOpt\_DCA approach allows for more precise multiclass classification, including various types of attacks and non-attacks (NAs), and has shown encouraging results. The organization of remainder of this paper is as follows. Section 2 provides a state-of-the-art literature review. Section 3, the Materials and Methods, discusses the dataset used and the proposed methods. Section 4 provides a detailed analysis and the results. Section 5 concludes the article.

#### **2. Related Works**

In [14], Fried and Last proposed a novel and optimistic technique of employing wide-range and publicly accessible flight records for training in machine learning (ML) paradigms, which could identify anomalous flight designs and was proven to be a coherent counteractant for many ADS-B attacks. This novel technique varies from the formerly proffered methodologies, incorporating elementariness with the present ADS-B system. In [15], Mall et al. discussed unsupervised settings with sensors fixed in specific regions where the data can be gathered via mobile gadgets that remain attached to a UAV or drone. The authors initially modeled an appropriate framework and a lightweight convention for initiating safe transmission amongst the gadgets and the cloud through a portable drone. This convention also employs the physically unclonable function's (PUF) advantages for

creation, which is employed to encrypt the messages in transmission. The familiar Scyther simulator is employed to stimulate the convention, and the outcomes show that this convention remains fully secured, preventing confidential data seepage.

In [16], Mairaj et al. attempted to learn the benefits of game-theoretic (GT) implementations for the avoidance of DDoSAs upon a drone emanating data out of standard game solutions, and optimized this with an encompassed authenticity concept named the quantal response equilibrium (QRE). The authors detected possible schemes for every player via simulations and devised five non-collaborative game scenarios for the DDoSAs' two versions. In such games, the conventional GT resolution or Nash equilibrium (NashE) gives data regarding the drone's suggested modes, the hacker's favored scheme, and the GT threshold (TH), presuming that the participants remain exceptionally brilliant.

In [17], Popoola et al. suggested the federated DL (FDL) methodology for zero-day BA identification to prevent data secrecy seepage in IoT-edge gadgets (IoTEG). This study utilizes an optimal deep neural network (ODNN) framework for NT classification. A model parameter server (MPS) distantly organizes the DNN paradigms' training in several IoTEGs when the federated averaging algorithm is employed to sum up the local paradigm updates. A global DNN paradigm is generated after many transmission rounds between the MPS and the IoTEG.

In [18], Hatzivasilis et al. introduced WARDOG, an awareness and digital forensic system, which notifies the end-user of the BN's contamination, reveals the BN framework, and catches confirmable data, which is then employed in a law court. The accountable administration system collects the data and automatically creates documentation for each instance. The document comprises authentic forensic data tracing entire engaged bodies and their parts in the attack.

In [19], Xi et al. proposed convolutional neural networks (CNNs) with a new deep learning framework that consists of dilated convolutional neural networks and recurrent neural networks. These stacked dilated convolutional networks perform effective feature selection, and the softmax classifier is used to recognize activities, which increases the accuracy of the classification performance. In [20], Alharbi and Alsubhi proposed a graphbased machine learning (ML) technique for botnet detection. For feature evaluation, filter-based theories are used, which exhibit robustness to zero-day attacks. This method achieved high precision, but its accuracy was moderate. In [21], Sung et al. presented a new methodology for discovering the malware in GCSs, which employed a fastText paradigm to generate low-size vectors when compared with the vectors from one-hot encoding (OhE) and a bidirectional LSTM paradigm for a comparison alongside sequential opcodes (SO). Furthermore, the API function names were employed to enhance the classification precision of the SO. In the experimentation, the Microsoft malware classification competency database was employed, and the family types classified the malware within the database. This proffered methodology exhibited an execution enhancement of 1.8%, correlating with the execution of the OhE-related technique.

In [22], Shitharth and Prasad proposed the supervisory control and data acquisition (SCADA) systems with the Markov chain clustering (MCC) technique, rapid probabilistic correlated optimization (RPCO) approach, and block-correlated neural network (BCNN) method to improve the accuracy of the network. However, it failed to reduce the costeffectiveness of the process. Several studies have executed intrusion and malware identification processes. Nevertheless, there is a deficit of research discussing the problems concerning BN detection and feature extraction, magnitude reductions to repress counterfeit data, overfitting, and meticulous criteria calibration. Many research studies have employed actual BA databases in actual settings.

Furthermore, studies have analyzed ML paradigms for synthetic BN data devoid of apportions for feature engineering and an exhaustive overfitting analysis. Many studies have employed unbalanced live databases for learning and BN identification. The research studies chiefly concentrate on achieving greater precision, without discussing the constraints of greatly unbalanced databases or acquiring ostensive precision. In Table 1, a summary is provided with the limitations of the earlier research studies.


**Table 1.** Summary and limitations of some existing studies.

### **3. Proposed HSBSOpt\_DCA**

UAV sets can be linked with one another to function as a relay to transfer the data out of a remote area (RA) network. Generally, the UAVs possess a mission for a surveillance operation and an operation to create a relay network for gathering data from RAs, such as in a desert or jungle. The UAVs' motility and versatility make it effortless to arrive at these RAs and give connectivity to the network. Nevertheless, with minor exertion, the attacker could effortlessly hijack the system. As a result, the deficit of a firm framework and the vulnerable wireless medium within FANETs make the nodes liable to attackers.

The N-BaIoT database comprises traffic data for pre-processing using the one-hot encoding method. The pre-processed data are then input in the feature selection step using the hybrid shark and bear smell optimization algorithm, after which the classification is performed using a dilated convolutional autoencoder. The proposed HSBSOpt\_DCA (Figure 1) consists of several segments, including the dataset description, pre-processing employing OhE, FS employing HSBSOA, optimization initialization, odor absorption, frontward motion (FtM) toward the target, rotatory motion, updating the particle location, attaining the GS and LS, and classification employing DCAE.

**Figure 1.** Block schematic illustration for attack classification.

#### *3.1. Dataset Description*

The N-BaIoT database [23] comprises traffic data out of nine Industrial IoTD, whereby seven gadgets gather instances for eleven classes, and the other two gather data for six classes (Ennio\_doorbell and Samsung\_SNH\_1011\_N\_Webcam). The data consist of harmless traffic and diverse malevolent attacks such as scan, TCP, UDP, and SYN attacks. There remains a sum of eighty-nine csv files in the current database's variant, having sum dimensions of 7.58 GB and 1,486,418 instances for ordinary and attack happenings. The 2 Bas—MIRAI and BASHLITE—have been classified into ten attack classes (AC) and NA. The AC includes:


#### *3.2. Pre-Processing Employing OhE*

A categorical column (CC) is a column containing classes, where the cardinality remains minimum in nature. In the N-BaIoT database, four columns are detected as CCs, specifically 'Dir', 'Proto', 'sTos', and 'dTos'. The first column comprises seven classes, the second one comprises fifteen classes, the third one comprises six classes, and the fourth one comprises five classes. OhE indicates the procedure of transforming CCs into vectors of zeros and ones. A column with two and three classes has vector lengths of two and three, respectively. Transforming a five-class CC into a vector of zeros and ones with a length of five produces multicollinearity problems (MP).

The MP leads to unnecessary data and associated anticipators. The MP could be resolved by dropping a column's OhE classes. Thus, a column having five classes possesses a vector length of four rather than five. Relating to N-BaIoT, the OhE columns' quantity for four CCs would be twenty-nine columns currently. Every categorical feature (CF) exhibiting m feasible categorical values will be converted into a value in Rm employing a function *e*, which maps the feature's *j*th value into the m-dimensional vector's *j*th element.

$$e(\mathbf{x}i) = (0, \dots, 1, \dots, 0) \text{ if } \mathbf{x}i = j \tag{1}$$

The two arithmetical CFs will be scaled concerning every feature's average *π* and standard deviation *β*:

$$m(\mathbf{x}i) = \frac{\mathbf{x}1 - \pi}{\beta} \tag{2}$$

Pre-processing transforms NT into an observance sequence in which every observance will be portrayed as a feature vector (FV). The observances will be selectively labelled by their class as 'normal' or 'anomalous'. Such FVs will later be appropriate as inputs for data mining or ML algorithms.

#### *3.3. FS Employing HSBSOA*

The motivation behind the shark smell optimization (SSO) algorithm is the shark's capability and supremacy in capturing prey by employing a strong sense of smell (SoS) in a short time. A bear's olfactory bulb remains many times bigger than the rest of the beasts when its top job is to forward smell data from the nose toward the brain. In the bear smell optimization (BSO) methodology, the bear's SoS is exemplary in seeking foodstuffs at 1000 miles and beyond (known as the global solution (GS)) in optimization). As bears cannot see foodstuffs that far away, the statistical paradigm centered upon the SoS proposes a decisive manner for seeking such goals. By merging these two algorithms, a better fitness value (FtV) could be acquired for the FS procedure.

#### *3.4. Initialization Procedure*

The initial solution (IS) for the SSO algorithm's (SSOA) populace should be produced haphazardly inside the search space (SSp). Every IS portrays an odor particle (OP) that exhibits a feasible shark location at the start of the search procedure. The IS vector will be illustrated in Equations (3) and (4), accordingly to which *X*<sup>1</sup> *<sup>i</sup>* = *i*th refers to the populace vector's starting location and *NP* = population size refers to the populace's dimensions:

$$X^1 = \left[\mathfrak{x}\_1^1, \mathfrak{x}\_{2'}^1, \dots, \mathfrak{x}\_{NP}^1\right] \tag{3}$$

The concerned optimization issue could be conveyed by:

$$\mathbf{x}\_{i}^{1} = \begin{bmatrix} \mathbf{x}\_{i,1'}^{1}, \mathbf{x}\_{i,2'}^{1}, \dots, \mathbf{x}\_{i,NP}^{1} \end{bmatrix} \tag{4}$$

where *x*<sup>1</sup> *<sup>i</sup>*,*<sup>j</sup>* represents the *j*th size of the shark's *i*th location and ND represents the decision variables' numeral. By employing the BSO methodology, the bear's nose absorbs disparate smells; every one exhibits a location for movement, since all things possess a distinct odor in the ecosystem. Notice that several of these are called local solutions (LS). The desirable foodstuff's specific smell remains the final solution and is regarded as the GS. Consider *Fi* = *f c*<sup>1</sup> *<sup>i</sup>* , *f c*<sup>2</sup> *<sup>i</sup>* ,... *f c<sup>j</sup> i* ,... *f c<sup>k</sup> i* being the *i*th obtained smell having *k* elements or particles, which is designed to solve the optimization issue *x*<sup>1</sup> *<sup>i</sup>* = *x*1 *<sup>i</sup>*,1, *<sup>x</sup>*<sup>1</sup> *<sup>i</sup>*,2,... *<sup>x</sup>*<sup>1</sup> *i*,*NP* . As the bear obtains n smells during the breathing duration, the IS remains a matrix *FM* = [ *f c<sup>j</sup> i* ] *N* ∗ *k*. Presently, as per the glomerular layer procedure and breathing action in a sniff sequence, *DS<sup>j</sup> <sup>i</sup>* indicates the *j*th smell element within *i*th. Centered upon statistical formulas, we obtain two conditions, which are *t*\_ *inhale* ≤ *t* ≤ *t*\_ *exhale* and *t*\_ *exhale* ≤ *t* with the presence of fairness, which includes the balanced energy to maintain the traffic in the transmission line:

$$DS\_i^j = MG\_i \text{ ( $t - t\_{inhale}$ )} + DS\_i^{t\_{inhale}} + BE\_i \text{ ( $t - t\_{inhale}$ )}\tag{5}$$

Equation (5) works for the condition *t*\_ *inhale* ≤ *t* ≤ *t*\_ *exhale*, where *t*\_ *inhale* represents the inhalation time (IT) and *BEi* (*t* − *tinhale*) denotes the balanced energy required during the inhalation process:

$$DS\_i^j = DS\_i^{t\_- \text{ exhale}} \, \* \, BE\_i^{t\_- \text{ exhale}} \, \exp\left(\frac{t\_- \text{ exhale} - t}{\varepsilon \, \text{exhale}}\right) \tag{6}$$

Equation (6) works for the condition *t*\_ *exhale* ≤ *tt*\_ *inhale*, where *t*\_ *exhale* represents the exhalation time (ET) and *BEt*\_ *exhale <sup>i</sup>* denotes the balanced energy required during the process of exhalation. In the optimization procedure, the comprehensive duration of a breathing cycle remains identical to *k* or the *i*th smell's length, and as per the ET and IT the smell elements are split into 2 sets.

The total balanced energy is the summation of the energy required for the processes of vital energy (VE) and energy loss (EL) and is mathematically expressed below:

$$BE\_{total} = BE\_{vital} + BE\_{loss} \tag{7}$$

where *BEvital* denotes the dissipated energy during the process of inhalation and exhalation and *BEloss* denotes the transmission loss that occurs.

#### *3.5. Odor Absorption (OA)*

For the process of odor absorption, mitral and granular parts are used to contain the receptor sensitivity, OA, as well as the input data, which are presented as *OBMG* = *OB*<sup>1</sup> *MG*,*OB*<sup>2</sup> *MG*,...,*OB<sup>i</sup> MG*,...,*OB<sup>N</sup> MG* . Presently in this condition, *DS<sup>j</sup> <sup>i</sup>* = 0 exhibits that there is no smell in the olfactory epithelium prior to the subsequent inhalation. The non-negative array could be computed as:

$$OB\_{MG}^{i}(F\_{i}) = \frac{1}{k} \sum\_{j=1}^{k} f\left(fc\_{i}^{j}\right), f\left(fc\_{i}^{j}\right) \* S\_{factor} \tag{8}$$

where *k* indicates the odor's extent in *i*th odor, while Equation (7) works for two conditions, which are the threshold values *Vt* <sup>≤</sup> *f c<sup>j</sup> <sup>i</sup>* and *Vt* <sup>≥</sup> *f c<sup>j</sup> i* , where the arrays centered upon the odors data's represent the mean value. Here, *Sf actor* denotes the satisfaction factor, whereby the mathematical expression for this factor is expressed as:

$$S\_{factor} = \mathcal{W} \* \sum\_{i=1}^{N} (1 - \mathcal{W}) \tag{9}$$

where *N* denotes the total number of odor absorption mitral and *W* denotes the weight factor. The neural dynamics evolving out of the granular and mitral (GM) layers are calculated as:

$$\begin{aligned} X &= -H\_0 \omega\_\mathcal{Y}(Y) - \infty\_\mathcal{X} X + \sum L\_0 \omega\_\mathcal{Y}\left(X\right) + D\mathcal{S} + \left(E\_{initial} - E\_{least}\right) \\ Y &= W\_0 \omega\_\mathcal{x}\left(\mathbf{x}\right) - \infty\_\mathcal{y}\left(Y + D\mathcal{S}\_c + \left(E\_{initial} - E\_{least}\right)\right) \end{aligned} \tag{10}$$

where *X* = {*x*1, *x*2,... *xn*} and *Y* = {*y*1, *y*2,... *yn*} represent the G-M cell (GMC) actions accordingly; *DS* = {*ds*1, *ds*2,... *dsn*} and *DSc* = {*dsc*1, *dsc*2,... *dscn*} represent the outward inputs to the mitral and middle of the granule cells, respectively; *Einitial* denotes the initial energy and *Eleast* denotes the lowest energy unit.

#### *3.6. Frontward Motion (FtM) toward the Target*

If the blood is discharged into the water, a shark possessing a velocity V goes towards the powerful OPs in every position to move nearer to the prey (target). Thus, the velocity within each size will be computed as:

$$w\_{i,1}^k = \mu k.R1.\frac{\partial (OF)}{\partial x\_j^j} \tag{11}$$

where *<sup>k</sup>* = 1, 2, ... *kmax <sup>∂</sup>*(*OF*) *<sup>∂</sup>xj* , which would be the objective function (*OF*) at location *<sup>x</sup><sup>k</sup> i*,1; *kmax* indicates the phases' maximal quantity for the forward motion of the shark, *k* indicates the phases' quantity, *μk* indicates a value within the interval [0, 1], and *R*1 is a haphazard number in the interval [0, 1]. The rise in the odor intensity decides the increase in the shark's velocity. Owing to inertia, the shark's acceleration remains a constraint. Thus, the present shark's velocity depends upon its former velocity, which can be utilized by altering (9), as exhibited in the following expression:

$$v\_{i,1}^k = \mu k.R1. \frac{\partial (OF)}{\partial x\_{i}^j} + \approx k.R2 v\_{i,1}^{k-1} \tag{12}$$

where ∝ *k* portrays the inertia coefficient within the interval [0, 1], *vk*−<sup>1</sup> *<sup>i</sup>*,1 portrays the shark's former velocity, and *R*2, like *R*1, remains a haphazard number in the interval [0, 1]. Because of the shark's FtM, its novel location remains *Yk*+<sup>1</sup> *<sup>i</sup>*,1 , which is decided depending upon its former location (*x<sup>k</sup> <sup>i</sup>* ) and velocity (*v<sup>k</sup> <sup>i</sup>* ). Hence, the shark's novel location can be described as:

$$Y\_{i,1}^{k+1} = \mathfrak{x}\_i^k + \upsilon\_i^k.\Delta t\_k \tag{13}$$

where Δ*tk*i denotes a time interval that can be presumed to be one for simplicity:

Pseudocode for frontward motion begins Calculate velocity V

Update the position of target prey

Velocity of each shark (*v<sup>k</sup> i*,1)

> *vk <sup>i</sup>*,1 <sup>=</sup> *<sup>μ</sup>k*.*R*1. *<sup>∂</sup>*(*OF*) *∂xj*

Find maximal quantity for forward motion Release the odor and find its intensity Update the shark's novel location End

#### *3.7. Rotatory Motion (RM)*

The shark also possesses an RM that will be employed to discover powerful OPs. The SSOA procedure can be named the local search (LcS), which can be defined as:

$$Z\_{i,1}^{k+1,m} = Y\_i^{k+1} + R\Im\Im\_{i\_\succ}^{k+1} \tag{14}$$

in which *m* = 1, 2, ... , *M*, and *R*3 denotes a haphazard number in the interval [−1, 1]. In the LcS, several points (*M*) will be linked to create closed contour lines and to design the shark's RM within the SSp.

#### *3.8. Updating the Particle Location*

The shark's search path will carry on with the RM, since this is nearer to the point of having a powerful SoS. This feature within the SSOA could be described by:

$$\text{l.r}\_{i}^{k+1} = \text{argmax}\left\{ \text{OF}(Y\_{i}^{k+1}), \text{OF}\left(Z\_{i}^{k+1,i}\right), \dots, \text{OF}\left(Z\_{i}^{k+1,M}\right) \right\} \tag{15}$$

in which *xk*+<sup>1</sup> *<sup>i</sup>* portrays the shark's subsequent location with the greatest *OF* value.
