*Article* **Squirrel Search Optimization with Deep Transfer Learning-Enabled Crop Classification Model on Hyperspectral Remote Sensing Imagery**

**Manar Ahmed Hamza 1,\*, Fadwa Alrowais 2, Jaber S. Alzahrani 3, Hany Mahgoub 4, Nermin M. Salem <sup>5</sup> and Radwa Marzouk <sup>6</sup>**


**Abstract:** With recent advances in remote sensing image acquisition and the increasing availability of fine spectral and spatial information, hyperspectral remote sensing images (HSI) have received considerable attention in several application areas such as agriculture, environment, forestry, and mineral mapping, etc. HSIs have become an essential method for distinguishing crop classes and accomplishing growth information monitoring for precision agriculture, depending upon the fine spectral response to the crop attributes. The recent advances in computer vision (CV) and deep learning (DL) models allow for the effective identification and classification of different crop types on HSIs. This article introduces a novel squirrel search optimization with a deep transfer learningenabled crop classification (SSODTL-CC) model on HSIs. The proposed SSODTL-CC model intends to identify the crop type in HSIs properly. To accomplish this, the proposed SSODTL-CC model initially derives a MobileNet with an Adam optimizer for the feature extraction process. In addition, an SSO algorithm with a bidirectional long-short term memory (BiLSTM) model is employed for crop type classification. To demonstrate the better performance of the SSODTL-CC model, a wideranging experimental analysis is performed on two benchmark datasets, namely dataset-1 (WHU-Hi-LongKou) and dataset-2 (WHU-Hi-HanChuan). The comparative analysis pointed out the better outcomes of the SSODTL-CC model over other models with a maximum of 99.23% and 97.15% on test datasets 1 and 2, respectively.

**Keywords:** hyperspectral remoting sensing; crop mapping; image classification; deep transfer learning; hyperparameter optimization

#### **1. Introduction**

Due to advancements in remote sensing image acquisition mechanisms and the increasing availability of rich spatial and spectral data by means of various sensors, hyperspectral imaging has become more prominent [1]. Especially, hyperspectral remote sensing image (HSI) classification has become a major source for real-time application in fields such as

**Citation:** Hamza, M.A.; Alrowais, F.; Alzahrani, J.S.; Mahgoub, H.; Salem, N.M.; Marzouk, R. Squirrel Search Optimization with Deep Transfer Learning-Enabled Crop Classification Model on Hyperspectral Remote Sensing Imagery. *Appl. Sci.* **2022**, *12*, 5650. https://doi.org/10.3390/ app12115650

Academic Editors: Dimitrios S. Paraforos, Giovanni Randazzo, Anselme Muzirafuti and Stefania Lanza

Received: 11 April 2022 Accepted: 18 May 2022 Published: 2 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

mineral mapping, agriculture, environment, and forestry, etc. [2,3]. Usually, the HIS is taken at a large number of contiguous narrow spectral wavelengths for the improved analysis of the earth object. Since the spectral resolution could be in nm, the hyperspectral sensor offers significant facility in data analysis [4] for many humanitarian tasks, including precision agriculture for improved farming practices, discrimination amongst vegetation classes for better treatment, etc. [5]. The current study emphasizes using and analyzing HSI in the agriculture area. Conventional techniques, such as statistical-based analyses and field surveys, are time-consuming [6]. Cutting-edge remote sensing technologies involving HSI provide an appropriate solution and might fill the gap with solutions such as crop classification. In the HSI framework, the classification has the common objective of automatically labeling the pixel (spectral pattern or signature) into a predetermined class [7]. The classification is implemented either by utilizing the transformed feature or the original feature. An HSI has numerous features and is hard to adapt to a single convolutional kernel size. When the number of model layers is increased, many useful features are lost [8–10].

The authors of [11] proposed a rotation-invariant local binary pattern-based weighted generalized closest neighbor (RILBP-WGCN) approach for an HSI classifier. The presented RILBP is an improved texture-based classifier paradigm, which employs LBP filters to any designated bands to generate a wide sketch of spatial texture data. Similarly, the presented WGCN approach effectually maintained the spatial uniformity amongst the adjacent pixel employing a local weight method and point-to-set distances. Meng et al. [12] concentrated on a DL-based crop mapping, utilizing one-shot hyperspectral satellite imagery, whereas three CNN techniques, such as 1D-CNN, 2D-CNN, and 3D-CNN, were executed for end-to-end crop mapping. Furthermore, a manifold learning-based visualized method, i.e., t-distributed stochastic neighbor embedding (t-SNE), was established for demonstrating the discriminative capability of deep semantic feature extracting by the distinct CNN approaches.

In [13], a hybrid model was established for estimating the chlorophyll content from the crops utilizing HIS segmentation with active learning, which contains two important stages. First, it can utilize a sparse multinomial logistic regression (SMLR) method for learning the class posterior probability distribution with quadratic programming or joint probability distributions. Second, it can utilize the data developed from the preceding step for segmenting the HSI utilizing a Markov random field segment. Farooq et al. [14] examine patch-based weed identification utilizing HSI. A CNN was estimated and correlated to a histogram of oriented gradients (HoG) for this solution. Appropriate patch sizes were examined. The restriction of RGB imagery was established. In [15], a deep one-class crop (DOCC) structure that contains a DOCC extracting element and an OCC extraction loss element was presented for large-scale OCC mapping. The DOCC structure takes only the instances of one target class as input for extracting the crop of interest by positive and unlabeled learning and automatically extracts the feature for OCC mapping.

In [16], a low altitude UAV hyperspectral remote sensing platform was created for collecting higher spatial resolution remote sensing images of degraded grassland. The GDIF-3D-CNN classifier method was utilized for classifying the pure pixel and every pixel data set, whose accuracy and performance were enhanced by optimizing the eight parameters of the method. Wei et al. [17] present a fine classifier approach dependent upon multi-feature fusion and DL. During this case, the morphological profiles, GLCM texture, and endmember abundance features were leveraged to exploit the spatial data of HIS. Next, the spatial data were fused with original spectral data to generate a classifier outcome by utilizing a DNN with a conditional random field (DNN + CRF) method. In detail, the DNN is a deep detection method that extracts depth features and mines the potential data.

For smaller samples and higher-dimension HSIs, it becomes very complex to learn wide-ranging image features; subsequently, it becomes hard to precisely recognize complex HSI. The UAV-borne HSIs have rich spatial data, and the spatial resolution reaches centimeter level; however, the higher spatial resolution causes serious spatial heterogeneity and spectral variability. Nowadays, the deep learning (DL) method is extensively employed in

image processing because of its effective feature learning abilities [9]. Currently, the most common DL-based network framework is the convolution neural network (CNN). CNN has the features of parameter sharing, equivariant mapping, and sparse interaction, which reduce the training parameter size and complexity of the network. Such features permit the algorithm to generate a certain degree of invariance in scaling, shifting, and distortion and also create fault tolerance and stronger robustness [10]. Consequently, CNN has been extensively employed in HSI classification.

This article introduces a novel squirrel search optimization with a deep transfer learning-enabled crop classification (SSODTL-CC) model on HSIs. The proposed SSODTL-CC model initially derives a MobileNet with an Adam optimizer for the feature extraction process. The utilization of the Adam optimizer allows for effectual adjustment of the hyperparameters of the MobileNet model. In addition, a bidirectional long-short term memory (BiLSTM) method is employed for crop type classification. To enhance the classifier efficiency of the BiLSTM model, the SSO algorithm is employed for hyperparameter optimization, which shows the novelty of the work. To demonstrate the better performance of the SSODTL-CC model, a wide-ranging experimental analysis is performed on a benchmark dataset.

#### **2. Materials and Methods**

In this article, a new SSODTL-CC model has been developed to identify the crop type in HSIs properly. To do so, the proposed SSODTL-CC model performed feature extraction using MobileNet with an Adam optimizer. In addition, the BiLSTM model received feature vectors and performed crop type classification. To enhance the classifier efficiency of the BiLSTM model, the SSO algorithm was employed for hyperparameter optimization. Figure 1 illustrates the block diagram of the SSODTL-CC technique.

**Figure 1.** Block diagram of the SSODTL-CC technique.

#### *2.1. Data Collection*

In this section, the experimental validation of the proposed model is performed against two datasets [18], namely dataset-1 (WHU-Hi-LongKou) and dataset-2 (WHU-Hi-HanChuan). The dataset-1 comprises a total of 9000 samples with nine class labels, holding 1000 samples under each class. In addition, dataset-2 comprises a total of 16,000 samples with 16 class labels, holding 1000 samples under each class. Figure 2 shows the sample HSIs from various classes, such as water spinach, soybean, strawberry, corn, sesame, and broad-leaf soybean.

**Figure 2.** Sample images: (**a**) water spinach, (**b**) soybean, (**c**) strawberry, (**d**) corn, (**e**) sesame, and (**f**) broad-leaf soybean.

#### *2.2. Feature Extraction: MobileNet Model*

During the feature extraction process, the HSIs were passed into the MobileNet model to generate feature vectors. MobileNet is a CNN-based technique that is extensively applied in classifier procedures. The most important benefit of utilizing the presented method is that the model needs moderately low computation work in comparison with the CNN, which makes it appropriate to operate with a mobile device and a computer that operates with lower computational capabilities. The presented method is a fundamental architecture that combines convolution layers that are applied to efficiently distinguish details according to two controllable attributes that change between parameter precision and potential. The presented method is valuable in diminishing the size of the system.

The MobileNet structure is very effective with the least amount of attributes, namely Palmprint detection. This concerns a depth-wise convolution. The fundamental architecture is dependent on discrete abstracted layers, i.e., a module of dissimilar convolution layers that seem to be the quantal structure that measures a typical in-depth complication [19]. The resolution multiplier variable *ω* is added to minimize the measurement of the input dataset and inner layer representation with the analogous variable.

The feature vector map of size *Fm* × *Fm*, and the filter is of size *Fs* × *Fs*. The input variable is embodied by *p*, and the output variable is denoted by *q*. For the basic abstract layer of the structure, the whole computation work is considered as variable *ce*, and it could be evaluated as follows:

$$\mathcal{L}\_c = F\_s \cdot F\_s \cdot \omega \cdot \mathfrak{a}F\_m \cdot \mathfrak{a}F\_m + \omega \cdot \rho \cdot \mathfrak{a}F\_m \cdot \mathfrak{a}F\_m \tag{1}$$

The *ω* multiplier value can be considered within one to *n*. The variable resolution multiplier is known as *α*. The computational effort is recognized as the variable *coste* and is evaluated by the following equation:

$$\text{cost}\_{\mathcal{C}} = F\_{\text{s}} \cdot F\_{\text{s}} \cdot \omega \cdot \rho \cdot F\_{m} \cdot F\_{m} \tag{2}$$

The proposed approach incorporates the pointwise and depth-wise convolutions that are circumscribed by the reduction variable known as the variable *d*, which is evaluated in the following:

$$d = \frac{F\_{\sf s} \cdot F\_{\sf s} \cdot \omega \cdot \alpha F\_{\sf m} \cdot \alpha F\_{\sf m} + \omega \cdot \rho \cdot \alpha F\_{\sf m} \cdot \alpha F\_{\sf m}}{F\_{\sf s} \cdot F\_{\sf s} \cdot \omega \cdot \rho \cdot F\_{\sf m} F\_{\sf m}} \tag{3}$$

The two hyper characteristics, resolution and width multipliers, enable changing the optimal window size for accurate prediction based on the context. The third values suggest that it contains three input channels. The principle under the MobileNet structure replaced the complicated convolutional layer, which comprises a convolutional layer with 3 × 3 buffers for the input dataset, along with a pointwise convolutional layer of size 1 × 1 that combines the filtered variable to construct an element.

To optimally tune the hyperparameters related to the MobileNet model, the Adam optimizer is exploited. Furthermore, the hyperparameter optimized by the MobileNetv2 approach utilizes the Adam optimizer. It can be utilized for estimating an adoptive learning value, whereas the parameter was implemented for training the parameter of the DNN approach [20]. It can be a well-designed and effective approach for the 1st-order gradient with constraints stored for stochastic optimization. At this point, the newly presented approach was utilized to resolve the ML problem with the maximum dimensional parameter space, and the massive data set measures the rate of learning for different features with approximations of 1st and 2nd order moments. Additionally, the Adam optimizer was heavily utilized depending upon the gradient descent (GD) and momentum technique and a variety of intervals. Therefore, the 1st momentum is attained utilizing Equation (4):

$$m\_i = \beta\_1 m\_{i-1} + (1 - \beta\_1) \frac{\partial \mathcal{C}}{\partial w}. \tag{4}$$

The 2nd momentum is expressed as:

$$
v\_{i} = \beta\_{2} v\_{i-1} + (1 - \beta\_{2}) \left(\frac{\partial C}{\partial w}\right)^{2}.\tag{5}$$

$$w\_{i+1} = w\_i - \eta \frac{\stackrel{\frown}{m}\_i}{\sqrt{\vartheta\_i + \epsilon}'} \tag{6}$$

in which <sup>ˆ</sup> *mi* = *mi*/(1 − *β*1) and *v*ˆ*<sup>i</sup>* = *vi*/(1 − *β*2).

#### *2.3. Crop Type Classification: BiLSTM Model*

At the time of image classification, the extracted feature vectors are fed into the BiLSTM model. The BiLSTM approach receives the feature vector as input and executes the detection method. The LSTM signifies a different RNN method, which solves the problem of gradient vanishing of RNN by offering a threshold method and memory unit [21]. However, *x* denotes the network input at different times, *y* refers to the network outcome, *h* stands for the hidden layer (HL), *u* refers to the weighted input to HLs, *w* demonstrates

the weighted input of the previous node HL to the existing node HL, and *v* signifies the weighted input in HL to the output layer.

During the actual implementation of the LSTM technique, the LSTM unit was upgraded at time *t* as:

$$\mathbf{i}\_t = \sigma(\mathbf{W}\_i \mathbf{i}\_{t-1} + \mathbf{U}\_i \mathbf{x}\_t + \mathbf{b}\_t) \tag{7}$$

$$f\_t = \sigma \left(\mathcal{W}\_j h\_{t-1} + \mathcal{U}\_f \mathbf{x}\_t + b\_f\right) \tag{8}$$

$$\mathcal{E} = \tanh(\mathcal{W}\_c \mathbf{h}\_{t-1} + \mathcal{U}\_c \mathbf{x}\_t + \mathbf{b}\_c) \tag{9}$$

$$\mathfrak{c}\_{t} = f\_{t} \odot \mathfrak{c}\_{t-1} + i\_{t} \odot \tilde{\mathfrak{c}}\_{t} \tag{10}$$

$$\rho\_t = \sigma(\mathcal{W}\_0 \mathbf{h}\_{t-1} + \mathcal{U}\_0 \mathbf{x}\_t + \mathbf{b}\_0) \tag{11}$$

$$h\_t = o\_{t-1} \odot \tanh(c\_t) \tag{12}$$

At this point, stands for the equal product of elements, and *σ* denotes the sigmoid function. *xt* signifies the input vector at time *t*. *ht* refers to the HL vector named as the output vector and the storage of all the data at time *t* and the preceding time. *bt*, *bf* , *bc*, *bo* demonstrates the offset vector. *Wi*, *Wf* , *Wc*, *Wo* implies the weight of various gates to the HL vector *ht*. *Ui*, *Uf* , *Uc*, *Uo* stands for the weighted input vector. *xt* stands for the input, forgotten, unit, and output gates, correspondingly. Utilizing the 3-gates infrastructure, the LSTM permits the recurrent network to maintain the useful data of the task from the memory units at the time of the trained method, therefore evading the problem of the RNN disappearing but reaching an extensive range of data.

In addition to processing the series data, the BLSTM presents more backward estimate procedures, for instance, different normal LSTM cases. This process employs the subsequent data of sequences. At last, the forward and reverse estimations are executed. The values were resultant of the output layer simultaneously; thus, as the outcome, all of the sequence data are reached in 2 × 2 directions, which is utilized to complete a variety of natural language processing tasks.

#### *2.4. Hyperparameter Tuning: SSO Algorithm*

For enhancing the classifier efficiency of the BiLSTM model, the SSO algorithm is employed for hyperparameter optimization. The SSO technique is proposed by the foraging behavior of a flying squirrel; subsequently, an effectual method employed small animals for migration. According to the food foraging hierarchy of squirrels [22], the optimum SSO algorithm is iteratively developed in an arithmetical model. There are important characteristics in SSA, that is, population sizes *NP*, maximal value of iteration *Iter* max , the predator existence possibility *Pdp*, decision variables value *n*, gliding constants *Gc*, scaling factors *s f* , upper and lower limits to decision variable *FSU* and *FSL*. They are given in the following. The position of the squirrel is randomly loaded from the searching space:

$$FS\_{i,j} = FS\_L + rand(\ ) \* (FS\_{U} - FS\_L), \ i = 1, \ 2, \dots, \ \text{NP}, \ j = 1, \ 2, \dots, n \tag{13}$$

However,*rand* ( ) denotes an arbitrary value in [0, 1]. The fitness measure *f* = (*f*<sup>l</sup> *f*<sup>2</sup> , *fNP*) of a squirrel position was processed by replacing the decision variable with FF:

$$f\_i = f\_i(FS\_{i,1} \; \text{ } \; FS\_{i,2} \; \text{ } \; \dots \; \text{ } FS\_{i,n} \text{)} \; ; \; i = 1 \; \text{ } \; 2 \; \dots \; \text{ } NP \tag{14}$$

Next, the quality of food sources is evaluated by the fitness measure of a squirrel position as follows:

$$<\langle sorted\\_f, sort\\_index\rangle = sort\ (f) \tag{15}$$

In addition, the organization of food sources was processed, which comprised hickory trees, normal trees, and oak trees (acorn nuts). The optimal food source (lower fitness) was assumed to be the hickory nut tree (*FSh*r), the successive food sources that exist are denoted as acorn nut trees (*FSa*r), and the rest are called normal trees (*FSnt*):

$$FS\_{ht} = FS(sote - index(1))\tag{16}$$

$$FS\_{at}(1:3) = FS(sort - index(2:4))\tag{17}$$

$$FS\_{nt}(1:NP-4) = FS(sort - index(5:NP))\tag{18}$$

The three states that denote the dynamic gliding approach of squirrels are described in the following.

Scenario 1. The squirrel resides in an acorn nut tree and jumps to a hickory nut tree. A novel location can be given as follows:

$$FS\_{at}^{new} = \begin{cases} \begin{array}{c} FS\_{at}^{old} + d\_{\mathcal{S}} \mathcal{G}\_{\mathfrak{c}} \Big( FS\_{ht}^{old} - FS\_{at}^{old} \Big) \\ \text{random location} \end{array} \Big) \quad \begin{array}{c} ifR \ge P\_{dp} \\ \text{otherwise} \end{array} \tag{19}$$

Now *dg* indicates the gliding distance, *R*<sup>l</sup> denotes a function that proceeds the measured value of a uniform distribution value within 0 and 1, and *Gc* denotes a gliding constant.

Scenario 2. The squirrel resides in a normal tree and moves to acorn nut trees for gathering needed food. A novel location can be determined by:

$$FS\_{nt}^{new} = \begin{cases} {}^{FS\_{nt}^{old}} + {}^{d}\_{g}G\_{c}(FS\_{at}^{old} - FS\_{nt}^{old}) & \text{if } R \ge P\_{dp} \\ {}^{random}\_{localown} & {}^{"} \quad {}^{other}wise \end{cases} \tag{20}$$

Here, *R*<sup>2</sup> indicates a function that provides a measure of uniform distribution value in [0, 1] .

Scenario 3. Squirrels on normal trees go to hickory nut trees once they meet the routine objectives. Now, a novel position of squirrel can be determined by:

$$S\_{nt}^{ncw} = \begin{cases} \begin{array}{c} FS\_{nt}^{old} + d\_{\emptyset}G\_{\mathfrak{c}} \left( FS\_{ht}^{old} - FS\_{nt}^{old} \right) \\ \text{random location} \end{array} \end{cases} \text{ if } \mathbb{R} \ge P\_{dp} \\ \tag{21}$$

where *R*<sup>3</sup> shows a function that suggests the measure of uniform distribution amongst [0, 1] . Hence, this measure is a maximum that invokes high perturbation. For achieving an appropriate method, a scaling factor (sf) is employed as a divisor of *dg*.

The foraging nature of flying squirrels depends on the season, which varies frequently. Therefore, the seasonal observation must be implemented; thus, the trapping is removed in the local optimal result. The seasonal constant *Sc* and minimal value can be given as:

$$S\_c^t = \sqrt{\sum\_{k=1}^n \left( F S\_{at,k}^t - F S\_{ht,k} \right)^2}, t = 1, \ 2, \ 3 \tag{22}$$

$$S\_{\rm cmin} = \frac{10E - 6}{365^{\rm ltr}/(l\text{ter}\_{\rm max})/2.5} \tag{2.3}$$

For *S<sup>t</sup> <sup>c</sup>* < *Sc* min, the winter becomes the highest, the squirrel loses its exploring ability, and the method of searching for food sources and locations changes:

$$FS\_{nt}^{new} = FS\_L + \text{Lévy}(\mathbf{n}) \times (FS\_{ll} - FS\_L) \tag{24}$$

Now the Lévy distribution is employed to improve the global search to an enhanced method:

$$\text{Lévy}(x) = 0.01 \times \frac{r\_d \times \sigma}{|r\_b|^{1/\beta}} \tag{25}$$

$$\sigma = \left(\frac{\Gamma(1+\beta) \times \sin\left(\pi \beta/2\right)}{\Gamma((1+\beta)/2) \times \beta \times 2^{((\beta-1)/2)}}\right)^{1/\beta} \tag{26}$$

This approach stops when the maximal constraint is fulfilled. If not, the nature of creating a novel location and approving the seasonal observation need to be repeatedly followed.

#### **3. Experimental Validation**

*3.1. Result Analysis of SSODTL-CC Model*

This section investigates the performance of the proposed model on test images. Figure 3 showcases the sample classification results obtained by the SSODTL-CC model. The figure implies that the proposed model has obtained effective classification results. In addition, some of the misclassified regions by the SSODTL-CC model are marked in blue circles.

**Figure 3.** Sample classification result of the SSODTL-CC technique under dataset-1: (**a**) input image, (**b**) class labels, and (**c**) classification output.

Figure 4 inspects the confusion matrices created by the SSODTL-CC model on the classification of nine classes under dataset-1. The figure reports that the SSODTL-CC model has categorized all the classes under different sets of datasets. For the entire dataset, the SSODTL-CC model recognized 956 samples under corn, 975 samples under cotton, 971 samples under sesame, 971 samples under broad-leaf soybean, 964 samples under narrow-leaf soybean, 949 samples under rice, 965 samples under water, 958 samples under roads and houses, and 967 samples under mixed weed. Similarly, the SSODTL-CC model has categorized the class labels proficiently on 70% of the training samples and 30% of the testing samples on dataset-1.

**Figure 4.** Confusion matrix of the SSODTL-CC technique under dataset-1. (**a**) Entire dataset-1. (**b**) 70% of Training dataset-1 and (**c**) 30% of Testing dataset-1.

Table 1 reports detailed crop classification outcomes of the SSODTL-CC model on all of dataset-1. The experimental values indicated that the SSODTL-CC model gained effectual outcomes under every individual class. For instance, in the corn class, the SSODTL-CC model offered *accuy*, *precn*, and *recal* of 99.24%, 97.55%, and 95.60%, respectively. Similarly, on the mixed weed class, the SSODTL-CC model reached *accuy*, *precn*, and *recal* of 99.27%, 96.70%, and 96.70%, respectively. Overall, the SSODTL-CC model showed a maximum average *accuy*, *precn*, and *recal* of 99.20%, 96.43%, and 96.40%, respectively.

Table 2 depicts a brief crop classification outcome of the SSODTL-CC approach on 70% of training dataset-1. The experimental values stated that the SSODTL-CC method gained effectual outcomes under every individual class. For instance, in the corn class, the SSODTL-CC model offered *accuy*, *precn*, and *recal* of 99.19%, 97.04%, and 95.49%, respectively. In addition, in the mixed weed class, the SSODTL-CC system obtained *accuy*,

*precn*, and *recal* of 99.27%, 96.67%, and 96.67%, respectively. Overall, the SSODTL-CC model demonstrated maximum average *accuy*, *precn*, and *recal* of 99.19%, 96.38%, and 96.35%, correspondingly.


**Table 1.** Result analysis of the SSODTL-CC technique with distinct classes under all of dataset-1.

**Table 2.** Result analysis of the SSODTL-CC technique with distinct classes under 70% of training dataset-1.


Table 3 defines the detailed crop classification outcomes of the SSODTL-CC model on 30% of testing dataset-1. The experimental values indicated that the SSODTL-CC model gained effectual outcomes under every individual class. For instance, in the corn class, the SSODTL-CC approach presented *accuy*, *precn*, and *recal* of 99.37%, 98.68%, and 95.85%, correspondingly. Furthermore, in the mixed weed class, the SSODTL-CC methodology reached *accuy*, *precn*, and *recal* of 99.26%, 96.76%, and 96.76%, respectively. Overall, the SSODTL-CC model portrayed enhanced average *accuy*, *precn*, and *recal* of 99.23%, 96.54%, and 96.53%, correspondingly.

Figure 5 illustrates the confusion matrices created by the SSODTL-CC approach on the classification of sixteen classes under dataset-2. The figure reveals that the SSODTL-CC model categorized all the classes under different sets of datasets. On the entire dataset, the SSODTL-CC model recognized 783 samples under class 1, 757 samples under class 2, 766 samples under class 3, 728 samples under class 4, 721 samples under class 5, 774 samples under class 6, 764 samples under class 7, 788 samples under class 8, 779 samples under

class 9, 779 samples under class 10, 733 samples under class 11, 806 samples under class 12, 771 samples under class 13, 829 samples under class 14, 733 samples under class 15, and 821 samples under class 16. Similarly, the SSODTL-CC approach categorized the class labels proficiently on 70% of the training samples and 30% of the testing samples on dataset-2.

**Table 3.** Result analysis of the SSODTL-CC technique with distinct classes under 30% of testing dataset-1.


Table 4 demonstrates the detailed crop classification outcomes of the SSODTL-CC model on all of dataset-2. The experimental values exposed that the SSODTL-CC model gained effectual outcomes under every individual class. For instance, in class 1, the SSODTL-CC algorithm obtained *accuy*, *precn*, and *recal* of 97.39%, 79.57%, and 78.30% correspondingly. In addition, in class 16, the SSODTL-CC model gained *accuy*, *precn*, and *recal* of 97.17%, 74.98%, and 82.10%, correspondingly. Overall, the SSODTL-CC model outperformed higher average *accuy*, *precn*, and *recal* of 97.13%, 74.98%, and 82.10%, respectively.

Table 5 reports a brief crop classification outcome of the SSODTL-CC model on 70% of training dataset-2. The experimental values exposed that the SSODTL-CC model gained effectual outcomes under every individual class. For instance, in class 1, the SSODTL-CC model offered *accuy*, *precn*, and *recal* of 97.43%, 79.74%, and 79.29%, respectively. In addition, in class 16, the SSODTL-CC model reached *accuy*, *precn*, and *recal* of 97.21%, 74.08%, and 81.65%, respectively. Overall, the SSODTL-CC methodology exhibited maximal average *accuy*, *precn*, and *recal* of 97.13%, 77.08%, and 77.05%, correspondingly.

Table 6 defines the detailed crop classification outcome of the SSODTL-CC technique on 30% of testing dataset-2. The experimental values indicated that the SSODTL-CC algorithm gained effectual outcomes under every individual class. For sample, in class 1, the SSODTL-CC model offered *accuy*, *precn*, and *recal* of 97.29%, 79.15%, and 75.93%, correspondingly. In the same way, in class 16, the SSODTL-CC system reached *accuy*, *precn*, and *recal* of 97.06%, 76.80%, and 82.99%, respectively. Overall, the SSODTL-CC approach showed maximal average *accuy*, *precn*, and *recal* of 97.15%, 77.25%, and 77.09%, correspondingly.

#### *3.2. Discussion*

To ensure the improved crop classification results of the SSODTL-CC model, a comparison study with recent models on two datasets is given in Table 7 [22,23].

Figure 6 investigates a comparative classification outcome of the SSODTL-CC model with existing models on dataset-1. The results indicated that the SVM model gained an ineffectual outcome with the least *accuy* of 95.98%. In line with this, the FNEA-OO model certainly accomplished increased performance with an *accuy* of 97.07%. In addition, the SVRFMC, CNN, and CNN-CRF models depicted closer *accuy* values of 98.20%, 98.08%, and

98.80%, respectively. However, the SSODTL-CC model demonstrated superior performance with an *accuy* of 99.23%.

**Figure 5.** Confusion matrix of the SSODTL-CC technique under dataset-2. (**a**) Entire dataset-2. (**b**) 70% of Training dataset-2, and (**c**) 30% of Testing dataset-2.

Figure 7 examines a comparative classification outcome of the SSODTL-CC model with existing approaches on dataset-2. The outcomes indicated that the SVM model gained an ineffectual outcome with the least *accuy* of 77.34%. Likewise, the FNEA-OO model certainly accomplished an increased performance with an *accuy* of 86.49%. Then, the SVRFMC, CNN, and CNN-CRF models depicted closer *accuy* values of 86.95%, 87.72%, and 94.67%, correspondingly. At last, the SSODTL-CC methodology demonstrated superior performance with an *accuy* of 97.15%.

From these results and discussions, it is evident that the SSODTL-CC model has the capability of attaining improved crop classification outcomes on HSIs.


**Table 4.** Result analysis of the SSODTL-CC technique with distinct classes under all of dataset-2.

**Table 5.** Result analysis of the SSODTL-CC technique with distinct classes under 70% of training dataset-2.



**Table 6.** Result analysis of the SSODTL-CC model with distinct classes under 30% of testing dataset-2.

**Table 7.** Comparative analysis of the SSODTL-CC technique with recent algorithms in terms of *accuy*.


**Figure 6.** Comparative analysis of the SSODTL-CC technique under dataset-1.

**Figure 7.** Comparative analysis of the SSODTL-CC technique under dataset-2.

#### **4. Conclusions**

In this article, a new SSODTL-CC model was developed to properly identify the crop type in HSIs. To do so, the proposed SSODTL-CC model performed feature extraction using MobileNet with an Adam optimizer. In addition, the BiLSTM model received feature vectors and performed crop type classification. To enhance the classifier efficiency of the BiLSTM model, the SSO algorithm was employed for hyperparameter optimization. To demonstrate the better performance of the SSODTL-CC model, a wide-ranging experimental analysis was performed on two benchmark datasets, namely dataset-1 (WHU-Hi-LongKou) and dataset-2 (WHU-Hi-HanChuan). The comparative analysis pointed out the better outcomes of the SSODTL-CC model over the recent approaches, with a maximum of 99.23% and 97.15% on test datasets 1 and 2, respectively. Therefore, the SSODTL-CC model can be utilized for effective crop type classification on HSIs. In the future, the classification performance of the SSODTL-CC model can be enhanced by the design of hybrid DL models.

**Author Contributions:** Conceptualization, M.A.H.; Data curation, F.A.; Formal analysis, F.A. and J.S.A.; Investigation, J.S.A.; Methodology, M.A.H.; Project administration, H.M.; Resources, H.M.; Software, N.M.S.; Supervision, N.M.S.; Validation, R.M.; Visualization, R.M.; Writing—original draft, M.A.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by King Khalid University, grant number RGP 2/46/43, Princess Nourah bint Abdulrahman University, grant number PNURSP2022R77 and Umm al-Qura University, grant number 22UQU4340237DSR24.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable to this article as no datasets were generated during the current study.

**Acknowledgments:** The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under grant number (46/43). Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R77), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: 22UQU4340237DSR19.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

