1. Introduction
Remote sensing target detection is used to mark the objects of interest in remote sensing imagery (RSIs) and to predict the location and type of these targets [
1]. Based on the perspective of the Earth vision platform, the object strength in the aviation images always appears in a random direction and the target is only concentrated in the conventional detection dataset [
2]. The object detection (OD) technique is used to detect samples of semantic objects of specific classes (for example, humans, birds, or airplanes) in digital videos and images. Small target detection has often become a hot and challenging field in target detection tasks. Transport planning, environmental management, military, and disaster control are crucial applications of RSIs [
3]. Moreover, vehicles in RSIs, as a special class (whether transportation, civilian, or military), are of particular significance and increasingly difficult. First, vehicle targets in RSIs are fewer than twenty pixels or even ten pixels in the target detection task; the smaller target is generally a target that has fewer than thirty pixels in an image [
4]. Next, weather and environment images, including shadow, building, and atmospheric occlusions, and other factors, including similar colors amongst vehicles, dissimilar sizes of vehicle targets in similar images, different overhead views, and their environments, can all lead to the poor detection accuracy of car targets [
5].
Vehicle detection in RSI aims to identify each instance of a vehicle [
6]. In previous approaches, researchers often developed and extracted vehicle features manually and then classified them to attain vehicle detection [
7]. The fundamental objective is to extract vehicle features and utilize traditional machine learning (ML) techniques for classification. Generally, the integration channel features, the scale-invariant feature transform (SIFT), and the histogram of the oriented gradient (HOG) are the features utilized in the detection process. [
8]. The approaches utilized for classification are intersection kernel support vectors (IKSVM), AdaBoost, SVM, and so on. However, conventional target detection techniques pay greater consideration to completing the tasks of RSI vehicle detection, and it is challenging to balance speed and accuracy. In contrast to the tremendous growth of deep learning (DL) techniques, there is a big difference in the efficiency and accuracy of detection [
9]. Network models based on DL approaches can map complex nonlinear relationships and extract richer features. Two categories of target detection network models are continually formed and optimized due to the development of hardware technology and enormous data: single-stage networks (i.e., SSD and YOLOv3) and two-stage networks (i.e., cascade RCNN and fast RCNN) [
10].
This study designs an improved chimp optimization algorithm with a DL-based vehicle detection and classification (ICOA-DLVDC) technique on RSIs. The presented ICOA-DLVDC technique focuses on the utilization of the DL model for the detection of vehicles on the RSI with a hyperparameter tuning strategy. First, the ICOA-DLVDC method exploits the EfficientDet model for OD purposes. Next, the detected objects are classified using the sparse autoencoder (SAE) model. Finally, the hyperparameter tuning of the SAE method can be chosen by ICOA. An extensive set of experiments has been conducted to highlight the improved vehicle classification outcomes of the ICOA-DLVDC technique. In short, the key contributions of the paper are listed as follows.
An intelligent ICOA-DLVDC technique comprising an EfficientDet object detector, SAE classification, and ICOA-based hyperparameter tuning for RSI has been presented, and to the best of our knowledge, the proposed model will not be found in the literature;
SAE is able to learn informative and discriminative features with the reduction of the data dimensionality, which is helpful in handling large and complex remote sensing datasets;
The integration of the EfficientNet object detector with SAE classification can significantly accomplish enhanced generalization and adaptability over various RSI datasets;
Hyperparameter optimization of the SAE model using the ICOA algorithm using cross-validation helps to boost the predictive outcome of the ICOA-DLVDC model for unseen data.
The rest of the paper is organized as follows:
Section 2 provides the related works and
Section 3 offers the proposed model. Then,
Section 4 gives the result analysis and
Section 5 concludes the paper.
2. Related Works
Ahmed et al. [
11] designed an IoT-assisted smart surveillance solution for multi-OD using segmentation. In particular, the study proposes the utilization of DL, IoT, and collaborative drones to enhance surveillance applications in smart cities. The study proposed an AI-based technique using a DL-based pyramid scene parsing network (PSPNet) for multiple-object segmentation and applied an aerial drone dataset. The authors in [
12] developed a new one-phase OD technique termed MDCT based on a transformer block and multi-kernel dilated convolution (MDC) blocks. Initially, in the single-phase OD technique, a feature enhancement model, the MDC block, was introduced. Next, a transformer block was incorporated into the neck network of the single-phase OD technique. Finally, a depth-wise convolutional layer was incorporated into the MDC block for reducing the computation cost. Qiu, Bai, and Chen [
13] designed a new technique called YOLO-GNS for vehicle detection. First, the SSH (single-stage headless) model was devised to facilitate the detection of smaller objects and optimize the feature extraction.
The authors in [
14] developed an OD technique based on YOLOv5 for aerial RSI, named KCFS-YOLOv5. The K-means++ algorithm was used for optimizing the initial cluster point to attain the suitable anchor box. Coordinate attention (CA) was embedded with the backbone network of YOLO_v5 to develop the Bi-directional FPN (BiFPN) architecture. Ye et al. [
15] designed a convolution network using an adaptive attention fusion module (AAFM). Initially, the stitcher was used for developing one image with objects of different scales according to the features of object distribution in the dataset. Moreover, a spatial attention module was developed, and the semantic data of the feature map was attained. Xiaolin et al. [
16] presented an S
2ANET-SR model based on the S
2A-NET network. The original and reduced images were fed to the detection model; later, a super-resolution enhancement model for the reduced images was developed for enhancing the feature extraction of smaller objects, and the texture matching loss and perceptual loss were introduced as supervision.
Javadi et al. [
17] investigated the ability of 3D feature maps for enhancing the accuracy of DNN for the recognition of vehicles. First, they introduced a DNN by using YOLOv3 with the base network, involving DenseNet201, DarkNet53, SqueezeNet, and MobileNetv2. Next, 3D depth maps were produced. Later, FCNN was trained on 3D feature mapping. Wu et al. [
18] introduced a GCWNet (global context-weaving network) for object recognition in RSIs. Then, two novel modules were introduced for refinement and feature extraction.
Several automated vehicle detection and classification models have been presented in the literature. Despite the benefits of the earlier studies, it is still required to boost the vehicle classification performance. Because of the continual deepening of the model, the number of parameters of DL models also increases quickly, which results in model overfitting. At the same time, different hyperparameters have a significant impact on the efficiency of the CNN model. Particularly, hyperparameters such as epoch count, batch size, and learning rate selection are essential to attaining an effective outcome. Since the trial-and-error method for hyperparameter tuning is a tedious and erroneous process, metaheuristic algorithms can be applied. Therefore, in this work, we employ the ICOA algorithm for the parameter selection of the SAE model.
4. Results and Discussion
The proposed model is simulated using the Python 3.6.5 tool on PC i5-8600k, GeForce 1050Ti 4GB, 16GB RAM, 250GB SSD, and 1TB HDD. The parameter settings are given as follows: learning rate: 0.01; dropout: 0.5; batch size: 5; epoch count: 50; activation: ReLU.
The experimental evaluation of the ICOA-DLVDC technique is performed on two datasets: the VEDAI [
22] and ISPRS Postdam [
23] datasets. The former dataset includes 3687 images; and the latter dataset has 2244 images.
Table 1 and
Table 2 defined a detailed description of the two datasets.
Figure 3 depicts the sample images.
Figure 4 illustrates the classifier outcomes of the ICOA-DLVDC method under the VEDAI dataset.
Figure 4a,b describes the confusion matrix presented by the ICOA-DLVDC technique at 70:30 of the TR set/TS set. The figure denoted that the ICOA-DLVDC method has detected and classified all nine class labels accurately. Similarly,
Figure 4c demonstrates the PR examination of the ICOA-DLVDC system. The figure showed that the ICOA-DLVDC method has accomplished maximal PR outcomes under nine classes. Finally,
Figure 4d demonstrates the ROC examination of the ICOA-DLVDC method. The figure demonstrates that the ICOA-DLVDC method has resulted in proficient outcomes with the highest ROC values under nine class labels.
In
Table 3, the vehicle classification outcomes of the ICOA-DLVDC method on the VEDAI dataset are reported. The table values state that the ICOA-DLVDC technique properly recognized all the vehicle types. With 70% of the TR set, the ICOA-DLVDC technique gains average
,
,
,
, and MCC of 99.43%, 96.66%, 94.45%, 95.43%, and 95.15% respectively. Moreover, with 30% of the TS set, the ICOA-DLVDC method gains average
,
,
,
, and MCC of 99.50%, 97.27%, 94.45%, 95.94%, and 95.72%, respectively.
Figure 5 shows the training accuracy
and
of the ICOA-DLVDC method on the VEDAI dataset. The
is determined by the evaluation of the ICOA-DLVDC technique on the TR dataset; whereas the
is computed by evaluating the performance on a separate testing dataset. The outcomes demonstrate that
and
increase with an upsurge in epochs. Thus, the performance of the ICOA-DLVDC method is improved on the TR and TS datasets, with a rise in several epochs.
In
Figure 6, the
and
outcomes of the ICOA-DLVDC method on the VEDAI dataset are shown. The
defines the error among the predictive performance and original values on the TR data. The
represents the measure of the performance of the ICOA-DLVDC technique on individual validation data. The results indicate that the
and
tend to decrease with rising epochs. They portray the enhanced performance of the ICOA-DLVDC method and its capability to generate accurate classification. The reduced value of
and
demonstrates the enhanced performance of the ICOA-DLVDC technique in capturing patterns and relationships.
The comparison study of the ICOA-DLVDC technique with other DL models on the VEDAI dataset is highlighted in
Table 4 and
Figure 7 [
24]. The outcomes show that the ICOA-DLVDC technique accomplishes improved performance with
of 99.50%. On the other hand, the CSOTL-VDCRS, LeNet, AlexNet, and VGG-16 models achieve reduced performance with
of 98.07%, 79.78%, 88.98%, and 94.46%, respectively.
Figure 8 illustrates the classifier results of the ICOA-DLVDC technique on the ISPRS Postdam dataset.
Figure 8a,b demonstrates the confusion matrix presented by the ICOA-DLVDC system at 70:30 of the TR set/TS set. The figure demonstrates that the ICOA-DLVDC method has detected and classified all four class labels accurately. Similarly,
Figure 8c demonstrates the PR examination of the ICOA-DLVDC model. The figure shows that the ICOA-DLVDC technique has accomplished high PR outcomes under four classes. Lastly,
Figure 8d elucidates the ROC examination of the ICOA-DLVDC model. The figure shows that the ICOA-DLVDC method has resulted in proficient outcomes, with the highest ROC values under four class labels.
In
Table 5, the vehicle classification outcomes of the ICOA-DLVDC technique on the ISPRS Postdam dataset are reported. The table values stated that the ICOA-DLVDC technique properly recognized all the vehicle types. With 70% of the TR set, the ICOA-DLVDC method gains average
,
,
,
, and MCC of 99.52%, 96.86%, 95.12%, 95.79%, and 94.77%, respectively. Furthermore, with 30% of the TS set, the ICOA-DLVDC method gains average
,
,
,
, and MCC of 99.70%, 95.90%, 95.90%, 95.90%, and 95.15%, respectively.
Figure 9 shows the training accuracy
and
of the ICOA-DLVDC technique on the ISPRS Postdam dataset. The
is determined by the evaluation of the ICOA-DLVDC technique on the TR dataset; whereas the
is computed by evaluating the performance on a separate testing dataset. The outcomes demonstrate that
and
increase with an upsurge in epochs. As a result, the performance of the ICOA-DLVDC technique is improved on the TR and TS dataset, with a rise in the number of epochs.
In
Figure 10, the
and
outcomes of the ICOA-DLVDC technique on ISPRS Postdam dataset are shown. The
defines the error among the predictive performance and original values on the TR data. The
represents the measure of the performance of the ICOA-DLVDC technique on individual validation data. The results indicate that the
and
tend to decrease with rising epochs. The portray the enhanced performance of the ICOA-DLVDC technique and its capability to generate accurate classification. The reduced value of
and
demonstrates the enhanced performance of the ICOA-DLVDC technique in capturing patterns and relationships.
The comparison analysis of the ICOA-DLVDC method with other DL techniques [
24] on the ISPRS Postdam dataset is highlighted in
Table 6 and
Figure 11. The outcome specified that the ICOA-DLVDC technique accomplishes improved performance, with an accuracy of 99.70%. On the other hand, the CSOTL-VDCRS, LeNet, AlexNet, and VGG-16 models achieve reduced performance, with accuraciwa of 98.67%, 94.54%, 95.86%, and 89.54%, respectively.