Lightweight UAV Detection Method Based on IASL-YOLO

Yang, Huaiyu; Liang, Bo; Feng, Song; Jiang, Ji; Fang, Ao; Li, Chunyun

doi:10.3390/drones9050325

Open AccessArticle

Lightweight UAV Detection Method Based on IASL-YOLO

by

Huaiyu Yang

¹,

Bo Liang

^1,*

,

Song Feng

¹,

Ji Jiang

²,

Ao Fang

¹ and

Chunyun Li

³

¹

Yunnan Key Laboratory of Computer Technology Application, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

²

School of Information and Network Security, Yunnan Police College, Kunming 650223, China

³

Kunming Educational Science Research Institute, Kunming 650031, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(5), 325; https://doi.org/10.3390/drones9050325

Submission received: 9 March 2025 / Revised: 8 April 2025 / Accepted: 21 April 2025 / Published: 23 April 2025

(This article belongs to the Special Issue UAV Detection, Classification, and Tracking)

Download

Browse Figures

Versions Notes

Abstract

:

The widespread application of drone technology has raised security concerns, as unauthorized drones can lead to illegal intrusions and privacy breaches. Traditional detection methods often fall short in balancing performance and lightweight design, making them unsuitable for resource-constrained scenarios. To address this, we propose the IASL-YOLO algorithm, which optimizes the YOLOv8s model to enhance detection accuracy and lightweight efficiency. First, we design the CFE-AFPN network to streamline the architecture while boosting feature fusion capabilities across non-adjacent layers. Second, we introduce the SIoU loss function to address the orientation mismatch issue between predicted and ground truth bounding boxes. Finally, we employ the LAMP pruning algorithm to compress the model. Experimental results on the Anti-UAV dataset show that the improved model achieves a 2.9% increase in Precision, a 6.8% increase in Recall, and 3.9% and 3.8% improvements in mAP50 and mAP50-95, respectively. Additionally, the model size is reduced by 75%, the parameter count by 78%, and computational workload by 30%. Compared to mainstream algorithms, IASL-YOLO demonstrates significant advantages in both performance and lightweight design, offering an efficient solution for drone detection tasks.

Keywords:

AFPN; EMA; FasterNet; lightweight; LAMP; SIoU; UAV detection; YOLOv8

1. Introduction

Unmanned Aerial Vehicles (UAVs), which are aircraft operated remotely or controlled by autonomous flight programs, have gained widespread application across various fields such as military [1], communication [2], and logistics [3], owing to their flexibility and efficiency. However, with the proliferation of UAV technology, their misuse has increasingly become a critical security concern, manifesting in forms such as unauthorized incursions [4], privacy violations [5], and potential threats to critical infrastructure [6]. These escalating risks further underscore the importance and urgency of research into anti-UAV detection technologies.

The rapid advancement of deep learning technology has provided novel solutions for anti-UAV detection, with deep learning-based methods gradually becoming the mainstream research direction in this field. Recent studies have demonstrated significant progress in enhancing detection performance through attention mechanisms and efficient feature learning. For instance, Ge et al. [7] proposed Neural Attention Learning (NEAL), a gradient-driven method that refines attention maps to optimize detection without extra computational overhead, achieving significant improvements on COCO and VOC benchmarks. In applications requiring real-time processing, Chen et al. [8] introduced MANet, a multi-attention framework that directly processes compressed video streams to reduce latency, demonstrating its potential for resource-constrained UAV scenarios. Meanwhile, Phan et al. [9] addressed the challenge of limited supervision by incorporating structural attention into Transformers, showing that anatomical priors can significantly improve synthesis accuracy in unpaired learning settings—an insight applicable to UAV detection under data scarcity. These works collectively advance attention-based modeling for accuracy–efficiency trade-offs in dynamic environments.

In recent years, the YOLO [10] series of object detection algorithms has gained widespread adoption across various fields due to their outstanding efficiency and real-time performance. Academic research on YOLO-based hybrid methods has yielded a range of innovative optimization strategies and practical engineering solutions. Stefenon et al. [11] proposed a Hypertuned-YOLO approach that creatively integrates genetic algorithm-based hyperparameter optimization with EigenCAM visual interpretation, successfully applying it to power distribution network fault localization. Experimental results demonstrate that this method achieves remarkable performance in insulator contamination detection, with an F1-score of 0.867 and mAP50 of 0.922. Singh et al. [12] developed a Pseudo-Prototype Component Network (Ps-ProtoPNet) for insulator defect classification in high-voltage transmission lines. Through systematic comparisons of various YOLO variants, they ultimately adopted YOLOv8m as the detection framework, achieving exceptional performance with an mAP50 of 0.9950 and mAP50-95 of 0.9125. The latest research further explores the integration of YOLOv5 detection modules with Quasi-ProtoPNet classifiers, opening new technical pathways for insulator defect classification tasks [13].

Meanwhile, the YOLO series of algorithms has demonstrated broad potential for application in UAV object detection tasks. However, due to the characteristics of UAV targets, such as their small size and low pixel proportion, combined with the influence of complex background textures and similar disturbances, the detection accuracy of the YOLO series algorithms for UAV small targets is often suboptimal. To enhance the precision of the YOLO series models in detecting UAV small targets, existing approaches typically optimize the models by incorporating attention mechanisms or adding specialized detection heads tailored for small targets. For instance, Hu et al. [14] were the first to apply an improved YOLOv3 algorithm to the field of anti-UAV detection. Their algorithm introduced an additional feature map scale to predict target bounding boxes, thereby capturing more texture and contour information, effectively improving the detection capability for UAV small targets. Fang et al. [15] proposed the SEB-YOLOv8s model, which reconstructs the YOLOv8 architecture using SPD-Conv, replaces the original Faster Implementation of Cross-Stage Partial Bottleneck with 2 convolutions (C2f) module with the Attention-enhanced C2f (AttC2f) module, and optimizes the Neck part by integrating BiLevel Routing Attention. On the Anti-UAV dataset, the model achieved a high accuracy of 95.9%. Ma et al. [16] proposed a high-performance LA-YOLO network that integrated the SimAM attention mechanism and a fusion block with normalized Wasserstein distance into YOLOv5, significantly enhancing the model’s detection accuracy for UAVs in low-altitude backgrounds. Zamri et al. [17] developed the P2-YOLOv8n-ResCBAM model based on the YOLOv8n architecture, incorporating multiple attention mechanisms and a high-resolution small target detection head, which increased the mean average precision (mAP) from 90.3% to 92.6%. However, this also increased model complexity, resulting in a decrease in inference speed.

Moreover, the practical application scenarios of anti-UAV detection often require deploying detection models on embedded devices or mobile terminals with limited computational resources. However, the limitations of the YOLO series models in terms of parameter volume and computational complexity make it difficult for them to directly adapt to such resource-constrained environments. To address this, existing research typically achieves the lightweight optimization of YOLO series models by introducing lightweight network structures or employing techniques such as model pruning. For example, Niu et al. [18] utilized the lightweight image detection network MobileNetV3 to replace the original CSPDarknet53 as the backbone network in the YOLOv4 framework and integrated a Coordinate Attention (CA) module, significantly reducing the model’s parameter count while maintaining detection accuracy. Zhang et al. [19] optimized YOLOv3-SPP3 using channel pruning and shortcut layer pruning algorithms, combined with fine-tuning training, which substantially compressed the model size and improved detection speed, albeit at the cost of a reduced mean average precision (mAP). Feng et al. [20] proposed an efficient UAV detection method based on YOLOv5s, constructing a lightweight backbone network using the ShuffleNetV2 network and a Coordinate Attention mechanism, and designing a balanced neck network with a Bidirectional Feature Pyramid Network (BiFPN) and Ghost Convolution. This method reduced the model’s computational complexity (GFLOPs) from 16.0 to 2.2 and increased the frame rate (FPS) from 153 to 188, though the mAP experienced a slight drop of 1.1%.

The trade-off between detection accuracy and model lightweighting is challenging to balance, as improving one aspect often significantly impacts the other. Consequently, existing research tends to focus on either enhancing detection accuracy or achieving model lightweighting as a singular goal, often overlooking the importance of maintaining high detection performance while ensuring a compact model architecture. To address this issue, this paper proposes the IASL-YOLO model based on the YOLOv8s framework, which optimizes the Neck structure, introduces a novel localization loss function, and implements pruning techniques to simultaneously enhance detection accuracy and achieve lightweighting. The acronym IASL encapsulates its core innovations:

IA (Improved AFPN): A novel C2f-Faster-EMA-enhanced Adaptive Feature Pyramid Network (CFE-AFPN) is proposed to replace the Neck module in YOLOv8s. It strengthens multi-scale feature fusion for UAV object detection while maintaining lightweight advantages.
S (SIoU Loss): The original Complete-IoU (CIoU) loss is replaced with SCYLLA-IoU (SIoU), improving localization accuracy without additional computational overhead.
L (LAMP Pruning): The Layer-Adaptive Sparsity for the Magnitude-Based Pruning (LAMP) algorithm is applied to eliminate redundant parameters while preserving detection performance.

The remainder of this paper is organized as follows: Section 2 provides a detailed introduction to the proposed method; Section 3 describes the datasets used in the experiments; Section 4 presents the experimental results and conducts a comprehensive analysis of the findings; Section 5 discusses the experimental details and compares the results with existing studies; and finally, Section 6 concludes the paper and suggests directions for future research.

2. Methods

The UAV target detection model proposed in this paper is developed by enhancing the YOLOv8 [21] framework. As an iterative version of the YOLO series, YOLOv8 introduces five hierarchical models: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. These models balance detection accuracy and inference speed by adjusting the network scale. YOLOv8 excels in anti-UAV detection tasks, leveraging its optimized multi-scale feature extraction capabilities and robust adaptability to complex environments, which are crucial for addressing the challenges of detecting small UAV targets. Its architecture, as illustrated in Figure 1, consists of three main components: the Backbone, Neck, and Head. To address the issues of insufficient detection accuracy and excessive model size in UAV small target detection scenarios, we propose the IASL-YOLO model, based on YOLOv8s as the baseline. The specific improvement strategies will be detailed in the subsequent sections.

In the design of Backbone, YOLOv8 adopts the CSPDarknet architecture, combined with the C2f module featuring a gradient shunting mechanism to achieve multi-scale feature extraction. It is further enhanced by the Fast Spatial Pyramid Pooling (SPPF) module, which optimizes the contextual information aggregation capability of feature maps. Notably, the C2f module retains the advantage of cross-stage feature concatenation from the Cross-Stage Partial Bottleneck with 3 convolutions (C3) module in YOLOv5 [22], while integrating the multi-branch design philosophy of Efficient Layer Aggregation Networks (ELANs). By introducing additional branches and more flexible feature concatenation, the module significantly boosts the model’s ability to preserve spatial information. This enables YOLOv8 to precisely capture the geometric structure and spatial distribution characteristics of small UAV targets amidst complex background interference, effectively suppressing such disturbances during recognition tasks.

In the Neck section, YOLOv8 utilizes a bidirectional feature pyramid network based on the Path Aggregation Network-Feature Pyramid Network (PAN-FPN) structure, achieving efficient multi-level feature fusion through two complementary pathways. The top–down pathway propagates high-level semantic information from deep features to shallow features, enhancing the semantic representation of shallow layers. Conversely, the bottom–up pathway conveys rich detail information from shallow features to deep features, refining the local details of high-level features. This fusion structure allows YOLOv8 to retain the fine features and critical details of small UAV targets, significantly improving their detection effectiveness in anti-UAV scenarios.

In the Head section, YOLOv8’s loss function comprises classification loss and localization loss. The classification loss employs the Binary Cross-Entropy (BCE) function to measure the discrepancy between predicted and true class probability distributions, ensuring accurate learning of class features. For localization, a combination of Distribution Focal Loss (DFL) [23] and CIoU [24] loss is used. This approach not only ensures precise bounding box localization for UAV targets but also optimizes the shape and position of predicted boxes, thereby enhancing detection performance for fast-moving or dynamically changing UAV targets.

2.1. The Structure of IASL-YOLO

Although the YOLOv8 object detection algorithm can effectively detect small UAV targets, it still faces challenges such as insufficient model lightweighting and inadequate learning and fusion of multi-level features. To address these issues, this paper introduces a lightweight small UAV target detection model, IASL-YOLO, based on YOLOv8s as the baseline model. The network structure is illustrated in Figure 2. The specific improvements include the following aspects: Firstly, the PAN-FPN structure in the Neck part of YOLOv8 is replaced with the proposed CFE-AFPN network, which employs a progressive fusion strategy to better integrate non-adjacent level features. The C2f-Faster-EMA module within this network enhances the representation of local details and global contextual information while reducing computational complexity. Secondly, the CIoU localization loss function of YOLOv8 is replaced with the SIoU localization loss function to address the misalignment between predicted and ground truth bounding boxes. Lastly, the LAMP pruning algorithm is applied to prune the model, significantly reducing its complexity without compromising detection accuracy, thereby achieving further lightweighting.

2.1.1. Improvement of the Neck

In UAV-based object detection tasks, the diversity of target scales imposes higher demands on the design of feature fusion networks. Although YOLOv8 employs the PAN-FPN structure in its Neck section, which enhances feature interaction between adjacent layers through top–down and bottom–up pathways, its support for feature transmission between non-adjacent layers is limited. This limitation may result in the loss of fine details for small targets in deeper layers or the dilution of semantic information for large targets in shallower layers, thereby affecting the overall multi-scale object detection performance.

To address this issue, this paper introduces the Asymptotic Feature Pyramid Network (AFPN) [25], enhances it further by integrating the C2f-Faster-EMA module, and ultimately proposes the CFE-AFPN network. This network significantly enhances UAV object detection performance by employing a progressive fusion strategy, an efficient multi-scale attention mechanism, and the incorporation of a 160 × 160 feature output layer. Additionally, by incorporating a single bottom–up feature fusion path and a design featuring cross-stage partial connections, the architecture significantly simplifies the network structure and effectively reduces model complexity. The structure of the CFE-AFPN network is shown in the Neck section of Figure 2.

The specific workflow of CFE-AFPN is as follows: In the initial stage of the network, CFE-AFPN fuses adjacent low-level features to reduce the semantic gap between them. As the network depth increases, CFE-AFPN progressively introduces higher-level features and integrates them with low-level features through a bottom–up progressive pathway, ultimately incorporating the highest-level features into the fusion process. After each feature fusion, CFE-AFPN employs the C2f-Faster-EMA module to further refine the learned features. Simultaneously, to address information conflicts between features of different levels, CFE-AFPN adopts the Adaptively Spatial Feature Fusion (ASFF) [26] network, which multiplies features from different levels by learnable coefficients, assigning spatial weights to features at various levels. This enables CFE-AFPN to adaptively retain effective information and achieve superior feature fusion. Finally, CFE-AFPN outputs four feature maps with different resolutions, each corresponding to a specific scale to meet the detection requirements of targets of varying sizes. Additionally, the newly introduced 160 × 160 output layer enhances the model’s capability to detect small UAV targets, thereby significantly improving its overall performance in anti-drone detection tasks.

Notably, while retaining the feature fusion capabilities of AFPN, CFE-AFPN reduces computational costs and also improves detection performance by replacing the Residual Networks (ResNet) [27] residual units in the AFPN network with the C2f-Faster-EMA module. The structure of the C2f-Faster-EMA module is illustrated in Figure 3.

The C2f-Faster-EMA module begins by performing a convolution operation on the input feature map, followed by splitting the feature map into two branches. One branch directly participates in subsequent feature fusion, while the other branch is processed through multiple stacked FasterBlock modules. The outputs of all FasterBlock modules except the last one are also extracted and incorporated into the subsequent feature fusion process to capture more fine-grained features. The two branches are then fused, and the EMA mechanism is employed to enhance cross-spatial feature representation, ultimately generating multi-scale attention feature maps. By integrating the strengths of C2f, FasterNet [28], and the EMA [29] attention mechanism, the C2f-Faster-EMA module achieves a lightweight design while further improving the model’s detection performance.

We introduced the C2f structure to replace the original residual units in AFPN for further feature learning after feature fusion. C2f, a key component of YOLOv8, utilizes Cross-Stage Partial Network (CSPNet)’s [30] cross-stage partial connection design to divide the input feature map into two parts: one part is passed directly to the subsequent stage, while the other part undergoes convolution before being fused with the first part. Compared to the residual units in ResNet, which require convolution on the entire feature map before adding it to the original feature map, this design in CSPNet significantly reduces computational redundancy and memory demands.

To further enhance model efficiency, we replaced the original stacked Bottleneck structure with the more lightweight FasterBlock in the C2f framework. FasterBlock, the core component of FasterNet, consists of a 3 × 3 partial convolution (PConv) layer and two 1 × 1 point convolutions. Specifically, PConv applies convolution to only 1/4 of the input feature channels, while the remaining channels are directly preserved and involved in subsequent feature fusion. This design reduces the computational load of PConv to 1/16 of that of a standard convolution, significantly alleviating the model’s computational burden.

Additionally, we incorporated the Efficient Multi-scale Attention (EMA) mechanism after feature concatenation in the C2f module, as illustrated in Figure 4, to notably improve small-target detection performance. The EMA mechanism reshapes partial channels into batch dimensions and combines them with a grouped processing strategy to achieve uniform distribution of semantic features across channels, effectively preserving channel information. Its core architecture consists of two parallel branches: one branch employs 1 × 1 convolution to capture global context information, while the other uses 3 × 3 convolution to extract multi-scale local features. By fusing the outputs of both branches through cross-spatial learning, EMA precisely captures pixel-level pairwise relationships, enhancing the global contextual representation of features. This design enables more efficient extraction of multi-scale features, significantly improving the detection capability for small UAV targets.

2.1.2. Introduction of the SIoU Loss Function

In YOLOv8, the traditional localization loss function CIoU comprehensively considers the distance between the centers of the predicted and ground truth bounding boxes, their aspect ratios, and their overlapping regions. However, it still has a key limitation: it does not account for directional alignment between the two boxes. Specifically, when there is a significant angular deviation between the line connecting the centers of the predicted and ground truth boxes and the coordinate axes (horizontal or vertical directions), the regression process of the model requires additional adjustments for directional correction, which increases optimization difficulty and reduces convergence efficiency.

To address this issue more effectively, we introduce the SIoU [31] loss function. Compared to CIoU, SIoU incorporates an angle cost term to penalize directional misalignment between the predicted and ground truth boxes. This mechanism guides the model to prioritize directional alignment, thereby accelerating convergence and improving detection accuracy. The SIoU loss consists of four components: angle cost, distance cost, shape cost, and IoU cost. The physical meanings and calculation methods of each component are as follows:

Angle Cost (

Λ

): Precisely quantifies the angular deviation between the predicted and ground truth boxes, ensuring accurate directional alignment.

Λ = 1 - 2 {sin}^{2} (arcsin (\frac{c_{h}}{σ}) - \frac{π}{4})

(1)

where

c_{h}

is the height difference between the centers of the ground truth and predicted boxes, and

σ

is the distance between their centers.

Distance Cost (

Δ

): Measures the distance between the centers of the two boxes, optimizing positional regression.

Δ = \sum_{t \in {x, y}} (1 - e^{- γ ρ_{t}}), \{\begin{matrix} ρ_{x} = {(\frac{b_{c_{x}}^{'} - b_{c_{x}}}{c_{w}})}^{2}, \\ ρ_{y} = {(\frac{b_{c_{y}}^{'} - b_{c_{y}}}{c_{h} 1})}^{2}, \\ γ = 2 - Λ \end{matrix}

(2)

where

b_{c_{x}}^{'}

and

b_{c_{x}}

are the x-coordinates of the centers of the ground truth and predicted boxes,

b_{c_{y}}^{'}

and

b_{c_{y}}

are their y-coordinates,

c_{h 1}

is the height of the minimum enclosing rectangle,

c_{w}

is its width, and

ρ_{t}

denotes

ρ_{x}

or

ρ_{y}

.

Shape Cost (

Ω

): Evaluates the difference in aspect ratios between the predicted and ground truth boxes to accurately capture the target’s morphological characteristics.

Ω = \sum_{t \in {w, h}} {(1 - e^{- w_{t}})}^{θ}, \{\begin{matrix} w_{w} = \frac{| w - w^{'} |}{max (w, w^{'})}, \\ w_{h} = \frac{| h - h^{'} |}{max (h, h^{'})} \end{matrix}

(3)

where

(w, h)

and

(w^{'}, h^{'})

represent the widths and heights of the predicted and ground truth boxes, respectively,

θ

controls the influence of the shape cost, and

w_{t}

denotes

w_{w}

or

w_{h}

.

IoU Cost: Measures the degree of overlap between the two boxes to ensure spatial consistency.

IoU = \frac{| b \cap b^{'} |}{| b \cup b^{'} |}

(4)

where b is the predicted box and

b^{'}

is the ground truth box.

Finally, the SIoU loss function is obtained by integrating these four components, as shown in the following equation.

L o s s_{S I o U} = 1 - I o U + \frac{Δ + Ω}{2}

(5)

2.1.3. Model Pruning Using the LAMP Algorithm

YOLOv8 has achieved remarkable breakthroughs in object detection performance and accuracy. However, the increased model complexity has led to a dramatic rise in computational load and parameter volume. This issue makes efficient deployment on resource-constrained embedded devices particularly challenging, especially for anti-UAV detection tasks, where real-time processing and lightweight models are crucial. To address these limitations, we introduce the LAMP [32] algorithm, which adaptively optimizes the model structure to enable lightweight deployment while maintaining detection accuracy, providing a viable solution for anti-UAV detection.

The implementation of model pruning involves three key steps: First, determine the speed-up ratio, which is the ratio of the computational load of the original model to that of the pruned model. Second, apply the LAMP pruning algorithm to perform the pruning operation. Finally, fine-tune the pruned model to recover any potential performance loss during the pruning process.

The core of the LAMP algorithm lies in its layer-adaptive scoring mechanism. This algorithm meticulously scores the weight tensors corresponding to fully connected layers and convolutional layers, dynamically pruning the connections with the smallest LAMP scores until the predefined speed-up ratio is achieved. By dynamically adjusting weight scores and integrating layer-wise optimization strategies, the LAMP algorithm effectively preserves the model’s core features while significantly reducing redundant computations. This approach greatly enhances computational resource efficiency, making the model more suitable for deployment in resource-constrained anti-UAV detection scenarios.

The calculation formula for the LAMP score is as follows. Here,

W [u]

represents the weight corresponding to the convolution kernel indexed by u, and it holds that

W [u] \leq W [v]

when

u \leq v

.

s c o r e (u; W) = \frac{{(W [u])}^{2}}{\sum_{v \geq u} {(W [v])}^{2}}

(6)

3. Dataset Description

The small-target detection images used in this experiment were sourced from the publicly available Anti-UAV [33] dataset released by Dalian University of Technology in November 2023. This dataset comprises 10,000 images specifically for drone target detection. It covers a wide range of outdoor environments, including high altitudes, dense clouds, forests, high-rise buildings, urban and rural residences, agricultural lands, and sports fields. Additionally, the dataset incorporates diverse lighting conditions (such as daytime, dusk, and nighttime) and weather variations (including clear and cloudy conditions). The dataset features over 35 types of drone targets, with sizes ranging from 35 × 20 pixels to 110 × 80 pixels. The average target coverage is approximately 0.013, with the minimum coverage as low as

1.9 \times 10^{- 6}

, while the largest targets can occupy up to 0.7 of the image area. Following the original dataset’s partitioning method, we divided the images into a training set of 5200 images, a validation set of 2600 images, and a test set of 2200 images. All images were manually annotated in a format suitable for YOLO training, including target categories and bounding box coordinates. Figure 5 provides a selection of images from this dataset.

4. Results

In this section, experiments are conducted based on the publicly available Anti-UAV dataset to validate the effectiveness of the proposed method. The experimental process encompasses several aspects: first, a detailed explanation of the experimental environment and hyperparameter configurations is provided; second, various metrics for evaluating model performance and lightweight design are introduced; next, the ablation experiment results are thoroughly analyzed and interpreted; finally, by comparing with baseline models and other different YOLOv8 models, the detection efficacy of the proposed method is further verified.

4.1. Experimental Environment and Hyperparameter Settings

The experimental environment is set up on a Linux server equipped with Nvidia Tesla K80 GPUs (4 GPUs, totaling 48GB of VRAM). The software versions used are as follows: Python 3.8.19, PyTorch 1.12.1, CUDA 10.2, and YOLOv8 8.3.12. Through systematic experimental validation, the optimal hyperparameters are determined: a batch size of 24, training epochs of 300, an input image size of 640 × 640, and an initial learning rate of 0.01. At the same time, it is particularly important to note that the proposed model in this paper adopts a speed-up ratio of 1.5 when applying the LAMP pruning algorithm for model pruning. The rationale behind this specific setting will be discussed in detail in subsequent sections. The experiments are based on the YOLOv8s model, and Table 1 provides a detailed list of the key parameter configurations.

4.2. Experimental Evaluation Metrics

For the models obtained during the training phase in this experiment, we adopted Precision, Recall, mAP50, and mAP50-95 as performance metrics for detection accuracy, along with Model Size, Parameters, and GFLOPs as lightweight metrics to evaluate the model efficiency. Additionally, FPS was employed as a key metric to measure the model’s inference speed.

Precision represents the proportion of actual positive samples among those predicted as positive by the model. Here, TP (True Positives) denotes the number of samples correctly predicted as drones, while FP (False Positives) denotes the number of samples incorrectly predicted as drones.

P r e c i s i o n = \frac{T P}{T P + F P}

(7)

Recall refers to the proportion of actual positive samples that are correctly predicted as positive by the model. Here, TP (True Positives) denotes the number of samples correctly predicted as drones, while FN (False Negatives) denotes the number of samples that the model failed to correctly predict as drones.

R e c a l l = \frac{T P}{T P + F N}

(8)

mAP (Mean Average Precision) is a core evaluation metric in object detection tasks, used to measure the overall performance of a model in multi-class detection. It is calculated by taking the mean of the area under the precision–recall curve (also known as Average Precision, AP) for each class, comprehensively reflecting the model’s balanced performance between detection precision and recall. This metric is widely used for performance evaluation in the field of object detection. mAP50 (Mean Average Precision at IoU = 0.5) represents the average precision when the IoU threshold is set to 0.5, which is the area under the precision–recall curve. mAP50-95, on the other hand, is a comprehensive evaluation of the average precision across IoU thresholds ranging from 0.5 to 0.95.

A P = \int_{0}^{1} P (R) d R

(9)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(10)

Model Size refers to the size of a model, measured in megabytes (MB), which indicates the amount of storage and bandwidth resources required for saving and transferring the model.

Parameters represent the number of learnable weights in a model, measured in units. They reflect the complexity of the model.

GFLOPs (Giga Floating-Point Operations per Second) refers to the computational load of a model, representing the number of floating-point operations performed per second, measured in billions of operations.

FPS (Frames Per Second) denotes the number of image frames processed by the model per second, measuring its inference speed.

4.3. Ablation Experiment

To validate the rationality of the IASL-YOLO model, this study conducted a series of ablation experiments on the Anti-UAV dataset. The detailed experimental results are presented in Table 2. Using the YOLOv8s model (Y) as the baseline, the experiments evaluated the individual and combined effects of the CFE-AFPN module (IA), the SIoU localization loss function (S), and the LAMP pruning algorithm (L). When independently introducing the CFE-AFPN module (Y+IA), the model’s detection performance improved significantly: precision, recall, and mAP50 increased by 2.0%, 6.1%, and 3.7%, respectively. In terms of model lightweighting, this module reduced model size by 37% and parameters by 39%, despite a 7% increase in computational cost and a drop in FPS, the overall performance gain was substantial. The standalone introduction of the SIoU loss function (Y+S) resulted in modest performance improvements, while the LAMP pruning algorithm (Y+L) achieved slight gains in performance metrics alongside significant lightweighting and FPS improvements. In progressive experiments (Y→Y+IA→Y+IA+S→Y+IA+S+L), the results demonstrated that after integrating the CFE-AFPN module, the addition of SIoU further slightly enhanced performance metrics. Subsequently, applying the LAMP pruning algorithm maintained performance while achieving greater lightweighting, reducing model size, parameters, and computational cost to 5.3, 2.4, and 19.9, respectively, with FPS recovering to 51.1.

4.4. Comparative Experiments with the Baseline Model

To validate the effectiveness of the proposed model in this paper, we conducted experiments on the Anti-UAV dataset and compared the IASL-YOLO model with the baseline model under the same experimental conditions. The experimental results are presented in Table 3. The results demonstrate that the IASL-YOLO model achieves significant improvements in detection performance, with precision increasing by 2.9%, recall by 6.8%, mAP50 by 3.9%, and mAP50-95 by 3.8%. In terms of model lightweighting, the model size is reduced by 75%, the number of parameters is decreased by 78%, and the computational load is reduced by 30%. While the proposed model exhibits a decrease in FPS, the significant improvements in detection accuracy and model lightweighting make it markedly superior to the YOLOv8s baseline model in overall performance.

To further evaluate the detection performance of the IASL-YOLO model, this study selected images from the Anti-UAV dataset featuring complex backgrounds at different times of day and varying UAV pixel ratios for comparative experiments. The experimental results demonstrate that the IASL-YOLO model exhibits significant advantages over the baseline model, YOLOv8s.

As illustrated in Figure 6, in images with a higher UAV pixel ratio, YOLOv8s is capable of identifying UAV targets but suffers from insufficient bounding box accuracy and missed detections. In contrast, the IASL-YOLO model not only provides more precise bounding boxes but also achieves a notable improvement in detection accuracy. In images with a lower UAV pixel ratio, YOLOv8s continues to struggle with missed detections and low precision, whereas the IASL-YOLO model not only accurately identifies targets but also delivers higher confidence scores.

Furthermore, as seen in the inference results in Figure 7, in complex backgrounds at different times of day, the IASL-YOLO model consistently outperforms YOLOv8s. Specifically, it significantly reduces the missed detection rate while enhancing detection accuracy. These experimental results comprehensively validate the superior performance of the IASL-YOLO model in UAV detection tasks.

We also conducted a performance comparison between YOLOv8s and our proposed model in multi-UAV target scenarios, as shown in Figure 8. The results demonstrate that our model outperforms other models in multi-UAV detection tasks, effectively reducing missed detections and false positives while achieving significantly higher accuracy than YOLOv8s, particularly for small-sized targets with low pixel proportions, where the detection improvement is even more pronounced.

It is important to emphasize that the cases where the proposed model displayed confidence scores below 0.7 in the presented test samples occurred only under extreme testing conditions, such as scenes with severe background interference. These instances represent edge cases that fall well beyond the scope of conventional applications. Notably, in these highly challenging scenarios where the benchmark model YOLOv8s either failed to detect objects entirely or produced critically low confidence scores, our IASL-YOLO model consistently delivered reliable detection performance. This comparative result strongly confirms the model’s significant robustness advantage when handling edge cases.

4.5. Comparative Experiments with Different YOLOv8 Models

To further validate the effectiveness of the drone target detection model proposed in this study, comparative experiments were conducted with the YOLOv8 series models, namely YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. The experimental results are presented in Table 4. In terms of detection performance, the proposed model achieved the highest values in precision (96.8%), recall (88.2%), mAP50 (92.4%), and mAP50-95 (61.9%). Additionally, in terms of lightweight design, compared to YOLOv8n, the smallest model in the YOLOv8 series, the proposed model reduced the model size to 88.3% of YOLOv8n and the number of parameters to 75%. Meanwhile, in terms of inference speed, the FPS metric of the proposed model shows significant improvement over YOLOv8m, YOLOv8l, and YOLOv8x. In conclusion, the proposed model demonstrates optimal performance in balancing detection accuracy and lightweight design.

5. Discussion

This section systematically evaluates the proposed IASL-YOLO model through comprehensive experiments. First, we analyze the impact of different speed-up ratios in model pruning to determine the optimal balance between accuracy and model lightweightness. Second, we visualize and analyze the training/validation curves of the proposed SIoU loss function. Finally, we benchmark IASL-YOLO against state-of-the-art methods to demonstrate its superior performance in both detection accuracy and computational efficiency.

5.1. Experiment to Determine the Speed-Up Ratio for Model Pruning

In the process of model pruning experiments, determining the speed-up ratio is a critical step, requiring a comprehensive consideration of the balance between performance and model simplification. An excessively high speed-up ratio may lead to a significant decline in performance or even the inability to achieve the target speed-up ratio, whereas a lower speed-up ratio may not result in substantial model lightweighting. Therefore, the optimal speed-up ratio should aim for the least performance degradation while pursuing the maximum degree of model simplification.

To explore the optimal speed-up ratio for the proposed model on the selected dataset, this experiment was designed to compare the model’s performance at different speed-up ratios. The experimental results are shown in Table 5. As shown in the experimental results, at a speed-up ratio of 1.0, the model is not pruned. At a speed-up ratio of 1.5, compared to the unpruned model, precision increased by 0.8%, recall by 0.5%, mAP50 by 0.1%, and mAP50-95 by 0.4%. Meanwhile, the model size was reduced by 61%, the number of parameters by 65%, and the computational load by 35%. At this speed-up ratio, the model achieves a slight performance improvement and demonstrates excellent lightweighting metrics. At speed-up ratios of 2.0 and 2.5, although the model size, number of parameters, and computational load were further reduced and FPS gradually increased, precision, recall, and mAP50 suffered varying degrees of loss, making such models unable to maintain good detection performance. Therefore, a speed-up ratio of 1.5 was chosen for pruning the model. With the speedup ratio set at 1.5, a comparison of the number of channels in each convolutional layer before and after model pruning, as depicted in Figure 9, reveals that the LAMP pruning algorithm predominantly removes redundant portions from the convolutional layers within the model’s Backbone network.

5.2. Training and Validation Curves of SIoU Loss

To better visualize the loss dynamics after introducing the SIoU localization loss function, its change curves during model training and validation are plotted in Figure 10. The model demonstrates favorable convergence within 300 epochs: the loss values drop rapidly in the initial stage, then slow down and stabilize in later phases, eventually approaching 1.0 for both training and validation. The training loss remains slightly lower than the validation loss, but the gap is reasonable, with no signs of overfitting. The above analysis indicates that the SIoU loss function effectively guides the model optimization process, and its smooth, stable convergence characteristics provide reliable support for improving model performance.

5.3. Comparative Experiments with Existing Methods

To validate the performance of the IASL-YOLO model, this paper conducted comparative experiments against RT-DETR-L [34], YOLOv7-tiny [35], YOLOv9s [36], YOLOv10s [37], and YOLOv11s [21] on the Anti-UAV dataset. The pruning speed-up setting for the IASL-YOLO model was 1.5, and the experimental results are shown in Table 6. The results demonstrate that IASL-YOLO exhibits significant advantages in both object detection performance and lightweight design. In terms of detection performance, the proposed IASL-YOLO model achieved the best results in all four key metrics: precision, recall, mAP50, and mAP50-95. Notably, IASL-YOLO showed a substantial performance advantage in the recall metric, outperforming the next-best model, RT-DETR-L, by 5%. In terms of lightweight design, IASL-YOLO also excels. Experimental data indicate that IASL-YOLO is significantly smaller in model size and number of parameters compared to the other models. Specifically, the number of parameters and model size of IASL-YOLO are only 8% of those of RT-DETR-L. Although the computational cost of IASL-YOLO is higher than that of YOLOv7-tiny, it significantly outperforms YOLOv7-tiny across all other performance and lightweight evaluation metrics. Meanwhile, in terms of inference speed, the model proposed in this paper demonstrates significantly superior FPS performance compared to both the RT-DETR-L and YOLOv9s models.

In summary, the IASL-YOLO model proposed in this paper outperforms other models in both detection performance and lightweight design. The model can better integrate features from different levels, significantly improving object localization accuracy. Additionally, by leveraging model pruning techniques, IASL-YOLO achieves substantial lightweight optimization while maintaining high detection precision. These advantages make it stand out in UAV small-object detection tasks and lightweight comparison experiments. Consequently, the experimental results of this study surpass those of other models.

6. Conclusions

To address the balance between accuracy and lightweight design in anti-drone detection, this study proposes a lightweight drone detection model named IASL-YOLO. The model first replaces the Neck part of YOLOv8s with the CFE-AFPN network, utilizing a bottom–up progressive feature fusion mechanism to reduce the model’s size while minimizing the semantic gap between non-adjacent hierarchical features. The integrated C2f-Faster-EMA module not only reduces computational load but also enhances the expressive power of local details and global features. Secondly, the SIoU localization loss function is employed in place of CIoU, effectively resolving the misalignment between predicted and true bounding boxes, thereby significantly improving the model’s detection accuracy. Lastly, the LAMP pruning algorithm is applied to the model, substantially reducing its size, number of parameters, and computational complexity. Experiments demonstrate that the proposed model outperforms the original model and other state-of-the-art models on the Anti-UAV dataset. The innovative design of this model offers a unique advantage in solving the challenge of balancing performance and lightweight in drone target detection, providing an efficient solution for anti-drone detection.

In future research, we aim to explore the application of knowledge distillation techniques to further optimize the model’s performance. By transferring knowledge from complex models to simplified ones, this technique can effectively enhance the accuracy and efficiency of pruned models. We plan to apply this technique to our detection model, with the goal of achieving a comprehensive improvement in performance.

Author Contributions

Conceptualization, H.Y. and S.F.; methodology, H.Y.; software, H.Y.; validation, H.Y., B.L., and J.J.; formal analysis, H.Y. and B.L.; investigation, S.F., A.F., and C.L.; resources, B.L.; data curation, H.Y. and S.F.; writing—original draft preparation, H.Y.; writing—review and editing, H.Y. and S.F.; visualization, A.F. and C.L.; supervision, J.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China; grant number 2022YFC3320800.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhu, X. Analysis of military application of UAV swarm technology. In Proceedings of the 2020 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, 27–28 November 2020; pp. 1200–1204. [Google Scholar]
Yan, C.; Fu, L.; Zhang, J.; Wang, J. A comprehensive survey on UAV communication channel modeling. IEEE Access 2019, 7, 107769–107792. [Google Scholar] [CrossRef]
Murray, C.C.; Raj, R. The multiple flying sidekicks traveling salesman problem: Parcel delivery with multiple drones. Transp. Res. Part C Emerg. Technol. 2020, 110, 368–398. [Google Scholar] [CrossRef]
Hengy, S.; Laurenzis, M.; Schertzer, S.; Hommes, A.; Kloeppel, F.; Shoykhetbrod, A.; Geibig, T.; Johannes, W.; Rassy, O.; Christnacher, F. Multimodal UAV detection: Study of various intrusion scenarios. In Proceedings of the Electro-Optical Remote Sensing XI, SPIE, Warsaw, Poland, 11–12 September 2017; Volume 10434, pp. 203–212. [Google Scholar]
Zhi, Y.; Fu, Z.; Sun, X.; Yu, J. Security and privacy issues of UAV: A survey. Mob. Netw. Appl. 2020, 25, 95–101. [Google Scholar] [CrossRef]
Pietrek, G. Threats to critical infrastructure. The case of unmanned aerial vehicles. J. Mod. Sci. 2022, 49, 120–133. [Google Scholar] [CrossRef]
Ge, C.; Song, Y.; Ma, C.; Qi, Y.; Luo, P. Rethinking attentive object detection via neural attention learning. IEEE Trans. Image Process. 2023, 33, 1726–1739. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Hong, D.; Qi, Y.; Han, Z.; Wang, S.; Qing, L.; Huang, Q.; Li, G. Multi-attention network for compressed video referring object segmentation. In Proceedings of the30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 4416–4425. [Google Scholar]
Phan, V.M.H.; Xie, Y.; Zhang, B.; Qi, Y.; Liao, Z.; Perperidis, A.; Phung, S.L.; Verjans, J.W.; To, M.S. Structural attention: Rethinking transformer for unpaired medical image synthesis. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; pp. 690–700. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Stefenon, S.F.; Seman, L.O.; Klaar, A.C.R.; Ovejero, R.G.; Leithardt, V.R.Q. Hypertuned-YOLO for interpretable distribution power grid fault location based on EigenCAM. Ain Shams Eng. J. 2024, 15, 102722. [Google Scholar] [CrossRef]
Singh, G.; Stefenon, S.F.; Yow, K.C. Interpretable visual transmission lines inspections using pseudo-prototypical part network. Mach. Vis. Appl. 2023, 34, 41. [Google Scholar] [CrossRef]
Stefenon, S.F.; Singh, G.; Souza, B.J.; Freire, R.Z.; Yow, K.C. Optimized hybrid YOLOu-Quasi-ProtoPNet for insulators classification. IET Gener. Transm. Distrib. 2023, 17, 3501–3511. [Google Scholar] [CrossRef]
Hu, Y.; Wu, X.; Zheng, G.; Liu, X. Object detection of UAV for anti-UAV based on improved YOLO v3. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8386–8390. [Google Scholar]
Fang, A.; Feng, S.; Liang, B.; Jiang, J. Real-time detection of unauthorized unmanned aerial vehicles using SEB-YOLOv8s. Sensors 2024, 24, 3915. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; Huang, S.; Jin, D.; Wang, X.; Li, L.; Guo, Y. LA-YOLO: An effective detection model for multi-UAV under low altitude background. Meas. Sci. Technol. 2024, 35, 055401. [Google Scholar] [CrossRef]
Zamri, F.N.M.; Gunawan, T.S.; Yusoff, S.H.; Alzahrani, A.A.; Bramantoro, A.; Kartiwi, M. Enhanced small drone detection using optimized YOLOv8 with attention mechanisms. IEEE Access 2024, 12, 90629–90643. [Google Scholar] [CrossRef]
Niu, R.; Qu, Y.; Wang, Z. UAV detection based on improved YOLOv4 object detection model. In Proceedings of the 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Zhuhai, China, 24–26 September 2021; pp. 25–29. [Google Scholar]
Zhang, X.; Fan, K.; Hou, H.; Liu, C. Real-time detection of drones using channel and layer pruning, based on the yolov3-spp3 deep learning algorithm. Micromachines 2022, 13, 2199. [Google Scholar] [CrossRef] [PubMed]
Feng, Y.; Wang, T.; Jiang, Q.; Zhang, C.; Sun, S.; Qian, W. A Efficient and Accurate UAV Detection Method Based on YOLOv5s. Appl. Sci. 2024, 14, 6398. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 March 2025).
Jocher, G. YOLOv5 by Ultralytics. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 March 2025).
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. Afpn: Asymptotic feature pyramid network for object detection. arXiv 2023, arXiv:2306.15988. [Google Scholar]
Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Chen, J.; Kao, S.h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-adaptive sparsity for the magnitude-based pruning. arXiv 2020, arXiv:2010.07611. [Google Scholar]
Zhao, J.; Zhang, J.; Li, D.; Wang, D. Vision-based anti-uav detection and tracking. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25323–25334. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]

Figure 1. Structure of YOLOv8.

Figure 2. Structure of IASL-YOLO.

Figure 3. C2f-Faster-EMA module.

Figure 4. EMA module.

Figure 5. Partial examples from the Anti-UAV dataset include: (a,b) UAV images with a high pixel ratio; (c,d) UAV images with a low pixel ratio; (e,f) daytime images with complex backgrounds; (g,h) nighttime images with complex backgrounds.

Figure 6. Inference results of YOLOv8s and the proposed IASL-YOLO model on the Anti-UAV dataset for UAV targets with small and large pixel proportions. (a) Inference results of YOLOv8s. (b) Inference results of the proposed IASL-YOLO model.

Figure 7. Inference results of YOLOv8s and the proposed IASL-YOLO model on the Anti-UAV dataset for UAV targets in complex daytime and nighttime backgrounds. (a) Inference results of YOLOv8s. (b) Inference results of the proposed IASL-YOLO model.

Figure 8. Inference results of YOLOv8s and the proposed IASL-YOLO model on the Anti-UAV dataset for multi-UAV targets. (a) Inference results of YOLOv8s. (b) Inference results of the proposed IASL-YOLO model.

Figure 9. Comparison of the number of channels in each convolutional layer before and after model pruning, with the speed-up ratio set to 1.5.

Figure 10. Training and validation curves of SIoU Loss.

Table 1. Key hyperparameter settings for model training.

Parameters	Setup
Epochs	300
Batch size	24
Image size	640 × 640
Optimizer	SGD
Momentum	0.937
Initial learning rate	0.01
Final learning rate	0.0001
Speed-up	1.5

Table 2. Comparison of detection results after introducing different improvement strategies (bolded data in table indicates best results).

Model	Precision/%	Recall/%	mAP50/%	mAP50-95/%	Model Size/MB	Parameter/ $10^{6}$	GFLOPs	FPS
Y (baseline)	93.9	81.4	88.5	58.1	21.5	11.1	28.4	59.8
Y+IA	95.9	87.5	92.2	61.4	13.6	6.8	30.4	42
Y+S	94.2	81.5	88.7	58	21.5	11.1	28.4	59.8
Y+L	95	81.4	88.8	59.7	9.6	4.9	18.8	73.1
Y+IA+S	96	87.7	92.3	61.3	13.6	6.8	30.4	42
Y+IA+S+L (Ours)	96.8	88.2	92.4	61.9	5.3	2.4	19.9	51.1

Table 3. Experimental results comparison between the proposed model and the baseline model.

Model	Precision/%	Recall/%	mAP50/%	mAP50-95/%	Model Size/MB	Parameter/ $10^{6}$	GFLOPs	FPS
YOLOv8s (baseline)	93.9	81.4	88.5	58.1	21.5	11.1	28.4	59.8
IASL-YOLO (Ours)	96.8	88.2	92.4	61.9	5.3	2.4	19.9	51.1

Table 4. The comparative results between the different YOLOv8 models and the proposed model (the bolded data in the table indicate the best results).

Model	Precision/%	Recall/%	mAP50/%	mAP50-95/%	Model Size/MB	Parameter/ $10^{6}$	GFLOPs	FPS
YOLOv8n	94.1	78.2	85.6	54.1	6.0	3.2	8.7	146.6
YOLOv8s	93.9	81.4	88.5	58.1	21.5	11.1	28.4	59.8
YOLOv8m	94.4	82.3	89	58.3	44.6	25.9	78.9	26.2
YOLOv8l	93.8	81.1	88.4	58.4	75.6	43.7	165.2	17
YOLOv8x	93.7	81	88.7	58.8	118	68.2	257.8	10.8
IASL-YOLO (Ours)	96.8	88.2	92.4	61.9	5.3	2.4	19.9	51.1

Table 5. Comparison results of the pruned models under different speed-up ratios (the bolded data in the table indicate the best results).

Speed-Up	Precision/%	Recall/%	mAP50/%	mAP50-95/%	Model Size/MB	Parameter/ $10^{6}$	GFLOPs	FPS
1.0	96	87.7	92.3	61.3	13.6	6.8	30.4	42
1.5	96.8	88.2	92.4	61.9	5.3	2.4	19.9	51.1
2.0	94.6	88.3	92.3	61.6	4.2	1.8	15.1	60.1
2.5	96.4	86.4	91.5	60.2	3.5	1.5	12.1	64.6

Table 6. Experimental results comparison between the proposed model and existing methods (the bolded data in the table indicate the best results).

Model	Precision/%	Recall/%	mAP50/%	mAP50-95/%	Model Size/MB	Parameter/ $10^{6}$	GFLOPs	FPS
RT-DETR-L	91.3	83.2	89.5	54.7	63.1	32	103.4	11.9
YOLOv7-tiny	93.3	68.6	77.6	50.8	12.3	6.0	13.2	117.6
YOLOv9s	94.7	82.5	89.7	60.3	19.4	9.7	39.6	37
YOLOv10s	92.1	78.9	87.1	55.9	15.8	7.2	21.4	61.7
YOLOv11s	94.4	80.7	88.2	57.4	18.4	9.4	21.3	55.3
IASL-YOLO (Ours)	96.8	88.2	92.4	61.9	5.3	2.4	19.9	51.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, H.; Liang, B.; Feng, S.; Jiang, J.; Fang, A.; Li, C. Lightweight UAV Detection Method Based on IASL-YOLO. Drones 2025, 9, 325. https://doi.org/10.3390/drones9050325

AMA Style

Yang H, Liang B, Feng S, Jiang J, Fang A, Li C. Lightweight UAV Detection Method Based on IASL-YOLO. Drones. 2025; 9(5):325. https://doi.org/10.3390/drones9050325

Chicago/Turabian Style

Yang, Huaiyu, Bo Liang, Song Feng, Ji Jiang, Ao Fang, and Chunyun Li. 2025. "Lightweight UAV Detection Method Based on IASL-YOLO" Drones 9, no. 5: 325. https://doi.org/10.3390/drones9050325

APA Style

Yang, H., Liang, B., Feng, S., Jiang, J., Fang, A., & Li, C. (2025). Lightweight UAV Detection Method Based on IASL-YOLO. Drones, 9(5), 325. https://doi.org/10.3390/drones9050325

Article Menu

Lightweight UAV Detection Method Based on IASL-YOLO

Abstract

1. Introduction

2. Methods

2.1. The Structure of IASL-YOLO

2.1.1. Improvement of the Neck

2.1.2. Introduction of the SIoU Loss Function

2.1.3. Model Pruning Using the LAMP Algorithm

3. Dataset Description

4. Results

4.1. Experimental Environment and Hyperparameter Settings

4.2. Experimental Evaluation Metrics

4.3. Ablation Experiment

4.4. Comparative Experiments with the Baseline Model

4.5. Comparative Experiments with Different YOLOv8 Models

5. Discussion

5.1. Experiment to Determine the Speed-Up Ratio for Model Pruning

5.2. Training and Validation Curves of SIoU Loss

5.3. Comparative Experiments with Existing Methods

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI