Comparison and ablation experiments were conducted to evaluate the performance of DT-YOLO comprehensively. The comparison consisted of DT-YOLO being compared with current mainstream object detection models, such as Fast-RCNN, YOLOv3, YOLOv4, YOLOv5, YOLOv6, YOLOv7, YOLOv8, SSD, and RetinaNet. The ablation study removed or added the components to see the impact of each of them.
5.4.1. The Comparison Experiments
To evaluate the performance of each model objectively, we utilized AP50, AP75, APs, APm, APl, mAP, and GFLOPs as the evaluation metrics, where the following applies: AP50 represents the AP value with the IoU threshold set to 0.5; APs, APm, and APl represents the AP value with the IoU threshold set to 0.75; mAP represents the average of the mAPs with various IoU thresholds; and APs, APm, and APl represent the AP values for small, medium, and large objects, respectively.
Table 6 indicates the mAP values of the different models, and
Table 7 indicates the values of various evaluation metrics in different object detection models.
As shown in
Table 6, DT-YOLO performs excellently across all categories in airport apron scenes, achieving the highest AP values compared to other object detection algorithms. In detail, the AP values for the “Engine” and “Nose” categories reached 94.9 and 94.6, respectively, making them the best-performing categories, which indicates that DT-YOLO can effectively and accurately detect these objects. The “tail” category follows closely behind, reaching an AP value of 88.4, demonstrating that although the performance is still relatively strong, it is slightly lower than that of the “engine” and “nose” categories. However, the detection performance for the “Person”, “LandingGear”, and “WingTip” categories is relatively weaker, with AP values of 75.8, 74.8, and 74.9, respectively.
The reason for the results is that the “Engine” and “Nose” categories typically belong to medium- and large-scale objects, which are often located in larger regions of the images, making the objects more prominent and clearer in the images. Additionally, their relatively fixed structure also makes them easier for object detection algorithms to detect. The precise detection of these objects by DT-YOLO can be attributed to its powerful feature extraction and the capability of object localization, which enable DT-YOLO to handle the localization and detection large objects in the complex environments of airport aprons. In contrast, although the “tail” category belongs to medium- and large-scale objects and is generally clear in the images, it presents significant variations in its appearance across different orientations; it even presents as a “straight line” in some extreme scenarios, which leads to some difficulty in detecting “tail” targets accurately. Nonetheless, the detection performance for this category still performs well, highlighting the adaptability and robustness of DT-YOLO in dealing with object appearance variations. On the other hand, for the “Person”, “LandingGear”, and “Wintip” categories, the objects tend to be smaller in size and are significantly affected by various environmental factors, such as shooting angle, distance, and lighting conditions. This can cause these small objects to appear blurred in the image, making it difficult for the algorithm to extract and locate clear features. Furthermore, the small objects are often occluded by complex backgrounds or other objects, further reducing detection accuracy.
Table 7 presents the performance of DT-YOLO compared to other mainstream object detection algorithms on ADD-dataset. From an overall analysis perspective, DT-YOLO outperforms other object detection algorithms in all evaluation metrics, demonstrating its superiority in object detection tasks in complex airport apron scenes. Specifically, DT-YOLO achieves 98.4 and 82.4 for
and
, respectively, which indicates that DT-YOLO can accurately detect various types of objects in airport apron scenes. Even under stricter AP75 evaluation criteria, DT-YOLO can still maintain high detection accuracy, proving its superiority and efficiency.
Furthermore, DT-YOLO also performs exceptionally well in detecting objects of different sizes, achieving 75.5, 90.8, and 85.4 for APs, APm, and APl, respectively; these all are superior values to those obtained by the other object detection algorithms. In detail, DT-YOLO excels in detecting medium and large objects, even in complex backgrounds and under varying camera angles, maintaining high accuracy. In contrast, its performance in small-object detection is slightly weaker, as indicated by the previous data analysis. This suggests that small objects may be more prone to detection difficulties due to factors such as blurriness or occlusion. However, DT-YOLO still outperforms other object detection algorithms in detecting small objects and can accurately detect them.
In terms of computational performance, the introduction of the Transformer self-attention mechanism and deformable convolutions increases computation, leading to a slight increase in the value of GFLOPs, reaching 262.9; this is slightly higher than other YOLO-based algorithms by 3 to 6 points. However, the increase remains within an acceptable range, which can meet the real-time detection demands of airport aprons. Additionally, as shown in the table, some of the latest YOLO models, such as YOLOv11, outperform DT-YOLO in a few specific metrics. However, since these models are new and the literature and research on them are limited, there may be potential issues with performance stability. Therefore, considering all factors, DT-YOLO remains superior to the other models. In our future work, we will continue to delve deeper into this area of research.
In summary, despite the relatively lower detection precision for small objects, DT-YOLO demonstrates a notable performance improvement for all categories compared to other object detection algorithms. This suggests that DT-YOLO not only excels in detecting medium and large objects but also shows potential improvements in detecting small objects in complex and dynamic airport apron environments, proving the superiority and reliability of DT-YOLO. To demonstrate the superiority of DT-YOLO more intuitively, we created a histogram of different object detection algorithms under various evaluation metrics and provided a graph of the detection effect, as shown in
Figure 14,
Figure 15,
Figure 16, respectively.
Figure 13 and
Figure 14 visually demonstrate the performance of DT-YOLO in different evaluation metrics and its detection effectiveness. It can be seen from the figures that DT-YOLO outperforms other object detection algorithms in all metrics and can effectively detect key aircraft components and personnel in airport aprons, which indicates that DT-YOLO not only improves accuracy but also successfully shortens the inference time and improves the model’s detection speed. Compared to other models, DT-YOLO demonstrates significant performance advantages.
5.4.2. The Ablation Experiments
In order to more precisely evaluate the contribution of individual components to the overall performance of the model, we conducted a series of ablation experiments by systematically removing or adding different components of the model. Here, we utilized AP50, AP75, mAP, and GFLOPs as the evaluation metrics. The experiments aimed to provide insights into the contribution of each component, thereby guiding the optimization of the performance for the model, and seven distinct areas of improvement were designed in the ablation experiments. In detail, Improvements 1 to 4 involved adding a single component, including the Transformer self-attention mechanism, the Dropout layer, DCN (deformable convolution network), and the SLoss loss function, respectively. Improvement 5 integrated the D-CTR module, which simultaneously incorporated the Transformer self-attention mechanism and Dropout layer, aiming to enhance the feature representation capabilities of the model. On the basis of integrating the D-CTR module, Improvements 6 and 7 additionally included deformable convolution and the new SLoss loss function, respectively. Each improvement was evaluated through corresponding experiments. The results are shown in
Table 8, where “✓” represents the corresponding method.
The experimental results in
Table 8 show that adding each component individually in Improvements 1 to 4 does not improve the performance of the model significantly; it even resulted in a decrease in model performance in some cases, which indicates that simply introducing a single component may lead to overfitting or the loss of key feature information, thus negatively affecting overall performance. For example, the Transformer self-attention mechanism in Improvement 1 and the Dropout layer in Improvement 2 caused decreases of 0.6 and 0.8 in the mAP metric, respectively, and also resulted in reductions in the AP50 and AP75 metrics. The reason for these performance drops is that the Transformer self-attention mechanism has strong capabilities in feature extraction and representation, which can cause overfitting. On the other hand, the standalone use of the Dropout layer, while intended to improve generalization, randomly drops some neurons, leading to the loss of some important feature information and a decrease.
In contrast, Improvement 5 integrates the Transformer self-attention mechanism and the Dropout layer into the D-CTR module, which successfully combines the advantages of both components, resulting in a significant improvement in model performance. Compared to the baseline model, Improvement 5 achieved a 1.4 increase in mAP, along with improvements in the AP50 and AP75 metrics. The results indicate that combining the Transformer self-attention mechanism and the Dropout layer can effectively balance the feature extraction of the model and its generalization capabilities, thereby enhancing its overall performance. Improvements 6 and 7 additionally included DCN and the new SLoss loss function on the basis of integrating the D-CTR module, further improving the feature extraction capability of the model. Compared to the baseline model, Improvements 6 and 7 led to increases in mAP of 1.6 and 1.8, respectively, further confirming the positive impact of these components on model performance.
In terms of detection speed, although the addition of new components increased computational complexity, resulting in a slight decrease in GFLOPs, the reduction remained within an acceptable range and met the real-time requirements for airport apron detection tasks. The results demonstrate the effectiveness and applicability of DT-YOLO in dynamic and complex airport apron scenarios.