Figure 1.
Original network.
Figure 1.
Original network.
Figure 2.
Modified network structure.
Figure 2.
Modified network structure.
Figure 3.
Stride-free convolution module.
Figure 3.
Stride-free convolution module.
Figure 4.
Convolutional triplet attention module.
Figure 4.
Convolutional triplet attention module.
Figure 5.
Schematic diagram of the contextual transformer module.
Figure 5.
Schematic diagram of the contextual transformer module.
Figure 6.
Visual representation of IoU.
Figure 6.
Visual representation of IoU.
Figure 7.
A visual representation of MPDIoU.
Figure 7.
A visual representation of MPDIoU.
Figure 8.
Network’s output results.
Figure 8.
Network’s output results.
Figure 9.
Schematic diagram of precision and recall calculation formulas.
Figure 9.
Schematic diagram of precision and recall calculation formulas.
Figure 10.
Data augmentation methods.
Figure 10.
Data augmentation methods.
Figure 11.
Flowchart of model training.
Figure 11.
Flowchart of model training.
Figure 12.
Label statistics for the training sets. (a,d) Instance count for each category in the augmented DIOR and DOTA version 1.0 training sets, respectively. Different colors represent different categories, and the height of each bar signifies the total count of instances. (b,e) Box plots representing the length and width distributions of instances for every category. The blue box plots represent the width distribution, while the red box plots represent the length distribution. Outliers are indicated by circles outside the box plots, and the category order is consistent with (a) or (d). (c,f) Histograms of the length and width distributions for all instances in the DIOR and DOTA version 1.0 training sets, respectively. The height of each bar represents the proportion of instances within a specific size range relative to the overall instance count in the dataset.
Figure 12.
Label statistics for the training sets. (a,d) Instance count for each category in the augmented DIOR and DOTA version 1.0 training sets, respectively. Different colors represent different categories, and the height of each bar signifies the total count of instances. (b,e) Box plots representing the length and width distributions of instances for every category. The blue box plots represent the width distribution, while the red box plots represent the length distribution. Outliers are indicated by circles outside the box plots, and the category order is consistent with (a) or (d). (c,f) Histograms of the length and width distributions for all instances in the DIOR and DOTA version 1.0 training sets, respectively. The height of each bar represents the proportion of instances within a specific size range relative to the overall instance count in the dataset.
Figure 13.
Detection results on the DIOR dataset.
Figure 13.
Detection results on the DIOR dataset.
Figure 14.
Detection results on the DOTA version 1.0 dataset.
Figure 14.
Detection results on the DOTA version 1.0 dataset.
Figure 15.
Comparison of mAP during the training process of the improved model on various datasets. In each sub-figure, the training epochs’ sequence is displayed along the horizontal axis, while the vertical axis displays the corresponding mAP50 or mAP50:95 metrics. The yellow curve represents the improved model, while the unimproved model’s performance is depicted by the blue curve. (a,b) mAP50 and mAP50:95 curves of the models before and after improvement on the DIOR dataset. (c,d) mAP50 and mAP50:95 curves of the models before and after improvement on the DOTA version 1.0 dataset.
Figure 15.
Comparison of mAP during the training process of the improved model on various datasets. In each sub-figure, the training epochs’ sequence is displayed along the horizontal axis, while the vertical axis displays the corresponding mAP50 or mAP50:95 metrics. The yellow curve represents the improved model, while the unimproved model’s performance is depicted by the blue curve. (a,b) mAP50 and mAP50:95 curves of the models before and after improvement on the DIOR dataset. (c,d) mAP50 and mAP50:95 curves of the models before and after improvement on the DOTA version 1.0 dataset.
Figure 16.
Precision–recall (PR) curves for different categories of objects in various datasets using the improved model. In each sub-figure, the recall value is plotted on the x-axis, whereas the precision value is shown on the y-axis. Different-colored curves correspond to different categories of objects, where the blue curve indicates the average AP across all categories. The numerical value following each category name in the legend represents the AP50 for that particular category. (a,b) PR curves for the models before and after improvement on the DIOR dataset, respectively. (c,d) PR curves for the models before and after improvement on the DOTA version 1.0 dataset, respectively.
Figure 16.
Precision–recall (PR) curves for different categories of objects in various datasets using the improved model. In each sub-figure, the recall value is plotted on the x-axis, whereas the precision value is shown on the y-axis. Different-colored curves correspond to different categories of objects, where the blue curve indicates the average AP across all categories. The numerical value following each category name in the legend represents the AP50 for that particular category. (a,b) PR curves for the models before and after improvement on the DIOR dataset, respectively. (c,d) PR curves for the models before and after improvement on the DOTA version 1.0 dataset, respectively.
Figure 17.
Comparison of AP50 for various categories. (a) Comparison of AP50 for various categories on DIOR; (b) comparison of AP50 for various categories on DOTA version 1.0.
Figure 17.
Comparison of AP50 for various categories. (a) Comparison of AP50 for various categories on DIOR; (b) comparison of AP50 for various categories on DOTA version 1.0.
Figure 18.
(a,b) Confusion matrices for the models before and after improvement on DIOR. (c,d) Confusion matrices for the models before and after improvement on DOTA version 1.0.
Figure 18.
(a,b) Confusion matrices for the models before and after improvement on DIOR. (c,d) Confusion matrices for the models before and after improvement on DOTA version 1.0.
Figure 19.
Comparison of the detection results on DIOR. (a–e) Local area comparisons of the model detection results in five different scenarios.
Figure 19.
Comparison of the detection results on DIOR. (a–e) Local area comparisons of the model detection results in five different scenarios.
Figure 20.
Comparison of the detection results on DOTA version 1.0. (a–e) Local area comparisons of the model detection results in five different scenarios.
Figure 20.
Comparison of the detection results on DOTA version 1.0. (a–e) Local area comparisons of the model detection results in five different scenarios.
Figure 21.
Precision–recall curves for each category on DIOR.
Figure 21.
Precision–recall curves for each category on DIOR.
Figure 22.
Precision–recall curve for each category on DOTA version 1.0.
Figure 22.
Precision–recall curve for each category on DOTA version 1.0.
Figure 23.
Comparison of detection outcomes among various models on DIOR.
Figure 23.
Comparison of detection outcomes among various models on DIOR.
Figure 24.
Comparison of detection outcomes among various models on DOTA version 1.0.
Figure 24.
Comparison of detection outcomes among various models on DOTA version 1.0.
Figure 25.
Heatmaps generated by GradCAM on DIOR. Images (a–c) correspond to three different scenes from the DIOR validation set.
Figure 25.
Heatmaps generated by GradCAM on DIOR. Images (a–c) correspond to three different scenes from the DIOR validation set.
Figure 26.
Heatmaps generated by GradCAM on DOTA version 1.0. Images (a–c) correspond to three distinct scenes from the DOTA version 1.0 validation set.
Figure 26.
Heatmaps generated by GradCAM on DOTA version 1.0. Images (a–c) correspond to three distinct scenes from the DOTA version 1.0 validation set.
Table 1.
Nomenclature.
Abbreviation | Meaning |
---|
SFC | Stride-free convolution module |
CTA | Convolutional triplet attention module |
CoT | Contextual transformer module |
IoU | Intersection over union |
MPDIoU | Minimum point distance-based IoU |
AP | Average precision |
Table 2.
Model training parameters.
Table 2.
Model training parameters.
Parameter | Value |
---|
Batch Size | 32 |
Image Size | 640 × 640 |
Optimizer | Adam |
Weight Decay | 0.0005 |
Learning Rate | 0.01 |
Momentum | 0.937 |
Epoch | 180 |
Table 3.
Correspondence table for object category names and their numbers in the DIOR dataset.
Table 3.
Correspondence table for object category names and their numbers in the DIOR dataset.
R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 |
airplane | airport | baseball field | basketball court | bridge | chimney | dam | Expressway-Service-area | Expressway toll station | golf field |
R11 | R12 | R13 | R14 | R15 | R16 | R17 | R18 | R19 | R20 |
ground track field | harbor | overpass | ship | stadium | storage tank | tennis court | train station | vehicle | windmill |
Table 4.
Correspondence table for object category names and their numbers in the DOTA version 1.0 dataset.
Table 4.
Correspondence table for object category names and their numbers in the DOTA version 1.0 dataset.
A1 | A2 | A3 | A4 | A5 | A5 | A7 | A8 |
small vehicle | large vehicle | plane | storage tank | ship | harbor | ground track field | soccer ball field |
A9 | A10 | A11 | A12 | A13 | A14 | A15 | |
tennis court | swimming pool | baseball diamond | roundabout | basketball court | bridge | helicopter | |
Table 5.
AP50 (%) of each category for DIOR.
Table 5.
AP50 (%) of each category for DIOR.
Method | mAP | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 | R11 | R12 | R13 | R14 | R15 | R16 | R17 | R18 | R19 | R20 |
---|
YOLOv7 | 87.3 | 98.4 | 93.7 | 95.9 | 92.3 | 58.9 | 93.3 | 77.8 | 96.0 | 90.1 | 89.6 | 91.5 | 76.7 | 73.1 | 95.2 | 95.7 | 90.7 | 97.0 | 72.7 | 72.0 | 94.4 |
Ours | 91.2 | 99.1 | 96.3 | 96.8 | 94.6 | 66.3 | 97.4 | 87.2 | 96.8 | 94.2 | 91.6 | 94.7 | 83.8 | 81.9 | 95.1 | 97.5 | 93.3 | 98.1 | 82.3 | 78.9 | 97.8 |
Table 6.
AP50 (%) of each category for DOTA version 1.0.
Table 6.
AP50 (%) of each category for DOTA version 1.0.
Method | mAP | A1 | A2 | A3 | A4 | A5 | A5 | A7 | A8 | A9 | A10 | A11 | A12 | A13 | A14 | A15 |
---|
YOLOv7 | 75.5 | 74.1 | 87.5 | 94.0 | 79.6 | 89.2 | 87.1 | 73.3 | 70.6 | 95.3 | 65.1 | 80.7 | 63.9 | 73.1 | 50.4 | 49.2 |
Ours | 80.8 | 77.5 | 90.4 | 94.7 | 80.6 | 94.6 | 88.2 | 78.7 | 76.9 | 95.6 | 72.1 | 83.2 | 69.4 | 79.8 | 59.6 | 70.4 |
Table 7.
Analysis of significance before and after model improvements based on data from
Table 5 and
Table 6.
Table 7.
Analysis of significance before and after model improvements based on data from
Table 5 and
Table 6.
Dataset | t-Statistic | p-Value |
---|
DIOR | −5.629211 | 0.000020 |
DOTA version 1.0 | −3.929775 | 0.001511 |
Table 8.
Ablation experiments on two datasets.
Table 8.
Ablation experiments on two datasets.
Dataset | Models | P (%) | R (%) | mAP50 (%) | mAP50:95 (%) |
---|
Baseline | SFC | CTA | CoT | MPD | SKC | AUN |
---|
DIOR | √ | | | | | | | 85.7 | 82.9 | 87.3 | 62.3 |
√ | √ | | | | | | 86.7 | 83.5 | 88.0 | 63.5 |
√ | √ | √ | | | | | 87.2 | 84.0 | 88.5 | 64.1 |
√ | √ | √ | √ | | | | 88.1 | 84.4 | 89.0 | 65.3 |
√ | √ | √ | √ | √ | | | 88.8 | 85.0 | 89.5 | 66.2 |
√ | √ | √ | √ | √ | √ | | 90.1 | 85.8 | 90.5 | 67.7 |
√ | √ | √ | √ | √ | √ | √ | 91.1 | 86.4 | 91.2 | 68.9 |
DOTA version 1.0 | √ | | | | | | | 77.9 | 71.7 | 75.5 | 51.1 |
√ | √ | | | | | | 78.9 | 72.7 | 76.5 | 51.8 |
√ | √ | √ | | | | | 79.5 | 73.6 | 77.2 | 52.4 |
√ | √ | √ | √ | | | | 80.2 | 74.5 | 77.9 | 53.0 |
√ | √ | √ | √ | √ | | | 80.9 | 75.4 | 78.6 | 53.5 |
√ | √ | √ | √ | √ | √ | | 82.1 | 76.6 | 79.8 | 54.5 |
√ | √ | √ | √ | √ | √ | √ | 83.1 | 77.6 | 80.8 | 55.3 |
Table 9.
Ablation experiments on two datasets.
Table 9.
Ablation experiments on two datasets.
Dataset | Module | P (%) | R (%) | mAP50 (%) | mAP50:95 (%) |
---|
DIOR | Baseline | 85.7 | 82.9 | 87.3 | 62.3 |
SFC | 86.7 | 83.5 | 88.0 | 63.5 |
CTA | 86.1 | 83.3 | 87.7 | 62.8 |
CoT | 86.5 | 83.1 | 87.5 | 63.4 |
MPD | 86.2 | 83.3 | 87.6 | 60.1 |
SKC | 86.9 | 83.5 | 88.1 | 63.6 |
AUN | 86.5 | 83.4 | 88.0 | 63.3 |
DOTA version 1.0 | Baseline | 77.9 | 71.7 | 75.5 | 51.1 |
SFC | 78.9 | 72.5 | 76.4 | 51.8 |
CTA | 78.5 | 72.6 | 76.2 | 51.3 |
CoT | 78.4 | 72.2 | 76.0 | 51.7 |
MPD | 78.3 | 72.3 | 76.2 | 51.3 |
SKC | 79.1 | 72.6 | 76.7 | 52.0 |
AUN | 78.8 | 72.4 | 76.5 | 51.8 |
Table 10.
Comparative experiments on DIOR.
Table 10.
Comparative experiments on DIOR.
Method | Params(M) | GFLOPs | FPS | P (%) | R (%) | mAP50 (%) | mAP50:95 (%) |
---|
YOLOv5l [18] | 45.6 | 81.1 | 36 | 91.2 | 84.2 | 88.6 | 65.3 |
YOLOv6 [19] | 52.3 | 88.9 | 50 | 84.6 | 81.7 | 86.0 | 60.5 |
YOLOv7 [20] | 62.6 | 99.3 | 57 | 85.7 | 82.9 | 87.3 | 62.3 |
YOLOv8 [21] | 67.5 | 99.0 | 61 | 89.9 | 83.2 | 88.4 | 67.6 |
YOLOv9 [22] | 61.8 | 96.2 | 59 | 89.3 | 82.7 | 88.2 | 65.3 |
CenterNet [57] | 35.8 | 52.3 | 41 | 75.1 | 65.7 | 69.8 | 43.4 |
DPAFPN [58] | 31.1 | 39.7 | 37 | 74.4 | 66.2 | 68.9 | 43.1 |
RetinaNet [59] | 46.7 | 60.9 | 46 | 71.9 | 63.5 | 66.7 | 41.8 |
Ours | 63.2 | 99.8 | 55 | 91.1 | 86.4 | 91.2 | 68.9 |
Table 11.
AP50 (%) comparative experiments for each category on DIOR with various modules.
Table 11.
AP50 (%) comparative experiments for each category on DIOR with various modules.
Method | mAP | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 | R11 | R12 | R13 | R14 | R15 | R16 | R17 | R18 | R19 | R20 |
---|
YOLOv5l [18] | 88.6 | 98.4 | 93.4 | 96.3 | 92.4 | 64.2 | 95.7 | 84.8 | 96.8 | 93.9 | 87.1 | 90.2 | 75.6 | 77 | 95.1 | 94.8 | 91.1 | 96.7 | 76.1 | 75.5 | 97.8 |
YOLOv6 [19] | 86.0 | 98.0 | 93.3 | 95.3 | 91.7 | 54.7 | 94.2 | 75.6 | 95.9 | 85.3 | 89.3 | 90.5 | 78.5 | 72.9 | 94.3 | 95.7 | 90.1 | 95.3 | 73.3 | 64.4 | 92.5 |
YOLOv7 [20] | 87.3 | 98.4 | 93.7 | 95.9 | 92.3 | 58.9 | 93.3 | 77.8 | 96.0 | 90.1 | 89.6 | 91.5 | 76.7 | 73.1 | 95.2 | 95.7 | 90.7 | 97.0 | 72.7 | 72.0 | 94.4 |
YOLOv8 [21] | 88.4 | 97.9 | 93.4 | 96.9 | 93.0 | 61.3 | 95.0 | 79.7 | 97.5 | 93.5 | 88.4 | 91.6 | 76.4 | 75.1 | 95.5 | 96.8 | 90.6 | 96.9 | 78.4 | 72.4 | 97.4 |
YOLOv9 [22] | 88.2 | 98.3 | 91.2 | 96.8 | 94.3 | 62.7 | 95.3 | 82.3 | 97.0 | 94.0 | 87.9 | 91.3 | 77.3 | 74.4 | 95.0 | 96.5 | 90.5 | 97.9 | 73.0 | 70.8 | 97.1 |
CenterNet [57] | 69.8 | 94.2 | 72.4 | 89.3 | 82.6 | 24.5 | 83.4 | 50.7 | 62.6 | 50.2 | 79.3 | 75.6 | 60.3 | 55.3 | 92.1 | 83.2 | 83.3 | 90.9 | 39.9 | 54.0 | 72.9 |
DPAFPN [58] | 68.9 | 95.3 | 50.2 | 92.5 | 87.4 | 29.7 | 86.3 | 48.2 | 68.5 | 68.4 | 60.2 | 81.1 | 36.6 | 49.0 | 86.8 | 90.5 | 83.2 | 93.1 | 28.8 | 54.0 | 88.3 |
RetinaNet [59] | 66.7 | 94.4 | 47.4 | 92.0 | 87.4 | 27.9 | 85.5 | 44.0 | 59.9 | 67.1 | 60.2 | 79.5 | 30.8 | 43.7 | 86.4 | 87.6 | 82.6 | 92.4 | 26.4 | 52.7 | 85.9 |
Ours | 91.2 | 99.1 | 96.3 | 96.8 | 94.6 | 66.3 | 97.4 | 87.2 | 96.8 | 94.2 | 91.6 | 94.7 | 83.8 | 81.9 | 95.1 | 97.5 | 93.3 | 98.1 | 82.3 | 78.9 | 97.8 |
Table 12.
Comparison experiments on DOTA version 1.0.
Table 12.
Comparison experiments on DOTA version 1.0.
Method | Params(M) | GFLOPs | FPS | P (%) | R (%) | mAP50 (%) | mAP50:95 (%) |
---|
YOLOv5l [18] | 45.5 | 81.3 | 31 | 81.5 | 69.1 | 73.9 | 48.8 |
YOLOv6 [19] | 52.3 | 88.8 | 46 | 79.2 | 63.1 | 74.9 | 49.9 |
YOLOv7 [20] | 62.6 | 99.2 | 52 | 77.9 | 71.7 | 75.5 | 51.1 |
YOLOv8 [21] | 67.5 | 99.0 | 56 | 78.8 | 67.1 | 71.6 | 49.2 |
YOLOv9 [22] | 61.8 | 96.2 | 54 | 79.6 | 72.0 | 76.5 | 52.4 |
CenterNet [57] | 35.7 | 52.3 | 36 | 76.4 | 71.2 | 73.6 | 50.1 |
DPAFPN [58] | 31.1 | 39.9 | 33 | 73.6 | 67.1 | 68.2 | 44.0 |
RetinaNet [59] | 46.7 | 61.1 | 39 | 67.9 | 61.7 | 62.7 | 42.1 |
Ours | 63.2 | 99.7 | 51 | 83.1 | 77.6 | 80.8 | 55.3 |
Table 13.
AP50 (%) comparison experiments for each category on DOTA version 1.0 with various modules.
Table 13.
AP50 (%) comparison experiments for each category on DOTA version 1.0 with various modules.
Method | mAP | A1 | A2 | A3 | A4 | A5 | A5 | A7 | A8 | A9 | A10 | A11 | A12 | A13 | A14 | A15 |
---|
YOLOv5l [18] | 73.9 | 65.8 | 85.6 | 93.8 | 75.1 | 88.3 | 85.6 | 68.6 | 59.4 | 94.4 | 60.4 | 80.6 | 63.5 | 71.1 | 52.0 | 64.3 |
YOLOv6 [19] | 74.9 | 68.9 | 87.8 | 93.9 | 78.5 | 88.4 | 86.3 | 72.6 | 60.6 | 94.9 | 65.8 | 80.6 | 61.5 | 71.9 | 49.1 | 62.2 |
YOLOv7 [20] | 75.5 | 74.1 | 87.5 | 94.0 | 79.6 | 89.2 | 87.1 | 73.3 | 70.6 | 95.3 | 65.1 | 80.7 | 63.9 | 73.1 | 50.4 | 49.2 |
YOLOv8 [21] | 71.6 | 62.8 | 86.9 | 93.4 | 75.4 | 89.8 | 83.6 | 64.2 | 55.1 | 94.3 | 59.7 | 72.4 | 57.3 | 69.2 | 45.1 | 64.6 |
YOLOv9 [22] | 76.5 | 70.1 | 88.3 | 94.6 | 80.8 | 90.1 | 87.6 | 71.1 | 67.9 | 95.6 | 66.3 | 79.3 | 64.7 | 75.3 | 54.6 | 61.9 |
CenterNet [57] | 73.6 | 65.9 | 85.5 | 94.4 | 78.2 | 89.3 | 87.0 | 61.9 | 65.9 | 95.4 | 59.8 | 79.0 | 61.1 | 71.3 | 50.7 | 58.6 |
DPAFPN [58] | 68.2 | 68.9 | 86.1 | 91.7 | 75.3 | 88.1 | 83.0 | 62.3 | 64.2 | 95.1 | 59.4 | 70.5 | 51.1 | 61.4 | 39.1 | 26.3 |
RetinaNet [59] | 62.7 | 66.3 | 85.2 | 91.1 | 70.8 | 87.3 | 81.7 | 53.1 | 50.4 | 94.1 | 51.8 | 53.2 | 47.0 | 55.9 | 30.2 | 22.1 |
Ours | 80.8 | 77.5 | 90.4 | 94.7 | 80.6 | 94.6 | 88.2 | 78.7 | 76.9 | 95.6 | 72.1 | 83.2 | 69.4 | 79.8 | 59.6 | 70.4 |