Author Contributions
Conceptualization, S.W. and X.L.; methodology, X.L.; validation, S.W. and X.L.; formal analysis, S.W. and C.G.; investigation, S.W. and X.L.; data curation, S.W. and X.L.; writing—original draft preparation, S.W.; writing—review and editing, S.W., X.L. and H.G.; supervision, C.G.; project administration, C.G.; funding acquisition, C.G. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Comparison of various methods [
2,
5,
6,
7].
Figure 1.
Comparison of various methods [
2,
5,
6,
7].
Figure 2.
The HRMamba-YOLO architecture (In neck, the orange arrows represent the fusion of different feature maps, and the other color arrows represent the forward transfer of feature maps with different resolutions).
Figure 2.
The HRMamba-YOLO architecture (In neck, the orange arrows represent the fusion of different feature maps, and the other color arrows represent the forward transfer of feature maps with different resolutions).
Figure 4.
Description of ES2D (ES2D adopts a strategy of scanning forward vertically and horizontally while skipping patches and maintaining the number of patches unchanged. Their efficient visual state space (EVSS) block comprises a convolutional branch for local features, uses ES2D as the SSM branch for global features, and all branches end through a squeeze–excitation block. They employ EVSS blocks for the horizontal direction (marked with green lines), while opting for inverted residual blocks for the vertical direction (marked with red lines), to enhance the capture of global representations).
Figure 4.
Description of ES2D (ES2D adopts a strategy of scanning forward vertically and horizontally while skipping patches and maintaining the number of patches unchanged. Their efficient visual state space (EVSS) block comprises a convolutional branch for local features, uses ES2D as the SSM branch for global features, and all branches end through a squeeze–excitation block. They employ EVSS blocks for the horizontal direction (marked with green lines), while opting for inverted residual blocks for the vertical direction (marked with red lines), to enhance the capture of global representations).
Figure 5.
The EMM module.
Figure 5.
The EMM module.
Figure 6.
Comparison of various feature pyramid networks.
Figure 6.
Comparison of various feature pyramid networks.
Figure 7.
The specific structure of the HRFPN.
Figure 7.
The specific structure of the HRFPN.
Figure 8.
The FMM module.
Figure 8.
The FMM module.
Figure 9.
The visualization of HRMamba-YOLO.
Figure 9.
The visualization of HRMamba-YOLO.
Figure 10.
The Visdrone2019 datasets.
Figure 10.
The Visdrone2019 datasets.
Figure 11.
Visualization of the feature map with one-eighth resolution (here, (a–f) represent YOLOv5-m, YOLOv6-m, YOLOv7, YOLOv8-m, YOLOX-m, and HRMamba-YOLO, respectively).
Figure 11.
Visualization of the feature map with one-eighth resolution (here, (a–f) represent YOLOv5-m, YOLOv6-m, YOLOv7, YOLOv8-m, YOLOX-m, and HRMamba-YOLO, respectively).
Figure 12.
Comparison of the detection results (Visdrone2019; here, (a–f) represent YOLOv5-m, YOLOv6-m, YOLOv7, YOLOv8, YOLOX-m, and HRMamba-YOLO, respectively, and the same applies to the following figures).
Figure 12.
Comparison of the detection results (Visdrone2019; here, (a–f) represent YOLOv5-m, YOLOv6-m, YOLOv7, YOLOv8, YOLOX-m, and HRMamba-YOLO, respectively, and the same applies to the following figures).
Figure 13.
Comparison of the detection results (Dota1.5, here, (a–f) represent YOLOv5-m, YOLOv6-m, YOLOv7, YOLOv8, YOLOX-m, and HRMamba-YOLO, respectively).
Figure 13.
Comparison of the detection results (Dota1.5, here, (a–f) represent YOLOv5-m, YOLOv6-m, YOLOv7, YOLOv8, YOLOX-m, and HRMamba-YOLO, respectively).
Table 1.
Comparison of state-of-the-art methods.
Table 1.
Comparison of state-of-the-art methods.
Method | mAP (%) | Latency (ms) |
---|
YOLOv3-tiny [26] | 15.9 | 3.2 |
YOLOv4-tiny [1] | 27.6 | 20 |
YOLOv5-s [5] | 29.1 | 10.8 |
YOLOv5-m [5] | 33.9 | 22.1 |
YOLOv6-s [6] | 30.2 | 12.8 |
YOLOv6-m [6] | 33.7 | 21.9 |
YOLOv7 [2] | 36.3 | 34.6 |
YOLOv7-tiny [2] | 29.8 | 7.5 |
YOLOv8-s [5] | 34.8 | 12.3 |
YOLOv8-m [5] | 34.5 | 26.9 |
YOLOX-s [7] | 32.7 | 11.5 |
YOLOX-m [7] | 34.2 | 24.6 |
Faster-RCNN [38] | 21.4 | – |
CenterNet [39] | 29.1 | – |
DMNet [40] | 28.7 | – |
SSD [3] | 25.3 | – |
ClusDet [41] | 31.7 | – |
DREN [42] | 30.3 | – |
GLSAN [24] | 32.5 | – |
QueryDet [43] | 28.3 | – |
HRMamba-YOLO (Ours) | 38.9 | 31.1 |
Table 2.
Ablation experiments on Double SPP.
Table 2.
Ablation experiments on Double SPP.
Variant | SPP’s Number | SE | mAP (%) | Latency (ms) |
---|
A | 1 | | 34.8 | 12.3 |
B | 1 | ✓ | 34.9 | 12.5 |
C | 2 | | 34.9 | 12.7 |
D | 2 | ✓ | 35.1 | 12.9 |
Table 3.
Ablation experiments on EMM.
Table 3.
Ablation experiments on EMM.
Global | Local | Param (M) | GFLOPs | mAP (%) | Latency (ms) |
---|
| | 15.2 | 32.1 | 35.1 | 12.9 |
✓ | | 17.6 | 35.1 | 35.6 | 17.4 |
✓ | ✓ | 16.9 | 35.2 | 35.3 | 16.4 |
| ✓ | 17.7 | 35.4 | 35.9 | 18.7 |
Table 4.
Experiment to validate the effectiveness of multi-scale input (FMM).
Table 4.
Experiment to validate the effectiveness of multi-scale input (FMM).
Method | mAP (%) | Latency (ms) |
---|
w/Auxiliary Input | 36.4 | 25.5 |
w/o Auxiliary Input | 35.6 | 24.7 |
Table 5.
Comparison experiment of different feature fusion networks.
Table 5.
Comparison experiment of different feature fusion networks.
Method | Param (M) | GFLOPs | mAP (%) | Latency (ms) |
---|
PANet | 15.2 | 32.1 | 35.1 | 12.9 |
HRNetv1 style | 19.4 | 42.5 | 35.9 | 19.2 |
HRNetv2 style | 14.9 | 35.5 | 36.2 | 16.2 |
HRFPN | 15.1 | 36.8 | 36.6 | 16.4 |
Table 6.
The step-by-step ablation experiment for HRMamba-YOLO.
Table 6.
The step-by-step ablation experiment for HRMamba-YOLO.
Variant | Double SPP | PANet | HRFPN | EMM | FMM | Param (M) | GFLOPs | mAP (%) | Latency (ms) |
---|
A | | ✓ | | | | 11.2 | 28.8 | 34.8 | 12.3 |
B | ✓ | ✓ | | | | 15.2 | 32.1 | 35.1 | 12.9 |
C | ✓ | ✓ | | ✓ | | 17.7 | 35.4 | 35.9 | 18.7 |
D | ✓ | ✓ | | | ✓ | 31.6 | 90.3 | 36.4 | 25.5 |
E | ✓ | ✓ | | ✓ | ✓ | 34.1 | 93.6 | 37.4 | 27.4 |
F | ✓ | | ✓ | | | 15.1 | 36.8 | 36.6 | 16.4 |
G | ✓ | | ✓ | ✓ | | 17.6 | 39.9 | 38.2 | 23 |
H | ✓ | | ✓ | | ✓ | 31.4 | 94.8 | 38.5 | 29.8 |
I | ✓ | | ✓ | ✓ | ✓ | 33.5 | 96.4 | 38.9 | 31.1 |
Table 7.
A mAP comparison of different classes in Visdrone2019.
Table 7.
A mAP comparison of different classes in Visdrone2019.
Model | mAP (%) | Pedestrian | People | Bicycle | Car | Van | Truck | Tricycle | Awning-Tricycle | Bus | Motor |
---|
YOLOv5-m [5] | 33.9 | 34.1 | 23.8 | 17.3 | 64.6 | 41.2 | 35.0 | 25.4 | 16.7 | 49.0 | 31.7 |
YOLOv6-m [6] | 33.7 | 32.8 | 23.1 | 17.0 | 64.0 | 41.4 | 35.5 | 26.1 | 17.0 | 49.8 | 30.6 |
YOLOv7 [2] | 36.3 | 36.9 | 25.4 | 19.7 | 66.8 | 44.4 | 37.6 | 26.2 | 16.7 | 55.5 | 33.7 |
YOLOv8-m [5] | 34.5 | 34.6 | 23.9 | 17.8 | 65.1 | 42.0 | 36.5 | 25.4 | 16.1 | 51.4 | 32.2 |
YOLOX-m [7] | 34.2 | 34.7 | 24.3 | 17.9 | 65.5 | 42.3 | 33.5 | 25.0 | 17.1 | 49.3 | 32.0 |
HRMamba-YOLO (Ours) | 38.9 | 37.5 | 26.8 | 21.6 | 68.2 | 46.8 | 41.6 | 30.9 | 19.9 | 58.9 | 36.8 |
Table 8.
The mAP comparison for different classes in Dota1.5 (part 1).
Table 8.
The mAP comparison for different classes in Dota1.5 (part 1).
Method | mAP (%) | Plane | Ship | Storage Tank | Baseball Diamond | Tennis Court | Basketball Court | Ground Track Field |
---|
YOLOv5-m [5] | 31.1 | 58.9 | 44.1 | 38.5 | 27.2 | 74.0 | 20.2 | 19.2 |
YOLOv6-m [6] | 32.0 | 60.2 | 44.9 | 39.5 | 29.6 | 77.9 | 24.2 | 21.1 |
YOLOv7 [2] | 34.4 | 63.5 | 48.2 | 43.6 | 32.1 | 78.1 | 23.8 | 19.5 |
YOLOv8-m [5] | 33.3 | 60.3 | 45.8 | 40.6 | 31.1 | 78.1 | 22.7 | 22.9 |
YOLOX-m [7] | 30.7 | 60.2 | 45.2 | 39.6 | 29.9 | 72.0 | 18.8 | 15.4 |
HRMamba-YOLO (Ours) | 37.1 | 64.6 | 48.2 | 39.6 | 39.3 | 83.0 | 25.8 | 26.0 |
Table 9.
The mAP comparison for different classes in Dota1.5 (part 2).
Table 9.
The mAP comparison for different classes in Dota1.5 (part 2).
Method | mAP (%) | Bridge | Large Vehicle | Small Vehicle | Helicopter | Roundabout | Soccer Ball Field | Swimming Pool |
---|
YOLOv5-m [5] | 31.1 | 2.7 | 56.0 | 34.6 | 14.1 | 9.2 | 18.5 | 19.6 |
YOLOv6-m [6] | 32.0 | 2.5 | 55.7 | 36.0 | 16.7 | 8.5 | 18.7 | 15.5 |
YOLOv7 [2] | 34.4 | 5.0 | 56.3 | 36.0 | 24.3 | 15.1 | 20.9 | 20.7 |
YOLOv8-m [5] | 33.3 | 3.7 | 55.7 | 35.3 | 19.9 | 11.4 | 21.7 | 19.8 |
YOLOX-m [7] | 30.7 | 2.2 | 55.7 | 34.6 | 14.2 | 10.5 | 18.1 | 17.3 |
HRMamba-YOLO (Ours) | 37.1 | 6.9 | 59.7 | 38.8 | 26.5 | 15.1 | 19.0 | 28.0 |
Table 10.
The mAP comparison for different classes in UCAS-AOD.
Table 10.
The mAP comparison for different classes in UCAS-AOD.
Method | mAP (%) | Plane | Car |
---|
YOLOv5-m [5] | 64.5 | 72.7 | 56.4 |
YOLOv6-m [6] | 64.7 | 72.4 | 57.1 |
YOLOv7 [2] | 64.6 | 70.9 | 58.3 |
YOLOv8-m [5] | 65.2 | 71.9 | 58.5 |
YOLOX-m [7] | 63.9 | 72.2 | 55.5 |
HRMamba-YOLO (Ours) | 66.7 | 73.2 | 60.1 |
Table 11.
The mAP comparison for different classes in DIOR (part 1).
Table 11.
The mAP comparison for different classes in DIOR (part 1).
Method | mAP (%) | Airplane | Airport | Baseball Field | Basketball Court | Bridge | Chimney | Dam |
---|
YOLOv5-m [5] | 56.3 | 72.1 | 50.3 | 78.4 | 77.2 | 27.5 | 72.6 | 36.0 |
YOLOv6-m [6] | 55.0 | 71.9 | 47.7 | 77.5 | 76.1 | 25.4 | 73.6 | 34.2 |
YOLOv7 [2] | 58.8 | 73.9 | 60.0 | 79.8 | 78.9 | 29.2 | 75.5 | 37.4 |
YOLOv8-m [5] | 65.8 | 78.2 | 70.7 | 82.4 | 83.3 | 37.4 | 81.7 | 53.1 |
YOLOX-m [7] | 64.6 | 77.4 | 66.4 | 82.1 | 82.1 | 38.3 | 80.0 | 50.1 |
HRMamba-YOLO (Ours) | 66.1 | 79.1 | 71.4 | 83.0 | 83.4 | 38.1 | 81.0 | 51.7 |
Table 12.
The mAP comparison for different classes in DIOR (part 2).
Table 12.
The mAP comparison for different classes in DIOR (part 2).
Method | mAP (%) | Expressway Service Area | Expressway toll Station | Golf Field | Ground Track Field | Harbor | Overpass | Ship |
---|
YOLOv5-m [5] | 56.3 | 62.9 | 51.2 | 59.7 | 70.1 | 45.8 | 42.2 | 57.0 |
YOLOv6-m [6] | 55.0 | 60.4 | 52.3 | 53.3 | 69.3 | 42.5 | 40.7 | 55.7 |
YOLOv7 [2] | 58.8 | 65.1 | 55.7 | 64.9 | 72.5 | 46.5 | 42.6 | 57.6 |
YOLOv8-m [5] | 65.8 | 75.1 | 65.8 | 75.2 | 78.3 | 55.3 | 50.5 | 61.5 |
YOLOX-m [7] | 64.6 | 74.6 | 64.4 | 71.1 | 76.9 | 55.4 | 48.9 | 61.0 |
HRMamba-YOLO (Ours) | 66.1 | 75.7 | 66.0 | 75.4 | 78.1 | 55.3 | 50.5 | 61.5 |
Table 13.
The mAP comparison for different classes in DIOR (part 3).
Table 13.
The mAP comparison for different classes in DIOR (part 3).
Method | mAP (%) | Stadium | Storage Tank | Tennis Court | Train Station | Vehicle | Windmill |
---|
YOLOv5-m [5] | 56.3 | 76.7 | 56.7 | 83.7 | 25.4 | 34.1 | 45.8 |
YOLOv6-m [6] | 55.0 | 74.7 | 53.6 | 82.8 | 27.5 | 33.3 | 46.6 |
YOLOv7 [2] | 58.8 | 77.8 | 57.7 | 84.6 | 33.2 | 36.0 | 47.7 |
YOLOv8-m [5] | 65.8 | 84.6 | 60.9 | 87.0 | 41.0 | 40.5 | 54.3 |
YOLOX-m [7] | 64.6 | 83.1 | 60.5 | 86.4 | 41.2 | 40.1 | 52.9 |
HRMamba-YOLO (Ours) | 66.1 | 86.4 | 61.3 | 87.1 | 41.4 | 40.9 | 55.2 |