Figure 1.
The comparison between horizontal bounding boxes and oriented bounding boxes. (a) Horizontal bounding box annotation of objects. (b) Oriented bounding box annotation of objects.
Figure 1.
The comparison between horizontal bounding boxes and oriented bounding boxes. (a) Horizontal bounding box annotation of objects. (b) Oriented bounding box annotation of objects.
Figure 2.
On the DOTA-V1.0 test set, the mAP and GFlops index are used to compare the mainstream network with our proposed network. The X-axis represents GFlops, and the Y-axis represents mAP. Our GCA2Net has excellent performance in the balance between mAP and GFlops.
Figure 2.
On the DOTA-V1.0 test set, the mAP and GFlops index are used to compare the mainstream network with our proposed network. The X-axis represents GFlops, and the Y-axis represents mAP. Our GCA2Net has excellent performance in the balance between mAP and GFlops.
Figure 3.
The overall framework of GCA2Net. The detector first acquires feature maps through the backbone network DRC-ResNet, which incorporates Dynamic Rotational Convolution. Subsequently, the AugFPN+ network is employed for feature fusion to obtain enhanced feature maps that provide a better representation of rotational characteristics. The fusion network consists of the Residual Feature Augmentation+ module and the Adaptive Spatial Fusion+ module. Finally, the Oriented RPN network generates oriented candidate bounding boxes, which are then passed to the classification and regression heads for prediction.
Figure 3.
The overall framework of GCA2Net. The detector first acquires feature maps through the backbone network DRC-ResNet, which incorporates Dynamic Rotational Convolution. Subsequently, the AugFPN+ network is employed for feature fusion to obtain enhanced feature maps that provide a better representation of rotational characteristics. The fusion network consists of the Residual Feature Augmentation+ module and the Adaptive Spatial Fusion+ module. Finally, the Oriented RPN network generates oriented candidate bounding boxes, which are then passed to the classification and regression heads for prediction.
Figure 4.
The overall framework of DRC-ResNet backbone. This backbone network is built on ResNet50. Specifically, we replaced the 3 × 3 convolution of the last three stages (Stage 3, Stage 4, and Stage 5) with a sequential combination of DRC and MOSCAB.
Figure 4.
The overall framework of DRC-ResNet backbone. This backbone network is built on ResNet50. Specifically, we replaced the 3 × 3 convolution of the last three stages (Stage 3, Stage 4, and Stage 5) with a sequential combination of DRC and MOSCAB.
Figure 5.
Rotation mechanism of the convolution kernels. Different colors represent different convolution weights. (a) Initialize a convolutional kernel, (b) expand the kernel to a convolutional space through bilinear interpolation, (c) rotate the kernel and resample it, where each weight uses the average of the weights of all the cells occupied by that cell, (d) obtain the rotated convolution kernel.
Figure 5.
Rotation mechanism of the convolution kernels. Different colors represent different convolution weights. (a) Initialize a convolutional kernel, (b) expand the kernel to a convolutional space through bilinear interpolation, (c) rotate the kernel and resample it, where each weight uses the average of the weights of all the cells occupied by that cell, (d) obtain the rotated convolution kernel.
Figure 6.
Example of convolution kernel rotation process. We use convolution kernels and rotate them by 45 degrees. Among them, interpolation represents the bilinear interpolation extended convolution kernel. For convenience, we have extended the convolution kernel to . Finally, resample and take the average value of the region occupied by each grid to obtain the final rotational convolution kernel.
Figure 6.
Example of convolution kernel rotation process. We use convolution kernels and rotate them by 45 degrees. Among them, interpolation represents the bilinear interpolation extended convolution kernel. For convenience, we have extended the convolution kernel to . Finally, resample and take the average value of the region occupied by each grid to obtain the final rotational convolution kernel.
Figure 7.
The overall framework of Dynamic Rotational Convolution. MOSCAB acts on the feature map output by DRC. The weighted addition operation is shown in Formula (
2).
Figure 7.
The overall framework of Dynamic Rotational Convolution. MOSCAB acts on the feature map output by DRC. The weighted addition operation is shown in Formula (
2).
Figure 8.
The overall framework of MOSCAB.
Figure 8.
The overall framework of MOSCAB.
Figure 9.
The overall framework of Adaptive Spatial Fusion+ and Residual Feature Augmentation+. Hadamard Product is an element-wise dot product.
Figure 9.
The overall framework of Adaptive Spatial Fusion+ and Residual Feature Augmentation+. Hadamard Product is an element-wise dot product.
Figure 10.
Schematic diagram of confusion matrix.
Figure 10.
Schematic diagram of confusion matrix.
Figure 11.
Data augmentation methods. During the training phase, we employed the data augmentation strategy mentioned above. However, during the testing phase, we did not use data augmentation strategies.
Figure 11.
Data augmentation methods. During the training phase, we employed the data augmentation strategy mentioned above. However, during the testing phase, we did not use data augmentation strategies.
Figure 12.
The model trainning progress.
Figure 12.
The model trainning progress.
Figure 13.
The detection results of the HRSC2016 dataset. The model achieved accurate detection results for ships of different categories and scales, demonstrating its adaptability to target orientation and target scale.
Figure 13.
The detection results of the HRSC2016 dataset. The model achieved accurate detection results for ships of different categories and scales, demonstrating its adaptability to target orientation and target scale.
Figure 14.
Visualization performance on the DOTA-V1.0 dataset is shown in the figure, which displays 15 object categories. The model has good adaptability to target orientation and target scale.
Figure 14.
Visualization performance on the DOTA-V1.0 dataset is shown in the figure, which displays 15 object categories. The model has good adaptability to target orientation and target scale.
Figure 15.
Comparison of detection performance of different methods on the DOTA-V1.0 dataset. It can be found that this article outperforms other methods in reducing the False Positives of targets. For example, OrientedRCNN and S2ANet both incorrectly checked the boxes for swimming pool (pink box), large velocity (orange box), and ship (red box). In addition, S2ANet also incorrectly checked the helicopter box (blue box). RetinaNet missed the detection of the harbor (yellow box) and also made errors detecting the swimming pool, harbor, and ground track field.
Figure 15.
Comparison of detection performance of different methods on the DOTA-V1.0 dataset. It can be found that this article outperforms other methods in reducing the False Positives of targets. For example, OrientedRCNN and S2ANet both incorrectly checked the boxes for swimming pool (pink box), large velocity (orange box), and ship (red box). In addition, S2ANet also incorrectly checked the helicopter box (blue box). RetinaNet missed the detection of the harbor (yellow box) and also made errors detecting the swimming pool, harbor, and ground track field.
Figure 16.
Experimental results of spatial resolution variation. Due to space limitations, we have provided results for typical categories. When the resolution of the image changes, typical objects such as planes (purple box), tennis courts (light blue box), and storage tanks (green box) can be effectively located, and good detection accuracy can be maintained.
Figure 16.
Experimental results of spatial resolution variation. Due to space limitations, we have provided results for typical categories. When the resolution of the image changes, typical objects such as planes (purple box), tennis courts (light blue box), and storage tanks (green box) can be effectively located, and good detection accuracy can be maintained.
Figure 17.
Experimental results of observing angle changes. Due to space limitations, we have provided visualization results for typical categories. Our model effectively detects typical objects such as planes (purple box), tennis courts (light blue box), harbors (yellow box), ships (red box), etc. when the observation angle changes and the positioning accuracy of the rotating box is good.
Figure 17.
Experimental results of observing angle changes. Due to space limitations, we have provided visualization results for typical categories. Our model effectively detects typical objects such as planes (purple box), tennis courts (light blue box), harbors (yellow box), ships (red box), etc. when the observation angle changes and the positioning accuracy of the rotating box is good.
Table 1.
The model training parameters.
Table 1.
The model training parameters.
Parameter | Value |
---|
Batch Size | 2 |
Input Size | |
Epoch | 12 |
Optimizer | SGD |
Learning Rate | 0.005 |
Weight Decay | 0.0001 |
Momentum | 0.9 |
Table 2.
Quantitative comparisons with state-of-the-art methods on DOTA-V1.0 test set. The best and second-best results are highlighted in bold and underline, respectively. ↑ indicates that the higher the numerical value, the better the model performance.
Table 2.
Quantitative comparisons with state-of-the-art methods on DOTA-V1.0 test set. The best and second-best results are highlighted in bold and underline, respectively. ↑ indicates that the higher the numerical value, the better the model performance.
Method | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | mAP (↑) |
---|
One-stage | | | | | | | | | | | | | | | | |
DRN [49] | 88.91 | 80.22 | 43.52 | 63.35 | 73.48 | 70.69 | 84.94 | 90.14 | 83.85 | 84.11 | 50.12 | 58.41 | 67.62 | 68.60 | 52.50 | 70.70 |
R3Det [34] | 88.76 | 83.09 | 50.91 | 67.27 | 76.23 | 80.39 | 86.72 | 90.78 | 84.68 | 83.24 | 61.98 | 61.35 | 66.91 | 70.63 | 53.94 | 73.79 |
RSDet [15] | 89.80 | 82.90 | 48.60 | 65.20 | 69.50 | 70.10 | 70.20 | 90.50 | 85.60 | 83.40 | 62.50 | 63.90 | 65.60 | 67.20 | 68.00 | 72.20 |
DAL [24] | 88.68 | 76.55 | 45.08 | 66.80 | 67.00 | 76.76 | 79.74 | 90.84 | 79.54 | 78.45 | 57.71 | 62.27 | 69.05 | 73.14 | 60.11 | 71.44 |
S2ANet [12] | 89.30 | 80.11 | 50.97 | 73.91 | 78.59 | 77.34 | 86.38 | 90.91 | 85.14 | 84.84 | 60.45 | 66.94 | 66.78 | 68.55 | 51.65 | 74.13 |
G-Rep [20] | 88.89 | 74.62 | 43.92 | 70.24 | 67.26 | 67.26 | 79.80 | 90.87 | 84.46 | 78.47 | 54.59 | 62.60 | 66.67 | 67.98 | 52.16 | 70.59 |
CFA [19] | 89.08 | 83.20 | 54.37 | 66.87 | 81.23 | 80.96 | 87.17 | 90.21 | 84.32 | 86.09 | 52.34 | 69.94 | 75.52 | 80.76 | 67.96 | 76.67 |
DFDet [50] | 89.41 | 82.42 | 49.93 | 70.63 | 79.57 | 79.02 | 87.22 | 90.91 | 82.80 | 84.49 | 62.05 | 64.26 | 72.30 | 72.90 | 58.29 | 75.08 |
Two-stage | | | | | | | | | | | | | | | | |
VitDet [51] | 88.38 | 75.86 | 52.24 | 74.42 | 78.52 | 83.22 | 88.47 | 90.86 | 77.18 | 86.98 | 48.95 | 62.77 | 76.66 | 72.97 | 57.48 | 74.41 |
CAD-Net [52] | 87.80 | 82.40 | 49.40 | 73.50 | 71.10 | 63.50 | 76.60 | 90.90 | 79.20 | 73.30 | 48.40 | 60.90 | 62.00 | 67.00 | 62.20 | 69.90 |
RoI Trans [11] | 88.64 | 78.52 | 43.44 | 75.92 | 68.81 | 73.68 | 83.59 | 90.74 | 77.27 | 81.46 | 58.39 | 53.54 | 62.83 | 58.93 | 47.67 | 69.56 |
SCRDet [36] | 89.98 | 80.65 | 52.09 | 68.36 | 68.36 | 60.32 | 72.41 | 90.85 | 87.94 | 86.86 | 65.02 | 66.68 | 66.25 | 68.24 | 65.21 | 72.61 |
G Vertex [13] | 89.64 | 85.00 | 52.26 | 77.34 | 73.01 | 73.14 | 86.82 | 90.74 | 79.02 | 86.81 | 59.55 | 70.91 | 72.94 | 70.86 | 57.32 | 75.02 |
FAOD [53] | 90.21 | 79.58 | 45.49 | 76.41 | 73.18 | 68.27 | 79.56 | 90.83 | 83.40 | 86.48 | 53.40 | 65.42 | 74.17 | 69.69 | 64.86 | 73.28 |
Mask OBB [54] | 89.61 | 85.09 | 51.85 | 72.90 | 75.28 | 73.23 | 85.57 | 90.37 | 82.08 | 85.05 | 55.73 | 68.39 | 71.61 | 69.87 | 66.33 | 74.86 |
ReDet [55] | 88.79 | 82.64 | 53.97 | 74.00 | 78.13 | 84.06 | 88.04 | 90.89 | 87.78 | 85.75 | 61.76 | 60.39 | 75.96 | 68.07 | 63.59 | 76.25 |
AOPG [37] | 89.14 | 82.74 | 51.87 | 69.28 | 77.65 | 82.42 | 88.08 | 90.89 | 86.26 | 85.13 | 60.60 | 66.30 | 74.05 | 67.76 | 58.77 | 75.39 |
Oriented RCNN [10] | 86.42 | 78.97 | 52.47 | 69.84 | 77.30 | 75.99 | 86.72 | 90.89 | 82.63 | 85.66 | 60.13 | 68.25 | 73.98 | 72.22 | 62.37 | 74.92 |
GCA2Net (Ours) | 90.06 | 84.48 | 59.70 | 81.30 | 81.01 | 84.91 | 88.20 | 90.90 | 87.61 | 86.42 | 67.27 | 70.33 | 81.18 | 78.73 | 71.08 | 77.56 |
Other Models | | | | | | | | | | | | | | | | |
FRED [56] | 89.37 | 82.12 | 50.84 | 73.89 | 77.58 | 77.38 | 87.51 | 90.82 | 86.30 | 84.25 | 62.54 | 65.10 | 72.65 | 69.55 | 63.41 | 75.56 |
ARS-DETR [57] | 87.65 | 76.54 | 50.64 | 69.85 | 79.76 | 83.91 | 87.92 | 90.26 | 86.24 | 85.09 | 54.58 | 67.01 | 75.62 | 73.66 | 63.39 | 75.47 |
SODC [58] | 88.54 | 84.72 | 51.21 | 70.18 | 79.43 | 80.82 | 88.56 | 90.66 | 87.28 | 86.49 | 55.37 | 64.59 | 68.22 | 71.03 | 64.85 | 76.06 |
HOFA-Net [59] | 90.42 | 76.64 | 47.57 | 59.01 | 73.41 | 85.64 | 89.29 | 90.76 | 73.30 | 89.44 | 71.15 | 69.39 | 75.16 | 67.05 | 74.77 | 75.53 |
Table 3.
Comparison of algorithm calculation efficiency and accuracy. The (↑) represents that the higher the indicator, the better, and the (↓) represents that the lower the indicator, the better. The computational efficiency was tested on an Nvidia Geforce RTX4090D GPU.
Table 3.
Comparison of algorithm calculation efficiency and accuracy. The (↑) represents that the higher the indicator, the better, and the (↓) represents that the lower the indicator, the better. The computational efficiency was tested on an Nvidia Geforce RTX4090D GPU.
Model | Params (M) (↓) | FLOPs (G) (↓) | FPS (img/s) (↑) | mAP |
---|
G. Vertex | 43.05 | 211.7 | 22.9 | 75.02 |
OrientedRCNN | 42.92 | 211.71 | 23.1 | 74.92 |
RetinaNet_OBB | 38.19 | 216.19 | 23.3 | 68.42 |
RoITransformer | 56.90 | 225.56 | 14.4 | 69.56 |
S2ANet | 38.02 | 171.37 | 23.0 | 74.13 |
GCA2Net (Ours) | 80.5 | 196.75 | 12.5 | 77.56 |
Table 4.
Quantitative comparisons with preceding state-of-the-art methods on HRSC2016 test set. The best and second-best results are highlighted in bold and underline, respectively.
Table 4.
Quantitative comparisons with preceding state-of-the-art methods on HRSC2016 test set. The best and second-best results are highlighted in bold and underline, respectively.
Method | Backbone | mAP |
---|
Rotated RetinaNet [60] | ResNet-50 | 85.10 |
R2PN [61] | IMP-VGG-16 | 79.6 |
TIOE-Det [62] | Resnet-101 | 90.16 |
CFC-Net [63] | Resnet-101 | 89.70 |
RSDet [15] | Resnet-50 | 86.5 |
R2CNN [64] | IMP-ResNet-101 | 73.10 |
RoI Trans [11] | IMP-ResNet-101 | 86.20 |
G. Vertex [13] | IMP-ResNet-101 | 88.20 |
R3Det [34] | IMP-ResNet-101 | 89.30 |
GWD [16] | ResNet-101 | 88.95 |
DAL [24] | IMP-ResNet-101 | 89.80 |
SLA [23] | ResNet-101 | 89.51 |
S2ANet [12] | IMP-ResNet-101 | 90.20 |
MSFN [65] | - | 90.00 |
AAM [66] | ResNet-50-AAM | 88.49 |
GCA2Net(Ours) | DRC-ResNet-50 | 90.40 |
Table 5.
Ablation studies of the influence of DRC and MOSCAB on the DOTA-V1.0 dataset where “✓” indicates that the module is used. The best result is highlighted in bold.
Table 5.
Ablation studies of the influence of DRC and MOSCAB on the DOTA-V1.0 dataset where “✓” indicates that the module is used. The best result is highlighted in bold.
Baseline | DRC | MOSCAB | mAP |
---|
✓ | | | 75.81 |
✓ | ✓ | | 76.9 |
✓ | ✓ | ✓ | 77.46 |
Table 6.
The ablation study of feature fusion methods on the DOTA-V1.0 dataset. Here, “✓” indicates that the module is used. ↑ indicates that the higher the numerical value, the better the model performance. The best result is highlighted in bold.
Table 6.
The ablation study of feature fusion methods on the DOTA-V1.0 dataset. Here, “✓” indicates that the module is used. ↑ indicates that the higher the numerical value, the better the model performance. The best result is highlighted in bold.
FPN | AugFPN | AugFPN+ | mAP (↑) |
---|
✓ | | | 76.02 |
| ✓ | | 76.59 |
| | ✓ | 77.56 |
Table 7.
The ablation study of different feature enhancement methods on the DOTA-V1.0 dataset. Here, “✓” indicates that the module is used. ↑ indicates that the higher the numerical value, the better the model performance. The best result is highlighted in bold. In this table, CAA stands for Context Anchor Attention.
Table 7.
The ablation study of different feature enhancement methods on the DOTA-V1.0 dataset. Here, “✓” indicates that the module is used. ↑ indicates that the higher the numerical value, the better the model performance. The best result is highlighted in bold. In this table, CAA stands for Context Anchor Attention.
MOSCAB | SE-Net | CAA | CBAM | mAP (↑) |
---|
✓ | | | | 77.56 |
| ✓ | | | 76.33 |
| | ✓ | | 68.78 |
| | | ✓ | 76.89 |
Table 8.
Ablation study of the replacement strategy on the DOTA-V1.0 dataset. Here, “✓” indicates that the stage is replaced. The best result is highlighted in bold.
Table 8.
Ablation study of the replacement strategy on the DOTA-V1.0 dataset. Here, “✓” indicates that the stage is replaced. The best result is highlighted in bold.
Stage2 | Stage3 | Stage4 | mAP |
---|
✓ | | | 76.75 |
✓ | ✓ | | 77.08 |
✓ | ✓ | ✓ | 77.56 |
Table 9.
Ablation study of the spatial encoding ways of the ARU on the DOTA-V1.0 dataset. Here, “✓” indicates that the module is used. The best result is highlighted in bold.
Table 9.
Ablation study of the spatial encoding ways of the ARU on the DOTA-V1.0 dataset. Here, “✓” indicates that the module is used. The best result is highlighted in bold.
DWConv | WTConv | Involution | mAP |
---|
✓ | | | 76.24 |
| ✓ | | 76.30 |
| | ✓ | 77.56 |