Author Contributions
Conceptualization, D.Z.; methodology, Z.L.; software, Z.L.; validation, Z.Y., Z.L. and C.Z.; formal analysis, Z.Y.; resources, J.Q.; writing—original draft preparation, Z.Y. and Z.L.; writing—review and editing, C.Z. and D.Z.; visualization, C.Z.; supervision, J.Q.; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Three main problems in detection of ORSIs. (a) refers to the situation of scattered objects in one image. (b) has a railway station in the middle, however it is overwhelmed by the complicated background. (c) is a scene with two categories of objects with different scale.
Figure 1.
Three main problems in detection of ORSIs. (a) refers to the situation of scattered objects in one image. (b) has a railway station in the middle, however it is overwhelmed by the complicated background. (c) is a scene with two categories of objects with different scale.
Figure 2.
Architecture of MFPNet.
Figure 2.
Architecture of MFPNet.
Figure 3.
The structure of Multi-Feature Pyramid Module. Mark * refers to the resized feature maps.
Figure 3.
The structure of Multi-Feature Pyramid Module. Mark * refers to the resized feature maps.
Figure 4.
The structure of Receptive Field Block. R refers to the expansion rate in the dilated convolution.
Figure 4.
The structure of Receptive Field Block. R refers to the expansion rate in the dilated convolution.
Figure 5.
The instance distribution of each category in Levir. Small instances refer to those smaller than 30 × 30 pixels. Middle ones are those between 30 × 30 and 150 × 150 pixels. Large instances are bigger than 150 × 150 pixels.
Figure 5.
The instance distribution of each category in Levir. Small instances refer to those smaller than 30 × 30 pixels. Middle ones are those between 30 × 30 and 150 × 150 pixels. Large instances are bigger than 150 × 150 pixels.
Figure 6.
Total number of instances of each category in DIOR.
Figure 6.
Total number of instances of each category in DIOR.
Figure 7.
Schematic images of the datasets. The first row are images from Levir and the others are from DIOR. Both datasets have scenes of the three problems pointed out, (i) scattered instances in one image, (ii) complex background, and (iii) scale-differences between targets.
Figure 7.
Schematic images of the datasets. The first row are images from Levir and the others are from DIOR. Both datasets have scenes of the three problems pointed out, (i) scattered instances in one image, (ii) complex background, and (iii) scale-differences between targets.
Figure 8.
Results of ships on the Levir dataset. For cases of incomplete objects, size differences and huge number of targets, our method provides promising results.
Figure 8.
Results of ships on the Levir dataset. For cases of incomplete objects, size differences and huge number of targets, our method provides promising results.
Figure 9.
Results of aircrafts on the Levir dataset. For cases of incomplete objects, size differences and huge number of targets, our method provides promising results.
Figure 9.
Results of aircrafts on the Levir dataset. For cases of incomplete objects, size differences and huge number of targets, our method provides promising results.
Figure 10.
Results of oil tanks on the Levir dataset. For cases of incomplete objects, size differences and huge number of targets, our method provides promising results.
Figure 10.
Results of oil tanks on the Levir dataset. For cases of incomplete objects, size differences and huge number of targets, our method provides promising results.
Figure 11.
Qualitative results of MFPNet320 on the DIOR test set. The first row is the detection results when a large number of scattered objects. The second row shows the results when the background texture is complex. The third row contains the detection results when scale difference exists in and between classes. The fourth row are some results of other cases.
Figure 11.
Qualitative results of MFPNet320 on the DIOR test set. The first row is the detection results when a large number of scattered objects. The second row shows the results when the background texture is complex. The third row contains the detection results when scale difference exists in and between classes. The fourth row are some results of other cases.
Figure 12.
The results of images with complex background. Faster R-CNN (a,d) and RetinaNet (b,e) have several false or missed detection. MFPNet (c,f) gives better results.
Figure 12.
The results of images with complex background. Faster R-CNN (a,d) and RetinaNet (b,e) have several false or missed detection. MFPNet (c,f) gives better results.
Figure 13.
The results of images having targets with scale differences. Comparing with Faster R-CNN (a,d) and RetinaNet (b,e), MFPNet (c,f) makes better detection.
Figure 13.
The results of images having targets with scale differences. Comparing with Faster R-CNN (a,d) and RetinaNet (b,e), MFPNet (c,f) makes better detection.
Figure 14.
The results of images with numerous scattered targets. Faster R-CNN (a,d) and RetinaNet (b,e) have many missed detection, while MFPNet (c,f) detects almost all objects.
Figure 14.
The results of images with numerous scattered targets. Faster R-CNN (a,d) and RetinaNet (b,e) have many missed detection, while MFPNet (c,f) detects almost all objects.
Figure 15.
False cases in DIOR dataset. False detection may occur when facing dense arrays of oriented objects.
Figure 15.
False cases in DIOR dataset. False detection may occur when facing dense arrays of oriented objects.
Table 1.
The details of local and global features in M-FPM.
Table 1.
The details of local and global features in M-FPM.
Layer | Output | Feature |
---|
Conv3_3 | 80 × 80 × 256 | local |
Conv4_3 | 40 × 40 × 512 | ⇓ |
Conv5_3 | 20 × 20 × 512 |
Conv_fc7 | 10 × 10 × 1024 | global |
Table 2.
The details of local and global features in RFB.
Table 2.
The details of local and global features in RFB.
Branch | Kernel | Input | Output | Receptive Field |
---|
1 | 1 × 1 | 80 × 80 × 512 | 80 × 80 × 128 | 12 × 12 |
3 × 3, R = 1 | 80 × 80 × 128 | 80 × 80 × 128 |
2 | 1 × 1 | 80 × 80 × 512 | 80 × 80 × 128 | 36 × 36 |
3 × 1 | 80 × 80 × 128 | 80 × 80 × 128 |
3 × 3, R = 3 | 80 × 80 × 128 | 80 × 80 × 128 |
3 | 1 × 1 | 80 × 80 × 512 | 80 × 80 × 128 | 36 × 36 |
1 × 3 | 80 × 80 × 128 | 80 × 80 × 128 |
3 × 3, R = 3 | 80 × 80 × 128 | 80 × 80 × 128 |
4 | 1 × 1 | 80 × 80 × 512 | 80 × 80 × 64 | 60 × 60 |
1 × 3 | 80 × 80 × 64 | 80 × 80 × 96 |
3 × 1 | 80 × 80 × 96 | 80 × 80 × 128 |
3 × 3, R = 5 | 80 × 80 × 128 | 80 × 80 × 128 |
Table 3.
Detection results on the Levir test set. All models are trained on Levir trainval set. The best performance are shown in red.
Table 3.
Detection results on the Levir test set. All models are trained on Levir trainval set. The best performance are shown in red.
Method | Backbone | Airplane | Ship | Oil Tank | mAP |
---|
Faster R-CNN [10] | VGG16 | 90.1 | 89.5 | 71.0 | 83.6 |
Faster R-CNN [10] | ResNet-50 [49] | 87.6 | 81.6 | 71.9 | 80.4 |
SSD300 [2] | ResNet-50 | 87.7 | 81.5 | 68.7 | 79.3 |
RetinaNet500 [1] | ResNet-50 | 87.6 | 80.2 | 74.0 | 80.6 |
RefineDet320 [4] | VGG16 | 90.7 | 89.3 | 85.4 | 88.5 |
CenterNet-DLA [16] | DLA-34 | 81.7 | 79.0 | 75.7 | 78.8 |
Ours | VGG16 | 90.7 | 88.8 | 89.8 | 89.8 |
Table 4.
Categories in the DIOR dataset and their corresponding numbers.
Table 4.
Categories in the DIOR dataset and their corresponding numbers.
c1 | Airplane | c6 | Chimney | c11 | Ground track field | c16 | Storage tank |
c2 | Airport | c7 | Dam | c12 | Harbor | c17 | Tennis court |
c3 | Baseball field | c8 | Expressway service area | c13 | Overpass | c18 | Train station |
c4 | Basketball court | c9 | Expressway toll station | c14 | Ship | c19 | Vehicle |
c5 | Bridge | c10 | Golf court | c15 | Stadium | c20 | Wind mill |
Table 5.
Detection results on the DIOR test set. All models are trained on the DIOR trainval set. * refers to adding FPN to the method. The best performances are marked in red, and the second ones are marked in blue. Green indicates the third best performances.
Table 5.
Detection results on the DIOR test set. All models are trained on the DIOR trainval set. * refers to adding FPN to the method. The best performances are marked in red, and the second ones are marked in blue. Green indicates the third best performances.
Method | Backbone | mAP | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 | c16 | c17 | c18 | c19 | c20 |
---|
R-CNN [26] | VGG16 | 37.7 | 35.6 | 43.0 | 53.8 | 62.3 | 15.6 | 53.7 | 33.7 | 50.2 | 33.5 | 50.1 | 49.3 | 39.5 | 30.9 | 9.1 | 60.8 | 18.0 | 54.0 | 36.1 | 9.1 | 16.4 |
Faster R-CNN [26] | VGG16 | 54.1 | 53.6 | 49.3 | 78.8 | 66.2 | 28.0 | 70.9 | 62.3 | 69.0 | 55.2 | 68.0 | 56.9 | 50.2 | 50.1 | 27.7 | 73.0 | 39.8 | 75.2 | 38.6 | 23.6 | 45.4 |
Faster R-CNN * [26] | ResNet‐50 | 63.1 | 54.1 | 71.4 | 63.3 | 81.0 | 42.6 | 72.5 | 57.5 | 68.7 | 62.1 | 73.1 | 76.5 | 42.8 | 56.0 | 71.8 | 57.0 | 53.5 | 81.2 | 53.0 | 43.1 | 80.9 |
Faster R-CNN * [26] | ResNet‐101 | 65.1 | 54.0 | 74.5 | 63.3 | 80.7 | 44.8 | 72.5 | 60.0 | 75.6 | 62.3 | 76.0 | 76.8 | 46.4 | 57.2 | 71.8 | 68.3 | 53.8 | 81.1 | 59.5 | 43.1 | 81.2 |
Mask R-CNN * [26] | ResNet‐50 | 63.5 | 53.8 | 72.3 | 63.2 | 81.0 | 38.7 | 72.6 | 55.9 | 71.6 | 67.0 | 73.0 | 75.8 | 44.2 | 56.5 | 71.9 | 58.6 | 53.6 | 81.1 | 54.0 | 43.1 | 81.1 |
Mask R-CNN * [26] | ResNet‐101 | 65.2 | 53.9 | 76.6 | 63.2 | 80.9 | 40.2 | 72.5 | 60.4 | 76.3 | 62.5 | 76.0 | 75.9 | 46.5 | 57.4 | 71.8 | 68.3 | 53.7 | 81.0 | 62.3 | 43.0 | 81.0 |
PANet [26] | ResNet‐50 | 63.8 | 61.9 | 70.4 | 71.0 | 80.4 | 38.9 | 72.5 | 56.6 | 68.4 | 60.0 | 69.0 | 74.6 | 41.6 | 55.8 | 71.7 | 72.9 | 62.3 | 81.2 | 54.6 | 48.2 | 86.7 |
PANet [26] | ResNet‐101 | 66.1 | 60.2 | 72.0 | 70.6 | 80.5 | 43.6 | 72.3 | 61.4 | 72.1 | 66.7 | 72.0 | 73.4 | 45.3 | 56.9 | 71.7 | 70.4 | 62.0 | 80.9 | 57.0 | 47.2 | 84.5 |
CBD-E [50] | ResNet-101 | 67.8 | 54.2 | 77.0 | 71.5 | 87.1 | 44.6 | 75.4 | 63.5 | 76.2 | 65.3 | 79.3 | 79.5 | 47.5 | 59.3 | 69.1 | 69.7 | 64.3 | 84.5 | 59.4 | 44.7 | 83.1 |
SSD300 [26] | VGG16 | 58.6 | 59.5 | 72.7 | 72.4 | 75.7 | 29.7 | 65.8 | 56.6 | 63.5 | 53.1 | 65.3 | 68.6 | 49.4 | 48.1 | 59.2 | 61.0 | 46.6 | 76.3 | 55.1 | 27.4 | 65.7 |
YOLO-v3 [26] | Darknet-53 | 57.1 | 72.2 | 29.2 | 74.0 | 78.6 | 31.2 | 69.7 | 26.9 | 48.6 | 54.4 | 31.1 | 61.1 | 44.9 | 49.7 | 87.4 | 70.6 | 68.7 | 87.3 | 29.4 | 48.3 | 78.7 |
RetinaNet500 [26] | ResNet‐50 | 65.7 | 53.7 | 77.3 | 69.0 | 81.3 | 44.1 | 72.3 | 62.5 | 76.2 | 66.0 | 77.7 | 74.2 | 50.7 | 59.6 | 71.2 | 69.3 | 44.8 | 81.3 | 54.2 | 45.1 | 83.4 |
RetinaNet500 [26] | ResNet‐101 | 66.1 | 53.3 | 77.0 | 69.3 | 85.0 | 44.1 | 73.2 | 62.4 | 78.6 | 62.8 | 78.6 | 76.6 | 49.9 | 59.6 | 71.1 | 68.4 | 45.8 | 81.3 | 55.2 | 44.4 | 85.5 |
RefineDet320 [4] | VGG16 | 67.1 | 69.5 | 80.4 | 74.4 | 81.1 | 40.0 | 72.7 | 68.8 | 80.2 | 58.9 | 77.7 | 74.2 | 61.3 | 57.8 | 63.3 | 75.3 | 47.3 | 81.3 | 65.7 | 34.7 | 78.2 |
CornerNet [26] | Hourglass-104 | 64.9 | 58.8 | 84.2 | 72.0 | 80.8 | 46.4 | 75.3 | 64.3 | 81.6 | 76.3 | 79.5 | 79.5 | 26.1 | 60.6 | 37.6 | 70.7 | 45.2 | 84.0 | 57.1 | 43.0 | 75.9 |
M2Det320 [42] | VGG16 | 44.0 | 54.7 | 61.4 | 67.1 | 54.6 | 16.7 | 61.6 | 33.2 | 60.1 | 51.7 | 58.5 | 60.2 | 19.6 | 32.7 | 31.3 | 63.0 | 12.4 | 71.4 | 21.5 | 9.0 | 38.2 |
CenterNet [16] | Hourglass-104 | 52.4 | 50.2 | 51.2 | 62.2 | 62.3 | 31.7 | 61.0 | 38.5 | 63.1 | 57.0 | 57.3 | 56.6 | 26.2 | 41.1 | 58.3 | 54.1 | 49.7 | 73.6 | 41.7 | 40.5 | 66.8 |
RICNN [26] | VGG16 | 44.2 | 39.1 | 61.0 | 60.1 | 66.3 | 25.3 | 63.3 | 41.1 | 51.7 | 36.6 | 55.9 | 58.9 | 43.5 | 39.0 | 9.1 | 61.1 | 19.1 | 63.5 | 46.1 | 11.4 | 31.5 |
RICAOD [26] | VGG16 | 50.9 | 42.2 | 69.7 | 62.0 | 79.0 | 27.7 | 68.9 | 50.1 | 60.5 | 49.3 | 64.4 | 65.3 | 42.3 | 46.8 | 11.7 | 53.5 | 24.5 | 70.3 | 53.3 | 20.4 | 56.2 |
RIFD-CNN [26] | VGG16 | 56.1 | 56.6 | 53.2 | 79.9 | 69.0 | 29.0 | 71.5 | 63.1 | 69.0 | 56.0 | 68.9 | 62.4 | 51.2 | 51.1 | 31.7 | 73.6 | 41.5 | 79.5 | 40.1 | 28.5 | 46.9 |
MFPNet320 | ResNet50 | 69.1 | 73.6 | 80.6 | 80.2 | 80.9 | 41.1 | 74.0 | 69.3 | 84.2 | 63.3 | 76.3 | 74.6 | 62.8 | 58.2 | 69.9 | 71.5 | 55.5 | 82.3 | 66.4 | 38.5 | 79.1 |
MFPNet320 | VGG16 | 71.2 | 76.6 | 83.4 | 80.6 | 82.1 | 44.3 | 75.6 | 68.5 | 85.9 | 63.9 | 77.3 | 77.2 | 62.1 | 58.8 | 77.2 | 76.8 | 60.3 | 86.4 | 64.5 | 41.5 | 80.2 |
Table 6.
Results of ablation experiment. All models are trained on the DIOR trainval set and tested on the DIOR test set.
Table 6.
Results of ablation experiment. All models are trained on the DIOR trainval set and tested on the DIOR test set.
Cascade
Layers | M-FPM | RFB | MAdd | FLOPs | mAP | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 | c16 | c17 | c18 | c19 | c20 |
---|
5 | ✓ | ✓ | 200.6 G | 99.8 G | 71.2 | 76.6 | 83.4 | 80.6 | 82.1 | 44.3 | 75.6 | 68.5 | 85.9 | 63.9 | 77.3 | 77.2 | 62.1 | 58.8 | 77.2 | 76.8 | 60.3 | 86.4 | 64.5 | 41.5 | 80.2 |
4 | ✓ | ✓ | 140.5 G | 71.7 G | 67.3 | 70.6 | 84.1 | 74.9 | 81.3 | 40.9 | 74.6 | 70.1 | 82.2 | 59.0 | 77.7 | 75.2 | 62.2 | 58.3 | 55.8 | 76.8 | 45.0 | 81.3 | 66.5 | 32.6 | 76.3 |
5 | ✓ | - | 181.4 G | 90.2 G | 70.6 | 77.1 | 81.4 | 80.6 | 81.1 | 43.7 | 74.3 | 69.8 | 84.5 | 63.0 | 77.3 | 76.4 | 62.2 | 58.9 | 76.2 | 76.8 | 60.0 | 86.7 | 62.2 | 40.8 | 79.9 |
5 | - | ✓ | 153.9 G | 76.5 G | 64.9 | 66.7 | 78.7 | 71.8 | 81.0 | 35.0 | 74.0 | 65.2 | 79.5 | 53.7 | 78.5 | 73.9 | 61.7 | 55.6 | 57.8 | 79.5 | 41.2 | 81.0 | 59.8 | 30.8 | 72.5 |
5 | - | - | 134.7 G | 66.9 G | 63.2 | 60.8 | 78.1 | 71.0 | 80.7 | 32.7 | 72.7 | 62.4 | 77.8 | 52.8 | 75.9 | 70.1 | 60.1 | 53.2 | 59.5 | 76.7 | 40.5 | 80.5 | 58.8 | 29.4 | 70.2 |