*4.5. Performance Comparison with Other Method*

Several one stage and two stage object detection methods is adopt to evaluate the effectiveness of MB-RPN: SSD [3] RetinaNet [10], FPN [8], Libra RCNN [11], GA-RPN [13] and SRetinaNet [12]. All of these methods except SRetinaNet are implemented with the source code provided by authors. For SRetinaNet method, it is implemented by adjusting hyperparameters of RetinaNet.

Table 2 shows the quantitative results on DOTA, the best performance for each category is colored in red. The mAP of MB-RPN achieves 68.5%, which outperform other methods. Compare with the one-stage approaches, since they are lack of positive/negative discrimination process, the detection accuracy are lower than all of the two-stage approaches obviously. Compare with original FPN, the mAP is 3% higher, moreover the AP of each category is also higher. Compare with GA-RPN and Libra RCNN, the mAP is increased about 1.7and 0.9% respectively, the AP of the most of categories are increased. The visualization comparison between MB-RPN and Libra RCNN is shown in Figure 10, MB-RPN detects more accurate small objects in various challenging cases, e.g., small vehicle objects at bottom left of Figure 10a and middle of Figure 10b. At the same time, the performances of other medium and big objects are not decreased. The above phenomenon proves the effectiveness of MB-RPN in enhancing the detection performances for images with small objects and large scale variants.

Considering the performances gap between one-stage and two-stage approaches, in this paper the performance comparison of UAVB dataset is only carried out between FPN, GA-RPN, Libra RCNN and MB-RPN. Table 3 shows the quantitative evaluation of these approaches, the best performance for each category is colored in red. All of the mAP acquired from the approaches above are not ideal, this may because the imbalance of samples. However, the MB-RPN still largely outperform other approaches on both mAP and AP for each category. Figure 11 provides a visual comparison of our approach and Libra RCNN, since both of the two approaches' performance are not ideal, it only shows

the localization results but without categories. It can be seen that MB-RPN detects more small objects such as cars at upper of the input image which is corresponding to sampling results and distribution shown above. The above phenomenon proves the effectiveness of MB-RPN in enhancing the detection performances.

**Figure 10.** Selected visual comparison of DOTA benchmark dataset. Each subfigure include result of Libra RCNN on the left and MB-RPN on the right. (**a**) small objects (**b**) middle objects.


**Table 3.** Quantitative performance(AP%) of our model on UAVB benchmark datasets compared with comparison approaches. The best performance on each category is colored in red.

**Figure 11.** Selected visual comparison of DOTA benchmark dataset which includes Libra RCNN on the top and MB-RPN on the bottom.
