Precise Orientation Estimation for Rotated Object Detection Based on a Unit Vector Coding Approach
Abstract
:1. Introduction
- (1)
- We propose a novel UVC encoding and decoding method that parameterizes object orientation through vector components. The encoded parameters exhibit continuous and reversible characteristics, thereby overcoming the boundary discontinuity and improving the accuracy of object orientation estimation.
- (2)
- We propose a novel CDL function as the loss function of the orientation angle prediction branch in model training to evaluate the predicted angle of rotated objects. Experimental results show that the design of this loss function significantly improves the accuracy of rotated object detection tasks.
2. Related Research
2.1. Rotated Object Detection
2.2. Representation of RBB
2.3. Boundary Discontinuity Problem
3. The Proposed Rotated Object Detection Method
3.1. Baseline
3.1.1. Orientation Angle Representation
3.1.2. Network Architecture
- (1)
- Regression output: this branch predicts the center (tx, ty), width (tw), and height (th) of the RBB.
- (2)
- Angle output: this branch predicts the orientation angle by decomposing it into components (tcos, tsin) in the form of a unit vector.
- (3)
- Center-ness output: this branch predicts the center-ness of the object, with an output channel number of 1.
- (4)
- Classification output: this branch predicts the class of the object within the RBB, where the number of output channels corresponds to the number of classes in the dataset.
3.2. Unit Vector Coding (UVC)
3.3. Loss Functions
3.3.1. PIoU Loss
3.3.2. Cosine Distance Loss (CDL)
3.4. Dataset and Training Method
3.4.1. Dataset
3.4.2. Training Method
4. Experimental Results
4.1. Ablation Study of the Proposed UVC Method Applied to Two IoU Loss Functions
- (1)
- Comparing the impact of the KLD loss function and the PIoU loss function on mAP and CS metrics, the experimental results show that PIoU performs better as a regression error metric on this dataset
- (2)
- As shown in Table 1, when the KLD loss function is combined with the proposed UVC method, mAP50 and mAP5095 increased by 1.8% and 4.5%, respectively. For angle accuracy evaluation, the CS metric increased by 0.013. Similarly, when the PIoU loss function is combined with the UVC method, mAP50 and mAP5095 increased by 0.2% and 1.1%, respectively, while CS improved by 0.02. These results validate that the proposed UVC method enhances detection accuracy across multiple metrics when integrated with different IoU loss functions. Moreover, the processing speed of the proposed method achieves 53.2 FPS, enabling real-time rotated object detection.
- (3)
- Figure 11 illustrates the visualization results on the MVTec test set. As seen in the figure, training with the proposed UVC method, in combination with both PIoU and KLD loss functions, effectively mitigates the boundary discontinuity problem, thereby improving the model’s prediction accuracy and stability.
4.2. Evaluation on Different Weight Values for the Angle Loss Function
4.3. Performance Comparison of Different Angle Parameter Encoding Methods
- (1)
- As shown in Table 2, on the MVTec dataset, the proposed UVC encoding method achieved the highest mAP50 and mAP5095 scores, significantly outperforming two recently published methods, PSC [19] and ACM [40]. These results demonstrate that the proposed UVC method provides significant performance improvements compared to the current state-of-the-art (SOTA) encoding techniques.
- (2)
- Table 2 also presents a summary of the number of Mega Parameters (MParams) and Giga Floating-point Operations per second (GFLOPs) for each encoding method. The CSL method [15] increases both the parameter number and computational load compared to the method without encoding (None). In contrast, the proposed UVC encoding method introduces negligible changes in both MParams and GFLOPs, while offering superior accuracy performance.
- (3)
- Table 3 presents the performance evaluation of the proposed method for each class on the MVTec test set. Compared to the method without encoding, the proposed method significantly enhances detection accuracy across several classes, including Type01, Type02, Type03, and Type05. These results demonstrate that the proposed method effectively improves the rotated object detection accuracy in terms of mAP5095 and CS metrics compared to the method without encoding.
- (4)
- Figure 13 presents two comparisons of the proposed method with the method without encoding on the MVTec test set. Figure 13a shows the results of the method without encoding. It is clear that when the object’s orientation angle approaches the boundary, the boundary discontinuity results in lower-quality bounding boxes, thereby reducing detection accuracy. In contrast, Figure 13b illustrates the results of the proposed method, which effectively addresses the boundary discontinuity issue, leading to enhanced detection accuracy for rotated objects.
4.4. Performance Comparison with SOTA Methods on the HRSC2016 Test Set
- (1)
- The existing CGD method [38], using ResNet101 as the backbone network, achieved the highest mAP score of 90.61 on the test set. The proposed method, also using the ResNet101 backbone, achieved the second-highest mAP score of 90.54. Additionally, when using the ResNet50 backbone network, the proposed method achieved the third-highest mAP score of 90.44 on the test set. These results confirm that the proposed method achieves comparable performance to the recently published CGD method and outperforms other SOTA methods addressing the boundary discontinuity problem, including KLD [33], CSL [15], and PCS [19].
- (2)
- We also compared our method with the VGL method [35], which employs the Deep-Layer Aggregation network with Deformable Convolutional Networks (DLA34-DCN) backbone. The results indicate that our method, using both ResNet50 and ResNet101 backbones, achieved superior performance, highlighting its efficiency in orientation estimation accuracy.
- (3)
- The detection performance of the proposed method is further evaluated on small and large ship objects in the HRSC2016 test set. In this experiment, a size threshold of 128 × 128 pixels was established to classify the ground truth into small and large objects. The average precision (AP) for small objects (APS) and large objects (APL) is then measured. Table 5 presents the results of this evaluation. From Table 5, it is evident that the proposed method achieves comparable AP scores for both small and large objects, indicating that the method exhibits significant scale-invariant detection performance.
- (4)
- Figure 14 presents a comparison between the proposed method and the method without encoding on the HRSC2016 test set. Figure 14a,b show the results of the method without encoding and the proposed method, respectively. The results clearly demonstrate that the proposed UVC encoding method significantly improves the detection robustness of the rotated object detector. This improvement is evident in the more accurate RBBs generated for ships of different sizes, shapes, and orientations, leading to an overall increase in rotated object detection performance and reliability.
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ding, J.; Xue, N.; Long, Y.; Xia, G.-S.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals CNN: Rotational region CNN for orientation robust scene text detection. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3610–3615. [Google Scholar]
- Liao, M.; Shi, B.; Bai, X. TextBoxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 2018, 27, 3676–3690. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Shen, L.; Li, B.; Yang, J.; Yang, F.; Yuan, K.; Fang, C.; Fanwang, Y. Real-Time Rotated Object Detection Using Angle Decoupling. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; pp. 2772–2778. [Google Scholar]
- Nie, K.; von Drigalski, F.; Triyonoputro, J.C.; Nakashima, C.; Shibata, Y.; Konishi, Y.; Ijiri, Y.; Yoshioka, T.; Domae, Y.; Ueshiba, T.; et al. Team O2AS’ approach for the task-board task of the World Robot Challenge 2018. Adv. Robot. 2020, 34, 477–498. [Google Scholar] [CrossRef]
- Zhou, D.; Fang, J.; Song, X.; Guan, C.; Yin, J.; Dai, Y.; Yang, R. IoU loss for 2D/3D object detection. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 16–19 September 2019; pp. 85–94. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 3520–3529. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined single-stage detector with feature refinement for rotating object. Proc. AAAI Conf. Artif. Intell. 2021, 35, 3163–3171. [Google Scholar] [CrossRef]
- Han, J.; Ding, J.; Xue, N.; Xia, G.S. ReDet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2786–2795. [Google Scholar]
- Wagner, R.; Matuschek, M.; Knaack, P.; Zwick, M.; Geiß, M. IndustrialEdgeML—End-to-end edge-based computer vision system for Industry 5.0. Procedia Comput. Sci. 2023, 217, 594–603. [Google Scholar] [CrossRef]
- Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. SCRDet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2384–2399. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 677–694. [Google Scholar] [CrossRef]
- Yang, X.; Yan, J.; Ming, Q.; Wang, W.; Zhang, X.; Tian, Q. Rethinking rotated object detection with Gaussian Wasserstein distance loss. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11830–11841. [Google Scholar]
- Yang, X.; Zhou, Y.; Zhang, G.; Yang, J.; Wang, W.; Yan, J.; Zhang, X.; Tian, Q. The KFIoU loss for rotated object detection. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Yang, X.; Hou, L.; Zhou, Y.; Wang, W.; Yan, J. Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15819–15829. [Google Scholar]
- Yu, Y.; Da, F. Phase-shifting coder: Predicting accurate orientation in oriented object detection. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 13354–13363. [Google Scholar]
- MVTec Datasets. MVTec Screws Dataset. Available online: https://www.mvtec.com/company/research/datasets/mvtec-screws (accessed on 7 November 2024).
- Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A high resolution optical satellite image dataset for ship recognition and some new baselines. In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, 24–26 February 2017; pp. 324–331. [Google Scholar]
- Ding, J.; Xue, N.; Xia, G.-S.; Bai, X.; Yang, W.; Yang, M.Y.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; et al. Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7778–7796. [Google Scholar] [CrossRef] [PubMed]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, Proceedings, Part I; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Liu, F.; Chen, R.; Zhang, J.; Xing, K.; Liu, H.; Qin, J. R2YOLOX: A lightweight refined anchor-free rotated detector for object detection in aerial images. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 5632715. [Google Scholar] [CrossRef]
- Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar] [CrossRef]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.-S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
- Zhang, F.; Wang, X.; Zhou, S.; Wang, Y.; Hou, Y. Arbitrary-oriented ship detection through center-head point extraction. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 5612414. [Google Scholar] [CrossRef]
- Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J. Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. Adv. Neural Inf. Process. Syst. 2021, 34, 18381–18394. [Google Scholar]
- Wang, J.; Li, F.; Bi, H. Gaussian focal loss: Learning distribution polarized angle prediction for rotated object detection in aerial images. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 4707013. [Google Scholar] [CrossRef]
- Zhao, T.; Liu, N.; Celik, T.; Li, H.-C. An arbitrary-oriented object detector based on variant gaussian label in remote sensing images. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 8013605. [Google Scholar] [CrossRef]
- Ming, Q.; Miao, L.; Zhou, Z.; Yang, X.; Dong, Y. Optimization for arbitrary-oriented object detection via representation invariance loss. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 8021505. [Google Scholar] [CrossRef]
- Cheng, G.; Wang, J.; Li, K.; Xie, X.; Lang, C.; Yao, Y.; Han, J. Anchor-free oriented proposal generator for object detection. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 5625414. [Google Scholar] [CrossRef]
- Xu, H.; Liu, X.; Ma, Y.; Zhu, Z.; Wang, S.; Yan, C.; Dai, F. Rotated object detection with circular gaussian distribution. Electronics 2023, 12, 3265. [Google Scholar] [CrossRef]
- Zhao, Z.; Li, S. ABFL: Angular boundary discontinuity free loss for arbitrary oriented object detection in aerial images. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 5611411. [Google Scholar] [CrossRef]
- Xu, H.; Liu, X.; Xu, H.; Ma, Y.; Zhu, Z.; Yan, C.; Dai, F. Rethinking boundary discontinuity problem for oriented object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17406–17415. [Google Scholar]
- Chen, Z.; Chen, K.; Lin, W.; See, J.; Yu, H.; Ke, Y.; Yang, C. Piou loss: Towards accurate oriented object detection in complex environments. In Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part V; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 195–211. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Everingham, M. The PASCAL Visual Object Classes Challenge 2007. Available online: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/ (accessed on 7 November 2024).
- Experimental Results of the Proposed Method on the MVTec Test Set, Results of Precise Orientation Estimation for Rotated Object Detection Based on Unit Vector Coding. Available online: https://youtu.be/ulJX3NIFMDE (accessed on 7 November 2024).
Detector | IoU Loss | UVC | mAP50 (%) | mAP5095 (%) | CS | FPS |
---|---|---|---|---|---|---|
YOLOX-s | KLD [33] | 96.34 | 75.43 | 0.983 | - | |
✓ | 98.14 | 79.93 | 0.996 | - | ||
PIoU [41] | 98.55 | 86.46 | 0.977 | 53.8 | ||
✓ | 98.71 | 87.48 | 0.997 | 53.2 |
Detector | Encoding | Len | Loss Function | mAP50 (%) | mAP5095 (%) | CS | MParams | GFLOPs |
---|---|---|---|---|---|---|---|---|
YOLOX-s | None | 1 | Smooth L1 loss | 98.55 | 86.46 | 0.977 | 9.83 | 31.76 |
CSL [15] | 180 | Gaussian focal loss | 98.65 | 80.45 | 0.957 | 9.90 | 32.14 | |
360 | Gaussian focal loss | 98.24 | 84.22 | 0.984 | 9.97 | 32.53 | ||
PSC [19] | 3 | Smooth L1 loss | 98.14 | 86.65 | 0.989 | 9.83 | 31.76 | |
60 | Smooth L1 loss | 98.55 | 86.35 | 0.601 | 9.85 | 31.88 | ||
ACM [40] | 2 | Smooth L1 loss | 98.62 | 86.96 | 0.993 | 9.83 | 31.76 | |
UVC | 2 | CDL (ours) | 98.71 | 87.48 | 0.997 | 9.83 | 31.76 |
Encoding | Metric | Type01 | Type02 | Type03 | Type04 | Type05 | Type06 | Type07 | Type08 | Type09 | Type10 | Type11 | Type12 | Type13 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
None | AP5095 | 0.863 | 0.833 | 0.844 | 0.854 | 0.836 | 0.845 | 0.886 | 0.926 | 0.924 | 0.880 | 0.916 | 0.822 | 0.812 |
CS | 1.000 | 0.830 | 0.958 | 1.000 | 0.965 | 1.000 | 0.999 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.958 | |
UVC | AP5095 | 0.888 | 0.860 | 0.864 | 0.852 | 0.847 | 0.859 | 0.862 | 0.923 | 0.939 | 0.884 | 0.927 | 0.858 | 0.811 |
CS | 1.000 | 1.000 | 1.000 | 1.000 | 0.998 | 1.000 | 0.998 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.962 |
Method | Backbone * | mAP07 (%) |
VGL [35] | DLA34-DCN | 89.78 |
RIDet [36] | ResNet50 | 89.47 |
KLD [33] | ResNet50 | 89.76 |
CSL [15] | ResNet50 | 89.84 |
PSC [19] | ResNet50 | 90.06 |
RoI Transformer [1] | ResNet101 | 86.20 |
R3Det-DCL [18] | ResNet101 | 89.46 |
RIDet [36] | ResNet101 | 89.63 |
R3Det-GWD [16] | ResNet101 | 89.85 |
S2A-Net [30] | ResNet101 | 90.17 |
ABFL [39] | ResNet101 | 90.30 |
AOPG [37] | ResNet101 | 90.34 |
CGD [38] | ResNet101 | 90.61 |
UVC (ours) | ResNet50 | 90.48 |
ResNet101 | 90.54 |
Method | Size Threshold | Backbone | APS (%) | APL (%) |
---|---|---|---|---|
UVC (ours) | 128 × 128 | ResNet50 | 89.48 | 90.86 |
ResNet101 | 89.65 | 90.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tsai, C.-Y.; Lin, W.-C. Precise Orientation Estimation for Rotated Object Detection Based on a Unit Vector Coding Approach. Electronics 2024, 13, 4402. https://doi.org/10.3390/electronics13224402
Tsai C-Y, Lin W-C. Precise Orientation Estimation for Rotated Object Detection Based on a Unit Vector Coding Approach. Electronics. 2024; 13(22):4402. https://doi.org/10.3390/electronics13224402
Chicago/Turabian StyleTsai, Chi-Yi, and Wei-Chuan Lin. 2024. "Precise Orientation Estimation for Rotated Object Detection Based on a Unit Vector Coding Approach" Electronics 13, no. 22: 4402. https://doi.org/10.3390/electronics13224402
APA StyleTsai, C.-Y., & Lin, W.-C. (2024). Precise Orientation Estimation for Rotated Object Detection Based on a Unit Vector Coding Approach. Electronics, 13(22), 4402. https://doi.org/10.3390/electronics13224402