An Improved YOLOX Model and Domain Transfer Strategy for Nighttime Pedestrian and Vehicle Detection
Abstract
:1. Introduction
- (1)
- An improved algorithm based on YOLOX was proposed for small target pedestrian and vehicle detection at night. The main improvements include 1. reparameterization of the model structure using the Re-parameterization Visual Geometry Group (RepVGG) technique; 2. the introduction of a coordinate-based attention mechanism; 3. the addition of a new feature scale fusion branch; and 4. improvement of the loss function.
- (2)
- A domain transfer training strategy was proposed that allows the model to be trained more efficiently using daytime datasets. The large-scale daytime dataset is fused with the much smaller night dataset after low-illumination degrading. The models were then trained and tested separately after unified low-illumination enhancement to fully extract the data features of the existing daytime dataset and remedy the nighttime data deficiencies problem.
- (3)
- The proposed improved YOLOX and domain transfer training strategies were validated on a real-world dataset. The experimental results showed that the improved YOLOX algorithm produced fewer errors than the original algorithm, and it was more accurate for nighttime vehicle/pedestrian detection when combined with a domain transfer training strategy.
2. Related Work
2.1. Object Detection
2.1.1. Traditional Detection Methods
2.1.2. Deep-Learning-Based Detection Methods
2.1.3. Domain Adaption-Based Object Detection Method
2.2. Low-Light Detection
2.2.1. Low Illumination Datasets
2.2.2. Low-Illumination Enhancement and Restoration Methods
2.2.3. Different Methods Applied to Low-Illuminated Detection
3. Improved YOLOX Detection Algorithm
3.1. YOLOX Model
3.2. Structural Re-Parameterization and Light-Weighting of Model
3.2.1. Structural Re-Parameterization
3.2.2. Lightweight Design of the Model
3.3. Introduction of Attention Mechanism
3.4. Feature Pyramid Improvement
3.5. Loss Function Design
3.5.1. Confidence Loss Function Design
3.5.2. Bounding Box Regression Loss Function Design
3.5.3. Combined Loss Function
4. Training Strategy for Data Domain Transfer
4.1. Low Light Enhancement
4.2. Low-Illumination Degrading Transformations
4.2.1. Domain Adaptation Estimation
4.2.2. Gamma Correction and Inverse Gamma Correction
4.2.3. Color Space Conversion
4.2.4. White Balance and Inverse White Balance
4.2.5. Low-Light Corruption
5. Experimental Comparison and Discussion
5.1. Experimental Datasets and Performance Evaluation Metrics
5.1.1. Experimental Dataset
5.1.2. Evaluation Metrics
5.2. Experimental Parameter Setting
5.3. Analysis
5.3.1. Training Evaluation Process and PR Curve
5.3.2. Ablation Studies
5.3.3. Effect of CA Module Location and Number of Layers
5.3.4. Comparison of Different Testing Methods
5.4. Effectiveness Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, J.; Li, J.; Wang, K.; Zhao, J.; Cong, H.; He, P. Exploring Factors Affecting the Severity of Night-Time Vehicle Accidents under Low Illumination Conditions. Adv. Mech. Eng. 2019, 11, 1687814019840940. [Google Scholar] [CrossRef]
- Chuma, E.L.; Iano, Y. Human Movement Recognition System Using CW Doppler Radar Sensor with FFT and Convolutional Neural Network. In Proceedings of the 2020 IEEE MTT-S Latin America Microwave Conference (LAMC 2020), Cali, Colombia, 26–28 May 2021; pp. 1–4. [Google Scholar]
- Navarro, P.J.; Fernández, C.; Borraz, R.; Alonso, D. A Machine Learning Approach to Pedestrian Detection for Autonomous Vehicles Using High-Definition 3D Range Data. Sensors 2017, 17, 18. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lee, W.; Cho, H.; Hyeong, S.; Chung, W. Practical Modeling of GNSS for Autonomous Vehicles in Urban Environments. Sensors 2019, 19, 4236. [Google Scholar] [CrossRef] [Green Version]
- Wei, Y.; Tian, Q.; Guo, J.; Huang, W.; Cao, J. Multi-Vehicle Detection Algorithm through Combining Harr and HOG Features. Math. Comput. Simul. 2019, 155, 130–145. [Google Scholar] [CrossRef]
- Wu, H.; Hu, Y.; Wang, W.; Mei, X.; Xian, J. Ship Fire Detection Based on an Improved YOLO Algorithm with a Lightweight Convolutional Neural Network Model. Sensors 2022, 22, 7420. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Cheng, Z.; Lv, J.; Wu, A.; Qu, N. YOLOv3 Object Detection Algorithm with Feature Pyramid Attention for Remote Sensing Images. Sens. Mater. 2020, 32, 4537. [Google Scholar] [CrossRef]
- Ju, M.; Luo, H.; Wang, Z.; Hui, B.; Chang, Z. The Application of Improved YOLO V3 in Multi-Scale Target Detection. Appl. Sci. 2019, 9, 3775. [Google Scholar] [CrossRef] [Green Version]
- Zhu, Y.; Ma, C.; Du, J. Rotated Cascade R-CNN: A Shape Robust Detector with Coordinate Regression. Pattern Recognit. 2019, 96, 106964. [Google Scholar] [CrossRef]
- Cai, Y.; Luan, T.; Gao, H.; Wang, H.; Chen, L.; Li, Y.; Sotelo, M.A.; Li, Z. YOLOv4-5D: An Effective and Efficient Object Detector for Autonomous Driving. IEEE Trans. Instrum. Meas. 2021, 70, 1–13. [Google Scholar] [CrossRef]
- Zhang, M.; Wang, C.; Yang, J.; Zheng, K. Research on Engineering Vehicle Target Detection in Aerial Photography Environment Based on YOLOX. In Proceedings of the 2021 14th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 11–12 December 2021; pp. 254–256. [Google Scholar]
- Viola, P.; Jones, M. Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, p. I. [Google Scholar]
- Viola, P.; Jones, M.J. Robust Real-Time Face Detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Felzenszwalb, P.; McAllester, D.; Ramanan, D. A Discriminatively Trained, Multiscale, Deformable Part Model. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D. Cascade Object Detection with Deformable Part Models. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2241–2248. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Girshick, R.; Felzenszwalb, P.; McAllester, D. Object Detection with Grammar Models. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2011; Volume 24. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 9627–9636. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-Based Fully Convolutional Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Hu, H.; Gu, J.; Zhang, Z.; Dai, J.; Wei, Y. Relation Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3588–3597. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards Balanced Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 821–830. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1483–1498. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain Adaptive Faster R-CNN for Object Detection in the Wild. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
- Cai, Q.; Pan, Y.; Ngo, C.-W.; Tian, X.; Duan, L.; Yao, T. Exploring Object Relation in Mean Teacher for Cross-Domain Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 11457–11466. [Google Scholar]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Strong-Weak Distribution Alignment for Adaptive Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 6956–6965. [Google Scholar]
- Zhuang, C.; Han, X.; Huang, W.; Scott, M. IFAN: Image-Instance Full Alignment Networks for Adaptive Object Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13122–13129. [Google Scholar]
- He, Z.; Zhang, L. Domain Adaptive Object Detection via Asymmetric Tri-Way Faster-RCNN. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 309–324. [Google Scholar]
- Zhao, G.; Li, G.; Xu, R.; Lin, L. Collaborative Training Between Region Proposal Localization and Classification for Domain Adaptive Object Detection. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 86–102. [Google Scholar]
- Xu, M.; Wang, H.; Ni, B.; Tian, Q.; Zhang, W. Cross-Domain Detection via Graph-Induced Prototype Alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 12355–12364. [Google Scholar]
- Xu, C.-D.; Zhao, X.-R.; Jin, X.; Wei, X.-S. Exploring Categorical Regularization for Domain Adaptive Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11724–11733. [Google Scholar]
- Neumann, L.; Karg, M.; Zhang, S.; Scharfenberger, C.; Piegert, E.; Mistr, S.; Prokofyeva, O.; Thiel, R.; Vedaldi, A.; Zisserman, A.; et al. NightOwls: A Pedestrians at Night Dataset. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018. [Google Scholar]
- Nada, H.; Sindagi, V.A.; Zhang, H.; Patel, V.M. Pushing the Limits of Unconstrained Face Detection: A Challenge Dataset and Baseline Results. In Proceedings of the 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), Redondo Beach, CA, USA, 22–25 October 2018; pp. 1–10. [Google Scholar]
- Yang, W.; Yuan, Y.; Ren, W.; Liu, J.; Scheirer, W.J.; Wang, Z.; Zhang, T.; Zhong, Q.; Xie, D.; Pu, S.; et al. Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study. IEEE Trans. Image Process. 2020, 29, 5737–5752. [Google Scholar] [CrossRef]
- Loh, Y.P.; Chan, C.S. Getting to Know Low-Light Images with the Exclusively Dark Dataset. Comput. Vis. Image Underst. 2019, 178, 30–42. [Google Scholar] [CrossRef] [Green Version]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2636–2645. [Google Scholar]
- Land, E.H. An Alternative Technique for the Computation of the Designator in the Retinex Theory of Color Vision. Proc. Natl. Acad. Sci. USA 1986, 83, 3078–3080. [Google Scholar] [CrossRef] [Green Version]
- Jobson, D.J.; Rahman, Z.; Woodell, G.A. A Multiscale Retinex for Bridging the Gap between Color Images and the Human Observation of Scenes. IEEE Trans. Image Process. 1997, 6, 965–976. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Guo, X.; Li, Y.; Ling, H. LIME: Low-Light Image Enhancement via Illumination Map Estimation. IEEE Trans. Image Process. 2017, 26, 982–993. [Google Scholar] [CrossRef]
- Lee, C.; Lee, C.; Kim, C.-S. Contrast Enhancement Based on Layered Difference Representation of 2D Histograms. IEEE Trans. Image Process. 2013, 22, 5372–5384. [Google Scholar] [CrossRef]
- Tomasi, C.; Manduchi, R. Bilateral Filtering for Gray and Color Images. In Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), Washington, DC, USA, 7 January 1998; pp. 839–846. [Google Scholar]
- Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep Retinex Decomposition for Low-Light Enhancement. arXiv 2018, arXiv:1808.04560. [Google Scholar]
- Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. EnlightenGAN: Deep Light Enhancement Without Paired Supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef]
- Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1780–1789. [Google Scholar]
- Guo, S.; Wang, W.; Wang, X.; Xu, X. Low-Light Image Enhancement with Joint Illumination and Noise Data Distribution Transformation. Vis. Comput. 2022. [Google Scholar] [CrossRef]
- Xu, K.; Chen, H.; Xu, C.; Jin, Y.; Zhu, C. Structure-Texture Aware Network for Low-Light Image Enhancement. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 4983–4996. [Google Scholar] [CrossRef]
- Sasagawa, Y.; Nagahara, H. YOLO in the Dark—Domain Adaptation Method for Merging Multiple Models. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 345–359. [Google Scholar]
- Xu, N.; Huo, C.; Pan, C. Adaptive Brightness Learning for Active Object Recognition. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 2162–2166. [Google Scholar]
- Wang, W.; Peng, Y.; Cao, G.; Guo, X.; Kwok, N. Low-Illumination Image Enhancement for Night-Time UAV Pedestrian Detection. IEEE Trans. Ind. Inform. 2021, 17, 5208–5217. [Google Scholar] [CrossRef]
- Arad, B.; Kurtser, P.; Barnea, E.; Harel, B.; Edan, Y.; Ben-Shahar, O. Controlled Lighting and Illumination-Independent Target Detection for Real-Time Cost-Efficient Applications. The Case Study of Sweet Pepper Robotic Harvesting. Sensors 2019, 19, 1390. [Google Scholar] [CrossRef] [Green Version]
- Chen, C.; Chen, Q.; Xu, J.; Koltun, V. Learning to See in the Dark. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3291–3300. [Google Scholar]
- Wu, T.-H.; Wang, T.-W.; Liu, Y.-Q. Real-Time Vehicle and Distance Detection Based on Improved Yolo v5 Network. In Proceedings of the 2021 3rd World Symposium on Artificial Intelligence (WSAI), Guangzhou, China, 18–20 June 2021; pp. 24–28. [Google Scholar]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-Style ConvNets Great Again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Zhang, H.; Wang, Y.; Dayoub, F.; Sunderhauf, N. VarifocalNet: An IoU-Aware Dense Object Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8514–8523. [Google Scholar]
- Lore, K.G.; Akintayo, A.; Sarkar, S. LLNet: A Deep Autoencoder Approach to Natural Low-Light Image Enhancement. Pattern Recognit. 2017, 61, 650–662. [Google Scholar] [CrossRef] [Green Version]
- Cui, Z.; Qi, G.-J.; Gu, L.; You, S.; Zhang, Z.; Harada, T. Multitask AET With Orthogonal Tangent Regularity for Dark Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2553–2562. [Google Scholar]
- Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.-C.; Qi, H.; Lim, J.; Yang, M.-H.; Lyu, S. UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking. Comput. Vis. Image Underst. 2020, 193, 102907. [Google Scholar] [CrossRef] [Green Version]
- Karaimer, H.C.; Brown, M.S. A Software Platform for Manipulating the Camera Imaging Pipeline. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 429–444. [Google Scholar]
- Ramanath, R.; Snyder, W.E.; Yoo, Y.; Drew, M.S. Color Image Processing Pipeline. IEEE Signal Process. Mag. 2005, 22, 34–43. [Google Scholar] [CrossRef]
- Foi, A.; Trimeche, M.; Katkovnik, V.; Egiazarian, K. Practical Poissonian-Gaussian Noise Modeling and Fitting for Single-Image Raw-Data. IEEE Trans. Image Process. 2008, 17, 1737–1754. [Google Scholar] [CrossRef]
- Plotz, T.; Roth, S. Benchmarking Denoising Algorithms With Real Photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 1586–1595. [Google Scholar]
Algorithms | Processing Time | Training Parameters | FLOPs | Average Absolute Error |
---|---|---|---|---|
RetinexNet | 0.1200 | 555,205 | 587.47 | 104.81 |
EnlightenGAN | 0.0078 | 8,636,675 | 273.24 | 102.78 |
Zero-DCE | 0.0025 | 79,416 | 84.99 | 98.78 |
Training Method | AP | mAP | F1 | Recall | Precision | Inference Time (ms) | Model Size (MB) | |
---|---|---|---|---|---|---|---|---|
Person | Car | |||||||
YOLOX (Baseline) | 0.768 | 0.762 | 0.765 | 0.736 | 0.685 | 0.796 | 11.2 | 17.2 |
+Lightweighting (lower number of channels) | 0.760 | 0.756 | 0.758 | 0.729 | 0.678 | 0.788 | 11.0 | 13.3 |
+Lightweighting (CSP depth increase) | 0.763 | 0.762 | 0.763 | 0.736 | 0.687 | 0.793 | 12.8 | 13.8 |
+reparameterization and lightweighting | 0.770 | 0.761 | 0.766 | 0.740 | 0.662 | 0.839 | 12.6 | 13.5 |
+Attention Mechanism Module (3 levels) | 0.781 | 0.769 | 0.775 | 0.743 | 0.700 | 0.792 | 20.3 | 13.8 |
+Feature Pyramid Improvement | 0.789 | 0.796 | 0.793 | 0.752 | 0.700 | 0.813 | 23.0 | 15.8 |
+Varifocal Loss | 0.795 | 0.798 | 0.797 | 0.753 | 0.724 | 0.786 | 23.2 | 15.8 |
+CIoU Loss | 0.801 | 0.801 | 0.801 | 0.761 | 0.718 | 0.809 | 23.1 | 15.8 |
Model | P2 | P3 | P4 | P5 | 3 Layer | 1 Layer | ||
---|---|---|---|---|---|---|---|---|
mAP | Inference Time (ms) | mAP | Inference Time (ms) | |||||
1 | √ | 0.788 | 17.2 | 0.780 | 16.1 | |||
2 | √ | √ | 0.791 | 21.6 | 0.789 | 17.2 | ||
3 | √ | √ | √ | 0.801 | 23.1 | 0.796 | 17.8 | |
4 | √ | √ | √ | √ | 0.782 | 25.1 | 0.782 | 18.5 |
Algorithms | AP | mAP | F1 | Recall | Precision | Inference Time (ms) | Model Size (MB) | |
---|---|---|---|---|---|---|---|---|
Person | Car | |||||||
Faster R-CNN | 0.762 | 0.757 | 0.760 | 0.753 | 0.705 | 0.810 | 84.7 | 166.0 |
Cascade R-CNN | 0.765 | 0.765 | 0.765 | 0.749 | 0.710 | 0.792 | 97.1 | 277.0 |
YOLOv3 | 0.700 | 0.743 | 0.722 | 0.676 | 0.564 | 0.836 | 34.9 | 235.0 |
YOLOv4 | 0.747 | 0.745 | 0.746 | 0.687 | 0.570 | 0.863 | 44.2 | 244.0 |
YOLOv5 | 0.740 | 0.737 | 0.738 | 0.721 | 0.636 | 0.831 | 10.6 | 13.7 |
YOLOX | 0.768 | 0.762 | 0.765 | 0.736 | 0.685 | 0.796 | 11.2 | 17.2 |
Improved YOLOX(ours) | 0.794 | 0.798 | 0.796 | 0.758 | 0.723 | 0.797 | 17.8 | 15.6 |
Improved YOLOX + Dataset Expansion | 0.820 | 0.815 | 0.817 | 0.767 | 0.742 | 0.793 | 17.8 | 15.6 |
Improved YOLOX + Domain Transfer (ours) | 0.831 | 0.816 | 0.824 | 0.779 | 0.734 | 0.829 | 19.7 | 15.9 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yi, K.; Luo, K.; Chen, T.; Hu, R. An Improved YOLOX Model and Domain Transfer Strategy for Nighttime Pedestrian and Vehicle Detection. Appl. Sci. 2022, 12, 12476. https://doi.org/10.3390/app122312476
Yi K, Luo K, Chen T, Hu R. An Improved YOLOX Model and Domain Transfer Strategy for Nighttime Pedestrian and Vehicle Detection. Applied Sciences. 2022; 12(23):12476. https://doi.org/10.3390/app122312476
Chicago/Turabian StyleYi, Kefu, Kai Luo, Tuo Chen, and Rongdong Hu. 2022. "An Improved YOLOX Model and Domain Transfer Strategy for Nighttime Pedestrian and Vehicle Detection" Applied Sciences 12, no. 23: 12476. https://doi.org/10.3390/app122312476
APA StyleYi, K., Luo, K., Chen, T., & Hu, R. (2022). An Improved YOLOX Model and Domain Transfer Strategy for Nighttime Pedestrian and Vehicle Detection. Applied Sciences, 12(23), 12476. https://doi.org/10.3390/app122312476