Adaptive Enhanced Detection Network for Low Illumination Object Detection
Abstract
:1. Introduction
- (1)
- We propose an adaptive enhanced detection network structure for low illumination object detection, which effectively integrates dual discriminators, encoding decoders, and attention mechanisms.
- (2)
- We designed the lighting conversion stage as the first stage of the global model, which can be applied to the standardized training of pairwise image data to achieve adaptive enhancement of low illumination image data, thus preventing input images with spatially varying lighting conditions from experiencing overexposure or underexposure problems after enhancement.
- (3)
- We designed the risk factor detection phase as the second phase of the global model. This part of the model is mainly based on the Transformer algorithm, adopts an encoder–decoder structure, and is subjected to lightweight processing, which can effectively process the adaptive enhanced image output in the first stage. On this basis, optimization is carried out for small target detection and the entire detection process to improve the overall automatic detection performance of risk factors.
- (4)
- We demonstrated the adaptive conversion effect of the first stage model on images of whole or partial regions through comparative experiments and demonstrated through ablation experiments that the detection performance of the second stage can be improved after adaptive conversion in the first stage. Based on quantitative experiments, it was shown that the second-stage detection model is generally superior to the mainstream CNN (convolutional neural network) and that the first stage can overcome the impact of low illumination in whole or partial region on detection. The global network can be used for low illumination object detection and has good detection performance.
2. Related Works
2.1. Generate Adversarial Networks
2.2. Transformer
3. Methods
3.1. Overall Framework Description
3.2. Light Conversion
3.2.1. Generation Phase
3.2.2. Discrimination Phase
3.2.3. Conversion Loss
3.3. Risk Factor Detection
3.3.1. Feature Extraction Phase
3.3.2. Encoding–Decoding Phase
3.3.3. Predictive Matching Phase
3.3.4. Detection Loss
4. Results
4.1. Data and Experimental Settings
4.2. Conversion Effect Experiment (E1)
4.3. Ablation Detection Experiment (E2)
4.4. Risk Factor Confusion Quantification Experiment (E3)
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jiang, L.; Jing, Y.; Hu, S.; Ge, B.; Xiao, W. Deep Refinement Network for Natural Low-Light Image Enhancement in Symmetric Pathways. Symmetry 2018, 10, 491. [Google Scholar] [CrossRef]
- Saito, K.; Saenko, K.; Liu, M.Y. Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part III 16; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Ivan, A.; Pavel, S.; Denis, K.; Alexey, K.; Taras, K.; Aleksei, S.; Sergey, N.; Victor, L.; Gleb, S. Highresolution daytime translation without domain labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Ren, S.; He, K.; Ross, G.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Baek, K.; Choi, Y.; Uh, Y.; Yoo, J.; Shim, H. Rethinking the truly unsupervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional onestage object detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Okawara, T.; Yoshida, M.; Nagahara, H.; Yagi, Y. Action Recognition from a Single Coded Image. In Proceedings of the IEEE International Conference on Computational Photography, St. Louis, MO, USA, 24–26 April 2020; pp. 1–11. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H. Scaled-YOLOv4: Scaling Cross Stage Partial Network. Computer Vision and Pattern Recognition. In Proceedings of the IEEE/cvf Conference on Computer vision and Pattern Recognition, Virtual, 19–25 June 2021. [Google Scholar]
- Nicolas, C.; Francisco, M.; Gabriel, S.; Nicolas, U.; Alexander, K.; Sergey, Z. End-to-end object detection with transformers. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Girshick, R. Fast R-CNN. Computer Science. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Li, L.; Zhang, P.; Zhang, H.; Yang, J.; Li, C.; Zhong, Y.; Wang, L.; Yuan, L.; Zhang, L.; Hwang, J.; et al. Grounded language-image pre-training. In Proceedings of the International Conference on Machine Learning, Guangzhou China, 18–21 February 2022. [Google Scholar]
- Shao, S.; Li, Z.; Zhang, T.; Peng, C.; Yu, G.; Zhang, X.; Li, J.; Sun, J. Objects365: A large-scale, high-quality dataset for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8430–8439. [Google Scholar]
- Aishwarya, K.; Mannat, S.; Yann, L.; Ishan, M.; Gabriel, S.; Nicolas, C. Mdetr—Modulated detection for end-to-end multi-modal understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Zhu, Z.; Xu, Z.; You, A.; Bai, X. Semantically multi-modal image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Lore, K.G.; Akintayo, A.; Sarkar, S. LLNet: A Deep Autoencoder Approach to Natural Low-light Image Enhancement. Pattern Recognit. 2017, 61, 650–662. [Google Scholar] [CrossRef]
- Tao, L.; Zhu, C.; Xiang, G.; Li, Y.; Jia, H.; Xie, X. LLCNN: A convolutional neural network for low-light image enhancement. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), Suzhou, China, 13–16 December 2018. [Google Scholar]
- Chen, W.; Wenjing, W.; Wenhan, Y.; Jiaying, L. Deep retinex decomposition for low-light enhancement. arXiv 2018, arXiv:1808.04560. [Google Scholar]
- Shangzhe, W.; Jiarui, X.; Yu-Wing, T.; Chi-Keung, T. Deep high dynamic range imaging with large foreground motions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 117–132. [Google Scholar]
- Jianrui, C.; Shuhang, G.; Lei, Z. Learning a deep single image contrast enhancer from multi-exposure images. IEEE Trans. Image Process. 2018, 27, 2049–2062. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Sun, L.; Wang, K.; Yang, K.; Xiang, K. See clearer at night: Towards robust nighttime semantic segmentation through day-night image conversion. arXiv 2019, arXiv:1908.05868. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Yu, B.; Wei, H.; Wang, W. GAN-Based Day and Night Image Cross-Domain Conversion Research and Application. In Proceedings of the 2022 11th International Conference of Information and Communication Technology, Wuhan, China, 24–26 June 2022; pp. 230–235. [Google Scholar]
- Chen, Y.; Xu, X.; Tian, Z.; Jia, J. Homomorphic latent space interpolation for unpaired imageto-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019. [Google Scholar]
- Anoop, C.; Alan, S. Sem-gan: Semanticallyconsistent image-to-image translation. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 7–11 January 2019. [Google Scholar]
- Yunjey, C.; Minje, C.; Munyoung, K. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Lin, J.; Chen, J.; Xia, Y.; Liu, S.; Qin, T.; Luo, J. Exploring explicit domain supervision for latent space disentanglement in unpaired image-to-image translation. arXiv 2019, arXiv:1902.03782. [Google Scholar] [CrossRef] [PubMed]
- Luigi, M.; Andrea, Z. Semantically adaptive image-to-image translation for domain adaptation of semantic segmentation. arXiv 2020, arXiv:2009.01166. [Google Scholar]
- Ali, J.; Lucy, C.; Phillip, I. On the ”steerability” of generative adversarial networks. arXiv 2020, arXiv:1907.07171. [Google Scholar]
- Peilun, L.; Xiaodan, L.; Daoyuan, J.; Eric, P.X. Semantic-aware grad-gan for virtual-to-real urban scene adaption. arXiv 2018, arXiv:1801.01726. [Google Scholar]
- Tian, X.; Wang, L.; Ding, Q. Overview of image semantic segmentation methods based on deep learning. J. Softw. 2019, 30, 440–468. [Google Scholar]
- Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.Y.; Isola, P.; Saenko, K.; Efros, A.; Darrell, T. CyCADA: Cycle-Consistent Adversarial Domain Adaptation. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Dai, X.; Chen, Y.; Yang, J.; Zhang, P.; Yuan, L.; Zhang, L. Dynamic detr: End-to-end object detection with dynamic attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 2988–2997. [Google Scholar]
- Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 7373–7382. [Google Scholar]
- Marco, T.; Umberto, M.; Gianluca, A.; Pietro, Z. Unsupervised domain adaptation for mobile semantic segmentation based on cycle consistency and feature alignment. Image Vis. Comput. 2020, 95, 103889. [Google Scholar]
- Yang, X.; Xu, Z.; Luo, J. Towards perceptual image dehazing by physics-based disentanglement and adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LI, USA, 2–7 February 2018. [Google Scholar]
- Wang, Y.; Zhang, X.; Yang, T.; Sun, J. Anchor detr: Query design for transformer-based detector. Proc. AAAI Conf. Artif. Intell. 2022, 36, 2567–2575. [Google Scholar] [CrossRef]
- Sun, Z.; Cao, S.; Yang, Y.; Kris, K. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Hamid, R.; Nathan, T.; JunYoung, G.; Amir, S.; Ian, R.; Silvio, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 658–666. [Google Scholar]
Ours (I) | Ours (II) | mTCr | mFCr | |
---|---|---|---|---|
Ours | √ | √ | 0.85 | 0.15 |
Ours | √ | 0.55 | 0.45 | |
CycleGAN | √ | 0.67 | 0.33 | |
Fast RCNN | 0.49 | 0.51 | ||
Deformable-DETR | 0.53 | 0.47 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, H.; Yu, B.; Wang, W.; Zhang, C. Adaptive Enhanced Detection Network for Low Illumination Object Detection. Mathematics 2023, 11, 2404. https://doi.org/10.3390/math11102404
Wei H, Yu B, Wang W, Zhang C. Adaptive Enhanced Detection Network for Low Illumination Object Detection. Mathematics. 2023; 11(10):2404. https://doi.org/10.3390/math11102404
Chicago/Turabian StyleWei, Hanting, Bo Yu, Wei Wang, and Chenghong Zhang. 2023. "Adaptive Enhanced Detection Network for Low Illumination Object Detection" Mathematics 11, no. 10: 2404. https://doi.org/10.3390/math11102404
APA StyleWei, H., Yu, B., Wang, W., & Zhang, C. (2023). Adaptive Enhanced Detection Network for Low Illumination Object Detection. Mathematics, 11(10), 2404. https://doi.org/10.3390/math11102404