Underwater Image Translation via Multi-Scale Generative Adversarial Network
Abstract
:1. Introduction
- In response to the data lacking challenge, we construct the backbone model of SID-AM-MSITM based on a fundamental image translator, TuiGAN [23]. TuiGAN conducts image translation tasks based on only two unpaired images, and we thus make further improvements on its encoders and decoders.
- In response to the challenge of insufficient feature extraction ability, we propose to apply Convolution Block Attention Modules (CBAM) [24] to the generators and discriminators of SID-AM-MSITM. CBAM assigns the weight distribution of feature maps in the two dimensions of channel and space and increases the weight of important features, so as to make SID-AM-MSITM pay attention to meaningful information.
- In response to the loss of content details, we further improve SID-AM-MSITM by constructing style-independent discriminators. The discriminators give similar results when discriminating images with the same content and different styles, so as to make SID-AM-MSITM focus on the content information instead of the style information.
- We conduct systematical experiments based on multiple datasets, including submarine, underwater optics, sunken ship, crashed plane, and underwater sonar images. Compared with multiple baseline models, SID-AM-MSITM improves the ability to access effective information and retain content details.
2. Methodology
2.1. Generators and Discriminators with CBAM Modules
2.2. Style-Independent Discriminators
2.2.1. Instance-Level Style-Independent Discriminators
2.2.2. Vector-Level Style-Independent Discriminators
2.3. Implementation
3. Experiment and Result Analysis
3.1. Evaluation Metric
- (1)
- PSNR: PSNR measures the distance between the distributions of two images. We use PSNR to calculate the distance between the source-domain images and reconstructed images. A larger PSNR value indicates a smaller difference between the two images.
- (2)
- SSIM: SSIM measures the similarity of two images. The value of SSIM is between 0 and 1, and a larger SSIM indicates a better reconstruction effect, which suggests the translation effect of an image translation model.
- (3)
- Entropy: Information entropy measures the complexity of an image. Larger information entropy indicates complex images that contain more information.
- (4)
- SIFID: Single Image Fréchet Inception Distance (SIFID) is a special type of Fréchet Inception distance (FID) [37]. It measures the deviation between the feature distribution of two single images, and smaller SIFID indicates the better effect of generated images.
3.2. Ablation Experiment
3.3. Comparative Experiment
- (1)
- CycleGAN: CycleGAN is one of the most typical translation models using cycle consistency. The model assumes the potential correspondence between source-domain and target-domain images.
- (2)
- FUNIT: FUNIT is an unsupervised few-shot image translation model that achieves satisfactory performance based on limited data.
- (3)
- AdaIN: AdaIN is an image translation model that achieves real-time and arbitrary style transfer.
- (4)
- SinDiffusion: SinDiffusion is a diffusion model that works on a single natural image.
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
SID-AM-MSITM | multi-scale image translation model based on style-independent discriminators and attention modules |
CBAM | convolution block attention module |
GAN | generative adversarial network |
CNN | convolutional neural network |
LeakyReLU | leaky rectified linear unit |
CycleGAN | cycle-consistent adversarial network |
TV loss | total variation loss |
PSNR | peak signal-to-noise ratio |
SSIM | structure similarity index measure |
Entropy | information entropy |
AdaIN | adaptive instance normalization |
CycleGAN | cycle-consistent adversarial networks |
FID | the Fréchet Inception distance |
SIFID | single image Fréchet Inception distance |
References
- Zhao, Y.; Zhu, K.; Zhao, T.; Zheng, L.; Deng, X. Small-Sample Seabed Sediment Classification Based on Deep Learning. Remote. Sens. 2023, 15, 2178. [Google Scholar] [CrossRef]
- Chen, B.; Li, R.; Bai, W.; Zhang, X.; Li, J.; Guo, R. Research on recognition method of optical detection image of underwater robot for submarine cable. In Proceedings of the 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 11–13 October 2019; pp. 1973–1976. [Google Scholar]
- Teng, B.; Zhao, H. Underwater target recognition methods based on the framework of deep learning: A survey. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420976307. [Google Scholar] [CrossRef]
- Cruz, L.; Lucio, D.; Velho, L. Kinect and rgbd images: Challenges and applications. In Proceedings of the IEEE 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials, Ouro Preto, Brazil, 22–25 August 2012; pp. 36–49. [Google Scholar]
- Yang, L.; Wang, B.; Zhang, R.; Zhou, H.; Wang, R. Analysis on location accuracy for the binocular stereo vision system. IEEE Photonics J. 2017, 10, 1–16. [Google Scholar] [CrossRef]
- Lin, E. Comparative Analysis of Pix2Pix and CycleGAN for Image-to-Image Translation. Highlights Sci. Eng. Technol. 2023, 39, 915–925. [Google Scholar] [CrossRef]
- Multi-view underwater image enhancement method via embedded fusion mechanism. Eng. Appl. Artif. Intell. 2023, 121, 105946. [CrossRef]
- Zhou, J.; Liu, Q.; Jiang, Q.; Ren, W.; Lam, K.M.; Zhang, W. Underwater camera: Improving visual perception via adaptive dark pixel prior and color correction. Int. J. Comput. Vis. 2023. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA, 8–13 December 2014; Volume 27. [Google Scholar]
- Hertzmann, A.; Jacobs, C.E.; Oliver, N.; Curless, B.; Salesin, D.H. Image analogies. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 12–17 August 2001; pp. 327–340. [Google Scholar]
- Resales; Achan; Frey. Unsupervised image translation. In Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, 14–17 October 2003; pp. 472–478. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8798–8807. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8789–8797. [Google Scholar]
- Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. WaterGAN: Unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robot. Autom. Lett. 2017, 3, 387–394. [Google Scholar] [CrossRef]
- Wang, N.; Zhou, Y.; Han, F.; Zhu, H.; Yao, J. UWGAN: Underwater GAN for real-world underwater color restoration and dehazing. arXiv 2019, arXiv:1912.10269. [Google Scholar]
- Li, N.; Zheng, Z.; Zhang, S.; Yu, Z.; Zheng, H.; Zheng, B. The synthesis of unpaired underwater images using a multistyle generative adversarial network. IEEE Access 2018, 6, 54241–54257. [Google Scholar] [CrossRef]
- Zhou, J.; Li, B.; Zhang, D.; Yuan, J.; Zhang, W.; Cai, Z.; Shi, J. UGIF-Net: An Efficient Fully Guided Information Flow Network for Underwater Image Enhancement. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
- Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv 2018, arXiv:1811.12231. [Google Scholar]
- Lin, J.; Pang, Y.; Xia, Y.; Chen, Z.; Luo, J. Tuigan: Learning versatile image-to-image translation with two unpaired images. In Proceedings of the 16th European Conference of Computer Vision (ECCV 2020), Glasgow, UK, 23–28 August 2020; Proceedings, Part IV 16. Springer: Cham, Switzerland, 2020; pp. 18–35. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Mahendran, A.; Vedaldi, A. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5188–5196. [Google Scholar]
- You, Q.; Wan, C.; Sun, J.; Shen, J.; Ye, H.; Yu, Q. Fundus image enhancement method based on CycleGAN. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 4500–4503. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Xu, J.; Li, Z.; Du, B.; Zhang, M.; Liu, J. Reluplex made more practical: Leaky ReLU. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 7–10 July 2020; pp. 1–7. [Google Scholar]
- Najafipour, A.; Babaee, A.; Shahrtash, S.M. Comparing the trustworthiness of signal-to-noise ratio and peak signal-to-noise ratio in processing noisy partial discharge signals. IET Sci. Meas. Technol. 2013, 7, 112–118. [Google Scholar] [CrossRef]
- Khadtare, M.S. GPU based image quality assessment using structural similarity (SSIM) index. In Emerging Research Surrounding Power Consumption and Performance Issues in Utility Computing; IGI Global: Hershey, PA, USA, 2016; pp. 276–282. [Google Scholar]
- Xu, N.; Zhuang, J.; Xiao, J.; Peng, C. Regional Differential Information Entropy for Super-Resolution Image Quality Assessment. arXiv 2021, arXiv:2107.03642. [Google Scholar]
- Shaham, T.R.; Dekel, T.; Michaeli, T. Singan: Learning a generative model from a single natural image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4570–4580. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Zhang, J.; Zhang, J.; Zhou, K.; Zhang, Y.; Chen, H.; Yan, X. An Improved YOLOv5-Based Underwater Object-Detection Framework. Sensors 2023, 23, 3693. [Google Scholar] [CrossRef] [PubMed]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Liu, M.Y.; Huang, X.; Mallya, A.; Karras, T.; Aila, T.; Lehtinen, J.; Kautz, J. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10551–10560. [Google Scholar]
- Wang, W.; Bao, J.; Zhou, W.; Chen, D.; Chen, D.; Yuan, L.; Li, H. Sindiffusion: Learning a diffusion model from a single natural image. arXiv 2022, arXiv:2211.12445. [Google Scholar]
- Zhou, J.; Pang, L.; Zhang, D.; Zhang, W. Underwater Image Enhancement Method via Multi-Interval Subhistogram Perspective Equalization. IEEE J. Ocean. Eng. 2023, 48, 474–488. [Google Scholar] [CrossRef]
Category | TuiGAN | TuiGAN with CBAM Modules | TuiGAN with Style-Independent Discriminators | SID-AM-MSITM |
---|---|---|---|---|
Sunken Ship + Sonar | 17.20 | 20.28 | 19.53 | 22.77 |
Sunken Ship + Optics | 21.24 | 18.91 | 19.98 | 22.87 |
Crashed plane + Sonar | 20.02 | 23.00 | 20.50 | 24.26 |
Crashed plane + Optics | 26.46 | 25.04 | 29.17 | 26.46 |
Submarine + Sonar | 27.27 | 25.88 | 31.11 | 31.85 |
Submarine + Optics | 23.20 | 31.83 | 27.90 | 25.96 |
Category | TuiGAN | TuiGAN with CBAM Modules | TuiGAN with Style-Independent Discriminators | SID-AM-MSITM |
---|---|---|---|---|
Sunken Ship + Sonar | 0.68 | 0.82 | 0.75 | 0.83 |
Sunken Ship + Optics | 0.85 | 0.88 | 0.87 | 0.90 |
Crashed plane + Sonar | 0.83 | 0.91 | 0.83 | 0.90 |
Crashed plane + Optics | 0.91 | 0.89 | 0.92 | 0.92 |
Submarine + Sonar | 0.85 | 0.84 | 0.88 | 0.89 |
Submarine + Optics | 0.70 | 0.87 | 0.79 | 0.84 |
Category | TuiGAN | TuiGAN with CBAM Modules | TuiGAN with Style-Independent Discriminators | SID-AM-MSITM |
---|---|---|---|---|
Sunken Ship + Sonar | 6.75 | 6.70 | 7.12 | 7.00 |
Sunken Ship + Optics | 6.00 | 6.21 | 6.86 | 6.78 |
Crashed plane + Sonar | 7.31 | 6.53 | 7.23 | 7.38 |
Crashed plane + Optics | 5.57 | 5.53 | 5.70 | 5.63 |
Submarine + Sonar | 6.47 | 5.54 | 6.96 | 6.59 |
Submarine + Optics | 6.32 | 6.29 | 5.44 | 6.33 |
Our Model | CycleGAN | FUNIT | AdaIN | TuiGAN | SinDiffusion |
---|---|---|---|---|---|
0.092 | 0.130 | 17.5 | 9.51 | 0.101 | 1.21 |
0.054 | 0.080 | 17.6 | 9.41 | 0.072 | 1.23 |
0.050 | 0.108 | 17.6 | 9.36 | 0.052 | 1.30 |
0.015 | 0.035 | 17.5 | 9.35 | 0.017 | 1.20 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, D.; Zhang, T.; Li, B.; Li, M.; Chen, W.; Li, X.; Wang, X. Underwater Image Translation via Multi-Scale Generative Adversarial Network. J. Mar. Sci. Eng. 2023, 11, 1929. https://doi.org/10.3390/jmse11101929
Yang D, Zhang T, Li B, Li M, Chen W, Li X, Wang X. Underwater Image Translation via Multi-Scale Generative Adversarial Network. Journal of Marine Science and Engineering. 2023; 11(10):1929. https://doi.org/10.3390/jmse11101929
Chicago/Turabian StyleYang, Dongmei, Tianzi Zhang, Boquan Li, Menghao Li, Weijing Chen, Xiaoqing Li, and Xingmei Wang. 2023. "Underwater Image Translation via Multi-Scale Generative Adversarial Network" Journal of Marine Science and Engineering 11, no. 10: 1929. https://doi.org/10.3390/jmse11101929
APA StyleYang, D., Zhang, T., Li, B., Li, M., Chen, W., Li, X., & Wang, X. (2023). Underwater Image Translation via Multi-Scale Generative Adversarial Network. Journal of Marine Science and Engineering, 11(10), 1929. https://doi.org/10.3390/jmse11101929