WEDM: Wavelet-Enhanced Diffusion with Multi-Stage Frequency Learning for Underwater Image Enhancement
Abstract
:1. Introduction
- We propose a novel UIE framework, the WEDM, based on WTConv and diffusion models. This framework addresses the frequency domain characteristics of underwater images and the degradation mechanism of diffusion models in high-frequency information, successfully implementing frequency domain enhancement and diffusion adjustment. This significantly improves the model’s ability to restore underwater image details and textures.
- We introduce the Wavelet Color Compensation Module (WCCM), which uses discrete wavelet transform for frequency domain decomposition and enhancement fusion. This effectively compensates for the degradation in underwater images. In the WEDM, the image processed by the color compensation module serves as a strong conditional guide, driving the diffusion model’s denoising process to accurately restore high-frequency details and reduce recovery bias caused by degradation.
- We propose the WTConv Residual Diffusion Adjustment Module (WDM), which deeply explores the potential of diffusion models in frequency domain modeling. This significantly improves the restoration of image details and textures while enhancing the model’s robustness to noise and generalization capabilities.
- Experimental results show that the WEDM outperforms previous UIE methods. Extensive ablation experiments validate the effectiveness of our contributions.
2. Related Work
2.1. Traditional Underwater Image Enhancement Method
2.2. Deep Learning Underwater Image Enhancement Methods
2.3. Diffusion Models
3. Methodology
3.1. Overall Framework
3.2. Wavelet Color Compensation Module
- LAB conversion: Convert the RGB image to the LAB color space, which allows for the separation of luminance (L) and chrominance (a, b) channels. This separation facilitates the independent adjustment of color components, where the luminance information is less affected by the scattering effects of water.
- Mask generation: Based on the luminance values, a binary mask is generated using a thresholding approach. Specifically, pixels with luminance greater than 0.847 are assigned a value of zero, and others are assigned one. This mask enables discriminative processing of regions with higher and lower luminance, ensuring that color compensation is applied appropriately based on the image’s luminance distribution.
- Wavelet-based correction: After decomposition of the image into wavelet subbands, color correction is applied to the low-frequency subbands of the a and b chrominance channels. This step adjusts for the color bias introduced by the underwater environment.
- Reconstruction: After color correction is applied to the low-frequency subbands, the image is reconstructed by merging all subbands using the inverse discrete wavelet transform (IDWT). The final image is then converted back to the RGB color space to obtain the color-corrected result.
3.3. Wavelet Diffusion Module
- Forward Diffusion: The forward diffusion process is modeled as a Markov chain, where noise is gradually added to the image. The forward step for a single noise addition can be written asHere, represents the image at time step t, and is the residual between the degraded image and the clean image . The parameters and control the influence of the residual and the noise respectively.
- Wavelet Convolution Parameter Setting and Sensitivity Analysis: The WTConv module leverages wavelet transform to effectively avoid the frequency aliasing problem inherent in conventional convolution, while achieving a large receptive field without significantly increasing computational cost. We adopt the db1 wavelet (Haar wavelet), which provides clear frequency localization and minimal computational complexity in image processing tasks. In addition, the wavelet convolution kernel size is set to 5 × 5. Experimental results show that, compared with smaller kernels, the 5 × 5 kernel is more effective in capturing long-range dependencies among image features, thereby better preserving and restoring high-frequency details. Following the experimental study of Finder et al. [38], we set the wavelet decomposition level to 2, which yields superior multi-scale feature representations.
- Reverse Denoising: The reverse diffusion employs L1 loss for residual prediction to preserve high-frequency details. The reverse denoising process is modeled as follows:
- Feature Reconstruction: After the WTConv operations, the processed high-frequency subbands are scaled to control their influence. The feature map is then reconstructed by merging the high-frequency and low-frequency components and applying inverse wavelet transform (IWT):
- Training Process and Loss Function: During the reverse diffusion process, the model predicts both the residuals and the noise . The training objective is to minimize the Kullback–Leibler (KL) divergence between the true posterior and the predicted posterior, which is simplified to the following L1 loss:This simplified loss function ensures that the model learns to predict the residuals effectively, leading to better performance in underwater image enhancement tasks.
4. Experiments
4.1. Setup
4.2. Results and Comparisons
4.2.1. Qualitative Comparison
4.2.2. Visual Evaluation of Enhancement Results
4.3. Applicability Analysis
Ablation Tests
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhou, W.; Zheng, F.; Yin, G.; Pang, Y.; Yi, J. YOLOTrashCan: A deep learning marine debris detection network. IEEE Trans. Instrum. Meas. 2023, 72, 1–12. [Google Scholar]
- Qi, Q.; Zhang, Y.; Tian, F.; Wu, Q.J.; Li, K.; Luan, X.; Song, D. Underwater image co-enhancement with correlation feature matching and joint learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 1133–1147. [Google Scholar]
- Sun, S.; Guo, H.; Wan, G.; Dong, C.; Zheng, C.; Wang, Y. High precision underwater acoustic localization of the black box utilizing an autonomous underwater vehicle based on the improved artificial potential field. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–10. [Google Scholar]
- Li, C.-Y.; Guo, J.-C.; Cong, R.-M.; Pang, Y.-W.; Wang, B. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans. Image Process. 2016, 25, 5664–5677. [Google Scholar]
- Cao, X.; Ren, L.; Sun, C. Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction. IEEE Trans. Cybern. 2023, 53, 1968–1981. [Google Scholar]
- Guan, M.; Xu, H.; Jiang, G.; Yu, M.; Chen, Y.; Luo, T.; Zhang, X. DiffWater: Underwater image enhancement based on conditional denoising diffusion probabilistic model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 2319–2335. [Google Scholar] [CrossRef]
- Hummel, R. Image enhancement by histogram transformation. Comput. Graph. Image Process. 1975, 6, 184–195. [Google Scholar] [CrossRef]
- Liang, Z.; Ding, X.; Wang, Y.; Yan, X.; Fu, X. GUDCP: Generalization of underwater dark channel prior for underwater image restoration. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 4879–4884. [Google Scholar]
- Luan, X.; Fan, H.; Wang, Q.; Yang, N.; Liu, S.; Li, X.; Tang, Y. FMambaIR: A Hybrid State Space Model and Frequency Domain for Image Restoration. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1234–1245. [Google Scholar] [CrossRef]
- Sarkar, P.; De, S.; Gurung, S.; Dey, P. UICE-MIRNet guided image enhancement for underwater object detection. Sci. Rep. 2024, 14, 22448. [Google Scholar] [CrossRef]
- Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Liu, J.; Wang, Q.; Fan, H.; Wang, Y.; Tang, Y.; Qu, L. Residual denoising diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2773–2783. [Google Scholar]
- Shi, X.; Wang, Y.G. CPDM: Content-preserving diffusion model for underwater image enhancement. Sci. Rep. 2024, 14, 31309. [Google Scholar] [CrossRef]
- Peng, L.; Zhu, C.; Bian, L. U-Shape Transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar]
- Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [PubMed]
- Li, H.; Li, J.; Wang, W. A fusion adversarial underwater image enhancement network with a public test dataset. arXiv 2019, arXiv:1906.06819. [Google Scholar]
- Peng, Y.-T.; Cao, K.; Cosman, P.C. Generalization of the dark channel prior for single image restoration. IEEE Trans. Image Process. 2018, 27, 2856–2868. [Google Scholar] [CrossRef]
- Liang, Z.; Zhang, W.; Ruan, R.; Zhuang, P.; Li, C. GIFM: An image restoration method with generalized image formation model for poor visible conditions. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4110616. [Google Scholar]
- Zhang, W.; Dong, L.; Xu, W. Retinex-inspired color correction and detail preserved fusion for underwater image enhancement. Comput. Electron. Agric. 2022, 192, 106585. [Google Scholar]
- Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Garcia, R. Locally adaptive color correction for underwater image dehazing and matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 997–1005. [Google Scholar]
- Fu, X.; Cao, X. Underwater image enhancement with global–local networks and compressed-histogram equalization. Signal Process. Image Commun. 2020, 86, 115892. [Google Scholar]
- Zhuang, P.; Wu, J.; Porikli, F.; Li, C. Underwater image enhancement with hyper-Laplacian reflectance priors. IEEE Trans. Image Process. 2022, 31, 5442–5455. [Google Scholar] [CrossRef]
- Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038–107049. [Google Scholar]
- Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. WaterGAN: Unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robot. Autom. Lett. 2018, 3, 387–394. [Google Scholar] [CrossRef]
- Peng, Y.T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar]
- Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar]
- Yang, J.; Li, C.; Li, X. Underwater image restoration with light-aware progressive network. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Huang, S.; Wang, K.; Liu, H.; Chen, J.; Li, Y. Contrastive semi-supervised learning for underwater image restoration via reliable bank. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18145–18155. [Google Scholar]
- Wang, Q.; Li, B.; Li, N.; Xie, J.; Wang, X.; Wang, X.; Chen, Y. Domain adaptive multi-frequency underwater image enhancement network. J. Electron. Imaging 2024, 33, 053035. [Google Scholar]
- Saharia, C.; Chan, W.; Chang, H.; Lee, C.; Ho, J.; Salimans, T.; Fleet, D.; Norouzi, M. Palette: Image-to-image diffusion models. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 7–11 August 2022; pp. 1–10. [Google Scholar]
- Tang, Y.; Kawasaki, H.; Iwaguchi, T. Underwater Image Enhancement by Transformer-Based Diffusion Model with Non-Uniform Sampling for Skip Strategy. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 5419–5427. [Google Scholar]
- Zhao, C.; Cai, W.; Dong, C.; Hu, C. Wavelet-based Fourier information interaction with frequency diffusion adjustment for underwater image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 8281–8291. [Google Scholar]
- Phung, H.; Dao, Q.; Tran, A. Wavelet Diffusion Models Are Fast and Scalable Image Generators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 10199–10208. [Google Scholar]
- Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Sbert, M. Color Channel Compensation (3C): A Fundamental Pre-Processing Step for Image Enhancement. IEEE Trans. Image Process. 2019, 29, 2653–2665. [Google Scholar]
- Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 22–26 September 2024; Springer Nature: Cham, Switzerland, 2024; pp. 363–380. [Google Scholar]
- Drews, P., Jr.; do Nascimento, E.; Moraes, F.; Botelho, S.; Campos, M. Transmission estimation in underwater single images. In Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2–8 December 2013; pp. 825–830. [Google Scholar]
- Zhang, W.; Zhou, L.; Zhuang, P.; Li, G.; Pan, X.; Zhao, W.; Li, C. Underwater image enhancement via weighted wavelet visual perception fusion. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 2469–2483. [Google Scholar] [CrossRef]
- Fu, Z.; Wang, W.; Huang, Y.; Ding, X.; Ma, K.K. Uncertainty inspired underwater image enhancement. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 465–482. [Google Scholar]
- Zhang, S.; Zhao, S.; An, D.; Li, D.; Zhao, R. LiteEnhanceNet: A lightweight network for real-time single underwater image enhancement. Expert Syst. Appl. 2024, 240, 122546. [Google Scholar]
- Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). Proc. Aaai Conf. Artif. Intell. 2021, 35, 15853–15854. [Google Scholar] [CrossRef]
- Korhonen, J.; You, J. Peak signal-to-noise ratio revisited: Is simple beautiful? In Proceedings of the 2012 Fourth International Workshop on Quality of Multimedia Experience, Melbourne, VIC, Australia, 5–7 July 2012; pp. 37–38. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Yang, M.; Sowmya, A. An underwater color image quality evaluation metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [PubMed]
- Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired underwater image quality measures. IEEE J. Ocean. Eng. 2016, 41, 541–551. [Google Scholar]
Method | TEST-L400 | UIEB | ||
---|---|---|---|---|
PSNR | SSIM | PSNR | SSIM | |
UDCP | 14.53 | 0.656 | 12.64 | 0.617 |
WWPF | 18.29 | 0.759 | 19.04 | 0.823 |
PUIE-Net | 28.53 | 0.917 | 22.47 | 0.883 |
LENet | 26.64 | 0.929 | 23.37 | 0.891 |
Shallow | 23.26 | 0.878 | 19.45 | 0.754 |
FUnIE | 23.11 | 0.823 | 20.16 | 0.819 |
DM-water | 29.95 | 0.946 | 23.19 | 0.893 |
WEDM | 35.44 | 0.961 | 24.23 | 0.910 |
Method | Challenge60 | U45 | Enhanced Time | ||
---|---|---|---|---|---|
UIQM | UCIQE | UIQM | UCIQE | ||
UDCP | 1.36 | 0.55 | 2.30 | 0.59 | 1.120 s |
WWPF | 2.34 | 0.58 | 2.80 | 0.60 | 0.228 s |
PUIE-Net | 2.53 | 0.56 | 3.15 | 0.57 | 0.015 s |
LENet | 2.58 | 0.57 | 3.07 | 0.59 | 0.010 s |
shallow | 2.30 | 0.50 | 2.89 | 0.52 | 0.050 s |
FUnIE | 2.37 | 0.54 | 3.22 | 0.58 | 0.057 s |
DM-water | 2.56 | 0.58 | 2.96 | 0.60 | 0.862 s |
WEDM | 2.70 | 0.59 | 3.28 | 0.62 | 0.110 s |
Baselines | PSNR | SSIM |
---|---|---|
Base | 29.12 | 0.865 |
w/o WCCM | 32.80 | 0.947 |
w/o WDM | 31.54 | 0.958 |
Full model | 35.44 | 0.963 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.; Ye, S.; Ouyang, X.; Zhuang, J. WEDM: Wavelet-Enhanced Diffusion with Multi-Stage Frequency Learning for Underwater Image Enhancement. J. Imaging 2025, 11, 114. https://doi.org/10.3390/jimaging11040114
Chen J, Ye S, Ouyang X, Zhuang J. WEDM: Wavelet-Enhanced Diffusion with Multi-Stage Frequency Learning for Underwater Image Enhancement. Journal of Imaging. 2025; 11(4):114. https://doi.org/10.3390/jimaging11040114
Chicago/Turabian StyleChen, Junhao, Sichao Ye, Xiong Ouyang, and Jiayan Zhuang. 2025. "WEDM: Wavelet-Enhanced Diffusion with Multi-Stage Frequency Learning for Underwater Image Enhancement" Journal of Imaging 11, no. 4: 114. https://doi.org/10.3390/jimaging11040114
APA StyleChen, J., Ye, S., Ouyang, X., & Zhuang, J. (2025). WEDM: Wavelet-Enhanced Diffusion with Multi-Stage Frequency Learning for Underwater Image Enhancement. Journal of Imaging, 11(4), 114. https://doi.org/10.3390/jimaging11040114