Vision Transformers in Image Restoration: A Survey
Abstract
:1. Introduction
1.1. Techniques Used in Image Restoration
1.2. Related Surveys
1.3. Main Contributions and Organization of the Survey
- The study lists the most important ViT-based architectures introduced in the image restoration domain, classified by each of the seven subtasks: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing.
- It describes the impact of using ViT, its advantages, and drawbacks in relation to these image restoration tasks.
- It compares the efficiency metrics, such as The Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index Metric (SSIM) of the ViT-based architectures, on the main benchmarks and datasets used in every task of image restoration.
- It discusses the most critical challenges facing ViT in image restoration and presents some solutions and future work.
2. Vision Transformer Model
3. Evaluation Metrics
4. Image Restoration Tasks
4.1. Image Super-Resolution
4.2. Image Denoising
4.3. Image Deblurring
4.4. Image Dehazing
4.5. Image JPEG Compression Artifact Reduction
4.6. Removing Adverse Weather Conditions
4.7. General Image Enhancement
5. Discussion and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- El-Shafai, W.; Ali, A.M.; El-Rabaie, E.S.M.; Soliman, N.F.; Algarni, A.D.; Abd El-Samie, F.E. Automated COVID-19 Detection Based on Single-Image Super-Resolution and CNN Models. Comput. Mater. Contin. 2021, 70, 1141–1157. [Google Scholar] [CrossRef]
- El-Shafai, W.; El-Nabi, S.A.; El-Rabaie, E.S.M.; Ali, A.M.; Soliman, N.F.; Algarni, A.D.; Abd El-Samie, F.E. Efficient Deep-Learning-Based Autoencoder Denoising Approach for Medical Image Diagnosis. Comput. Mater. Contin. 2021, 70, 6107–6125. [Google Scholar] [CrossRef]
- El-Shafai, W.; Mohamed, E.M.; Zeghid, M.; Ali, A.M.; Aly, M.H. Hybrid Single Image Super-Resolution Algorithm for Medical Images. Comput. Mater. Contin. 2022, 72, 4879–4896. [Google Scholar] [CrossRef]
- Lu, D.; Weng, Q. A Survey of Image Classification Methods and Techniques for Improving Classification Performance. Int. J. Remote Sens. 2007, 28, 823–870. [Google Scholar] [CrossRef]
- Noor, A.; Benjdira, B.; Ammar, A.; Koubaa, A. DriftNet: Aggressive Driving Behaviour Detection Using 3D Convolutional Neural Networks. In Proceedings of the 1st International Conference of Smart Systems and Emerging Technologies, SMART-TECH 2020, Riyadh, Saudi Arabia, 3–5 November 2020; pp. 214–219. [Google Scholar] [CrossRef]
- Varone, G.; Boulila, W.; lo Giudice, M.; Benjdira, B.; Mammone, N.; Ieracitano, C.; Dashtipour, K.; Neri, S.; Gasparini, S.; Morabito, F.C.; et al. A Machine Learning Approach Involving Functional Connectivity Features to Classify Rest-EEG Psychogenic Non-Epileptic Seizures from Healthy Controls. Sensors 2022, 22, 129. [Google Scholar] [CrossRef]
- Benjdira, B.; Koubaa, A.; Boulila, W.; Ammar, A. Parking Analytics Framework Using Deep Learning. In Proceedings of the 2022 2nd International Conference of Smart Systems and Emerging Technologies, SMARTTECH 2022, Riyadh, Saudi Arabia, 9–11 May 2022; pp. 200–205. [Google Scholar] [CrossRef]
- Benjdira, B.; Koubaa, A.; Azar, A.T.; Khan, Z.; Ammar, A.; Boulila, W. TAU: A Framework for Video-Based Traffic Analytics Leveraging Artificial Intelligence and Unmanned Aerial Systems. Eng. Appl. Artif. Intell. 2022, 114, 105095. [Google Scholar] [CrossRef]
- Benjdira, B.; Ouni, K.; al Rahhal, M.M.; Albakr, A.; Al-Habib, A.; Mahrous, E. Spinal Cord Segmentation in Ultrasound Medical Imagery. Appl. Sci. 2020, 10, 1370. [Google Scholar] [CrossRef] [Green Version]
- Benjdira, B.; Ammar, A.; Koubaa, A.; Ouni, K. Data-Efficient Domain Adaptation for Semantic Segmentation of Aerial Imagery Using Generative Adversarial Networks. Appl. Sci. 2020, 10, 1092. [Google Scholar] [CrossRef] [Green Version]
- Khan, A.R.; Saba, T.; Khan, M.Z.; Fati, S.M.; Khan, M.U.G. Classification of Human’s Activities from Gesture Recognition in Live Videos Using Deep Learning. Concurr. Comput. 2022, 34, e6825. [Google Scholar] [CrossRef]
- Ubaid, M.T.; Saba, T.; Draz, H.U.; Rehman, A.; Ghani, M.U.; Kolivand, H. Intelligent Traffic Signal Automation Based on Computer Vision Techniques Using Deep Learning. IT Prof. 2022, 24, 27–33. [Google Scholar] [CrossRef]
- Delia-Alexandrina, M.; Nedevschi, S.; Fati, S.M.; Senan, E.M.; Azar, A.T. Hybrid and Deep Learning Approach for Early Diagnosis of Lower Gastrointestinal Diseases. Sensors 2022, 22, 4079. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M.; Khan, S.; Naseer, M.; City, M.; Dhabi, A.; et al. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 200. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
- Islam, K. Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work. arXiv 2022. [Google Scholar] [CrossRef]
- Shamshad, F.; Khan, S.; Zamir, S.W.; Khan, M.H.; Hayat, M.; Khan, F.S.; Fu, H. Transformers in Medical Imaging: A Survey. arXiv 2022. [Google Scholar] [CrossRef]
- Su, J.; Xu, B.; Yin, H. A Survey of Deep Learning Approaches to Image Restoration. Neurocomputing 2022, 487, 46–65. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V.; et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
- Vaswani, A.; Brain, G.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Pérez, J.; Marinković, J.; Barceló, P. On the Turing Completeness of Modern Neural Network Architectures. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar] [CrossRef]
- Cordonnier, J.-B.; Loukas, A.; Jaggí, M.J. On the Relationship between Self-Attention and Convolutional Layers. arXiv 2019. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Li, X.; Yin, X.; Li, C.; Zhang, P.; Hu, X.; Zhang, L.; Wang, L.; Hu, H.; Dong, L.; Wei, F.; et al. Oscar: Object-Semantics Aligned Pre-Training for Vision-Language Tasks. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 121–137. [Google Scholar] [CrossRef]
- Su, W.; Zhu, X.; Cao, Y.; Li, B.; Lu, L.; Wei, F.; Dai, J. VL-BERT: Pre-Training of Generic Visual-Linguistic Representations. arXiv 2019. [Google Scholar] [CrossRef]
- Fedus, W.; Zoph, B.; Shazeer, N. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. J. Mach. Learn. Res. 2021, 23, 1–40. [Google Scholar] [CrossRef]
- Jing, L.; Tian, Y. Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4037–4058. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, F.; Hou, Z.; Mian, L.; Wang, Z.; Zhang, J.; Tang, J. Self-Supervised Learning: Generative or Contrastive. IEEE Trans. Knowl. Data Eng. 2021, 35, 857–876. [Google Scholar] [CrossRef]
- Thung, K.H.; Raveendran, P. A Survey of Image Quality Measures. In Proceedings of the International Conference for Technical Postgraduates 2009, TECHPOS 2009, Kuala Lumpur, Malaysia, 14–15 December 2009. [Google Scholar] [CrossRef]
- Saad, M.A.; Bovik, A.C.; Charrier, C. Blind Image Quality Assessment: A Natural Scene Statistics Approach in the DCT Domain. IEEE Trans. Image Process. 2012, 21, 3339–3352. [Google Scholar] [CrossRef]
- Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs SSIM. In Proceedings of the International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar] [CrossRef]
- Almohammad, A.; Ghinea, G. Stego Image Quality and the Reliability of PSNR. In Proceedings of the 2010 2nd International Conference on Image Processing Theory, Tools and Applications, IPTA 2010, Paris, France, 7–10 July 2010; pp. 215–220. [Google Scholar] [CrossRef]
- Rouse, D.M.; Hemami, S.S. Analyzing the Role of Visual Structure in the Recognition of Natural Image Content with Multi-Scale SSIM. In Human Vision and Electronic Imaging XIII; SPIE: Bellingham, WA, USA, 2008; Volume 6806, pp. 410–423. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
- Sara, U.; Akter, M.; Uddin, M.S.; Sara, U.; Akter, M.; Uddin, M.S. Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study. J. Comput. Commun. 2019, 7, 8–18. [Google Scholar] [CrossRef] [Green Version]
- Pambrun, J.F.; Noumeir, R. Limitations of the SSIM Quality Metric in the Context of Diagnostic Imaging. In Proceedings of the International Conference on Image Processing, ICIP 2015, Quebec City, QC, Canada, 27–30 September 2015; pp. 2960–2963. [Google Scholar] [CrossRef]
- Marathe, A.; Jain, P.; Walambe, R.; Kotecha, K. RestoreX-AI: A Contrastive Approach Towards Guiding Image Restoration via Explainable AI Systems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 3030–3039. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Cheon, M.; Yoon, S.-J.; Kang, B.; Lee, J. Perceptual Image Quality Assessment with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021; pp. 433–442. [Google Scholar]
- Conde, M.V.; Burchi, M.; Timofte, R. Conformer and Blind Noisy Students for Improved Image Quality Assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 940–950. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Available online: https://dl.acm.org/doi/10.5555/3298023.3298188 (accessed on 30 October 2022).
- Park, S.C.; Park, M.K.; Kang, M.G. Super-Resolution Image Reconstruction: A Technical Overview. IEEE Signal Process. Mag. 2003, 20, 21–36. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Wang, X.; Zhou, J.; Dong, C. Activating More Pixels in Image Super-Resolution Transformer. arXiv 2022. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; van Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Zhang, D.; Huang, F.; Liu, S.; Wang, X.; Jin, Z. SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution. arXiv 2022. [Google Scholar] [CrossRef]
- Yang, C.Y.; Ma, C.; Yang, M.H. Single-Image Super-Resolution: A Benchmark. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 372–386. [Google Scholar] [CrossRef]
- Yang, C.-Y.; Yang, M.-H. Fast Direct Super-Resolution by Simple Functions. In Proceedings of the IEEE International Conference on Computer Vision 2013, Sydney, Australia, 2–8 December 2013; pp. 561–568. [Google Scholar]
- Shan, Q.; Li, Z.; Jia, J.; Tang, C.K. Fast Image/Video Upsampling. ACM Trans. Graph. 2008, 27, 153. [Google Scholar] [CrossRef] [Green Version]
- Sun, J.; Sun, J.; Xu, Z.; Shum, H.Y. Image Super-Resolution Using Gradient Profile Prior. In Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Anchorage, Alaska, 23–28 June 2008. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Tu, J.; Mei, G.; Ma, Z.; Piccialli, F. SWCGAN: Generative Adversarial Network Combining Swin Transformer and CNN for Remote Sensing Image Super-Resolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5662–5673. [Google Scholar] [CrossRef]
- Ma, Q.; Jiang, J.; Liu, X.; Ma, J. Learning A 3D-CNN and Transformer Prior for Hyperspectral Image Super-Resolution. arXiv 2021. [Google Scholar] [CrossRef]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-Trained Image Processing Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021; pp. 12299–12310. [Google Scholar]
- He, D.; Wu, S.; Liu, J.; Xiao, G. Cross Transformer Network for Scale-Arbitrary Image Super-Resolution Lecture Notes in Computer Science. In Proceedings of the Knowledge Science, Engineering and Management: 15th International Conference, KSEM 2022, Singapore, 6–8 August 2022; pp. 633–644. [Google Scholar] [CrossRef]
- Liu, H.; Shao, M.; Wang, C.; Cao, F. Image Super-Resolution Using a Simple Transformer Without Pretraining. Neural Process. Lett. 2022, 1–19. [Google Scholar] [CrossRef]
- Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient Long-Range Attention Network for Image Super-Resolution. arXiv 2022. [Google Scholar] [CrossRef]
- Cai, Q.; Qian, Y.; Li, J.; Lv, J.; Yang, Y.-H.; Wu, F.; Zhang, D. HIPA: Hierarchical Patch Transformer for Single Image Super Resolution. arXiv 2022. [Google Scholar] [CrossRef]
- Yoo, J.; Kim, T.; Lee, S.; Kim, S.H.; Lee, H.; Kim, T.H. Enriched CNN-Transformer Feature Aggregation Networks for Super-Resolution. arXiv 2022. [Google Scholar] [CrossRef]
- Wang, S.; Zhou, T.; Lu, Y.; Di, H. Detail-Preserving Transformer for Light Field Image Super-Resolution. In Proceedings of the AAAI Conference on Artificial Intelligence 2022, Virtual, 22 February–1 March 2022; Volume 36, pp. 2522–2530. [Google Scholar] [CrossRef]
- Liang, Z.; Wang, Y.; Wang, L.; Yang, J.; Zhou, S. Light Field Image Super-Resolution with Transformers. IEEE Signal Process. Lett. 2022, 29, 563–567. [Google Scholar] [CrossRef]
- Lei, S.; Shi, Z.; Mo, W. Transformer-Based Multistage Enhancement for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 3136190. [Google Scholar] [CrossRef]
- Wang, L.; Zhu, H.; He, Z.; Jia, Y.; Du, J. Adjacent Slices Feature Transformer Network for Single Anisotropic 3D Brain MRI Image Super-Resolution. Biomed. Signal Process. Control. 2022, 72, 103339. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, L.; Chen, W.; Jia, Y.; He, Z.; Du, J. 3D Cross-Scale Feature Transformer Network for Brain mr Image Super-Resolution. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, 23–27 May 2022; pp. 1356–1360. [Google Scholar] [CrossRef]
- Fang, C.; Zhang, D.; Wang, L.; Zhang, Y.; Cheng, L.; Lab, Z.; Han, J. Cross-Modality High-Frequency Transformer for MR Image Super-Resolution. arXiv 2022. [Google Scholar] [CrossRef]
- Hu, J.F.; Huang, T.Z.; Deng, L.J.; Dou, H.X.; Hong, D.; Vivone, G. Fusformer: A Transformer-Based Fusion Network for Hyperspectral Image Super-Resolution. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6012305. [Google Scholar] [CrossRef]
- Liu, Y.; Hu, J.; Kang, X.; Luo, J.; Fan, S. Interactformer: Interactive Transformer and CNN for Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5531715. [Google Scholar] [CrossRef]
- Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning Texture Transformer Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 13–19 June 2020; pp. 5791–5800. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
- Gao, G.; Xu, Z.; Li, J.; Yang, J.; Zeng, T.; Qi, G.-J. CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution. arXiv 2022. [Google Scholar] [CrossRef]
- Yan, B.; Cao, L.; Qi, F.; Wang, H. Bilateral Network with Channel Splitting Network and Transformer for Thermal Image Super-Resolution. arXiv 2022. [Google Scholar] [CrossRef]
- Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [Green Version]
- Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef]
- Thakur, R.S.; Chatterjee, S.; Yadav, R.N.; Gupta, L. Image De-Noising with Machine Learning: A Review. IEEE Access 2021, 9, 93338–93363. [Google Scholar] [CrossRef]
- Tian, C.; Fei, L.; Zheng, W.; Xu, Y.; Zuo, W.; Lin, C.W. Deep Learning on Image Denoising: An Overview. Neural Netw. 2020, 131, 251–275. [Google Scholar] [CrossRef]
- Prayuda, A.W.H.; Prasetyo, H.; Guo, J.M. AWGN-Based Image Denoiser Using Convolutional Vision Transformer. In Proceedings of the 2021 International Symposium on Electronics and Smart Devices: Intelligent Systems for Present and Future Challenges, ISESD 2021, Bandung, Indonesia, 29–30 June 2021. [Google Scholar] [CrossRef]
- Liu, X.; Hong, Y.; Yin, Q.; Zhang, S. DnT: Learning Unsupervised Denoising Transformer from Single Noisy Image. In Proceedings of the 4th International Conference on Image Processing and Machine Vision, Hong Kong, China, 25–27 March 2022; pp. 50–56. [Google Scholar] [CrossRef]
- Pang, T.; Zheng, H.; Quan, Y.; Ji, H. Recorrupted-to-Recorrupted: Unsupervised Deep Learning for Image Denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021; pp. 2043–2052. [Google Scholar]
- Zhao, M.; Cao, G.; Huang, X.; Yang, L. Hybrid Transformer-CNN for Real Image Denoising. IEEE Signal Process. Lett. 2022, 29, 1252–1256. [Google Scholar] [CrossRef]
- Xue, T.; Ma, P. TC-Net: Transformer Combined with Cnn for Image Denoising. Appl. Intell. 2022, 1–10. [Google Scholar] [CrossRef]
- Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A General U-Shaped Transformer for Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 17683–17693. [Google Scholar]
- Fan, C.-M.; Liu, T.-J.; Liu, K.-H. SUNet: Swin Transformer UNet for Image Denoising. arXiv 2022. [Google Scholar] [CrossRef]
- Li, Z.; Jiang, H.; Zheng, Y. Polarized Color Image Denoising Using Pocoformer. arXiv 2022. [Google Scholar] [CrossRef]
- Zhang, Z.; Yu, L.; Liang, X.; Zhao, W.; Xing, L. TransCT: Dual-Path Transformer for Low Dose Computed Tomography. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; pp. 55–64. [Google Scholar] [CrossRef]
- Wang, D.; Wu, Z.; Yu, H. TED-Net: Convolution-Free T2T Vision Transformer-Based Encoder-Decoder Dilation Network for Low-Dose CT Denoising. In Proceedings of the Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, 27 September 2021; pp. 416–425. [Google Scholar] [CrossRef]
- Luthra, A.; Sulakhe, H.; Mittal, T.; Iyer, A.; Yadav, S. Eformer: Edge Enhancement Based Transformer for Medical Image Denoising. arXiv 2021. [Google Scholar] [CrossRef]
- Gupta, A.; Joshi, N.; Lawrence Zitnick, C.; Cohen, M.; Curless, B. Single Image Deblurring Using Motion Density Functions. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; pp. 171–184. [Google Scholar] [CrossRef]
- Wang, R.; Tao, D. Recent Progress in Image Deblurring. arXiv 2014. [Google Scholar] [CrossRef]
- Lai, W.-S.; Huang, J.-B.; Hu, Z.; Ahuja, N.; Yang, M.-H. A Comparative Study for Single Image Blind Deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1701–1709. [Google Scholar]
- Zhang, K.; Ren, W.; Luo, W.; Lai, W.S.; Stenger, B.; Yang, M.H.; Li, H. Deep Image Deblurring: A Survey. Int. J. Comput. Vis. 2022, 130, 2103–2130. [Google Scholar] [CrossRef]
- Zheng, Z.; Jia, X. UHD Image Deblurring via Multi-Scale Cubic-Mixer. arXiv 2022. [Google Scholar] [CrossRef]
- Tsai, F.-J.; Peng, Y.-T.; Lin, Y.-Y.; Tsai, C.-C.; Lin, C.-W. Stripformer: Strip Transformer for Fast Image Deblurring. arXiv 2022. [Google Scholar] [CrossRef]
- Sun, J.; Cao, W.; Xu, Z.; Ponce, J. Learning a Convolutional Neural Network for Non-Uniform Motion Blur Removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 769–777. [Google Scholar]
- Guo, C.-L.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image Dehazing Transformer with Transmission-Aware 3D Position Embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 5812–5820. [Google Scholar]
- Gao, G.; Cao, J.; Bao, C.; Hao, Q.; Ma, A.; Li, G. A Novel Transformer-Based Attention Network for Image Dehazing. Sensors 2022, 22, 3428. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Hua, Z.; Li, J. Two-Stage Single Image Dehazing Network Using Swin-Transformer. IET Image Process 2022, 16, 2518–2534. [Google Scholar] [CrossRef]
- Jiao, Q.; Liu, M.; Ning, B.; Zhao, F.; Dong, L.; Kong, L.; Hui, M.; Zhao, Y. Image Dehazing Based on Local and Non-Local Features. Fractal Fract. 2022, 6, 262. [Google Scholar] [CrossRef]
- Song, Y.; He, Z.; Qian, H.; Du, X. Vision Transformers for Single Image Dehazing. arXiv 2022. [Google Scholar] [CrossRef]
- Zhao, D.; Li, J.; Member, S.; Li, H.; Xu, L. Complementary Feature Enhanced Network with Vision Transformer for Image Dehazing. arXiv 2021. [Google Scholar] [CrossRef]
- Dong, C.; Deng, Y.; Loy, C.C.; Tang, X. Compression Artifacts Reduction by a Deep Convolutional Network. In Proceedings of the IEEE International Conference on Computer Vision 2015, Santiago, Chile, 7–13 December 2015; pp. 576–584. [Google Scholar]
- Liu, X.; Cheung, G.; Ji, X.; Zhao, D.; Gao, W. Graph-Based Joint Dequantization and Contrast Enhancement of Poorly Lit JPEG Images. IEEE Trans. Image Process. 2019, 28, 1205–1219. [Google Scholar] [CrossRef]
- Foi, A.; Katkovnik, V.; Egiazarian, K. Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images. IEEE Trans. Image Process. 2007, 16, 1395–1411. [Google Scholar] [CrossRef]
- Chen, T.; Wu, H.R.; Qiu, B. Adaptive Postifiltering of Transform Coefficients for the Reduction of Blocking Artifacts. IEEE Trans. Circuits Syst. Video Technol. 2001, 11, 594–602. [Google Scholar] [CrossRef]
- Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [Green Version]
- Jiang, J.; Zhang, K.; Timofte, R. Towards Flexible Blind JPEG Artifacts Removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, BC, Canada, 11–17 October 2021; pp. 4997–5006. [Google Scholar]
- Fu, X.; Zha, Z.-J.; Wu, F.; Ding, X.; Paisley, J. JPEG Artifacts Reduction via Deep Convolutional Sparse Coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2501–2510. [Google Scholar]
- Jiang, X.; Tan, W.; Cheng, R.; Zhou, S.; Yan, B. Learning Parallax Transformer Network for Stereo Image JPEG Artifacts Removal. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10 October 2022; pp. 6072–6082. [Google Scholar] [CrossRef]
- Charbonnier, P.; Blanc-Féraud, L.; Aubert, G.; Barlaud, M. Two Deterministic Half-Quadratic Regularization Algorithms for Computed Imaging. In Proceedings of the International Conference on Image Processing, ICIP 1994, Austin, TX, USA, 13–16 November 1994; Volume 2, pp. 168–172. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, J.; Wang, O.; Lin, Z.; Lu, H. SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 13–19 June 2020; pp. 541–550. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision 2018, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Perera, A.G.; Wei Law, Y.; Chahl, J. UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops 2018, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep Continuous Fusion for Multi-Sensor 3D Object Detection. In Proceedings of the European Conference on Computer Vision 2018, Munich, Germany, 8–14 September 2018; pp. 641–656. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection From RGB-D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
- Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear Total Variation Based Noise Removal Algorithms. Phys. D 1992, 60, 259–268. [Google Scholar] [CrossRef]
- Roth, S.; Black, M.J. Fields of Experts: A Framework for Learning Image Priors. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, CA, USA, 20–25 June 2005; Volume II, pp. 860–867. [Google Scholar] [CrossRef]
- He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar] [CrossRef]
- Zhang, H.; Patel, V.M. Density-Aware Single Image De-Raining Using a Multi-Stream Dense Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 695–704. [Google Scholar]
- Wei, W.; Meng, D.; Zhao, Q.; Xu, Z.; Wu, Y. Semi-Supervised Transfer Learning for Image Rain Removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019; pp. 3877–3886. [Google Scholar]
- Qian, R.; Tan, R.T.; Yang, W.; Su, J.; Liu, J. Attentive Generative Adversarial Network for Raindrop Removal from a Single Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2482–2491. [Google Scholar]
- Fu, X.; Huang, J.; Zeng, D.; Huang, Y.; Ding, X.; Paisley, J. Removing Rain from Single Images via a Deep Detail Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 3855–3863. [Google Scholar]
- Zhang, K.; Li, R.; Yu, Y.; Luo, W.; Li, C. Deep Dense Multi-Scale Network for Snow Removal Using Semantic and Depth Priors. IEEE Trans. Image Process. 2021, 30, 7419–7431. [Google Scholar] [CrossRef]
- Ren, W.; Tian, J.; Han, Z.; Chan, A.; Tang, Y. Video Desnowing and Deraining Based on Matrix Decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 4210–4219. [Google Scholar]
- Liu, Y.F.; Jaw, D.W.; Huang, S.C.; Hwang, J.N. DesnowNet: Context-Aware Deep Network for Snow Removal. IEEE Trans. Image Process. 2018, 27, 3064–3073. [Google Scholar] [CrossRef] [Green Version]
- You, S.; Tan, R.T.; Kawakami, R.; Mukaigawa, Y.; Ikeuchi, K. Adherent Raindrop Modeling, Detectionand Removal in Video. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1721–1733. [Google Scholar] [CrossRef]
- Quan, Y.; Deng, S.; Chen, Y.; Ji, H. Deep Learning for Seeing Through Window with Raindrops. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2463–2471. [Google Scholar]
- Qin, Q.; Yan, J.; Wang, X.; Wang, Q.; Li, M.; Wang, Y. ETDNet: An Efficient Transformer Deraining Model. IEEE Access 2021, 9, 119881–119893. [Google Scholar] [CrossRef]
- Tan, F.; Kong, Y.; Fan, Y.; Liu, F.; Zhou, D.; Zhang, H.; Chen, L.; Gao, L.; Qian, Y. SDNet: Mutil-Branch for Single Image Deraining Using Swin. arXiv 2021. [Google Scholar] [CrossRef]
- Liu, L.; Xie, L.; Zhang, X.; Yuan, S.; Chen, X.; Zhou, W.; Li, H.; Tian, Q. TAPE: Task-Agnostic Prior Embedding for Image Restoration. arXiv 2022. [Google Scholar] [CrossRef]
- Maria, J.; Valanarasu, J.; Yasarla, R.; Patel, V.M. TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 2353–2363. [Google Scholar]
- Liu, L.; Yuan, S.; Liu, J.; Guo, X.; Yan, Y.; Tian, Q. SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained Siamese Transformers. In Proceedings of the AAAI Conference on Artificial Intelligence 2022, Virtual, 22 February–1 March 2022; Volume 36, pp. 1747–1755. [Google Scholar] [CrossRef]
- Deng, Z.; Cai, Y.; Chen, L.; Gong, Z.; Bao, Q.; Yao, X.; Fang, D.; Yang, W.; Zhang, S.; Ma, L. RFormer: Transformer-Based Generative Adversarial Network for Real Fundus Image Restoration on a New Clinical Benchmark. IEEE J. Biomed. Health Inform. 2022, 26, 4645–4655. [Google Scholar] [CrossRef]
- Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A.; Research, B.A. Image-To-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Boudiaf, A.; Guo, Y.; Ghimire, A.; Werghi, N.; de Masi, G.; Javed, S.; Dias, J. Underwater Image Enhancement Using Pre-Trained Transformer. In Proceedings of the Image Analysis and Processing–ICIAP 2022: 21st International Conference, Lecce, Italy, 23–27 May 2022; pp. 480–488. [Google Scholar] [CrossRef]
- Wei, T.; Li, Q.; Chen, Z.; Liu, J. FRGAN: A Blind Face Restoration with Generative Adversarial Networks. Math. Probl. Eng. 2021, 2021, 2384435. [Google Scholar] [CrossRef]
- Souibgui, M.A.; Biswas, S.; Jemni, S.K.; Kessentini, Y.; Fornés, A.; Lladós, J.; Pal, U. DocEnTr: An End-to-End Document Image Enhancement Transformer. arXiv 2022. [Google Scholar] [CrossRef]
- Zhang, P.; Zhang, K.; Luo, W.; Li, C.; Wang, G. Blind Face Restoration: Benchmark Datasets and a Baseline Model. arXiv 2022. [Google Scholar] [CrossRef]
- Wang, C.; Shang, K.; Zhang, H.; Li, Q.; Hui, Y.; Zhou, S.K. DuDoTrans: Dual-Domain Transformer Provides More Attention for Sinogram Restoration in Sparse-View CT Reconstruction. arXiv 2021. [Google Scholar] [CrossRef]
- Ji, H.; Feng, X.; Pei, W.; Li, J.; Lu, G. U2-Former: A Nested U-Shaped Transformer for Image Restoration. arXiv 2021. [Google Scholar] [CrossRef]
- Yan, C.; Shi, G.; Wu, Z. SMIR: A Transformer-Based Model for MRI Super-Resolution Reconstruction. In Proceedings of the 2021 IEEE International Conference on Medical Imaging Physics and Engineering, ICMIPE 2021, Hefei, China, 13–14 November 2021. [Google Scholar] [CrossRef]
- Wang, H.; Jiang, K. Research on Image Super-Resolution Reconstruction Based on Transformer. In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Industrial Design, AIID 2021, Guangzhou, China, 28–30 May 2021; pp. 226–230. [Google Scholar] [CrossRef]
Method | Year | Algorithm | Region of Interest |
---|---|---|---|
[18] | 2022 | ViT | Generative modeling, low-level vision (e.g., image super-resolution, image enhancement, and colorization), Recognition tasks (e.g., image classification, object detection, action recognition, and segmentation), video processing (e.g., activity recognition, video forecasting), multi-modal tasks (e.g., visual question answering, visual reasoning, and visual grounding), and three-dimensional analysis (e.g., point cloud classification and segmentation) |
[19] | 2022 | ViT | Backbone network (e.g., supervised pretraining and self-supervised pretraining), high/mid-level vision (e.g., object detection, segmentation, and pose estimation), low-level vision (e.g., image generation, and image enhancement), video processing (e.g., video inpainting, and video captioning), multimodality tasks (e.g., classification, image generation, and multi-task), and efficient transformer (e.g., decomposition, distillation, quantization, and architecture design) |
[20] | 2022 | ViT, CNN | Fundamental concepts, a background of the self-attention mechanism, strengths and weaknesses, computational cost, and comparison of ViT and CNN performance on benchmark datasets |
[21] | 2022 | ViT | Medical image segmentation, detection, classification, reconstruction, synthesis, registration, and clinical report generation |
[22] | 2022 | CNN, ViT | General deep learning techniques in super-resolution, dehazing, deblurring, and image denoising. Then, a summary of the leading architectural components involved in these tasks, such as residual or skip connection, receptive field, and unsupervised autoencoder mechanisms. |
Our | - | ViT | Seven image restoration tasks are deeply studied: image super-resolution, image denoising, general image enhancement, JPEG compression artifact reduction, image deblurring, removing adverse weather conditions, and image dehazing. |
Method | Training Dataset | Set14 PSNR/SSIM | BSD100 PSNR/SSIM | Manga109 PSNR/SSIM | Parameters |
---|---|---|---|---|---|
HAT [49], 2022 | DF2K | 35.29/0.9293 | 32.74/0.9066 | 41.01/0.9831 | 20.8M |
SwinFIR [51], 2022 | DF2K | 34.93/0.9276 | 32.64/0.9054 | 40.61/0.9816 | 13.99M |
IPT [59], 2021 | ImageNet | 34.43 | 32.48 | - | 46.0M |
SwinIR [50], 2021 | DF2K | 34.61/0.9260 | 32.55/0.9043 | 40.02/0.9800 | 11.8M |
CrossSR [60], 2022 | DIV2K | 33.99/0.9218 | 32.27/0.9000 | - | 18.3M |
SRT [61], 2022 | DIV2K | 33.95/0.9207 | 32.35/0.9018 | - | 11M |
ELAN [62], 2022 | DIV2K | 33.94/0.9207 | 32.30/0.9012 | 39.11/0.9782 | 582K |
HIPA [63], 2022 | DIV2K | 34.25/0.9235 | 32.48/0.9040 | 39.75/0.9794 | 19.2M |
ACT [64], 2022 | ImageNet | 34.68/0.9260 | 32.60/0.9052 | 40.11/0.9807 | 46.0M |
Method | Training Dataset | Set14 PSNR/SSIM | BSD100 PSNR/SSIM | Manga109 PSNR/SSIM | Parameters |
---|---|---|---|---|---|
HAT [49], 2022 | DF2K | 31.47/0.8584 | 29.63/0.8191 | 36.02/0.9576 | 20.8M |
SwinFIR [51], 2022 | DF2K | 31.24/0.8566 | 29.55/0.8169 | 35.77/0.9563 | 13.99M |
IPT [59], 2021 | ImageNet | 30.85 | 29.38 | - | 46.0M |
SwinIR [50], 2021 | DF2K | 31.00/0.8542 | 29.49/0.8150 | 35.28/0.9543 | 11.8M |
CrossSR [60], 2022 | DIV2K | 30.43/0.8433 | 29.15/0.8063 | 33.95/0.9455 | 0.68M |
SRT [61], 2022 | DIV2K | 30.53/0.8460 | 29.21/0.8082 | - | 18.3M |
ELAN [62], 2022 | DIV2K | 30.55/0.8463 | 29.21/0.8081 | 34.00/0.9478 | 590K |
HIPA [63], 2022 | DIV2K | 30.38/0.8417 | 29.13/0.8061 | 33.82/0.9460 | 365K |
ACT [64], 2022 | DIV2K | 30.80/0.8504 | 29.45/0.8127 | 34.86/0.9521 | 19.2M |
HAT [49], 2022 | ImageNet | 31.17/0.8549 | 29.55/0.8171 | 35.47/0.9548 | 46.0M |
Method | Training Dataset | Set14 PSNR/SSIM | BSD100 PSNR/SSIM | Manga109 PSNR/SSIM | Parameters |
---|---|---|---|---|---|
HAT [49], 2022 | DF2K | 29.47/0.8015 | 28.09/0.7551 | 33.09/0.9335 | 20.8M |
SwinFIR [51], 2022 | DF2K | 29.36/0.7993 | 28.03/0.7520 | 32.83/0.9314 | 13.99M |
IPT [59], 2021 | ImageNet | 29.01 | 27.82 | - | 46.0M |
SwinIR [50], 2021 | DF2K | 29.15/0.7958 | 27.95/0.7494 | 32.22/0.9273 | 11.8M |
CrossSR [60], 2022 | DIV2K | 28.11/0.7842 | 27.54/0.7464 | 30.09/0.9077 | - |
SRT [61], 2022 | DIV2K | 28.69/0.7833 | 27.69/0.7379 | 30.75/0.9100 | 0.68M |
ELAN [62], 2022 | DIV2K | 28.79/0.7856 | 27.70/0.7405 | 32.46/0.8975 | 18.3M |
HIPA [63], 2022 | DIV2K | 28.87/0.7880 | 27.75/0.7429 | - | 18.3M |
ACT [64], 2022 | DIV2K | 28.78/0.7858 | 27.69/0.7406 | 30.92/0.9150 | 601K |
HAT [49], 2022 | DIV2K | 28.68/0.7832 | 27.62/0.7382 | 30.76/0.9111 | 365K |
SwinFIR [51], 2022 | DIV2K | 29.02/0.7945 | 27.94/0.7463 | 31.77/0.9231 | 19.2M |
IPT [59], 2021 | ImageNet | 29.27/0.7968 | 28.00/0.7516 | 32.44/0.9282 | 46.0M |
Method | Scale | EPFL PSNR/SSIM | HCInew PSNR/SSIM | HCIold PSNR/SSIM | INRIA PSNR/SSIM | Parameters |
---|---|---|---|---|---|---|
SA-LSA [65], 2022 | ×2 | 34.48/0.9759 | 37.35/0.9770 | 44.31/0.9943 | 36.40/0.9843 | 3.78M |
LFT-transformer [66], 2022 | 34.80/0.978 | 37.84/0.979 | 44.52/0.995 | 36.59/0.986 | 1.16M | |
SA-LSA [65], 2022 | ×4 | 28.93/0.9167 | 31.19/0.9186 | 37.39/0.9720 | 30.96/0.9502 | 3.78M |
LFT-transformer [66], 2022 | 29.25/0.921 | 31.46/0.922 | 37.63/0.974 | 31.20/0.952 | 1.16M |
Method | Scale | Training Dataset | AID PSNR/SSIM | UCMerced PSNR/ SSIM | Parameters |
---|---|---|---|---|---|
TransENet [67], 2022 | ×2 | UCMerced, AID | 35.28/0.9374 | 34.03/0.9301 | - |
×4 | 29.38/0.7909 | 27.77/0.7630 | - | ||
SWCGAN [57], 2022 | ×4 | - | 27.63 | 3.8M |
Method | Scale | Training Dataset | Kirby21 PSNR/SSIM | ANVIL-adult PSNR/SSIM | MSSEG PSNR/SSIM | BraTS2018 PSNR/SSIM | Parameters |
---|---|---|---|---|---|---|---|
ASFT [68], 2022 | ×2 | Kirby21 | 43.68 ± 2.08/ 0.9965 ± 0.0014 | 40.96 ± 1.00/ 0.9906 ± 0.0013 | 41.22 ± 1.37/ 0.9978 ± 0.0004 | - | 1.85M |
CFTN [69], 2022 | 39.70 0.9847 | - | - | - | 21.93M | ||
ASFT [68], 2022 | ×3 | 40.19 ± 2.04/0.9882 ± 0.0034 | 37.54 ± 1.10/0.9703 ± 0.0041 | 36.82 ± 1.43 /0.9868 ± 0.0021 | - | 1.85M | |
CFTN [69], 2022 | 36.03 0.9612 | - | - | - | 21.93M | ||
Cohf-T [70], 2022 | ×3 | - | - | - | - | 34.84/0.9507 | 152M |
×4 | - | - | - | 33.26/0.9425 |
Method | Training Dataset | Harvard PSNR/SSIM | CAVE PSNR/SSIM | Houston PSNR/SSIM | Pavia Centre PSNR/SSIM | Parameters |
---|---|---|---|---|---|---|
3DT-Net [58], 2021 | Harvard, CAVE | 50.93/0.996 | 48.05/0.991 | - | - | 3.46M |
Fusformer [71], 2022 | Harvard, CAVE | 48.56/0.995 | 44.42/0.984 | - | - | 0.10M |
Interactformer [72] 2022 | - | - | - | 29.74/0.9181 | 28.51/0.8897 | 4465 K |
Method | Sub Task | Dataset | PSNR/SSIM | Parameters |
---|---|---|---|---|
TTSR [73], 2020 | Reference-based SR ×4 | CUFED5 | 27.09/0.804 | 9.10M |
Sun80 | 30.02/0.814 | |||
Urban100 | 25.87/0.784 | |||
Manga109 | 30.09/0.907 | |||
CTCNet [75], 2022 | Face Image SR ×8 | CelebA | 28.37/0.8115 | - |
Helen | 27.08/0.8077 | |||
BN-CSNT [76], 2022 | Thermal Image SR | PBVS-2022 ×2 | 21.08/0.7803 | - |
PBVS-2022 ×4 | 33.64/0.9263 |
Method | Train Dataset | Dataset | Noise Factor (NF) | PSNR/SSIM | Parameters |
---|---|---|---|---|---|
CVT [81], 2021 | DIV2K | SET 12 | 15 | 34.548 | - |
25 | 31.865 | ||||
30 | 27.676 | ||||
BSD68 | 15 | 33.790 | |||
25 | 30.828 | ||||
30 | 26.688 | ||||
SwinIR [50], 2021 | DIV2K | Kodak24 | 15 | 35.34 | 11.8M |
25 | 32.89 | ||||
50 | 29.79 | ||||
McMaster | 15 | 35.61 | |||
25 | 33.20 | ||||
50 | 30.22 | ||||
Urban100 | 15 | 35.13 | |||
25 | 32.90 | ||||
50 | 29.82 | ||||
CBSD68 | 15 | 34.42 | |||
25 | 31.78 | ||||
50 | 28.56 | ||||
IPT [59], 2021 | ImageNet | CBSD68 | 30 | 32.32 | 46.0M |
50 | 29.88 | ||||
Uformer [86], 2022 | SIDD and DND | SIDD | Real world Noise | 39.89/0.960 | 50.88M |
DND | Real world Noise | 40.04/0.956 | |||
Restormer [43], 2022 | DIV2K and SIDD | CBSD68 | 15 | 34.40 | 25.31M |
25 | 31.79 | ||||
50 | 28.60 | ||||
Urban100 | 15 | 35.13 | |||
25 | 32.96 | ||||
50 | 30.02 | ||||
SIDD | Real world Noise | 40.02/0.960 | |||
DnT [82], 2022 | - | Set9 | 25 | 32.18 | - |
50 | 29.29/ | ||||
75 | 27.62/ | ||||
100 | 26.31/ | ||||
CBSD68 | 25 | 28.78/8.16 | |||
50 | 25.72/7.02 | ||||
TECDNet [85], 2022 | - | SIDD | Real world Noise | 39.77/0.970 | 20.87M |
DND | Real world Noise | 39.92/0.955 | |||
TC-Net [86], 2022 | - | SIDD | Real world Noise | 39.69/0.970 | - |
DND | Real world Noise | 39.88/0.954 | |||
SUNet [87], 2022 | DIV2K | CBSD68 | 10 | 35.94 0.958 | 99M |
30 | 30.28 0.870 | ||||
50 | 27.85 0.799 | ||||
Kodak24 | 10 | 36.79 0.953 | |||
30 | 31.82 0.899 | ||||
50 | 29.54 0.810 | ||||
Pocoformer [88], 2022 | Real-world polarized color image | Real-world polarized color image | Real world Noise | 39.33/0.966 | 26.26M |
TransCT [89], 2021 | Mayo Clinic Low-Dose CT | Mayo | Real world Noise | 0.923 ± 0.024 | - |
Pig | Real world Noise | 0.87 ± 0.029 | |||
TED-Net [90], 2021 | Mayo Clinic Low-Dose CT | Mayo | Real world Noise | /0.9144 | 18.88M |
Eformer [91], 2021 | Mayo Clinic Low-Dose CT | Mayo | Real world Noise | 43.487/0.9861 | - |
Method | Dataset Training | GoPro PSNR/SSIM | HIDE PSNR/SSIM | RealBlur-R PSNR/SSIM | Parameters |
---|---|---|---|---|---|
Uformer [86], 2022 | GoPro | 33.06/0.967 | 30.90/0.953 | 36.19/0.956 | 50.88M |
Restormer [43], 2022 | GoPro | 32.92/0.961 | 31.22/0.942 | 36.19/0.957 | 25.31M |
Multi-scale Cubic-Mixer [96], 2022 | 4KRD | 33.79/0.962 | - | 39.66/0.969 | 40M |
Stripformer [97], 2022 | RealBlur | 33.08/0.962 | 31.03/0.940 | 39.84/0.974 | 20M |
Method | Dataset Training | SOTS-Indoor PSNR/SSIM | SOTS-Outdoor PSNR/SSIM | HSTS PSNR/SSIM | O-Hazy PSNR/SSIM | Parameters |
---|---|---|---|---|---|---|
DeHamer [99], 2022 | RESIDE | 36.63/0.9881 | 35.18/0.9860 | - | - | - |
TCAM [100], 2022 | RESIDE | 21.44/0.8851 | - | 23.83/0.9022 | - | - |
ISM [101], 2022 | RESIDE | 36.34/0.9836 | 30.85/0.9628 | 30.40/0.9696 | - | - |
Jiao et al. [102], 2022 | O-hazy and I-hazy | - | - | - | 15.89/0.56 | - |
Song et al. [103], 2022 | RESIDE | 38.46/0.994 | 34.29/0.983 | - | - | 4.634M |
Zhao et al. [104], 2021 | RESIDE | 32.17/0.970 | - | - | 29.87/0.758 | - |
Method | Type | Dataset | Quality Factor | PSNR/SSIM | Parameters |
---|---|---|---|---|---|
SwinIR [50], 2021 | Single Image | Classic5 | 10 | 30.27/0.8249 | 11.8M |
20 | 32.52/0.8748 | ||||
30 | 33.73/0.8961 | ||||
40 | 34.52/0.9082 | ||||
LIVE1 | 10 | 29.86/0.8287 | |||
20 | 32.25/0.8909 | ||||
30 | 33.69/0.9174 | ||||
40 | 34.67/0.9317 | ||||
PTNet [112], 2022 | Pair Stereo images | Flickr1024 | 10 | 28.05/0.8403 | 0.91 M |
20 | 30.39/0.9017 | ||||
30 | 31.83/0.9264 | ||||
KITTI2012 | 10 | 31.43/0.8786 | |||
20 | 33.85/0.9231 | ||||
30 | 35.18/0.9404 | ||||
KITTI2015 | 10 | 31.42/0.8730 | |||
20 | 34.07/0.9245 | ||||
30 | 35.57/0.9449 | ||||
Middlebury | 10 | 32.05/0.8676 | |||
20 | 34.51/0.9200 | ||||
30 | 35.85/0.9400 |
Method | Dataset Training | Rain100L PSNR/SSIM | Rain100H PSNR/SSIM | SPAD PSNR/SSIM | Raindrop800 PSNR/SSIM | Snow100K PSNR/SSIM | Parameters |
---|---|---|---|---|---|---|---|
ETDNet [132], 2021 | Rain100L and Rain100H | 41.09/0.986 | 32.35/0.9299 | - | - | - | 32.97M |
SDNet [133], 2021 | Rain100L and Rain100H | 37.92/0.9843 | 28.26/0.8957 | - | - | - | 2.14M |
IPT [59], 2021 | ImageNet | - | 41.62/0.9880 | - | - | - | 46.0M |
Uformer [86], 2022 | SPAD | - | - | 47.84/0.9925 | - | - | 50.88M |
Restormer [43], 2022 | Rain100L and Rain100H | 38.99/0.978 | 31.46/0.904 | - | - | - | 25.31M |
TAPE [134], 2022 | Rain200H, Raindrop800 and Snow100K | 33.17 | - | - | 27.69 | 26.33 | 1.07M |
TransWeather [135], 2022 | Raindrop800 and Snow100K | - | - | - | 34.55/0.9502 | 33.78/0.9287 | 31 M |
SiamTrans [136], 2022 | NTURain | 27.02/0.9024 | - | - | - | 26.05/0.8605 | - |
Method | Type | Training Dataset | PSNR/SSIM | Parameters | |
---|---|---|---|---|---|
RFormer [137], 2022 | Medical Real Fundus restoration | 120 pair Real Fundus | 28.38/0.863 | 21.11M | |
UIE-IPT [139], 2022 | Underwater Images | UFO-120 | 23.14/0.90 | 46.0M | |
FRGAN [140], 2021 | Face Restoration | VGGFace2 and CASIA-WebFace | 23.54/0.8199 | - | |
DocEnTr [141], 2022 | Enhance Handwritten Document Images | DIBCO | -/20.81 | - | |
H-DIBCO | -/22.29 | ||||
STUNet [142], 2022 | Blind Face Restoration | EDFace-Celeb-1M (BFR128) | 24.5500/0.6978 | - | |
EDFace-Celeb-150K (BFR512) | 27.1833/0.7346 | ||||
DuDoTrans [143], 2021 | Medical CT Reconstruction | NIH-AAPM | 32.68/0.9047 | - | |
COVID-19 | 37.83/0.9727 | ||||
U2-Former [144], 2021 | Image Reflection Removal | PLNet | Real20 | 23.67/0.835 | - |
Nature | 24.75/0.848 | ||||
Solid | 25.27/0.907 | ||||
Wild | 25.68/0.905 | ||||
Postcard | 22.43/0.889 | ||||
Image Rain Removal | Rain100L | 39.31/0.982 | - | ||
Rain100H | 30.87/0.899 | ||||
Image Hazing Removal | RESIDE | Indoor | 36.42/0.988 | - | |
Outdoor | 31.10/0.976 | ||||
SMIR [145], 2021 | MRI image reconstruction different sampling ratios | HCP | 10% | -/0.72 | 11M |
20% | -/0.86 | ||||
30% | -/0.87 | ||||
40% | -/0.89 | ||||
50% | -/0.91 | ||||
Wang et al. [146], 2021 | Image Reconstruction | ImageNet 20k | Set5 | -/32.61 | - |
Set14 | -/28.92 | ||||
B100 | -/27.78 | ||||
Urban100 | -/26.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ali, A.M.; Benjdira, B.; Koubaa, A.; El-Shafai, W.; Khan, Z.; Boulila, W. Vision Transformers in Image Restoration: A Survey. Sensors 2023, 23, 2385. https://doi.org/10.3390/s23052385
Ali AM, Benjdira B, Koubaa A, El-Shafai W, Khan Z, Boulila W. Vision Transformers in Image Restoration: A Survey. Sensors. 2023; 23(5):2385. https://doi.org/10.3390/s23052385
Chicago/Turabian StyleAli, Anas M., Bilel Benjdira, Anis Koubaa, Walid El-Shafai, Zahid Khan, and Wadii Boulila. 2023. "Vision Transformers in Image Restoration: A Survey" Sensors 23, no. 5: 2385. https://doi.org/10.3390/s23052385
APA StyleAli, A. M., Benjdira, B., Koubaa, A., El-Shafai, W., Khan, Z., & Boulila, W. (2023). Vision Transformers in Image Restoration: A Survey. Sensors, 23(5), 2385. https://doi.org/10.3390/s23052385