A Comprehensive Survey of Deep Learning Approaches in Image Processing
Abstract
:1. Introduction
- We provide an in-depth examination of the evolution of DL models in image processing, from foundational architectures to the latest advancements, highlighting the key developments that have shaped the field.
- The survey synthesizes various DL techniques that have been instrumental in advancing image processing, including those that enhance model efficiency, generalization, and robustness.
- We discuss the critical metrics used to evaluate DL models in image processing, offering a nuanced understanding of how these metrics are applied across different tasks.
- This survey identifies the persistent challenges in the application of DL to image processing and also explores potential future directions, including the integration of emerging technologies that could further advance the field.
2. Evolution of Deep Learning in Image Processing
2.1. Architectural Innovations
2.2. Specialized Architectures for Task-Specific Challenges
2.3. Expanding Capabilities with Transformers and Self-Attention
2.4. Integration of Generative Models
3. Deep Learning Techniques in Image Processing
3.1. Transfer Learning
3.2. Data Augmentation
3.3. Regularization Techniques
3.4. Adversarial Training
3.5. Self-Supervised and Unsupervised Learning
3.6. Domain Generalization and Adaptation
3.7. Meta-Learning
3.8. Prompt Learning
3.9. Model Compression and Optimization Techniques for Efficiency and Scalability
4. Advanced Deep Learning Models
4.1. Deep Residual Networks and Beyond
4.2. Attention Mechanisms and Transformers
4.3. Generative Models and Adversarial Networks
4.4. Hybrid and Multi-Modal Models
5. Evaluation Metrics for Image Processing Models
6. Applications of Deep Learning in Image Processing
6.1. Medical Imaging
6.2. Autonomous Systems
6.3. Remote Sensing and Environmental Monitoring
6.4. Security and Surveillance
6.5. Art and Cultural Heritage
6.6. Ethical and Social Considerations
6.7. Interdisciplinary Collaboration
7. Challenges and Future Directions
7.1. Challenges
7.2. Future Directions
8. Conclusions
Author Contributions
Funding
Conflicts of Interest
List of Abbreviations
Acronym | Meaning |
AI | Artificial Intelligence |
DL | Deep Learning |
ML | Machine Learning |
GPUs | Graphics Processing Units |
CNN | Convolutional Neural Network |
ResNet | Residual Network |
DesNet | Densely Connected Convolutional Network |
FCN | Fully Convolutional Network |
R-CNN | Region-based Convolutional Neural Network |
YOLO | You Only Look Once |
NN | Neural Network |
ConvNext | Next Generation of Convolutional Networks |
ViT | Vision Transformer |
GAN | Generative Adversarial Network |
CGAN | Conditional GAN |
WGAN | Wasserstein GAN |
FGSM | Fast Gradient Sign Method |
PGD | Projected Gradient Descent |
MAT | Model-based Adversarial Training |
UPGD | Universal Projected Gradient Descent |
HTPL | Hierarchical Transfer Progressive Learning |
RL | Reinforcement Learning |
SimCLR | Simple Framework for Contrastive Learning of Visual |
DDC | Deep Domain Confusion |
DICA | Domain-Invariant Component Analysis |
DANN | Domain-Adversarial NN |
MANN | Memory-Augmented NN |
SNAIL | Neural Attentive Meta-Learner |
MAML | Model-Agnostic Meta-Learning |
LSTM | Long Short-Term Memory |
SGD | Stochastic Gradient Descent |
CLIP | Contrastive Language-Image Pretraining |
PRISM | Promptable and Robust Interactive Segmentation Model |
MoCo | Momentum Contrast |
NAS | Neural Architecture Search |
TP | True Positive |
TN | True Negative |
FP | False Positive |
FN | False Negative |
TPR | True Positive Rate |
FPR | False Positive Rate |
IoU | Intersection over Union |
AP | Average Precision |
mAP | Mean AP |
SSIM | Structural Similarity Index |
FID | Fréchet Inception Distance |
PSNR | Peak Signal-to-Noise Ratio |
NCC | Normalized Cross-Correlation |
ROC | Receiver Operating Characteristic |
AUC | Area Under the ROC Curve |
MCC | Matthews Correlation Coefficient |
ADAS | Advanced Driver-Assistance System |
XAI | Explainable AI |
References
- Monga, V.; Li, Y.; Eldar, Y.C. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Process. Mag. 2021, 38, 18–44. [Google Scholar] [CrossRef]
- Banan, A.; Nasiri, A.; Taheri-Garavand, A. Deep learning-based appearance features extraction for automated carp species identification. Aquac. Eng. 2020, 89, 102053. [Google Scholar] [CrossRef]
- Wang, P.; Fan, E.; Wang, P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar] [CrossRef]
- Li, L.; Zhou, T.; Wang, W.; Li, J.; Yang, Y. Deep hierarchical semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–22 June 2022; pp. 1246–1257. [Google Scholar]
- Li, X.; Wang, T.; Cui, H.; Zhang, G.; Cheng, Q.; Dong, T.; Jiang, B. SARPointNet: An automated feature learning framework for spaceborne SAR image registration. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6371–6381. [Google Scholar] [CrossRef]
- Alshayeji, M.; Al-Buloushi, J.; Ashkanani, A.; Abed, S. Enhanced brain tumor classification using an optimized multi-layered convolutional neural network architecture. Multimed. Tools Appl. 2021, 80, 28897–28917. [Google Scholar] [CrossRef]
- Duan, R.; Deng, H.; Tian, M.; Deng, Y.; Lin, J. SODA: A large-scale open site object detection dataset for deep learning in construction. Autom. Constr. 2022, 142, 104499. [Google Scholar] [CrossRef]
- Jeon, W.; Ko, G.; Lee, J.; Lee, H.; Ha, D.; Ro, W.W. Deep learning with GPUs. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 2021; Volume 122, pp. 167–215. [Google Scholar]
- Cai, L.; Gao, J.; Zhao, D. A review of the application of deep learning in medical image classification and segmentation. Ann. Transl. Med. 2020, 8. [Google Scholar] [CrossRef]
- Wang, X.; Zhao, Y.; Pourpanah, F. Recent advances in deep learning. Int. J. Mach. Learn. Cybern. 2020, 11, 747–750. [Google Scholar] [CrossRef]
- Liu, Y.; Pu, H.; Sun, D.W. Efficient extraction of deep image features using convolutional neural network (CNN) for applications in detecting and analysing complex food matrices. Trends Food Sci. Technol. 2021, 113, 193–204. [Google Scholar] [CrossRef]
- Hoeser, T.; Kuenzer, C. Object detection and image segmentation with deep learning on earth observation data: A review-part i: Evolution and recent trends. Remote Sens. 2020, 12, 1667. [Google Scholar] [CrossRef]
- Shin, D.; He, S.; Lee, G.M.; Whinston, A.B.; Cetintas, S.; Lee, K.C. Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach; SSRN: Amsterdam, The Netherlands, 2020. [Google Scholar]
- Csurka, G.; Hospedales, T.M.; Salzmann, M.; Tommasi, T. Visual Domain Adaptation in the Deep Learning Era; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
- Lilhore, U.K.; Simaiya, S.; Kaur, A.; Prasad, D.; Khurana, M.; Verma, D.K.; Hassan, A. Impact of deep learning and machine learning in industry 4.0: Impact of deep learning. In Cyber-Physical, IoT, and Autonomous Systems in Industry 4.0; CRC Press: Boca Raton, FL, USA, 2021; pp. 179–197. [Google Scholar]
- Li, X.; Xiong, H.; Li, X.; Wu, X.; Zhang, X.; Liu, J.; Bian, J.; Dou, D. Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond. Knowl. Inf. Syst. 2022, 64, 3197–3234. [Google Scholar] [CrossRef]
- Greenwald, N.F.; Miller, G.; Moen, E.; Kong, A.; Kagel, A.; Dougherty, T.; Fullaway, C.C.; McIntosh, B.J.; Leow, K.X.; Schwartz, M.S.; et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol. 2022, 40, 555–565. [Google Scholar] [CrossRef] [PubMed]
- Thompson, N.C.; Greenewald, K.; Lee, K.; Manso, G.F. The computational limits of deep learning. arXiv 2020, arXiv:2007.05558. [Google Scholar]
- Zhan, Z.H.; Li, J.Y.; Zhang, J. Evolutionary deep learning: A survey. Neurocomputing 2022, 483, 42–58. [Google Scholar] [CrossRef]
- Sarwinda, D.; Paradisa, R.H.; Bustamam, A.; Anggia, P. Deep learning in image classification using residual network (ResNet) variants for detection of colorectal cancer. Procedia Comput. Sci. 2021, 179, 423–431. [Google Scholar] [CrossRef]
- Liang, J. Image classification based on RESNET. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2020; Volume 1634, p. 012110. [Google Scholar]
- Yu, D.; Yang, J.; Zhang, Y.; Yu, S. Additive DenseNet: Dense connections based on simple addition operations. J. Intell. Fuzzy Syst. 2021, 40, 5015–5025. [Google Scholar] [CrossRef]
- Chen, B.; Zhao, T.; Liu, J.; Lin, L. Multipath feature recalibration DenseNet for image classification. Int. J. Mach. Learn. Cybern. 2021, 12, 651–660. [Google Scholar] [CrossRef]
- Liu, M.; Chen, L.; Du, X.; Jin, L.; Shang, M. Activated gradients for deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 2156–2168. [Google Scholar] [CrossRef] [PubMed]
- Khan, S.D.; Basalamah, S. Multi-branch deep learning framework for land scene classification in satellite imagery. Remote Sens. 2023, 15, 3408. [Google Scholar] [CrossRef]
- Alom, M.Z.; Hasan, M.; Yakopcic, C.; Taha, T.M.; Asari, V.K. Inception recurrent convolutional neural network for object recognition. Mach. Vis. Appl. 2021, 32, 1–14. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, Z.; Zeng, C.; Yu, Y.; Wan, X. High-quality image compressed sensing and reconstruction with multi-scale dilated convolutional neural network. Circuits Syst. Signal Process. 2023, 42, 1593–1616. [Google Scholar] [CrossRef]
- Bergamasco, L.; Bovolo, F.; Bruzzone, L. A dual-branch deep learning architecture for multisensor and multitemporal remote sensing semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2147–2162. [Google Scholar] [CrossRef]
- Ragab, M.G.; Abdulkader, S.J.; Muneer, A.; Alqushaibi, A.; Sumiea, E.H.; Qureshi, R.; Al-Selwi, S.M.; Alhussian, H. A Comprehensive Systematic Review of YOLO for Medical Object Detection (2018 to 2023). IEEE Access 2024, 12, 57815–57836. [Google Scholar] [CrossRef]
- Vijayakumar, A.; Vairavasundaram, S. Yolo-based object detection models: A review and its applications. Multimed. Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]
- Qi, J.; Nguyen, M.; Yan, W.Q. Waste classification from digital images using ConvNeXt. In Proceedings of the 10th Pacific-Rim Symposium on Image and Video Technology, Online, 25–28 November 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–13. [Google Scholar]
- Todi, A.; Narula, N.; Sharma, M.; Gupta, U. ConvNext: A Contemporary Architecture for Convolutional Neural Networks for Image Classification. In Proceedings of the 3rd International Conference on Innovative Sustainable Computational Technologies (CISCT), Dehradun, India, 8–9 September 2023; pp. 1–6. [Google Scholar]
- Ramos, L.; Casas, E.; Romero, C.; Rivas-Echeverría, F.; Morocho-Cayamcela, M.E. A study of convnext architectures for enhanced image captioning. IEEE Access 2024, 12, 13711–13728. [Google Scholar] [CrossRef]
- Mou, L.; Hua, Y.; Zhu, X.X. Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7557–7569. [Google Scholar] [CrossRef]
- Du, G.; Cao, X.; Liang, J.; Chen, X.; Zhan, Y. Medical Image Segmentation based on U-Net: A Review. J. Imaging Sci. Technol. 2020, 64, 1. [Google Scholar] [CrossRef]
- Li, H.; Wang, W.; Wang, M.; Li, L.; Vimlund, V. A review of deep learning methods for pixel-level crack detection. J. Traffic Transp. Eng. (Engl. Ed.) 2022, 9, 945–968. [Google Scholar] [CrossRef]
- Yang, H.; Huang, C.; Wang, L.; Luo, X. An improved encoder–decoder network for ore image segmentation. IEEE Sensors J. 2020, 21, 11469–11475. [Google Scholar] [CrossRef]
- Lin, K.; Zhao, H.; Lv, J.; Li, C.; Liu, X.; Chen, R.; Zhao, R. Face Detection and Segmentation Based on Improved Mask R-CNN. Discret. Dyn. Nat. Soc. 2020, 2020, 9242917. [Google Scholar] [CrossRef]
- Muhammad, K.; Hussain, T.; Ullah, H.; Del Ser, J.; Rezaei, M.; Kumar, N.; Hijji, M.; Bellavista, P.; de Albuquerque, V.H.C. Vision-based semantic segmentation in scene understanding for autonomous driving: Recent achievements, challenges, and outlooks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22694–22715. [Google Scholar] [CrossRef]
- Nie, X.; Duan, M.; Ding, H.; Hu, B.; Wong, E.K. Attention mask R-CNN for ship detection and segmentation from remote sensing images. IEEE Access 2020, 8, 9325–9334. [Google Scholar] [CrossRef]
- Li, S.; Zhao, X. Pixel-level detection and measurement of concrete crack using faster region-based convolutional neural network and morphological feature extraction. Meas. Sci. Technol. 2021, 32, 065010. [Google Scholar] [CrossRef]
- Udendhran, R.; Balamurugan, M.; Suresh, A.; Varatharajan, R. Enhancing image processing architecture using deep learning for embedded vision systems. Microprocess. Microsystems 2020, 76, 103094. [Google Scholar] [CrossRef]
- Khan, A.; Rauf, Z.; Khan, A.R.; Rathore, S.; Khan, S.H.; Shah, N.S.; Farooq, U.; Asif, H.; Asif, A.; Zahoora, U.; et al. A recent survey of vision transformers for medical image segmentation. arXiv 2023, arXiv:2312.00634. [Google Scholar]
- Liu, Q.; Xu, Z.; Bertasius, G.; Niethammer, M. Simpleclick: Interactive image segmentation with simple vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 22290–22300. [Google Scholar]
- Qian, X.; Zhang, C.; Chen, L.; Li, K. Deep learning-based identification of maize leaf diseases is improved by an attention mechanism: Self-attention. Front. Plant Sci. 2022, 13, 864486. [Google Scholar] [CrossRef]
- Azad, R.; Kazerouni, A.; Heidari, M.; Aghdam, E.K.; Molaei, A.; Jia, Y.; Jose, A.; Roy, R.; Merhof, D. Advances in medical image analysis with vision transformers: A comprehensive review. Med. Image Anal. 2023, 91, 103000. [Google Scholar] [CrossRef] [PubMed]
- Hassani, A.; Walton, S.; Shah, N.; Abuduweili, A.; Li, J.; Shi, H. Escaping the big data paradigm with compact transformers. arXiv 2021, arXiv:2104.05704. [Google Scholar]
- Zhao, H.; Jia, J.; Koltun, V. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10076–10085. [Google Scholar]
- Li, S.; Wu, C.; Xiong, N. Hybrid architecture based on CNN and transformer for strip steel surface defect classification. Electronics 2022, 11, 1200. [Google Scholar] [CrossRef]
- Fang, J.; Lin, H.; Chen, X.; Zeng, K. A hybrid network of cnn and transformer for lightweight image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1103–1112. [Google Scholar]
- Sun, Q.; Fang, N.; Liu, Z.; Zhao, L.; Wen, Y.; Lin, H. HybridCTrm: Bridging CNN and transformer for multimodal brain image segmentation. J. Healthc. Eng. 2021, 2021, 7467261. [Google Scholar] [CrossRef]
- Akil, M.; Saouli, R.; Kachouri, R. Fully automatic brain tumor segmentation with deep learning-based selective attention using overlapping patches and multi-class weighted cross-entropy. Med. Image Anal. 2020, 63, 101692. [Google Scholar]
- Kumar, V.R.; Yogamani, S.; Milz, S.; Mäder, P. FisheyeDistanceNet++: Self-supervised fisheye distance estimation with self-attention, robust loss function and camera view generalization. Electron. Imaging 2021, 33, 1–11. [Google Scholar]
- Gong, M.; Chen, S.; Chen, Q.; Zeng, Y.; Zhang, Y. Generative adversarial networks in medical image processing. Curr. Pharm. Des. 2021, 27, 1856–1868. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Christophe, S.; Mermet, S.; Laurent, M.; Touya, G. Neural map style transfer exploration with GANs. Int. J. Cartogr. 2022, 8, 18–36. [Google Scholar] [CrossRef]
- Chen, H. Challenges and corresponding solutions of generative adversarial networks (GANs): A survey study. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021; Volume 1827, p. 012066. [Google Scholar]
- Qin, Z.; Liu, Z.; Zhu, P.; Ling, W. Style transfer in conditional GANs for cross-modality synthesis of brain magnetic resonance images. Comput. Biol. Med. 2022, 148, 105928. [Google Scholar] [CrossRef] [PubMed]
- Kim, C.; Park, S.; Hwang, H.J. Local stability of wasserstein GANs with abstract gradient penalty. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 4527–4537. [Google Scholar] [CrossRef] [PubMed]
- Zeng, Q.; Ma, X.; Cheng, B.; Zhou, E.; Pang, W. Gans-based data augmentation for citrus disease severity detection using deep learning. IEEE Access 2020, 8, 172882–172891. [Google Scholar] [CrossRef]
- Balaji, Y.; Chellappa, R.; Feizi, S. Robust optimal transport with applications in generative modeling and domain adaptation. Adv. Neural Inf. Process. Syst. 2020, 33, 12934–12944. [Google Scholar]
- Figueira, A.; Vaz, B. Survey on synthetic data generation, evaluation methods and GANs. Mathematics 2022, 10, 2733. [Google Scholar] [CrossRef]
- Kazeminia, S.; Baur, C.; Kuijper, A.; van Ginneken, B.; Navab, N.; Albarqouni, S.; Mukhopadhyay, A. GANs for medical image analysis. Artif. Intell. Med. 2020, 109, 101938. [Google Scholar] [CrossRef]
- Yamaguchi, S.; Kanai, S.; Eda, T. Effective data augmentation with multi-domain learning gans. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 6566–6574. [Google Scholar]
- Croitoru, F.A.; Hondru, V.; Ionescu, R.T.; Shah, M. Diffusion models in vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10850–10869. [Google Scholar] [CrossRef] [PubMed]
- Cao, H.; Tan, C.; Gao, Z.; Xu, Y.; Chen, G.; Heng, P.A.; Li, S.Z. A survey on generative diffusion models. IEEE Trans. Knowl. Data Eng. 2024, 36, 2814–2830. [Google Scholar] [CrossRef]
- Iman, M.; Arabnia, H.R.; Rasheed, K. A review of deep transfer learning and recent advancements. Technologies 2023, 11, 40. [Google Scholar] [CrossRef]
- Matsoukas, C.; Haslum, J.F.; Sorkhei, M.; Söderberg, M.; Smith, K. What makes transfer learning work for medical images: Feature reuse & other factors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9225–9234. [Google Scholar]
- Alzubaidi, L.; Fadhel, M.A.; Al-Shamma, O.; Zhang, J.; Santamaría, J.; Duan, Y.; R. Oleiwi, S. Towards a better understanding of transfer learning for medical imaging: A case study. Appl. Sci. 2020, 10, 4523. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Al-Amidie, M.; Al-Asadi, A.; Humaidi, A.J.; Al-Shamma, O.; Fadhel, M.A.; Zhang, J.; Santamaría, J.; Duan, Y. Novel transfer learning approach for medical imaging with limited labeled data. Cancers 2021, 13, 1590. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12299–12310. [Google Scholar]
- Gupta, J.; Pathak, S.; Kumar, G. Deep learning (CNN) and transfer learning: A review. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2022; Volume 2273, p. 012029. [Google Scholar]
- Kim, H.E.; Cosa-Linan, A.; Santhanam, N.; Jannesari, M.; Maros, M.E.; Ganslandt, T. Transfer learning for medical image classification: A literature review. BMC Med. Imaging 2022, 22, 69. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z. Mitigating Negative Transfer for Better Generalization and Efficiency in Transfer Learning. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2022. [Google Scholar]
- Agarwal, N.; Sondhi, A.; Chopra, K.; Singh, G. Transfer learning: Survey and classification. Smart Innov. Commun. Comput. Sci. Proc. ICSICCS 2020 2021, 1168, 145–155. [Google Scholar]
- Zhang, W.; Deng, L.; Zhang, L.; Wu, D. A survey on negative transfer. IEEE/CAA J. Autom. Sin. 2022, 10, 305–329. [Google Scholar] [CrossRef]
- Yang, Y.; Huang, L.K.; Wei, Y. Concept-wise Fine-tuning Matters in Preventing Negative Transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 18753–18763. [Google Scholar]
- Chen, X.; Tao, H.; Zhou, H.; Zhou, P.; Deng, Y. Hierarchical and progressive learning with key point sensitive loss for sonar image classification. Multimed. Syst. 2024, 30, 1–16. [Google Scholar] [CrossRef]
- Yang, S.; Xiao, W.; Zhang, M.; Guo, S.; Zhao, J.; Shen, F. Image data augmentation for deep learning: A survey. arXiv 2022, arXiv:2204.08610. [Google Scholar]
- Maharana, K.; Mondal, S.; Nemade, B. A review: Data pre-processing and data augmentation techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
- Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognit. 2023, 137, 109347. [Google Scholar] [CrossRef]
- Rebuffi, S.A.; Gowal, S.; Calian, D.A.; Stimberg, F.; Wiles, O.; Mann, T.A. Data augmentation can improve robustness. Adv. Neural Inf. Process. Syst. 2021, 34, 29935–29948. [Google Scholar]
- Li, P.; Li, D.; Li, W.; Gong, S.; Fu, Y.; Hospedales, T.M. A simple feature augmentation for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 8886–8895. [Google Scholar]
- Mumuni, A.; Mumuni, F. Data augmentation: A comprehensive survey of modern approaches. Array 2022, 16, 100258. [Google Scholar] [CrossRef]
- Termritthikun, C.; Jamtsho, Y.; Muneesawang, P. An improved residual network model for image recognition using a combination of snapshot ensembles and the cutout technique. Multimed. Tools Appl. 2020, 79, 1475–1495. [Google Scholar] [CrossRef]
- Galdran, A.; Carneiro, G.; González Ballester, M.A. Balanced-mixup for highly imbalanced medical image classification. In Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 27 September–1 October 2021; pp. 323–333. [Google Scholar]
- Walawalkar, D.; Shen, Z.; Liu, Z.; Savvides, M. Attentive cutmix: An enhanced data augmentation approach for deep learning based image classification. arXiv 2020, arXiv:2003.13048. [Google Scholar]
- Yun, J.P.; Shin, W.C.; Koo, G.; Kim, M.S.; Lee, C.; Lee, S.J. Automated defect inspection system for metal surfaces based on deep learning and data augmentation. J. Manuf. Syst. 2020, 55, 317–324. [Google Scholar] [CrossRef]
- Tian, K.; Lin, C.; Sun, M.; Zhou, L.; Yan, J.; Ouyang, W. Improving auto-augment via augmentation-wise weight sharing. Adv. Neural Inf. Process. Syst. 2020, 33, 19088–19098. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 702–703. [Google Scholar]
- Moradi, R.; Berangi, R.; Minaei, B. A survey of regularization strategies for deep models. Artif. Intell. Rev. 2020, 53, 3947–3986. [Google Scholar] [CrossRef]
- Nandini, G.S.; Kumar, A.S.; Chidananda, K. Dropout technique for image classification based on extreme learning machine. Glob. Transit. Proc. 2021, 2, 111–116. [Google Scholar] [CrossRef]
- Garbin, C.; Zhu, X.; Marques, O. Dropout vs. batch normalization: An empirical study of their impact to deep learning. Multimed. Tools Appl. 2020, 79, 12777–12815. [Google Scholar] [CrossRef]
- Wu, L.; Li, J.; Wang, Y.; Meng, Q.; Qin, T.; Chen, W.; Zhang, M.; Liu, T.Y. R-drop: Regularized dropout for neural networks. Adv. Neural Inf. Process. Syst. 2021, 34, 10890–10905. [Google Scholar]
- Andriushchenko, M.; D’Angelo, F.; Varre, A.; Flammarion, N. Why Do We Need Weight Decay in Modern Deep Learning? arXiv 2023, arXiv:2310.04415. [Google Scholar]
- Li, X.; Chen, S.; Yang, J. Understanding the disharmony between weight normalization family and weight decay. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 4715–4722. [Google Scholar]
- De, S.; Smith, S. Batch normalization biases residual blocks towards the identity function in deep networks. Adv. Neural Inf. Process. Syst. 2020, 33, 19964–19975. [Google Scholar]
- Awais, M.; Iqbal, M.T.B.; Bae, S.H. Revisiting internal covariate shift for batch normalization. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 5082–5092. [Google Scholar] [CrossRef] [PubMed]
- Zhao, W.; Alwidian, S.; Mahmoud, Q.H. Adversarial training methods for deep learning: A systematic review. Algorithms 2022, 15, 283. [Google Scholar] [CrossRef]
- Allen-Zhu, Z.; Li, Y. Feature purification: How adversarial training performs robust deep learning. In Proceedings of the 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), Denver, CO, USA, 7–10 February 2022; pp. 977–988. [Google Scholar]
- Chang, C.L.; Hung, J.L.; Tien, C.W.; Tien, C.W.; Kuo, S.Y. Evaluating robustness of ai models against adversarial attacks. In Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence, Taipei, Taiwan, 6 October 2020; pp. 47–54. [Google Scholar]
- Silva, S.H.; Najafirad, P. Opportunities and challenges in deep learning adversarial robustness: A survey. arXiv 2020, arXiv:2007.00753. [Google Scholar]
- Xie, C.; Tan, M.; Gong, B.; Wang, J.; Yuille, A.L.; Le, Q.V. Adversarial examples improve image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 819–828. [Google Scholar]
- Naqvi, S.M.A.; Shabaz, M.; Khan, M.A.; Hassan, S.I. Adversarial attacks on visual objects using the fast gradient sign method. J. Grid Comput. 2023, 21, 52. [Google Scholar] [CrossRef]
- Lanfredi, R.B.; Schroeder, J.D.; Tasdizen, T. Quantifying the preferential direction of the model gradient in adversarial training with projected gradient descent. Pattern Recognit. 2023, 139, 109430. [Google Scholar] [CrossRef] [PubMed]
- Wong, E.; Rice, L.; Kolter, J.Z. Fast is better than free: Revisiting adversarial training. arXiv 2020, arXiv:2001.03994. [Google Scholar]
- Deng, Y.; Karam, L.J. Universal adversarial attack via enhanced projected gradient descent. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Virtual Conference, Abu Dhabi, United Arab Emirates, 25–28 September 2020; pp. 1241–1245. [Google Scholar]
- Robey, A.; Hassani, H.; Pappas, G.J. Model-based robust deep learning: Generalizing to natural, out-of-distribution data. arXiv 2020, arXiv:2005.10247. [Google Scholar]
- Schmarje, L.; Santarossa, M.; Schröder, S.M.; Koch, R. A survey on semi-, self-and unsupervised learning for image classification. IEEE Access 2021, 9, 82146–82168. [Google Scholar] [CrossRef]
- Yuan, Y.; Wang, C.; Jiang, Z. Proxy-based deep learning framework for spectral–spatial hyperspectral image classification: Efficient and robust. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
- Jaiswal, A.; Babu, A.R.; Zadeh, M.Z.; Banerjee, D.; Makedon, F. A survey on contrastive self-supervised learning. Technologies 2020, 9, 2. [Google Scholar] [CrossRef]
- Li, Y.; Chen, J.; Zheng, Y. A multi-task self-supervised learning framework for scopy images. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 2005–2009. [Google Scholar]
- Chen, S.; Xue, J.H.; Chang, J.; Zhang, J.; Yang, J.; Tian, Q. SSL++: Improving self-supervised learning by mitigating the proxy task-specificity problem. IEEE Trans. Image Process. 2021, 31, 1134–1148. [Google Scholar] [CrossRef]
- Wang, C.; Wu, Y.; Qian, Y.; Kumatani, K.; Liu, S.; Wei, F.; Zeng, M.; Huang, X. Unispeech: Unified speech representation learning with labeled and unlabeled data. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10937–10947. [Google Scholar]
- Ericsson, L.; Gouk, H.; Loy, C.C.; Hospedales, T.M. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Process. Mag. 2022, 39, 42–62. [Google Scholar] [CrossRef]
- Chen, X.; Ding, M.; Wang, X.; Xin, Y.; Mo, S.; Wang, Y.; Han, S.; Luo, P.; Zeng, G.; Wang, J. Context autoencoder for self-supervised representation learning. Int. J. Comput. Vis. 2024, 132, 208–223. [Google Scholar] [CrossRef]
- Albelwi, S. Survey on self-supervised learning: Auxiliary pretext tasks and contrastive learning methods in imaging. Entropy 2022, 24, 551. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual, 12–18 July 2020; pp. 1597–1607. [Google Scholar]
- Ci, Y.; Lin, C.; Bai, L.; Ouyang, W. Fast-MoCo: Boost momentum-based contrastive learning with combinatorial patches. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 290–306. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Wang, Y.; Zhang, J.; Kan, M.; Shan, S.; Chen, X. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12275–12284. [Google Scholar]
- Diba, A.; Sharma, V.; Safdari, R.; Lotfi, D.; Sarfraz, S.; Stiefelhagen, R.; Van Gool, L. Vi2clr: Video and image for visual contrastive learning of representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1502–1512. [Google Scholar]
- Allaoui, M.; Kherfi, M.L.; Cheriet, A. Considerably improving clustering algorithms using UMAP dimensionality reduction technique: A comparative study. In Proceedings of the International Conference on Image and Signal Processing, Virtual, 23–25 October 2020; pp. 317–325. [Google Scholar]
- Zebari, R.; Abdulazeez, A.; Zeebaree, D.; Zebari, D.; Saeed, J. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J. Appl. Sci. Technol. Trends 2020, 1, 56–70. [Google Scholar] [CrossRef]
- Nalepa, J.; Myller, M.; Imai, Y.; Honda, K.i.; Takeda, T.; Antoniak, M. Unsupervised segmentation of hyperspectral images using 3-D convolutional autoencoders. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1948–1952. [Google Scholar] [CrossRef]
- Raza, K.; Singh, N.K. A tour of unsupervised deep learning for medical image analysis. Curr. Med. Imaging 2021, 17, 1059–1077. [Google Scholar]
- Rai, S.; Bhatt, J.S.; Patra, S.K. An unsupervised deep learning framework for medical image denoising. arXiv 2021, arXiv:2103.06575. [Google Scholar]
- Kim, W.; Kanezaki, A.; Tanaka, M. Unsupervised learning of image segmentation based on differentiable feature clustering. IEEE Trans. Image Process. 2020, 29, 8055–8068. [Google Scholar] [CrossRef]
- Yoon, J.S.; Oh, K.; Shin, Y.; Mazurowski, M.A.; Suk, H.I. Domain Generalization for Medical Image Analysis: A Review. Proc. IEEE 2024, 112, 1583–1609. [Google Scholar] [CrossRef]
- Zhou, K.; Liu, Z.; Qiao, Y.; Xiang, T.; Loy, C.C. Domain generalization: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4396–4415. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Wang, F.; Jiang, Y.; Xu, Z.; Wu, S.; Zhang, Y. Cross-subject EEG-based emotion recognition with deep domain confusion. In Proceedings of the 12th International Conference on Intelligent Robotics and Applications (ICIRA), Shenyang, China, 8–11 August 2019; pp. 558–570. [Google Scholar]
- Wang, F.; Han, Z.; Gong, Y.; Yin, Y. Exploring domain-invariant parameters for source free domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7151–7160. [Google Scholar]
- Khoee, A.G.; Yu, Y.; Feldt, R. Domain generalization through meta-learning: A survey. Artif. Intell. Rev. 2024, 57, 285. [Google Scholar] [CrossRef]
- Sicilia, A.; Zhao, X.; Hwang, S.J. Domain adversarial neural networks for domain generalization: When it works and how to improve. Mach. Learn. 2023, 112, 2685–2721. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, A.; Shi, H.; Huang, S.; Zheng, W.; Liu, Z.; Zhang, Q.; Yang, X. CT synthesis from MRI using multi-cycle GAN for head-and-neck radiation therapy. Comput. Med. Imaging Graph. 2021, 91, 101953. [Google Scholar] [CrossRef] [PubMed]
- Ostankovich, V.; Yagfarov, R.; Rassabin, M.; Gafurov, S. Application of cyclegan-based augmentation for autonomous driving at night. In Proceedings of the International Conference Nonlinearity, Information and Robotics (NIR), Innopolis, Russia, 3–6 December 2020; pp. 1–5. [Google Scholar]
- Huisman, M.; Van Rijn, J.N.; Plaat, A. A survey of deep meta-learning. Artif. Intell. Rev. 2021, 54, 4483–4541. [Google Scholar] [CrossRef]
- Tian, Y.; Zhao, X.; Huang, W. Meta-learning approaches for learning-to-learn in deep learning: A survey. Neurocomputing 2022, 494, 203–223. [Google Scholar] [CrossRef]
- Luo, S.; Li, Y.; Gao, P.; Wang, Y.; Serikawa, S. Meta-seg: A survey of meta-learning for image segmentation. Pattern Recognit. 2022, 126, 108586. [Google Scholar] [CrossRef]
- He, K.; Pu, N.; Lao, M.; Lew, M.S. Few-shot and meta-learning methods for image understanding: A survey. Int. J. Multimed. Inf. Retr. 2023, 12, 14. [Google Scholar] [CrossRef]
- Jha, A. In the Era of Prompt Learning with Vision-Language Models. arXiv 2024, arXiv:2411.04892. [Google Scholar]
- Zhou, K.; Yang, J.; Loy, C.C.; Liu, Z. Learning to prompt for vision-language models. Int. J. Comput. Vis. 2022, 130, 2337–2348. [Google Scholar] [CrossRef]
- Fang, A.; Ilharco, G.; Wortsman, M.; Wan, Y.; Shankar, V.; Dave, A.; Schmidt, L. Data determines distributional robustness in contrastive language image pre-training (clip). In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 6216–6234. [Google Scholar]
- Li, Y.; Wang, H.; Duan, Y.; Xu, H.; Li, X. Exploring visual interpretability for contrastive language-image pre-training. arXiv 2022, arXiv:2209.07046. [Google Scholar]
- Liu, J.; Wang, H.; Yin, W.; Sonke, J.J.; Gavves, E. Click prompt learning with optimal transport for interactive segmentation. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 93–110. [Google Scholar]
- Rao, A.; Fisher, A.; Chang, K.; Panagides, J.C.; McNamara, K.; Lee, J.Y.; Aalami, O. IMIL: Interactive Medical Image Learning Framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5241–5250. [Google Scholar]
- Li, H.; Liu, H.; Hu, D.; Wang, J.; Oguz, I. Prism: A promptable and robust interactive segmentation model with visual prompts. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; pp. 389–399. [Google Scholar]
- Marinov, Z.; Jäger, P.F.; Egger, J.; Kleesiek, J.; Stiefelhagen, R. Deep interactive segmentation of medical images: A systematic review and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10998–11018. [Google Scholar] [CrossRef] [PubMed]
- Jain, P.; Ienco, D.; Interdonato, R.; Berchoux, T.; Marcos, D. SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting. arXiv 2024, arXiv:2412.08536. [Google Scholar]
- Zhao, M.; Li, M.; Peng, S.L.; Li, J. A novel deep learning model compression algorithm. Electronics 2022, 11, 1066. [Google Scholar] [CrossRef]
- Mohammed, S.B.; Krothapalli, B.; Althat, C. Advanced Techniques for Storage Optimization in Resource-Constrained Systems Using AI and Machine Learning. J. Sci. Technol. 2023, 4, 89–125. [Google Scholar]
- Vadera, S.; Ameen, S. Methods for pruning deep neural networks. IEEE Access 2022, 10, 63280–63300. [Google Scholar] [CrossRef]
- Cheng, H.; Zhang, M.; Shi, J.Q. A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10558–10578. [Google Scholar] [CrossRef] [PubMed]
- Daghero, F.; Pagliari, D.J.; Poncino, M. Energy-efficient deep learning inference on edge devices. In Advances in Computers; Academic Press Inc.: Cambridge, MA, USA, 2021; Volume 122, pp. 247–301. [Google Scholar]
- Abdolrasol, M.G.; Hussain, S.S.; Ustun, T.S.; Sarker, M.R.; Hannan, M.A.; Mohamed, R.; Ali, J.A.; Mekhilef, S.; Milad, A. Artificial neural networks based optimization techniques: A review. Electronics 2021, 10, 2689. [Google Scholar] [CrossRef]
- Zhang, W.; Ji, M.; Yu, H.; Zhen, C. ReLP: Reinforcement learning pruning method based on prior knowledge. Neural Process. Lett. 2023, 55, 4661–4678. [Google Scholar] [CrossRef]
- Zakariyya, I.; Kalutarage, H.; Al-Kadri, M.O. Towards a robust, effective and resource efficient machine learning technique for IoT security monitoring. Comput. Secur. 2023, 133, 103388. [Google Scholar] [CrossRef]
- Rokh, B.; Azarpeyvand, A.; Khanteymoori, A. A comprehensive survey on model quantization for deep neural networks in image classification. ACM Trans. Intell. Syst. Technol. 2023, 14, 1–50. [Google Scholar] [CrossRef]
- Qin, H.; Zhang, Y.; Ding, Y.; Liu, X.; Danelljan, M.; Yu, F. QuantSR: Accurate low-bit quantization for efficient image super-resolution. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
- Alkhulaifi, A.; Alsahli, F.; Ahmad, I. Knowledge distillation in deep learning and its applications. PeerJ Comput. Sci. 2021, 7, e474. [Google Scholar] [CrossRef]
- Xu, Q.; Li, Y.; Shen, J.; Liu, J.K.; Tang, H.; Pan, G. Constructing deep spiking neural networks from artificial neural networks with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7886–7895. [Google Scholar]
- Wang, J.; Wu, Y.; Liu, M.; Yang, M.; Liang, H. A real-time trajectory optimization method for hypersonic vehicles based on a deep neural network. Aerospace 2022, 9, 188. [Google Scholar] [CrossRef]
- Zhang, L.; Bao, C.; Ma, K. Self-distillation: Towards efficient and compact neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4388–4403. [Google Scholar] [CrossRef]
- Tian, G.; Chen, J.; Zeng, X.; Liu, Y. Pruning by training: A novel deep neural network compression framework for image processing. IEEE Signal Process. Lett. 2021, 28, 344–348. [Google Scholar] [CrossRef]
- Weng, O. Neural network quantization for efficient inference: A survey. arXiv 2021, arXiv:2112.06126. [Google Scholar]
- Tang, J.; Shivanna, R.; Zhao, Z.; Lin, D.; Singh, A.; Chi, E.H.; Jain, S. Understanding and improving knowledge distillation. arXiv 2020, arXiv:2002.03532. [Google Scholar]
- Luo, S.; Fang, G.; Song, M. Deep semantic image compression via cooperative network pruning. J. Vis. Commun. Image Represent. 2023, 95, 103897. [Google Scholar] [CrossRef]
- Shafiq, M.; Gu, Z. Deep residual learning for image recognition: A survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
- Xie, G.; Ren, J.; Marshall, S.; Zhao, H.; Li, R.; Chen, R. Self-attention enhanced deep residual network for spatial image steganalysis. Digit. Signal Process. 2023, 139, 104063. [Google Scholar] [CrossRef]
- Liu, F.; Ren, X.; Zhang, Z.; Sun, X.; Zou, Y. Rethinking skip connection with layer normalization in transformers and resnets. arXiv 2021, arXiv:2105.07205. [Google Scholar]
- Shehab, L.H.; Fahmy, O.M.; Gasser, S.M.; El-Mahallawy, M.S. An efficient brain tumor image segmentation based on deep residual networks (ResNets). J. King Saud Univ. Eng. Sci. 2021, 33, 404–412. [Google Scholar] [CrossRef]
- Alotaibi, B.; Alotaibi, M. A hybrid deep ResNet and inception model for hyperspectral image classification. PFG–J. Photogramm. Remote Sens. Geoinf. Sci. 2020, 88, 463–476. [Google Scholar] [CrossRef]
- Zhang, C.; Benz, P.; Argaw, D.M.; Lee, S.; Kim, J.; Rameau, F.; Bazin, J.C.; Kweon, I.S. Resnet or densenet? Introducing dense shortcuts to resnet. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3550–3559. [Google Scholar]
- Yadav, D.; Jalal, A.; Garlapati, D.; Hossain, K.; Goyal, A.; Pant, G. Deep learning-based ResNeXt model in phycological studies for future. Algal Res. 2020, 50, 102018. [Google Scholar] [CrossRef]
- Hasan, N.; Bao, Y.; Shawon, A.; Huang, Y. DenseNet convolutional neural networks application for predicting COVID-19 using CT image. SN Comput. Sci. 2021, 2, 389. [Google Scholar] [CrossRef] [PubMed]
- LIU, J.w.; LIU, J.w.; LUO, X.l. Research progress in attention mechanism in deep learning. Chin. J. Eng. 2021, 43, 1499–1511. [Google Scholar]
- Ghaffarian, S.; Valente, J.; Van Der Voort, M.; Tekinerdogan, B. Effect of attention mechanism in deep learning-based remote sensing image processing: A systematic literature review. Remote Sens. 2021, 13, 2965. [Google Scholar] [CrossRef]
- Osman, A.A.; Shalaby, M.A.W.; Soliman, M.M.; Elsayed, K.M. A survey on attention-based models for image captioning. Int. J. Adv. Comput. Sci. Appl. 2023, 14. [Google Scholar] [CrossRef]
- Zhao, J.; Hou, X.; Pan, M.; Zhang, H. Attention-based generative adversarial network in medical imaging: A narrative review. Comput. Biol. Med. 2022, 149, 105948. [Google Scholar] [CrossRef]
- Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
- Li, J.; Yan, Y.; Liao, S.; Yang, X.; Shao, L. Local-to-global self-attention in vision transformers. arXiv 2021, arXiv:2107.04735. [Google Scholar]
- Mehrani, P.; Tsotsos, J.K. Self-attention in vision transformers performs perceptual grouping, not attention. Front. Comput. Sci. 2023, 5, 1178450. [Google Scholar] [CrossRef]
- Chen, X.; Pan, J.; Lu, J.; Fan, Z.; Li, H. Hybrid cnn-transformer feature fusion for single image deraining. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 378–386. [Google Scholar]
- Sardar, A.S.; Ranjan, V. Enhancing Computer Vision Performance: A Hybrid Deep Learning Approach with CNNs and Vision Transformers. In Proceedings of the International Conference on Computer Vision and Image Processing, Jammu, India, 3–5 November 2023; pp. 591–602. [Google Scholar]
- Zhang, Z.; Jiang, Y.; Jiang, J.; Wang, X.; Luo, P.; Gu, J. Star: A structure-aware lightweight transformer for real-time image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 4106–4115. [Google Scholar]
- Wang, L.; Chen, W.; Yang, W.; Bi, F.; Yu, F.R. A state-of-the-art review on image synthesis with generative adversarial networks. IEEE Access 2020, 8, 63514–63537. [Google Scholar] [CrossRef]
- Shamsolmoali, P.; Zareapoor, M.; Granger, E.; Zhou, H.; Wang, R.; Celebi, M.E.; Yang, J. Image synthesis with adversarial networks: A comprehensive survey and case studies. Inf. Fusion 2021, 72, 126–146. [Google Scholar] [CrossRef]
- Lee, I.H.; Chung, W.Y.; Park, C.G. Style transformation super-resolution GAN for extremely small infrared target image. Pattern Recognit. Lett. 2023, 174, 1–9. [Google Scholar] [CrossRef]
- Agnese, J.; Herrera, J.; Tao, H.; Zhu, X. A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1345. [Google Scholar] [CrossRef]
- Sharma, P.; Kumar, M.; Sharma, H.K.; Biju, S.M. Generative adversarial networks (GANs): Introduction, Taxonomy, Variants, Limitations, and Applications. Multimed. Tools Appl. 2024, 83, 88811–88858. [Google Scholar] [CrossRef]
- Stanczuk, J.; Etmann, C.; Kreusser, L.M.; Schönlieb, C.B. Wasserstein GANs work because they fail (to approximate the Wasserstein distance). arXiv 2021, arXiv:2103.01678. [Google Scholar]
- Raman, G.; Cao, X.; Li, A.; Raman, G.; Peng, J.C.H.; Lu, J. CGANs-based real-time stability region determination for inverter-based systems. In Proceedings of the IEEE Power & Energy Society General Meeting (PESGM), Montreal, QC, Canada, 2–6 August 2020; pp. 1–5. [Google Scholar]
- Khanuja, S.S.; Khanuja, H.K. GAN challenges and optimal solutions. Int. Res. J. Eng. Technol. (IRJET) 2021, 8, 836–840. [Google Scholar]
- Biau, G.; Sangnier, M.; Tanielian, U. Some theoretical insights into Wasserstein GANs. J. Mach. Learn. Res. 2021, 22, 1–45. [Google Scholar]
- Ahmad, Z.; Jaffri, Z.u.A.; Chen, M.; Bao, S. Understanding GANs: Fundamentals, variants, training challenges, applications, and open problems. Multimed. Tools Appl. 2024, 1–77. [Google Scholar] [CrossRef]
- Li, Z.; Li, D.; Xu, C.; Wang, W.; Hong, Q.; Li, Q.; Tian, J. Tfcns: A cnn-transformer hybrid network for medical image segmentation. In Proceedings of the International Conference on Artificial Neural Networks, Bristol, UK, 6–9 September 2022; pp. 781–792. [Google Scholar]
- Zhao, M.; Cao, G.; Huang, X.; Yang, L. Hybrid transformer-CNN for real image denoising. IEEE Signal Process. Lett. 2022, 29, 1252–1256. [Google Scholar] [CrossRef]
- Gupta, D.; Suman, S.; Ekbal, A. Hierarchical deep multi-modal network for medical visual question answering. Expert Syst. Appl. 2021, 164, 113993. [Google Scholar] [CrossRef]
- Liang, Y.; Wang, X.; Duan, X.; Zhu, W. Multi-modal contextual graph neural network for text visual question answering. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 3491–3498. [Google Scholar]
- Wang, Y.; Qiu, Y.; Cheng, P.; Zhang, J. Hybrid CNN-transformer features for visual place recognition. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1109–1122. [Google Scholar] [CrossRef]
- Weng, W.; Zhang, Y.; Xiong, Z. Event-based video reconstruction using transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Mondreal, QC, Canada, 10–17 October 2021; pp. 2563–2572. [Google Scholar]
- Tang, Q.; Liang, J.; Zhu, F. A comparative review on multi-modal sensors fusion based on deep learning. Signal Process. 2023, 213, 109165. [Google Scholar] [CrossRef]
- Park, S.; Vien, A.G.; Lee, C. Cross-modal transformers for infrared and visible image fusion. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 770–785. [Google Scholar] [CrossRef]
- He, X.; Wang, Y.; Zhao, S.; Chen, X. Co-attention fusion network for multimodal skin cancer diagnosis. Pattern Recognit. 2023, 133, 108990. [Google Scholar] [CrossRef]
- Xu, L.; Tang, Q.; Zheng, B.; Lv, J.; Li, W.; Zeng, X. CGFTrans: Cross-Modal Global Feature Fusion Transformer for Medical Report Generation. IEEE J. Biomed. Health Inform. 2024, 28, 5600–5612. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Ibanez-Guzman, J. Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems. IEEE Signal Process. Mag. 2020, 37, 50–61. [Google Scholar] [CrossRef]
- Reinke, A.; Tizabi, M.D.; Sudre, C.H.; Eisenmann, M.; Rädsch, T.; Baumgartner, M.; Acion, L.; Antonelli, M.; Arbel, T.; Bakas, S.; et al. Common limitations of image processing metrics: A picture story. arXiv 2021, arXiv:2104.05642. [Google Scholar]
- Singh, S.; Mittal, N.; Singh, H. Classification of various image fusion algorithms and their performance evaluation metrics. Comput. Intell. Mach. Learn. Healthc. Inform. 2020, 179–198. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, E.; Zhu, Y. Image segmentation evaluation: A survey of methods. Artif. Intell. Rev. 2020, 53, 5637–5674. [Google Scholar] [CrossRef]
- Zhou, J.; Gandomi, A.H.; Chen, F.; Holzinger, A. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 2021, 10, 593. [Google Scholar] [CrossRef]
- Baraheem, S.S.; Le, T.N.; Nguyen, T.V. Image synthesis: A review of methods, datasets, evaluation metrics, and future outlook. Artif. Intell. Rev. 2023, 56, 10813–10865. [Google Scholar] [CrossRef]
- Luo, G.; Cheng, L.; Jing, C.; Zhao, C.; Song, G. A thorough review of models, evaluation metrics, and datasets on image captioning. IET Image Process. 2022, 16, 311–332. [Google Scholar] [CrossRef]
- Zhou, S.K.; Greenspan, H.; Davatzikos, C.; Duncan, J.S.; Van Ginneken, B.; Madabhushi, A.; Prince, J.L.; Rueckert, D.; Summers, R.M. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 2021, 109, 820–838. [Google Scholar] [CrossRef] [PubMed]
- Suganyadevi, S.; Seethalakshmi, V.; Balasamy, K. A review on deep learning in medical image analysis. Int. J. Multimed. Inf. Retr. 2022, 11, 19–38. [Google Scholar] [CrossRef]
- Narin, A.; Kaya, C.; Pamuk, Z. Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Anal. Appl. 2021, 24, 1207–1220. [Google Scholar] [CrossRef] [PubMed]
- Allugunti, V.R. A machine learning model for skin disease classification using convolution neural network. Int. J. Comput. Program. Database Manag. 2022, 3, 141–147. [Google Scholar] [CrossRef]
- Francolini, G.; Desideri, I.; Stocchi, G.; Salvestrini, V.; Ciccone, L.P.; Garlatti, P.; Loi, M.; Livi, L. Artificial Intelligence in radiotherapy: State of the art and future directions. Med. Oncol. 2020, 37, 1–9. [Google Scholar] [CrossRef]
- Bera, K.; Braman, N.; Gupta, A.; Velcheti, V.; Madabhushi, A. Predicting cancer outcomes with radiomics and artificial intelligence in radiology. Nat. Rev. Clin. Oncol. 2022, 19, 132–146. [Google Scholar] [CrossRef]
- Ebrahimi, A.; Luo, S.; Disease Neuroimaging Initiative, f.t.A. Convolutional neural networks for Alzheimer’s disease detection on MRI images. J. Med. Imaging 2021, 8, 024503. [Google Scholar] [CrossRef]
- Hatuwal, B.K.; Thapa, H.C. Lung cancer detection using convolutional neural network on histopathological images. Int. J. Comput. Trends Technol. 2020, 68, 21–24. [Google Scholar] [CrossRef]
- Samanta, A.; Saha, A.; Satapathy, S.C.; Fernandes, S.L.; Zhang, Y.D. Automated detection of diabetic retinopathy using convolutional neural networks on a small dataset. Pattern Recognit. Lett. 2020, 135, 293–298. [Google Scholar] [CrossRef]
- Krishnan, R.; Rajpurkar, P.; Topol, E.J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 2022, 6, 1346–1352. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.C.; Pareek, A.; Jensen, M.; Lungren, M.P.; Yeung, S.; Chaudhari, A.S. Self-supervised learning for medical image classification: A systematic review and implementation guidelines. npj Digit. Med. 2023, 6, 74. [Google Scholar] [CrossRef] [PubMed]
- Shurrab, S.; Duwairi, R. Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Comput. Sci. 2022, 8, e1045. [Google Scholar] [CrossRef] [PubMed]
- Celi, L.A.; Cellini, J.; Charpignon, M.L.; Dee, E.C.; Dernoncourt, F.; Eber, R.; Mitchell, W.G.; Moukheiber, L.; Schirmer, J.; Situ, J.; et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities—A global review. PLoS Digit. Health 2022, 1, e0000022. [Google Scholar] [CrossRef] [PubMed]
- Chowdhury, R.H. Intelligent systems for healthcare diagnostics and treatment. World J. Adv. Res. Rev. 2024, 23, 007–015. [Google Scholar] [CrossRef]
- Xie, Y.; Lu, L.; Gao, F.; He, S.j.; Zhao, H.j.; Fang, Y.; Yang, J.m.; An, Y.; Ye, Z.w.; Dong, Z. Integration of artificial intelligence, blockchain, and wearable technology for chronic disease management: A new paradigm in smart healthcare. Curr. Med. Sci. 2021, 41, 1123–1133. [Google Scholar] [CrossRef] [PubMed]
- Chawla, N. AI, IOT and Wearable Technology for Smart Healthcare—A Review. Int. J. Recent Res. Asp. 2020, 7, 9–14. [Google Scholar]
- Kuutti, S.; Bowden, R.; Jin, Y.; Barber, P.; Fallah, S. A survey of deep learning applications to autonomous vehicle control. IEEE Trans. Intell. Transp. Syst. 2020, 22, 712–733. [Google Scholar] [CrossRef]
- Grigorescu, S.; Trasnea, B.; Cocias, T.; Macesanu, G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2020, 37, 362–386. [Google Scholar] [CrossRef]
- Tran, L.A.; Do, T.D.; Park, D.C.; Le, M.H. Enhancement of robustness in object detection module for advanced driver assistance systems. In Proceedings of the International Conference on System Science and Engineering (ICSSE), Nha Trang, Vietnam, 26–28 August 2021; pp. 158–163. [Google Scholar]
- Farooq, M.A.; Corcoran, P.; Rotariu, C.; Shariff, W. Object detection in thermal spectrum for advanced driver-assistance systems (ADAS). IEEE Access 2021, 9, 156465–156481. [Google Scholar] [CrossRef]
- Tran, L.A.; Do, T.D.; Park, D.C.; Le, M.H. Robustness Enhancement of Object Detection in Advanced Driver Assistance Systems (ADAS). arXiv 2021, arXiv:2105.01580. [Google Scholar]
- Li, G.; Li, S.; Li, S.; Qin, Y.; Cao, D.; Qu, X.; Cheng, B. Deep reinforcement learning enabled decision-making for autonomous driving at intersections. Automot. Innov. 2020, 3, 374–385. [Google Scholar] [CrossRef]
- Harrison, K.; Ingole, R.; Surabhi, S.N.R.D. Enhancing Autonomous Driving: Evaluations Of AI And ML Algorithms. Educ. Adm. Theory Pract. 2024, 30, 4117–4126. [Google Scholar] [CrossRef]
- Jeyaraman, J.; Malaiyappan, J.N.A.; Sistla, S.M.K. Advancements in Reinforcement Learning Algorithms for Autonomous Systems. Int. J. Innov. Sci. Res. Technol. (IJISRT) 2024, 9, 1941–1946. [Google Scholar]
- Ekatpure, R. Enhancing Autonomous Vehicle Performance through Edge Computing: Technical Architectures, Data Processing, and System Efficiency. Appl. Res. Artif. Intell. Cloud Comput. 2023, 6, 17–34. [Google Scholar]
- Lv, Z.; Chen, D.; Wang, Q. Diversified technologies in internet of vehicles under intelligent edge computing. IEEE Trans. Intell. Transp. Syst. 2020, 22, 2048–2059. [Google Scholar] [CrossRef]
- Ma, Y.; Wang, Z.; Yang, H.; Yang, L. Artificial intelligence applications in the development of autonomous vehicles: A survey. IEEE/CAA J. Autom. Sin. 2020, 7, 315–329. [Google Scholar] [CrossRef]
- Bathla, G.; Bhadane, K.; Singh, R.K.; Kumar, R.; Aluvalu, R.; Krishnamurthi, R.; Kumar, A.; Thakur, R.; Basheer, S. Autonomous vehicles and intelligent automation: Applications, challenges, and opportunities. Mob. Inf. Syst. 2022, 2022, 7632892. [Google Scholar] [CrossRef]
- Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
- Li, J.; Pei, Y.; Zhao, S.; Xiao, R.; Sang, X.; Zhang, C. A review of remote sensing for environmental monitoring in China. Remote Sens. 2020, 12, 1130. [Google Scholar] [CrossRef]
- Chen, J.; Chen, S.; Fu, R.; Li, D.; Jiang, H.; Wang, C.; Peng, Y.; Jia, K.; Hicks, B.J. Remote sensing big data for water environment monitoring: Current status, challenges, and future prospects. Earth’s Future 2022, 10, e2021EF002289. [Google Scholar] [CrossRef]
- Pi, Y.; Nath, N.D.; Behzadan, A.H. Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv. Eng. Inform. 2020, 43, 101009. [Google Scholar] [CrossRef]
- Park, J.; Lee, D.; Lee, J.; Cheon, E.; Jeong, H. Study on Disaster Response Strategies Using Multi-Sensors Satellite Imagery. Korean J. Remote Sens. 2023, 39, 755–770. [Google Scholar]
- Rashid, M.; Bari, B.S.; Yusup, Y.; Kamaruddin, M.A.; Khan, N. A comprehensive review of crop yield prediction using machine learning approaches with special emphasis on palm oil yield prediction. IEEE Access 2021, 9, 63406–63439. [Google Scholar] [CrossRef]
- Masolele, R.N.; De Sy, V.; Herold, M.; Marcos, D.; Verbesselt, J.; Gieseke, F.; Mullissa, A.G.; Martius, C. Spatial and temporal deep learning methods for deriving land-use following deforestation: A pan-tropical case study using Landsat time series. Remote Sens. Environ. 2021, 264, 112600. [Google Scholar] [CrossRef]
- Sun, X.; Zhang, Y.; Shi, K.; Zhang, Y.; Li, N.; Wang, W.; Huang, X.; Qin, B. Monitoring water quality using proximal remote sensing technology. Sci. Total Environ. 2022, 803, 149805. [Google Scholar] [CrossRef] [PubMed]
- Shafique, A.; Cao, G.; Khan, Z.; Asad, M.; Aslam, M. Deep learning-based change detection in remote sensing images: A review. Remote Sens. 2022, 14, 871. [Google Scholar] [CrossRef]
- Desai, S.; Ghose, D. Active learning for improved semi-supervised semantic segmentation in satellite images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 553–563. [Google Scholar]
- Gu, X.; Angelov, P.P.; Zhang, C.; Atkinson, P.M. A semi-supervised deep rule-based approach for complex satellite sensor image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2281–2292. [Google Scholar] [CrossRef]
- Raghavan, R.; Verma, D.C.; Pandey, D.; Anand, R.; Pandey, B.K.; Singh, H. Optimized building extraction from high-resolution satellite imagery using deep learning. Multimed. Tools Appl. 2022, 81, 42309–42323. [Google Scholar] [CrossRef]
- Qin, R.; Liu, T. A review of landcover classification with very-high resolution remotely sensed optical images—Analysis unit, model scalability and transferability. Remote Sens. 2022, 14, 646. [Google Scholar] [CrossRef]
- Rezaee, K.; Rezakhani, S.M.; Khosravi, M.R.; Moghimi, M.K. A survey on deep learning-based real-time crowd anomaly detection for secure distributed video surveillance. Pers. Ubiquitous Comput. 2024, 28, 135–151. [Google Scholar] [CrossRef]
- Iqbal, M.J.; Iqbal, M.M.; Ahmad, I.; Alassafi, M.O.; Alfakeeh, A.S.; Alhomoud, A. Real-Time Surveillance Using Deep Learning. Secur. Commun. Netw. 2021, 2021, 6184756. [Google Scholar] [CrossRef]
- Schuartz, F.C.; Fonseca, M.; Munaretto, A. Improving threat detection in networks using deep learning. Ann. Telecommun. 2020, 75, 133–142. [Google Scholar] [CrossRef]
- Raut, M.; Dhavale, S.; Singh, A.; Mehra, A. Insider threat detection using deep learning: A review. In Proceedings of the 3rd International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi, India, 3–5 December 2020; pp. 856–863. [Google Scholar]
- Maddireddy, B.R.; Maddireddy, B.R. Advancing Threat Detection: Utilizing Deep Learning Models for Enhanced Cybersecurity Protocols. Rev. Esp. Doc. Cient. 2024, 18, 325–355. [Google Scholar]
- Salama AbdELminaam, D.; Almansori, A.M.; Taha, M.; Badr, E. A deep facial recognition system using computational intelligent algorithms. PLoS ONE 2020, 15, e0242269. [Google Scholar] [CrossRef] [PubMed]
- Singh, A.; Bhatt, S.; Nayak, V.; Shah, M. Automation of surveillance systems using deep learning and facial recognition. Int. J. Syst. Assur. Eng. Manag. 2023, 14, 236–245. [Google Scholar] [CrossRef]
- Saheb, T. Ethically contentious aspects of artificial intelligence surveillance: A social science perspective. AI Ethics 2023, 3, 369–379. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Wu, Y.C.; Zhou, M.; Fu, H. Beyond surveillance: Privacy, ethics, and regulations in face recognition technology. Front. Big Data 2024, 7, 1337465. [Google Scholar] [CrossRef]
- Smith, M.; Miller, S. The ethical application of biometric facial recognition technology. AI Soc. 2022, 37, 167–175. [Google Scholar] [CrossRef] [PubMed]
- Andrejevic, M.; Selwyn, N. Facial recognition technology in schools: Critical questions and concerns. Learn. Media Technol. 2020, 45, 115–128. [Google Scholar] [CrossRef]
- Ferrer, X.; Van Nuenen, T.; Such, J.M.; Coté, M.; Criado, N. Bias and discrimination in AI: A cross-disciplinary perspective. IEEE Technol. Soc. Mag. 2021, 40, 72–80. [Google Scholar] [CrossRef]
- Ntoutsi, E.; Fafalios, P.; Gadiraju, U.; Iosifidis, V.; Nejdl, W.; Vidal, M.E.; Ruggieri, S.; Turini, F.; Papadopoulos, S.; Krasanakis, E.; et al. Bias in data-driven artificial intelligence systems—An introductory survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1356. [Google Scholar] [CrossRef]
- Lee, R.S.; Lee, R.S. AI ethics, security and privacy. In Artificial Intelligence in Daily Life; Springer: Singapore, 2020; pp. 369–384. [Google Scholar] [CrossRef]
- Gupta, V.; Sambyal, N.; Sharma, A.; Kumar, P. Restoration of artwork using deep neural networks. Evol. Syst. 2021, 12, 439–446. [Google Scholar] [CrossRef]
- Gaber, J.A.; Youssef, S.M.; Fathalla, K.M. The role of artificial intelligence and machine learning in preserving cultural heritage and art works via virtual restoration. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 10, 185–190. [Google Scholar] [CrossRef]
- Mendoza, M.A.D.; De La Hoz Franco, E.; Gómez, J.E.G. Technologies for the preservation of cultural heritage—A systematic review of the literature. Sustainability 2023, 15, 1059. [Google Scholar] [CrossRef]
- Trček, D. Cultural heritage preservation by using blockchain technologies. Herit. Sci. 2022, 10, 6. [Google Scholar] [CrossRef]
- Belhi, A.; Bouras, A.; Al-Ali, A.K.; Foufou, S. A machine learning framework for enhancing digital experiences in cultural heritage. J. Enterp. Inf. Manag. 2023, 36, 734–746. [Google Scholar] [CrossRef]
- Leshkevich, T.; Motozhanets, A. Social perception of artificial intelligence and digitization of cultural heritage: Russian context. Appl. Sci. 2022, 12, 2712. [Google Scholar] [CrossRef]
- Yu, T.; Lin, C.; Zhang, S.; Wang, C.; Ding, X.; An, H.; Liu, X.; Qu, T.; Wan, L.; You, S.; et al. Artificial intelligence for Dunhuang cultural heritage protection: The project and the dataset. Int. J. Comput. Vis. 2022, 130, 2646–2673. [Google Scholar] [CrossRef]
- Fiorucci, M.; Khoroshiltseva, M.; Pontil, M.; Traviglia, A.; Del Bue, A.; James, S. Machine learning for cultural heritage: A survey. Pattern Recognit. Lett. 2020, 133, 102–108. [Google Scholar] [CrossRef]
- Kusters, R.; Misevic, D.; Berry, H.; Cully, A.; Le Cunff, Y.; Dandoy, L.; Díaz-Rodríguez, N.; Ficher, M.; Grizou, J.; Othmani, A.; et al. Interdisciplinary research in artificial intelligence: Challenges and opportunities. Front. Big Data 2020, 3, 577974. [Google Scholar] [CrossRef]
- Meron, Y. Graphic Design and Artificial Intelligence: Interdisciplinary Challenges for Designers in the Search for Research Collaboration. In Proceedings of the DRS Conference Proceedings, Bilbao, Spain, 25 June–3 July 2022. [Google Scholar] [CrossRef]
- Audry, S. Art in the Age of Machine Learning; MIT Press: Cambridge, MA, USA, 2021. [Google Scholar]
- Mello, M.M.; Wang, C.J. Ethics and governance for digital disease surveillance. Science 2020, 368, 951–954. [Google Scholar] [CrossRef] [PubMed]
- Dhirani, L.L.; Mukhtiar, N.; Chowdhry, B.S.; Newe, T. Ethical dilemmas and privacy issues in emerging technologies: A review. Sensors 2023, 23, 1151. [Google Scholar] [CrossRef]
- Drukker, K.; Chen, W.; Gichoya, J.; Gruszauskas, N.; Kalpathy-Cramer, J.; Koyejo, S.; Myers, K.; Sá, R.C.; Sahiner, B.; Whitney, H.; et al. Toward fairness in artificial intelligence for medical image analysis: Identification and mitigation of potential biases in the roadmap from data collection to model deployment. J. Med. Imaging 2023, 10, 061104. [Google Scholar] [CrossRef] [PubMed]
- Tripathi, S.; Musiolik, T.H. Fairness and ethics in artificial intelligence-based medical imaging. In Ethical Implications of Reshaping Healthcare with Emerging Technologies; IGI Global: Hershey, PA, USA, 2022; pp. 71–85. [Google Scholar]
- Santosh, K.; Gaur, L. Artificial Intelligence and Machine Learning in Public Healthcare: Opportunities and Societal Impact; Springer: Singapore, 2022. [Google Scholar]
- Panigutti, C.; Monreale, A.; Comandè, G.; Pedreschi, D. Ethical, societal and legal issues in deep learning for healthcare. Deep. Learn. Biol. Med. 2022, 265–313. [Google Scholar] [CrossRef]
- Hussain, I.; Nazir, M.B. Empowering Healthcare: AI, ML, and Deep Learning Innovations for Brain and Heart Health. Int. J. Adv. Eng. Technol. Innov. 2024, 1, 167–188. [Google Scholar]
- Khanna, S.; Srivastava, S. Patient-centric ethical frameworks for privacy, transparency, and bias awareness in deep learning-based medical systems. Appl. Res. Artif. Intell. Cloud Comput. 2020, 3, 16–35. [Google Scholar]
- Hogenhout, L. A framework for ethical AI at the United Nations. arXiv 2021, arXiv:2104.12547. [Google Scholar]
- Vegesna, V.V. Privacy-Preserving Techniques in AI-Powered Cyber Security: Challenges and Opportunities. Int. J. Mach. Learn. Sustain. Dev. 2023, 5, 1–8. [Google Scholar]
- Dhinakaran, D.; Sankar, S.; Selvaraj, D.; Raja, S.E. Privacy-Preserving Data in IoT-based Cloud Systems: A Comprehensive Survey with AI Integration. arXiv 2024, arXiv:2401.00794. [Google Scholar]
- Shanmugam, L.; Tillu, R.; Jangoan, S. Privacy-Preserving AI/ML Application Architectures: Techniques, Trade-offs, and Case Studies. J. Knowl. Learn. Sci. Technol. 2023, 2, 398–420. [Google Scholar] [CrossRef]
- Memarian, B.; Doleck, T. Fairness, Accountability, Transparency, and Ethics (FATE) in Artificial Intelligence (AI), and higher education: A systematic review. Comput. Educ. Artif. Intell. 2023, 5, 100152. [Google Scholar] [CrossRef]
- Akinrinola, O.; Okoye, C.C.; Ofodile, O.C.; Ugochukwu, C.E. Navigating and reviewing ethical dilemmas in AI development: Strategies for transparency, fairness, and accountability. GSC Adv. Res. Rev. 2024, 18, 050–058. [Google Scholar] [CrossRef]
- Lepore, D.; Dolui, K.; Tomashchuk, O.; Shim, H.; Puri, C.; Li, Y.; Chen, N.; Spigarelli, F. Interdisciplinary research unlocking innovative solutions in healthcare. Technovation 2023, 120, 102511. [Google Scholar] [CrossRef]
- Rasheed, K.; Qayyum, A.; Ghaly, M.; Al-Fuqaha, A.; Razi, A.; Qadir, J. Explainable, trustworthy, and ethical machine learning for healthcare: A survey. Comput. Biol. Med. 2022, 149, 106043. [Google Scholar] [CrossRef]
- Geroski, T.; Filipović, N. Artificial Intelligence Empowering Medical Image Processing. In Silico Clinical Trials for Cardiovascular Disease: A Finite Element and Machine Learning Approach; Springer: Cham, Switzerland, 2024; pp. 179–208. [Google Scholar]
- Castiglioni, I.; Rundo, L.; Codari, M.; Di Leo, G.; Salvatore, C.; Interlenghi, M.; Gallivanone, F.; Cozzi, A.; D’Amico, N.C.; Sardanelli, F. AI applications to medical images: From machine learning to deep learning. Phys. Medica 2021, 83, 9–24. [Google Scholar] [CrossRef]
- Gupta, S.; Kumar, S.; Chang, K.; Lu, C.; Singh, P.; Kalpathy-Cramer, J. Collaborative privacy-preserving approaches for distributed deep learning using multi-institutional data. RadioGraphics 2023, 43, e220107. [Google Scholar] [CrossRef]
- Kim, J.C.; Chung, K. Hybrid multi-modal deep learning using collaborative concat layer in health bigdata. IEEE Access 2020, 8, 192469–192480. [Google Scholar] [CrossRef]
- Qian, Y. Network Science, Big Data Analytics, and Deep Learning: An Interdisciplinary Approach to the Study of Citation, Social and Collaboration Networks. Ph.D. Thesis, Queen Mary University of London, London, UK, 2021. [Google Scholar]
- Peters, D.; Vold, K.; Robinson, D.; Calvo, R.A. Responsible AI—Two frameworks for ethical design practice. IEEE Trans. Technol. Soc. 2020, 1, 34–47. [Google Scholar] [CrossRef]
- Rakova, B.; Yang, J.; Cramer, H.; Chowdhury, R. Where responsible AI meets reality: Practitioner perspectives on enablers for shifting organizational practices. Proc. Acm Hum. Comput. Interact. 2021, 5, 1–23. [Google Scholar] [CrossRef]
- Sarker, I.; Colman, A.; Han, J.; Watters, P. Context-Aware Machine Learning and Mobile Data Analytics: Automated Rule-Based Services with Intelligent Decision-Making; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Unger, M.; Tuzhilin, A.; Livne, A. Context-aware recommendations based on deep learning frameworks. ACM Trans. Manag. Inf. Syst. (TMIS) 2020, 11, 1–15. [Google Scholar] [CrossRef]
- Jeong, S.Y.; Kim, Y.K. Deep learning-based context-aware recommender system considering contextual features. Appl. Sci. 2021, 12, 45. [Google Scholar] [CrossRef]
- Bansal, M.A.; Sharma, D.R.; Kathuria, D.M. A systematic review on data scarcity problem in deep learning: Solution and applications. ACM Comput. Surv. (CSUR) 2022, 54, 1–29. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Bai, J.; Al-Sabaawi, A.; Santamaría, J.; Albahri, A.S.; Al-dabbagh, B.S.N.; Fadhel, M.A.; Manoufali, M.; Zhang, J.; Al-Timemy, A.H.; et al. A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. J. Big Data 2023, 10, 46. [Google Scholar] [CrossRef]
- Dewi, C.; Chen, R.C.; Liu, Y.T.; Tai, S.K. Synthetic Data generation using DCGAN for improved traffic sign recognition. Neural Comput. Appl. 2022, 34, 21465–21480. [Google Scholar] [CrossRef]
- de Melo, C.M.; Torralba, A.; Guibas, L.; DiCarlo, J.; Chellappa, R.; Hodgins, J. Next-generation deep learning based on simulators and synthetic data. Trends Cogn. Sci. 2022, 26, 174–187. [Google Scholar] [CrossRef] [PubMed]
- Wen, Q.; Sun, L.; Yang, F.; Song, X.; Gao, J.; Wang, X.; Xu, H. Time series data augmentation for deep learning: A survey. arXiv 2020, arXiv:2002.12478. [Google Scholar]
- Khosla, C.; Saini, B.S. Enhancing performance of deep learning models with different data augmentation techniques: A survey. In Proceedings of the International Conference on Intelligent Engineering and Management (ICIEM), London, UK, 17–19 June 2020; pp. 79–85. [Google Scholar]
- Wani, M.A.; Bhat, F.A.; Afzal, S.; Khan, A.I. Advances in Deep Learning; Springer: Singapore, 2020. [Google Scholar] [CrossRef]
- Freire, P.; Srivallapanondh, S.; Napoli, A.; Prilepsky, J.E.; Turitsyn, S.K. Computational complexity evaluation of neural network applications in signal processing. arXiv 2022, arXiv:2206.12191. [Google Scholar]
- Murshed, M.S.; Murphy, C.; Hou, D.; Khan, N.; Ananthanarayanan, G.; Hussain, F. Machine learning at the network edge: A survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–37. [Google Scholar] [CrossRef]
- Merenda, M.; Porcaro, C.; Iero, D. Edge machine learning for ai-enabled iot devices: A review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef]
- Acun, B.; Murphy, M.; Wang, X.; Nie, J.; Wu, C.J.; Hazelwood, K. Understanding training efficiency of deep learning recommendation models at scale. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Republic of Korea, 27 Frebruary–3 March 2021; pp. 802–814. [Google Scholar]
- Menghani, G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
- Stiglic, G.; Kocbek, P.; Fijacko, N.; Zitnik, M.; Verbert, K.; Cilar, L. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1379. [Google Scholar] [CrossRef]
- Brigo, D.; Huang, X.; Pallavicini, A.; Borde, H.S.d.O. Interpretability in deep learning for finance: A case study for the Heston model. arXiv 2021, arXiv:2104.09476. [Google Scholar] [CrossRef]
- Von Eschenbach, W.J. Transparency and the black box problem: Why we do not trust AI. Philos. Technol. 2021, 34, 1607–1622. [Google Scholar] [CrossRef]
- Franzoni, V. From black box to glass box: Advancing transparency in artificial intelligence systems for ethical and trustworthy AI. In Proceedings of the International Conference on Computational Science and Its Applications, Athens, Greece, 3–6 July 2023; pp. 118–130. [Google Scholar]
- Saisubramanian, S.; Galhotra, S.; Zilberstein, S. Balancing the tradeoff between clustering value and interpretability. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–9 February 2020; pp. 351–357. [Google Scholar]
- He, C.; Ma, M.; Wang, P. Extract interpretability-accuracy balanced rules from artificial neural networks: A review. Neurocomputing 2020, 387, 346–358. [Google Scholar] [CrossRef]
- Zhao, L.; Liu, T.; Peng, X.; Metaxas, D. Maximum-entropy adversarial data augmentation for improved generalization and robustness. Adv. Neural Inf. Process. Syst. 2020, 33, 14435–14447. [Google Scholar]
- Zhang, L.; Deng, Z.; Kawaguchi, K.; Ghorbani, A.; Zou, J. How does mixup help with robustness and generalization? arXiv 2020, arXiv:2010.04819. [Google Scholar]
- Bai, T.; Luo, J.; Zhao, J.; Wen, B.; Wang, Q. Recent advances in adversarial training for adversarial robustness. arXiv 2021, arXiv:2102.01356. [Google Scholar]
- Han, D.; Wang, Z.; Zhong, Y.; Chen, W.; Yang, J.; Lu, S.; Shi, X.; Yin, X. Evaluating and improving adversarial robustness of machine learning-based network intrusion detectors. IEEE J. Sel. Areas Commun. 2021, 39, 2632–2647. [Google Scholar] [CrossRef]
- Taori, R.; Dave, A.; Shankar, V.; Carlini, N.; Recht, B.; Schmidt, L. Measuring robustness to natural distribution shifts in image classification. Adv. Neural Inf. Process. Syst. 2020, 33, 18583–18599. [Google Scholar]
- Wiles, O.; Gowal, S.; Stimberg, F.; Alvise-Rebuffi, S.; Ktena, I.; Dvijotham, K.; Cemgil, T. A fine-grained analysis on distribution shift. arXiv 2021, arXiv:2110.11328. [Google Scholar]
- Puyol-Antón, E.; Ruijsink, B.; Piechnik, S.K.; Neubauer, S.; Petersen, S.E.; Razavi, R.; King, A.P. Fairness in cardiac MR image analysis: An investigation of bias due to data imbalance in deep learning based segmentation. In Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Strasbourg, France, 27 September–1 October 2021; pp. 413–423. [Google Scholar]
- Shah, M.; Sureja, N. A Comprehensive Review of Bias in Deep Learning Models: Methods, Impacts, and Future Directions. Arch. Comput. Methods Eng. 2024, 32, 255–267. [Google Scholar] [CrossRef]
- Almeida, D.; Shmarko, K.; Lomas, E. The ethics of facial recognition technologies, surveillance, and accountability in an age of artificial intelligence: A comparative analysis of US, EU, and UK regulatory frameworks. AI Ethics 2022, 2, 377–387. [Google Scholar] [CrossRef]
- Fontes, C.; Perrone, C. Ethics of Surveillance: Harnessing the Use of Live Facial Recognition Technologies in Public Spaces for Law Enforcement; Technical University of Munich: Munich, Germany, 2021. [Google Scholar]
- Alikhademi, K.; Drobina, E.; Prioleau, D.; Richardson, B.; Purves, D.; Gilbert, J.E. A review of predictive policing from the perspective of fairness. Artif. Intell. Law 2022, 30, 1–17. [Google Scholar] [CrossRef]
- Yen, C.P.; Hung, T.W. Achieving equity with predictive policing algorithms: A social safety net perspective. Sci. Eng. Ethics 2021, 27, 1–16. [Google Scholar] [CrossRef] [PubMed]
- Akrim, A.; Gogu, C.; Vingerhoeds, R.; Salaün, M. Self-Supervised Learning for data scarcity in a fatigue damage prognostic problem. Eng. Appl. Artif. Intell. 2023, 120, 105837. [Google Scholar] [CrossRef]
- Wittscher, L.; Pigorsch, C. Exploring Self-supervised Capsule Networks for Improved Classification with Data Scarcity. In Proceedings of the International Conference on Image Processing and Capsule Networks, Bangkok, Thailand, 20–21 May 2022; pp. 36–50. [Google Scholar]
- Bekker, J.; Davis, J. Learning from positive and unlabeled data: A survey. Mach. Learn. 2020, 109, 719–760. [Google Scholar] [CrossRef]
- Guo, L.Z.; Zhang, Z.Y.; Jiang, Y.; Li, Y.F.; Zhou, Z.H. Safe deep semi-supervised learning for unseen-class unlabeled data. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; pp. 3897–3906. [Google Scholar]
- Huang, W.; Yi, M.; Zhao, X.; Jiang, Z. Towards the generalization of contrastive self-supervised learning. arXiv 2021, arXiv:2111.00743. [Google Scholar]
- Kim, D.; Yoo, Y.; Park, S.; Kim, J.; Lee, J. Selfreg: Self-supervised contrastive regularization for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 9619–9628. [Google Scholar]
- Wang, D.; Li, M.; Gong, C.; Chandra, V. Attentivenas: Improving neural architecture search via attentive sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6418–6427. [Google Scholar]
- White, C.; Zela, A.; Ru, R.; Liu, Y.; Hutter, F. How powerful are performance predictors in neural architecture search? Adv. Neural Inf. Process. Syst. 2021, 34, 28454–28469. [Google Scholar]
- Kim, J.; Chang, S.; Kwak, N. PQK: Model compression via pruning, quantization, and knowledge distillation. arXiv 2021, arXiv:2106.14681. [Google Scholar]
- Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 2021, 461, 370–403. [Google Scholar] [CrossRef]
- Marković, D.; Grollier, J. Quantum neuromorphic computing. Appl. Phys. Lett. 2020, 117, 150501. [Google Scholar] [CrossRef]
- Ghosh, S.; Nakajima, K.; Krisnanda, T.; Fujii, K.; Liew, T.C. Quantum neuromorphic computing with reservoir computing networks. Adv. Quantum Technol. 2021, 4, 2100053. [Google Scholar] [CrossRef]
- Bento, V.; Kohler, M.; Diaz, P.; Mendoza, L.; Pacheco, M.A. Improving deep learning performance by using Explainable Artificial Intelligence (XAI) approaches. Discov. Artif. Intell. 2021, 1, 1–11. [Google Scholar] [CrossRef]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Van der Velden, B.H.; Kuijf, H.J.; Gilhuijs, K.G.; Viergever, M.A. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal. 2022, 79, 102470. [Google Scholar] [CrossRef] [PubMed]
- Chen, Z.; Xiao, F.; Guo, F.; Yan, J. Interpretable machine learning for building energy management: A state-of-the-art review. Adv. Appl. Energy 2023, 9, 100123. [Google Scholar] [CrossRef]
- Molnar, C.; Casalicchio, G.; Bischl, B. Interpretable machine learning–a brief history, state-of-the-art and challenges. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Ghent, Belgium, 14–18 September 2020; pp. 417–431. [Google Scholar]
- Nannini, L.; Balayn, A.; Smith, A.L. Explainability in AI policies: A critical review of communications, reports, regulations, and standards in the EU, US, and UK. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, Chicago, IL, USA, 12–15 June 2023; pp. 1198–1212. [Google Scholar]
- Ebers, M. Regulating explainable AI in the European Union. An overview of the current legal framework(s). In Nordic Yearbook of Law and Informatics; The Swedish Law and Informatics Research Institute: Stockholm, Swedish, 2020. [Google Scholar]
- Alchieri, L.; Badalotti, D.; Bonardi, P.; Bianco, S. An introduction to quantum machine learning: From quantum logic to quantum deep learning. Quantum Mach. Intell. 2021, 3, 28. [Google Scholar] [CrossRef]
- Peral-García, D.; Cruz-Benito, J. and García-Peñalvo, F.J. Systematic literature review: Quantum machine learning and its applications. Comput. Sci. Rev. 2024, 51, 100619. [Google Scholar] [CrossRef]
- Dou, W.; Zhao, X.; Yin, X.; Wang, H.; Luo, Y.; Qi, L. Edge computing-enabled deep learning for real-time video optimization in IIoT. IEEE Trans. Ind. Inform. 2020, 17, 2842–2851. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, M.; Wang, X.; Ma, X.; Liu, J. Deep learning for edge computing applications: A state-of-the-art survey. IEEE Access 2020, 8, 58322–58336. [Google Scholar] [CrossRef]
- Zhang, C.; Wang, J.; Yen, G.G.; Zhao, C.; Sun, Q.; Tang, Y.; Qian, F.; Kurths, J. When autonomous systems meet accuracy and transferability through AI: A survey. Patterns 2020, 1, 100050. [Google Scholar] [CrossRef]
- Sollini, M.; Bartoli, F.; Marciano, A.; Zanca, R.; Slart, R.H.; Erba, P.A. Artificial intelligence and hybrid imaging: The best match for personalized medicine in oncology. Eur. J. Hybrid Imaging 2020, 4, 1–22. [Google Scholar] [CrossRef]
- Nanda, V.; Dooley, S.; Singla, S.; Feizi, S.; Dickerson, J.P. Fairness through robustness: Investigating robustness disparity in deep learning. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, Virtual Event, Canada, 3–10 March 2021; pp. 466–477. [Google Scholar]
- Hamon, R.; Junklewitz, H.; Sanchez, I. Robustness and explainability of artificial intelligence. Publ. Off. Eur. Union 2020, 207, 2020. [Google Scholar]
- Munoko, I.; Brown-Liburd, H.L.; Vasarhelyi, M. The ethical implications of using artificial intelligence in auditing. J. Bus. Ethics 2020, 167, 209–234. [Google Scholar] [CrossRef]
- Adelakun, B.O. Ethical Considerations in the Use of AI for Auditing: Balancing Innovation and Integrity. Eur. J. Account. Audit. Financ. Res. 2022, 10, 91–108. [Google Scholar]
- Mökander, J. Auditing of AI: Legal, ethical and technical approaches. Digit. Soc. 2023, 2, 49. [Google Scholar] [CrossRef]
- Ashok, M.; Madan, R.; Joha, A.; Sivarajah, U. Ethical framework for Artificial Intelligence and Digital technologies. Int. J. Inf. Manag. 2022, 62, 102433. [Google Scholar] [CrossRef]
- Xu, J. A review of self-supervised learning methods in the field of medical image analysis. Int. J. Image Graph. Signal Process. (IJIGSP) 2021, 13, 33–46. [Google Scholar] [CrossRef]
- Taleb, A.; Lippert, C.; Klein, T.; Nabi, M. Multimodal self-supervised learning for medical image analysis. In Proceedings of the 27th International Conference on Information Processing in Medical Imaging, Virtual Event, 28–30 June 2021; pp. 661–673. [Google Scholar]
- Zeebaree, S.R.; Ahmed, O.; Obid, K. Csaernet: An efficient deep learning architecture for image classification. In Proceedings of the 3rd International Conference on Engineering Technology and its Applications (IICETA), Najaf, Iraq, 6–7 September 2020; pp. 122–127. [Google Scholar]
- Özyurt, F. Efficient deep feature selection for remote sensing image recognition with fused deep learning architectures. J. Supercomput. 2020, 76, 8413–8431. [Google Scholar] [CrossRef]
- Jin, W.; Li, X.; Fatehi, M.; Hamarneh, G. Guidelines and evaluation of clinical explainable AI in medical image analysis. Med. Image Anal. 2023, 84, 102684. [Google Scholar] [CrossRef] [PubMed]
- Han, S.H.; Kwon, M.S.; Choi, H.J. EXplainable AI (XAI) approach to image captioning. J. Eng. 2020, 2020, 589–594. [Google Scholar] [CrossRef]
- Yang, G.; Rao, A.; Fernandez-Maloigne, C.; Calhoun, V.; Menegaz, G. Explainable AI (XAI) in biomedical signal and image processing: Promises and challenges. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 1531–1535. [Google Scholar]
Architecture | Innovation | References |
---|---|---|
CNNs | They are the foundation for image processing, enable automatic spatial hierarchy capture through convolutional layers, which process image patterns at different levels of granularity. | [19] |
ResNets | Introduced residual connections to address vanishing gradient problems, allowing deeper networks to be trained by learning residual functions rather than direct mappings. | [20,21] |
DenseNets | DenseNets enable direct connections between all layers to enhance feature reuse, reduce computational costs, and improve efficiency in image classification and object detection tasks. | [22,23,24] |
Multi-branch Architectures | Inception networks process image features at multiple scales simultaneously within a single model, significantly improving performance on complex tasks like semantic segmentation. | [25,26,27,28] |
YOLO | YOLO transformed object detection with a single network approach, simultaneously predicting bounding boxes and class probabilities, achieving real-time efficiency with high accuracy. | [29,30] |
ConvNext | ConvNext integrates principles from vision transformers into traditional CNNs, improving performance with innovations like depth-wise convolutions and larger kernel sizes while retaining simplicity. | [31,32,33] |
FCNs | FCNs replace fully connected layers with convolutional ones, preserving spatial hierarchies for dense predictions in tasks such as semantic segmentation. | [34] |
U-Net | U-Net’s encoder–decoder structure with skip connections enables precise boundary delineation, making it especially effective for medical imaging and other pixel-level prediction tasks. | [35,36,37] |
Mask R-CNN | Mask R-CNN extends object detection by integrating segmentation, creating pixel-level masks for detected objects, which is valuable for tasks like autonomous driving and video analysis. | [38,39] |
Specialized Task-Specific Architectures | Tailored architectures address specific challenges in advanced image processing, ensuring accuracy and efficiency in highly specialized domains. | [40,41,42] |
ViTs | Vision transformers handle global dependencies in images by modeling them as sequences of patches, offering advantages in scene understanding and holistic image analysis | [43,44,45,46,47] |
Self-Attention Mechanisms | Self-attention dynamically prioritizes relevant image regions for tasks like classification and segmentation, enabling robust generalization across diverse datasets. | [48,49,50,51,52,53] |
GANs | GANs use adversarial training between a generator and discriminator to create realistic images, excelling in tasks like image synthesis, super-resolution, and style transfer. | [54,55,56,57] |
CGANs | CGANs integrate class labels or other auxiliary information into GANs, enabling controlled generation of specific types of images based on given conditions. | [58] |
WGANs | WGANs improve GAN training stability by introducing a Wasserstein distance-based loss function, addressing mode collapse and convergence issues. | [59] |
Other GAN Applications | GANs are used for synthetic data generation, data augmentation, and domain adaptation, improving robustness and generalization in low-data scenarios and cross-domain tasks. | [60,61,62,63,64] |
Diffusion Models | Diffusion models utilize a probabilistic framework to iteratively add and remove noise, achieving state-of-the-art results in tasks like image restoration, synthesis, and denoising. | [65,66] |
Topic | Description | References |
---|---|---|
Pre-trained models and transfer learning strategies | Analyze transfer learning using CNN- and transformer-based pre-trained models and their application in medical imaging. Also focused on key categories, i.e., adversarial-based and network-based (fine-tuning, freezing CNN layers, and progressive learning). | [67,68,69,70,71,72,73] |
Negative transfer | Highlights the issue of negative transfer, where source and target tasks differ significantly, hindering performance, and strategies to mitigate its impact. | [74,75] |
Negative transfer: mitigation strategies | Explores data transferability, model transferability, training process enhancement, and prediction refinement strategies. | [76] |
Concept-wise fine-tuning is a model transferability method. | [77] | |
HTPL, a feature-based transfer learning approach that progressively fine-tunes features ensuring effective domain alignment and mitigating negative transfer issues. | [78] |
Technique | Description | References |
---|---|---|
Basic Methods | Basic augmentation techniques (e.g., rotation, scaling) for increasing dataset diversity and preventing overfitting, especially when large datasets are impractical or expensive. | [79,80,81,82,83] |
Advanced Modern Approaches | Techniques like Cutout, Mixup, and CutMix enhance model robustness by introducing complex image variations and encouraging focus on global context rather than localized features. | [84] |
Complex Image Transformations | Reviews the application of techniques such as blending images, masking, and targeted transformations to improve model generalization and reduce overfitting. | [85,86,87] |
Automated Strategies | AutoAugment and RandAugment that leverage optimization and RL to identify the most effective augmentation policies for specific datasets, significantly improving performance with reduced manual effort. | [88,89,90] |
Topic/Technique | Description | References |
---|---|---|
Comprehensive survey in Regularization | Reviews traditional and modern regularization methods, comparing their effectiveness, computational cost, and applicability to mitigate overfitting in DL. | [91] |
Dropout | Prevents overfitting by randomly deactivating neurons during training, forcing the model to learn redundant feature representations and increasing robustness. | [92,93,94] |
Weight Decay | Penalizes large weights by adding a regularization term to the loss function, discouraging excessive reliance on specific parameters and improving generalization. Also, disharmony issues are discussed with weight normalization methods. | [95,96] |
Batch Normalization | Stabilizes and accelerates training by normalizing layer inputs, reducing internal covariate shift, and indirectly functioning as a regularization technique to improve model performance. | [97,98] |
Topic | Description | References |
---|---|---|
Adversarial examples and Training | Overview of methods, challenges, and opportunities for generating adversarial examples to expose and improve model robustness. | [99,100,101,102,103] |
FGSM | A computationally efficient method for generating adversarial examples with minimal perturbation. | [104] |
PGD | An iterative approach and stronger method for crafting adversarial examples by refining perturbations stepwise. | [105] |
Free Adversarial Training | Efficiently reuses gradient computations via minibatch replays to achieve robustness with reduced cost. | [106] |
UPGD | Enhanced PGD algorithm for generating universal adversarial perturbations, balancing accuracy and robustness. | [107] |
MAT | Leverages models of natural variation to generate adversarial examples, enhancing robustness against naturally shifted datasets. | [108] |
Technique | Description | References |
---|---|---|
Self-supervised and Unsupervised Learning | Techniques focusing on how they improve image classification without labeled data. | [109] |
Proxy-based Learning | Frameworks for spectral–spatial hyperspectral image classification, enhancing robustness and efficiency. | [110] |
Contrastive Learning | Reviews contrastive learning techniques in self-supervised frameworks, highlighting their success in extracting meaningful representations. | [111] |
Multi-task Learning | Proposes frameworks for self-supervised learning in specific domains like medical imaging. | [112] |
Self-supervised Learning Improvements | Mitigating issues related to proxy task specificity, improving performance across various downstream applications. | [113] |
Representation Learning | Explores the combinations of labeled and unlabeled data for unified speech and visual representation. | [114] |
Reviews advances and challenges in self-supervised representation learning, emphasizing its potential in scalable applications. | [115] | |
Context Autoencoders: demonstrates their use for effective representation learning in image processing tasks. | [116] | |
Self-supervised methods | Reviews key methods such as SimCLR and MoCo, focusing on robust representation learning through contrastive approaches. | [117,118,119] |
Advanced self-supervised techniques | Integration of attention mechanisms and other advanced approaches to enhance capabilities in self-supervised frameworks. | [120,121,122] |
Unsupervised Learning and Dimensionality reduction | Examines the use of clustering with UMAP in uncovering intrinsic structures of data, improving latent feature representations. Highlights the effectiveness of preserving essential features while reducing redundancy. | [123,124] |
3D Convolutional Autoencoders | Explains their application in compact representation of hyperspectral image data. | [125] |
Unsupervised Learning frameworks | Discusses applications into tasks such as medical image analysis, denoising, and segmentation. | [126,127,128] |
Technique | Description | References |
---|---|---|
Domain variability | Discusses the challenges caused by domain shifts, such as differences in lighting, resolution, or imaging devices. | [129] |
Domain generalization | Focuses on training models to perform robustly on unseen domains without direct access to target domain data during training. | [130] |
DDC | Aligns feature distributions across multiple source domains for robust generalization. | [131] |
DICA | Aligns features for learning domain-invariant representations. | [132] |
Episodic training | Frameworks to enhance robustness by simulating domain shifts during training to prepare models for unseen variations. | [133] |
DANNs | Adversarial learning to generate domain-agnostic features, improving generalization across domains. | [134] |
CycleGAN | Transform target domain images into the appearance of the source domain (style transfer), improving alignment. | [135] |
Demonstrates its application in autonomous driving for adapting object detection models in rural environments despite training in urban areas. | [136] |
Technique | Description | References |
---|---|---|
Metric-based | Prototypical, siamese, and matching networks classify new data points by comparing them to learned class prototypes using feature extractors and similarity metrics. | [137,138] |
Model-based | MANNs combine NNs with external memory modules to enhance learning efficiency. The SNAIL improves parameter tuning efficiency. | [139] |
Optimization-based | MAML fine-tunes model parameters for rapid task adaptation. Other methods: META-LSTM, META-SGD, Reptile. | [138,139,140] |
Technique | Description | References |
---|---|---|
Vision-language models (e.g., CLIP) | Enables classification tasks using textual prompts enhancing applications like environmental monitoring and disaster assessment. | [141,142,143,144] |
Interactive segmentation | Click prompt learning for real-time refinement of outputs using user-provided prompts particularly useful in medical imaging. Interactive Medical Image Learning Framework using DL algorithms trained during the user study compared in performance against state-of-the-art modern augmentations. PRISM model applied to 3D medical images’ segmentation with significantly improved performance. In-depth analysis of the foundational principles of interactive segmentation methodologies and categorization based on common characteristics in the field of medical imaging. | [145,146,147,148] |
Zero-shot learning | Utilizes task-specific prompts to adapt models trained on general datasets for niche tasks, such as land-use mapping or satellite imagery analysis without retraining. | [149] |
Technique | Description | References |
---|---|---|
Model Compression Overview | Highlights the need for compression in DL to reduce computational cost and memory usage for deployment in resource-constrained environments such as edge devices. | [150] |
Pruning | Removing redundancy or insignificant parameters (e.g., weights, neurons, layers) to reduce the model size, computational requirements, and inference time. Includes structured and unstructured pruning approaches. | [151,152,153,154,155,156,157] |
Quantization | Reduced parameter precision (e.g., from 32-bit floats to 8-bit integers) for faster computation and memory usage while preserving model accuracy, enabling efficient deployment. | [158,159] |
Knowledge Distillation | Transfers knowledge from a large teacher model to a smaller student model, preserving essential characteristics while enhancing scalability and efficiency for deployment. | [160,161] |
Energy-Efficient Architectures | Focuses on developing architectures that consume less power and are optimized for specific hardware, including FPGA and ASIC implementations. | [162] |
Self-Distillation Methods | Refine models using predictions from their intermediate layers as supervisory signals, improving performance without external teacher models. | [163,164] |
RL for Optimization | Explores RL-based strategies to automate pruning and quantization, achieving optimal compression and performance trade-offs. | [165,166] |
Cooperative Compression | Discusses collaborative approaches like joint optimization of pruning, quantization, and distillation for maximum resource utilization and scalability. | [167] |
Model | Description | References |
---|---|---|
ResNets | Introduced skip connections to address vanishing gradient problems, enabling the training of very deep networks and improving performance in tasks like classification and detection. | [168,169,170,171,172] |
ResNeXt | Utilized a split–transform–merge strategy to aggregate transformations, enhancing feature diversity and capture efficiency. | [173] |
DenseNet | Introduced densely connected layers to promote feature reuse, reduce the number of parameters, and improve computational efficiency and accuracy. | [174,175] |
Attention Mechanisms | Focus dynamically on the most relevant input regions, enhancing spatial dependency modeling for tasks like classification, segmentation, and detection. | [176,177,178,179,180] |
Self-Attention | Captures long-range dependencies by relating all elements within a sequence, boosting spatial and temporal understanding for image processing tasks. | [181] |
ViTs | Treat images as sequences of patches and use self-attention mechanisms for global dependency modeling and scalability with large datasets. | [182,183,184] |
Hybrid Architectures | Combine CNNs for local feature extraction with transformers for global context modeling, achieving improved performance for complex image tasks. | [185] |
GANs | Involves a generator and a discriminator in an adversarial process to synthesize realistic synthetic images, with applications in style transfer and super-resolution. | [186,187,188,189,190] |
WGANs | Introduces a stable loss function to mitigate mode collapse and training instability issues in GANs. | [191] |
CGANs | Enable controlled image generation using auxiliary information, improving specific task performance like cross-domain synthesis. | [192] |
GANs challenges | Limitations such as training instability, mode collapse, and computational resource demands, emphasizing the need for careful design. | [193,194,195] |
Hybrid Models | Combine CNNs with transformers to leverage local feature extraction and global dependency modeling for tasks like video analysis and visual question answering. | [196,197,198,199,200,201] |
Multi-Modal Models | Integrate visual, textual, auditory, or sensory data to enhance understanding and decision-making in tasks like medical diagnostics and autonomous driving. | [202,205,206] |
Cross-Modal Transformers | Employ transformers for fusing multiple modalities, such as infrared and visible image fusion, enhancing model adaptability and performance across data modalities. | [203] |
Co-Attention Fusion Networks | Focus on aligning multi-modal data streams for specific tasks like multimodal skin cancer diagnosis, improving feature integration and decision accuracy. | [204] |
Category | Metric |
---|---|
Classification | Accuracy Precision Recall (Sensitivity) F1-Score AUC-ROC Log Loss |
Segmentation and Detection | IoU Dice Coefficient Jaccard Index Pixel Accuracy |
Image Quality | SSIM PSNR NCC |
Object Detection Metrics | mAP |
Agreement | Cohen’s Kappa MCC |
Advanced Evaluation | Balanced Accuracy FID |
Topic | References | Description |
---|---|---|
Medical Imaging | [213,214,215,216,217,218,219,220,221] | Discusses the revolutionary impact of CNNs on medical diagnostics, such as cancer detection, Alzheimer’s, and diabetic retinopathy. It also highlights how DL models aid in treatment planning and patient monitoring. |
Health monitoring | [222,223,224,225,226,227,228] | Continuous health monitoring using wearable devices and self-supervised learning and AI, addressing challenges such as bias and disparities in diagnostic accuracy across different demographic groups. |
Autonomous systems | [229,230,231,232,233,234,235] | Discusses DL applications such as object detection, lane-keeping, and obstacle avoidance in self-driving cars, focusing on real-time decision-making systems like ADASs. |
AI and edge computing in autonomous systems | [236,237,238,239,240] | Recent advancements in real-time AI and edge computing enhance autonomous systems’ efficiency. |
Remote Sensing and Environmental Monitoring | [241,242,243,244,245,246,247] | Discusses the applications of DL in analyzing satellite and aerial imagery, particularly for tracking deforestation, wildlife, damage assessment from natural disasters, and predicting crop yields. |
Enhancing Environmental Monitoring | [248,249,250,251,252,253] | Highlights the integration of DL with remote sensing for environmental decision-making, as well as challenges like the computational cost of processing high-resolution images. |
Security and Surveillance | [254,255,256,257,258,259,260] | Pertains to the role of DL in real-time video surveillance, facial recognition, and anomaly detection. |
Security Surveillance and Ethical Concerns | [261,262,263,264,265,266,267] | Discusses the ethical implications of deploying DL in surveillance systems, including concerns around privacy, the potential misuse of technology, and bias. |
Art and Cultural Heritage | [268,269,270,271,272,273] | Describe the applications of DL in restoring damaged artwork, colorizing old photographs, and digitizing cultural artifacts. |
AI Collaboration in Cultural Preservation | [274,275,276,277,278] | Focuses on interdisciplinary collaboration between art historians and AI researchers to ensure DL respects cultural integrity and enhances public engagement. |
Ethical and Social Considerations | [279,280,281,282,283,284,285,286,287,288,289,290,291,292] | Addresses bias in DL models, particularly in medical imaging and surveillance, as well as the need for fairness, transparency, and accountability in AI systems and privacy-preserving algorithms. |
Interdisciplinary Collaboration | [293,294,295,296,297,298,299,300,301,302,303,304] | Highlights the importance of collaboration between AI researchers and domain experts in healthcare, environmental science, and security for advancing DL applications. |
Challenges | References |
---|---|
Data scarcity, particularly in medical imaging, autonomous vehicles, and satellite imagery | [305,306,307,308,309,310] |
Computational complexity of DL models and the challenge of deployment on edge devices | [311,312,313,314,315,316] |
Interpretability challenges, especially in healthcare, finance, and law | [317,318,319,320,321,322] |
Generalization and robustness challenges in DL models | [323,324,325,326,327,328] |
Ethical implications, including bias and privacy concerns in AI systems | [329,330,331,332,333,334] |
Future Directions | References |
---|---|
Self-supervised learning and data scarcity | [335,336,337,338,339,340] |
Efficient model architectures, NAS, pruning, quantization, distillation, neuromorphic and quantum computing | [341,342,343,344,345,346] |
XAI and methods for making DL models interpretable | [347,348,349,350,351,352,353] |
Integration of emerging technologies with DL, quantum computing, and edge computing | [354,355,356,357,358,359] |
Development of new evaluation metrics, fairness, and ethical considerations | [360,361,362,363,364,365] |
Opportunities for innovation in self-supervised learning, efficient model architectures, explainable AI, and emerging technologies | [366,367,368,369,370,371,372] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Trigka, M.; Dritsas, E. A Comprehensive Survey of Deep Learning Approaches in Image Processing. Sensors 2025, 25, 531. https://doi.org/10.3390/s25020531
Trigka M, Dritsas E. A Comprehensive Survey of Deep Learning Approaches in Image Processing. Sensors. 2025; 25(2):531. https://doi.org/10.3390/s25020531
Chicago/Turabian StyleTrigka, Maria, and Elias Dritsas. 2025. "A Comprehensive Survey of Deep Learning Approaches in Image Processing" Sensors 25, no. 2: 531. https://doi.org/10.3390/s25020531
APA StyleTrigka, M., & Dritsas, E. (2025). A Comprehensive Survey of Deep Learning Approaches in Image Processing. Sensors, 25(2), 531. https://doi.org/10.3390/s25020531