Better with Less: Efficient and Accurate Skin Lesion Segmentation Enabled by Diffusion Model Augmentation
Abstract
1. Introduction
- An enhanced DDPM architecture for high-fidelity dermoscopic image synthesis. We design a DDPM whose U-Net backbone is specifically modified with dilated convolutions and self-attention layers to capture the unique characteristics of diffuse borders and complex internal textures.
- A powerful data augmentation framework that boosts model efficiency. We systematically demonstrate that our synthetic data augmentation strategy provides consistent and substantial performance gains across a broad spectrum of segmentation architectures, from lightweight to complex models. Our framework enables compact, computationally efficient models to achieve accuracy on par with, or even exceeding, those models with more parameters, which is beneficial for deploying accurate models in resource-constrained environments.
- Rigorous and comprehensive experimental validation. We conduct a thorough evaluation of our framework on standard benchmark datasets. Statistical validation of segmentation improvements across multiple architectures, and a deep dive into the trade-offs between model complexity and performance, proving the efficacy and generalizability of our approach.
2. Related Works
2.1. Skin Lesion Segmentation with Deep Learning
2.2. Generative Models for Medical Data Augmentation
3. Proposed Method
3.1. Stage 1: Enhanced DDPM for Data Synthesis
3.2. Stage 2: Segmentation Training and Inference
3.2.1. Model Architecture: The Dilated-UNet
3.2.2. Loss Function and Training
4. Experiments and Results
4.1. Dataset
4.2. Experimental Setup
4.3. Evaluation Metrics
- are the mean and covariance of the real image features.
- are the mean and covariance of the generated image features.
- is the squared Euclidean distance between the two mean vectors.
- denotes the trace of a matrix (sum of diagonal elements).
- is the matrix square root of the product of the two covariance matrices.
4.4. Composite Image Quality Assessment
4.5. Generalization of Data Augmentation to Other Segmentation Architectures
4.6. Impact of Synthetic Data on Segmentation Performance
- Full Synthetic Data: 0 real images + 1000 synthetic images.
- Real/Synthetic (1:10): 100 real images + 1000 synthetic images.
- Real/Synthetic (1:2): 500 real images + 1000 synthetic images.
- Real/Synthetic (1:1): 1000 real images + 1000 synthetic images.
- Real/Synthetic (2:1): 2000 real images + 1000 synthetic images.
4.7. Analysis of Augmentation Efficacy and Model Efficiency
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Flavio, J.J.; Fernandez, S. The Rising Incidence of Skin Cancers in Young Adults: A Population-Based Study in Brazil. Sci. J. Dermatol. Venereol. 2025, 3, 39–53. [Google Scholar]
- Waseh, S.; Lee, J.B. Advances in melanoma: Epidemiology, diagnosis, and prognosis. Front. Med. 2023, 10, 1268479. [Google Scholar] [CrossRef]
- Abbas, Q.; Fondón, I.; Rashid, M. Unsupervised skin lesions border detection via two-dimensional image analysis. Comput. Methods Programs Biomed. 2011, 104, e1–e15. [Google Scholar] [CrossRef]
- Singh, G.; Kamalja, A.; Patil, R.; Karwa, A.; Tripathi, A.; Chavan, P. A comprehensive assessment of artificial intelligence applications for cancer diagnosis. Artif. Intell. Rev. 2024, 57, 179. [Google Scholar] [CrossRef]
- Khouloud, S.; Ahlem, M.; Fadel, T.; Amel, S. W-net and inception residual network for skin lesion segmentation and classification. Appl. Intell. 2022, 52, 3976–3994. [Google Scholar] [CrossRef]
- Zhang, J.; Xie, Y.; Xia, Y.; Shen, C. Attention residual learning for skin lesion classification. IEEE Trans. Med. Imaging 2019, 38, 2092–2103. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 2019, 39, 1856–1867. [Google Scholar] [CrossRef]
- Kaur, R.; Kaur, S. Automatic skin lesion segmentation using attention residual U-Net with improved encoder-decoder architecture. Multimed. Tools Appl. 2025, 84, 4315–4341. [Google Scholar] [CrossRef]
- Chen, J.; Mei, J.; Li, X.; Lu, Y.; Yu, Q.; Wei, Q.; Luo, X.; Xie, Y.; Adeli, E.; Wang, Y.; et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med. Image Anal. 2024, 97, 103280. [Google Scholar] [CrossRef]
- Bie, Y.; Luo, L.; Chen, H. Mica: Towards explainable skin lesion diagnosis via multi-level image-concept alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 837–845. [Google Scholar]
- El-Shafai, W.; El-Fattah, I.A.; Taha, T.E. Advancements in non-invasive optical imaging techniques for precise diagnosis of skin disorders. Opt. Quantum Electron. 2024, 56, 1112. [Google Scholar] [CrossRef]
- Jiang, H.; Imran, M.; Zhang, T.; Zhou, Y.; Liang, M.; Gong, K.; Shao, W. Fast-DDPM: Fast denoising diffusion probabilistic models for medical image-to-image generation. IEEE J. Biomed. Health Inform. 2025; Early Access. [Google Scholar] [CrossRef]
- Bourou, A.; Boyer, T.; Gheisari, M.; Daupin, K.; Dubreuil, V.; De Thonel, A.; Mezger, V.; Genovesio, A. PhenDiff: Revealing subtle phenotypes with diffusion models in real images. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 358–367. [Google Scholar]
- Huijben, E.M.; Pluim, J.P.; van Eijnatten, M.A. Denoising diffusion probabilistic models for addressing data limitations in chest X-ray classification. Inform. Med. Unlocked 2024, 50, 101575. [Google Scholar] [CrossRef]
- Pan, Z.; Xia, J.; Yan, Z.; Xu, G.; Wu, Y.; Jia, Z.; Chen, J.; Shi, Y. Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective. arXiv 2024, arXiv:2408.08228. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.; Hajishirzi, H. Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 552–568. [Google Scholar]
- Dash, M.; Londhe, N.D.; Ghosh, S.; Semwal, A.; Sonawane, R.S. PsLSNet: Automated psoriasis skin lesion segmentation using modified U-Net-based fully convolutional network. Biomed. Signal Process. Control 2019, 52, 226–237. [Google Scholar] [CrossRef]
- Xie, F.; Yang, J.; Liu, J.; Jiang, Z.; Zheng, Y.; Wang, Y. Skin lesion segmentation using high-resolution convolutional neural network. Comput. Methods Programs Biomed. 2020, 186, 105241. [Google Scholar] [CrossRef]
- Al-Masni, M.A.; Al-Antari, M.A.; Choi, M.T.; Han, S.M.; Kim, T.S. Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks. Comput. Methods Programs Biomed. 2018, 162, 221–231. [Google Scholar] [CrossRef] [PubMed]
- Zhao, C.; Shuai, R.; Ma, L.; Liu, W.; Wu, M. Segmentation of skin lesions image based on U-Net++. Multimed. Tools Appl. 2022, 81, 8691–8717. [Google Scholar] [CrossRef]
- Zeng, Z.; Hu, Q.; Xie, Z.; Li, B.; Zhou, J.; Xu, Y. Small but mighty: Enhancing 3d point clouds semantic segmentation with u-next framework. Int. J. Appl. Earth Obs. Geoinf. 2025, 136, 104309. [Google Scholar] [CrossRef]
- Liu, Y.; Zhu, H.; Liu, M.; Yu, H.; Chen, Z.; Gao, J. Rolling-unet: Revitalizing mlp’s ability to efficiently extract long-distance dependencies for medical image segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 3819–3827. [Google Scholar]
- Tang, H.; Li, Z.; Zhang, D.; He, S.; Tang, J. Divide-and-conquer: Confluent triple-flow network for RGB-T salient object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 1958–1974. [Google Scholar] [CrossRef]
- Eddy, S.R. What is a hidden Markov model? Nat. Biotechnol. 2004, 22, 1315–1316. [Google Scholar] [CrossRef]
- Rasmussen, C.E. The infinite Gaussian mixture model. Adv. Neural Inf. Process. Syst. 1999, 12, 554–560. Available online: https://proceedings.neurips.cc/paper_files/paper/1999/file/97d98119037c5b8a9663cb21fb8ebf47-Paper.pdf (accessed on 20 August 2025).
- Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders (Foundations and Trends® in Machine Learning); Now Publishers: Norwell, MA, USA, 2019; Volume 12, pp. 307–392. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Qin, Z.; Liu, Z.; Zhu, P.; Xue, Y. A GAN-based image synthesis method for skin lesion classification. Comput. Methods Programs Biomed. 2020, 195, 105568. [Google Scholar] [CrossRef] [PubMed]
- Casti, P.; Cardarelli, S.; Comes, M.C.; D’Orazio, M.; Filippi, J.; Antonelli, G.; Mencattini, A.; Di Natale, C.; Martinelli, E. S3-VAE: A novel Supervised-Source-Separation Variational AutoEncoder algorithm to discriminate tumor cell lines in time-lapse microscopy images. Expert Syst. Appl. 2023, 232, 120861. [Google Scholar] [CrossRef]
- Kazerouni, A.; Aghdam, E.K.; Heidari, M.; Azad, R.; Fayyaz, M.; Hacihaliloglu, I.; Merhof, D. Diffusion models in medical imaging: A comprehensive survey. Med. Image Anal. 2023, 88, 102846. [Google Scholar] [CrossRef]
- Wang, G.; Zhang, N.; Liu, W.; Chen, H.; Xie, Y. MFST: A multi-level fusion network for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Zhang, B.; Zheng, Z.; Zhao, Y.; Shen, Y.; Sun, M. MCBTNet: Multi-Feature Fusion CNN and Bi-Level Routing Attention Transformer-based Medical Image Segmentation Network. IEEE J. Biomed. Health Inform. 2025, 29, 5069–5082. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar] [CrossRef]
- Codella, N.C.; Gutman, D.; Celebi, M.E.; Helba, B.; Marchetti, M.A.; Dusza, S.W.; Kalloo, A.; Liopyris, K.; Mishra, N.; Kittler, H.; et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; IEEE: New York, NY, USA, 2018; pp. 168–172. [Google Scholar]
- Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 2018, 5, 1–9. [Google Scholar] [CrossRef]
- Nie, X.; Zhang, C.; Cao, Q. Image segmentation method on quartz particle-size detection by deep learning networks. Minerals 2022, 12, 1479. [Google Scholar] [CrossRef]
- Sang, D.V.; Minh, N.D. Fully residual convolutional neural networks for aerial image segmentation. In Proceedings of the 9th International Symposium on Information and Communication Technology, Danang City, Vietnam, 6–7 December 2018; pp. 289–296. [Google Scholar]
- Heryadi, Y.; Irwansyah, E.; Miranda, E.; Soeparno, H.; Hashimoto, K. The effect of resnet model as feature extractor network to performance of DeepLabV3 model for semantic satellite image segmentation. In Proceedings of the 2020 IEEE Asia-Pacific Conference on Geoscience, Electronics and Remote Sensing Technology (AGERS), Jakarta, Indonesia, 7–8 December 2020; IEEE: New York, NY, USA, 2020; pp. 74–77. [Google Scholar]
- Hao, S.; Yu, Z.; Zhang, B.; Dai, C.; Fan, Z.; Ji, Z.; Ganchev, I. MEFP-Net: A dual-encoding multi-scale edge feature perception network for skin lesion segmentation. IEEE Access 2024, 12, 140039–140052. [Google Scholar] [CrossRef]
- Khan, S.; Khan, A.; Teng, Y. DFF-UNet: A Lightweight Deep Feature Fusion UNet Model for Skin Lesion Segmentation. IEEE Trans. Instrum. Meas. 2025, 74, 5030214. [Google Scholar] [CrossRef]
Component | Parameter Group | Setting |
---|---|---|
DDPM for Image Generation | ||
U-Net Architecture | 4 levels, multipliers [1, 2, 2, 2], attention @ L2 | |
Time Embedding | Sinusoidal Positional | |
Optimizer | AdamW ( WD, Cosine Annealing LR) | |
Regularization | Grad. Clip (1.0), Augmentation (Flip, Norm) | |
Batch Size | 8 | |
Segmentation Model Training | ||
Model Setup | Pre-trained backbones (ResNet50/101) | |
Optimizer | RMSprop (0.9 momentum, LR, | |
WD) | ||
Input Image Size | ||
Data Split (Train/Val) | 90% (Real + Synthetic)/10% (Real) | |
Max Epochs/Batch Size | Up to 50/8 | |
Key Software and Hardware | ||
Framework | PyTorch 1.11.0 (CUDA 11.3) | |
GPU | NVIDIA GeForce RTX 3090 | |
OS/CUDA Driver | Linux/11.7 |
Model | FID Score |
---|---|
Traditional DDPM | 143.7268 |
DDPM + dilated convolutions | 140.3926 |
Segmentation Model | Real Data Only (R) | Real + Synthetic (R + S) | (R + S vs. R) [pp] | Significance | |||
---|---|---|---|---|---|---|---|
DICE | IoU | DICE | IoU | DICE | IoU | ||
Standard U-Net | 0.8757 | 0.7988 | 0.8829 | 0.8086 | +0.72 | +0.98 | ** |
Dilated-UNet (ours) | 0.8797 | 0.8013 | 0.8843 | 0.8088 | +0.46 | +0.75 | * |
FCN-ResNet50 | 0.8859 | 0.8134 | 0.8931 | 0.8221 | +0.72 | +0.87 | * |
FCN-ResNet101 | 0.8842 | 0.8110 | 0.8907 | 0.8200 | +0.65 | +0.90 | * |
DeepLabV3-ResNet50 | 0.8847 | 0.8118 | 0.8835 | 0.8112 | −0.12 | −0.06 | n.s. |
DeepLabV3-ResNet101 | 0.8775 | 0.7855 | 0.8859 | 0.7968 | +0.84 | +1.13 | ** |
MEFP-Net [41] | 0.8965 | 0.8275 | 0.9014 | 0.8321 | +0.49 | +0.46 | * |
DFF-UNet [42] | 0.8762 | 0.8175 | 0.8804 | 0.8219 | +0.42 | +0.44 | n.s. |
Training Data Configuration | DICE |
---|---|
Real Data Only (e.g., 2000 images) | 0.8797 |
Synthetic Data (Dilated Conv.) Only (1000 images) | 0.7836 |
Synthetic Data (Normal Conv.) Only (1000 images) | 0.7802 |
Real Data (2000) + Synthetic (Normal Conv., 1000) | 0.8841 |
Real Data (2000) + Synthetic (Dilated Conv., 1000) | 0.8843 |
Training Data Configuration | Real Images | Synthetic Images | DICE |
---|---|---|---|
Full Synthetic Data Only | 0 | 1000 | 0.7836 |
Real/Synthetic (1:10) | 100 | 1000 | 0.8178 |
Real/Synthetic (1:2) | 500 | 1000 | 0.8509 |
Real/Synthetic (1:1) | 1000 | 1000 | 0.8640 |
Real/Synthetic (2:1) | 2000 | 1000 | 0.8843 |
Segmentation Model | Parameters (M) | IoU Score | ||
---|---|---|---|---|
Baseline | Augmented | Change (pp) | ||
DFF-UNet [42] | 0.19 | 0.8175 | 0.8219 | +0.44 |
MEFP-Net [41] | 5.62 | 0.8280 | 0.8315 | +0.35 |
Standard U-Net | 17.26 | 0.7988 | 0.8086 | +0.98 |
Dilated-UNet | 17.26 | 0.8013 | 0.8088 | +0.75 |
FCN-ResNet50 | 35.31 | 0.8121 | 0.8221 | +1.00 |
DeepLabV3-ResNet50 | 42.00 | 0.8120 | 0.8110 | −0.10 |
FCN-ResNet101 | 54.30 | 0.8130 | 0.8210 | +0.80 |
DeepLabV3-ResNet101 | 60.99 | 0.7860 | 0.7990 | +1.30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, P.; Chen, Z.; Sun, X.; Deng, X. Better with Less: Efficient and Accurate Skin Lesion Segmentation Enabled by Diffusion Model Augmentation. Electronics 2025, 14, 3359. https://doi.org/10.3390/electronics14173359
Yang P, Chen Z, Sun X, Deng X. Better with Less: Efficient and Accurate Skin Lesion Segmentation Enabled by Diffusion Model Augmentation. Electronics. 2025; 14(17):3359. https://doi.org/10.3390/electronics14173359
Chicago/Turabian StyleYang, Peng, Zhuochao Chen, Xiaoxuan Sun, and Xiaodan Deng. 2025. "Better with Less: Efficient and Accurate Skin Lesion Segmentation Enabled by Diffusion Model Augmentation" Electronics 14, no. 17: 3359. https://doi.org/10.3390/electronics14173359
APA StyleYang, P., Chen, Z., Sun, X., & Deng, X. (2025). Better with Less: Efficient and Accurate Skin Lesion Segmentation Enabled by Diffusion Model Augmentation. Electronics, 14(17), 3359. https://doi.org/10.3390/electronics14173359