MSLUnet: A Medical Image Segmentation Network Incorporating Multi-Scale Semantics and Large Kernel Convolution
Abstract
:1. Introduction
- (1)
- We introduce a new segmentation network called MSLUnet, specifically designed for lesion segmentation in medical images. MSLUnet efficiently extracts multi-scale and global contextual information while requiring fewer parameters and less computational effort, thereby enhancing the accuracy of lesion segmentation;
- (2)
- We present the Multi-scale Feature Extraction (MSE) block, which captures multi-scale feature information at a granular level. This block utilizes a symmetric structure with multiple basic blocks to comprehensively extract features at different scales;
- (3)
- We propose a convolutional decoder module that combines depth-separable convolution with an inverse bottleneck structure to optimize the extraction of global contextual information and minimize the parameter count. The module is further enhanced with normalization, residual concatenation, and activation functions to improve its performance in medical image segmentation;
- (4)
- Experimental evaluations demonstrate that our MSLUnet, with only 2.18 million parameters—just 38% of the parameters of the traditional Unet—achieves superior segmentation results compared to other models across various public medical image datasets.
2. Related Work
3. Methods
3.1. Multiscale Feature Extraction Encoder
3.2. Large-Kernel Convolution Feature Extraction Block (LKE Block)
3.3. Three-Branch Attention Mechanism
3.4. Downsampling and Upsampling
4. Experiments
4.1. Datasets
4.2. Experimental Details
4.3. Experimental Results
4.3.1. Comparison of BUSI Dataset
4.3.2. Comparison of Kvasir-SEG Dataset
4.3.3. Comparison of ISIC 2018 Dataset
4.3.4. Statistical Test Analysis
- -
- BUSI dataset: The F-statistic is 302.87 with a p-value of 1.78 × 10−29, indicating an extremely significant difference in performance between models on this dataset. This could mean that some models perform very well on this type of data while others perform very poorly with high variability.
- -
- Kvasir-SEG dataset: The F-statistic is 126.46 with a p-value of 2.39 × 10−23, indicating lower variability in performance among models compared to the BUSI dataset. However, the significance level remains very high.
- -
- ISIC_2018 dataset: The F-statistic is 62.44, and the p-value is 1.62 × 10−18, indicating the smallest difference in performance between the models compared to the previous two datasets, albeit still significant.
4.3.5. Analysis
4.3.6. Ablation Experiments
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Xu, G.; Feng, C.; Ma, F. Review of Medical Image Segmentation Based on UNet. J. Front. Comput. Sci. Technol. 2023, 17, 1776–1792. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Wang, F.; Wang, X.; Sun, S. A Reinforcement Learning Level-Based Particle Swarm Optimization Algorithm for Large-Scale Optimization. Inf. Sci. 2022, 602, 298–312. [Google Scholar] [CrossRef]
- Wang, Y.; Cai, S.; Chen, J.; Yin, M. SCCWalk: An Efficient Local Search Algorithm and Its Improvements for Maximum Weight Clique Problem. Artif. Intell. 2020, 280, 103230. [Google Scholar] [CrossRef]
- Wang, L.; Pan, Z.; Wang, J. A Review of Reinforcement Learning Based Intelligent Optimization for Manufacturing Scheduling. Complex Syst. Model. Simul. 2021, 1, 257–270. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
- Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.J.; Heinrich, M.P.; Misawa, K.; Mori, K.; McDonagh, S.G.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
- Jha, D.; Riegler, M.A.; Johansen, D.; Halvorsen, P.; Johansen, H.D. DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation. In Proceedings of the 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 28–30 July 2020; pp. 558–564. [Google Scholar]
- Valanarasu, J.M.J.; Patel, V.M. UNeXt: MLP-Based Rapid Medical Image Segmentation Network. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2022, Singapore, 18–22 September 2022; Part VI. pp. 23–33. [Google Scholar]
- Han, Z.; Jian, M.; Wang, G.-G. ConvUNeXt: An efficient convolution neural network for medical image segmentation. Knowl.-Based Syst. 2022, 253, 109512. [Google Scholar] [CrossRef]
- Zhou, Y.; Chang, H.; Lu, X.; Lu, Y. DenseUNet: Improved image classification method using standard convolution and dense transposed convolution. Knowl.-Based Syst. 2022; 254, 109658. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Amyar, A.; Modzelewski, R.; Vera, P.; Morard, V.; Ruan, S. Multi-task multi-scale learning for outcome prediction in 3D PET images. Comput. Biol. Med. 2022, 151, 106208. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zuo, S.; Xiao, Y.; Chang, X.; Wang, X. Vision transformers for dense prediction: A survey. Knowl.-Based Syst. 2022, 253, 109552. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Chen, B.; Liu, Y.; Zhang, Z.; Lu, G.; Kong, A.W.-K. TransAttUnet: Multi-Level Attention-Guided U-Net with Transformer for Medical Image Segmentation. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 8, 55–68. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Computer Vision—ECCV 2022 Workshops; Springer: Cham, Switzerland, 2023; pp. 205–218. [Google Scholar]
- Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Johansen, D.; Lange, T.D.; Halvorsen, P.; Johansen, H.D. ResUNet++: An Advanced Architecture for Medical Image Segmentation. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; pp. 225–2255. [Google Scholar]
- Shao, H.; Zeng, Q.; Hou, Q.; Yang, J. MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention. arXiv 2023, arXiv:2312.08866. [Google Scholar]
- Titoriya, A.K.; Singh, M.P. PVT-CASCADE network on skin cancer dataset. In Proceedings of the 8th International Conference on Computing in Engineering and Technology (ICCET 2023), Hybrid Conference, Patna, India, 14–15 July 2023; pp. 480–486. [Google Scholar]
- Lu, Z.; She, C.; Wang, W.; Huang, Q. LM-Net: A light-weight and multi-scale network for medical image segmentation. Comput. Biol. Med. 2024, 168, 107717. [Google Scholar] [CrossRef]
- Dinh, B.D.; Nguyen, T.T.; Tran, T.T.; Pham, V.T. 1M parameters are enough? A lightweight CNN-based model for medical image segmentation. In Proceedings of the 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan, 31 October–3 November 2023; pp. 1279–1284. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Zhao, Z.; Liu, Q.; Wang, S. Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild. IEEE Trans. Image Process. 2021, 30, 6544–6556. [Google Scholar] [CrossRef]
- Quan, X.P.; Xiang, L.Y.; Ying, L. Medical Image Segmentation Fusing Multi-Scale Semantic and Residual Bottleneck Attention. Comput. Eng. 2023, 49, 162–170. [Google Scholar] [CrossRef]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 4278–4284. [Google Scholar]
- Ba, J.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Agarap, A.F. Deep Learning using Rectified Linear Units (ReLU). arXiv 2018, arXiv:1803.08375. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar]
- Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to Attend: Convolutional Triplet Attention Module. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 3138–3147. [Google Scholar]
- Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
- Jha, D.; Smedsrud, P.H.; Riegler, M.; Halvorsen, P.; de Lange, T.; Johansen, D.; Johansen, H.D. Kvasir-SEG: A Segmented Polyp Dataset. In Proceedings of the Conference on Multimedia Modeling, Daejeon, Republic of Korea, 5–8 January 2020. [Google Scholar]
- Codella, N.C.F.; Rotemberg, V.M.; Tschandl, P.; Celebi, M.E.; Dusza, S.W.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M.A.; et al. Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC). arXiv 2019, arXiv:1902.03368. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Tang, F.; Ding, J.; Wang, L.; Ning, C.Y.; Zhou, S.K. CMUNeXt: An Efficient Medical Image Segmentation Network based on Large Kernel and Skip Fusion. arXiv 2023, arXiv:2308.01239. [Google Scholar]
- Chen, G.; Li, L.; Dai, Y.; Zhang, J.; Yap, M.H. AAU-Net: An Adaptive Attention U-Net for Breast Lesions Segmentation in Ultrasound Images. IEEE Trans. Med. Imaging 2023, 42, 1289–1300. [Google Scholar] [CrossRef]
- Rahman, M.M.; Marculescu, R. G-CASCADE: Efficient Cascaded Graph Convolutional Decoding for 2D Medical Image Segmentation. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 7713–7722. [Google Scholar]
- Amyar, A.; Guo, R.; Cai, X.; Assana, S.; Chow, K.; Rodriguez, J.; Yankama, T.; Cirillo, J.; Pierce, P.; Goddu, B.; et al. Impact of deep learning architectures on accelerated cardiac T<sub>1</sub> mapping using MyoMapNet. NMR Biomed. 2022, 35, e4794. [Google Scholar] [CrossRef]
Dataset | Train | Test | Input Image |
---|---|---|---|
BUSI [39] | 517 | 130 | 256 × 256 |
Kvasir-SEG [40] | 800 | 200 | 256 × 256 |
ISIC 2018 [41] | 2075 | 518 | 256 × 256 |
Architecture | mIoU (%) | Dice (%) | Recall (%) | Precision (%) | Specificity (%) |
---|---|---|---|---|---|
UNet [8] | 77.1 | 75.2 | 84.2 | 87.7 | 70.4 |
UNet++ [9] | 78.3 | 75.8 | 83.1 | 91.7 | 70.8 |
Attention_Unet [11] | 76.6 | 72.8 | 83.5 | 87.7 | 69.9 |
ResUnet++ [25] | 78.6 | 77.0 | 87.5 | 86.6 | 72.6 |
SwinUnet [24] | 78.5 | 77.4 | 87.9 | 89.7 | 72.7 |
ConvUNeXt [15] | 77.5 | 73.2 | 82.8 | 90.6 | 70.1 |
CMUNeXt [43] | 78.3 | 75.0 | 87.3 | 85.9 | 72.5 |
AAU-Net [44] | 78.7 | 78.9 | 87.6 | 86.3 | 72.9 |
ULite [29] | 76.4 | 72.9 | 82.2 | 89.3 | 69.2 |
LM-Net [28] | 79.0 | 79.1 | 87.9 | 86.5 | 73.2 |
MSLUnet (our) | 79.3 | 79.8 | 88.3 | 88.7 | 73.7 |
Architecture | mIoU (%) | Dice (%) | Recall (%) | Precision (%) | Specificity (%) |
---|---|---|---|---|---|
UNet [8] | 87.3 | 86.6 | 94.2 | 94.2 | 86.0 |
UNet++ [9] | 90.8 | 90.0 | 95.1 | 95.0 | 87.2 |
Attention_Unet [11] | 88.9 | 89.5 | 95.2 | 95.1 | 87.4 |
ResUnet++ [25] | 88.3 | 88.0 | 92.7 | 94.5 | 83.3 |
SwinUnet [24] | 82.5 | 84.6 | 91.8 | 93.2 | 82.8 |
ConvUNeXt [15] | 90.7 | 89.6 | 94.3 | 95.4 | 86.6 |
CMUNeXt [43] | 90.9 | 91.2 | 94.9 | 95.3 | 87.0 |
GCASCADE [45] | 89.9 | 92.6 | 95.2 | 94.6 | 87.7 |
ULite [29] | 90.4 | 88.4 | 95.2 | 94.4 | 86.8 |
LM-Net [28] | 89.2 | 91.5 | 95.0 | 95.3 | 87.3 |
MSLUnet (our) | 91.1 | 93.2 | 95.6 | 94.9 | 88.4 |
Architecture | mIoU (%) | Dice (%) | Recall (%) | Precision (%) | Specificity (%) |
---|---|---|---|---|---|
UNet [8] | 85.1 | 86.8 | 91.3 | 93.5 | 78.7 |
UNet++ [9] | 86.6 | 87.3 | 91.9 | 93.3 | 81.1 |
Attention Unet [11] | 86.0 | 87.1 | 91.1 | 93.7 | 80.1 |
ResUnet++ [25] | 85.7 | 86.6 | 90.6 | 93.8 | 79.5 |
TransUnet [22] | 85.9 | 88.3 | 91.8 | 93.8 | 81.0 |
ConvUNeXt [15] | 86.2 | 87.2 | 92.0 | 92.9 | 80.5 |
CMUNeXt [43] | 86.9 | 88.5 | 91.3 | 93.4 | 79.9 |
UNeXt [14] | 85.4 | 90.3 | 93.2 | 93.9 | 80.7 |
ULite [29] | 86.0 | 87.0 | 90.8 | 94.0 | 80.0 |
GCASCADE [45] | 86.5 | 90.9 | 93.1 | 94.7 | 80.9 |
MSLUnet (our) | 87.3 | 91.4 | 92.9 | 95.1 | 81.2 |
Dataset | F-Statistic | p-Value | Conclusion |
---|---|---|---|
BUSI [39] | 302.87 | 1.78 × 10−29 | Significant difference |
Kvasir-SEG [40] | 126.46 | 2.39 × 10−23 | Significant difference |
ISIC 2018 [41] | 62.44 | 1.62 × 10−18 | Significant difference |
Architecture | Parameter (M) | FLOPs (G) | BUSI | Kvasir-SEG | ISIC 2018 | |||
---|---|---|---|---|---|---|---|---|
mIoU (%) | Dice | mIoU (%) | Dice | mIoU (%) | Dice | |||
UNet | 4.32 | 35.61 | 77.1 | 0.737 | 87.3 | 0.866 | 85.1 | 0.848 |
UNet + MSE | 3.86 | 10.55 | 78.2 | 0.779 | 89.9 | 0.897 | 86.9 | 0.896 |
UNet + LKE | 2.64 | 5.21 | 77.9 | 0.764 | 91.8 | 0.904 | 86.5 | 0.893 |
UNet + AT | 4.31 | 10.15 | 78.0 | 0.767 | 90.6 | 0.901 | 87.2 | 0.884 |
MSLUnet (our) | 2.18 | 5.69 | 79.3 | 0.798 | 91.1 | 0.932 | 87.3 | 0.914 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, S.; Cheng, L. MSLUnet: A Medical Image Segmentation Network Incorporating Multi-Scale Semantics and Large Kernel Convolution. Appl. Sci. 2024, 14, 6765. https://doi.org/10.3390/app14156765
Zhu S, Cheng L. MSLUnet: A Medical Image Segmentation Network Incorporating Multi-Scale Semantics and Large Kernel Convolution. Applied Sciences. 2024; 14(15):6765. https://doi.org/10.3390/app14156765
Chicago/Turabian StyleZhu, Shijuan, and Lingfei Cheng. 2024. "MSLUnet: A Medical Image Segmentation Network Incorporating Multi-Scale Semantics and Large Kernel Convolution" Applied Sciences 14, no. 15: 6765. https://doi.org/10.3390/app14156765
APA StyleZhu, S., & Cheng, L. (2024). MSLUnet: A Medical Image Segmentation Network Incorporating Multi-Scale Semantics and Large Kernel Convolution. Applied Sciences, 14(15), 6765. https://doi.org/10.3390/app14156765