Dual Attention-Based 3D U-Net Liver Segmentation Algorithm on CT Images
Abstract
:1. Introduction
- (1)
- Limited network depth restricts the extraction and modeling ability of complex features.
- (2)
- Lack of comprehensive global contextual analysis capability limits performance improvement.
- (3)
- Loss of semantic features during the upsampling process results in decreased accuracy of the final prediction.
- (1)
- An improved 3D U-Net network with residual connections is introduced to increase network depth, better capture multiscale information, and alleviate semantic information loss.
- (2)
- Integration of DA-Block into the encoder aims to enhance feature extraction capability and extract rich global contextual information.
- (3)
- The introduction of CBAM modules in each skip connection optimizes the transfer of features in the encoder, reduces semantic gaps, aids the decoder in reconstructing feature maps, and achieves accurate liver segmentation.
2. Related Works
2.1. U-Net Models
2.2. The Use of Attention Mechanisms in Liver Segmentation
3. Methodology
3.1. 3D Res-UNet
- (1)
- The encoder module utilizes convolution and pooling operations to downsample the input, progressively reducing the image size and feature count to extract low-level features.
- (2)
- The decoder module employs transpose convolution operations to upsample the input, gradually increasing the image size and feature count, and performs feature fusion to generate high-level features.
- (3)
- The residual connections are responsible for connecting features with the same resolution during the downsampling and upsampling processes, aiding the network in better capturing multiscale information, alleviating semantic information loss issues, and enhancing image segmentation performance.
3.2. Dual Attention Block
3.2.1. DA-PAM
- (1)
- Input feature map ARC×D×H×W, where C represents the number of channels, D represents the depth, H represents the height, and W represents the width. Through three convolutional layers, three feature maps P, Q, and M are obtained, each with a size of RC×D×H×W.
- (2)
- Let N = D × H × W denote the number of voxels. Reshape the feature maps P, Q, and M into P′, Q′, and M′, each with a size of RC×N.
- (3)
- Perform transpose operation on feature map P′ to obtain P′RN×C, then multiply it with the feature map Q′. The result of matrix multiplication passes through a softmax layer to obtain the normalized weight map SRN×N, where S represents the spatial attention map, as shown in:
- (4)
- Multiply the transposed weight map S with the feature map M′, and then scale it by a scale factor α. Reshape the result into RC×D×H×W. The initial value of α is set to 0, and it is iteratively trained to obtain larger weights.
- (5)
- Finally, element-wise addition with the input feature map A yields the output ERC×D×H×W, as shown in:
3.2.2. DA-CAM
- (1)
- Let N = D × H × W denote the number of voxels. Reshape the input feature map A into A′RC×N.
- (2)
- Multiply A′ with its transposed form A′RN×C, and the result passes through a softmax layer to obtain the normalized weight map XRC×C. X represents the channel attention map, as shown in:
- (3)
- Multiply the transposed weight map X with the reshaped feature map A′, and then scale it by a scale factor β. Reshape the result into RC×D×H×W. The initial value of β is set to 0, and it is iteratively trained to obtain larger weights.
- (4)
- Finally, element-wise addition with the input feature map A yields the output ERC×D×H×W, as shown in:
3.2.3. The Detailed Process of the DA-Block
3.3. Convolutional Block Attention Module
3.3.1. CBAM-CAM
- (1)
- Input feature map ARC×D×H×W, where C represents the number of channels, D represents the depth, H represents the height, and W represents the width. The feature map is separately subjected to max-pooling and average-pooling along the spatial dimensions, resulting in two different dimensional feature descriptions, B and C, both of size RC×1×1×1, as shown in:
- (2)
- B and C are passed through a shared-weight Multilayer Perceptron (MLP) network, comprising two layers of convolution operations W0 and W1, resulting in new feature maps B′ and C′, as shown in:
- (3)
- B′ and C′ are element-wise added and passed through a sigmoid activation function to obtain normalized channel attention weights Mc(A), as shown in:
3.3.2. CBAM-SAM
- (1)
- Input feature map A′RC×D×H×W. Max-pooling and average-pooling are separately applied along the channel dimension of the feature map, resulting in two different dimensional feature descriptions, D and E, both of size R1×D×H×W, as shown in:
- (2)
- D and E are stacked along the channel dimension and passed through a convolution operation followed by a sigmoid activation function to obtain normalized spatial attention weights Ms(A′), as shown in:
3.3.3. The Detailed Process of the CBAM
4. Experiments
4.1. Datasets and Evaluation
- (1)
- The DSC is a measure of similarity often used to compute the similarity or overlap between the predicted segmentation result Vseg and ground truth Vgt, particularly effective in cases where object boundaries are not clearly defined. Its formula is given by:
- (2)
- The VOE is employed to quantify the overlap between two volumes. It is calculated as follows:
- (3)
- The HD is a metric used to measure the similarity between sets of points. Given two point sets A and B, it computes the maximum distance from each point in A to the nearest point in B. The formula for HD is:
- (4)
- The RMSD is utilized to quantify the difference between structures. Its formula is given by:
4.2. Implementation Details
4.3. Segmentation Results
4.3.1. Ablation Studies
- (1)
- Effect of the CBAM in Encoder and Skip Connection.
- (2)
- Effect of the DA-Block in Encoder.
4.3.2. Comparison with Other Methods
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Rahman, H.; Bukht, T.F.N.; Imran, A.; Tariq, J.; Tu, S.; Alzahrani, A. A Deep Learning Approach for Liver and Tumor Segmentation in CT Images Using ResUNet. Bioengineering 2022, 9, 368. [Google Scholar] [CrossRef]
- Cheemerla, S.; Balakrishnan, M. Global epidemiology of chronic liver disease. Clin. Liver Dis. 2021, 17, 365–370. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Meng, L.; Zhang, Q.; Bu, S. Two-stage liver and tumor segmentation algorithm based on convolutional neural network. Diagnostics 2021, 11, 1806. [Google Scholar] [CrossRef]
- Xi, X.F.; Wang, L.; Sheng, V.S.; Cui, Z.; Fu, B.; Hu, F. Cascade U-ResNets for simultaneous liver and lesion segmentation. IEEE Access 2020, 8, 68944–68952. [Google Scholar] [CrossRef]
- Zhang, F.; Yang, J.; Nezami, N.; Laage-Gaupp, F.; Chapiro, J.; De Lin, M.; Duncan, J. Liver tissue classification using an auto-context-based deep neural network with a multi-phase training framework. In Proceedings of the Patch-Based Techniques in Medical Imaging: 4th International Workshop, Granada, Spain, 20 September 2018; pp. 59–66. [Google Scholar]
- Li, L.; Ma, H. Rdctrans u-net: A hybrid variable architecture for liver ct image segmentation. Sensors 2022, 22, 2452. [Google Scholar] [CrossRef]
- Wei, C.; Ren, S.; Guo, K.; Hu, H.; Liang, J. High-resolution Swin transformer for automatic medical image segmentation. Sensors 2023, 23, 3420. [Google Scholar] [CrossRef]
- Gao, Y.; Guo, J.; Fu, C.; Wang, Y.; Cai, S. VLSM-Net: A fusion architecture for CT image segmentation. Appl. Sci. 2023, 13, 4384. [Google Scholar] [CrossRef]
- Xing, Z.; Wan, L.; Fu, H.; Yang, G.; Zhu, L. Diff-unet: A diffusion embedded network for volumetric segmentation. arXiv 2023, arXiv:2303.10326. [Google Scholar]
- Bogoi, S.; Udrea, A. A lightweight deep learning approach for liver segmentation. Mathematics 2022, 11, 95. [Google Scholar] [CrossRef]
- Liu, J.; Yan, Z.; Zhou, C.; Shao, L.; Han, Y.; Song, Y. mfeeU-Net: A multi-scale feature extraction and enhancement U-Net for automatic liver segmentation from CT Images. Math. Biosci. Eng. 2023, 20, 7784–7801. [Google Scholar] [CrossRef]
- Kushnure, D.T.; Tyagi, S.; Talbar, S.N. LiM-Net: Lightweight multi-level multiscale network with deep residual learning for automatic liver segmentation in CT images. Biomed. Signal Process. Control 2023, 80, 104305. [Google Scholar] [CrossRef]
- Luan, S.; Xue, X.; Ding, Y.; Wei, W.; Zhu, B. Adaptive attention convolutional neural network for liver tumor segmentation. Front. Oncol. 2021, 11, 680807. [Google Scholar] [CrossRef]
- Pettit, R.W.; Marlatt, B.B.; Corr, S.J.; Havelka, J.; Rana, A. nnU-Net deep learning method for segmenting parenchyma and determining liver volume from computed tomography images. Ann. Surg. Open 2022, 3, e155. [Google Scholar] [CrossRef]
- Almotairi, S.; Kareem, G.; Aouf, M.; Almutairi, B.; Salem, M. Liver tumor segmentation in CT scans using modified SegNet. Sensors 2020, 20, 1516. [Google Scholar] [CrossRef]
- Liu, Z.; Han, K.; Wang, Z.; Zhang, J.; Song, Y.; Yao, X.; Yuan, D.; Sheng, V.S. Automatic liver segmentation from abdominal CT volumes using improved convolution neural networks. Multimed. Syst. 2021, 27, 111–124. [Google Scholar] [CrossRef]
- Wardhana, G.; Naghibi, H.; Sirmacek, B.; Abayazid, M. Toward reliable automatic liver and tumor segmentation using convolutional neural network based on 2.5 D models. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 41–51. [Google Scholar] [CrossRef]
- Lei, T.; Wang, R.; Zhang, Y.; Wan, Y.; Liu, C.; Nandi, A.K. DefED-Net: Deformable encoder-decoder network for liver and liver tumor segmentation. IEEE Trans. Radiat. Plasma Med. Sci. 2021, 6, 68–78. [Google Scholar] [CrossRef]
- Mourya, G.K.; Gogoi, M.; Talbar, S.N.; Dutande, P.V.; Baid, U. Research Anthology on Improving Medical Imaging Techniques for Analysis and Intervention; Medical Info Science Reference: Hershey, PA, USA, 2023; Chapter 59; pp. 1153–1165. ISBN 978-1-6684-7544-7. [Google Scholar]
- Tian, Y.; Xue, F.; Lambo, R.; He, J.; An, C.; Xie, Y.; Cao, H.; Qin, W. Fully-automated functional region annotation of liver via a 2.5 D class-aware deep neural network with spatial adaptation. Comput. Methods Programs Biomed. 2021, 200, 105818. [Google Scholar] [CrossRef]
- Hong, J.; Yu, S.C.H.; Chen, W. Unsupervised domain adaptation for cross-modality liver segmentation via joint adversarial learning and self-learning. Appl. Soft Comput. 2022, 121, 108729. [Google Scholar] [CrossRef]
- Tan, M.; Wu, F.; Kong, D.; Mao, X. Automatic liver segmentation using 3D convolutional neural networks with a hybrid loss function. Med. Phys. 2021, 48, 1707–1719. [Google Scholar] [CrossRef]
- Jeong, J.G.; Choi, S.; Kim, Y.J.; Lee, W.S.; Kim, K.G. Deep 3D attention CLSTM U-Net based automated liver segmentation and volumetry for the liver transplantation in abdominal CT volumes. Sci. Rep. 2022, 12, 6370. [Google Scholar] [CrossRef]
- Pandey, S.; Chen, K.F.; Dam, E.B. Comprehensive Multimodal Segmentation in Medical Imaging: Combining YOLOv8 with SAM and HQ-SAM Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 2–6 October 2023; pp. 2584–2590. [Google Scholar]
- Lin, S.; Lin, C. Brain tumor segmentation using U-Net in conjunction with EfficientNet. PeerJ Comput. Sci. 2024, 10, e1754. [Google Scholar] [CrossRef]
- Cheng, J.; Ye, J.; Deng, Z.; Chen, J.; Li, T.; Wang, H.; Su, Y.; Huang, Z.; Chen, J.; Jiang, L.; et al. Sam-med2d. arXiv 2023, arXiv:2308.16184. [Google Scholar]
- Liu, Z.; Song, Y.Q.; Sheng, V.S.; Wang, L.; Jiang, R.; Zhang, X.; Yuan, D. Liver CT sequence segmentation based with improved U-Net and graph cut. Expert Syst. Appl. 2019, 126, 54–63. [Google Scholar] [CrossRef]
- Song, L.I.; Geoffrey, K.F.; Kaijian, H.E. Bottleneck feature supervised U-Net for pixel-wise liver and tumor segmentation. Expert Syst. Appl. 2020, 145, 113131. [Google Scholar]
- Wu, J.; Zhou, S.; Zuo, S.; Chen, Y.; Sun, W.; Luo, J.; Duan, J.; Wang, H.; Wang, D. U-Net combined with multi-scale attention mechanism for liver segmentation in CT images. BMC Med. Inf. Decis. Mak. 2021, 21, 283. [Google Scholar] [CrossRef]
- Yu, A.H.; Liu, Z.; Sheng, V.S.; Song, Y.; Liu, X.; Ma, C.; Wang, W.; Ma, C. CT segmentation of liver and tumors fused multi-scale features. Intell. Autom. Soft Comput. 2021, 30, 589–599. [Google Scholar] [CrossRef]
- Jiang, L.; Ou, J.; Liu, R.; Zou, Y.; Xie, T.; Xiao, H.; Bai, T. Rmau-net: Residual multi-scale attention u-net for liver and tumor segmentation in ct images. Comput. Biol. Med. 2023, 158, 106838. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, K.; Liao, X.; Qian, Y.; Wang, Q.; Yuan, Z.; Heng, P.A. Channel-Unet: A spatial channel-wise convolutional neural network for liver and tumors segmentation. Front. Genet. 2019, 10, 1110. [Google Scholar] [CrossRef]
- Han, L.; Chen, Y.; Li, J.; Zhong, B.; Lei, Y.; Sun, M. Liver segmentation with 2.5 D perpendicular UNets. Comput. Electr. Eng. 2021, 91, 107118. [Google Scholar] [CrossRef]
- Lv, P.; Wang, J.; Wang, H. 2.5 D lightweight RIU-Net for automatic liver and tumor segmentation from CT. Biomed. Signal Process. Control 2022, 75, 103567. [Google Scholar] [CrossRef]
- Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; pp. 424–432. [Google Scholar]
- Czipczer, V.; Manno-Kovacs, A. Adaptable volumetric liver segmentation model for CT images using region-based features and convolutional neural network. Neurocomputing 2022, 505, 388–401. [Google Scholar] [CrossRef]
- Chi, J.; Han, X.; Wu, C.; Wang, H.; Ji, P. X-Net: Multi-branch UNet-like network for liver and tumor segmentation from 3D abdominal CT scans. Neurocomputing 2021, 459, 81–96. [Google Scholar] [CrossRef]
- He, R.; Xu, S.; Liu, Y.; Li, Q.; Liu, Y.; Zhao, N.; Yuan, Y.; Zhang, H. Three-dimensional liver image segmentation using generative adversarial networks based on feature restoration. Front. Med. 2022, 8, 794969. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Zheng, C.; Zhou, T.; Feng, L.; Liu, L.; Zeng, Q.; Wang, G. A deep residual attention-based U-Net with a biplane joint method for liver segmentation from CT scans. Comput. Biol. Med. 2023, 152, 106421. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Zhang, R.; Yan, P. Feature fusion encoder decoder network for automatic liver lesion segmentation. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI), Venice, Italy, 8–11 April 2019; pp. 430–433. [Google Scholar]
- Guo, C.; Szemenyei, M.; Yi, Y.; Wang, W.; Chen, B.; Fan, C. Sa-unet: Spatial attention u-net for retinal vessel segmentation. In Proceedings of the 2020 IEEE 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 1236–1242. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Xiao, X.; Lian, S.; Luo, Z.; Li, S. Weighted res-unet for high-quality retina vessel segmentation. In Proceedings of the 2018 9th International Conference On Information Technology in Medicine and Education (ITME), Hangzhou, China, 19–21 October 2018; pp. 327–331. [Google Scholar]
- Sun, G.; Pan, Y.; Kong, W.; Xu, Z.; Ma, J.; Racharak, T.; Nguyen, L.M.; Xin, J. DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation. Front. Bioeng. Biotechnol. 2024, 12, 1398237. [Google Scholar] [CrossRef] [PubMed]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Bilic, P.; Christ, P.; Li, H.B.; Vorontsov, E.; Ben-Cohen, A.; Kaissis, G.; Menze, B.; Szeskin, A.; Jacobs, C.; Mamani, G.E.H.; et al. The liver tumor segmentation benchmark (lits). Med. Image Anal. 2023, 84, 102680. [Google Scholar] [CrossRef] [PubMed]
- Salehi, S.S.M.; Erdogmus, D.; Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. In Proceedings of the International Workshop on Machine Learning In Medical Imaging (MLMI), Quebec City, QC, Canada, 10 September 2017; pp. 379–387. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Valanarasu, J.M.J.; Sindagi, V.A.; Hacihaliloglu, I.; Patel, V.M. Kiu-net: Overcomplete convolutional architectures for biomedical image and volumetric segmentation. IEEE Trans. Med. Imaging 2021, 41, 965–976. [Google Scholar] [CrossRef]
Methods * | Encoder with CBAM | Skip with CBAM | DSC↑ | HD95↓ | ||
---|---|---|---|---|---|---|
1st Layer | 2nd Layer | 3rd Layer | ||||
Res-UNet | 91.72 | 30.54 | ||||
Our approach | √ | 90.63 | 34.21 | |||
Our approach | √ | 90.87 | 30.67 | |||
Our approach | √ | 91.61 | 28.74 | |||
Our approach | √ | 90.15 | 30.99 | |||
Our approach | √ | √ | √ | 91.33 | 28.81 | |
Our approach | √ | √ | √ | √ | 92.47 | 29.03 |
Methods | Encoder with DA-Block | DSC↑ | HD95↓ |
---|---|---|---|
Res-UNet | 91.72 | 30.54 | |
Our approach (without CBAM) | √ | 92.25 | 29.72 |
Methods * | DSC (%) | VOE (%) | HD95 (mm) | RMSD (mm) |
---|---|---|---|---|
SegNet | 64.95 | 35.27 | 39.83 | 16.01 |
KiU-Net | 88.67 | 11.39 | 32.65 | 13.12 |
3D U-Net | 89.26 | 10.89 | 31.33 | 12.63 |
3D Res-UNet | 91.72 | 9.08 | 30.54 | 11.49 |
Our approach | 92.56 | 7.34 | 28.09 | 10.61 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, B.; Qiu, S.; Liang, T. Dual Attention-Based 3D U-Net Liver Segmentation Algorithm on CT Images. Bioengineering 2024, 11, 737. https://doi.org/10.3390/bioengineering11070737
Zhang B, Qiu S, Liang T. Dual Attention-Based 3D U-Net Liver Segmentation Algorithm on CT Images. Bioengineering. 2024; 11(7):737. https://doi.org/10.3390/bioengineering11070737
Chicago/Turabian StyleZhang, Benyue, Shi Qiu, and Ting Liang. 2024. "Dual Attention-Based 3D U-Net Liver Segmentation Algorithm on CT Images" Bioengineering 11, no. 7: 737. https://doi.org/10.3390/bioengineering11070737