SALM: A Unified Model for 2D and 3D Region of Interest Segmentation in Lung CT Scans Using Vision Transformers
Abstract
:1. Introduction
- Development of a Unified 2D and 3D Segmentation Model: We propose SALM, a unified Vision Transformer-based model capable of performing both 2D and 3D ROI segmentation in lung CT images. This unified architecture simplifies the segmentation process and demonstrates the versatility of Vision Transformers in handling different data modalities within a single framework.
- Novel Adaptation of Positional Encoding for Enhanced Spatial Context Awareness: We demonstrate that modifying the periodicity of the standard sinusoidal functions used in positional encoding significantly enhances the model’s ability to capture multi-scale spatial relationships, particularly within 3D volumetric data. By introducing a modulation factor, we effectively increase the spatial frequency of the encoding. This allows the model to integrate information across different scales, leading to improved segmentation accuracy, especially in complex structures like those found in lung CT images. This finding provides insights into optimizing Transformer architectures for volumetric medical image analysis.
- Adaptation to Lung Image Size Variability: Another major contribution of our work lies in our approach to managing the size variability of lung CT images. Our SALM model incorporates advanced image processing techniques to standardize image scale and resolution during preprocessing, enabling accurate segmentation of CT images, irrespective of their original size. This standardization not only ensures high accuracy in the detection and segmentation of ROI of varying sizes, but also helps maintain optimal model performance on a wide range of lung imaging data. By directly tackling the challenge posed by the diversity of lung CT images, we are helping to make lung image segmentation models efficient and reliable, without compromising performance even for smaller original images.
2. Related Works
3. Methods
3.1. Dataset and Preprocessing
- Isotropic resampling (for 3D images): We first resampled each CT volume to an isotropic voxel spacing of 1 × 1 × 1 mm using linear interpolation. This step, performed using linear interpolation functions available in standard imaging libraries, ensures that spatial relationships between voxels are consistent across all dimensions, regardless of the original slice spacing.
- Slice-by-slice processing: Each 3D volume was then processed slice by slice. This approach allows us to apply SALM, designed for 2D inputs, to 3D volumetric data. While the Vision Transformer processes each slice independently, the modified 3D positional encoding, described in detail in Section 3.2.2, allows for the integration of 3D spatial context.
- Resizing to 1024 × 1024: Each slice, whether from an original 2D image or extracted from a 3D volume, was resized to a standard size of 1024 × 1024 pixels using linear interpolation (linear interpolation). This resizing operation ensures that the model receives input images of uniform dimensions, which is crucial for the performance of the Vision Transformer. The choice of the 1024 × 1024 size represents a compromise between preserving anatomical details and memory and computational constraints.
- Channel repetition (for 2D images): For original 2D images and slices extracted from 3D volumes, we repeated the intensity channel three times to create a 3-channel input image, compatible with the ViT architecture and consistent with the processing of color images, even though CT images are grayscale.
3.2. Model Architecture
3.2.1. Image Encoder
3.2.2. 3D Positional Encoding
3.2.3. Mask Decoder
3.3. Training Procedure
4. Results and Discussion
4.1. Quantitative Results
4.2. Qualitative Results
4.3. External Validation on PleThora Dataset
4.4. Discussion
- Tissue Density Similarity: Anatomical structures with attenuation coefficients (Hounsfield units) similar to the targeted ROI can induce classification errors. This densitometric ambiguity is particularly pronounced in transition zones between soft tissues and vascular structures. For example, blood vessels, scar tissue, and areas of fibrosis can exhibit Hounsfield values that overlap with those of lung nodules, especially part-solid or ground-glass nodules. This overlap makes it challenging for the model to solely rely on intensity information for accurate differentiation, leading to false-positive detections in regions with dense vasculature or fibrotic changes.
- Morphological Characteristics: The presence of anatomical structures with geometric characteristics similar to irregular or hidden ROI can generate segmentation errors. The model may incorrectly interpret these formations as potential ROI. In advanced lung disease, ROI such as tumors or affected lung parenchyma can exhibit highly irregular shapes and may be partially obscured or hidden by surrounding diseased tissue, pleural effusions, or consolidation, as frequently observed in the PleThora dataset. SALM, trained primarily on datasets with relatively well-defined ROI like LUNA16, may struggle to accurately delineate these highly irregular and poorly defined ROI in advanced pathological cases. The model’s reliance on shape features learned from less complex examples might lead to under-segmentation or inaccurate boundary delineation when confronted with the complex and atypical morphologies characteristic of advanced lung pathologies. This is further compounded by the fact that, in diseased lungs, the contrast between the ROI and the surrounding abnormal tissue may be reduced, making morphological differentiation even more challenging for the model.
- Positional Encoding Limitations: Overpredictions can result from an imperfect capture of three-dimensional spatial relationships, despite the use of modified positional encoding. Peripheral areas of anatomical structures, where intensity gradients are less pronounced, are particularly susceptible to misclassification. While our modified 3D positional encoding enhances spatial awareness, it is not perfect. In regions where anatomical structures are complex or boundaries are ill-defined, particularly at the periphery of organs or lesions where intensity transitions are gradual, the model might struggle to precisely delineate the ROI. This can lead to over-segmentation extending into surrounding tissues or misclassification of adjacent structures due to limitations in fully capturing the intricate 3D spatial context.
- Acquisition Artifacts: Artifacts inherent to CT imaging, such as beam hardening or partial volume effects, can create intensity variations that mislead the model, particularly at interfaces between different tissue types. Beam hardening artifacts, often seen near dense bone structures like ribs, and partial volume averaging, especially in regions with thin slices or complex anatomy, can introduce artificial intensity gradients and noise. These artifacts can create spurious features that the model might misinterpret as relevant structures or boundaries, leading to false-positive detections or inaccurate segmentation, especially at the lung periphery or near the chest wall.
- Refinement for Small, Irregular, and Hidden ROI: To improve segmentation accuracy for ROI that are not only small and irregularly shaped but also poorly defined or hidden due to the complexities of advanced lung disease, future work could focus on incorporating more advanced data augmentation techniques, such as localized deformations and synthetic ROI generation [60], specifically targeting the variability in size, shape, and visibility of lesions, particularly in advanced pathological cases like those observed in the PleThora dataset. Furthermore, research could investigate the incorporation of attention mechanisms that are more sensitive to fine-grained details and boundary information, potentially through novel Transformer architectures or modifications to the existing self-attention layers, to better capture the indistinct and ambiguous boundaries of hidden or irregular ROI. Finally, exploring loss functions that explicitly emphasize boundary sharpness and penalize over-segmentation, such as boundary loss [61] or contour loss [62], could further improve delineation accuracy for challenging ROI with complex and ill-defined borders, often encountered in diseased lung tissue.
- Addressing Over-segmentation and False Positives: To challenge over-segmentation and reduce false-positive detections, particularly arising from tissue density similarity and acquisition artifacts, future research could explore the incorporation of anatomical priors and contextual information into the model architecture. This might involve integrating anatomical segmentation maps [63] as additional input channels or exploring hierarchical segmentation approaches [64] that first segment broader anatomical regions before refining the ROI segmentation. Furthermore, exploring contrastive learning techniques [65] to better discriminate between ROI and similar-appearing background structures could enhance the model’s robustness to densitometric ambiguities.
- Exploration of Advanced Architectures for Efficiency and Performance: To further improve both the efficiency and the performance of SALM, future work could investigate the integration of novel Transformer architectures. Specifically, the exploration of xLSTM [66], with its ability to process sequential data efficiently across multiple timescales, could be beneficial for modeling temporal dependencies in longitudinal studies or dynamic contrast-enhanced CT. Mamba [67], with its selective state space model, offers a promising avenue for capturing long-range dependencies in both spatial and temporal dimensions with high computational efficiency, potentially allowing for more efficient processing of large 3D volumes. The integration of these advanced architectures, combined with the modified positional encoding scheme described in this paper, could lead to significant improvements in segmentation accuracy, robustness, and clinical utility, paving the way for real-time applications and deployment on resource-constrained devices.
- Incorporation of Clinical and Demographic Data: To further enhance the clinical relevance and diagnostic potential of SALM, future work could explore the possibility of incorporating additional clinical information, such as patient demographics, medical history, and radiology reports, as auxiliary inputs to the model. This could enable SALM to learn more nuanced relationships between image features and clinical context, potentially improving segmentation accuracy and enabling more informed diagnostic predictions or risk stratification.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
2D | two-dimensional |
3D | three-dimensional |
ACDC | Automatic Cancer Detection and Classification |
ACM | active contour model |
AI | artificial intelligence |
AUC | Area under the Curve |
BCE | binary cross-entropy |
CAD | computer-aided detection |
CNN | Convolutional Neural Network |
CRF | conditional random field |
CT | Computed Tomography |
DSC | Dice Similarity Coefficient |
FMs | Foundation Models |
FCN | Fully Convolutional Network |
FFN | Feedforward Network |
HU | Hounsfield unit |
kg CO2 eq/kWh | kilograms of carbon dioxide equivalent per kilowatt-hour |
LIDC-IDRI | Lung Image Database Consortium and Image Database Resource Initiative |
LUNA16 | Lung Nodule Analysis 2016 |
MRI | Magnetic Resonance Imaging |
MSA | Multi-head Self-Attention |
NSCLC | Non-Small Cell Lung Cancer |
PE | positional encoding |
PUE | Power Usage Effectiveness |
ROC | Receiver Operating Characteristic |
ROI | Region of Interest |
SALM | Segment Anything in Lung Model |
UNET | U-Shaped Network |
ViT | Vision Transformer |
Appendix A. Carbon Footprint
- # GPUs: the number of GPUs used for training;
- GPU Power Consumption (W): the power consumption of a single GPU in watts;
- TT (h): the total training time in hours;
- PUE (Power Usage Effectiveness): a factor that accounts for the energy used by the entire data center infrastructure, including cooling and other overhead. We use a PUE of 1.1, which is in line with recommendations from [71] and represents a relatively efficient data center.
Stage | No. of GPUs | Training Time (h) | Energy Consumption (kWh) | Carbon Emissions (kg CO2eq) |
---|---|---|---|---|
Hyperparameter Tuning | 1 | 2016 | 887.04 | 1.508 |
Final Model Training | 1 | 672 | 295.68 | 0.503 |
Total | - | 2688 | 1182.72 | 2.011 |
References
- World Health Organization. Cancer. 2023. Available online: https://www.who.int/news-room/fact-sheets/detail/cancer (accessed on 24 June 2024).
- Razzak, M.I.; Naz, S.; Zaib, A. Deep learning for medical image processing: Overview, challenges and the future. In Classification in BioApps: Automation of Decision Making; Springer: Berlin/Heidelberg, Germany, 2018; pp. 323–350. [Google Scholar]
- Bates, K.; Le, K.N.; Lu, H. Deep learning for robust and flexible tracking in behavioral studies for C. elegans. PLoS Comput. Biol. 2022, 18, e1009942. [Google Scholar] [CrossRef] [PubMed]
- Lambert, T.; Waters, J. Towards effective adoption of novel image analysis methods. Nat. Methods 2023, 20, 971–972. [Google Scholar] [CrossRef]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
- Leibig, C.; Allken, V.; Ayhan, M.; Berens, P.; Wahl, S. Leveraging uncertainty information from deep neural networks for disease detection. Sci. Rep. 2016, 7, 17816. [Google Scholar] [CrossRef]
- Ghali, R.; Akhloufi, M.A. ARSeg: An Attention RegSeg Architecture for CXR Lung Segmentation. In Proceedings of the 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI), Virtual, 9–11 August 2022; pp. 291–296. [Google Scholar]
- Ghali, R.; Akhloufi, M.A. Vision Transformers for Lung Segmentation on CXR Images. SN Comput. Sci. 2023, 4, 414. [Google Scholar] [CrossRef] [PubMed]
- Li, F.; Zhou, L.; Wang, Y.; Chen, C.; Yang, S.; Shan, F.; Liu, L. Modeling long-range dependencies for weakly supervised disease classification and localization on chest X-ray. Quant. Imaging Med. Surg. 2022, 12, 3364. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Jung, K.H. Uncover this tech term: Foundation model. Korean J. Radiol. 2023, 24, 1038. [Google Scholar] [CrossRef] [PubMed]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. arXiv 2023, arXiv:2304.02643. [Google Scholar]
- Ma, J.; Wang, B. Segment anything in medical images. arXiv 2023, arXiv:2304.12306. [Google Scholar] [CrossRef] [PubMed]
- Setio, A.A.A.; Traverso, A.; de Bel, T.; Berens, M.S.; Van Den Bogaard, C.; Cerello, P.; Chen, H.; Dou, Q.; Fantacci, M.E.; Geurts, B.; et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge. Med. Image Anal. 2017, 42, 1–13. [Google Scholar] [CrossRef]
- Kiser, K.; Ahmed, S.; Stieb, S.M.; Mohamed, A.A.S.R.; Elhalawani, H.; Park, P.Y.S.; Doyle, N.S.; Wang, B.J.; Barman, A.; Fuller, C.D.; et al. Thoracic Volume and Pleural Effusion Segmentations in Diseased Lungs for Benchmarking Chest CT Processing Pipelines. The Cancer Imaging Archive (TCIA). Note: Dataset Published by The Cancer Imaging Archive. Available online: https://www.cancerimagingarchive.net/analysis-result/plethora/ (accessed on 12 February 2025). [CrossRef]
- Aerts, H.J.W.L.; Wee, L.; Velazquez, E.R.; Leijenaar, R.T.H.; Parmar, C.; Grossmann, P.; Carvalho, S.; Bussink, J.; Monshouwer, R.; Haibe-Kains, B.; et al. Data From NSCLC-Radiomics. The Cancer Imaging Archive (TCIA). Note: Dataset Published by The Cancer Imaging Archive. Available online: https://www.cancerimagingarchive.net/collection/nsclc-radiomics/ (accessed on 12 February 2025).
- Gayap, H.T.; Akhloufi, M.A. Deep machine learning for medical diagnosis, application to lung cancer detection: A review. BioMedInformatics 2024, 4, 236–284. [Google Scholar] [CrossRef]
- Bellotti, R.; Carlo, F.D.; Gargano, G.; Tangaro, S.; Cascio, D.; Catanzariti, E.; Cerello, P.; Cheran, S.; Delogu, P.; Mitri, I.D.; et al. A CAD system for nodule detection in low-dose lung CTs based on region growing and a new active contour model. Med. Phys. 2007, 34, 4901–4910. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Wang, X.; Dai, Y.; Zhang, P. Supervised recursive segmentation of volumetric CT images for 3D reconstruction of lung and vessel tree. Comput. Methods Programs Biomed. 2015, 122, 316–329. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: New York, NY, USA, 2016; pp. 565–571. [Google Scholar]
- Wu, W.; Gao, L.; Duan, H.; Huang, G.; Ye, X.; Nie, S. Segmentation of pulmonary nodules in CT images based on 3D-UNET combined with three-dimensional conditional random field optimization. Med. Phys. 2020, 47, 4054–4063. [Google Scholar] [CrossRef]
- Zhao, W.; Yang, J.; Sun, Y.; Li, C.; Wu, W.; Jin, L.; Yang, Z.; Ni, B.; Gao, P.; Wang, P.; et al. 3D deep learning from CT scans predicts tumor invasiveness of subcentimeter pulmonary adenocarcinomas. Cancer Res. 2018, 78, 6881–6889. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Mahon, R.; Weiss, E.; Jan, N.; Taylor, R.J.; McDonagh, P.R.; Quinn, B.; Yuan, L. Automated lung cancer segmentation using a PET and CT dual-modality deep learning neural network. Int. J. Radiat. Oncol. Biol. Phys. 2023, 115, 529–539. [Google Scholar] [CrossRef] [PubMed]
- Riaz, Z.; Khan, B.; Abdullah, S.; Khan, S.; Islam, M.S. Lung tumor image segmentation from computer tomography images using MobileNetV2 and transfer learning. Bioengineering 2023, 10, 981. [Google Scholar] [CrossRef] [PubMed]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Mkindu, H.; Wu, L.; Zhao, Y. Lung nodule detection in chest CT images based on vision transformer network with Bayesian optimization. Biomed. Signal Process. Control 2023, 85, 104866. [Google Scholar] [CrossRef]
- Niu, C.; Wang, G. Unsupervised contrastive learning based transformer for lung nodule detection. Phys. Med. Biol. 2022, 67, 204001. [Google Scholar] [CrossRef] [PubMed]
- Xie, Y.; Zhang, J.; Xia, Y. Semi-supervised adversarial model for benign–malignant lung nodule classification on chest CT. Med. Image Anal. 2019, 57, 237–248. [Google Scholar] [CrossRef]
- Tang, T.; Zhang, R.; Lin, K.; Li, F.; Xia, X. SM-RNet: A Scale-aware-based Multi-attention Guided Reverse Network for Pulmonary Nodules Segmentation. IEEE Trans. Instrum. Meas. 2023, 72, 3315365. [Google Scholar] [CrossRef]
- Bhattacharyya, D.; Thirupathi Rao, N.; Joshua, E.S.N.; Hu, Y.C. A bi-directional deep learning architecture for lung nodule semantic segmentation. Vis. Comput. 2023, 39, 5245–5261. [Google Scholar] [CrossRef] [PubMed]
- Said, Y.; Alsheikhy, A.A.; Shawly, T.; Lahza, H. Medical images segmentation for lung cancer diagnosis based on deep learning architectures. Diagnostics 2023, 13, 546. [Google Scholar] [CrossRef] [PubMed]
- Antonelli, M.; Reinke, A.; Bakas, S.; Farahani, K.; Kopp-Schneider, A.; Landman, B.A.; Litjens, G.; Menze, B.; Ronneberger, O.; Summers, R.M.; et al. The Medical Segmentation Decathlon. Nat. Commun. 2022, 13, 4128. [Google Scholar] [CrossRef] [PubMed]
- Zhang, F.; Wang, Q.; Fan, E.; Lu, N.; Chen, D.; Jiang, H.; Yu, Y. Enhancing non-small cell lung cancer tumor segmentation with a novel two-step deep learning approach. J. Radiat. Res. Appl. Sci. 2024, 17, 100775. [Google Scholar] [CrossRef]
- Chen, W.; Wei, H.; Peng, S.; Sun, J.; Qiao, X.; Liu, B. HSN: Hybrid segmentation network for small cell lung cancer segmentation. IEEE Access 2019, 7, 75591–75603. [Google Scholar] [CrossRef]
- He, B.; Hu, W.; Zhang, K.; Yuan, S.; Han, X.; Su, C.; Zhao, J.; Wang, G.; Wang, G.; Zhang, L. Image segmentation algorithm of lung cancer based on neural network model. Expert Syst. 2022, 39, e12822. [Google Scholar] [CrossRef]
- Armato, S.G.; McLennan, G.; Bidaut, L.; McNitt-Gray, M.F.; Meyer, C.R.; Reeves, A.P.; Zhao, B.; Aberle, D.R.; Henschke, C.I.; Hoffman, E.A.; et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Med. Phys. 2011, 38, 915–931. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Zhang, J.; Tan, T.; Teng, X.; Sun, X.; Zhao, H.; Liu, L.; Xiao, Y.; Lee, B.; Li, Y.; et al. Deep learning methods for lung cancer segmentation in whole-slide histopathology images—the acdc@ lunghp challenge 2019. IEEE J. Biomed. Health Inform. 2020, 25, 429–440. [Google Scholar] [CrossRef] [PubMed]
- Šarić, M.; Russo, M.; Stella, M.; Sikora, M. CNN-based Method for Lung Cancer Detection in Whole Slide Histopathology Images. In Proceedings of the 2019 4th International Conference on Smart and Sustainable Technologies (SpliTech), Split, Croatia, 18–21 June 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Baek, S.; He, Y.; Allen, B.G.; Buatti, J.M.; Smith, B.J.; Tong, L.; Sun, Z.; Wu, J.; Diehn, M.; Loo, B.W.; et al. Deep segmentation networks predict survival of non-small cell lung cancer. Sci. Rep. 2019, 9, 17286. [Google Scholar] [CrossRef]
- Jiang, J.; Hu, Y.C.; Tyagi, N.; Zhang, P.; Rimner, A.; Mageras, G.S.; Deasy, J.O.; Veeraraghavan, H. Tumor-aware, adversarial domain adaptation from CT to MRI for lung cancer segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, 16–20 September 2018; Proceedings, Part II 11. Springer: Berlin/Heidelberg, Germany, 2018; pp. 777–785. [Google Scholar]
- Reeves, T.; Mah, P.; McDavid, W. Deriving Hounsfield units using grey levels in cone beam CT: A clinical application. Dentomaxillofacial Radiol. 2012, 41, 500–508. [Google Scholar] [CrossRef] [PubMed]
- Xu, G.; Hao, Z.; Luo, Y.; Hu, H.; An, J.; Mao, S. DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices. IEEE Trans. Mob. Comput. 2023, 23, 5917–5932. [Google Scholar] [CrossRef]
- Li, C.; Kim, K.; Wu, B.; Zhang, P.; Zhang, H.; Dai, X.; Vajda, P.; Lin, Y. An Investigation on Hardware-Aware Vision Transformer Scaling. ACM Trans. Embed. Comput. Syst. 2023, 23, 1–19. [Google Scholar] [CrossRef]
- Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024. [Google Scholar]
- Digital Research Alliance of Canada. Narval Documentation. Note: Documentation for Narval, an Advanced Computing Resource Provided by the Digital Research Alliance of Canada. Available online: https://docs.alliancecan.ca/wiki/Narval/en (accessed on 12 February 2025).
- Mohammed, K.K.; Hassanien, A.E.; Afify, H.M. A 3D image segmentation for lung cancer using V.Net architecture based deep convolutional networks. J. Med. Eng. Technol. 2021, 45, 337–343. [Google Scholar] [CrossRef]
- Gan, W.; Wang, H.; Gu, H.; Duan, Y.; Shao, Y.; Chen, H.; Feng, A.; Huang, Y.; Fu, X.; Ying, Y.; et al. Automatic segmentation of lung tumors on CT images based on a 2D; 3D hybrid convolutional neural network. Br. J. Radiol. 2021, 94, 20210038. [Google Scholar] [CrossRef] [PubMed]
- Kamal, U.; Rafi, A.M.; Hoque, R.; Wu, J.; Hasan, M.K. Lung Cancer Tumor Region Segmentation Using Recurrent 3D-DenseUNet. In Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; pp. 36–47. [Google Scholar] [CrossRef]
- Peixoto, S.A.; Medeiros, A.G.; Hassan, M.M.; Dewan, M.A.A.; Albuquerque, V.H.C.d.; Filho, P.P.R. Floor of log: A novel intelligent algorithm for 3D lung segmentation in computer tomography images. Multimed. Syst. 2022, 28, 1151–1163. [Google Scholar] [CrossRef]
- Saood, A.; Hatem, I. COVID-19 lung CT image segmentation using deep learning methods: U-Net versus SegNet. BMC Med. Imaging 2021, 21, 19. [Google Scholar] [CrossRef] [PubMed]
- Zhao, T.; Gao, D.; Wang, J.; Yin, Z. Lung segmentation in CT images using a fully convolutional neural network with multi-instance and conditional adversary loss. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; IEEE: New York, NY, USA, 2018; pp. 505–509. [Google Scholar]
- Murugappan, M.; Bourisly, A.K.; Prakash, N.B.; Sumithra, M.G.; Acharya, U.R. Automated semantic lung segmentation in chest CT images using deep neural network. Neural Comput. Appl. 2023, 35, 15343–15364. [Google Scholar] [CrossRef] [PubMed]
- NVIDIA. NVIDIA GeForce GTX 1080 Ti: The Ultimate GeForce. Note: Official Product Announcement Page for the NVIDIA GeForce GTX 1080 Ti Graphics Card. Available online: https://www.nvidia.com/en-us/geforce/news/gfecnt/nvidia-geforce-gtx-1080-ti/ (accessed on 12 February 2025).
- Apple. MacBook Pro-Buy. Note: Product Page for Purchasing MacBook Pro Computers from Apple Canada. Manufacturer: Apple Inc., City: Cupertino, Country: USA. Available online: https://www.apple.com/ca/shop/buy-mac/macbook-pro (accessed on 12 February 2025).
- Wang, J.; Oliveira, M.M. A hole-filling strategy for reconstruction of smooth surfaces in range images. In Proceedings of the 16th Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2003), São Carlos, Brazil, 12–15 October 2003; IEEE: New York, NY, USA, 2003; pp. 11–18. [Google Scholar]
- Hasan, M.M.; Mishra, P.K. Improving morphology operation for 2D hole filling algorithm. Int. J. Image Process. IJIP 2012, 6, 635–646. [Google Scholar]
- Chudasama, D.; Patel, T.; Joshi, S.; Prajapati, G.I. Image segmentation using morphological operations. Int. J. Comput. Appl. 2015, 117, 8887. [Google Scholar] [CrossRef]
- Kim, Y.; Lee, J.H.; Kim, C.; Jin, K.N.; Park, C.M. GAN based ROI conditioned synthesis of medical image for data augmentation. In Proceedings of the Medical Imaging 2023: Image Processing, San Diego, CA, USA, 19–23 February 2023; SPIE: Bellingham, WA, USA, 2023; Volume 12464, pp. 752–758. [Google Scholar]
- Kervadec, H.; Bouchtiba, J.; Desrosiers, C.; Granger, E.; Dolz, J.; Ayed, I.B. Boundary loss for highly unbalanced segmentation. In Proceedings of the International Conference on Medical Imaging with Deep Learning, PMLR, London, UK, 8–10 July 2019; pp. 285–296. [Google Scholar]
- Chen, Z.; Zhou, H.; Lai, J.; Yang, L.; Xie, X. Contour-aware loss: Boundary-aware learning for salient object segmentation. IEEE Trans. Image Process. 2020, 30, 431–443. [Google Scholar] [CrossRef]
- Evans, A.; Collins, D.; Holmes, C. Automatic 3D regional MRI segmentation and statistical probability anatomy maps. In Quantification of Brain Function Using PET; Elsevier: Amsterdam, The Netherlands, 1996; pp. 123–130. [Google Scholar]
- Lee, D.U.; Cheung, R.C.; Luk, W.; Villasenor, J.D. Hierarchical segmentation for hardware function evaluation. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2008, 17, 103–116. [Google Scholar] [CrossRef]
- Hu, H.; Wang, X.; Zhang, Y.; Chen, Q.; Guan, Q. A comprehensive survey on contrastive learning. Neurocomputing 2024, 610, 128645. [Google Scholar] [CrossRef]
- Beck, M.; Pöppel, K.; Spanring, M.; Auer, A.; Prudnikova, O.; Kopp, M.; Klambauer, G.; Brandstetter, J.; Hochreiter, S. xLSTM: Extended Long Short-Term Memory. arXiv 2024, arXiv:2405.04517. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
- Lacoste, A.; Luccioni, A.; Schmidt, V.; Dandres, T. Quantifying the carbon emissions of machine learning. arXiv 2019, arXiv:1910.09700. [Google Scholar]
- Strubell, E.; Ganesh, A.; McCallum, A. Energy and policy considerations for modern deep learning research. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13693–13696. [Google Scholar]
- Wu, C.J.; Raghavendra, R.; Gupta, U.; Acun, B.; Ardalani, N.; Maeng, K.; Chang, G.; Aga, F.; Huang, J.; Bai, C.; et al. Sustainable ai: Environmental implications, challenges and opportunities. Proc. Mach. Learn. Syst. 2022, 4, 795–813. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Environment and Climate Change Canada. Federal Greenhouse Gas Offset System: Emission Factors and Reference Values. 2023. Available online: https://www.canada.ca/en/environment-climate-change/services/climate-change/pricing-pollution-how-it-will-work/output-based-pricing-system/federal-greenhouse-gas-offset-system/emission-factors-reference-values.html (accessed on 12 February 2025).
Ref. | Objective | Database | Results (%) |
---|---|---|---|
[31] | Detection of pulmonary nodules | LUNA16 [14] | DSC = 88.89 |
[32] | Early detection of lung cancer | MSD [33] | Seg. accuracy = 97.83 |
Class. accuracy = 98.77 | |||
[24] | Tumor segmentation for radiotherapy planning | Private clinical database | DSC = 83 |
[25] | Cancerous lesion segmentation | MSD [33] | DSC = 87.93 |
Recall = 86.02 | |||
Precision = 93 | |||
[34] | Accurate tumor segmentation in NSCLC treatment | NSCLC cases | DSC = 80 |
[35] | Segmentation | 134 scans CT | Dice = 88.8 |
Sensitivity = 87.2 | |||
Precision = 90.9 | |||
[36] | Recognition of lung cancer | LIDC-IDRI [37] | Sensitivity = 95.7 |
[38] | Lung cancer detection challenge | ACDC@LungHP Database [39] | - |
[40] | Lung cancer segmentation | NSCLC | DSC = 86.1 |
[41] | Automatic tumor segmentation from T2 MRI | 377 patients CT + 6 IRM | DSC = 80 |
[18] | Automatic detection of lung nodules using region growth algorithms and active contour models | 15 CT scans with approx. 4700 slices | Detection rate: 88.5, false positives: 6.6 per CT |
[19] | Supervised semi-3D segmentation of lung tissue and reconstruction of untrimmed 3D models | 15 scans of healthy and early-stage lung tumor patients | Outperforms in accuracy and speed |
[22] | Accurate segmentation of pulmonary nodules using 3D-UNET network optimized by 3D-CRF | 936 lung nodules from LIDC-IDRI, validated on clinical data. | Dice score: 80.1 |
[23] | Automatic classification and segmentation of lung nodules using 3D CNN and multitask learning | 651 nodules annotated with segmentation masks and pathological labels | Weighted average F1-score: 63.3 vs. radiologists 51.0 to 56.6 |
Model | Accuracy | AUC | DSC |
---|---|---|---|
SegNet [52] | 95.00 | N/A | N/A |
UNET [52] | 91.00 | N/A | N/A |
FCN [53] | N/A | N/A | 92.00 |
DeepLabV3 [54] | 94.90 | N/A | N/A |
SALM | 99.00 | 99.00 | 93.00 |
Model | F1-Score | DSC | Test Size | Database (No. of Patients) |
---|---|---|---|---|
FCN V.Net [48] | N/A | 80.00 | 32 | 96 |
Hybrid CNN [49] | N/A | 58.00 | N/A | 260 |
Recurrent 3D-DenseUNet [50] | N/A | 72.28 | N/A | 300 |
FoL [51] | N/A | 83.68 | 430 slices | 40 |
SAM [12] | 42.83 | 43.95 | 174 | - |
SALM-3D | 75.57 | 81.88 | 174 | 888 |
Positional Encoding | F1-Score (%) | DSC (%) |
---|---|---|
Standard | 56.23 | 59.53 |
Modified (3) | 75.57 | 81.88 |
GPU | Processing Time (Minutes) | Images per Second |
---|---|---|
NVIDIA GeForce GTX 1080 Ti (12 GB) | 2.25 | 1.68 |
Apple M3 (16 GB) | 7.51 | 1.92 |
Metric | Value |
---|---|
Mean Dice Score | 0.7882 ± 0.1514 |
Mean Accuracy | 0.9764 ± 0.0146 |
AUC-ROC | 0.8592 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gayap, H.T.; Akhloufi, M.A. SALM: A Unified Model for 2D and 3D Region of Interest Segmentation in Lung CT Scans Using Vision Transformers. Appl. Biosci. 2025, 4, 11. https://doi.org/10.3390/applbiosci4010011
Gayap HT, Akhloufi MA. SALM: A Unified Model for 2D and 3D Region of Interest Segmentation in Lung CT Scans Using Vision Transformers. Applied Biosciences. 2025; 4(1):11. https://doi.org/10.3390/applbiosci4010011
Chicago/Turabian StyleGayap, Hadrien T., and Moulay A. Akhloufi. 2025. "SALM: A Unified Model for 2D and 3D Region of Interest Segmentation in Lung CT Scans Using Vision Transformers" Applied Biosciences 4, no. 1: 11. https://doi.org/10.3390/applbiosci4010011
APA StyleGayap, H. T., & Akhloufi, M. A. (2025). SALM: A Unified Model for 2D and 3D Region of Interest Segmentation in Lung CT Scans Using Vision Transformers. Applied Biosciences, 4(1), 11. https://doi.org/10.3390/applbiosci4010011