Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans

Tavakolian, Niloofar; Nazemi, Azadeh; Suen, Ching Yee

doi:10.3390/app15052516

Open AccessArticle

Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans

by

Niloofar Tavakolian

¹

,

Azadeh Nazemi

²

and

Ching Yee Suen

^1,*

¹

Gina Cody School of Engineering and Computer Science, Concordia University, Montreal, QC H3H 2L9, Canada

²

School of Computing, Engineering & The Built Environment, Edinburgh Napier University, Edinburgh EH10 5DT, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(5), 2516; https://doi.org/10.3390/app15052516

Submission received: 16 December 2024 / Revised: 24 January 2025 / Accepted: 3 February 2025 / Published: 26 February 2025

(This article belongs to the Special Issue Computer Vision for Medical Informatics and Biometrics Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate and timely segmentation of liver trauma in computed tomography (CT) images is essential for effective diagnosis and management in emergency medicine. This review examines advancements in liver segmentation techniques from 1993 to 2024, focusing on deep learning models and their impact on improving diagnostic accuracy for liver injuries. Early methods relied on basic image processing, which faced limitations due to noise, intensity variations, and complex abdominal anatomy. The advent of deep learning has transformed this domain, with architectures such as UNet, UNet++, UNet3+, multiscale large kernel (MSLUNet), and Swin-Unet achieving significant improvements in segmentation precision. Additionally, generative adversarial networks (GANs), including conditional GAN and pixel-to-pixel (Pix2Pix) GAN, have further enhanced image quality and detail, addressing deficiencies in traditional methods. This review provides a comparative analysis of these models, highlighting their strengths and limitations in liver injury segmentation.

Keywords:

medical image segmentation; deep learning; convolutional neural networks (CNN); abdominal trauma analysis

1. Introduction

Medical image segmentation is essential in modern healthcare, providing critical support for diagnosis, treatment planning, surgical guidance, and patient monitoring. Over recent decades, the field has progressed from manual analysis to sophisticated automated systems, fueled by advancements in computing power and artificial intelligence (AI). While manual and semi-automatic methods are still in use, they are time-intensive, dependent on operator expertise, and often costly, highlighting the urgent need for fully automated, reliable segmentation solutions. This review explores the evolution of medical image segmentation for abdominal and liver trauma analysis, emphasizing key technological advancements that have transformed clinical workflows. Precise segmentation of liver and abdominal organs is particularly valuable in pre-operative planning and surgical guidance, where accurate lesion and trauma segmentation can allow for more conservative resections, preserving healthy tissue and improving patient outcomes. AI-driven techniques, especially deep learning, have led to substantial improvements in segmentation accuracy and speed, with convolutional neural networks (CNNs), U-Net architectures, and novel adaptations like MSLUNet showing promising results in trauma analysis. Regarding the segmentation masks used in CT scan analysis, distinguishing between tumors in soft tissue organs and lacerated tissues from trauma can be challenging, particularly if the segmentation is solely based on the image data without clinical context. Additionally, automatic segmentation for abdominal and liver trauma remains challenging due to:

Artifacts in CT images and variations in intensity information [1];
Irregular shapes and diverse sizes of anatomical structures [2];
Low contrast between liver parenchyma and lesions [3];
Weak boundaries between the liver and adjacent abdominal organ [4].

These complexities necessitate robust, adaptive artificial intelligence (AI) models that can handle noisy, diverse data while achieving high segmentation accuracy. Emerging approaches, such as multiscale feature extraction and attention mechanisms, have shown potential in addressing these issues, and reliable segmentation even in complex anatomical regions. This review provides a comprehensive overview of state-of-the-art methods and discusses future directions to improve the performance and clinical applicability of AI-driven segmentation tools in abdominal trauma, specifically liver analysis.

The rest of this article is structured as follows: Section 2 is a summary of the historical evolution of medical image analysts. Section 3 presents abdominal medical image datasets and then briefly explains evaluation metrics for medical image analysis. Section 4 presents a comparison between different network approaches, and finally, the conclusion is presented.

2. Historical Evolution

2.1. Early Foundations (1993–2007)

The foundation of modern medical image analysis was solidified in 1993 when Roberts et al. introduced a systematic approach to CT-based diagnosis, highlighting the modality’s superior contrast resolution and its capability to visualize internal organs clearly without structural overlap. This method proved particularly effective for diagnosing abdominal and pelvic trauma in selected patient populations, demonstrating distinct advantages over other diagnostic techniques [5]. Further advancements were documented in 2007 by Hamidi et al. in a comprehensive study involving 245 patients with blunt abdominal trauma. Their retrospective analysis of CT scans correlated with surgical outcomes and clinical follow-ups established the reliability of CT for this purpose. The study reported a sensitivity of 97% and specificity of 95%, validating CT [6]. Concurrently, Taourel et al. (2007) focused on vascular emergencies in liver trauma, developing specialized CT protocols to identify and grade various conditions such as lacerations, parenchymal hematomas, contusions, and active bleeding. Their work emphasized the diagnostic precision of CT in assessing vascular complications and elementary lesions in liver trauma, contributing significantly to the management and treatment strategies based on early diagnosis and possible arterial embolization [7].

2.2. Transition to Automated Methods (2010–2018)

In 2010, Gao Yan et al. introduced a method for automatic kidney segmentation from 2D abdominal CT images. The complexity of kidney segmentation arises from the similar grey levels of adjacent organs, the effects of contrast media, and the high variability in organ positions and shapes within abdominal CT scans. Traditionally, kidney segmentation has been performed manually or semi-automatically. Gao Yan’s team proposed an enhanced connected component labeling algorithm based on intensity values to estimate kidney positions. This was followed by a method that combines multiscale mathematical morphology and labeling algorithms to delineate fine kidney regions accurately [8].

In 2017, Negar Farzaneh et al. developed a fully automated Bayesian-based method for 3D liver segmentation. Their technique achieved high accuracy, demonstrated by Dice and Jaccard similarity coefficients of 93.5% and 87.9%, respectively, highlighting its effectiveness in precise volumetric analysis of the liver [9]. Simultaneously, in 2017, R. Vivanti et al. presented an approach for the automatic detection of new tumors and quantification of tumor burden in horizontal liver CT scans. Recognizing the challenge of identifying small new tumors that radiologists might overlook during routine scans, they utilized baseline and follow-up CT scans, along with baseline tumor delineations and a tumor appearance prior model. By employing a convolutional neural network classifier, they enhanced new tumor detection rates significantly, achieving a true positive rate of 86% compared to the previous 72% [10]. In 2018, Negar Farzaneh et al. proposed a method for automated kidney segmentation that utilizes ensemble learning coupled with active contour modeling. This approach begins with the detection of an initial mask within the kidney region, which is then refined by evolving its boundary. Their technique demonstrated high efficacy, achieving an average recall score of 92.6% and a Dice similarity value of 88.9% [11].

2.3. Deep Learning Revolution (2018–2024)

2.3.1. Deep Learning System (DLS) Using Contrast-Enhanced CT

In 2018, Kyu Jin Choi et al. conducted a DLS for accurately staging liver fibrosis using contrast-enhanced CT images during the portal venous phase, leveraging a large dataset of CT images from 7461 patients with confirmed liver fibrosis. The system’s effectiveness was validated on separate datasets comprising 891 patients, with detailed logistic regression analyses examining the impact of patient demographics and CT imaging techniques on diagnostic accuracy. The DLS’s performance was compared to traditional methods such as radiologist evaluations, employing metrics like the area under the receiver operating characteristic curve (AUROC) and the Obuchowski index and achieving a staging accuracy of 79.4% and AUROC scores of 0.96, 0.97, and 0.95 for diagnosing significant fibrosis, advanced fibrosis, and cirrhosis, respectively [12].

2.3.2. Deep Convolutional Neural Network (DCNN)

In 2018, Koichiro Yasaka et al. conducted a retrospective clinical study utilizing deep learning to stage liver fibrosis from CT images. This pilot study included 496 CT scans from 286 patients, leveraging a development set of 396 portal phase images and a test set of 100 images, with both sets sorted by fibrosis stage. A DCNN was trained with augmentation techniques such as rotation and noise addition. Their method achieved AUROC for diagnosing significant fibrosis, advanced fibrosis, and cirrhosis of 0.74, 0.76, and 0.73, respectively [13].

2.3.3. UNet++

In 2018, Zhou et al. introduced UNet++, to enhance medical image segmentation by leveraging a deeply supervised encoder-decoder network with nested, dense skip pathways. The re-designed skip pathways aim to reduce the semantic gap between the feature maps of the encoder and decoder sub-networks. Additionally, integrating deep supervision improves segmentation accuracy, particularly for lesions appearing at multiple scales. The authors evaluated UNet++ on four diverse medical imaging datasets, including lung nodule segmentation in CT scans, colon polyp segmentation in colonoscopy videos, cell nuclei segmentation in microscopy images, and liver segmentation in abdominal CT scans. Their experiments revealed significant performance improvements, with UNet++ achieving an average IoU gain of 3.9 and 3.4 points compared to U-Net and wide U-Net, respectively, highlighting its efficacy across various medical imaging tasks [14].

2.3.4. Dense VNet

In 2018, Eli Gibson et al. developed a dense VNet deep-learning-based registration-free multi-organ segmentation. The targeted organs include the pancreas, parts of the gastrointestinal tract (esophagus, stomach, and duodenum), as well as adjacent organs such as the liver, spleen, left kidney, and gallbladder. This study thoroughly assessed the segmentation accuracy of the proposed approach by comparing it to existing methods, including traditional deep learning and multi-atlas label fusion (MALF) techniques, through cross-validation on a dataset from multiple centers involving 90 subjects. This method demonstrated clear improvements, achieving higher Dice similarity scores and reduced mean absolute distances across most organs. For example, Dice scores reached 0.78 for the pancreas, 0.90 for the stomach, and 0.76 for the esophagus [15].

2.3.5. Fully Convolutional Network (FCN) and Conditional GAN

In 2019, Ben-Cohen et al. developed a system to generate virtual positron emission tomography (PET) images from CT scans utilizing FCN and conditional GAN technologies. This system was designed to simulate PET data from CT scans, which could potentially reduce false positives in lesion detection. Clinically, the technology offers a significant advantage by enabling lesion detection and drug treatment assessments in a CT-only setting, reducing the dependency on more costly and radiation-heavy PET/CT scans. The research utilized 60 PET/CT scans from Sheba Medical Center, with 23 for training and 37 for testing. This incorporated the synthetic PET data as a mechanism to decrease false positives in the identification of malignant liver lesions. The initial outcomes are encouraging, with a 28% reduction in the average number of false positives per case, decreasing from 2.9 to 2.1 [16].

2.3.6. Unet3+

In 2020, Huang et al. proposed UNet 3+, an advanced version of the U-Net architecture designed for improved semantic segmentation in medical imaging. The architecture introduces full-scale skip connections, which effectively integrate low-level details with high-level semantic features across multiple scales, and employs deep supervision to enhance hierarchical representation learning. Additionally, a hybrid loss function and a classification-guided module are incorporated to improve boundary accuracy and minimize over-segmentation in non-organ regions. UNet 3+ achieves higher segmentation accuracy with fewer network parameters, offering computational efficiency. The model achieved Dice scores of 0.9675 and 0.9620 for liver and spleen segmentation, respectively [17].

2.3.7. Multiscale DL and CART

In 2020, David Dreizin et al. presented the findings of a retrospective study focused on the automatic detection of blunt hepatic injuries. They utilized a multiscale deep learning algorithm, trained on manually labeled data with an 80/20% cross-validation split, to derive voxel-wise measurements of liver lacerations. Liver volume was determined using a pre-trained liver segmentation model. The Liver Parenchymal Disruption Index (LPDI) was automatically calculated for each patient. To enhance diagnostic accuracy, a classification and regression tree (CART) analysis incorporated both automated LPDI measurements and manually segmented contrast extravasation (CE) volumes. They achieved a Dice score of 0.61 for blunt hepatic injury [18].

2.3.8. Deep Learning Algorithm (DLA)

In 2020, Yura Ahn et al. developed a method to address time-consuming segmentation for liver and spleen volumes. They used a DLA for fully automated segmentation in portal venous phase CT images across various liver conditions. The DLA was trained on a dataset comprising portal venous phase images from 813 patients and evaluated across two separate test datasets: the first included 150 CT examinations of patients with conditions ranging from healthy to cirrhotic livers and post-hepatectomy cases; the second dataset comprised 50 pairs of CT examinations from different institutions. The DLA demonstrated high accuracy with mean Dice similarity score (DSS) of 0.973 and 0.974 for liver and spleen, respectively, across varying liver conditions and showed no significant performance drop in images from external sources, highlighting its robustness and potential for clinical application [19].

2.3.9. Residual Attention UNet

In 2020, Qiangguo Jin et al. developed a 3D hybrid deep attention-aware network to extract liver/tumors from CT scans, utilizing a modified residual attention U-Net (RA-UNet). This approach integrated U-Net’s ability to capture multiscale attention information, merging low-level and high-level features. They introduced the attention residual mechanism to medical imaging, optimizing the model with fewer parameters than typical approaches. Their work expanded the U-Net framework to include 3D liver/tumor segmentation tasks. The system featured a two-stage process: initial liver localization by RA-UNet-I, and precise liver segmentation and tumor lesion identification by RA-UNet-II. The model was trained on both the LiTS and the 3DIRCADb datasets. The training process is notably time-intensive, primarily due to the use of 3D convolutions. Their result presented a Dice score of 0.595 and 0.830 for tumor segmentation on LiTS and 3DIRCADb datasets, respectively [20].

2.3.10. Encoder-Decoder

In 2020, Setareh Dabiri et al. developed a method to automatically identify the central axial slice at the third lumbar vertebra (L3) vertebral level in CT scans and segment tissues including skeletal muscle (SM), subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and intermuscular adipose tissue (IMAT). The approach involved an L3 slice localization network, trained on over 12,000 images, and a muscle-fat segmentation network based on an encoder-decoder CNN structure. Using CT datasets from cancer patients, the method achieved a slice localization error of 0.87 ± 2.54 on 1748 CT scans. For tissue segmentation, mean Jaccard scores were 97% for SM, 98% for SAT, 97% for VAT, and 83% for IMAT, evaluated on datasets of 1327 and 1202 test samples [21].

2.3.11. S-Net

In 2021, Shunyao Luan et al. introduced the S-Net, a neural network that integrates attention mechanisms and long-hop connections for liver tumor segmentation from CT images. Their method, featuring a classic encoder-decoder structure with added attention, enhanced the semantic information processing across the network. The effectiveness of S-Net was validated on several datasets including MICCAI 2017 LiTS, where it achieved a Dice global score for tumor segmentation of 0.7555 and a Dice per case score of 0.613, demonstrating its potential for improving radiotherapy planning [22].

2.3.12. Conditional GAN

In 2021, Pierre-Henri Conze et al. developed an advanced deep learning model for fully automated multi-organ segmentation from abdominal CT images, based on a conditional generative adversarial network (cGAN) framework. The model employs a discriminator to refine organ boundaries and a generator with cascaded, partially pre-trained convolutional encoder-decoder networks. By fine-tuning non-medical images, the model addresses data scarcity and achieves multi-level segmentation refinements using auto-context. It demonstrated exceptional performance, earning first place in liver CT, liver MR, and multi-organ MR segmentation at the CHAOS challenge, with Dice scores of 97.95%, 89.67%, 90.56%, and 84.70% for liver, spleen, right kidney, and left kidney segmentation, respectively [23].

2.3.13. RetinaNet and U-Net

In 2021, José Denes Lima Araújo et al. presented a comprehensive framework combining two CNN models: RetinaNet for initial lesion detection and U-Net for detailed segmentation. The method incorporates image acquisition and pre-processing in segmentation refinement, achieving a Dice coefficient of 82.99% and a Matthews correlation coefficient (MCC) of 83.62%. This demonstrates the method’s capability to handle the variability in lesion shapes and sizes, providing accurate and reliable segmentation results [24].

2.3.14. Multiscale DL and Support Vector Machine (SVM)

In 2021, David Dreizin et al. evaluated a multiscale deep learning algorithm aimed at the quantitative visualization and measurement of traumatic hemoperitoneum, comparing its diagnostic performance against categorical estimation methods. They employed Dice similarity coefficients (DSCs) to assess the performance of a three-dimensional (3D) U-Net and a coarse-to-fine deep learning method. Additionally, an SVM with a radial basis function was used to derive an optimal cutoff. The results showed that the mean DSC for the multiscale algorithm was 0.61, significantly outperforming the 0.32 achieved by the 3D U-Net and the 0.52 by the coarse-to-fine method, indicating a superior accuracy of the multiscale approach in detecting and quantifying traumatic hemoperitoneum [25].

2.3.15. Recursive Cascaded Network (RCN)

In 2021, Shaodi Yang et al. developed an enhanced unsupervised-learning-based framework for multi-organ registration on 3D abdominal CT images. They incorporated a coarse-to-fine RCN within a standard U-Net architecture to achieve precise 3D abdominal CT image multi-organ registration results. To ensure the integrity of the registered images, they integrated a topology-preserving loss into the total loss function, which helped to prevent distortion in the predicted transformations. The efficacy of their method was validated using four public databases, demonstrating high-precision clinical multi-organ registration results suitable for real-time applications. Their experimental results indicate a Dice score of 97.75% for multi-organ (liver, spleen, right kidney, and left kidney) segmentation [26].

2.3.16. CNN

In 2022, Negar Farzaneh et al. developed an end-to-end pipeline for quantifying liver parenchymal disruption due to trauma using three-dimensional contrast-enhanced CT scans. The process involved generating segmentation masks for both normal and trauma-affected liver regions, leveraging expert knowledge to reduce false positives by incorporating typical liver trauma patterns. Volumes were then quantified to calculate the extent of parenchymal disruption. Utilizing deep convolutional neural networks trained and validated on a dataset from the University of Michigan Health System, the model achieved high accuracy for liver parenchyma (96.13%, 96.00%, 96.35%) and moderate accuracy for trauma regions (51.21%, 53.20%, 56.76%) [27].

2.3.17. Swin-Unet

In 2023, Hu Cao et al. proposed Swin-Unet, a Transformer-based U-shaped encoder-decoder architecture for medical image segmentation. Addressing the limitations of convolutional neural networks (CNNs) in capturing global and long-range semantic interactions, Swin-Unet utilizes hierarchical Swin Transformers with shifted windows as the encoder to extract context features and a symmetric Swin Transformer-based decoder with patch-expanding layers. They achieved a Dice score of 79.13 for accurately segmenting various organs, including the aorta, gallbladder, spleen, left kidney, right kidney, liver, pancreas, and stomach [28].

2.3.18. Pix2Pix

In 2023, Ali Jamali et al. introduced an AI-based decision support system designed to enhance the triage process for liver trauma using CT scans. The system employs a generative adversarial network (GAN), specifically the Pix2Pix model, to detect liver bleeding and lacerations effectively, achieving high accuracy with Dice scores of 97% for the liver and 93% for lacerations. They trained the model on 3Dircadb with liver tumor and tested it on a local dataset with laceration and achieved high evaluation metrics [29].

2.3.19. 3D Convolutional Block Attention Module Neural Network (CBAMNN)

In 2024, Chi-Tung Cheng et al. introduced a deep learning model leveraging the 3D convolutional block attention module (CBAM) to rapidly identify life-threatening solid organ injuries, including spleen, liver, and kidney damage, in trauma settings. Using abdominal CT scans from a single trauma center (2008–2017), the model classified cases as positive or negative based on organ injuries. The dataset comprised 1302 scans (87%) for training and validation and 194 scans (13%) for testing. Evaluated using AUROC, accuracy, sensitivity, specificity, and predictive values, the model delivered impressive results: spleen injury detection achieved 93.8% accuracy and 95.2% specificity; liver injury detection reached 82.0% accuracy and 84.7% specificity; and kidney injury detection achieved 95.9% accuracy and 98.9% specificity [30].

2.3.20. MSLUNet

In 2024, Zhu and Cheng presented MSLUNet, a network designed to enhance medical image segmentation. This network combines a symmetric multiscale feature extraction module with a three-branch attention mechanism, tailored to reduce computational demands and parameter counts, making it ideal for use on mobile medical devices. They used small convolutional kernels in the encoder to capture detailed features across scales and an inverse bottleneck structure in the decoder for effective feature integration. MSLUNet’s attention mechanism improves segmentation accuracy by focusing on relevant features and capturing global contextual data. Their evaluation metrics on MSLUNet are Dice, recall, precision, and specificity of 91.1, 93.2, 95.6, and 94.9, respectively [31].

2.3.21. Deep Abdominal Net

In 2024, Xinru Shen et al. developed a deep-learning-based detection algorithm aimed at enhancing the initial screening of internal abdominal injuries on computed tomography (CT) scans, which are essential for diagnosing traumatic abdominal injuries. Given the challenges of interpreting CT images in emergency contexts, this approach offers a solution through automated image analysis. Utilizing a dataset of 3147 patients, 855 of whom had confirmed abdominal trauma, the researchers applied a 2D semantic segmentation model and a 2.5D classification model to estimate injury probabilities for specific organs. Their model achieved impressive accuracy, especially in detecting renal injuries (0.932) [32].

Table 1 presents the reviewed deep learning methods and their evaluation metrics based on their objectives. Based on these tables, the methods that specifically worked on liver/liver lesions are extracted for further studies.

3. Datasets

This section provides an overview of several pivotal datasets employed in the field of medical image segmentation. Each dataset, created for specific segmentation challenges, plays a crucial role in advancing the development and evaluation of computational methods for medical imaging. These resources are integral for researchers focusing on various organs and conditions, from liver tumors/trauma to multi-organ analysis, offering a rich ground for algorithmic innovation and benchmarking.

3.1. BTCV

The BTCV dataset, designed for assessing 3D abdominal CT image segmentation methods, serves as a benchmark for performance comparisons. Its training set includes 30 annotated samples, with segmentation masks provided for four organs: the liver, left kidney, right kidney, and spleen. This dataset is widely used for developing and evaluating segmentation algorithms, especially in multi-organ medical imaging tasks [33].

3.2. Combined Healthy Abdominal Organ Segmentation (CHAOS)

The CHAOS dataset, provided by Kavur et al., is used for the Combined Healthy Abdominal Organ Segmentation Challenge. It features abdominal CT and magnetic resonance imaging (MRI) images from different patients, including 20 training and 20 testing cases for each modality. CT images are annotated for livers. They are from potential liver donors with healthy livers, taken from the upper abdomen during the portal venous phase, 70–80 s after contrast agent injection. The training data includes DICOM images and their corresponding ground truth masks, while the test set contains only DICOM images [34].

3.3. Liver Tumor Segmentation (LiTS)

LiTS is a benchmark dataset designed to evaluate segmentation methods for detecting liver/liver tumors in medical imaging. It is known for being challenging due to the complexity of accurately identifying both the liver and tumors within CT scans. It includes 201 abdominal computed tomography (CT) images, with 194 of these scans showing lesions. The dataset includes 131 training samples, each annotated with segmentation masks that outline the liver/any tumors present [35].

3.4. Sliver07

Sliver07 is a benchmark dataset designed for liver segmentation tasks in medical imaging, known for its challenging nature. The dataset’s training set consists of 20 samples, each annotated with segmentation masks that precisely outline the liver. This makes Sliver07 an essential resource for developing and evaluating algorithms focused on accurate liver segmentation in medical images [36].

3.5. 3Dircadb

3D Image Reconstruction for Comparison of Algorithm Database (3Dircadb) is a 3D abdominal CT image dataset widely used for medical image segmentation tasks. It includes 20 annotated samples, each containing segmentation masks for multiple anatomical structures, such as the liver, left kidney, liver tumors, and other relevant regions. This dataset serves as a valuable resource for developing and benchmarking algorithms designed for multi-structure segmentation in abdominal imaging [37].

3.6. CT-ORG

The CT-ORG dataset contains 140 CT scans with 3D annotations for lungs, bones, liver, kidneys, and bladder, with some scans also labeling the brain. It includes diverse imaging conditions and patients with liver lesions and metastatic diseases. Of the scans, 131 are standalone CTs and 9 are PET-CT components. The dataset is divided into 119 training and 21 testing scans. While lungs and bones in the training set were segmented automatically using morphological techniques, all other organs were segmented manually, primarily using ITK-SNAP with semi-automatic active contouring. The dataset also incorporates images from the LiTS challenge, supporting advancements in multi-class organ segmentation and algorithm benchmarking [38].

Evaluation Metrics

Commonly used metrics in medical image processing are as follows: for segmentation—Dice coefficient, IoU, Hausdorff distance; for classification—accuracy, precision, recall, F1-score; and for detection—sensitivity, specificity, AUROC.

Segmentation Metrics

Dice Coefficient:

$Dice Coefficient = \frac{2 \cdot | A \cap B |}{| A | + | B |}$
Intersection over Union (IoU):

$IoU = \frac{| A \cap B |}{| A \cup B |}$
Hausdorff Distance:

$d_{H} (A, B) = max \{sup_{a \in A} inf_{b \in B} ∥ a - b ∥, sup_{b \in B} inf_{a \in A} ∥ b - a ∥\}$

Classification Metrics

Accuracy:

$Accuracy = \frac{TP + TN}{TP + FP + FN + TN}$
Precision:

$Precision = \frac{TP}{TP + FP}$
Recall (Sensitivity):

$Recall = \frac{TP}{TP + FN}$
F1-Score:

$F 1 - Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$

Detection Metrics

Sensitivity:

$Sensitivity = \frac{TP}{TP + FN}$
Specificity:

$Specificity = \frac{TN}{TN + FP}$
AUC-ROC: Represents the area under the receiver operating characteristic curve, plotting:

$TPR = \frac{TP}{TP + FN}, FPR = \frac{FP}{FP + TN}$

$AUC = \sum_{i = 1}^{n - 1} ({FPR}_{i + 1} - {FPR}_{i}) \cdot \frac{{TPR}_{i + 1} + {TPR}_{i}}{2}$

Incorporating explainable AI (XAI) techniques, such as Grad-CAM [39], SHAP [40], and LIME [41], can enhance the interpretability of deep learning models used for abdominal organ segmentation.

4. Cons and Pros of Each Network for Liver or Liver Lesion Segmentation

This section specifically compares different networks in terms of liver/liver lesion segmentation.

Deep convolutional neural networks (DCNNs) are robust and accurate for liver and large lesion segmentation but require architectural enhancements for subtle anomalies. They depend on large datasets and high computational resources, making them less suitable for resource-limited or small-dataset scenarios.

FCNs efficiently classify individual pixels with a simple downsampling and upsampling structure. While effective for liver segmentation, they struggle with intricate or small lesions and often require post-processing for finer details.

Encoder-decoder architectures are versatile for liver segmentation but need refinements, such as skip connections and advanced loss functions, to handle small, irregular lesions and address issues with boundary preservation, noisy datasets, and class imbalances. S-Net (Figure 1) is a lightweight, computationally efficient model suitable for liver segmentation and real-time applications, particularly in resource-constrained settings. However, its simplicity reduces accuracy in handling complex shapes, small lesions, or noisy datasets. While effective as a baseline for simpler tasks, S-Net is less competitive for detailed liver lesion segmentation. Enhancements like advanced loss functions and preprocessing can improve its performance. It excels in feature blending and low-level detail preservation but faces limitations in complex scenarios due to its shallow encoder and fixed structural design [42].

Dense V-Net (Figure 2) is a robust architecture for liver/liver lesion segmentation, excelling in 3D volumetric analysis. Its densely connected layers enhance feature extraction, gradient flow, and fine detail preservation, making it highly effective for complex and irregular liver lesions. Its reliance on extensive computational resources, large annotated datasets, and sensitivity to noise are its drawbacks faced in resource-limited scenarios. Proper preprocessing, data augmentation, and loss function customization can help Dense V-Net achieve strong performance. The network utilizes convolutions to both extract features from the data and reduce its resolution at the end of each stage by applying suitable strides. The structure features a compression path on the left side and a decompression path on the right, which restores the signal to its original size. All convolutions are performed with proper padding. The compression path on the left is organized into distinct stages, each operating at a specific resolution. These stages consist of one to three convolutional layers per stage [43].

UNet operates effectively with a small number of training samples and produces highly accurate segmentation outcomes. This network is structured with a contracting path that captures semantic or contextual details from the image, and an expansive path that reintegrates location information for each pixel, pinpointing their specific positions. The design features symmetric paths, forming a U-shaped architecture. Enhancing the network’s depth, adding more skip connections, and incorporating dropout layers can further improve results. Additionally, it significantly shortens the training duration by decreasing the number of filters in each convolutional block, which in turn reduces the overall count of trainable parameters. U-Net is a reliable baseline for liver segmentation due to its simplicity, efficiency, and effectiveness. For liver lesion segmentation, which often involves small or complex structures, enhancements such as attention mechanisms, modified loss functions, or other architectures’ integration with modern architectures are required [44]. Figure 3 shows an architectural overview of U-Net.

U-Net++ builds on the traditional U-Net by introducing nested and dense skip connections, which improve accuracy and robustness for liver/liver lesion segmentation tasks. This architecture is particularly effective for detecting small and complex lesions due to its refined feature extraction and multiscale representation capabilities. However, its added complexity increases computational demands and training times. U-Net++ performs exceptionally well for liver segmentation and excels in capturing irregular shapes and small details in lesion segmentation with careful optimization.

UNet3+’s increased complexity and resource demands necessitate careful planning and optimization. With large and diverse datasets, U-Net 3+ delivers accurate results, while smaller datasets require regularization techniques to mitigate overfitting. Table 2 demonstrates the comparison of U-Net, U-Net++, and U-Net3+ layers:

RA-UNet excels in detailed liver/lesion segmentation, using residual connections and attention mechanisms for high accuracy, especially with small or irregularly shaped lesions. While computationally demanding, it performs exceptionally well in resource-rich environments with high-quality datasets, delivering excellent results with appropriate tuning and preprocessing. Figure 4 shows the architectural overview for RA-UNet.

Conditional GANs (cGANs) excel in liver lesion segmentation, effectively capturing complex shapes and fine details. They are robust against noise and class imbalance, making them suitable for challenging datasets. Their training complexity, resource requirements, and reliance on paired data are drawbacks. Figure 5 shows an architectural overview for cGAN.

Pix2Pix, with its adversarial training, excels in capturing intricate details and irregular shapes for liver/lesion segmentation. However, its dependency on paired data, training instability, and high resource requirements limit its practicality.

Swin-Unet, leveraging attention-based and multiscale feature representation, is highly effective for detecting small and irregular lesions while preserving liver boundaries. Despite its accuracy, it demands significant computational resources and longer training times, making it less ideal for resource-constrained settings.

Figure 6 shows the architectural overview for Swin-Unet GAN.

MSL-UNet offers an efficient and accurate solution for liver/lesion segmentation, excelling in resource-constrained or real-time scenarios. Its multiscale feature extraction performs well on small, irregular lesions, though it may not match the precision of advanced models like Swin-Unet or Dense V-Net. It is ideal for lightweight deployments and serves as a strong baseline for tasks requiring multiscale representation.

Table 3 demonstrates the comparison of different layers of U-Net-based networks.

For applications requiring a detection-first approach, a two-stage framework such as RetinaNet combined with U-Net is an effective choice, allowing initial lesion localization followed by high-precision segmentation. However, when simultaneous segmentation and detection are needed, models like RA-UNet and conditional GANs (cGANs) provide an end-to-end solution that captures both global liver structures and local lesion characteristics. In cases where generalization to unseen data is a priority, Swin-Unet—a Transformer-based architecture—offers superior performance by capturing long-range dependencies and fine anatomical details. Finally, for applications focused on lesion quantification, particularly in trauma cases, multiscale deep learning combined with SVM-based classification provides a robust framework for measuring lesion severity and extent Table 4.

5. Conclusions

In the domain of liver lesion (trauma or tumor) segmentation, several models stand out due to their ability to handle the complexity and variability of liver anatomy and lesion characteristics. Among these, U-Net++ and RA-UNet demonstrate exceptional performance, particularly in detecting small and irregular lesions. U-Net++ excels with its nested and dense skip connections, which enhance feature extraction and multiscale representation, making it well suited for challenging lesion segmentation tasks. Similarly, RA-UNet integrates residual and attention mechanisms, providing superior accuracy in distinguishing lesions with weak boundaries and irregular shapes.

For advanced segmentation tasks requiring global context understanding and fine detail extraction, Swin-Unet leverages Transformer-based architectures to capture long-range dependencies and small lesion details effectively. Additionally, Dense V-Net is highly effective for 3D volumetric segmentation, preserving fine details, and handling complex lesion morphologies, though it demands substantial computational resources.

Emerging architectures like MSLUNet offer a balanced approach, combining efficiency and accuracy, making them suitable for real-time applications, although they may not match the precision of more computationally intensive models. Furthermore, conditional GANs (cGANs), including Pix2Pix, excel in capturing intricate details and complex lesion structures, particularly in noise-robust environments, although their dependency on paired training data and high resource requirements can be limiting.

Overall, the choice of model should align with the specific clinical requirements, computational resources, and dataset characteristics. For scenarios prioritizing segmentation accuracy and detail, U-Net++, RA-UNet, and Swin-Unet are leading contenders, while MSLUNet and Dense V-Net provide practical alternatives for resource-constrained or large-scale applications. These models collectively set a strong foundation for advancing liver lesion diagnosis and treatment planning.

Author Contributions

Conceptualization, A.N.; methodology, N.T.; validation, A.N. and N.T.; formal analysis, N.T.; investigation, A.N.; resources, N.T.; writing—original draft preparation, A.N. and N.T.; writing—review and editing, A.N. and N.T.; visualization, A.N. and N.T.; supervision, C.Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to acknowledge the use of Grammarly (www.grammarly.com) for proofreading and grammar checking throughout the manuscript preparation.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Medical Imaging and Analysis
Imaging Techniques
CT	Computed Tomography
MRI	Magnetic Resonance Imaging
PET	Positron Emission Tomography
Technological Frameworks
ML	Machine Learning
AI	Artificial Intelligence
XAI	Explainable AI
CHAOS	Combined Healthy Abdominal Organ Segmentation
Machine Learning, Neural Networks, and Deep Learning
CNN	Convolutional Neural Networks
MSLUNet	Multiscale and Large Kernel U-Net
RA-UNet	Residual Attention U-Net
RCN	Recursive Cascaded Network
GAN	Generative Adversarial Network
SVM	Support Vector Machine
DCNN	Deep Convolutional Neural Network
FCN	Fully Convolutional Networks
DLA	Deep Learning Algorithm
DLM	Deep Learning Model
DLS	Deep Learning System
CBAMNN	Convolutional Block Attention Module Neural Network
Pix2Pix	Pixel-to-Pixel Image Translation Network
Clinical and Anatomical Considerations
L3	Third Lumbar Vertebra
LPDI	Liver Parenchymal Disruption Index
CE	Contrast Extravasation
SAT	Subcutaneous Adipose Tissue
VAT	Visceral Adipose Tissue
IMAT	Intermuscular Adipose Tissue
SM	Skeletal Muscle
Datasets and Evaluation Metrics
LiTS	Liver Tumor Segmentation Challenge
3DIRCADb	3D Image Reconstruction for Comparison of Algorithm Database
DSC	Dice Similarity Coefficient
CART	Classification and Regression Tree
DSS	Dice Similarity Score
MCC	Matthews Correlation Coefficient
AUC	Area Under the Curve
Technical and Computational Aspects
2D	Two-Dimensional
3D	Three-Dimensional

References

Niño, S.B.; Bernardino, J.; Domingues, I. Algorithms for Liver Segmentation in Computed Tomography Scans: A Historical Perspective. Sensors 2024, 24, 1752. [Google Scholar] [CrossRef] [PubMed]
Sivakumaran, L.; Chartrand, G.; Vu, K.N.; Vandenbroucke-Menu, F.; Kauffmann, C.; Tang, A. Liver segmentation: Indications, techniques, and future directions. Insights Imaging 2017, 8, 377–392. [Google Scholar] [CrossRef]
Wei, D.; Jiang, Y.; Zhou, X.; Wu, D.; Feng, X. A Review of Advancements and Challenges in Liver Segmentation. J. Imaging 2024, 10, 202. [Google Scholar] [CrossRef]
Kumar, S.S.; Vinpd Kumar, K. Literature survey on deep learning methods for liver segmentation from CT images: A comprehensive review. Multimed. Tools Appl. 2024, 83, 71833–71862. [Google Scholar] [CrossRef]
Roberts, J.L.; Dalen, K.; Bosanko, C.M.; Jafir, S.Z. CT in abdominal and pelvic trauma. Radiographics 1993, 13, 735–752. [Google Scholar] [CrossRef] [PubMed]
Hamidi, M.I.; Aldaoud, K.M.; Qtaish, I. The role of computed tomography in blunt abdominal trauma. Sultan Qaboos Univ. Med. J. 2007, 7, 41. [Google Scholar]
Taourel, P.; Vernhet, H.; Suau, A.; Granier, C.; Lopez, F.M.; Aufort, S. Vascular emergencies in liver trauma. Eur. J. Radiol. 2007, 64, 73–82. [Google Scholar] [CrossRef]
Yan, G.; Wang, B. An automatic kidney segmentation from abdominal CT images. In Proceedings of the 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, Xiamen, China, 29–31 October 2010; Volume 1, pp. 280–284. [Google Scholar]
Farzaneh, N.; Habbo-Gavin, S.; Soroushmehr, S.R.; Patel, H.; Fessell, D.P.; Ward, K.R.; Najarian, K. Atlas based 3D liver segmentation using adaptive thresholding and superpixel approaches. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 1093–1097. [Google Scholar]
Vivanti, R.; Szeskin, A.; Lev-Cohain, N.; Sosna, J.; Joskowicz, L. Automatic detection of new tumors and tumor burden evaluation in longitudinal liver CT scan studies. Int. J. Comput. Assist. Radiol. Surg. 2017, 12, 1945–1957. [Google Scholar] [CrossRef]
Farzaneh, N.; Soroushmehr, S.M.R.; Patel, H.; Wood, A.; Gryak, J.; Fessell, D.; Najarian, K. Automated Kidney Segmentation for Traumatic Injured Patients through Ensemble Learning and Active Contour Modeling. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2018, 2018, 3418–3421. [Google Scholar]
Choi, K.J.; Jang, J.K.; Lee, S.S.; Sung, Y.S.; Shim, W.H.; Kim, H.S.; Yun, J.; Choi, J.Y.; Lee, Y.; Kang, B.K.; et al. Development and Validation of a Deep Learning System for Staging Liver Fibrosis by Using Contrast Agent-enhanced CT Images in the Liver. Radiology 2018, 289, 688–697. [Google Scholar] [CrossRef]
Yasaka, K.; Akai, H.; Kunimatsu, A.; Abe, O.; Kiryu, S. Deep learning for staging liver fibrosis on CT: A pilot study. Eur. Radiol. 2018, 28, 4578–4585. [Google Scholar] [CrossRef]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA ML-CDS 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11045. [Google Scholar]
Gibson, E.; Giganti, F.; Hu, Y.; Bonmati, E.B.; Bandula, S.; Gurusamy, K.; Davidson, B.; Pereira, S.P.; Clarkson, M.J.; Barratt, D.C. Automatic Multi-Organ Segmentation on Abdominal CT With Dense V-Networks. IEEE Trans. Med. Imaging 2018, 37, 1822–1834. [Google Scholar] [CrossRef] [PubMed]
Ben-Cohen, A.; Klang, E.; Raskin, S.P.; Soffer, S.; Ben-Haim, S.; Konen, E.; Amitai, M.M.; Greenspan, H. Cross-modality synthesis from CT to PET using FCN and GAN networks for improved automated lesion detection. Eng. Appl. Artif. Intell. 2019, 78, 186–194. [Google Scholar] [CrossRef]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.-W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. arXiv 2020, arXiv:2004.08790. [Google Scholar] [CrossRef]
Dreizin, D.; Zhou, Y.; Fu, S.; Wang, Y.; Li, G.; Champ, K.; Yuille, A.L. A multiscale deep learning method for quantitative visualization of traumatic hemoperitoneum at CT: Assessment of feasibility and comparison with subjective categorical estimation. Radiol. Artif. Intell. 2020, 2, e190220. [Google Scholar] [CrossRef] [PubMed]
Ahn, Y.; Yoon, J.S.; Lee, S.S.; Suk, H.I.; Son, J.H.; Sung, Y.S.; Kim, H.S. Deep learning algorithm for automated segmentation and volume measurement of the liver and spleen using portal venous phase computed tomography images. Korean J. Radiol. 2020, 21, 987–997. [Google Scholar] [CrossRef]
Jin, Q.; Meng, Z.; Sun, C.; Cui, H.; Su, R. RA-UNet: A hybrid deep attention-aware network to extract liver and tumor in CT scans. Front. Bioeng. Biotechnol. 2020, 8, 605132. [Google Scholar] [CrossRef] [PubMed]
Dabiri, S.; Popuri, K.; Ma, C.; Chow, V.; Cespedes Feliciano, E.M.; Caan, B.J.; Baracos, V.E.; Beg, M.F. Deep learning method for localization and segmentation of abdominal CT. Comput. Med. Imaging Graph. 2020, 85, 101776. [Google Scholar] [CrossRef]
Luan, S.; Xue, X.; Ding, Y.; Wei, W.; Zhu, B. Adaptive Attention Convolutional Neural Network for Liver Tumor Segmentation. Front. Oncol. 2021, 11, 680807. [Google Scholar] [CrossRef]
Conze, P.H.; Kavur, A.E.; Cornec-Le Gall, E.; Gezer, N.S.; Le Meur, Y.; Selver, M.A.; Rousseau, F. Abdominal multi-organ segmentation with cascaded convolutional and adversarial deep networks. Artif. Intell. Med. 2021, 117, 102109. [Google Scholar] [CrossRef]
Araújo, J.D.L.; Cruz, L.B.d.; Ferreira, J.L.; Neto, O.P.d.S.; Silva, A.C.; Paiva, A.C.d.; Gattass, M. An automatic method for segmentation of liver lesions in computed tomography images using deep neural networks. Expert Syst. Appl. 2021, 180, 115064. [Google Scholar] [CrossRef]
Dreizin, D.; Chen, T.; Liang, Y.; Zhou, Y.; Paes, F.; Wang, Y.; Morrison, J.J. Added value of deep learning-based liver parenchymal CT volumetry for predicting major arterial injury after blunt hepatic trauma: A decision tree analysis. Abdom. Radiol. 2021, 46, 2556–2566. [Google Scholar] [CrossRef]
Yang, S.; Zhao, Y.; Liao, M.; Zhang, F. An Unsupervised Learning-Based Multi-Organ Registration Method for 3D Abdominal CT Images. Sensors 2021, 21, 6254. [Google Scholar] [CrossRef]
Farzaneh, N.; Stein, E.B.; Soroushmehr, R.; Gryak, J.; Najarian, K. A deep learning framework for automated detection and quantitative assessment of liver trauma. BMC Med. Imaging 2022, 22, 39. [Google Scholar] [CrossRef] [PubMed]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Computer Vision—ECCV 2022 Workshops. ECCV 2022; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
Jamali, A.; Nazemi, A.; Sami, A.; Bahrololoom, R.; Paydar, S.; Shakibafar, A. Decision Support System to triage of liver trauma. arXiv 2024, arXiv:2408.02012. [Google Scholar]
Cheng, C.T.; Lin, H.H.; Hsu, C.P.; Chen, H.W.; Huang, J.F.; Hsieh, C.H.; Liao, C.H. Deep Learning for Automated Detection and Localization of Traumatic Abdominal Solid Organ Injuries on CT Scans. J. Digit. Imaging Inform. Med. 2024, 37, 1113–1123. [Google Scholar] [CrossRef] [PubMed]
Zhu, S.; Cheng, L. MSLUNet: A Medical Image Segmentation Network Incorporating Multi-Scale Semantics and Large Kernel Convolution. Appl. Sci. 2024, 14, 6765. [Google Scholar] [CrossRef]
Shen, X.; Zhou, Y.; Shi, X.; Zhang, S.; Ding, S.; Ni, L.; Dou, X.; Chen, L. The application of deep learning in abdominal trauma diagnosis by CT imaging. World J Emerg Surg. 2024, 19, 17. [Google Scholar] [CrossRef]
Multi-Atlas Labeling Beyond the Cranial Vault—Workshop and Challenge. Synapse Liver Lesion Segmentation Challenge. 2015. Available online: https://www.synapse.org/Synapse:syn3193805/wiki/217754 (accessed on 1 February 2025).
Kavur, A.E.; Alper, S.; Oguz, D.; Mustafa, B.; Gezer, S.N. CHAOS—Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data (Version v1.03) [Dataset]; Zenodo: Geneva, Switzerland, 2019. [Google Scholar]
Bilic, P.; Christ, P.F.; Vorontsov, E.; Chlebus, G.; Chen, H.; Dou, Q.; Fu, C.-W.; Han, X.; Heng, P.-A.; Hesser, J.; et al. The Liver Tumor Segmentation Benchmark (LiTS). arXiv 2019, arXiv:1901.04056. [Google Scholar] [CrossRef]
Heimann, T.; Ginneken, B.V.; Styner, M.A. Segmentation of the Liver 2007 (SLIVER07). Available online: http://sliver07.isi.uu.nl/ (accessed on 12 December 2018).
Soler, L.; Hosttettle, A.; Charnoz, A.; Fasquel, J.; Moreau, J. 3D Image Reconstruction for Comparison of Algorithm Database: A Patient Specific Anatomical and Medical Image Database. Available online: https://www.ircad.fr/research/data-sets/liver-segmentation-3d-ircadb-01/ (accessed on 16 April 2018).
Rister, B.; Shivakumar, K.; Nobashi, T.; Rubin, D.L. CT-ORG: A Dataset of CT Volumes with Multiple Organ Segmentations (Version 1) [Dataset]; The Cancer Imaging Archive, University of Arkansas for Medical Sciences: Little Rock, AR, USA, 2019. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems (NeurIPS). arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
Mazhar, S.; Atif, N.; Bhuyan, M.; Ahamed, S.R. S-Net: A Lightweight Real-Time Semantic Segmentation Network for Autonomous Driving. In Proceedings of the Computer Vision and Image Processing. CVIP 2023. Communications in Computer and Information Science, Jammu, India, 3–5 November 2023; Springer: Cham, Switzerland, 2024; Volume 2010, pp. 735–752. [Google Scholar] [CrossRef]
Milletari, F.; Navab, N.; Ahmadi, S.-A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the Fourth International Conference on 3D Vision (3DV), Montreal, QC, Canada, 11–17 October 2016; pp. 565–571. [Google Scholar] [CrossRef]
Ayalew, Y.A.; Fante, K.A.; Mohammed, M. Modified U-Net for liver cancer segmentation from computed tomography images with a new class balancing method. BMC Biomed. Eng. 2021, 3, 4. [Google Scholar] [CrossRef]

Figure 1. Architectural overview for SNet. The encoder path extracts hierarchical features, while skip connections preserve spatial details and ensure information flow between layers. The bridge incorporates the convolutional block attention module. The decoder path reconstructs segmentation maps using up-convolution layers, progressively combining feature maps from corresponding encoder layers. Attention mechanisms enhance focus on relevant liver regions, ensuring precise segmentation. The output layer generates the final segmentation map, delineating the liver from surrounding tissues [22].

Figure 2. Architectural overview of Dense VNet. The downsampling path extracts multiscale features using dense blocks with batch normalization (BN) and ReLU activations, while the upsampling path reconstructs the segmentation map with feature refinement. Skip connections preserve spatial details by directly linking encoder and decoder features. The output layer generates the final segmented liver mask [43].

Figure 3. Architectural overview for U-Net. The contracting path captures spatial and contextual information through convolutional layers and max-pooling. Skip connections link corresponding layers in the contracting and expanding paths, preserving fine-grained spatial details for better segmentation accuracy. The expanding path reconstructs the segmentation map with up-convolutions. The bottom block represents the network’s bottleneck, extracting high-level features. The output layer generates a binary mask or multi-class segmentation map for the liver region.

Figure 4. Architectural overview for RA-UNet. The RA-UNet architecture enhances traditional U-Net by integrating attention mechanisms for improved segmentation accuracy. The encoder path extracts hierarchical features, while the bottleneck serves as the central processing block for global feature representation. Skip connections preserve spatial and contextual information, enabling effective reconstruction in the decoder path. Attention mechanisms prioritize relevant regions, ensuring precise delineation of target areas. The output layer generates the final segmentation map for applications such as liver segmentation in medical imaging.

Figure 5. Architectural overview for cGAN.

Figure 6. Architectural overview for Swin-Unet.

Table 1. Chronological summary of deep learning studies in medical imaging (2018–2024).

Year	Authors	Method	Purpose and Used Data	Evaluation Metrics
2018	Kyu Jin Choi et al. [12]	DLS	Staging liver fibrosis; 7461 patients with confirmed liver fibrosis	AUROC scores: 0.96, 0.97, 0.95 for diagnosing significant fibrosis, advanced fibrosis, and cirrhosis, respectively; accuracy: 74.9
2018	Koichiro Yasaka et al. [13]	DCNN	Staging liver fibrosis; 496 CT scans from 286 patients	AUROC scores: 0.74, 0.76, and 0.73 for diagnosing significant fibrosis, advanced fibrosis, and cirrhosis, respectively
2018	Zongwei Zhou et al. [14]	UNet++	Liver segmentation; 331 Samples 512 × 512	IoU of 3.9 and 3.4 points compared to U-Net and wide U-Net, respectively
2018	Eli Gibson et al. [15]	Deep V-Networks	Pancreas, stomach, and esophagus segmentation for endoscopic procedures; 90 subjects	Dice scores: pancreas 0.78, stomach 0.90, esophagus 0.76
2019	Avi Ben-Cohen et al. [16]	FCN & cGAN	Liver lesion; 60 PET/CT scans	28% false positive reduction
2020	Huimin Huang et al. [17]	UNET3+	Liver/spleen segmentation; 131 Samples, 103 for train, 28 for test	Dice scores 0.9675 and 0.9620 for liver/spleen, respectively
2020	David Dreizin et al. [18]	Multiscale DL	Blunt hepatic injury; 130 samples, 113 blunt, 17 penetrating	Dice score of 0.61
2020	Yura Ahn et al. [19]	DLA	Segmentation of liver/spleen; 813 patients and evaluated 150/50 pairs of CT	Dice scores: liver 0.973; spleen 0.974
2020	Qiangguo Jin et al. [20]	RA-UNet	3D liver/tumor segmentation; LiTS and 3DIRCADb datasets	Dice score of 0.595 and 0.830 for tumor segmentation
2020	Setareh Dabiri et al. [21]	Encoder-Decoder Networks	Segmentation at L3 vertebra level; 1748 CT images	Jaccard scores: 97/% for SM and VAT and 98% and 83% for SAT and IMAT tissue segmentation
2021	Shunyao Luan et al. [22]	S-Net	Liver tumor segmentation; LiTS	Dice scores: global 75.5%; per case 61.3%
2021	Pierre-Henri Conze et al. [23]	Conditional GAN	Liver, kidneys, and spleen segmentation from abdominal CT; CHAOS	Dice score 97.95,89.67,90.56,84.70%, respectively, for liver, spleen, right kidney, and left kidney segmentation
2021	José Denes Lima Araújo et al. [24]	RetinaNet and UNet	Lesion detection and segmentation; LiTS	Dice score: 82.99%, MCC: 83.62%
2021	David Dreizin et al. [25]	Multiscale DL & SVM	Quantification of traumatic hemoperitoneum; total patients: 130, diagnosis: traumatic hemoperitoneum	Dice Score: 0.61%
2021	Shaodi Yang et al. [26]	RCN	Multi-organ registration on 3D abdominal CT images; LiTS, 3DIRCADB, BTCV, Silver 07	Dice score 97.75% for multi-organ segmentation
2022	Negar Farzaneh et al. [27]	CNN	Quantitative assessment of liver trauma; 77 CT scans (34 with and 43 without liver parenchymal trauma)	Dice score 96.31% and 51.21% for liver parenchyma and liver trauma, respectively.
2023	Hu Cao et al. [28]	Swin-Unet	Segmentation of the aorta, gallbladder, spleen, left kidney, right kidney, liver, pancreas, and stomach; 3779 CT images from 30 patient includes 18 train and 12 test	Dice score 79.13%
2023	Ali Jamali et al. [29]	Pix2Pix GAN	Decision support for liver trauma triage; total patients: 20, total images: 2823, liver masks: 1153, tumors or cysts: 467	Dice scores of 97% for liver, 93% for lacerations
2024	Chi-Tung Cheng et al. [30]	3D CBAMNN	Detection of traumatic abdominal injuries; 1302 scans (87%) for training and validation and 194 scans (13%) for testing	Spleen injury model accuracy of 0.938 and specificity of 0.952; liver injury mode accuracy of 0.820 and specificity of 0.847; kidney injuries model accuracy of 0.959 and specificity of 0.989
2024	Zhu and Cheng [31]	MSLUNet	Enhanced medical image segmentation; BUSI dataset: total images: 780, normal breast: 133 benign tumor: 437 malignant tumor: 210 used 647 (benign + malignant tumor); Kvasir-SEG dataset: 1000 images + 1000 masks	Evaluation metrics on MSLUNet are Dice, recall, precision, and specificity of 91.1, 93.2, 95.6, and 94.9, respectively
2024	Xinru Shen et al. [32]	2D Semantic Segmentation	Abdominal injury detection; 855 out of 3147 patients had confirmed abdominal trauma.	Accuracy of 93.2% in renal injuries

Table 2. Layer summary of U-Net, U-Net++, and U-Net 3+ architectures.

Feature/Layer	U-Net	U-Net++	U-Net 3+
Convolutional Layers	Encoder and decoder use 3 × 3 convolutions with ReLU activation	Similar to U-Net with added 3 × 3 convolutions in nested skip connections	Similar to U-Net; used in both encoder and decoder for feature fusion.
Activation Function	ReLU	ReLU	ReLU
Pooling	2 × 2 max-pooling in the encoder for downsampling	2 × 2 max-pooling in the encoder for downsampling	2 × 2 max-pooling in the encoder for downsampling
Upsampling	Transposed convolutions for spatial resolution recovery	Transposed convolutions for spatial resolution recovery	Bilinear upsampling or transposed convolutions
Skip Connections	Direct connections between corresponding encoder and decoder levels	Nested skip connections with intermediate convolutional layers	Full-scale skip connections integrating features from all encoder and decoder levels
Fusion Layers	No explicit fusion layer	Intermediate nodes use convolution layers for feature refinement	Fusion of multiscale features using 3 × 3 convolutions
Deep Supervision is not supported	Not supported	Supported with intermediate outputs at multiple decoder levels
Output Layer	1 × 1 convolution for segmentation map generation	1 × 1 convolution for segmentation map generation	1 × 1 convolution for segmentation map generation
Computational Complexity	Moderate	Higher due to nested connections	Highest due to full-scale skip connections and deep supervision.

Table 3. Pros and cons of networks for liver/liver lesion segmentation.

Pros	Method	Cons
DCNNs	High accuracy and robustness for liver segmentation. Effective for larger lesions.	Struggles with small or subtle lesions. Requires large datasets and significant computational resources.
Fully Convolutional Networks (FCNs)	Simple and computationally efficient for liver segmentation.	Poor performance on small or intricate lesions. Requires post-processing for fine details.
Encoder-decoder architectures	Simple, flexible, and efficient.	Struggles with small or irregular lesions and noisy datasets. Needs enhancements for lesions.
S-Net	Lightweight and efficient. Ideal for resource-constrained or real-time applications.	Reduced accuracy for complex shapes and small lesions. Sensitive to noise.
Dense V-Net	Strong for 3D volumetric analysis. Preserves fine details. Handles complex lesions effectively.	High computational demands. Sensitive to noise. Requires large annotated datasets.
U-Net	Simple, efficient, and effective for liver segmentation.	Limited performance for small or complex lesions without enhancements.
U-Net++	Superior accuracy with nested and dense skip connections. Effective for small and complex lesions.	Increased complexity and computational requirements.
U-Net 3+	High accuracy and robustness for liver and liver lesion segmentation. Suitable for clinical applications.	High computational demands. Risk of overfitting on small datasets.
RA-UNet	Excellent for small or irregular lesions. Combines residual and attention mechanisms to improve accuracy.	High computational requirements. Requires high-quality datasets.
Conditional GANs (cGANs)	Robust to noise and class imbalance. Captures complex shapes and fine details.	Complex training. High resource demands. Requires paired data.
Pix2Pix	Handles intricate details and complex structures effectively.	Dependency on paired data. Training instability. Resource-intensive.
Swin-Unet	Excellent for small and irregular lesions. Preserves liver boundaries with global context understanding.	High memory and computational requirements. Long training times.
MSL-UNet	Balanced efficiency and accuracy good for small/irregular lesions. Suitable for real-time applications.	Not as accurate as complex architectures. Limited performance on noisy datasets.

Table 4. Summary of integrated methods for simultaneous liver segmentation and lesion detection.

Method	Approach	Strengths
RetinaNet + U-Net	Detection + segmentation	Two-stage approach for high accuracy
cGANs (Conditional GANs)	Joint segmentation and detection	Robust to noise and domain shifts
Multiscale DL + SVM	Multi-task learning	Quantifies injury severity
RA-UNet	Attention-based segmentation and detection	High accuracy, fewer false positives
Swin-Unet	Transformer-based model	Captures global and local details

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tavakolian, N.; Nazemi, A.; Suen, C.Y. Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans. Appl. Sci. 2025, 15, 2516. https://doi.org/10.3390/app15052516

AMA Style

Tavakolian N, Nazemi A, Suen CY. Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans. Applied Sciences. 2025; 15(5):2516. https://doi.org/10.3390/app15052516

Chicago/Turabian Style

Tavakolian, Niloofar, Azadeh Nazemi, and Ching Yee Suen. 2025. "Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans" Applied Sciences 15, no. 5: 2516. https://doi.org/10.3390/app15052516

APA Style

Tavakolian, N., Nazemi, A., & Suen, C. Y. (2025). Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans. Applied Sciences, 15(5), 2516. https://doi.org/10.3390/app15052516

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans

Abstract

1. Introduction

2. Historical Evolution

2.1. Early Foundations (1993–2007)

2.2. Transition to Automated Methods (2010–2018)

2.3. Deep Learning Revolution (2018–2024)

2.3.1. Deep Learning System (DLS) Using Contrast-Enhanced CT

2.3.2. Deep Convolutional Neural Network (DCNN)

2.3.3. UNet++

2.3.4. Dense VNet

2.3.5. Fully Convolutional Network (FCN) and Conditional GAN

2.3.6. Unet3+

2.3.7. Multiscale DL and CART

2.3.8. Deep Learning Algorithm (DLA)

2.3.9. Residual Attention UNet

2.3.10. Encoder-Decoder

2.3.11. S-Net

2.3.12. Conditional GAN

2.3.13. RetinaNet and U-Net

2.3.14. Multiscale DL and Support Vector Machine (SVM)

2.3.15. Recursive Cascaded Network (RCN)

2.3.16. CNN

2.3.17. Swin-Unet

2.3.18. Pix2Pix

2.3.19. 3D Convolutional Block Attention Module Neural Network (CBAMNN)

2.3.20. MSLUNet

2.3.21. Deep Abdominal Net

3. Datasets

3.1. BTCV

3.2. Combined Healthy Abdominal Organ Segmentation (CHAOS)

3.3. Liver Tumor Segmentation (LiTS)

3.4. Sliver07

3.5. 3Dircadb

3.6. CT-ORG

4. Cons and Pros of Each Network for Liver or Liver Lesion Segmentation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI