Modification of U-Net with Pre-Trained ResNet-50 and Atrous Block for Polyp Segmentation: Model TASPP-UNet

Mukasheva, Assel; Koishiyeva, Dina; Sergazin, Gani; Sydybayeva, Madina; Mukhammejanova, Dinargul; Seidazimov, Syrym

doi:10.3390/engproc2024070016

Open AccessProceeding Paper

Modification of U-Net with Pre-Trained ResNet-50 and Atrous Block for Polyp Segmentation: Model TASPP-UNet^†

by

Assel Mukasheva

^1,*,

Dina Koishiyeva

²,

Gani Sergazin

³

,

Madina Sydybayeva

⁴,

Dinargul Mukhammejanova

⁵ and

Syrym Seidazimov

²

¹

School of Information Technology and Engineering, Kazakh-British Technical University, Almaty 050000, Kazakhstan

²

Department of Information Technology, Almaty University of Power Engineering and Telecommunications, Almaty 050013, Kazakhstan

³

Academy of Logistics and Transport, Almaty 050012, Kazakhstan

⁴

Faculty of Computer Technologies and Cyber Security, International University of Information Technology, Almaty 050013, Kazakhstan

⁵

Department of Artificial Intelligence and Big Data, Kazakh National University Named after Al-Farabi, Almaty 050040, Kazakhstan

^*

Author to whom correspondence should be addressed.

^†

Presented at the International Conference on Electronics, Engineering Physics and Earth Science (EEPES’24), Kavala, Greece, 19–21 June 2024.

Eng. Proc. 2024, 70(1), 16; https://doi.org/10.3390/engproc2024070016

Published: 31 July 2024

(This article belongs to the Proceedings of International Conference on Electronics, Engineering Physics and Earth Science (EEPES 2024))

Download

Browse Figures

Versions Notes

Abstract

Colorectal cancer is the third most prevalent type of cancer globally, and it typically progresses unnoticed, making early detection via effective screening methods crucial. This study presents the TASPP-UNet, an advanced deep learning model that integrates Atrous Spatial Pyramid Pooling (ASPP) blocks and a ResNet-50 encoder to enhance polyp boundary delineation accuracy in colonoscopy images. We utilized augmented datasets from Kvasir-SEG and CVC Clinic-DB, which included up to 2000 images, to enrich the training examples’ variability. The TASPP-UNet achieved a superior IOU of 0.9276, compared to 0.9128 by the ResNet50-UNet and 0.8607 by the standard U-Net, demonstrating its efficacy in precise segmentation tasks. Notably, this model exhibited impressive computational efficiency with a processing speed of 151.1 frames per second (FPS), underscoring its potential for real-time clinical applications aimed at early and accurate colorectal cancer detection. This performance highlights the model’s capability not only to improve diagnostic accuracy but also to enhance clinical workflows, potentially leading to better patient outcomes.

Keywords:

deep learning; polyp segmentation; U-Net; atrous block

1. Introduction

Colorectal cancer (CRC) ranks as the third prevalent type of cancer and claims the lives of millions of people worldwide, according to GLOBOCAN [1]. The stomach, being a pouch-shaped organ, typically does not present symptoms from luminal tumor growth leading to obstructive complications in its initial stages [2]. Recently, major efforts have been focused on early detection of colorectal polyps because they may represent a precancerous condition [3]. Adenomatous colorectal polyps are considered dangerous because they are more likely to develop into cancer; a small adenoma can slowly progress over 10 years before becoming invasive cancer through the pathway of chromosomal instability [4,5]. Currently, colonoscopy is widely used to examine the large intestine. However, wireless capsule endoscopy (WCE) offers a new approach by allowing the patient to swallow a small WCE capsule to collect video images of the colon [6]. However, earlier research indicates 26% of colorectal polyps go undetected during colonoscopy, and when WCE is administered, the endoscopist requires a long time to analyze them, which also increases the likelihood of being missed [7]. In response to the problems of missing polyps during colonoscopy and the complexity of analyzing WCE data, systems based on artificial intelligence approaches are being developed to automatically segment colon polyps. This study compared computer systems and traditional colonoscopy methods in polyps. The results showed that the computer system group revealed more polyps and adenomas than the group with the traditional white light method of examination [8]. Along the same lines, another research study found that the use of a computer system in colonoscopy reduced the miss rate of polyps to 13.89%, compared to 40.00% with traditional colonoscopy [9].

The application of neural network algorithms facilitates the automation of medical image analysis. However, the field of segmentation and analysis of colon polyps using neural network algorithms faces several challenges, namely blurred boundaries of polyps, their similarity to surrounding tissue size, and also a lack of quality annotated data, which can lead to issues of model generalization [10]. In the field of computer vision for medical image segmentation, convolutional neural networks, namely U-Net architecture with skip connection for better feature transfer between networks, have been emphasized [11]. Convolutional neural networks, namely U-Net [12] architecture with symmetric encoder and decoder and skip connection for better feature transfer between networks, have been emphasized. However, there are several disadvantages of mixing features of different abstraction levels, low generalizability, and limited context perception. Currently, many additions and modifications have been developed to improve the basic U-Net model in the polyp segmentation task, but outstanding challenges still exist [13].

In this paper, a TASPP-UNet model is proposed that is based on the U-Net model architecture, which has been improved by integrating the pre-trained [14] and adding the Atrous Spatial Pyramid Pooling (ASPP) block [15] to improve context analysis. An experimental comparison with the baseline version of U-Net and the version with the pre-trained encoder was performed. Improving neural network architectures and applying transfer learning techniques aim to overcome existing problems in polyp segmentation, which will hopefully help in the diagnosis of CRC.

2. Materials and Methods

The following section describes the basic blocks of the encoder and decoder of the TASPP-UNet method, and the parameters of ResNet50-UNet and the classical U-Net will be compared.

2.1. Pre-Trained Encoder

The TASPP-UNet model utilizes a combination of the deep convolutional network ResNet-50 [16] as an encoder for advanced contextual features. The pre-trained ResNet-50 model trained on the multi-million ImageNet [17] dataset consists of fifty trained layers as well as residual blocks designed to avoid vanishing gradients. The problem is that the error gradient propagating backward from the output to the input layers becomes increasingly smaller, making it difficult to update the weights in the initial layers during training [18]. The residual block formula is shown below (Formula (1)).

F (x) + x,

(1)

where

x

is the input and

F (x)

is the output of the layers. Figure 1 depicts the residual block.

In the context of colon polyp segmentation, employing ResNet-50 as a feature extractor should enhance the model’s accuracy.

2.2. ASPP Module

The ASPP module is located in the model architecture at the encoder output of the pre-trained ResNet-50 encoder, which is important for processing the deepest and most abstract features before they are reconstructed in the decoder [19]. The ASPP block uses atrotic atrous convolution to process images at different scales, increasing the receptive field of filters without the need to increase the number of parameters or the amount of computation. The ASPP module is shown in Figure 2.

Figure 2 illustrates a schematic of the ASPP block showing parallel convolutional pathways with dilatational velocities as well as a global pooling layer. These paths are combined using 1 × 1 convolution to create an enriched set of output features.

2.3. Model Decoder Path

The decoder in the TASPP-UNet model sequentially increases the resolution of the feature map using an up-sampling process to recover spatial details lost in the encoding stage. Subsequent convolutional layers further refine the features, while batch normalization ensures the stability of the learning process. Regularization mechanisms such as Dropout are applied to reduce overtraining [20]. Inverse convolution layers further extend the feature map by refining the segmentation. A final 1 × 1 sigmoid activation convolution transforms the feature map into a probabilistic representation of each pixel’s membership to the object of interest, thus creating a segmentation map. Figure 3 shows the overall architecture of the TASPP-UNet model.

Figure 3 illustrates the overall structure of the TASPP-UNet model, which includes an encoder with a pre-trained ResNet-50, an ASPP block for multilevel context analysis, and a decoder with up-sampling and skip connections to recover a detailed segmentation map [21]. The model outputs binary polyp segmentation and segmentation overlay on the original image.

2.4. Comparison of Segmentation Models

In our study, we conducted a comparative analysis of the TASPP-UNet model with two other architectures, i.e., the advanced UNet with ResNet-50 encoder and the basic U-Net. Table 1 provides a quantitative comparison of these architectures.

The TASPP-UNet model integrating ASPP and ResNet-50 has the highest complexity with more than 19.5 million parameters and a model size of about 74.7 MB. The ResNet-50-UNet method, although utilizing the powerful ResNet-50 encoder but without the ASPP module, occupies an intermediate position with approximately 14.9 million parameters and a model size of 63.42 MB.

2.5. Evaluation Metrics

Several key indicators, each with a specific formula, were used to assess colon polyp segmentation performance in the research experiment.

Binary Cross-Entropy Loss: This function calculates the difference between probability assignments—the actual label and the predicted probability; the following will be indicated as Loss. It is calculated using Formula (2).

L o s s = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \cdot \log (Ý_{i}) + (1 - y_{i}) \cdot \log (1 - Ý_{i})],

(2)

where N is the number of samples,

y_{i}

is the actual label, and

Ý_{i}

is the predicted probability [22].

Dice Score: It measures the overlap between the prediction and ground truth (Formula (3)).

D i c e s c o r e = \frac{2 \times |X \cap Y|}{|X| + |Y|},

(3)

where X is the set of predicted positives and Y is the set of actual positives [23].

Accuracy: This reflects the proportion of true results out of the total number of cases examined (Formula (4)).

A c c u r a c y = \frac{T P + T N}{T P + F P + F N + T N} .

(4)

where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively [24].

Mean Intersection Over Union (Mean IOU): This is a common evaluation metric for segmentation tasks that measures the mean overlap between the predicted segmentation and the reference standard for each class. It is given by Formula (5).

M e a n I O U = \frac{1}{N} \sum_{i = 1}^{N} \frac{X_{i} \cap Y_{i}}{X_{i} \cup Y_{i}}

(5)

where X_i and Y_i are the prediction and the reference standard, and N is the number of classes [25].

Each metric provides insight into different aspects of model performance. Loss provides a direct measure of prediction accuracy in terms of probability, Dice score, and Mean IOU, which assess the spatial accuracy of segmentation, and precision provides a quick overall measure of correctness.

2.6. Data Preprocessing

In this study, two public databases, Kvasir-SEG [26], including 1000 annotated images, and the CVC Clinic-DB [27], set containing 612 annotated images, were used to solve the colon polyp segmentation problem. These sets were combined for data diversity. During data preparation for the colon polyp segmentation study, a data augmentation method was applied to the data to expand the original image set to 2000 image instances. The augmentation process utilized operations such as random variation of brightness and contrast, rotation to a limited angle, and shift and scaling while preserving the aspect ratio [28]. These operations were chosen to simulate possible variations in the shooting conditions, thus enriching the training dataset with a variety of conditions. After reaching the target amount of data, pixel values of images and masks were normalized, bringing them to a range between 0 and 1 to improve the convergence of model training.

2.7. Implementation Details

The development of neural network models for this study was conducted using an NVIDIA A100 PCIe 80 GB GPU, (NVIDIA Corporate, Santa Clara, CA, USA). This graphics card is built on a 7 nm process, features a GA100 GPU, and contains 6912 shader blocks, 432 tensor cores, and 80 GB of HBM2e memory via a 5120-bit interface. The software used was Python 3.10 and TensorFlow version 2.15.0, providing the necessary compatibility and performance for medical data processing.

3. Results and Discussion

In this experiment for colon polyp segmentation, the datasets used were Kvasir-SEG, which includes 1000 annotated images, and CVC Clinic-DB, which comprises 612 images, totaling 1612 images. These were augmented to 2000 images to enhance dataset variability. The model training was executed using the Adam optimizer [29] with a starting learning rate of 0.0001, incorporating early stopping based on the validation sample’s performance if no improvement was noted over 25 epochs with training capped at 140 and results of model segmentation by metric indicators were rounded to four decimal places. The performance results for the main metrics on the training data are derived in Table 2, and the plotted graph with curves is shown in Figure 4.

The TASPP-UNet demonstrated the highest Mean IOU at 92.76%, reflecting its superior capability in precise boundary delineation, despite a Dice score of 0.9147 and Accuracy score of 76.55%. The ResNet-50 U-Net, while achieving a marginally better Dice score of 0.9176, lagged slightly in Accuracy score and Mean IOU at 74.17% and 91.28%, respectively. The standard U-Net showed the lowest performance across all metrics with a Dice score of 0.8731, Accuracy score of 72.56%, and Mean IOU of 86.07%, underscoring the enhancements provided by ASPP and the ResNet-50 encoder in the advanced models. Figure 4 shows the plots of model learning curves for the main metrics on the training set.

The outcome of the effectiveness of the main metrics on the validation data is presented in Table 3.

The validation data show that TASPP-UNet has attained the overall top scores in all metrics, with a Mean IOU of 0.7141, indicating a strong capacity for accurate segmentation. It also led to a Dice score and Accuracy score, reinforcing its robustness. The ResNet-50 U-Net, despite a slightly higher Loss, maintained competitive scores, notably a Dice score of 0.7430. The standard U-Net, while trailing with a Mean IOU of 0.5076, demonstrates the potential areas for improvement in segmentation tasks, particularly in terms of precision and generalizability, reflected in its Dice score of 0.6386 and Accuracy score of 72.56%. Figure 5 presents the plots of model learning curves for the main metrics on the validation data.

The outcomes of the methods on the test set of 200 images are summarized in Table 4.

On the test dataset, the TASPP-UNet method performed the best with a Loss of 0.0963, a Dice score of 0.8967, an Accuracy score of 0.5624, and a Mean IOU of 0.8789. Resnet50-UNet, although it showed a higher Loss, achieved a moderate Dice score and Mean IOU. Standard U-Net showed average results, with a Loss of 0.1461, a Dice score of 0.8234, an Accuracy score of 0.5529, and a Mean IOU of 0.7667, indicating the potential for improvement in segmentation accuracy on unfamiliar data.

Table 5 compares the training time and speed of different methods in the polyp segmentation task.

Referring to the results in Table 5, the TASPP-UNet exhibits the shortest training time and the fastest speed (FPS), which makes it preferable when computing resources are limited. Resnet50-UNet and standard U-Net require longer training times. These metrics are important for understanding the trade-offs between accuracy and efficiency of model training in research. Figure 6 and Figure 7 show the original images of the model prediction results of colon polyp segments and the overlaid segment on the images.

Figure 6 shows that TASPP-UNet’s model predictions align closely with expert annotations, indicating its precision. In contrast, the Resnet-50-UNet and standard U-Net show some inaccuracies, marked by blue boxes where polyps were either missed or incorrectly identified, illustrating the challenges in polyp boundary delineation by these models.

Presented in Figure 7, polyp segmentation prediction results illustrate that the TASPP-UNet model shows segmentation that agrees well with the ground truth provided by experts, which emphasizes its performance.

4. Conclusions

The TASPP-UNet model proposed in this study and its experimental comparison with classical U-Net and Resnet-50 U-Net on a test set demonstrate its performance in the task of colon polyp segmentation. Achieving an average IOU of 0.9276, TASPP-UNet demonstrates robust boundary detection, which exceeded the performance of Resnet-50 U-Net by 0.9128 and standard U-Net by 0.8607. The performance in boundary detection can be attributed to ASPP blocks, which enhance contextual information. In terms of operational efficiency, TASPP-UNet maintained a decent frame rate of 151.1 FPS during training, suggesting its potential for rapid deployment in clinical settings. This high frame rate emphasizes the model’s ability to process large datasets quickly, facilitating its application in real-time medical imaging scenarios. The balance between high segmentation accuracy and computational efficiency emphasizes the promise of the TASPP-UNet model as a practical tool to assist endoscopists in accurate and timely identification of colorectal polyps.

In addition, extending the applicability of the model to other types of medical images and conditions could greatly expand its utility. The inclusion of large and diverse datasets will be critical to improving the generalizability and robustness of the model. Efforts should also be directed towards optimizing the model for resource-constrained environments to ensure that it can be used in a variety of clinical settings, including those with limited computing infrastructure.

Thus, the improvement of the TASPP-UNet model represents a leap forward in medical image segmentation and hopefully has promising implications for improving the efficiency and accuracy of colorectal polyp detection and segmentation in clinical practice.

Future research should focus on improving these computational paradigms, expanding the applicability of the model, and incorporating large datasets to assist endoscopists in segmenting and recognizing colon polyps.

Author Contributions

Conceptualization, A.M. and D.K.; methodology, A.M., G.S., D.K. and M.S.; software, D.K., A.M. and S.S.; validation, A.M., D.K. and M.S.; formal analysis, A.M. and D.K.; investigation, D.M., A.M. and D.K.; resources, M.S. and G.S.; data curation, D.K., A.M. and D.M.; writing—original draft preparation, D.K. and A.M.; writing—review and editing, D.K. and A.M.; visualization, D.K. and A.M.; supervision, A.M. and S.S.; project administration, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Morgan, E.; Arnold, M.; Gini, A.; Lorenzoni, V.; Cabasag, C.J.; Laversanne, M.; Vignat, J.; Ferlay, J.; Murphy, N.; Bray, F. Global burden of colorectal cancer in 2020 and 2040: Incidence and mortality estimates from GLOBOCAN. Gut 2023, 72, 338–344. [Google Scholar] [CrossRef] [PubMed]
Waldum, H.; Fossmark, R. Gastritis, Gastric Polyps and Gastric Cancer. Int. J. Mol. Sci. 2021, 22, 6548. [Google Scholar] [CrossRef] [PubMed]
Maida, M.; Macaluso, F.S.; Ianiro, G.; Mangiola, F.; Sinagra, E.; Hold, G.; Maida, C.; Cammarota, G.; Gasbarrini, A.; Scarpulla, G. Screening of colorectal cancer: Present and future. Expert Rev. Anticancer Ther. 2017, 17, 1131–1146. [Google Scholar] [CrossRef] [PubMed]
Mathews, A.A.; Draganov, P.V.; Yang, D. Endoscopic management of colorectal polyps: From benign to malignant polyps. World J. Gastrointest. Endosc. 2021, 13, 356–370. [Google Scholar] [CrossRef] [PubMed]
Yang, D.-H. Understanding colorectal polyps to prevent colorectal cancer. J. Korean Med. Assoc. 2023, 66, 626–631. [Google Scholar] [CrossRef]
Jia, X.; Xing, X.; Yuan, Y.; Xing, L.; Meng, M.Q.-H. Wireless Capsule Endoscopy: A New Tool for Cancer Screening in the Colon With Deep-Learning-Based Polyp Recognition. Proc. IEEE 2020, 108, 178–197. [Google Scholar] [CrossRef]
Kim, N.H.; Jung, Y.S.; Jeong, W.S.; Yang, H.J.; Park, S.K.; Choi, K.; Park, D.I. Miss rate of colorectal neoplastic polyps and risk factors for missed polyps in consecutive colonoscopies. Intestig. Res. 2017, 15, 411–418. [Google Scholar] [CrossRef] [PubMed]
Glissen Brown, J.R.; Mansour, N.M.; Wang, P.; Chuchuca, M.A.; Minchenberg, S.B.; Chandnani, M.; Liu, L.; Gross, S.A.; Sengupta, N.; Berzin, T.M. Deep Learning Computer-aided Polyp Detection Reduces Adenoma Miss Rate: A United States Multi-center Randomized Tandem Colonoscopy Study (CADeT-CS Trial). Clin. Gastroenterol. Hepatol. 2022, 20, 1499–1507.e4. [Google Scholar] [CrossRef] [PubMed]
Wang, P.; Liu, P.; Glissen Brown, J.R.; Berzin, T.M.; Zhou, G.; Lei, S.; Liu, X.; Li, L.; Xiao, X. Lower Adenoma Miss Rate of Computer-Aided Detection-Assisted Colonoscopy vs. Routine White-Light Colonoscopy in a Prospective Tandem Study. Gastroenterology 2020, 159, 1252–1261.e5. [Google Scholar] [CrossRef] [PubMed]
Ali, S.; Ghatwary, N.; Jha, D.; Isik-Polat, E.; Polat, G.; Yang, C.; Li, W.; Galdran, A.; Ballester, M.Á.G.; Thambawita, V.; et al. Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge. Sci. Rep. 2024, 14, 2032. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention; Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; Springer: Cham, Switzerland, 2015; Volume 9351. [Google Scholar] [CrossRef]
Zunair, H.; Ben Hamza, A. Sharp U-Net: Depthwise convolutional network for biomedical image segmentation. Comput. Biol. Med. 2021, 136, 104699. [Google Scholar] [CrossRef] [PubMed]
Sanchez-Peralta, L.F.; Bote-Curiel, L.; Picon, A.; Sanchez-Margallo, F.M.; Pagador, J.B. Deep learning to find colorectal polyps in colonoscopy: A systematic literature review. Artif. Intell. Med. 2020, 108, 101923. [Google Scholar] [CrossRef] [PubMed]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Kalinin, A.A.; Iglovikov, V.I.; Rakhlin, A.; Shvets, A.A. Medical Image Segmentation Using Deep Neural Networks with Pre-trained Encoders. In Deep Learning Applications: Advances in Intelligent Systems and Computing; Wani, M., Kantardzic, M., Sayed-Mouchaweh, M., Eds.; Springer: Singapore, 2020; Volume 1098. [Google Scholar] [CrossRef]
Zamanoglu, E.S.; Erbay, S.; Cengil, E.; Kosunalp, S.; Tumen, V.; Demir, K. Land Cover Segmentation using DeepLabV3 and ResNet50. In Proceedings of the 2023 4th International Conference on Communications, Information, Electronic and Energy Systems (CIEES), Plovdiv, Bulgaria, 23–25 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
Bengio, Y. Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade; Lecture Notes in Computer Science; Montavon, G., Orr, G.B., Müller, K.R., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7700. [Google Scholar] [CrossRef]
Jie, F.; Nie, Q.; Li, M.; Yin, M.; Jin, T. Atrous spatial pyramid convolution for object detection with encoder-decoder. Neurocomputing 2021, 464, 107–118. [Google Scholar] [CrossRef]
Khan, S.H.; Hayat, M.; Porikli, F. Regularization of deep neural networks with spectral dropout. Neural Netw. 2019, 110, 82–90. [Google Scholar] [CrossRef] [PubMed]
Yu, M.; Zhang, W.; Chen, X.; Liu, Y.; Niu, J. An End-to-End Atrous Spatial Pyramid Pooling and Skip-Connections Generative Adversarial Segmentation Network for Building Extraction from High-Resolution Aerial Images. Appl. Sci. 2022, 12, 5151. [Google Scholar] [CrossRef]
Jadon, S. A survey of loss functions for semantic segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Via del Mar, Chile, 27–29 October 2020; pp. 1–7. [Google Scholar] [CrossRef]
Yeung, M.; Rundo, L.; Nan, Y.; Sala, E.; Schönlieb, C.B.; Yang, G. Calibrating the Dice Loss to Handle Neural Network Overconfidence for Biomedical Image Segmentation. J. Digit. Imaging 2023, 36, 739–752. [Google Scholar] [CrossRef] [PubMed]
Popovic, A.; De la Fuente, M.; Engelhardt, M.; Radermacher, K. Statistical validation metric for accuracy assessment in medical image segmentation. Int. J. Comput. Assist. Radiol. Surg. 2007, 2, 169–181. [Google Scholar] [CrossRef]
Berman, M.; Triki, A.R.; Blaschko, M.B. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4413–4421. [Google Scholar] [CrossRef]
Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Halvorsen, P.; de Lange, T.; Johansen, D. Kvasir-SEG: A Segmented Polyp Dataset. In MultiMedia Modeling MMM 2020; Lecture Notes in Computer Science; Ro, Y., Cheng, W.-H., Kim, J., Chu, W.-T., Cui, P., Choi, J.-W., Hu, M.-C., De Neve, W., Eds.; Springer: Cham, Switzerland, 2020; Volume 11962. [Google Scholar] [CrossRef]
Bernal, J.; Sánchez, F.J.; Fernández-Esparrach, G.; Gil, D.; Rodríguez, C.; Vilariño, F. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 2015, 43, 99–111. [Google Scholar] [CrossRef] [PubMed]
Mukasheva, A.; Koishiyeva, D.; Suimenbayeva, Z.; Rakhmetulayeva, S.; Bolshibayeva, A.; Sadikova, G. Comparison evaluation of unet-based models with noise augmentation for breast cancer segmentation on ultrasound image. East.-Eur. J. Enterp. Technol. 2023, 5, 85–97. [Google Scholar] [CrossRef]
Bock, S.; Weiß, M. A Proof of Local Convergence for the Adam Optimizer. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar] [CrossRef]

Figure 1. Residual block.

Figure 2. ASPP block.

Figure 3. TASPP-UNet model architecture.

Figure 4. Comparative performance metrics on a training set of colon polyp segmentation methods, where (a) Loss, (b) Dice score, (c) Accuracy score estimation, and (d) Mean IOU estimation by epoch.

Figure 5. Comparative analysis on the validation data of colon polyp segmentation methods, where (a) Loss, (b) Dice score, (c) Accuracy score estimation, and (d) Mean IOU estimation by epoch.

Figure 6. Comparison of polyp segmentation methods (overlaying predicted results are presented on images and areas with segmentation errors are highlighted in blue).

Figure 7. Three samples from the test set to compare methods for predicting polyp segmentation (color segmentation is presented on a dark blue background).

Table 1. Comparison of characteristics and parameters of segmentation methods.

Method	Features	Parameters	Size (MB)
TASPP-UNet	ASPP and ResNet-50	19,582,337	74.70
Resnet50-UNet	ResNet-50	14,877,313	63.42
U-Net	Skip connections	7,531,521	28.73

Table 2. Evaluation of segmentation methods on a training dataset.

Method	Loss	Dice Score	Accuracy Score	Mean IOU
TASPP-U-Net	0.2243	0.9147	0.7655	0.9276
Resnet-50 U-Net	0.0983	0.9176	0.7417	0.9128
U-Net	0.0608	0.8731	0.7256	0.8607

Table 3. Evaluation of segmentation methods on a validation dataset.

Method	Loss	Dice Score	Accuracy Score	Mean IOU
TASPP-UNet	0.2628	0.7798	0.7555	0.7141
Resnet50-UNet	0.2800	0.7430	0.7417	0.6311
U-Net	0.2877	0.6386	0.7256	0.5076

Table 4. Evaluation of segmentation methods on a test dataset.

Method	Loss	Dice Score	Accuracy Score	Mean IOU
TASPP-UNet	0.0963	0.8967	0.5624	0.8789
Resnet-50-UNet	0.2248	0.7587	0.5406	0.6714
U-Net	0.1461	0.8234	0.5529	0.7667

Table 5. Comparison of training time and speed for segmentation methods.

Method	Training Time	Epoch Duration	Training Speed (FPS)
TASPP-UNet	1186	8.5	151.1
Resnet50-UNet	2123	15.2	84.4
U-Net	2269	16.2	79.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mukasheva, A.; Koishiyeva, D.; Sergazin, G.; Sydybayeva, M.; Mukhammejanova, D.; Seidazimov, S. Modification of U-Net with Pre-Trained ResNet-50 and Atrous Block for Polyp Segmentation: Model TASPP-UNet. Eng. Proc. 2024, 70, 16. https://doi.org/10.3390/engproc2024070016

AMA Style

Mukasheva A, Koishiyeva D, Sergazin G, Sydybayeva M, Mukhammejanova D, Seidazimov S. Modification of U-Net with Pre-Trained ResNet-50 and Atrous Block for Polyp Segmentation: Model TASPP-UNet. Engineering Proceedings. 2024; 70(1):16. https://doi.org/10.3390/engproc2024070016

Chicago/Turabian Style

Mukasheva, Assel, Dina Koishiyeva, Gani Sergazin, Madina Sydybayeva, Dinargul Mukhammejanova, and Syrym Seidazimov. 2024. "Modification of U-Net with Pre-Trained ResNet-50 and Atrous Block for Polyp Segmentation: Model TASPP-UNet" Engineering Proceedings 70, no. 1: 16. https://doi.org/10.3390/engproc2024070016

APA Style

Mukasheva, A., Koishiyeva, D., Sergazin, G., Sydybayeva, M., Mukhammejanova, D., & Seidazimov, S. (2024). Modification of U-Net with Pre-Trained ResNet-50 and Atrous Block for Polyp Segmentation: Model TASPP-UNet. Engineering Proceedings, 70(1), 16. https://doi.org/10.3390/engproc2024070016

Article Menu

Modification of U-Net with Pre-Trained ResNet-50 and Atrous Block for Polyp Segmentation: Model TASPP-UNet^†

Abstract

1. Introduction