Unsupervised Image Segmentation on 2D Echocardiogram
Abstract
:1. Introduction
2. Related Work
2.1. Deep Learning for Image Segmentation in Medical Image Analysis
2.2. Image Segmentation in Cardiology
2.3. Deep Learning in Echocardiography
2.4. Unsupervised Medical Image Segmentation
2.5. Challenges and Limitations
3. Methods
3.1. Architecture
- Contracting Path (Down-sampling): This path reduces the spatial dimensions of the input while capturing high-level features. Each stage applies two 3 × 3 convolutions followed by ReLU activation, then down-samples with 2 × 2 max pooling. This reduces resolution but increases the number of channels, capturing more complex features.
- Bottleneck (Middle Layer): The bottleneck captures the most abstract features, using two 3 × 3 convolutions with ReLU activation. It processes the reduced feature map before the expanding path begins, preserving the essential information needed for reconstruction.
- Expanding Path (Up-sampling): The expanding path up-samples the feature maps with transposed convolutions, restoring spatial resolution. Skip connections from the contracting path help retain fine-grained details, ensuring the final output preserves spatial information.
- Final Layer: A 1 × 1 convolution reduces the feature maps to the required number of output channels, generating a segmentation mask with the same size as the input image. Softmax activation provides class probabilities for each pixel.
- Reconstruction Layer: The reconstruction layer outputs a 3-channel image using a 1 × 1 convolution, helping to regularize the training process by preserving sufficient spatial information.
3.2. Loss Functions
- Reconstruction Loss: Ensures that the network accurately reproduces the input image by minimizing the Mean Squared Error (MSE) between the input and the reconstructed output.
- Contour Regularization Loss: This loss penalizes sharp variations along the boundaries in the segmentation mask, encouraging smoother predictions. It computes the difference between the maximum and minimum values of neighboring pixels within a given radius to enforce smoothness [22]. In our approach, we used a radius of 3 pixels.
- Similarity Loss: To measure the similarity within the predicted segmentation mask, we use a cross-entropy-based similarity loss. This loss component enhances the similarity of features within the same cluster and differentiates features from different clusters [6], defined as:
3.3. Training Process
Pre-Processing the Network Inputs
- Frame Selection: A subset of frames is extracted from each video sequence, ensuring they provide enough information for segmentation (frames in different phases of the cardiac cycle, EDV and ESV included). Since consecutive cardiac cycles within the same video exhibit minimal differences, increasing the number of frames significantly lengthens processing time without notable performance improvements.Therefore, selecting an appropriate number of frames that cover at least one complete cardiac cycle per video is sufficient for the training phase.
- Pre-processing for Training: The selected frames are first resized to a resolution of 224 × 224 to ensure uniform input size for the model. A series of transformations is applied to augment the data, including the random application of Gaussian blur with varying kernel sizes (3, 5, and 7) and sigma values (ranging from 0.1 to 2.0), as well as random rotations of up to 5 degrees. These transformations are applied with a probability of 0.5, adding variability to the training data to make the model more robust to noise and orientation changes. Finally, the frames are normalized using the mean and standard deviation values specific to the dataset, ensuring that pixel intensities are standardized for improved model performance.
- Pre-processing for Validation and Testing: For validation and testing, only resizing to 224 × 224 and tensor conversion are performed, without any additional augmentations, ensuring consistency in evaluation.
3.4. Post-Processing Through 3D Watershed Segmentation
- Gradient Computation: The first step involves computing a gradient of the image to highlight regions of rapid intensity changes, which will serve as potential boundaries. This gradient map is combined with a positional bias that accounts for the spatial position of the pixels within the image, ensuring that the algorithm can handle complex 3D structures in a volumetric image.
- Thresholding and Mask Creation: A binary mask is generated for each frame using Otsu’s thresholding technique, which automatically separates the foreground and background by selecting an optimal threshold that minimizes the variance within each group. This technique analyzes the histogram of pixel intensities in the gray-scale image, determining a threshold that best distinguishes between the two regions. After applying the threshold, all pixels below the threshold are classified as background, and those above as foreground. The mask is then inverted, as we are interested in using the background areas to initiate the watershed flooding process.
- Positional Bias and Distance Transform: In addition to the standard distance transform, a positional bias is introduced. This bias is weighted across axes as follows: 0.4 for the z-axis, 0.3 for the y-axis, and 0.3 for the x-axis. This helps prioritize certain axes when calculating the distances, leading to improved segmentation accuracy for 3D data.
- Marker-Based Segmentation: To control the flooding process, we use markers that identify regions of interest such as the left ventricle. These markers are derived from the U-Net’s initial segmentation output and refined with the computed distance transform.
- Watershed Transformation: The Watershed transformation is then applied, segmenting the regions based on the previously computed markers and gradient map. The segmentation is constrained to a maximum number of classes, and labels are assigned to each segmented region.
- Label Assignment and Visualization: The segmented regions are labeled, and each class is assigned a distinct color using a persistent color map for visualization. This step ensures that the left ventricle is isolated for further analysis, and the colors remain consistent across frames. However, due to the nature of the algorithm, it is challenging to consistently assign the same color/label to each chamber across different videos. This variability introduced the need for on-the-fly selection of the left ventricle for subsequent mask isolation and volume calculations.
3.5. Performance Evaluation
3.5.1. Visual Inspection of Segmentation Results
3.5.2. Quantitative Evaluation and Mask Extraction
- Dice Coefficient (Dice) is given by:
- Intersection over Union (IoU), also known as the Jaccard Index, is defined as:IoU measures the ratio of the intersection to the union of the predicted and actual segments, providing a stricter evaluation than Dice.
4. Results
4.1. Quantitative Analysis
4.1.1. Segmentation Accuracy
4.1.2. Ejection Fraction
4.2. Qualitative Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Quality Assessment of Selected vs. Non-Selected Videos
Appendix A.2. Training and Validation Loss for Different Frame Sizes
Appendix A.3. Grid Search and Weight Selection
Appendix A.4. Ablation Study
Appendix A.5. Comparison with State-of-the-Art Models
- DeepLabV3: Known for its ability to capture multi-scale contextual information, DeepLabV3 uses atrous convolutions and is particularly effective for semantic segmentation tasks where precise boundary detection is important. The model employs a ResNet-101 backbone and requires pixel-level annotations to achieve high-performance [26]. In this study, we used the pre-trained version of DeepLabV3 with ResNet-101 to evaluate its performance on echocardiograms. The results show that without specific fine-tuning for gray-scale images, it cannot identify any useful information from the echocardiograms (see Figure A6(4)).
- Mask R-CNN: The Mask R-CNN approach extends Faster R-CNN by adding a segmentation branch to predict pixel-wise masks for each detected object. The model is highly effective for instance segmentation tasks, though it also depends on labeled data to generate accurate segmentation results [27]. It uses a ResNet-50 backbone with a Feature Pyramid Network (FPN) to capture multi-scale features. While Mask R-CNN is effective at object detection and segmentation, its performance on gray-scale echocardiograms (see Figure A6(5)) is limited because it was pre-trained on RGB images and does not account for medical-specific features like heart chambers without further adaptation.
- U-Net: Widely used in biomedical image segmentation, U-Net employs an encoder–decoder structure and is particularly successful in medical imaging due to its ability to capture both low- and high-level features. Like the other supervised models, it relies on annotated data for training [28]. In this study, we adopted a ResNet-34 encoder pre-trained on ImageNet. While the U-Net has demonstrated strong performance in medical imaging tasks due to its ability to capture both low- and high-level features, like the other models, it had poor segmentation accuracy for gray-scale segmentation tasks in echocardiogram analysis (see Figure A6(6)).
Appendix A.6. Extra Results
References
- Tromp, J.; Bauer, D.; Claggett, B.L.; Frost, M.; Iversen, M.B.; Prasad, N.; Petrie, M.C.; Larson, M.G.; Ezekowitz, J.A.; Scott, D.; et al. A formal validation of a deep learning-based automated workflow for the interpretation of the echocardiogram. Nat. Commun. 2022, 13, 6776. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Gajjala, S.; Agrawal, P.; Tison, G.H.; Hallock, L.A.; Beussink-Nelson, L.; Lassen, M.H.; Fan, E.; Aras, M.A.; Jordan, C.; et al. Fully automated echocardiogram interpretation in clinical practice: Feasibility and diagnostic accuracy. Circulation 2018, 138, 1623–1635. [Google Scholar] [CrossRef] [PubMed]
- Cai, L.; Gao, J.; Zhao, D. A review of the application of deep learning in medical image classification and segmentation. Ann. Transl. Med. 2020, 8, 713. [Google Scholar] [CrossRef] [PubMed]
- Xia, X.; Kulis, B. W-Net: A deep model for fully unsupervised image segmentation. arXiv 2017, arXiv:1711.08506. [Google Scholar]
- Litjens, G.; Ciompi, F.; Wolterink, J.M.; de Vos, B.D.; Leiner, T.; Teuwen, J.; Isgum, I. State-of-the-art deep learning in cardiovascular image analysis. JACC Cardiovasc. Imaging 2019, 12, 1589–1601. [Google Scholar] [CrossRef]
- Kim, W.; Kanezaki, A.; Tanaka, M. Unsupervised learning of image segmentation based on differentiable feature clustering. IEEE Trans. Image Process. 2020, 29, 8055–8068. [Google Scholar] [CrossRef]
- Ouyang, D.; He, B.; Ghorbani, A.; Yuan, N.; Ebinger, J.; Langlotz, C.P.; Heidenreich, P.A.; Harrington, R.A.; Liang, D.H.; Ashley, E.A.; et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature 2020, 580, 252–256. [Google Scholar] [CrossRef]
- Zhou, T.; Canu, S.; Ruan, S. A review: Deep learning for medical image segmentation using multi-modality fusion. Array 2019, 3, 100004. [Google Scholar] [CrossRef]
- Chen, C.; Qin, C.; Qiu, H.; Tarroni, G.; Duan, J.; Bai, W.; Rueckert, D. Deep Learning for Cardiac Image Segmentation: A Review. Front. Cardiovasc. Med. 2020, 7, 25. [Google Scholar] [CrossRef]
- Lin, Z.; Lei, C.; Yang, L. Modern Image-Guided Surgery: A Narrative Review of Medical Image Processing and Visualization. Sensors 2023, 23, 9872. [Google Scholar] [CrossRef]
- Cruz-Aceves, I.; Avina-Cervantes, J.G.; Lopez-Hernandez, J.M.; Garcia-Hernandez, M.G.; Ibarra-Manzano, M.A. Unsupervised Cardiac Image Segmentation via Multiswarm Active Contours with a Shape Prior. Comput. Math. Methods Med. 2013, 2013, 909625. [Google Scholar] [CrossRef] [PubMed]
- Fozilov, K.; Colan, J.; Davila, A.; Misawa, K.; Qiu, J.; Hayashi, Y.; Mori, K.; Hasegawa, Y. Endoscope Automation Framework with Hierarchical Control and Interactive Perception for Multi-Tool Tracking in Minimally Invasive Surgery. Sensors 2023, 23, 9865. [Google Scholar] [CrossRef] [PubMed]
- Erkmen, H.; Schulze, H.; Wiesmann, T.; Mettin, C.; El-Monajem, A.; Kron, F. Sensing Technologies for Guidance During Needle-Based Interventions. Sustainability 2023, 13, 1224. [Google Scholar] [CrossRef]
- Seetharam, K.; Raina, S.; Sengupta, P.P. The Role of Artificial Intelligence in Echocardiography. Curr. Cardiol. Rep. 2020, 22, 99. [Google Scholar] [CrossRef]
- Vesal, S.; Gu, M.; Kosti, R.; Maier, A.; Ravikumar, N. Adapt Everywhere: Unsupervised Adaptation of Point-Clouds and Entropy Minimisation for Multi-modal Cardiac Image Segmentation. arXiv 2021, arXiv:2103.08219. [Google Scholar] [CrossRef]
- Kalra, A.; Kumar, V.; Chung, E.; Biswas, L.; Kuruvilla, S.; Zoghbi, W.A.; Gilliam, L. Unsupervised Myocardial Segmentation for Cardiac BOLD MRI. J. Cardiovasc. Magn. Reson. 2020, 22, 89–95. [Google Scholar]
- Ding, X.; Han, Z. A Semi-Supervised Approach Combining Image and Frequency Enhancement for Echocardiography Segmentation. IEEE Access 2024, 12, 92549–92559. [Google Scholar] [CrossRef]
- Abdi, A.H.; Luong, C.; Tsang, T.; Allan, G.; Nouranian, S.; Jue, J.; Hawley, D.; Fleming, S.; Gin, K.; Swift, J.; et al. Automatic Quality Assessment of Echocardiograms Using Convolutional Neural Networks: Feasibility on the Apical Four-Chamber View. IEEE Trans. Med. Imaging 2017, 36, 1221–1230. [Google Scholar] [CrossRef]
- Lang, R.M.; Badano, L.P.; Mor-Avi, V.; Afilalo, J.; Armstrong, A.; Ernande, L.; Foster, R.M.; Goldstein, E.A.; Kuznetsova, S.; Lancellotti, L.; et al. Recommendations for cardiac chamber quantification by echocardiography in adults: An update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. J. Am. Soc. Echocardiogr. 2015, 28, 1–39. [Google Scholar] [CrossRef]
- Aganj, I.; Harisinghani, M.G.; Weissleder, R.; Fischl, B. Unsupervised Medical Image Segmentation Based on the Local Center of Mass. Sci. Rep. 2018, 8, 13012. [Google Scholar] [CrossRef]
- Yang, J.; Ding, X.; Zheng, Z.; Xu, X.; Li, X. GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 4–6 October 2023; pp. 11878–11887. [Google Scholar] [CrossRef]
- Ye, K.; Liu, P.; Zou, X.; Zhou, Q.; Zheng, G. KiPA22 Report: U-Net with Contour Regularization for Renal Structures Segmentation. In Proceedings of the KiPA22 Conference, Shanghai Jiao Tong University, Shanghai, China, 8–12 August 2022. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Kornilov, A.; Safonov, I.; Yakimchuk, I. A Review of Watershed Implementations for Segmentation of Volumetric Images. J. Imaging 2022, 8, 127. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Metric | Method | Value | 95% CI |
---|---|---|---|
Dice Coefficient | Our Method (Unsupervised) | 0.8225 | [0.8098, 0.8353] |
EchoNet Dynamic (Supervised) | 0.9241 | [0.9207, 0.9275] | |
IoU | Our Method (Unsupervised) | 0.7071 | [0.6928, 0.7213] |
EchoNet Dynamic (Supervised) | 0.8600 | [0.8542, 0.8657] |
Metric | Our Method vs. GT | EchoNet D vs. GT |
---|---|---|
MAE | 13.7057 | 13.1116 |
MAPE | 28.01% | 25.38% |
RMSE | 16.9375 | 18.5931 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cacao, G.F.; Du, D.; Nair, N. Unsupervised Image Segmentation on 2D Echocardiogram. Algorithms 2024, 17, 515. https://doi.org/10.3390/a17110515
Cacao GF, Du D, Nair N. Unsupervised Image Segmentation on 2D Echocardiogram. Algorithms. 2024; 17(11):515. https://doi.org/10.3390/a17110515
Chicago/Turabian StyleCacao, Gabriel Farias, Dongping Du, and Nandini Nair. 2024. "Unsupervised Image Segmentation on 2D Echocardiogram" Algorithms 17, no. 11: 515. https://doi.org/10.3390/a17110515
APA StyleCacao, G. F., Du, D., & Nair, N. (2024). Unsupervised Image Segmentation on 2D Echocardiogram. Algorithms, 17(11), 515. https://doi.org/10.3390/a17110515