Advanced Diagnosis of Cardiac and Respiratory Diseases from Chest X-Ray Imagery Using Deep Learning Ensembles
Abstract
:1. Introduction
- Novel Ensemble Architecture: We propose a unique ensemble framework that integrates Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), effectively leveraging their complementary strengths to improve multi-label classification accuracy. This hybrid design offers a new paradigm for combining spatial and contextual representations in medical imaging.
- Comprehensive and Scalable Methodology: Our pipeline includes advanced preprocessing, targeted data augmentation, and iterative model refinement, specifically optimized for large-scale clinical datasets. This ensures that the proposed approach is not only accurate but also robust and generalizable.
- Empirical Evaluation and Model Insights: The study provides an in-depth comparative analysis of individual models and the ensemble configuration, highlighting their respective strengths and guiding the ensemble’s design. The ensemble consistently achieves superior metrics, including lower Hamming Loss and higher ROC-AUC, validating the efficacy of the integration strategy.
- Clinical and Societal Impact: With its high accuracy and reliability, the proposed system demonstrates strong potential for deployment in real-world clinical environments, particularly in resource-constrained settings. This contributes to ongoing efforts to improve diagnostic equity and accessibility through AI-powered solutions.
2. Literature Review
3. Proposed Methodology
3.1. Dataset Overview
3.2. Approach and Model Selection
3.3. Model Training Overview
3.3.1. VGG19 Model
3.3.2. ResNet152 Model
3.3.3. Xception Model
3.3.4. Vision Transformer (ViT) Model
3.4. Horizontal Stacking and Hybrid Architecture
3.5. Meta-Model Ensembling
4. Results and Discussion
4.1. Evaluation Metrics
4.1.1. Metrics Used During Model Training
- Binary Accuracy: Measures correctness of predictions on a per-label basis. This is suitable for multi-label classification, where each class is treated independently.
- Mean Absolute Error (MAE): Calculates the average absolute difference between predicted probabilities and actual labels, reflecting the model’s confidence calibration.
4.1.2. Metrics Used for Model Comparison
- ROC-AUC Curve: Evaluates the discriminative ability of each model across all classes using the Receiver Operating Characteristic (ROC) curve and its Area Under the Curve (AUC). As a threshold-independent metric, it offers insights into classification performance regardless of the decision threshold [31].
4.1.3. Rationale for Excluding Traditional Metrics
- These metrics require micro, macro, or weighted averaging to adapt them for multi-label settings, which can obscure class-wise performance and inflate aggregated results in the presence of class imbalance.
- In datasets like chest X-rays, where labels are non-exclusive and often sparse, aggregate metrics can be misleading or overly optimistic/pessimistic depending on label prevalence.
- Interpreting these metrics becomes non-trivial, particularly when multiple pathologies co-occur or when certain classes are under-represented.
4.2. Model Performance
4.3. Comparative Analysis and Implications
4.4. Error Analysis and Insights
- Patterns in Misclassification:
- −
- The Vision Transformer (ViT) model, despite its strong performance in detecting Edema, struggled significantly with conditions such as Hernia and Fibrosis [8]. These cases often presented subtle features that might not be captured well by global attention mechanisms.
- −
- ResNet-152 and VGG19 showed improved consistency across multiple conditions but were prone to errors in diseases with overlapping visual features, such as Infiltration and Mass [4].
- Dataset Biases:
- −
- An imbalance in the NIH Chest X-ray dataset was observed, with certain conditions like Pneumonia and Effusion being overrepresented compared to rarer diseases such as Hernia or Fibrosis [7].
- Model Limitations:
- −
- The Xception model, while balanced in performance, lacked the specificity required to differentiate closely related thoracic anomalies, leading to near-random guessing for some rare disease labels [6].
- Potential Bias Sources:
- −
- Class Imbalance: The long-tailed distribution in the dataset caused a disproportionate focus on majority classes during training [7].
- −
- Co-Occurrence of Labels: Certain diseases co-occurred frequently (e.g., Infiltration and Pneumonia), which led the models to overgeneralize, reducing their ability to handle cases where only one condition was present [8].
- −
- Image Quality Variability: Variations in image resolution and noise across the dataset further impacted the model’s ability to generalize [5].
4.5. Comparison with Existing Literature
- Average AUC: 0.635 across 14 thoracic diseases;
- Best per-class AUC: 0.74 for Edema;
- Lowest Hamming Loss: 0.1408.
5. Conclusions and Future Work
5.1. Conclusions
- The ViT model demonstrated moderate effectiveness, with notable performance in detecting conditions like Edema, but struggled with others like Hernia.
- ResNet152 and VGG19 showed similar levels of accuracy in their predictions, with VGG19 slightly outperforming in terms of lower Hamming Loss.
- Xception exhibited a balanced performance across various conditions but did not show exceptionally high discriminative ability for any specific disease.
- The Meta model emerged as the most effective, achieving the lowest Hamming Loss and consistently higher ROC-AUC values across different diseases, indicating its superior discriminatory capabilities.
- These findings highlight the Meta model’s potential for clinical application, where accuracy and reliability are crucial.
5.2. Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Geroski, T.; Filipović, N. Artificial Intelligence Empowering Medical Image Processing. In In Silico Clinical Trials for Cardiovascular Disease: A Finite Element and Machine Learning Approach; Springer: Berlin/Heidelberg, Germany, 2024; pp. 179–208. [Google Scholar]
- Huang, M.L.; Liao, Y.C. A lightweight CNN-based network on COVID-19 detection using X-ray and CT images. Comput. Biol. Med. 2022, 146, 105604. [Google Scholar] [CrossRef] [PubMed]
- Jain, A.; Bhardwaj, A.; Murali, K.; Surani, I. A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases. arXiv 2024, arXiv:2406.00237. [Google Scholar]
- Liu, C.; Cao, Y.; Alcantara, M.; Liu, B.; Brunette, M.; Peinado, J.; Curioso, W. TX-CNN: Detecting tuberculosis in chest X-ray images using convolutional neural network. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 2314–2318. [Google Scholar] [CrossRef]
- Rahman, T.; Chowdhury, M.E.H.; Khandakar, A.; Islam, K.R.; Islam, K.F.; Mahbub, Z.B.; Kadir, M.A.; Kashem, S. Transfer Learning with Deep Convolutional Neural Network (CNN) for Pneumonia Detection Using Chest X-ray. Appl. Sci. 2020, 10, 3233. [Google Scholar] [CrossRef]
- Patil, P.; Patil, H. X-ray Imagining Based Pneumonia Classification using Deep Learning and Adaptive Clip Limit based CLAHE Algorithm. In Proceedings of the 2020 IEEE 4th Conference on Information & Communication Technology (CICT), Chennai, India, 3–5 December 2020; pp. 1–4. [Google Scholar] [CrossRef]
- Basu, S.; Mitra, S.; Saha, N. Deep Learning for Screening COVID-19 using Chest X-Ray Images. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; pp. 2521–2527. [Google Scholar]
- Alsaati, M.A.Y. Diagnosis of COVID-19 in X-ray Images using Deep Neural Networks. Int. Res. J. Multidiscip. Technov. 2024, 6, 232–244. [Google Scholar] [CrossRef]
- Madani, A.; Moradi, M.; Karargyris, A.; Syeda-Mahmood, T. Semi-supervised learning with generative adversarial networks for chest X-ray classification with ability of data domain adaptation. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 1038–1042. [Google Scholar] [CrossRef]
- Do, O.C.; Luong, C.M.; Dinh, P.H.; Tran, G.S. An efficient approach to medical image fusion based on optimization and transfer learning with VGG19. Biomed. Signal Process. Control. 2024, 87, 105370. [Google Scholar] [CrossRef]
- Yimer, F.; Tessema, A.; Simegn, G. Multiple Lung Diseases Classification from Chest X- Ray Images using Deep Learning approach. Int. J. Adv. Trends Comput. Sci. Eng. 2021, 10, 1–7. [Google Scholar]
- Onalaja, J.; Shahra, E.Q.; Basurra, S.; Jabbar, W.A. Image Classifier for an Online Footwear Marketplace to Distinguish between Counterfeit and Real Sneakers for Resale. Sensors 2024, 24, 3030. [Google Scholar] [CrossRef]
- Saeed, M.; Ullah, M.; Khan, S.D.; Cheikh, F.A.; Sajjad, M. Vit based covid-19 detection and classification from cxr images. Electron. Imaging 2023, 35, VDA-407. [Google Scholar] [CrossRef]
- Uparkar, O.; Bharti, J.; Pateriya, R.K.; Gupta, R.K.; Sharma, A. Vision Transformer Outperforms Deep Convolutional Neural Network-based Model in Classifying X-ray Images. Procedia Comput. Sci. 2023, 218, 2338–2349. [Google Scholar] [CrossRef]
- Okolo, G.I. IEViT: An enhanced vision transformer architecture for chest X-ray image classification. Comput. Methods Programs Biomed. 2022, 226, 107141. [Google Scholar] [CrossRef]
- Jiang, X.; Zhu, Y.; Cai, G.; Zheng, B.; Yang, D. MXT: A New Variant of Pyramid Vision Transformer for Multi-label Chest X-ray Image Classification. Cogn. Comput. 2022, 14, 1362–1377. [Google Scholar] [CrossRef]
- Holste, G.; Zhou, Y.; Wang, S.; Jaiswal, A.; Lin, M.; Zhuge, S.; Yang, Y.; Kim, D.; Nguyen-Mau, T.H.; Tran, M.T.; et al. Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge. Med. Image Anal. 2024, 97, 103224. [Google Scholar] [CrossRef] [PubMed]
- Öztürk, Ş.; Turalı, M.Y.; Çukur, T. Hydravit: Adaptive multi-branch transformer for multi-label disease classification from chest X-ray images. arXiv 2023, arXiv:2310.06143. [Google Scholar] [CrossRef]
- Jeong, J.; Jeoun, B.; Park, Y.; Han, B. An Optimized Ensemble Framework for Multi-Label Classification on Long-Tailed Chest X-ray Data. In Proceedings of the ICCV Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD), Paris, France, 2 October 2023. [Google Scholar]
- Kim, D. CheXFusion: Effective Fusion of Multi-View Features using Transformers for Long-Tailed Chest X-Ray Classification. arXiv 2023, arXiv:2308.03968. [Google Scholar]
- Krishnan, K.S.; Krishnan, K.S. Vision transformer based COVID-19 detection using chest X-rays. In Proceedings of the 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 7–9 October 2021; pp. 644–648. [Google Scholar]
- Showkat, S.; Qureshi, S. Efficacy of Transfer Learning-based ResNet models in Chest X-ray image classification for detecting COVID-19 Pneumonia. Chemom. Intell. Lab. Syst. 2022, 224, 104534. [Google Scholar] [CrossRef]
- Ikechukwu, A.V.; Murali, S.; Deepu, R.; Shivamurthy, R. ResNet-50 vs VGG-19 vs training from scratch: A comparative analysis of the segmentation and classification of Pneumonia from chest X-ray images. Glob. Transitions Proc. 2021, 2, 375–381. [Google Scholar] [CrossRef]
- Gupta, A.; Anjum; Gupta, S.; Katarya, R. InstaCovNet-19: A deep learning classification model for the detection of COVID-19 patients using Chest X-ray. Appl. Soft Comput. 2021, 99, 106859. [Google Scholar] [CrossRef]
- Xie, J.; Xu, B.; Chuang, Z. Horizontal and vertical ensemble with deep representation for classification. arXiv 2013, arXiv:1306.2759. [Google Scholar]
- Mallick, J.; Talukdar, S.; Ahmed, M. Combining high resolution input and stacking ensemble machine learning algorithms for developing robust groundwater potentiality models in Bisha watershed, Saudi Arabia. Appl. Water Sci. 2022, 12, 77. [Google Scholar] [CrossRef]
- Acar, E.; Rais-Rohani, M. Ensemble of metamodels with optimized weight factors. Struct. Multidiscip. Optim. 2009, 37, 279–294. [Google Scholar] [CrossRef]
- Lu, M.; Hou, Q.; Qin, S.; Zhou, L.; Hua, D.; Wang, X.; Cheng, L. A Stacking Ensemble Model of Various Machine Learning Models for Daily Runoff Forecasting. Water 2023, 15, 1265. [Google Scholar] [CrossRef]
- Stemerman, R.; Arguello, J.; Brice, J.; Krishnamurthy, A.; Houston, M.; Kitzmiller, R.R. Identification of social determinants of health using multi-label classification of electronic health record clinical notes. JAMIA Open 2021, 4, ooaa069. [Google Scholar] [CrossRef]
- Tsoumakas, G.; Katakis, I.; Vlahavas, I. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook; Springer: Berlin/Heidelberg, Germany, 2010; pp. 667–685. [Google Scholar]
- Grandini, M.; Bagli, E.; Visani, G. Metrics for multi-class classification: An overview. arXiv 2020, arXiv:2008.05756. [Google Scholar]
- Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. CheXNet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv 2017, arXiv:1711.05225. [Google Scholar]
- Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3462–3471. [Google Scholar] [CrossRef]
- Bharati, S.; Podder, P.; Mondal, M.R. Hybrid deep learning for detecting lung diseases from X-ray images. Informatics Med. Unlocked 2020, 20, 100391. [Google Scholar] [CrossRef]
- Kim, S.; Rim, B.; Choi, S.; Lee, A.; Min, S.; Hong, M. Deep learning in multi-class lung diseases’ classification on chest X-ray images. Diagnostics 2022, 12, 915. [Google Scholar] [CrossRef] [PubMed]
- Ashraf, S.N.; Mamun, M.A.; Abdullah, H.M.; Alam, M.G.R. SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification. In Proceedings of the 2023 26th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 13–15 December 2023. [Google Scholar] [CrossRef]
- Marikkar, U.; Atito, S.; Awais, M.; Mahdi, A. LT-ViT: A Vision Transformer for Multi-Label Chest X-ray Classification. arXiv 2023, arXiv:2311.07263. [Google Scholar]
- Sajed, S.; Sanati, A.; Garcia, J.E.; Rostami, H.; Keshavarz, A.; Teixeira, A. The effectiveness of deep learning vs. traditional methods for lung disease diagnosis using chest X-ray images: A systematic review. Appl. Soft Comput. 2023, 147, 110817. [Google Scholar] [CrossRef]
- Ravi, V.; Narasimhan, H.; Pham, T.D. A cost-sensitive deep learning-based meta-classifier for pediatric pneumonia classification using chest X-rays. Expert Syst. 2022, 39, e12966. [Google Scholar] [CrossRef]
Ref. | Method Used | Dataset Used | Advantages | Challenges |
---|---|---|---|---|
[3] | Comparative Study of CNN, ResNet, and ViTs | NIH Chest X-ray dataset | ViTs achieve highest accuracy; pre-training on large datasets enhances performance | High computational resources; requires extensive pre-training. |
[10] | A novel image fusion method using TL_VGG19 network combined with Equilibrium Optimization Algorithm (EOA) | C1: MRI-PET (269 pairs), C2: MRI-SPECT (357 pairs), C3: MRI, CT, PET, SPECT (1424 images) | Enhanced image synthesis quality with improved brightness, contrast, and sharpness; demonstrated highest performance in six evaluation metrics including QA, QAB/F, and QC. Preserved intricate details from input images. | High computational cost due to complex model training; potential limitations in generalizing to datasets not included in the transfer learning. |
[17] | Multi-Label Classification with Long-Tailed Distribution | CXR-LT dataset (350,000 chest X-rays) | Addresses label imbalance and co-occurrence; utilizes vision-language models for future tasks | Handling rare diseases; managing large-scale datasets |
[18] | Hybrid CNN-Transformer with Multi-Branch Output | ChestX-ray14 dataset (112,120 images) | Captures long-range dependencies; handles label co-occurrence; improves classification accuracy by 1.2–1.4% | High computational complexity; managing adaptive weights for co-occurrence relationships |
[19] | Optimized Ensemble Framework with CSRA | MIMIC-CXR-LT dataset | Improves classification performance; handles long-tailed distribution effectively | Complexity of ensemble methods; managing computational resources |
[20] | Transformer-based Fusion with Self-Attention and Cross-Attention | MIMIC-CXR dataset | Enhances multi-view classification; achieves state-of-the-art performance; handles class imbalance | Managing computational complexity; optimizing data balancing |
[13] | Vision Transformer (ViT) with self-attention mechanism | COVID-19 Dataset and COVID-19 Radiography Dataset | Outperforms CNN-based models; achieves 97% accuracy and 94% F1-score on Radiography dataset; effective in capturing global context | Limited generalizability to other chest conditions; requires fine-tuning of ViT architecture and intensive preprocessing |
Parameter | Value |
---|---|
Input Shape | Images resized from 256 × 256 × 3 to 224 × 224 × 3 |
Normalization | Pixel values scaled to the range [0, 1] using Rescaling (1./255) |
CLAHE | Contrast limited adaptive histogram equalization |
Data Augmentation | Horizontal flipping, Rotation, Shearing, Zooming |
Loss Function | Binary Cross-Entropy (suitable for multi-label classification) |
Optimizer | Adam optimizer with default learning rate |
Metrics | Binary Accuracy and Mean Absolute Error (MAE) |
Training Duration | Maximum of 10 epochs |
Batch Size | 32 |
Disease/Model | ViT | ResNet | VGG19 | Xception | Meta Model |
---|---|---|---|---|---|
Atelectasis | 0.53 | 0.56 | 0.56 | 0.49 | 0.62 |
Cardiomegaly | 0.57 | 0.51 | 0.52 | 0.49 | 0.61 |
Consolidation | 0.59 | 0.59 | 0.58 | 0.51 | 0.63 |
Edema | 0.66 | 0.68 | 0.68 | 0.54 | 0.74 |
Effusion | 0.56 | 0.54 | 0.55 | 0.48 | 0.64 |
Emphysema | 0.47 | 0.48 | 0.49 | 0.40 | 0.61 |
Fibrosis | 0.41 | 0.36 | 0.37 | 0.52 | 0.68 |
Hernia | 0.42 | 0.43 | 0.41 | 0.50 | 0.65 |
Infiltration | 0.52 | 0.52 | 0.53 | 0.51 | 0.55 |
Mass | 0.48 | 0.46 | 0.46 | 0.51 | 0.55 |
Nodule | 0.44 | 0.44 | 0.43 | 0.57 | 0.61 |
Pleural_Thickening | 0.48 | 0.43 | 0.44 | 0.52 | 0.63 |
Pneumonia | 0.54 | 0.53 | 0.51 | 0.49 | 0.53 |
Pneumothorax | 0.49 | 0.46 | 0.46 | 0.47 | 0.63 |
Hamming Loss | 0.24258 | 0.18126 | 0.13355 | 0.16085 | 0.140803 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nakrani, H.; Shahra, E.Q.; Basurra, S.; Mohammad, R.; Vakaj, E.; Jabbar, W.A. Advanced Diagnosis of Cardiac and Respiratory Diseases from Chest X-Ray Imagery Using Deep Learning Ensembles. J. Sens. Actuator Netw. 2025, 14, 44. https://doi.org/10.3390/jsan14020044
Nakrani H, Shahra EQ, Basurra S, Mohammad R, Vakaj E, Jabbar WA. Advanced Diagnosis of Cardiac and Respiratory Diseases from Chest X-Ray Imagery Using Deep Learning Ensembles. Journal of Sensor and Actuator Networks. 2025; 14(2):44. https://doi.org/10.3390/jsan14020044
Chicago/Turabian StyleNakrani, Hemal, Essa Q. Shahra, Shadi Basurra, Rasheed Mohammad, Edlira Vakaj, and Waheb A. Jabbar. 2025. "Advanced Diagnosis of Cardiac and Respiratory Diseases from Chest X-Ray Imagery Using Deep Learning Ensembles" Journal of Sensor and Actuator Networks 14, no. 2: 44. https://doi.org/10.3390/jsan14020044
APA StyleNakrani, H., Shahra, E. Q., Basurra, S., Mohammad, R., Vakaj, E., & Jabbar, W. A. (2025). Advanced Diagnosis of Cardiac and Respiratory Diseases from Chest X-Ray Imagery Using Deep Learning Ensembles. Journal of Sensor and Actuator Networks, 14(2), 44. https://doi.org/10.3390/jsan14020044