Interpretable Deep Learning for Pneumonia Detection Using Chest X-Ray Images
Abstract
:1. Introduction
2. Related Works
3. Methodology
3.1. Experimental Setup
3.2. Experimental Design
3.2.1. Proposed Model
3.2.2. Dataset
3.2.3. Data Preprocessing
- Resizing: As mentioned earlier, the chest X-ray images were resized to standardized dimensions suitable for the CNN input layer. Commonly, 224 × 224 were the dimensions chosen, as they strike a balance between capturing essential details and computational efficiency.
- Normalization: The pixel values of the images were normalized to a standard scale, from 0 to 1. Normalization helps stabilize training and ensures that each feature contributes equally to model learning.
- Data Augmentation: To enhance the robustness and generalization of the model, data augmentation techniques such as rotation, flipping, and zooming were applied. This introduces variations in the training data, reducing overfitting and improving the model performance on unseen data. The specific augmentation parameters used were as follows:
- Rotation: Images were randomly rotated by up to 15 degrees.
- Flipping: Horizontal flipping was applied to randomly selected images.
- Zooming: Images were randomly zoomed in or out by up to 20%.
3.2.4. Model Development
- Adversarial Training: The model uses a perturbation magnitude of 0.01 with a 0.001 learning rate to improve robustness against adversarial inputs. Iterative tuning of these parameters ensures that the model can handle slight image disturbances while remaining accurate.
- Class Activation Maps (CAMs): Using Grad-CAM, gradient-based feature maps help localize pneumonia features within chest X-rays. Grad-CAM’s focus on class-specific regions is adjusted through its gradient thresholds, balancing interpretability (MRS) and diagnostic performance.
- Layer-wise Relevance Propagation (LRP): This method utilizes the epsilon variant (ε = 0.001), ensuring numerical stability and a clear relevance distribution across layers. Adjusting the epsilon refines the pixel-level relevance, yielding high interpretability without performance trade-offs, making LRP especially suitable for clinical settings.
- Spatial Attention Mechanism (SAM): The SAM optimizes the spatial focus by tuning attention weights across lung regions, emphasizing relevant image features while minimizing background noise. This method’s optimization focuses on clarity in spatial regions rather than granular features, sacrificing minor specificity to broaden attention across the X-ray for overall interpretability.
Key Hyperparameters
Iterative Refinement and Model Optimization
- For Adversarial Training, fine-tuning the perturbation magnitude and learning rate helped balance robustness with interpretability, with the final settings demonstrating the optimal trade-off score; this allowed the model to remain accurate while becoming more resistant to adversarial input.
- Grad-CAM optimizations involve gradient analysis and feature map adjustments, using the Mean Relevance Score (MRS) as the indicator, which effectively highlights pneumonia-affected regions.
- LRP was refined through adjustments to its epsilon parameter, achieving the highest MRS possible with no significant trade-offs in accuracy, which made it the best-optimized technique for interpretability without performance loss.
- The SAM underwent multiple adjustments to spatial configurations, balancing attention weights and feature map visualization. This enabled clear visualization of critical areas, which is beneficial for understanding the model focus despite a slight compromise in accuracy.
3.2.5. Interpretability Technique Implementation
4. Results
4.1. Model Evaluation
4.1.1. Model Optimization Process and Loss Function
- N is the number of samples;
- C is the number of classes;
- is the ground-truth label for class c of sample i;
- is the predicted probability of class c.
- are the model weights at iteration ;
- is the learning rate;
- ∇L(Wt) is the gradient of the loss function with respect to the weights.
4.1.2. Quantitative Assessment
- Accuracy measures the overall correctness of the model’s predictions, calculating the ratio of correctly predicted instances to the total instances. In the context of pneumonia detection, a high accuracy indicates the model’s proficiency in distinguishing between pneumonia and normal cases.
- Recall (Sensitivity) evaluates the model’s ability to correctly identify true positive cases among all actual positive cases. It is particularly crucial in medical applications to ensure that pneumonia cases are not overlooked, which requires minimizing false negatives and enhancing the model’s diagnostic sensitivity.
- Specificity gauges the model’s capability to correctly identify true negative cases among all actual negative cases. In the context of pneumonia detection, a high specificity indicates the model’s proficiency in avoiding false positives, reducing unnecessary concern for patients without pneumonia.
- The Area Under the ROC Curve (AUC-ROC) provides a graphical representation of the model’s ability to discriminate between pneumonia and normal cases across various decision thresholds. A higher AUC-ROC value signifies an improved discrimination performance, offering a comprehensive assessment of the model’s diagnostic accuracy.
- The Mean Relevance Score (MRS) [29] is used to assess the performance of interpretability techniques. In the context of CNN-based interpretable deep learning models for pneumonia detection in chest X-ray images, the MRS provides a quantitative measure of the interpretability of these techniques by quantifying how effectively they reveal the features influencing the CNN’s decisions. A comparative analysis of the MRS across different techniques identifies the most effective approach for revealing critical features associated with pneumonia, facilitating a better understanding and trust in the model’s decision-making process by clinicians and researchers.
- The trade-off score [30] is a numerical way to measure the trade-off, which can be constructed by combining the performance metric (such as accuracy) with a quantitative measure of interpretability. One common method is to adjust the model’s score as follows:
4.1.3. Visual and Graphical Representations
4.2. Pre-Trained ResNet50 Without Interpretability Technique Results
4.3. Layer-Wise Relevance Propagation Implementation Results
4.4. Adversarial Training Implementation Results
4.5. Class Activation Map Implementation Results
4.6. Attention Mechanism Implementation Results
5. Discussion
5.1. Summary of Evaluation
5.2. Research Contributions
5.3. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning, 2nd ed.; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Ser, J.D.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- World Health Organization. Pneumonia in Children. Available online: https://www.who.int/news-room/fact-sheets/detail/pneumonia (accessed on 10 December 2024).
- Chen, J.; Wu, L.; Zhang, J.; Zhang, L.; Gong, D.; Zhao, Y.; Chen, Q.; Huang, H.; Yang, M.; Yang, X.; et al. Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: A prospective study. Sci. Rep. 2020, 10, 19196. [Google Scholar] [CrossRef]
- Our World in Data. Pneumonia. Available online: https://ourworldindata.org/pneumonia (accessed on 10 December 2024).
- Yan, K.; Wang, X.; Lu, L.; Summers, R.M. DeepLesion: Automated Mining of Large-scale Lesion Annotations and Universal Lesion Detection with Deep Learning. J. Med. Imaging 2020, 7, 014501. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Kang, B.; Ma, J.; Zeng, X.; Xiao, M.; Guo, J.; Cai, M.; Yang, J.; Li, Y.; Meng, X.; et al. A Deep Learning Algorithm Using CT Images to Screen for Corona Virus Disease (COVID-19). Eur. Radiol. 2021, 31, 6096–6104. [Google Scholar] [CrossRef] [PubMed]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 12–18 July 2020; pp. 6105–6114. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpretable Machine Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1701–1710. [Google Scholar] [CrossRef]
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2nd ed.; Leanpub: Vancouver, BC, Canada, 2020; pp. 45–67. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 12 November 2024).
- Zhou, J.; Ye, J.; Zhang, Y.; Chen, J.; Xu, Y.; Cao, L. Pneumonia Detection Using Chest X-ray Images Based on Convolutional Neural Network. J. Med. Imaging Health Inform. 2021, 11, 1512. [Google Scholar] [CrossRef]
- Quazi, S. Artificial Intelligence and Machine Learning in Precision and Genomic Medicine. Med. Oncol. 2022, 39, 120. [Google Scholar] [CrossRef] [PubMed]
- Johnson, A.E.W.; Pollard, T.J.; Mark, R.G.; Berkowitz, S.J.; Horng, S. MIMIC-CXR: Chest Radiographs in Critical Care. Sci. Data 2020, 7, 317. [Google Scholar] [CrossRef]
- Fonseka, D.; Chrysoulas, C. Data augmentation to improve the performance of a convolutional neural network on Image Classification. In Proceedings of the International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; pp. 515–518. [Google Scholar] [CrossRef]
- Ren, H.; Wong, A.B.; Lian, W.; Cheng, W.; Zhang, Y.; He, J.; Liu, Q.; Yang, J.; Zhang, C.J.; Wu, K.; et al. Interpretable pneumonia detection by combining deep learning and explainable models with Multisource Data. IEEE Access 2021, 9, 95872–95883. [Google Scholar] [CrossRef]
- Rajaraman, S.; Thoma, G.; Antani, S.; Candemir, S. Visualizing and explaining deep learning predictions for pneumonia detection in pediatric chest radiographs. In Proceedings of the SPIE Conference Medical Imaging 2019: Computer-Aided Diagnosis, San Diego, CA, USA, 16–21 February 2019. [Google Scholar] [CrossRef]
- Aljawarneh, S.A.; Al-Quraan, R. Pneumonia detection using enhanced convolutional neural network model on chest X-ray image. Big Data 2023. [Google Scholar] [CrossRef] [PubMed]
- Siddiqi, R.; Javaid, S. Deep Learning for Pneumonia Detection in Chest X-ray Images: A Comprehensive Survey. J. Imaging 2024, 10, 176. [Google Scholar] [CrossRef] [PubMed]
- Koh, P.W.; Liang, P.; Nguyen, A.; Tang, K.; Guo, Z.; Doshi-Velez, F. Concept Bottleneck Models. Adv. Neural Inf. Process. Syst. 2020, 33, 11623–11634. Available online: https://api.semanticscholar.org/CorpusID:220424448 (accessed on 12 November 2024).
- Han, T.; Nebelung, S.; Pedersoli, F.; Zimmermann, M.; Schulze-Hagen, M.; Ho, M.; Haarburger, C.; Kiessling, F.; Kuhl, C.; Schulz, V.; et al. Advancing diagnostic performance and clinical usability of neural networks via adversarial training and dual batch normalization. Comput. Biol. Med. 2021, 124, 103926. [Google Scholar] [CrossRef] [PubMed]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
- He, K.; Gan, C.; Li, Z.; Rekik, I.; Yin, Z.; Ji, W.; Zhang, Y.; Shen, D. Transformers in medical image analysis. Intell. Med. 2023, 3, 59–78. Available online: https://mednexus.org/doi/full/10.1016/j.imed.2022.07.002 (accessed on 12 November 2024). [CrossRef]
- Nillmani; Sharma, N.; Saba, L.; Khanna, N.N.; Kalra, M.K.; Fouda, M.M.; Suri, J.S. Segmentation-Based Classification Deep Learning Model Embedded with Explainable AI for COVID-19 Detection in Chest X-ray Scans. Diagnostics 2022, 12, 2132. [Google Scholar] [CrossRef] [PubMed]
- Zhong, Y.; Piao, Y.; Tan, B.; Liu, J. A multi-task fusion model based on a residual–Multi-layer perceptron network for mammographic breast cancer screening. Comput. Methods Programs Biomed. 2024, 247, 108101. [Google Scholar] [CrossRef] [PubMed]
- Dong, J.; Chen, J.; Xie, X.; Lai, J.; Chen, H. Adversarial Attacks and Defenses for Medical Image Analysis: Methods and Applications. ACM Comput. Surv. 2024, 57, 1–38. [Google Scholar] [CrossRef]
- Suara, S.; Jha, A.; Sinha, P.; Sekh, A. Is Grad-CAM Explainable in Medical Images? In Computer Vision and Image Processing, 1st ed.; Springer Nature: Cham, Switzerland, 2024; pp. 124–135. [Google Scholar]
- Li, X.; Zheng, Y.; Ge, Z.; Dai, D.; Ju, Z. Attention Mechanism-Based Image Analysis for Medical Diagnostics. Neurocomputing 2022, 478, 31–45. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Ismail, H.; Wu, T.; Roy, A.; Guestrin, C.; Xing, E. Benchmarking Deep Learning Interpretability in Time Series Predictions. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020; pp. 6441–6451. Available online: https://proceedings.neurips.cc/paper/2020/file/47a3893cc405396a5c30d91320572d6d-Paper.pdf (accessed on 12 November 2024).
- Lakkaraju, H.; Rudin, C.; McCormick, T.H. An Empirical Study of the Accuracy-Explainability Trade-off in Machine Learning for Public Policy. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 2120–2130. Available online: https://facctconference.org/static/pdfs_2022/facct22-3533090.pdf (accessed on 12 November 2024).
Year | Confirmed Cases (in Millions) | Misdiagnosis Rate | Deaths (in Millions) |
---|---|---|---|
2010 | 220 | 1.400 | 46 |
2013 | 230 | 2.200 | 67.5 |
2016 | 250 | 3.000 | 94 |
2019 | 280 | 2.500 | 118 |
Publication | Interpretability Technique | Architecture | Results |
---|---|---|---|
Ren et al., 2021 [15] | Bayesian network | AlexNet, DenseNet121, Inception V-4, ResNet-50, Xception | Accuracy: 82.9% (highest with ResNet-50) Interpretability: 75.9% (highest with ResNet-50) |
Rajaraman et al., 2019 [16] | Grad-CAM and LIME | CNN, Visual Geometry Group-16 (VGG-16) | Accuracy: 96.2% (VGG-16), 94.1% (CNN) Interpretability: 91.8% (VGG-16), 87.3% (CNN) |
Aljawarneh and Al-Quraan, 2023 [17] | Adversarial Training, LRP, Attention Mechanism | Enhanced CNN, ResNet-50 | Accuracy: 82.8% (lowest) and 92.4% (highest) Interpretability: 79.6% (lowest) 87.3% (highest) |
Siddiqi and Javaid, 2024 [18] | Grad-CAM, LIME, SHAP | ResNet-50, VGG-16, Dense-Net, AlexNet, Mobile Net | Accuracy: 99.39% (highest) Interpretability: 94.96% (highest) |
Model | Accuracy | Sensitivity | Specificity | AUC-ROC | MRS | Trade-Off |
---|---|---|---|---|---|---|
Base ResNet-50 | 0.90 | 0.92 | 0.88 | 0.93 | 0.00 | - |
LRP | 0.91 | 0.90 | 0.92 | 0.93 | 0.85 | - |
Adversarial Training | 0.90 | 0.87 | 0.79 | 0.87 | 0.76 | 0.79 |
CAM | 0.905 | 0.86 | 0.83 | 0.89 | 0.70 | - |
Attention Mechanism | 0.88 | 0.90 | 0.87 | 0.91 | 0.12 | 0.42 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Colin, J.; Surantha, N. Interpretable Deep Learning for Pneumonia Detection Using Chest X-Ray Images. Information 2025, 16, 53. https://doi.org/10.3390/info16010053
Colin J, Surantha N. Interpretable Deep Learning for Pneumonia Detection Using Chest X-Ray Images. Information. 2025; 16(1):53. https://doi.org/10.3390/info16010053
Chicago/Turabian StyleColin, Jovito, and Nico Surantha. 2025. "Interpretable Deep Learning for Pneumonia Detection Using Chest X-Ray Images" Information 16, no. 1: 53. https://doi.org/10.3390/info16010053
APA StyleColin, J., & Surantha, N. (2025). Interpretable Deep Learning for Pneumonia Detection Using Chest X-Ray Images. Information, 16(1), 53. https://doi.org/10.3390/info16010053