Explainable Survival Analysis of Censored Clinical Data Using a Neural Network Approach
Abstract
:1. Introduction
2. Materials and Methods
2.1. Survival Analysis
- If is < 0 (), when the covariate increases, the event hazard decreases, and the survival time increases.
- If is = 0 (), the covariate has no effect on the event hazard.
- If is > 0 (), when the covariate increases, the event hazard increases, and the survival time decreases.
2.2. Dataset
2.2.1. Synthetic Dataset Analysis
- (i)
- We designed the time axis, including k points in a time interval T. Internal points were randomly generated from a uniform distribution. We generated the cumulative distribution function (CDF) of event occurrences by randomly sampling CDF values from a uniform distribution in ascending order to simulate a monotone increasing function.
- (ii)
- The obtained CDF samples were fitted to a cubic smoothing spline, obtaining the simulated CDF.
- (iii)
- We calculated the probability density function (PDF) from the CDF.
- (iv)
- We obtained the baseline survival function as (1-PDF).
- (v)
- Finally, we computed the baseline hazard by dividing the PDF by the survival function.
2.2.2. Clinical Dataset
- The quantification of hepatic and cardiac iron overload by the T2* MRI technique, with the subsequent conversion of hepatic T2* values into liver iron concentration (LIC) values;
- The measurement of left ventricular (LV) end-diastolic volume index (EDVI), LV mass index (MI), LV ejection faction (EF), right ventricular (RV) EDVI, and RV EF from cine images;
- The measurement of left atrial (LA) and right atrial (RA) area indices (AI) from cine images;
- The detection of replacement myocardial fibrosis by the late gadolinium enhancement technique;
- An assessment of serum ferritin levels within one month of the MRI scan.
- Both continuous and categorical variables;
- The censoring variable: a categorical variable that describes whether the patient is censored, that is, whether the event of interest (i.e., cardiac event) has occurred;
- The time variable: a continuous variable that represents the time at which the event occurred or, for patients who did not experience the event, the last time that they were seen.
2.3. Data Pre-Processing
2.4. Cox-Net Model
2.5. Synthetic Data Generation
2.6. Model Training and Validation
2.7. Model Evaluation: C-Index
- If both i and j are not censored, then we can determine when both patients developed the disease. The pair is determined to be a concordant pair if and , and it is determined to be a discordant pair if and .
- If both i and j are censored, then we do not know who developed the disease first (if at all), so we do not consider this pair in the computation.
- If either i or j is censored, we only observe one disease. For example, if patient i develops the disease at time and patient j is censored, then two different situations can arise:
- –
- If , then we do not know for sure who developed the disease first, so we do not consider this pair in the computation.
- –
- If , then we know for sure that patient i developed the disease first. Hence, is a concordant pair if , and it is a discordant pair if .
2.8. Explanation by Permutation Feature Importance
3. Results
3.1. Synthetic Dataset Results
3.2. MIOT Dataset Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CDF | cumulative distribution function |
c-index | concordance index |
HR | hazard ratio |
MIOT | Myocardial Iron Overload in Thalassemia |
MRI | magnetic resonance imaging |
NN | neural networks |
probability density function | |
PFI | permutation feature inversion |
TM | thalassemia major |
References
- Clark, T.; Bradburn, M.; Love, S.; Altman, D. Survival Analysis Part I: Basic concepts and first analyses. Br. J. Cancer 2003, 89, 232–238. [Google Scholar] [CrossRef] [PubMed]
- Matsumoto, T.; Walston, S.L.; Walston, M.; Kabata, D.; Miki, Y.; Shiba, M.; Ueda, D. Deep Learning–Based Time-to-Death Prediction Model for COVID-19 Patients Using Clinical Data and Chest Radiographs. J. Digit. Imaging 2023, 36, 178–188. [Google Scholar] [CrossRef]
- Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 24. [Google Scholar] [CrossRef]
- Cox, D.R. Partial Likelihood. Biometrika 1975, 62, 269–276. [Google Scholar]
- Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random survival forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
- Binder, H.; Schumacher, M. Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinform. 2008, 9, 14. [Google Scholar] [CrossRef]
- Wiegrebe, S.; Kopper, P.; Sonabend, R.; Bischl, B.; Bender, A. Deep learning for survival analysis: A review. Artif. Intell. Rev. 2024, 57, 65. [Google Scholar] [CrossRef]
- Lee, C.; Zame, W.R.; Yoon, J.; Van Der Schaar, M. DeepHit: A deep learning approach to survival analysis with competing risks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 2314–2321. [Google Scholar]
- Lee, C.; Yoon, J.; Van Der Schaar, M. Dynamic-DeepHit: A Deep Learning Approach for Dynamic Survival Analysis with Competing Risks Based on Longitudinal Data. IEEE Trans. Biomed. Eng. 2020, 67, 122–133. [Google Scholar] [CrossRef]
- Bennis, A.; Mouysset, S.; Serrurier, M. Estimation of Conditional Mixture Weibull Distribution with Right Censored Data Using Neural Network for Time-to-Event Analysis. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer International Publishing: Berlin/Heidelberg, Germany, 2020; Volume 12084 LNAI, pp. 687–698. [Google Scholar] [CrossRef]
- Kvamme, H.; Borgan, Ø. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Anal. 2021, 27, 710–736. [Google Scholar] [CrossRef]
- Aastha; Huang, P.; Liu, Y. DeepCompete: A deep learning approach to competing risks in continuous time domain. In Proceedings of the AMIA Annual Symposium Proceedings, Online, 14–18 November 2020; Volume 2020, pp. 177–186. [Google Scholar]
- Jing, B.; Zhang, T.; Wang, Z.; Jin, Y.; Liu, K.; Qiu, W.; Ke, L.; Sun, Y.; He, C.; Hou, D.; et al. A deep survival analysis method based on ranking. Artif. Intell. Med. 2019, 98, 1–9. [Google Scholar] [CrossRef]
- Tong, L.; Mitchel, J.; Chatlin, K.; Wang, M.D. Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC Med. Inform. Decis. Mak. 2020, 20, 225. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Chen, N.; Guo, J.; Xu, X.; Liu, L.; Yi, Z. SurvNet: A Novel Deep Neural Network for Lung Cancer Survival Analysis with Missing Values. Front. Oncol. 2021, 10, 588990. [Google Scholar] [CrossRef] [PubMed]
- Nagpal, C.; Yadlowsky, S.; Rostamzadeh, N.; Heller, K. Deep Cox Mixtures for Survival Regression. In Proceedings of the Machine Learning for Healthcare Conference, Vrtual, 6–7 August 2021; Volume 149, pp. 674–708. [Google Scholar]
- Devarajan, K.; Ebrahimi, N. A semi-parametric generalization of the Cox proportional hazards regression model: Inference and applications. Comput. Stat. Data Anal. 2011, 55, 667–676. [Google Scholar] [CrossRef]
- Kartsonaki, C. Survival analysis. Diagn. Histopathol. 2016, 22, 263–270. [Google Scholar] [CrossRef]
- Lin, D. On the Breslow estimator. Lifetime Data Anal. 2007, 13, 471–480. [Google Scholar] [CrossRef] [PubMed]
- Harden, J.J.; Kropko, J. Simulating duration data for the cox model. Political Sci. Res. Methods 2019, 7, 921–928. [Google Scholar] [CrossRef]
- Pepe, A.; Pistoia, L.; Gamberini, M.R.; Cuccia, L.; Lisi, R.; Cecinati, V.; Maggio, A.; Sorrentino, F.; Filosa, A.; Rosso, R.; et al. National networking in rare diseases and reduction of cardiac burden in thalassemia major. Eur. Heart J. 2021, 43, 2482–2492. [Google Scholar] [CrossRef]
- Buxton, A.E.; Calkins, H.; Callans, D.J.; DiMarco, J.P.; Fisher, J.D.; Greene, H.L.; Haines, D.E.; Hayes, D.L.; Heidenreich, P.A.; Miller, J.M.; et al. ACC/AHA/HRS 2006 key data elements and definitions for electrophysiological studies and procedures: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Data Standards (ACC/AHA/HRS Writing Committee to Develop Data Standards on Electrophysiology). Circulation 2006, 114, 2534–2570. [Google Scholar] [CrossRef]
- Jessup, M.; Abraham, W.T.; Casey, D.E.; Feldman, A.M.; Francis, G.S.; Ganiats, T.G.; Konstam, M.A.; Mancini, D.M.; Rahko, P.S.; Silver, M.A.; et al. 2009 Focused Update: ACCF/AHA Guidelines for the Diagnosis and Management of Heart Failure in Adults. Circulation 2009, 119, 1977–2016. [Google Scholar] [CrossRef]
- Batista, G.E.A.P.A.; Monard, M.C. An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 2003, 17, 519–533. [Google Scholar] [CrossRef]
- Philipp, G.; Song, D.; Carbonell, J.G. Gradients explode—Deep Networks are shallow—ResNet explained. CoRR 2017. Available online: https://openreview.net/pdf?id=HkpYwMZRb (accessed on 16 February 2025).
- Ching, T.; Zhu, X.; Garmire, L.X. Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput. Biol. 2018, 14, e1006076. [Google Scholar] [CrossRef]
- Akbas, K.; Balıkçı Çiçek, I.; Kaya, M.; Colak, C. Comparison of Performance of Deep Survival and Cox Proportional Hazard Models: An Application on the Lung Cancer Dataset. Med. Sci. Int. Med. J. 2022, 11, 1202. [Google Scholar] [CrossRef]
- Zhao, L.; Feng, D. Deep neural networks for survival analysis using pseudo values. IEEE J. Biomed. Health Inform. 2020, 24, 3308–3314. [Google Scholar] [CrossRef] [PubMed]
- Bice, N.; Kirby, N.; Bahr, T.; Rasmussen, K.; Saenz, D.; Wagner, T.; Papanikolaou, N.; Fakhreddine, M. Deep learning-based survival analysis for brain metastasis patients with the national cancer database. J. Appl. Clin. Med. Phys. 2020, 21, 187–192. [Google Scholar] [CrossRef]
- Alabdallah, A.; Ohlsson, M.; Pashami, S.; Rögnvaldsson, T. The Concordance Index decomposition: A measure for a deeper understanding of survival prediction models. Artif. Intell. Med. 2024, 148, 102781. [Google Scholar] [CrossRef] [PubMed]
- Ghorbani, A.; Abid, A.; Zou, J. Interpretation of Neural Networks Is Fragile. arXiv 2019, arXiv:1710.10547. [Google Scholar]
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2nd ed.; 2022; Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 16 February 2025).
- Schwalbe, N.; Wahl, B. Artificial intelligence and the future of global health. Lancet 2020, 395, 1579–1586. [Google Scholar] [CrossRef]
- Pepe, A.; Meloni, A.; Rossi, G.; Midiri, M.; Missere, M.; Valeri, G.; Sorrentino, F.; D’Ascola, D.G.; Spasiano, A.; Filosa, A.; et al. Prediction of cardiac complications for thalassemia major in the widespread cardiac magnetic resonance era: A prospective multicentre study by a multi-parametric approach. Eur. Heart J. Cardiovasc. Imaging 2018, 19, 299–309. [Google Scholar] [CrossRef]
- Meloni, A.; Pistoia, L.; Gamberini, M.R.; Cuccia, L.; Lisi, R.; Cecinati, V.; Ricchi, P.; Gerardi, C.; Restaino, G.; Righi, R.; et al. Multi-Parametric Cardiac Magnetic Resonance for Prediction of Heart Failure Death in Thalassemia Major. Diagnostics 2023, 13, 890. [Google Scholar] [CrossRef]
- Mewton, N.; Liu, C.Y.; Croisille, P.; Bluemke, D.; Lima, J.A. Assessment of myocardial fibrosis with cardiovascular magnetic resonance. J. Am. Coll. Cardiol. 2011, 57, 891–903. [Google Scholar] [CrossRef]
- Borgna-Pignatti, C.; Rugolotto, S.; De Stefano, P.; Zhao, H.; Cappellini, M.D.; Del Vecchio, G.C.; Romeo, M.A.; Forni, G.L.; Gamberini, M.R.; Ghilardi, R.; et al. Survival and complications in patients with thalassemia major treated with transfusion and deferoxamine. Haematologica 2004, 89, 1187–1193. [Google Scholar] [PubMed]
- Pepe, A.; Gamberini, M.R.; Missere, M.; Pistoia, L.; Mangione, M.; Cuccia, L.; Spasiano, A.; Maffei, S.; Cadeddu, C.; Midiri, M.; et al. Gender differences in the development of cardiac complications: A multicentre study in a large cohort of thalassaemia major patients to optimize the timing of cardiac follow-up. Br. J. Haematol. 2018, 180, 879–888. [Google Scholar] [CrossRef] [PubMed]
- Kander, M.C.; Cui, Y.; Liu, Z. Gender difference in oxidative stress: A new look at the mechanisms for cardiovascular diseases. J. Cell. Mol. Med. 2017, 21, 1024–1032. [Google Scholar] [CrossRef] [PubMed]
- Solti, F.; Vecsey, T.; Kékesi, V.; Juhász-Nagy, A. The effect of atrial dilatation on the genesis of atrial arrhythmias. Cardiovasc. Res. 1989, 23, 882–886. [Google Scholar] [CrossRef]
- Vale-Silva, L.A.; Rohr, K. Long-term cancer survival prediction using multimodal deep learning. Sci. Rep. 2021, 11, 13505. [Google Scholar] [CrossRef]
Variable | Mean ± SD |
---|---|
Age | 29.3 ± 9.1 (years) |
Left Ventricular End-Diastolic Volume Index (LVEDI) | 86.2 ± 18.8 (mL/m2) |
Left Ventricular Ejection Fraction (LVEF) | 62.5 ± 5.9% |
Left Ventricular Mass Index (LVMI) | 58.8 ± 12.8 (g/m2) |
Right Ventricular End-Diastolic Volume Index (RVEDI) | 82.9 ± 19.4 (mL/m2) |
Right Ventricular Ejection Fraction (RVEF) | 61.6 ± 6.7% |
Left Atrium Area Index (LAAI) | 12.9 ± 2.7 (cm2) |
Right Atrium Area Index (RAAI) | 12.3 ± 2.4 (cm2) |
Cardiac T2* | 28.1 ± 12.3 (ms) |
Liver Iron Concentration (LIC) | 9.1 ± 9.0 (mg/g/dw) |
Ferritin | 1594 ± 1394 (mg/mL) |
Replacement Myocardial Fibrosis | 21% |
Male Sex | 47% |
Cox-Net | Cox Regression | |||||||
---|---|---|---|---|---|---|---|---|
Censored (%) | Fold 1 | Fold 2 | Fold 3 | Mean ± SD | Fold 1 | Fold 2 | Fold 3 | Mean ± SD |
0% | 0.832 | 0.812 | 0.834 | 0.826 ± 0.009 | 0.832 | 0.819 | 0.836 | 0.828 ± 0.007 |
20% | 0.883 | 0.863 | 0.852 | 0.866 ± 0.013 | 0.838 | 0.831 | 0.833 | 0.834 ± 0.003 |
35% | 0.832 | 0.828 | 0.798 | 0.819 ± 0.015 | 0.793 | 0.776 | 0.759 | 0.776 ± 0.014 |
50% | 0.828 | 0.859 | 0.827 | 0.838 ± 0.015 | 0.756 | 0.764 | 0.760 | 0.760 ± 0.003 |
65% | 0.808 | 0.862 | 0.835 | 0.835 ± 0.022 | 0.749 | 0.739 | 0.726 | 0.738 ± 0.009 |
80% | 0.857 | 0.862 | 0.813 | 0.844 ± 0.022 | 0.763 | 0.756 | 0.702 | 0.740 ± 0.027 |
95% | 0.817 | 0.850 | 0.739 | 0.802 ± 0.047 | 0.745 | 0.660 | 0.720 | 0.708 ± 0.036 |
98% | 0.857 | 0.745 | 0.863 | 0.821 ± 0.054 | 0.697 | 0.771 | 0.758 | 0.742 ± 0.032 |
Model | Fold 1 | Fold 2 | Fold 3 | Mean ± SD | Test Set |
---|---|---|---|---|---|
Cox-Net | 0.771 | 0.856 | 0.806 | 0.811 ± 0.035 | 0.795 |
Cox Regression | 0.736 | 0.806 | 0.829 | 0.790 ± 0.040 | 0.690 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
De Santi, L.A.; Orlandini, F.; Positano, V.; Pistoia, L.; Sorrentino, F.; Messina, G.; Roberti, M.G.; Missere, M.; Schicchi, N.; Vallone, A.; et al. Explainable Survival Analysis of Censored Clinical Data Using a Neural Network Approach. BioMedInformatics 2025, 5, 17. https://doi.org/10.3390/biomedinformatics5020017
De Santi LA, Orlandini F, Positano V, Pistoia L, Sorrentino F, Messina G, Roberti MG, Missere M, Schicchi N, Vallone A, et al. Explainable Survival Analysis of Censored Clinical Data Using a Neural Network Approach. BioMedInformatics. 2025; 5(2):17. https://doi.org/10.3390/biomedinformatics5020017
Chicago/Turabian StyleDe Santi, Lisa Anita, Francesca Orlandini, Vincenzo Positano, Laura Pistoia, Francesco Sorrentino, Giuseppe Messina, Maria Grazia Roberti, Massimiliano Missere, Nicolò Schicchi, Antonino Vallone, and et al. 2025. "Explainable Survival Analysis of Censored Clinical Data Using a Neural Network Approach" BioMedInformatics 5, no. 2: 17. https://doi.org/10.3390/biomedinformatics5020017
APA StyleDe Santi, L. A., Orlandini, F., Positano, V., Pistoia, L., Sorrentino, F., Messina, G., Roberti, M. G., Missere, M., Schicchi, N., Vallone, A., Santarelli, M. F., Clemente, A., & Meloni, A. (2025). Explainable Survival Analysis of Censored Clinical Data Using a Neural Network Approach. BioMedInformatics, 5(2), 17. https://doi.org/10.3390/biomedinformatics5020017