Survival Analysis as Imprecise Classification with Trainable Kernels
Abstract
1. Introduction
- We propose three training strategies and the corresponding survival models, iSurvM, iSurvQ, iSurvJ, as an alternative to kernel-based models such as the Beran estimator.
- The models can be trained using different attention mechanisms implemented by means of neural networks as well as simple Gaussian kernels (an additional model iSurvJ(G) which uses a Gaussian kernel for the attention weights). The number of trainable parameters depends on the key and query attention matrices and can be arbitrarily chosen.
- Additionally, unlike Beran estimators, the proposed models impose no restrictions on the number of concurrent event times, making them more versatile. Moreover, a high proportion of censored data does not lead to significant accuracy degradation in the proposed models, unlike in Beran estimators.
- No parametric assumptions are made in the proposed models.
- Various numerical experiments with real and synthetic datasets are conducted to compare the proposed survival models with each other under different conditions and to compare them with the Beran estimator using the concordance index and the Brier score. We compare the proposed models with the Beran estimator, but not with available transformer-based models [16,41,42], because the introduced models are regarded as alternatives to the kernel-based Beran estimator and can be incorporated into a more complex model as a component. The corresponding codes implementing the proposed models are publicly available at: https://github.com/NTAILab/iSurvMQJ (accessed on 14 September 2025).
2. Related Work
3. Preliminaries
3.1. Survival Analysis
3.2. Attention Mechanism and the Nadaraya–Watson Regression
4. Survival Analysis as an Imprecise Multi-Label Classification Problem
5. Training Strategies and Three Survival Models
5.1. A General Form of Attention Weights
5.2. First Model
Algorithm 1 Training algorithm for iSurvM |
|
5.3. Second Model
5.4. Third Model
Algorithm 2 Training algorithm for iSurvJ |
|
5.5. Extended Intervals
6. Numerical Experiments
6.1. Description of Synthetic Data
6.2. Description of Real Data
6.3. Study of the Model Properties
6.3.1. Dependence on the Number of Features
6.3.2. Dependence on Parameter k
6.3.3. Dependence on the Number of Censored Observations
6.3.4. Intervals for Survival Functions
6.3.5. Unconditional Survival Functions
6.3.6. An Illustrative Comparative Example of iSurvJ(G) and Beran Estimator
6.4. Some Conclusions from Numerical Experiments
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Hosmer, D.; Lemeshow, S.; May, S. Applied Survival Analysis: Regression Modeling of Time to Event Data; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Jing, B.; Zhang, T.; Wang, Z.; Jin, Y.; Liu, K.; Qiu, W.; Ke, L.; Sun, Y.; He, C.; Hou, D.; et al. A deep survival analysis method based on ranking. Artif. Intell. Med. 2019, 98, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Hao, L.; Kim, J.; Kwon, S.; Ha, I.D. Deep learning-based survival analysis for high-dimensional survival data. Mathematics 2021, 9, 1244. [Google Scholar] [CrossRef]
- Lee, E.; Wang, J. Statistical Methods for Survival Data Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
- Hothorn, T.; Bühlmann, P.; Dudoit, S.; Molinaro, A.; van der Laan, M. Survival ensembles. Biostatistics 2006, 7, 355–373. [Google Scholar] [CrossRef]
- Wrobel, L.; Gudys, A.; Sikora, M. Learning rule sets from survival data. BMC Bioinform. 2017, 18, 285–297. [Google Scholar] [CrossRef] [PubMed]
- Zhao, L.; Feng, D. DNNSurv: Deep Neural Networks for Survival Analysis Using Pseudo Values. arXiv 2019, arXiv:1908.02337v2. [Google Scholar]
- Zhao, Z.; Zobolas, J.; Zucknick, M.; Aittokallio, T. Tutorial on survival modeling with applications to omics data. Bioinformatics 2024, 40, btae132. [Google Scholar] [CrossRef]
- Marinos, G.; Kyriazis, D. A Survey of Survival Analysis Techniques. In Proceedings of the HEALTHINF, Online, 11–13 February 2021; pp. 716–723. [Google Scholar] [CrossRef]
- Wang, P.; Li, Y.; Reddy, C. Machine Learning for Survival Analysis: A Survey. ACM Comput. Surv. (CSUR) 2019, 51, 1–36. [Google Scholar] [CrossRef]
- Wiegrebe, S.; Kopper, P.; Sonabend, R.; Bischl, B.; Bender, A. Deep learning for survival analysis: A review. Artif. Intell. Rev. 2024, 57, 1–34. [Google Scholar] [CrossRef]
- Ishwaran, H.; Kogalur, U. Random Survival Forests for R. R News 2007, 7, 25–31. [Google Scholar]
- Belle, V.V.; Pelckmans, K.; Suykens, J.; Huffel, S.V. Survival SVM: A practical scalable algorithm. In Proceedings of the ESANN, Bruges, Belgium, 23–25 April 2008; pp. 89–94. [Google Scholar]
- Chen, G.H. An Introduction to Deep Survival Analysis Models for Predicting Time-to-Event Outcomes. Found. Trends Mach. Learn. 2024, 17, 921–1100. [Google Scholar] [CrossRef]
- Arroyo, A.; Cartea, A.; Moreno-Pino, F.; Zohren, S. Deep attentive survival analysis in limit order books: Estimating fill probabilities with convolutional-transformers. Quant. Financ. 2024, 24, 35–57. [Google Scholar] [CrossRef]
- Hu, S.; Fridgeirsson, E.; van Wingen, G.; Welling, M. Transformer-based deep survival analysis. In Proceedings of the Survival Prediction-Algorithms, Challenges and Applications, PMLR, Palo Alto, CA, USA, 22–24 March 2021; pp. 132–148. [Google Scholar]
- Li, C.; Zhu, X.; Yao, J.; Huang, J. Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics. In Proceedings of the 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; IEEE Computer Society: Los Alamitos, CA, USA, 2022; pp. 4256–4262. [Google Scholar] [CrossRef]
- Zhang, X.; Mehta, D.; Hu, Y.; Zhu, C.; Darby, D.; Yu, Z.; Merlo, D.; Gresle, M.; Van Der Walt, A.; Butzkueven, H.; et al. Adaptive transformer modelling of density function for nonparametric survival analysis. Mach. Learn. 2025, 114, 31. [Google Scholar] [CrossRef]
- Kaplan, E.; Meier, P. Nonparametric Estimation from Incomplete Observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
- Beran, R. Nonparametric regression with randomly censored survival data. In Technical Report; University of California: Berkeley, CA, USA, 1981. [Google Scholar]
- Chen, G.H. Survival kernets: Scalable and interpretable deep kernel survival analysis with an accuracy guarantee. J. Mach. Learn. Res. 2024, 25, 1–78. [Google Scholar]
- Evers, L.; Messow, C.M. Sparse kernel methods for high-dimensional survival data. Bioinformatics 2008, 24, 1632–1638. [Google Scholar] [CrossRef] [PubMed]
- Gefeller, O.; Michels, P. A review on smoothing methods for the estimation of the hazard rate based on kernel functions. In Proceedings of the Computational Statistics: Volume 1: Proceedings of the 10th Symposium on Computational Statistics, Neuchatel, Switzerland, August 1992; Springer: Berlin/Heidelberg, Germany, 1992; pp. 459–464. [Google Scholar] [CrossRef]
- Cawley, G.; Talbot, N.; Janacek, G.; Peck, M. Bayesian kernel learning methods for parametric accelerated life survival analysis. In Proceedings of the First International Conference on Deterministic and Statistical Methods in Machine Learning, Sheffield, UK, 7–10 September 2004; pp. 37–55. [Google Scholar] [CrossRef]
- Li, H.; Luan, Y. Kernel Cox regression models for linking gene expression profiles to censored survival data. In Proceedings of the Pacific Symposium Biocomputing 2003; World Scientific: Singapore, 2002; pp. 65–76. [Google Scholar] [CrossRef]
- Rong, Y.; Zhao, S.D.; Zheng, X.; Li, Y. Kernel Cox partially linear regression: Building predictive models for cancer patients’ survival. Stat. Med. 2024, 43, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Yang, H.; Zhu, H.; Ahn, M.; Ibrahim, J.G. Weighted functional linear Cox regression model. Stat. Methods Med. Res. 2021, 30, 1917–1931. [Google Scholar] [CrossRef]
- Cox, D. Regression models and life-tables. J. R. Stat. Soc. Ser. (Methodol.) 1972, 34, 187–220. [Google Scholar] [CrossRef]
- Tutz, G.; Schmid, M. Modeling Discrete Time-to-Event Data; Springer: New York, NY, USA, 2016; Volume 3. [Google Scholar] [CrossRef]
- Suresh, K.; Severn, C.; Ghosh, D. Survival prediction models: An introduction to discrete-time modeling. BMC Med. Res. Methodol. 2022, 22, 207. [Google Scholar] [CrossRef]
- Kvamme, H.; Borgan, Ø. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Anal. 2021, 27, 710–736. [Google Scholar] [CrossRef]
- Zhong, C.; Tibshirani, R. Survival analysis as a classification problem. arXiv 2019, arXiv:1909.11171v2. [Google Scholar] [CrossRef]
- Nadaraya, E. On estimating regression. Theory Probab. Its Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
- Watson, G. Smooth regression analysis. Sankhya Indian J. Stat. Ser. 1964, 26, 359–372. [Google Scholar]
- Luong, T.; Pham, H.; Manning, C. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics: Lisbon, Portugal, 2015; pp. 1412–1421. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Kvamme, H.; Borgan, O.; Scheel, I. Time-to-Event Prediction with Neural Networks and Cox Regression. J. Mach. Learn. Res. 2019, 20, 1–30. [Google Scholar]
- Coolen, F. An imprecise Dirichlet model for Bayesian analysis of failure data including right-censored observations. Reliab. Eng. Syst. Saf. 1997, 56, 61–68. [Google Scholar] [CrossRef]
- Coolen, F.; Yan, K. Nonparametric predictive inference withright-censored data. J. Stat. Plan. Andin. 2004, 126, 25–54. [Google Scholar] [CrossRef]
- Mangili, F.; Benavoli, A.; de Campos, C.; Zaffalon, M. Reliable survival analysis based on the Dirichlet process. Biom. J. 2015, 57, 1002–1019. [Google Scholar] [CrossRef]
- Tang, Z.; Liu, L.; Chen, Z.; Ma, G.; Dong, J.; Sun, X.; Zhang, X.; Li, C.; Zheng, Q.; Yang, L.; et al. Explainable survival analysis with uncertainty using convolution-involved vision transformer. Comput. Med. Imaging Graph. 2023, 110, 102302. [Google Scholar] [CrossRef]
- Wang, Y.; Kong, X.; Bi, X.; Cui, L.; Yu, H.; Wu, H. ResDeepSurv: A Survival Model for Deep Neural Networks Based on Residual Blocks and Self-attention Mechanism. Interdiscip. Sci. Comput. Life Sci. 2024, 16, 405–417. [Google Scholar] [CrossRef]
- Salerno, S.; Li, Y. High-dimensional survival analysis: Methods and applications. Annu. Rev. Stat. Its Appl. 2023, 10, 25–49. [Google Scholar] [CrossRef]
- Bender, A.; Rügamer, D.; Scheipl, F.; Bischl, B. A general machine learning framework for survival analysis. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Ghent, Belgium, 14–18 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 158–173. [Google Scholar] [CrossRef]
- Emmert-Streib, F.; Dehmer, M. Introduction to Survival Analysis in Practice. Mach. Learn. Knowl. Extr. 2019, 1, 1013–1038. [Google Scholar] [CrossRef]
- Chen, G.H. Deep kernel survival analysis and subject-specific survival time prediction intervals. In Proceedings of the Machine Learning for Healthcare Conference, PMLR, Durham, NC, USA, 7–8 August 2020; pp. 537–565. [Google Scholar]
- Yang, X.; Qiu, H. Deep Gated Neural Network With Self-Attention Mechanism for Survival Analysis. IEEE J. Biomed. Health Inform. 2024, 29, 2945–2956. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Sun, J. Survtrace: Transformers for survival analysis with competing events. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Chicago, IL, USA, 7–10 August 2022; pp. 1–9. [Google Scholar] [CrossRef]
- Jiang, S.; Suriawinata, A.A.; Hassanpour, S. MHAttnSurv: Multi-head attention for survival prediction using whole-slide pathology images. Comput. Biol. Med. 2023, 158, 106883. [Google Scholar] [CrossRef] [PubMed]
- Teng, J.; Yang, L.; Wang, S.; Yu, J. A Semi-Supervised Transformer Survival Prediction Model for Lung Cancer. Adv. Funct. Mater. 2025, 35, 2419005. [Google Scholar] [CrossRef]
- Wang, Z.; Gao, Q.; Yi, X.; Zhang, X.; Zhang, Y.; Zhang, D.; Liò, P.; Bain, C.; Bassed, R.; Li, S.; et al. Surformer: An interpretable pattern-perceptive survival transformer for cancer survival prediction from histopathology whole slide images. Comput. Methods Programs Biomed. 2023, 241, 107733. [Google Scholar] [CrossRef] [PubMed]
- Yao, Z.; Chen, T.; Meng, L.; Wong, K.C. A Multi-head Attention Transformer Framework for Oesophageal Cancer Survival Prediction. In Proceedings of the 2024 4th International Conference on Artificial Intelligence, Robotics, and Communication (ICAIRC), Xiamen, China, 27–29 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 309–313. [Google Scholar] [CrossRef]
- Hill, B.M. De Finetti’s theorem, induction, and A (n) or Bayesian nonparametric predictive inference (with discussion). Bayesian Stat. 1988, 3, 211–241. [Google Scholar]
- Walley, P. Inferences from multinomial data: Learning about a bag of marbles. J. R. Stat. Soc. Ser. 1996, 58, 3–57. [Google Scholar] [CrossRef]
- Harrell, F.; Califf, R.; Pryor, D.; Lee, K.; Rosati, R. Evaluating the yield of medical tests. J. Am. Med. Assoc. 1982, 247, 2543–2546. [Google Scholar] [CrossRef]
- May, M.; Royston, P.; Egger, M.; Justice, A.; Sterne, J. Development and validation of a prognostic model for survival time data: Application to prognosis of HIV positive patients treated with antiretroviral therapy. Stat. Med. 2004, 23, 2375–2398. [Google Scholar] [CrossRef]
- Uno, H.; Cai, T.; Pencina, M.; D’Agostino, R.; Wei, L.J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 2011, 30, 1105–1117. [Google Scholar] [CrossRef]
- Brier, G. Verification of forecasts expressed in terms of probability. Mon. Weather. Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
- Graf, E.; Schmoor, C.; Sauerbrei, W.; Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 1999, 18, 2529–2545. [Google Scholar] [CrossRef]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Rubinstein, R.; Kroese, D. Simulation and the Monte Carlo Method, 2nd ed.; Wiley: Hoboken, NJ, USA, 2008; p. 345. [Google Scholar] [CrossRef]
- Smith, N.; Tromble, R. Sampling uniformly from the unit simplex. In Technical Report 29; Johns Hopkins University: Baltimore, MD, USA, 2004. [Google Scholar]
- Gyorfi, L.; Kohler, M.; Krzyzak, A.; Walk, H. A Distribution-Free Theory of Nonparametric Regression; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar] [CrossRef]
- Lei, Y.; Dogan, U.; Zhou, D.X.; Kloft, M. Data-dependent generalization bounds for multi-class classification. IEEE Trans. Inf. Theory 2019, 65, 2995–3021. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
- Demsar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
- Lee, C.; Zame, W.; Yoon, J.; van der Schaar, M. Deephit: A deep learning approach to survival analysis with competing risks. In Proceedings of the 32nd Association for the Advancement of Artificial Intelligence (AAAI) Conference, New Orleans, LA, USA, 2–7 February 2018; pp. 1–8. [Google Scholar] [CrossRef]
- Katzman, J.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 1–12. [Google Scholar] [CrossRef]
- Poché, A.; Hervier, L.; Bakkay, M.C. Natural example-based explainability: A survey. In Proceedings of the World Conference on Explainable Artificial Intelligence, Lisbon, Portugal, 26–28 July 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 24–47. [Google Scholar] [CrossRef]
- Destercke, S.; Antoine, V. Combining Imprecise Probability Masses with Maximal Coherent Subsets: Application to Ensemble Classification. In Advances in Intelligent Systems and Computing, Proceedings of the Synergies of Soft Computing and Statistics for Intelligent Data Analysis; Springer: Berlin/Heidelberg, Germany, 2013; Volume 190, pp. 27–35. [Google Scholar] [CrossRef]
Dataset | Beran | iSurvM | iSurvQ | iSurvJ | iSurvJ(G) |
---|---|---|---|---|---|
Veterans | 0.7040 | 0.7103 | 0.7295 | 0.7196 | 0.7003 |
AIDS | 0.7529 | 0.7319 | 0.7359 | 0.7139 | 0.7483 |
Breast Cancer | 0.6519 | 0.6344 | 0.6240 | 0.6487 | 0.6600 |
WHAS500 | 0.7468 | 0.7632 | 0.7596 | 0.7616 | 0.7575 |
GBSG2 | 0.6730 | 0.6827 | 0.6883 | 0.6863 | 0.6706 |
BLCD | 0.5009 | 0.5068 | 0.4835 | 0.5067 | 0.5080 |
LND | 0.4695 | 0.5264 | 0.5663 | 0.5730 | 0.4229 |
GCD | 0.4535 | 0.5201 | 0.5717 | 0.5690 | 0.4266 |
CML | 0.6410 | 0.6633 | 0.6806 | 0.6917 | 0.6424 |
Rossi | 0.5817 | 0.6088 | 0.6068 | 0.5853 | 0.6186 |
METABRIC | 0.6261 | 0.6410 | 0.6418 | 0.6470 | 0.6407 |
Dataset | Beran | iSurvM | iSurvQ | iSurvJ | iSurvJ(G) |
---|---|---|---|---|---|
Veterans | 0.1373 | 0.1189 | 0.1221 | 0.1229 | 0.1365 |
AIDS | 0.0716 | 0.0791 | 0.0805 | 0.0706 | 0.0678 |
Breast Cancer | 0.1853 | 0.2539 | 0.2377 | 0.2425 | 0.2071 |
WHAS500 | 0.2174 | 0.1896 | 0.1900 | 0.1883 | 0.2186 |
GBSG2 | 0.2074 | 0.2019 | 0.1995 | 0.1998 | 0.2186 |
BLCD | 0.2952 | 0.2701 | 0.2783 | 0.2800 | 0.2716 |
LND | 0.2423 | 0.2332 | 0.2194 | 0.2133 | 0.2180 |
GCD | 0.1996 | 0.1911 | 0.1860 | 0.1850 | 0.1991 |
CML | 0.1456 | 0.1325 | 0.1372 | 0.1339 | 0.1488 |
Rossi | 0.3086 | 0.1068 | 0.1083 | 0.1090 | 0.1084 |
METABRIC | 0.2015 | 0.2002 | 0.1980 | 0.1952 | 0.2078 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Konstantinov, A.; Utkin, L.; Efremenko, V.; Muliukha, V.; Lukashin, A.; Verbova, N. Survival Analysis as Imprecise Classification with Trainable Kernels. Mathematics 2025, 13, 3040. https://doi.org/10.3390/math13183040
Konstantinov A, Utkin L, Efremenko V, Muliukha V, Lukashin A, Verbova N. Survival Analysis as Imprecise Classification with Trainable Kernels. Mathematics. 2025; 13(18):3040. https://doi.org/10.3390/math13183040
Chicago/Turabian StyleKonstantinov, Andrei, Lev Utkin, Vlada Efremenko, Vladimir Muliukha, Alexey Lukashin, and Natalya Verbova. 2025. "Survival Analysis as Imprecise Classification with Trainable Kernels" Mathematics 13, no. 18: 3040. https://doi.org/10.3390/math13183040
APA StyleKonstantinov, A., Utkin, L., Efremenko, V., Muliukha, V., Lukashin, A., & Verbova, N. (2025). Survival Analysis as Imprecise Classification with Trainable Kernels. Mathematics, 13(18), 3040. https://doi.org/10.3390/math13183040