Next Article in Journal
Use of Droplet Digital Polymerase Chain Reaction to Identify Biomarkers for Differentiation of Benign and Malignant Renal Masses
Previous Article in Journal
Radiation Therapy for Stage IIA/B Seminoma: Modeling Secondary Cancer Risk for Protons and VMAT versus 3D Photons
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Individual Survival Distributions Generated by Multi-Task Logistic Regression Yield a New Perspective on Molecular and Clinical Prognostic Factors in Gastric Adenocarcinoma

1
Department of Surgery, Faculty of Medicine and Dentistry, College of Health Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
2
Department of Oncology, Faculty of Medicine and Dentistry, College of Health Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
3
Department of Mathematical and Statistical Sciences, Faculty of Science, College of Natural and Applied Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
4
Department of Computing Science, Faculty of Science, College of Natural and Applied Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
5
Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1, Canada
*
Author to whom correspondence should be addressed.
Cancers 2024, 16(4), 786; https://doi.org/10.3390/cancers16040786
Submission received: 6 December 2023 / Revised: 29 January 2024 / Accepted: 12 February 2024 / Published: 15 February 2024
(This article belongs to the Section Cancer Informatics and Big Data)

Abstract

:

Simple Summary

Traditional survival models estimate risk across many patients, and thus, translating survival outcomes to individual patients is a difficult task. Individual survival distributions (ISDs) provide an accurate survival curve for each individual patient. In this study, we demonstrate that ISDs established with multi-task logistic regression (MTLR) models provide accurate individual survival predictions for gastric cancer. Because MTLR is not bound by traditional assumptions, we demonstrate that the degree to which a tumour has a favourable immune system interaction has the most relevant long-term survival effects.

Abstract

Recent advances in our understanding of gastric cancer biology have prompted a shift towards more personalized therapy. However, results are based on population-based survival analyses, which evaluate the average survival effects of entire treatment groups or single prognostic variables. This study uses a personalized survival modelling approach called individual survival distributions (ISDs) with the multi-task logistic regression (MTLR) model to provide novel insight into personalized survival in gastric adenocarcinoma. We performed a pooled analysis using 1043 patients from a previously characterized database annotated with molecular subtypes from the Cancer Genome Atlas, Asian Cancer Research Group, and tumour microenvironment (TME) score. The MTLR model achieved a 5-fold cross-validated concordance index of 72.1 ± 3.3%. This model found that the TME score and chemotherapy had similar survival effects over the entire study time. The TME score provided the greatest survival benefit beyond a 5-year follow-up. Stage III and Stage IV disease contributed the greatest negative effect on survival. The MTLR model weights were significantly correlated with the Cox model coefficients (Pearson coefficient = 0.86, p < 0.0001). We illustrate how ISDs can accurately predict the survival time for each patient, which is especially relevant in cases of molecular subtype heterogeneity. This study provides evidence that the TME score is principally associated with long-term survival in gastric adenocarcinoma. Additional external validation and investigation into the clinical utility of this ISD model in gastric cancer is an area of future research.

1. Introduction

Recent advances in our understanding of gastric cancer biology have prompted a shift towards personalized management. Transcriptomes or multi-omics-based molecular classification systems, such as those proposed by the Cancer Genome Atlas (TCGA) and the Asian Cancer Research Group (ACRG), have identified tumours with distinct molecular characteristics with potential implications for personalized anti-cancer therapies [1,2,3]. The identification of microsatellite instability (MSI) due to deficiency in mismatch repair proteins has enabled the use of immune checkpoint inhibitor therapy in metastatic and neoadjuvant settings [4,5,6]. The possible omission of chemotherapy in tumours with MSI is another intriguing and ongoing development [7,8,9,10,11].
Despite these advances, the outcomes of clinical trials and treatment allocation in the clinic still rely on population-based survival analysis, which evaluates the average survival effects of entire treatment groups or prognostic variables. One intriguing method to augment personalized cancer treatment is to use individual survival distributions (ISDs) [12]. ISDs provide survival estimates and visual curves for each patient based on their unique tumour and clinical characteristics. These models offer advantages over other personalized survival estimation methods, such as nomograms, because they can provide survival estimates at every future time point, and they provide an additional visual interpretation of survival probability. Nomograms function as single-time risk estimators that serve to measure the probability of whether a patient survives at, for instance, one year or three years. They do not provide any information regarding the chance of survival outside of the designated periods.
Furthermore, some ISD models, such as multi-task logistic regression, are not bound by the proportional hazards assumption, which assumes that the effect of every variable remains constant over time [13,14]. It is unlikely that every variable related to oncologic outcomes follows this assumption. For example, a chemotherapeutic treatment likely provides a proportionally greater survival effect around the time the treatment is administered, but longer-term survival may be more associated with the disease stage. Because MTLR is not bound by the proportional hazards assumption, the model can predict that patient P1 is more likely to survive 1 year than patient P2, but P2 is more likely to survive 5 years—the survival curves can cross [15]. We can also determine how the relative survival effects of a given variable change over time.
Here, we construct ISDs from a pooled set of gastric cancer patients annotated with integrated molecular classifications from the TCGA, ACRG, and tumour microenvironment (TME) score classification schemes [1,2,16]. Using MTLR, we identify a novel understanding of how clinical and molecular features vary in their survival effects over time. We also provide examples of how ISDs may be presented for individual patients and explain how survival curves can facilitate counterfactual reasoning.

2. Materials and Methods

2.1. Dataset

We used previously characterized data from a pooled integrated molecular classification dataset, which contains 2202 gastric adenocarcinoma patients from 11 publicly available datasets [17]. In this prior work, supervised machine learning models were used to produce models that classify each patient according to their TCGA, ACRG, and TME molecular subtypes. These models were used to learn the molecular subtypes of all 2202 patients. We included 1043 out of 2202 patients who possessed complete clinical data for our variables of interest, which comprised the TCGA, ACRG, and TME molecular subtype scores and age, stage, sex, Lauren classification, tumour location, and chemotherapy exposure status. Patients with missing data for these variables were excluded.

2.2. Models and Statistical Analysis

Individual survival distribution models for overall survival were implemented as suggested by Haider et al. [18]. The codebase for implementation in R can be found at https://github.com/haiderstats/ISDEvaluation (accessed on 16 June 2021). We tested the performance of Cox with Kalbfleisch–Prentice extensions (Cox-KP), ElasticNet Cox (CoxEN-KP), random survival forest (RSF), accelerated failure time (AFT), and multi-task logistic regression (MTLR) models for our survival prediction task. Model performance and optimization were assessed using 5-fold cross-validation (CV) [12].
The models were selected with consideration of their concordance index (C-Index), 1-calibration and D-calibration metrics [12]. The concordance index is a measure of a model’s discrimination—the proportion of pairs of subjects/patients where the prediction (i.e., model score/risk) is concordant with the true outcome [18,19]. Calibration is a measure of how well model predictions match the true observed event rate. The 1-calibration metric is simply the Hosmer–Lameshow goodness-of-fit test, which evaluates whether the predicted rate of an event is statistically similar to the true event rate at a specific time point [20]. Distribution calibration (D-calibration) assesses whether the proportion of patients predicted to experience an event is uniformly distributed in each decile using a Pearson’s χ2 test. This metric provides a means to determine if we should believe the predictions made by an ISD model [18]. We report the p value from these tests for which the null hypothesis denotes a calibrated test. If p < 0.05, then the model is not calibrated for that calibration metric.
The chosen time-to event models represent some common survival modelling methods but are not exhaustive. Detailed descriptions of these models have been provided in previous studies that used Cox-KP, CoxEN-KP, RSF, AFT, and MTLR models as benchmark comparisons for ISDs [12,18]. We have included brief descriptions of these models below. The Cox model is a semi-parametric model that provides a risk score. To produce an ISD, the Kalbfleisch–Prentice estimator is employed to estimate a baseline hazard function [21]. The CoxEN-KP model uses ElasticNet (EN) regularization of the negative log of the partial likelihood in an effort to improve the model fit [22,23]. Cox-KP and CoxEN-KP provide excellent discrimination (C-Index) but comparatively provide worse calibration indices [12]. The RSF is a non-parametric ensemble estimator that does not obey the proportional hazards assumption [24]. RSF is most effective in higher-dimensional datasets with low censor rates [12]. Similar to Cox, RSF provides excellent discrimination but poor calibration compared to MTLR. AFT is a parametric survival model based on the Weibull distribution and is effective in simple, low-dimensional survival prediction tasks [25]. MTLR generates individual patient survival distributions for a specific number of times based on the number of uncensored patients in a dataset [26]. An empirical study demonstrated that MTLR provided equivalent or superior discrimination compared to other survival models and provided excellent model calibration in both low- and high-dimensional datasets [12,15,18].
The survival probability and median survival time were calculated using spline functions in R. A monotonic cubic spline function using Hyman filtering of the Hermite spline method was used to generate a prediction function [27]. The median survival time was calculated using the integral (i.e., the area under the curve) of the monotonic function at a 50% survival probability. For additional details and the codebase regarding these calculations, please see https://github.com/haiderstats/ISDEvaluation (accessed on 16 June 2021).
The survival effects were assessed using the mean model weights derived from out-of-fold data (i.e., unseen data) in the 5-fold CV. The 5 most influential variables, defined as the largest mean absolute value of the MTLR weights, were selected. The survival weights as a function of time were plotted using ggplot2 version 3.4.4 as loess smooth curves, with 95% confidence intervals for the top 5 variables [28]. A forest plot was developed using the mean model weights, and the 95% confidence interval was calculated using 1000 bootstraps with replacement. We assessed the significance of each covariate using the one-sample Wilcoxon test, assuming the null hypothesis that a model weight of zero provides no survival effect. The Benjamini–Hochberg method was used to correct for multiple comparisons. Statistical significance was defined as p < 0.05. A Cox proportional hazards model, which was separate from the ISD Cox models, was also developed using the same patient data for the MTLR model. This Cox model was constructed without cross-validation and instead used all 1043 patients to generate the Cox coefficients. The similarity between the Cox regression coefficients and the MTLR model weights was assessed using Pearson’s correlation.

3. Results

We constructed ISDs to expand on the utility of our integrated molecular classification models using continuous model probability scores. Using 1043 patients with available clinicopathologic characteristics (Table 1), we evaluated the performance of several ISD models for our prediction task (Table 2). Multi-task logistic regression provided a superior calibrated model with a nearly identical C-Index compared to CoxEN-KP (MTLR C-Index = 72.1 ± 3.3% versus CoxKP-EN = 72.2 ± 2.9%). The MTLR model was D-calibrated and 1-calibrated for all bins except the 50% percentile (Table 2).
In contrast to Cox, MTLR is not bound by the proportional hazards assumption. We evaluated whether MTLR could provide unique insight into survival effects by modelling the weights derived from out-of-fold patients in the 5-fold CV. In Figure 1A, the loess smooths and their 95% confidence intervals illustrate the relationship of the top five most influential covariates to survival over the entire time course of our model. As opposed to presenting a consistent risk over time, this approach suggests that certain covariates present more prominent effects at different times from disease presentation. For example, early beneficial effects are observed for chemotherapy, which taper off after 24 months. Stage IV disease exerted a constant negative survival effect over 10 years, but Stage III disease did not provide increased death until after one year. Notably, a greater TME score was most associated with long-term survival beyond 5 years and closely mirrored the temporal effects of chemotherapy.
To enhance familiarity with ISD models we assessed whether the estimates from the MTLR models are similar to the “gold standard” Cox regression coefficients. In Figure 1B, negative weights derived from the MTLR model correspond to improved survival. Here, we present the mean weights for all predicted time points with their respective 95% confidence intervals. Stages III and IV provided the greatest effect size on survival (mean weight of 0.027 (95% CI 0.022, 0.033); p < 0.001, and mean weight of 0.049 (95% CI 0.046, 0.053); p < 0.001, respectively). Survival was also significantly decreased by age, Stage II, and increasing epithelial-to-mesenchymal transition (EMT) and chromosomal instability (CIN) scores. The most beneficial survival effects were observed for chemotherapy and increasing high TME scores (mean weight of −0.021 (95% CI −0.027, −0.016); p < 0.001, and mean weight of −0.015 (95% CI −0.020, −0.010); p < 0.001, respectively). Microsatellite instability, as defined by our TCGA classifier, significantly improved survival, whereas ACRG MSI was not significant. With this appealing presentation, MTLR can provide familiar interpretations of survival effects relative to traditional Cox models.
Next, we compared the similarity of the coefficients of the Cox proportional hazards model to the mean weights derived from the MTLR model. We show that there are strong correlations (Pearson coefficient = 0.86, p < 0.0001) among the model effects for the variables of interest in Figure 1C. Thus, an ISD using MTLR provides similar population-based interpretations as Cox models when averaged over all predicted time points but also provides advantages over Cox, including better calibration and the absence of the proportional hazards assumption.
We evaluated the ability of our MTLR model to provide insights into personalized medicine for gastric cancer. Three scenarios are presented in Figure 2A. Scenario a (i.e., blue curves) presents the effects of chemotherapy in two Stage II males in their sixties with similar TME scores. Here, the patient who received chemotherapy had a 13.8% and 18.7% greater probability of survival at 24 and 48 months, respectively. Scenario b (i.e., red curves) illustrates the relationship between the TME score and chemotherapy in Stage III chromosomal instability (CIN) tumours. We observed that the survival curves for a patient who did not receive chemotherapy are nearly identical to one who received chemotherapy with a low TME tumour. MTLR provided additional insight into the individual survival effects of other molecular subtypes. In Scenario c (i.e., purple curves), we found that a high epithelial-to-mesenchymal (EMT) score profoundly affected the overall survival in comparable Stage IV, chemo-naïve, and low TME tumours (median survival with high EMT = 9.4 months versus low EMT = 14.5 months).
We investigated potential counterfactual applications of ISD models to facilitate the communication of personalized medicine. In Figure 2B, we demonstrate the survival benefit, as interpreted by our MTLR model, of administering chemotherapy to a 67-year-old female with a Stage IV, high-TME-score gastric cancer. Although additional research is required, counterfactual scenarios could be presented to patients as a visual interpretation of otherwise foreign and abstract statistical estimates of treatment benefits/harms.

4. Discussion

This study presents a survival modelling perspective called individual survival distributions with the MTLR model to provide novel insight into personalized survival predictions of gastric adenocarcinoma. Prior research has demonstrated that ISDs provide a valuable alternative to popular personalized survival modelling strategies, such as nomograms [12,15]. Using MTLR, we identified that the TME score and chemotherapy provide non-proportional but similar survival effects over time. In our analysis, the TME score provided the greatest beneficial survival effect beyond 5 years. Conversely, the disease stage was the most prominent prognostic factor contributing to poor patient survival. Of note, prominent molecular classification subtypes derived from the TCGA and ACRG cohorts were not among the top 5 most influential variables for survival.
Personalized survival interpretation using ISDs provides an intuitive and visually appealing method to investigate and present survival effects. In addition to illustrating individual survival curves, we also demonstrate that the MTLR model can provide population-based interpretations of survival effects similar to Cox proportional hazards when the model weights are averaged over the entire time modelled. In this study, we quantitatively demonstrated the statistically significant similarity between the Cox model coefficients and the MTLR model weights. Our intention for this comparison is to convey a sense of familiarity with the MTLR model for researchers who may typically use Cox regression. The ubiquitous Cox model was effectively designed to produce an effective discriminatory tool (i.e., C-Index). However, it suffers from poor calibration, which motivated us to consider more general models, such as MTLR. In this study, we empirically confirm that MTLR provides nearly equivalent model discrimination but also better calibration. Thus, we propose that ISDs using MTLR is a valuable tool for survival analysis and may be used in addition to or instead of traditional Cox models. It is important to note that the interpretation of the survival effects of MTLR variables is relative to the variables included in that specific model. This interpretation also applies to Cox models and any other survival model generated using observational data.
Specific to gastric cancer, our MTLR model approach builds on previous research that shows that the tumour immune microenvironment is a significant prognostic variable [2,16,17]. The TME score was developed using an in silico analysis of the ACRG microarray-based molecular classification of gastric cancer. The TME score is characterized by immune activation, the response to the virus, and the interferon-gamma response, as well as enrichment in chemokines, such as CXCL10, CXCL9, and CCL4 [16]. It was also demonstrated to be a predictive biomarker for the immune checkpoint inhibitor response in advanced melanoma and metastatic urothelial cancers. The TME score is strongly associated with EBV-type and MSI gastric cancer, which have been demonstrated to possess immune-rich microenvironments [2,17]. In an era of immunotherapy, a personalized survival model informed by the TME score could provide a valuable tool in allocating immune checkpoint blockade therapy.
Our results provide insight into how gastric cancer survival may be optimized. For example, age, Stage II, and Stage III represent the majority of the top five most influential variables for survival. These variables are unmodifiable at the time of diagnosis for a given patient. The only method to improve these prognostic factors would be to identify disease at an earlier stage and age. Although this strategy is admirable, universal screening in low-incidence populations, such as those in Western nations, is not feasible. However, first-generation immigrants from high-incidence countries comprise one screening population that could be targeted in low-incidence nations [29,30,31]. In contrast to age and stage, the other top five prognostic factors, namely chemotherapy and TME score, are potentially modifiable factors. The choice of an individual patient to forgo chemotherapy is a significant factor for survival. Thus, conveying the importance of chemotherapy using ISDs is one method that may improve adherence to chemotherapy treatment. Of course, the use of chemotherapy must be balanced with the patient’s performance status and goals of care.
The TME score is the most intriguing modifiable risk factor. The efficacy of adjuvant and neoadjuvant chemotherapy/chemoradiotherapy, as well as checkpoint-inhibition therapy, has been demonstrated to be associated with the tumour immune microenvironment [32,33,34,35]. Thus, augmenting the tumour immune microenvironment is an intriguing therapeutic strategy [36,37]. Two potential methods to achieve this goal include cancer vaccines or chemokine/cytokine therapy. In a landmark Phase 1B trial, Rojas et al. demonstrated that personalized mRNA BioNTech vaccines containing unique tumour neoantigens significantly augmented immune microenvironments and survival in pancreatic adenocarcinoma patients [38]. Relevant chemokine/cytokine-directed strategies include combining IL-12 plasmid therapy with immune checkpoint inhibition, or a CXCR4 antagonist and pembrolizumab with chemotherapy, among others [39,40].
There are several limitations of our observational study. The population consisted mainly of patients from Asian countries. Furthermore, there is potential for selection bias, as only patients with publicly available whole-transcriptome data were included. Additionally, the chemotherapy regimens were heterogeneous and only consisted of adjuvant chemotherapy. Thus, the generalizability of our findings must be validated in the context of neoadjuvant chemotherapy, which is the predominate treatment regimen in Western countries [41].
There is currently no data evaluating the utility of ISDs in improving patient outcomes, physician decision making, or patient education. Indeed, despite the exciting prospects of ISDs, their value must be shown in clinical care. Future studies should include patient-reported outcomes and qualitative analyses of the perception of ISDs versus other prognostic tools, such as physician-directed discussions or computer-based tools, such as nomograms. We hypothesize that the visual presentation of survival in an individualized and appealing graphical format may provide an improved perspective on disease severity, decrease patient uncertainty regarding their diagnosis, and enhance confidence in patient-centered decision making. A similar sentiment regarding nomograms is well-characterized. Regardless of model accuracy or calibration performance, personalized survival models are most valuable if they ultimately improve patient and physician satisfaction and survival outcomes [42].
To facilitate the use of ISD models and the MTLR model, we provide links and citations in Section 2 of this manuscript to the R codebase used to implement ISD models, the MTLR R package, and the Python package Survival EVAL [13,43].

5. Conclusions

Individual survival distributions using the MTLR model provide novel insight into gastric cancer prognostic factors in the context of a pooled analysis of publicly available data annotated with integrated molecular subtype classification. Additional external validation and investigation into the clinical utility of this ISD model is an area of future research.

Author Contributions

Conceptualization, D.S., J.S., S.G., R.G., D.E.S. and G.R.R.; methodology, D.S. and R.G.; formal analysis, D.S.; investigation, D.S.; data curation, D.S.; writing—original draft preparation, D.S.; writing—review and editing, D.S., J.S., S.G., R.G., D.E.S. and G.R.R.; visualization, D.S.; supervision, J.S., S.G., R.G., D.E.S. and G.R.R.; project administration, D.E.S. and G.R.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study. The data included in this study are publicly available and adherent to the ethical requirements at Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/, accessed on 1 March 2022) for studies GSE62254, GSE26253, GSE13861, GSE26899, GSE26901, and Genomic Data Commons (https://www.cancer.gov/tcga, accessed on 1 March 2022) for TCGA data.

Informed Consent Statement

All patients gave informed consent, keeping with the policies of the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/, accessed on 1 March 2022) for studies GSE62254, GSE26253, GSE13861, GSE26899, GSE26901, and Genomic Data Commons (https://www.cancer.gov/tcga, accessed on 1 March 2022) for TCGA data.

Data Availability Statement

All data used to perform the analysis in this study are available publicly at https://github.com/skubleny/ISD_GastricCancer/tree/main (accessed on 25 November 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cancer Genome Atlas Research Network. Comprehensive Molecular Characterization of Gastric Adenocarcinoma. Nature 2014, 513, 202–209. [Google Scholar] [CrossRef]
  2. Kim, S.T.; Cristescu, R.; Bass, A.J.; Kim, K.M.; Odegaard, J.I.; Kim, K.; Liu, X.Q.; Sher, X.; Jung, H.; Lee, M.; et al. Comprehensive Molecular Characterization of Clinical Responses to PD-1 Inhibition in Metastatic Gastric Cancer. Nat. Med. 2018, 24, 1449–1458. [Google Scholar] [CrossRef]
  3. Cristescu, R.; Lee, J.; Nebozhyn, M.; Kim, K.-M.; Ting, J.C.; Wong, S.S.; Liu, J.; Yue, Y.G.; Wang, J.; Yu, K.; et al. Molecular Analysis of Gastric Cancer Identifies Subtypes Associated with Distinct Clinical Outcomes. Nat. Med. 2015, 21, 449–456. [Google Scholar] [CrossRef]
  4. Patel, M.A.; Kratz, J.D.; Lubner, S.J.; Loconte, N.K.; Uboha, N.V. Esophagogastric Cancers: Integrating Immunotherapy Therapy into Current Practice. J. Clin. Oncol. 2022, 40, 2751–2762. [Google Scholar] [CrossRef] [PubMed]
  5. Kelly, R.J.; Ajani, J.A.; Kuzdzal, J.; Zander, T.; van Cutsem, E.; Piessen, G.; Mendez, G.; Feliciano, J.; Motoyama, S.; Lièvre, A.; et al. Adjuvant Nivolumab in Resected Esophageal or Gastroesophageal Junction Cancer. N. Engl. J. Med. 2021, 384, 1191–1203. [Google Scholar] [CrossRef] [PubMed]
  6. André, T.; André, A.; Tougeron, D.; Piessen, G.; De La Fouchardì, C.; Louvet, C.; Adenis, A.; Jary, M.; Tournigand, C.; Aparicio, T.; et al. Neoadjuvant Nivolumab Plus Ipilimumab and Adjuvant Nivolumab in Localized Deficient Mismatch Repair/Microsatellite Instability-High Gastric or Esophagogastric Junction Adenocarcinoma: The GERCOR NEONIPIGA Phase II Study. J. Clin. Oncol. 2022, 41, 255–265. [Google Scholar] [CrossRef] [PubMed]
  7. Smyth, E.C.; Wotherspoon, A.; Peckitt, C.; Gonzalez, D.; Hulkki-Wilson, S.; Eltahir, Z.; Fassan, M.; Rugge, M.; Valeri, N.; Okines, A.; et al. Mismatch Repair Deficiency, Microsatellite Instability, and Survival: An Exploratory Analysis of the Medical Research Council Adjuvant Gastric Infusional Chemotherapy (MAGIC) Trial. JAMA Oncol. 2017, 3, 1197–1203. [Google Scholar] [CrossRef] [PubMed]
  8. Choi, Y.Y.; Kim, H.; Shin, S.-J.; Kim, H.Y.; Lee, J.; Yang, H.-K.; Kim, W.H.; Kim, Y.-W.; Kook, M.-C.; Park, Y.K.; et al. Microsatellite Instability and Programmed Cell Death-Ligand 1 Expression in Stage II/III Gastric Cancer: Post Hoc Analysis of the CLASSIC Randomized Controlled Study. Ann. Surg. 2019, 270, 309–316. [Google Scholar] [CrossRef] [PubMed]
  9. Pietrantonio, F.; Miceli, R.; Raimondi, A.; Kim, Y.W.; Kang, W.K.; Langley, R.E.; Choi, Y.Y.; Kim, K.M.; Nankivell, M.G.; Morano, F.; et al. Individual Patient Data Meta-Analysis of the Value of Microsatellite Instability as a Biomarker in Gastric Cancer. J. Clin. Oncol. 2019, 37, 3392–3400. [Google Scholar] [CrossRef] [PubMed]
  10. Lordick, F. Chemotherapy for Resectable Microsatellite Instability-High Gastric Cancer? Lancet Oncol. 2020, 21, 203. [Google Scholar] [CrossRef]
  11. Smyth, E.C. Chemotherapy for Resectable Microsatellite Instability-High Gastric Cancer? Lancet Oncol. 2020, 21, 204. [Google Scholar] [CrossRef] [PubMed]
  12. Haider, H.; Hoehn, B.; Davis, S.; Greiner, R. Effective Ways to Build and Evaluate Individual Survival Distributions. J. Mach. Learn. Res. 2020, 21, 85. [Google Scholar]
  13. Haider, H. MTLR: Survival Prediction with Multi-Task Logistic Regression; 2019. Available online: https://github.com/haiderstats/MTLR. (accessed on 16 June 2021).
  14. Cheung, C.C.; Vittinghoff, E.; Marcus, G.M.; Gerstenfeld, E.P. Beware of the Hazards: Limitations of the Proportional Hazards Assumption. EP Eur. 2021, 23, 2048. [Google Scholar] [CrossRef] [PubMed]
  15. Kumar, N.; Skubleny, D.; Parkes, M.; Verma, R.; Davis, S.; Kumar, L.; Aissiou, A.; Greiner, R. Learning Individual Survival Models from PanCancer Whole Transcriptome Data. Clin. Cancer Res. 2023, 29, 3924–3936. [Google Scholar] [CrossRef]
  16. Zeng, D.; Li, M.; Zhou, R.; Zhang, J.; Sun, H.; Shi, M.; Bin, J.; Liao, Y.; Rao, J.; Liao, W. Tumor Microenvironment Characterization in Gastric Cancer Identifies Prognostic and Immunotherapeutically Relevant Gene Signatures. Cancer Immunol. Res. 2019, 7, 737–750. [Google Scholar] [CrossRef]
  17. Skubleny, D.; Purich, K.; Williams, T.; Wickware, J.; McLean, D.R.; Martins-Filho, S.N.; Buttenschoen, K.; Haase, E.; McCall, M.; Ghosh, S.; et al. The Tumour Immune Microenvironment Drives Survival Outcomes and Therapeutic Response in an Integrated Molecular Analysis of Gastric Adenocarcinoma. Clin. Cancer Res. 2023; submitted. [Google Scholar]
  18. Qi, S.-A.; Kumar, N.; Farrokh, M.; Sun, W.; Kuan, L.-H.; Ranganath, R.; Henao, R.; Greiner, R. An Effective Meaningful Way to Evaluate Survival Models. Proc. Mach. Learn. Res. 2023, 202, 28244–28276. Available online: https://nyuscholars.nyu.edu/en/publications/an-effective-meaningful-way-to-evaluate-survival-models (accessed on 6 December 2023).
  19. Harrell, F.E., Jr.; Lee, K.L.; Mark, D.B. Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors. Stat. Med. 1996, 15, 361–387. [Google Scholar] [CrossRef]
  20. Hosmer, D.W.; Lemesbow, S. Goodness of Fit Tests for the Multiple Logistic Regression Model. Commun. Stat. Theory Methods 1980, 9, 1043–1069. [Google Scholar] [CrossRef]
  21. Kalbfleisch, J.D.; Prentice, R.L. Marginal Likelihoods Based on Cox’s Regression and Life Model. Biometrika 1973, 60, 267–278. [Google Scholar] [CrossRef]
  22. Simon, N.; Friedman, J.H.; Hastie, T.; Tibshirani, R. Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J. Stat. Softw. 2011, 39, 1–13. [Google Scholar] [CrossRef]
  23. Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  24. Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random Survival Forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
  25. Stute, W. Consistent Estimation Under Random Censorship When Covariables Are Present. J. Multivar. Anal. 1993, 45, 89–103. [Google Scholar] [CrossRef]
  26. Yu, C.-N.; Greiner, R.; Lin, H.-C.; Baracos, V. Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors. In Advances in Neural Information Processing Systems, Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2011; Volume 24. [Google Scholar]
  27. Hyman, J.M. Accurate Monotonicity Preserving Cubic Interpolation. SIAM J. Sci. Stat. Comput. 1983, 4, 645–654. [Google Scholar] [CrossRef]
  28. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis; Springer-Verlag: New York, NY, USA, 2016. [Google Scholar]
  29. Kim, G.H.; Bang, S.J.; Ende, A.R.; Hwang, J.H. Is Screening and Surveillance for Early Detection of Gastric Cancer Needed in Korean Americans? Korean J. Intern. Med. 2015, 30, 747–758. [Google Scholar] [CrossRef]
  30. Shah, S.C.; Canakis, A.; Peek, R.M.; Saumoy, M. Endoscopy for Gastric Cancer Screening Is Cost Effective for Asian Americans in the United States. Clin. Gastroenterol. Hepatol. 2020, 18, 3026–3039. [Google Scholar] [CrossRef] [PubMed]
  31. Kim, G.H.; Liang, P.S.; Bang, S.J.; Hwang, J.H. Screening and Surveillance for Gastric Cancer in the United States: Is It Needed? Gastrointest. Endosc. 2016, 84, 18–28. [Google Scholar] [CrossRef]
  32. Park, Y.H.; Lal, S.; Lee, J.E.; Choi, Y.L.; Wen, J.; Ram, S.; Ding, Y.; Lee, S.H.; Powell, E.; Lee, S.K.; et al. Chemotherapy Induces Dynamic Immune Responses in Breast Cancers That Impact Treatment Outcome. Nat. Commun. 2020, 11, 6175. [Google Scholar] [CrossRef]
  33. Mlecnik, B.; Bindea, G.; Angell, H.K.; Maby, P.; Angelova, M.; Tougeron, D.; Church, S.E.; Lafontaine, L.; Fischer, M.; Fredriksen, T.; et al. Integrative Analyses of Colorectal Cancer Show Immunoscore Is a Stronger Predictor of Patient Survival Than Microsatellite Instability. Immunity 2016, 44, 698–711. [Google Scholar] [CrossRef]
  34. El Sissy, C.; Kirilovsky, A.; Lagorce Pagès, C.; Marliot, F.; Custers, P.A.; Dizdarevic, E.; Sroussi, M.; Castillo-Martin, M.; Haicheur, N.; Dermani, M.; et al. International Validation of the Immunoscore Biopsy in Patients with Rectal Cancer Managed by a Watch-and-Wait Strategy. J. Clin. Oncol. 2023, 42, 70–80. [Google Scholar] [CrossRef]
  35. André, T.; Shiu, K.-K.; Kim, T.W.; Jensen, B.V.; Jensen, L.H.; Punt, C.; Smith, D.; Garcia-Carbonero, R.; Benavides, M.; Gibbs, P.; et al. Pembrolizumab in Microsatellite-Instability–High Advanced Colorectal Cancer. N. Engl. J. Med. 2020, 383, 2207–2218. [Google Scholar] [CrossRef] [PubMed]
  36. Duan, Q.; Zhang, H.; Zheng, J.; Zhang, L. Turning Cold into Hot: Firing up the Tumor Microenvironment. Trends Cancer 2020, 6, 605–618. [Google Scholar] [CrossRef]
  37. Zhang, J.; Huang, D.; Saw, P.E.; Song, E. Turning Cold Tumors Hot: From Molecular Mechanisms to Clinical Applications. Trends Immunol. 2022, 43, 523–545. [Google Scholar] [CrossRef] [PubMed]
  38. Rojas, L.A.; Sethna, Z.; Soares, K.C.; Olcese, C.; Pang, N.; Patterson, E.; Lihm, J.; Ceglia, N.; Guasp, P.; Chu, A.; et al. Personalized RNA Neoantigen Vaccines Stimulate T Cells in Pancreatic Cancer. Nature 2023, 618, 144–150. [Google Scholar] [CrossRef]
  39. Algazi, A.P.; Twitty, C.G.; Tsai, K.K.; Le, M.; Pierce, R.; Browning, E.; Hermiz, R.; Canton, D.A.; Bannavong, D.; Oglesby, A.; et al. Phase II Trial of IL-12 Plasmid Transfection and PD-1 Blockade in Immunologically Quiescent Melanoma. Clin. Cancer Res. 2020, 26, 2827–2837. [Google Scholar] [CrossRef] [PubMed]
  40. Bockorny, B.; Semenisty, V.; Macarulla, T.; Borazanci, E.; Wolpin, B.M.; Stemmer, S.M.; Golan, T.; Geva, R.; Borad, M.J.; Pedersen, K.S.; et al. BL-8040, a CXCR4 Antagonist, in Combination with Pembrolizumab and Chemotherapy for Pancreatic Cancer: The COMBAT Trial. Nat. Med. 2020, 26, 878–885. [Google Scholar] [CrossRef] [PubMed]
  41. Al-Batran, S.-E.; Homann, N.; Pauligk, C.; Goetze, T.O.; Meiler, J.; Kasper, S.; Kopp, H.-G.; Mayer, F.; Haag, G.M.; Luley, K.; et al. Perioperative Chemotherapy with Fluorouracil plus Leucovorin, Oxaliplatin, and Docetaxel versus Fluorouracil or Capecitabine plus Cisplatin and Epirubicin for Locally Advanced, Resectable Gastric or Gastro-Oesophageal Junction Adenocarcinoma (FLOT4): A Randomised, Phase 2/3 Trial. Lancet 2019, 393, 1948–1957. [Google Scholar] [CrossRef]
  42. Balachandran, V.P.; Gonen, M.; Smith, J.J.; DeMatteo, R.P. Nomograms in Oncology: More than Meets the Eye. Lancet Oncol. 2015, 16, e173–e180. [Google Scholar] [CrossRef]
  43. Qi, S.-A.; Sun, W.; Greiner, R. SurvivalEVAL: A Comprehensive Open-Source Python Package for Evaluating Individual Survival Distributions. In Proceedings of the 2023 AAAI Fall Symposia, Arlington, VA, USA, 25–27 October 2023. [Google Scholar] [CrossRef]
Figure 1. Individual survival distributions using multitask logistic regression. (A) Loess smooth curves for the top 5 most influential covariate MTLR weights generated using 5-fold cross-validation. Shaded bands represent 95% confidence intervals. (B) Mean MTLR weights averaged over all time points for each covariate in our model. The point represents the mean, and the semi-transparent box represents the 95% confidence interval estimate from 1000 bootstraps. Weights less than zero favour survival. A one-sample t-test was performed to assess if a covariate was significantly greater than zero, and p values were corrected using Benjamini–Hochberg. The p-value significance is denoted in the plot legend. (C) Scatter plot of Cox proportional hazards model coefficients versus MTLR weight for each variable. The Pearson correlation coefficient and p-value are denoted in the plot. Acronyms: MSS TP53+ = microsatellite stable tp53 positive; MSS TP53− = microsatellite stable tp53 negative; EMT = epithelial-to-mesenchymal transition.
Figure 1. Individual survival distributions using multitask logistic regression. (A) Loess smooth curves for the top 5 most influential covariate MTLR weights generated using 5-fold cross-validation. Shaded bands represent 95% confidence intervals. (B) Mean MTLR weights averaged over all time points for each covariate in our model. The point represents the mean, and the semi-transparent box represents the 95% confidence interval estimate from 1000 bootstraps. Weights less than zero favour survival. A one-sample t-test was performed to assess if a covariate was significantly greater than zero, and p values were corrected using Benjamini–Hochberg. The p-value significance is denoted in the plot legend. (C) Scatter plot of Cox proportional hazards model coefficients versus MTLR weight for each variable. The Pearson correlation coefficient and p-value are denoted in the plot. Acronyms: MSS TP53+ = microsatellite stable tp53 positive; MSS TP53− = microsatellite stable tp53 negative; EMT = epithelial-to-mesenchymal transition.
Cancers 16 00786 g001
Figure 2. Generation of personalized survival curves using ISDs. (A) Individual survival curves for 8 patients using a learned MTLR model. The x-axis represents time in months, and the y-axis represents survival probability. The patient characteristics for each colour are shown in the plot legend below. Scenario a (i.e., blue curves) presents the effects of chemotherapy in two Stage II males in their sixties with similar TME scores. Scenario b (i.e., red curves) illustrates the relationship between the TME score and chemo-therapy in Stage III chromosomal instability (CIN) tumours. Scenario c (i.e., purple curves), Stage IV, chemo-naïve, and low TME tumours. (B) Example of a counterfactual scenario illustrating the predicted effect of chemotherapy for a given patient.
Figure 2. Generation of personalized survival curves using ISDs. (A) Individual survival curves for 8 patients using a learned MTLR model. The x-axis represents time in months, and the y-axis represents survival probability. The patient characteristics for each colour are shown in the plot legend below. Scenario a (i.e., blue curves) presents the effects of chemotherapy in two Stage II males in their sixties with similar TME scores. Scenario b (i.e., red curves) illustrates the relationship between the TME score and chemo-therapy in Stage III chromosomal instability (CIN) tumours. Scenario c (i.e., purple curves), Stage IV, chemo-naïve, and low TME tumours. (B) Example of a counterfactual scenario illustrating the predicted effect of chemotherapy for a given patient.
Cancers 16 00786 g002
Table 1. Patient demographics.
Table 1. Patient demographics.
Characteristicn/N (Missing %)N = 1043 1
Age1043/1043 (0%)59 (49, 67)
Stage1043/1043 (0%)
   I 170 (16%)
   II 330 (32%)
   III 339 (33%)
   IV 204 (20%)
Sex1043/1043 (0%)
   Female 359 (34%)
   Male 684 (66%)
TCGA Subtype1043/1043 (0%)
   Chromosomal Instability 824 (79%)
   Epstein–Barr Virus Type 43 (4.1%)
   Genomically Stable 66 (6.3%)
   Microsatellite Instability 110 (11%)
ACRG Subtype1043/1043 (0%)
   Epithelial-to-Mesenchymal Transition 118 (11%)
   Microsatellite Instability 162 (16%)
   Microsatellite Stable TP53 Negative 412 (40%)
   Microsatellite Stable TP53 Positive 351 (34%)
TME Subtype1043/1043 (0%)
   High 478 (46%)
   Low 565 (54%)
Lauren Classification1043/1043 (0%)
   Diffuse 495 (47%)
   Intestinal 504 (48%)
   Mixed 44 (4.2%)
Tumour Location1043/1043 (0%)
   Distal 537 (51%)
   Proximal 482 (46%)
   Whole 24 (2.3%)
Treatment1043/1043 (0%)
   No 299 (29%)
   Yes 744 (71%)
Study1043/1043 (0%)
   ACRG 219 (21%)
   Kosin 98 (9.4%)
   KUGH 82 (7.9%)
   Samsung 432 (41%)
   TCGA 151 (14%)
   Yonsei MDACC 61 (5.8%)
1 Median (IQR); n (%); MDACC = MD Anderson Cancer Center; KUGH = Korea University Guro Hospital.
Table 2. Individual survival distribution results.
Table 2. Individual survival distribution results.
Model
MetricAFTCoxKPCoxKPENMTLRRSF
Concordance 10.720 ± 0.0280.721 ± 0.0280.722 ± 0.0290.721 ± 0.0330.699 ± 0.048
D-Calibration 20.4250.9930.9940.9800.866
1-Calibration 10th 20.1470.4470.8980.0860.354
1-Calibration 25th 20.0240.4700.4770.5060.026
1-Calibration 50th 20.0000.0110.0270.0420.447
1-Calibration 75th 20.0000.0020.0060.2430.655
1-Calibration 90th 20.0000.0090.0270.1320.050
Integrated Brier 10.176 ± 0.0150.169 ± 0.0110.169 ± 0.0110.178 ± 0.0150.178 ± 0.021
1 Mean ± standard deviation; 2 p-value.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Skubleny, D.; Spratlin, J.; Ghosh, S.; Greiner, R.; Schiller, D.E.; Rayat, G.R. Individual Survival Distributions Generated by Multi-Task Logistic Regression Yield a New Perspective on Molecular and Clinical Prognostic Factors in Gastric Adenocarcinoma. Cancers 2024, 16, 786. https://doi.org/10.3390/cancers16040786

AMA Style

Skubleny D, Spratlin J, Ghosh S, Greiner R, Schiller DE, Rayat GR. Individual Survival Distributions Generated by Multi-Task Logistic Regression Yield a New Perspective on Molecular and Clinical Prognostic Factors in Gastric Adenocarcinoma. Cancers. 2024; 16(4):786. https://doi.org/10.3390/cancers16040786

Chicago/Turabian Style

Skubleny, Daniel, Jennifer Spratlin, Sunita Ghosh, Russell Greiner, Daniel E. Schiller, and Gina R. Rayat. 2024. "Individual Survival Distributions Generated by Multi-Task Logistic Regression Yield a New Perspective on Molecular and Clinical Prognostic Factors in Gastric Adenocarcinoma" Cancers 16, no. 4: 786. https://doi.org/10.3390/cancers16040786

APA Style

Skubleny, D., Spratlin, J., Ghosh, S., Greiner, R., Schiller, D. E., & Rayat, G. R. (2024). Individual Survival Distributions Generated by Multi-Task Logistic Regression Yield a New Perspective on Molecular and Clinical Prognostic Factors in Gastric Adenocarcinoma. Cancers, 16(4), 786. https://doi.org/10.3390/cancers16040786

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop