Multimodal Machine Learning for Prognosis and Survival Prediction in Renal Cell Carcinoma Patients: A Two-Stage Framework with Model Fusion and Interpretability Analysis

Yan, Keyue; Fong, Simon; Li, Tengyue; Song, Qun

doi:10.3390/app14135686

Open AccessArticle

Multimodal Machine Learning for Prognosis and Survival Prediction in Renal Cell Carcinoma Patients: A Two-Stage Framework with Model Fusion and Interpretability Analysis

¹

Department of Computer and Information Science, University of Macau, Macau SAR 999078, China

²

Chongqing Key Laboratory of Intelligent Perception and Blockchain, Department of Artificial Intelligence, Chongqing Technology and Business University, Chongqing 400067, China

³

Department of Computer Science, North China University of Technology, Beijing 100144, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5686; https://doi.org/10.3390/app14135686

Submission received: 5 May 2024 / Revised: 26 June 2024 / Accepted: 26 June 2024 / Published: 29 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Current medical limitations in predicting cancer survival status and time necessitate advancements beyond traditional methods and physical indicators. This research introduces a novel two-stage prognostic framework for renal cell carcinoma, addressing the inadequacies of existing diagnostic approaches. In the first stage, the framework accurately predicts the survival status (alive or deceased) with metrics Accuracy, Precision, Recall, and F1 score to evaluate the effects of the classification results, while the second stage focuses on forecasting the future survival time of deceased patients with Root Mean Square Error and Mean Absolute Error to evaluate the regression results. Leveraging popular machine learning models, such as Adaptive Boosting, Extra Trees, Gradient Boosting, Random Forest, and Extreme Gradient Boosting, along with fusion models like Voting, Stacking, and Blending, our approach significantly improves prognostic accuracy as shown in our experiments. The novelty of our research lies in the integration of a logistic regression meta-model for interpreting the blending model’s predictions, enhancing transparency. By the SHapley Additive exPlanations’ interpretability, we provide insights into variable contributions, aiding understanding at both global and local levels. Through modal segmentation and multimodal fusion applied to raw data from the Surveillance, Epidemiology, and End Results program, we enhance the precision of renal cell carcinoma prognosis. Our proposed model provides an interpretable analysis of model predictions, highlighting key variables influencing classification and regression decisions in the two-stage renal cell carcinoma prognosis framework. By addressing the black-box problem inherent in machine learning, our proposed model helps healthcare practitioners with a more reliable and transparent basis for applying machine learning in cancer prognostication.

Keywords:

renal cell carcinoma; machine learning; survival prediction; multimodal data

1. Introduction

Renal cell carcinoma (RCC) is a type of malignant tumor that originates in the kidney and mainly affects the epithelial cells of the renal tubules. It is one of the most common and deadly cancers of the urinary system. RCC accounts for 2–3% of adult malignancies and is the most lethal cancer of the genitourinary system [1]. According to statistics in America, there were 73,750 new cases and 14,830 deaths in 2020. Moreover, up to 16% of patients with RCC have distant metastases to lungs, bone, brain, adrenal glands, contralateral kidney, and liver at diagnosis, and their 5-year survival rate is only 13% [2,3]. To diagnose RCC, medical professionals usually rely on imaging tests such as ultrasound, computed tomography, magnetic resonance imaging, and so on. Sometimes, a biopsy or surgical exploration may be required [4]. The TNM Classification of Malignant Tumors (TNM stage) grading system is used to determine different stages of RCC based on tumor size, lymph node metastasis and distant metastasis. The treatment options for RCC include surgical resection, targeted drug therapy, immunotherapy and other programs [5]. The specific treatment plan depends on various factors such as the patient’s stage, subtype, gene mutation, general condition, and so on. However, machine learning algorithms have become increasingly popular in medical diagnosis and treatment approaches with the rise of artificial intelligence technologies. Machine learning algorithms can analyze large amounts of clinical and bioinformatic data to improve the efficiency and accuracy of diagnosis, prognosis, treatment and prevention [6]. For RCC, machine learning and data mining methods can be applied to further analyze data from RCC patients and predict their survival times. In this research, we use data collected from The Surveillance, Epidemiology, and End Results (SEER) program, which contains pathology, treatment and survival data for various types of cancer patients to investigate machine learning for predicting survival state and time for RCC patients. In this research, we extract data from SEER PLUS on RCC patients collected from 2000 to 2017. We use unimodal and multimodal data consisting of each patient’s personal information (gender, age, race, etc.), tumor pathology (tumor size, number of tumors, tumor TNM grade, tumor site of origin, etc.) and treatment (surgery, radiotherapy, chemotherapy, etc.) to compose the original dataset. And then we train different types of machine learning models to predict the mortality and survival time of RCC patients with classification and regression. We propose a new approach to RCC prognosis based on machine learning model fusion and multimodal data. The main contributions of this research are as follows:

For the data application, we combine unimodal and multimodal data for prediction and prognosis analysis. Few previous studies have focused on how different types of data in the SEER database can affect the performance of machine learning models for RCC. In this research, we conduct a comparative study.
For the models, we use not only traditional machine learning models (e.g., Adaptive Boosting, Extra Trees, Gradient Boosting, Random Forest, and Extreme Gradient Boosting) but also fusion models (e.g., Voting, Stacking, and Blending) that combine these traditional models to achieve stable prediction results.
In the experimental process, we apply a two-stage framework consisting of classification–regression prediction processes. First, we use unimodal and multimodal data to classify and predict the mortality of RCC patients. Second, among the predicted deaths of RCC, we regress the prognostic survival time of the RCC patients and classify the prognostic accuracy of RCC after 1, 2, 3, 4 and 5 years based on the predicted regression results.
We conduct exploratory analyses and discussions of the prediction variables after predicting the survival status and survival time of RCC patients. Using SHapley Additive exPlanations (SHAP) and Uplift Models [7,8], we further quantify and examine the variables that significantly affect the predicted outcomes of survival status and survival time.

This research paper continues with a review of existing research on cancer diagnosis and prognosis utilizing data mining or machine learning algorithms, including the SEER program. Then, it describes the data analysis and machine learning approaches used in detail, along with an in-depth explanation of the two-stage survival status prognosis framework for RCC patients. Following that, the experimental findings are presented and examined. At last, the paper concludes with a discussion and summary of the findings.

2. Related Work

Many previous research papers have used the SEER program for various fields of work. There are currently three main types of research on the SEER program. The first type is often conducted by clinicians and bioinformatics analysts who consolidate data from the SEER program and other official databases such as CDC WONDER [9]. They then statistically analyze the risk of cardiovascular disease death. However, this represents one of the most basic and straightforward uses of the SEER program. The second type of research focuses on survival analysis using survival time and survival status variables from the SEER program as dependent variables. Researchers randomly combine the remaining variables to form independent variables used as inputs and outputs for statistical cox regression. This type of research investigates deeper into how different variables in the SEER program affect the length of patient survival [10,11]. The third type of research is currently the most popular and varies based on the variables in the SEER program. It includes prediction of patient survival status, prediction of patient survival time, prediction of patient cancer metastasis, and so on. A wide range of prediction models are applied in these types of studies, including not only traditional machine learning algorithms such as Decision Trees, Random Forest, Naive Bayes classifier, Support Vector Machine, and Adaptive Boosting [12,13,14] but also deep learning models like Artificial Neural Networks when the dataset contains a sufficient number of cancer patient records [15,16].

Some researchers use survival status as the dependent variable or predictor to determine whether a patient is at risk of dying in the future, based on the prediction accuracy of machine learning models. For example, Jiang et al. applied three machine learning models (Extreme Gradient Boosting, Support Vector Machine, and Bayesian Network) to classify the 5-year survival status of osteosarcoma patients, with Extreme Gradient Boosting being identified as the most effective algorithm for classification with potential clinical applications [17]. Similarly, Huang et al. used several tumor pathology variables to evaluate the survival state (alive or deceased) of 5-year breast cancer-specific survival and overall survival with Decision Tree, Random Forest, Support Vector Machine, and other methods [18]. On the other hand, some researchers have focused on regressing the length of survival time rather than the classification of survival status. Lynch et al. employed six different machine learning algorithms to predict the survival months of lung cancer patients with similar independent variables as in the previous research [19]. Our research extends the prediction task into a two-stage framework, in which the first stage predicts the survival status of RCC patients and the second stage predicts the survival time of RCC patients. The exact process is described in detail in Section 3, and the prediction effect of the two-stage framework is given in Section 4.

In order to make the analysis process more realistic and convenient, Xu et al. [20] focused not only on the accuracy and errors of different model algorithms but also on the pre-processing of variables and features in the raw dataset. They applied the Lasso regression model to select the suitable features before machine learning process, reducing training time and increasing efficiency. Similarly, Li et al. [21] used another method of statistical p-values to select features for predicting lung metastasis. For more specialized research, various methods such as ANOVA, Select From Model (SFM), Recursive Feature Elimination (RFE), and Variance (VAR) have been used, or they have been combined to select features for much larger datasets [22]. However, these methods differ in their assumptions in reality, requiring careful evaluation and comparison by researchers. In our research, we pre-process the data according to the recommendations of healthcare professionals and divide the data into multiple unimodal and multimodal datasets to facilitate predictive analysis, and the specific data pre-processing process is described in detail in Section 3.1.

In addition to predicting different problems, researchers have also investigated the importance of independent variables in machine learning models that contribute to the prediction, ranking them according to their importance. For instance, Qiu et al. [23] ranked the feature importance of six machine learning algorithms used in their study and plotted a histogram of the importance of the variables based on the importance scores. In contrast, more advanced interpretability methods such as SHAP have been employed in recent research to make machine learning models more interpretable, rather than treating them as computational black-box models. SHAP treats each feature as a contributor to the prediction, with interpretability based on the Shapley value principle of game theory for interpretative graphing. SHAP can clearly show and visualize the magnitude of fluctuations in the impact of each variable in the model on classification predictions. For example, Sorayaie Azar et al. [24] used SHAP to explain the results of ovarian cancer survival and found that histologic type, chemotherapy recode, year of diagnosis, and age of diagnosis were the most influential factors. Similarly, Alabi et al. [25] applied SHAP to analyze the prognostic results of oropharyngeal cancer and discovered that DFS, HPV, N stage and age were the most important features. Results from SHAP can be more convincing and easier to understand than feature importance plotting. And our two-stage framework also uses SHAP to analyze the importance of variables after each prediction, which is shown in Section 4.1 and Section 4.2.

After calculating the SHAP value for the model and ranking the features’ importance, we select the important features as intervention behaviors (treatment) and apply Uplift Models to examine the effect of the treatment on classification labels (outcome). Uplift Models can help identify the variables most sensitive to the classification labels, and it is a useful technique used in marketing strategies, healthcare interventions, and more. It uses treatment and control groups to measure the effect of classification. One of the methods for Uplift Models is Lai’s Generalized Weighted Uplift Method (LGWUM) [26,27], which employs a four-quadrant approach to compute the Uplift Models’ scores used as descriptive outcomes to target suitable individuals for a given treatment.

3. Materials and Methods

The data for this research are collected from the SEER program downloaded from the website (https://seer.cancer.gov/ (accessed on 1 January 2024)), which is one of the most comprehensive and established cancer statistics databases. All data and variables are available to cancer researchers worldwide through the SEER Data Access Agreement from the website. We select data from SEER PLUS on patients diagnosed with RCC from 2000 to 2017. Only data with positive histology and clinical diagnosis are used as criteria for RCC diagnosis. Moreover, only deceased due to RCC is considered a criterion for death in the survival status. If a patient died from a cause other than RCC, this is not taken into account in this research to ensure that the predicted survival status and survival time are specific to RCC. After selection, 57,002 RCC patients with 29,777 alive and 27,225 deceased are collected for further research.

3.1. Multimodal Data Pre-Processing and Analysis

We select features of RCC patients from the SEER program and divide the features into three categories: personal information, tumor pathology and treatment. Table 1 summarizes these features, including their names, types, values and descriptions. The personal information category contains basic information about the patient, such as gender, race, age, marital status and whether the patient belongs to the Purchased/Referred Care Delivery Area (PRCDA). These features may affect the progression of RCC, so we include them as inputs for machine learning models. The tumor pathology category contains information about the tumor, including the grade and T, N, and M stage of the tumor, as well as the location of the tumor and the number and size of tumors. These features reflect the severity and extent of RCC, so we use them as predictors for survival status and time. The treatment category includes information on the different types of surgery the patient has undergone, as well as whether they have received radiotherapy or chemotherapy. These features indicate the interventions and outcomes of RCC, so we consider them potential modifiers for inputs.

One way to reduce the risk of overfitting machine learning models is by merging some variables. Some previous studies have merged variables for certain features, such as the age, race, type of tumor of the patient [28,29,30]. For example, in our research, we merge the following feature variables.

Marital status: We classified ‘single (never married)’, and ‘unmarried or domestic partner’ as ‘Unmarried’, and ‘married’ as ‘Married’, etc.
Surgery_LN: We classified ‘biopsy or aspiration of regional lymph node, NOS’ as ‘No’, while all other values were classified as ‘Yes’.
Surgery_Other: We classified ‘non-primary surgical procedure performed’ as ‘No’, while all other values were classified as ‘yes’.
Radiation: We classified any radiotherapy modality as ‘Yes’, while patients who did not undergo radiotherapy were classified as ‘No’.

Before feeding the data into machine learning models, we encoded these categorical variables using the one-hot encoding method. This method is often used in papers based on the SEER program and machine learning [31,32]. For example, we encoded the categorical variables Yes, No, Unknown as binary variables 100, 010, 001. As supervised learning algorithms require a categorical label or a regression-dependent variable for all samples, we considered the survival status to be the categorical label for the stage 1 training and testing process, and the survival time to be the regression-dependent variable for stage 2. By combining these two stages, we could infer the survival status of RCC patients and predict their survival time simultaneously.

3.2. Proposed Machine Learning and Fusion Models

In this research, eight machine learning models are employed to address the classification and regression problems, including five traditional machine learning models: Adaptive Boosting (ADA), Extra Trees (ET), Gradient Boosting (GB), Random Forest (RF), and Extreme Gradient Boosting (XGB). These are known as ensemble learning models, which are more stable than single models and have been applied in other fields for prediction [33,34]. Firstly, a grid search method is used to select the optimal parameters for these five models from the pre-training process. A 5-fold validation method is conducted to evaluate the training effects. The selected parameters of each model are as follows:

ADA: algorithm (SAMME, SAMME.R), n_estimators (50, 100, 150, 200), learning_rate (1, 2, 3, 4);
ET: criterion (gini, entropy, log_loss), n_estimators (50, 100, 150, 200), max_depth (2, 5, 8, 15), max_features (2, 4, 6, 8), min_sample_split (2, 4, 6, 8);
GB: learning_rate (0.01, 0.1, 1), n_estimators (50, 100, 150, 200), max_features (log2, sqrt, None), max_depth (2, 5, 8, 15), min_samples_split (2, 4, 6, 8);
RF: n_estimators (50, 100, 150, 200), max_depth (2, 5, 8, 15), max_features (2, 4, 6, 8);
XGB: learning_rate (0.05, 0.5, 1), n_estimators (20, 50, 80), reg_alpha (0, 0.5, 1), reg_lambda (0, 0.5, 1);

The remaining three models use fusion model techniques called Voting, Stacking and Blending. A fusion model is a type of ensemble learning that combines multiple base models to improve overall performance, while ensemble learning is a machine learning technique that employs multiple learners to achieve better results than any single learner. Fusion models are known to consistently produce efficient classification results in machine learning tasks in general, and with technological advancement, they have gained popularity in recent years in real-world applications and data-mining competitions. They have also demonstrated success in applications outside of the medical field. For example, in the financial stock market, Ampomah et al. applied a double ensemble learning approach to improve the prediction accuracy of different stocks in the American stock market [35]. Meanwhile, Nti et al. achieved accurate predictions of future stock trends by using Stacking and Blending methods [36].

Voting is a fusion model that enhances the accuracy and robustness of classification or regression by combining the predicting results of multiple base models. The model structure of Voting is shown in Figure 1. We used the hard style Voting model, in which the final classification result is determined by the majority vote of the base models. The base models used in the Voting model are the five classical machine learning models mentioned above, and the parameters in the model are optimized by the grid search method.

In contrast to Voting, the Stacking and Blending models are more complex and constructed as two-layer models. The model structures of Stacking and Blending are shown in Figure 2. As shown in Algorithm 1, the Stacking model takes the predictions of the base models in the first layer as inputs for the meta model in the second layer, which can improve the accuracy and robustness of classification or regression by combining the predictions of the previous base models. In practice, we train each base model on the training set data using cross-validation and obtain a matrix of predictions for each base model, where each column corresponds to a base model’s predictions. We concatenate all the prediction matrices from the base models to form the training data for the meta model. The meta model uses the new training and test data to produce the final predictions. In this research, we use the five optimized classical machine learning models as the base model and logistic regression (classification) or linear regression (regression) as the meta model.

Algorithm 1 Stacking

Input: data $D = {\{x_{i}, y_{i}\}}_{i = 1}^{N}$ , ( $x_{i} \in R^{n}$ , $y \in R$ )
Step 1: Training the first-level machine learning models
$for t \leftarrow 1 to T do$
training the base model $h_{t}$ based on data $D$
$end for$
Step 2: Construct a new dataset from the original data $D$
$for i \leftarrow 1 to N do$
constructing a new data that contains the result from
base model $\{x_{i}^{'}, y_{i}\}$ , where $x_{i}^{'} = \{h_{1} (x_{1}), . ., h_{T} (x_{i})\}$
$end for$
Step 3: Training the second-level machine learning model (meta model)
training the meta model $h^{'}$ based on new data $\{x_{i}^{'}, y_{i}\}$
return $H (x) = h^{'} (h_{1} (x), h_{2} (x), . . ., h_{T} (x))$

As shown in Figure 2 and Algorithm 2, the Blending model is similar to the Stacking model, which also uses several base models to extract features and feed them to a meta model for prediction. The whole dataset is split into training, validation and test sets. The training set is used to train all the base models individually. Each base model can produce probability matrices on the validation and test sets. The validation matrices from all the base models are used as the new training set and the test matrices from all the base models are used as the new test set. The Blending model does not require cross-validation, so it is simpler and faster than Stacking. However, it also reduces the number of samples and the data utilization in the meta model training, which may affect its performance. For comparison, we use the same base models and meta model as in Stacking.

In Section 4, we use the five machine learning models and three fusion models described above for classification and regression prediction. In the application, we obtain the optimal model performance for analysis, and identify the better performing sub-models within the best performing fusion model, in which we use the SHAP tool and Uplift Models to measure the variable importance of the optimal models.

Algorithm 2 Blending

Input: data $D = {\{x_{i}, y_{i}\}}_{i = 1}^{N}$ , ( $x_{i} \in R^{n}$ , $y \in R$ ), split it into $D_{1}$ training and $D_{2}$ validation set
Step 1: Training the first-level machine learning models
$for t \leftarrow 1 to T do$
training the base model $h_{t}$ based on data $D_{1}$
$end for$
Step 2: Construct a new dataset from the original data $D_{2}$
$for i \leftarrow 1 to N do$
constructing a new data $D_{2}$ that contains the result from base model $\{x_{i}^{'}, y_{i}\}$ , where $x_{i}^{'} = \{h_{1} (x_{1}), . ., h_{T} (x_{i})\}$
$end for$
Step 3: Training the second-level machine learning model (meta model)
training the meta model $h^{'}$ based on new data $D_{2} = \{x_{i}^{'}, y_{i}\}$
return $H (x) = h^{'} (h_{1} (x), h_{2} (x), . . ., h_{T} (x))$

3.3. Evaluation Metrics of the Two Stage Machine Learning Process

Current research papers using the SEER database have focused on exploring survival states or survival time in cancer, and few studies have conducted further analysis combining predicting the survival status and time of patients [37,38]. However, in the two-stage machine learning experimental framework we designed, we predict the survival of RCC patients in the first stage, and further predict the survival time of patients in the second stage. Figure 3 illustrates our proposed experimental framework of classification–regression two-stage machine learning for RCC survival time prognosis. The details of each stage are described in the following.

After pre-processing the unimodal and multimodal datasets in Section 3.1, we select the survival status (alive or deceased) of the RCC patients as the label for the classification models. We randomly split the whole dataset into an 80% training set and a 20% test set. Additionally, we use 5-fold cross-validation during the training process to evaluate the model stability. To compare the model performance between unimodal and multimodal machine learning, we conduct four sets of experiments: three sets on unimodal data and one set on multimodal data. This method was used by Montiel et al. to compare model classification performance across modal data [39]. In each set of unimodal or multimodal data, we apply the five classical machine learning models (ADA, ET, GB, RF, and XGB) from Section 3.2 and three fusion models (Voting, Stacking, and Blending) for prediction. We use four metrics—Accuracy, Precision, Recall, and F1 score—to measure the prediction accuracy of these classification models.

Using these classification models, we predict the survival of RCC patients and then analyze the interpretability of the model variables with the SHAP tool. The plotting graph of SHAP shows us not only the ranking of independent variable importance but also the positive or negative impact of each variable on the survival prediction. To quantify the extent to which the dependent variable influences the models’ classification, we apply Uplift Models to quantify the importance of these variables more precisely [40]. First, we split the patients into two groups: the Treatment Group (RCC patients with the important variables) and the Control Group (RCC patients without the important variables).

When a variable has a positive effect on the RCC patient survival, we further divide the Treatment Group and Control Group into four subgroups based on the following criteria:

Control Non-responder (CN): RCC patients did not have the important variable and died eventually.
Control Responder (CR): RCC patients did not have the important variable and survived.
Treated Non-responder (TN): RCC patients had the important variable and died eventually.
Treated Responder (TR): RCC patients had the important variable and survived.

When a variable has a negative effect on the RCC patient survival, we further divide the Treatment Group and Control Group into four subgroups based on the following criteria:

Control Non-responder (CN): RCC patients did not have the important variable and survived.
Control Responder (CR): RCC patients did not have the important variable and died eventually.
Treated Non-responder (TN): RCC patients had the important variable and survived.
Treated Responder (TR): RCC patients had the important variable and died eventually.

Among the four subgroups, TN and TR are from the Treatment Group; CN and CR are from the Control Group. We are more interested in the patients in the CN and TR subgroups because a higher proportion of them indicates that the independent variable is more effective. In this research, we use different classification algorithms to obtain the distribution probabilities of the four subgroups and use LGWUM to calculate the Uplift Score. The Uplift Score measures how well the independent variable can classify the RCC patients. The formula is as follows:

\begin{matrix} U p l i f t S c o r e = \frac{P (C N) - P (C R)}{P (C N) + P (C R)} + \frac{P (T R) - P (T N)}{P (T R) + P (T N)} . \end{matrix}

(1)

Moreover, to evaluate how well the classification model can distinguish the target RCC patients in the CN and TR groups, we compute the Qini Coefficient, which is the difference between the Qini Curve of the Uplift Model and the curve of the random model. We consider the Uplift Model to be valid if the Qini Coefficient is greater than 5%, which means that we regard the significant dependent variable as having a significant effect on the survival prediction of RCC patients. The Qini Coefficient is calculated as follows, where ‘a’ is the proportion of the treatment group RCC patients and ‘N’ is the total proportion of RCC patients. The random model assigns the same uplift value to all the samples, regardless of their positive or negative effects. It results in a straight line from (0,0) to [1, uplift(N)]:

\begin{matrix} Q i n i C o e f f i c i e n t = \sum_{n = 1}^{N} (U p l i f t M o d e l C u r v e - R a n d o m M o d e l C u r v e) . \end{matrix}

(2)

\begin{matrix} U p l i f t M o d e l C u r v e = N a (\frac{T R}{T R + T N} - \frac{C R}{C R + C N}) . \end{matrix}

(3)

After comparing the results of the four modalities and choosing the one with the best Accuracy for the next modeling analysis, in stage 2, we form a new training set with the data that have the status of death in the previous training set. Similarly, we form a new test set with the data that have the model prediction of death for RCC patients in the previous test set. We select survival time as the dependent variable for the regression models and reuse eight regression machine learning models to make predictions and evaluate them with new metrics: Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Their equations are as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} {(y_{n} - {\hat{y}}_{n})}^{2}}

(4)

M A E = \frac{1}{N} \sum_{n = 1}^{N} | y_{n} - {\hat{y}}_{n} |

(5)

Using the regression results, we apply the regression-to-classification method to convert the predicted survival time of RCC patients into discrete categories: 1 year (12 months), 2 years (24 months), 3 years (36 months), 4 years (48 months), and 5 years (60 months). This results in new Accuracy, Precision, Recall, and F1 score metrics that reflect not only the survival status of the patient but also the prognosis of the patient’s survival time.

4. Experiment Results and Discussion

In this section, we split the results in Figure 4 into two parts: stage 1 and stage 2. We discuss them in detail in Section 4.1 and Section 4.2. In Section 4.1, we use three unimodal datasets and one multimodal dataset, as well as different types of machine learning models, to predict the survival status of RCC patients. We compare the metrics of the models and select the best one for analysis. We also summarize the importance of the different variables by disentangling the model’s input variables. In Section 4.2, we use the optimal modal dataset and select the deceased RCC patients in the training set from the first stage as the new training set. We also select the RCC patients who are predicted to be deceased by the best machine learning model in the first stage as the new test set. We reuse them for training to predict the survival time of the deceased RCC patients.

4.1. Stage I: Classification and Interpretable Analysis

Table 2, Table 3, Table 4 and Table 5 show the classification metrics of all models for three unimodal data and one multimodal data. In the training process, we use the 5-fold cross-validation method for all models and the training set to produce more accurate and stable predictions. The classification metrics for the training set include the mean and standard deviation of the metrics under the 5-fold cross validation. The classification metrics for the test set show the test results of the models directly. Overall, unimodal personal information has the worst performance. As shown in Table 2, all models have an accuracy rate of 62% or lower, and even the fusion models fail to exceed 63% accuracy on the test set. Similarly, on the unimodal tumor pathology data, their accuracy improves to about 70%. On the unimodal treatment data, the accuracy reaches between 65 and 66%. Compared with the other two unimodal datasets, the information on tumor pathology provides a more useful reference for classification models, followed by treatment and personal information.

In the final set of experiments at this stage, we combine the three unimodal datasets to form a multimodal dataset as input to the machine learning model. As shown in Table 5, the prediction of survival status for RCC patients is significantly improved, with an overall accuracy of around 76%. Even the worst model, ADA, achieves an accuracy of 74.52%, which is much better than the models based on the three modalities. For classical machine learning models, GB achieves the best result of 77.50% and RF achieves the result of 77.06%. For fusion models, Blending achieves the best result of 77.35%.

We select the best performing GB, RF, and Blending models for analysis. For the Blending model, the first step is to interpret its meta-model. We use logistic regression as the meta-model, and the prediction results of each base model as the input for logistic regression to perform the classification task again. In Figure 4, we visualize the feature importance of the five base models. Two of them, RF and GB, have more important roles in the classification of the variables. We conduct an interpretability analysis on the two foundational models, RF and GB, since healthcare professionals and doctors often find it challenging to trust a predictive model without direct interpretability. We utilize the SHAP package, which aids in visualizing the significance of each variable’s contribution to the predictions, thereby clarifying the final model’s output. The SHAP package offers two interpretative views: global interpretability across all samples and local interpretability for individual samples. Global interpretability delivers a comprehensive model overview, effectively prioritizing each variable’s importance in a descending sequence. Figure 5 and Figure 6 illustrate the feature importance plots for the RF and GB models, respectively. In these two models, they all have a similar rank of important features. The M stage and Age_old are the top two importance, with a trend that M1 stage and older age lead to death in RCC patients. Among the remaining variables, histologic type 8317/3, tumor size, surgery, radiotherapy, and chemotherapy are also key factors in the classification. For tumor stage T and stage N, the higher the tumor stage, the higher the probability of death in RCC patients, which provides an important reference for predicting survival status. For treatment variables, the different surgical procedures performed on the patients are generally positive for RCC patients. If the tumors are removed, the patient will have a better chance of survival. However, for radiotherapy and chemotherapy, the situation is different. As modern techniques for managing tumors, radiotherapy and chemotherapy are often used in the middle and late stages of tumors [41]. But for RCC, they are not very effective. For patients with RCC who underwent radiotherapy and chemotherapy, their survival status tends to be worse. Therefore, early surgical treatment may be more effective than radiotherapy and chemotherapy for RCC. One interesting variable is that marital status plays an important role in the survival of RCC patients, with RCC patients who have families having a greater advantage in terms of survival, and according to previous studies, married RCC patients have shown lower mortality rates at all stages [42]. Meanwhile, another study also found that widowed, divorced/separated male RCC patients had a worse prognosis and higher risk of mortality after treatment [43]. These studies suggest that people should consider patients’ marital status when treating RCC, as they may need more attention and support. This information can help healthcare professionals provide patients with a more personalized treatment plan.

Furthermore, the SHAP dependence plot is instrumental in elucidating the influence of individual features on the prediction model’s output. Figure 7 and Figure 8 display the actual values of the top six features with their corresponding SHAP values. The horizontal coordinate in each subplot represents the change in features of the original dataset, and the vertical coordinate represents the change in SHAP values. Features that exhibit SHAP values above zero are indicative of a model’s prediction for the survival of RCC patients. Moreover, a higher SHAP value means an increasing likelihood of patient survival. Take the subplot in Figure 7 with the Age_old variable in the horizontal coordinate and SHAP Value in the vertical coordinate as an example. We can find that when Age_old is positive, the SHAP Value is less than 0, which means that the patient has a higher risk of death; conversely, when Age_old is negative, the SHAP Value is greater than 0, which means that the patient has a higher likelihood of survival. It is evident from Figure 7 and Figure 8 that factors such as younger age, smaller tumor size, and receipt of surgical treatment are associated with improved survival probabilities. These observations are in concordance with the insights derived from Figure 5 and Figure 6.

Local interpretability enables the analysis of input characteristics for individual samples across various models, facilitating specific predictions for each case. The left panels of Figure 9 and Figure 10 depict the survival prediction for a particular sample. As illustrated in the left panel of Figure 9, the variables M stage, Tumor Size, and Surgery Cancer all have a positive impact on this sample’s survival probability, whereas the variable Age_old has a negative effect. In a similar vein, the left panel of Figure 10 shows that Tumor Size, M stage, and Grade_2 positively influence the survival prediction. Consequently, the model predicts survival for this sample patient. Conversely, the right panels of Figure 9 and Figure 10 present the survival prediction plot for a different sample by both models. It is apparent from these figures that Tumor Size and Age_old are significant factors, with their prominence leading the model to predict a non-survival outcome for this patient.

To quantify the exact impact of the important variables derived from the SHAP plotting on the classification results, we use Uplift Models to validate the important variables with high rank in a new round. Similar to the previous classification problem, we split the RCC patient dataset into training and test data, and then we combine the important categorical variables with the original labels to form new labels. For example, Age_old and survival state are combined into four new categorical labels: age_old+alive, age_old+deceased, not_age_old+alive and not_age_old+deceased. We train the new training set using the best performing RF and GB models from the previous stage. Table 6 and Table 7 show the training results of these two models. GB performs better overall in terms of accuracy prediction. However, for the Uplift Model, we focus more on the Uplift Score and Qini Coefficient. When the Uplift Score is greater than 0, it means that the important variables have a significant effect on the classification. From Table 6 and Table 7, we can see that all important variables have Uplift Scores greater than 0, regardless of whether they are positively or negatively correlated with RCC survival. In terms of the Qini Coefficient, the results are different. According to the evaluation criteria, the Qini Coefficient should be greater than 0.05, which means that Uplift Models have a significant difference from the random model. However, in the RF model, four variables have negative values, and in the GB model, M stage and Radiation have negative Qini Coefficient values, which means the model performs worse than the random model in classifying these variables. Therefore, in the actual diagnosis of RCC, we can consider focusing on the remaining variables, such as the patient’s age, types of cancerous cells, stage of cancer, and cancer surgery status. These variables have been validated by the Uplift Model to predict the future survival of RCC patients more accurately.

4.2. Stage II: Regression and Interpretable Analysis

After the analysis in the previous stage, we select the data of patients who died of RCC in the multimodal dataset for the prediction of survival time with different types of regression models. Similar to stage 1, we use the same machine learning algorithm from stage 1 one by one. For Stacking and Blending, we use linear regression as the meta model instead of logistic regression. The metrics of the predicted survival time results are shown in Table 8. Although the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) show relatively large errors, we are only interested in the reclassification results from the regression results. We categorize the predicted survival times of all models into 1, 2, 3, 4 and 5 years. These reclassification results are shown in Table 9 and visualized in Figure 11 for observation and analysis. The ADA model performs the worst in terms of RCC death prognosis at 1–5 years. In contrast, the ET model outperforms all models in terms of prognosis at 2–5 years for RCC. Although RF performs well in step 1, it fails to have higher accuracy than other models at 1–3 years’ prognosis. The Blending model maintains relatively stable accuracy over the 1–5 year prognosis and performs relatively well over the 1–5 year prognosis. However, it has very poor prognosis results at 5 years, which are only higher than ADA. Overall, classical machine learning models are more unstable in the 1–5 year prognosis, while for fusion models, the overall performance is smoother.

Similar to stage 1, we also visualize the importance of the variables in the linear regression meta model of Stacking and Blending models. As shown in Figure 12 and Figure 13, among the five base models, GB, XGB and ET have important roles in the Stacking and Blending regression. We visualize these three regression models’ importance using SHAP, and their feature importance plots are shown in Figure 14, Figure 15 and Figure 16. Compared with the importance of the base models in stage 1, features such as M stage, Tumor size, Grade 2 and T stage 1, surgery of cancer, and other features related to tumor and surgery are key factors in the deceased RCC patients. Different from stage 1, old age is not the top important factor, which suggests that age is not a key factor in the prognosis of death. Surgical and tumor-related information factors are more important in both stage 1 and stage 2. Therefore, in the prognosis of RCC, we should focus more on the multimodal data that combine surgical and tumor information to improve the model performance.

4.3. Discussion

There are many research papers that have attempted to use the SEER program in combination with machine learning and survival analysis to predict various tasks, such as tumor metastasis, patients’ disease status or survival status. Although these papers have different data sources, features, data cleaning, data modeling and prediction tasks, there are some useful materials or methods that can be referenced.

Focusing on RCC, Wang et al. used the best performing machine learning algorithm, XGB, to evaluate the risk of RCC with liver metastasis, achieving an AUC of 94.7% [44]. In terms of the features in their research, they only used six variables—grade, T stage, N stage, tumor size, bone metastasis, and lung metastasis—as inputs for all models. They only used the P value and Lasso regression to filter the variables. In comparison to our research, we modify the inputs of the models based on the modalities of the data. Wang et al. also visualized the importance of the variables for the six models, but they relied solely on the importance ranking method from the models themselves, rather than using the SHAP package.

Some other papers have focused on the prognosis of cancer. Yu et al. concentrated on the 5-year survival status prognosis of non-metastatic cervical cancer patients by using features such as age, marital status, race, histology, stage, surgery and more [45]. Jin et al. conducted a 10-year prognosis for differentiated thyroid cancer with similar features [46]. However, both papers lacked detailed tables of categorical indicators for machine learning classification problems, making it challenging to compare them with our study. In the prognosis of lung cancer [47], Wang et al. employed a similar approach to our study. They achieved an accuracy ranging between 70% and 90% with seven machine learning models to classify the survival probability at 1 year, 3 years and 5 years by predicting the prognosis of lung cancer patients at different time points. They also predicted the prognosis of lung cancer for male and female patients. In terms of variable treatment, Wang et al. used common variables such as age, race, marital status, grade, T stage, N stage, M stage, and surgery. Moreover, in the training process, their research required retraining the machine learning model for prediction due to changes in labels for classification. In contrast, our research addresses the problem of dividing the prognostic year in the regression stage of stage 2, where we only need to convert the regression results into different prognostic years.

5. Conclusions

In conclusion, our research represents a significant advancement in renal cell carcinoma (RCC) prognosis through multimodal machine learning. We highlight four key contributions that collectively enhance the understanding and utility of predictive models in clinical practice.

Firstly, our innovative approach consolidates features from diverse modalities, departing from conventional methods and providing a holistic input for machine learning models. This empowers healthcare professionals with a comprehensive tool for RCC prediction.

Secondly, our introduction of a distinctive model fusion technique offers a more stable and accurate alternative to traditional methods, improving the reliability of prognostic assessments in real-world scenarios. In the prediction framework, machine learning prediction based on multimodal data is better than machine learning prediction based on unimodal data. The accuracy of predicting the survival status of RCC patients is higher than 75%. Moreover, in terms of machine learning model selection, the performance of fusion models is also more stable than that of single machine learning models. The interpretability aspect of the analysis is also the same as that of the single machine learning model, both of which can be disentangled relatively easily in terms of model importance.

Thirdly, our two-stage model training framework, focusing on survival status prediction and prognosis forecasting, represents a paradigm shift in RCC prognosis. This temporal disaggregation enables unprecedented granularity in survival predictions, enhancing clinical decision-making.

Lastly, our selection of optimal models, coupled with visualization of variable importance using the SHAP package, enables a good understanding of variables influencing RCC diagnosis and survival prediction. This helps explainability in machine learning predictions and provides invaluable guidance for medical practitioners and RCC patients in real-world applications. In our future research, we plan to use similar logical frameworks and ideas for the multimodal machine learning analysis of different cancers or other diseases, in which we will expand the choice of data sources, and we can consider using deep learning algorithms for the feature extraction of medical images, audio, and other kinds of data, and integrate different multimodal data into a single one to improve the utility of two-stage multimodal machine learning.

Author Contributions

Conceptualization, K.Y. and S.F.; methodology, K.Y.; software, K.Y.; validation, S.F.; writing—original draft preparation, K.Y.; writing—review and editing, T.L. and Q.S.; supervision, S.F.; funding acquisition, S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Chongqing Technology and Business University 2023 High-level Talent Research Initiation Project (Grant No. 2356013), Natural Science Foundation of Chongqing, China (CSTB2022NSCQ-MSX1571), Guangzhou Development Zone Science and Technology (2021GH10, 2020GH10, EF003/FST-FSJ/2019/GSTIC), Macau FDCT (0032/2022/A, 0091/2020/A2) and University of Macau (MYRG2022-00271-FST, Collaborative Research Grant (MYRG-CRG)—CRG2021-00002-ICI).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that there are no personal or organizational conflicts of interest with this work.

Abbreviations

The following abbreviations are used in this manuscript:

RCC	renal cell carcinoma
SEER	Surveillance, Epidemiology, and End Results
SHAP	SHapley Additive exPlanations
ADA	Adaptive Boosting
ET	Extra Trees
GB	Gradient Boosting
RF	Random Forest
XGB	Extreme Gradient Boosting

References

Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed]
Umberto, C.; Montorsi, F. Renal cancer. Lancet 2016, 387, 894–906. [Google Scholar]
Zhi, Y.; Li, X.; Qi, F.; Hu, X.; Xu, W. Association of Tumor Size with Risk of Lymph Node Metastasis in Clear Cell Renal Cell Carcinoma: A Population-Based Study. J. Oncol. 2020, 2020, 8887782. [Google Scholar] [CrossRef] [PubMed]
de Leon, A.D.; Pedrosa, I. Imaging and screening of kidney cancer. Radiol. Clin. 2017, 55, 1235–1250. [Google Scholar] [CrossRef] [PubMed]
Sharma, R.; Kannourakis, G.; Prithviraj, P.; Ahmed, N. Precision medicine: An optimal approach to patient care in renal cell carcinoma. Front. Med. 2022, 9, 766869. [Google Scholar] [CrossRef] [PubMed]
Yan, K.; Li, T.; Marques, J.A.L.; Gao, J.; Fong, S.J. A review on multimodal machine learning in medical diagnostics. Math. Biosci. Eng. 2023, 20, 8708–8726. [Google Scholar] [CrossRef]
Hatami, F.; Rahman, M.M.; Nikparvar, B.; Thill, J.C. Non-linear associations between the urban built environment and commuting modal split: A random forest approach and SHAP evaluation. IEEE Access 2023, 11, 12649–12662. [Google Scholar] [CrossRef]
Devriendt, F.; Van Belle, J.; Guns, T.; Verbeke, W. Learning to rank for uplift modeling. IEEE Trans. Knowl. Data Eng. 2020, 34, 4888–4904. [Google Scholar] [CrossRef]
Miao, J.; Wang, Y.; Gu, X.; Lin, W.; Ouyang, Z.; Wang, M.; Su, J. Risk of Cardiovascular Disease Death in Older Malignant Melanoma Patients: A Population-Based Study. Cancers 2022, 14, 4783. [Google Scholar] [CrossRef]
Li, W.; Huang, G.; Wang, Y.Q. Development a survival prediction model for patients with Paget disease of the breast based on the SEER database. Med. Data Min. 2023, 6, 2. [Google Scholar] [CrossRef]
Pausch, T.M.; Liu, X.; Cui, J.; Wei, J.; Miao, Y.; Heger, U.; Hackert, T. Survival benefit of resection surgery for pancreatic ductal adenocarcinoma with liver metastases: A propensity score-matched SEER database analysis. Cancers 2021, 14, 57. [Google Scholar] [CrossRef] [PubMed]
Alabi, R.O.; Mäkitie, A.A.; Pirinen, M.; Elmusrati, M.; Leivo, I.; Almangush, A. Comparison of nomogram with machine learning techniques for prediction of overall survival in patients with tongue cancer. Int. J. Med. Inform. 2021, 145, 104313. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Zhou, Q.; Liu, W.; Xu, C.; Tang, Z.R.; Dong, S.; Yin, C. A machine learning-based predictive model for predicting lymph node metastasis in patients with ewing’s sarcoma. Front. Med. 2022, 9, 832108. [Google Scholar] [CrossRef] [PubMed]
Tian, H.; Ning, Z.; Zong, Z.; Liu, J.; Hu, C.; Ying, H.; Li, H.L. Application of machine learning algorithms to predict lymph node metastasis in early gastric cancer. Front. Med. 2022, 8, 759013. [Google Scholar] [CrossRef] [PubMed]
Liu, W.C.; Li, M.X.; Qian, W.X.; Luo, Z.W.; Liao, W.J.; Liu, Z.L.; Liu, J.M. Application of machine learning techniques to predict bone metastasis in patients with prostate cancer. Cancer Manag. Res. 2021, 13, 8723–8736. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Liu, W.; Hussain, M.F.; Wang, B.; Xu, C.; Dong, S.; Yin, C. An external-validated prediction model to predict lung metastasis among osteosarcoma: A multicenter analysis based on machine learning. Comput. Intell. Neurosci. 2022, 2022, 2220527. [Google Scholar] [CrossRef] [PubMed]
Jiang, J.; Pan, H.; Li, M.; Qian, B.; Lin, X.; Fan, S. Predictive model for the 5-year survival status of osteosarcoma patients based on the SEER database and XGBoost algorithm. Sci. Rep. 2021, 11, 5542. [Google Scholar] [CrossRef]
Huang, K.; Zhang, J.; Yu, Y.; Lin, Y.; Song, C. The impact of chemotherapy and survival prediction by machine learning in early Elderly Triple Negative Breast Cancer (eTNBC): A population based study from the SEER database. BMC Geriatr. 2022, 22, 268. [Google Scholar] [CrossRef]
Lynch, C.M.; Abdollahi, B.; Fuqua, J.D.; de Carlo, A.R.; Bartholomai, J.A.; Balgemann, R.N.; van Berkel, V.H.; Frieboes, H.B. Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int. J. Med. Inform. 2017, 108, 1–8. [Google Scholar] [CrossRef]
Xu, C.; Liu, W.; Yin, C.; Li, W.; Liu, J.; Sheng, W.; Zhang, Q. Establishment and validation of a machine learning prediction model based on big data for predicting the risk of bone metastasis in renal cell carcinoma patients. Comput. Math. Methods Med. 2022, 2022, 5676570. [Google Scholar] [CrossRef]
Li, W.; Hong, T.; Liu, W.; Dong, S.; Wang, H.; Tang, Z.R.; Yin, C. Development of a machine learning-based predictive model for lung metastasis in patients with ewing sarcoma. Front. Med. 2022, 9, 807382. [Google Scholar] [CrossRef] [PubMed]
Cavalcante, C.H.; Primo, P.E.; Sales, C.A.; Caldas, W.L.; Silva, J.H.; Souza, A.H.; Madeiro, J.P. Sudden cardiac death multiparametric classification system for Chagas heart disease’s patients based on clinical data and 24-hours ECG monitoring. Math. Biosci. Eng. 2023, 20, 9159–9178. [Google Scholar] [CrossRef] [PubMed]
Qiu, B.; Su, X.H.; Qin, X.; Wang, Q. Application of machine learning techniques in real-world research to predict the risk of liver metastasis in rectal cancer. Front. Oncol. 2022, 12, 1065468. [Google Scholar] [CrossRef] [PubMed]
Sorayaie, A.A.; Babaei, R.S.; Naemi, A.; Bagherzadeh, M.J.; Pirnejad, H.; Bagherzadeh, M.M.; Wiil, U.K. Application of machine learning techniques for predicting survival in ovarian cancer. BMC Med. Inform. Decis. Mak. 2022, 22, 345. [Google Scholar] [CrossRef] [PubMed]
Alabi, R.O.; Almangush, A.; Elmusrati, M.; Leivo, I.; Mäkitie, A.A. An interpretable machine learning prognostic system for risk stratification in oropharyngeal cancer. Int. J. Med. Inform. 2022, 168, 104896. [Google Scholar] [CrossRef] [PubMed]
Kane, K.; Lo, V.S.; Zheng, J. Mining for the truly responsive customers and prospects using true-lift modeling: Comparison of new and existing methods. J. Mark. Anal. 2014, 2, 218–238. [Google Scholar] [CrossRef]
Benjamin, D.J.; Berger, J.O.; Johannesson, M.; Nosek, B.A.; Wagenmakers, E.J.; Berk, R.; Johnson, V.E. Redefine statistical significance. Nat. Hum. Behav. 2018, 2, 6–10. [Google Scholar] [CrossRef] [PubMed]
Liao, F.; Wang, W.; Wang, J. A deep learning-based model predicts survival for patients with laryngeal squamous cell carcinoma: A large population-based study. Eur. Arch.-Oto-Rhino-Laryngol. 2023, 280, 789–795. [Google Scholar] [CrossRef]
Ruan, Z.; Quan, Q.; Wang, Q.; Jiang, J.; Peng, R. New staging system and prognostic model for malignant phyllodes tumor patients without distant metastasis: A development and validation study. J. Clin. Med. 2023, 12, 1889. [Google Scholar] [CrossRef]
Yan, P.; Huang, R.; Hu, P.; Liu, F.; Zhu, X.; Hu, P.; Huang, Z. Nomograms for predicting the overall and cause-specific survival in patients with malignant peripheral nerve sheath tumor: A population-based study. J. Neuro-Oncol. 2019, 143, 495–503. [Google Scholar] [CrossRef]
Chen, S.; Li, X.; Liang, Y.; Lu, X.; Huang, Y.; Zhu, J.; Li, J. Short-term prognosis for hepatocellular carcinoma patients with lung metastasis: A retrospective cohort study based on the SEER database. Medicine 2022, 101, e31399. [Google Scholar] [CrossRef]
Sedighi-Maman, Z.; Heath, J.J. An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction. Sensors 2022, 22, 6783. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Yan, K. Machine learning-based quantitative trading strategies across different time intervals in the American market. Quant. Financ. Econ. 2023, 7, 569–594. [Google Scholar] [CrossRef]
Li, Y.; Yan, K. Prediction of Barrier Option Price Based on Antithetic Monte Carlo and Machine Learning Methods. Cloud Comput. Data Sci. 2023, 4, 77–86. [Google Scholar] [CrossRef]
Ampomah, E.K.; Qin, Z.; Nyame, G.; Botchey, F.E. Stock market decision support modeling with tree-based AdaBoost ensemble machine learning models. Informatica 2021, 44, 477–489. [Google Scholar] [CrossRef]
Nti, I.K.; Adekoya, A.F.; Weyori, B.A. A comprehensive evaluation of ensemble learning for stock-market prediction. J. Big Data 2020, 7, 20. [Google Scholar] [CrossRef]
Teng, J.; Zhang, H.; Liu, W.; Shu, X.O.; Ye, F. A dynamic Bayesian model for breast cancer survival prediction. IEEE J. Biomed. Health Inform. 2022, 26, 5716–5727. [Google Scholar] [CrossRef] [PubMed]
Liu, P.; Fei, S. Two-stage prediction of comorbid cancer patient survivability based on improved infinite feature selection. IEEE Access 2020, 8, 169559–169567. [Google Scholar] [CrossRef]
Hevia-Montiel, N.; Perez-Gonzalez, J.; Neme, A.; Haro, P. Machine Learning-Based Feature Selection and Classification for the Experimental Diagnosis of Trypanosoma cruzi. Electronics 2022, 11, 785. [Google Scholar] [CrossRef]
Wijaya, D.; Jumri Habbeyb, D.S.; Barus, S.; Pasaribu, B.; Sirbu, L.I.; Dharma, A. Uplift modeling VS conventional predictive model: A reliable machine learning model to solve employee turnover. Int. J. Artif. Intell. Res. 2021, 5, 53–64. [Google Scholar] [CrossRef]
Christensen, M.; Hannan, R. The emerging role of radiation therapy in renal cell carcinoma. Cancers 2022, 14, 4693. [Google Scholar] [CrossRef] [PubMed]
Siech, C.; Morra, S.; Scheipner, L.; Baudo, A.; Jannello, L.M.; de Angelis, M.; Karakiewicz, P.I. Married Status Affects Rates of Treatment and Mortality in Male and Female Renal Cell Carcinoma Patients Across all Stages. Clin. Genitourin. Cancer 2024, 22, 593–598. [Google Scholar] [CrossRef] [PubMed]
Marchioni, M.; Martel, T.B.; Bandini, M.; Pompe, R.S.; Tian, Z.; Kapoor, A.; Karakiewicz, P.I. Marital status and gender affect stage, tumor grade, treatment type and cancer specific mortality in T1–2 N0 M0 renal cell carcinoma. World J. Urol. 2017, 35, 1899–1905. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Xu, C.; Liu, W.; Zhang, M.; Zou, J.A.; Shao, M.; Yin, C. A clinical prediction model for predicting the risk of liver metastasis from renal cell carcinoma based on machine learning. Front. Endocrinol. 2023, 13, 1083569. [Google Scholar] [CrossRef] [PubMed]
Yu, W.; Lu, Y.; Shou, H.; Xu, H.E.; Shi, L.; Geng, X.; Song, T. A 5-year survival status prognosis of nonmetastatic cervical cancer patients through machine learning algorithms. Cancer Med. 2023, 12, 6867–6876. [Google Scholar] [CrossRef]
Jin, S.; Yang, X.; Zhong, Q.; Liu, X.; Zheng, T.; Zhu, L.; Yang, J. A predictive model for the 10-year overall survival status of patients with distant metastases from differentiated thyroid cancer using Xgboost algorithm-a population-based analysis. Front. Genet. 2022, 13, 896805. [Google Scholar] [CrossRef]
Wang, Y.; Liu, S.; Wang, Z.; Fan, Y.; Huang, J.; Huang, L.; Zhou, F. A machine learning-based investigation of gender-specific prognosis of lung cancers. Medicina 2021, 57, 99. [Google Scholar] [CrossRef]

Figure 1. Model structure of Voting.

Figure 2. Model structure of Stacking and Blending.

Figure 3. Experimental framework.

Figure 4. Feature importance of logistical model of Blending.

Figure 5. Feature importance of RF in stage 1.

Figure 6. Feature importance of GB in stage 1.

Figure 7. Top 6 features’ dependence plot of GB.

Figure 8. Top 6 features’ dependence plot of RF.

Figure 9. Waterfall plot of GB for individual RCC patient.

Figure 10. Waterfall plot of RF for individual RCC patient.

Figure 11. Survival status prognosis of 1–5 years.

Figure 12. Feature importance of linear regression of Stacking.

Figure 13. Feature importance of linear regression of Blending.

Figure 14. Feature importance of GB in stage 2.

Figure 15. Feature importance of XGB in stage 2.

Figure 16. Feature importance of ET in stage 2.

Table 1. Detailed description of SEER features.

Modal	Feature Name	Type	Values	Description
Personal information	Sex	Categorical	Female, Male	the sex of the patient
Personal information	PRCDA	Categorical	Yes, No, Unknown	whether the patient belongs to PRCDA or not
Personal information	Race	Categorical	White, Black, Other, Unknown	the race of the patient
Personal information	Age	Categorical	Young (0–30), Middle (30–60), Old (>60)	the age of the patient
Personal information	Marital status	Categorical	Married, Unmarried, Divorced,	the marital status of the patient
Tumor pathology	Grade	Categorical	1, 2, 3, 4	the differentiation of tumor
Tumor pathology	T stage	Categorical	T0, T1, T2, T3, T4	the AJCC ‘T’ component
Tumor pathology	N stage	Categorical	N0, N1	the AJCC ‘N’ component
Tumor pathology	M stage	Categorical	M0, M1	the AJCC ‘M’ component
Tumor pathology	Tumor size	Numerical	0-999	information on size of tumor
Tumor pathology	Laterality	Categorical	Left, Right, Paired, Bilateral, Oneside, Not_paired	the description of a paired organ or side of the body on which the reportable tumor originated
Tumor pathology	Histologic type	Categorical	8312/3, 8317/3, 8318/3, 8316/3	types of cancerous cells
Tumor pathology	Primary Site	Categorical	C64.9, C65.9, C80.9, C66.9, C20.9, C34.1, C61.9, C74.9, C67.5, C67.2, C67.9, C68.9, C25.8, C56.9	tumor’s site of origin, or where the tumor originated
Tumor pathology	Benign/borderline tumors number	Numerical	0, 1, 2, 3, 4	total number of benign or borderline tumors for patient
Treatment	Surgery_LN	Categorical	Yes, No	the surgery related to lymph nodes
Treatment	Surgery_Other	Categorical	Yes, No	the surgery beyond the regional lymph nodes
Treatment	Surgery_Cancer	Categorical	Yes, No	whether perform the surgery or not
Treatment	Radiation	Categorical	Yes, No	whether perform the radiation or not
Treatment	Chemotherapy	Categorical	Yes, No	whether perform the chemotherapy or not

Table 2. Classification metrics based on unimodal personal information dataset.

Model	Dataset	Accuracy	Precision	Recall	F1 Score
ADA	Training	61.79 ± 0.19	61.88 ± 0.22	61.79 ± 0.19	61.81 ± 0.19
ADA	Test	62.34	62.45	62.34	62.36
ET	Training	61.72 ± 0.25	62.40 ± 0.49	61.72 ± 0.25	61.58 ± 0.24
ET	Test	61.78	62.41	61.78	61.68
GB	Training	61.82 ± 0.16	62.33 ± 0.28	61.82 ± 0.16	61.74 ± 0.18
GB	Test	61.63	62.11	61.63	61.57
RF	Training	61.71 ± 0.27	61.78 ± 0.31	61.71 ± 0.27	61.73 ± 0.28
RF	Test	62.33	62.42	62.33	62.35
XGB	Training	61.71 ± 0.21	61.91 ± 0.25	61.71 ± 0.21	61.71 ± 0.24
XGB	Test	62.26	62.38	62.26	62.28
Voting	Training	61.65 ± 0.34	61.77 ± 0.50	61.65 ± 0.34	61.63 ± 0.30
Voting	Test	62.35	62.42	62.35	62.37
Stacking	Training	61.77 ± 0.14	61.87 ± 0.19	61.77 ± 0.14	61.79 ± 0.15
Stacking	Test	62.28	62.35	62.28	62.30
Blending	Training	61.16 ± 0.21	61.49 ± 0.30	61.16 ± 0.21	61.14 ± 0.21
Blending	Test	61.74	62.15	61.74	61.71

Table 3. Classification metrics based on unimodal tumor pathology dataset.

Model	Dataset	Accuracy	Precision	Recall	F1 Score
ADA	Training	70.58 ± 0.33	72.47 ± 0.29	70.58 ± 0.33	69.61 ± 0.49
ADA	Test	70.38	72.26	70.38	69.35
ET	Training	70.89 ± 0.52	73.60 ± 0.21	70.89 ± 0.52	69.65 ± 0.67
ET	Test	70.45	73.34	70.45	69.06
GB	Training	71.13 ± 0.31	71.41 ± 0.33	71.13 ± 0.31	70.88 ± 0.31
GB	Test	70.86	71.13	70.86	70.59
RF	Training	70.88 ± 0.40	72.85 ± 0.22	70.88 ± 0.40	69.90 ± 0.44
RF	Test	70.89	72.84	70.89	69.88
XGB	Training	70.97 ± 0.18	71.28 ± 0.21	70.97 ± 0.18	70.76 ± 0.26
XGB	Test	70.94	71.08	70.94	70.75
Voting	Training	71.01 ± 0.30	73.21 ± 0.33	71.01 ± 0.30	69.95 ± 0.33
Voting	Test	70.91	73.10	70.91	69.82
Stacking	Training	70.71 ± 0.45	71.16 ± 0.33	70.71 ± 0.45	70.37 ± 0.51
Stacking	Test	70.78	71.01	70.78	70.54
Blending	Training	70.93 ± 0.86	72.01 ± 0.96	70.93 ± 0.86	70.29 ± 0.85
Blending	Test	70.84	71.96	70.84	70.15

Table 4. Classification metrics based on unimodal treatment dataset.

Model	Dataset	Accuracy	Precision	Recall	F1 Score
ADA	Training	66.55 ± 0.46	73.81 ± 0.26	66.55 ± 0.46	63.05 ± 0.48
ADA	Test	66.99	73.94	66.99	63.61
ET	Training	66.55 ± 0.46	73.81 ± 0.26	66.55 ± 0.46	63.05 ± 0.48
ET	Test	66.99	73.94	66.99	63.61
GB	Training	66.55 ± 0.46	73.81 ± 0.26	66.55 ± 0.46	63.05 ± 0.48
GB	Test	66.99	73.94	66.99	63.61
RF	Training	66.55 ± 0.46	73.81 ± 0.26	66.55 ± 0.46	63.05 ± 0.48
RF	Test	66.99	73.94	66.99	63.61
XGB	Training	66.55 ± 0.46	73.81 ± 0.26	66.55 ± 0.46	63.05 ± 0.48
XGB	Test	66.99	73.94	66.99	63.61
Voting	Training	66.55 ± 0.27	73.80 ± 0.11	66.55 ± 0.27	63.05 ± 0.42
Voting	Test	66.99	73.94	66.99	63.61
Stacking	Training	66.55 ± 0.46	73.81 ± 0.26	66.55 ± 0.46	63.05 ± 0.48
Stacking	Test	66.99	73.94	66.99	63.61
Blending	Training	66.83 ± 0.16	74.04 ± 0.20	66.83 ± 0.16	63.36 ± 0.15
Blending	Test	66.99	73.94	66.99	63.61

Table 5. Classification metrics based on multimodal dataset.

Model	Dataset	Accuracy	Precision	Recall	F1 Score
ADA	Training	70.48 ± 0.23	74.58 ± 0.21	74.48 ± 0.23	74.37 ± 0.24
ADA	Test	74.52	74.59	74.52	74.42
ET	Training	74.96 ± 0.19	75.03 ± 0.20	74.96 ± 0.19	74.88 ± 0.19
ET	Test	76.26	76.33	76.26	76.17
GB	Training	76.89 ± 0.20	76.93 ± 0.17	76.89 ± 0.20	76.85 ± 0.21
GB	Test	77.50	77.57	77.50	77.43
RF	Training	76.56 ± 0.42	76.63 ± 0.37	76.56 ± 0.42	76.48 ± 0.43
RF	Test	77.06	77.16	77.06	76.98
XGB	Training	75.74 ± 0.35	75.79 ± 0.31	75.74 ± 0.35	75.67 ± 0.36
XGB	Test	76.12	76.18	76.12	76.05
Voting	Training	76.08 ± 0.29	76.14 ± 0.31	76.08 ± 0.29	76.01 ± 0.28
Voting	Test	76.59	76.66	76.59	76.51
Stacking	Training	76.72 ± 0.29	76.74 ± 0.27	76.72 ± 0.29	76.68 ± 0.30
Stacking	Test	77.19	77.24	77.19	77.13
Blending	Training	77.00 ± 0.60	77.04 ± 0.62	77.00 ± 0.60	76.95 ± 0.58
Blending	Test	77.35	77.40	77.35	77.29

Table 6. Feature importance ranking of RF Uplift Model.

Variable	Correlation	Uplift Score	Qini Coefficient	Accuracy
M stage	−0.40	0.81	−11.79	73.99%
Age_old	−0.24	0.39	6.11	76.92%
Histologic type_8317/3	0.27	0.47	0.95	76.95%
Age_middle	0.23	0.35	−0.22	76.38%
Surgery_cancer	0.27	0.53	4.57	74.57%
Grade_2	0.21	0.13	6.06	76.89%
Histologic type_8312/3	−0.18	0.29	5.53	76.27%
T stage_1	0.19	0.07	−1.41	76.89%
Marital status_Widowed	−0.14	0.25	3.93	75.23%
Chemotherapy	−0.23	0.26	−8.26	72.20%

Table 7. Feature importance ranking of GB Uplift Model.

Variable	Correlation	Uplift Score	Qini Coefficient	Accuracy
M stage	−0.40	1.00	−7.29	75.05%
Age_old	−0.24	0.42	12.74	77.31%
Histologic type_8317/3	0.27	0.70	16.97	77.40%
Surgery_cancer	0.27	0.78	8.63	76.37%
Age_middle	0.23	0.34	18.04	77.12%
T stage_1	0.19	0.39	18.04	77.13%
Grade_2	0.21	0.28	18.22	77.82%
Marital status_Widowed	−0.14	0.36	12.11	76.21%
Histologic type_8312/3	−0.18	0.37	16.76	76.98%
Radiation	−0.23	0.68	−10.92	73.53%

Table 8. Survival time prediction metrics.

Model	RMSE	MAE
ADA	52.64	37.39
ET	46.63	32.00
GB	50.64	35.25
RF	50.36	34.99
XGB	51.39	35.64
Voting	50.54	35.14
Stacking	50.40	34.98
Blending	50.35	34.84

Table 9. Survival status prognosis metrics of 1–5 years.

Model	Metrics	1 Year	2 Year	3 Year	4 Year	5 Year
ADA	Accuracy	66.42	68.52	70.05	69.79	70.42
	Precision	69.06	73.27	72.71	73.36	72.64
	Recall	66.42	68.52	70.05	69.79	70.42
	F1 score	63.09	68.11	70.56	70.68	71.24
ET	Accuracy	68.67	72.81	74.38	74.60	75.23
	Precision	71.19	74.99	76.67	76.02	73.62
	Recall	68.67	72.81	74.38	74.60	75.23
	F1 score	65.77	72.76	74.76	75.07	74.00
GB	Accuracy	68.29	71.90	73.56	73.14	74.38
	Precision	70.92	74.52	76.20	74.64	73.75
	Recall	68.19	71.90	73.56	73.14	74.38
	F1 score	65.46	71.84	73.97	73.64	74.01
RF	Accuracy	67.57	71.18	72.71	73.80	75.02
	Precision	70.60	74.12	75.69	75.40	74.53
	Recall	67.57	71.18	72.71	73.80	75.02
	F1 score	64.61	71.17	73.22	74.34	74.75
XGB	Accuracy	68.33	71.68	72.90	73.10	74.47
	Precision	71.20	74.34	75.63	74.55	73.51
	Recall	68.33	71.68	72.90	73.10	74.47
	F1 score	65.46	71.66	73.37	73.60	73.88
Voting	Accuracy	67.56	70.86	72.84	73.75	75.14
	Precision	71.07	74.29	75.49	75.42	74.38
	Recall	67.56	70.86	72.84	73.75	75.14
	F1 score	64.42	70.81	73.34	74.30	74.69
Stacking	Accuracy	68.56	71.87	73.28	73.42	74.45
	Precision	71.28	74.52	76.00	74.89	73.45
	Recall	68.56	71.87	73.28	73.42	74.45
	F1 score	65.83	71.83	73.72	73.91	73.82
Blending	Accuracy	68.98	71.65	72.82	73.61	74.25
	Precision	71.17	74.24	75.67	75.30	73.27
	Recall	68.98	71.65	72.82	73.61	74.25
	F1 score	66.58	71.61	73.27	74.16	73.64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, K.; Fong, S.; Li, T.; Song, Q. Multimodal Machine Learning for Prognosis and Survival Prediction in Renal Cell Carcinoma Patients: A Two-Stage Framework with Model Fusion and Interpretability Analysis. Appl. Sci. 2024, 14, 5686. https://doi.org/10.3390/app14135686

AMA Style

Yan K, Fong S, Li T, Song Q. Multimodal Machine Learning for Prognosis and Survival Prediction in Renal Cell Carcinoma Patients: A Two-Stage Framework with Model Fusion and Interpretability Analysis. Applied Sciences. 2024; 14(13):5686. https://doi.org/10.3390/app14135686

Chicago/Turabian Style

Yan, Keyue, Simon Fong, Tengyue Li, and Qun Song. 2024. "Multimodal Machine Learning for Prognosis and Survival Prediction in Renal Cell Carcinoma Patients: A Two-Stage Framework with Model Fusion and Interpretability Analysis" Applied Sciences 14, no. 13: 5686. https://doi.org/10.3390/app14135686

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multimodal Machine Learning for Prognosis and Survival Prediction in Renal Cell Carcinoma Patients: A Two-Stage Framework with Model Fusion and Interpretability Analysis

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Multimodal Data Pre-Processing and Analysis

3.2. Proposed Machine Learning and Fusion Models

3.3. Evaluation Metrics of the Two Stage Machine Learning Process

4. Experiment Results and Discussion

4.1. Stage I: Classification and Interpretable Analysis

4.2. Stage II: Regression and Interpretable Analysis

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI