Next Article in Journal
Alterations in HLA Class I-Presented Immunopeptidome and Class I-Interactome upon Osimertinib Resistance in EGFR Mutant Lung Adenocarcinoma
Next Article in Special Issue
BOLD Coupling between Lesioned and Healthy Brain Is Associated with Glioma Patients’ Recovery
Previous Article in Journal
Effect of Exercise Training on Quality of Life after Colorectal and Lung Cancer Surgery: A Meta-Analysis
Previous Article in Special Issue
Glioblastoma Surgery Imaging–Reporting and Data System: Validation and Performance of the Automated Segmentation Task
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Overall Survival Time in Glioblastoma Patients Using Gradient Boosting Machines Algorithm and Recursive Feature Elimination Technique

by
Golestan Karami
1,2,*,
Marco Giuseppe Orlando
1,
Andrea Delli Pizzi
1,2,
Massimo Caulo
1,2 and
Cosimo Del Gratta
1,2
1
Department of Neuroscience, Imaging and Clinical Sciences, Gabriele D’Annunzio University, 66100 Chieti, Italy
2
Institute for Advanced Biomedical Technologies, Gabriele D’Annunzio University, 66100 Chieti, Italy
*
Author to whom correspondence should be addressed.
Cancers 2021, 13(19), 4976; https://doi.org/10.3390/cancers13194976
Submission received: 1 August 2021 / Revised: 20 September 2021 / Accepted: 29 September 2021 / Published: 4 October 2021
(This article belongs to the Special Issue Perioperative Imaging and Mapping Methods in Glioma Patients)

Abstract

:

Simple Summary

Despite the highly aggressive nature of glioblastoma multiforme (GBM), survival time is in practice highly variable, and some of the patients remain stable for several years after treatment. The aim of this study was to develop a machine learning method that could precisely predict survival time of GBM patients. To do so, we integrated multi-modal MRI with non-supervised and supervised machines. We first identified compartments of the tumor then extracted their features. Then relevant useful features were selected by Random Forest-Recursive Feature Elimination (RF-RFE) to feed into Gradient Boosting Machine Algorithm with the aim of classifying GBM patients. By selecting the most relevant features, multi-modality MRI with tumor segmentation provided valuable independent and complete features to feed a machine learning model. Additionally, advanced machine-learning methods such as RF-RFE and GBoost are powerful tools for data mining. Hand-crafted feature-based methods have shown promising results, but there is no systematic way to determine survival-related hand-crafted features and existing methods mostly rely on experience.

Abstract

Despite advances in tumor treatment, the inconsistent response is a major challenge among glioblastoma multiform (GBM) that lead to different survival time. Our aim was to integrate multimodal MRI with non-supervised and supervised machine learning methods to predict GBM patients’ survival time. To this end, we identified different compartments of the tumor and extracted their features. Next, we applied Random Forest-Recursive Feature Elimination (RF-RFE) to identify the most relevant features to feed into a GBoost machine. This study included 29 GBM patients with known survival time. RF-RFE GBoost model was evaluated to assess the survival prediction performance using optimal features. Furthermore, overall survival (OS) was analyzed using univariate and multivariate Cox regression analyses, to evaluate the effect of ROIs and their features on survival. The results showed that a RF-RFE Gboost machine was able to predict survival time with 75% accuracy. The results also revealed that the rCBV in the low perfusion area was significantly different between groups and had the greatest effect size in terms of the rate of change of the response variable (survival time). In conclusion, not only integration of multi-modality MRI but also feature selection method can enhance the classifier performance.

Graphical Abstract

1. Introduction

Glioblastoma multiforme (GBM) as high-grade gliomas (HGGs), is the most common and aggressive brain malignancy in adults, consisting of 16% of all primary central nervous system neoplasms [1], with a median survival of 15 months [2]. Despite the highly aggressive nature of GBM, some of them remain stable for several years after treatment, and their prognosis and survival times are practically different [3].
Studies indicated that traditional WHO grading could not capture the biological characteristics of gliomas and lacks power in prognosticating the clinical course of gliomas. The 2016 CNS WHO presented major restructuring of the diffuse gliomas classification, and for the first time, used molecular parameters in additions to histology [4]. The WHO classification of CNS tumors was defined by both histology and molecular features, including glioblastoma, IDH-wild type, and glioblastoma, IDH-mutant. The molecular subtypes depend on the presence or absence of mutations in the isocitrate dehydrogenase (IDH) gene. To better predict survival time for gliomas patients, emphasis is put on the identification of molecular phenotypes [5,6]. Gliomas patients with IDH-mutated survive longer than those with IDH-wild type [7]. The 2021 fifth edition WHO classification introduces major changes that advance the role of molecular diagnostics in CNS tumor classification. GBMs are classified as adult type diffuse gliomas patients [8].
Healthcare has focused on improving survival in the treatment of brain tumor patients [9,10]. The effectiveness of the treatment procedure depends on the extent of resection and sensitivity of the tumor to chemo-radiation therapy, molecular subtype of tumors, and their grading [7,11]. Despite the advances in treatment, the inconsistent response is a major challenge in treatment, which could be related to the extensive heterogeneity [12], and to molecular subtype [13]. Now the only way to identify a molecular diagnostic of gliomas is through surgical biopsy or complete resection of the tumor. Since, surgical biopsy often provides limited information due to tumor spatial heterogeneity and does not allow real-time monitoring of the tumor, and GBMs can infiltrate the surrounding brain tissue rapidly, early non-invasive diagnosis in order to design individual treatment planning is critical. Moreover, a variety of distinct cells exist in the tumor which display diverse treatment responses which can be the key of treatment failure [12].
Magnetic resonance imaging (MRI) is a rich source of patients’ information, and being used as a part of the procedure for diagnosis and follow-up of brain tumor patients [14]. Moreover, MRI techniques including dynamic susceptibility contrast (DSC) and diffusion tensor imaging (DTI) provide a valuable source of data to evaluate more comprehensively the biology of the tumors. Diffusion and also perfusion differ within the tumor, and differences of patients’ survival might be caused due to differences in this heterogeneity. DSC-MRI allows investigating tumor angiogenesis and perfusion. Due to high angiogenesis of GBMs, they show high perfusion and intra-tumor heterogenous vascular pattern with a necrotic area. When the demand and supply of nutrients are mismatched the sufficiently perfused habitats may lead to progression and proliferation, whereas the insufficiently perfused habitats may induce clones resistant to therapy via hypoxia. Hence, this heterogeneity could cause poor treatment and inconsistent response [15,16]. DTI-MRI provides information about the motion of water protons at the cellular level reflecting structural data on cellular density and extracellular space [17]. The mean diffusivity (MD) is a MRI biomarker that correlates with cellular packing [18] and glioma grade [19,20]. Fractional anisotropy (FA) values describe the degree of anisotropy in the given voxel. Anisotropy reflects the increased cellularity and restriction of water diffusion in high-grade gliomas [21,22,23].
The parameters derived from DTI and DSC could represent complementary information of GBM patients, but due to the large number of features and the comparatively low number of patients, a preliminary feature selection method to better classification is required. Hence, not only the useful selected features may improve data’s compatibility with a machine learning model class but also it will shorten training time.
Hand-crafted features for machine learning techniques have been explored to predict patient outcomes. Vergun et al. [24] used multimodal images including resting-state fMRI, task fMRI, and DTI in combination with clinical and demographic variables fed into a support vector machine (SVM) to predict outcomes. One of the predicted variables was mortality within 18 months after surgery. Their model was able to classify patient mortality with 80.7% accuracy. In another study using multimodal MRI (post-contrast T1 and DTI), Nie et al. [25] proposed a multi-channel architecture of 3D convolution neural network (CNN) to extract high-level features. Then these features along with demographic tumor features were fed into SVM to finally predict overall survival time. They obtained 90.66% accuracy in predicting survival time. Chaddad et al. [26] proposed a novel class of multimodal image features based on the joint intensity matrix (JIM) to model fine-grained texture signatures in the radiomic analysis of low-grade glioma (LGG) tumors to predict gene status and overall survival time of patients. Their results showed that JIM features can predict the mutant or wild-type status of relevant genes for LGG. Classification combining all features including volume, JIM, and gray-level co-occurrence matrix (GLCM) resulted in an AUC value of 86.79% in predicting short and long LGG patient survival outcomes. Moreover, they showed that JIM features were generally the most informative predictors and provided information that is complementary to conventional GLCMs.
Recently Recursive Feature Elimination (RFE) algorithm appeared as one of the most popular feature selection techniques which provides good performance in many applications [27,28,29]. RFE uses a classifier to rank the features and recursively removes the weakest features [30,31]. The process of removing the weakest features continues until the number of required features is reached. Random Forests (RF) algorithm as a classifier adds randomness in selecting a subset of predictors and the output of the random forest is the class selected by most trees. Due to this randomness in the feature selection, the RF classifier works more robustly compared to other classifiers such as SVM, discriminative analysis, and neural networks. The RF-RFE estimates which features are more effective to discriminate the classes and eliminates features that are not useful in predicting the class. The performance of RFE depends on the algorithm used to choose features and the number of the features to select. These hyper parameters can be optimized.
Gradient boosting (GBoost) is a powerful ensemble machine learning method [32,33]. The idea of boosting is to use a forward stage-wise strategy to build an additive model. Boosting as an ensemble technique corrects the performance of prior models, and uses the residual errors of the previous model as the weight values. To optimize the performance of the classifiers, learning rate and loss function are changed like Neural Networks. Due to features such as being user friendly and impressive predictive accuracy, GBoost has shown to perform exceptionally well in different fields. Ogunleye et al. modeled extreme gradient boosting (XGBoost) method for Chronic Kidney Disease (CKD) diagnosis and achieved high accuracy and performance [34]. Zhong et al. applied XGBoost method and showed that incorporating multivariate features improved the performance of the method in predicting protein [34].
Despite the comprehensive advanced imaging, there is still uncertainty about GBM survival time. In this study, we exploited the potential of multi-modal MRI and GBoost machine to address the issue of predicting survival time of GBM patients. We used multi-modality data (post-contrast T1w, T2-w, DTI, and DSC maps) and determined different compartments of the tumor. We hypothesized that survival differences of patients might be caused due to differences in GBM tumor heterogeneity, and perfusion and diffusion parameters derived could represent complementary information of GBM patients. Intensity and texture features of different tumor compartments play an important role in the creation of biomarkers for gliomas and might be a specific marker of patient outcome. Moreover, in order to select the main relevant features, we used RF-RFE method. We expected these features to facilitate and complement the machine learning framework to predict GBM patient survival time.

2. Materials and Methods

2.1. Patients Cohort

Neuroimaging and clinical data of 29 Glioblastoma multiforme (GBM) patients with known survival time were included in this study. These patients were selected from a database of individuals who received DSC and DTI as part of presurgical planning for gliomas patients at ITAB, Università degli Studi “G. d’Annunzio” Chieti-Pescara between May 2011 and August 2016. This study was approved by the local ethics committee. Patients signed informed consent on use of their data. All patients underwent surgery, and tumors were histologically proven. Exclusion criteria included low grade patients, history of previous cranial surgery, and inability to undergo MRI scanning. GBM patients with available DSC data, DTI, and survival data were analyzed.
Table 1 lists the clinical information of the patients. The collected clinical data included age, gender, date of surgery, histology and WHO tumor grade, and date of death or date of last follow-up, in some cases there was molecular information as well. All patients underwent complete tumor resection surgery. The survival times were defined as the lapse between the date of pre-surgery MRI and the date of death or last follow-up.
The prognosis for GBM patients is poor, and median survival is 15 months [1] even with an advanced treatment. Therefore, we chose less than 15 month as short-term survival time, and more than 15 month as long-term survival time to classify GBM patients.

2.2. MR Imaging Protocols

All MR images were obtained preoperatively with a 3-Tesla MR imaging scanner (Achieva, Philips Medical Systems, Amsterdam, the Netherlands) with 8 channel head coil. The preoperative MRI sequences included post-contrast T1-weighted 3D volumetric sequences (TR ms/TE ms, 7.7/3.7; slice thickness 2 mm; voxel size, 0.97 × 0.97 × 2 mm3; acquisition matrix, 256 × 256 × 70), T2-weighted turbo spin echo sequence (TR ms/TE ms, 3000/80 ms; slice thickness 4 mm; acquisition matrix, 560 × 560 × 27), FLAIR imaging without contrast (TR ms/TE ms/IT ms, 11,000/125 ms; slice thickness 5 mm, acquisition matrix, 560 × 560 × 27), DTI scans were acquired using a pulsed gradient spin-echo sequence with a single-shot echo-planar acquisition(TR ms/TE ms, 4.5/0.72; acquisition matrix: 128 × 128 × 60; slice thickness 2 mm with b-value of 1000 s/mm2 in 17 uniformly distributed directions), and DSC scans were acquired using a pulsed gradient spin-echo sequence with a single-shot echo-planar acquisition(TR ms/TE ms, 1720/35 ms; flip angle 75 degrees; slice thickness 5 mm; acquisition matrix 128 × 128 × 25; 50 volumes).

2.3. Methodology

The proposed approach consisted of four steps: step 1: Pre-processing-DTI and DSC metrics extraction, co-registration and defining the tumor; step 2: tumor segmentation by K-means clustering; step 3: histogram feature extraction and texture feature extraction using GLCM; step 4: feature reduction using RF-RFE; step5: statistical analysis; step6: training a GBoost machine by applying leave-one-out cross validation approach to evaluate the model.
The outline of the proposed approach is illustrated in Figure 1. As shown, the features were extracted from MRIs using histogram and GLCM. After normalization, RF-RFE was applied to derive reduced, discriminated, and uncorrelated set of features. Finally, the reduced features were used as the input of the GBoost machine. The gradient boosting machine is using decision trees and is a powerful ensemble machine learning algorithm. All steps are explained in detail in the following subsections.

2.3.1. Data Pre-Processing

DTI images were processed with the diffusion toolbox in FMRIB Software Library (https://fsl.fmrib.ox.ac.uk (accessed on 1 March 2019)). Eddy current correction, brain extraction, and diffusion tensor fitting were performed, and the FA and MD maps were calculated for each patient.
DSC images were processed by perfusion mismatch analyzer (PMA) software. The arterial input function was automatically defined. A standard population-based arterial input function was defined in PMA software, and maps of cerebral blood volume (CBV) and cerebral blood flow (CBF) were calculated for each patient.
To ensure the same view of physiological and structural images and extract all map features at all tumor slices, images were co-registered to the DTI-B0 images by Linear Image Registration Tool (FLIRT) in FMRIB Software Library (FMRIB, Oxford, UK).
Contouring of the region of interest was performed manually by a neuroradiologist. Post-contrast T1, FLAIR, and T2-weighted images were used to contour the tumor margins manually in 3D-Slicer (https://www.slicer.org (accessed on 1 March 2019)). Care was taken that the tumor mask enclosed entirely the whole solid part of the tumor. To do so, the mask of the tumor was drawn on each slice where the contrast-enhanced on the post-contrast T1-w images and high signal intensity on the T2-w images were visible. The tumor mask was used to extract the post-contrast T1-w, T2-w, MD, FA, rCBV, and CBF maps of the tumor and extract voxel-wise values of them.

2.3.2. Tumor Segmentation by K-Means Clustering

Post-contrast T1, MD, and CBV maps of tumor lesions were segmented by the k-means clustering segmentation method separately. A cluster refers to a collection of data points that have certain similarities, and each data point is allocated to one cluster with the closest similarity.
Three extracted clusters within post-contrast T1 tumor images included the contrast-enhanced (CE), non-enhanced (non-en), and necrotic. Two clusters including low mean diffusivity (LMD) and high mean diffusivity (HMD) were extracted on the MD tumor map. A low MD cluster was obtained from the lower MD voxels. The low rCBV (LrCBV) cluster was obtained from the lower rCBV voxels (Figure 2). Based on the overlapping different regions, LMD-LrCBV ROI was yielded within the CE region. Moreover, LMD-LrCBV ROI in necrosis region was extracted. A common area between high mean diffusion and low perfusion was defined as HMD-LrCBV within the necrosis ROI.
Iteration of these processes slice by slice was performed on post-contrast T1, the MD, and rCBV tumor maps. Normal-appearing white matter (NAWM) was drawn from the contralateral normal brain tissue individually. This region was far from the tumor location and had no perceivable abnormalities. All these processes were done in MATLAB (R2019b, The MathWorks, Inc., Natick, MA, USA) under the supervision of a neuroradiologist.

2.3.3. Histogram and Texture Feature Extraction and Feature Reduction

For complex feature extraction, we utilized multi-modal MRI to determine different compartments (ROIs) of the tumor and then extract their features. For each individual ROI, three different feature types were extracted.
First, ROI-histogram-based statistics features including mean value, standard deviation, skewness, and kurtosis were calculated. Second, ROI-based texture features of the tumor including contrast, correlation, and homogeneity were calculated. We extracted texture features of the tumor using the gray level co-occurrence matrix (GLCM). The GLCM functions characterize the texture of an image by calculating how often pairs of pixels with specific values and in a specified spatial relationship occur in an image [35]. We analyzed the entire tumor volume and relative volumes of ROIs. Lastly, 480 extracted features were fused to be fed into the machine.
Decreasing the dimension of features is a necessary step before training a machine-learning model to improve the prediction accuracy, the generalization ability, and decrease the computation time as well. In this paper we used RFE as a feature selection approach. There are two configuration options; the choice in classification algorithm to rank the features and recursively remove the lowest features, and the choice in number of features. Hence, we applied Random Forest-based recursive feature elimination (RFE) feature selection for mining 60 best features.

2.3.4. Statistical Analysis

All statistical analyses were performed using SPSS (version 25). After Kolmogorov–Smirnov test, the features were compared by Kruskal Wallis test between ROIs and the statistical significance was evaluated. Features were significantly different between different ROIs. Spearman rank correlation was used to model the relation between the features and survival. Results showed that there is a negative correlation between rCBV and MD mean values with survival (p < 0.01). Cox proportional hazards regression analyses were performed to evaluate the features and their effect on survival time.

2.3.5. Gradient Boosting Machine

There are hyper parameters that affect model performance, such as learning rate, size of the data sample used to train each model, and finally depth of the decision tree. We optimized the hyper parameters on the dataset and then trained the model.
We used leave-one-out cross-validation to verify the predictions and avoid overfitting. The advantage of leave-one-out cross-validation is that all observations are used for both training and testing once. LOOCV results were averaged to produce a single estimation to evaluate the testing dataset. Trail LOOCV was repeated 50 times. Finally, metrics, including accuracy, precision, recall, F1-measure, and Matthews correlation coefficient were calculated and reported. We used Scikit-learn, a machine learning package in the python programming language to implement all the codes.

3. Results

In this study, we used presurgical neuroimaging and clinical data of 29 GBM patients (19 men, 10 women; mean age, 57.6 years; age range, 23–80 years) with high grade gliomas, 16 short-term and 13 long-term survivals (Table 1). We used post-contrast T1, the MD, and CBV maps to extract regions of interest (ROIs). Histogram-based and texture features were extracted at selected ROIs. After measuring potentially important parameters we fed them to the prediction model. First, we report the statistical analysis results related to the survival time and the features, and then we report the RF-RFE GBoost machine classifier results.

3.1. Survival Analysis

The survival times were divided into two groups, short-term (less than 15 month), and long-term (more than 15 month) survival. The short-term overall survival rate was 0.55. The average survival time was 27.1 months, and the median survival time was 13.1 months.
Histogram-based and texture features were compared between group ROIs and groups. Kruskal–Wallis test was used to compare features at the ROI level. Mean value of features were significantly different. Spearman rank correlation was used to model the relation between the features and survival. Post-contrast T1, T2, rCBV revealed a weak negative correlation coefficient with survival time (p-value < 0.05).
Univariate and multivariate Cox regression analysis was used to evaluate the features and their effect on survival time. The rCBV in LMD-LCBV in CE region was significant in univariate and multivariate Cox regression analysis (Table 2) and had the greatest effect size in terms of the rate of change of the response variable (survival time).

3.2. DTI-DSC Correlation at the Contrast-Enhanced Tumor Region

We found only one significant inverse correlation between derived metrics at the contrast-enhanced region with survival: low rCBV at the CE region (p-value < 0.05). The higher rCBV in LMD-LrCBV ROI in CE region was associated with worse survival (HR = 1.433, p-value < 0.018). Short-term survival group in the LMD-LrCBV ROI showed higher T2w, rCBV and lower MD values than long-term survival group.

3.3. DTI-DSC Correlation at the Necrosis Region

At the necrotic region, we found statistical significance of four derived metrics with survival. Higher rCBV, post-contrast T1w, and T2w values in the necrotic region wereassociated with worse survival (p-value < 0.05). Higher rCBV in LMD-LrCBV ROI in the necrotic region was associated with worse survival (HR = 1.028, p-value < 0.05). Short-term survival group showed a higher rCBV than long-term survival group (HR = 1.212, p < 0.05), and an increased CBF (HR = 0.99, p < 0.05).

3.4. Texture Features Analysis

Texture features were different according to ROI and survival time. Among these features, contrast on post-contrastT1w images and rCBV in LMD-LrCBV ROI was significantly different between groups (p-value < 0.05). Higher post-contrast T1w, FA, and rCBV contrast were associated with worse survival.

3.5. Volume Analysis

Table 3 shows the relative volume of ROIs in two groups. There was no significant difference in whole tumor volume between the two groups, whereas there was positive correlation between the necrosis and HMD-LrCBV relative volumes with survival. The results showed that in short-term survival group, the LMD-LrCBV relative volume (3.2 ± 0.73 cm3) was significantly larger than LMD-LrCBV relative volume (2.44 ± 0.88 cm3) in long-term survival group (p < 0.05). Although non-significant, the contrast-enhanced (CE) relative volume was larger in the short-term survival group.

3.6. GBoost Survival Classifier Results

In LOOCV, the GBoost classifier outputs a decision value to predict the survival time class for each single patient, the receiver operating curve (ROC) was delineated for testing datasets generated by the LOOCV (Figure 3). To illustrate the diagnostic performance of the predictive model, the area under the curve (AUC) was calculated (Figure 4, Table 4). A GBoost trained model using best selected features by RF-RFE method was able to classify patient survival with 75% accuracy. Classification of survival time with all features without RF-RFE achieved 58% accuracy (Table 4).

4. Discussion

Conventional MRI protocols are regularly used as part of the diagnosis and treatment assessment procedure of gliomas. These images provide a rich source of information for clinical evaluation and statistical predictions. The non-invasive advanced images with conventional MR images may potentially improve the diagnostic value of MR images. Here we integrated conventional and advanced MRI to identify different compartments of diffusion and perfusion of the tumor and then extracted the features. We expected that perfusion and diffusion features derived can represent complementary information of GBM patients, and these features facilitate and complement the machine learning framework. Moreover, we utilized the potential of RF-RFE GBoost machine to predict survival time of GBM patients precisely. Advanced machine learning methods such as RF-RFE and GBoost are powerful tools for data mining
Many studies have assessed the clinical importance of the advanced MRI markers previously. The rCBV as a biomarker of angiogenesis [36] has been positively correlated with tumor vascularity and cellular proliferation [37]. Our finding showed that identifying the tumor components by combined conventional and advanced MRI resulted in an important clinical diagnosis. Our results showed that higher rCBV values in the low perfusion ROI were associated with worse survival (p-value < 0.05). According to previous studies [38], rCBV is associated with hypoxia-initiated angiogenesis that could lead to resistance to treatment. Moreover, our finding showed that higher tumor necrosis volume was associated with better survival, but higher rCBV at necrosis area prognosticated worse survival. Tumor necrosis rCBV indicates increased metabolic rate. It can be imaged by PET and MRS (Lactate peak) for further revaluation. Although non-significant, our results showed that higher rCBV values are correlated with worse survival in high perfusion ROI which could imply faster tumor growth.
The MD as a parameter of tissue cellularity [39] has a diagnostic performance. Our results showed that lower MD values correlated with worse survival, but significantly only in the short-term survival group (p-value < 0.05). Restricted diffusivity is due to higher tumor cellularity and a sign of higher proliferation. Several studies demonstrated that more aggressive gliomas have lower MD values [23,40], although a meta-analysis also showed that the MD value had an inverse correlation with cellularity in gliomas, this correlation was not consistent in all tumor types [19,20]. There was no significant correlation between FA and survival, but the FA values were significantly different in all ROIs.
Tumor contouring and segmentation is a challenging and important step because most work depends on manual delineation and validation slice by slice for each tumor. Although tumor segmentation was done by the k-means algorithm, it still needed to be verified by a neuroradiologist. This step is time-consuming, and inter- or intra-observer disagreement seems to be unavoidable. We proposed a method to determine different compartments in a heterogeneous GBM tumor using multi-modal MRI and investigated their effects on patient survival. The LMD-LrCBV ROI was described as the combination of the low MD and the low rCBV values in the CE region. Our results showed that lower MD and higher rCBV values correlated with poor survival in this ROI. Moreover, our results showed that the LMD-rCBV relative volume in short-term survival group was significantly larger than LMD-LrCBV relative volume in long-term survival group (p-value < 0.05) that may be a sign of higher cellularity and proliferation. GBM patients with short-term survival time tended to have a higher volume of the CE, low perfusion, and low diffusion in the CE region. Besides, a larger necrosis relative volume and HMD-LrCBV relative volume can indicate a better prognosis and survival time provided that rCBV is low. This can support the thesis that if most of the low perfusion compartment features high diffusion, the probability of the presence of other compartments and an aggressive tumor is smaller.
The great advantage of multimodal MRI analysis with a proper feature selection method is that the most effective features can be found and used to improve the performance accuracy of the machine learning model. In fact, our findings have clinical significance. The main purpose of the work was to develop an RF-RFE GBoost method that could identify influent features affecting the survival time and predict the survival time of GBM patients with high accuracy. The overall accuracy of the survival time prediction was approximately 75% for best features selected by RF-RFE and 55% without feature selection method, respectively. This work showed that advanced neuroimaging data in combination with unsupervised and supervised learning machines can provide an accurate result in predicting the survival time of GBM patients. There are patients who are expected to have poor outcomes due to GBM nature, but the prediction of a long survival time increases hope for improvement for the patients. Hence, more aggressive treatment is needed to improve survival time.
Other clinical significance of our study was that we included only GBM patients. GBMs are known with poor prognosis compared to other gliomas patients. In a similar study, Vergun et. al. [24] used multimodal images in combination with clinical and demographic variables to train SVM to predict patients’ outcome. Although their model was able to predict patients’ mortality with 80.7% accuracy; they included all grade types of the gliomas. Gliomas have significant different prognosis. Oligodandrogliomas as grade II are associated with good prognosis with median overall survival of 8 years. IDH-wild type diffuse astrocytomas are molecularly and clinically similar to GBM, and they have a poor prognosis (OS less than 2 years). IDH-mutant diffuse astrocytomas show intermediate survival time (OS bigger than 2 years) [41]. However, difference between their survivals is too large, and at least classification of all gliomas types into two classes might not to be a good idea.
There were some limitations of this study that needed to be considered. The main limitation was about the sample size. The main problem with a small dataset is interpretation of the results, and also the performance of the machine classifier depends on the sample size and characteristics of the sample. Conventional machine learning methods can well handle a small samples-size classification problem in subject level. Moreover, the feature selection method is one of the main techniques which can affect classifier performance and improve the classifier’s performance. Additionally, integration of structural and advanced MRI could help to better extract the most relevant parameters.
This study was a retrospective study, and data were collected in an individual center. It took several years to complete and collect data, and molecular analysis of CNS tumors was not readily available for all patients, and diagnosis was mostly based on histopathological analysis. Therefore, we were not able to consider molecular GBM information. However, methods to determine a GBM patient’s molecular subtypes or even tumor histopathology require an invasive biopsy or surgical resection. MRI as a rich source of patient’s information may serve as a noninvasive technique to determine molecular subtypes and survival time prediction of GBM patients. We aimed at predicting GBM patients’ survival time just based on their non-invasive pre-surgery MR images.

5. Conclusions

In conclusion, this work showed that the RF-RFE Gboost machine could identify useful features and predict the survival time of GBM patients with high accuracy. The clinical significance of this work is to provide a clinical tool in predicting complementary level of prognosis using only clinical MR images non-invasively. It may help surgeons and oncologists when deciding the treatment strategy.

Author Contributions

Conceptualization, G.K., M.C., C.D.G.; methodology, G.K., M.G.O., A.D.P., M.C., C.D.G.; software G.K.; validation, G.K., M.G.O., A.D.P., M.C., C.D.G.; formal analysis G.K., M.G.O.; resources, G.K., M.G.O., A.D.P., M.C., C.D.G.; datacuration, G.K., M.G.O., A.D.P., M.C., C.D.G.; writing—original draft preparation, G.K.; writing—review and editing, G.K., A.D.P., C.D.G.; visualization, G.K.; supervision, M.C. and C.D.G.; project administration, C.D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 713645.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by Ethics Committee Department Neuroscience, Imaging, and Clinical Science, Gabriele D’Annuzio University (Number 1762 27/03/2019).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The dataset generated during and/or analyzed during the current study are not publicly available due to the clinical and confidential nature of the material but can be made available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Thakkar, J.P.; Dolecek, T.A.; Horbinski, C.; Ostrom, Q.; Lightner, D.D.; Barnholtz-Sloan, J.; Villano, J.L. Epidemiologic and Molecular Prognostic Review of Glioblastoma. Cancer Epidemiol. Biomark. Prev. 2014, 23, 1985–1996. [Google Scholar] [CrossRef] [Green Version]
  2. Davis, M.E. Glioblastoma: Overview of Disease and Treatment. Clin. J. Oncol. Nurs. 2016, 20, S2–S8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Parsons, D.W.; Jones, S.; Zhang, X.; Lin, J.C.-H.; Leary, R.J.; Angenendt, P.; Mankoo, P.; Carter, H.; Siu, I.-M.; Gallia, G.L.; et al. An Integrated Genomic Analysis of Human GlioblastomaMultiforme. Science 2008, 321, 1807–1812. [Google Scholar] [CrossRef] [Green Version]
  4. Louis, D.N.; Perry, A.; Reifenberger, G.; von Deimling, A.; Figarella-Branger, D.; Cavenee, W.K.; Ohgaki, H.; Wiestler, O.D.; Kleihues, P.; Ellison, D.W. The 2016 World Health Organization Classification of Tumors of the Central Nervous System: A summary. Acta Neuropathol. 2016, 131, 803–820. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Olar, A.; Wani, K.M.; Alfaro, K.; Heathcock, L.E.; Van Thuijl, H.F.; Gilbert, M.R.; Armstrong, T.; Sulman, E.P.; Cahill, D.; Vera-Bolanos, E.; et al. IDH mutation status and role of WHO grade and mitotic index in overall survival in grade II–III diffuse gliomas. Acta Neuropathol. 2015, 129, 585–596. [Google Scholar] [CrossRef] [Green Version]
  6. Qi, S.; Yu, L.; Gui, S.; Ding, Y.; Han, H.; Zhang, X.; Wu, L.; Yao, F. IDH mutations predict longer survival and response to temozolomide in secondary glioblastoma. Cancer Sci. 2012, 103, 269–273. [Google Scholar] [CrossRef] [PubMed]
  7. Hartmann, C.; Hentschel, B.; Wick, W.; Capper, D.; Felsberg, J.; Simon, M.; Westphal, M.; Schackert, G.; Meyermann, R.; Pietsch, T.; et al. Patients with IDH1 wild type anaplastic astrocytomas exhibit worse prognosis than IDH1-mutated glioblastomas, and IDH1 mutation status accounts for the unfavorable prognostic effect of higher age: Implications for classification of gliomas. Acta Neuropathol. 2010, 120, 707–718. [Google Scholar] [CrossRef] [Green Version]
  8. Louis, D.N.; Perry, A.; Wesseling, P.; Brat, D.J.; Cree, A.I.; Figarella-Branger, D.; Hawkins, C.; Ng, H.K.; Pfister, S.M.; Reifenberger, G.; et al. The 2021 WHO Classification of Tumors of the Central Nervous System: A summary. Neuro-Oncology 2021, 23, 1231–1251. [Google Scholar] [CrossRef]
  9. DeAngelis, L.M.; Gutin, P.H.; Leibel, S.A.; Posner, J.B. Intracranial Tumors: Diagnosis and Treatment; CRC Press: Boca Raton, FL, USA, 2001. [Google Scholar]
  10. Patel, S.; Bansal, A.; Young, E.; Batchala, P.; Patrie, J.; Lopes, M.; Jain, R.; Fadul, C.; Schiff, D. Extent of Surgical Resection in Lower-Grade Gliomas: Differential Impact Based on Molecular Subtype. Am. J. Neuroradiol. 2019, 40, 1149–1155. [Google Scholar] [CrossRef] [PubMed]
  11. Hanahan, D.; Weinberg, R.A. Hallmarks of Cancer: The Next Generation. Cell 2011, 144, 646–674. [Google Scholar] [CrossRef] [Green Version]
  12. Sottoriva, A.; Spiteri, I.; Piccirillo, S.G.M.; Touloumis, A.; Collins, V.P.; Marioni, J.C.; Curtis, C.; Watts, C.; Tavaré, S. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl. Acad. Sci. USA 2013, 110, 4009–4014. [Google Scholar] [CrossRef] [Green Version]
  13. Verhaak, R.G.; Hoadley, K.A.; Purdom, E.; Wang, V.; Qi, Y.; Wilkerson, M.D.; Miller, C.R.; Ding, L.; Golub, T.; Mesirov, J.P.; et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010, 17, 98–110. [Google Scholar] [CrossRef] [Green Version]
  14. Lundy, P.; Domino, J.; Ryken, T.; Fouke, S.; McCracken, D.J.; Ormond, D.R.; Olson, J.J. The role of imaging for the management of newly diagnosed glioblastoma in adults: A systematic review and evidence-based clinical practice guideline update. J. Neuro-Oncol. 2020, 150, 95–120. [Google Scholar] [CrossRef] [PubMed]
  15. O’Connor, J.P.; Jackson, A.; Asselin, M.-C.; Buckley, D.L.; Parker, G.J.; Jayson, G.C. Quantitative imaging biomarkers in the clinical development of targeted therapeutics: Current and future perspectives. Lancet Oncol. 2008, 9, 766–776. [Google Scholar] [CrossRef]
  16. O’Connor, J.P.B.; Rose, C.; Waterton, J.C.; Carano, R.A.D.; Parker, G.J.M.; Jackson, A. Imaging Intratumor Heterogeneity: Role in Therapy Response, Resistance, and Clinical Outcome. Clin. Cancer Res. 2014, 21, 249–257. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Sadeghi, N.; Camby, I.; Goldman, S.; Gabius, H.J.; Balériaux, D.; Salmon, I.; Decaesteckere, C.; Kiss, R.; Metens, T. Effect of hydrophilic components of the extracellular matrix on quantifiable diffusion-weighted imaging of hu-man gliomas: Preliminary results of correlating apparent diffusion coefficient values and hyaluronan expression level. Am. J. Roentgenol. 2003, 181, 235–241. [Google Scholar] [CrossRef]
  18. Lu, J.; Li, X.; Li, H. Perfusion parameters derived from MRI for preoperative prediction of IDH mutation and MGMT promoter methylation status in glioblastomas. Magn. Reson. Imaging 2021, 83, 189–195. [Google Scholar] [CrossRef]
  19. Shiroishi, M.S.; Boxerman, J.L.; Pope, W.B. Physiologic MRI for assessment of response to therapy and prognosis in glioblastoma. Neuro-Oncology 2015, 18, 467–478. [Google Scholar] [CrossRef]
  20. Surov, A.; Meyer, H.J.; Wienke, A. Correlation between apparent diffusion coefficient (ADC) and cellularity is different in several tumors: A meta-analysis. Oncotarget 2017, 8, 59492–59499. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Gauvain, K.M.; McKinstry, R.C.; Mukherjee, P.; Perry, A.; Neil, J.J.; Kaufman, B.A.; Hayashi, R.J. Evaluating Pediatric Brain Tumor Cellularity with Diffusion-Tensor Imaging. Am. J. Roentgenol. 2001, 177, 449–454. [Google Scholar] [CrossRef]
  22. Beppu, T.; Inoue, T.; Shibata, Y.; Kurose, A.; Arai, H.; Ogasawara, K.; Ogawa, A.; Nakamura, S.; Kabasawa, H. Measurement of fractional anisotropy using diffusion tensor MRI in supratentorialastrocytic tumors. J. Neuro-Oncol. 2003, 63, 109–116. [Google Scholar] [CrossRef]
  23. Inoue, T.; Ogasawara, K.; Beppu, T.; Ogawa, A.; Kabasawa, H. Diffusion tensor imaging for preoperative evaluation of tumor grade in gliomas. Clin. Neurol. Neurosurg. 2005, 107, 174–180. [Google Scholar] [CrossRef]
  24. Vergun, S.; Suhonen, J.I.; Nair, V.A.; Kuo, J.; Baskaya, M.; Garcia-Ramos, C.; Meyerand, E.E.; Prabhakaran, V. Predicting primary outcomes of brain tumor patients with advanced neuroimaging MRI measures. Interdiscip. Neurosurg. 2018, 13, 109–118. [Google Scholar] [CrossRef]
  25. Nie, D.; Lu, J.; Zhang, H.; Adeli, E.; Wang, J.; Yu, Z.; Liu, L.Y.; Wang, Q.; Wu, J.; Shen, D. Multi-channel 3D deep feature learning for survival time prediction of brain tumor patients using multi-modal neuroimages. Sci. Rep. 2019, 9, 1103. [Google Scholar] [CrossRef] [Green Version]
  26. Chaddad, A.; Desrosiers, C.; Abdulkarim, B.; Niazi, T. Predicting the Gene Status and Survival Outcome of Lower Grade Glioma Patients With Multimodal MRI Features. IEEE Access 2019, 7, 75976–75984. [Google Scholar] [CrossRef]
  27. Richhariya, B.; Tanveer, M.; Rashid, A.H. Diagnosis of Alzheimer’s disease using universum support vector machine based recursive feature elimina-tion (USVM-RFE). Biomed. Signal Process. Control. 2020, 59, 101903. [Google Scholar] [CrossRef]
  28. Chen, Q.; Meng, Z.; Su, R. WERFE: A Gene Selection Algorithm Based on Recursive Feature Elimination and Ensemble Strategy. Front. Bioeng. Biotechnol. 2020, 8, 496. [Google Scholar] [CrossRef]
  29. Senan, E.M.; Al-Adhaileh, M.H.; Alsaade, F.W.; Aldhyani, T.H.H.; Alqarni, A.A.; Alsharif, N.; Uddin, M.I.; Alahmadi, A.H.; Jadhav, M.E.; Alzahrani, M.Y. Diagnosis of Chronic Kidney Disease Using Effective Classification Algorithms and Recursive Feature Elimi-nation Techniques. J. Healthc. Eng. 2021, 2021, 1004767. [Google Scholar] [CrossRef] [PubMed]
  30. Ijaz, M.; Rehman, A.U.; Hamdi, M.; Bermak, A. Recursive Feature Elimination with Random Forest Classifier for Compensation of Small Scale Drift in Gas Sensors. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; pp. 1–5. [Google Scholar]
  31. Zhang, L.; Zheng, X.; Pang, Q.; Zhou, W. Fast Gaussian kernel support vector machine recursive feature elimination algorithm. Appl. Intell. 2021, 1–14. [Google Scholar] [CrossRef]
  32. Ma, B.; Meng, F.; Yan, G.; Yan, H.; Chai, B.; Song, F. Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Comput. Biol. Med. 2020, 121, 103761. [Google Scholar] [CrossRef]
  33. Basha, S.; Vellore Institute of Technology University; Rajput, D.; Vandhan, V. Impact of Gradient Ascent and Boosting Algorithm in Classification. Int. J. Intell. Eng. Syst. 2018, 11, 41–49. [Google Scholar] [CrossRef]
  34. Ogunleye, A.A.; Wang, Q.-G. XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 17, 2131–2140. [Google Scholar] [CrossRef]
  35. Zulpe, N.; Pawar, V. GLCM textural features for brain tumor classification. Int. J. Comput. Sci. Issues 2012, 9, 354. [Google Scholar]
  36. Wesseling, P.; Ruiter, D.J.; Burger, A.P.C. Angiogenesis in brain tumors; pathobiological and clinical aspects. J. Neuro-Oncol. 1997, 32, 253–265. [Google Scholar] [CrossRef]
  37. Emblem, K.E.; Due-Tonnessen, P.; Hald, J.K.; Bjornerud, A.; Pinho, M.C.; Scheie, D.; Schad, L.R.; Meling, T.R.; Zoellner, F.G. Machine learning in preoperative glioma MRI: Survival associations by perfusion-based support vector ma-chine outperforms traditional MRI. J. Magn. Reson. Imaging 2014, 40, 47–54. [Google Scholar] [CrossRef] [PubMed]
  38. Kickingereder, P.; Sahm, F.; Radbruch, A.; Wick, W.; Heiland, S.; Von Deimling, A.; Bendszus, M.; Wiestler, B. IDH mutation status is associated with a distinct hypoxia/angiogenesis transcriptome signature which is non-invasively predictable with rCBV imaging in human glioma. Sci. Rep. 2015, 5, 16238. [Google Scholar] [CrossRef]
  39. Chenevert, T.L.; Stegman, L.D.; Taylor, J.M.G.; Robertson, P.L.; Greenberg, H.S.; Rehemtulla, A.; Ross, B.D. Diffusion Magnetic Resonance Imaging: An Early Surrogate Marker of Therapeutic Efficacy in Brain Tumors. J. Natl. Cancer Inst. 2000, 92, 2029–2036. [Google Scholar] [CrossRef] [PubMed]
  40. Lee, H.Y.; Na, D.G.; Song, I.C.; Lee, D.H.; Seo, H.S.; Kim, J.H.; Chang, K.H. Diffusion-tensor imaging for glioma grading at 3-T magnetic resonance imaging: Analysis of fractional anisotropy and mean diffusivity. J. Comput. Assist. Tomogr. 2008, 32, 298–303. [Google Scholar] [CrossRef]
  41. Thurnher, M.M. 2007 World Health Organization classification of tumours of the central nervous system. Cancer Imaging 2009, 9, S1–S3. [Google Scholar] [CrossRef] [PubMed]
Figure 1. GBMs survival prediction model outline using RF-RFE GBoost machine.
Figure 1. GBMs survival prediction model outline using RF-RFE GBoost machine.
Cancers 13 04976 g001
Figure 2. A sample of different brain MRI images: (a) post-contrast T1-weighted, (b) T2-weighte, (c) MD map, (d) FA map, (e) CBV map, (f) CBF map in 66-years-old woman with glioblastoma. All images are co-registered to the B0 DTI images. The tumor region was segmented into contrast enhanced, non-enhanced, necrosis in post-contrast image, low and high diffusion in MD map, and low and high perfusion in rCBV map by k-means. Based on their overlapping, sub-regions were obtained.
Figure 2. A sample of different brain MRI images: (a) post-contrast T1-weighted, (b) T2-weighte, (c) MD map, (d) FA map, (e) CBV map, (f) CBF map in 66-years-old woman with glioblastoma. All images are co-registered to the B0 DTI images. The tumor region was segmented into contrast enhanced, non-enhanced, necrosis in post-contrast image, low and high diffusion in MD map, and low and high perfusion in rCBV map by k-means. Based on their overlapping, sub-regions were obtained.
Cancers 13 04976 g002
Figure 3. ROC curve on the dataset.
Figure 3. ROC curve on the dataset.
Cancers 13 04976 g003
Figure 4. 50 trials and averaged test accuracies for each LOOCV trial.
Figure 4. 50 trials and averaged test accuracies for each LOOCV trial.
Cancers 13 04976 g004
Table 1. Patients ‘characteristics and clinical information.
Table 1. Patients ‘characteristics and clinical information.
Demographics
Age
Age range23–80 year
Average age57.6 year
Gender
Male19
Females10
Tumor located hemisphere
Right13
Left and bilateral16
Survival time
Short term survival < 15 month16
Long term survival > 15 month13
Table 2. Relationships between features and survival by univariate Cox regression and multivariate Cox regression.
Table 2. Relationships between features and survival by univariate Cox regression and multivariate Cox regression.
Relationships between Features and Survival by Univariate Cox Regression:
VariableRegression Coefficient ( β i )Hazard Ratio Exp ( β i )Confidence Level 95%p-Value
Age0.0000.773(0.947–1.056)0.996
Sex(M)−0.2581.000(0.259–2.302)0.643
T1 in necrosis region0.0010.994(1.000, 1.001)0.04 *
T2 in necrosis region0.0001.002(1.000, 1.001)0.021 *
CBF in necrosis region0.0000.999(0.998, 1.000)0.013 *
rCBV in necrosis region0.0111.037(1.014, 1.061)0.001 *
rCBV in LMD-LrCBV in necrosis region0.0101.028(1.007, 1.049)0.007 *
rCBV in HMD-LrCBV in necrosis region0.0781.212(1.041, 1.411)0.013 *
rCBV in LMD-LrCBV in CE region0.1521.433(1.064, 1.931)0.018 *
Relationships between Features and Survival by Multivariate Cox Regression
VariableRegression Coefficient( β i )Hazard Ratio Exp ( β i )Confidence Level 95%p-Value
T2 in necrotic region0.0001.000(0.999, 1.001)0.828
rCBV in LMD-LrCBV in CE region0.0011.001(1.000, 1.001)0.020 *
rCBV in necrosis region0.1131.120(1.058, 1.186)0.05 *
* The significance level is 0.05.
Table 3. Relative volume of ROIs in different groups.
Table 3. Relative volume of ROIs in different groups.
En (cm3)Nec (cm3)LMD-LrCBV (cm3)HMD-LrCBV (cm3)En-HrCBV (cm3)Tumor Volume (cm3)
Short-termsurvival group1.51 ± 0.452.07 ± 0.593.20 ± 0.731.01 ± 0.490.65 ± 0.376.02 ±5.1
Long-term survival group1.26 ± 0.412.64 ± 0.652.47 ± 0.881.95 ± 0.770.42 ± 0.236.07 ± 5.4
p-value0.1900.049 *0.032 *0.003 *0.1780.536
* The significance level is 0.05.
Table 4. Performance GBoost classifier on the dataset with and without RF-RFE.
Table 4. Performance GBoost classifier on the dataset with and without RF-RFE.
ClassifierAccuracyPrecisionRecallF1-MeasureAUCMCC
RF-RFE GBoost classifier0.750.750.740.750.7410.48
GBoost classifier (without RF-RFE)0.580.580.580.580.580.16
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Karami, G.; Giuseppe Orlando, M.; Delli Pizzi, A.; Caulo, M.; Del Gratta, C. Predicting Overall Survival Time in Glioblastoma Patients Using Gradient Boosting Machines Algorithm and Recursive Feature Elimination Technique. Cancers 2021, 13, 4976. https://doi.org/10.3390/cancers13194976

AMA Style

Karami G, Giuseppe Orlando M, Delli Pizzi A, Caulo M, Del Gratta C. Predicting Overall Survival Time in Glioblastoma Patients Using Gradient Boosting Machines Algorithm and Recursive Feature Elimination Technique. Cancers. 2021; 13(19):4976. https://doi.org/10.3390/cancers13194976

Chicago/Turabian Style

Karami, Golestan, Marco Giuseppe Orlando, Andrea Delli Pizzi, Massimo Caulo, and Cosimo Del Gratta. 2021. "Predicting Overall Survival Time in Glioblastoma Patients Using Gradient Boosting Machines Algorithm and Recursive Feature Elimination Technique" Cancers 13, no. 19: 4976. https://doi.org/10.3390/cancers13194976

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop