1. Introduction
Head and neck squamous cell carcinoma (HNSCC) is the seventh most prevalent cancer worldwide, with about 890,000 new cases and 450,000 mortalities in 2020 [
1]. HNSCCs occur in a wide range of primary sites in the head and neck region, ranging from the oral cavity to the pharynx. Multi-model treatments are used as the first-line treatment for HNSCC patients, including surgery, radiotherapy, chemotherapy, and immunotherapy, depending on the particular tumor position and staging [
2]. Radiotherapy is the elective treatment for most HNSCC cases when the cancer is unresectable at a locally advanced stage [
3]. Despite the advancement of multi-modality treatment, the 5-year survival rate in patients with HNSCC is still less than 50% due to late diagnosis and the high risk of disease recurrence [
4].
One important feature of HNSCCs is their high heterogeneity. This consists of anatomical, biological, and molecular heterogeneities [
5]. This affects the treatment outcome among patients treated with the same standard therapy [
6]. This property undermines the development of effective biomarkers and the effectiveness of conventional tumor biopsy [
6]. To date, only programmed death-ligand 1 (PD-L1) and human papillomavirus (HPV) expressions are considered as useful biomarkers in HNSCC [
7]. To tackle this situation, radiomics has been suggested for the further development of personalized treatment of HNSCC [
6].
Radiomics consists of extracting quantitative information from medical images, and associating it with clinical features to construct models for prognosis prediction with different machine-learning algorithms [
8]. Radiomics can potentially identify previously unknown tumor markers to improve prognosis prediction in large datasets [
9]. Due to its ability to detect tumor heterogeneity by extracting and analyzing sub-visual features from various imaging modalities, radiomics has been commonly studied in relation to HNSCC for outcome prediction with promising results [
10,
11,
12].
Different machine-learning algorithms have their strengths and limitations. It is suggested that by combining the predictions from multiple machine-learning algorithms, a more reliable prediction will be achieved by averaging and mitigating their limitations [
13]. Long et al. [
14] employed an ensembled machine-learning algorithm to predict survival in patients with hepatocellular carcinoma (HCC) and bone metastasis. Their ensemble model demonstrated the best performance as compared to individual machine-learning algorithms, with an area under curve (AUC) of 0.779. This hints that the ensemble technique can potentially improve prognosis prediction and facilitate clinical decision-making.
Apart from radiomic data, clinical information including tumor staging and patient demographics provides valuable information for prognosis prediction. It is suggested that combining radiomic and clinical features will produce a synergistic effect, which enhances predictive performance and accuracy. Gangil et al. [
15] compared the predictive capabilities of machine-learning algorithms using clinical, radiomic, and radiomic–clinical datasets in HNSCC. They revealed that the model that combined radiomic–clinical datasets exhibited superior predictive power in comparison to the model which relied solely on clinical features. Meanwhile, the model constructed with radiomic features alone demonstrated poor performance in predicting clinical outcomes.
The ensemble technique may be combined with radiomic and clinical information for further enhancing the predictive outcome. Tang et al. [
16] integrated radiomic data obtained from radiotherapy (RT) planning CT with clinical information for prognosis prediction in patients with non-small cell lung cancer (NSCLC). Radiomic and clinical features were first studied by five machine-learning algorithms with a voted ensemble machine-learning (VEML) model. Then, a probability-weighted strategy was used to incorporate radiomic and clinical features. The results showed that the combined model had superior performance compared to the radiomic model. This demonstrated that the combined model possesses the ability to improve prognosis prediction.
Since HPV status has been recognized as an important prognostic biomarker in HNSCC with a strong link to oropharyngeal carcinoma (OPC), there are some studies that combined radiomic and HPV status for a prognosis prediction. Wang et al. [
17] combined radiomic features and HPV status to perform a risk classification in patients with OPC. Meanwhile, Ou et al. [
18] showed that combining HPV p16 status and radiomics could outperform models using p16 status or radiomics alone in locally advanced HNSCC, with the majority of cases being OPC (68%). Therefore, the addition of HPV status information could be important for a prognosis prediction of HNSCC.
Heterogeneity remains a major concern affecting HNSCC prognosis and HPV information has been emerging as an important biomarker in HNSCC. Meanwhile, the use of an ensembled technique with both radiomic and clinical information may offer excellent predictive capability compared to previous studies. In this study, we aimed to predict 5-year overall survival in HNSCC radiotherapy patients by integrating radiomic and clinical information in machine-learning models.
2. Materials and Methods
2.1. Data Acquisition
The datasets consisting of treatment planning CT and radiotherapy (RT) structures sets were collected from The Cancer Imaging Archive (TCIA). This is an openly accessible database providing collections of medical images from various imaging modalities and is regulated by the Frederick National Laboratory for Cancer Research.
With permission granted from TCIA, a total of 627 datasets of HNSCC patients receiving radiotherapy at MD Anderson Cancer Center were acquired from TCIA’s ‘HNSCC’ collection [
19]. The collection comprised head and neck cancer patients receiving radical radiotherapy from 2003 to 2013, and oropharyngeal cancer patients receiving radiotherapy between 2005 and 2012. Pre-treatment planning CT images, along with RT structures and gross tumor volumes (GTV) contoured by professional clinical oncologists in Digital Imaging and Communications in Medicine (DICOM) format were obtained from the datasets. Furthermore, patient demographic and pathological information, including gender, age, smoking status, diagnostic site, tumor stage, HPV status, treatment modality, and 5-year overall survival status were also collected.
2.2. Study Workflow
Patients who satisfied the following requirements were included in the study: (1) they underwent treatment planning CT with gross tumor volume (GTV) delineated by clinical oncologists, (2) they possessed complete pathologic information including HPV status, and (3) they had a definite tumor staging. Initially, datasets comprising CT images, delineated RT structures, and clinical information were collected from TCIA. After that, radiomic feature extraction was conducted, and these features were subsequently inputted into the predictive models.
In this study, the primary endpoint was defined as 5-year overall survival (OS). To minimize the potential bias caused by imbalanced data and reduce the risk of overfitting, a balanced sample consisting of the same number of individuals who were alive and dead 5 years after diagnosis was employed by random selection. Then, the selected sample datasets were randomized to minimize selection bias and reduce the impact of confounding variables. Eventually, the data were analyzed using 5 machine-learning algorithms. The process from sample balancing was repeated 5 times to ensure that all samples were studied at least once. The outcomes of each iteration were then averaged to obtain more reliable results.
2.3. Feature Extraction
The extraction of radiomic features from GTV was employed utilizing the PyRadiomics extension in 3D Slicer software (v. 4.10.2), developed by the Computational Imaging and Bioinformatics Lab at Harvard Medical School [
20,
21]. The predictive model was developed by extracting 107 radiomic features from the planning CT images. These radiomic features include tumor shape, gray-level co-occurrence matrix, gray-level dependence matrix, first-order statistics, gray-level size zone matrix, gray-level run length matrix, and neighboring gray-tone difference matrix features [
21]. Subsequently, the extracted features were analyzed by 5 machine-learning algorithms utilizing R software (v. 4.1.3) to predict the prognosis of HNSCC.
2.4. Machine Learning
Five common machine-learning algorithms were utilized in this study, including decision tree (DT), extreme boost (EB), random forest (RF), support vector machine (SVM), and generalized linear model (GLM) algorithms. The brief introduction of the 5 machine-learning algorithms is summarized in
Table 1. For each algorithm, the targeted population was randomly divided into three cohorts. 70% of the samples were inputted into a training cohort to establish patterns, whereas both the validation cohort and the testing cohort contained 15% of the data. To enhance the accuracy of predictive performance, voted ensemble machine learning (VEML) was then employed by incorporating the probability scores generated from 5 algorithms to achieve a more realistic prediction when there were conflicts occurring between models. VEML was employed in our previous publication [
16] and the feature is summarized in
Figure 1.
2.5. Probability-Weighted Enhanced Model (PWEM)
Our previous studies have indicated that integrating radiomic features with clinical factors could enhance the accuracy of predictive models of NSCLC [
16]. Moreover, patient demographic and pathological information also had a satisfactory performance in prognosis prediction of HNSCC [
18,
23,
24]. To improve the predictive performance of the VEML model, we combined the results of VEML of radiomic and clinical factors by a probability-weighted approach.
The PWEM was illustrated in our previous study [
16]. Briefly, the model comprised both hard voting and soft voting techniques (
Figure 2). For hard voting, a VEML model was utilized to generate a VEML score for the radiomic model and the clinical factor model. These VEML scores represented the estimation and likelihood of the survival outcome from solely considering radiomic (VRA) or clinical factors (VCF). For soft voting, a probability-weighted enhanced approach was employed to assign each model’s weighting based on their respective ability to predict prognosis in the validation cohort. Predictive weighting is the factor that reflects the model’s probability of acquiring a correct prediction under a conflicting situation. The weighting of each model is counted according to the probability of getting a correct prediction by each model among the conflicted predictions. By multiplying the VEML score of each model to its respective predictive weighting, the sum of two models would be used as the final score ranging from 0 to 1. A score lower than 0.5 suggests that the patient is likely to survive at the study endpoint. Meanwhile, a weighted score of 0.5 or higher indicates mortality prediction at the study endpoint.
The weighted score was determined by using the following equation [
16]:
2.6. Data Analysis
The descriptive data were presented as mean ± standard deviation. To assess the predictive performance of the radiomic and clinical factor model using a single machine-learning algorithm, VEML and PWEM, the receiver operating characteristic (ROC) curve was utilized to demonstrate the prognostic performance of the models in various metrics, including area under the curve (AUC), accuracy, sensitivity, and specificity. Moreover, for the clinical factor model using a random forest algorithm, the mean decrease accuracy and mean decrease Gini were used to assess the importance of each clinical factor in prognosis prediction. Additional evaluations by chi-square tests were employed to confirm the significance of clinical factors in HNSCC survival. A p value of less than 0.05 is considered as statistically significant for this study.