Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study

Lee, Hsi-Chieh; Lin, Tsung-Chieh; Chang, Chi-Chang; Lu, Yen-Chiao Angel; Lee, Chih-Min; Purevdorj, Bolormaa

doi:10.3390/app122412520

Open AccessArticle

Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study

by

Hsi-Chieh Lee

¹

,

Tsung-Chieh Lin

¹

,

Chi-Chang Chang

^2,3,*

,

Yen-Chiao Angel Lu

^4,5,*

,

Chih-Min Lee

⁶ and

Bolormaa Purevdorj

^7,8,9

¹

Department of Computer Science and Information Engineering, National Quemoy University, Kinmen County 892, Taiwan

²

School of Medical Informatics, Chung Shan Medical University & IT Office, Chung Shan Medical University Hospital, Taichung City 40201, Taiwan

³

Department of Information Management, Ming Chuan University, Taoyuan City 33300, Taiwan

⁴

Department of Nursing, Chung Shan Medical University, Taichung City 40201, Taiwan

⁵

Department of Nursing, Chung Shan Medical University Hospital, Taichung City 40201, Taiwan

⁶

Department of Mathematics and Information Engineering, National Taipei University of Education, Taipei 106, Taiwan

⁷

School of Public Health, Mongolian National University of Medical Sciences, Ulaanbaatar 14210, Mongolia

⁸

Division of Environmental and Preventive Medicine, Tottori University, Tottori 680-8550, Japan

⁹

School of Public Health, Loma Linda University, Loma Linda, CA 92350, USA

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(24), 12520; https://doi.org/10.3390/app122412520

Submission received: 30 September 2022 / Revised: 29 November 2022 / Accepted: 2 December 2022 / Published: 7 December 2022

(This article belongs to the Special Issue Decision Support Systems for Disease Detection and Diagnosis)

Download

Browse Figures

Versions Notes

Abstract

:

This study aimed to develop a risk-prediction model for second primary skin cancer (SPSC) survivors. We identified the clinical characteristics of SPSC and created awareness for physicians screening high-risk patients among skin cancer survivors. Using data from the 1248 skin cancer survivors extracted from five cancer registries, we benchmarked a random forest algorithm against MLP, C4.5, AdaBoost, and bagging algorithms for several metrics. Additionally, in this study, we leveraged the synthetic minority over-sampling technique (SMOTE) for the issue of the imbalanced dataset, cost-sensitive learning for risk assessment, and SHAP for the analysis of feature importance. The proposed random forest outperformed the other models, with an accuracy of 90.2%, a recall rate of 95.2%, a precision rate of 86.6%, and an F1 value of 90.7% in the SPSC category based on 10-fold cross-validation on a balanced dataset. Our results suggest that the four features, i.e., age, stage, gender, and involvement of regional lymph nodes, which significantly affect the output of the prediction model, need to be considered in the analysis of the next causal effect. In addition to causal analysis of specific primary sites, these clinical features allow further investigation of secondary cancers among skin cancer survivors.

Keywords:

second primary skin cancer (SPSC); random forest; SHAP; cost-sensitive learning; synthetic minority over-sampling technique (SMOTE)

1. Introduction

Skin cancer is the most common type of cancer in the United States. Doctors and patients have given importance to the observation of second primary cancers (SPCs) with caution [1]. In Taiwan, early detection and diagnosis have become feasible owing to the promotion of cancer screening in recent years.

With improvements in computing capacity, artificial intelligence (AI) has been gaining attention in the medical field. Machine learning (ML), a component of AI, plays a key role in assisting diagnosis and prognosis. In 1994 [2], H.C. Lee et al. utilized artificial neural networks (NN) to distinguish melanoma from the other three skin tumors, i.e., benign pigmented tumors, and achieved promising performance. Moreover, in 1998 [3], they combined a hierarchical neural network and fuzzy system to diagnose skin cancer with 23 features extracted from raw images and achieved excellent results. Nevertheless, the literature on the prediction of second cancers in skin cancer survivors is still inadequate.

Since then, an increasing number of machine learning and deep learning technologies have exploded in the field of artificial intelligence. Gerald Schaefer et al. used an SVM ensemble classification approach to identify melanoma images [4]. Noel Codella et al. combined deep learning (CNN), sparse coding, and support vector machine (SVM) learning algorithms for melanoma recognition [5]. According to Litjens et al., deep learning techniques permeated the field of medical image analysis [6]. Tajbakhsh et al. demonstrated that pre-trained and transfer learning models are applicable to medical images [7]. Wang et al. [8] highlighted the importance of including additional factors in the model that can allow improved cost-effective screening strategies for skin cancer through their model. In our study, we consider the random forest algorithm [9] to develop a classifier and benchmark MLP, C4.5 [10], AdaBoost [11], and bagging [12] algorithms, using several metrics. Additionally, we leverage synthetic minority over-sampling techniques (SMOTE) [13] to solve the issue of unbalanced datasets, cost-sensitive learning for risk assessment, and Shapley additive explanations (SHAP) [14,15,16] for the analysis of relevant characteristics considering global and local perspectives. Random forest is a type of ensemble machine-learning model based on classification and regression trees (CART), bagging (bootstrap [17] aggregation), and random feature selection. This randomness design prevents the overfitting of other decision trees and possesses tolerance against noise and outliers. The important feature of random forests is the permutation approach, which helps measure the decrease in prediction performance as we permute the value of the feature. We have summarized the current state of cancer diagnostics in Table A1 of Appendix A.

SMOTE is an oversampling algorithm proposed by Chawla et al. used to improve overfitting. This algorithm helps create new randomly generated samples between minority class samples and their neighbors, which can balance the number among categories. Cost-sensitive [18,19,20] learning is a training approach that considers the assigned costs of misclassification errors during model training. This is closely related to an imbalanced dataset study. Different biases are required, with different monitoring criteria in different risk management phases. The computing speed of random forest is efficient; thus, we can easily integrate different cost biases for scenarios of interest to obtain an overall picture. The SHAP analysis is an extension based on Shapley values and a game-theoretic method to calculate the average of all marginal contributions in all the coalitional combinations. In addition to the calculation of Shapley values, SHAP addresses the additive feature attribution as a linear model. For a given feature set, the SHAP values are calculated for each feature and then added, considering a base value to perform the final prediction. TreeSHAP is proposed as a variant of SHAP for tree-based machine-learning models, such as decision trees, random forests, and gradient-boosted trees. The importance of SHAP features is defined as the average of the absolute SHAP values per feature for all the instances. This is focused on the variance of the model output, which differs from the importance of permutation features based on the performance error. Thus, we can know how a change in the magnitude of a model owing to the manipulation of feature values leads to likelihood or regression without considering performance, i.e., accuracy or loss.

2. Materials and Methods

The process flow used in this study is illustrated in Figure 1. The data were collected from 1 January 2011 to 31 December 2020, giving a total of 1248 effective records, including 1146 SPSC patients and 102 non-SPSC patients. In addition, our data revealed that basal cell neoplasms accounted for 56% of all histology of skin cancer survivors, followed by: squamous cell neoplasms (32%); nevi and melanomas (8%); adnexal and skin appendage neoplasms and fibromatous neoplasms (both are 3%); adenomas and adenocarcinomas, and cystic, mucinous, and serous neoplasms (both are 1%). In Step 1, we collected the skin cancer dataset. To better fit our machine-learning algorithms and TreeSHAP analysis, we encoded all the data into categorical variables in Step 2. In Step 3, owing to a significant imbalance between the second primary skin cancer (SPSC) and non- SPSC data, we balanced the dataset with the up-sampling method, SMOTE, for the minor category. One of our key purposes is to understand the prediction behaviors of our trained model; hence, we used 10-fold cross-validation without a dataset split in Step 4. In Step 5, random forest is used for comparison with other baseline algorithms, and we have further utilized model interpretation to observe feature importance and interactions.

2.1. Step 1: Dataset Source

Based on the study of SPCs in skin cancer survivors and discussions with clinical experts, a total of 10 predictive risk factors were selected (Table 1). Therefore, the analysis aimed to identify the most important risk characteristics of these 10 predictors. The dataset encoding features and sample size distribution are shown in Table 1. The original dataset had a significant imbalance issue, i.e., the minor category is ~10% of the majority. Figure 2 shows the sample distribution specific to the target label before balancing the dataset.

2.2. Step 2: Data Preprocessing

For a better fit to our machine-learning algorithms and TreeSHAP analysis, we encoded the data into categorical variables with an assigned integer. Moreover, we assigned larger integers to ordinal variables that are positive or have a higher intensity. However, a few features were missing or had unavailable values, which we encoded as zero to minimize the possible impact on the SHAP value.

2.3. Step 3: Balancing Dataset

Owing to the significantly imbalanced dataset, the model tends to overfit the main category and ignore the learning features of the minority. Several approaches can help overcome this learning bias caused by the target loss function design, such as assigning different weights to samples or categories in the loss function, assigning different cost weights to prediction errors, and resampling data using over- or under-sampling.

In this study, SMOTE was used for over-sampling. We generated synthetic samples for the minority, up to the quantity of the majority, instead of simply duplicating samples. This method selects samples around a base sample of the minority, randomly selects one neighbor, and randomly perturbs one feature at a time.

2.4. Step 4: 10-Fold Cross-Validation

We used 10-fold cross-validation to prevent bias from the test-set split of the entire dataset, which helps segment the entire dataset into 10 equal size subgroups. Each time, a subgroup was selected as the holdout set, and the rest of the 9 subgroups were used for training. At the end of 10 rounds of training, the average performance of the 10 models was obtained.

For the machine-learning algorithm, we benchmarked random forest with MLP, C4.5, AdaBoost, and bagging:

MLP is a classifier that uses backpropagation to learn a multi-layer perceptron to classify instances.
C4.5 develops a decision tree by splitting the value of the features at each node, including categorical and numerical features. The information gain is calculated, and the feature with the highest gain is used as the splitting rule.
AdaBoost with C4.5, which is part of the group of ensemble methods, i.e., boosting, adds newly trained models in a series, where subsequent models focus on fixing the prediction errors made by previous models. In this study, we selected C4.5 as a base classifier.
Bagging (bootstrap aggregation) with C4.5 is an ensemble skilled method that uses the bootstrap sampling technique to form different sample sets based on replacements. We used C4.5 as a base classifier to derive the forest.

The random forest classification model helps develop parallel decision trees that decide on a category based on votes for a given instance and provide the final decision as a prediction. Cost-sensitive learning is important in the case of risk management, in which we pursue better flexibility of trade-offs among metrics. For example, we may focus more on the recall rate of the SPSC category by loosening the performance of the precision rate. In this section, we assign false negative (FN) cost errors to the SPSC categories 1, 3, and 5; however, we maintain the false positive (FP) cost error at 1.

In our study, we set the SPSC category as positive and then evaluated several metrics. True positive (TP) is the number of instances of positive categories predicted as positive. True negative (TN) is the number of instances of the negative class predicted as negative. False positive (FP) is the number of instances of negative classes predicted as positive. False negative (FN) is the number of instances of positive classes predicted as negative. Accuracy: (TP + TN)/(TP + TN + FP + FN). Precision: TP/(TP + FP). Recall: TP/(TP + FN). F1-score: 2 × (Recall × Precision)/(Recall + Precision). FP rate: FP/(FP + TN). TP rate: TP/(TP + FN). The receiver operating characteristic (ROC) curve uses the TP rate as the y-axis and the FP rate as the x-axis and plots points with corresponding thresholds.

2.5. Step 5: Interpretability

We review the importance of two approaches, RF and SHAP. The former is a single-feature permutation approach to observe the model performance impact, whereas the latter is flexible in observing main features and interaction effects regarding the model output.

For SHAP, we used global interpretability plots. The feature importance plot lists the importance of all the features in descending order and uses color bars to denote positive or negative correlation coefficients.

The importance plot provides very rich SHAP values and the output impact direction information of the individual points in rich colors, which can help users quickly obtain critical insights. The dependency plot helpfully shows correlations and interactions between the two features and SHAP value trends.

In the local interpretability plot, the waterfall plot is designed to demonstrate explanations of individual instances.

3. Results

3.1. Prediction Performance

This study retrospectively collected 1248 records from five cancer registries. These cases were included if the patient was diagnosed with skin carcinoma and treated accordingly. The study group was composed of survivors with SPSC, and the control group comprised survivors with non-SPSC. Incomplete fields, incomplete histology data, and incomplete variable data were excluded. This study was approved by the Institutional Review Board of the Chung Shan Medical University Hospital (IRB no. CS2-20114) without patient consent.

A balanced dataset and 10-fold cross-validation were used for the performance analysis. Table 2 lists the comparisons among algorithms, including MLP, C4.5, AdaBoost with C4.5, bagging with C4.5, and random forest. Random forest exhibited a metric outperformance in F1 of 90.7%, ROC area of 93.9%, PRC of 92.6%, and accuracy of 90.2% for the SPSC category. In our case, MLP demonstrated worse metric performance than C4.5 as a benchmark between the numerical-base algorithm and the categorical-base decision tree algorithm. Based on the evidence that random forest surpassed the C4.5 and bagging sets, we believe that the main improvement is based on the randomness design of bootstrap subsampling and the choice of the feature for splitting nodes. Compared to AdaBoost, the independent trees of random forest showed better ensemble synergy than the boosting ensemble that arranges trees in series.

Based on the comparison of the ROC curves illustrated in Figure 3a, random forest covers the largest area of 0.939, which implies that the trade-off between TPR and FPR based on threshold setting is relatively better than the others, while MLP is the worst.

As listed in Table 3, the recall rate increases (i.e., from 95.2% to 95.9%) if we increase the penalty cost of the FN error of the SPCs category but maintain the FP error cost at one, whereas the precision rate (i.e., 86.6–77.9%) and overall accuracy (i.e., 90.2–84.3%) are compromised accordingly. This policy scenario implies that we hope that the potential patient with SPC would be labeled as much as possible, considering the acceptable misclassification FP results of the patient. As shown in Figure 3b, there is no variation in the costs.

3.2. Interpretability

Before explaining the features, we need to understand that model interpretability is not always equal to causality. The SHAP values do not provide causality but provide insights into how the model behaves from data learning.

First, we used a random forest to observe the importance of global features. The features arranged by the random forest method showed that age, stage, and gender were the top three impact features. The SHAP values in Figure 4 provide additional information. First, we agreed that age, stage, and gender demonstrate a critical and positive impact, which implies that larger feature values bring greater SPSC probability, whereas alcohol falls to the fifth rank. Moreover, the involvement of the regional lymph node ranks fourth. The red bars indicate the positive overall correlation coefficients.

Next, we investigated the SHAP distribution of all the instances, as illustrated in Figure 6, i.e., using a bee swarm plot (Figure 5a) and heat map (Figure 5b). We observed how age contributes to the SHAP value per instance; however, we cannot investigate the correlation distribution. Therefore, we plotted the scattering plot for age and observed a positive correlation trend in Figure 8a. We further investigated the SHAP interaction between features shown in Figure 6, which segments the SHAP value into the main value on the diagonal line and the interaction value off the diagonal line. In Figure 7, in the heat map, which is the absolute mean calculated from all the instances, we observe that the highest point is the main age value of 0.238. We further checked the dependency graph in Figure 8b and observed a very clear positive correlation between age and SPSC. However, we observed no special trend with respect to interactions, as shown in Figure 8c,d.

In the same way, we saw a similar correlation behavior in Figure 9, with regard to gender; Figure 10, with regard to stage; and Figure 11, with regard to the involvement of the regional lymph node. This indicates that male sex, higher stage, and positive lymph node involvement are risk factors for SPSC. Furthermore, we observed positive and negative correlations between the subgroups shown in Figure 11d,e.

The global interpretation shows the correlations of all the samples between features and predictions that cannot explain predictions with regard to a specific sample. Figure 12a shows the segmentation of the individual SHAP value of the SPSC instance, from which we can clearly see how these feature forces bring the prediction probability f(x) up to 1. In contrast, Figure 12b shows that the reverse forces decrease f(x) to 0.

4. Discussion

Garcovic et al. [21] found that the old (65–79 years) and very old age (>80 years) subgroups had the highest increase in the incidence rate of basal cell carcinoma. Several research findings indicate a positive association between aging and skin cancer risk [22,23]. Our findings support that aging is one of the main risk factors for SPSC. Another study by Yen et al. [24] found that alcohol consumption was significantly associated with an increased risk of basal cell carcinoma and cutaneous squamous cell carcinoma. The results of this study also indicated that alcohol consumption is a risk factor for SPSC. Research shows that men experience extensive exposure outside, negligible frequent sun protection, and positive tanning perception, which elevate their risk toward SPSC [1]. Several studies have shown that men are at an increased risk of SPSC [22,25,26,27,28,29,30,31,32].

In our study, we identified results similar to those of the previous studies. We explored data mining using a machine-learning algorithm, i.e., random forest on an imbalanced dataset. Furthermore, we used SMOTE to oversample the minority to balance and prevent the model from being biased on the original majority of non- SPSC instances. With respect to metric performance, random forest showed a better prediction capability than the benchmark algorithms. The model achieved an overall accuracy of 90.2%, a recall rate of 95.2%, a precision rate of 86.6%, and an F1 of 90.7% for the SPSC category based on 10-fold cross-validation on a balanced dataset. With regard to cost-sensitive learning, if we increase the FN cost, the recall rate improves from 95.2% to 95.9%, whereas the precision rate decreases from 86.6% to 77.9%, and the accuracy decreases from 90.2% to 84.3%. Cost-sensitive learning is a quick method of conventional machine learning to assess risk, while we intend to switch between different policies. From the SHAP value interpretation analysis, we can obtain insights into how the model makes decisions based on features and the interactions between correlated features. The top four features are age, stage, gender, and regional lymph node involvement. However, the model interpretation is for the model behavior analysis instead of causal effect analysis. The model learns complicated correlations within the dataset, as shown in Figure 7, and identifies the best choice to optimize the objective function.

Limitations and Future Directions

This study has some limitations. First, the TCR data were collected from different hospitals at various time periods, and different sources may have different judgment levels for the data. Second, a few features possessed a significant number of missing values, which may bias the real trends if these feature ranks are inappropriately preprocessed and assigned. We need to focus on this once we interpret these features. Third, overfitting can occur if we do not include or collect other critical features for the use case. Fourth, when we used SMOTE to balance the sample size, a few input features still exhibited obviously unbalanced statuses, which could lead to poor generalization.

In the future, with regard to data collection and enhancement, multimodal data can be a better mode to include more data types for better prediction performance. For instance, additional medical images or a time series of a patient’s pathological trajectory would provide additional significant features for prediction. From the perspective of explainable artificial intelligence (XAI) in high-stakes fields, e.g., medical applications, it plays an important role in providing more insights into how the model makes predictions, which is necessary information for different user roles, such as patients, clinicians, or model architects, to efficiently reach an agreement or completely understand the risks of the wrong prediction. Several mathematical XAI approaches and perspectives are available to balance system design between algorithm-centric and user-centric types. It is also important to develop standards or criteria for XAI evaluation to help identify effective XAI methods per use case. Further investigation into the causal analysis of specific primary versus secondary cancers can be conducted using these clinical features in survivors of skin cancer.

5. Conclusions

A risk-prediction model for second primary skin cancer (SPSC) survivors has been built. Random forest was the best classifier proposed for the prediction of SPSC in our study, with an accuracy of 90.2%, a recall rate of 95.2%, a precision rate of 86.6%, and an F1 value of 90.7%. In addition, cost-sensitive learning was utilized, with an increase in the recall rate from 95.2% to 95.9%, a compromised precision rate from 86.6% to 77.9%, and a reduced accuracy from 90.2% to 84.3%. Our study found that age, stage, gender, and involvement of the regional lymph nodes were the most significant impact factors for SPSC. These features will be useful for physicians to detect high-risk patients among skin cancer survivors in advance.

Author Contributions

Conceptualization, H.-C.L.; Methodology, H.-C.L.; Validation, B.P.; Formal analysis, H.-C.L. and T.-C.L.; Writing—original draft, T.-C.L., C.-C.C. and Y.-C.A.L.; Writing—review & editing, C.-C.C., Y.-C.A.L. and C.-M.L.; Visualization, B.P.; Supervision, C.-C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Ministry of Science and Technology, Taiwan grant number MOST 111-2321-B-040-003.

Institutional Review Board Statement

The study was approved by the Chung Shan Medical University Hospital Institutional Review Board (No: CS2-21114).

Informed Consent Statement

The Institutional Review Board of Chung Shan Medical University Hospital approved this study (IRB No. CS2-20114) and waived the requirement for patient consent.

Data Availability Statement

Data are available from the Institutional Review Board of Chung Shan Medical University Hospital for researchers who meet the criteria for access to confidential data. Requests for the data may be sent to the Chung Shan Medical University Hospital Institutional Review Board, Taichung City, Taiwan (e-mail: [email protected]).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Summary of the current state of cancer diagnostics.

Title	Authors	Published Year	Cancer	Input Data	Proposed, Benchmark Models	Result
Application of DL to predict advanced neoplasia using big clinical data in colorectal cancer screening of asymptomatic adults	Yang, Hyo-Joon, et al.	2021	Colorectal	Clinical data	ANN/LR	AUC of LR (0.724), ANN (0.760)
A data-driven approach to a chemotherapy recommendation model based on DL for patients with colorectal cancer in Korea	Park, Jin-Hyeok, et al.	2020	Colorectal	Clinical data	ANN	Concordance rates with the NCCN guidelines were 70.5% for Top-1 Accuracy and 84% for Top-2 Accuracy
Preoperative prediction of lymph node metastasis in patients with early-T-stage non-small cell lung cancer by machine learning algorithms	Wu, Yijun, et al.	2020	Lung	Clinical data	AdaBoost, ANN, DT, GBDT, LR, MNB, RF, XGBoost	RF is the best model, AUC 0.89
Robust machine learning for colorectal cancer risk prediction and stratification	Nartowt, Bradley J., et al.	2020	Colorectal	Clinical data	ANN	Concordance of 0.70 ± 0.02, sensitivity of 0.63 ± 0.06, and specificity of 0.82 ± 0.04
A Machine Learning Approach for Long-Term Prognosis of Bladder Cancer based on Clinical and Molecular Features	Song, Qingyuan, et al.	2020	Bladder	Clinical data	LR	AUC 0.77, F1 0.78, sensitivity 0.65, specificity 0.79, accuracy 0.76
Predicting breast cancer in Chinese women using machine learning techniques: algorithm development	Hou, Can, et al.	2020	Breast Cancer	Clinical data	XGBoost, RF and ANN	AUC of XGBoost (0.742), ANN (0.728), RF (0.728)
Treatment stratification of patients with metastatic castration-resistant prostate cancer by machine learning	Deng, Kaiwen, Hongyang Li, and Yuanfang Guan.	2020	Prostate	Clinical data	Linear regression, LR, Cox regression, BAG-CART, RF	RF achieved the highest performance
Patient classification of two-week wait referrals for suspected head and neck cancer: a machine learning approach	Moor, J. W., V. Paleri, and J. Edwards.	2019	Head and neck	Clinical data	ML algorithms	Variational logistic regression was the most clinically useful technique among benchmarks
The performance of different artificial intelligence models in predicting breast cancer among individuals having type 2 diabetes mellitus	Hsieh, Meng-Hsuen, et al.	2019	Breast Cancer	Clinical data	LR, ANN and RF	AUC of the LR (0.834), ANN (0.865), and RF (0.959)
Machine learning methods can more efficiently predict prostate cancer compared with prostate-specific antigen density and prostate-specific antigen velocity	Nitta, Satoshi, et al.	2019	Prostate	Clinical data	ANN, RF, SVM	AUC of ANN(0.69), RF(0.64), SVM(0.63)
A machine learning-assisted decision-support model to better identify patients with prostate cancer requiring an extended pelvic lymph node dissection	Hou, Ying, et al.	2019	Prostate	Clinical data	LR, SVM, RF	AUCs (RFs+/RFs−: 0.906/0.885; SVM+/SVM−: 0.891/0.868; LR+/LR−: 0.886/0.882) with (+) or without (−) MRI-reported LNI
Classifying lung cancer severity with ensemble machine learning in health care claims data	Bergquist, Savannah L., et al.	2017	Lung	Clinical data	Ensemble of ML models	Sensitivity 0.93, specificity 0.92, accuracy 0.93
A Novel Chaos-Based Privacy-Preserving Deep Learning Model for Cancer Diagnosis	Mujeeb Ur Rehman, Arslan Shafique, Yazeed Yasin Ghadi et al.	2022	Brain	Image	DL	Sensitivity 0.97, accuracy 0.972, F1 0.98
Imbalanced breast cancer classification using transfer learning	R. Singh, T. Ahmed, A. Kumar et al.	2021	Breast	Image	DL	Accuracy 0.903
Breast cancer diagnosis using deep belief networks on ROI images	G. Altan et al.	2021	Breast	Image	DL	Accuracy 0.963, specificity 0.967, sensitivity 0.96, precision 0.964
Automatic skin cancer detection in dermoscopy Imaging based on ensemble lightweight DL network	Wei, Lisheng, Kun Ding, and Huosheng Hu	2020	Skin	Image	DL	AUC 0.854, accuracy 0.876
A 3D Probabilistic Deep Learning System for Detection and Diagnosis of Lung Cancer Using Low-Dose CT Scans	O. Ozdemir, R. L. Russell, and A. A. Berlin	2020	Lung	Image	DL	AUC 0.869, sensitivity 0.921
Breast lesion classification based on dynamic contrast-enhanced magnetic resonance images sequences with long short-term memory networks	N. Antropova, B. Huynh, H. Li, and M. L. Giger	2018	Breast	Image	DL	AUC 0.88, accuracy 0.93
Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images	J. Xu et al.	2016	Breast	Image	DL	AUC 0.788, F1 0.85

References

Duarte, A.F.; Sousa-Pinto, B.; Haneke, E.; Correia, O. Risk factors for development of new skin neoplasms in patients with past history of skin cancer: A survival analysis. Sci. Rep. 2018, 8, 15744. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ercal, F.; Chawla, A.; Stoecker, W.; Lee, H.-C.; Moss, R. Neural network diagnosis of malignant melanoma from color images. IEEE Trans. Biomed. Eng. 1994, 41, 837–845. [Google Scholar] [CrossRef]
Ercal, F.; Lee, H.C.; Stoecker, W.V.; Moss, R.H. Skin Cancer Classification Using Neural Networks and Fuzzy Systems. Int. J. Smart Eng. Syst. Des. 1998, 1, 273–289. [Google Scholar]
Schaefer, G.; Krawczyk, B.; Celebi, M.E.; Iyatomi, H. An ensemble classification approach for melanoma diagnosis. Memetic Comput. 2014, 6, 233–240. [Google Scholar] [CrossRef]
Codella, N.; Cai, J.; Abedini, M.; Garnavi, R.; Halpern, A.; Smith, J.R. Deep Learning, Sparse Coding, and SVM for Melanoma Recognition in Dermoscopy Images. In Proceedings of the 6th International Workshop on Machine Learning in Medical Imaging, Munish, Germany, 5–9 October 2015; pp. 118–126. [Google Scholar]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [Green Version]
Tajbakhsh, N.; Shin, J.Y.; Gurudu, S.R.; Hurst, R.T.; Kendall, C.B.; Gotway, M.B.; Liang, J. Convolutional neural networks for medical image analysis: “Full training or fine tuning?” . IEEE Trans. Med. Imaging 2016, 35, 1299–1312. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, W.; Jorgenson, E.; Ioannidis, N.M.; Asgari, M.M.; Whittemore, A.S. A Prediction Tool to Facilitate Risk-Stratified Screening for Squamous Cell Skin Cancer. J. Investig. Dermatol. 2018, 138, 2589–2594. [Google Scholar] [CrossRef] [PubMed]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Salzberg, S.L. C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 1994, 16, 235–240. [Google Scholar] [CrossRef] [Green Version]
Freund, Y.; Schapire, R.E. Experiments with a New Boosting Algorithm. In Proceedings of the International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; pp. 148–156. [Google Scholar]
Breiman, L. Bagging predictors. Mach. Learn. 2004, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.-W.; Newman, S.-F.; Kim, J.; et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2018, 2, 749–760. [Google Scholar] [CrossRef] [PubMed]
Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Chapman and Hall: New York, NY, USA, 1993. [Google Scholar]
Krawczyk, B.; Woźniak, M.; Schaefer, G. Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. 2014, 14, 554–562. [Google Scholar] [CrossRef] [Green Version]
Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. A Comparative Study of Data Sampling and Cost Sensitive Learning. In Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, Pisa, Italy, 15–19 December 2008; pp. 46–52. [Google Scholar] [CrossRef]
Thai-Nghe, N.; Gantner, Z.; Schmidt-Thieme, L. Cost-sensitive learning methods for imbalanced data. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
Garcovich, S.; Colloca, G.; Sollena, P.; Andrea, B.; Balducci, L.; Cho, W.C.; Bernabei, R.; Peris, K. Skin Cancer Epidemics in the Elderly as An Emerging Issue in Geriatric Oncology. Aging Dis. 2017, 8, 643–661. [Google Scholar] [CrossRef] [Green Version]
Nosrati, A.; Yu, W.Y.; McGuire, J.; Griffin, A.; de Souza, J.R.; Singh, R.; Linos, E.; Chren, M.M.; Grimes, B.; Jewell, N.P.; et al. Outcomes and Risk Factors in Patients with Multiple Primary Melanomas. J. Investig. Dermatol. 2018, 139, 195–201. [Google Scholar] [CrossRef] [Green Version]
Albert, A.; Knoll, M.A.; Conti, J.A.; Zbar, R.I.S. Non-Melanoma Skin Cancers in the Older Patient. Curr. Oncol. Rep. 2019, 21, 79. [Google Scholar] [CrossRef]
Yen, H.; Dhana, A.; Okhovat, J.; Qureshi, A.; Keum, N.; Cho, E. Alcohol intake and risk of nonmelanoma skin cancer: A systematic review and dose–response meta-analysis. Br. J. Dermatol. 2017, 177, 696–707. [Google Scholar] [CrossRef]
Adams, G.J.; Goldstein, E.K.; Goldstein, B.G.; Jarman, K.L.; Goldstein, A.O. Attitudes and Behaviors That Impact Skin Cancer Risk among Men. Int. J. Environ. Res. Public Health 2021, 18, 9989. [Google Scholar] [CrossRef]
Fontanillas, P.; Alipanahi, B.; Furlotte, N.A.; Johnson, M.; Wilson, C.H.; Pitts, S.J.; Gentleman, R.; Auton, A. Disease risk scores for skin cancers. Nat. Commun. 2021, 12, 160. [Google Scholar] [CrossRef]
Chang, C.-C.; Huang, T.-H.; Shueng, P.-W.; Chen, S.-H.; Chen, C.-C.; Lu, C.-J.; Tseng, Y.-J. Developing a Stacked Ensemble-Based Classification Scheme to Predict Second Primary Cancers in Head and Neck Cancer Survivors. Int. J. Environ. Res. Public Health 2021, 18, 12499. [Google Scholar] [CrossRef] [PubMed]
Chang, C.C.; Chen, C.-C.; Cheewakriangkrai, C.; Chen, Y.C.; Yang, S.F. Risk Prediction of Second Primary Endometrial Cancer in Obese Women: A Hospital-Based Cancer Registry Study. Int. J. Environ. Res. Public Health 2021, 18, 8997. [Google Scholar] [CrossRef] [PubMed]
Dusingize, J.C.; Olsen, C.M.; Pandeya, N.P.; Subramaniam, P.; Thompson, B.S.; Neale, R.E.; Green, A.C.; Whiteman, D.C. Cigarette Smoking and the Risks of Basal Cell Carcinoma and Squamous Cell Carcinoma. J. Investig. Dermatol. 2017, 137, 1700–1708. [Google Scholar] [CrossRef]
Zavattaro, E.; Fava, P.; Veronese, F.; Cavaliere, G.; Ferrante, D.; Cantaluppi, V.; Ranghino, A.; Biancone, L.; Fierro, M.T.; Savoia, P. Identification of Risk Factors for Multiple Non-Melanoma Skin Cancers in Italian Kidney Transplant Recipients. Medicina 2019, 55, 279. [Google Scholar] [CrossRef] [Green Version]
Savoia, P.; Veronese, F.; Camillo, L.; Tarantino, V.; Cremona, O.; Zavattaro, E. Multiple Basal Cell Carcinomas in Immunocompetent Patients. Cancers 2022, 14, 3211. [Google Scholar] [CrossRef] [PubMed]
Wiemels, J.L.; Wiencke, J.K.; Li, Z.; Ramos, C.; Nelson, H.H.; Karagas, M.R. Risk of Squamous Cell Carcinoma of the Skin in Relation to IgE: A Nested Case–Control Study. Cancer Epidemiol. Biomark. Prev. 2011, 20, 2377–2383. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Method process flow.

Figure 2. Sample distribution of each characteristic specific to SPSC (red: SPSC; blue: non-SPSC).

Figure 3. Comparison of the ROC curves of (a) different algorithms and (b) random forest with different FN costs.

Figure 4. SHAP value for each characteristic (red: positive impact; blue: negative impact).

Figure 5. SHAP value plots: (a) bee swarm plot and (b) heat map.

Figure 6. Dependence plot matrix of the SHAP interaction values.

Figure 7. Heat map of the SHAP interaction values between the features.

Figure 8. Dependence plots of the SHAP interaction values between age and other important features: (a) SHAP value of age; (b) SHAP main effect value of age; (c) SHAP interaction value between age and stage; and (d) SHAP interaction value between age and gender.

Figure 9. Dependence plots of the SHAP interaction values between stage and other important features: (a) SHAP value of stage; (b) SHAP main effect value of stage; (c) SHAP interaction value between stage and age; and (d) SHAP interaction value between stage and gender.

Figure 10. Dependence plots of the SHAP interaction values between gender and other important features: (a) SHAP value of gender; (b) SHAP main effect value of gender; (c) SHAP interaction value between gender and age; and (d) SHAP interaction value between gender and stage.

Figure 11. Dependence plots of SHAP interaction values between lymph and other important characteristics: (a) SHAP value of lymph; (b) SHAP main effect value of lymph; (c) SHAP interaction value between lymph and age; (d) SHAP interaction value between lymph and stage; and (e) SHAP interaction value between lymph and gender.

Figure 12. Waterfall plot. Two example instances for local interpretation: (a) SPSC case and (b) non-SPSC case. f(x) is the prediction probability and E[f(x)] is the expectation of probability (i.e., the average predictions of all the instances).

Table 1. Dataset encoding features and sample sizes demonstrated in prediction analysis.

Items	Features	Rank	Encoding	Sample Size
1	Gender	Female	0	597
1	Gender	Male	1	651
2	Age at diagnosis	17~98	Integer	1248
3	Tumor size	<5 cm	0	1202
3	Tumor size	≥5 cm	1	46
4	Regional lymph node involvement	No	0	1140
4	Regional lymph node involvement	Yes	1	108
5	Cancer stage	<(Stage I)	0	818
5	Cancer stage	≥(Stage II)	1	430
6	Residual tumor on edge of surgery	No	0	1223
6	Residual tumor on edge of surgery	Yes	1	25
7	Radiation Therapy	No	0	1218
7	Radiation Therapy	Yes	1	30
8	Systemic therapy	No	0	1242
8	Systemic therapy	Yes	1	6
9	Areca	No/NA	0	1200
9	Areca	Yes	1	48
10	Alcohol	No/NA	0	1144
10	Alcohol	Yes	1	104
Category	Second primary skin cancer	No	0	1146
Category	Second primary skin cancer	Yes	1	102

Table 2. Comparison of the classification performance for different algorithms.

Algorithm	TP Rate	FP Rate	Precision	Recall	F1 Score	ROC Area	PRC Area	Acc.	Category
MLP	0.814	0.078	0.913	0.814	0.861	0.902	0.893	0.868	non-SPSC
MLP	0.922	0.186	0.832	0.922	0.875	0.902	0.875	0.868	SPSC
C4.5	0.837	0.051	0.942	0.837	0.886	0.916	0.906	0.893	non-SPSC
C4.5	0.949	0.163	0.853	0.949	0.898	0.916	0.891	0.893	SPSC
AdaBoost C4.5	0.846	0.051	0.944	0.846	0.892	0.931	0.916	0.898	non-SPSC
AdaBoost C4.5	0.949	0.154	0.861	0.949	0.903	0.931	0.92	0.898	SPSC
Bagging C4.5	0.842	0.051	0.942	0.842	0.889	0.927	0.915	0.895	non-SPSC
Bagging C4.5	0.949	0.158	0.857	0.949	0.901	0.927	0.911	0.895	SPSC
Random Forest	0.853	0.048	0.947	0.853	0.897	0.939	0.932	0.902	non-SPSC
Random Forest	0.952	0.147	0.866	0.952	0.907	0.939	0.926	0.902	SPSC

Table 3. Comparison of the random forest classification performance for different costs.

Cost of FN	TP Rate	FP Rate	Precision	Recall	F1 Score	ROC Area	PRC Area	Acc.	Category
1	0.853	0.048	0.947	0.853	0.897	0.939	0.932	0.902	non-SPSC
1	0.952	0.147	0.866	0.952	0.907	0.939	0.926	0.902	SPSC
3	0.784	0.045	0.946	0.784	0.857	0.937	0.932	0.870	non-SPSC
3	0.955	0.216	0.815	0.955	0.880	0.937	0.923	0.870	SPSC
5	0.728	0.041	0.947	0.728	0.823	0.938	0.932	0.843	non-SPSC
5	0.959	0.272	0.779	0.959	0.860	0.938	0.924	0.843	SPSC

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, H.-C.; Lin, T.-C.; Chang, C.-C.; Lu, Y.-C.A.; Lee, C.-M.; Purevdorj, B. Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study. Appl. Sci. 2022, 12, 12520. https://doi.org/10.3390/app122412520

AMA Style

Lee H-C, Lin T-C, Chang C-C, Lu Y-CA, Lee C-M, Purevdorj B. Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study. Applied Sciences. 2022; 12(24):12520. https://doi.org/10.3390/app122412520

Chicago/Turabian Style

Lee, Hsi-Chieh, Tsung-Chieh Lin, Chi-Chang Chang, Yen-Chiao Angel Lu, Chih-Min Lee, and Bolormaa Purevdorj. 2022. "Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study" Applied Sciences 12, no. 24: 12520. https://doi.org/10.3390/app122412520

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study

Abstract

1. Introduction

2. Materials and Methods

2.1. Step 1: Dataset Source

2.2. Step 2: Data Preprocessing

2.3. Step 3: Balancing Dataset

2.4. Step 4: 10-Fold Cross-Validation

2.5. Step 5: Interpretability

3. Results

3.1. Prediction Performance

3.2. Interpretability

4. Discussion

Limitations and Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI