1. Introduction
Globally, breast cancer is the most commonly diagnosed cancer and the second leading cause of cancer-related death in women [
1]. Currently, the main clinical approach is surgical treatment assisted with multi-disciplinary methods, such as radiotherapy, chemotherapy, and targeted therapy [
2]. However, accurate prediction of recurrence after breast cancer surgery is associated with improved allocation and use of health care resources and can also improve coordination of healthcare and the efficiency of healthcare resource allocation for these patients.
Machine learning algorithms use sample data to learn and identify patterns so that they can use new data to make predictions. A recent study developed several novel artificial neural network (ANN) models for diagnosis of human colorectal cancer (CRC) based on data from The Cancer Genome Atlas (TCGA) [
3]. The 10-fold cross-validation results for the training and testing datasets in that study demonstrated the excellent performance of the back propagation (BP) and learning vector quantization models in terms of prediction accuracy, area under the curve (AUC) values, robustness, accuracy, and sensitivity. Their results inspired the models developed in the current study, which integrate gene expression profiling data and artificial intelligence algorithms in improved diagnostic tools for CRC. In another recent study, accuracy in predicting breast cancer recurrence was compared among conventional and recently developed data mining algorithms [
4]. According to the comparison results, the decision tree C5.0 algorithm may be the best tool for predicting breast cancer recurrence, particularly 3-year recurrence, in patients who are in distant recurrence stage or nonrecurrence stage. In the dataset, the best predictors of breast cancer recurrence were lymph node (LN) involvement, human epidermal growth factor-receptor-2 (HER2) value, tumor size, and tumor margin (free versus closed). In another recent study, a naïve Bayesian classifier (NBC) model were used to predict breast cancer recurrence within 5 years after breast cancer surgery. The prediction performance of the proposed NBC model was comparable to that of previous models that have used support vector machine (SVM) (AUC = 0.85) or ANN (AUC = 0.85) [
5]. The nomogram-based approach is attractive because it does not require computation or calculation for prediction of breast cancer recurrence.
Although many forecasting models for predicting outcomes after breast cancer surgery have been proposed in recent years, models for predicting recurrence within 10 years after breast cancer surgery have had major shortcomings: (1) recently proposed forecasting models have lower prediction accuracy compared to conventional models [
6,
7], (2) proposed forecasting models require use of health insurance claims data, which may be unavailable for real-time use in clinical settings [
8,
9], and (3) predictions of postoperative recurrence after breast surgery do not consider demographic characteristics, clinical characteristics, quality of care and preoperative health-related quality of life [
10,
11]. Successful applications of statistical data mining and machine learning methods have been demonstrated in the medical field [
7,
8,
9,
10,
11]. Clinical and genetic information can be used to improve precision in estimating prognosis and to obtain a comprehensive overview of a disease. Given the rapid accumulation of real-world data, the development of machine learning technologies provides the capability to generate risk stratification models that can efficiently consider numerous predictors.
Few studies of recurrence after breast cancer surgery have used longitudinal data for more than ten years. Moreover, no studies have considered group differences in factors other than outcome, such as demographic characteristics and clinical characteristics. Additionally, no studies have discussed machine learning algorithms for predicting recurrence within 10 years after breast cancer surgery. Health researchers can use the predictive simulation results obtained in this study not only to develop and improve healthcare policies, but also to improve healthcare decision making. The aim of this study was to compare the five forecasting models in terms of accuracy in predicting and identifying significant predictors of recurrence within 10 years after breast cancer surgery.
4. Discussion
To the best of our knowledge, this study is the first to use forecasting models to analyze recurrence within 10 years after breast cancer surgery. Accuracy in predicting recurrence within 10 years in breast cancer patients after surgery was compared among the five forecasting models. When all models were constructed using a given set of clinical inputs, the ANN model was clearly superior to other forecasting models. Furthermore, unlike previous works in which the analyses were performed using a dataset for a single medical center, our study used prospective and longitudinal data from multiple medical centers, which provides a more accurate depiction of current treatment for breast cancer patients after surgery [
7,
8,
9]. Additionally, in contrast with previous series studies that have used data for a single institution, this study used registry data to obtain a more accurate depiction of breast cancer surgery treatment in large populations. Using registry data also minimizes referral bias or bias caused by the practices of a single physician or a single institution [
24].
Several strengths of this analysis should be noted. To our knowledge, this investigation is the first to compare machine learning algorithms, including regression-based method, to predict recurrence within 10 years after breast cancer surgery in a large general population. Unlike previously developed machine learning-based prognostic tools in oncology, the forecasting models in the study were trained on data for all patients treated at oncology or hematology/oncology clinics regardless of history of cancer-directed therapy. Furthermore, compared with machine learning algorithms previously applied in oncology, the forecasting models in this study included more numerous predictors, all of which are typically available in structured formats in real-time medical recorder databases. Thus, these forecasting models are more efficient than previously trained machine learning algorithms in the general oncology setting. The 10-year follow-up period in this prospective cohort study was also longer than that in previous works. Finally, most of the patients that the model classified as high-risk patients would be deemed appropriate for discussion of end-of-life preferences in a clinical setting.
Recent works have repeatedly demonstrated the superior performance of the ANN model compared to other forecasting models [
25,
26,
27]. The advantages offered by the unique characteristics of the ANN model have been confirmed by statistical analyses. For example, using an ANN model enables more appropriate and more accurate processing of inputs that are incomplete or inputs that introduce noise. Another advantage is that linear and non-linear ANN models with good potential for use in large-scale medical databases can be constructed using data that are highly correlated but not normally distributed. Prognosis prediction is only one of the many applications of ANN models in clinical research in the medical field. Furthermore, the comparisons of various forecasting models in this study suggest that, by expanding the number of potential predictors, the ANN model facilitates systematic analysis of various diseases and facilitates comparisons of the effectiveness of research methods. Additionally, the proposed model can be extended to outcome prediction for treatments other than breast cancer surgery.
The global sensitivity analysis of the weights of significant predictors of recurrence within 10 years in breast cancer patients after surgery in this study revealed that the best predictor was surgeon volume, followed by hospital volume. This finding is consistent with earlier reports that, compared to all other breast cancer treatment variables, surgeon volume and hospital volume are the best predictors of breast cancer surgery outcomes, including treatment costs, health-related quality of life, readmission, complications, and recurrence after surgery [
28,
29,
30]. Compared to a low-volume surgeon, a high-volume surgeon accompanied with a well-trained medical team tends to perform better in terms of operating time, quality of surgical procedure, discharge planning, and medical outcomes, all of which can potentially reduce postoperative recurrence. Morche et al. performed a meta-analysis of thirty-two reviews reporting on fifteen surgical procedures to investigate whether surgeon volume is a prognostic predictor of quality of health care [
28]. Their meta-analysis of data for 32 publications with 15 different cancer procedures revealed that, in addition to volume-outcome relationship, surgeon volume is a significant independent predictor of medical outcomes in the general population of cancer patients.
Shi et al. retrospectively analyzed 97,215 breast cancer surgeries to examine the longitudinal effect of both hospital volume and surgeon volume on medical resource utilization and medical outcomes after surgical resection of breast cancer [
21]. The study concluded that surgeon volume and hospital volume are significant independent predictors of total direct medical costs and postoperative recurrence (
p < 0.001). The likely explanation for this finding is that ‘practice makes perfect’ and high surgical volumes not only improve surgical skills, but also reduce postoperative recurrence. The importance of surgeon volume and hospital volume for predicting outcomes in patients after cancer surgery is now well recognized. For investigators, these assessments enable a more comprehensive depiction of the potential burden on the patient after the envisaged (palliative) treatment in terms of its effects on medical resource utilization and medical outcomes simultaneously. Thus, surgeon volume and hospital volume, in addition to clinical attributes, should be included as a standard risk factor or predictor in future randomized controlled trials. Using s for stratification would improve the quality of future trials by increasing the homogeneity of treatment groups and would aid understanding of their results.
In agreement with previous studies [
30,
31], the present study found that advanced breast cancer stage was significantly associated with recurrence within 10 years after breast cancer surgery. During the study period, 369 patients (32.4%) had a tumor stage I, 456 (40.0%) tumor stage II, and 315 (27.6%) tumor stage III. Early diagnosis of breast cancer disease and curative retreatment are likely to improve recurrence. After surgery, breast cancer patients are often burdened by multiple cancer-related comorbidities that increase their risk of poor postoperative outcomes, including complications, a long hospital stay, a short survival time, and high treatment costs. As reported by Wu et al., tumor stage is an important predictor of recurrence after cancer surgery [
32]. Our global sensitivity analysis also indicated that recurrence within 10 years after breast cancer surgery tends to increase in patients with late-stage tumors, which is consistent with other works [
30,
31,
32].
This prospective observational study of a cohort of breast cancer surgery patients in Taiwan analyzed data for patients treated at multiple healthcare institutions. The ANN model developed in this study improves accuracy in identifying factors significantly associated with recurrence within 10 years after breast cancer surgery. However, the proposed forecasting model has many other potential clinical applications. For example, healthcare institutions can improve care quality by using the methods developed in this study to evaluate the effectiveness of medical treatment. Since the proposed ANN model accurately predicts recurrence within 10 years after breast cancer surgery, healthcare administrators and medical professionals at other institutions can use the model to demonstrate the need for prompt and appropriate postsurgical treatment. Broader potential applications of the model in Taiwan include facilitating the formulation and promotion of healthcare policies and the development of decision-support systems, which would ultimately contribute to improved health in all cancer patients. However, further studies are needed to determine the true clinical relevance of the ANN model and to clarify whether the model has practical clinical applications in predicting prognosis and in optimizing medical management for breast cancer patients after surgery.
This study has several limitations inherent in any large database analysis. First, the validity of the comparisons in the study is limited by the exclusion of complications associated with recurrence after surgery. Second, the analysis was limited to recurrence over a 10-year period after surgery, which reduced the subset of breast cancer patients in which the ANN model is clinically applicable. Third, this study only compared individual ANN, KNN, SVM, NBC, and COX models. Future works may consider the use of an alternative study design that compares a balanced sample of surgeons or hospitals at the first level and then randomly selects breast cancer patients at the second level. Thus, the relative importance of patient and provider characteristics could be delineated in multilevel modeling. Another advantage is that interacting effects of patient and provider characteristics on breast cancer recurrence could be detected. Nevertheless, the results can still be considered valid given the robustness and statistical significance of the results.