Machine Learning for Recurrence Prediction of Gynecologic Cancers Using Lynch Syndrome-Related Screening Markers
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Preprocessing and EDA
2.3. Classification Models
- (1)
- LR: An LR is a parametric classification model type and a particular form of the generalized linear model. This algorithm predicts a dependent data variable by analyzing the relationship between one or more existing independent variables.
- (2)
- SVM: AN SVM is among the most popular machine learning models wherein a constraint finding the most stable discriminant boundary is added to, thereby forming a perceptron-based model. SVM defines a decision boundary with support vectors and classifies unclassified points by comparing them to the corresponding decision boundaries. The linear kernel is a fundamental function in SVM; however, the radial basis function (RBF) kernel is a popular kernel function used in a variety of kernelized learning algorithms.
- (3)
- NB: The NB algorithm is a probabilistic machine learning algorithm that uses the Bayes theorem. Under the naive assumption that all events are independent, it makes probabilistic predictions by inferring posterior probabilities from prior probabilities. The NB classifier has the advantage of being very efficient to train and requiring a small amount of training data to estimate the classification parameters.
- (4)
- RF: The RF algorithm can predict an output variable based on a majority vote after generating multiple decision trees. As a result, a decision tree can schematize decision rules and their outcomes in the form of a tree structure. The data is then split into multiples by a tree-based model based on specific cutoff values of the feature values. The RF can then improve on the performance of a single tree algorithm by combining decision trees and bagging, where various samples can be extracted from the training set using bootstrap.
- (5)
- GB: Alternatively, the GB algorithm creates a more accurate and stronger learner by combining a simple and weak decision tree. Although the accuracy of the weak tree model reveals a flaw in prediction error, it can be compensated for using the second model. As a result, combining these successively weak tree models yields a more accurate model than the first. In GB, the residual is fitted with a weak tree model, and the predicted value is then updated by adding the predicted residual to the previous prediction.
- (6)
- XGBoost: XGBoost is another model that has been in the spotlight recently in tree-based ensemble learning. Although XGBoost is based on GB, it can overcome GB’s drawbacks, such as slow execution time and a lack of overregulation. XGBoost, in particular, can learn histories in a parallel CPU environment. Therefore, it can complete training faster than the existing GB model.
2.4. Performance Evaluation
3. Results
3.1. Classification Results
3.2. Classification Performance on Tree-Based Approaches
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Roddy, E.; Chapman, J. Genomic insights in gynecologic cancer. Curr. Probl. Cancer 2017, 41, 8–36. [Google Scholar] [CrossRef] [PubMed]
- Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 2011, 474, 609–615. [Google Scholar] [CrossRef] [PubMed]
- Cancer Genome Atlas Research Network; Kandoth, C.; Schultz, N.; Cherniack, A.D.; Akbani, R.; Liu, Y.; Shen, H.; Robertson, A.G.; Pashtan, I.; Shen, R.; et al. Integrated genomic characterization of endometrial carcinoma. Nature 2013, 497, 67–73. [Google Scholar] [CrossRef] [Green Version]
- Cancer Genome Atlas Research Network. Integrated genomic and molecular characterization of cervical cancer. Nature 2017, 543, 378–384. [Google Scholar] [CrossRef] [PubMed]
- Moore, K.; Colombo, N.; Scambia, G.; Kim, B.G.; Oaknin, A.; Friedlander, M.; Lisyanskaya, A.; Floquet, A.; Leary, A.; Sonke, G.S.; et al. Maintenance Olaparib in Patients with Newly Diagnosed Advanced Ovarian Cancer. N. Engl. J. Med. 2018, 379, 2495–2505. [Google Scholar] [CrossRef] [PubMed]
- Gonzalez-Martin, A.; Pothuri, B.; Vergote, I.; DePont Christensen, R.; Graybill, W.; Mirza, M.R.; McCormick, C.; Lorusso, D.; Hoskins, P.; Freyer, G.; et al. Niraparib in Patients with Newly Diagnosed Advanced Ovarian Cancer. N. Engl. J. Med. 2019, 381, 2391–2402. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schmeler, K.M.; Lynch, H.T.; Chen, L.M.; Munsell, M.F.; Soliman, P.T.; Clark, M.B.; Daniels, M.S.; White, K.G.; Boyd-Rogers, S.G.; Conrad, P.G.; et al. Prophylactic surgery to reduce the risk of gynecologic cancers in the Lynch syndrome. N. Engl. J. Med. 2006, 354, 261–269. [Google Scholar] [CrossRef] [Green Version]
- Le, D.T.; Durham, J.N.; Smith, K.N.; Wang, H.; Bartlett, B.R.; Aulakh, L.K.; Lu, S.; Kemberling, H.; Wilt, C.; Luber, B.S.; et al. Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade. Science 2017, 357, 409–413. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Marabelle, A.; Le, D.T.; Ascierto, P.A.; Di Giacomo, A.M.; De Jesus-Acosta, A.; Delord, J.P.; Geva, R.; Gottfried, M.; Penel, N.; Hansen, A.R.; et al. Efficacy of Pembrolizumab in Patients With Noncolorectal High Microsatellite Instability/Mismatch Repair-Deficient Cancer: Results From the Phase II KEYNOTE-158 Study. J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 2020, 38, 1–10. [Google Scholar] [CrossRef]
- Le, D.T.; Uram, J.N.; Wang, H.; Bartlett, B.R.; Kemberling, H.; Eyring, A.D.; Skora, A.D.; Luber, B.S.; Azad, N.S.; Laheru, D.; et al. PD-1 Blockade in Tumors with Mismatch-Repair Deficiency. N. Engl. J. Med. 2015, 372, 2509–2520. [Google Scholar] [CrossRef] [Green Version]
- Butel-Simoes, G.I.; Macrae, F.; Spigelman, A.D. Celebrating the career and contributions of Dr Henry T. Lynch (1928–2019). Intern. Med. J. 2020, 50, 108–109. [Google Scholar] [CrossRef] [PubMed]
- NCCN Guidelines Version 1.2021 Genetic/Familial High-Risk Assessment:Colorectal. Available online: www.nccn.org (accessed on 9 November 2021).
- Patil, N.R.; Khan, G.N. Exceptional Response to A Single Cycle of Immunotherapy in a Lynch Syndrome Patient with Metastatic Pancreatic Adenocarcinoma. Am. J. Case Rep. 2020, 21, e923803. [Google Scholar] [CrossRef]
- Kooshkaki, O.; Derakhshani, A.; Safarpour, H.; Najafi, S.; Vahedi, P.; Brunetti, O.; Torabi, M.; Lotfinejad, P.; Paradiso, A.V.; Racanelli, V.; et al. The Latest Findings of PD-1/PD-L1 Inhibitor Application in Gynecologic Cancers. Int. J. Mol. Sci. 2020, 21, 5034. [Google Scholar] [CrossRef] [PubMed]
- Hart, G.R.; Yan, V.; Huang, G.S.; Liang, Y.; Nartowt, B.J.; Muhammad, W.; Deng, J. Population-Based Screening for Endometrial Cancer: Human vs. Machine Intelligence. Front. Artif. Intell. 2020, 3, 539879. [Google Scholar] [CrossRef]
- Akazawa, M.; Hashimoto, K.; Noda, K.; Yoshida, K. The application of machine learning for predicting recurrence in patients with early-stage endometrial cancer: A pilot study. Obstet Gynecol. Sci. 2021, 64, 266–273. [Google Scholar] [CrossRef] [PubMed]
- Horeweg, N.; de Bruyn, M.; Nout, R.A.; Stelloo, E.; Kedziersza, K.; Leon-Castillo, A.; Plat, A.; Mertz, K.D.; Osse, M.; Jurgenliemk-Schulz, I.M.; et al. Prognostic Integrated Image-Based Immune and Molecular Profiling in Early-Stage Endometrial Cancer. Cancer Immunol. Res. 2020, 8, 1508–1519. [Google Scholar] [CrossRef]
- Dong, H.C.; Dong, H.K.; Yu, M.H.; Lin, Y.H.; Chang, C.C. Using Deep Learning with Convolutional Neural Network Approach to Identify the Invasion Depth of Endometrial Cancer in Myometrium Using MR Images: A Pilot Study. Int. J. Environ. Res. Public Health 2020, 17, 5993. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Wang, Z.; Zhang, J.; Wang, C.; Wang, Y.; Chen, H.; Shan, L.; Huo, J.; Gu, J.; Ma, X. Deep learning model for classifying endometrial lesions. J. Transl. Med. 2021, 19, 10. [Google Scholar] [CrossRef]
- Sun, H.; Zeng, X.; Xu, T.; Peng, G.; Ma, Y. Computer-Aided Diagnosis in Histopathological Images of the Endometrium Using a Convolutional Neural Network and Attention Mechanisms. IEEE J. Biomed. Health Inform. 2020, 24, 1664–1676. [Google Scholar] [CrossRef] [Green Version]
- Takahashi, Y.; Sone, K.; Noda, K.; Yoshida, K.; Toyohara, Y.; Kato, K.; Inoue, F.; Kukita, A.; Taguchi, A.; Nishida, H.; et al. Automated system for diagnosing endometrial cancer by adopting deep-learning technology in hysteroscopy. PLoS ONE 2021, 16, e0248526. [Google Scholar] [CrossRef]
- Hildebrand, L.A.; Pierce, C.J.; Dennis, M.; Paracha, M.; Maoz, A. Artificial Intelligence for Histology-Based Detection of Microsatellite Instability and Prediction of Response to Immunotherapy in Colorectal Cancer. Cancers 2021, 13, 391. [Google Scholar] [CrossRef]
- Hwangbo, S.; Kim, S.I.; Kim, J.H.; Eoh, K.J.; Lee, C.; Kim, Y.T.; Suh, D.S.; Park, T.; Song, Y.S. Development of Machine Learning Models to Predict Platinum Sensitivity of High-Grade Serous Ovarian Carcinoma. Cancers 2021, 13, 1875. [Google Scholar] [CrossRef] [PubMed]
- Chung, H.C.; Ros, W.; Delord, J.P.; Perets, R.; Italiano, A.; Shapira-Frommer, R.; Manzuk, L.; Piha-Paul, S.A.; Xu, L.; Zeigenfuss, S.; et al. Efficacy and Safety of Pembrolizumab in Previously Treated Advanced Cervical Cancer: Results From the Phase II KEYNOTE-158 Study. J. Clin. Oncol. 2019, 37, 1470–1478. [Google Scholar] [CrossRef]
- Santin, A.D.; Deng, W.; Frumovitz, M.; Buza, N.; Bellone, S.; Huh, W.; Khleif, S.; Lankes, H.A.; Ratner, E.S.; O’Cearbhaill, R.E.; et al. Phase II evaluation of nivolumab in the treatment of persistent or recurrent cervical cancer (NCT02257528/NRG-GY002). Gynecol. Oncol. 2020, 157, 161–166. [Google Scholar] [CrossRef]
- Krzyszczyk, P.; Acevedo, A.; Davidoff, E.J.; Timmins, L.M.; Marrero-Berrios, I.; Patel, M.; White, C.; Lowe, C.; Sherba, J.J.; Hartmanshenn, C.; et al. The growing role of precision and personalized medicine for cancer treatment. Technol. Singap. World Sci. 2018, 6, 79–100. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schaefer, J.; Lehne, M.; Schepers, J.; Prasser, F.; Thun, S. The use of machine learning in rare diseases: A scoping review. Orphanet. J. Rare Dis. 2020, 15, 145. [Google Scholar] [CrossRef]
- Aggarwal, R.; Sounderajah, V.; Martin, G.; Ting, D.S.W.; Karthikesalingam, A.; King, D.; Ashrafian, H.; Darzi, A. Diagnostic accuracy of deep learning in medical imaging: A systematic review and meta-analysis. NPJ Digit. Med. 2021, 4, 65. [Google Scholar] [CrossRef]
- von Itzstein, M.S.; Hullings, M.; Mayo, H.; Beg, M.S.; Williams, E.L.; Gerber, D.E. Application of Information Technology to Clinical Trial Evaluation and Enrollment: A Review. JAMA Oncol. 2021. [Google Scholar] [CrossRef] [PubMed]
- Kurnit, K.C.; Reid, P.; Moroney, J.W.; Fleming, G.F. Immune checkpoint inhibitors in women with gynecologic cancers: Practical considerations. Gynecol. Oncol. 2020, 158, 531–537. [Google Scholar] [CrossRef]
- Grossman, J.E.; Vasudevan, D.; Joyce, C.E.; Hildago, M. Is PD-L1 a consistent biomarker for anti-PD-1 therapy? The model of balstilimab in a virally-driven tumor. Oncogene 2021, 40, 1393–1395. [Google Scholar] [CrossRef]
Age, Years | Variable |
---|---|
Median (range) | 54 (21–86) |
ECOG performance status | |
0–1 | 47 (41.2%) |
2–4 | 67 (58.8%) |
FIGO stage at diagnosis | |
I/II | 22 (19.3%) |
III/IV | 84 (73.7%) |
N/A | 8 (7.0%) |
Origin of cancer | |
Cervix | 37 (32.5%) |
Vulvar | 1 (0.9%) |
Ovary/Peritoneum/fallopian tube | 43 (37.7%) |
Endometrium | 23 (20.2%) |
Uterine corpus | 8 (7.0%) |
Gestational trophoblast | 2 (1.8%) |
PD-L1 expression * | |
≥1 | 65 (57.1%) |
<1 | 28 (24.6%) |
N/A | 21 (18.4%) |
MMRd | 26 (22.8%) |
MMRp | 88 (77.2%) |
MSI-H | 11 (9.6%) |
MSI-L | 6 (5.2%) |
MSS | 97 (85.1) |
MMRd or MSI-H | 38 (33.3%) |
MMRp or MSS | 76 (66.7%) |
Target lesion size, mm # | |
Median, range | 60 (10~1230) |
Number of previous lines of chemotherapy | |
1 | 29 (25.4%) |
2 | 39 (34.2%) |
3 | 18 (15.8%) |
4 | 15 (13.2%) |
≥5 | 13 (11.4%) |
Type of immune checkpoint inhibitor | |
Pembrolizumab | 101 (88.6%) |
Nivolumab | 13 (11.4%) |
Origin | MMRd | MMRp | MSI-H | MSI-L | MSS | Recur |
---|---|---|---|---|---|---|
Cervix | 7 (18.9%) | 30 | 3 (8.1%) | 0 | 34 | 25 (67.6%) |
Ovary | 7 (16.3%) | 36 | 2 (4.7%) | 5 | 36 | 36 (83.7%) |
Endometrium | 9 (39.1%) | 14 | 4 (17.4%) | 1 | 18 | 16 (69.6) |
Others | 3 (33.3%) | 8 | 4 (18.2%) | 0 | 9 | 4 (36.4%) |
Total | 26 (22.8%) | 88 | 11 (9.6%) | 6 (5.2%) | 97 | 81 (71.1%) |
Attributes Notation | Attributes | Raw Data Type | One-Hot Encoded Attributes |
---|---|---|---|
MLH1 | MutL homolog 1 | Text | MLH1_intact |
MLH1_loss | |||
MLH1_none | |||
MSH2 | MutS homolog 2 | Text | MSH2_intact |
MSH2_loss | |||
MSH2_none | |||
MSH6 | MutS homolog 6 | Text | MSH2_intact |
MSH6_loss | |||
MSH6_none | |||
PMS2 | PMS1 homolog 2 | Text | PMS2_intact |
PMS2_loss | |||
PMS2_none | |||
MSI | Microsatellite instability | Text | MSI_high |
MSI_low | |||
MSI_stable | |||
Age 60 | Age greater/less than 60 | Numeric | Age60 |
Size | Tumor size | Numeric | Size_1 |
Size_2 | |||
Size_3 | |||
Size_4 |
Attributes | Fisher Odd Ratio | Attributes | Fisher Odd Ratio |
---|---|---|---|
(p-Value) | (p-Value) | ||
MLH1 intact | 3.1233 | PMS2 intact | 3.6571 |
(0.0193) | (0.0147) | ||
MLH1 loss | 0.8 | PMS2 loss | 0.9062 |
(0.7167) | (1.0) | ||
MSH2 intact | 1.84 | MSI high | 0.448 |
(0.2079) | (0.2920) | ||
MSH2 loss | 1.6623 | MSI low | 2.1052 |
(1.0) | (0.6706) | ||
MSH6 intact | 1.5180 | MSI stable | 0.7169 |
(0.4018) | (0.5216) | ||
MSH6 loss | 1.7846 | Age 60 | 1.2606 |
(0.4221) | (0.6610) | ||
Size 1 | 1.0706 | Size 2 | 0.8172 |
(1.0) | (0.8107) | ||
Size 3 | 1.2392 | Size 4 | 0.9455 |
(0.8171) | (1.0) |
Classification Model | Hyperparameter Ranges |
---|---|
Random forest Gradient boosting XGBoost | Maximum depth of tree = [1, 3, 5, 10, 15, 20, 25, 30] Number of estimators = [10, 50, 100, 200, 500, 1000, 1500, 2000] Learning rate = [1, 0.1, 0.01, 0.001, 0.0001, 0.00001] |
Support vector machine | C = [0.001, 0.01, 0.1, 1, 10, 100, 1000, 10,000] Kernel = [“linear,” “rbf”] Gamma = [0.5, 0.1, 0.01, 0.001, 0.0001] |
Logistic regression | C = [0.001, 0.01, 0.1, 1, 10, 100, 1000, 10,000] |
ML Algorithms | 5-Fold CV | 5-Fold CV Rep. 10 |
---|---|---|
RF (train) | 0.972 | 0.978 |
RF (test) | 0.818 | 0.826 |
GB (train) | 0.952 | 0.900 |
GB (test) | 0.779 | 0.782 |
XGBoost (train) | 0.917 | 0.883 |
XGBoost (test) | 0.767 | 0.778 |
LR (train) | 0.872 | 0.876 |
LR (test) | 0.803 | 0.801 |
Linear SVC (train) | 0.871 | 0.875 |
Linear SVC (test) | 0.782 | 0.792 |
Kernel SVC (train) | 0.729 | 0.729 |
Kernel SVC (test) | 0.675 | 0.675 |
NB (train) | 0.855 | 0.855 |
NB (test) | 0.791 | 0.773 |
ML Algorithms | Accuracy (%) | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) |
---|---|---|---|---|---|
Random forest | 88.88 | 90.12 | 87.65 | 87.95 | 89.87 |
Gradient boosting | 88.27 | 88.88 | 87.65 | 87.8 | 88.75 |
XGBoost | 85.8 | 82.71 | 88.88 | 88.15 | 83.72 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, B.W.; Choi, M.C.; Kim, M.K.; Lee, J.-W.; Kim, M.T.; Noh, J.J.; Park, H.; Jung, S.G.; Joo, W.D.; Song, S.H.; et al. Machine Learning for Recurrence Prediction of Gynecologic Cancers Using Lynch Syndrome-Related Screening Markers. Cancers 2021, 13, 5670. https://doi.org/10.3390/cancers13225670
Kim BW, Choi MC, Kim MK, Lee J-W, Kim MT, Noh JJ, Park H, Jung SG, Joo WD, Song SH, et al. Machine Learning for Recurrence Prediction of Gynecologic Cancers Using Lynch Syndrome-Related Screening Markers. Cancers. 2021; 13(22):5670. https://doi.org/10.3390/cancers13225670
Chicago/Turabian StyleKim, Byung Wook, Min Chul Choi, Min Kyu Kim, Jeong-Won Lee, Min Tae Kim, Joseph J. Noh, Hyun Park, Sang Geun Jung, Won Duk Joo, Seung Hun Song, and et al. 2021. "Machine Learning for Recurrence Prediction of Gynecologic Cancers Using Lynch Syndrome-Related Screening Markers" Cancers 13, no. 22: 5670. https://doi.org/10.3390/cancers13225670
APA StyleKim, B. W., Choi, M. C., Kim, M. K., Lee, J.-W., Kim, M. T., Noh, J. J., Park, H., Jung, S. G., Joo, W. D., Song, S. H., & Lee, C. (2021). Machine Learning for Recurrence Prediction of Gynecologic Cancers Using Lynch Syndrome-Related Screening Markers. Cancers, 13(22), 5670. https://doi.org/10.3390/cancers13225670