Machine Learning-Based Risk Stratification for Gestational Diabetes Management
Abstract
:1. Introduction
2. System Design
2.1. Participant Inclusion and Exclusion
2.2. Hyperglycaemia Risk Score Definition
3. Methods
3.1. Data Preprocessing
3.2. Model Development and Hyperparameter Optimization
4. Results
4.1. Model Training and Internal Validation
4.2. External Validation
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Software Packages and Implementation
Appendix B. Summary Characteristics of Processed Datasets
Appendix B.1. Oxford University Hospitals NHS Foundation Trust (OUH)
Feature: | Count | Mean | SD | 25% | 50% | 75% |
---|---|---|---|---|---|---|
Pre-Breakfast Tag | 840 | 5.335 | 1.531 | 4.8 | 5.200 | 5.648 |
Post-Breakfast Tag | 840 | 7.066 | 2.712 | 6.05 | 6.767 | 7.588 |
Post-Lunch Tag | 840 | 6.664 | 2.185 | 5.881 | 6.467 | 7.150 |
Post-Dinner Tag | 840 | 6.814 | 2.058 | 5.968 | 6.664 | 7.350 |
High-Readings Proportion | 840 | 0.249 | 0.225 | 0.083 | 0.222 | 0.375 |
Pre-Breakfast Gradient | 840 | −0.015 | 0.636 | −0.25 | 0.000 | 0.200 |
Post-Breakfast Gradient | 840 | −0.066 | 1.485 | −0.551 | −0.050 | 0.500 |
Post-Lunch Gradient | 840 | 0.019 | 1.024 | −0.550 | 0.000 | 0.550 |
Post-Dinner Gradient | 840 | −0.044 | 0.994 | −0.550 | 0.000 | 0.462 |
Gestational Age | 840 | 221.830 | 39.966 | 203.083 | 229 | 251.5 |
Medication | 840 | 0.356 | 0.479 | 0 | 0 | 1 |
BMI | 452 | 30.964 | 6.799 | 25.910 | 30.425 | 34.963 |
Appendix B.2. Royal Berkshire Hospitals NHS Foundation Trust (RBH)
Feature: | Count | Mean | SD | 25% | 50% | 75% |
---|---|---|---|---|---|---|
Pre-Breakfast Tag | 186 | 5.155 | 0.889 | 4.6 | 5.028 | 5.425 |
Post-Breakfast Tag | 186 | 6.84 | 1.204 | 6.037 | 6.658 | 7.4 |
Post-Lunch Tag | 186 | 9.88 | 44.07 | 5.969 | 6.6 | 7.347 |
Post-Dinner Tag | 186 | 6.86 | 1.164 | 6.153 | 6.709 | 7.296 |
High-Readings Proportion | 186 | 0.207 | 0.187 | 0.067 | 0.167 | 0.3 |
Appendix C. Model Development
Appendix C.1. Final Hyperparameter Values Used for Models
Feature Set | Colsample (Tree) | Gamma | Learning Rate | Max Depth | N Estimators | Sub-Sample |
---|---|---|---|---|---|---|
Tags | 0.5 | 0.9 | 0.05 | 3 | 58 | 0.7 |
Tags, Gradients | 0.7 | 0.9 | 0.05 | 3 | 57 | 0.9 |
Tags, High-Readings | 0.5 | 0.9 | 0.05 | 3 | 59 | 0.9 |
Tags, Gradients, High-Readings | 0.7 | 0.5 | 0.05 | 3 | 58 | 0.9 |
Tags, EHR | 0.5 | 0.9 | 0.05 | 3 | 59 | 0.9 |
Tags, EHR, Gradients | 0.7 | 0.9 | 0.05 | 3 | 59 | 0.9 |
Tags, EHR, High-Readings | 0.5 | 0.5 | 0.05 | 3 | 46 | 0.9 |
Tags, EHR, Gradients, High-Readings | 0.7 | 0.5 | 0.05 | 3 | 47 | 0.9 |
Before Breakfast | After Breakfast | After Lunch | After Dinner |
---|---|---|---|
0.11896724 | 0.02644718 | 0.0061178 | 0.05830037 |
Appendix C.2. SHAP Analysis Results
Feature Sets Used | p-Value |
---|---|
Tags, Gradients | <0.0001 |
Tags, High-Readings | 0.309 |
Tags, Gradients, High-Readings | 0.308 |
Tags, EHR | <0.0001 |
Tags, EHR, Gradients | <0.0001 |
Tags, EHR, High-Readings | 0.232 |
Tags, EHR, Gradients, High-Readings | 0.124 |
Appendix D. Additional Analyses
Appendix D.1. Final Hyperparameter Values Used for Models
Feature Set | Colsample (Tree) | Gamma | Learning Rate | Max Depth | N Estimators | Sub-Sample |
---|---|---|---|---|---|---|
Tags, Gradients, High-readings, EHR (including BMI) | 0.5 | 0.9 | 0.05 | 3 | 51 | 0.9 |
Tags | 0.5 | 0.7 | 0.05 | 3 | 59 | 0.9 |
Appendix D.2. Model Development on a Training Set including BMI as a Predictor Variable
Feature Sets Used | Model | MSE | R2 | MAE | Rank Accuracy | ||
---|---|---|---|---|---|---|---|
Lower | Middle | Upper | |||||
Tags, Gradients, High-readings, EHR (including BMI) | MLR | 0.023 (0.020–0.113) | 0.404 (0.000–0.506) | 0.116 (0.109–0.129) | 0.624 (0.588–0.660) | 0.433 (0.396–0.466) | 0.642 (0.605–0.679) |
Random Forest Regression | 0.022 (0.020–0.024) | 0.475 (0.406–0.527) | 0.114 (0.109–0.119) | 0.615 (0.576–0.655) | 0.426 (0.393–0.459) | 0.644 (0.604–0.679) | |
XGBoost Regression | 0.022 (0.020–0.025) | 0.470 (0.410–0.513) | 0.113 (0.107–0.119) | 0.612 (0.575–0.654) | 0.421 (0.385–0.455) | 0.643 (0.607–0.679) |
Appendix D.3. Model Development on a Training Set with a Fixed Blood Glucose Range
Feature Sets Used | Model | MSE | R2 | MAE | Rank Accuracy | ||
---|---|---|---|---|---|---|---|
Lower | Middle | Upper | |||||
Tags, High-readings, Intervention, Gradients | MLR | 0.022 (0.020–0.024) | 0.445 (0.397–0.481) | 0.114 (0.111–0.118) | 0.613 (0.587–0.639) | 0.418 (0.393–0.444) | 0.651 (0.621–0.678) |
Random Forest Regression | 0.022 (0.020–0.024) | 0.445 (0.399–0.479) | 0.116 (0.113–0.120) | 0.596 (0.567–0.624) | 0.403 (0.379–0.429) | 0.640 (0.610–0.668) | |
XGBoost Regression | 0.020 (0.019–0.022) | 0.482 (0.437–0.514) | 0.111 (0.108–0.115) | 0.605 (0.580–0.632) | 0.412 (0.388–0.439) | 0.651 (0.625–0.678) |
References
- IDF Atlas, The International Diabetes Federation Altas Tenth Edition 2021. Available online: https://idf.org/aboutdiabetes/what-is-diabetes/facts-figures.html (accessed on 21 April 2022).
- American Diabetes Association. Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care 2003, 26 (Suppl. 1), S5–S20. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Oskovi-Kaplan, Z.A.; Ozgu-Erdinc, A.S. Management of gestational diabetes mellitus. In Diabetes: From Research to Clinical Practice; Springer: Cham, Switzerland, 2020; pp. 257–272. [Google Scholar]
- Martis, R.; Crowther, C.A.; Shepherd, E.; Alsweiler, J.; Downie, M.R.; Brown, J. Treatments for women with gestational diabetes mellitus: An overview of Cochrane systematic reviews. Cochrane Database Syst. Rev. 2018, 8, CD012327. [Google Scholar] [CrossRef] [PubMed]
- Turok, D.K.; Ratcliffe, S.; Baxley, E.G. Management of gestational diabetes mellitus. Am. Fam. Physician 2003, 68, 1767–1772. [Google Scholar] [PubMed]
- Mackillop, L.; Loerup, L.; Bartlett, K.; Farmer, A.; Gibson, O.J.; Hirst, J.E.; Tarassenko, L. Development of a real-time smartphone solution for the management of women with or at high risk of gestational diabetes. J. Diabetes Sci. Technol. 2014, 8, 1105–1114. [Google Scholar] [CrossRef] [PubMed]
- National Institute for Health and Care Excellence. Diabetes in Pregnancy: Management of Diabetes and Its Complications from Preconception to the Postnatal Period. NG3. 15 February 2015. Available online: https://www.nice.org.uk/guidance/ng3 (accessed on 15 January 2022).
- Li, Y.; Ren, X.; He, L.; Li, J.; Zhang, S.; Chen, W. Maternal age and the risk of gestational diabetes mellitus: A systematic review and meta-analysis of over 120 million participants. Diabetes Res. Clin. Pract. 2020, 162, 108044. [Google Scholar] [CrossRef] [PubMed]
- Bochkur Dratver, M.A.; Arenas, J.; Thaweethai, T.; Yu, C.; James, K.; Rosenberg, E.A.; Powe, C.E. Longitudinal changes in glucose during pregnancy in women with gestational diabetes risk factors. Diabetologia 2021, 65, 541–551. [Google Scholar] [CrossRef] [PubMed]
- Finneran, M.M.; Landon, M.B. Oral agents for the treatment of gestational diabetes. Curr. Diabetes Rep. 2018, 18, 119. [Google Scholar] [CrossRef] [PubMed]
Feature Set | Features | Definition |
---|---|---|
Tags | Pre-breakfast reading, Post-breakfast reading, Post-lunch reading, Post-dinner reading | Tags correspond to mean blood glucose measurements for a given time-point (tag), over the three-day observation period |
Gradients | Pre-breakfast gradient, Post-breakfast gradient, Post-lunch gradient, Post-dinner gradient | Gradients correspond to the rate of change in blood glucose for a given time-point (tag), over the three-day observation period |
High-readings | Percentage of high readings | High-readings is the percentage of high-readings among all blood glucose measurements within the three-day observation period. This feature is also calculated for the subsequent three days (three days following the observation window) and used as the predicted output of the models. |
EHR | Maternal age, Gestational day, Medication | Maternal age is the age of the woman when she is confirmed with pregnancy. Gestational day is the average day of the woman’s pregnancy (gestational) days over each three-day window. Medication is a binary feature, defined as anyone undertaking Metformin and insulin during their pregnancy. |
Characteristic | OUH Cohort | RBH Cohort |
---|---|---|
Age (years) | 33.6 (5.2) | 33.6 (4.9) * |
BMI | 31.0 (6.8) * | 28.6 (6.9) * |
Ethnicity | Not Stated: 506 White: 231 South Asian: 44 Black: 27 Other: 22 Chinese: 8 Mixed: 2 | Not Stated: 43 White: 63 South Asian: 66 Black: 7 Other: 7 |
Model | MSE | R2 | MAE | Rank Accuracy | ||
---|---|---|---|---|---|---|
Lower | Middle | Upper | ||||
MLR | 0.035 (0.031–0.149) | 0.155 (0.000–0.179) | 0.142 (0.127–0.150) | 0.570 (0.516–0.628) | 0.403 (0.372–0.433) | 0.601 (0.537–0.665) |
Random Forest Regression | 0.022 (0.021–0.024) | 0.447 (0.400–0.482) | 0.117 (0.114–0.121) | 0.598 (0.569–0.625) | 0.404 (0.378–0.430) | 0.639 (0.610–0.665) |
XGBoost Regression | 0.021 (0.019–0.023) | 0.482 (0.442–0.516) | 0.112 (0.109–0.116) | 0.609 (0.582–0.633) | 0.413 (0.387–0.438) | 0.650 (0.624–0.675) |
Feature Sets Used | MSE | R2 | MAE | Rank Accuracy | ||
---|---|---|---|---|---|---|
Lower | Middle | Upper | ||||
Tags | 0.021 (0.020–0.021) | 0.507 (0.494–0.519) | 0.110 (0.109–0.111) | 0.624 (0.609–0.639) | 0.413 (0.391–0.430) | 0.627 (0.612–0.639) |
Tags, High-Readings | 0.020 (0.020–0.021) | 0.519 (0.505–0.530) | 0.108 (0.107–0.110) | 0.639 (0.624–0.654) | 0.420 (0.398–0.440) | 0.624 (0.607–0.639) |
Feature Sets Used | MSE | R2 | MAE | Rank Accuracy | ||
---|---|---|---|---|---|---|
Lower | Middle | Upper | ||||
Tags | 0.021 (0.019–0.023) | 0.482 (0.442–0.516) | 0.112 (0.109–0.116) | 0.609 (0.582–0.633) | 0.413 (0.387–0.438) | 0.650 (0.624–0.675) |
Tags, Gradients | 0.021 (0.019–0.023) | 0.480 (0.442–0.517) | 0.112 (0.108–0.116) | 0.608 (0.580–0.635) | 0.412 (0.385–0.437) | 0.650 (0.626–0.675) |
Tags, High-Readings | 0.021 (0.019–0.022) | 0.488 (0.446–0.521) | 0.112 (0.108–0.116) | 0.616 (0.589–0.643) | 0.418 (0.394–0.444) | 0.655 (0.628–0.681) |
Tags, Gradients, High-Readings | 0.021 (0.019–0.022) | 0.484 (0.444–0.515) | 0.112 (0.108–0.115) | 0.611 (0.586–0.638) | 0.416 (0.391–0.441) | 0.654 (0.628–0.678) |
Tags, EHR | 0.021 (0.019–0.023) | 0.480 (0.440–0.510) | 0.112 (0.108–0.116) | 0.611 (0.583–0.635) | 0.412 (0.387–0.438) | 0.650 (0.622–0.676) |
Tags, EHR, Gradients | 0.021 (0.019–0.023) | 0.480 (0.437–0.514) | 0.112 (0.108–0.116) | 0.610 (0.581–0.635) | 0.414 (0.387–0.440) | 0.650 (0.623–0.676) |
Tags, EHR, High-Readings | 0.021 (0.019–0.022) | 0.486 (0.448–0.518) | 0.112 (0.108–0.116) | 0.615 (0.589–0.641) | 0.418 (0.392–0.442) | 0.655 (0.628–0.681) |
Tags, EHR, Gradients, High-Readings | 0.021 (0.019–0.023) | 0.484 (0.442–0.514) | 0.112 (0.108–0.116) | 0.613 (0.586–0.643) | 0.417 (0.392–0.444) | 0.654 (0.629–0.680) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, J.; Clifton, D.; Hirst, J.E.; Kavvoura, F.K.; Farah, G.; Mackillop, L.; Lu, H. Machine Learning-Based Risk Stratification for Gestational Diabetes Management. Sensors 2022, 22, 4805. https://doi.org/10.3390/s22134805
Yang J, Clifton D, Hirst JE, Kavvoura FK, Farah G, Mackillop L, Lu H. Machine Learning-Based Risk Stratification for Gestational Diabetes Management. Sensors. 2022; 22(13):4805. https://doi.org/10.3390/s22134805
Chicago/Turabian StyleYang, Jenny, David Clifton, Jane E. Hirst, Foteini K. Kavvoura, George Farah, Lucy Mackillop, and Huiqi Lu. 2022. "Machine Learning-Based Risk Stratification for Gestational Diabetes Management" Sensors 22, no. 13: 4805. https://doi.org/10.3390/s22134805
APA StyleYang, J., Clifton, D., Hirst, J. E., Kavvoura, F. K., Farah, G., Mackillop, L., & Lu, H. (2022). Machine Learning-Based Risk Stratification for Gestational Diabetes Management. Sensors, 22(13), 4805. https://doi.org/10.3390/s22134805