Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach
Abstract
:1. Introduction
2. Materials and Methods
2.1. Population and Sample
2.2. Dependent Variable
2.3. Independent Variables (Features)
2.4. Data Analysis
3. Results
4. Discussion
4.1. Summary of Main Findings
4.2. Algorithm Performance
4.3. Support for the Study Hypotheses
4.4. Similarity and Differences to Past Research
4.5. Study limitations
5. Conclusions
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Carlo, W.A.; Goudar, S.S.; Jehan, I.; Chomba, E.; Tshefu, A.; Garces, A.; Parida, S.; Althabe, F.; McClure, E.M.; Derman, R.J.; et al. High Mortality Rates for Very Low Birth Weight Infants in Developing Countries Despite Training. Pediatrics 2010, 126, e1072–e1080. [Google Scholar] [CrossRef] [Green Version]
- Blencowe, H.; Cousens, S.; Chou, D.; Oestergaard, M.; Say, L.; Moller, A.-B.; Kinney, M.; Lawn, J.; the Born Too Soon Preterm Birth Action Group. Born Too Soon: The global epidemiology of 15 million preterm births. Reprod. Health 2013, 10 (Suppl. S1), S2. [Google Scholar] [CrossRef] [Green Version]
- Blencowe, H.; Lee, A.C.; Cousens, S.; Bahalim, A.; Narwal, R.; Zhong, N.; Chou, D.; Say, L.; Modi, N.; Katz, J.; et al. Preterm birth–associated neurodevelopmental impairment estimates at regional and global levels for 2010. Pediatr. Res. 2013, 74, 17–34. [Google Scholar] [CrossRef] [Green Version]
- Murray, C.J.L.; Vos, T.; Lozano, R.; Naghavi, M.; Flaxman, A.D.; Michaud, C.; Ezzati, M.; Shibuya, K.; Salomon, J.A.; Abdalla, S.; et al. Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990–2010: A systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012, 380, 2197–2223. [Google Scholar] [CrossRef]
- Goldenberg, R.L.; Hauth, J.C.; Andrews, W.W. Intrauterine Infection and Preterm Delivery. N. Engl. J. Med. 2000, 342, 1500–1507. [Google Scholar] [CrossRef]
- Sibai, B.M. Preeclampsia as a Cause of Preterm and Late Preterm (Near-Term) Births. Semin. Perinatol. 2006, 30, 16–19. [Google Scholar] [CrossRef]
- Hossain, R.; Harris, T.; Lohsoonthorn, V.; Williams, M.A. Risk of preterm delivery in relation to vaginal bleeding in early pregnancy. Eur. J. Obstet. Gynecol. Reprod. Biol. 2007, 135, 158–163. [Google Scholar] [CrossRef] [Green Version]
- Levy, A.; Fraser, D.; Katz, M.; Mazor, M.; Sheiner, E. Maternal anemia during pregnancy is an independent risk factor for low birthweight and preterm delivery. Eur. J. Obstet. Gynecol. Reprod. Biol. 2005, 122, 182–186. [Google Scholar] [CrossRef]
- Melikova, S.; Bagirova, H.; Magalov, S. The impact of maternal epilepsy on delivery and neonatal outcomes. Child’s Nerv. Syst. 2019, 36, 775–782. [Google Scholar] [CrossRef]
- Sorensen, T.K.; Dempsey, J.C.; Xiao, R.; Frederick, I.O.; Luthy, D.A.; Williams, M.A. Maternal Asthma and Risk of Preterm Delivery. Obstet. Gynecol. Surv. 2003, 58, 702–703. [Google Scholar] [CrossRef]
- Liu, B.; Xu, G.; Sun, Y.; Qiu, X.; Ryckman, K.K.; Yu, Y.; Snetselaar, L.G.; Bao, W. Maternal cigarette smoking before and during pregnancy and the risk of preterm birth: A dose–response analysis of 25 million mother–infant pairs. PLOS Med. 2020, 17, e1003158. [Google Scholar] [CrossRef]
- Della Rosa, P.A.; Miglioli, C.; Caglioni, M.; Tiberio, F.; Mosser, K.H.; Vignotto, E.; Canini, M.; Baldoli, C.; Falini, A.; Candiani, M.; et al. A hierarchical procedure to select intrauterine and extrauterine factors for methodological validation of preterm birth risk estimation. BMC Pregnancy Childbirth 2021, 21, 306. [Google Scholar] [CrossRef]
- Delnord, M.; Zeitlin, J. Epidemiology of late preterm and early term births—An international perspective. Semin. Fetal Neonatal Med. 2019, 24, 3–10. [Google Scholar] [CrossRef] [Green Version]
- Shapiro-Mendoza, C.K.; Lackritz, E.M. Epidemiology of late and moderate preterm birth. Semin. Fetal Neonatal Med. 2012, 17, 120–125. [Google Scholar] [CrossRef] [Green Version]
- Trivers, R.L. Parent-offspring conflict. Integr. Comp. Biol. 1974, 141, 249–264. [Google Scholar] [CrossRef] [Green Version]
- Haig, D. Genetic Conflicts in Human Pregnancy. Q. Rev. Biol. 1993, 68, 495–532. [Google Scholar] [CrossRef] [Green Version]
- Williams, T.C.; Drake, A.J. Preterm birth in evolutionary context: A predictive adaptive response? Philos. Trans. R. Soc. B Biol. Sci. 2019, 374, 20180121. [Google Scholar] [CrossRef] [Green Version]
- Gluckman, P.D.; Hanson, M.A.; Beedle, A.S. Early life events and their consequences for later disease: A life history and evolutionary perspective. Am. J. Hum. Biol. 2007, 19, 1–19. [Google Scholar] [CrossRef]
- Hanson, M.A.; Gluckman, P.D. Early Developmental Conditioning of Later Health and Disease: Physiology or Pathophysiology? Physiol. Rev. 2014, 94, 1027–1076. [Google Scholar] [CrossRef]
- Leidy, N.K.; Malley, K.G.; Steenrod, M.A.W.; Mannino, D.M.; Make, B.J.; Bowler, R.P.; Thomashow, B.M.; Barr, R.G.; Rennard, S.I.; Houfek, J.F.; et al. Insight into Best Variables for COPD Case Identification: A Random Forests Analysis. Chronic Obstr. Pulm. Dis. J. COPD Found. 2016, 3, 406–418. [Google Scholar] [CrossRef]
- De Lobel, L.; Geurts, P.; Baele, G.; Castro-Giner, F.; Kogevinas, M.; Van Steen, K. A screening methodology based on Random Forests to improve the detection of gene–gene interactions. Eur. J. Hum. Genet. 2010, 18, 1127–1132. [Google Scholar] [CrossRef]
- Heinze, G.; Wallisch, C.; Dunkler, D. Variable selection—A review and recommendations for the practicing statistician. Biom. J. 2018, 60, 431–449. [Google Scholar] [CrossRef] [Green Version]
- Ketende, S.; Jones, E. User Guide to Analysing MCS Data Using Stata; Centre for Longitudinal Studies: London, UK, 2011. [Google Scholar]
- Quigley, M.; Hockley, C.; Davidson, L. Agreement between hospital records and maternal recall of mode of delivery: Evidence from 12 391 deliveries in the UK Millennium Cohort Study. BJOG Int. J. Obstet. Gynaecol. 2007, 114, 195–200. [Google Scholar] [CrossRef]
- Hockley, C.; Quigley, M.; Hughes, G.; Calderwood, L.; Joshi, H.; Davidson, L.L. Linking Millennium Cohort data to birth registration and hospital episode records. Paediatr. Perinat. Epidemiol. 2007, 22, 99–109. [Google Scholar] [CrossRef]
- Connelly, R.; Platt, L. Cohort Profile: UK Millennium Cohort Study (MCS). Leuk. Res. 2014, 43, 1719–1725. [Google Scholar] [CrossRef] [Green Version]
- IBM Cloud Education. Random Forest. Available online: https://www.ibm.com/cloud/learn/random-forest (accessed on 8 November 2022).
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Grömping, U. Variable Importance Assessment in Regression: Linear Regression versus Random Forest. Am. Stat. 2009, 63, 308–319. [Google Scholar] [CrossRef]
- Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. Promot. Commun. Stat. Stata 2020, 20, 3–29. [Google Scholar] [CrossRef]
- Romero, R.; Dey, S.K.; Fisher, S.J. Preterm labor: One syndrome, many causes. Science 2014, 345, 760–765. [Google Scholar] [CrossRef] [Green Version]
- McHale, P.; Maudsley, G.; Pennington, A.; Schlüter, D.K.; Ben Barr, B.; Paranjothy, S.; Taylor-Robinson, D. Mediators of socioeconomic inequalities in preterm birth: A systematic review. BMC Public Health 2022, 22, 1134. [Google Scholar] [CrossRef]
- Dunkel Schetter, C. Psychological science on pregnancy: Stress processes, biopsychosocial models, and emerging research issues. Annu. Rev. Psychol. 2011, 62, 531–558. [Google Scholar] [CrossRef]
- Lu, M.J.; Huang, K.; Yan, S.Q.; Zhu, B.B.; Shao, S.S.; Zhu, P.; Tao, F.B. Association of antenatal anxiety with preterm birth and low birth weight: Evidence from a birth cohort study. Zhonghua Liu Xing Bing Xue Za Zhi Zhonghua Liuxingbingxue Zazhi 2020, 41, 1072–1075. [Google Scholar]
- Asta, F.; Michelozzi, P.; Cesaroni, G.; De Sario, M.; Badaloni, C.; Davoli, M.; Schifano, P. The Modifying Role of Socioeconomic Position and Greenness on the Short-Term Effect of Heat and Air Pollution on Preterm Births in Rome, 2001–2013. Int. J. Environ. Res. Public Health 2019, 16, 2497. [Google Scholar] [CrossRef]
Variable | Obs | Yes | No |
---|---|---|---|
Premature birth, before 37 weeks | 18,201 | 1361 | 16,840 |
Very premature birth, before 32 weeks | 18,201 | 194 | 18,007 |
Pregnancy illness | Yes | No | |
Dorsopathies | 18,201 | 382 | 17,819 |
Sciatica | 18,201 | 225 | 17,796 |
Non-trivial infections | 18,201 | 303 | 17,898 |
Anaemia | 18,201 | 362 | 17,839 |
UTI | 18,201 | 509 | 17,692 |
Eclampsia | 18,201 | 994 | 17,207 |
Hyperemesis | 18,201 | 797 | 17,404 |
Bleeding | 18,201 | 1115 | 17,086 |
Any illness reported in pregnancy | 18,196 | 6871 | 11,325 |
Reported longstanding illnesses occurring in more than 0.2% of the sample | Yes | No | |
Endometriosis | 18,201 | 59 | 18,142 |
Arthritis | 18,201 | 72 | 18,129 |
Psoriasis | 18,201 | 41 | 18,160 |
Dermatitis | 18,201 | 71 | 18,130 |
Irritable bowel syndrome | 18,201 | 64 | 18,137 |
Asthma | 18,201 | 805 | 17,396 |
Hypertension | 18,201 | 82 | 18,119 |
Hearing loss | 18,201 | 51 | 18,150 |
Migraine | 18,201 | 56 | 18,145 |
Epilepsy | 18,201 | 91 | 18,110 |
Clinical depression | 18,201 | 282 | 17,919 |
Karotype 47 (xxx) | 18,201 | 43 | 18,158 |
Diabetes mellitus | 18,201 | 93 | 18,108 |
Thyroid problems | 18,201 | 171 | 18,030 |
Anaemia | 18,201 | 44 | 18,157 |
Mother in paid work while pregnant | 18,183 | 11,364 | 6819 |
Partner in paid work at start of pregnancy | 12,963 | 11,847 | 1116 |
Pregnancy result of fertility treatment | 18,194 | 476 | 17,718 |
Mother reports getting depressed | 18,196 | 4468 | 13,728 |
Mother reports partner get in violent rage | 12,584 | 405 | 12,179 |
Mother ever was a smoker | 11,298 | 1767 | 9531 |
Partner has depression | 13,022 | 1208 | 11,814 |
Partner has diabetes | 13,020 | 160 | 12,860 |
Partner has longstanding illness | 13,030 | 2647 | 10,383 |
Home is damp | 18,163 | 2484 | 15,679 |
Grandparents live in household | 18,201 | 1414 | 16,787 |
Father not in household | 18,175 | 3102 | 15,073 |
Infant sex | 18,201 | 9337M | 8864F |
Variable Name | Obs | Mean | SD | Min | Max |
---|---|---|---|---|---|
OECD equivalised income | 18,024 | 289.7 | 196.3 | 13.2 | 1282.8 |
Father’s age | 18,165 | 31.9 | 5.7 | 15 | 68 |
Number of children in household | 18,201 | 0.93 | 1.08 | 0 | 9 |
Age mother left full-time education (yrs) | 18,121 | 17.6 | 2.8 | 7 | 36 |
Mother’s ethnic group (8 categories) white, mixed, Indian, Pakistani, Bangladeshi, Caribbean, African, others. | 18,172 | 1.6 | 1.6 | 1 | 8 |
Birth interval from last child (months) | 8870 | 42.8 | 27.9 | 9 | 318 |
Mother’s age | 18,199 | 20.1 | 5.9 | 13 | 51 |
Age father left full-time education | 13,001 | 17.6 | 2.9 | 0 | 35 |
Father’s qualification MCS code (1 = highest) | 13,012 | 24.8 | 38.5 | 1 | 96 |
Father’s life satisfaction (10 = highest) | 12,578 | 7.8 | 1.7 | 1 | 10 |
Father feels he can run own life (1 = agree) | 12,579 | 1.3 | 0.6 | 1 | 3 |
Father feels has control over life (1 = agree) | 12,579 | 1.3 | 0.7 | 1 | 3 |
Father reports mother has used force (1 = Yes, 2 = no, 3 = refusal) | 12,290 | 1.9 | 0.3 | 1 | 3 |
Partner happy with relationship (1 = lowest) | 12,278 | 5.7 | 1.4 | 1 | 7 |
Partner suspects on brink of separation (1 = Yes) | 12,289 | 4.6 | 0.7 | 1 | 6 |
Partner cigarettes per day before pregnancy (descriptive for smokers) | 5330 | 13 | 9.4 | 0 | 70 |
Partner’s self-rated general health (1 = healthy) | 13,032 | 1.9 | 0.7 | 1 | 4 |
Neighbourhood vandalism (1 = least) | 18,137 | 3.1 | 0.9 | 1 | 4 |
Neighbourhood pollution, grime (1 = least) | 17,997 | 3.1 | 0.9 | 1 | 4 |
Mother’s satisfaction with area (1 = satisfied) | 18,165 | 1.9 | 1.1 | 1 | 5 |
Housing (house = 1, to sharing = 4. Not codable = 5) | 18,179 | 1.4 | 4.1 | 1 | 5 |
Mother suspects on brink of separation (1 = Yes) | 14,241 | 4.7 | 0.7 | 1 | 6 |
Mother happy with relationship (1 = lowest) | 14,234 | 5.7 | 1.4 | 1 | 7 |
Mother reports father has used force (1 = Yes, 2 = no, 3 = refusal) | 14,240 | 2.0 | 0.2 | 1 | 3 |
Mother’s unit alcohol per day before pregnant (descriptive shown for drinkers only) | 3675 | 1.7 | 1.4 | 0 | 22 |
Mother’s cigarettes per day before pregnancy (descriptive shown for smokers only) | 6877 | 11.8 | 8.1 | 0 | 80 |
Singleton birth = 1, twins = 2, triplets = 3 | 18,201 | 1.0 | 0.1 | 1 | 3 |
Mother’s maths ability: change in shops (1 = able) | 18,172 | 1.1 | 0.3 | 1 | 3 |
Mother’s literacy: filling in forms (1 = able) | 18,172 | 1.1 | 0.4 | 1 | 3 |
Mother’s SES by occupation (SOC2000) | 18,201 | 4856.8 | 2884.6 | 0 | 9259 |
Mother’s life satisfaction (10 = highest) | 17,596 | 7.7 | 1.8 | 1 | 10 |
Mother feels she can run own life (1 = agree) | 17,607 | 1.2 | 0.6 | 1 | 3 |
Mother feels has control over life (1 = agree) | 17,607 | 1.4 | 0.7 | 1 | 3 |
Mother feels she gets what she wants (1 = agree) | 17,609 | 2.0 | 0.5 | 1 | 3 |
Algorithm | Out-of-Bag Error | Hyper-Tuning: n. Iterations/n. Variables at Each Split | Sensitivity (n. Correctly Classified Premature/n. Premature) | Specificity (Number Correctly Classified Not Premature/n. Not Premature) |
---|---|---|---|---|
Delivery before 32 weeks, 72 features | 0.0107 | 20/7 | 68% (131/194) | 100% (18,007/18,007) |
Delivery before 32 weeks, algorithm reduced to 6 features | 0.0109 | 25/6 | 93% (180/194) | 100% (18,007/18,007) |
Delivery before 37 weeks, 72 features | 0.0752 | 25/12 | 70% (957/1361) | 100% (16,840/16,840) |
Delivery before 37 weeks, algorithm reduced to 9 features | 0.0745 | 30/3 | 60% (821/1361) | 100% (16,840/16,840) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Waynforth, D. Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach. Reprod. Med. 2022, 3, 320-333. https://doi.org/10.3390/reprodmed3040025
Waynforth D. Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach. Reproductive Medicine. 2022; 3(4):320-333. https://doi.org/10.3390/reprodmed3040025
Chicago/Turabian StyleWaynforth, David. 2022. "Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach" Reproductive Medicine 3, no. 4: 320-333. https://doi.org/10.3390/reprodmed3040025
APA StyleWaynforth, D. (2022). Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach. Reproductive Medicine, 3(4), 320-333. https://doi.org/10.3390/reprodmed3040025