Next Article in Journal
Impact of Long Working Hours on Mental Health Status in Japan: Evidence from a National Representative Survey
Previous Article in Journal
Supervised Machine Learning-Based Models for Predicting Raised Blood Sugar
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Future Birth Rates with the Use of an Adaptive Machine Learning Algorithm: A Forecasting Experiment for Scotland

by
Maria Tzitiridou-Chatzopoulou
1,*,
Georgia Zournatzidou
2 and
Michael Kourakos
3
1
School of Healthcare Sciences, Midwifery Department, University of Western Macedonia, 50100 Kozani, Greece
2
Department of Business Administration, University of Western Macedonia, 50100 Kozani, Greece
3
School of Healthcare Sciences, Department of Nursing, University of Ioannina, 45500 Ioannina, Greece
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2024, 21(7), 841; https://doi.org/10.3390/ijerph21070841
Submission received: 1 May 2024 / Revised: 12 June 2024 / Accepted: 17 June 2024 / Published: 27 June 2024

Abstract

:
The total fertility rate is influenced over an extended period of time by shifts in population socioeconomic characteristics and attitudes and values. However, it may be impacted by macroeconomic trends in the short term, although these effects are likely to be minimal when fertility is low. With the objective of forecasting monthly deliveries, this study concentrates on the analysis of registered births in Scotland. Through this approach, we examine the significance of precisely forecasting fertility trends, which can subsequently aid in the anticipation of demand in diverse sectors by allowing policymakers to anticipate changes in population dynamics and customize policies to tackle emerging demographic challenges. Consequently, this has implications for fiscal stability, national economic accounts and the environment. In conducting our analysis, we incorporated non-linear machine learning methods alongside traditional statistical approaches to forecast monthly births in an out-of-sample exercise that occurs one step in advance. The outcomes underscore the efficacy of machine learning in generating precise predictions within this particular domain. In sum, this research will comprehensively demonstrate a cutting-edge model of machine learning that utilizes several attributes to assist in clinical decision-making, predict potential complications during pregnancy and choose the appropriate delivery method, as well as help in medical diagnosis and treatment.

1. Introduction

Human fertility is a multi-faceted and constantly evolving phenomenon shaped by various biological, societal, and economic variables. Factors such as shifts in cultural attitudes towards female education and employment, the accessibility and affordability of childcare services, and broader economic indicators, like wage patterns, disposable income, and employment rates, all play a role [1] (for a more detailed discussion). This intricate interplay of factors renders the prediction of future birth rates a challenging endeavor [2,3]. Forecasting birth rate and fertility rate plays a crucial role in policy formulation and long-term planning initiatives. Specifically, the importance of accurately predicting fertility trends can significantly contribute to anticipating demand across various sectors, as it enables policymakers to anticipate shifts in population dynamics and tailor their policies to address emerging demographic challenges, thereby impacting national economic accounts and fiscal stability [4].
The existing literature highlights that period fertility rates can vary in response to specific economic or political conditions during certain years [5,6]. Hence, it is crucial to quantify the uncertainty in fertility forecasts to facilitate efficient risk management, empowering policymakers to make well-informed decisions in the face of uncertain future circumstances. Additionally, conducting quantitative evaluations of fertility forecasting techniques yields invaluable insights into their practicality and efficacy. Such assessments provide a thorough comprehension of the strengths and limitations inherent in various forecasting approaches.
In this paper, we consider a time series forecasting strategy for predicting fertility rates in a univariate forecasting exercise. This allows us to evaluate the efficacy of conventional econometric approaches, as well as non-linear machine learning methods, to predict births in order to further enhance our understanding of their applicability and robustness in various demographic contexts. Previous studies in the literature have extensively explored methods for predicting fertility rates and birth rates across diverse populations and temporal contexts [7,8,9]. The forecasting fertility rate methods include principal component and functional data models [10,11,12] approaches such as time series models and linear extrapolation [5,13] approaches that complete cohort fertility schedules [14,15,16] as well as Bayesian methods [17,18].
Congdon (1990) utilized a technique to predict fertility rates specifically for London boroughs [7]. Alkema et al. (2011) conducted an extensive evaluation of fertility forecasts, comparing various forecasting methodologies and underscoring the complexities posed by uncertainty and volatility in demographic projections [19]. In a more recent study, De Iaco and Maggio (2016) applied ARIMA methods to forecast the parameters of a gamma function tailored to the fertility trends observed in Italy [20]. Furthermore, they integrated a Markov field model to address correlations within the error structure of this model. Similarly, Mazzuco and Scarpa (2015) forecast the bimodal pattern of fertility by employing a Flexible Generalizable Skew-Normal Distribution [21]. Additionally, Lutz et al. (2014) and Beaujot (2015) explored the demographic drivers behind global fertility decline, emphasizing the significance of education, urbanization, and women’s empowerment [22]. Similarly, Barro and Lee (2015) examined the impact of educational attainment on fertility rates, revealing a negative correlation between education levels and fertility across nations [23]. Collectively, these studies enhance our comprehension of the factors shaping fertility patterns and birth rates, furnishing valuable insights for policymakers and researchers alike.
In the context of our analysis, that is of univariate time series forecasting, the advancement and acceptance of non-linear techniques have progressed at a relatively slower pace [24] particularly within specific domains. For instance, Saibal et al. (2023) focused on predicting Prakriti classes using data from 217 healthy individuals from genetically distinct cohorts in northern and western India, specifically examining three extreme Prakriti types [25]. To address inter-individual variability, eight machine learning (ML) classifiers were employed. The predictive abilities of these ML algorithms were subsequently evaluated to explore the use of artificial intelligence (AI) in enhancing the assessment of Prakriti in Ayurveda, aiming to improve the accuracy and consistency of these assessments and reduce subjective bias. As already mentioned, research in this field has heavily relied on the use of traditional time series forecasting methods, neglecting the potential advantages offered by more sophisticated state-of-the-art machine learning regression approaches [26]. Traditional econometric approaches used in time series forecasting, further involving Holt’s linear trend method, extends simple exponential smoothing to capture linear trends in the data. This method is particularly effective for time series data with a consistent trend but no seasonality. Recent studies have validated its utility in various fields [27,28]. Furthermore, Holt–Winters’ seasonal method extends Holt’s method by incorporating a seasonal component. Holt–Winters’ method is particularly effective for data with both trend and seasonal components and is widely applied in various industries [29,30]. In this study, however, we consider Prophet and other machine learning methods. Specifically, Prophet is designed to handle time series data with strong seasonal effects and potential for missing data, while Holt’s and Holt–Winters’ methods are more traditional approaches that can be highly effective based on the data considered in the analysis. However, Prophet offers advantages in flexibility and ease of handling irregular time series data and incorporating external regressors, as in our case. This makes it particularly useful in scenarios where data patterns are complex. For this reason, we consider the employment of machine learning tree-based algorithms and algorithms that exploit boosting, as well as traditional econometric approaches, to evaluate their performance and effectiveness in forecasting births. We focus on births in the UK and specifically Scotland, as it has been noted that, since the late 1970s, Scotland has consistently exhibited notably lower fertility rates compared to England and Wales. This difference primarily stems from reduced rates of childbearing among women in their thirties and forties in Scotland relative to England. In a related report, the importance of delving into the substantial population challenges Scotland faces, such as an aging population, decreasing birth rate, and the evolving repercussions of Brexit, underline the imperative for a comprehensive national strategy (www.gov.scot, accessed on 3 April 2024). For these reasons, we aim to accurately forecast the number of births in Scotland. To this end, we conduct an out-of-sample forecasting exercise, with various settings being considered regarding the forecasting horizons and the accuracy measures to evaluate the performance of the corresponding regression approaches.
The rest of this paper is structured as follows. In Section 2 we outline the data utilized in the out-of-sample forecasting exercise, detailing the methodology employed. Section 3 delves, into the analysis results. Lastly Section 4 concludes this paper.

2. Methodology

In this section, we present the methods used to approach the research question (Figure 1). Specifically, in the current study machine learning algorithms were considered in our forecasting experiment to predict births in Scotland. In the related literature, a wide range of approaches have been considered, mainly focusing on modelling fertility rather than forecasting [31,32]. More recently, machine learning approaches have been considered in various forecasting problems across disciplines, reporting significant enhancements in accuracy compared to current methodologies [33]. In our analysis, the machine learning methodologies considered involve tree-based algorithms, namely Random Forest, as well as boosting algorithms and specifically Extreme Gradient Boosting. Additionally, Linear Regression and a conventional econometric time series approach, the Autoregressive Integrated Moving Average (ARIMA), which has been extensively employed for similar purposes [34,35], are employed to compare their effectiveness in order to accurately predict births in a univariate out-of-sample forecasting exercise.

2.1. Machine Learning Models

Facebook Prophet

Prophet is a simple algorithm developed to forecast time series data, featuring additional components capturing trends and seasonal patterns, as well as holiday effects. Firstly, Prophet models the overall trend in the data using a linear regression model. Next, it captures periodic fluctuations or seasonal patterns by utilizing Fourier series to model weekly, yearly, and/or any custom seasonalities in the data under analysis. Furthermore, Prophet accounts for holidays and other known events that may influence the time series, enabling users to specify custom holiday effects. Finally, the algorithm combines the abovementioned components to produce forecasts.

2.2. Random Forest

Breiman (2001) introduced the Random Forest algorithm, which utilizes a group of decision trees { T 1 , T 2 , , T N } , to produce results [36]. Decision trees are a machine learning method used for classifying and predicting purposes. In a decision tree, the dataset is divided into subsets based on input feature values to create predictable groups related to the target variable. Each decision tree within a Random Forest is constructed independently using a subset of the training data and selected features. The incorporation of randomness at both data and feature levels helps reduce correlations between trees, enhancing the ensemble’s resilience and minimizing the risk of overfitting.
Assuming we have a dataset called D with n samples and m features, a decision tree T is made up of a series of splits depicted as nodes. At each node, the process selects the feature j and a threshold t that effectively divides the data into two groups aiming to minimize errors in each subset. The decision on how to split can be guided by metrics like error (MSE) or mean absolute error (MAE) for regression tasks. This recursive process continues until certain conditions are met, like reaching a tree depth or having a specific number of samples, in each leaf node. In regression tasks, this combination usually involves averaging the predictions from all trees.

2.3. Extra Trees

Extra Trees Regression is a machine learning technique. It works by building a forest of random decision trees. Each tree is trained on a different subset of data points drawn with replacement from the original data. Additionally, at each split point within the trees, a random selection of features is considered, further increasing the diversity of the trees. This randomness helps reduce overfitting and improve the overall accuracy of the predictions. By averaging the predictions from all the trees in the forest, Extra Trees Regression delivers the final prediction.

2.4. Extreme Gradient Boosting (XGBoost)

Extreme gradient boosting (XGBoost) is an efficient and scalable algorithm for implementing gradient boosted decision trees. According to Chen and Guestrin (2016), XGBoost is defined as a tree boosting machine learning approach [37]. Its impact has been widely acknowledged across machine learning and data mining challenges, making it an algorithm employed in numerous machine learning applications. XGBoost utilizes K function f k x to approximate the function of f k ( x ) , represented as follows:
F k x = k = 1 K f k x ,   f k x F
where K is the number of trees, f k ( x ) is a function family F, and F is the set of all possible regression trees (CART). XGBoost utilizes a specific form of a base learner: f k ( x ) is a CART and can be denoted as ω _ ( q ( χ ) ) , q ϵ { 1,2 ,     ,   T } , where T represents the number of leaves in the tree, q represents the decision rules of the tree, and ω is a vector that signifies the sample weight of leaf nodes. Therefore, the loss function of XGBoost is expanded to the objective function by adding a regularization term as follows:
L = i 1 n Ψ ( y i , F k x i ) + k = 1 K Ω ( f k )
where F k x is the prediction on the i-th sample at the K-th boost and Ω _ ( ( f ) ) = γ T + 0.5 × λ ω 2 . In the regularization term, γ is a fixed coefficient, and ω 2 is the L2 norm of leaf weights—the Ω(*) is the regularization term that penalizes the complexity of the model. The regularized objective function, which is inspired by the regularized greedy forest, tends to smooth the base learners’ contributions to avoid overfitting. The Ψ(*) is a specified loss function that measures the difference between the prediction and the real class label. In XGBoost, to find the minimum F k ( x ) , the objective function is optimized with gradient descent, where only the first-order gradient statistics are used.

2.5. Evaluation Metrics

Different measures have been utilized in the related literature to assess the performance of regression models [26]. For the purposes of our analysis, we rely on three evaluation criteria: the Root Mean Square Error (RMSE), the Mean Absolute Error (MAE) and the Symmetric Mean Absolute Percentage Error (SMAPE).
The RMSE metric is defined as follows:
R M S E = 1 n i = 1 n X i Y i 2
where X i stands for the predicted value and Y i stands for the actual value.
The MAE metric is defined as:
M A E = 1 n i = 1 n X i Y i
The SMAPE metric is defined as follows:
S M A P E = 100 % n i = 1 n X i Y i ( X i + Y i ) / 2
SMAPE is expressed as a percentage (Flores 1986) and can be used to measure the predictive performance of the regression models [24,38].

3. Data

In this paper, the proposed machine learning forecasting models are employed on data concerning Great Britain, and specifically Scotland. To this end, we use data regarding births, sourced from www.nrscotland.gov.uk (accessed on 3 April 2024) and involve official country-level data of monthly births registered by month of registration, that cover the period from January 1998 to December 2022. The logarithmic transformation of the monthly birth variable has been considered throughout the analysis. Additionally, the nonparametric unit root test has been further applied to reveal whether or not the variable is stationary. According to the results, the birth series variable can be used in its logarithmic form in the present analysis without further transformation.
Table 1 reports the descriptive statistics for the monthly births’ series. Specifically, in Table 1 we notice that the mean of the logarithm of monthly births in Scotland is 8.381 and the standard deviation 0.429. The skewness is −10.526, while the kurtosis value equals 124.054. Regarding the skewness metric, an asymmetric distribution of the birth series is observed. For kurtosis, the variable shows a deviation from the normal distribution, with the kurtosis value being greater than 3, hence following a leptokurtic distribution. Based on the results of the Jarque Bera, test we can conclude that the monthly birth series does not follow a normal distribution.
Figure 2 and Figure 3 show Scotland’s monthly birth numbers and suggest a possible link to the COVID-19 pandemic. The pandemic might have worsened existing worries, especially financial, for young couples planning families. Money is an important factor in family planning, so a national plan to address Scotland’s falling birth rate is needed. This study helps us understand how uncertainty, including that from climate change, can affect birth rates in Scotland. This knowledge can be used to create better policies.
Considering all the above, the proposed forecasting exercise can enhance our understanding of demographic trends in this specific region.

4. Results

In this study, we aim to predict births on a monthly basis with a special focus on Scotland. The importance of our approach can be seen considering the dramatic decline in the birth rate and fertility rate. The results of our forecasting experiment can provide valuable insights and information for policy makers, healthcare providers, and others who are interested in understanding demographic trends and planning for the future.
We next present the results of the one-step-ahead out-of-sample forecasting performance of the proposed univariate machine learning regression methods to predict the monthly series of births in Scotland (Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8). We follow a rolling estimation window approach involving 24 observations [28]. Additionally, the dataset was split into train and test set, respectively, based on the 80–20% proportion of the total observations of the data. Hyperparameters for each of the machine learning regression methods, as well as the number of lags for the birth related variable, were tuned based on cross validation. Window sizes of 1, 3, 6, 9 and 12, months were used with the value chosen as 12. For each of the machine learning models examined, different hyperparameters settings were tried, including the learning rates (0.00001, 0.00005, 0.0001, 0.0005, 0.001), as well as the number of estimators (50, 100, 500). The forecasting was performed in R (version 4.3.0) using the ‘timetk’ package (version 2.9.0). We utilized the ‘tidymodels’ package (version 1.1.1) [39], ‘lubridate’ package (version 1.9.2) [40] and ‘modeltime’ package (version 1.2.8) [41] in RStudio (version 2023.06.0+421).
We also use the Mοdel Confidence Set (MCS) method introduced by [42] to identify the group of models that perform well. This technique allows us to compare models by eliminating those that demonstrate significantly poorer performance, assuming an equal level of forecast accuracy, at a specified confidence level. By conducting comparisons, we can make conclusions about significance. For an explanation of the MCS procedure, please refer to [42]. We apply this test to both non-linear methods analyzed in our study.
Based on the corresponding results presented in Table 2, Extreme Gradient Boosting, Random Forest and Prophet appear to be the best-performing models, with the Extreme Gradient Boosting algorithm showing slightly better performance based on the metrics values. Random Forest and Prophet perform reasonably well. The results based on Linear Regression present the poorest performance among all models, with higher error metrics values.

5. Concluding Remarks

In this paper, we predict births in Scotland in a one-step-ahead out-of-sample univariate forecasting exercise. Predicting birth rates holds significant importance across various fields due to its wide-ranging implications. Effectively and accurately predicting future births can affect demography and public health, as it can enable policymakers and healthcare professionals to anticipate population growth or decline, thereby informing decisions regarding resource allocation for healthcare services, education, and social welfare programs. Additionally, in economics and business, projections of birth and fertility rates provide critical insights into future consumer demographics, labor force dynamics, and market trends, influencing investment strategies, workforce planning, and product development. Moreover, in environmental science and sustainability, understanding population growth patterns is essential for assessing the impact on natural resources, biodiversity, and ecosystems, guiding efforts toward sustainable development. Overall, the ability to predict births facilitates informed decision-making and strategic planning across a spectrum of fields, contributing to the well-being and sustainability of societies and ecosystems.
Future research on this topic could focus on the examination of more sophisticated machine learning and deep learning algorithms that can better capture the dynamics of these specific data. Furthermore, additional predictors could be considered that relate to factors that affect birth rate and fertility rate to improve the out-of-sample forecasts of the machine learning approaches.

Author Contributions

Conceptualization, M.T.-C. and M.K.; methodology, G.Z.; software, G.Z.; validation, M.T.-C. and M.K.; formal analysis, G.Z.; investigation, G.Z.; resources, M.T.-C.; data curation, M.K.; writing—original draft preparation, M.T.-C., G.Z. and M.K.; writing—review and editing, M.T.-C., G.Z. and M.K.; visualization, G.Z.; supervision, M.K.; project administration, G.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Balbo, N.; Billari, F.C.; Mills, M. Fertility in Advanced Societies: A Review of Research. Eur. J. Popul. 2013, 29, 1–38. [Google Scholar] [CrossRef]
  2. Tzitiridou-Chatzopoulou, M.; Orovou, E.; Zournatzidou, G. Digital Training for Nurses and Midwives to Improve Treatment for Women with Postpartum Depression and Protect Neonates: A Dynamic Bibliometric Review Analysis. Healthcare 2024, 12, 1015. [Google Scholar] [CrossRef]
  3. Russell, A.C.; Santucci, N.R.; Tzitiridou-Chatzopoulou, M.; Kountouras, J.; Zournatzidou, G. The Potential Impact of the Gut Microbiota on Neonatal Brain Development and Adverse Health Outcomes. Children 2024, 11, 552. [Google Scholar] [CrossRef]
  4. Hilton, J.; Dodd, E.; Forster, J.J.; Smith, P.W.F.; Bijak, J. Forecasting Fertility with Parametric Mixture Models. arXiv 2019, arXiv:1909.09545, 1–26. [Google Scholar]
  5. De Beer, J. A time series model for cohort data. J. Am. Stat. Assoc. 1985, 80, 525–530. [Google Scholar] [CrossRef]
  6. Thompson, P.A.; Bell, W.R.; Long, J.F.; Miller, R.B. Multivariate Time Series Projections of Parameterized Age-Specific Fertility Rates. J. Am. Stat. Assoc. 1989, 84, 689–699. [Google Scholar] [CrossRef]
  7. Congdon, P. Graduation of Fertility Schedules: An Analysis of Fertility Patterns in London in the 1980s and an Application to Fertility Forecasts. Reg. Stud. 1990, 24, 311–326. [Google Scholar] [CrossRef]
  8. Bell, W. Applying Time Series Models in Forcasting Age-Specific Fertility Rates. Statistical Research Division Report Series-US. 1988. Available online: https://www.census.gov/library/working-papers/1988/adrm/rr88-19.html (accessed on 8 June 2024).
  9. Hozik, J.E.; Bell, W.R. Forecasting Age-Specific Fertility Using Principal Components. American Statistical Association, Social Statistics Section, Statistica; (CENSUS/SRD/RR-87/19). 1987; pp. 1–14. Available online: https://www.census.gov/library/working-papers/1987/adrm/rr87-19.html (accessed on 8 June 2024).
  10. Lee, R.D. Modeling and forecasting the time series of US fertility: Age distribution, range, and ultimate level. Int. J. Forecast. 1993, 9, 187–202. [Google Scholar] [CrossRef]
  11. Hyndman, R.J.; Shahid, U. Robust forecasting of mortality and fertility rates: A functional data approach. Comput. Stat. Data Anal. 2007, 51, 4942–4956. [Google Scholar] [CrossRef]
  12. Wiśniowski, A.; Smith, P.W.F.; Bijak, J.; Raymer, J.; Forster, J.J. Bayesian Population Forecasting: Extending the Lee-Carter Method. Demography 2015, 52, 1035–1059. [Google Scholar] [CrossRef]
  13. Myrskylä, M.; Goldstein, J.R.; Cheng, Y.-H.A. New Cohort Fertility Forecasts for the Developed World: Rises, Falls, and Reversals. Popul. Dev. Rev. 2013, 39, 31–56. [Google Scholar] [CrossRef]
  14. Evans, M.D.R. American Fertility Patterns: A Comparison of White and Nonwhite Cohorts Born 1903-56. Popul. Dev. Rev. 1986, 12, 269–293. [Google Scholar] [CrossRef]
  15. Li, N.; Zheng, W. Forecasting cohort incomplete fertility: A method and an application. Popul. Stud. 2003, 57, 303–320. [Google Scholar] [CrossRef]
  16. Peristera, P.; Anastasia, K. Modeling fertility in modern populations. Demogr. Res. 2007, 16, 141–194. [Google Scholar] [CrossRef]
  17. Ševčíková, H.; Nan, L.; Vladimíra, K.; Patrick, G.; Adrian, E.R. Age-Specific Mortality and Fertility Rates for Probabilistic Population Projections. Dyn. Demogr. Anal. 2015, 39, 285–310. [Google Scholar] [CrossRef]
  18. Schmertmann, C.; Zagheni, E.; Goldstein, J.R.; Myrskylä, M. Bayesian Forecasting of Cohort Fertility. J. Am. Stat. Assoc. 2014, 109, 500–513. [Google Scholar] [CrossRef]
  19. Alkema, L.; Raftery, A.E.; Gerland, P.; Clark, S.J.; Pelletier, F.; Buettner, T.; Heilig, G.K. Probabilistic projections of the total fertility rate for all countries. Demography 2011, 48, 815–839. [Google Scholar] [CrossRef] [PubMed]
  20. De Iaco, S.; Sabrina, M. A dynamic model for age-specific fertility rates in Italy. Spat. Stat. 2016, 17, 105–120. [Google Scholar] [CrossRef]
  21. Mazzuco, S.; Bruno, S. Fitting Age-Specific Fertility Rates By a Skew-Symmetric Probability Density Function. J. R. Stat. Soc. Ser. A 2015, 178, 187–203. [Google Scholar] [CrossRef]
  22. Lutz, W.; Butz, W.P.; KC, S. World Population and Human Capital in the Twenty-First Century; Oxford University Press: Oxford, UK, 2014. [Google Scholar]
  23. Barro, R.; Lee, J.-W. Education Matters: Global Schooling Gains from the 19th to the 21st Century; Oxford University Press: Oxford, UK, 2015; Available online: https://econpapers.repec.org/RePEc:oxp:obooks:9780199379231 (accessed on 8 June 2024).
  24. Makridakis, S. Accuracy measures: Theoretical and practical concerns. Int. J. Forecast. 1993, 9, 527–529. [Google Scholar] [CrossRef]
  25. Majumder, S.; Kutum, R.; Khatua, D.; Sekh, A.A.; Kar, S.; Mukerji, M.; Prasher, B. On intelligent Prakriti assessment in Ayurveda: A comparative study. J. Intell. Fuzzy Syst. 2023, 45, 9827–9844. [Google Scholar] [CrossRef]
  26. Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [Google Scholar] [CrossRef]
  27. Gardner, E.S. Exponential smoothing: The state of the art—Part II. Int. J. Forecast. 2006, 22, 637–666. [Google Scholar] [CrossRef]
  28. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts.Com/Fpp2; OTexts: Melbourne, Australia, 2018. [Google Scholar]
  29. De Livera, A.M.; Hyndman, R.J.; Snyder, R.D. Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 2011, 106, 1513–1527. [Google Scholar] [CrossRef]
  30. Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
  31. Booth, H. Demographic Forecasting: 1980 to 2005 in review. Int. J. Forecast. 2006, 22, 547–581. [Google Scholar] [CrossRef]
  32. Bohk-Ewald, C.; Li, P.; Myrskylä, M. Forecast accuracy hardly improves with method complexity when completing cohort fertility. Proc. Natl. Acad. Sci. USA 2018, 115, 9187–9192. [Google Scholar] [CrossRef]
  33. Zournatzidou, G.; Mallidis, I.; Farazakis, D.; Floros, C. Enhancing Bitcoin Price Volatility Estimator Predictions: A Four-Step Methodological Approach Utilizing Elastic Net Regression. Mathematics 2024, 12, 1392. [Google Scholar] [CrossRef]
  34. Miller, R.B.; Hickman, J.C. Time Series Analysis and Forecasting. Transactions of Society of Actuaries 1973. 1973. Available online: https://www.soa.org/4934e6/globalassets/assets/library/research/transactions-of-society-of-actuaries/1973/january/tsa73v25pt1n7314.pdf (accessed on 8 June 2024).
  35. Cantor, D.; Land, K.C. Unemployment and crime rates in the post-World War II United States: A theoretical and empirical analysis. Am. Sociol. Rev. 1985, 50, 317–332. [Google Scholar] [CrossRef]
  36. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  37. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  38. Goodwin, P.; Lawton, R. On the asymmetry of the symmetric MAPE. Int. J. Forecast. 1999, 15, 405–408. [Google Scholar] [CrossRef]
  39. Kuhn, M.; Wickham, H. Tidymodels: Easily Install and Load the ‘Tidymodels’ Packages. 2021. Available online: https://CRAN.R-project.org/package=tidymodels (accessed on 8 June 2024).
  40. Grolemund, G.; Wickham, H. lubridate: Make Dealing with Dates a Little Easier. 2021. Available online: https://CRAN.R-project.org/package=lubridate (accessed on 8 June 2024).
  41. Dancho, M. Modeltime: The Tidymodels Extension for Time Series Modeling (Version 1.2.8). 2023. Available online: https://CRAN.R-project.org/package=modeltime (accessed on 8 June 2024).
  42. Hansen, P.R.; Lunde, A.; Nason, J.M. The Model Confidence Set. Econometrica 2011, 79, 453–497. [Google Scholar] [CrossRef]
Figure 1. Methodology workflow.
Figure 1. Methodology workflow.
Ijerph 21 00841 g001
Figure 2. Total number of births in Scotland in months for the period from January 1998 to December 2022.
Figure 2. Total number of births in Scotland in months for the period from January 1998 to December 2022.
Ijerph 21 00841 g002
Figure 3. Logarithmic form of monthly births in Scotland for the period from January 1998 to December 2022.
Figure 3. Logarithmic form of monthly births in Scotland for the period from January 1998 to December 2022.
Ijerph 21 00841 g003
Figure 4. Series of Forecasts for the number of births in Scotland, one-step-ahead, based on the Random Forest machine learning algorithm. The different lines highlights the trends and seasonality of the phenomenon per year.
Figure 4. Series of Forecasts for the number of births in Scotland, one-step-ahead, based on the Random Forest machine learning algorithm. The different lines highlights the trends and seasonality of the phenomenon per year.
Ijerph 21 00841 g004
Figure 5. Series of Forecasts for the number of births in Scotland, one-step-ahead, based on the XGBoost machine learning algorithm. The different lines highlights the trends and seasonality of the phenomenon per year.
Figure 5. Series of Forecasts for the number of births in Scotland, one-step-ahead, based on the XGBoost machine learning algorithm. The different lines highlights the trends and seasonality of the phenomenon per year.
Ijerph 21 00841 g005
Figure 6. Series of Forecasts for the number of births in Scotland, one-step-ahead, based on the Prophet method. The different lines highlights the trends and seasonality of the phenomenon per year.
Figure 6. Series of Forecasts for the number of births in Scotland, one-step-ahead, based on the Prophet method. The different lines highlights the trends and seasonality of the phenomenon per year.
Ijerph 21 00841 g006
Figure 7. Series of forecasts for the number of births in Scotland, one-step-ahead, based on Linear Regression. The different lines highlights the trends and seasonality of the phenomenon per year.
Figure 7. Series of forecasts for the number of births in Scotland, one-step-ahead, based on Linear Regression. The different lines highlights the trends and seasonality of the phenomenon per year.
Ijerph 21 00841 g007
Figure 8. Series of forecasts for the number of births in Scotland, one-step-ahead, based on the ARIMA traditional time series approach. The different lines highlights the trends and seasonality of the phenomenon per year.
Figure 8. Series of forecasts for the number of births in Scotland, one-step-ahead, based on the ARIMA traditional time series approach. The different lines highlights the trends and seasonality of the phenomenon per year.
Ijerph 21 00841 g008
Table 1. Descriptive Statistics.
Table 1. Descriptive Statistics.
Mean8.381
Median8.427
Maximum9.131
Minimum2.944
Standard Deviation 0.429
Skewness−10.526
Kurtosis 124.054
Jarque–Bera188,717 ***
Jarque–Bera probability[0.000]
Notes: This table reports the descriptive statistics for the logarithm of the monthly births in Scotland, for the full sample. The Jarque–Bera test tests the null hypothesis of normality for each series. The probabilities of the Jarque–Bera test are contained in brackets. *** indicates a rejection of the null hypothesis of normality at 1% significance level.
Table 2. Estimation results for the births in Scotland (one-step-ahead out-of-sample).
Table 2. Estimation results for the births in Scotland (one-step-ahead out-of-sample).
ModelMAERMSESMAPE
ARIMA0.440.520.72
Prophet0.370.46 *0.54
Random Forest0.340.44 *0.57
Extreme Gradient Boosting0.320.41 *0.54
Linear Regression0.450.620.67
Notes. The Table reports the out-of-sample results (metrics values) for predicting births in Scotland (h = 1 days). (*) indicates models that are included in the Model Confidence Set at the 1% significance level.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tzitiridou-Chatzopoulou, M.; Zournatzidou, G.; Kourakos, M. Predicting Future Birth Rates with the Use of an Adaptive Machine Learning Algorithm: A Forecasting Experiment for Scotland. Int. J. Environ. Res. Public Health 2024, 21, 841. https://doi.org/10.3390/ijerph21070841

AMA Style

Tzitiridou-Chatzopoulou M, Zournatzidou G, Kourakos M. Predicting Future Birth Rates with the Use of an Adaptive Machine Learning Algorithm: A Forecasting Experiment for Scotland. International Journal of Environmental Research and Public Health. 2024; 21(7):841. https://doi.org/10.3390/ijerph21070841

Chicago/Turabian Style

Tzitiridou-Chatzopoulou, Maria, Georgia Zournatzidou, and Michael Kourakos. 2024. "Predicting Future Birth Rates with the Use of an Adaptive Machine Learning Algorithm: A Forecasting Experiment for Scotland" International Journal of Environmental Research and Public Health 21, no. 7: 841. https://doi.org/10.3390/ijerph21070841

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop