Application of Time-Series Modeling in Forecasting the Doctorate-Level Science and Technology Workforce

Yoon, Ho-Yeol; Choe, Hochull

doi:10.3390/app14199135

Open AccessArticle

Application of Time-Series Modeling in Forecasting the Doctorate-Level Science and Technology Workforce

by

Ho-Yeol Yoon

and

Hochull Choe

^*

Strategic Technology Policy Center, Korea Research Institute of Chemical Technology (KRICT), Daejeon 34114, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 9135; https://doi.org/10.3390/app14199135

Submission received: 30 August 2024 / Revised: 30 September 2024 / Accepted: 3 October 2024 / Published: 9 October 2024

(This article belongs to the Special Issue State-of-the-Art Dynamical Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The science and technology (S&T) workforce plays a crucial role in social development by promoting technological innovation and economic growth, as well as serving as a key indicator of research and development productivity and measure of innovation capability. Therefore, effective S&T workforce policies must be established to enhance national competitiveness. This study proposes a time-series forecasting methodology to predict the scale and structural trends of South Korea’s doctorate-level S&T workforce. Based on earlier research and case data, we applied both the traditional time-series model exponential smoothing and the latest model Prophet, developed by Meta, in this study. Further, public data from South Korea were used to apply the proposed models. To ensure robust model evaluation, we considered multiple metrics. With respect to both forecasting accuracy and sensitivity to data variability, Prophet was found to be the most suitable for predicting the S&T doctorate workforce’s scale. The scenarios derived from the Prophet model can help the government formulate policies based on scientific evidence in the future.

Keywords:

doctorate-level workforce; time-series analysis; predictive modeling; exponential smoothing; the prophet model

1. Introduction

As participants of the innovation ecosystem, the science and technology (S&T) workforce plays a significant role in social development by enhancing technological innovation and economic growth [1,2,3,4,5,6]. The workforce is a key S&T indicator in research and development (R&D) productivity analysis and institutional innovation capacity measurements [7,8]. Science and engineering Doctor of Philosophy (PhD) scholars in the S&T workforce are highly skilled and have specialized research training that is applicable in various organizations [1,9,10,11]. To this day, rapid and radical technological changes continue to globally drive the demand for PhD scholars in industrial fields [9,12,13]. However, the policy reports of several countries indicate the future shortage of S&T workers [9,14,15]. The importance of fostering the S&T workforce’s development was emphasized recently with respect to the rapidly changing internal and external environments [16], and the importance of data-driven trend identification in policy formulation [17]. Despite highlighting the significance of this issue, most studies on workforce forecasting predominantly focus on specific industrial sectors [18].

Historically, the United States was the focal point of S&T education, as well as talent development [8]. However, recently, East Asian countries have been focusing on dynamic growth and elevating their education systems to world-class levels to enhance national competitiveness [1]. In particular, South Korea continues to strengthen its S&T human resources by promoting R&D-focused dynamic education and innovation policies [5,19]. Further, the country’s S&T manpower policy achievements have helped increase the numbers of S&T masters- and doctorate-level graduates. However, South Korea faces an imbalance between the industrial demand for and the supply of talent because of the disparities in worker concentrations in certain fields [20]. This supply and demand insecurity is expected to increase in the coming years because of the steady decline in school-age population in the country.

Nevertheless, studies on workforce forecasting in South Korea remain scarce [18]. Most of them focus solely on North America as well [18,21]. Labor supply forecasting is a critical consideration because it aids strategic planning, resource allocation, and policy formulation to create an innovative and skilled workforce [18]. The evidence base generated through social science research effectively supports informed policy decision making [22]. Existing studies on labor supply forecasting in various sectors leverage some population data directly related to the number of workers and weighted indicators in the estimation process, and use methodologies such as the cohort-component method, matrix projection models, and the employment structure [6,23,24]. Often, forecasting methodologies are combined to develop precise workforce forecasts because the use of a single methodology is challenging. Some studies suggest the application of scenario-based sectoral occupational flexibility and the employment structure [23], which limits the use of time-series approaches. This is clarified by the literature review on workforce forecasting models conducted by Safarishahrbijari [18] as well. Accordingly, the use of time-series analysis techniques is less prevalent compared with other research methodologies [18]. Moreover, to the best of our knowledge, no study has hitherto applied a time-series analysis to forecast the scale of the S&T doctorate workforce.

Studies forecasting future trends in the S&T workforce using time-series methods are limited because such forecasts require the consideration of various external conditions [18]. However, a time-series analysis provides a robust framework for scenario forecasting as it mathematically describes the patterns and dependencies that unfold over time [25]. Accordingly, this study addresses the aforementioned gap in existing research by proposing time-series forecasting methods to predict the S&T workforce in South Korea. To clarify the proposed methodology, we apply it to predict the size of the country’s dynamically growing PhD workforce. In this manner, this study makes the following contributions: It devises a reliable time-series method to predict the size of the S&T PhD workforce and, thereby, contributes to the broad field of S&T workforce policy and provides guidance for policymakers and researchers. Specifically, we apply the state-of-the-art time-series analysis model Prophet in conjunction with traditional exponential smoothing methods to identify the superior forecasting methodology through a comparative analysis. This study can be used as a scenario for S&T doctoral manpower policy formulation. By addressing the limitations of a time-series analysis identified by earlier studies, our forecast examines the structural trends of the S&T PhD workforce, rather than examining human resource planning in detail.

Various time-series approaches, such as single exponential smoothing (SES) [26], double exponential smoothing (DES) [27,28], and Prophet [29], clarified in the existing literature, are well known for their high accuracy in forecasting labor and workforce-related indices. Therefore, this study considers three methods—two exponential smoothing techniques and one Prophet technique—that are considered effective in managing seasonal and nonseasonal data. Although the autoregressive integrated moving average (ARIMA) method is an excellent time-series model, the selected models can effectively analyze our limited time-series data.

Each method’s accuracy in predicting the ratio of the doctoral workforce to the general population was assessed by comparing it with 20 years of historical data. In this manner, this study enhances our understanding of the doctoral workforce’s trends and tests the workforce forecasting effectiveness of different time-series models. To measure the accuracy of the forecast, mean absolute percentage error (MAPE), mean absolute scaled error (MASE), root mean squared error (RMSE), and normalized root mean square error (NRMSE) are used as metrics.

Section 2, Section 3, Section 4 and Section 5 describe the research methodology; present findings; discuss the findings with respect to earlier studies; and present the study’s conclusions, highlighting the implications of the findings and future research directions, respectively.

2. Proposed Methods

As shown in Figure 1, the methods proposed by the study are summarized as follows:

Combines disparate data to estimate the ratio of the number of S&T PhD graduates to the general population. This is represented as follows:

$Ratio = \frac{P h D g r a d u a t e}{P o p u l a t i o n e s t i m a t i o n}$

(1)
Conducts predictive modeling using exponential smoothing (single and double) and Prophet.
Determines the best model based on MAPE, MASE, RMSE, and NRMSE metrics.
The prediction and interpretation of South Korea’s S&T PhD workforce combined with future projected population data.

2.1. Data Collection and Analysis Environments

This study adopted a quantitative approach to forecast the number of S&T PhD scholars in South Korea by applying exponential smoothing and Prophet using public data, including the Projected Populations for Korea obtained from the Korean Statistical Information Service and the Statistical Yearbook of Education of the Korean Educational Development Institute (KEDI) [30,31]. The data were collected over 20 years, from 2001 to 2020, to match the analysis period. We used the SES, DES, and Prophet models for analysis and MAPE, MASE, RMSE, and NRMSE as model evaluation metrics. Further, we utilized the forecast package for exponential smoothing and the Prophet package for the Prophet model. All the analyses were conducted in the R 4.4.1 environment.

2.2. Analysis Methods

SES is a basic time-series forecasting method that is used in the absence of any trend or seasonality. It is particularly effective in identifying trends in smoothed data [32]. Further, it uses α as a parameter to adjust observation weights and is calculated as follows:

{\hat{Y}}_{t + 1} = α Y_{t} + (1 - α) \hat{Y_{t}}

(2)

where the predicted value at time t + 1 represents the smoothed value calculated at time t, Y_t is the actual observation made at time t, and α denotes the smoothing constant used in the calculation [33].

DES, which is an extension of SES, reflects the linear trend in data. It considers both α and β parameters, which represent level and trend, respectively. Further, the “ets” function in the R forecast package utilizes the Holt linear trend model [34]. The parameters are calculated as follows:

L_{t} = α Y_{t} + (1 - α) (L_{t - 1} + T_{t - 1}),

(3)

T_{t} = β (L_{t} - L_{t - 1}) + (1 - β) T_{t - 1},

(4)

{\hat{Y}}_{t + m} = L_{t} + m T_{t},

(5)

where L_t represents the estimated level at time t, α is the smoothing parameter used to adjust the data level, Y_t denotes the actual observations at time t, and T_t refers to the trend estimate at the same time. The parameter β is used to smooth the trend. Finally, the predicted value at time t + m is represented as

{\hat{Y}}_{(t + m)}

[35,36].

Prophet is a time-series analysis technique and automated forecasting tool developed by the data analytics team of Meta [37]. It includes a trend model, a seasonality model, and holiday effects; the final forecast model is calculated as follows:

y (t) = T (t) + S (t) + H (t) + ϵ (t),

(6)

where T(t) represents the trend model, which captures the long-term movement in data; S(t) denotes the seasonality model, which accounts for repetitive patterns or cycles over a fixed period; H(t) is the holiday effect, which models the impact of holidays or special events on the data; and, finally,

ϵ

(t) represents the error term.

MAPE is the mean of the absolute value of the percentage error between predicted and actual values. It quantifies a model’s accuracy by calculating the average of absolute percentage differences to provide a measure of relative prediction accuracy. MAPE is calculated as shown in Equation (7). All evaluation metrics, including MAPE, indicated that lower values corresponded to higher accuracy.

MAPE = (\frac{1}{n} \sum_{t = 1}^{n} |\frac{A_{t} - F_{t}}{A_{t}}|),

(7)

where A_t represents the actual observed value at time t, F_t corresponds to the predicted value at the same time, and n indicates the total number of data points available for the analysis.

MASE is the mean of the forecast absolute error normalized to the simple naïve forecast error and is useful to compare the model forecast performance on time-series data. It is calculated as shown in Equation (8). Furthermore, MASE indicates that a lower number indicates a higher accuracy.

MASE = \frac{\frac{1}{n} \sum_{t = 1}^{n} |A_{t} - F_{t}|}{\frac{1}{n - 1} \sum_{t = 2}^{n} |A_{t} - A_{t - 1}|} .

(8)

Similarly, RMSE quantifies a model’s accuracy by measuring the square root of the average squared differences between observed and predicted values. It provides an aggregate measure of prediction error in the same units as the original data, making it sensitive to large errors. RMSE is computed as shown in Equation (9). Lower RMSE values indicate higher predictive accuracy values.

RMSE = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(A_{t} - F_{t})}^{2}},

(9)

where A_t denotes the actual observed value at time t, F_t represents the predicted value at the same time, and n indicates the total number of observations considered in the analysis.

NRMSE standardizes RMSE to provide a dimensionless metric, facilitating the comparison of different datasets or models with varying scales. It is calculated by dividing RMSE by the range of observed values, as shown in Equation (10). This normalization enables the relative evaluation of model performance, in which lower values indicate better accuracy.

NRMSE = \frac{RMSE}{A_{\max} - A_{\min}},

(10)

where A_max and A_min are the maximum and minimum observed values, respectively. This formulation ensures that NRMSE remains consistent regardless of the data scale, making it suitable for a comparative performance analysis.

3. Results

3.1. Descriptive Statistics

Statistics Korea (KOSTAT) collected the relevant future projected population data for 20 years, from 2001 to 2020 [30]. These nationally approved statistics are collected and published in 5-year intervals. The data used in this study were based on the 2022 Population and Housing Census results projected in 5-year intervals. Further, the data conformed to the assumption that the median fertility rate, median life expectancy, and median international net migration were default projections. We used data from the 25- to 34-year age group, since they have the highest proportion of full-time students in the South Korean PhD program. The 2021 Survey of Earned Doctorates was extrapolated to obtain a new potential workforce supply [38]. The number of S&T PhD graduates by year was obtained from the 2002 Statistical Yearbook of Education from KEDI [31]. Our study defined science and engineering majors as S&T domains following the KEDI classification.

Table 1 and Figure 2 depict the data trend for the analysis period. Whereas the number of S&T PhD graduates increased during the analysis period, the estimated population decreased. Both the datasets demonstrated a linear pattern but had opposite trends. We applied a time-series approach to calculate the relevant ratio.

Detailed descriptive statistics reveals several key aspects: First, a steady increase in the number of PhD graduates indicates a positive response to the relevant policies aimed at increasing the number of S&T human resources. However, the simultaneous population decline highlights the potential challenges involved in sustaining this growth. These contrasting trends underline the importance of performing a nuanced analysis to inform policy decisions.

3.2. Single Exponential Smoothing

We selected optimal SES parameters using the “ets()” function in the forecast package of the R 4.4.1 software package. This function automatically identifies the model’s best-fitting parameters. The value of the smoothing constant, α, ranges between 0 and 1, and the α value selected by the algorithm was 0.99, indicating that significant weight was assigned to the most recent observations. Based on the selected parameters, we calculated the model performance metrics: Akaike Information Criterion (AIC) = −332.761, Bayesian Information Criterion (BIC) = −329.774, and Corrected Akaike Information Criterion (AICc) = −331.261.

AIC was computed using the following formula:

A I C = 2 k - 2 \ln (L),

(11)

where k represents the number of parameters in the model, and L is the model’s maximum likelihood.

BIC was calculated as follows:

B I C = k \ln (n) - 2 \ln (L),

(12)

where n is the number of observations, k is the number of parameters, and L is the maximum likelihood.

AICc is an adjusted version of AIC that accounts for small sample sizes and is calculated using the following formula:

A I C c = A I C + \frac{2 k (k + 1)}{n - k - 1},

(13)

where n is the number of observations and k is the number of parameters in the model. Figure 3 depicts the forecasting results obtained by using the SES method. Our data do not demonstrate any significant trends other than a linear trend. Therefore, the forecasts for the next 20 years remain at the level of the last observation. As shown in Figure 3, the forecasted values remain at a lower level than the actual ones. The forecasting model utilizing SES generally underestimates the growth of the S&T workforce, since SES tends to exhibit a lag in relation to the observed values when data exhibit an increasing or decreasing trend over time [32]. Hence, we applied the DES method to obtain accurate forecasts. The SES result indicates that recent observations significantly influence the forecast, leading to a static future projection.

3.3. Double Exponential Smoothing

Similarly to SES, DES selects optimal parameters using the R forecast package. The α value selected by the algorithm was 0.99, which is close to 1, and the β value was 0.05. Accordingly, we assigned a significant weight to recent observations (AIC = −341.943, BIC = −335.969, and AICc = −335.482). Figure 4 depicts DES forecast results. A linear trend was trained for the period 2001–2020 to project the trend for the subsequent 20 years. Compared with SES results, this method effectively reflects the characteristics of our data. The prediction accuracy of the existing training data surpassed that of SES as well. Following the linear trend of the existing observed data, a steadily increasing rate was predicted for all years until 2040. The DES graph reflects a linear data trend similar to that of SES, and the upper and lower data widths are significantly low in this graph. The forecasting model utilizing DES shows a tendency for the predicted values to align closely with the actual figures, particularly those after 2013.

3.4. Prophet Model

The Prophet analysis technique is effective for data with various trends and strong seasonal effects, as well as being robust to outliers and shifts in trends [39]. Similarly, the model is trained on the data for 2001–2020 to forecast the trend for the next 20 years. Figure 5 depicts the results of Prophet analysis. Between 2008 and 2012, the predicted and actual values were separated by a significant gap; however, after this period, the model demonstrated a stable forecasting trend. Compared with those of the two earlier models, the predictions derived from the existing data exhibited trends that were closely aligned with the original data. Further, the forecast graph obtained in this case is similar to that of DES and accurately reflects the linear trend in our data. Once again, we observed a steady upward trend after 2020. The widths of the upper and lower prediction bounds are lower than those obtained from exponential smoothing. Finally, Prophet’s robustness to outliers and ability to incorporate shifts in trends render it particularly useful in dynamic environments characterized by sudden changes arising from policy shifts or external factors.

3.5. Comparison of Methods

We evaluated the accuracies of models and compared them by analyzing their MAPE, MASE, RMSE, and NRMSE values, as presented in Table 2. MAPE is the validated difference between predicted and actual values, with low values indicating high accuracies. For this metric, DES demonstrates the lowest value and has the highest prediction accuracy, followed by Prophet and SES, in that order. Conversely, MASE evaluates the forecast accuracy of the time-series data and compares the forecast model’s error to simple naive error. Similarly to MAPE, low MASE values indicate high prediction accuracy. Further, Prophet demonstrates the best performance using this metric. However, SES performed poorly on both the metrics. Both DES and Prophet exhibit strong predictive performance. To comprehensively evaluate the forecasting accuracy from multiple perspectives, we considered RMSE and NRMSE in addition to DES and Prophet. RMSE is calculated by taking the square root of the mean of the squared differences between predicted and actual values, and it is highly sensitive to large errors due to its direct measurement of the magnitude of prediction errors. Further, NRMSE, which is a normalized version of RMSE, facilitates a relative assessment of error magnitude to provide a highly nuanced comparison of prediction accuracy and data variability. Accordingly, this approach enables a comprehensive evaluation of forecasting performance by taking into account both absolute and relative error metrics. Further, various metrics were simultaneously considered to select the best model to forecast South Korea’s S&T PhD workforce. Although DES demonstrates a slightly higher forecast accuracy, Prophet better reflects the outlier changes in the time series. Therefore, Prophet was considered the most appropriate model to forecast the S&T workforce (which often fluctuates according to policy and environmental changes). Section 3.6 discusses the future projections based on the rates predicted by Prophet. On comparing MAPE and MASE values, although DES provides a slightly higher accuracy, Prophet’s ability to manage outliers and trend shifts renders it more suitable for our forecasting needs than DES, particularly in dynamic policy environments.

3.6. Forecasting of Science and Technology Doctoral Graduates

Table 3 and Figure 6 forecast the results regarding the size of South Korea’s S&T labor supply for 2021 to 2040. Ratio-based estimation required the rounding of the number of graduates to a single decimal place. The data indicated that the ratio of the number of S&T PhD scholars to the population will increase with the increase in the number of graduates until 2028, steadily decline until 2035, and show a modest increase after 2036. The forecasting results indicate a complex trend influenced by demographic changes and policy impacts. The initial increase, followed by the decline and subsequent recovery, in the number of graduates highlights the importance of implementing adaptive policies that can respond to changing demographic realities and ensuring a steady supply of S&T PhD graduates.

4. Discussion

Figure 7 presents a graph demonstrating the combined predicted results of this study and existing data trends. A combination of Figure 2 and Figure 6 effectively depicts the findings as well. In South Korea, the number of individuals between the ages of 25 and 34 years pursuing further education is expected to steadily decline after 2024. Conversely, the number of doctoral graduates will increase until 2028, probably due to the country’s S&T labor policy. However, the number of PhD graduates will subsequently decline because of the decline in school-age population. Regardless, from 2036 onward, the proportional increase in S&T PhD scholars will surpass the population decline and show a moderate upward trend.

Summarily, the workforce trends in this scenario are linked to the quality of PhD graduates and the quantity and quality of future labor positions. It is difficult to determine whether this scenario’s outcome will lead to over- or understaffing in the future. However, earlier studies have expressed concern regarding the oversupply of PhD scholars, which can limit their utilization [20]. This suggests the necessity of implementing a complex combination of policies. To strengthen national S&T competitiveness, it is necessary to employ a PhD workforce in STEM fields [5]. Appropriate education policies should be implemented to maintain the highly qualified S&T PhD workforce in South Korea. They must be accompanied by industrial and employment policies relevant to their career paths as well [5,11,12,14,21]. These complex quantities and quality dynamics should be considered in future studies.

This study is significant because it uses a time-series methodology to forecast the future size of the South Korean S&T PhD workforce. The projected size derived from this analysis can be used in detailed workforce planning by the integration of various methodologies. Additionally, it serves as a foundational basis for policy initiatives considering the demographic structure. Further, our forecasting model provides evidence to support the advancement of such policies. In particular, our forecast scenario is expected to serve as a starting point for future studies on the complexity of S&T labor policy.

The current study did not include comparative analyses using models such as long short-term memory (LSTM) and ARIMA due to overfitting and model complexity issues with limited time-series data. This study’s scope can be enhanced following the availability of additional time-series data. Further, as it involved leveraging a population-based predictive model, this study is limited by the constraint that current patterns continue to be maintained.

5. Conclusions

This study identified an appropriate time-series methodology to estimate the supply of the S&T workforce. To account for the data’s limitations, we selected exponential smoothing methods and the Prophet model, and used South Korea as a case study to validate the methods. Our forecasts for the S&T doctorate supply for a period of 20 years, from 2021 to 2040, clarified that the Prophet model outperforms traditional methods such as exponential smoothing in workforce supply predictions. Therefore, the Prophet model can serve as a benchmark for future forecasting models focusing on the supply of S&T human resources in Korea. Similarly, our findings indicate that the Prophet model may be effectively applied to workforce forecasting studies in other countries and sectors, despite having potential variations across disciplines and regions. Furthermore, this study indicates that, compared to traditional time-series models, state-of-the-art models like Prophet provide better predictive performance regarding estimating S&T workforce trends.

Our findings predict medium- to long-term workforce supply trends for S&T PhD scholars in South Korea. Earlier studies anticipated an increase in demand for PhD scholars in various industries, including S&T. Although the demand for S&T PhD scholars specializing in R&D planning, evaluation, and policy formulation is growing, the scholars’ roles will not be confined to these areas alone. The S&T doctoral workforce in South Korea is projected to grow at a rate that is consistent with current trends until 2028; subsequently, it will decline and eventually recover after 2035. During this 7-year period, targeted policy measures are needed to support the rising number of doctoral scholars and ensure an upward trend after 2035. Since the global landscape is evolving and technological hegemony is on the rise, it is imperative to foster a robust pipeline of core S&T talent. As previously noted, human resource development is a multifaceted challenge that cannot be addressed by a single policy. A comprehensive and dynamic policy framework, incorporating various initiatives in collaboration with relevant government departments, is essential to bolster national competitiveness. Such efforts must be sustained to ensure evidence-based policy implementation.

Further, we plan to continue refining this forecasting model to ensure its broad applicability in supporting policies while overcoming various constraints. Many of the earlier studies on time-series forecasting suggest combining different types of prediction models. However, owing to data limitations, we could apply only a limited number of models in this study. Therefore, future research will utilize additional time-series data and models such as LSTM and ARIMA to conduct more robust comparative analyses. If these models are found to improve performance, ensemble models combining various configurations can be proposed. Once additional data become available, multivariate time-series models such as the vector error correction model can be explored as well. These future research ideas can be realized only if longer time-series data become available. This study benchmarks existing forecasting models, and future research should prioritize the exploration of novel methodologies. Applying these approaches across countries and sectors will enable comprehensive comparisons of global supply trends to obtain deep insights. This is crucial to the advancement of the field of policy science.

Author Contributions

Conceptualization, H.-Y.Y. and H.C.; Methodology, H.-Y.Y.; Writing—original draft, H.-Y.Y.; Writing—review & editing, H.-Y.Y. and H.C.; Supervision, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

Acknowledgments

All authors would like to thank the reviewers for their valuable contributions that have enhanced the quality of our research.

Conflicts of Interest

The authors declare no competing interests.

References

Atkinson, R.C.; Blanpied, W.A. Research universities: Core of the US science and technology system. Technol. Soc. 2008, 30, 30–48. [Google Scholar] [CrossRef]
De, S. Intangible capital and growth in the “new economy”: Implications of a multi-sector endogenous growth model. Struct. Chang. Econ. Dyn. 2014, 28, 25–42. [Google Scholar] [CrossRef]
Santos, J.M.; Horta, H.; Heitor, M. Too many PhDs? An invalid argument for countries developing their scientific and academic systems: The case of Portugal. Technol. Forecast. Soc. Chang. 2016, 113, 352–362. [Google Scholar] [CrossRef]
Bonilla, K.; Salles-Filho, S.; Bin, A. Building science, technology, and research capacity in developing countries: Evidence from student mobility and international cooperation between Korea and Guatemala. STI Policy Rev. 2018, 9, 99–132. [Google Scholar]
Jung, J. Domestic and overseas doctorates and their academic entry-level jobs in South Korea. Asian Educ. Dev. Stud. 2018, 7, 205–222. [Google Scholar] [CrossRef]
Albert, J.R.G.; Tabunda, A.M.L.; David, C.P.C.; Cuenca, J.S.; Francisco, K.A.; Vizmanos, J.F.V.; Labina, C.S. Future S&T Human Resource Requirements in the Philippines: A Labor Market Analysis. Discussion Paper Series No. 2020–22. 2020. Available online: https://pidswebs.pids.gov.ph/CDN/PUBLICATIONS/pidsdps2022.pdf (accessed on 20 May 2024).
Radosevic, S.; Auriol, L. Patterns of restructuring in research, development and innovation activities in central and eastern European countries: An analysis based on S&T indicators. Res. Policy 1999, 28, 351–376. [Google Scholar] [CrossRef]
Basu, A.; Foland, P.; Holdridge, G.; Shelton, R.D. China’s rising leadership in science and technology: Quantitative and qualitative indicators. Scientometrics 2018, 117, 249–269. [Google Scholar] [CrossRef]
Lee, H.-F.; Miozzo, M.; Laredo, P. Career patterns and competences of PhDs in science and engineering in the knowledge economy: The case of graduates from a UK research-based university. Res. Policy 2010, 39, 869–881. [Google Scholar] [CrossRef]
Bøgelund, P.; de Graaff, E. The road to become a legitimate scholar: A case study of international PhD students in science and engineering. Int. J. Doct. Stud. 2015, 10, 519–533. [Google Scholar] [CrossRef]
Barge-Gil, A.; D’Este, P.; Herrera, L. PhD trained employees and firms’ transitions to upstream R&D activities. Ind. Innov. 2021, 28, 424–455. [Google Scholar] [CrossRef]
Gould, J. How to build a better PhD. Nature 2015, 528, 22–25. [Google Scholar] [CrossRef] [PubMed]
Shmatko, N.; Katchanov, Y.; Volkova, G. The value of PhD in the changing world of work: Traditional and alternative research careers. Technol. Forecasting Soc. Chang. 2020, 152, 119907. [Google Scholar] [CrossRef]
Butz, W.P.; Bloom, G.A.; Gross, M.E.; Kelly, T.K.; Kofner, A.; Rippen, H.E. Is There a Shortage of Scientists and Engineers? How Would We Know? IP-241-OSTP; Rand Corporation: Santa Monica, CA, USA, 2003. [Google Scholar]
Suzdalova, M.; Politsinskaya, E.; Sushko, A. About the problem of professional personnel shortage in mechanical engineering industry and ways of solving. Procedia Soc. Behav. Sci. 2015, 206, 394–398. [Google Scholar] [CrossRef]
Zweig, D.; Kang, S. America Challenges China’s National Talent Programs. Available online: https://www.jstor.org/stable/resrep24782 (accessed on 20 May 2024).
Athey, S. Beyond prediction: Using big data for policy problems. Science 2017, 355, 483–485. [Google Scholar] [CrossRef] [PubMed]
Safarishahrbijari, A. Workforce forecasting models: A systematic review. J. Forecast. 2018, 37, 739–753. [Google Scholar] [CrossRef]
Shapiro, M.A.; So, M.; Woo Park, H. Quantifying the national innovation system: Inter-regional collaboration networks in South Korea. Technol. Anal. Strateg. Manag. 2010, 22, 845–857. [Google Scholar] [CrossRef]
BAI. Audit on the Demographic Crisis V. 2022. Available online: https://www.bai.go.kr/bai/result/branch/detail?srno=2762 (accessed on 20 May 2024). (In Korean).
Landry, M.D.; Hack, L.M.; Coulson, E.; Freburger, J.; Johnson, M.P.; Katz, R.; Kerwin, J.; Smith, M.H.; Wessman, H.C.B.; Venskus, D.G.; et al. Workforce projections 2010–2020: Annual supply and demand forecasting models for physical therapists across the United States. Phys. Ther. 2016, 96, 71–80. [Google Scholar] [CrossRef]
Stewart, R.; Dayal, H.; Langer, L.; van Rooyen, C. Transforming evidence for policy: Do we have the evidence generation house in order? Humanit. Soc. Sci. Commun. 2022, 9, 116. [Google Scholar] [CrossRef]
Maier, T.; Afentakis, A. Forecasting supply and demand in nursing professions: Impacts of occupational flexibility and employment structure in Germany. Hum. Resour. Health 2013, 11, 24. [Google Scholar] [CrossRef]
Fuchs, J.; Söhnlein, D.; Weber, B.; Weber, E. Stochastic forecasting of labor supply and population: An integrated model. Popul. Res. Policy Rev. 2018, 37, 33–58. [Google Scholar] [CrossRef]
Moniz, A.B. Scenario-building methods as a tool for policy analysis. In Innovative Comparative Methods for Policy Analysis: Beyond the Quantitative-Qualitative Divide; Springer: Boston, MA, USA, 2006; pp. 185–209. [Google Scholar]
Gustriansyah, R.; Alie, J.; Suhandi, N. Modeling the number of unemployed in South Sumatra province using the exponential smoothing methods. Qual. Quant. 2023, 57, 1725–1737. [Google Scholar] [CrossRef] [PubMed]
Dumičić, K.; Čeh Časni, A.; Žmuk, B. Forecasting unemployment rate in selected European countries using smoothing methods. World Acad. Sci. Eng. Technol. Int. J. Soc. Educ. Econ. Manag. Eng. 2015, 9, 867–872. [Google Scholar]
Syafwan, H.; Syafwan, M.; Syafwan, E.; Hadi, A.F.; Putri, P. Forecasting unemployment in north Sumatra using double exponential smoothing method. J. Phys. Conf. Ser. 2021, 1783, 012008. [Google Scholar] [CrossRef]
Pontoh, R.S.; Zahroh, S.; Nurahman, H.R.; Aprillion, R.I.; Ramdani, A.; Akmal, D.I. Applied of feed-forward neural network and Facebook prophet model for train passengers forecasting. J. Phys. Conf. Ser. 2021, 1776, 012057. [Google Scholar] [CrossRef]
KOSIS. Projected Population. Available online: https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1BPA001&vw_cd=MT_ETITLE&list_id=A41_10&scrId=&language=en&seqNo=&lang_mode=en&obj_var_id=&itm_id=&conn_path=MT_ETITLE&path=%252Feng%252FstatisticsList%252FstatisticsListIndex.do (accessed on 14 December 2023).
KEDI. Statistical Yearbook of Education. 2022. Available online: https://kess.kedi.re.kr/eng/publ/view?survSeq=2022&publSeq=2&menuSeq=0&itemCode=02&language=en (accessed on 20 May 2024).
Voineagu, V.; Pisica, S.; Caragea, N. Forecasting monthly unemployment by econometric smoothing techniques. J. Econ. Comput. Econ. Cybern. Stud. Res. 2012, 46, 255–267. [Google Scholar]
Brown, R.G. Statistical Forecasting for Inventory Control; McGraw-Hill: New York, NY, USA, 1959; pp. 443–473. [Google Scholar]
Hyndman, R.; Athanasopoulos, G.; Bergmeir, C.; Caceres, G.; Chhay, L.; O’Hara-Wild, M.; Petropoulos, F.; Razbash, S.; Wang, E.; Yasmeen, F. Forecasting Functions for Time Series and Linear Models, R Package Version 6; 2015. Available online: https://pkg.robjhyndman.com/forecast/ (accessed on 15 June 2024).
Holt, C.C. Forecasting trends and seasonals by exponentially weighted moving averages. ONR Memo. 1957, 52, 5–10. [Google Scholar]
Holt, C.C. Forecasting seasonals and trends by exponentially weighted moving averages. Int. J. Forecast. 2004, 20, 5–10. [Google Scholar] [CrossRef]
Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
KRIVET. Survey of Earned Doctorate. 2021. Available online: https://www.krivet.re.kr/kor/sub.do?pageIndex=&menuSn=12&pstNo=E120220010&orderType=date&dataSeType=all&fileType=full&searchYear=&searchType=all&searchText=%EB%B0%95%EC%82%AC%EC%A1%B0%EC%82%AC (accessed on 20 May 2024). (In Korean).
Serrano, A.L.M.; Rodrigues, G.A.P.; Martins, P.H.S.; Saiki, G.M.; Filho, G.P.R.; Gonçalves, V.P.; Albuquerque, R.O. Statistical comparison of time series models for forecasting Brazilian monthly energy demand using economic, industrial, and climatic exogenous variables. Appl. Sci. 2024, 14, 5846. [Google Scholar] [CrossRef]

Figure 1. Data analysis procedure.

Figure 2. Annual data trends.

Figure 3. Single exponential smoothing forecasting results.

Figure 4. Double exponential smoothing forecasting results.

Figure 5. Prophet forecasting results.

Figure 6. Annual forecasting trends using predicted data.

Figure 7. Annual trends for the science and technology workforce combining existing and forecasted results.

Table 1. Quantified data for 2001–2020.

Year	PhD Graduate	Population Estimation	Ratio
2001	2804	8,556,683	0.033%
2002	3095	8,485,909	0.036%
2003	3183	8,414,442	0.038%
2004	3516	8,303,905	0.042%
2005	3669	8,204,263	0.045%
2006	3814	8,153,407	0.047%
2007	3619	8,097,282	0.045%
2008	3670	8,011,074	0.046%
2009	3815	7,857,528	0.049%
2010	4138	7,711,889	0.054%
2011	5092	7,632,229	0.067%
2012	5292	7,523,552	0.070%
2013	5414	7,401,510	0.073%
2014	5523	7,304,584	0.076%
2015	5614	7,141,056	0.079%
2016	5978	6,968,066	0.086%
2017	6177	6,859,703	0.090%
2018	6351	6,853,679	0.093%
2019	6713	6,894,238	0.097%
2020	7263	6,953,345	0.104%

Table 2. Comparison of the statistical error measurements across methods.

	SES	DES	Prophet
MAPE	5.979	3.402	3.463
MASE	0.950	0.512	0.470
RMSE	4.693 × 10⁻⁵	3.211 × 10⁻⁵	2.394 × 10⁻⁵
NRMSE_minmax	0.066	0.045	0.033

Table 3. Predicted data for the period from 2021 to 2040.

Year	Population Estimate (Predicted)	Ratio (Predicted)	Number of Doctoral Graduates (Predicted)
2021	6,981,653	0.107%	7462
2022	7,019,218	0.111%	7800
2023	7,119,257	0.116%	8255
2024	7,149,410	0.121%	8676
2025	7,146,556	0.125%	8916
2026	7,110,027	0.129%	9172
2027	6,932,072	0.134%	9277
2028	6,716,604	0.139%	9351
2029	6,511,059	0.143%	9287
2030	6,275,989	0.147%	9218
2031	6,026,943	0.152%	9143
2032	5,817,110	0.157%	9138
2033	5,648,775	0.161%	9066
2034	5,480,099	0.165%	9028
2035	5,296,199	0.170%	8981
2036	5,172,281	0.175%	9050
2037	5,123,004	0.178%	9138
2038	5,095,903	0.183%	9306
2039	5,043,637	0.187%	9454
2040	5,036,726	0.193%	9713

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yoon, H.-Y.; Choe, H. Application of Time-Series Modeling in Forecasting the Doctorate-Level Science and Technology Workforce. Appl. Sci. 2024, 14, 9135. https://doi.org/10.3390/app14199135

AMA Style

Yoon H-Y, Choe H. Application of Time-Series Modeling in Forecasting the Doctorate-Level Science and Technology Workforce. Applied Sciences. 2024; 14(19):9135. https://doi.org/10.3390/app14199135

Chicago/Turabian Style

Yoon, Ho-Yeol, and Hochull Choe. 2024. "Application of Time-Series Modeling in Forecasting the Doctorate-Level Science and Technology Workforce" Applied Sciences 14, no. 19: 9135. https://doi.org/10.3390/app14199135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of Time-Series Modeling in Forecasting the Doctorate-Level Science and Technology Workforce

Abstract

1. Introduction

2. Proposed Methods

2.1. Data Collection and Analysis Environments

2.2. Analysis Methods

3. Results

3.1. Descriptive Statistics

3.2. Single Exponential Smoothing

3.3. Double Exponential Smoothing

3.4. Prophet Model

3.5. Comparison of Methods

3.6. Forecasting of Science and Technology Doctoral Graduates

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI