1. Introduction
Monkeypox (Mpox) is a contagious zoonotic disease that has experienced significant spread worldwide, with symptoms such as fever, fatigue, and rash as key indicators [
1]. This disease has raised concern in Latin America and Chicago due to endemic transmission and low vaccination coverage, highlighting the importance of accessible vaccination strategies and ongoing surveillance to control the disease [
2,
3]. Consequently, there has been notable interest in recent years due to outbreaks occurring in various regions, leading to hundreds of research studies [
4,
5]. Therefore, developing and developed countries have had to prepare response plans and continuously monitor Mpox disease since it has significantly impacted international territories since 2022 [
6,
7,
8]. However, it was necessary to emphasize the importance of genomic surveillance to understand and control the spread of emerging infectious diseases, emphasizing the need for a rapid and coordinated response at national and international levels [
9,
10]. Similarly, the need to highlight the importance of reliable information sources to understand and address this disease effectively was evident [
11,
12]. Additionally, it was relevant to highlight the assessment of fear associated with Mpox to design customized education and prevention programs, with more significant implications for the development of strategies considering psychosocial factors, such as epistemic credulity and media perception [
13,
14,
15], as well as the critical need to improve awareness strategies and preventive measures to mitigate the risk of Mpox transmission and other potential public health threats to empower individuals in making informed decisions about their well-being [
16,
17]. Therefore, a rapid and collaborative international response was necessary to control and prevent the spread of the virus to develop effective detection strategies, emergency management, and advancements in antiviral drugs and vaccines to address this public health threat [
18,
19].
Short-term forecasts of its trajectory at various geographic levels can assist in developing policy and intervention measures for any fast-spreading new illness. However, there are few opportunities to evaluate predicting performance and improve models during a public health crisis [
20]. Fortunately, as of September 2022, instances were rapidly declining globally, with non-endemic nations reporting a total of 90,574 cases and 170 fatalities as of 27 December 2023 [
21]. Given the diverse effects of the epidemic on different geographical scales and the dramatic drop in Mpox cases, it is critical to retroactively review forecasting approaches to better prepare for future public health catastrophes [
22]. In response to this concern, research began to understand the outbreak of Mpox. One study indicated that since May 2022, 108 countries with Mpox outbreaks have been identified, with the disease primarily affecting homosexual and bisexual men [
23]. It was suggested that risk factors for contracting Mpox include being a young man, having sex with other men, having unprotected sex, being HIV positive, and having a history of sexually transmitted infections [
24]. Furthermore, in November 2022, over 850,000 English tweets using the keyword “Monkeypox” were analyzed, revealing initially negative emotions towards a new global outbreak. It was observed that tweets helped disseminate information such as vaccination locations, global case quantification, symptoms, and prevention methods. However, they were also prone to providing misinformation [
25]. On the other hand, the evolution of Mpox entails a significant risk of severe outcomes in terms of hospitalization, with significant differences between the recent outbreak and historical ones, suggesting a possible variation in disease severity at different periods [
26].
Various researchers in the year 2022 studied the behavior of Mpox, proposing statistical models that could predict cases of contagion and death with different levels of success. In some cases, they suggested linear regression models for forecasting Mpox outbreaks [
27]. In contrast, in another case, they proposed a convolutional neural network model to detect and predict Mpox contagion cases [
28]. In 2023, a hybrid technique for predicting Mpox infection and death yielded notable results. In this regard, time series have emerged as a valuable tool for predicting the spread of infectious diseases and improving response capacity to outbreaks. The use of predictive statistical models in the healthcare domain has experienced a significant increase in recent times. These models serve as a crucial link between statistics and medical practice, offering valuable support in decision-making and facilitating the creation of various systems and tools to mitigate uncertainties, improve performance, and establish effective control measures to combat diseases [
29].
Other studies addressed the growing threat of Mpox in a post-COVID-19 context, using neural networks to predict its spread in the USA, Germany, the UK, France, and Canada, showing high accuracy in outbreak prediction. The effectiveness of the artificial neural network (ANN) model compared to other methods such as LSTM and GRU was highlighted, emphasizing the importance of deep learning in predicting and controlling emerging diseases like Mpox [
30]. Furthermore, classification systems based on neural networks and explainable AI tools were proposed, trained using a dataset of images and achieving over 98 percent accuracy [
31]. Challenges such as data availability and quality, biases in datasets, and interpretability in this field were identified, emphasizing the importance of periodically updating the dataset with new images of infected patients for future research [
32]. Similarly, time series models, such as ARIMA, have been used to understand the dynamics of infectious disease outbreaks and predict their spread, focusing on developing an effective prediction model to understand short-term behavior [
33]. Likewise, a decrease in nucleotide mutation rates was observed, maintaining a balance between bidirectional rates through time series analysis by predicting Mpox virus mutation using deep learning models such as LSTM [
34]. Other approaches, such as the innovative filtering and combination technique, accurately forecasted cumulative daily confirmed cases of Mpox using time series and machine learning models, demonstrating the forecasting system’s efficiency and accuracy [
35]. Similarly, machine learning techniques and time series analysis allowed for identifying key patterns and trends in disease spread, showing that convolutional neural networks perform better in analysis [
36,
37]. Likewise, the effectiveness of the stacked ensemble learning approach in predicting transmission rates, especially in Europe, where the pandemic was severe, was evidenced [
38]. Furthermore, other studies highlighted the superiority of machine learning approaches over traditional time series models for predicting Mpox, showing that the multilayer perceptron model outperformed ARIMA with lower mean squared error, recommending the application of methods such as extreme learning machine and support vector machine for better future adaptation [
39].
Short-term forecasting of infectious diseases has become vital for health policy-making and improving the population’s standards in specific or general localities. In this regard, new contributions should be encouraged by proposing different forecasting tools to provide an extensive range of forecasting models that can be applied to specific or general areas for analysis and study. Hence, the main aim of this research work is two-fold: first, propose a new ensemble time series technique, and second, apply the proposed method to attain precise and efficient short-term Mpox infectious disease forecasting for the world’s four most influential countries (Brazil, France, Spain, and the USA) and the world. Therefore, this approach involved processing the first cumulative confirmed case time series to address variance stabilization, normalization, stationarity, and a nonlinear secular trend component. After that, five single time series models including autoregressive, simple exponential smoothing, autoregressive integrated moving averages, nonlinear autoregressive, and the Theta model, were used to ensure the cleaning (i.e., free from variance stabilization, normalization, stationarity, and seasonality issues), and three proposed ensemble models were used to estimate the filtered confirmed case time series. The proposed ensemble models are based on the weighting technique, such as equal weight to single models, in-sample-based weighing (training), and out-of-sample (validation). However, four different accuracy average errors, such as the mean absolute error, the mean absolute percent error, the root mean squared error, and the root mean log squared error, and a statistical equal forecast test, the Diebold–Marino test, are determined to check the performance of the proposed novel time series ensemble forecasting technique. Furthermore, the developed novel time series ensemble approach can be used to forecast other diseases in the future.
The remainder of this manuscript is structured as follows:
Section 2 outlines the general framework of the proposed time series ensemble approach. In
Section 3, the proposed time series forecasting approach is applied to the daily cumulative confirmed cases series from the four countries: Brazil, the USA, Spain, and France, as well as the total cases worldwide. Using the best ensemble model within the proposed forecasting approach in this paper, a projection has been made for the next twenty-eight days, equivalent to four weeks. To understand the spread of the disease and associated risks in the four countries with the highest number of infections, as well as the total cases worldwide, a comprehensive discussion is presented in
Section 4. Lastly,
Section 5 concludes by discussing the study’s limitations and proposing directions for future research.
3. Case Study Results
This work aims to provide a short-term forecast of the cumulatively infected cases of Mpox using the four most influential countries (Brazil, France, Spain, and the USA) worldwide and for the whole world. The Mpox datasets (daily cumulative confirmed cases) were taken from the official website of “Our World in Data” from 1 June 2022 to 30 April 2023. The graphical presentation and the descriptive statistics of cumulative confirmed cases for all countries and the world can be seen in
Figure 2 and
Table 2.
Figure 2 shows the cumulative Mpox-confirmed cases and an increasing nonlinear curve in all cases. However, this figure shows that the world has the most confirmed Mpox cases, while the USA shows the highest confirmed cases among the most affected countries. On the other hand, Spain had the most confirmed counts at the start, but after September 2022, Brazil obtained the second-highest number of confirmed cases of Mpox, and Spain had the third-highest number of confirmed cases. In the same way, France had higher infected cases at the start, but after August 2022, there were more confirmed new cases than in Spain and Brazil. The current situation is that amongst all countries, the USA has the most new confirmed counts, while Brazil and Spain are in the second and third positions, and France is the fourth most affected country among all countries until 30 April 2023.
In contrast to the graphic presentation, the descriptive statistics, such as minimum, 25% (first quartile), 50% (second quartile or median), 75% (third quartile), arithmetic mean, variance, standard deviation, skewness, kurtosis, and maximum statistics for Brazil, France, Spain, the USA, and the entire world, using original and natural logarithm cumulative confirmed cases time series, are tabulated in
Table 1. It is clearly confirmed from this table that the natural logarithm effect on all considered cumulative time series stabilizes the variance and standard deviation as well. Due to this effect, this work will proceed with a log series for all cases for further analysis. Therefore, the complete datasets for all considered countries and the entire world of the daily cumulative confirmed cases covering 334 days were divided into three parts as follows: 1 June to 20 January 2023 (234 days) was used for model estimation (training part), 21 January 2023 to 11 March 2023 (50 days) was used for model validation (hold-out sample), and 12 March 2023 to 30 April 2023 (50 days) was used for model testing (out-of-sample) the one-day-ahead cumulative confirmed cases forecasts.
As confirmed by the previous discussion, all series have an increasing nonlinear trend component. This work extracts the nonlinear curve trend component using the regression spine method to achieve this. The graphical representation of the nonlinear curve trend component along with the original log confirmed cumulative series is shown in
Figure 3. Clearly, it can be seen that in all cases, such as sky blue (the whole world), blue (the USA), green (Spain), black (Brazil), and red (France), the nonlinear curve trend component is extracted very well. Once the nonlinear curve trend component is removed, this work moves ahead with further modeling and forecasting with clean cumulative confirmed case time series. The remaining filtered series (clean cumulative confirmed case time series) for the four most affected countries and the entire world case are shown in
Figure 4.
Before modeling and forecasting time series data, it is essential to check the stationarity property of the dataset. To do this, this work performed the augmented Dickey–Fuller test and reported the results (statistics and
p-values) for the original and clean (taking the natural logarithm and removing the trend component) cumulative confirmed cases time series for all considered countries and the entire world case in
Table 3. This table indicates the original cumulative established case time series of the four most affected countries and the world as a whole, all nonstationary. In contrast, the clean cumulative confirmed cases time series (with natural logarithm and removing the trend component) for all considered countries have a higher negative statistic value. They are mostly minuscule (less than 0.05), indicating that the series is stationary at a 5% significance level. Once the dataset has been preprocessed, confirmed cumulative case series are modeled and forecast. To this end, this work uses five single time series models, including the autoregressive model, the exponential smoothing model, the autoregressive moving averages model, the nonlinear autoregressive model, and the Theta model, and the three proposed ensemble models (the EnsE, the EnsT, and the EnsV). Therefore, in the proposed time series ensemble forecasting approach, compare nine total models within the two contexts, such as comparing single model performance, the proposed ensemble models, and single verse ensemble models.
Hence, for all nine models for the four most affected countries and the world case, one-day-ahead out-of-sample forecast outcomes (MAP, MAE, RMSE, and RMLSE) are listed in
Table 4. From
Table 4, it is concluded that the EnsV produced the best forecasting results compared to all nine forecasting models within the proposed time series ensemble forecasting approach in all four most affected countries and the entire world case. For instance, the average accuracy errors for these locations are the following: Brazil (MAPE = 0.0000111, MAE = 0.1681917, RMSLE = 0.0000992, RMSE = 1.1861); France (MAPE = 0.00000019, MAE = 0.00000199, RMSLE = 0.00000191, RMSE = 0.00000553); Spain (MAPE = 0.00003314, MAE = 0.2417941, RMSLE = 0.00019113, RMSE = 1.498126); the USA (MAPE = 0.00010156, MAE = 2.996107, RMSLE = 0.00039027, RMSE = 11.99131); and the entire world (MAPE = 0.00021311, MAE = 0.00079817, RMSLE = 21.92421, RMSE = 70.9131). However, the EnsT model shows the second-best forecasting results among all nine forecasting models in all four most affected countries and the entire world, while the third-best forecasting accuracy average error results are given in the following manner: Brazil (the Theta model; MAPE = 0.0000156, MAE = 0.1711713, RMSLE = 0.0001092, RMSE = 1.196843); France (the ARMA model; MAPE = 0.00000025, MAE = 0.00000309, RMSLE = 0.00000216, RMSE = 0.00000649); Spain (the ESM model; MPAE = 0.00004015, MAE = 0.3040281, RMSLE = 0.00020946, RMSE = 1.586085); and the entire world case (the Theta model; MAPE = 0.00010858, MAE = 3.306309, RMSLE = 0.00042027, RMSE = 12.79745). Therefore, it is seen that within all nine forecasting models, the proposed ensemble models (the EnsT and the EnsV models) generally perform better than single models; however, within the single models, different countries have different single best models, as mentioned previously. Note that the best model is an EnsV or equivalent model for all four countries most affected by Mpox and the world. Also, using the proposed ensemble learning leads to a marked reduction in extreme errors (see
Table 1). The proposed ensemble learning approach, thus, proves to be particularly effective in forecasting new cumulative confirmed cases of Mpox diseases.
Table 5 gives the
p-values for the hypothesis of equal forecast accuracy according to the Diebold and Mariano (DM) test. The DM test has been applied to the series obtained by joining the 50 one-day-ahead forecast errors for each country and each pair of forecasts. Each element of the table is the
p-value of a hypothesis system, assuming no difference in the accuracy of the forecasters in the row or column compared to the alternative that the model in the row is more accurate than the model in the queue. Focusing on the EnsV model, in all considered countries and the entire world case, it is statistically significant in terms of accuracy average errors (MAPE, MAE, RMSLE, and RMSE; see
Table 4) and statistically not different in terms of the DM test (see
Table 5). On the other hand, if we restrict ourselves to single models for all considered countries and the entire world, the best model varies from country to country. Therefore, to conclude this section, from the accuracy average errors (MAPE, MAE, RMSLE, and RMSE) and an equal forecast statistical test (the DM test), we can conclude that the proposed time series ensemble learning forecasting approach is highly efficient and accurate for one day ahead of confirmed cumulative new cases of Mpox for the four most affected countries as well as for the entire world case. In addition, within the proposed time series ensemble learning approach, the proposed EnsV model produces more precise forecasts when compared with the alternative ensemble models and single time series models.
5. Conclusions
This work mainly aimed to forecast the short-term transmission rate of the Mpox infection disease in the most infected countries, such as the USA, Brazil, France, Spain, and the world. To this end, this work proposes a unique time series ensemble approach to analyze and predict the spread of Mpox in the top four countries with high infection rates. This approach involved processing the first cumulative confirmed case time series to address variance stabilization, normalization, stationarity, and a nonlinear secular trend component. After that, five single-time series models and three of their proposed ensemble models were used to forecast the clean, confirmed-case time series. The accuracy of the models is evaluated using average accuracy errors (MAE, MAPE, RMSE, and RMSLE) and an equal forecasting accuracy statistical test (the DM test). Based on the results, it is found that the proposed time ensemble forecasting approach is an efficient and accurate way to forecast the cumulative confirmed cases for the top four countries on the globe and the entire world. In addition, using the best ensemble model, a forecast is made for the next 28 days (four weeks), which will help understand the spread of the disease and the associated risks. This information can prevent further spread and enable timely and effective treatment. Furthermore, the developed novel time series ensemble approach can be used to forecast other diseases in the future.
The study only used a cumulative Mpox dataset from the four most affected countries and the whole world. Still, it could be expanded to include other variables, such as the number of new daily cases and daily and cumulative death counts. This would help evaluate the effectiveness of the proposed time series ensemble forecasting approach. Furthermore, it could forecast short-term daily and cumulative COVID-19 confirmed cases, death counts, and recovered cases. However, the proposed forecasting methods only employed single-time series models. In the future, machine learning models such as random forest, support vector regression, Xboost gradient algorithm, etc., will be integrated to enhance the forecasting technique.