1. Introduction
SARS-Cov-2 (COVID-19) is on the rise and it is quickly infecting new people every day. Currently, two years after the onset of this pandemic, this ascending trend has not yet stopped and it is even multiplying in some countries [
1]. When a person is determined to be infected with the disease in a country, there may be two possibilities where he is infected:
The first case concerns the situation where both carriers and recipients of the disease are in the same country. This type of disease transmission is considered “local”;
The second is for cases infected in another country and transferred to a second country by travel. This type of disease transmission is called “imported”.
Communication among nations is one of the main causes of disease transmission, and is called disease interaction between countries in this paper. In addition to disease progress in target communities when examining its spreading profile, it is also of the utmost importance to reflect on its prevalence rate in other countries, including those with a high volume of travel [
2]. The number of cases infected with this health condition can be thus deemed as a timeseries, taking account of the related statistics in the form of data over time [
3].
In this regard, numerous researchers have thus far attempted to utilize a wide range of statistical tools to predict the number of cases of COVID-19 in the future to guide health care officials to make informed decisions [
4]. For example, Shadabfar et al. used a susceptible–exposed–infected–vaccinated–recovered (SEIVR) model combined with the Monte Carlo (MC) sampling method to probabilistically investigate the COVID-19 spreading profile in the United States (USA) [
5,
6].
In general, different stochastic computations [
7,
8] and numerical methods [
9,
10,
11,
12,
13,
14] are exploited to assess the various aspects of the COVID-19 outbreak. In this sense, Katoch et al. used the autoregressive integrated moving average (ARIMA) model to forecast the COVID-19 dynamics in India [
15]. Kumar Sahai et al. also modeled and predicted this pandemic via the ARIMA model [
16]. Using the same ARIMA model, Malki et al. further predicted the second rebound of this disease; they also projected the end of the pandemic based on the ARIMA model [
17]. Chaurasia et al. additionally used ARIMA and a regression model to forecast mortality rates in this respect [
18]. Furthermore, Kumar et al. employed timeseries methods to analyze the COVID-19 spreading profile in ten affected countries [
19]. Using the
-Sutte indicator and ARIMA, Attanayake et al. modeled COVID-19 [
20]. Hernandez et al. correspondingly forecasted COVID-19 per region using the ARIMA model and polynomial functions [
21]. Moreover, Yang et al. defined the data as timeseries and predicted the COVID-19 spreading profile in Wuhan, China [
22].
Even though these studies have been to take advantage of different regression and optimization techniques to obtain the best fit of the data and consequently provide reliable timeseries forecasting, they typically suffer from one limitation, that is, their prediction remains independent of the disease spreading profile in other nations in the region. In fact, concerning the development trends of the disease interactions in neighboring countries, it seems ideal to measure the relationship between the disease spreading profiles in relevant nations to consider its impact on predicting the disease timeseries in target countries and regions.
To fill this gap, this paper utilizes a Network Autoregressive (NAR) Model. For this purpose, the COVID-19 data are initially retrieved from the World Health Organization (WHO) and the Johns Hopkins University online official websites and databases for seven different countries, namely, Iran, Turkey, Iraq, Azerbaijan, Armenia, Afghanistan, and Pakistan [
23]. Thereafter, by constructing a network in the region, in which each vertex corresponds to a country and each edge represents the correlation of the total number of currently infected cases, the correlation matrix of the area is established. After that, the timeseries forecasting for Iran is performed using the NAR model, providing the number of infected cases up to December 2021. Comparing the root mean square error (RMSE) and mean absolute percentage error (MAPE) between autoregressive integrated moving average (ARIMA) and NAR models demonstrate that a better fit is obtained over the data once interactions among neighboring countries are taken into account. The method proposed in this paper can thus be implemented systematically to provide a reference for the investigation of the disease spreading profile in other countries and regions.
The rest of this study is organized as follows.
Section 2 introduces the study area and then reviews the disease progression across the countries in the region concerned, from the onset of the COVID-19 pandemic in February 2020.
Section 3 sheds light on the details of both methods implemented in this study, namely, the ARIMA and the NAR models, and subsequently describes how to consider the disease interactions in the neighboring nations in the proposed formulation. Next, in
Section 4, the ARIMA and the NAR models are fitted to the existing data. In addition, upon comparing both methods, it is settled that the consideration of the disease interactions in the neighboring countries can enhance the prediction accuracy. Thus, the NAR model is employed to forecast the number of cases infected in Iran until the end of December 2021, and the results are reported. Then, in
Section 5, the criteria for choosing the threshold are clarified in more detail. Finally, the contents are summarized and concluded in
Section 6.
2. Target Region and Data Description
To implement this work, the records of the COVID-19 data from the WHO and Johns Hopkins University official websites are used [
24]. It should be noted that the data reported by the WHO contain some uncertainty and do not reflect the complete and accurate status of the disease in society [
25,
26]. However, the approach presented in the current research is implemented based on the disease statistics provided by the WHO as the reference dataset. The authors do not claim that the prediction made in the paper is the real state of the disease in society but acknowledge that it will be the disease’s future according to WHO data. The data show confirmed cases, daily recovery, and death rates. The total of currently infected patients is accordingly calculated as follows:
As mentioned earlier, the primary purpose of this study is to address the impact of COVID-19 interactions in the neighboring countries on the timeseries forecasting model of the number of cases infected in Iran. As a result, some neighboring nations, including Turkey, Iraq, Azerbaijan, Armenia, Afghanistan, and Pakistan, are considered the target region here. The COVID-19 data from Turkmenistan are not publicly available, so they are not reflected in this study. A comparison of the geographic locations of these countries with Iran is further depicted in
Figure 1. The timeseries of the rate of infected cases and infected cases in these nations as of 10 September 2021 are shown in
Figure 2. A closer look at
Figure 2 also reveals that different countries have so far experienced similar trends of this condition at the same time, which reinforces the hypothesis that the nations located in this region interact with the spread of the disease. For example, Iran and Turkey simultaneously experienced three peaks in March 2020, December 2020, and April 2021.
5. Discussion
In
Section 3, it was discussed that the adjacency matrix, representing the disease interaction among nations, is formed by adopting a threshold and comparing it to the correlation matrix. In order to determine this threshold and explain how to implement the process, additional analysis is required, which is discussed in this section.
To explain the approach adopted to compute the correlation threshold, it should be explained that there are two constraints to meet. First, none of the
in Equation (7) should be zero; otherwise, an infinity term would appear in this equation. Besides, a value of
, which can minimize the RMSE, is preferred as it helps the algorithm gain better accuracy. To implement an algorithm that can satisfy these two conditions, the threshold value is defined as a decision variable. An external loop is then added to the main algorithm to change the value of
and calculate the corresponding
and RMSE. The results for different
values from 0 to 1 with an increment of
are reported in
Table 4.
As seen, values greater than 0.6 give infinity values for , thus cannot be selected as a correlation threshold in the algorithm. Moreover, for the rest of the cases, gives the minimum amount of RMSE. Hence, it is selected as the optimal case and is utilized in the model implementation.
6. Summary and Conclusions
In this paper, the COVID-19 spreading profile in Iran is predicted in view of the influence of the severity and correlation of the disease in neighboring countries. To this end, the timeseries of COVID-19 infection among seven countries in the region, including Iran, Turkey, Iraq, Azerbaijan, Armenia, Afghanistan, and Pakistan, are downloaded from the online databases provided by the WHO and Johns Hopkins University. Then, a network is formed in the region to establish the correlation matrix among the countries concerned. Furthermore, by incorporating the correlation matrix into the proposed formula and calculating the model coefficients, the NARI model is used to predict the number of infected cases in Iran up until the end of September 2021, taking into account the impact of the disease in neighboring countries. The main results obtained in this study are as follows:
- 1.
The correlation matrix obtained from the network of the countries in the region shows that the greatest impact of COVID-19 on Iran comes from Iraq, Turkey, Pakistan, Azerbaijan, Afghanistan and Armenia, with correlation coefficients of 0.86, 0.83, 0.64, 0.56, 0.55, 0.16, respectively. This result can also be seen in the trend of infected cases. The increasing/decreasing trend and the number of disease peaks in Iran, Iraq, and Turkey are very similar and have occurred within a short period of time. This indicates that the proposed correlation criterion is able to capture the similarity between infected data and disease peaks;
- 2.
Timeseries predictions can be made with or without considering disease interactions in different countries. Incorporating the disease interaction not only helps the algorithm assess one of the most important components of disease transmission between societies but also significantly increases the accuracy of the timeseries prediction. This issue can be examined by comparing the two models of ARIMA and NARI. The RMSE with and without considering the disease interactions among neighboring countries is equal to 5.42 and 3.06 for ARIMA and NARI, respectively. This means that the consideration of the disease interactions in neighboring countries improves the prediction accuracy considerably. As the model’s accuracy in predicting disease increases, more reliable tools are provided for policymakers to take informed controlling decisions;
- 3.
The point estimation obtained from the NARI model indicates that the number of infected cases in Iran declines after September 2021, so the total currently infected cases will fall below 480,000 by the end of 2021. According to the prediction corresponding to the lower limit of 20%, 50%, 80%, and 95% quantiles, the total number of infected persons will fall below 390,000, 320,000, 220,000 and 130,000, respectively, by the end of 2021.
Iran’s close neighbors, sharing common borders, and their impacts on the COVID-19 spreading profile in Iran are examined in this paper. However, ideally, more distant countries in the region that have direct or indirect demographic relationships with Iran can be also considered. Such a high volume of interactions between the countries requires the construction of a larger network to cover more countries and to subsequently provide a more reliable prediction. Such a model imposes more complexities on the problem, making the prediction results more accurate and reliable. Moreover, various factors, such as hospitalization, social distancing, quarantine, and so forth, can affect the number of people infected with COVID-19 in a society. However, the spreading profile of disease under the effects of the involved factors is not in the scope of the current research. Simulating the disease spread, taking into account the factors involved, requires establishing a system of differential equations in a so-called compartmental model and solving it incrementally to simulate the disease profile in the future. This topic is under investigation by the authors.