1. Introduction
Coronaviruses are a family of viruses that are serious pathogens of people. They result in gastrointestinal, hepatic, neurological, and severe respiratory diseases. Their main distributions are among humans, bats, mice, livestock, and wild animals [
1,
2,
3]. The last two decades witnessed three outbreaks of coronaviruses, called SARS-CoV, MERS-CoV, and SARS-CoV-2 (COVID-19), in 2003, 2012, and 2019, respectively. These three outbreaks have confirmed human-to-human and animal-to-animal transmission [
4].
According to the official numbers of the confirmed cases of the three mentioned outbreaks, the new coronavirus, COVID-19, is the most dangerous, and its spread is the highest, as recorded in more than 200 countries and territories. The first reported cases of COVID-19 were recorded in Wuhan City, Hubei Province, China [
5]. The beginning was linked to several people who visited a local seafood market in Wuhan and suffered from respiratory illness. The number of reported cases increased daily in Wuhan, Hubei province, and in other Chinese cities and provinces. After a short time, several countries recorded confirmed cases of COVID-19, such as Japan, Korea, and several other countries. Thereafter, a huge outbreak of COVID-19 spread in many countries, especially in European countries, such as Italy, Spain, Germany, France, and others. In Asia, except China, the most affected countries are Korea and Iran; whereas in the Americas, the most affected country is the USA. The source of the new coronavirus, COVID-19, is still unconfirmed, and in some studies, such as Lu et al. [
6], it was shown that bat-derived coronavirus strains were similar to COVID-19; therefore, the authors found that bats were the potential source of COVID-19.
The daily confirmed cases globally have sharply increased, even with the strict policies implemented by governments and the lockdown of many cities in the world. The main reason for that is the incubation period of COVID-19, which may be up to 14 days, as described by Chen et al. [
7]. During the incubation period, the infection can be transmitted to others even if the infected person does not have symptoms. Furthermore, for some people, the incubation period may reach 24 days, as concluded by Guan et al. [
8].
The rapid spread of COVI-19 confirms that it is a terrifying pandemic; therefore, it is necessary to study and analyze the increase of the affected cases or so-called confirmed cases.
Forecasting previous epidemics has received wide attention, and different methods have been proposed. For example, a forecasting model based on Bayesian inference was proposed by Shaman et al. [
9] to forecast the outbreaks of Ebola in Guinea, Liberia, and Sierra Leone. An ensemble adjustment Kalman filter based forecasting method was proposed by Shaman et al. [
10] to forecast seasonal outbreaks of influenza in New York City. Another Kalman filter based model was also proposed by Shaman et al. [
11] to forecast weekly influenza cases. Moreover, different mathematical and statistical methods have been proposed for various epidemics, such as hepatitis A virus infection proposed by True and Kurt [
12], West Nile virus (WNV), proposed by Defelice et al. [
13], SARS proposed by Massad et al. [
14], influenza A (H1N1-2009) proposed by Ong et al. [
15], and MERS proposed by Nah et al. [
16].
Recently, there have been several studies presented to address different forecasting issues for COVID-19, for example: forecasting of the human-to-human transmission of COVID-19 by Thompson [
17], forecasting the number of confirmed cases of COVID-19 by Zhao et al. [
18] and Al-qaness et al. [
19], forecasting the infection rate of COVID-19 by Nishiura et al. [
20], estimating the transmission risk of COVID-19 by Tang et al. [
21], and estimating the risk of death of COVID-19 by Jung et al. [
22].
On 24 March 2020, the number of confirmed COVID-19 cases reached 24,811, 69,176, 9073, and 53,740 in Iran, Italy, Korea, and the USA, respectively. In this paper, we propose a time-series forecasting approach to forecast confirmed cases of COVID-19 in four countries, Korea, the USA, Italy, and Iran, using an improved adaptive neuro-fuzzy inference system (ANFIS). The ANFIS is a well-known time-series forecasting model, which has received wide attention and been applied for various prediction and forecasting issues, such as stock prices [
23], oil prices [
24], energy and oil consumption [
25,
26,
27], and others. One of the main limitations of ANFIS is the estimation of its parameters. Recently, various optimization approaches were employed to solve this challenge, such as the sine-cosine algorithm (SCA) [
26], particle swarm intelligence (PSO) [
28,
29,
30], and social-spider optimization [
31].
In this paper, we present an improved ANFIS version, by enhancing its performance using a new nature-inspired optimization approach, called the marine predators algorithm (MPA). The MPA was proposed by Faramarzi et al. [
32]. It is inspired by the foraging strategy of ocean predators, based on two types of strategies, called Lévy and Brownian motion, which are selected by the predators for optimal foraging. Therefore, in this study, we leverage the MPA to optimize the ANFIS parameters.
In our previous study [
19], we proposed an enhanced ANFIS forecasting model, called FPASSA-ANFIS. We forecasted the number of infected people in China. Although the proposed model showed good performances, using two metaheuristics, salp swarm algorithm (SSA) and flower pollination algorithm (FPA), was a little complex. However, it was found that it needs more improvements, especially to deal with large-scale datasets, and also, its exploration ability is less effective than its exploitation. Therefore, this study applied a new metaheuristic method called the marine predators algorithm (MPA) [
32]. This algorithm simulates the strategy that represents the relation between the predator and prey in the ocean by using the Brownian and Lévy movements. Our developed MPA-ANFIS approach begins by setting the initial value for its parameters. Then, this is followed by splitting the historical data of COVID-19 for the specified country into two sets of training and testing. Then, we set the initial value for a set of solutions that indicate the configuration of the parameters of the ANFIS network. Thereafter, we compute the performance of the ANFIS model using the training set and the current configuration/solution using the root mean squared error (RMSE) as an objective function. The next step is to determine the best configuration of the parameter. We then use the operators of MPA to update the other solutions. After reaching the terminal condition, the best solution is used to build the ANFIS model and the testing set to assess the constructed ANFIS model. This next step is the forecasting of COVID-19.
The primary contributions and objectives are listed as follows:
We propose a robust time-series model for forecasting the number of infected people (confirmed cases) of SARS-CoV2 in several countries, Iran, Italy, Korea, and the USA.
We improve the performance of the ANFIS model using a novel optimization method, MPA, which has not been applied in previous studies since the MPA is a new algorithm proposed in recent months.
We evaluate the proposed MPA-ANFIS with official datasets and by comparing it with several previous forecasting methods.
The rest of sections of this study are arranged as follows:
Section 2 consists of the preliminaries of ANFIS and MPA.
Section 3 presents the MPA-ANFIS method. Experiments and results are described in
Section 4. Finally, the conclusion is presented in
Section 6.
3. The Proposed Method
This section introduces the proposed method called PMA-ANFIS. The goal of PMA-ANFIS is to forecast the number of cases of COVID-19 in four countries, namely Italy, the USA, Iran, and Korea.
The proposed method improves ANFIS by optimizing its parameters. The ANFIS model was selected because it is widely used in many forecasting tasks. It also can work effectively with uncertainty, fuzziness, and ambiguity in the problem. MPA is a new optimization algorithm; it shows good performance in selecting the best ANFIS parameters compared to other methods.
PMA-ANFIS is constructed using the five layers of the ANFIS model, where the Layer 1 receives the input data, and Layer 5 produces the results. The main goal of FPA is to optimize the ANFIS weights that lie between Layers 4 and 5. This process works in the training phase.
PMA-ANFIS receives the number of confirmed cases and their dates. Then, the input data are formed by the proposed method to be in a time-series format. Due to the data diversity in the four countries, the autocorrelation function (ACF) is applied to perform this step. It searches for patterns in the data and helps select the best one. It is recommended that a number greater than 0.2 be considered; therefore, in this study, 6 lags were selected for the USA dataset, 5 lags for both the Korean and Iranian datasets, and 7 lags for the Italian dataset. With these settings, the input data were formed.
The entire dataset was divided into two groups. The first group (i.e., training set) contained 75% of the data, while the rest was used as a testing set. ANFIS applies the fuzzy c-means method, and the cluster number was set to seven.
To evaluate the quality of the candidate parameters, the mean squared error (MSE) was applied (as in Equation (
18)). The MSE computes the error between the target and the produced data.
where
g indicates the target data.
d is the output of the produced data. The size of the population is defined by the variable
.
As the optimization method, MPA-ANFIS starts by creating a population (X) to represent the problem population. After that, the objective function is applied to test the solutions individually. In each iteration, the value of the MSE is checked, and the solution that has the lowest value of MSE is saved as the best solution. MPA-ANFIS works and loops its steps until meeting the stop criterion, and the best parameter of ANFIS is passed to the testing stage. The optimized ANFIS model is used to compute the final results in the testing stage.
MPA-ANFIS was evaluated using well-known performance measures, namely root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination (
). The MPA-ANFIS stages are illustrated in
Figure 2.
5. Discussion
In this paper, we proposed a modified ANFIS model using a new optimization algorithm, called MPA, to forecast the number of confirmed cases of COVID-19 in four different countries, Italy, Iran, Korea, and the USA.
By analyzing the relation of confirmed cases (RCC) between the confirmed cases and the four countries’ areas, we could note that there was a positive relation in all countries. The area of Italy was the smallest one among the four countries (301,339 ), and the RCC was the highest one, equaling 10.29%, whereas, the USA had the largest area (952,5067 ), and the RCC was the smallest one, equaling 0.44%. The RCC of Korea (100,210 ) was 4.25%, and the RCC of Iran (164,8195 ) was 1.13%.
From the analysis of forecasting confirmed COVID-19 cases for the four countries, it could be observed that the confirmed cases rate increased between 2% and 42% in Italy and between 8% and 40% in the USA, whereas, in Iran and Korea, it increased between 3% and 13% and 0.5% and 3%, respectively.
In this study, we proposed an alternative forecasting COVID-19 model that depended on improving the quality of the ANFIS model using MPA. The proposed MPA used the COVID-19 datasets from four countries. The main aim of using those datasets was to test the ability of ANFIS-MPA to work with data collected from different countries, and each one of these countries had its dynamics and different internal conditions.
The results of the improved ANFIS using MPA seemed to propose that the COVID-19 curve for the USA, Iran, and Italy had an exponential form, and for Korea after 13 March, it increased with small numbers. From the previous analysis, it could be concluded that the performance of the developed MPA-ANFIS model provided better results than the other models over all the tested datasets. However, the proposed ANFIS-MPA suffered from some limitations, such as its computational time seemed to be higher than other models in some cases. In addition, ANFIS needed some improvement in its structure to avoid the over-fitting problem that occurred when the algorithm was trained using the training set, but it could not provide the optimal response when the testing set was applied to its learned model. Furthermore, the traditional MPA still needed more improvement since it was found that, by analyzing its behavior, the exploitation ability was weaker than the exploration ability.
For more improvement and investigation, the mobility and transportation data between countries and within a country need to be addressed in future work, which may reveal the real reason for this terrifying spread of COVID-19. However, access to these records requires more time.