**1. Introduction**

With the intensification of global warming, climate abnormalities and natural disasters have become more and more intense, and the increasing changes in the environment have provided very favorable conditions for the spread of hand-foot-and-mouth disease (HFMD) [1,2]. Although HFMD is not a critical disease, there are still many children who have very serious complications due to this illness. If they are not treated in time, a series of complications such as myocarditis and encephalitis will occur, causing vital organ damage and even threatening their lives [3].

The prevalence of HFMD in China has continued unabated, and it has received great attention from the national health department. The prevention and treatment of HFMD should stop transmission from the root cause. However, the virus that leads to HFMD is not only many kinds, but also many types. Therefore, to carry out research on the prediction of the number of HFMD prevalence, the early-warning of epidemic trends and related factors has become the top priority of the country's HFMD epidemic control [4].

However, in previous studies on HFMD prediction and early-warning models, relevant researchers mainly conducted statistical analysis on factors related to the HFMD

**Citation:** Lin, X.; Wang, X.; Wang, Y.; Du, X.; Jin, L.; Wan, M.; Ge, H.; Yang, X. Optimized Neural Network Based on Genetic Algorithm to Construct Hand-Foot-and-Mouth Disease Prediction and Early-Warning Model. *Int. J. Environ. Res. Public Health* **2021**, *18*, 2959. https://doi.org/ 10.3390/ijerph18062959

Academic Editor: Tim Hulsen

Received: 8 February 2021 Accepted: 8 March 2021 Published: 14 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

epidemic [5]. Including weather and demographic attributes, they explore the correlation between the different regions in the incidence of HFMD at different times of the amount of each factor and established a variety of models for prediction. However, these methods lack accurate positioning and research on the HFMD epidemic warning, and the data for establishing the prediction model is insufficient, the time and space of the data are too large, and the method used by the model is not perfect. Above shortcomings have caused many problems such as HFMD prediction and early-warning model to be inaccurate, limited to the problems of broad prediction and blind early-warning.

This paper aims to carry out accurate data analysis and standard data preprocessing based on the incidence of HFMD. At the same time, this paper also established a more accurate prediction model of the number of HFMD cases and a more reasonable early warning model. These have laid the foundation for realizing early warning of whether the regional HFMD has broken out or strengthened prevention and control.

#### **2. Related Work**

The World Health Organization (WHO) attaches great importance to the establishment of an early-warning system for infectious diseases, and develops an early-warning mechanism for infectious diseases and promoted its irreplaceable important role [6]. The principle of the early warning system is to make a judgment on whether there will be an outbreak or epidemic of infectious diseases based on the clinical information of the existing disease diagnosis patients. Their purpose is clear, just to detect abnormal health incidents promptly, quickly notify relevant health departments and staff, and take preventive and control measures in the first time. The early-warning system of such infectious diseases is the symptom monitoring system [7,8].

Since its establishment in 1946, the Centers for Disease Control and Prevention (CDC) has established a national infectious disease surveillance system for epidemic infectious diseases such as malaria and influenza. Until 1995, they began to build for all types of acute infectious disease monitoring network, in 2001 they integrated more than 100 spotty infectious disease surveillance system. Since then, the monitoring and early-warning system has been changed to "National Disease Electronic Monitoring System" [9].

The European Union (EU) has developed a group-type infectious disease monitoring and early-warning system based on the cooperation of all member states. It provides a collaborative platform for information and control and prevention for the countries in the group, and at the same time focuses on international cooperation and exchanges [10].

In January 2004, China began trial operation of the direct online reporting system for epidemics and public health emergencies, and the system was officially launched in April of the same year [11]. Afterwards, direct online reporting of various infectious diseases such as tuberculosis, dengue fever, and HFMD have been launched on the system, and public health information resources have been integrated and shared. At present, China's infectious disease early-warning model mainly uses the mobile percentile early-warning method and the spatial scanning statistical method [12,13]. However, these two methods rely too much on the direct reporting system of infectious diseases, and because the model is simple, many parameters are determined artificially. This has led to the problems of poor early-warning accuracy, repeated early-warnings, no early-warnings during epidemic periods and chaotic early-warning during non-epidemic periods, which seriously affected the early-warning work of HFMD epidemics.

At present, most researchers' research on the prevalence of HFMD relies on statistical methods such as multiple linear regression [14], cross-correlation analysis, and correlation analysis. They analyzed the correlation between related influencing factors and the incidence of HFMD, and obtained statistically significant results. This research results mostly proved the correlation between certain epidemic factors and the number of HFMD cases; at the same time, the incidence of HFMD epidemic was predicted by using the above-mentioned three infectious disease prediction methods.

Yin Ye et al. counted the daily incidence of HFMD for six years since 2011, calculated the correlation coefficient between the daily incidence of HFMD and twelve weather indicators of the day, and drew the conclusion that the daily incidence of HFMD is correlated with certain meteorological factors [15]. Jing Qinlong et al. used cross-correlation analysis methods to study relevant meteorological factors. They found that as the lag period decreases, the relationship between monthly average temperature and monthly cumulative precipitation and the number of HFMD monthly cases is the strongest [16]. Liu Yamin et al. established a variety of different prediction models using monthly incidence data from 2010 to 2015, then input the monthly incidence rate data of HFMD in 2016 as test data into the model [17]. Under the comparison of four objective evaluation indicators, they found that the seasonal autoregressive integrated moving average (SARIMA) model not only has excellent fitting generalization ability, but also has higher prediction accuracy.

As we entered the era of big data, many scholars began to design methods using big data to help build more accurate prediction or early-warning model of diseases [18,19]. So in this paper, we would present our effort at constructing a HFMD prediction and early-warning model with the help of big data.

#### **3. Construct HFMD Prediction Algorithm Model Based on BP Neural Network**

Figure 1 shows the overall process of the HFMD prevalence prediction model based on back propagation (BP) neural network constructed in this article, which will be introduced in detail below.

#### *3.1. Data Acquisition and Analysis*

This paper uses big data to build a predictive and early-warning model for HFMD through multi-dimensional data fusion. The data used mainly include two parts: incidence data and environmental data.

First of all, the incidence data comes from HFMD in Shanxi Province in 2016. There is no personal privacy data in this data, including: region (township), date of onset, age group, gender group, and population classification.

For the incidence data, we carried out exploratory data analysis to select appropriate characteristic factors affecting the HFMD epidemic in the model construction process, mainly analyzing indicators such as gender, population type, onset time, and patient age:


**Figure 1.** Flow chart of HFMD epidemic prediction model based on BP neural network.

**Figure 2.** Sex ratio of HFMD patients.

**Figure 3.** HFMD's proportion of each population category.

**Figure 4.** HFMD's proportion of each population category.

**Figure 5.** Age distribution of HFMD epidemic. the link (http://data.sheshiyuanyi.com/WeatherData/, accessed on 1 March 2020). x-axis is in years.

After the epidemic analysis of the original data, according to the number of cases per day in each district and county, the statistics are summarized, and only the date, area and statistical incidence in the original data file are retained.

Then, according to the results of epidemic analysis, the daily weather of each district and county was obtained. Due to different weather data sources and different ways of data acquisition, some weather data (maximum temperature, minimum temperature, wind level) need to be obtained from lishi.tianqi.com by web crawler; the other part of weather data (sunshine duration, air humidity, average air pressure) is obtained by file download. Because this part of the weather data only exists in the meteorological stations in the province and the data index is stable, the three weather data of 18 meteorological stations in Shanxi Province are downloaded from the http://data.sheshiyuanyi.com/ WeatherData/, accessed on 1 March 2020, and the statistical areas are allocated according to the weather data of the nearest meteorological stations. Before the distribution, the nearest meteorological stations can be found by crawling the geographical location of the regions and meteorological stations, i.e., latitude and longitude data, and the three weather data of the nearest stations are allocated to the statistical areas; to consider the effect of the incubation period (usually 4 days) on the daily incidence of disease, the corresponding weather data of the day before 4 days were obtained, and the weather indexes such as maximum temperature, minimum temperature, wind grade, average sunshine duration, average air humidity and average air pressure were obtained in the same way.

Finally, according to the results of HFMD epidemic analysis, population data needs to be summarized, so the internal population data is calculated to count the number of children aged 0–6 in each region, and integrated into the data file generated in the previous step; at the same time, in order to consider the impact of the incidence of the day before the day on the day, the number of cases from the previous day is also included in the model characteristics to generate complete data for establishing the HFMD epidemic prediction model.

#### *3.2. Data Preprocessing*

The process of data preprocessing will greatly influence the result of data analysis [20].

#### 3.2.1. Missing Value Processing

Among the data related to the factors affecting the spread of HFMD, the weather data or the population data of districts and counties on the day have some variable values missing, so appropriate methods must be used to deal with them. First of all, for variables whose values are not collected and most of the individuals whose variables are missing, the simple deletion method is used to directly delete variables or individual data, and will not be included in experimental research and data analysis. Then, the nearest neighbor padding method is used to fill the attributes with stable attribute values and small numerical variance. Finally, the mean value filling method is used to deal with the situation where a small part of the data is missing.
