*3.4. Construction of HFMD Prediction Model*

Commonly used machine learning regression prediction algorithms include multiple linear regression (LR), support vector regression (SVR), differential integrated moving average autoregressive models and BP neural networks [15] and so on. Through analysis, we will select BP neural network to construct an early prediction model for HFMD on big data.

After analysis of related factors, seven related variables were obtained. After these seven related variables are normalized by Min-Max, the training set and the test set are randomly selected according to the ratio of 7:3. Then input the training set into the prediction model to be established, and train and adjust the parameters in the model. After a series of training processes, a HFMD epidemic prediction model suitable for solving this problem is obtained. Then input the test set into the HFMD prevalence prediction model for prediction, obtain the prediction result, and compare the result with the expected output to evaluate the HFMD prevalence prediction model.

The structure of the HFMD prediction model based on the machine learning regression algorithm is shown in Figure 6. It includes six modules: data acquisition and summary, data preprocessing, influencing factor analysis, model learning, epidemic case number prediction, and model evaluation analysis. In the data acquisition and summary module, the meteorological factors and demographic factors data related to the HFMD epidemic are acquired in multiple ways, and the county daily data is summarized as city weekly data; the dirty data is mainly cleaned in the data preprocessing module; in the influencing factor analysis module, univariate, bivariate and multivariate joint analysis of the correlation between influencing factors and the number of popular populations are carried out, and the feature set of relevant HFMD epidemic influencing factors suitable for modeling is selected; in the process of model learning, the machine learning regression model is used to learn to obtain the optimal structure; in the HFMD epidemic case number prediction module, the test set is input into the model; in the model evaluation and analysis module, the learned optimal model is analyzed with different weights on the training set and the test set, and the relevant evaluation index values are obtained to judge the pros and cons of the model.

**Figure 6.** Structure diagram of HFMD epidemic prediction model based on machine learning regression algorithm.

Figure 7 shows the three-layer structure of the BP neural network used in this article. The number of neurons in each layer is *m*, *k*, and 1, respectively. The number of hidden layers and the number of neurons in each layer can be dynamically adjusted according to the training effect. However, the number of neurons in the first and last layers is fixed. The

training process of BP neural network is realized by error feedback mechanism [16]. The activation function used is the relu function.

**Figure 7.** Three-layer single output BP neural network structure diagram.

The model training process is:


#### **4. Neural Network Parameter Optimization Based on GA**

In the training process of BP neural network, the gradient descent method and error feedback propagation mechanism are essentially used to dynamically update the connection weights, which also exposes the shortcomings of this training method [21]. First of all, there are strict requirements for model hyperparameters such as learning rate, too large

or too small learning rate will affect the optimization effect; Secondly, if the number of training iterations is too large, the convergence efficiency is low when the error function gradually becomes flat in the later stage, and it is difficult to converge to a flat point if the number of generations is too small; Finally, it is because the training starts according to the initial set of random weights, looking for a smooth gradient and falling into a local minimum state, it is difficult to jump out to find the global minimum state. Therefore, in view of the above shortcomings, we use genetic algorithm (GA) to globally optimize the connection weights.

First determine the BP network structure, and encode the target individual with floating-point numbers. Arrange all the connection weights in the neural network in order to form the row vector *Wj* = (*w*1, *w*2, ··· , *wn*) of individual *j*, which represents the genetic code of chromosome *j* in the population, The weight *wi* of connection *i* in the network represents the genotype of gene *i* on the chromosome, and *n* represents the number of all connections in the neural network. In this model, if the number of neurons contained in the input layer, the first hidden layer, the second hidden layer, and the output layer are *m, k, h,* 1, respectively, then the number of genes *n* is calculated as in Equation (1).

$$m = m \times k + k \times h + h \tag{1}$$

Secondly, all individuals in the population must be initialized randomly. Each individual has a chromosome vector, which can be decoded back into a BP neural network model with floating-point numbers. Therefore, before learning, all individual vectors must be initialized with random real numbers in the range of [−1, 1] to generate the first generation population.

Finally, it is necessary to calculate the fitness of all individuals, select individuals for genetic operations, including replication, crossover and mutation, to generate a new generation of populations [22]. In this paper, the fitness can be calculated directly from the average error of the individual on the sample. Therefore, select high fitness, i.e., individuals with small errors for retention, and select low fitness, i.e., individuals with large errors for elimination. Thus, individual neural networks with poor fit are discarded in the training process. Then, perform uniform mutation and arithmetic crossover operations on general individuals to obtain a new generation. After repeated training reaches the specified number of evolutions, the optimal model is obtained. Otherwise, it returns to the fitness calculation step to reiterate. The final individual is decoded to obtain all the connection weights in the neural network, which are used for actual prediction and quantitative evaluation indicators are obtained. Figure 8 is a detailed flowchart of the GA-BP HFMD prediction model.

**Figure 8.** Flow chart of GA-BP HFMD epidemic prediction model.
