*2.2. Back Propagation Neural Networks (BPNN)*

BPNN discover intricate structures in large datasets by using the backpropagation algorithm to indicate how a machine should change its internal parameters [30]. The ARMA model in this paper is a linear time-invariant system, which can be effectively simulated by using BPNN. BPNN is a multi-layer feedforward neural network trained according to the error back propagation algorithm. The complete neural network structure is composed of a large number of neurons. A typical BPNN consists of three layers: input layer, hidden layer, and output layer. Generally, we use normalized data as the input layer. Different from the input layer, the neurons in the hidden layer and the output layer have computational functions and have similar definitions. For neurons in the hidden layer and output layer, its iteration consists of two parts, namely forward propagation and back propagation. The forward propagation of a single neuron consists of two steps. First, calculate {*z*} through weight and bias, and then calculate {*a*} through an activation function *g*(*x*), where {*a*} is the input layer of the next layer of neurons or output layer. According to the calculation result of forwarding propagation, the weight and deviation are updated through back propagation. The back propagation of the BPNN is calculated by the gradient descent method. After many iterations, the neural network can fit the data with less error. It is worth noting that the activation functions of the hidden layer and the output layer can be different. Figure 1 shows single neuron calculation and BPNN structure.

**Figure 1.** Single neuron calculation and BPNN structure.

When using the BPNN to simulate the ARMA model, we expect this method to be able to estimate the parameters of the ARMA model and obtain a method for determining the order of the model. Hossain et al., 2020, studied artificial neural networks (ANN) to determine the order of ARMA model, but this method did not consider the influence of bias in the process of formula derivation [20]. Therefore, we re-derive the relevant formula. Equation (1) can be rewritten in the following form

$$X\_t = \sum\_{i=1}^p \phi\_i X\_{t-i} + \sum\_{j=0}^q \theta\_j Z\_{t-j} \qquad (\theta\_0 = 1) \tag{2}$$

where *X*t is the time series, *Zt* is the noise sequence, *φi* and *θj* are coefficients of ARMA model.

Next, we compare the difference between the calculation method of BPNN and Equation (2). Figure 1 shows the processing of the input data by a single neuron. According to this, we can obtain the calculation process of the hidden layer neuron on the input data as follows

$$
\begin{bmatrix} z\_1 \\ z\_2 \\ z\_3 \\ \vdots \\ z\_m \end{bmatrix} = \begin{bmatrix} w\_{11} & w\_{12} & w\_{13} & \cdots & w\_{1n} \\ w\_{21} & w\_{22} & w\_{23} & \cdots & w\_{2n} \\ w\_{31} & w\_{32} & w\_{33} & \cdots & w\_{3n} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ w\_{m1} & w\_{m2} & w\_{m3} & \cdots & w\_{mn} \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x}\_1 \\ \mathbf{x}\_2 \\ \mathbf{x}\_3 \\ \vdots \\ \mathbf{x}\_n \end{bmatrix} + \begin{bmatrix} b\_1 \\ b\_2 \\ b\_3 \\ \vdots \\ b\_n \end{bmatrix} \tag{3}
$$

or

$$Z^{[1]} = \mathcal{W}^{[1]}X + b^{[1]}$$

Next,

$$
\nabla\_{\bar{\tau}^i} = W^{\bar{\tau}^i} \lambda + \bar{\nu}^{\bar{\tau}^i}
$$

$$\begin{bmatrix} a\_1 \\ a\_2 \\ a\_3 \\ \vdots \\ a\_m \end{bmatrix} = \begin{bmatrix} g\_1(z\_1) \\ g\_1(z\_2) \\ g\_1(z\_3) \\ \vdots \\ g\_1(z\_m) \end{bmatrix} \tag{4}$$

$$\text{or}$$

$$a^{[1]} = \mathfrak{g}\_1(Z^{[1]})$$

where *X* is a column vector composed of input data, *W*[1] is a column vector composed of *WTi* , *b*[1] sis a column vector composed of bias, *g* is the activation function used, and *a*[1] is a column vector composed of activation value. The output of the proposed BPNN can be written as follows

$$\begin{bmatrix} z\_1 \end{bmatrix} = \begin{bmatrix} & w\_{11} & w\_{12} & w\_{13} & \cdots & w\_{1m} \end{bmatrix} \cdot \begin{bmatrix} a\_1 \\ a\_2 \\ a\_3 \\ \vdots \\ a\_m \end{bmatrix} + \begin{bmatrix} b\_1 \end{bmatrix} \tag{5}$$

or

$$Z^{[2]} = \mathcal{W}^{[2]}a^{[1]} + b^{[2]}$$

Next,

$$
\mathfrak{G} = a^{[2]} = \mathfrak{g}\_2(Z^{[2]}) \tag{6}
$$

where the meaning of each letter is similar to before. In this process, if we do not consider the bias and the activation function of the hidden layer, and at the same time set the activation function of the output layer to the linear activation function, we will obtain the following results

$$\mathcal{Y} = \mathcal{W}^{[2]} \mathcal{W}^{[1]} X \tag{7}$$

Equation (7) is consistent with the conclusion derived by Hossain et al., 2020 [20].

Although the method described in Equation (7) can easily obtain the coefficient estimates of the ARMA model, it does not consider the influence of the bias and the nonlinear activation function on the neural network process. The existence of bias is of grea<sup>t</sup> significance to the operation of neural networks. It can improve the accuracy of neural network classification and reduce the noise in the evaluation process [31]. When we add bias, although the effect of neural network iteration can be improved, the coefficient estimates cannot be obtained as easily as Equation (7). This is because of the influence of the deviation column vector, the coefficient calculation of the ARMA model in the calculation of the BPNN can no longer be simply obtained through the weight matrix. Our other improvement to Equation (7) is the addition of a nonlinear activation function. This is not only because the nonlinear activation function can better exert the computational performance of the neural network but also the coefficient estimation of the ARMA model itself is a nonlinear process. In the symmetric formwork system, BPNN can overcome the shortcomings of insufficient randomness of ARMA order estimation. It is worth noting that the coefficient of the model can be better estimated by the least square method when we can accurately determine the order of the model [32,33].
