*3.2. FFANN-BP*

It is well known in FFANN-BP the weighted sum of inputs and bias term are passed to the activation level through the transfer function to produce the output. The network is trained in an iterative process. The number of hidden layers is chosen to be only one to reduce the network complexity, and increase the computational efficiency. Figure 1 shows the architecture of the FFANN-BP [33]. The inputs are fed into the input layer and propagate through the activation function, different layers may perform different transformations on their inputs. Then The mean squared error between the outputs and actual target values is backpropagated from the output layer to the input layer. The error is minimized by the adaptation of their connected weights in a supervised way. The most important problem is to decide the number of layers and neurons in the hidden layers.

Without loss of generality, let there be *n* neurons in the input layer, *p* neurons in the hidden layer, and *q* neurons in the output layer. The *k*-th input vector is *x*(*k*)=(*x*1(*k*), *x*2(*k*), ... , *xn*(*k*)). The *k*-th input vector of the hidden layer is *hik* = (*hi*1(*k*), *hi*2(*k*), ... , *hip*(*k*)), the *k*-th output vector of hidden layer is *ho*(*k*)=(*ho*1(*k*), *ho*2(*k*), ... , *hop*(*k*)). The *k*-th input vector of the output layer is *yi*(*k*)=(*yi*1(*k*), *yi*2(*k*), ... , *yiq*(*k*)), the *k*-th output vector of the output layer is *yo*(*k*)=(*yo*1(*k*), *yo*2(*k*), ... , *yoq*(*k*)). The desired output vector is *do*(*k*) = *d*1(*k*), *d*2(*k*), ... , *dq*(*k*). The weights between the *i*-th neuron in the input layer and *h*-th neuron in the hidden layer are *wih*. The weights between *h*-th neuron in the hidden layer and *o*-th neuron in the output layer are *who*, where *i* = 1, 2, ... , *n*, *h* = 1, 2, ... , *p*, *o* = 1, 2, ... , *q*. The biases of the hidden layer and the output layer are *bh* and *bo* respectively. The number of samples is *m*, and *f* is the activation function. The commonly used activation function is the sigmoid function:

$$f(\mathbf{x}\_i) = \frac{1}{1 + e^{-\mathbf{x}\_i}} \tag{2}$$

Each connection weight is assigned a random number in the interval (−1, 1). *E*,*ε*, *M* are the error function, the calculation accuracy value, and the maximum learning times respectively.

**Figure 1.** The architecture of the FFANN-BP.

The *k*-th input sample *x*(*k*)=(*x*1(*k*), *x*2(*k*), ... , *xn*(*k*)) is randomly selected, and the corresponding expected output are *do*(*k*)=(*d*1(*k*), *d*2(*k*), ... , *dq*(*k*)) and calculate the input and output of each neuron in the hidden layer.

$$h\mathbf{i}\_{\rm li}(k) = \sum\_{i=1}^{n} w\_{i\rm li} \mathbf{x}\_{i}(k) - b\_{\rm li}, h = 1, 2, \dots, p \tag{3}$$

$$h o\_h(k) = f(h i\_h(k)), h = 1, 2, \dots, p \tag{4}$$

$$\text{logi}\_o(k) = \sum\_{i=1}^{p} w\_{ho} h o\_h(k) - b\_o, o = 1, 2, \dots, q \tag{5}$$

$$y o\_o(k) = f(y i\_o(k)), o = 1, 2, \dots, q \tag{6}$$

Then the total error is computed,

$$E = \frac{1}{2m} \sum\_{k=1}^{m} \sum\_{o=1}^{q} (d\_o(k) - y\_o(k))^2 \tag{7}$$

The partial derivatives of the error function to each neuron in the output layer are calculated by using the expected output and the actual output of the network *δo*(*k*), then the partial derivative of the error function to each neuron in the hidden layer is calculated by using the connection weights from the hidden layer to the output layer, *δo*(*k*) the output of the output layer and *δh*(*k*) the output of the hidden layer [33].

$$\frac{\partial E}{\partial w\_{\text{ho}}} = \frac{\partial E}{\partial y\_{\text{o}}} \frac{\partial y i\_{\text{o}}}{\partial w\_{\text{ho}}} = -h o\_h(k) (d\_o(k) - y o\_o(k)) \\ f'(y i\_o(k)) = -h o\_h \delta\_o(k) \tag{8}$$

$$
\Delta w\_{ho}(k) = -\mu \frac{\partial E}{\partial w\_{ho}} = \mu \delta\_o(k) h o\_h(k) \tag{9}
$$

$$w\_{ho}^{N+1} = w\_{ho}^{N} + \eta \delta\_o(k) h o\_h(k) \tag{10}$$

$$
\Delta w\_{ih}(k) = -\mu \frac{\partial E}{\partial w\_{ih}} = -\mu \frac{\partial E}{\partial h i\_h(k)} \frac{\partial h i\_h(k)}{\partial w\_{ih}} = \delta\_h(k) x\_l(k) \tag{11}
$$

$$w\_{ih}^{N+1} = w\_{ih}^{N} + \eta \delta\_h(k) \mathbf{x}\_i(k) \tag{12}$$

The algorithm terminates when the error reaches the preset accuracy or the number of learning is greater than the prespecified set maximum number of times. Otherwise, we select the next learning sample and the corresponding expected output and return to enter the next round of learning.
