*3.1. Fuzzy Neural Network Structure*

The fuzzy neural network structure established in this paper has four layers, each layer is connected by weighted values. The multi-layer forward fuzzy neural network has the following advantages and characteristics for automatic control:


BP algorithm is adopted in the fuzzy neural network, its main feature is the forward transmission of the signal and the backward propagation of the error [22]. Since the transfer functions must satisfy the conditions of being differentiable everywhere, the hidden layer transfer function and the output layer activation function are Gaussian function and linear function, respectively. Fuzzy neural network mainly refers to the use of neural network structure to achieve fuzzy logic reasoning. Compared with a traditional neural network, the second layer can give specific physical meaning to tracking error, and the third layer is the fuzzy logic reasoning layer in the design of fuzzy neural network. The structure of the fuzzy neural network is shown in Figure 4 [23].

We define the error *e* and the error variation *de* as fuzzy linguistic variables, and these two fuzzy language variables contain five linguistic values, i.e., NB, NS, ZO, PS, and PB. Thus, there are 25 rules in the third layer of the fuzzy neural network structure. In the following, *xki* represents the *i*th input of layer

*k*, and *okj* represents the net input of the *j*th node of layer *k*, and *ykj* represents the output of the *j*th node of layer *k*. The input and output relationships of each layer node are as follows [24,25]:

**Figure 4.** Fuzzy neural network structure diagram.

The fuzzy neural network has input layer, membership function layer, rule layer, and output layer from left to right in Figure 4. The input–output relationship in the input layer is

$$\sigma\_i^1 = s(i), \qquad i = 1, 2 \tag{6}$$

$$s = \begin{bmatrix} e & de \end{bmatrix}',\tag{7}$$

$$y\_i^1 = o\_i^1. \qquad i = 1,2\tag{8}$$

The membership function layer is used to evaluate membership degree of each input component which belongs to the fuzzy set of each linguistic variable. The input–output relationship of this layer is

$$
\sigma\_{ij}^2 = h(y\_i^1), \qquad i = 1, 2 \quad j = 1, 2, \ldots, 5,\tag{9}
$$

$$y\_k^2 = f(o\_{ij}^2), \qquad k = 1, 2, \ldots, 10,\tag{10}$$

where *o*2*ij* represents the input of the *j*th fuzzy set of the *i*th fuzzy variable. For the same fuzzy concept, the membership functions can be different. Although the form is not exactly the same, the functions follow normal distribution which can reflect the fuzzy information that is processed by fuzzy concept when solving problems. For example, if the membership function of normal distribution is adopted, the result is shown as follows:

$$h(\mathbf{x}) = -\frac{(\mathbf{x} - m\_j)^2}{\sigma\_j^2},\tag{11}$$

$$f(\mathbf{x}) = \exp(\mathbf{x}),\tag{12}$$

where *mj* and *σj* are parameters of Gauss membership function of *x*. Hence, the input and output relationships of the second layer nodes are shown as

$$
\sigma\_{ij}^2 = h(y\_i^1) = -\frac{(y\_i^1 - m\_{ij})^2}{\sigma\_{ij}^2},
\tag{13}
$$

$$y\_j^2 = f(o\_{ij}^2) = \exp(o\_{ij}^2). \tag{14}$$

The third layer denotes fuzzy rule sets, every node denotes a fuzzy rule. This layer has 25 nodes. The input–output relationships of the third layer nodes are shown in Equations (15) and (16):

$$
\sigma\_n^3 = y\_i^2 \ast y\_j^2, \qquad i = 1, 2, \ldots, 5 \quad j = 1, 2, \ldots, 5 \tag{15}
$$

$$y\_n^3 = o\_n^3. \qquad n = 1, 2, \ldots, 25. \tag{16}$$

The fourth layer denotes controller output, i.e., the optimal result obtained by fuzzy neural network:

$$\sigma\_1^4 = \frac{\sum\_{k=1 \atop k=1 \atop k=1}^{25} y\_k^3}{\sum\_{k=1}^{25} y\_k^3}, \qquad k = 1, 2, \dots, 25,\tag{17}$$

$$y\_1^4 = o\_{1'}^4 \tag{18}$$

where *y*4 1 denotes the temperature set-point change *u*.

### *3.2. Fuzzy Neural Network Learning Algorithm*

Compared with other traditional methods, the BP algorithm has better persistence and predictability. In BP neural network, the error signal is transmitted backward from the output layer to the first layer. Therefore, hierarchical calculation can be reserved and the index function can be defined as:

$$E = \frac{(d - y\_1^4)^2}{2},\tag{19}$$

where *d* is desired signal *<sup>P</sup>*target, and *y*4 1denotes *P*total in this paper.

If the output layer is concerned, we can draw the following Equations (20) from (15) and (16):

$$\begin{split} \delta\_1^4 &= -\frac{\partial E}{\partial o\_1^4} \\ &= -\frac{\partial E}{\partial P\_{\text{total}}} \ast \frac{\partial P\_{\text{total}}}{\partial u} \ast \frac{\partial u}{\partial o\_1^4} \\ &\approx e \ast \text{sgn}(\frac{\triangle P\_{\text{total}}}{\triangle u}) \ast \frac{\partial u}{\partial o\_1^4} \end{split} \tag{20}$$

where *δ* represents the local gradient of the BP neural network. Since *∂P*total *∂u* is unknown, we approximately replace it with the symbolic operator *sgn*( *P*total *u* ), and the learning rate *η* in the following can compensate

the inaccurate calculation form.The fundamentals of the BP neural network is to use the gradient descent method to correct the weights, and Equation (21) is obtained from Equation (20):

$$
\triangle w\_k = -\frac{\partial E}{\partial o\_1^4} \* \frac{\partial o\_1^4}{\partial w\_k} = \delta\_1^4 \* y\_{k'}^3 \tag{21}
$$

where *k* = 1, 2, ..., 25.

> For the fuzzy rule layer and the membership function layer, the local gradients are denoted as

$$
\delta\_k^3 = -\frac{\partial E}{\partial o\_k^3} = \delta\_1^4 \ast w\_{k'} \tag{22}
$$

$$\delta\_j^2 = -\frac{\partial E}{\partial \sigma\_j^2} = (\sum\_k \delta\_k^3 \* y\_i^2) \* y\_j^2, \quad j = 1, 2, \dots, 10,\tag{23}$$

where *j* = *i*, *k* (*k* = 1, 2, ..., 25) stands for the node in the third layer, which are connected with the *j*th node in the second layer. *i* denotes another node in the second layer that is connected to the *k*th node in the third layer.

Therefore, the parameter correction value of the input membership function is as follows:

$$
\triangle m\_{i\bar{j}} = -\frac{\partial E}{\partial m\_{i\bar{j}}} = \delta\_{\bar{j}}^2 \frac{\mathcal{Z}(y\_i^1 - m\_{i\bar{j}})}{\sigma\_{i\bar{j}}^2},\tag{24}
$$

$$
\triangle \sigma\_{i\circ} = -\frac{\partial E}{\partial \sigma\_{i\circ}} = \delta\_{\rangle}^2 \frac{2(y\_i^1 - m\_{i\circ})^2}{\sigma\_{i\circ}^3}.\tag{25}
$$

Finally, the correction algorithm of adjustable parameters in fuzzy neural networks is expressed by

$$\begin{aligned} w\_k(n+1) &= w\_k(n) + \eta\_1 \triangle w\_k(n) + \\ w\_1(w\_k(n) - w\_k(n-1)), \end{aligned} \tag{26}$$

$$\begin{aligned} m\_{ij}(n+1) &= m\_{ij}(n) + \eta\_2 \triangle m\_{ij}(n) + \\ n\_2(m\_{ij}(n) - m\_{ij}(n-1)), \end{aligned} \tag{27}$$

$$\begin{aligned} \sigma\_{lj}(n+1) &= \sigma\_{lj}(n) + \eta \Box \sigma\_{lj}(n) + \\ &\alpha\_3(\sigma\_{lj}(n) - \sigma\_{lj}(n-1)), \end{aligned} \tag{28}$$

where *η*1, *η*2, *η*3 are learning rates for each adjustable parameter, and *α*1, *α*2, *α*3 are momentum factors for each adjustable parameter. The well-chosen learning factors and momentum factors can accelerate algorithm convergence, reduce shock, and effectively suppress the local minimum—and their values are limited to (0, 1) interval.

### *3.3. Optimization of Initial Value of Adjustable Parameters*

The learning effect of the fuzzy neural network has a strong dependence on the initial values of the connection weight and membership function. The BP algorithm is suitable for solving the complicated nonlinear problem, but the algorithm usually falls into local minimum, resulting in training failure. In [26], the authors proposed a method for estimating the parameters of dynamic models for induction motor dominating loads. Based on PSO, the method finds the adequate set of parameters that best fit the sampling data from the measurement for a period of time, minimizing the error of the outputs and active and reactive

power demands. Hence, a hybrid algorithm uniting the PSO and BP algorithms was proposed to find the optimal initial parameter. The PSO algorithm includes a group of particles, these particles are stochastically distributed in the high-dimensional search space. These particles are group members that are used to find the optimal solution in a high-dimensional search space, and the optimal solution is equal to the possible position for the object. Each particle is a candidate optimal solution in a higher dimensional search space, the optimal direction and speed of individual particles are dependent on the optimal location and optimal speed of the whole particle and its own, and the optimization results are measured by the fitness function. The fitness function is determined by the specific problem. The mathematical description is as follows:

$$V\_i \quad = \quad V\_i + c\_1 \* r\_1 \* (lbest\_i - L\_i) + c\_2 \* r\_2 \* (gbest - L\_i), \tag{29}$$

$$L\_i \quad = \quad L\_i + V\_{i\prime} \tag{30}$$

where *Vi*, *Li* and *lbesti* denote the velocity, location, and historical optimal location of the *i*th particle, respectively. *gbest* is the best position of all particles at present. *c*1 and *c*2 are learning rates, and *r*1 and *r*2 are two random numbers between 0 and 1.

Next, the root-mean-square error (RMSE) is defined as an indicator used to assess tracking performance, which can be defined as follows:

$$RMSE = \sqrt{\frac{\sum\_{k}^{N\_s} c\_k^2}{N\_s (P\_{\text{target}}^{\text{max}} - P\_{\text{target}}^{\text{min}})^2}}\,\text{}\tag{31}$$

where *Ns* denotes quantity of control cycles, *P*max target and *P*min target denote the upper and lower limits of the target signal range, respectively. The flow chart of the optimization algorithm is shown in Figure 5.

**Figure 5.** Flow chart of the optimization algorithm.
