**1. Introduction**

In different insulation systems, transformer oil is an important insulating medium in power transformers, and the water content in oil is an important factor in determining the insulation life of transformers [1–3]. Many problems in the insulating system, such as breakdown voltage reduction, dielectric loss increase and the acceleration of the chemical reaction of organic matter, are caused by a higher water content in oil [4–7]. Therefore, the detection of water content in transformer oil is of grea<sup>t</sup> significance to ensure the safe and stable operation of the transformer.

The technology and methods of water content detection in transformer oil have been studied by many scholars domestically and abroad [8–12]. Detection methods of water content in transformer oil, including off-line and on-line, have been widely reported. The measurement of water content based on humidity sensors and the reduction of temperature error have been studied by some scholars [8]. Martin et al. predicted water content by means of establishing a mathematical model of the water dynamic [9]. But this measurement should be corrected for by any difference in oil temperature

between the water activity probe location and the insulation hot-spot location. The measurement of water content using direct optical techniques has been proposed in [10]. But the polar molecules in oil can affect optical measurement accuracy. The authors in [11] proposed to predict the water content in oil from the curve of water equilibrium. In order for this method to yield accurate data, the water content and temperature must have reached equilibrium. In conclusion, more or less errors were caused when water content was tested using the above method, because the inner part of the transformer in operation is an extremely complex movement process and the transformer oil is a very complex mixture [13–15]. Since the multi frequency ultrasonic detection technology is a non-destructive testing technology, and has many advantages, such as increasing cavitation events, reducing the dead angle caused by standing wave, and improving the sonochemical yield [16–18]. It has been implemented for various applications, e.g., distance measurement [19], medical examination [20], partial discharge localization in power transformers [21] and quality inspection [22]. The multi-frequency ultrasonic detection technology reflects the internal information of the measured object at the molecular level, and can avoid the interference of external environmental factors such as temperature to a grea<sup>t</sup> extent, thereby realizing high-precision detection.

In this article, an alternative method of measuring the water content in transformer oil, using multi-frequency ultrasonic detection technology and PCA-GA-BPNN was proposed. The multi-frequency ultrasound data was obtained by the experiment was reduced into dimensions by PCA and input into the BPNN, then the Carle Fischer method was used to measure the water content as BPNN output. The number of neurons in the hidden layer was determined by the test method, the weight and the threshold of the BPNN was optimized by GA which improved the prediction accuracy. Finally, a case study of test set was carried out to verify the validity of the proposed empirical model.

The remainder of this paper is organized as follows. The Multi Frequency Ultrasonic (MFU) testing system is described in Section 2. Experiments are discussed in Section 3. The PCA, GA and BPNN algorithms are represented in Section 4. The prediction model of water content in transformer oil and the prediction results are presented in Section 5. Finally, a conclusion is provided in Section 6.

#### **2. MFU Testing System**

As shown in Figure 1, the multi frequency ultrasonic detection system consists of three parts, namely, an Ultrasonic Measurement Device (Yucoya Energy Safety GmbH, GER), an Ultrasonic Sensor and Measurement Software (Yucoya Ultrasound Manager, Yucoya Energy Safety GmbH, GER). The internal state information of transformer oil was reflected by these ultrasonic parameters that were obtained by continuous scanning at the molecular level.

**Figure 1.** Structure chart of the Multi-frequency ultrasonic device.

The structure of the ultrasonic sensor is shown in Figure 2. At the time of detection, the ultrasonic sensor was dipped into the transformer oil, except for the mounting bracket and the external connectors, to fill the measurement chamber with the oil. Meanwhile, the ultrasonic transmitter Tx emits an ultrasound signal, including 20 frequencies within the range of 600 kHz–1000 kHz and the central resonance frequency was about 750 kHz. A part of this signal is first reflected at the interface between the reference medium and the measurement chamber. This reflected signal travels back to ultrasonic receiver Rx1 where it is measured. This signal is called L1. The other part of the signal emitted by ultrasonic transmitter Tx is transmitted through the interface of the reference medium and the measurement chamber. It travels to ultrasonic receiver Rx2, where it is measured. This part of the signal is called L3. Finally, at ultrasonic receiver Rx2, a part of the signal is reflected again and travels back to ultrasonic receiver Rx1. This signal is called L2.

**Figure 2.** Principle structure diagram of ultrasonic sensor.

The received signals which are called L1, L2 and L3 can be written as:

$$\mathbf{x}(t) = A\sin(\omega t + \varphi) \tag{1}$$

where *A* is signal amplitude, *ϕ* is signal phase, *ω* = 2*π f* and *f* is signal frequency.

This signal can be written as:

$$x(t) = A\cos(\varphi)\sin(\omega t) + A\sin(\varphi)\cos(\omega t) \tag{2}$$

$$\mathbf{x}(t) = \mathbf{C}\_0 \sin(\omega t) + \mathbf{C}\_1 \cos(\omega t) \tag{3}$$

where:

$$C\_0 = A\cos(\varphi)\tag{4}$$

$$C\_1 = A \sin(\varphi) \tag{5}$$

The requested amplitude and phase can be determined as follows:

$$A = \sqrt{\mathbb{C}\_0^2 + \mathbb{C}\_1^2} \tag{6}$$

$$\varphi = \arctan\left[\frac{\mathbb{C}\_1}{\mathbb{C}\_0}\right] + 1 - \text{sgn}(\mathbb{C}\_0)\frac{\pi}{2} \tag{7}$$

## **3. Experimental Results**

A series of measurements were conducted on 160 transformer mineral oil samples, which were made up of 10 new oils, 10 drying oils from new oils and 140 service-aged oils. The Carle Fischer method was used to identify the water content of each sample. The same oil samples were tested using multi-frequency ultrasonic transformer detection device for the acoustic wave frequency spectrum.

The six samples oil response spectrum are shown in Figure 3, where the legend is the moisture content of the sample in mg/L. It can be seen that, as the moisture content fell, the amplitude response of each frequency fell, and the increase in amplitude response was maintained in L1 phase and L2 phase. In L3 phase, there was a "basin" inside the amplitude response within the scope of the frequency range 700 kHz to 850 kHz. An oil sample with a moisture content of 1.51 mg/L exhibited the lowest amplitude response in L1 phase, L2 phase and In L3 phase.

**Figure 3.** Spectrum curve of ultrasonic signal. (**a**) Amplitude response of L1 phase of multi-frequency ultrasonic detection; (**b**) Amplitude response of L2 phase of multi-frequency ultrasonic detection; (**c**) Amplitude response of L3 phase of multi-frequency ultrasonic detection.

#### **4. Artificial Neural Network**

#### *4.1. Principal Component Analysis*

According to [23,24], as a kind of multivariate statistical analysis, PCA is successfully used in various applications such as picture processing, face recognition and so on. As a model of dimension reduction, it operates mainly though creating a few new variables, which are uncorrelated, from the original variables, and these new variables retain the maximum information of original variables as much as possible.

The dimension reduction matrix was obtained by PCA, as described below.

Step 1: Standardize the original matrix:

$$\mathbf{x}\_{ij}^{\*} = \frac{\left(\mathbf{x}\_{ij} - \overline{\mathbf{x}\_{j}}\right)}{\mathbf{S}\_{j}} \tag{8}$$

where *x*∗*ij* is a variable of standard matrix, *xij* is a variable of original matrix, *i* (=1, 2, ... , *N*) is the number of variable, *j* (=1, 2, ... , *m*) is the dimensions of each sample, and *xj*, *Sj* are the mean and variance of the indicator variable *xj*, respectively.

Step 2: Calculate the correlation matrix, the eigenvectors and the eigenvalues of the correlation matrix:

$$R = \frac{X^{\*T} \times X^{\*}}{(N-1)} \tag{9}$$

$$R \times \lambda\_{\bar{j}} = \lambda\_{\bar{j}} \times u\_{\bar{j}} \tag{10}$$

where *R* is the correlation matrix, *X*∗ is the standard matrix, and *λj*, *uj* are the eigenvalues and the eigenvectors of the correlation matrix, respectively.

Step 3: Calculate the contributing rate of cumulative variance and the contributing rate of variance:

$$
\eta\_{\dot{j}} = \frac{\lambda\_{\dot{j}}}{\sum\_{j}^{m} \lambda\_{\dot{j}}} \times 100\% \tag{11}
$$

$$\eta\_{\Sigma}(p) = \sum\_{j}^{p} \eta\_{j} \tag{12}$$

where *ηj* is the contributing rate of variance of the *j*th principal component, and *<sup>η</sup>*∑(*p*) is the accumulative variance contribution of the first *p* principal components.

Step 4: Calculate the projection of original matrix:

$$Z\_{\rm N \times p} = X\_{\rm N \times m}^\* \mathcal{U}\_{m \times p} \tag{13}$$

where *ZN*×*p* is the dimension reduction matrix, and *Um*×*p* = *<sup>u</sup>*1, *u*2,..., *up*.

In this study, the original data matrix is the multi frequency ultrasonic detection data which is a 242-dimensional space including the amplitude and phase of 20 frequencies, time of flight (TOF) and velocity, which could be reduced by PCA. The information retention ratio of the original data matrix after the dimension reduction is shown in Figure 4. It was observed as conspicuous from Figure 4 that the first principal component encompassed only about 35% of the total variation, and the information retention rate increased with the increase in the dimensions, and the information retention rate of the first seven principal components reached 90%. The number of neurons in the input layer of BPNN model were determined by the number of principal components that had a high information retention rate. In order to ensure performance of BPNN model, the first eight principal components, which the information retention rate hit 95%, were used as the inputs of the model.

**Figure 4.** Information retention ratio.

#### *4.2. BP Neural Network*

An artificial neural network model has been developed to investigate the correlation between the water content in oil samples and their multi frequency ultrasonic spectrum data. According to the authors in [25], when there are enough neurons in the hidden layer, a three-layer BPNN can realize the mapping of an arbitrary I-dimension (input layer) to any k-dimension (output layer). Therefore, in this paper, a three-layer BPNN was chosen, and the input variables of the BPNN with PCA were the first eight principal components from the analysis of Section 4.1, and the output layer consisted of 1 neuron, corresponding to the water content.

The flow chart of model training and learning is shown in Figure 5.

**Figure 5.** Flow chart of back propagation neural network training.

## *4.3. Genetic Algorithm*

It is difficult for traditional BPNNs to find out the global optimum solution of the prediction application [26,27]. A genetic algorithm (GA) is a method to obtain global optimum solution of the proposed problems based on a natural selection process which mimics the biological evolution process [28–31]. In this article, GA was used to optimize the weight and the threshold of the BPNN. The BPNN flow chart of the GA-BPNN is shown in Figure 6.

**Figure 6.** Flow chart of optimizing the BP neural network parameters with a GA.

#### *4.4. PCA-GA-BPNN Prediction Model*

The working process of the prediction model was divided into three stages.

The first stage: create a database module. The database module not only matched the multi-frequency ultrasonic parameters of oil with water content in oil, but also divided it into the training sample collection and test sample set in certain proportions.

The second stage: create the prediction model. The prediction model first read out training samples from the database module, and combined them with the PCA to ge<sup>t</sup> the input matrix, which was composed of the first eight principal components. Then, by means of the initial GA-BPNN parameters, the initial forecast model, and the initial forecast results were produced. The fitness function of the model was described as:

$$F(\mathbf{x}) = \sum\_{i=1}^{n} |y\_i - \hat{y}\_i| \tag{14}$$

where *yi* is predicted value, *y*ˆ*i* is observed value, *i* and is sample size.

Using the fitness function of the genetic algorithm to calculate the fitness value each individual in each generation, if it meets fitness convergence conditions, the initial forecast model is the final prediction module, or it will perform the operation of selection, crossover and mutation, and pass the new parameters to the GA-BPNN, then the second generation prediction model combines with the database module to ge<sup>t</sup> the second generation forecast results and so on, until it finally meets the prediction model of the fitness convergence conditions. The roulette method was used as the selection of GA, the probability of selection, *Px*, for each individual, *x*, was described as:

$$P\_x = \frac{f\_x}{\sum\_{j=1}^N f\_j} \tag{15}$$

$$f\_x = \frac{k}{F\_x} \tag{16}$$

where *Fx* is the fitness values of individual *x*, *N* is the individual number and *k* is the coefficient.

The crossover operation method of the *k*th chromosome *ak* and the *l*th chromosome *al* in the *j* position was described as:

$$\begin{cases} a\_{kj} = a\_{kj}(1 - b) + a\_{lj}b \\\ a\_{lj} = a\_{lj}(1 - b) + a\_{kj}b \end{cases} \tag{17}$$

where *b* is a random number in the range of 0 to 1.

The mutation operation was determined by Formulas (18) and (19):

$$a\_{ij} = \begin{cases} \ a\_{ij} + \left( a\_{ij} - a\_{\max} \right) \times f(\mathbf{g}) \ r > 0.5\\\ a\_{ij} + \left( a\_{\min} - a\_{ij} \right) \times f(\mathbf{g}) \ r \le 0.5 \end{cases} \tag{18}$$

$$f(\mathcal{g}) = r\_2 \left( 1 - \frac{\mathcal{g}}{\mathcal{G}\_{\text{max}}} \right)^2 \tag{19}$$

where *amax* is the upper bound of *aij*, *amin* is the lower bound of *aij*, *r*2 is a random number, *g* is the current iterations, *Gmax* is the maximum number of evolution generations, and *r* is a random number in the range of 0-1.

The third stage: the water content in oil is forecast. According to the final prediction model, it will accurately predict the water content in oil.

#### **5. Results and Discussion**

In the simulation and compiled environment of Matlab (2014a), the prediction model of water content in transformer oil was established. The model was trained with the training data which included 150 random sets, and the prediction accuracy of the model was tested with the remaining 10 sets. Since the number of optimal hidden layer neurons gives uncertainty in the initial modeling, the range of the number of hidden layer neurons was determined by empirical formula [32–34]:

$$m < \sqrt{m+l} + a \tag{20}$$

$$n < \log\_2 l \tag{21}$$

where *l* is the number of input layer neurons, *n* is the number of hidden layer neurons, and *m* is the number of output layer neurons. *a* is in the range of 1-10. Therefore, in this paper, the number of hidden layer neurons was between 1 and 16. The back-check diagnosis was performed using the network of 1-16 hidden neurons, and the mean square error (MSE) of recheck was calculated. The variation of MSE with the number of neurons in the hidden layer as shown in Figure 7.

**Figure 7.** The MSE of BP neural network recheck.

As shown in Figure 6, when the number of hidden layer neurons is 5, the MSE of the model was at the minimum. Therefore, according to Section 4.2, the topological structure of the BPNN model was determined as "8-5-1" by many experiments. In order to improve the convergence speed and prediction accuracy of the BPNN, the weight and the threshold of the BPNN was optimized by GA.

The quality of the solution evaluated by the genetic algorithm depends on the fitness value of the solution. In this model, the sum of the absolute value of the error between the predicted output and the expected output was the individual fitness value, so for the individual fitness, a lower value is better. Figure 8 shows the model of optimal fitness is declining in the process of evolution, finally the optimal individual fitness value was 9.6.

**Figure 8.** The fitness curve of Genetic algorithms.

The optimal weights and the optimal thresholds obtained from the optimization of GA were assigned to the BPNN. Then, the model was trained with the training set. The regression curve of the model is shown in Figure 9. Figure 9 shows that the correlation coefficient of the PCA-GA-BPNN model was about 0.98, indicating that the model has a good regression fit.

**Figure 9.** The regression fitting curve of GA-BPNN.

The MSE of the training process of the PCA-GA-BPNN is shown in Figure 10. The MSE of this model gradually decreased with the increase of training times. The minimum MSE (8.65 × <sup>10</sup>−5) was obtained by the PCA-GA-BPNN at the 18th iteration.

**Figure 10.** The training process of GA-BPNN.

In this paper, in order to demonstrate the superiority of the prediction model of water content in transformer oil based on the PCA-GA-BPNN, the experiment compared the prediction accuracy with the BPNN and GA-BPNN. The three models were used to identify the test set, the prediction results are shown in Table 1 and Figure 11.


**Table 1.** Prediction results of water content.

**Figure 11.** Prediction results.

In order to quantitatively analyze the predictive effect of the three models, the mean absolute percent error (MAPE) was used to compare the prediction errors of the BPNN model, the GA-BPNN model and the PCA-GA-BPNN model. According to Table 1, it was concluded that the MAPE of the BPNN was 17.03%, the MAPE of the GA-BPNN was 10.31%, and the MAPE of the PCA-GA-BPNN was 7.07%.
