1. Introduction
The transformer is one of the important equipment of the power system, and its operation status is related to the safety and stability of the whole power system. Transformer failure will cause great harm to the power system. The internal failure and life reduction in the transformer is usually caused by the degradation of insulation capability, and the internal overheating of the transformer is the main reason for the degradation of its insulation capability, so it is important to obtain an accurate internal temperature of the transformer [
1]. Hot spot temperature and oil temperature are important parameters of the internal temperature of the oil-immersed transformer; hot spot temperature is the temperature of the hottest spot inside the transformer, and oil temperature is the temperature of the transformer-insulating oil, both of which are important indexes for assessing the operational status of the transformer. Oil temperature is usually measured by temperature sensors. There are two ways to obtain the hot spot temperature: (1) directly measured using fibre optic sensors or (2) calculated according to the hot spot temperature analysis calculation method; however, fibre optic sensors have a high cost of temperature measurement and are difficult to maintain. According to the load guidelines and numerical calculation methods, to calculate the hot spot temperature, the environment, time, and other factors cannot be taken into account and therefore cannot accurately calculate the hot spot temperature [
2].
Elevated oil temperature is usually caused by transformer overload operation, cooling system abnormalities, equipment ageing, and other reasons. Transformer equipment can usually be overloaded in a short period of time, and this behaviour will lead to oil temperature rise. However, the prolonged overload operation of the transformer may shorten the service life of the transformer or lead to transformer damage. An abnormal transformer cooling system will lead to the abnormal heat dissipation of the transformer, which will cause the oil temperature to rise. Ageing of the equipment will increase the magnetic flux loss of the transformer and reduce the insulation capacity, resulting in an increase in oil temperature. Rising oil temperature will have an effect on the insulation capacity of the transformer. Rising oil temperature accelerates the ageing of insulating materials, resulting in a decrease in their insulation capacity. High temperature will accelerate the oxidation process of the insulating oil, and the oxidation products will reduce the insulating ability of the transformer. As the temperature continues to rise, the dielectric loss of the insulating oil increases, and even the breakdown of the insulating oil may occur, seriously affecting the stable operation of the transformer.
The current mainstream research is to establish a prediction model for the hot spot temperature of the transformer, but considering that it is difficult to obtain accurate hot spot temperature data on the transformer and the oil temperature data have the same reference value, this paper models and predicts the oil temperature data using artificial intelligence algorithms based on historical data to extract the internal potential characteristics and establish a prediction model, which provides a predictive model for the dynamic capacity increase in the transformer in the context of big data mining, analysis, and so on. Dynamic capacity increases work to provide predictive and efficient load reduction suggestions and countermeasures [
3]. Through the accurate prediction of transformer oil temperature, we can effectively monitor and adjust the transformer’s operating status, as well as reduce the waste of energy due to overheating or overcooling, thus improving the efficiency of energy utilisation [
4].
In previous studies on transformer temperature prediction, deep learning methods were first applied to transformer hot spot temperature prediction by Daponte P et al. However, the inputs of this network model only considered the environmental factors and load currents; after that, Pradhan MK et al. proposed an improved artificial neural network model by taking the oil temperature parameter as the input to the model based on Daponte P [
5]; Li Mengli et al. [
6] increased the input feature dimension to five dimensions, which made the prediction results more accurate; Li Shuqing et al. [
7] applied the grey neural network model to transformer temperature prediction, which enabled them to improve the prediction accuracy under the limited amount of data; Jiang Bing et al. [
8] used the BP neural network for the prediction of hot spot temperatures of transformer windings, but local convergence problems may occur, so other researchers followed up to establish various optimisation algorithms for the BP neural network to predict the hot spot temperatures of transformer windings. Researchers also established various optimisation algorithms to optimise the hyperparameters of the BP neural network to avoid this problem; for example, the literature [
9] optimised the BP neural network to achieve the prediction of transformer oil temperature by improving the PSO algorithm [
9].
The support vector machine (SVM) method was initially used to solve classification problems, but considering the advantages of SVMs for dealing with nonlinear, high-dimensional data, researchers have applied it to transformer temperature prediction. However, support vector machines are sensitive to parameters, and their sensitivity and the improper selection of parameters can significantly affect the results, so the use of this method requires the optimal solution of its parameters [
10]. Chen Weigen et al. [
11] used a genetic algorithm to optimise the parameters of the support vector machine, and the prediction accuracy was better than the results using the default parameters. Yu Xi et al. [
12] used the particle swarm algorithm to optimise the parameters of SVM and compared the results with the genetic algorithm to conclude that particle swarm algorithm optimisation is better. Liu Gang et al. [
13] overcame the shortcomings of traditional PSO, whereby it is easy to fall into local minima, and improved the convergence speed of the algorithm by introducing a contraction factor to the particle update speed based on PSO of LSSVM. Liang Feng et al. [
14] used the ant colony optimisation (ACO) algorithm to optimise the SVM parameters and compare it with the PSO algorithm; the transformer hot spot temperature prediction results are better than the PSO-SVM model, but the selection rules for input features are vague. Jingke Liu et al. [
15] established the PCA-IHHO-LSSVM model to predict the transformer’s temperature. First, the model input features are determined using PCA, and then the parameters of LSSVM are optimised using the Harris Hawk algorithm, which improves the normality of feature selection, and the model has a better prediction accuracy. However, the above researchers using the SVM are working on small-scale or ideal experimental datasets and are unable to determine its prediction effect on real large-scale datasets.
The experimental dataset of the above method is mostly a small-scale dataset with a data size of about several hundred groups, while the dataset sizes in actual engineering are relatively large, and it is not possible to determine the prediction effect of the above method in actual large-scale datasets. In addition, most of the previous studies used ideal datasets and ignored the process of data processing, and the actual transformer data are complex and cannot be directly applied to the model; they first need to be analysed and processed. Therefore, this paper proposes a transformer oil temperature prediction method based on data-driven and multi-model fusion. The method firstly analyses and processes the real transformer operation and inspection data, eliminates duplicate data, fills in the missing data, and screens the abnormal data, and then the abnormal data detection process is completed through the theoretical detection method of outliers combined with the actual operation and inspection experience. The model construction is divided into two parts; the first part is the base model, which is constructed using five different machine learning methods to enable feature extraction for different feature dimensions of the dataset. The second part is the improved SSA-BP neural network model, where the oil temperature prediction results derived from the base model are used as inputs to the neural network model; the hyperparameters of the BP neural network are optimised using the improved sparrow search algorithm (TSSA), and the BP neural network performs a quadratic prediction of the initial prediction values to produce the final prediction results. The purpose of establishing the oil temperature prediction model is to establish a high-precision oil temperature prediction model through the normal operation data of the transformer, after which the real-time monitoring data are input into the model, and the model prediction value is compared with the actual value to determine whether the current oil temperature data are abnormal or not. In this paper, the method is used to predict transformer oil temperature for several different datasets, and the results show good prediction results for all of them.
The
Section 1 of this paper writes about data analysis and data processing, where the original dataset is subjected to data transformation and classification, model input feature selection, and data cleaning. The
Section 2 of this paper carries out the algorithmic introduction of the model, which introduces the principle of the base model algorithm and the application of the TSSA-BP method, respectively. The
Section 3 is the experimental design and conclusion analysis, which describe the presentation and analysis of the experimental results. The
Section 4 is the concluding remarks, which summarise and look forward to the research methodology of this paper. Considering the large number of acronyms in this paper, a list of acronyms is provided in
Table 1.
3. Introduction of Model Algorithm
The data-driven multi-model fusion prediction method achieves an accurate prediction of oil temperature by integrating the base model and the secondary prediction model, and the overall process of model construction is shown in
Figure 3.
The process is as follows: the feature dataset is obtained through data processing; the feature data are input into the five base models for the initial prediction of oil temperature; the resulting predicted value of oil temperature is used as the input to the BP neural network; the improved SSA optimisation algorithm is used for parameter searching of the BP neural network, and the BP neural network is trained according to the optimal combination of the parameters, and the final prediction of the oil temperature is finally realised.
3.1. Introduction to Base Modelling Algorithms
The base model prediction method is used to make the initial prediction of transformer oil temperature using five machine learning methods; the machine learning methods use multiple linear regression, ridge regression, support vector regression, decision tree regression, and KNN regression methods, and the algorithm principles are described below.
(1) Multiple linear regression: Multiple linear regression is a statistical method used to analyse the linear relationship between multiple independent variables and a single dependent variable [
20]. In multiple linear regression, the prediction accuracy is made best by finding a set of best-fit coefficients, which usually performs better when there is a good linear relationship between the inputs and outputs, which is given by the formula:
where
f(
x) represents the dependent variable;
l represents the weight coefficients trained by the model;
x represents the independent variable, and
b is the bias. The fitting process for multiple linear regression is usually performed by determining the coefficients using the least squares method;
(2) Ridge regression: Ridge regression is a linear regression method for dealing with covariate data. In multivariate linear regression, ordinary least squares estimation produces unstable results when there are multiple covariates in the predictor variables, and Ridge regression introduces a regularisation term to solve this problem [
21]. The objective function of linear regression is
To ensure that the regression coefficients
β are summable, Ridge regression adds the following penalty term of L2 paradigm to the objective function:
where
λ is a non-negative number, and the larger
λ is, the smaller the regression coefficient
β is in order to minimise
J(
β). The inclusion of the L2 paradigm penalty term makes the matrix full rank and ensures invertibility. However, the inclusion of the penalty term makes the estimation of the regression coefficient
β no longer unbiased. Regularisation improves model stability and reduces the risk of overfitting;
(3) Support vector regression: The core of the support vector machine method is to map the data from a low-dimensional space to a high-dimensional space through the kernel function in order to find a hyperplane in the high-dimensional space so that the error between the training samples and the hyperplane is minimised, and finally, it completes the regression prediction through the kernel function to weight the support vectors by averaging them [
22]. Considering that the radial basis kernel function has a better generalisation performance, this paper chooses the radial basis kernel function, which is mathematically expressed as
where
µ and
v are sample data, and
σ is the kernel function.
SVM optimises the model by maximising the interval and minimising the total loss. When using support vector machines for nonlinear regression prediction, it is necessary to assign initial values to the penalty factor C and the kernel function σ. However, the determination of the initial values is related to the prediction effect of the model, so it is necessary to use some bionic algorithms to optimise its hyperparameters, and in this paper, we use the genetic algorithm to optimise its parameters;
(4) Decision tree regression: The decision tree algorithm is a basic method in machine learning; regression decision tree mainly refers to the CART algorithm, and its so-called regression is based on the feature vector to determine the corresponding output value; the regression tree is the feature space, which is divided into several units, and each division module has a specific output. The specific process of regression tree construction is in the input space where the training data are located, which is determined by recursively dividing each region into two sub-regions and deciding the output value on each sub-region to construct a decision binary tree [
23]. The decision tree generating function is
where
M is the number of partitioned regions;
cm is the partitioned region output value, and
I is the indicator function.
Decision tree construction typically starts at the root node and recursively creates child nodes based on the best segmentation point for the feature. When selecting segmentation criteria, the decision tree algorithm needs to determine how to evaluate the feature segmentation effect. Segmentation criteria include the selection of features and thresholds that minimise the mean square error, with the ultimate goal of determining the optimal features for segmenting the data;
(5) KNN regression: KNN regression, also known as K-nearest neighbour regression, is based on the core idea that for n-dimensional input variables, each corresponding to a point in the feature space, the output is the category label or predicted value corresponding to that feature vector [
24]. Regression prediction using the KNN algorithm is performed by finding the k-nearest neighbours of a new prediction instance, and then taking the mean of the target of these k samples as the sample prediction value, which is expressed with the following formula:
Prediction using KNN regression requires the selection of a distance metric to calculate the similarity between the data, and commonly used data metrics are Euclidean distance and Manhattan distance [
25]. In this paper, Euclidean distance is chosen to calculate data correlation.
3.2. Introduction to Multi-Model Fusion Methods
A multi-model fusion method is used to combine multiple models to improve the model’s generalisation ability and its prediction accuracy, and its basic idea is to combine the prediction results of multiple models to obtain better prediction results [
26]. The general multi-model fusion method is used to obtain the final prediction result by voting or the weighted average of the prediction results of the base learners, while in this paper, we improve the prediction accuracy of the model by establishing an improved SSA-BP neural network for potential feature extraction and training using the initial prediction results.
3.2.1. Principles of Improved Sparrow Search Algorithm
The sparrow search algorithm (SSA) is a population optimisation method inspired by the foraging behaviour of sparrows. In nature, sparrows automatically fall into two types of roles when searching for food: discoverers and followers. The role of discoverers is to scout for food sources, and they indicate to the group the location of the food as well as the search path. Followers rely on the guidance of the discoverers to obtain food. During foraging activities, sparrows observe the movements of their peers, and when high-intake peers are detected, attackers compete with them to increase their feeding opportunities. At the same time, when sparrow populations are threatened, they adopt strategies to escape predators [
27].
However, the sparrow search algorithm still has the disadvantages of easily falling into local optimum and high randomness [
28]. The improved sparrow search algorithm, also known as the adaptive t-distributed sparrow search algorithm (TSSA), reduces the probability of SSA falling into local optimum solutions and improves the optimisation ability of the algorithm. Compared with SSA, TSSA samples the t-distribution of each sparrow’s position when updating its position and redefines the sparrow position updating method. TSSA dynamically selects the probability P to regulate the use of the adaptive t-distribution variation operator. TSSA achieves the adaptive adjustment of sparrow searching to avoid falling into the local optimal solution and improves the algorithm convergence speed, its ability to search for the global optimal solution through the introduction of the adaptive mechanism, and its ability to search for globally optimal solutions [
29].
In the original SSA framework, the finder position update at each iteration is described as
In the equation,
t represents the current iteration number;
itermax is the maximum iteration number;
Xi,j represents the position of the
ith sparrow in the
jth dimension;
R2 represents the warning value, and the range is [0, 1];
ST is the safety value, and the range is [0.5, 1];
Q is a random number obeying a normal distribution;
L is a 1 × d matrix, and all the elements in the matrix are 1; when
R2 <
ST, it indicates that there is no predator in the surroundings of the environment; when
R2 ≥
ST, it indicates that all the sparrows need to move to other safe areas for foraging. When there is no predator, an extensive search can be performed; when
R2 ≥
ST, it means that a sparrow in the population has found a predator, and it sent a message to the other sparrows; at this time, all the sparrows need to move to other safe areas to forage. The follower’s location update is described as
where
XP represents the global optimal position;
XW represents the global worst position; A
+ is a 1 × d matrix with all matrix elements 1 or −1 and satisfies
A+ =
AT(
AAT)
−1. When
i >
n/2, it means that the
ith follower did not forage successfully, which leads to its low survival rate, and it needs to go to other areas to forage, and other position information is determined based on the position update formula.
When the population is threatened by a natural enemy, the scout sparrow sends out an early warning signal, and the population engages in anti-predatory behaviour; the position update is mathematically described as
In the above expression, Xbest is the global optimal position; β is the step-size adjustment parameter, which is a normally distributed random number; K is the step-size adjustment parameter, which indicates the movement position of the sparrow; fi is the fitness value of the ith sparrow; fbest is the current global optimal fitness value; fw is the worst fitness value; when fi > fbest, it means that the sparrow is foraging in the edge area, where it is easily detected by predators. When fi = fbest, it means that the sparrow is located in the centre of the population and detects the presence of predators around it, and it needs to move to other areas to avoid predators.
TSSA introduces an adaptive t-distribution at the time of position update, and the position update is described as
where
Xit+1 is the sparrow position after perturbation;
Xit is the current position of the sparrow, and
t(
iter) is a distribution variation algorithm with the number of iterations as the parameter of degrees of freedom. Dynamic selection probability
p is used to regulate the change in the adaptive t-distribution variation operator, which is mathematically expressed as
where
w1 is the upper limit of dynamic probability selection;
w2 is the magnitude of change in dynamic selection probability;
itermax is the maximum number of iterations, and
iter is the current number of iterations.
3.2.2. Introduction to BP Neural Networks
Because of the nonlinear modelling ability and data-driven characteristics of multilayer neural networks, they have good characteristics in solving regression-type problems. The BP neural network is a multilayer feed-forward neural network, which adjusts the weights and biases in the network through the characteristics of signal forward propagation and error backpropagation [
30]. The structure of the BP neural network is divided into an input layer, a hidden layer, and an output layer, and the number of neurons in the input layer is determined by the input features of the model; the number of neurons in the output layer is determined by the number of outputs of the model, and the number of neurons in the hidden layer is derived through empirical formulas or model tuning. The network structure is shown in
Figure 4.
3.2.3. Improving the SSA-BP Model
To avoid the BP neural network from falling into local minima as well as to improve its prediction accuracy, an improved SSA-BP neural network model is established. When the BP neural network performs a local search, it can find out the updating position of weights and thresholds faster, provide better effect hyperparameters for the training of the BP neural network, and optimise the prediction accuracy and convergence speed of the BP neural network [
31]. The flowchart of the improved SSA to optimise the BP neural network is shown in
Figure 5.
The process first takes the base model prediction results as input to the BP neural network and initialises the parameters of the BP neural network. After that, the weights and bias of the BP neural network are optimised using the TSSA optimisation algorithm, and the process starts with initialising the sparrow population position, after which the t-distribution variance operator is added to the position updating process, with the discoverer updating the position to simulate the global search and the follower updating the position to simulate the local search, and the optimal individual of the population position is finally determined. After judging that the termination conditions are satisfied, the optimal weights and thresholds are output, and the model is trained according to the optimal parameters, and then the final prediction of transformer oil temperature is completed.
In the multi-model fusion prediction method, the base model includes multiple linear regression, Ridge regression, support vector regression, KNN nearest neighbour regression, and decision tree regression. Among them, the computational complexity of multiple linear regression, support vector regression, Ridge regression, and decision tree regression is relatively low and can be ignored in the overall complexity calculation. But brute-forced KNN nearest neighbour regression needs to be considered for its computational complexity, and its prediction time complexity is O(n f + k f),where k is the number of nearest neighbours. In BP neural networks, the training complexity depends on the number of layers of the network and the number of neurons in each layer, which is usually denoted as O(p), where p is the number of neurons in the input layer; q is the number of neurons in the hidden layer; r is the number of hidden layer layers, and s is the number of neurons in the output layer. Model simplification and approximation methods can be performed to reduce the computational complexity when dealing with large-scale datasets. Model simplification can usually be used to reduce the computational complexity by reducing the number of layers and neurons of the neural network. Approximation methods for KNN nearest neighbour regression use approximate search methods to reduce the computation.
4. Experiment Design and Result Analysis
The experimental design and result analysis section is divided into three subsections, which include dataset division, base model prediction, multi-model fusion experiment, and result analysis. Dataset division is used to divide the processed data into 8 experimental datasets according to different practical situations. The base model prediction is to show the prediction effect of different base models on the divided datasets. The multi-model fusion experiment and result analysis show the results of verification and comparison experiments and analyse them.
4.1. Dataset Partitioning
To verify the feasibility of the multi-model fusion method in the real complex data situation while considering the effects of the season and transformer load state on the model prediction results, 8 sets of experimental datasets are constructed, which are the small-scale and larger-scale datasets with high load in summer, low load in summer, high load in winter, and low load in winter, respectively, and the small-scale dataset consist of 200 sets of experimental data, of which 180 sets are used for model training, and 20 sets are used to validate the effect; the larger-scale dataset consist of 2000 sets of experimental data, of which 1800 sets are used for model training, and 200 sets are used as a test set to validate the model effect.
To verify the effectiveness of the experiments driven by larger-scale real data, this paper first conducted experiments using the base model.
4.2. Base Model Predictions
The multi-model fusion approach first requires the use of a base model for prediction, which provides an initial prediction of the transformer oil temperature through five machine learning methods. Based on the grey correlation calculation, it is determined that the inputs to the model consist of load current, ambient temperature, load factor, and active power, and the output of the model is oil temperature. The small-scale dataset and larger-scale dataset of summer high load are selected for the presentation of the base model’s prediction results; a total of 180 sets of data from the summer high load small-scale dataset are used for training, and 20 sets of data are used for testing. The summer high load larger-scale dataset used 1800 sets for training and 200 sets for testing.
The base model approach, in terms of parameter settings, sets alpha for Ridge regression to 0.5, and the optimiser selects auto mode; the penalty factor C for support vector regression is set to 1000, and the kernel function selects the radial basis kernel function; epsilon is set to 0.1, and gamma is set to 0.05. The split quality function for decision tree regression selects the mean squared error; the splitting strategy is defaulted, and the maximum depth of the tree is set to five, and the maximum number of leaf nodes is set to ten. The number of neighbours for KNN regression is set to 30; distance is used for regression weights; the algorithm is selected as a brute, and the distance metric is set as Euclidean. The experiments are performed on the Python software (version 3.13), and the prediction results are shown in
Figure 6.
By analysing the base model’s prediction results, it can be obtained that the decision tree method has the best prediction effect and is consistent with the trend of the real data, and the error of multiple linear regression is large, but the prediction trend is consistent with the real data.
The test set for the summer high-load, larger-scale dataset has 200 sets of data that cannot be visualised on a line graph, so the base model prediction results are presented in tabular form.
From
Table 7, it can be seen that when predicting larger-scale datasets, due to factors such as increased data volume and complex data situation, the prediction accuracy of the above base model decreases in comparison to small-scale datasets, so a multi-model fusion method is established to improve the prediction ability on larger-scale datasets.
4.3. Multi-Model Fusion Experiment and Result Analysis
4.3.1. Experimental Design
According to the prediction results of the base model, it can be seen that the prediction accuracy of the base model in larger datasets is lower, so we verify the prediction effect of the multi-model fusion method through further experiments; for this purpose, this paper designs the following experiments for verification, as shown in
Table 8.
The validation experiment verifies the prediction effect of the multi-model fusion method in different practical situations by targeting eight different datasets. Comparison experiments use different prediction methods for the same dataset to compare and analyse the prediction effect of different prediction methods.
4.3.2. Verification Experiment
The multi-model fusion approach retrains the base model prediction results through the TSSA-BP model, which is mathematically expressed as
where
f(
x) is the prediction result of the integrated learning model; TSSA-BP stands for the improved sparrow algorithm-optimised BP neural network model, and
x1 to
x5 in parentheses represent the prediction results of the five base models.
In conducting the validation experiments, the experimental validation is first carried out for the small-scale dataset; for example, Experiment 1, Experiment 3, Experiment 5, and Experiment 7 are carried out, and their results are analysed.
In terms of the parameter settings of the multi-model fusion method, the parameter settings for the small-scale and larger-scale datasets are not exactly the same. The parameter settings for the small-scale dataset are as follows: the number of sparrow populations is set to 20; the number of sparrow search iterations is set to 30; the optimisation parameter dimensionality is set to two dimensions; the warning value ST is set to 0.8, and the percentage of discoverers is set to 20%. The number of nodes in the input layer of the BP neural network is set to five. The number of nodes in the hidden layer is set to ten, and the number of nodes in the output layer is set to one. The activation function is selected as ReLU; the learning rate is set to 0.01; the number of iterations is set to 500, and the loss function is selected as MSE.
The evaluation metrics are selected as root mean square error and mean absolute percentage error. Root mean square error (RMSE) is an evaluation metric used to measure the difference between the predicted value of the model and the true value; the closer the value of
RMSE is to zero, the more accurate the prediction of the model is. This is because it squares the error, thereby giving more weight to larger prediction errors. Mean absolute percentage error (MAPE) is a measure of prediction accuracy, which evaluates the model’s performance by calculating the absolute value of the difference between the predicted value and the true value as a percentage of the true value, and the smaller the value of
MAPE, the better the prediction performance of the model. Both indicators are used to measure the difference between the predicted value and the actual value, and the root mean square error is calculated as
The mean absolute percentage error is calculated as
In the formula, m denotes the sample; h(xi) denotes the predicted value of the ith sample, and yi denotes the actual value of the ith sample.
The small-scale datasets with high summer load, low summer load, high winter load, and low winter load are modelled and predicted using multi-model fusion prediction methods to analyse the prediction performance of multi-model fusion prediction methods on different situations and data sizes, respectively. The prediction results of Experiment 1, Experiment 3, Experiment 5, and Experiment 7 are shown in
Figure 7,
Figure 8,
Figure 9 and
Figure 10.
Figure 7 shows the prediction results of Experiment 1, with an RMSE of 0.5581 and an MAPE of 0.81%, whose prediction trends are consistent with the actual values, but the prediction is generally effective at some peak points.
Figure 8 shows the prediction results of Experiment 3, with an RMSE of 0.6525 and an MAPE of 0.96%, from which it can be seen that the trend of the predicted and actual values of the oil temperature is the same, and the prediction error is larger at the peak point compared to the other sample points.
Figure 9 shows the prediction results of Experiment 5, with an RMSE of 0.1567 and an MAPE of 0.38%, from which it can be seen that the prediction errors for all test sample points are small, indicating that the multi-model fusion method has a better prediction effect on this dataset.
Figure 10 shows the prediction results of Experiment 7; the RMSE is 0.3692, and the MAPE is 0.67%. From the figure, it can be seen that the trend of the predicted value is basically consistent with the real value, and the model prediction error is small.
As can be seen from
Figure 7,
Figure 8,
Figure 9 and
Figure 10, the predicted values of the data-driven multi-model fusion-based approach on small-scale datasets are consistent with the trend of the actual values. It has good prediction accuracy, both in the cases of summer low-load and high-load and winter low-load and high-load small-scale datasets.
When experiments are conducted for larger datasets, i.e., when Experiment 2, Experiment 4, Experiment 6, and Experiment 8 are conducted, the prediction results are presented in a tabular form, taking into account the large number of predicted sample points.
Parameter settings for the larger dataset were determined experimentally. The number of sparrow populations was set to 50; the number of sparrow search iterations was set to 100; the optimisation parameter dimension was 2-dimensional; the warning value ST was set to 0.8, and the percentage of discoverers was 20%. The number of nodes in the hidden layer of the BP neural network was set to 10; the activation function selected the ReLU function; the learning rate was set to 0.001; the number of iterations was set to 1000, and the loss function was selected as MSE.
The prediction results of the multi-model fusion prediction method on larger datasets are shown in
Table 9.
As can be seen from
Table 8, the multi-model fusion method also has good results in the 2000-group data size case, which proves the feasibility of the method for larger datasets as well. The prediction error is minimised in the case of high load in winter, with an RMSE of 0.3694 and an MAPE of 0.71%. The prediction error is highest in the case of low load in winter, with an RMSE of 1.0877 and an MAPE of 1.58%. From the experimental results, it can be obtained that the error between the predicted value and the actual value of the multi-model fusion prediction method is about one degree, and the purpose of the transformer oil temperature prediction is to evaluate the transformer operation status or obtain early warning of abnormal data according to the predicted value, and the error of one degree or less is basically in line with the actual requirements of the project.
4.4. Comparison Experiment
The GA-BP prediction model and the SSA-BP prediction model are established from the same sample dataset. The GA-BP prediction model represents the genetic algorithm-optimised BP neural network [
32], and the SSA-BP prediction model represents the sparrow search algorithm-optimised BP neural network [
33]. The effects of different prediction methods on the same dataset are explored through comparative experiments, and the three models are trained on the summer high-load, small-scale dataset and the summer high-load, larger-scale dataset, respectively, and the effects are verified. The prediction results of the three models on the summer high-load, small-scale dataset are shown in
Figure 11.
By comparing and analysing the prediction curves of different methods in the figure, we find that the prediction effect of the multi-model fusion method is better than the other prediction methods. The root mean square errors of the multi-model fusion prediction method, the GA-BP model, and the SSA-BP model on the summer high-load small-scale dataset are 0.5581, 1.8437, and 1.7184, respectively, and the experimental results proved that the multi-model fusion method has the smallest error, and the trend of the prediction curves is the same as the real value.
The prediction results of the three models on the larger-scale dataset with high load in summer are shown in
Table 10, from which it can be seen that the multi-model fusion method used in this paper has a root-mean-square error of 0.6479 and an average absolute percentage error of 0.95%, which is the smallest of the three prediction model errors, further validating the feasibility of the method.
5. Conclusions
In this paper, a transformer oil temperature prediction method based on data-driven and multi-model fusion is proposed, and the innovation of this method has two main points: 1. Data processing and experimental validation based on larger-scale real operating data can, on the one hand, be used to study the abnormal data detection method for real operating data, and the premise of the data-driven approach is to ensure the standardisation and accuracy of the data, taking into account the complexity of the real data. Therefore, the normality and accuracy of datasets are improved through data analysis and processing. On the other hand, for a large amount of various types of real operating data under different conditions, the prediction effect of the multi-model fusion method is investigated by constructing eight groups of different types of datasets, and the prediction effect of the multi-model fusion method is investigated on datasets with different seasons, loads, and sizes to validate the method’s generalisation performance for the real data. 2. The multi-model fusion prediction method is established for transformer oil temperature prediction, and the initial prediction results of different machine learning methods are used as model inputs to the TSSA-BP neural network. The multi-model fusion prediction method uses different base models for initial prediction to improve the generalisation ability of the model, and then uses TSSA to optimise the parameters of the BP neural network, and it finally learns the features of the base model prediction results and the real values via TSSA-BP, which improves the generalisation ability and prediction accuracy of the prediction model.
The experimental results show that the maximum root mean square error and the mean absolute percentage error of this method on different datasets are 1.0877 and 1.58%. The results of the verification experiments show that the error of the multi-model fusion method for oil temperature prediction on the real transformer operation and inspection data is about one degree, which verifies the practicability of the method on transformer oil temperature prediction. Comparison experiment results show that the prediction accuracy of the multi-model fusion prediction method proposed in this paper is better than that of other prediction methods, which verifies the feasibility of the method in transformer oil temperature prediction.
Multi-model fusion prediction methods also have corresponding limitations, mainly in the base model parameter optimisation problem and computational cost considerations; different base models have different model parameters, and time-consuming parameterization of different models is needed to avoid the occurrence of overfitting or underfitting. The multi-model fusion approach increases the computational cost, which needs to be considered in real-time prediction applications for large-scale data. Although the method proposed in this paper achieves good prediction results on large-scale datasets, the actual modelling data in engineering may be larger and more complex, so subsequent research will model the prediction for larger-scale datasets to improve the generalisation ability and prediction performance of the model as much as possible.