Multi-Objective Optimization Method for Power Transformer Design Based on Surrogate Modeling and Hybrid Heuristic Algorithm

Shi, Baidi; Xiao, Wei; Zhang, Liangxian; Wang, Tao; Jiang, Yongfeng; Shang, Jingyu; Li, Zixing; Chen, Xinfu; Li, Meng

doi:10.3390/electronics14061198

Open AccessArticle

Multi-Objective Optimization Method for Power Transformer Design Based on Surrogate Modeling and Hybrid Heuristic Algorithm

by

Baidi Shi

^1,2,†

,

Wei Xiao

^2,†,

Liangxian Zhang

²,

Tao Wang

²,

Yongfeng Jiang

^3,*

,

Jingyu Shang

¹,

Zixing Li

²,

Xinfu Chen

² and

Meng Li

²

¹

College of Mechanical & Electrical Engineering, Hohai University, Changzhou 213200, China

²

Changzhou Xidian Transformer Co., Ltd., Changzhou 213022, China

³

College of Materials Science & Engineering, Hohai University, Changzhou 213200, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work and are considered as co-first authors.

Electronics 2025, 14(6), 1198; https://doi.org/10.3390/electronics14061198

Submission received: 21 January 2025 / Revised: 3 March 2025 / Accepted: 7 March 2025 / Published: 18 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

In response to the increasing demands for energy conservation and pollution reduction, optimizing transformer design to reduce operational losses and minimize raw material usage has become crucial. This paper introduces an innovative methodology that combines ensemble learning models with hybrid multi-objective optimization heuristic algorithms to optimize leakage impedance deviation, on-load loss, and raw material consumption in power transformers. The stacking ensemble model uses support vector machines, linear regression, decision tree regression, and K-nearest neighbors as base learners, with the extreme learning machine serving as the meta-learner to re-learn outputs from first-level learners. Given the significant impact of hyperparameters on the prediction performance of ensemble learning models, an improved particle swarm optimization method is proposed for effective hyperparameter optimization. To assess the uncertainty of the proposed ensemble learning model, a Kriging surrogate model-based analysis is outlined. Moreover, a powerful multi-objective algorithm that integrates the multi-objective grey wolf optimization (MOGWO) and the non-dominated sorting genetic algorithm-III (NSGA3) is presented for model optimization. This approach demonstrates superior performance compared to mainstream multi-objective optimization algorithms. The effectiveness of this method is further validated through the engineering tests of two real engineering cases. The proposed algorithm can accommodate various design requirements and, under the given constraints, achieve a multi-objective optimization design for power transformers, ensuring optimal performance in different operational scenarios.

Keywords:

power transformer; ensemble learning; hybrid algorithm; finite element method; genetic algorithm; hyperparameter optimization; multi-objective optimization; grey wolf optimization; Kriging

1. Introduction

Power transformers are considered essential components in electrical systems [1,2,3,4]. Transformers can regulate the voltage grade from primary to secondary according to the law of electromagnetic induction; the transmission efficiency can almost reach 98% according to related operating loss. It is of great significance to minimize the operating loss considering operational costs and environmental protection. In addition, the transformer must satisfy the electrical performance [5,6] and structural strength [7,8] from the technical protocol as well. However, an acute contradiction exists between manufacturing costs and specification requests in the increasingly competitive global market. These situations mandate the manufacturer to adopt optimization strategies yielding better performance with lower costs while satisfying the given constraints and technical requirements.

In the past, the power transformer design was single-objective-oriented (SOO) [9,10,11,12,13,14], i.e., the specific main performance was selected as the object, and the rest factors were set as the constraints. The most representative work is [12], where five different algorithms were utilized to address transformer design optimization (TDO) under the given design parameters. In addition, in [15,16,17], the winding oil flow paths, on-load (OL) loss, manufacturing costs, and electric performance were optimized via the geometric topology method, numerical method, and heuristic algorithm, respectively. Despite the above TDO methods [9,10,11,12,13,14,15,16,17] effectively optimizing one required object, the lack of global optimization capability makes it easy to omit potential solutions in engineering cases. However, TDO is a typical multi-objective (MO) optimization problem, where the minimizing of both manufacturing costs and operating loss is meaningful. The above methods are incapable of optimizing the manufacturing cost and operating loss simultaneously due to the limitations of the SOO algorithm.

Apart from SOO algorithms, multi-objective optimization algorithms (MOAs) have been extensively applied in complex engineering problems to cope with actual needs. Different from single-objective problems, the MOA aims to find a set of non-dominate solutions, namely Pareto front (PF), based on the theory of Pareto dominance (PD) [18]. Although PD-based MOAs are sometimes time-consuming and complex, they are still charming and widely used in various industries [19,20,21,22,23,24,25,26]. NSGA3 is utilized to decrease energy consumption and carbon emissions and increase thermal comfort during architectural design [18]. The results show that the presented method can best suit design requirements from four different climatic regions and show great robustness among other mainstream MOAs. Due to its powerful global searching ability and robustness, the NSGA3 was also applied to optimize the economic emission dispatch system, industrial environment management, and energy-saving system in manufacturing [19,20,21]. Except for NSGA3, the other MOAs, such as MO particle swarm optimization (PSO), improved NSGA2, while the MO artificial bee colony (ABB) and MO grey wolf optimization (GWO) have also been used in engineering MO problem optimization [22,23,24]. Razmi et al. [22] utilized the finite element analysis (FEA) in conjunction with MOPSO to optimize the energy absorption and initial peak force of the corrugated square tubes, and the results show how the presented methodology can improve the two indicators at the same time. The state-of-the-art method that combined NSGA2 and fuzzy correlation entropy was proposed by Wang et al. [23] to solve the energy-efficient hybrid flow-shop scheduling problem. In [24,25], MOABB and the MOGFO were utilized to solve software next-release problems and software requirement problems, respectively. However, according to the theory of “No free lunch” [26], it is impossible for an MOA to solve all types of MO problems. The complexity of practical engineering problems has shown exponential growth against the background of big data and artificial intelligence. Due to the increasing complexity of design parameters and objectives in the real world, the feasible volume of objectives is ever-growing, and MOAs could suffer from problems in selecting mechanisms and diversity maintenance. Each algorithm has its own specific advantages and drawbacks, such as the high-searching precision, which results in a low convergence rate and a time-consuming solving process. Therefore, it is meaningful to improve the performance of MOA by introducing multiple cooperative mechanisms, constraint handing technology (CHT), and dynamic decaying functions. Zou et al. [27] presented an improved multi-population evolutionary algorithm in conjunction with a new cooperative mechanism, and the results showed that the presented method can obtain competitive solutions to MO problems with multiple constraints. In addition, the hybrid algorithm has shown encouraging performance in improving diversity and can obtain an evenly distributed PF solution to problems via introducing multiple MOAs during evolution [28,29] and is able to mix the advantages and weaken the disadvantages of utilized MOAs. In [28], a hybrid method of NSGA2 and MOPSO is presented to design the optimization of a shell-and-tube heat exchanger; the result showed that hybrid methods are superior to single NSGA2 and MOPSO. Shi et al. [29] optimized the operating loss and manufacturing costs of three-phase three-pillar transformers using a hybrid method, namely MOPSO-NSGA3. The key point of hybrid MOAs is the reasonable use of the advantages of each algorithm during the iterations; the MOPSO features a wide search area and fast convergence, while the search accuracy and solution diversity are not satisfactory, and the above defects can be avoided by using this in conjunction with the NSGA2 [28] and NSGA3 [29]. Inspired by the above techniques [22,23,24,25,26,27,28,29], the hybrid algorithm that utilizes the multi-objective grey wolf optimizer (MOGWO) [30] and NSGA3 is presented, in which the chaotic searching technique is introduced to enhance searching widely and diversity.

It is worth warning that MOAs are all model-oriented, which means the validity and accuracy of the model will directly determine the final optimization effects. The refined calculation of each indicator during the transformer design is of great importance. The transformer operating loss consists of on-load loss (OLL) and no-load loss (NLL). The OLL can be detailed as ohmic loss, stray loss, and eddy-current loss; the NLL is mainly caused by the core. Both can be calculated analytically [31,32,33,34] and numerically [35,36]; the analytical methods [31,32,33,34] usually have obvious expressions that feature low-computing complexity and can be directly applied in massively parallel computation. However, the analytical models’ parameters are determined according to the regression principle, which usually lacks accuracy, generality, and generalization. So far, numerical methods have been widely used in electromagnetic, structural, and temperature fields due to the fast-growing computer performance in which mature electromagnetic computing software [37] has shown great performance in calculating transformer-operating loss and leakage impedance precisely. However, the solving process is usually time-consuming, which makes it hard to apply in the early design stage. Therefore, it will be helpful to present a methodology that can achieve accurate transformer operating loss and support massively parallel computing in the design stage. Due to the “no free lunch” theory, the limitations in each regression model are inevitable, and it is impossible to employ one specific regression model to mine all data features in the sample set. Nowadays, many scholars have been devoted to ensemble learning models to seek higher prediction performance in many fields. To the best of our knowledge, there is no research applying this state-of-the-art technique to the transformer optimization area. Therefore, an ensemble model that combines the extreme learning machine (ELM), support vector machine (SVM), K-nearest neighbors (KNNs), and random forest (RF) was invented to calculate transformer leakage impedance and operating loss in which the learning samples were generated using 3D FEM.

Combining the above contexts, the main contributions of this paper are as follows: (i) a hybrid MO transformer optimization model is presented to evaluate the manufacturing cost, operating loss, and the deviation of the leakage impedance under different design parameters; (ii) a stacking model that integrates the SVM, RF, LR, KNN, and ELM is invented to calculate the operating loss and the leakage impedance of the transformer; and (iii) the hybrid MOA, namely, MOGWO-NSGA3 is proposed to conduct transformer optimization, aiming to satisfy the scenes of different design requirements.

2. Ensemble Learning Model

In this paper, an ensemble learning model is established to evaluate the overall performance of the transformer. Considering that the machine learning model is data-oriented, the sample set will be introduced first. Further, the ensemble learning model will be discussed, including the methodology containing the base learners that made up the stacking model, namely, SVM, KNN, RF, and LR, as well as the meta-learners that utilize ELM.

2.1. Data Description

The research objective of this study is to introduce a three-phase, three-pillar 110 kV voltage class power transformer; the insulation form, winding structure, and fixed structure are relatively universal and widely utilized in the power system. In order to guarantee accuracy, the 3D-FEM is utilized to calculate the load loss and leakage impedance of the power transformer and form the training dataset. The 3D building and the solving of the electromagnetic field are conducted in Simcenter Magnet software. Except for the regression methodologies, the completeness and sufficiency of the sample data will directly determine the upper limit of the model’s performance. It is important to design a dataset in a scientific and rational way; as such, the Box–Behnken Design (BBD) is applied to select representative test sites within the whole sample space. Due to the representative points having the characteristics of “uniform dispersion, neat and comparable”, this design can maximize the spatial distribution characteristics of data samples in a highly efficient, fast, and economical experimental way. The designed parameter is mainly concerned with the electrical parameters and the key structure parameters; the above parameters are, respectively, shown in Table 1 and Figure 1.

In Table 1, due to the voltage level being given, the electrical parameters mainly show the low-voltage (LV) winding and the high-voltage (HV) winding. The main insulation structure of the 110 kV voltage transformer is relatively conventional; the insulation strength can be effectively guaranteed when the distance between LV and HV windings is acceptable. In addition, the longitudinal insulation margin is usually not a major consideration, and it does not need to be emphasized in the design. Moreover, in order to decrease computational complexity and guarantee accuracy, several modifications and simplifications have to be applied:

(a): Due to the insulation components generating no eddy current loss (non-conductors), these components have been omitted.
(b): The mesh refinement quality of the air, the core, and steel components was set as 80 mm, 20 mm, and 40 mm.
(c): Considering the “skin effect”, the skin depth of the structure steel was 3 mm when the excitation frequency was 50 Hz. In order to improve computing efficiency and accuracy, the thin steel components (tank, clamps, and pulling plate) had a set impedance boundary.
(d): The material of the core and the tank shunts were set as the oriented silicon steel sheet; the unit volume loss was calculated with the software built-in function.
(e): Considering the symmetry of the transformer structure, the solution area had a high-voltage side half model. The 3D time harmonic solver was utilized to solve the electric field.

The above settings have been validated in the engineering field and can effectively guarantee magnetic-field-solving accuracy, the details of which can be found in [38,39].

In Figure 1, the key sizes of the core and the windings are marked; once the above parameters were confirmed, the overall size of the transformer, the amount of oil used, the amount of copper wire used, and the amount of steel used in the structural parts could be estimated. In addition, once the dimensional parameters of the core were determined, the sizes of the clamp components and the tank could also be confirmed. Due to the model having built-in 3D software, the usage of each material can be precisely estimated compared with traditional analytical methods. Figure 2 shows the whole equivalent model under the specific parameters.

Figure 2 shows the overall structure under a specific design scheme, in which the sizes of the clamp components and the tank are set as the associated dimensions, which are automatically reshaped according to the core and the winding.

Once the design parameters have been given, the above settings will have automatic association processing. Figure 3 shows the leakage magnetic field and the loss distribution of the calculation model.

In Figure 3, it can be found that once the leakage flux crosses the iron structural parts, which leads to a high loss unit of volume loss, the maximum loss comes from the clamping parts, which can reach 17.784 kW/m³. The on-load loss, load loss, and leakage impedance can also be achieved by the software’s built-in function. The whole setup procedures of the dataset are summarized in Figure 4.

Figure 4 illustrates the process of constructing the finite element sample set. The following should be noted:

Impedance deviation $f_{l d}$ : The finite element model cannot directly compute the impedance. Instead, it calculates the stored energy in the magnetic field domain through integration and then derives the impedance based on the inductive energy storage formula.
Operational losses $f_{l l}$ : The operational losses, including the winding resistance loss, eddy current loss in the windings, stray losses in structural components (as shown in Figure 3 from the tank and the component parts), and core loss, are obtained by summing up the respective losses using the finite element model.
Material usage $f_{c o s t}$ : This primarily includes steel, silicon steel sheets, and copper wire, which constitute the major components of the transformer manufacturing cost.
Transformer volume $f_{V o l}$ : Due to the use of a parametric finite element model, the oil tank dimensions are correlated with the core and winding parameters, enabling the accurate estimation of the transformer volume.

2.2. Base Learners

Unlike the traditional learning model, the ensemble method integrates multiple regression models (base learners) to obtain more accurate results compared with the single learner. Generally, it is significant to introduce base learners with different structures to conduct a homogenous ensemble, which is favorable to the learning of sample features more comprehensively.

2.2.1. Support Vector Machine

In order to solve the regression problem with SVM, it is necessary to introduce the ε-insensitive loss function based on the SVM classification to obtain the support vector machine for regression (SVR). The basic point of SVR is no longer to find an optimal surface on which to separate different types of samples but to minimize the error of all training samples from the optimal regression surface; the regression expression can be simplified as the following:

v r (x) = w^{*} φ (x) + b

(1)

where

φ (x)

is the non-linear kernel function; x and b denote the input variables and the constant adjustment term, respectively. The problem of solving the SVR will eventually be transformed into a quadratic programming problem with constraints, of which the optimization object and constraints are, respectively, the following:

\binom{(w^{*}, b^{*}) = \underset{w, b, ξ_{i}}{arg mim} \frac{{‖w‖}^{2}}{2} + C \sum_{i = 1}^{m} ξ_{i}}{\{\begin{matrix} y_{i} (w^{T} x_{i} + b) \geq 1 - ξ_{i} (\forall i = 1, \dots, m) \\ ξ_{i} \geq 0 (\forall i = 1, \dots, m) \end{matrix}}

(2)

where m is the number of the training dataset. C is the regularization parameter, which is introduced to avoid overfitting in the training dataset.

ξ_{i}

is the slack variable, which is vital to deal with data indivisibility and sample noise. The optimal values of

w^{*}

and

b^{*}

can be calculated based on the convex optimization and duality principle [40,41]. The selection of the C and kernel function directly influences the prediction performance; these hyperparameters are optimized in the next section.

2.2.2. K-Nearest Neighbor

Considering the fact that all primary learners are integrated into the stacking method, it is not recommended to apply excessively strong learners in the ensemble model. The K-nearest neighbor (KNN) is a well-known regression and classification methodology that has no real training process, and the model output is related to only three elements: (a) the selection of k, where the model output is the mean value of the k nearest samples; (b) the distance metric method; and (c) the numerical characteristics of the sample itself.

2.2.3. Decision Tree Regression

Decision tree regression (DTR) is a well-known supervised learning technique that predicts the object’s value by introducing straightforward regression rules from the training dataset. In this ensemble method, DTR was chosen as a basic learner to predict the electrical performance of the power transformer because it has low computation complexity, high robustness, and accuracy.

Figure 5 shows a simplex example of the DTR. The input data are denoted as

X = {x_{1}, x_{2}, x_{3}}

in the figure.

R = {r_{1}, r_{2}, \dots, x_{5}}

relates to the threshold value of the decision.

Y = {y_{1}, y_{2}, \dots, y_{5}}

indicates the output of the model. The actual regression rules are much more complex than the workflow in Figure 1; the characteristics of the model mainly depend on the number of samples and the dimension of the input argument.

It can be noted that every branch of the DTR represents a division of the input variable space in the previous level, and all the branches eventually form a tree structure. If the j-th feature variable

x^{(j)}

is set as the segmentation variable, the j-th feature is divided into two parts:

\{\begin{matrix} R_{1} (j, s) = \{x | x^{(j)} \leq s\} \\ R_{2} (j, s) = \{x | x^{(j)} \geq s\} \end{matrix}

(3)

where

s

is the split threshold. The optimal j and s need to minimize the sum of the square errors of the two regions; that is, the optimization objective can be expressed as follows:

{m i n}_{j, s} [{m i n}_{c_{1}} \sum_{x_{i} \in R_{1} (j, s)} {(y_{i} - c_{1})}^{2} + {m i n}_{c_{2}} \sum_{x_{i} \in R_{2} (j, s)} {(y_{i} - c_{2})}^{2}]

(4)

where

c_{1}

and

c_{2}

are the output values of the above regions

R_{1}

and

R_{2}

. Based on the convex optimization principle, in order to minimize the output squared error, the optimal values of c₁ and c₂ are the mean values of Y in the corresponding region, and Equation (4) can be rewritten as follows:

{m i n}_{j, s} [\sum_{x_{i} \in R_{1} (j, s)} {(y_{i} - \hat{c_{1}})}^{2} + {m i n}_{c_{2}} \sum_{x_{i} \in R_{2} (j, s)} {(y_{i} - \hat{c_{2}})}^{2}]

(5)

Then, the above procedures can be continued while continuing to divide the generated regions R until the model error reaches the limit. The final output of the DTR is given as follows:

f (x) = \sum_{m = 1}^{M} \hat{c_{m}} I

(6)

where M is the number of zones divided. I is the indicator function. During the above steps, it is clear that the maximum depth of the tree M, the sample segmentation threshold ST, and max-leaf nodes MN codetermine the model performance; the optimal parameters will be discussed in the next section.

2.2.4. Linear Regression

The base learners mentioned above belong to the machine learning field. In this section, the statistic linear regression (LR) is introduced to alleviate the information redundancy from similar algorithms. In operational research, LR is often used to measure the complexity of a problem; the optimization objective of this method is the following:

O b j (W) = a r g m i n \sum_{i = 1}^{n} (W x_{i} - y_{i}) + λ \sum_{j = 1}^{m} W_{j}

(7)

where W is the weight matrix of the input variable vector x; y is the label of the corresponding sample. The purpose of this equation is clear, i.e., minimize the mean absolute error under the given dataset. In addition, the item

λ \sum_{j = 1}^{m} W_{j}

is the regularization coefficient, and this item effectively balances the weight of the coefficients. Based on the convex optimization principle, Equation (7) must have an optimal solution when the regularization coefficient is determined.

W = {(X^{T} X + λ I)}^{- 1} X^{T} y

(8)

Here, I is the unit diagonal matrix. The model performance is only decided by the given dataset and has no heuristic hyperparameters that need to be optimized if the equation is without a regularization coefficient. The value of

λ

is found according to the grid search principle, which traverses all values in the search interval with a specific step. In this methodology,

λ

is set as a hyperparameter similar to the above three algorithms, and

λ

is optimized in the next section.

2.3. Extreme Learning Machine

Like the structure of fully connected neural networks, the extreme learning machine (ELM) is a two-layer neural network (NN) structure with random weights, which greatly improves the training efficiency without degenerating the prediction accuracy. The core innovation of the ELM is the fact that the input weights and bias values of its hidden nodes are randomly or artificially set and remain unchanged throughout the learning process. This is significantly different from conventional neural network algorithms that require the continuous optimization of weights through iteration. Different from NN, the algorithm is trained via a well-known error backpropagation algorithm; hyperparameters of the ELM can be determined by solving linear equations or applying methods such as least squares methods to finish it at one epoch, rather than iterating repeatedly, thus greatly speeding up the learning speed. A lot of related studies have proven that although ELM’s weight initialization is random, it still shows good generalization ability in many cases; that is, it can make accurate predictions or classifications on unseen data. The output matrix of the hidden layer H with m hidden neurons is as follows:

H = {[\begin{matrix} g (w_{1} x_{1} + d_{1}) & \dots & g (w_{m} x_{1} + d_{1}) \\ ⋮ & ⋱ & ⋮ \\ g (w_{1} x_{N} + d_{1}) & \dots & g (w_{m} x_{N} + d_{m}) \end{matrix}]}_{N \times d}

(9)

where

g

is the activation function. w and d refer, respectively, to the weight and the bias of the hidden neurons. This can be simplified as follows:

H = g (X W + b)

(10)

where X is the input matrix with N samples; therefore, the shape of X is N × d; b is the bias matrix with the shape of N × 1; W is the weight coefficient matrix; and the shape of the matrix is d × N. It should be noticed that W and

b

are randomly initialized and remain unchanged during learning. The final output of the ELM

\hat{T}

is the following:

\hat{T} = β H

(11)

where

β

is the weight matrix of the output layers. The main computational step of ELM is to determine the weights from the hidden layer to the output layer. By minimizing all sample prediction errors as the optimization objective, the β matrix can be obtained based on the least square method.

β = {(H^{T} H)}^{- 1} H^{T} T

(12)

Due to the fact that X and b in the hidden layer can be randomly initialized or manually set, this contributes to the performance variance. Therefore, the weight matrix X and bias matrix b are both optimized in the next section.

2.4. Integrated Strategy

Stacking is a well-known method of combining individual learners by training them; meanwhile, individual learners and their integrators are, respectively, named first-level learners and meta-learners. The basic idea of the stacking algorithm is as follows: (a) use raw training set data to train the first-level learners; (b) generate a new training dataset that utilizes the outputs from the first-level learners as the input, and the labels of the training dataset as the output; and (c) train the meta-learner with the new dataset. Therefore, it is necessary to integrate non-homogenous first-level learners in order to maximize the model’s performance. Because the stacking method adopts the structure of multilevel learner association, the excellent non-linear fitting ability may cause “overfitting” and degenerate the predictive performance in real engineering applications. The k-fold cross-validation is applied as the sample partitioning strategy to evaluate the model performance in the training dataset. In this method, the original dataset is evenly divided into k folds

\{D_{1}, \dots, D_{k}\}

, and the four previously mentioned base learners are trained k times. In each epoch, each base learner utilizes k-1 folds for training and the remaining fold for validating, and it is clear that every sample has the chance to become training data and validating data. Finally, it generates k results for each base learner after the k-fold cross-validation. The above results can be analyzed statistically to evaluate the differences and performances of each model. The selection of k has no specific rules; typically, it should achieve a trade-off between computational costs and model performance. Combined with Section 2.1, Section 2.2, Section 2.3 and Section 2.4, the workflow of this ensemble learning model is depicted in Figure 6.

Figure 6 systematically depicts the whole workflow of the presented ensemble learning model; the base learner and the meta-learner were introduced in the previous section.

It should be noted that the initial hyperparameters affect the prediction ability and the robustness of machine learning models. Therefore, an improved PSO can be applied to conduct hyperparameter optimization; this algorithm is introduced in the next section.

2.5. Surrogate-Based Optimization Strategy

In recent years, mainstream finite element analysis (FEA) software such as ANSYS Workbench 2024R2 and COMSOL 6.3 has integrated machine learning (ML) and deep learning (DL) surrogate models to accelerate finite element evaluations and reduce computational costs. These surrogates enable efficient optimization by minimizing the need for full-order simulations.

However, the built-in regression models in ANSYS and COMSOL rely on outdated methodologies and lack flexibility, adaptability, and predictive accuracy for complex engineering optimizations, especially in transformer design. To address these limitations, this study proposes an ensemble learning-based surrogate modeling framework that integrates multiple base learners with a meta-learning strategy, significantly enhancing both prediction accuracy and generalization.

Based on the finite element sample set (Section 2.1) and the stacking-based ensemble learning framework (Section 2.2, Section 2.3 and Section 2.4), the proposed method constructs a black-box surrogate model for transformer performance evaluation. Although lacking an explicit analytical expression like traditional mathematical formulations, this model can still be represented using a standard mathematical expression, allowing for seamless integration into the optimization framework.

s t a c k i n g (θ) = {f_{l d}, f_{l l}, f_{c o s t}, f_{V o l}}

(13)

Here,

θ

represents the set of design variables for the power transformer, with the electrical parameters referenced in Table 1 and dimensional and positioning parameters in Figure 1. The outputs of the surrogate model,

f_{l d}

,

f_{l l}

,

f_{c o s t}

, and

f_{V o l}

, correspond to the transformer’s short-circuit impedance, operating losses, material cost, and volume, respectively. It is important to note that

f_{l l}

represents the total losses of the transformer, including both load loss and no-load loss under a given capacity. Considering the fluctuations in raw material prices.

f_{c o s t}

reflects the actual material consumption, including copper, iron, and silicon steel sheets rather than absolute cost values. Similarly,

f_{V o l}

represents the overall volume of the transformer, which can further be expressed in terms of its length, width, and height dimensions (l × w × h).

Except for the above optimization objects, similar to engineering optimization problems, transformer design must satisfy multiple constraints, including the induced voltage constraint, the ratio of turns constraint, the no-load loss constraint, and other related limitations. The detailed analytical expressions for these constraints can be found in [41] (Section 2.3) where they are formulated as inequality constraints, equality constraints, and real-number constraints. Based on these formulations, the transformer optimization model can be converted into a standard operational research framework as shown in Equation (14):

\begin{matrix} \min F (f_{1} (x), f_{2} (x), \dots, f_{M} (x)) \\ \begin{matrix} h_{j} (x) \leq 0, j = 1,2, \dots, J \\ h_{k} (x) = 0, k = 1,2, \dots K \\ \begin{matrix} x_{i}^{l b} {\leq x}_{i} \leq x_{i}^{u b}, i = 1,2, \dots N \end{matrix} \end{matrix} \end{matrix}

(14)

where M is the number of optimization objectives.

h_{j}

denotes the j-th inequality constraint, with a total of J constraints. Similarly,

h_{k}

denotes the k-th equality constraints, with a total of K constraints.

x_{i}^{l b}

and

x_{i}^{u b}

, mean the lower and the upper bounds of the i-th decision variable, respectively.

In the transformer optimization process, specific objectives are typically selected from cost, loss, volume, and short-circuit impedance based on the project’s technical specifications and requirements. The detailed optimization procedure will be further elaborated in the subsequent optimization section.

3. Model Validation

3.1. Optimization with an Improved PSO

It is necessary to apply an optimizer to search for the optimal hyperparameters in base learners to improve the model’s overall performance. The bird predation process inspires the PSO, and the bird population’s social leadership and individual behavior are simulated during the iterations. The PSO has been widely utilized in engineering optimization due to its high global searching ability and robustness [42,43]. In order to systematically model the above processes, the i-th individual defines two physical attributes, namely, the velocity V_i and the position X_i. During the iteration, each particle updates its speed and position according to the individual historic optimal and population optimal, which can be described by the following equations:

V_{i}^{t + 1} = w V_{i}^{t} + c_{1} r_{1} (P_{i} - X_{i}^{t}) + c_{2} r_{2} (P_{g b} - X_{i}^{t})

(15)

X_{i}^{t + 1} = X_{i}^{t} + V_{i}^{t + 1}

(16)

where t denotes the current iteration. w is the local acceleration constant. r₁ and r₂ are the random number distributions in [0,1] that let the individual show more behavior throughout optimization, benefiting exploration and local optimal avoidance. c₁ and c₂ refer to the individual acceleration coefficient and the global acceleration coefficient, respectively.

c₁, c₂, and w are set as the constant in the traditional version; this strategy can effectively reduce computing complexity. However, setting it as the constant during the whole iteration is inappropriate. In this improved PSO, the c₁, c₂, and w are deliberately designed as the linear decay functions in order to emphasize the global searching ability in the early stage and guarantee the convergence in the final; they can be calculated using the following equations:

w = (w_{e} - w_{s}) \frac{(M - t)}{M} + w_{s}

(17)

c_{1} = (c_{1 e} - c_{1 s}) \frac{t}{M} + c_{1 s}

(18)

c_{2} = (c_{2 e} - c_{2 s}) \frac{t}{M} + c_{2 s}

(19)

where M is the maximum number of iterations. w is linearly decreased from the initial coefficient

w_{e}

to the final value

w_{s}

. The same strategy is used for

c_{1}

and

c_{2}

; the series parameters (

c_{1 e}

,

c_{1 s}

,

c_{2 e}

, and

c_{2 s}

) are not repeated here. In addition, the decay function can also be set as the exponential function to make w smooth in the numerical transition phase, the details of which can be found in [44,45]. Combined with the abovementioned key steps, the whole workflow of the improved PSO algorithm can be described in Table 2.

Table 2 systematically describes the workflow of the improved PSO (which will be marked as PSO* in the following section). Step (1.1) initializes the population according to the design variable interval, and the generated X and V obey uniform distribution to guarantee the global searching ability. The fitness function is the root mean square error (RMSE) of the applied algorithm in the testing dataset, which can be calculated using the following equation:

R M S E (y, \hat{y}) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(\hat{y_{i}} - y_{i})}^{2}}

(20)

where

y_{i}

is the actual output of the dataset, and

\hat{y_{i}}

denotes the output of the utilized model.

3.2. Performance Evaluation Under Different Optimizers

The ensemble learning model is constructed in Section 2. In order to ensure the accuracy of the model, parametric finite element software was utilized to establish the model training sample set, thereby achieving the precise calculation of transformer loss, short-circuit impedance, and material usage. Considering that the above models are machine learning algorithms, the selection of their hyperparameters directly affects the model prediction performance. Therefore, special machine learning hyperparameter optimization will be carried out in this chapter. In order to prove the effectiveness and the superiority of the improved PSO, the whale optimization algorithm (WOA), genetic algorithm (GA), differential evolution (DE) algorithm, and the traditional PSO were also utilized to optimize the hyperparameters of the presented ensemble learning model. Table 3 illustrates the main parameters of the five heuristic algorithms mentioned. It is worth noting that in the stacking framework mentioned in this paper, the samples are divided into five folds for cross-validation, i.e., the sample set is divided into five parts, and five iterations are performed. In each iteration, four parts of the samples are selected as the validation set, and the remaining part is selected as the validation set to verify the effectiveness of the current learning. Under this strategy, the individual evaluation of each heuristic algorithm in each iteration of the iteration contains the results of five sample sets, and the average of the five results is used as the population fitness value. Therefore, in the hyperparameter optimization process mentioned in this paper, every sample can be used as both the training set and the validation set, which can avoid the interference of random errors on the model performance evaluation. However, under this strategy, the evaluation of the individual fitness function of the heuristic algorithm needs to be performed in additional nested loops and machine learning model training, which takes a long time and causes space complexity.

All experiments were executed using Python 3.10.4 in a ubuntu 20.04 environment; a desktop computer with an AMD (R) Core (TM) R9-7950X CPU @4.7 GHz; and random access memory (RAM) of DDR5 6400 MHz 64 GB. The population number and the iteration times of each algorithm were, respectively, set at 400 and 100. Figure 7 shows the RMSE curve of each optimizer under iterations; to facilitate the visualization of the parameter optimization process, each point represents the mean value of the three normalized regression indicators.

As depicted in Figure 7, the model fitting error continues to decrease with the progression of iterations. The selection of hyperparameters greatly influences the ensemble learning model, and proper parameters can effectively enhance the predictive performance of the model. During the initial phase of the iteration, particularly within the first 50 iterations, each optimization algorithm is capable of significantly decreasing the fitting deviation of every ensemble learning model. In addition, by enlarging the results of the last 100 iterations, the improved PSO can obtain the lowest root-mean-square error at the end of the iteration, which is the best-fitting regression accuracy on the test set. It can be inferred that the presented PSO is the most suitable optimizer to search for the hyperparameters.

3.3. Performance Evaluation Under Different Models

To demonstrate the superiority of this model over other machine learning and deep learning models of the same type, we conducted separate training sessions to compare its performance with KNN, DTR, ELM, SVM, LR, random forest (RF), and full connection network (FCN) models using an established sample dataset. The hyperparameters in each model were optimized using the GA. For the FCN model, network size specifications were aligned with those of the ELM setting; 200 iterations were performed; neuron initialization followed the Kaiming principle; and parameter training utilized the Adam optimizer. As for random forest, it is an iterative Bagging integration model that combines multiple decision trees through a homogeneous integration method known for its strong generalization ability. To mitigate random fluctuation errors, each model was trained 10 times. Table 4 shows the mean RMSE value of the leakage impedance, the operating loss, and the material usage under different regression models of which the best score value has been marked in bold.

From the 10 training results, the stacking model achieved a better average RMSE value and had better algorithm stability compared with other mainstream framework models, which means the presented stacking model has better prediction performance compared with the mainstream machine learning models. Because this ensemble learning adopts a heterogeneous ensemble strategy, in order to verify that this ensemble learning strategy does not have redundancy, all kinds of first-level learning models have certain irreplaceability and can effectively improve the prediction accuracy and generalization ability of the ensemble learning model; it is necessary to carry out a difference analysis on the prediction capabilities of different models and their combinations. For the SVM, DTR, LR, and ELM models involved in this ensemble learning model, testing set samples were used to make predictions, and the Pearson correlation coefficient

p_{x y}

was used to evaluate the correlation of related models.

p_{x y} = \frac{n \sum_{i = 1}^{n} x_{i} y_{i} - \sum_{i = 1}^{n} x_{i} \sum_{i = 1}^{n} y_{i}}{\sqrt{n \sum_{i = 1}^{n} x_{i}^{2} - {(\sum_{i = 1}^{n} x_{i})}^{2}} \sqrt{n \sum_{i = 1}^{n} x_{i}^{2} - {(\sum_{i = 1}^{n} x_{i})}^{2}}}

(21)

Here, n is the number of samples in the selected testing set; x and y refer to the predicted values of the two comparison models. The Pearson correlation coefficients satisfy the exchange principle, and the final correlation coefficient matrix is a symmetric matrix; Figure 8 shows the results.

As illustrated in Figure 8, the autocorrelation of each individual model is one. Notably, the primary learning models (ELM, SVM, and DTR) exhibit autocorrelation coefficients greater than 0.84. Although their predictive abilities are similar, the subtle differences among them enhance the effectiveness of heterogeneous integration. Moreover, while the linear regression (LR) model—when using only the penalty term—demonstrates relatively weaker fitting performance, its lower correlation with the other models helps mitigate overfitting. Finally, the coefficients between the stacking model and the individual base models are all below 0.9, indicating that there is no significant redundancy. Overall, the low inter-model correlation within the stacking framework, along with the incorporation of penalty terms, effectively prevents overfitting and ensures the robustness of the integrated learning approach.

3.4. Uncertainty Quantification

In scientific research and engineering applications, model uncertainty analysis is a critical component for evaluating the performance and reliability of predictive models. Despite the outstanding performance of machine learning models, these models typically provide deterministic predictions without quantifying uncertainty. This limitation can lead to unreliable predictions in regions where training data is sparse or where the target function exhibits complex variations. In critical domains such as engineering optimization, unquantified uncertainties may pose significant decision-making risks. Consequently, model uncertainty analysis not only helps quantify the reliability of predictions but also guides the improvement of models and the optimization of data sampling strategies, thereby enhancing the overall model performance.

Uncertainty in models can generally be categorized into two types: model errors and data noise. Model errors arise from limitations in the model structure or biases introduced during the training process, whereas data noise reflects inherent randomness in the data. In this study, we incorporated a Kriging surrogate model [46,47] (also known as Gaussian Process Regression, GPR) to quantify uncertainty in the predictions of the stacking model. Based on the Gaussian Process assumption, the Kriging model constructs a covariance function over the input space to estimate both the mean and variance in the target function. The predicted mean represents the optimal estimate of the target function, while the predicted variance quantifies the confidence (or uncertainty) of the model.

Specifically, the Kriging model first builds a covariance matrix to represent the relationships between the training points and constructs a global surrogate for the target function. In this study, the Kriging model fits the prediction outputs of the presented stacking model and evaluates the confidence of the predictions across the input space. For the test sample set generated by the output of the stacking model, 75% of the samples are used to train the Kriging surrogate model, while the remaining samples are utilized for output prediction. By analyzing the variance predicted using Kriging, we can identify regions where the stacking model exhibits weak performance. As such, the simultaneous use of the RBF kernel (Radial Basis Function) and the WhiteKernel (white noise kernel) can comprehensively capture the characteristics of the target function and any inherent bias in the sample data. This entire series of steps has been fully integrated into the “Gaussian Process” module of the scikit-learn toolbox. Figure 8 illustrates the output characteristics of the Gaussian Process Regression model for the test set samples across various objectives under the 95% confidence interval.

Figure 9 summarizes the uncertainty deviations for each objective. The predicted mean curve of the Kriging model is expected to align with the overall trend of the true values, which means that the Kriging surrogate model effectively captures the function values of the optimization objectives. Additionally, most of the true values fall within the 95% confidence interval, indicating the overall systematic error of the proposed stacking model and the fact that the systematic uncertainty bias of the proposed stacking model is well controlled.

4. Hybrid Optimization Algorithm

Section 3 validates the effectiveness and superiority of the proposed integrated learning model; on the one hand, the proposed model has superiority over the same type of regression models. On the other hand, the heterogeneous integration strategy adopted can better meet the needs of integrated learning, achieving better prediction accuracy with the diversity of the models. In this chapter, we optimize the presented ensemble learning model to minimize the leakage impedance deviation, operating loss, and material usage of the transformer. Aiming at the problem of premature convergence or the low searching accuracy of the final Pareto solution set in a single heuristic MO optimization algorithm, a remarkable hybrid MO heuristic algorithm will be established to satisfy the actual requirements of engineering.

4.1. MOGWO Algorithm

MOGWO is a meta-heuristic algorithm proposed by Mirjalili et al. [25] in 2016, designed by simulating the social hierarchy and cluster predative strategy of wild wolves. The algorithm introduces a certain degree of randomness and exploration in the optimization process, which helps to remove the local optimal state and has the characteristics of wide applicability and fast convergence. The parameters to be optimized are regarded as gray wolf populations, which are divided into four levels and marked as

α, β, γ

and

ω

, in which, the populations in

α

,

β

, and

γ

lead the individuals in

ω

to update their position. In addition to systematically modeling the social behaviors of the grey wolves, the following equations were applied to simulate the encircling behavior of the wolves:

\{\begin{matrix} D = E {\cdot G}_{p} (t) - G (t) \\ G (t + 1) = G_{p} (t) - A \cdot D \\ A = 2 a \cdot r_{1} - a \\ E = 2 r_{2} \end{matrix}

(22)

where D is the distance vector; t is the number of iterations. A and E are the coefficient matrix. G_p and G refer to the position of the prey and the position of the grey wolf, respectively. r₁ and r₂ are the random vectors within the interval [0,1].

a

is the convergence factor, which is like the w in PSO and is set as the linear decay function. So far, the iteration steps are like the traditional GWO; in single-objective optimization, the selection of the α, β, and γ can directly correspond according to their fitness value. In MO, the comparative analysis of the advantages and disadvantages of two individuals within a population is typically not straightforward, and it is necessary to introduce a Pareto dominance relationship to evaluate the population. The non-dominant optimal solution is determined at the end of each generation, and the Archive population is updated by evaluating the individual fitness value of gray wolves. During the update process, it is necessary to check if the number of individuals in the Archive population exceeds the maximum requirement for archiving. If so, it should rearrange the target space division according to the adaptive grid rules. This involves identifying and ignoring one solution from the most crowded area while inserting the latest solution into the least crowded area. Additionally, α, β, and γ grey wolves are selected from the Archive population using a roulette strategy to determine their locations, which then serve as references for calculating other grey wolves’ positions; the calculation process is as follows:

\{\begin{matrix} G_{1} = G_{α} - A_{1} |E_{1} {\cdot G}_{α} (t) - G (t)| \\ G_{2} = G_{β} - A_{2} |E_{2} {\cdot G}_{β} (t) - G (t)| \\ G_{3} = G_{γ} - A_{2} |E_{3} {\cdot G}_{γ} (t) - G (t)| \\ G (t + 1) = \frac{G_{1} + G_{2} + G_{3}}{3} \end{matrix}

(23)

where

G_{α}

,

G_{β}

and

G_{γ}

are the current position of the wolf, which is α, β, and γ in this iteration t. G(t) is the position of the other individual in

ω

in this iteration t, and G(t + 1) is the position after updating. Combined with the above content, the flow chart of the MOGWO algorithm is shown in Figure 9.

Figure 10 systemically depicts the workflow of the MOGWO; it can be found that the algorithm has a series of advantages, such as simple structure and a few parameters, and only the number of wolves, the number of iterations, and the Archive size need to be set. Due to the α, β, and γ being randomly selected from the Archive in each iteration, which will benefit the global search ability and avoid local optimal, this strategy also weakens the search accuracy and algorithm convergence ability. In addition, according to the “No free lunch” theory, among all possible algorithms and problems, on average, no one algorithm can perform best on all problems; that is, a single algorithm cannot avoid its limitations. Therefore, to make the algorithm have better robustness and universality, we utilized MOGWO as the first-stage algorithm due to the fact that its characteristics can greatly meet the needs of the initial iteration of the multi-objective evolutionary algorithm and explore the frontier direction of Pareto in all directions. To sum up, MOGWO is used in the first stage Pareto frontier direction search in this hybrid algorithm, and the population generated after the iteration is used as the initial population of the NSGA3 algorithm to carry out the second-stage fine search. In addition, the “random reinitialization” strategy is applied in boundary processing; if an individual crosses the boundary, it is randomly reinitialized within the feasible space, which can introduce diversity and avoid particle clustering near the boundaries. This method can also favor the global search ability.

4.2. NSGA3 Algorithm

Except for the global search ability, the difficulty of the MO also lies in the maintenance of the population diversity and the precision of the solution. NSGA3 can better maintain population diversity and uniformity in the iterative process by introducing a wide reference point mechanism compared to the traditional heuristic multi-objective algorithm. In addition, its main frame is like the classical genetic algorithm; it inherits its excellent characteristics of high search accuracy and strong robustness and can effectively meet the actual needs of the second stage of this hybrid evolutionary algorithm. The iterative process of the t-round can be seen in Figure 11. A methodology called “Adaptive penalty” is applied in boundary handling, ensuring that some good quality solutions will not be abandoned due to overstepping the boundary. This is particularly useful in constrained optimization problems where maintaining feasibility is crucial.

The main structure of the NSGA3 algorithm is like NSGA2, but its difference is only reflected in the final population retention. Its core mechanism is still how to select N optimal individuals from a contemporary population with a size of 2N to enter the next iteration. NSGA3 divides the population Pt into multiple subsets with the same dominance level based on a fast, non-dominated sorting algorithm {ND₁, ND₂, …}; subsets with smaller subscripts have a higher dominance rank and are preferentially selected into the next generation population. In the subsequent process, if the number of the next generation population set P^t⁺¹ is equal to N, the round of that iteration is complete. When the population size is less than N, individual selection from the ND_l₊₁ subset is based on the reference point mechanism until the next generation population is fully identified. The difference in the NSGA series algorithms lies in the screening mechanism of individuals in the same domination level set. The detailed iterative process and mechanism can be seen in the literature [21]. In NSGA3, the reference point ranking mechanism is used to solve the problem of poor uniform distribution in the Pareto frontier solution and the deterioration of the searchability when there are many optimization objectives (greater than or equal to three).

4.3. Hybrid Algorithm

Two mainstream heuristic multi-objective optimization algorithms of MOGWO and NSGA3 are introduced, respectively, in Section 4.1 and Section 4.2 of the above chapters. However, based on the “free lunch law”, an increase in the performance of a single algorithm inevitably leads to a decrease in the performance of other indicators or an increase in the computational time and space complexity. Therefore, if the above algorithm is used alone, it will inevitably have a series of problems, such as premature convergence, a low-quality solution set, or slow convergence speed. Considering the limitations of a single algorithm, this paper proposes a two-stage hybrid algorithm according to the actual characteristics of the MOGWO and NSGA3 algorithms. The specific algorithm flow can be seen in Figure 12.

The flow of the proposed hybrid multi-objective evolutionary algorithm is illustrated in Figure 12, effectively integrating the strong global search capability of MOGWO and the strong convergence capability of NSGA3. A two-stage strategy is adopted to effectively solve a series of problems, such as the premature convergence and slow convergence rate encountered by a single algorithm. In the first stage of the hybrid algorithm, A and E are set accordingly in the MOGWO algorithm to maximize the direction of the feasible searching domain. After processing the MOGWO algorithm, its final population can be used as the initial population for the NSGA3 algorithm, followed by carrying out a refined search in the second phase of the NSGA3 algorithm process.

5. Engineering Optimization

5.1. Background

In the previous chapters, a multi-objective analysis model was established, and an improved hybrid multi-objective optimization algorithm was proposed. This chapter verifies the feasibility of its application in power transformer optimization design through practical projects. Table 5 lists the technical requirements of a real power transformer. This case is from a photovoltaic power station project, which requires ten transformers with identical specifications. Given the availability of extensive project delivery data and the fact that this type of transformer has been routinely manufactured by the company and widely deployed within the power system, a substantial amount of prototype data is available for performance comparison.

Additionally, the overvoltage capacity of the 110 kV power transformer can be reliably guaranteed as long as the winding insulation distance exceeds a certain threshold, considering that the main insulation and vertical insulation forms are relatively fixed. Therefore, there is no safety risk in the series optimization design as long as the margin is ensured. For the optimization scenarios, the following two cases are considered:

Case 1: Minimizing operating loss $f_{l l}$ and material usage $f_{c o s t}$ :

This scenario focuses on minimizing operating loss while reducing material usage. The challenge here lies in the trade-off between achieving minimal operating loss and minimizing the materials used in the transformer, as reducing losses often requires optimizing core and winding designs, which may result in increased material consumption. The optimization algorithm aims to find an optimal balance that satisfies both design criteria, ensuring efficient performance while keeping manufacturing costs in check.

Case 2: Minimizing impedance deviation $f_{l d}$ and volume $f_{V o l}$ :

This scenario focuses on minimizing leakage deviation while reducing the transformer volume. The primary challenge here is the trade-off between reducing leakage deviation and minimizing the transformer’s physical size. Optimizing leakage often involves improving the transformer’s efficiency, which could lead to an increase in its volume. Reducing the transformer’s volume is critical for transportation and safety, particularly in special use cases such as offshore wind farms and urban islands, where space constraints and transportation logistics are major concerns. The optimization algorithm aims to strike a balance that ensures both operational efficiency and practical feasibility in these demanding environments.

5.2. Optimization Results

To overcome the limitations of relying solely on the MOGWO-NSGA3 algorithm for multi-objective optimization, this study employs a combination of mainstream algorithms, including MOGWO, NSGA3, and MOPSO, and MODE, to optimize the two proposed scenarios. A performance comparison of these algorithms is conducted, with the corresponding parameter settings as follows:

(1): Iteration and Population Size: To ensure the comparability of the parameters, the total number of iterations for the proposed MOGWO-NSGA3 hybrid algorithm is set to 400, with MOGWO and NSGA3 each running 200 iterations. For other comparison algorithms, a total of 400 iterations is performed. Additionally, the population size for all heuristic algorithms is set to 200.
(2): MOGWO and MOPSO Parameter Settings: First, to ensure good global search capability in the early stages and effective convergence in the later stages, both the convergence factor and the inertia weight in MOPSO and MOGWO are set to follow a linear decay strategy. Furthermore, a crowding distance sorting principle is introduced in both MOPSO and MOGWO, where the most optimal individuals are selected from the Pareto front to guide the remaining population in global optimization.
(3): MODE and NSGA3 Parameter Settings: Since both MODE and NSGA3 are genetic algorithms, their structural frameworks are consistent with that of the hybrid algorithm. Therefore, the mutation factor and crossover probability are both set to 0.9 and 0.1, respectively, in line with the settings of the hybrid algorithm.

The choice of these parameter settings is primarily driven by the complexity of the problem, as it requires balancing exploration and exploitation effectively, ensuring the algorithms can handle the trade-offs between conflicting objectives and reach optimal solutions efficiently. To further enhance the reliability of the results, each algorithm is run 10 times, and the Pareto non-dominated solutions obtained from each run are combined into a single set. From this combined set, the final Pareto front is identified by selecting all non-dominated solutions. This approach allows for a more robust assessment of each algorithm’s performance by reducing the impact of randomness and ensuring a diverse and representative set of solutions. In the final visualization, different algorithms are represented using distinct markers, allowing for clear comparisons of their ability to explore the solution space and identify optimal trade-offs between the conflicting objectives. Figure 13 and Figure 14 present the Pareto optimal fronts obtained by the mainstream multi-objective optimization algorithms for the two proposed scenarios under the hyperparameter settings.

Figure 13 and Figure 14 summarize the final Pareto optimal solutions obtained by each algorithm in two optimization scenarios. The proposed MOGWO-NSGA3 algorithm achieved non-dominated solution sets of 54.8% and 68.9%, respectively. This highlights the algorithm’s superior global search capability, outperforming mainstream multi-objective optimization algorithms. The use of the “best-of-the-best” strategy ensures that the most promising solutions are identified, demonstrating the algorithm’s effectiveness in maintaining diversity and robustness in the solution set. This confirms that MOGWO-NSGA3 is highly effective for engineering optimization problems, providing a reliable tool with which to capture Pareto front solutions with high quality and diversity, making it suitable for real-world applications.

The Pareto front shown in Figure 13 effectively highlights the sharp trade-off between manufacturing costs and operational losses in the transformer design process. Measures to reduce losses, such as increasing the winding cross-section, enlarging the core cross-section, and using highly magnetic permeability silicon steel sheets, all lead to a significant increase in manufacturing costs. Furthermore, it is important to note that due to constraints such as short-circuit impedance, transportation limitations, and electrical conditions, some potential solutions were excluded from this study. As a result, the final Pareto optimal front obtained is not continuous. This phenomenon is more pronounced in Figure 14. Although there is no direct opposing relationship between the impedance deviation of the transformer and its overall volume, the short-circuit impedance can be adjusted not only by modifying the winding channel distance but also through variations in the number of turns and the asymmetry of ampere-turns. However, the transformer’s volume is primarily determined by the core dimensions. Therefore, the trade-off between minimizing volume and optimizing impedance is mainly realized through indirect constraints such as no-load losses, cost, and electrical performance.

As a result, the final Pareto front, in this case, does not exhibit the same degree of opposition as shown in Figure 13, and the Pareto front demonstrates significant breakpoints, revealing strong non-linear characteristics. The final Pareto optimal solutions obtained in Figure 13 and Figure 14 are further filtered according to the following principles: For Scenario 1, the optimal selection is made from the already obtained Pareto optimal front, with the criteria of impedance deviation being less than 2% alongside the minimization of volume. For Scenario 2, since minimizing the impedance deviation and volume may lead to a significant increase in cost, the filtering criterion is solely based on minimizing volume. Subsequently, the series of transformers were designed and manufactured based on the selected optimal options. Table 6 summarizes the performance indicators of the power transformers under the two different optimization strategies. The “Origin” column in Table 6 represents the measured performance data of power transformers produced using conventional optimization methods before the implementation of this project. This traditional approach is based on the branch and bound method, following a top-down strategy, where the optimization process is conducted sequentially, starting with the core, followed by the primary winding, the secondary winding, and other structural components. Furthermore, during this process, practical constraints from the project are embedded, and if any constraint is not satisfied, the design iteratively backtracks and reoptimizes. As a result, this method primarily yields feasible solutions but does not guarantee global optimality.

The optimization results demonstrate the effectiveness of the proposed hybrid MOGWO-NSGA3 algorithm in the power transformer design process. The algorithm successfully optimized both the operational losses and material usage (Scenario 1), as well as the impedance deviation and transformer volume (Scenario 2):

Scenario 1: The optimization led to a reduction in off-load loss (from 244.7 kW to 237.9 kW) and on-load loss (from 29.4 kW to 27.1 kW), alongside a reduction in copper usage (from 11,205.3 kg to 10,204.3 kg) and silicon steel usage (from 35,774.1 kg to 33,751.1 kg). These improvements indicate that the MOGWO-NSGA3 algorithm was able to find solutions that maintain efficiency while minimizing material consumption. The optimization design was achieved primarily by reducing the axial dimension of the windings and lowering the winding current density, which allowed for significant reductions in losses and material usage, all while ensuring that electrical performance and constraints were met.
Scenario 2: The algorithm optimized the impedance deviation and transformer volume, achieving a significant reduction in volume (from 5.44 × 1.73 × 2.79 m to 5.14 × 1.61 × 2.64 m) while keeping the leakage impedance deviation below the desired threshold of 2%. To achieve this minimized design, the core dimensions were reduced, and the distance between the windings and the tank also decreased. However, this led to an increase in the magnetic flux density, which, in turn, caused a rise in the overall stray losses of the transformer. Although this came with a slight increase in off-load loss and copper usage, the overall trade-off demonstrated the algorithm’s ability to balance competing objectives and satisfy the constraints imposed by engineering requirements. For transformers designed for special applications, such as those required by offshore wind farms or urban islands, these trade-offs are acceptable to meet the specific technical conditions and the needs of the application scenarios.

Thus, the results confirm the effectiveness of the proposed algorithm for real-world engineering applications, where optimization involves complex trade-offs between multiple conflicting objectives. The algorithm’s strong performance in finding optimal solutions ensures that it can be reliably applied in the design and manufacturing of power transformers, helping to meet performance and operational constraints while minimizing costs and improving efficiency.

6. Conclusions

This paper utilized state-of-the-art ensemble learning techniques and the multi-objective heuristic algorithm in a transformer design to minimize the operating loss, the deviation of the leakage impedance, and the manufacturing costs. The results show that the presented ensemble learning model can effectively evaluate the overall performance of the power transformer. On this basis, a hybrid MO algorithm named MOGWO-NSGA3 was established to conduct engineering optimization, and the effectiveness of this method was verified by engineering tests. In addition, several related studies have been carried out:

(a): A high-quality transformer design dataset was established based on the parametric finite element model to evaluate the leakage impedance, operating loss, and manufacturing cost under different design parameters, which lays a foundation for subsequent integrated learning.
(b): An improved PSO algorithm is presented to optimize the hyperparameters of the ensemble learning model, which showed better performance compared to the mainstream heuristic algorithm.
(c): The Pearson correlation analysis revealed that the base learners within the stacking ensemble model exhibit low correlation, confirming the benefits of heterogeneous integration in improving prediction accuracy. Additionally, uncertainty analysis was conducted using the Kriging surrogate model, showing that the systematic uncertainty bias in the proposed stacking framework remains within an acceptable margin of 5%, ensuring the robustness of the model.
(d): The MOGWO-NSGA3 optimization algorithm was further validated by analyzing its capability to handle complex trade-offs between multiple conflicting objectives in the transformer design. The results show that the proposed algorithm effectively reduces operating losses by 2.9% and manufacturing costs by 8.4% while ensuring compliance with engineering constraints. Furthermore, the algorithm demonstrated its ability to balance impedance deviation and transformer volume, making it highly suitable for specialized applications such as offshore wind farms and urban substations. These trade-offs, while leading to a slight 1.7% increase in the off-load loss and an 8.3% increase in copper consumption, remain within acceptable limits under specific technical constraints and operational requirements.

However, the proposed power transformer optimization approach also has certain limitations:

(a): High computational costs for dataset construction: Building a finite element sample set requires a significant amount of time, making the data generation process computationally expensive. Although the ensemble learning-based surrogate model achieves high accuracy, its computational time and space complexity remain significantly higher than traditional analytical models. This poses challenges for large-scale parallel computations, necessitating future research on lightweight ensemble learning algorithms to improve efficiency.
(b): Limited applicability to specialized transformers: The current surrogate modeling framework has certain limitations and may not be directly applicable to split transformers, autotransformers, and phase-shifting transformers. To enhance the diversity of future transformer designs, it is essential to develop specialized datasets tailored to these transformer types, enabling the broader applicability of the proposed optimization methodology.

Author Contributions

Conceptualization, W.X., Y.J., T.W. and B.S.; Project administration, L.Z., T.W. and M.L. Software, B.S., W.X. and Y.J., Methodology, B.S.; Writing—review and editing, Y.J. and Z.L.; Investigation, L.Z., X.C., T.W. and J.S. Visualization, Y.J. and W.X.; Writing—original draft, B.S.; Validation, X.C. and L.Z.; Data curation, M.L. and Z.L.; Formal analysis, L.Z. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research is supported by the National Science and Technology Major Project under grant 2024ZD0803200, the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX24_0896), the Fundamental Research Funds for the Central Universities (No. B240205008), the National Natural Science Foundation of China (No. 51879089), and the Cooperative Innovational Center for Coastal Development and Protection (for the first group, 2011 Plan of China’s Jiangsu Province, grant no. (2013) 56).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors express their appreciation for the financial support provided by the National Science and Technology Major Project under grant 2024ZD0803200, the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX24_0896), the Fundamental Research Funds for the Central Universities (No. B240205008), the National Natural Science Foundation of China (No. 51879089), and the Cooperative Innovational Center for Coastal Development and Protection (for the first group, 2011 Plan of China’s Jiangsu Province, grant no. (2013) 56).

Conflicts of Interest

Author Wei Xiao was employed by the company Changzhou Xidian transformer Co., LTD. Author Bai-di shi was employed by the company Changzhou Xidian transformer Co., LTD. Author Liang-xian Zhang was employed by the company Changzhou Xidian transformer Co., LTD. Author Tao Wang was employed by the company Changzhou Xidian transformer Co., LTD. Author Xin-fu Chen was employed by the company Changzhou Xidian transformer Co., LTD. Author Zi-xing Li was employed by the company Changzhou Xidian transformer Co., LTD. Author Meng Li was employed by the company Changzhou Xidian transformer Co., LTD. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Xu, Q.; Zhao, H.; Lyu, W.; Peng, X.; Yang, D.; Wang, B.; Zhang, Z. Optimization of an Operation Strategy for Variable Speed Pumped Storage Power System Flexibility. Electronics 2024, 13, 104. [Google Scholar] [CrossRef]
Wu, Y.; Fu, L.; Ma, F.; Hao, X. Cyber-Physical Co-Simulation of Shipboard Integrated Power System Based on Optimized Event-Driven Synchronization. Electronics 2020, 9, 540. [Google Scholar] [CrossRef]
Zarifakis, M.; Coffey, W.T.; Kalmykov, Y.P.; Titov, S.V.; Byrne, D.J.; Carrig, S.J. Active Damping of Power Oscillations Following Frequency Changes in Low Inertia Power Systems. IEEE Trans. Power Syst. 2019, 34, 4984–4992. [Google Scholar] [CrossRef]
Garces, A. A Linear Three-Phase Load Flow for Power Distribution Systems. IEEE Trans. Power Syst. 2016, 31, 827–828. [Google Scholar] [CrossRef]
Mogorovic, M.; Dujic, D. Sensitivity Analysis of Medium-Frequency Transformer Designs for Solid-State Transformers. IEEE Trans. Power Electron. 2019, 34, 8356–8367. [Google Scholar] [CrossRef]
Shi, B.; Jiang, Y.; Xiao, W.; Shang, J.; Li, M.; Li, Z.; Chen, X. Power Transformer Vibration Analysis Model Based on Ensemble Learning Algorithm. IEEE Access 2025, 13, 37812–37827. [Google Scholar] [CrossRef]
IEEE PC37.91/D17; IEEE Approved Draft Guide for Protecting Power Transformers. IEEE: Piscataway, NJ, USA, 2020; pp. 1–194.
IEEE PC57.12.00/D2.2; IEEE Approved Draft Standard for General Requirements for Liquid-Immersed Distribution, Power, and Regulating Transformers. IEEE: Piscataway, NJ, USA, 2021; pp. 1–70.
Hur, J.; Kim, J.; Kim, S.; Ahn, S. Analysis and Design of High-Frequency Transformer for High-Efficiency Converter. IEEE Trans. Ind. Electron. 2013, 60, 3981–3991. [Google Scholar] [CrossRef]
Di Noia, L.P.; Lauria, D.; Mottola, F.; Rizzo, R. Design optimization of distribution transformers by minimizing the total owning cost. Int. Trans. Electr. Energy Syst. 2017, 27, e2397. [Google Scholar] [CrossRef]
Amarasinghe, R.P.W.S.; Kumara, W.G.K.P.; Rajapaksha, R.A.K.G.; Rupasinghe, R.A.D.K.; Wijayapala, W.D.A.S. A transformer design optimisation tool for oil immersed distribution transformers. In Proceedings of the 2015 Moratuwa Engineering Research Conference (MERCon), Moratuwa, Sri Lanka, 7–8 April 2015. [Google Scholar]
Amoiralis, E.I.; Tsili, M.A.; Paparigas, D.G.; Kladas, A.G. Global Transformer Design Optimization Using Deterministic and Nondeterministic Algorithms. IEEE Trans. Ind. Appl. 2014, 50, 383–394. [Google Scholar] [CrossRef]
Tsili, M.A.; Amoiralis, E.I.; Kladas, A.G.; Georgilakis, P.S. Optimal Design of Multi-Winding Transformer Using Combined FEM and Evolutionary Optimization Techniques. IET Electr. Power Appl. 2012, 6, 112–119. [Google Scholar] [CrossRef]
Barrios, E.L.; Ursua, A.; Marroyo, L.; Sanchis, P. Analytical Design Methodology for Litz-Wired High-Frequency Power Transformers. IEEE Trans. Ind. Electron. 2015, 62, 2103–2113. [Google Scholar] [CrossRef]
Li, L.; Zhu, G.; Wang, Z.; Liu, T.; Yang, S. Topological Design Optimization Approach for Winding Oil Flow Paths in Oil-Natural Air-Natural Transformers Based on a Fluidic-Thermal Coupled Model. IEEE Trans. Magn. 2023, 59, 1–5. [Google Scholar] [CrossRef]
Smajic, J.; Hughes, J.; Steinmetz, T.; Pusch, D.; Monig, W.; Carlen, M. Numerical Computation of Ohmic and Eddy-Current Winding Losses of Converter Transformers Including Higher Harmonics of Load Current. IEEE Trans. Magn. 2012, 48, 827–830. [Google Scholar] [CrossRef]
Georgilakis, P.S. Transformer Design Optimization. Power Syst. 2009, 38, 331–376. [Google Scholar]
Liang, J.; Ban, X.; Yu, K.; Qu, B.; Qiao, K.; Yue, C.; Chen, K.; Tan, K.C. A Survey on Evolutionary Constrained Multiobjective Optimization. IEEE Trans. Evol. Comput. 2023, 27, 201–221. [Google Scholar] [CrossRef]
Wu, X.; Li, X.; Qin, Y.; Xu, W.; Liu, Y. Intelligent multiobjective optimization design for NZEBs in China: Four climatic regions. Appl. Energy 2023, 339, 120934. [Google Scholar] [CrossRef]
Wang, Y.; Chen, C.; Tao, Y.; Wen, Z.; Chen, B.; Zhang, H. A many-objective optimization of industrial environmental management using NSGA-III: A case of China’s iron and steel industry. Appl. Energy 2019, 242, 46–56. [Google Scholar] [CrossRef]
Li, K.; Ding, Y.-Z.; Ai, C.; Sun, H.; Xu, Y.-P.; Nedaei, N. Multi-objective optimization and multi-aspect analysis of an innovative geothermal-based multi-generation energy system for power, cooling, hydrogen, and freshwater production. Energy 2022, 245, 123198. [Google Scholar] [CrossRef]
Razmi, A.; Rahbar, M.; Bemanian, M. PCA-ANN integrated NSGA-III framework for dormitory building design optimization: Energy efficiency, daylight, and thermal comfort. Appl. Energy 2022, 305, 117828. [Google Scholar] [CrossRef]
Deng, X.; Yang, F.; Cao, L.; Huang, J. Multi-objective optimization for a novel sandwich corrugated square tubes. Alex. Eng. J. 2023, 74, 611–626. [Google Scholar] [CrossRef]
Wang, Y.-J.; Li, J.; Wang, G.-G. Fuzzy correlation entropy-based NSGA-II for energy-efficient hybrid flow-shop scheduling problem. Knowl.-Based Syst. 2023, 277, 110808. [Google Scholar] [CrossRef]
Alrezaamiri, H.; Ebrahimnejad, A.; Motameni, H. Parallel multi-objective artificial bee colony algorithm for software requirement optimization. Requir. Eng. 2020, 25, 363–380. [Google Scholar] [CrossRef]
Jain, H.; Deb, K. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point Based Nondominated Sorting Approach, Part II: Handling Constraints and Extending to an Adaptive Approach. IEEE Trans. Evol. Comput. 2014, 18, 602–622. [Google Scholar] [CrossRef]
Zou, J.; Sun, R.; Liu, Y.; Hu, Y.; Yang, S.; Zheng, J.; Li, K. A Multipopulation Evolutionary Algorithm Using New Cooperative Mechanism for Solving Multiobjective Problems with Multiconstraint. IEEE Trans. Evol. Comput. 2024, 28, 267–280. [Google Scholar] [CrossRef]
Xu, Z.; Ning, X.; Yu, Z.; Ma, Y.; Zhao, Z.; Zhao, B. Design optimization of a shell-and-tube heat exchanger with disc-and-doughnut baffles for aero-engine using one hybrid method of NSGA II and MOPSO. Case Stud. Therm. Eng. 2023, 41, 102644. [Google Scholar] [CrossRef]
Shi, B.; Zhang, L.; Jiang, Y.; Li, Z.; Xiao, W.; Shang, J.; Chen, X.; Li, M. Three-Phase Transformer Optimization Based on the Multi-Objective Particle Swarm Optimization and Non-Dominated Sorting Genetic Algorithm-3 Hybrid Algorithm. Energies 2023, 16, 7575. [Google Scholar] [CrossRef]
Ghasemi, M.; Bagherifard, K.; Parvin, H.; Nejatian, S.; Pho, K.H. Multi-objective whale optimization algorithm and multi-objective grey wolf optimizer for solving next release problem with developing fairness and uncertainty quality indicators. Appl. Intell. 2021, 51, 5358–5387. [Google Scholar] [CrossRef]
Sieradzki, S.; Rygal, R.; Soinski, M. Apparent core losses and core losses in five-limb amorphous transformer of 160 kVA. IEEE Trans. Magn. 1998, 34, 1189–1191. [Google Scholar] [CrossRef]
Lambert, M.; Sirois, F.; Martinez-Duro, M.; Mahseredjian, J. Analytical Calculation of Leakage Inductance for Low-Frequency Transformer Modeling. IEEE Trans. Power Deliv. 2013, 28, 507–515. [Google Scholar] [CrossRef]
Dawood, K.; Alboyaci, B.; Cinar, M.A.; Sonmez, O. A New method for the Calculation of Leakage Reactance in Power Transformers. J. Electr. Eng. Technol. 2017, 12, 1883–1890. [Google Scholar] [CrossRef]
Georgilakis, P.S. Spotlight on Modern Transformer Design; Springer: London, UK, 2009. [Google Scholar]
Jurkovic, Z.; Jurisic, B.; Zupan, T. Fast Hybrid Approach for Calculation of Losses in Outer Packages of Transformer Core Due to Perpendicular Stray Flux. IEEE Trans. Magn. 2021, 57, 1–4. [Google Scholar] [CrossRef]
Liu, B.; Takahashi, Y.; Fujiwara, K.; Imamori, S. Stray Loss Evaluation of Power Transformers Using Simplified Air-Core Model with Tank and Frame. IEEE Trans. Magn. 2023, 59, 8401606. [Google Scholar] [CrossRef]
Thango, B.A.; Jordaan, J.A.; Nnachi, A.F. Analysis of Stray Losses in Transformers using Finite Element Method Modelling. In Proceedings of the 2021 IEEE PES/IAS PowerAfrica, Virtual Conference, 23–27 August 2021; pp. 1–5. [Google Scholar] [CrossRef]
Xie, W.; Ding, K.; Chen, B.; Wang, Q.; Liu, P.; Hao, L.; Guo, Y.; Yang, J. Calculation of Magnetic Flux Leakage and Loss of 220kV Transformer Based on MAGNET Field-Circuit Coupling. In Proceedings of the 2023 4th International Conference on Power Engineering (ICPE), Macau, Macao, 11–13 December 2023; pp. 38–43. [Google Scholar] [CrossRef]
Taghilou, M.; Mirsalim, M.; Eslamian, M.; Teymouri, A. Comparative Study of Shield Placement to Mitigate the Stray Loss of Power Transformers Based on 3D-FEM Simulation. In Proceedings of the 2023 3rd International Conference on Electrical Machines and Drives (ICEMD), Tehran, Iran, 20–21 December 2023; pp. 1–7. [Google Scholar] [CrossRef]
Liao, R.; Zheng, H.; Grzybowski, S.; Yang, L. Particle swarm optimization-least squares support vector regression based forecasting model on dissolved gases in oil-filled power transformers. Electr. Power Syst. Res. 2011, 81, 2074–2080. [Google Scholar] [CrossRef]
Huang, Y.; Jin, Y.; Zhang, L.; Liu, Y. Remote Sensing Object Counting through Regression Ensembles and Learning to Rank. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
Ahirwal, M.K.; Kumar, A.; Singh, G.K. Analysis and testing of PSO variants through application in EEG/ERP adaptive filtering approach. Biomed. Eng. Lett. 2012, 2, 186–197. [Google Scholar] [CrossRef]
Iqbal, M.A.; Fakhar, M.S.; Kashif, S.A.R.; Naeem, R.; Rasool, A. Impact of parameter control on the performance of APSO and PSO algorithms for the CSTHTS problem: An improvement in algorithmic structure and results. PLoS ONE 2021, 16, e0261562. [Google Scholar] [CrossRef] [PubMed]
Rodrigues, F.; Molina, Y.; Silva, C.; Ñaupari, Z. Simultaneous tuning of the AVR and PSS parameters using particle swarm optimization with oscillating exponential decay. Int. J. Electr. Power Energy Syst. 2021, 133, 107215. [Google Scholar] [CrossRef]
Yan, C.-m.; Lu, G.; Liu, Y.; Deng, X.-y. A modified PSO algorithm with exponential decay weight. In Proceedings of the 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Guilin, China, 29–31 July 2017; pp. 239–242. [Google Scholar]
Jones, D.R.; Schonlau, M.; Welch, W.J. Efficient Global Optimization of Expensive Black-Box Functions. J. Glob. Optim. 1998, 13, 455–492. [Google Scholar] [CrossRef]
Koziel, S.; Leifsson, L. Surrogate-Based Modeling and Optimization Applications in Engineering; Springer: New York, NY, USA, 2013. [Google Scholar]

Figure 1. Key structure parameters.

Figure 2. Assembly drawing.

Figure 3. Loss distribution.

Figure 4. Workflow of generating a dataset.

Figure 5. Simplex example of the DTR.

Figure 6. Workflow of the methodology.

Figure 7. Iteration curves.

Figure 8. Correlation coefficient matrix.

Figure 9. Sample uncertainty bias visualization: (a) usage of the material; (b) operating loss; and (c) leakage impedance.

Figure 10. Workflow of the MOGWO.

Figure 11. NSGA3 iteration process.

Figure 12. Workflow of the hybrid algorithm.

Figure 13. Pareto solution under Case 1.

Figure 14. Pareto solution under Case 2.

Table 1. Electrical parameters.

Parameter	Name	Descriptions
$T_{l}$	Turns of the low-voltage windings	Mainly affects short circuit impedance and manufacturing cost
$d_{l}$	Current density of the low-voltage windings	Mainly affects short circuit impedance and operating cost
${A l}_{l}$	The length of the cross-section surface of the low-voltage windings	Mainly affects the winding eddy loss and the manufacturing cost
${A w}_{l}$	The width of the cross-section surface of the low-voltage windings	Mainly affects the winding eddy loss and the manufacturing cost
$d_{h}$	Current density of the high-voltage windings	Mainly affects short circuit impedance and operating cost
${A l}_{h}$	The length of the cross-section surface of the high-voltage windings	Mainly affects the winding eddy loss and the manufacturing cost
${A w}_{h}$	The width of the cross-section surface of the high-voltage windings	Mainly affects the winding eddy loss and the manufacturing cost

Table 2. Workflow of the improved PSO.

Input parameters: M: maximum number of iterations; N: number of populations,

w_{e}

,

w_{s}

,

c_{1 e}

,

c_{1 s}

,

c_{2 e}

,

c_{2 s}

.

(1): Initialize

(1.1): Randomly generate the position of population X, and the velocity of population V

(1.2): Find the optimal individual and mark it as p_b

(2): For t = 1 to M # start the iteration

(2.1): For k = 1 to N # update the velocity and the position of each swarm

(2.1.1): Update the velocity of k-th swarm according to Equation (15)

(2.1.2): Update the position of k-th swarm according to Equation (16)

(2.1.3): Boundary judgment and constraint processing

(2.1.4): Update the historically optimal location P_k and the $P_{g b}$

(3): Output: the final position of the population P_M

Table 3. Parameter settings.

Algorithms	Setup of the Key Parameters
PSO* (Table 2)	NP = 100, $w_{e}$ = 0.9, $w_{s}$ = 0.5, $c_{1 s} = c_{2 s} = 2$ , $c_{1 e} = c_{2 e} = 2$
PSO	NP = 100, $w$ = 0.9, $c_{1} = c_{2} = 1$
WOA	NP = 100, $a$ = 2
GA	NP = 100, $p_{c}$ = 0.7, $p_{m}$ = 0.3
DE	NP = 100, $F$ = 0.7

Table 4. Model performance. The best score value has been marked in bold.

Model	Leakage Impedance	Operating Loss	Material Usage
Stacking	1.743%	2.591%	1.452%
RF	2.150%	2.971%	1.574%
FCN	1.798%	2.604%	1.607%
KNN	2.981%	2.704%	1.784%
DTR	2.166%	3.065%	1.684%
ELM	1.801%	2.677%	1.461%
SVM	2.312%	2.715%	1.718%
LR	3.143%	3.507%	2.185%

Table 5. Technique requirements.

Index	Value	Index	Value
Capacity	90 MVA	On Load Loss	≤273 kW
Transformer Ratio	110 kV/10.5 kV	Off Load Loss	≤32.6 kW
Leakage Impedance	18% ± 5%	Core Flux Density	≤1.78 T
Current Density	≤3.8 (A/mm²)	Length Limit	≤6 m
Height Limit	≤3.5 m	Width Limit	≤2.4 m

Table 6. Performance comparison.

Index	Origin	Scenario-1	Scenario-2
Off-load loss (kW)	244.7	237.9	249.1
On-load loss (kW)	29.4	27.1	26.2
Copper usage (kg)	11,205.3	10,204.3	12,217.4
Q235 steel (kg)	13,974.4	13,741.6	11,875.0
Silicon steel usage (kg)	35,774.1	33,751.1	33,174.1
Dimension (length × width × height)	5.44 × 1.73 × 2.79	5.24 × 1.67 × 2.75	5.14 × 1.61 × 2.64
Leakage impedance deviation (%)	3.1	4.3	1.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, B.; Xiao, W.; Zhang, L.; Wang, T.; Jiang, Y.; Shang, J.; Li, Z.; Chen, X.; Li, M. Multi-Objective Optimization Method for Power Transformer Design Based on Surrogate Modeling and Hybrid Heuristic Algorithm. Electronics 2025, 14, 1198. https://doi.org/10.3390/electronics14061198

AMA Style

Shi B, Xiao W, Zhang L, Wang T, Jiang Y, Shang J, Li Z, Chen X, Li M. Multi-Objective Optimization Method for Power Transformer Design Based on Surrogate Modeling and Hybrid Heuristic Algorithm. Electronics. 2025; 14(6):1198. https://doi.org/10.3390/electronics14061198

Chicago/Turabian Style

Shi, Baidi, Wei Xiao, Liangxian Zhang, Tao Wang, Yongfeng Jiang, Jingyu Shang, Zixing Li, Xinfu Chen, and Meng Li. 2025. "Multi-Objective Optimization Method for Power Transformer Design Based on Surrogate Modeling and Hybrid Heuristic Algorithm" Electronics 14, no. 6: 1198. https://doi.org/10.3390/electronics14061198

APA Style

Shi, B., Xiao, W., Zhang, L., Wang, T., Jiang, Y., Shang, J., Li, Z., Chen, X., & Li, M. (2025). Multi-Objective Optimization Method for Power Transformer Design Based on Surrogate Modeling and Hybrid Heuristic Algorithm. Electronics, 14(6), 1198. https://doi.org/10.3390/electronics14061198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Objective Optimization Method for Power Transformer Design Based on Surrogate Modeling and Hybrid Heuristic Algorithm

Abstract

1. Introduction

2. Ensemble Learning Model

2.1. Data Description

2.2. Base Learners

2.2.1. Support Vector Machine

2.2.2. K-Nearest Neighbor

2.2.3. Decision Tree Regression

2.2.4. Linear Regression

2.3. Extreme Learning Machine

2.4. Integrated Strategy

2.5. Surrogate-Based Optimization Strategy

3. Model Validation

3.1. Optimization with an Improved PSO

3.2. Performance Evaluation Under Different Optimizers

3.3. Performance Evaluation Under Different Models

3.4. Uncertainty Quantification

4. Hybrid Optimization Algorithm

4.1. MOGWO Algorithm

4.2. NSGA3 Algorithm

4.3. Hybrid Algorithm

5. Engineering Optimization

5.1. Background

5.2. Optimization Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI