Explainable Warm-Start Point Learning for AC Optimal Power Flow Using a Novel Hybrid Stacked Ensemble Method

Xu, Kaijie; Zhang, Xiaochen; Qiu, Lin

doi:10.3390/su17020438

Open AccessArticle

Explainable Warm-Start Point Learning for AC Optimal Power Flow Using a Novel Hybrid Stacked Ensemble Method

by

Kaijie Xu

¹

,

Xiaochen Zhang

¹

and

Lin Qiu

^1,2,*

¹

Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Haining 314400, China

²

College of Electrical Engineering, Zhejiang University, Hangzhou 310058, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(2), 438; https://doi.org/10.3390/su17020438

Submission received: 4 December 2024 / Revised: 27 December 2024 / Accepted: 29 December 2024 / Published: 8 January 2025

(This article belongs to the Section Energy Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

With the development of renewable energy, renewable power generation has become an increasingly important component of the power system. However, it also introduces uncertainty into the analysis of the power system. Therefore, to accelerate the solution of the OPF problem, this paper proposes a novel Hybrid Stacked Ensemble Method (HSEM), which incorporates explainable warm-start point learning for AC optimal power flow. The HSEM integrates conventional machine learning techniques, including regression trees and random forests, with gradient boosting trees. This combination leverages the individual strengths of each algorithm, thereby enhancing the overall generalization capabilities of the model in addressing AC-OPF problems and improving its interpretability. Experimental results indicate that the HSEM model achieves superior accuracy in AC-OPF solutions compared to traditional Deep Neural Network (DNN) approaches. Furthermore, the HSEM demonstrates significant improvements in both the feasibility and constraint satisfaction of control variables. The effectiveness of the proposed HSEM is validated through rigorous testing on the IEEE-30 bus system and the IEEE-118 bus system, demonstrating its ability to provide an explainable warm-start point for solving AC-OPF problems.

Keywords:

AC-OPF; warm-start; machine learning

1. Introduction

As global industrialization continues to advance, the demand for energy is growing, and electricity, as one of the most crucial energy carriers, has garnered widespread attention [1,2]. This growing demand, coupled with the rapid proliferation of renewable energy sources, has introduced unprecedented challenges in power system operations. These challenges encompass the integration of renewable energy, the management of load fluctuations, the maintenance of power quality, and the optimization of power resource allocation. In this context, the optimization of power systems has increasingly centered around the Optimal Power Flow (OPF) problem [3,4]. The OPF problem is inherently complex due to its high-dimensional, nonlinear, and non-convex nature, leading to substantial computational challenges and difficulties in achieving the required accuracy. This complexity is particularly acute in real-time AC-OPF applications, such as probabilistic analysis, where real-time decision-making is crucial. Interestingly, as the world transitions to a more sustainable energy mix, the role of OPF becomes even more critical [5]. Thus, it highlights the increasing importance of developing innovative approaches to tackle these challenges, ensuring that power systems can meet the demands of a rapidly changing energy landscape [6].

To address the challenges associated with the OPF problem, numerous methods have been proposed in existing research. Traditional iterative algorithms, such as the Interior Point Method [7], Linear Programming [8], Quadratic Programming (QP) [9], and Newton–Raphson [10] have been widely applied for solving the OPF for a single sample. These algorithms are known for their precision and reliability in finding solutions, though they can be computationally intensive. In addition to traditional methods, heuristic algorithms have been developed to accelerate the OPF solving process by providing approximate solutions. Examples of these algorithms include Particle Swarm Optimization (PSO) [11], Novel Improved Social Spider Optimization (NISSO) [12], Enhanced Coati Optimization Algorithm (ECOA) [13], Crisscross Search–Grey Wolf Optimizer (CS-GWO) [14], and Modified Gorilla Troop Optimizer (MGTO) [15]. These algorithms are designed to explore the solution space more efficiently, often leading to faster convergence times. However, despite their advantages, metaheuristic algorithms often perform well only on individual samples and are generally not suitable for real-time OPF problems. Their reliance on approximations can lead to reduced accuracy and their efficiency often diminishes in dynamic, real-time scenarios. Furthermore, metaheuristic algorithms suffer from the lack of a solid mathematical foundation to support their operational mechanisms. As a result, while heuristic approaches offer valuable insights and speed, they are not always capable of handling the complexities of real-time OPF challenges effectively.

With the advancement of artificial intelligence, researchers have increasingly focused on utilizing artificial intelligence to overcome OPF challenges. For instance, a data-driven approach based on Recurrent Neural Networks (RNNs) was proposed by [16], which learns the relationship between load and generation outputs to quickly predict OPF results. Similarly, ref. [17] introduced an OPF estimation method using One-Dimensional Convolutional Neural Networks (1D-CNNs), which maps system loads to generator outputs for direct OPF prediction. Furthermore, the ConvOPF-DOP model proposed by [18] is a novel data-driven approach that uses Convolutional Neural Networks (CNNs) to solve AC-OPF across various operational patterns, significantly improving computational efficiency while maintaining accuracy. Moreover, ref. [19] employed a Graph Neural Network (GNN) model to approximate the optimal solution of AC-OPF through an imitation learning framework, offering improved computational speed. In a related approach, ref. [20] introduced an unsupervised learning approach using GNNs to solve OPF, which demonstrated computational efficiency and maintained constraint satisfaction through a novel penalty function technique. Additionally, ref. [6] combined the strengths of GNNs and CNNs by proposing a Graph Convolutional Neural Network (GCNN) model, enhancing OPF problem-solving capabilities, while also incorporating a physics-guided loss term to further improve the model’s accuracy and robustness. Deep Neural Networks (DNNs) have also been applied to solve DC OPF [21] and extended to AC-OPF by [22]. Despite the computational efficiency these neural network approaches offer, they often face challenges in maintaining accuracy and ensuring that solutions comply with safety constraints. Moreover, deep learning methods are often referred to as ’black-box’ due to their lack of interpretability. Although the aforementioned algorithms can make relatively accurate predictions for OPF solutions, their predictions often exceed safety constraints because they do not take into account the inherent characteristics of the power system. As a result, these solutions are typically not considered by grid operators, as they may cause harm to the overall operation of the power system.

Interpretability plays a crucial role in systems with physical characteristics, and the OPF problem is a prime example of such systems. While existing approaches, such as the deep learning models mentioned above, offer advantages in rapidly predicting OPF problems, models without interpretability and transparency pose significant risks to power systems. Therefore, some tree-based machine learning methods have already been applied to OPF. For example, ref. [23] constructs classification trees to learn the optimal strategy from mixed-integer quadratic problems, with the tree structure providing interpretability. Refs. [24,25] utilize random forests as a warm-start point for OPF solutions under load fluctuations, providing enhanced interpretability in the solution process. In [26], an XGBoost model is used for feature extraction, enhancing the overall interpretability of the model. However, despite the interpretability offered by these models, the inherent complexity of power systems means that a single model is often unable to fully capture the system’s characteristics and provide sufficiently accurate predictions.

To address the aforementioned challenges, this paper presents an explainable warm-start point learning using a Hybrid Stacked Ensemble Model (HSEM). The HSEM model utilizes the concept of stacking [27], which integrates multiple machine learning algorithms to enhance overall performance. Specifically, the model combines Regression Trees [28], Random Forests [29], XGBoost [30], LightGBM [31], and CatBoost [32]. By leveraging the strengths of these diverse algorithms, the HSEM aims to improve predictive accuracy and robustness.

In this paper, we introduce the Hybrid Stacked Ensemble Method (HSEM), which provides an explainable warm-start point for AC-OPF solutions, ensuring both the accuracy and feasibility of the resulting solutions. The main contributions of this paper are summarized as follows:

This paper introduces a novel HSEM model that integrates traditional machine learning algorithms within a stacking framework. This approach is primarily designed to provide an accurate and explainable warm-start point for OPF problems, ensuring both interpretability and solution quality.
The proposed HSEM model has been rigorously tested on the IEEE-30 bus and IEEE-118 bus systems. These experiments validate its effectiveness in predicting OPF problems under different test systems.

The remainder of this research article is organized as follows: Section 2 presents the problem statement that forms the foundation of this research. Section 3 outlines the methodology employed throughout the study. In Section 4, an in-depth analysis and discussion of the results are provided. Finally, Section 5 concludes the article by summarizing the key findings and discussing their implications.

2. Problem Formulation

This paper considers the AC-OPF system, which includes

N_{b}

buses and

N_{b r}

branches. Among these buses, there are

N_{g}

generator buses denoted as G. The AC-OPF problem involves two control variables: the output and voltage of the generator buses. The detailed formulation can be expressed as follows [33]:

\begin{matrix} min & \sum_{i \in G} C_{i} (P_{g i}) \\ s . t . & P_{g i}^{min} \leq P_{g i} \leq P_{g i}^{max} \forall i \in G \\ Q_{g i}^{min} \leq Q_{g i} \leq Q_{g i}^{max} \forall i \in G \\ V_{i}^{min} \leq | V_{i} | \leq V_{i}^{max} \forall i i n N_{b} \\ | S_{l m} | \leq S_{l m}^{max} \forall (l, m) i n L \\ P_{g i} - P_{d i} = \sum_{l \in N_{b} (i)} Re {V_{i} (V_{i}^{*} - V_{l}^{*}) Y_{i l}} \\ Q_{g i} - Q_{d i} = \sum_{l \in N_{b} (i)} Im {V_{i} (V_{i}^{*} - V_{l}^{*}) Y_{i l}} \end{matrix}

(1)

where g denotes the generator and d denotes the demand.

Y_{i l}

represents the admittance between bus i and l.

P_{i}

and

Q_{i}

represent the active power and reactive power at bus i, respectively.

The first equation in Equation (1) is the objective function of the AC-OPF problem, which calculates the cost of the generators. As shown in (2), the generation cost is a quadratic function, with the decision variable being the generator output. The coefficients

c_{i 2}, c_{i 1}, c_{i 0}

of the function are determined by the bus system. The constraints in Equation (1) include both equality and inequality constraints. Among the inequality constraints, the first two are the active and reactive power output limits of the generator nodes, and the last two are the voltage limits of all nodes and the branch flow limits. The equality constraints ensure the balance of both active power flow and reactive power flow within the power system.

C_{i} (P_{g i}) = c_{i 2} P_{g i}^{2} + c_{i 1} P_{g k} + c_{i 0} \forall i \in G

(2)

3. Materials and Methods

3.1. Regression Tree

The Regression Tree is a classic machine learning method commonly used to address regression problems. Its fundamental concept involves a series of binary decisions to select the optimal feature and the best split point based on the Sum of Squared Errors (SSE), partitioning the feature space into two regions. This process partitions the feature space into two regions, which is repeated for each region until a stopping condition is met. This iterative process results in the construction of a minimal binary regression tree [28].

Let

D = (x_{i}, y_{i}) ∣ i = 1, 2, \dots, N

be the training dataset, where

x_{i} \in R^{p}

is the input vector and

y_{i} \in R

is the target variable. The regression tree builds the model through the following steps:

3.1.1. Selecting the Best Split Variable and Split Point

For each input variable

x_{j}

(

j = 1, 2, \dots, p

) and candidate split point s, minimize the sum of squared errors (SSE) after splitting the dataset into two subsets:

S S E (j, s) = \sum_{x_{i} \leq s} {(y_{i} - m_{1})}^{2} + \sum_{x_{i} > s} {(y_{i} - m_{2})}^{2}

(3)

where

\hat{y} \leq s

and

\hat{y} > s

are the means of the leaf data on the left and right sides of the split point s, respectively. The algorithm selects the j and s that minimize the SSR as the best split for the current node.

3.1.2. Partition the Feature Space and Determine the Predicted Data

Using the selected

(j, s)

, partition the feature space. For each leaf node, take the average of all target variables within that node as the predicted value.

3.1.3. Recursive Splitting

Recursively apply steps 1 and 2, partitioning the dataset until the termination criteria are satisfied, such as reaching the maximum allowable tree depth or the sample count in each leaf node falling below a specified threshold.

3.1.4. Prediction Output

At each leaf node, take the mean of all target variables within that node as the prediction value. For a new input data point x, follow the decision path to find its corresponding leaf node and return the prediction value of that node.

After obtaining the generated regression tree model, for a new input test set

x_{test}

, the algorithm will follow the decision path to locate its corresponding leaf node. The predicted value of that node is then returned as the output for the test set.

3.2. Random Forest

The Random Forest algorithm is a classic ensemble method first proposed by [29]. It utilizes the bootstrapping concept from the bagging algorithm [34]. Bootstrapping entails repeatedly sampling from the original dataset with replacement to generate multiple sub-samples (bootstrap samples). Random Forest extends the bagging algorithm by using regression trees as weak learners and randomly selecting a subset of features to determine the optimal split points of the regression trees. The steps of the Random Forest algorithm are as follows:

3.2.1. Generate Training Subsets

Use bootstrapping to randomly sample multiple subsets from the original training dataset with replacement. Each subset is used to train a decision tree model.

3.2.2. Build and Train Regression Trees

For each generated subset, construct a regression tree. At each node split, randomly select a subset of features and decide the optimal split point based on this subset of features to train each regression tree.

3.2.3. Prediction Output

The final prediction result would be the average value of the outputs from all regression trees.

The Random Forest algorithm, utilizing an ensemble of regression trees, enhances predictive accuracy. The bootstrapping method within the bagging algorithm improves the generalization capability of regression trees and mitigates overfitting. Furthermore, the random selection of feature subsets allows Random Forests to perform effectively in high-dimensional contexts, such as the OPF problem examined in this study.

3.3. XGBoost

Extreme Gradient Boosting (XGBoost) is a refined algorithm derived from the Gradient Boosting Decision Trees (GBDTs) framework [35], proposed by [30]. The primary difference between XGBoost and original GBDT is the inclusion of a regularization term in the objective function, as shown in Equation (4). Additionally, XGBoost is a parallelizable algorithm.

L (θ) = \sum_{i = 1}^{n} L (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(4)

Here,

{\hat{y}}_{i}

and

y_{i}

are the predicted and label values, respectively, and

L (y_{i}, {\hat{y}}_{i})

represents the loss function during training. The term

Ω (f_{k})

is the regularization term, typically defined as Equation (5):

Ω (f) = γ T + \frac{1}{2} λ {∥w∥}^{2}

(5)

In this equation.

T is the number of leaf nodes in the tree.
w is the weight of leaf node k.
$γ$ is the leaf nodes number regular penalty term.
$λ$ is the leaf weight regular penalty term.

XGBoost effectively prevents overfitting to the training data and enhances the model’s generalization capability by incorporating regularization for the complexity of the tree and penalties for leaf node weights. Additionally, unlike the original GBDT, which operates sequentially, XGBoost employs parallel computing. It divides the dataset into multiple subsets, processing each on different CPU cores. During the construction of decision trees, XGBoost accelerates the node splitting process through parallel computation. These methods leverage multi-core processors, significantly reducing computation time [36].

3.4. LightGBM

LightGBM (Light Gradient Boosting Machine) is an improved algorithm based on the original GBDT, proposed by Microsoft in 2017, primarily aimed at reducing computation time when handling high-dimensional, large-scale data [31].

LightGBM employs a histogram-based algorithm instead of a pre-sorted approach to help find the optimal split points. The fundamental concept behind the histogram algorithm involves discretizing continuous feature values into k integers and constructing a histogram with k bins. By using these histogram bins, LightGBM can efficiently identify the best split points for decision trees [31].

Additionally, LightGBM uses a leaf-wise strategy with depth limitation, rather than the traditional level-wise strategy, to generate leaf nodes. As shown in Figure 1, the traditional level-wise strategy splits all nodes at each level of the tree. However, this approach can lead to splitting nodes with low gain, resulting in lower overall splitting gain. Conversely, the leaf-wise strategy would select the leaf node with the highest splitting gain for splitting at each step. This approach can effectively reduce loss. However, because leaf-wise only splits the node with the highest gain, it often leads to excessively deep paths and overfitting. Therefore, leaf-wise with depth limitation has been proposed to effectively prevent overfitting and improve the algorithm’s accuracy.

3.5. CatBoost

Categorical Boosting (CatBoost) [32] is an improved machine learning algorithm based on GBDTs, designed to address the challenge of handling categorical features in large-scale datasets. Compared to traditional GBDTs, it incorporates several technical improvements and enhancements.

3.5.1. Efficient Handling of Categorical Features

CatBoost excels in handling categorical features. CatBoost employs a technique called Target Statistics (TS), which is an effective method for processing categorical features with minimal information loss. In this method, categorical features are replaced by their corresponding average label values. Additionally, CatBoost applies random permutations to the dataset. Given an observation dataset

D = {(X_{i}, Y_{i})}_{i = 1}^{n}

, the dataset is randomly permuted according to

σ = (σ_{1}, \dots, σ_{n})

. The specific formula is shown in Equation (6):

x_{σ_{i}, k} = \frac{\sum_{j = 1}^{i - 1} [x_{σ_{j}, k} = x_{σ_{i}, k}] \cdot y_{σ_{j}} + w \cdot p}{\sum_{j = 1}^{i - 1} [x_{σ_{j}, k} = x_{σ_{i}, k}] + w}

(6)

where p is the prior value and

β

is the weight of the prior value. By using this method, CatBoost can effectively handle noise in low-frequency categories and address data leakage [37] and conditional shift issues inherent in the TS algorithm.

3.5.2. Combining Multiple Categorical Features

CatBoost employs a distinctive greedy approach to handle categorical features by considering feature combinations. When constructing new splits in the decision tree, the algorithm does not consider any combinations during the first split. However, from the second split onwards, CatBoost takes into account all preset feature combinations along with all categorical features in the dataset. For example, if the first split selects feature m, in the second split, CatBoost might consider combinations such as

m \times n

,

m \times l

, or even

n \times l

. This approach enables CatBoost to capture interactions between features effectively, enhancing the model’s generalization capability while also performing well on high-dimensional, nonlinear problems.

3.5.3. Ordered Boosting

In original GBDT algorithms, subsequent weak learners are trained based on the prediction errors or gradients of their predecessors [35]. However, due to the structural issues of the algorithm or noise in the dataset, these gradients may not be entirely accurate. Consequently, each successive weak learner might be trained on inaccurate gradients, leading to substantial bias in the final prediction results. Additionally, differences in the distributions of the training and test sets cause GBDTs to adjust their parameters to better fit the training set during the training process, resulting in overfitting.

To improve the model’s generalization capability, CatBoost introduces an innovative technique known as Ordered Boosting, which can overcome the prediction shift caused by gradient bias. The pseudocode for Ordered Boosting is shown in Algorithm 1.

Algorithm 1 Ordered Boosting

Require:: A dataset ${(X_{z}, Y_{z})}_{z = 1}^{n}$ sorted by $σ$ , number of trees I

1:: Randomly permute $σ$ over the set $[1, n]$
2:: Initialize $M_{i} = 0$ for $i = 1, \dots, n$
3:: for $t = 1$ to I do
4:: for $i = 1$ to n do
5:: Compute residuals $r_{i} = y_{i} - M_{σ (i) - 1} (X_{i})$
6:: end for
7:: for $i = 1$ to n do
8:: Update $Δ M$ by calling LearnModel with data subset ${(X_{i}, r_{j}) : σ (j) \leq i}$
9:: Set $M_{i} = M_{i} + Δ M$
10:: end for
11:: end for
12:: return Final model $M_{n}$

In Pseudocode 1,

{(X_{z}, Y_{z})}_{z = 1}^{n}

represents the input dataset, where

X_{z}

is the z-th input and

Y_{z}

is the corresponding z-th ground truth. The initialization process includes randomly permuting the dataset

[1, n]

to generate

σ

, ensuring that the sample order is randomized during training. Subsequently, the predicted values

M_{i}

are set to zero to facilitate subsequent calculations.

Within the algorithm’s loops, the first major loop iterates over the number of trees I. The first inner loop calculates the residuals

r_{i}

for each sample, which are the differences between the actual values

y_{i}

and the current model predictions

M_{σ (i) - 1} (X_{i})

. The second inner loop updates the model by first learning a new model

Δ M

based on the samples

(X_{i}, r_{j})

. This is the core part of the Ordered Boosting algorithm, ensuring that each sample’s residual is calculated based on the prediction errors of previous samples rather than the overall model error. The inner loop then updates the model’s predictions by adding the newly learned model

Δ M

to the current model

M_{i}

.

3.6. Ensemble Model Learning Method

Stacking learning is a commonly used ensemble algorithm model, applied in both classification and regression problems [27]. Stacking learning primarily consists of two parts: the base learners and the meta-learner. The base learners consist of various independent machine learning algorithms, while the meta-learner aggregates the outputs from these base learners for training and prediction. In this study, we use Regression Tree (RT), Random Forest (RF), and CatBoost as the base learners, and employ XGBoost and LightGBM as the meta-learners. The detailed algorithm flow is presented in Algorithm 2.

Algorithm 2 Training and testing of HSEM.

1:: Input: Training data $(X_{train}, y_{train})$ ,
2:: Testing data $(X_{test}, y_{test})$ ,
3:: Base models $B = {B_{1}, B_{2}, \dots, B_{k}}$ ,
4:: Meta models $M = {M_{1}, M_{2}}$
5:: Output: Final predictions ${\hat{y}}_{final}$
6:: Initialize: Meta features $F_{train} = \emptyset$ , $F_{test} = \emptyset$
7:: for each base model $B_{i} \in B$ do
8:: Train $B_{i}$ on $(X_{train}, y_{train})$
9:: Predict $B_{i}$ on $X_{validation}$ to get ${\hat{y}}_{train, i}$
10:: Predict $B_{i}$ on $X_{test}$ to get ${\hat{y}}_{test, i}$
11:: Append ${\hat{y}}_{train, i}$ to $F_{train}$
12:: Append ${\hat{y}}_{test, i}$ to $F_{test}$
13:: end for
14:: Train meta model M on $(F_{train}, y_{train})$
15:: Apply ReLU to the meta model and Predict M on $F_{test}$ to obtain final predictions ${\hat{y}}_{final}$
16:: Return: ${\hat{y}}_{final}$

In the pseudocode, HSEM initializes the feature matrices for the base models,

F_{train}

and

F_{test}

. Then, the training set is divided into two parts using k-fold (KF) validation: one part is the train set and the other part is the validation set. The train set is used to train each model in the base model ensemble, while the validation set is fed into the trained base models to generate predictions

{\hat{y}}_{train, i}

. These results are appended to

F_{train}

and used as the training set for the meta-model. Similarly, the original test set

(X_{test}, y_{test})

is fed into the trained base models to generate predictions, which are combined into

F_{test}

and used as the test set for the meta-model.

In the meta-model, the training process is initially conducted using

F_{train}

. Subsequently,

F_{test}

is utilized for testing. The final outcome represents the ultimate prediction of the entire HSEM algorithm.

Figure 2 illustrates the algorithm’s flowchart. Similar to the pseudocode, the training set is first divided into two parts: the train set and the validation set. These, along with the test set, are fed into the base models. The features generated by the base models are treated as meta-features and used as inputs for the meta-model, which undergoes further training and testing. The output of the meta-model on the test set serves as the final prediction result of the entire model.

The base models in the HSEM algorithm are optimized towards the ground truth in the test set. The stacking of the first layer of base models involves training base models to make their outputs as accurate as possible, producing predictions close to the actual values. Although the optimization target for the second-layer meta-model remains the ground truth, the meta-model does not directly accept bus system information as input. Instead, it only receives the concatenated predictions from the base models and the ground truth. Thus, the meta-model essentially maps the base models’ predictions to a more accurate final prediction. In addition, unlike the first-layer models, which focus on individual machine learning algorithms to enhance each base model’s output accuracy, the meta-model primarily learns the relationships between different models. Specifically, the goal of stacking the first-layer base models is to make each base model generate predictions as close to the actual values as possible. The meta-model’s task, on the other hand, is to integrate the predictions from each base model, capturing the relationships between them to become more stable and obtain accurate predictions.

The HSEM algorithm integrates the advantages of various algorithms within a stacking framework, resulting in both accurate and explainable solutions for AC-OPF problems. While individual machine learning models can provide relatively accurate outputs for the OPF problem, they still face some issues. Firstly, the outputs of base models can contain errors that cannot be minimized solely through the model’s structure and characteristics. Additionally, base models often perform well on specific datasets or during training but do not generalize well to the test set. The HSEM model, through the stacking of base models and meta-models, comprehensively considers the outputs of multiple base models, thereby enhancing the model’s generalization capability. This results in better performance on both training and test sets.

4. Results

This paper uses the IEEE 30-bus system and the IEEE 118-bus system to test and validate the proposed Hybrid Stacked Ensemble Method (HSEM). The IEEE 30-bus system consists of 6 generator buses, while the IEEE 118-bus system contains 54 generator buses. By leveraging both small- and large-scale test systems, the performance of the algorithm in providing an explainable warm-start point for AC-OPF problems is thoroughly evaluated.

4.1. Simulation Setting

In the simulation settings, we detail the data generation, preprocessing steps, datasets used, and the experimental equipment involved in our simulations.

4.1.1. Data Generation

The required data were generated by introducing random perturbations to the demand of buses, thereby creating different power flow grids. The original

P_{d}

and

Q_{d}

values of all load buses in the network serve as reference values, which are then multiplied by a perturbation factor

μ

to obtain new

{\hat{P}}_{d}

and

{\hat{Q}}_{d}

values. The range of

μ

is [0.6, 1.4] p.u., and it is uniformly randomly sampled. After generating the corresponding new test systems, Matpower [38] was used to solve the newly generated test systems. The optimal control variables obtained were used as the ground truth in the dataset.

4.1.2. Data Preprocessing

In the control variables of a test system, there can be significant differences in the ranges and dimensions of different features. For example, in the IEEE 118-bus system, the voltage dimension is around 10, while the generator output can be as high as

10 \times 10^{2}

. If regularization is not applied, the differences in magnitudes between different features can affect the model’s training performance, causing certain features to dominate the loss function calculation. Therefore, regularization is used to process the data by rescaling the features to the

[0, 1]

range. The normalization formula is given by Equation (7).

D_{n o r m} = \frac{D_{o r i} - l b}{u b - l b}

(7)

where

D_{o r i}

represents the original data value,

D_{n o r m}

is the normalized data value, and

l b

and

u b

are the lower and upper bounds of the data range, respectively.

The denormalization formula is given by Equation (8):

D_{d e n o r m} = D_{n o r m} \times (u b - l b) + l b

(8)

Using this method, data can be normalized and denormalized while maintaining numerical stability, thereby improving the model’s training and prediction performance.

The information on dataset used for model training can be found in Table 1. To evaluate the accuracy of the solutions for the test results, the cost comparison in percentage

κ

as defined in Equation (9) is used [22]. In addition, the feasibility rate and two quantitative metrics, Mean Squared Error (MSE) and Mean Absolute Error (MAE), are also employed to measure the performance of the model.

κ = \frac{c o s t_{i p s} - c o s t_{m o d e l}}{c o s t_{m o d e l}}

(9)

In the case study, we first conduct a longitudinal comparison of the proposed HSEM algorithm with those obtained using the individual base learners within the algorithm. Subsequently, we perform a horizontal comparison with the more popular Deep Neural Network (DNN) algorithms to demonstrate the superior performance of the proposed method. The DNN we used in the simulation consists of three layers with 64 neurons per layer. The neural network was developed and tested using PyTorch version 1.10.0 within the Python environment. The simulations tests were conducted on a server equipped with an Intel(R) Core(TM) i5-13500H processor (13th Gen, 2600 MHz, 12 cores, 16 logical processors) (Intel: Santa Clara, CA, USA).

4.2. IEEE 30-Bus System

The IEEE 30-bus system is a classical bus system that conforms to IEEE standards and is frequently used for testing as a small test system. This system consists of 41 transmission lines, six generator buses, and 24 load buses [39]. Therefore, in the proposed HSEM model, the input dimension is 48, including the active demand and reactive demand of the 24 load buses. The output dimension of the model is 12, consisting of the active power outputs and voltages of the six generator buses.

Table 2 presents the results of the objective function values for HSEM and other comparative methods on the IEEE 30-bus system. In the table, the feasibility rate refers to the proportion of feasible solutions that satisfy the control variable limits and OPF security constraints as specified in Equation (1). The maximum, minimum, and average

|κ|

are employed to evaluate the performance of the different predictive model. The maximum

|κ|

reflects the worst-case prediction error, the minimum

|κ|

shows the best prediction error, and the average

|κ|

provides an overview of the overall prediction accuracy. These metrics help comprehensively analyze the model’s performance in both extreme and ordinary situations. Mean Squared Error (MSE) and Mean Absolute Error (MAE) are used to assess the mean squared error and mean absolute error of the predictions, respectively. It is particularly important to note that

|κ|

, MSE, and MAE are calculated only for the feasible solutions of the algorithm, ensuring the accuracy and reliability of the evaluation results.

In terms of feasibility rate, it is evident that the weak learners of the base models, except for CatBoost, perform well in meeting the constraints of the IEEE 30-bus system. For example, Regression Trees, Random Forest, LightGBM, and XGBoost all achieve a 100% feasibility rate on the IEEE 30-bus system. In contrast, although CatBoost has only a 46.61% feasibility rate, it performs exceptionally well in terms of

|κ|

, even surpassing other base learners by 1–2 orders of magnitude.

The proposed HSEM algorithm effectively predicts the optimal values for the IEEE 30-bus system case. The total generation cost resulting from the HSEM output is very close to that obtained from Matpower, the ground truth, with an average value of

|κ|

reaching as low as 0.000179%. Additionally, this algorithm maintains a 99.96% feasibility rate across 10,000 test sets, meeting the high accuracy requirements of practical applications. Furthermore, the MSE and MAE values also indicate that the HSEM algorithm has a low prediction error for the generation cost, demonstrating good consistency and stability.

When compared with the more popular DNN algorithms, it is observed that while DNN algorithms may perform better than base learners (except CatBoost) in terms of an average value of

|κ|

, the HSEM algorithm outperforms DNN in both

|κ|

and feasibility rate.

Although Table 2 provides a quantitative analysis of the prediction performance of the objective function values using evaluation metrics, even the most indicative metric, the average

|κ|

, can still be affected by extreme points. The scatter plot of

|κ|

can better reflect the distribution of relative errors in the objective function values. Figure 3 shows the scatter plots of

|κ|

for the six comparative algorithms, while Figure 4 presents the scatter plot of

|κ|

for the HSEM algorithm alongside the corresponding frequency

|κ|

occurrences.

From the aforementioned figures, it can be observed that the value of

|κ|

for RT among the weak learners is mostly within

0.1 %

, for RF it is within

0.04 %

, and for LightGBM and XGBoost it falls primarily within

0.005 %

. Notably, CatBoost performs the best, with the value of

|κ|

mainly distributed within

0.001 %

, while DNN falls within

0.003 %

. In contrast, as shown in the right panel of Figure 4, the value of

|κ|

for the HSEM algorithm is mainly within the range of

5 \times 10^{- 4} %

. If a few extreme points are excluded, the HSEM algorithm demonstrates the best stability and error performance. The results in Figure 3 and Figure 4 also confirm the accuracy of the evaluations presented in Table 2.

The prediction of the OPF problem is essentially a multi-objective regression problem, where the prediction of the control variables determines the prediction of the objective function of the OPF problem. Therefore, Table 3 analyzes the performance of different algorithms in predicting control variables. Since the dimensions of the two types of control variables—active power output and voltage at generator buses—are different, Table 3 separately analyzes the MSE and MAE of different algorithms for the control variables

P_{g}

and

V_{g}

.

As shown in the table, the proposed HSEM algorithm achieves an MSE of

8.86 \times 10^{- 3}

and an MAE of

5.37 \times 10^{- 2}

in predicting the control variable

P_{g}

. Given that the output range of voltage in the IEEE 30-bus system is on the order of

10^{0}

to

10^{2}

, the HSEM algorithm performs well in fitting

P_{g}

. For

V_{g}

, which has a range of

[0.94, 1.06]

, the HSEM algorithm achieves an MSE of

6.93 \times 10^{- 10}

and an MAE of

1.27 \times 10^{- 5}

, outperforming other algorithms and ensuring the accuracy of the regression problem for the target variables.

Additionally, Figure 5 and Figure 6 analyze the Absolute Percentage Error (APE) between the predicted values and the ground truth for control variables

P_{g}

and

V_{g}

, showing scatter plots and frequency distribution plots of APE for

P_{g}

and

V_{g}

. From Figure 5, it can be observed that in most samples, the APE scatter points for

P_{g}

fall within the range of

0 - 0.4 %

, and the frequency distribution plot shows that errors are larger closer to 0. Considering the magnitude of the control variable

P_{g}

, the HSEM algorithm provides good predictions for

P_{g}

.

Furthermore, Figure 6 shows that for most test sets, the APE scatter points for

V_{g}

are within

0.002 %

, with the frequency of errors peaking near 0. Therefore, the APE metric also indicates that the HSEM algorithm performs well in predicting control variables.

In terms of time complexity, the proposed HSEM algorithm requires only 4.63 ms per sample, compared to 87.3 ms per sample when using Matpower. This represents a substantial speedup of approximately 18.8 times, demonstrating that HSEM is significantly more efficient and capable of providing faster warm-start for AC-OPF problems, while maintaining its explainability and accuracy.

4.3. IEEE 118-Bus System

The IEEE 118-bus system is one of the IEEE power system test cases, derived from the power system of the northeastern United States. It is widely used for large-scale power system research and optimization algorithm verification. This system consists of 186 transmission lines, 54 generator buses, and 64 load buses [1]. Therefore, for the IEEE 118-bus system, the input dimension of HSEM is 128, including the active and reactive demand corresponding to the 64 load buses. The output dimension of the algorithm is 108, comprising the active power outputs and voltages of the 54 generator buses. Unlike the IEEE 30-bus system, the IEEE 118-bus system has a higher output dimension (nine times that of the smaller bus system). Additionally, the range of control variables in the IEEE 118-bus system is also larger, making this bus system an excellent testbed for evaluating the performance of algorithms on larger-scale test system.

Table 4 describes the performance of the HSEM algorithm and other comparative algorithms in terms of objective function values when handling the IEEE 118-bus system.

From the perspective of feasibility rate, it is evident that due to the higher output dimension and range of the 118-bus system, the performance of various comparative methods in terms of feasibility rate is not as good as in the smaller bus system. In particular, CatBoost achieves a feasibility rate of only 6.35%. Among the comparative algorithms, even the highest feasibility rate achieved by LightGBM is only 95.57%, which still does not meet the requirements for large-scale or real-time prediction of feasible solutions. However, the HSEM algorithm performs well even when faced with the more complex, high-dimensional test system, achieving a feasibility rate of 99.22%. In contrast, the DNN algorithm only achieves a feasibility rate of 45.46%. When dealing with higher-dimensional nonlinear problems, the DNN algorithm often faces significant issues of exceeding safety constraints. This highlights one of the advantages of the proposed HSEM algorithm.

Regarding the objective function values, among the comparative algorithms, CatBoost shows the best performance based on the average

|κ|

. Even considering marginal cases, CatBoost has the smallest maximum

|κ|

. For the HSEM algorithm, although its maximum

|κ|

is larger than that of CatBoost, its overall performance on the test set in terms of

|κ|

surpasses all other algorithms, reaching

5.44 \times 10^{- 5}

. Additionally, the HSEM algorithm also shows the best performance on the other two evaluation metrics, MSE and MAE. It is important to note that CatBoost performs well on many evaluation measures, but its low feasibility rate shows it struggles with the OPF problem. On the other hand, the HSEM algorithm uses a stacking method, combining solutions from RT and RF algorithms and refining them through a meta-model. This allows the overall output to meet safety constraints while also improving accuracy.

Figure 7 shows the scatter plot distribution of

|κ|

for comparative algorithms on the IEEE 118-bus system, while Figure 8 presents the scatter plot and corresponding frequency of

|κ|

for the HSEM algorithm.

Analyzing the above figures, it can be seen that the

|κ|

of the HSEM algorithm mainly falls within the range of

2 \times 10^{- 4} %

, whereas for CatBoost it is primarily within

2.5 \times 10^{- 4} %

. These two algorithms exhibit much smaller objective function errors compared to the other comparative algorithms. The similarity in the

|κ|

scatter plot of CatBoost to that of the HSEM algorithm aligns with the close average

|κ|

values observed in Table 4. However, as mentioned earlier, CatBoost’s feasibility rate is significantly lower than that of the other comparative algorithms. Essentially, the HSEM algorithm, through its two-layer stacking approach, corrects the base model’s prediction outputs to meet the constraints.

Table 5 presents the regression errors of the control variables for each algorithm on the IEEE 118-bus system. As shown in the table, the HSEM algorithm has the smallest regression errors. For the control variable

P_{g}

, which has a larger magnitude, the HSEM algorithm achieves an MSE of

4.17 \times 10^{- 3}

and an MAE of

6.07 \times 10^{- 10}

. For the control variable

V_{g}

, the HSEM algorithm achieves an MSE of

3.36 \times 10^{- 2}

and an MAE of

1.40 \times 10^{- 5}

. Additionally, whether considering MAE or MSE, CatBoost’s evaluation metrics are closest to those of HSEM, making it the second-best performing algorithm after HSEM. Therefore, the errors in the control variables also reflect the errors in the objective function values, consistent with the results in Table 4.

Figure 9 and Figure 10 depict the APE (Absolute Percentage Error) of the HSEM algorithm for the control variables

P_{g}

and

V_{g}

. From Figure 9, it can be seen that the sum of the APE for

P_{g}

in the HSEM algorithm mostly falls within the range of 0.1%. Considering that the maximum value range of generator active power output in the 118-bus system is 0 to 707 (excluding the slack bus), and the minimum value range is 0 to 100, the HSEM algorithm fits the control variables for generator output very well. For the generator bus voltages, as shown in Figure 10, the APE of the HSEM algorithm mostly falls within

0.0015 %

. Therefore, the HSEM algorithm also performs well in fitting the control variables.

For the IEEE 118-bus system, the proposed HSEM algorithm requires only 12.7 ms per sample, whereas Matpower takes 297.6 ms per sample. This results in a speedup of approximately 23.4 times, demonstrating that HSEM significantly outperforms Matpower in terms of computational efficiency, while still providing an explainable and accurate warm-start point for AC-OPF problems.

5. Conclusions

This paper introduces a novel Hybrid Stacked Ensemble Model (HSEM) algorithm designed for the efficient solution of AC Optimal Power Flow (OPF) problems, providing an explainable warm-start point for real-time power system operation and control. The algorithm was evaluated using both the IEEE 30-bus system and the IEEE 118-bus system. In terms of the average

|κ|

evaluation metric, the HSEM achieved a target function value error of

1.79 \times 10^{- 4} %

on the smaller system and

5.44 \times 10^{- 5} %

on the larger system. Additionally, the feasibility rates for the solutions were notably high, at 99.96% for the IEEE 30-bus system and 99.92% for the IEEE 118-bus system. In terms of computational speed, HSEM significantly outperformed Matpower, being approximately 18.8 times faster on the IEEE 30-bus system and 23.4 times faster on the IEEE 118-bus system. These results highlight the accuracy, feasibility, and interpretability of the HSEM algorithm, making it a reliable and explainable tool for providing warm-start points in AC-OPF problems.

Future work could involve further optimization of the proposed algorithm. Potential improvements include the use of parallelization techniques to accelerate the model training process. Additionally, testing the algorithm on larger bus systems could provide insights into its scalability and effectiveness. Moreover, the HSEM model could be extended to more complex scenarios, such as those involving changes in topology.

Author Contributions

Methodology, K.X.; writing—original draft, X.Z.; writing—review and editing, L.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China under Grant 52477202.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sharma, A.; Jain, S. Day-ahead optimal reactive power ancillary service procurement under dynamic multi-objective framework in wind integrated deregulated power system. Energy 2021, 223, 120028. [Google Scholar] [CrossRef]
Adhvaryyu, P.; Chattopadhyay, P.; Bhattacharya, A. Dynamic optimal power flow of combined heat and power system with Valve-point effect using Krill Herd algorithm. Energy 2017, 127, 756–767. [Google Scholar] [CrossRef]
Pourakbari-Kasmaei, M.; Rider, M.; Mantovani, J. An unequivocal normalization-based paradigm to solve dynamic economic and emission active-reactive OPF (optimal power flow). Energy 2014, 73, 554–566. [Google Scholar] [CrossRef]
Rahmani, S.; Amjady, N. A new optimal power flow approach for wind energy integrated power systems. Energy 2017, 134, 349–359. [Google Scholar] [CrossRef]
Ghasemi, M.; Ghavidel, S.; Akbari, E.; Vahed, A. Solving non-linear, non-smooth and non-convex optimal power flow problems using chaotic invasive weed optimization algorithms based on chaos. Energy 2014, 73, 340–353. [Google Scholar] [CrossRef]
Gao, M.; Yu, J.; Yang, Z.; Zhao, J. A physics-guided graph convolution neural network for optimal power flow. IEEE Trans. Power Syst. 2023, 39, 380–390. [Google Scholar] [CrossRef]
Jabr, R.; Coonick, A.; Cory, B. A primal-dual interior point method for optimal power flow dispatching. IEEE Trans. Power Syst. 2002, 17, 654–662. [Google Scholar] [CrossRef]
Zehar, K.; Sayah, S. Optimal power flow with environmental constraint using a fast successive linear programming algorithm: Application to the Algerian power system. Energy Convers. Manag. 2008, 49, 3362–3366. [Google Scholar] [CrossRef]
Momoh, J.; Adapa, R.; El-Hawary, M. A review of selected optimal power flow literature to 1993. I. Nonlinear and quadratic programming approaches. IEEE Trans. Power Syst. 1999, 14, 96–104. [Google Scholar] [CrossRef]
Ambriz-Perez, H.; Acha, E.; Fuerte-Esquivel, C. Advanced SVC models for Newton-Raphson load flow and Newton optimal power flow studies. IEEE Trans. Power Syst. 2000, 15, 129–136. [Google Scholar] [CrossRef]
Abido, M. Optimal power flow using particle swarm optimization. Int. J. Electr. Power Energy Syst. 2002, 24, 563–571. [Google Scholar] [CrossRef]
Nguyen, T. A high performance social spider optimization algorithm for optimal power flow solution with single objective optimization. Energy 2019, 171, 218–240. [Google Scholar] [CrossRef]
Hasanien, H.; Alsaleh, I.; Alassaf, A.; Alateeq, A. Enhanced coati optimization algorithm-based optimal power flow including renewable energy uncertainties and electric vehicles. Energy 2023, 283, 129069. [Google Scholar] [CrossRef]
Meng, A.; Zeng, C.; Wang, P.; Chen, D.; Zhou, T.; Zheng, X.; Yin, H. A high-performance crisscross search based grey wolf optimizer for solving optimal power flow problem. Energy 2021, 225, 120211. [Google Scholar] [CrossRef]
Jamal, R.; Zhang, J.; Men, B.; Khan, N.; Ebeed, M.; Jamal, T.; Mohamed, E. Chaotic-quasi-oppositional-phasor based multi populations gorilla troop optimizer for optimal power flow solution. Energy 2024, 301, 131684. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, S.; Jia, Y.; Xiao, J.; Bai, X. Data-Driven RNN Method for Distribution Network Optimal Power Flow Problems. In Proceedings of the 2023 3rd Power System and Green Energy Conference (PSGEC), Shanghai, China, 24–26 August 2023; pp. 382–387. [Google Scholar]
Yang, K.; Gao, W.; Fan, R. Optimal Power Flow Estimation Using One-Dimensional Convolutional Neural Network. In Proceedings of the 2021 North American Power Symposium (NAPS), College Station, TX, USA, 14–16 November 2021; pp. 1–6. [Google Scholar]
Jia, Y.; Bai, X.; Zheng, L.; Weng, Z.; Li, Y. ConvOPF-DOP: A Data-Driven Method for Solving AC-OPF Based on CNN Considering Different Operation Patterns. IEEE Trans. Power Syst. 2023, 38, 853–860. [Google Scholar] [CrossRef]
Owerko, D.; Gama, F.; Ribeiro, A. Optimal power flow using graph neural networks. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 5930–5934. [Google Scholar]
Owerko, D.; Gama, F.; Ribeiro, A. Unsupervised Optimal Power Flow Using Graph Neural Networks. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 6885–6889. [Google Scholar] [CrossRef]
Deka, D.; Misra, S. Learning for DC-OPF: Classifying active sets using neural nets. In Proceedings of the 2019 IEEE Milan PowerTech, Milan, Italy, 23–27 June 2019; pp. 1–6. [Google Scholar] [CrossRef]
Zamzam, A.; Baker, K. Learning Optimal Solutions for Extremely Fast AC Optimal Power Flow. In Proceedings of the 2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Tempe, AZ, USA, 11–13 November 2020; pp. 1–6. [Google Scholar] [CrossRef]
Li, M.; Wei, W.; Chen, Y.; Ge, M.F.; Catalão, J.P.S. Learning the Optimal Strategy of Power System Operation With Varying Renewable Generations. IEEE Trans. Sustain. Energy 2021, 12, 2293–2305. [Google Scholar] [CrossRef]
Baker, K. Learning Warm-Start Points for AC Optimal Power Flow. 2019. Available online: https://arxiv.org/abs/1905.08860 (accessed on 3 December 2024).
Cao, Y.; Zhao, H.; Liang, G.; Zhao, J.; Liao, H.; Yang, C. Fast and explainable warm-start point learning for AC Optimal Power Flow using decision tree. Int. J. Electr. Power Energy Syst. 2023, 153, 109369. [Google Scholar] [CrossRef]
Chen, M.; Liu, Q.; Chen, S.; Liu, Y.; Zhang, C.H.; Liu, R. XGBoost-based algorithm interpretation and application on post-fault transient stability status prediction of power system. IEEE Access 2019, 7, 13149–13158. [Google Scholar] [CrossRef]
Wolpert, D. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 2011, 1, 14–23. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6639–6649. [Google Scholar]
Zhou, Y.; Lee, W.J.; Diao, R.; Shi, D. Deep Reinforcement Learning Based Real-time AC Optimal Power Flow Considering Uncertainties. J. Mod. Power Syst. Clean Energy 2022, 10, 1098–1109. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Friedman, J. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
Nobre, J.; Neves, R. Combining Principal Component Analysis, Discrete Wavelet Transform and XGBoost to trade in the financial markets. Expert Syst. Appl. 2019, 125, 181–194. [Google Scholar] [CrossRef]
Zhang, K.; Schölkopf, B.; Muandet, K.; Wang, Z. Domain adaptation under target and conditional shift. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; pp. 819–827. [Google Scholar]
Zimmerman, R.; Murillo-Sánchez, C.; Thomas, R. MATPOWER: Steady-State Operations, Planning, and Analysis Tools for Power Systems Research and Education. IEEE Trans. Power Syst. 2011, 26, 12–19. [Google Scholar] [CrossRef]
Li, S.; Gong, W.; Wang, L.; Yan, X.; Hu, C. Optimal power flow by means of improved adaptive differential evolution. Energy 2020, 198, 117314. [Google Scholar] [CrossRef]

Figure 1. Level-wise and leaf-wise growth strategies.

Figure 2. Flow chart of HSEM.

Figure 3. Scatter plot of

|κ|

for comparative algorithms in the 30-bus system.

Figure 3. Scatter plot of

|κ|

for comparative algorithms in the 30-bus system.

Figure 4. Scatter plot and frequency distribution of

|κ|

for the HSEM algorithm in the 30-bus system.

Figure 4. Scatter plot and frequency distribution of

|κ|

for the HSEM algorithm in the 30-bus system.

Figure 5. Scatter plot of absolute percentage errors for control variables in the 30-bus system for

P_{g}

.

Figure 5. Scatter plot of absolute percentage errors for control variables in the 30-bus system for

P_{g}

.

Figure 6. Scatter plot of absolute percentage errors for control variables in the 30-bus system for

V_{g}

.

Figure 6. Scatter plot of absolute percentage errors for control variables in the 30-bus system for

V_{g}

.

Figure 7. Scatter plot of

|κ|

for comparative algorithms in the 118-bus system.

Figure 7. Scatter plot of

|κ|

for comparative algorithms in the 118-bus system.

Figure 8. Scatter plot and frequency distribution of

|κ|

for the HSEM algorithm in the 118-bus system.

Figure 8. Scatter plot and frequency distribution of

|κ|

for the HSEM algorithm in the 118-bus system.

Figure 9. Scatter plot of absolute percentage errors for control variables in the 118-bus system for

P_{g}

.

Figure 9. Scatter plot of absolute percentage errors for control variables in the 118-bus system for

P_{g}

.

Figure 10. Scatter plot of absolute percentage errors for control variables in the 118-bus system for

V_{g}

.

Figure 10. Scatter plot of absolute percentage errors for control variables in the 118-bus system for

V_{g}

.

Table 1. Dataset information of test systems.

Test System	Training Dataset	Testing Dataset
30-bus system	40,000	10,000
118-bus system	56,000	14,000

Table 2. Performance of different algorithms on the IEEE 30-bus system: objective function values.

Algorithm	Feasibility (%)	Max $\|κ\|$ (%)	Min $\|κ\|$ (%)	Average $\|κ\|$ (%)	MSE	MAE
DNN	58.45	$2.64 \times 10^{- 2}$	$2.72 \times 10^{- 9}$	$8.61 \times 10^{- 4}$	$2.25 \times 10^{- 2}$	$7.31 \times 10^{- 2}$
RT	100.00	$3.75 \times 10^{- 1}$	$1.15 \times 10^{- 6}$	$1.99 \times 10^{- 2}$	$9.29 \times 10^{- 0}$	$1.77 \times 10^{- 0}$
RF	100.00	$1.22 \times 10^{- 1}$	$4.50 \times 10^{- 7}$	$6.81 \times 10^{- 3}$	$1.03 \times 10^{- 0}$	$0.60 \times 10^{- 0}$
CatBoost	46.41	$7.42 \times 10^{- 3}$	$1.55 \times 10^{- 7}$	$2.88 \times 10^{- 4}$	$2.06 \times 10^{- 3}$	$2.54 \times 10^{- 2}$
Lightgbm	100.00	$2.58 \times 10^{- 2}$	$3.50 \times 10^{- 9}$	$1.16 \times 10^{- 3}$	$3.47 \times 10^{- 2}$	$1.01 \times 10^{- 1}$
XGBoost	100.00	$2.93 \times 10^{- 2}$	$1.97 \times 10^{- 7}$	$9.85 \times 10^{- 4}$	$1.97 \times 10^{- 2}$	$8.66 \times 10^{- 2}$
HSEM	99.96	$1.88 \times 10^{- 2}$	$1.31 \times 10^{- 9}$	$1.79 \times 10^{- 4}$	$9.29 \times 10^{- 4}$	$1.59 \times 10^{- 2}$

Table 3. Performance of different algorithms on the IEEE 30-bus system: control variables.

Algorithm	MSE for $P_{g}$	MSE for $V_{g}$	MAE for $P_{g}$	MAE for $V_{g}$
DNN	$3.37 \times 10^{- 2}$	$1.51 \times 10^{- 8}$	$1.01 \times 10^{- 1}$	$9.21 \times 10^{- 5}$
RT	1.76	$4.07 \times 10^{- 7}$	$7.90 \times 10^{- 1}$	$2.91 \times 10^{- 4}$
RF	$5.98 \times 10^{- 1}$	$1.80 \times 10^{- 7}$	$4.65 \times 10^{- 1}$	$1.93 \times 10^{- 4}$
CatBoost	$3.50 \times 10^{- 2}$	$1.42 \times 10^{- 9}$	$1.01 \times 10^{- 1}$	$1.89 \times 10^{- 5}$
LightGBM	$1.23 \times 10^{- 1}$	$1.05 \times 10^{- 8}$	$2.04 \times 10^{- 1}$	$5.11 \times 10^{- 5}$
XGBoost	$1.50 \times 10^{- 1}$	$9.22 \times 10^{- 9}$	$2.32 \times 10^{- 1}$	$4.76 \times 10^{- 5}$
HSEM	$8.86 \times 10^{- 3}$	$6.93 \times 10^{- 10}$	$5.37 \times 10^{- 2}$	$1.27 \times 10^{- 5}$

Table 4. Performance of different algorithms on the IEEE 118-bus system: objective function values.

Algorithm	Feasibility (%)	Max $\|κ\|$ (%)	Min $\|κ\|$ (%)	Average $\|κ\|$ (%)	MSE	MAE
DNN	45.46	$2.80 \times 10^{- 2}$	$2.48 \times 10^{- 6}$	$3.39 \times 10^{- 3}$	$2.97 \times 10^{1}$	$4.35$
RT	52.66	$8.55 \times 10^{- 2}$	$5.69 \times 10^{- 6}$	$5.38 \times 10^{- 2}$	$1.35 \times 10^{4}$	$7.00 \times 10^{1}$
RF	66.19	$3.54 \times 10^{- 1}$	$1.30 \times 10^{- 6}$	$2.42 \times 10^{- 2}$	$2.65 \times 10^{3}$	$3.15 \times 10^{1}$
CatBoost	6.35	$8.75 \times 10^{- 4}$	$9.00 \times 10^{- 8}$	$7.17 \times 10^{- 5}$	$1.86 \times 10^{- 2}$	$9.33 \times 10^{- 2}$
Lightgbm	95.57	$6.43 \times 10^{- 2}$	$3.19 \times 10^{- 8}$	$3.45 \times 10^{- 3}$	$6.04 \times 10^{1}$	$4.47$
XGBoost	95.03	$1.15 \times 10^{- 2}$	$3.52 \times 10^{- 8}$	$8.78 \times 10^{- 4}$	$3.17$	$1.14$
HSEM	99.22	$3.03 \times 10^{- 3}$	$7.02 \times 10^{- 9}$	$5.44 \times 10^{- 5}$	$1.34 \times 10^{- 2}$	$7.05 \times 10^{- 2}$

Table 5. Performance of different algorithms on the IEEE 118-bus system: control variables.

Algorithm	MSE for $P_{g}$	MSE for $V_{g}$	MAE for $P_{g}$	MAE for $V_{g}$
DNN	$6.51 \times 10^{- 1}$	$4.17 \times 10^{- 7}$	$4.56 \times 10^{- 1}$	$4.67 \times 10^{- 4}$
RT	$5.02$	$8.23 \times 10^{- 7}$	$1.22$	$4.96 \times 10^{- 4}$
RF	$2.23$	$3.95 \times 10^{- 7}$	$8.22 \times 10^{- 1}$	$3.46 \times 10^{- 4}$
CatBoost	$1.09 \times 10^{- 2}$	$9.87 \times 10^{- 10}$	$5.19 \times 10^{- 2}$	$1.85 \times 10^{- 5}$
LightGBM	$2.01 \times 10^{- 1}$	$1.53 \times 10^{- 8}$	$2.54 \times 10^{- 1}$	$7.55 \times 10^{- 5}$
XGBoost	$1.62 \times 10^{- 1}$	$1.37 \times 10^{- 8}$	$2.28 \times 10^{- 1}$	$7.02 \times 10^{- 5}$
HSEM	$4.17 \times 10^{- 3}$	$6.07 \times 10^{- 10}$	$3.36 \times 10^{- 2}$	$1.40 \times 10^{- 5}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, K.; Zhang, X.; Qiu, L. Explainable Warm-Start Point Learning for AC Optimal Power Flow Using a Novel Hybrid Stacked Ensemble Method. Sustainability 2025, 17, 438. https://doi.org/10.3390/su17020438

AMA Style

Xu K, Zhang X, Qiu L. Explainable Warm-Start Point Learning for AC Optimal Power Flow Using a Novel Hybrid Stacked Ensemble Method. Sustainability. 2025; 17(2):438. https://doi.org/10.3390/su17020438

Chicago/Turabian Style

Xu, Kaijie, Xiaochen Zhang, and Lin Qiu. 2025. "Explainable Warm-Start Point Learning for AC Optimal Power Flow Using a Novel Hybrid Stacked Ensemble Method" Sustainability 17, no. 2: 438. https://doi.org/10.3390/su17020438

APA Style

Xu, K., Zhang, X., & Qiu, L. (2025). Explainable Warm-Start Point Learning for AC Optimal Power Flow Using a Novel Hybrid Stacked Ensemble Method. Sustainability, 17(2), 438. https://doi.org/10.3390/su17020438

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Warm-Start Point Learning for AC Optimal Power Flow Using a Novel Hybrid Stacked Ensemble Method

Abstract

1. Introduction

2. Problem Formulation

3. Materials and Methods

3.1. Regression Tree

3.1.1. Selecting the Best Split Variable and Split Point

3.1.2. Partition the Feature Space and Determine the Predicted Data

3.1.3. Recursive Splitting

3.1.4. Prediction Output

3.2. Random Forest

3.2.1. Generate Training Subsets

3.2.2. Build and Train Regression Trees

3.2.3. Prediction Output

3.3. XGBoost

3.4. LightGBM

3.5. CatBoost

3.5.1. Efficient Handling of Categorical Features

3.5.2. Combining Multiple Categorical Features

3.5.3. Ordered Boosting

3.6. Ensemble Model Learning Method

4. Results

4.1. Simulation Setting

4.1.1. Data Generation

4.1.2. Data Preprocessing

4.2. IEEE 30-Bus System

4.3. IEEE 118-Bus System

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI