Research on Energy Scheduling Optimization Strategy with Compressed Air Energy Storage

Wang, Rui; Zhang, Zhanqiang; Meng, Keqilao; Lei, Pengbing; Wang, Kuo; Yang, Wenlu; Liu, Yong; Lin, Zhihua

doi:10.3390/su16188008

Open AccessArticle

Research on Energy Scheduling Optimization Strategy with Compressed Air Energy Storage

by

Rui Wang

¹

,

Zhanqiang Zhang

^1,*,

Keqilao Meng

²,

Pengbing Lei

³,

Kuo Wang

¹,

Wenlu Yang

¹,

Yong Liu

⁴ and

Zhihua Lin

^5,*

¹

College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, China

²

College of New Energy, Inner Mongolia University of Technology, Hohhot 010080, China

³

POWERCHINA Hebei Electric Power Engineering Co., Ltd., Shijiazhuang 050031, China

⁴

Shandong Energy Group Electric Power Group Co., Ltd., Jinan 250014, China

⁵

Science and Technology Research Institute of China Three Gorges Corporation, Beijing 101100, China

^*

Authors to whom correspondence should be addressed.

Sustainability 2024, 16(18), 8008; https://doi.org/10.3390/su16188008

Submission received: 31 July 2024 / Revised: 11 September 2024 / Accepted: 11 September 2024 / Published: 13 September 2024

(This article belongs to the Section Energy Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

Due to the volatility and intermittency of renewable energy, the integration of a large amount of renewable energy into the grid can have a significant impact on its stability and security. In this paper, we propose a tiered dispatching strategy for compressed air energy storage (CAES) and utilize it to balance the power output of wind farms, achieving the intelligent dispatching of the source–storage–grid system. The Markov decision process framework is used to describe the energy dispatching problem of CAES through the Actor–Critic (AC) algorithm. To address the stability and low sampling efficiency issues of the AC algorithm in continuous action spaces, we employ the deep deterministic policy gradient (DDPG) algorithm, a model-free deep reinforcement learning algorithm based on deterministic policy. Furthermore, the use of Neuroevolution of Augmenting Topologies (NEAT) to improve DDPG can enhance the adaptability of the algorithm in complex environments and improve its performance. The results show that scheduling accuracy of the DDPG-NEAT algorithm reached 91.97%, which was 15.43% and 31.5% higher than the comparison with the SAC and DDPG algorithms, respectively. The algorithm exhibits excellent performance and stability in CAES energy dispatching.

Keywords:

compressed air energy storage; deep deterministic policy gradient; neuroevolution of augmenting topologies; optimal scheduling

1. Introduction

In recent years, the growth of the national population and the continuous development of the economy have caused energy consumption to surge [1]. The extensive use of fossil fuels by humans has unavoidably affected the environment, emitting a large amount of greenhouse gases and causing the greenhouse effect. Opting for renewable energy over fossil fuels can lead to cleaner and more sustainable energy [2]. The large-scale development of renewable energy gradually improves absorption and regulation capabilities for a high proportion of renewable sources, ultimately leading to the creation of a new type of power system predominantly based on renewable energy [3].

However, due to the inherent intermittency and instability of wind and photovoltaic power, integrating these energy sources into the grid faces significant challenges [4]. Energy storage technology is a great and affordable solution for addressing this issue. By storing energy during off-peak times and providing power during peak hours, it minimizes load fluctuations, creates grid space for renewables, enhances consumption efficiency, and ensures power system stability [5]. CAES is a promising technology that has several benefits. It can store a large amount of energy, is affordable, has a long lifespan, and is environmentally friendly [6,7]. When CAES systems are applied to distributed energy systems, the coordinated scheduling of the system faces problems such as energy wastage due to the inherent characteristics of renewable energy sources. Thus, the reliable and cost-effective scheduling of compressed air energy storage devices and controllable equipment is a major concern [8]. From this, it is clear that an energy scheduling strategy holds a crucial position in the efficient use of energy [9].

In recent times, many scholars have conducted a series of studies on energy scheduling strategies. At present, energy optimization management problems can be classified into traditional optimization strategies and strategies based on deep reinforcement learning (DRL) [10]. Traditional optimization methods comprise mathematical programming and dynamic programming. Traditional optimization methods, like linear programming [11], dynamic programming [12], and integer programming [13], have been commonly utilized by scholars to analyze energy management problems. Li et al. [14] analyzed the system temperature and created a model for scheduling a tri-generation Advanced Adiabatic Compressed Air Energy Storage (AA-CAES) system. They proposed an improved scheduling strategy for tri-generation microgrids by combining binary technology and piecewise linearization. Reference [12] tackles the optimization scheduling problem of integrated energy systems in campuses. Since this optimization scheduling model is essentially a multi-period non-convex nonlinear programming problem, the researchers proposed an improved approximate dynamic programming algorithm to enhance the solution efficiency. Reference [15] focuses on multi-energy microgrids based on cogeneration. On the premise of improving energy utilization, it proposed a multi-objective mixed-integer optimization planning method that considers various demand-side thermal–electric coordinated responses. For instance, reference [16] established a cogeneration scheduling model containing AA-CAES based on mixed linear programming, significantly reducing wind curtailment and operating costs.

Although the aforementioned traditional optimization algorithms have the advantages of being simple, stable, and reliable, capable of completing simple scheduling optimization tasks, they are unable to address complex power system problems, such as multi-objective optimization, uncertainty issues, etc. [17]. When implementing energy management strategies for CAES, due to the multiple constraints and objective functions involved in energy management, there is a need for more advanced heuristic algorithms for optimization [18]. Some examples of these algorithms are particle swarm optimization [19], Genetic Algorithms [20], Sparrow Search Algorithm [21], etc. Ref. [22] proposed a dual-layer optimization scheduling method based on an enhanced Firework Algorithm. This method uses the Firework Algorithm to solve complex problems, considering reliability and economy. Meanwhile, a scheduling model for integrated energy systems, which considers uncertainty and hybrid energy storage devices, uses a combination of Fuzzy C-Means and an improved particle swarm optimization algorithm to tackle the system’s uncertainties [23]. On the other hand, regarding the optimization configuration of energy storage, ref. [24] solved the optimization configuration problem of the energy storage system through a method based on an improved multi-objective particle swarm optimization algorithm.

When it comes to practical applications, it is important to choose the right optimization method and algorithm. The heuristic algorithms mentioned earlier can help improve energy management efficiency and accuracy to some extent [25]. However, as power systems’ big data characteristics become more clear, traditional solving algorithms face restrictions in scope, which result in challenges such as high computational loads, biased model predictions, reduced accuracy, and slower speeds [26]. At the same time, with the rapid development of artificial intelligence technology, theoretical methods such as deep learning and reinforcement learning have shown significant advantages in data analysis, prediction, classification, etc., especially process massive data [27,28]. Currently, researchers and practitioners are applying DRL algorithms in energy storage scheduling, optimization strategies, operational control, and energy management. Reference [29] proposes a collaborative energy management model for the characteristics of wind and solar energy. The final use of the Q-learning algorithm to solve the peak control energy management optimization problem in coastal residential areas achieved effective results in reducing load fluctuations, improving system economy, and increasing system self-sufficiency. In order to solve the economic security scheduling problem caused by the high proportion of renewable energy access to the power grid, a proximal policy optimization algorithm based on the Kullback–Leibler dispersion penalty factor and importance sampling technique was proposed. Finally, the adaptability and effectiveness of the proposed method were verified by simulation [30]. In [31], aiming at the problem of insufficient peaking capacity of the power system after the grid connection of large-scale renewable energy sources such as wind power and photovoltaics, a two-layer optimized new energy pumped storage combined system operation strategy based on linear decreasing inertia weight particle swarm optimization and sequential quadratic programming algorithms was proposed. The simulation results showed that the strategy could effectively reduce the impact of wind energy peaking characteristics on the power grid and improve the utilization rate of renewable energy. Reference [32] proposed an adaptive active power rolling dispatch strategy based on distributed deep reinforcement learning to cope with the uncertainty of a high percentage of renewable energy sources. The experimental results showed that the proposed algorithm could help multiple intelligences learn effective active power control strategies. To address the dangers of CAES operating in concert with solar power systems, a deep reinforcement learning approach was proposed for optimizing CAES energy arbitrage in a prediction model [33].

Addressing the issues mentioned above, this paper proposes a tiered dispatch strategy for CAES and an improvement in the neural network structure within the DDPG algorithm. In the proposed algorithm, a neural network topology evolves to adapt to the structure suited for energy storage scheduling problems. By using this new network structure, we improve the training outcomes. Finally, through case analysis using real measured data, we show that the algorithm effectively improves the accuracy of scheduling and energy utilization efficiency. The following are the main contributions of this work:

A hierarchical scheduling model for CAES systems is constructed and transformed into a Markov decision process. The coordinated scheduling problem of wind farms and energy storage is balanced using the DRL algorithm.
In order to achieve the efficient learning of intelligences, a deterministic policy gradient-based DDPG algorithm is used. The algorithm effectively improves the learning ability of the intelligent body in continuous action space to adapt to its complex environment in the power system.
This paper introduces a combined algorithm that merges the NEAT algorithm with the DDPG algorithm to enhance the effectiveness of the algorithm. By utilizing the adaptive network structure of NEAT, the combined approach improves adaptability in complex environments and efficiency in renewable energy utilization.

The remainder of this paper is organized as follows. Section 2 is an integrated energy framework. Section 3 describes the source–storage–grid cooperative control model. Section 4 introduces deep reinforcement learning algorithms including the AC algorithm, DDPG and NEAT algorithm. Section 5 illustrates the case studies and results.

2. Integrated Energy Framework

The AA-CAES integrated energy system is represented by the framework illustrated in Figure 1. The control framework of the system uses a hierarchical strategy to coordinate and optimize energy. Based on these instructions from the control center, the wind farm operates in conjunction with AA-CAES. The control strategy builds on DRL, an approach that utilizes models and data to maximize rewards through dynamic interactions between the system and its environment.

2.1. AA-CAES Structure

The main components of an AA-CAES system include compressors, expanders, air storage chambers, heat exchangers, and a temperature storage device, as shown in Figure 2. During the energy storage phase, the AA-CAES system harnesses electricity generated from renewable sources to drive compressors that compress air into a high-pressure, high-temperature gas. This gas then enters a heat exchanger to undergo thermal moderation with a heat transfer medium, resulting in a cooled high-pressure gas that is stored within air storage chambers; simultaneously, the heat medium, now carrying the absorbed heat, is directed into thermal storage tanks to store the thermal energy. During energy release, the high-pressure, cooled gas from the storage chambers is released and enters a heat exchanger to conduct thermal exchange with heat released from the thermal storage tanks, creating high-pressure, high-temperature gas that powers expanders to generate electricity, which is subsequently vented into the atmosphere.

In this paper, while taking into account the operational characteristics of AA-CAES to establish appropriate operational constraints, it is treated as a large-scale energy storage device during the scheduling process, without a detailed examination of its working process.

2.2. Hierarchical Energy Optimization Strategy

The system control framework uses a tiered strategy to coordinate and control energy. It divides into control and physical layers, integrating information, power, and control flows. Figure 3 illustrates the tiered structure of the source–storage–grid system. The control layer includes a control center that sends instructions to the wind farms and energy storage units. The wind farms change their power output, and energy storage units adjust their charging and discharging based on these commands. The physical layer connects the wind farms, energy storage, and the grid at the AA-CAES station. It manages the energy flow between the source, storage, and grid.

2.3. DRL Description

Due to the inherent randomness and uncertainty of wind energy, it is challenging to establish a specific and accurate mathematical model for the source–storage–grid optimization decision process. As a result, we achieve a tiered scheduling optimization strategy by utilizing a model-free, data-driven deep reinforcement learning method. We transform the energy scheduling issue of AA-CAES as a Markov decision process (MDP) and gather relevant data from various parts of the system in a data-driven manner. The state space consists of the states of each model within the source-storage-grid. We derive the optimal scheduling strategy through the iterative process of states and actions, guided by the reward values from the interactions between the agent and the environment. Figure 4 illustrates the agent process between an intelligent body and its environment.

3. Cooperative Control Framework Of Source–Storage–Grid System

The control center enhances grid security by coordinating wind farms and the charging/discharging of AA-CAES through an integrated source–storage–grid model. AA-CAES undertakes peak shaving and valley filling, ensuring the stable operation of the power system and supplying the grid with reserve energy.

3.1. Wind Farm Model

The output power variability of wind turbine units can impact the stability of the electrical grid during grid connection. Taking a comprehensive analytical approach involves creating a wind farm model to study its impact on the electrical system. The active power output of wind farms operates within certain constraints:

P_{min} \leq P_{t}^{W} \leq P_{max}

(1)

where

P_{min}

and

P_{max}

denote the minimum and maximum active power output of the wind farm.

3.2. AA-CAES Model

The fluctuations in the active power output of wind farms must comply with the power system’s requirements for safe and stable operation. Simultaneously, AA-CAES can eliminate the effects of active power fluctuations on the power system. AA-CAES helps the grid meet its peak and valley demands by changing the state of charge and discharge. AA-CAES operates in three modes: storing energy, releasing energy, and being idle. We make assumptions in modeling AA-CAES to simplify the computation, as follows:

It is assumed that air is an ideal gas and satisfies the ideal gas equation of state;
The reservoir is modeled using an isothermal constant volume model, where the temperature of the air in the reservoir is equal to the ambient temperature, and the volume of the reservoir is exploded to be constant;
The compressor and expander are modeled adiabatically;
Heat loss from the heat storage tank and heat loss from the heat exchange process are excluded.

Energy storage process: The process involves a compressor pressurizing air into high-pressure storage units and a heat exchanger storing the generated heat in thermal storage tanks. Based on the assumptions made, the instantaneous compression power

P_{c h a r g e, t}

of AA-CAES during the energy storage process can be expressed as follows:

\begin{matrix} P_{c h a r g e, t} = \frac{1}{η_{m}} \sum_{k = 1}^{n_{c}} \frac{1}{η_{c, k}} q_{c, t} C T_{c, k} (π_{c, k}^{\frac{λ - 1}{λ}} - 1), \forall t \in {1, 2, . . ., T} \end{matrix}

(2)

where T is a complete scheduling cycle;

n_{c}

is the number of compressor stages;

q_{c, t}

is the compressor inlet air mass flow rate at the time of scheduling; C is the constant-pressure specific heat capacity of air;

T_{c, k}

is the inlet air temperature of the compressor in stage k; p1 is the pressure ratio of the compressor in stage k, and

λ

is the index of multiplicity;

η_{c, k}

is the equivalent direct efficiency of the compressor in stage k; and

η_{m}

is the efficiency of the electric motor.

Energy release process: The AA-CAES energy release process consists of generating electrical energy through the turbine expansion of high-pressure air and the heating of this air in a combustion chamber. Following the same principles as before, the expansion power

P_{d i s c h a r g e, t}

at any given moment during the electricity generation process can be derived as follows:

P_{d i s c h a n g e, t} = η_{g} \sum_{k = 1}^{n_{d}} η_{d, k} q_{d, t} C T_{d, k} (1 - π_{d, k}^{- \frac{λ - 1}{λ}}), \forall t \in {1, 2, . . ., T}

(3)

where

n_{d}

is the number of expander stages;

q_{d, t}

is the expander inlet air mass flow rate at the scheduling moment t;

T_{d, k}

is the inlet air temperature of the expander of the stage k;

π_{d, k}

is the pressure ratio of the expander of the stage k;

η_{d, k}

is the iso-first efficiency of the expander of the stage k; and

η_{g}

is the efficiency of the generator. AA-CAES also needs to satisfy constraints on charging and discharging power:

P_{m i n}^{C} \leq P_{t}^{C} \leq P_{m a x}^{C}

(4)

where

P_{t}^{C}

correspond to the charging and discharging power at time t.

P_{m i n}^{C}

and

P_{m a x}^{C}

are the minimum and maximum power of charging and discharging, respectively. The capacity constraints are as follows:

E_{m i n} \leq E_{t}^{c} \leq E_{m a x}

(5)

where

E_{t}^{c}

is the discharge power at time t.

E_{m i n}

and

E_{m a x}

are the minimum and maximum capacity. CAES can store residual power and its electrical energy storage state

S O C_{t}

can be defined as follows:

S O C_{t} = \frac{E_{t}^{c}}{E_{m a x}} \times 100 %

(6)

To avoid overcharging and discharging, it is also necessary to add constraints on the state of charge:

S O C_{min} \leq S O C_{t} \leq S O C_{max}

(7)

where

S O C_{m i n}

and

S O C_{m a x}

are the minimum and maximum state of charge of the CAES, respectively.

3.3. Energy Scheduling Model

The energy scheduling model comprises the electrical grid, wind farms, and AA-CAES. The model works by analyzing the wind farm output and grid electricity needs. It then sends commands to AA-CAES to control its operations and balance out fluctuations in wind power. The power required for grid dispatch is shown in Equation (8).

P_{m i n} \leq P_{t}^{r e f} \leq P_{m a x}

(8)

where

P_{t}^{r e f}

is the grid dispatch active power reference value.

P_{m i n}

and

P_{m a x}

are the minimum and maximum values of the grid dispatching power. The imbalance between the grid dispatching power and the wind farm output power determines the operation of AA-CAES.

Δ P = P_{t}^{r e f} - P_{t}^{W}

(9)

Here,

Δ P > 0

indicates AA-CAES is discharging,

Δ P < 0

indicates AA-CAES is charging, and

Δ P = 0

represents AA-CAES in idle mode.

3.4. Markov Model

The aim of DRL is to make optimal decisions by analyzing system behaviors, with the goal of enhancing system performance. The learning process for the agent is realized through an MDP; at each time step, the agent performs an action based on the current environmental state, and then receives a reward and information about the subsequent state. To find the optimal strategy, the agent uses state information from previous environments to assess the quality of its actions. The energy scheduling issue can be described as an MDP, and this MDP consists of a state space

s

, an action space

a

, and a reward function

r

.

3.4.1. State Space $s_{t}$

The state space consists of three components. This model conforms to the source–storage–grid model and its constraints. When the dispatching power

P_{t}^{r e f}

is less than the active power of the wind farm

P_{t}^{W}

, AA-CAES discharges. Conversely, AA-CAES is in a charging state.

P_{t}^{C}

represent the charging and discharging power of AA-CAES, respectively. The system’s state space can be described as follows:

s_{t} = [P_{t}^{r e f}, P_{t}^{W}, P_{t}^{C}]

(10)

3.4.2. Action Space $a_{t}$

The agent interacts with the environment, generating actions based on the state of the environment and energy scheduling strategies. Based on the system state

s_{t}

at time

t

, the charging and discharging actions of AA-CAES are influenced by the grid dispatching state and the output of the wind farm. The operations within the system are described as follows:

a_{t} \in P_{t}^{C},

(11)

a_{t} \in [P_{m i n}^{C}, P_{m a x}^{C}]

(12)

3.4.3. Reward $r_{t}$

The goal of this paper is to design a reward function that focuses on lowering operational expenses. Therefore, the reward function at time t is defined as the opposite of the operational costs of the system. The reward function

r_{t}

is defined as follows:

r_{t} = - (P_{t}^{c} C_{L} Δ t + \sum_{n = 1}^{T} C_{t, i})

(13)

where

C_{t, i}

denotes the penalty value of i at time t, and

C_{L}

denotes the electricity price of the grid.

4. Deep Reinforcement Learning Algorithms

Model-free methods in DRL can be divided into value-based learning and policy-based learning. Value-based learning usually employs neural networks to approximate the optimal action-value function (Q-function), while policy-based learning utilizes neural networks to train a policy function, obtaining a probability distribution of actions to maximize returns. Therefore, the AC method, combining both policy-based and value-based approaches, is utilized. In order to solve the problem of the slow convergence of algorithmic networks and the inability to explore the environment, the stochastic policy gradient algorithm was changed to the DRL algorithm with deep deterministic policy gradient—DDPG. Furthermore, the DDPG algorithm incorporates a genetic topology neural network approach to enhance the neural network, resulting in improved adaptability, increased learning efficiency, and enhanced scheduling accuracy.

4.1. Actor–Critic Algorithm

The AC model has two parts: the actor and the critic. The actor decides, and the critic evaluates whether the decisions are right. The actor and critic interact and influence each other. The actor receives the state

s_{t}

from the environment, generates an action

a_{t}

, and provides feedback to both the critic and the environment. Meanwhile, the critic assesses the action

a_{t}

taken in state

s_{t}

. Finally, the critic updates the value and policy networks based on the Temporal Difference error (TD-error). The value network and the policy network are approximated by two neural networks. The state value function is defined as follows:

V_{π} (s) = \sum_{a} π (a | s) Q_{π} (s, a)

(14)

Within the state value function,

π (a ∣ s)

is the policy function that calculates action probabilities and controls the actions of the agent, while

Q_{π} (s, a)

is the action-value function used to assess the quality of actions. Since

π (a ∣ s)

and

Q_{π} (s, a)

are unknown, they are approximated using two neural networks learned through the AC method. In the policy network, a neural network denoted as

π (a ∣ s; θ)

is used to approximate

π (a ∣ s)

and, in the value network,

q (s, a; w)

is employed to approximate

Q_{π} (s, a)

, with the state value function approximated as

V (s; θ, w)

, so Equation (14) can be expressed as

V (s; θ, w) = \sum_{a} π (a ∣ s; θ) q (s, a; w)

(15)

where

ω

and

θ

are the parameters of the neural network. The state value function is obtained by neural network.

ω

is updated in the value network using Temporal Difference (TD) updating. TD is calculated using the formula for

q (s_{t + 1}, a_{t + 1}; w_{t})

at times

t

and

t + 1

, as shown in the following equation:

y_{t} = r_{t} + γ q (s_{t + 1}, a_{t + 1}; w_{t})

(16)

The loss function is the difference between the predicted value and the true value, as shown in the following equation:

L (w) = \frac{1}{2} {[q (s_{t}, a_{t}; w_{t}) - y_{t}]}^{2}

(17)

The loss function is updated using gradient descent to reduce the loss function L.

w_{t + 1} = w_{t} - α \frac{\partial L}{\partial w} |_{w = w_{t}}

(18)

The policy network of the actor updates the policy gradient

θ

in the same manner, as shown in the following equation:

θ_{t + 1} = θ_{t} + β q (s_{t}, a_{t}; w_{t}) \nabla_{θ} π (a_{t} ∣ s_{t}; θ_{t})

(19)

The value of

V (s; θ, w)

is enhanced through gradient ascent, improving the actions of the actor, which leads to higher evaluations from the critic. The advantage of the AC algorithm is its superior performance in high-dimensional and continuous action spaces. However, the AC algorithm often converges to local optima, which are highly sensitive to hyper-parameter tuning.

4.2. Deep Deterministic Policy Gradient

The AC algorithm faces challenges when dealing with continuous action spaces, like issues with sampling action probabilities, unstable training, and insufficient exploration strategies. The DDPG algorithm addressed these challenges by incorporating deterministic policy, experience replay, target networks, and parameter noise. By effectively enhancing the algorithm, its stability, training efficiency, and performance in complex environments are improved, honing it to fit reinforcement learning tasks in continuous action spaces. The main idea of the DDPG algorithm is to compute the gradient of the policy function

π (s)

and use this gradient to update the policy. The key is to utilize the strategy gradient theorem to compute the gradient of a deterministic strategy. For a deterministic gradient

π

, the gradient can be expressed as follows:

\nabla_{θ} J (θ) = E_{s - ρ_{π} (s)} [\nabla_{θ} π (s, θ) \cdot Q^{π} (s, π (s, θ))]

(20)

where

J (θ)

is the maximization of the expected accumulation reward,

ρ_{π} (s)

is the distribution of states under the strategy

π

, and

Q^{π} (s, a)

is the action-value function for the state S and action A under the strategy

π

. DDPG approximates the strategy and value functions through neural networks.

The DDPG algorithm uses a neural network to approximate the policy and value functions, and the algorithm mainly consists of a policy network for generating actions, a network for evaluating the value of the actions, and an objective network for stabilizing the training process. The policy network

μ (s | θ^{μ})

generates actions given the state S, and the policy network is updated by maximizing the expected cumulative reward with the objective function:

J_{a} (θ^{μ}) = E [Q (s, μ (s | θ^{μ}) | θ^{Q})]

(21)

The value network

Q (\begin{matrix} s, a | θ^{Q} \end{matrix})

evaluates the value of the action. It is updated by minimizing the mean square error.

L (θ^{Q}) = E_{s, a \sim D} [{(r + γ Q^{'} (s^{'}, μ^{'} (s^{'} | θ^{μ^{'}}) | θ^{μ^{'}}) - Q (s, a | θ^{Q}))}^{2}]

(22)

where D is the empirical playback pool and

Q^{'} (s^{'}, a^{'} | θ^{Q^{'}})

is the output of the target value network.

μ^{'} (\begin{matrix} s^{'} | θ^{μ^{'}} \end{matrix})

is the output of the target strategy network. Updating the policy network by deterministic policy gradients, Equation (21) can be expressed as follows:

\nabla_{θ^{μ}} J_{a} (θ^{μ}) = E_{s \sim ρ_{π}} [\nabla_{a} Q (s, a | θ^{Q}) |_{a = μ (s | θ^{μ})} \cdot \nabla_{θ^{μ}} μ (s | θ^{μ})]

(23)

To improve the training stability, the target network is updated using soft updates.

θ^{'} \leftarrow τ θ + (1 - τ) θ^{'}

(24)

The algorithm mainly consists of a strategy network for generating the action

μ (s | θ^{μ})

, a value network for evaluating the value of the action

Q (\begin{matrix} s, a | θ^{Q} \end{matrix})

, and a target network for stabilizing the training process. The Actor and Critic networks continuously learn and improve by adjusting their parameters. This helps maximize the expected returns and minimize TD-error, improving action generation and action-value estimations. This process is repeatedly conducted during the training of the DDPG algorithm, aiming to enhance the overall performance of the system.

4.3. Neuroevolution of Augmenting Topologies

NEAT optimizes both the topology and connection weights of neural networks. It is utilized to evolve complex ANNs, aiming to reduce the dimensionality of the parameter search space by progressively refining the ANN structure throughout the evolutionary process.

The evolutionary process begins with a group of small, simple genomes and gradually increases in complexity over generations. Initially, genomes have a very simple topology: they only express input, output, and bias neurons. Hidden neurons are not included in the initial genomes to ensure that the search for solutions (ANN connection weights) begins within the lowest possible dimensional parameter space. With each passing generation, new genes may be introduced, increasing the search space of the solution by adding previously nonexistent dimensions. Thus, evolution starts from searching within a small space that can be easily optimized and adds new dimensions as needed. This approach allows for the gradual discovery of complex solutions, which is far more efficient than initiating the search directly in a multidimensional space that includes the final solution.

The NEAT algorithm aims to significantly decrease the complexity of genome organization. The evolution of network topology, by narrowing down the search space, offers considerable performance benefits. Another important aspect of the algorithm is the introduction of an evaluation function, which facilitates the purposeful evolution of the structure of genome.

In combining DDPG and NEAT, NEAT is used to generate and optimize the topology of neural networks. These neural networks are used as Actor networks in DDPG. Specifically, NEAT explores different network structures through evolutionary algorithms to select and optimize neural network topologies suitable for a particular environment. These optimized neural networks are then embedded into the DDPG framework as Actor networks for policy optimization, which are then evaluated by Critic networks. Through this interaction, NEAT provides the ability to adjust the network structure dynamically, while DDPG optimizes the strategies through deep reinforcement learning, allowing the algorithm to adapt better to complex decision-making environments. The advantage of combining DDPG with NEAT is that NEAT optimizes the topology of the Actor network through an evolutionary algorithm, which makes the Actor network better adapt to the complex environment, helping DDPG to better explore the state–action space, which may lead to higher policy quality and thus improve the overall performance. The structure of the DDPG-NEAT algorithm is shown in Figure 5. The method of combining DDPG and NEAT is shown in Algorithm 1.

Algorithm 1 DDPG with NEAT

1:: Input: NEAT parameters ( $c r o s s o v e r_r a t e$ , $m u t a t i o n_r a t e$ , $p o p u l a t i o n_s i z e$ , $n u m_g e n e r a t i o n s$ ), DDPG parameters ( $α$ , $γ$ , $θ$ , $θ^{π}$ , $θ^{Q}$ )
2:: Initialize NEAT population with $p o p u l a t i o n_s i z e$ individuals
3:: Initialize DDPG networks (actor, critic, target networks)
4:: Initialize experience replay buffer D with capacity N
5:: for $g e n e r a t i o n$ in $1 : n u m_g e n e r a t i o n s$ do
6:: Evolve NEAT population for $n u m_g e n e r a t i o n s$ iterations with $c r o s s o v e r_r a t e$ and $m u t a t i o n_r a t e$
7:: Get best NEAT individual
8:: Create actor network based on NEAT topology
9:: for $e p i s o d e$ in $1 : n u m_e p i s o d e s$ do
10:: Initialize episode
11:: for t in $1 : T$ do
12:: Select action $a_{t}$ according to current policy $π (s_{t})$ + noise
13:: Execute action $a_{t}$ , observe reward $r_{t}$ and next state $s_{t + 1}$
14:: Store transition $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in D
15:: Sample mini-batch of transitions from D
16:: Update critic by minimizing loss:
$L = E_{s, a \sim D} [{(r + γ Q^{'} (s^{'}, μ^{'} (s^{'} | θ^{μ^{'}}) | θ^{μ^{'}}) - Q (s, a | θ^{Q}))}^{2}]$
17:: Update actor policy using the sampled policy gradient:
$\nabla_{θ^{μ}} J_{a} (θ^{μ}) = E_{s \sim ρ_{π}} [\nabla_{a} Q (s, a | θ^{Q}) |_{a = μ (s | θ^{μ})} \cdot \nabla_{θ^{μ}} μ (s | θ^{μ})]$
18:: Update target networks: $θ^{'} \leftarrow τ θ + (1 - τ) θ^{'}$
19:: end for
20:: end for
21:: end for

5. Case Studies and Results

A CAES power station in Inner Mongolia, China, is equipped with an electrical energy storage capacity of 40 MWh, used to stabilize the active power output of a wind farm. To achieve precise energy scheduling for the source–storage–grid system, historical data on the active power output of the wind farm and the grid dispatching instruction data are utilized, with data collected at 10-min intervals between two points.

To ensure accurate and reliable comparisons, the same parameter conditions are set to demonstrate the differing performances of three distinct DRL algorithms. For the DDPG algorithm, the learning rate is set to 0.001, the discount factor to 0.99, and BATCH-SIZE to 128. In the NEAT algorithm, the population size is set to 200, and the number of generations to 10. More detailed hyper–parameters are shown in Table 1 and the neural network parameters are shown in Table 2.

The performance of the DDPG-NEAT algorithm was compared with two other DRL algorithms, DDPG and SAC, and the comparison was visualized in a chart. As depicted in Figure 6, the curves for all three algorithms exhibit certain characteristics. Due to the agents introducing noise while exploring the action space to enrich the sample experience pool, promoting rapid learning, all three algorithms exhibit significant fluctuations in the initial phase. The DDPG and DDPG-NEAT algorithms have reward values around −18,000 in the first 50 steps, whereas the SAC algorithm shows reward values around −19,000 in the first 200 steps with more pronounced fluctuations. After initial sample accumulation and policy network updates, the DDPG algorithm undergoes an update at step 50, with reward values reaching around −14,000. However, without learning a better strategy, the curve does not show further improvement. At step 200, the SAC algorithm is updated and the reward value is around −4000, but the instability of the learned strategy leads to large fluctuations in the curve and lack of convergence. In contrast, the DDPG-NEAT algorithm undergoes updates at steps 50 and 100, with the first update achieving reward values around −6000 albeit with considerable fluctuations, and the second update elevating the performance curve to about −1000. Subsequent reduced fluctuations indicate that the agents have learned a stable strategy, leading to the convergence of the reward function. Furthermore, from step 50 onward, the DDPG-NEAT algorithm learns strategies, showing gradually decreasing fluctuations in the reward function during training. The large fluctuations caused by each update are reduced, which indicates that the algorithm performs stably. Under the same conditions, the SAC algorithm only undergoes one update, reaching reward values around −4000, Although this reduces the fluctuation in the reward function curve, the curve does not converge further and experiences significant fluctuations. The DDPG algorithm reaches a reward value curve of about −14,000 after sample updates, but there is no obvious upward trend after each update, and the fluctuation is still large, highlighting the limited learning ability and stability of the algorithm. Figure 7 depicts the relationship between the output power of the wind farm and the dispatched power over 480 min.

Figure 8 illustrates the charge–discharge status of the CAES at different time intervals of 10 min. The CAES system stores the excess electrical energy generated by the wind farm when its output power exceeds the dispatching power, particularly during the 10–40 min interval. Conversely, when the output power of the wind farm falls below the dispatching power, for example, during the 50–170 min and 190–220 min intervals, and cannot satisfy the dispatching power requirements, the CAES discharges to compensate for the deficiency in electrical energy, thus adhering to dispatching instructions. The results show that, while the various algorithms are effective in managing scheduling, they produce very different scheduling results. Optimization algorithms strive to align the charging and discharging levels of the energy storage system with the surplus or deficit of power (i.e., the difference between the output power of the wind farm and the dispatching power), signifying a more precise and reliable dispatching strategy. For instance, at the 20 min mark, the output power of the wind farm exceeds the dispatching command by approximately 1.7 MW. From Figure 8, the charging power of the DDPG algorithm is roughly 2.34 MW, the charging power of the SAC algorithm is about 1.98 MW, and the charging power of the DDPG-NEAT algorithm is nearly 1.78 MW. At the 100 min mark, the output power of the wind farm is approximately 1.64 MW lower than the dispatching instruction. The discharging power for the DDPG algorithm is around 2.18 MW, for the SAC algorithm is about 1.81 MW, and for the DDPG-NEAT algorithm is approximately 1.7 MW.

Estimation of SOC states under three algorithms by Equation (6). From Figure 9, the results show that the SOC value of DDPG-NEAT is maximum 0.6426 and minimum 0.5611; the SOC value of SAC is maximum 0.6452 and minimum 0.5486; and the SOC value of DDPG is the maximum of 0.6519 and minimum of 0.5257. The results show that the DDPG-NEAT algorithm exhibits less fluctuating situations compared to the other two algorithms of SOC, implying it has more efficient energy management, which can dynamically adjust the energy storage and release according to the demand, and reduces the energy waste.

6. Discussion

The method proposed in this paper can effectively coordinate the output between wind farms and CAES. During the initial stage of training, DDPG faces slow and difficult convergence because of the neural network parameters’ initial set values. This is clear when handling a large exploration space. The NEAT algorithm can guide the neural network evolution through Critic to get a network that is more suitable for complex environments. Combining the DDPG algorithm with the NEAT algorithm can improve the adaptability of the neural network, ultimately leading to increased accuracy in reinforcement learning. Therefore, applying this algorithm to other complex power systems for scheduling strategy research is the focus of further research. Various problems may be encountered in applications to other power systems, such as increased environmental complexity, adaptability and generalizability issues, and hyper-parameter tuning. In order to account for the characteristics of different energy types, algorithms must be specifically modeled and optimized. Because of the diversity and complexity of power systems, retraining or tuning of hyper-parameters for different system environments is required.

7. Conclusions

In this paper, we propose an AA-CAES scheduling optimization strategy based on an improved DDPG algorithm. The scheduling optimization strategy includes a layered model, including control and physical layers, to achieve the integration of energy, information, and control flows. A data-driven approach is used to transform the AA-CAES energy scheduling problem into an MDP problem. Aiming at the problem that the AC algorithm easily falls into local optimal solutions and is sensitive to hyper-parameters, the DDPG algorithm based on the deterministic policy gradient is proposed. Meanwhile, in order to improve the robustness of the algorithm and the accuracy of scheduling for better scheduling strategies, the DDPG is improved by using the NEAT algorithm, which is used to generate neural networks adapted to complex environments. The improved algorithms have significantly improved the algorithmic performance, and the DDPG-NEAT algorithm reaches a reward value of around −1000 after strategy learning and reward updating, while the SAC algorithm and DDPG algorithm are only around −4000 and −14,000, respectively. As can be seen from the data in Table 3, DDPG-NEAT improves the accuracy of scheduling and obtains high rewards. During the scheduling process, the deviation of the DDPG-NEAT algorithm from the exact value is only 1.201 MW, while the SAC and DDPG algorithms reach 4.219 MW and 8.997 MW, respectively. In the algorithmic comparisons, the DDPG-NEAT algorithm improves the scheduling accuracy by 15.43% and 31.5%, respectively, when compared with the entropy-based SAC and DDPG algorithms.

In this paper, the proposed algorithm is validated with real data. The following conclusions are drawn:

Deep reinforcement learning algorithms can play an important role in the intelligent scheduling of power systems containing AA-CAES.
The effectiveness of the algorithm is verified by analyzing the simulation results. The algorithm realizes the cooperative scheduling in the source–storage network and ensures the safe operation of the power grid. Even in the case of unstable wind power generation, the system operation can be made smoother by scheduling AA-CAES.
The experimental results also show the better performance of the improved DDPG algorithm with DDPG-NEAT compared to the other two DRL algorithms. The comparison of the power scheduling data of the three algorithms shows that the DDPG-NEAT algorithm can perform the scheduling task better and improve the energy utilization efficiency.

The results show that the improved algorithm not only can accomplish the scheduling task but also has better performance. However, this paper still has some limitations. In this paper, the working state of the AA-CAES system is only briefly described when establishing the Markov model, and its working process is not described. In our future research work, we will start with more complex energy systems, such as more renewable energy inputs, solar, hydro, etc. and more complex forms of energy storage, such as adding other advanced energy storage methods to form an energy storage system. For more complex energy systems, we will verify the applicability of the algorithms and study more efficient energy management systems.

Author Contributions

Conceptualization, R.W.; methodology, Z.Z.; software, R.W.; formal analysis, R.W.; investigation, K.W.; data curation, W.Y.; writing—original draft preparation, R.W.; supervision, K.M.; project administration, Y.L.; resources, P.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Inner Mongolia Autonomous Region Science and Technology Major Project (grant number 2021ZD0032) and the Inner Mongolia Autonomous Region “Open Competition Mechanism to Select the Best Candidates” Project (grant number 2022JBGS0045).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

The authors greatly appreciate the comments from the reviewers, whose comments helped improve the quality of the paper.

Conflicts of Interest

Pengbing Lei was employed by the Power China Hebei Electric Power Engineering Co., Ltd., Yong Liu was employed by the Shandong Energy Group Electric Power Group Co., Ltd., Zhihua Lin was employed by the Science and Technology Research Institute of China Three Gorges Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CAES	Compressed Air Energy Storage
AA-CAES	Advanced Adiabatic Compressed Air Energy Storage
AC	Actor–Critic
DDPG	Deep Deterministic Policy Gradient
NEAT	Neuroevolution of Augmenting Topologies
DDPG-NEAT	Deep Deterministic Policy Gradient with Neuroevolution of Augmenting Topologies
MDP	Markov Decision Process
DRL	Deep Reinforcement Learning
TD	Temporal Difference
TD-error	Temporal Difference error
ANN	Artificial Neural Network

References

Karmaker, A.K.; Rahman, M.M.; Hossain, M.A.; Ahmed, M.R. Exploration and Corrective Measures of Greenhouse Gas Emission from Fossil Fuel Power Stations for Bangladesh. J. Clean. Prod. 2020, 244, 118645. [Google Scholar] [CrossRef]
Xie, H.; Yu, Y.; Wang, W.; Liu, Y. The Substitutability of Non-Fossil Energy, Potential Carbon Emission Reduction and Energy Shadow Prices in China. Energy Policy 2017, 107, 63–71. [Google Scholar] [CrossRef]
Ming, Z.; Song, X.; Mingjuan, M.; Xiaoli, Z. New Energy Bases and Sustainable Development in China: A Review. Renew. Sustain. Energy Rev. 2013, 20, 169–185. [Google Scholar] [CrossRef]
Argyrou, M.C.; Christodoulides, P.; Kalogirou, S.A. Energy Storage for Electricity Generation and Related Processes: Technologies Appraisal and Grid Scale Applications. Renew. Sustain. Energy Rev. 2018, 94, 804–821. [Google Scholar] [CrossRef]
Michaelides, E.E. Thermodynamics, Energy Dissipation, and Figures of Merit of Energy Storage Systems—A Critical Review. Energies 2021, 14, 6121. [Google Scholar] [CrossRef]
Huang, Y.; Keatley, P.; Chen, H.S.; Zhang, X.J.; Rolfe, A.; Hewitt, N.J. Techno-Economic Study of Compressed Air Energy Storage Systems for the Grid Integration of Wind Power. Int. J. Energy Res. 2018, 42, 559–569. [Google Scholar] [CrossRef]
Zhang, X.; Li, Y.; Gao, Z.; Chen, S.; Xu, Y.; Chen, H. Overview of Dynamic Operation Strategies for Advanced Compressed Air Energy Storage. J. Energy Storage 2023, 66, 107408. [Google Scholar] [CrossRef]
Xu, W.; Zhang, W.; Hu, Y.; Yin, J.; Wang, J. Multi Energy Flow Optimal Scheduling Model of Advanced Adiabatic Compressed Air Energy Storage. Trans. China Electrotech. Soc. 2022, 37, 5944–5955. [Google Scholar]
Wang, X.; Zhou, J.; Qin, B.; Guo, L. Coordinated Power Smoothing Control Strategy of Multi-Wind Turbines and Energy Storage Systems in Wind Farm Based on MADRL. IEEE Trans. Sustain. Energy 2024, 15, 368–380. [Google Scholar] [CrossRef]
Zhou, X.; Wang, J.; Wang, X.; Chen, S. Optimal Dispatch of Integrated Energy System Based on Deep Reinforcement Learning. Energy Rep. 2023, 9, 373–378. [Google Scholar] [CrossRef]
Sheng, Y.; Yang, J.; Ma, S.; Wang, Y.; Li, H. Research on Optimal Dispatching of Integrated Energy System Based on Demand-supply Interaction. Power Demand Side Manag. 2019, 21, 48–54. [Google Scholar]
Zhuo, Y.; Chen, J.; Zhu, J.; Ye, H.; Wang, Z. Optimal Scheduling of Park-level Integrated Energy Systems Based on Improved Approximate Dynamic Programming. High Volt. Eng. 2022, 51, 2597–2606. [Google Scholar]
Yan, K.; Zhang, J.; He, Y.; Zhang, Y.; Liu, Y.; Li, X. The Optimal Dispatching of Mixed Integer Programming Based on Opportunity Constraint of Microgrid. Electr. Power Sci. Eng. 2021, 37, 17–24. [Google Scholar]
Li, Y.; Yao, F.; Zhang, S.; Liu, Y.; Miao, S. An Optimal Dispatch Model of Adiabatic Compressed Air Energy Storage System Considering Its Temperature Dynamic Behavior for Combined Cooling, Heating and Power Microgrid Dispatch. High Volt. Eng. 2022, 51, 104366. [Google Scholar] [CrossRef]
Lin, J.; Liu, Y.; Chen, B.; Chen, R.; Chen, Y.; Dai, X. Micro-grid Energy Optimization Dispatch of Combined Cold and Heat Power Supply Based on Stochastic Chance-constrained Programming. Electr. Meas. Instrum. 2019, 56, 85–90. [Google Scholar]
Li, Y.; Miao, S.; Yin, B.; Han, J.; Zhang, S.; Wang, J.; Luo, X. Combined Heat and Power Dispatch Considering Advanced Adiabatic Compressed Air Energy Storage for Wind Power Accommodation. Energy Convers. Manag. 2019, 200, 112091. [Google Scholar] [CrossRef]
Naidji, I.; Ben Smida, M.; Khalgui, M.; Bachir, A.; Li, Z.; Wu, N. Efficient Allocation Strategy of Energy Storage Systems in Power Grids Considering Contingencies. IEEE Access 2019, 7, 186378–186392. [Google Scholar] [CrossRef]
Men, J. Bi-Level Optimal Scheduling Strategy of Integrated Energy System Considering Adiabatic Compressed Air Energy Storage and Integrated Demand Response. J. Electr. Eng. Technol. 2024, 19, 97–111. [Google Scholar] [CrossRef]
Long, F.; Jin, B.; Yu, Z.; Xu, H.; Wang, J.; Bhola, J.; Shavkatovich, S.N. Research on Multi-Objective Optimization of Smart Grid Based on Particle Swarm Optimization. Electrica 2023, 23, 222–230. [Google Scholar] [CrossRef]
Torkan, R.; Ilinca, A.; Ghorbanzadeh, M. A Genetic Algorithm Optimization Approach for Smart Energy Management of Microgrid. Renew. Energy 2022, 197, 852–863. [Google Scholar] [CrossRef]
Fathy, A.; Alanazi, T.M.; Rezk, H.; Yousri, D. Optimal Energy Management of Micro-Grid Using Sparrow Search Algorithm. Energy Rep. 2022, 8, 758–773. [Google Scholar] [CrossRef]
Wang, Y.; Zheng, Y.; Xue, H.; Mi, Y. Optimal Dispatch of Mobile Energy Storage for Peak Load Shifting Based on Enhanced Firework Algorithm. Autom. Electr. Power Syst. 2021, 45, 48–56. [Google Scholar]
Ma, Y.; Zhou, J.; Dong, X.; Wang, H.; Zhang, W.; Tan, Z. Multi-objective Optimal Scheduling Model for Multi-energy System Considering Uncertainty and Hybrid Energy Storage Devices. J. Electr. Power Sci. Technol. 2022, 37, 19–32. [Google Scholar]
Lu, L.; Chu, G.; Zhang, T.; Yang, Z. Optimal Configuration of Energy Storage in a Microgrid Based on Improved Multi-objective Particle Swarm Optimization. Power Syst. Prot. Control 2020, 48, 116–124. [Google Scholar]
Liu, X.; Xie, S.; Tian, J.; Wang, P. OTwo-Stage Scheduling Strategy for Integrated Energy Systems Considering Renewable Energy Consumption. IEEE Access 2022, 10, 83336–83349. [Google Scholar] [CrossRef]
Xu, Z.; Han, G.; Liu, L.; Martínez-García, M.; Wang, Z. Multi-Energy Scheduling of an Industrial Integrated Energy System by Reinforcement Learning-Based Differential Evolution. IEEE Trans. Green Commun. Netw. 2021, 5, 1077–1090. [Google Scholar] [CrossRef]
Li, Y.; Zhang, Z.; Meng, K.; Wei, H. Energy Optimal Dispatch of Microgrid Based on Improved Depth Deterministic Strategy Gradient Algorithm. Electron. Meas. Technol. 2023, 46, 73–80. [Google Scholar]
Wang, B.; Li, Y.; Ming, W.; Wang, S. Deep Reinforcement Learning Method for Demand Response Management of Interruptible Load. IEEE Trans. Smart Grid 2020, 11, 3146–3155. [Google Scholar] [CrossRef]
Chen, L.; Wu, J.; Tang, H.; Jin, F.; Wang, Y. A Q-Learning Based Optimization Method of Energy Management for Peak Load Control of Residential Areas with CCHP Systems. Electr. Power Syst. Res. 2023, 214, 108895. [Google Scholar]
Luo, J.; Zhang, W.; Wang, H.; Wei, W.; He, J. Research on Data-Driven Optimal Scheduling of Power System. Energies 2023, 16, 2926. [Google Scholar] [CrossRef]
Chang, Y.; Liu, S.; Wang, L.; Cong, W.; Zhang, Z.; Qi, S. Research on Low-Carbon Economic Operation Strategy of Renewable Energy-Pumped Storage Combined System. Math. Probl. Eng. 2022, 13, 9202625. [Google Scholar] [CrossRef]
Bai, Y.; Chen, S.; Zhang, J.; Xu, J.; Gao, T.; Wang, X.; Gao, D.W. An Adaptive Active Power Rolling Dispatch Strategy for High Proportion of Renewable Energy Based on Distributed Deep Reinforcement Learning. Appl. Energy 2023, 330, 120294. [Google Scholar] [CrossRef]
Dolatabadi, A.; Abdeltawab, H.; Mohamed, Y.A.-R.I. Deep Reinforcement Learning-Based Self-Scheduling Strategy for a CAES-PV System Using Accurate Sky Images-Based Forecasting. IEEE Trans. Power Syst. 2023, 38, 1608–1618. [Google Scholar] [CrossRef]

Figure 1. Integrated energy system framework.

Figure 2. AA-CAES system model.

Figure 3. Hierarchical control framework of source–storage–grid system.

Figure 4. Intelligent agent with the environment.

Figure 5. Structure of the DDPG-NEAT algorithm.

Figure 6. Average reward curves for three different DRL algorithms.

Figure 7. Relationship diagram of wind turbine output power and scheduling power.

Figure 8. Charging and discharging status of AA-CAES based on DDPG, SAC, and DDPG-NEAT algorithms.

Figure 9. The SOC curves for three different DRL algorithms.

Table 1. Hyper–parameters of the DDPG-NEAT algorithm.

Hyper-Parameter	Value
Learning rate	0.001
Discount factor	0.99
Training episode	250,000
Steps in each episode	500
Batch size	128
Population size	200
Generation number	10
Soft update factor	0.995
Action noise	0.1

Table 2. Network structure and parameterization.

	Input Layer	Hidden Layer1	Hidden Layer2	Output Layer
Actor network	3	128	128	3
Critic network	6	128	128	1

Table 3. Performance comparison of three algorithms.

Algorithm	DDPG-NEAT	SAC	DDPG
Power error (MW)	1.20105	4.21946	8.99762
Scheduling accuracy (%)	91.97	76.54	60.47
Charging capacity (MWh)	1.69079	2.10285	2.93163
Discharging capacity (MWh)	−4.18489	−5.10002	−6.72516

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Zhang, Z.; Meng, K.; Lei, P.; Wang, K.; Yang, W.; Liu, Y.; Lin, Z. Research on Energy Scheduling Optimization Strategy with Compressed Air Energy Storage. Sustainability 2024, 16, 8008. https://doi.org/10.3390/su16188008

AMA Style

Wang R, Zhang Z, Meng K, Lei P, Wang K, Yang W, Liu Y, Lin Z. Research on Energy Scheduling Optimization Strategy with Compressed Air Energy Storage. Sustainability. 2024; 16(18):8008. https://doi.org/10.3390/su16188008

Chicago/Turabian Style

Wang, Rui, Zhanqiang Zhang, Keqilao Meng, Pengbing Lei, Kuo Wang, Wenlu Yang, Yong Liu, and Zhihua Lin. 2024. "Research on Energy Scheduling Optimization Strategy with Compressed Air Energy Storage" Sustainability 16, no. 18: 8008. https://doi.org/10.3390/su16188008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Energy Scheduling Optimization Strategy with Compressed Air Energy Storage

Abstract

1. Introduction

2. Integrated Energy Framework

2.1. AA-CAES Structure

2.2. Hierarchical Energy Optimization Strategy

2.3. DRL Description

3. Cooperative Control Framework Of Source–Storage–Grid System

3.1. Wind Farm Model

3.2. AA-CAES Model

3.3. Energy Scheduling Model

3.4. Markov Model

3.4.1. State Space $s_{t}$

3.4.2. Action Space $a_{t}$

3.4.3. Reward $r_{t}$

4. Deep Reinforcement Learning Algorithms

4.1. Actor–Critic Algorithm

4.2. Deep Deterministic Policy Gradient

4.3. Neuroevolution of Augmenting Topologies

5. Case Studies and Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Research on Energy Scheduling Optimization Strategy with Compressed Air Energy Storage

Abstract

1. Introduction

2. Integrated Energy Framework

2.1. AA-CAES Structure

2.2. Hierarchical Energy Optimization Strategy

2.3. DRL Description

3. Cooperative Control Framework Of Source–Storage–Grid System

3.1. Wind Farm Model

3.2. AA-CAES Model

3.3. Energy Scheduling Model

3.4. Markov Model

3.4.1. State Space s t

3.4.2. Action Space a t

3.4.3. Reward r t

4. Deep Reinforcement Learning Algorithms

4.1. Actor–Critic Algorithm

4.2. Deep Deterministic Policy Gradient

4.3. Neuroevolution of Augmenting Topologies

5. Case Studies and Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.4.1. State Space $s_{t}$

3.4.2. Action Space $a_{t}$

3.4.3. Reward $r_{t}$