An Adaptive Scheduling Method for Standalone Microgrids Based on Deep Q-Network and Particle Swarm Optimization

Zhang, Borui; Liu, Bo

doi:10.3390/en18082133

Open AccessArticle

An Adaptive Scheduling Method for Standalone Microgrids Based on Deep Q-Network and Particle Swarm Optimization

by

Borui Zhang

and

Bo Liu

^*

College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(8), 2133; https://doi.org/10.3390/en18082133

Submission received: 19 March 2025 / Revised: 14 April 2025 / Accepted: 17 April 2025 / Published: 21 April 2025

(This article belongs to the Section A1: Smart Grids and Microgrids)

Download

Browse Figures

Versions Notes

Abstract

:

Standalone wind–solar–diesel–storage microgrids serve as a crucial solution for achieving energy self-sufficiency in remote and off-grid areas, such as rural regions and islands, where conventional power grids are unavailable. Addressing scheduling optimization challenges arising from the intermittent nature of renewable energy generation and the uncertainty of load demand, this paper proposes an adaptive optimization scheduling method (DQN-PSO) that integrates Deep Q-Network (DQN) with Particle Swarm Optimization (PSO). The proposed approach leverages DQN to assess the operational state of the microgrid and dynamically adjust the key parameters of PSO. Additionally, a multi-strategy switching mechanism, incorporating global search, local adjustment, and reliability enhancement, is introduced to jointly optimize both clean energy utilization and power supply reliability. Simulation results demonstrate that, under typical daily, high-volatility, and low-load scenarios, the proposed method improves clean energy utilization by 3.2%, 4.5%, and 10.9%, respectively, compared to conventional PSO algorithms while reducing power supply reliability risks to 0.70%, 1.04%, and 0.30%, respectively. These findings validate the strong adaptability of the proposed algorithm to dynamic environments. Further, a parameter sensitivity analysis underscores the significance of the dynamic adjustment mechanism.

Keywords:

standalone microgrids; adaptive method; reinforcement learning; particle swarm optimization algorithm

1. Introduction

The global renewable energy market is undergoing rapid growth and profound transformation. Under the framework of the “dual carbon” targets, the integration of renewable energy into modern power systems has become increasingly widespread [1]. As a self-sustaining energy system that operates independently of conventional power grids, standalone wind–solar–diesel–storage microgrids integrate photovoltaic (PV) generation, wind power, energy storage systems, and backup diesel generators to facilitate local electricity production and consumption. These microgrids offer a cost-effective and efficient power supply solution for remote areas and regions where grid extension is either impractical or prohibitively expensive [2,3]. However, the operation of standalone microgrids is hindered by the inherent intermittency of distributed energy sources and the uncertainty of load demand. Under the constraint of relying solely on internal resources, achieving energy balance, enhancing the utilization of distributed renewable energy, and ensuring reliable power supply remain critical research challenges in the field of standalone microgrid optimization.

Optimization-based scheduling plays a crucial role in mitigating operational challenges in standalone microgrids. As a high-dimensional, nonlinear, and multi-constraint optimization problem, various studies have employed traditional optimization methods for its solution. Among these, Particle Swarm Optimization (PSO) has been widely applied in microgrid scheduling due to its strong scalability and ease of implementation [4,5,6]. However, the performance of PSO is highly sensitive to its parameter settings, and fixed parameter configurations often struggle to adapt to varying operating conditions, leading to premature convergence and local optima. To overcome this limitation, researchers have introduced adaptive mechanisms to enhance PSO. For instance, Zhang et al. [7] proposed an improved PSO algorithm with adaptive inertia weight and constriction factor for economic scheduling in islanded microgrids, yielding promising results. Ge [8] designed a multi-objective PSO-based power system scheduling model, enhancing convergence and solution quality through dynamic weight adjustment. Guan et al. [9] optimized PSO parameters and particle velocity transformation to improve efficiency and reliability in solving economic dispatch problems in microgrids.

In recent years, reinforcement learning (RL) has gained increasing attention as a promising approach for microgrid scheduling, owing to its capabilities in online learning and dynamic optimization. By leveraging the state–action–reward mechanism, RL effectively addresses the uncertainties associated with distributed energy resources and load fluctuations [10,11,12]. However, conventional RL algorithms often encounter significant challenges when applied to high-dimensional state spaces and complex multi-objective optimization problems, particularly in terms of computational efficiency and convergence stability. To overcome these limitations, deep reinforcement learning (DRL) integrates deep neural networks into RL frameworks, thereby enhancing adaptability in dynamic and uncertain environments. For example, Wen et al. [13] employed Deep Q-Networks (DQN) to optimize the scheduling of electric vehicle (EV)-integrated microgrids under renewable energy uncertainty. Domínguez-Barbero et al. [14] adopted DQN to achieve cost-efficient operation in islanded microgrids. Pan et al. [15] developed a dynamic scheduling framework based on the Soft Actor-Critic (SAC) algorithm to enhance both economic and environmental performance. Liang et al. [16] utilized the Deep Deterministic Policy Gradient (DDPG) algorithm to enable continuous control in off-grid renewable energy systems. These studies collectively demonstrate the significant potential of DRL in managing renewable energy variability and addressing complex scheduling objectives. Nevertheless, several critical challenges remain unresolved, including high data dependency, slow convergence rates, and limited generalization ability under extreme or unseen conditions.

Given these challenges, the fusion of intelligent optimization algorithms and reinforcement learning has attracted growing interest, showcasing significant potential in complex environments. For example, Yin et al. [17] proposed a reinforcement learning-enhanced PSO (RLPSO), which dynamically adjusts PSO parameters to improve convergence and global search capability. Wang et al. [18] developed a reinforcement learning-level PSO (RLLPSO) to effectively address large-scale optimization problems. Huang et al. [19] embedded a reinforcement learning feedback mechanism into PSO for autonomous underwater vehicle path planning, enhancing search efficiency and adaptability. Additionally, Zhang et al. [20] combined reinforcement learning with PSO to optimize wind farm layouts, demonstrating its advantages in handling complex constraints. Gao et al. [21] introduced an improved PSO (IPSO_RL) with Q-learning-based dynamic inertia weight adjustment, exhibiting superior performance in flexible job shop scheduling problems, further validating the potential of combining RL with intelligent optimization algorithms.

Although both Particle Swarm Optimization (PSO) and reinforcement learning (RL) have shown promise in microgrid scheduling, conventional PSO suffers from fixed parameter settings, which limits its ability to handle rapid fluctuations in solar and wind power output and makes it prone to premature convergence. On the other hand, while RL and DRL offer dynamic optimization capabilities, they often face challenges such as slow convergence rates and substantial data requirements. Furthermore, existing hybrid RL-PSO approaches are primarily applied to areas such as path planning and system layout optimization, and a well-established framework for multi-objective scheduling in standalone microgrids—especially under high uncertainty—has yet to emerge. These limitations underscore the pressing need for an efficient hybrid framework that integrates the global search capability of PSO with the adaptive learning capacity of RL.

To address this problem, we propose an adaptive microgrid scheduling optimization approach based on a Deep Q-Network and Particle Swarm Optimization (DQN-PSO) framework. The main contributions of this work are summarized as follows:

(1): We design a DQN-PSO adaptive optimization framework in which a Deep Q-Network module perceives the microgrid’s operational state and dynamically adjusts key PSO parameters, including the inertia weight and acceleration coefficients. This enables the PSO algorithm to respond more flexibly to changing system conditions, thereby mitigating the issue of local optima in dynamic environments and enhancing overall scheduling performance.
(2): We propose three novel adaptive scheduling strategies—Global Search, Local Adjustment, and Reliability Enhancement—designed to improve performance under diverse operating conditions: a. Global Search broadens the solution space to facilitate comprehensive optimization; b. Local Adjustment fine-tunes scheduling decisions to accommodate real-time supply–demand imbalances; and c. Reliability Enhancement prioritizes the dispatch of energy storage to maintain power supply continuity. These strategies collectively improve system flexibility and operational stability.
(3): To evaluate the effectiveness of the proposed method, we develop a representative standalone microgrid simulation model and conduct comparative experiments under multiple scenarios. The results demonstrate that the proposed approach achieves superior performance in terms of clean energy utilization and power supply reliability, thereby validating its practical applicability.

The structure of the paper is as follows: Section 2 presents the system composition and mathematical model of the standalone microgrid. Section 3 elaborates on the theoretical framework and cooperative mechanism of the proposed DQN-PSO method. Section 4 describes the simulation experiments, evaluates performance, and compares the advantages and limitations of DQN-PSO with traditional methods. Section 5 presents the experimental results and provides a detailed analysis of their significance, while Section 6 summarizes the key research contributions and outlines potential directions for future work.

2. Standalone Microgrid System Model

Figure 1 illustrates the typical composition of a standalone microgrid system, which consists of rooftop photovoltaic (PV) panels, wind turbines (WT), diesel generators (DG), energy storage devices (ES), and both residential and industrial loads. In this system, PV and wind power serve as the primary renewable energy sources, but exhibit inherent intermittency and uncertainty. The diesel generator functions as a backup power source to mitigate energy shortages, while energy storage devices are employed for electricity storage and demand–supply balancing, effectively alleviating load fluctuations. The standalone microgrid model constructed in this section is based on a typical wind–solar–diesel–storage architecture and is intended to provide a standardized simulation environment for the subsequent optimization scheduling using the DQN-PSO algorithm.

2.1. Mathematical Model of the Microgrid System

(1): PV Power Generation:

The power output of a PV system primarily depends on solar irradiance and ambient temperature. In this study, a simplified PV model [22] is employed, with its power output given by

P_{P V} = P_{S T C} \frac{G_{t}}{G_{S T C}} [1 + δ (T_{c} - T_{S T C})]

(1)

where

P_{P V}

(kW) is the output power of the PV array,

P_{S T C}

(kW)is the rated power under standard test conditions (STC) (irradiance of 1000 W/m², temperature of 25 °C),

G_{t}

(W/m²) is the actual solar irradiance,

G_{S T C}

is the solar irradiance under STC,

δ

is the temperature coefficient of peak power (−0.004/°C for mainstream crystalline silicon PV modules),

T_{c}

(°C) is the PV module surface temperature, and

T_{s t c}

(°C) is the reference temperature under STC.

(2): Wind Power Generation:

The power output of a wind turbine depends on wind speed, rotor swept area, and power coefficient. To simplify complex dynamic computations, this study considers only wind speed and employs a wind speed power characteristic curve [23] as an approximate mathematical model

P_{W P} = \{\begin{matrix} 0, v < v_{c i} o r v > v_{c o} \\ P_{r} \frac{v^{2} - v_{c i}^{2}}{v_{r}^{2} - v_{c i}^{2}}, v_{c i} \leq v \leq v_{r} \\ P_{r}, v_{r} \leq v \leq v_{c o} \end{matrix}

(2)

where

P_{W P}

(kW) is the wind turbine power output,

v

(m/s) is the current wind speed, and

v_{c i}

,

v_{r}

,

v_{c o}

are the cut-in (3 m/s), rated (12 m/s), and cut-out wind speeds (25 m/s), respectively.

P_{r}

(kW) is the rated power of the wind turbine. Figure 2 illustrates the corresponding wind turbine power characteristic curve.

(3): Energy Storage System:

Integrating an appropriately sized battery storage system effectively mitigates power fluctuations caused by renewable energy variability. The charging and discharging power of the battery is expressed as

P_{E S} (t) = \{\begin{matrix} η_{c h} P_{c h} (t), P_{c h} (t) > 0 \\ \frac{P_{d i s} (t)}{η_{d i s}}, P_{d i s} (t) > 0 \end{matrix}

(3)

where

P_{E S} (t)

(kW) is the output power of the energy storage system at time t,

P_{c h} (t)

and

P_{d i s} (t)

(kW) are the charging and discharging power at time t, and

η_{c h}

and

η_{d i s}

represent the charging and discharging efficiencies, respectively. The State of Charge (SOC) of the energy storage system represents its energy level at a given time and is updated using the following equation [24]

S O C (t) = S O C (t - 1) + \frac{η_{c h} p_{c h} (t) Δ t}{E_{r}} - \frac{P_{d i s} (t) Δ t}{η_{d i s} E_{r}}

(4)

where

S O C (t)

is the SOC of the storage system at time t,

E_{r}

(kWh) is the rated capacity of the storage system, and

∆ t

is the time step.

(4): Diesel Generator:

As a critical backup power source in standalone microgrids, the diesel generator is activated when distributed renewable energy supply is insufficient, or when the energy storage system lacks adequate charge, ensuring the stability and reliability of power supply.

2.2. System Constraints

(1): Power Balance Constraint:

P_{s u p p l y} (t) = P_{P V} (t) + P_{W P} (t) + P_{D G} (t) + P_{d i s} (t) = P_{l o a d} (t) .

(5)

P_{P V} (t), P_{W P} (t), P_{D G} (t)

, and

P_{d i s} (t)

represent the power outputs of the PV system, wind turbine, diesel generator, and energy storage system at time t, respectively, and

P_{l o a d} (t)

is the total power demand at time t.

(2): Power Output Constraints of System Components:

\{\begin{matrix} P_{P V_m i n} \leq P_{P V} (t) \leq P_{P V_m a x} \\ P_{W P_m i n} \leq P_{W P} (t) \leq P_{W P_m a x} \\ P_{D G_m i n} \leq P_{D G} (t) \leq P_{D G_m a x} \end{matrix}

(6)

P_{P V_m i n}

and

P_{P V_m a x}

denote the minimum and maximum power outputs of the PV system,

P_{W P_m i n}

and

P_{W P_m a x}

denote the minimum and maximum power outputs of the wind turbine, and

P_{D G_m i n}

and

P_{D G_m a x}

denote the minimum and maximum power outputs of the diesel generator.

(3): Energy Storage System Constraints:

\{\begin{matrix} P_{c h} (t) \leq P_{c h_m a x}, P_{d i s} (t) \leq P_{d i s_m a x} \\ {S O C}_{m i n} \leq S O C (t) \leq {S O C}_{m a x} \end{matrix} .

(7)

P_{c h_m a x}

and

P_{d i s_m a x}

are the maximum charging and discharging power limits of the energy storage system, and

{S O C}_{m i n}

and

{S O C}_{m a x}

define the allowable SOC range.

2.3. Optimization Objectives

The objective of optimizing the scheduling of an independent wind–solar–hydro-storage–diesel microgrid is to efficiently allocate power generation from wind turbines, PV panels, battery storage, and diesel generators. The optimization aims to maximize clean energy utilization while ensuring demand fulfillment and adhering to system constraints.

(1): Clean Energy Utilization Rate:

Independent microgrids are typically deployed in remote areas with limited access to external energy sources. Maximizing local renewable energy utilization is essential to reducing reliance on external power supplies. The clean energy utilization rate is defined as

η_{c l e a n} = \frac{\sum_{t = 1}^{T} P_{r e n e w} (t)}{\sum_{t = 1}^{T} P_{s u p p l y} (t)}

(8)

where

P_{r e n e w} (t)

represents the real-time output of clean energy sources, including PV and wind power, and

P_{s u p p l y} (t)

denotes the total power supply at time t.

(2): Supply Reliability Risk:

Supply reliability is a critical factor affecting the operational performance and demand fulfillment of independent microgrids. The inherent intermittency of distributed renewable energy, the limited capacity of storage systems, and uncertain load demand contribute to the risk of power shortages. To quantitatively assess and optimize power supply reliability, the supply reliability risk function is expressed as

η_{r i s k} = \frac{\sum_{t \in T} {[P_{l o a d} (t) - P_{s u p p l y} (t)]}^{+}}{\sum_{t \in T} P_{l o a d} (t)}

(9)

where

{[P_{l o a d} (t) - P_{s u p p l y} (t)]}^{+}

denotes the power shortage at time t, which is considered only when the shortage is positive, and

\sum_{t \in T} P_{l o a d} (t)

is the total load demand over the scheduling period, used to normalize the power shortage, ensuring comparability.

3. Scheduling Method

Building on the system model and constraints, this study proposes a hybrid Deep Q-Network and Particle Swarm Optimization (DQN-PSO) approach for optimal scheduling in standalone microgrids. By integrating reinforcement learning with swarm intelligence, this method dynamically adapts to renewable energy fluctuations, enhancing both clean energy utilization and supply reliability.

3.1. Standard PSO Algorithm

Particle Swarm Optimization (PSO) is a heuristic optimization technique inspired by the foraging behavior of bird flocks [25]. Due to its simplicity and efficiency, PSO has been widely applied in optimization problems, including microgrid scheduling. The basic PSO process is illustrated in Figure 3.

Each particle in PSO represents a candidate solution within the search space and moves iteratively to explore the global optimum. The movement of a particle is influenced by its personal best solution and the global best solution of the swarm. The velocity and position update equations are given by the following equations:

Velocity Update:

v_{i}^{t + 1} = ω v_{i}^{t} + c_{1} r a n d (0, 1) ({P b e s t}_{i}^{t} - x_{i}^{t}) + c_{2} r a n d (0, 1) ({G b e s t}^{t} - x_{i}^{t})

(10)

2.: Position Update:

x_{i}^{t + 1} = x_{i}^{t} + v_{i}^{t + 1} .

(11)

ω is the inertia weight (typically ranging from 0.4 to 0.9), controlling the search range and convergence speed;

c_{1}

and

c_{2}

are cognitive and social learning factors, respectively, commonly set to 2;

r a n d (0, 1)

are random numbers in [0, 1];

{P b e s t}_{i}

and

G b e s t

represent the personal and global best positions;

v_{i}

and

x_{i}

denote the velocity and position of particle i, respectively; and t is the iteration index.

In microgrid scheduling, PSO optimizes the output of various system components. A particle’s position represents a candidate scheduling strategy, including photovoltaic (PV) power allocation, wind power allocation, battery charge/discharge scheduling, and diesel generator output. The fitness function is designed to maximize clean energy utilization while ensuring power supply reliability.

However, standard PSO struggles with dynamic environmental changes due to fixed parameter configurations, leading to premature convergence or suboptimal performance. Some examples are as follows:

Sudden load variations: Fixed inertia weights may cause premature convergence, limiting exploration of optimal scheduling strategies.
Weather fluctuations: Fixed learning factors restrict the algorithm’s ability to rapidly adjust to variations in renewable energy generation.

To overcome these limitations, an adaptive mechanism is introduced to dynamically adjust PSO parameters based on real-time system conditions.

3.2. Deep Q-Network Module

Reinforcement learning (RL) enables an agent to interact with its environment and optimize decision-making strategies based on reward feedback. In this study, Deep Q-Network (DQN) is incorporated to dynamically adjust key PSO parameters, enhancing the adaptability of microgrid scheduling. Prior studies [26] have demonstrated that DQN improves convergence speed and global search performance in optimization tasks.

DQN employs deep learning techniques to approximate the Q-value function, which estimates the expected return of an action in a given state. Compared to traditional Q-learning, DQN mitigates the curse of dimensionality and instability issues, making it well-suited for high-dimensional microgrid scheduling problems.

3.2.1. Neural Network Architecture

The DQN architecture consists of an input layer, two hidden layers, and an output layer:

Input layer: Receives the microgrid state vector, including renewable generation, load demand and battery state of charge (SOC).
Hidden layers: Two fully connected layers, each with 32 neurons, optimized for computational efficiency while preserving critical feature representations. The ReLU activation function enhances non-linear mapping capabilities.
Output layer: Generates Q-values for all possible actions, predicting expected rewards using a linear activation function.

DQN is implemented using TensorFlow, with the Adam optimizer fine-tuning network weights. The loss function is defined as

L (θ) = E [{(y - Q (s, a; θ))}^{2}]

(12)

where Q (s, a; θ) is the predicted Q-value for action a in state s, parameterized by θ, and y is the target Q-value, computed as

y = r + γ {m a x}_{a'} Q (s^{'}, a^{'}; θ^{-})

(13)

where r is the immediate reward, reflecting the system’s reward for taking action a in state s, γ is the discount factor (determining the importance of future rewards), s’ is the new state,

Q (s^{'}, a^{'}; θ^{-})

is the Q-value of the next state-action pair, evaluated by the target network, and

θ^{-}

represents the parameters of the target network, which is periodically updated from the main Q-network to stabilize training.

3.2.2. Training Process Based on Markov Decision Process (MDP)

The DQN training process follows an MDP framework, comprising six key steps:

State Observation: Collect real-time microgrid operational data (e.g., renewable generation, load demand, battery SOC).
Action Selection: Apply an ε-greedy policy to balance exploration and exploitation.
Environment Interaction: Execute the selected scheduling action and observe system response.
Experience Replay: Store state–action–reward transitions in a replay buffer and sample randomly during training.
Q-Value Update: Train the Q-network using mini-batch gradient descent.
Target Network Synchronization: Periodically update the target network parameters to reduce training variance and improve convergence stability.

To enhance learning stability, an experience replay mechanism stores past interactions, allowing the model to sample diverse experiences instead of consecutive time steps. This mitigates temporal correlation issues and improves generalization. Additionally, a separate target network helps stabilize Q-value updates by reducing variance, ensuring smoother convergence.

DQN employs an ε-greedy strategy for action selection, where the agent chooses a random action with probability ε (exploration) and selects the action with the highest Q-value with probability 1 − ε (exploitation). The value of ε is gradually decayed during training to transition from exploration to exploitation, enabling more efficient learning.

3.3. DQN-PSO Collaborative Optimization Algorithm

DQN effectively maps state–action relationships in dynamic environments, offering a flexible and adaptive optimization framework for microgrids scheduling. To address the limitations of traditional PSO in wind–solar–diesel–storage microgrids scheduling, this study introduces a DQN-PSO collaborative optimization approach. This approach incorporates three dynamic search strategies and leverages DQN’s reinforcement learning capabilities to adaptively fine-tune key PSO parameters, thereby improving global search efficiency and optimization effectiveness in complex, dynamic environments.

A key innovation of DQN-PSO, compared to existing research, is its adaptive dual-layer optimization architecture, which perceives the real-time operational state of the microgrid and dynamically optimizes PSO parameters, making it particularly effective for high-dimensional, nonlinear, and multi-constraint microgrid scheduling problems. And the existing RL-PSO methods typically use tabular Q-learning or basic reinforcement learning techniques to adjust PSO parameters, these approaches often suffer from the curse of dimensionality in high-dimensional state spaces. In addition, the lack of dynamic strategy switching limits their adaptability in complex microgrid environments. In contrast, the proposed DQN-PSO method employs a deep neural network to approximate the Q-value function, allowing it to effectively handle dynamic and complex microgrid states. Building on this, the study models the PSO parameter adjustment process as a Markov Decision Process (MDP) and employs DQN to derive the optimal parameter adjustment strategy, enabling PSO to adaptively refine its search behavior in response to system dynamics. The MDP framework comprises four core elements: state space, action space, state transitions, and reward function. They are structured as follows:

State Space Design:

The state space is represented by the operational state vector of the microgrid, designed to comprehensively capture the system’s dynamic characteristics and operational constraints. The state vector includes the following key variables

S (t) = {P_{l o a d} (t), P_{P V} (t), P_{W P} (t), P_{D G} (t), S O C (t), ∆ S O C (t)}

(14)

where

P_{l o a d} (t)

represents the current load demand of the microgrid, and

P_{P V} (t), P_{W P} (t)

, and

P_{D G} (t)

denote the power outputs of photovoltaic, wind turbine, and diesel generator, respectively. The variable

S O C (t)

represents the state of charge of the storage system. Additionally, the change rate

∆ S O C (t)

quantifies the real-time regulation capacity of the storage system in response to load fluctuations or renewable energy output variations, enabling DQN to assess system stability and energy balance.

2.: Action Space Design:

The action space primarily involves adjusting key PSO parameters and search strategies to adapt to varying microgrid operating conditions. The specific design is as follows

A (t) = (M o d e, ∆ ω \in [- 0.1, 0.1])

(15)

where

∆ ω

represents the adjustment magnitude of the inertia weight, with its initial value preset by the policy, and Mode denotes different search strategies, including Global Search, Local Adjustment, and Reliability Enhancement.

(1): Global Search Strategy: This strategy is primarily employed in scenarios with significant fluctuations in wind and solar output, encouraging particles to explore new scheduling configurations while maximizing renewable energy utilization. The parameter settings are as follows: $ω$ = 0.7 + $∆ ω$ , $c_{1}$ = 1.0, $c_{2}$ = 2.0. $c_{2}$ > $c_{1}$ emphasizes social learning, expanding the search range to enhance renewable energy utilization.
(2): Local Adjustment Strategy: When the renewable energy utilization rate is high, this strategy fine-tunes the scheduling scheme to optimize local supply–demand balance. Particle positions are adjusted based on supply–demand deviations.

$x_{i}^{t + 1} \leftarrow x_{i}^{t} + β \cdot ∆ P \cdot N (0, 1), β = 0.05$

(16)

where N(0,1) represents a standard normal distribution to control the fine-tuning range. The parameter settings are as follows: $ω$ = 0.5 + $∆ ω$ , $c_{1}$ = 0.5, $c_{2}$ = 2.0.
(3): Reliability Enhancement Strategy: This strategy balances supply and demand through energy storage regulation, prioritizing power supply reliability while maximizing renewable energy output within feasible limits. It is triggered when one of the following conditions is met: SOC(t) < 0.25 or SOC(t) > 0.85 or $η_{r i s k}$ > 0.03. In these cases, gradient descent is applied to adjust the energy storage power, mitigating power supply risks.

3.: State Transition and Reward Function Design:

State transitions are governed by the real-time operational state of the microgrid and PSO parameter adjustments. At time step t, DQN selects an action A(t) based on the ε-greedy strategy to modify PSO parameters. The particle swarm then updates its velocity and position accordingly, generating a new scheduling strategy. The microgrid subsequently executes the scheduling plan based on load variations and renewable energy fluctuations, transitioning to a new state S(t + 1). To ensure training stability, DQN employs an experience replay mechanism to store interaction data and utilizes a target network for stable Q-value updates, preventing policy oscillations.

The reward function is formulated to enhance power supply reliability while optimizing renewable energy utilization. A weighted dual-objective metric is used to guide DQN in learning scheduling strategies that optimize both objectives

R = ω_{1} η_{c l e a n} - ω_{2} η_{r i s k} + ω_{3} M o d e_B o n u s

(17)

where

ω_{1}

,

ω_{2}

, and

ω_{3}

are weight coefficients, set as

ω_{1}

= 0.4,

ω_{2}

= 0.5, and

ω_{3}

= 0.1 in this study, prioritizing renewable energy utilization while ensuring power supply reliability. The weight values are validated through pre-experiments. This study introduces the Mode_Bonus mechanism to overcome the limitations of static weighted-sum reward functions. Without Mode_Bonus, the reward function may become excessively biased toward either objective

η_{c l e a n}

or

η_{r i s k}

, potentially leading to suboptimal policy learning, for example, prioritizing reliability at the expense of exploration. The proposed mechanism allows the DQN to dynamically adapt to evolving microgrids conditions, effectively balancing the trade-off between exploration and risk control. Furthermore, it addresses the inherent limitations of fixed-weight schemes in multi-objective optimization, thereby guiding the algorithm toward global optimality in complex and uncertain environments. The Mode_Bonus is defined as follows: if

η_{c l e a n}

> 0.6, Mode_Bonus = 0.02, if

η_{r i s k}

< 0.03, Mode_Bonus = 0.015. If both conditions are met, Mode_Bonus = 0.02, otherwise Mode_Bonus = 0.

3.4. DQN-PSO Algorithm Architecture and Process

The DQN-PSO algorithm employs a dual-layer optimization architecture, where the DQN decision layer and the PSO optimization layer collaborate to achieve dynamic microgrid scheduling. The DQN decision layer selects actions based on the real-time state of the microgrid, dynamically adjusting PSO parameters and optimizing the Q-value network. It is implemented using a fully connected neural network with ReLU activation functions and the Adam optimizer, while its loss function is based on mean squared error (MSE). The PSO optimization layer fine-tunes the scheduling strategy based on the dynamically adjusted parameters. The particle swarm updates its velocity and position to search for the global optimal solution, ensuring alignment between the fitness function of PSO and the reward function of DQN.

Through a closed-loop feedback mechanism, DQN continuously refines PSO parameters and strategy selection, while PSO generates scheduling strategies and provides feedback for further optimization. This dual-layer framework overcomes the limitations of traditional PSO with fixed parameters, significantly enhancing adaptability to load fluctuations and weather variations. Figure 4 illustrates the collaborative optimization architecture.

In standalone microgrid scheduling, the system’s operational state is highly dynamic and uncertain. Traditional optimization methods often fail to adapt efficiently, leading to suboptimal or unstable scheduling results. By dynamically tuning PSO parameters in response to real-time system feedback, the DQN-PSO algorithm improves scheduling flexibility and robustness.

DQN-PSO Algorithm Steps are as follows:

Initialize the microgrid environment and set up the DQN and PSO parameters.
At each time step t
- DQN acquires the current microgrid state S(t);
- DQN selects an action A(t) using the ε-greedy strategy;
- The selected action modifies PSO parameters based on the corresponding strategy;
- PSO iterates to generate an optimized scheduling solution;
- The microgrid executes the scheduling strategy, updates its state to S(t + 1), and calculates the reward R(t);
- The interaction experience is stored in a replay buffer, and the DQN is trained by sampling from stored experiences;
- The target network is periodically updated to ensure training stability.
Repeat the above steps until the termination condition is met, and output the optimal scheduling strategy.

Figure 5 presents the detailed flowchart of the algorithm.

4. Example Analysis

In this study, a standalone wind–solar–diesel–battery microgrid, as illustrated in Figure 1, is selected as the research subject. A simulation model is developed using the open-source Python microgrid simulator pymgrid [27]. The previously established mathematical models of the microgrid system are employed to characterize the power output of each component and the evolution of the energy storage state. The photovoltaic (PV) output is modeled based on solar radiation intensity and temperature models, and the wind power output follows a Weibull wind speed distribution [28]. Pymgrid provides flexible component modeling and data processing functionalities, enabling the customization of load profiles and renewable energy output curves.

The key parameters of the generation units and the energy storage system are listed in Table 1 and Table 2 [29,30].

The typical daily profiles of load demand, photovoltaic (PV) power generation, and wind power generation for the standalone microgrid are depicted in Figure 6. The PV peak output reaches 30 kW, while the average wind power output is 55.65 kW. The load profile exhibits a dual-peak characteristic, with peaks occurring in the morning and evening.

4.1. Benchmark Experiments

To validate the effectiveness of the DQN-PSO method, it is compared against the following two baseline approaches: Standard Particle Swarm Optimization (PSO) and Random Policy. PSO Utilizes fixed parameter settings, including inertia weight and learning factors (

ω = 0.7

,

c_{1} = c_{2} = 2.0

). Random Policy Serves as a lower-bound baseline, where the diesel generator (DG) and battery power outputs are randomly allocated within feasible limits. Figure 7 provides a visual comparison of the DG power output and battery storage output over 24 h for the three methods.

As observed from Experimental Results, the DQN-PSO method (red solid line) demonstrates adaptive scheduling. During peak load periods, the DG output closely follows the demand, while during low-demand periods, DG utilization is minimized, relying primarily on renewable energy and battery storage. Power shortages (red ‘×’) occur only occasionally, typically during rapid load transitions. The battery storage system (blue solid line) efficiently manages charging and discharging, discharging effectively during peak load and charging up to −20 kW during off-peak hours, ensuring state-of-charge (SOC) balance. In contrast, the standard PSO method (purple dashed line) exhibits more pronounced DG fluctuations with frequent power shortages (purple ‘×’), especially between 0–5 h and 10–15 h. Its battery charging and discharging behavior (cyan dashed line) is irregular, failing to effectively support load demand. The random policy (gray dotted line) results in unpredictable DG output fluctuations, widespread power shortages (gray ‘×’), and battery storage output (light gray dotted line) that randomly oscillates between −20 kW and 20 kW, lacking coordination. These results demonstrate that DQN-PSO optimally schedules DG utilization and battery storage, significantly enhancing renewable energy utilization while reducing fossil fuel dependency.

4.2. Performance Comparison of Indicators

To comprehensively evaluate the dynamic adaptability of the DQN-PSO algorithm, three microgrid operation scenarios are designed. These scenarios introduce different perturbation factors based on typical daily data (Figure 6), simulating complex environments. Scenario 1 serves as the baseline scenario without additional perturbations, representing a typical microgrid operation environment with relatively stable supply–demand balance. Scenario 2 is the high-fluctuation scenario, where Gaussian noise is added to the PV and WT outputs (with a noise standard deviation of 20% and a minimum truncation value of 0) to simulate the uncertainty in power generation caused by extreme weather conditions or equipment fluctuations, while the load remains unchanged. This scenario increases the variability of renewable energy output, making supply–demand matching more challenging. Scenario 3 simulates low electricity demand during nighttime or holidays, where the PV and WT outputs remain the same as in the baseline scenario, but the load is reduced by 20%, increasing the proportion of renewable energy utilization.

All data are generated using pymgrid, with component outputs computed based on the previously established mathematical models, ensuring that simulation results accurately reflect real-world microgrid behavior. The scheduling period is set to 24 h, with a time step of 1 h. In each scenario, the algorithm runs 10 times, and the mean and standard deviation of key performance metrics are recorded, as shown in Table 3.

The results indicate that the DQN-PSO method outperforms the Standard PSO in all three scenarios. For example, in the baseline scenario, DQN-PSO improves renewable energy utilization by approximately 3.2% while significantly reducing power supply risk. In the high-fluctuation scenario, DQN-PSO increases renewable energy utilization by about 4.5% and reduces power supply risk by 3%, demonstrating its adaptability to uncertain environments. In the low-load scenario, DQN-PSO increases renewable energy utilization to 78.09%, showcasing its superior resource utilization capability.

To further analyze the dynamic adaptability of the DQN-PSO algorithm, this study examines the average usage frequency of the three strategies (Global Search, Local Adjustment, and Reliability Enhancement) in the three scenarios, based on statistics from 10 runs (as shown in Figure 8, Figure 9 and Figure 10). These strategies are dynamically selected by DQN for global exploration, local optimization, and risk control, respectively.

In the baseline scenario, Local Adjustment dominates, indicating that the algorithm primarily responds to normal fluctuations in load and renewable energy output through local optimization. Global Search is mainly used during PV output peaks to optimize renewable energy utilization. Reliability Enhancement is used less frequently, reflecting the low demand for risk control in a stable supply–demand environment.

In the high-fluctuation scenario, Local Adjustment frequency significantly increases, suggesting that the algorithm prioritizes local adjustments to handle the substantial variations in PV and WT output. Reliability Enhancement is used at a moderate level to help mitigate power supply risks. Global Search is rarely used, indicating that under high uncertainty conditions, the priority of global exploration decreases.

In the low-load scenario, Local Adjustment remains dominant, indicating that the algorithm adapts to the reduced load through local optimization. Global Search is used more frequently during PV output peaks to optimize renewable energy utilization. Reliability Enhancement is less needed, reflecting the lower risk of power supply issues under low-load conditions.

The strategy usage frequency distributions in Figure 8, Figure 9 and Figure 10 align closely with the performance data in Table 3. For instance, in the high-fluctuation scenario, the significant increase in Reliability Enhancement frequency (0.6–0.9) directly reduces power supply risk (from 4.04% to 1.04%). In the low-load scenario, the moderate use of Global Search (0.1–0.4) contributes to an increase in renewable energy utilization to 78.09%. However, the dominant use of Reliability Enhancement across all scenarios (frequency 0.5–0.9) may indicate an excessive preference for risk control, potentially limiting the effectiveness of Global Search in stable conditions. Future improvements could involve adjusting the reward function weights or mode-switching thresholds to further optimize strategy allocation and enhance overall algorithm performance.

4.3. Parameter Sensitivity Analysis

This study further analyzes the impact of DQN parameters (learning rate α and discount factor γ) and PSO parameters (inertia weight) on the algorithm’s performance. In the baseline scenario (Scenario 1), each parameter combination is tested over 10 runs, with renewable energy utilization and power supply risk recorded to evaluate performance sensitivity to parameter variations.

Figure 11 presents the sensitivity analysis results of the DQN-PSO algorithm with respect to inertia weight and learning rate α. Subfigures (a) and (b) illustrate the effect of inertia weight on renewable energy utilization and power supply risk, respectively, while subfigures (c) and (d) depict the impact of learning rate α. The results indicate that the algorithm achieves optimal performance when the inertia weight is set to 0.7, and the learning rate α is 0.0005.

Figure 12 shows the sensitivity analysis results of the DQN-PSO algorithm concerning the discount factor γ. Subfigures (a) and (b) illustrate the impact of γ on renewable energy utilization and power supply risk, respectively. The analysis reveals that the best balance between the two performance metrics is achieved when γ = 0.98.

5. Discussion

This study proposes a DQN-PSO-based scheduling method for standalone microgrids and validates its effectiveness in enhancing renewable energy utilization and ensuring power supply reliability through simulation experiments. The results demonstrate that the proposed method significantly improves microgrid operational performance under various scenarios. In the typical-day scenario, the clean energy utilization rate increased from 65.28% (standard PSO) to 68.51%, while the power supply reliability risk decreased from 2.45% to 0.70%. Under high-fluctuation conditions, clean energy utilization improved by approximately 5%, and the reliability risk was reduced by about 3%, indicating strong adaptability to environmental variability.

These improvements are primarily attributed to the dynamic parameter tuning and multi-strategy switching mechanisms of the DQN-PSO algorithm. The DQN component leverages deep Q-networks to dynamically optimize PSO parameters, thereby enhancing global search capabilities and effectively mitigating the risk of local optima—a common issue in conventional PSO. Parameter sensitivity analysis suggests that the optimal configuration is w = 0.7, α = 0.0005, and γ = 0.98. Notably, the discount factor γ should be adjusted in accordance with actual microgrid operating conditions to balance multiple optimization objectives. In extreme scenarios, the algorithm’s global search capability may still be constrained by the initial particle distribution, introducing a risk of suboptimal convergence. To counter this, the multi-strategy switching mechanism enhances the robustness of the scheduling process. DQN-PSO can dynamically alternate between distinct search strategies based on real-time operating conditions, thereby effectively addressing renewable energy uncertainties. For instance, in highly volatile scenarios, the algorithm prioritizes exploratory strategies to better accommodate rapid fluctuations in supply and demand.

Compared with traditional PSO, the adaptive nature of DQN-PSO markedly improves scheduling precision, particularly in complex and dynamic environments. Furthermore, the core mechanisms underlying this approach offer strong potential for broader applications in domains requiring dynamic optimization and robust decision-making—such as real-time energy management in intelligent traction power systems [31] and the optimal control of shipboard power systems [32].

6. Conclusions

This paper proposes a DQN-PSO-based scheduling method for standalone microgrids. By dynamically adjusting PSO parameters and introducing a multi-strategy switching mechanism, the proposed approach effectively enhances renewable energy utilization and mitigates power supply risks. These findings offer novel insights and technical support for the advancement of intelligent microgrid scheduling. Future research directions include the integration of multi-agent reinforcement learning (MARL) to facilitate coordinated optimization among multiple interconnected microgrids. Additionally, the application of transfer learning techniques could improve the generalization capability of the algorithm across diverse operating conditions. Moreover, refining the reward function by developing adaptive reward mechanisms or the incorporation of real-time electricity pricing and demand response strategies may further enhance the economic efficiency and real-world applicability of the proposed framework.

Author Contributions

Writing—original draft preparation, B.Z.; writing—review and editing, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 61972147), the Major Science and Technology Projects in Yunnan Province (Grant No. 202202AE090032), and the Hunan Province College Students’ Innovation and Entrepreneurship Training Program (Grant Nos. s202410537132, s202410537131, s202410537126).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, B.; Li, Y.; Yao, W.; Jiang, L.; Zhang, C.; Duan, C.; Ren, Y. Optimization and Control of New Power Systems under the Dual Carbon Goals: Key Issues, Advanced Techniques, and Perspectives. Energies 2023, 16, 3904. [Google Scholar] [CrossRef]
Hubble, A.H.; Ustun, T.S. Composition, placement, and economics of rural microgrids for ensuring sustainable development. Sustain. Energy Grids Netw. 2018, 13, 1–18. [Google Scholar] [CrossRef]
Kumar, A.; Singh, A.R.; Deng, Y.; He, X.; Kumar, P.; Bansal, R.C. Integrated assessment of a sustainable microgrid for a remote village in hilly region. Energy Convers. Manag. 2019, 180, 442–472. [Google Scholar] [CrossRef]
Pisei, S.; Choi, J.Y.; Lee, W.P.; Won, D.J. Optimal power scheduling in multi-microgrid system using particle swarm optimization. J. Electr. Eng. Technol. 2017, 12, 1329–1339. [Google Scholar] [CrossRef]
Hossain, M.A.; Pota, H.R.; Squartini, S.; Zaman, F.; Guerrero, J.M. Energy scheduling of community microgrid with battery cost using particle swarm optimisation. Appl. Energy 2019, 254, 113723. [Google Scholar] [CrossRef]
Zheng, Z.; Yang, S. Particle swarm optimisation for scheduling electric vehicles with microgrids. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar] [CrossRef]
Zhang, H.; Li, G.; Wang, S. Optimization dispatching of isolated island microgrid based on improved particle swarm optimization algorithm. Energy Rep. 2022, 8, 420–428. [Google Scholar] [CrossRef]
Ge, J. Research on economic environment scheduling optimization of power system based on multi-objective particle swarm optimization. Process. Integr. Optim. Sustain. 2024, 9, 275–290. [Google Scholar] [CrossRef]
Guan, Z.; Wang, H.; Li, Z.; Luo, X.; Yang, X.; Fang, J.; Zhao, Q. Multi-Objective Optimal Scheduling of Microgrids Based on Improved Particle Swarm Algorithm. Energies 2024, 17, 1760. [Google Scholar] [CrossRef]
Lei, L.; Tan, Y.; Dahlenburg, G.; Xiang, W.; Zheng, K. Dynamic energy dispatch based on deep reinforcement learning in IoT-driven smart isolated microgrids. IEEE Internet Things J. 2020, 8, 7938–7953. [Google Scholar] [CrossRef]
Totaro, S.; Boukas, I.; Jonsson, A.; Cornelusse, B. Lifelong control of off-grid microgrid with model-based reinforcement learning. Energy 2021, 232, 121035. [Google Scholar] [CrossRef]
Meng, Q.; Hussain, S.; Luo, F.; Wang, Z.; Jin, X. An online reinforcement learning-based energy management strategy for microgrids with centralized control. IEEE Trans. Ind. Appl. 2024, 61, 1501–1510. [Google Scholar] [CrossRef]
Wen, Y.; Fan, P.; Hu, J.; Ke, S.; Wu, F.; Zhu, X. An Optimal Scheduling Strategy of a Microgrid with V2G Based on Deep Q-Learning. Sustainability 2022, 14, 10351. [Google Scholar] [CrossRef]
Domínguez-Barbero, D.; García-González, J.; Sanz-Bobi, M.A.; Sánchez-Úbeda, E.F. Optimising a Microgrid System by Deep Reinforcement Learning Techniques. Energies 2020, 13, 2830. [Google Scholar] [CrossRef]
Pan, W.; Yu, X.; Guo, Z.; Qian, T.; Li, Y. Online EVs Vehicle-to-Grid Scheduling Coordinated with Multi-Energy Microgrids: A Deep Reinforcement Learning-Based Approach. Energies 2024, 17, 2491. [Google Scholar] [CrossRef]
Liang, T.; Chai, L.; Cao, X.; Tan, J.; Jing, Y.; Lv, L. Real-time optimization of large-scale hydrogen production systems using off-grid renewable energy: Scheduling strategy based on deep reinforcement learning. Renew. Energy 2024, 224, 120177. [Google Scholar] [CrossRef]
Yin, S.; Jin, M.; Lu, H.; Gong, G.; Mao, W.; Chen, G.; Li, W. Reinforcement-learning-based parameter adaptation method for particle swarm optimization. Complex Intell. Syst. 2023, 9, 5585–5609. [Google Scholar] [CrossRef]
Wang, F.; Wang, X.; Sun, S. A reinforcement learning level-based particle swarm optimization algorithm for large-scale optimization. Inf. Sci. 2022, 602, 298–312. [Google Scholar] [CrossRef]
Huang, H.; Jin, C. A novel particle swarm optimization algorithm based on reinforcement learning mechanism for AUV path planning. Complexity 2021, 2021, 8993173. [Google Scholar] [CrossRef]
Zhang, Z.; Li, J.; Lei, Z.; Zhu, Q.; Cheng, J.; Gao, S. Reinforcement learning-based particle swarm optimization for wind farm layout problems. Energy 2024, 313, 134050. [Google Scholar] [CrossRef]
Gao, Y.J.; Shang, Q.X.; Yang, Y.Y.; Hu, R.; Qian, B. Improved particle swarm optimization algorithm combined with reinforcement learning for solving flexible job shop scheduling problem. In Proceedings of the International Conference on Intelligent Computing, Zhengzhou, China, 10–13 August 2023; Springer Nature Singapore: Gateway, Singapore, 2023; Volume 14086, pp. 288–298. [Google Scholar] [CrossRef]
Li, L.; Tao, S.; Xiao, X.; Chen, P. Improved photovoltaic model and its application in reliability evaluation of microgrid. Adv. Technol. Electr. Eng. Energy 2016, 35, 65–71. [Google Scholar] [CrossRef]
Shao, Z.; Liu, Y.; Zhang, Y. Affine modelling method of wind speed-power characteristics in wind farm based on measured data. Electr. Power Autom. Equip. 2019, 39, 96–101. [Google Scholar] [CrossRef]
Liu, H.W.; Wang, S.T.; Liu, G.C. Operation strategy of optimal capacity configuration for microgrid with WTPV-DE-BES. Acta Energiae Solaris Sin. 2022, 43, 453–460. [Google Scholar]
Wang, D.; Tan, D.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
Zeng, D.; Yan, T.; Zeng, Z.; Liu, H.; Guan, P. A hyperparameter adaptive genetic algorithm based on DQN. J. Circuits Syst. Comput. 2023, 32, 2350062. [Google Scholar] [CrossRef]
Henri, G.; Levent, T.; Halev, A.; Alami, R.; Cordier, P. pymgrid: An open-source python microgrid simulator for applied artificial intelligence research. arXiv 2020, arXiv:2011.08004. [Google Scholar] [CrossRef]
Zhang, Z.; Chen, B.; Wu, M.; Che, B.; Yang, Y.; Liu, T. A stochastic energy optimization of stand-alone electrical microgrid with storage systems and demand management. Intell. Decis. Technol. 2025. [Google Scholar] [CrossRef]
Luo, J.; Zhang, W.; Wang, H.; Deng, L. Research on Optimal Scheduling of Micro-Grid Based on Deep Reinforcement Learning. J. Electr. Power 2023, 38, 54–63. (In Chinese) [Google Scholar] [CrossRef]
Hengan, C.; Lin, G.; Cao, L.U. Multi-objective Optimal Dispatch Model and Its Algorithm in Isolated Microgrid With Renewable Energy Generation as Main Power Supply. Power Syst. Technol. 2020, 44, 664–674. (In Chinese) [Google Scholar] [CrossRef]
Ying, Y.; Tian, Z.; Wu, M.; Liu, Q.; Tricoli, P. A Real-Time Energy Management Strategy of Flexible Smart Traction Power Supply System Based on Deep Q-Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 8938–8948. [Google Scholar] [CrossRef]
Fu, J.; Sun, D.; Peyghami, S.; Blaabjerg, F. A Novel Reinforcement-Learning-Based Compensation Strategy for DMPC-Based Day-Ahead Energy Management of Shipboard Power Systems. IEEE Trans. Smart Grid 2024, 15, 4349–4363. [Google Scholar] [CrossRef]

Figure 1. System of standalone microgrid.

Figure 2. Wind turbine power characteristic curve.

Figure 3. PSO flow chart.

Figure 4. DQN-PSO collaborative optimization architecture diagram.

Figure 5. DQN-PSO algorithm flow chart.

Figure 6. Typical daily load and wind and photovoltaic power profiles for standalone microgrid.

Figure 7. Diesel generator power comparison chart and battery output comparison chart.

Figure 8. The average frequency of strategy usage in the baseline scenario.

Figure 9. The average frequency of strategy usage in the high-fluctuation scenario.

Figure 10. The average frequency of strategy usage in the low-load scenario.

Figure 11. Sensitivity analysis of DQN-PSO parameters (α and inertia weight).

Figure 12. Sensitivity analysis of DQN-PSO parameters (γ).

Table 1. Main parameters of the power generation unit.

Equipment Type	Parameter
PV	$P_{S T C}$ = 40 kW
WT	$P_{r}$ = 80 kW
DG	$P_{D G}$ = 65 kW

Table 2. Capacity of generators and parameters of storage.

Parameter	Value	Parameter	Value
$E_{r}$ /kWh	200	$P_{c h_m a x}$ /kW	40
${S O C}_{m a x}$	0.9	$P_{d i s_m a x}$ /kW	40
${S O C}_{m i n}$	0.2	$η_{c h}$	0.95
SOC(0)	0.5	$η_{d i s}$	0.95

Table 3. Performance Comparison Across Scenarios (Mean ± Std).

Scenario	Algorithm	$η_{c l e a n}$ (%)	$η_{r i s k}$ (%)
Scenario 1	Standard PSO	65.28 ± 1.66	2.45 ± 1.84
Scenario 1	DQN-PSO	68.51 ± 1.58	0.70 ± 0.43
Scenario 2	Standard PSO	61.74 ± 1.82	4.04 ± 3.97
Scenario 2	DQN-PSO	66.20 ± 1.71	1.04 ± 0.80
Scenario 3	Standard PSO	67.15 ± 2.82	0.82 ± 1.00
Scenario 3	DQN-PSO	78.09 ± 2.35	0.30 ± 0.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, B.; Liu, B. An Adaptive Scheduling Method for Standalone Microgrids Based on Deep Q-Network and Particle Swarm Optimization. Energies 2025, 18, 2133. https://doi.org/10.3390/en18082133

AMA Style

Zhang B, Liu B. An Adaptive Scheduling Method for Standalone Microgrids Based on Deep Q-Network and Particle Swarm Optimization. Energies. 2025; 18(8):2133. https://doi.org/10.3390/en18082133

Chicago/Turabian Style

Zhang, Borui, and Bo Liu. 2025. "An Adaptive Scheduling Method for Standalone Microgrids Based on Deep Q-Network and Particle Swarm Optimization" Energies 18, no. 8: 2133. https://doi.org/10.3390/en18082133

APA Style

Zhang, B., & Liu, B. (2025). An Adaptive Scheduling Method for Standalone Microgrids Based on Deep Q-Network and Particle Swarm Optimization. Energies, 18(8), 2133. https://doi.org/10.3390/en18082133

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Adaptive Scheduling Method for Standalone Microgrids Based on Deep Q-Network and Particle Swarm Optimization

Abstract

1. Introduction

2. Standalone Microgrid System Model

2.1. Mathematical Model of the Microgrid System

2.2. System Constraints

2.3. Optimization Objectives

3. Scheduling Method

3.1. Standard PSO Algorithm

3.2. Deep Q-Network Module

3.2.1. Neural Network Architecture

3.2.2. Training Process Based on Markov Decision Process (MDP)

3.3. DQN-PSO Collaborative Optimization Algorithm

3.4. DQN-PSO Algorithm Architecture and Process

4. Example Analysis

4.1. Benchmark Experiments

4.2. Performance Comparison of Indicators

4.3. Parameter Sensitivity Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI