1. Introduction
The global renewable energy market is undergoing rapid growth and profound transformation. Under the framework of the “dual carbon” targets, the integration of renewable energy into modern power systems has become increasingly widespread [
1]. As a self-sustaining energy system that operates independently of conventional power grids, standalone wind–solar–diesel–storage microgrids integrate photovoltaic (PV) generation, wind power, energy storage systems, and backup diesel generators to facilitate local electricity production and consumption. These microgrids offer a cost-effective and efficient power supply solution for remote areas and regions where grid extension is either impractical or prohibitively expensive [
2,
3]. However, the operation of standalone microgrids is hindered by the inherent intermittency of distributed energy sources and the uncertainty of load demand. Under the constraint of relying solely on internal resources, achieving energy balance, enhancing the utilization of distributed renewable energy, and ensuring reliable power supply remain critical research challenges in the field of standalone microgrid optimization.
Optimization-based scheduling plays a crucial role in mitigating operational challenges in standalone microgrids. As a high-dimensional, nonlinear, and multi-constraint optimization problem, various studies have employed traditional optimization methods for its solution. Among these, Particle Swarm Optimization (PSO) has been widely applied in microgrid scheduling due to its strong scalability and ease of implementation [
4,
5,
6]. However, the performance of PSO is highly sensitive to its parameter settings, and fixed parameter configurations often struggle to adapt to varying operating conditions, leading to premature convergence and local optima. To overcome this limitation, researchers have introduced adaptive mechanisms to enhance PSO. For instance, Zhang et al. [
7] proposed an improved PSO algorithm with adaptive inertia weight and constriction factor for economic scheduling in islanded microgrids, yielding promising results. Ge [
8] designed a multi-objective PSO-based power system scheduling model, enhancing convergence and solution quality through dynamic weight adjustment. Guan et al. [
9] optimized PSO parameters and particle velocity transformation to improve efficiency and reliability in solving economic dispatch problems in microgrids.
In recent years, reinforcement learning (RL) has gained increasing attention as a promising approach for microgrid scheduling, owing to its capabilities in online learning and dynamic optimization. By leveraging the state–action–reward mechanism, RL effectively addresses the uncertainties associated with distributed energy resources and load fluctuations [
10,
11,
12]. However, conventional RL algorithms often encounter significant challenges when applied to high-dimensional state spaces and complex multi-objective optimization problems, particularly in terms of computational efficiency and convergence stability. To overcome these limitations, deep reinforcement learning (DRL) integrates deep neural networks into RL frameworks, thereby enhancing adaptability in dynamic and uncertain environments. For example, Wen et al. [
13] employed Deep Q-Networks (DQN) to optimize the scheduling of electric vehicle (EV)-integrated microgrids under renewable energy uncertainty. Domínguez-Barbero et al. [
14] adopted DQN to achieve cost-efficient operation in islanded microgrids. Pan et al. [
15] developed a dynamic scheduling framework based on the Soft Actor-Critic (SAC) algorithm to enhance both economic and environmental performance. Liang et al. [
16] utilized the Deep Deterministic Policy Gradient (DDPG) algorithm to enable continuous control in off-grid renewable energy systems. These studies collectively demonstrate the significant potential of DRL in managing renewable energy variability and addressing complex scheduling objectives. Nevertheless, several critical challenges remain unresolved, including high data dependency, slow convergence rates, and limited generalization ability under extreme or unseen conditions.
Given these challenges, the fusion of intelligent optimization algorithms and reinforcement learning has attracted growing interest, showcasing significant potential in complex environments. For example, Yin et al. [
17] proposed a reinforcement learning-enhanced PSO (RLPSO), which dynamically adjusts PSO parameters to improve convergence and global search capability. Wang et al. [
18] developed a reinforcement learning-level PSO (RLLPSO) to effectively address large-scale optimization problems. Huang et al. [
19] embedded a reinforcement learning feedback mechanism into PSO for autonomous underwater vehicle path planning, enhancing search efficiency and adaptability. Additionally, Zhang et al. [
20] combined reinforcement learning with PSO to optimize wind farm layouts, demonstrating its advantages in handling complex constraints. Gao et al. [
21] introduced an improved PSO (IPSO_RL) with Q-learning-based dynamic inertia weight adjustment, exhibiting superior performance in flexible job shop scheduling problems, further validating the potential of combining RL with intelligent optimization algorithms.
Although both Particle Swarm Optimization (PSO) and reinforcement learning (RL) have shown promise in microgrid scheduling, conventional PSO suffers from fixed parameter settings, which limits its ability to handle rapid fluctuations in solar and wind power output and makes it prone to premature convergence. On the other hand, while RL and DRL offer dynamic optimization capabilities, they often face challenges such as slow convergence rates and substantial data requirements. Furthermore, existing hybrid RL-PSO approaches are primarily applied to areas such as path planning and system layout optimization, and a well-established framework for multi-objective scheduling in standalone microgrids—especially under high uncertainty—has yet to emerge. These limitations underscore the pressing need for an efficient hybrid framework that integrates the global search capability of PSO with the adaptive learning capacity of RL.
To address this problem, we propose an adaptive microgrid scheduling optimization approach based on a Deep Q-Network and Particle Swarm Optimization (DQN-PSO) framework. The main contributions of this work are summarized as follows:
- (1)
We design a DQN-PSO adaptive optimization framework in which a Deep Q-Network module perceives the microgrid’s operational state and dynamically adjusts key PSO parameters, including the inertia weight and acceleration coefficients. This enables the PSO algorithm to respond more flexibly to changing system conditions, thereby mitigating the issue of local optima in dynamic environments and enhancing overall scheduling performance.
- (2)
We propose three novel adaptive scheduling strategies—Global Search, Local Adjustment, and Reliability Enhancement—designed to improve performance under diverse operating conditions: a. Global Search broadens the solution space to facilitate comprehensive optimization; b. Local Adjustment fine-tunes scheduling decisions to accommodate real-time supply–demand imbalances; and c. Reliability Enhancement prioritizes the dispatch of energy storage to maintain power supply continuity. These strategies collectively improve system flexibility and operational stability.
- (3)
To evaluate the effectiveness of the proposed method, we develop a representative standalone microgrid simulation model and conduct comparative experiments under multiple scenarios. The results demonstrate that the proposed approach achieves superior performance in terms of clean energy utilization and power supply reliability, thereby validating its practical applicability.
The structure of the paper is as follows:
Section 2 presents the system composition and mathematical model of the standalone microgrid.
Section 3 elaborates on the theoretical framework and cooperative mechanism of the proposed DQN-PSO method.
Section 4 describes the simulation experiments, evaluates performance, and compares the advantages and limitations of DQN-PSO with traditional methods.
Section 5 presents the experimental results and provides a detailed analysis of their significance, while
Section 6 summarizes the key research contributions and outlines potential directions for future work.
3. Scheduling Method
Building on the system model and constraints, this study proposes a hybrid Deep Q-Network and Particle Swarm Optimization (DQN-PSO) approach for optimal scheduling in standalone microgrids. By integrating reinforcement learning with swarm intelligence, this method dynamically adapts to renewable energy fluctuations, enhancing both clean energy utilization and supply reliability.
3.1. Standard PSO Algorithm
Particle Swarm Optimization (PSO) is a heuristic optimization technique inspired by the foraging behavior of bird flocks [
25]. Due to its simplicity and efficiency, PSO has been widely applied in optimization problems, including microgrid scheduling. The basic PSO process is illustrated in
Figure 3.
Each particle in PSO represents a candidate solution within the search space and moves iteratively to explore the global optimum. The movement of a particle is influenced by its personal best solution and the global best solution of the swarm. The velocity and position update equations are given by the following equations:
ω is the inertia weight (typically ranging from 0.4 to 0.9), controlling the search range and convergence speed;
and
are cognitive and social learning factors, respectively, commonly set to 2;
are random numbers in [0, 1];
and
represent the personal and global best positions;
and
denote the velocity and position of particle
i, respectively; and
t is the iteration index.
In microgrid scheduling, PSO optimizes the output of various system components. A particle’s position represents a candidate scheduling strategy, including photovoltaic (PV) power allocation, wind power allocation, battery charge/discharge scheduling, and diesel generator output. The fitness function is designed to maximize clean energy utilization while ensuring power supply reliability.
However, standard PSO struggles with dynamic environmental changes due to fixed parameter configurations, leading to premature convergence or suboptimal performance. Some examples are as follows:
Sudden load variations: Fixed inertia weights may cause premature convergence, limiting exploration of optimal scheduling strategies.
Weather fluctuations: Fixed learning factors restrict the algorithm’s ability to rapidly adjust to variations in renewable energy generation.
To overcome these limitations, an adaptive mechanism is introduced to dynamically adjust PSO parameters based on real-time system conditions.
3.2. Deep Q-Network Module
Reinforcement learning (RL) enables an agent to interact with its environment and optimize decision-making strategies based on reward feedback. In this study, Deep Q-Network (DQN) is incorporated to dynamically adjust key PSO parameters, enhancing the adaptability of microgrid scheduling. Prior studies [
26] have demonstrated that DQN improves convergence speed and global search performance in optimization tasks.
DQN employs deep learning techniques to approximate the Q-value function, which estimates the expected return of an action in a given state. Compared to traditional Q-learning, DQN mitigates the curse of dimensionality and instability issues, making it well-suited for high-dimensional microgrid scheduling problems.
3.2.1. Neural Network Architecture
The DQN architecture consists of an input layer, two hidden layers, and an output layer:
Input layer: Receives the microgrid state vector, including renewable generation, load demand and battery state of charge (SOC).
Hidden layers: Two fully connected layers, each with 32 neurons, optimized for computational efficiency while preserving critical feature representations. The ReLU activation function enhances non-linear mapping capabilities.
Output layer: Generates Q-values for all possible actions, predicting expected rewards using a linear activation function.
DQN is implemented using TensorFlow, with the Adam optimizer fine-tuning network weights. The loss function is defined as
where
Q (
s,
a;
θ) is the predicted Q-value for action
a in state
s, parameterized by
θ, and
y is the target Q-value, computed as
where
r is the immediate reward, reflecting the system’s reward for taking action a in state
s,
γ is the discount factor (determining the importance of future rewards),
s’ is the new state,
is the Q-value of the next state-action pair, evaluated by the target network, and
represents the parameters of the target network, which is periodically updated from the main Q-network to stabilize training.
3.2.2. Training Process Based on Markov Decision Process (MDP)
The DQN training process follows an MDP framework, comprising six key steps:
State Observation: Collect real-time microgrid operational data (e.g., renewable generation, load demand, battery SOC).
Action Selection: Apply an ε-greedy policy to balance exploration and exploitation.
Environment Interaction: Execute the selected scheduling action and observe system response.
Experience Replay: Store state–action–reward transitions in a replay buffer and sample randomly during training.
Q-Value Update: Train the Q-network using mini-batch gradient descent.
Target Network Synchronization: Periodically update the target network parameters to reduce training variance and improve convergence stability.
To enhance learning stability, an experience replay mechanism stores past interactions, allowing the model to sample diverse experiences instead of consecutive time steps. This mitigates temporal correlation issues and improves generalization. Additionally, a separate target network helps stabilize Q-value updates by reducing variance, ensuring smoother convergence.
DQN employs an ε-greedy strategy for action selection, where the agent chooses a random action with probability ε (exploration) and selects the action with the highest Q-value with probability 1 − ε (exploitation). The value of ε is gradually decayed during training to transition from exploration to exploitation, enabling more efficient learning.
3.3. DQN-PSO Collaborative Optimization Algorithm
DQN effectively maps state–action relationships in dynamic environments, offering a flexible and adaptive optimization framework for microgrids scheduling. To address the limitations of traditional PSO in wind–solar–diesel–storage microgrids scheduling, this study introduces a DQN-PSO collaborative optimization approach. This approach incorporates three dynamic search strategies and leverages DQN’s reinforcement learning capabilities to adaptively fine-tune key PSO parameters, thereby improving global search efficiency and optimization effectiveness in complex, dynamic environments.
A key innovation of DQN-PSO, compared to existing research, is its adaptive dual-layer optimization architecture, which perceives the real-time operational state of the microgrid and dynamically optimizes PSO parameters, making it particularly effective for high-dimensional, nonlinear, and multi-constraint microgrid scheduling problems. And the existing RL-PSO methods typically use tabular Q-learning or basic reinforcement learning techniques to adjust PSO parameters, these approaches often suffer from the curse of dimensionality in high-dimensional state spaces. In addition, the lack of dynamic strategy switching limits their adaptability in complex microgrid environments. In contrast, the proposed DQN-PSO method employs a deep neural network to approximate the Q-value function, allowing it to effectively handle dynamic and complex microgrid states. Building on this, the study models the PSO parameter adjustment process as a Markov Decision Process (MDP) and employs DQN to derive the optimal parameter adjustment strategy, enabling PSO to adaptively refine its search behavior in response to system dynamics. The MDP framework comprises four core elements: state space, action space, state transitions, and reward function. They are structured as follows:
The state space is represented by the operational state vector of the microgrid, designed to comprehensively capture the system’s dynamic characteristics and operational constraints. The state vector includes the following key variables
where
represents the current load demand of the microgrid, and
, and
denote the power outputs of photovoltaic, wind turbine, and diesel generator, respectively. The variable
represents the state of charge of the storage system. Additionally, the change rate
quantifies the real-time regulation capacity of the storage system in response to load fluctuations or renewable energy output variations, enabling DQN to assess system stability and energy balance.
- 2.
Action Space Design:
The action space primarily involves adjusting key PSO parameters and search strategies to adapt to varying microgrid operating conditions. The specific design is as follows
where
represents the adjustment magnitude of the inertia weight, with its initial value preset by the policy, and Mode denotes different search strategies, including Global Search, Local Adjustment, and Reliability Enhancement.
- (1)
Global Search Strategy: This strategy is primarily employed in scenarios with significant fluctuations in wind and solar output, encouraging particles to explore new scheduling configurations while maximizing renewable energy utilization. The parameter settings are as follows: = 0.7 + , = 1.0, = 2.0. > emphasizes social learning, expanding the search range to enhance renewable energy utilization.
- (2)
Local Adjustment Strategy: When the renewable energy utilization rate is high, this strategy fine-tunes the scheduling scheme to optimize local supply–demand balance. Particle positions are adjusted based on supply–demand deviations.
where
N(0,1) represents a standard normal distribution to control the fine-tuning range. The parameter settings are as follows:
= 0.5 +
,
= 0.5,
= 2.0.
- (3)
Reliability Enhancement Strategy: This strategy balances supply and demand through energy storage regulation, prioritizing power supply reliability while maximizing renewable energy output within feasible limits. It is triggered when one of the following conditions is met: SOC(t) < 0.25 or SOC(t) > 0.85 or > 0.03. In these cases, gradient descent is applied to adjust the energy storage power, mitigating power supply risks.
- 3.
State Transition and Reward Function Design:
State transitions are governed by the real-time operational state of the microgrid and PSO parameter adjustments. At time step t, DQN selects an action A(t) based on the ε-greedy strategy to modify PSO parameters. The particle swarm then updates its velocity and position accordingly, generating a new scheduling strategy. The microgrid subsequently executes the scheduling plan based on load variations and renewable energy fluctuations, transitioning to a new state S(t + 1). To ensure training stability, DQN employs an experience replay mechanism to store interaction data and utilizes a target network for stable Q-value updates, preventing policy oscillations.
The reward function is formulated to enhance power supply reliability while optimizing renewable energy utilization. A weighted dual-objective metric is used to guide DQN in learning scheduling strategies that optimize both objectives
where
,
, and
are weight coefficients, set as
= 0.4,
= 0.5, and
= 0.1 in this study, prioritizing renewable energy utilization while ensuring power supply reliability. The weight values are validated through pre-experiments. This study introduces the
Mode_Bonus mechanism to overcome the limitations of static weighted-sum reward functions. Without
Mode_Bonus, the reward function may become excessively biased toward either objective
or
, potentially leading to suboptimal policy learning, for example, prioritizing reliability at the expense of exploration. The proposed mechanism allows the DQN to dynamically adapt to evolving microgrids conditions, effectively balancing the trade-off between exploration and risk control. Furthermore, it addresses the inherent limitations of fixed-weight schemes in multi-objective optimization, thereby guiding the algorithm toward global optimality in complex and uncertain environments. The
Mode_Bonus is defined as follows: if
> 0.6,
Mode_Bonus = 0.02, if
< 0.03,
Mode_Bonus = 0.015. If both conditions are met,
Mode_Bonus = 0.02, otherwise
Mode_Bonus = 0.
3.4. DQN-PSO Algorithm Architecture and Process
The DQN-PSO algorithm employs a dual-layer optimization architecture, where the DQN decision layer and the PSO optimization layer collaborate to achieve dynamic microgrid scheduling. The DQN decision layer selects actions based on the real-time state of the microgrid, dynamically adjusting PSO parameters and optimizing the Q-value network. It is implemented using a fully connected neural network with ReLU activation functions and the Adam optimizer, while its loss function is based on mean squared error (MSE). The PSO optimization layer fine-tunes the scheduling strategy based on the dynamically adjusted parameters. The particle swarm updates its velocity and position to search for the global optimal solution, ensuring alignment between the fitness function of PSO and the reward function of DQN.
Through a closed-loop feedback mechanism, DQN continuously refines PSO parameters and strategy selection, while PSO generates scheduling strategies and provides feedback for further optimization. This dual-layer framework overcomes the limitations of traditional PSO with fixed parameters, significantly enhancing adaptability to load fluctuations and weather variations.
Figure 4 illustrates the collaborative optimization architecture.
In standalone microgrid scheduling, the system’s operational state is highly dynamic and uncertain. Traditional optimization methods often fail to adapt efficiently, leading to suboptimal or unstable scheduling results. By dynamically tuning PSO parameters in response to real-time system feedback, the DQN-PSO algorithm improves scheduling flexibility and robustness.
DQN-PSO Algorithm Steps are as follows:
Initialize the microgrid environment and set up the DQN and PSO parameters.
At each time step t
DQN acquires the current microgrid state S(t);
DQN selects an action A(t) using the ε-greedy strategy;
The selected action modifies PSO parameters based on the corresponding strategy;
PSO iterates to generate an optimized scheduling solution;
The microgrid executes the scheduling strategy, updates its state to S(t + 1), and calculates the reward R(t);
The interaction experience is stored in a replay buffer, and the DQN is trained by sampling from stored experiences;
The target network is periodically updated to ensure training stability.
Repeat the above steps until the termination condition is met, and output the optimal scheduling strategy.
Figure 5 presents the detailed flowchart of the algorithm.
4. Example Analysis
In this study, a standalone wind–solar–diesel–battery microgrid, as illustrated in
Figure 1, is selected as the research subject. A simulation model is developed using the open-source Python microgrid simulator pymgrid [
27]. The previously established mathematical models of the microgrid system are employed to characterize the power output of each component and the evolution of the energy storage state. The photovoltaic (PV) output is modeled based on solar radiation intensity and temperature models, and the wind power output follows a Weibull wind speed distribution [
28]. Pymgrid provides flexible component modeling and data processing functionalities, enabling the customization of load profiles and renewable energy output curves.
The key parameters of the generation units and the energy storage system are listed in
Table 1 and
Table 2 [
29,
30].
The typical daily profiles of load demand, photovoltaic (PV) power generation, and wind power generation for the standalone microgrid are depicted in
Figure 6. The PV peak output reaches 30 kW, while the average wind power output is 55.65 kW. The load profile exhibits a dual-peak characteristic, with peaks occurring in the morning and evening.
4.1. Benchmark Experiments
To validate the effectiveness of the DQN-PSO method, it is compared against the following two baseline approaches: Standard Particle Swarm Optimization (PSO) and Random Policy. PSO Utilizes fixed parameter settings, including inertia weight and learning factors (
,
). Random Policy Serves as a lower-bound baseline, where the diesel generator (DG) and battery power outputs are randomly allocated within feasible limits.
Figure 7 provides a visual comparison of the DG power output and battery storage output over 24 h for the three methods.
As observed from Experimental Results, the DQN-PSO method (red solid line) demonstrates adaptive scheduling. During peak load periods, the DG output closely follows the demand, while during low-demand periods, DG utilization is minimized, relying primarily on renewable energy and battery storage. Power shortages (red ‘×’) occur only occasionally, typically during rapid load transitions. The battery storage system (blue solid line) efficiently manages charging and discharging, discharging effectively during peak load and charging up to −20 kW during off-peak hours, ensuring state-of-charge (SOC) balance. In contrast, the standard PSO method (purple dashed line) exhibits more pronounced DG fluctuations with frequent power shortages (purple ‘×’), especially between 0–5 h and 10–15 h. Its battery charging and discharging behavior (cyan dashed line) is irregular, failing to effectively support load demand. The random policy (gray dotted line) results in unpredictable DG output fluctuations, widespread power shortages (gray ‘×’), and battery storage output (light gray dotted line) that randomly oscillates between −20 kW and 20 kW, lacking coordination. These results demonstrate that DQN-PSO optimally schedules DG utilization and battery storage, significantly enhancing renewable energy utilization while reducing fossil fuel dependency.
4.2. Performance Comparison of Indicators
To comprehensively evaluate the dynamic adaptability of the DQN-PSO algorithm, three microgrid operation scenarios are designed. These scenarios introduce different perturbation factors based on typical daily data (
Figure 6), simulating complex environments. Scenario 1 serves as the baseline scenario without additional perturbations, representing a typical microgrid operation environment with relatively stable supply–demand balance. Scenario 2 is the high-fluctuation scenario, where Gaussian noise is added to the PV and WT outputs (with a noise standard deviation of 20% and a minimum truncation value of 0) to simulate the uncertainty in power generation caused by extreme weather conditions or equipment fluctuations, while the load remains unchanged. This scenario increases the variability of renewable energy output, making supply–demand matching more challenging. Scenario 3 simulates low electricity demand during nighttime or holidays, where the PV and WT outputs remain the same as in the baseline scenario, but the load is reduced by 20%, increasing the proportion of renewable energy utilization.
All data are generated using pymgrid, with component outputs computed based on the previously established mathematical models, ensuring that simulation results accurately reflect real-world microgrid behavior. The scheduling period is set to 24 h, with a time step of 1 h. In each scenario, the algorithm runs 10 times, and the mean and standard deviation of key performance metrics are recorded, as shown in
Table 3.
The results indicate that the DQN-PSO method outperforms the Standard PSO in all three scenarios. For example, in the baseline scenario, DQN-PSO improves renewable energy utilization by approximately 3.2% while significantly reducing power supply risk. In the high-fluctuation scenario, DQN-PSO increases renewable energy utilization by about 4.5% and reduces power supply risk by 3%, demonstrating its adaptability to uncertain environments. In the low-load scenario, DQN-PSO increases renewable energy utilization to 78.09%, showcasing its superior resource utilization capability.
To further analyze the dynamic adaptability of the DQN-PSO algorithm, this study examines the average usage frequency of the three strategies (Global Search, Local Adjustment, and Reliability Enhancement) in the three scenarios, based on statistics from 10 runs (as shown in
Figure 8,
Figure 9 and
Figure 10). These strategies are dynamically selected by DQN for global exploration, local optimization, and risk control, respectively.
In the baseline scenario, Local Adjustment dominates, indicating that the algorithm primarily responds to normal fluctuations in load and renewable energy output through local optimization. Global Search is mainly used during PV output peaks to optimize renewable energy utilization. Reliability Enhancement is used less frequently, reflecting the low demand for risk control in a stable supply–demand environment.
In the high-fluctuation scenario, Local Adjustment frequency significantly increases, suggesting that the algorithm prioritizes local adjustments to handle the substantial variations in PV and WT output. Reliability Enhancement is used at a moderate level to help mitigate power supply risks. Global Search is rarely used, indicating that under high uncertainty conditions, the priority of global exploration decreases.
In the low-load scenario, Local Adjustment remains dominant, indicating that the algorithm adapts to the reduced load through local optimization. Global Search is used more frequently during PV output peaks to optimize renewable energy utilization. Reliability Enhancement is less needed, reflecting the lower risk of power supply issues under low-load conditions.
The strategy usage frequency distributions in
Figure 8,
Figure 9 and
Figure 10 align closely with the performance data in
Table 3. For instance, in the high-fluctuation scenario, the significant increase in Reliability Enhancement frequency (0.6–0.9) directly reduces power supply risk (from 4.04% to 1.04%). In the low-load scenario, the moderate use of Global Search (0.1–0.4) contributes to an increase in renewable energy utilization to 78.09%. However, the dominant use of Reliability Enhancement across all scenarios (frequency 0.5–0.9) may indicate an excessive preference for risk control, potentially limiting the effectiveness of Global Search in stable conditions. Future improvements could involve adjusting the reward function weights or mode-switching thresholds to further optimize strategy allocation and enhance overall algorithm performance.
4.3. Parameter Sensitivity Analysis
This study further analyzes the impact of DQN parameters (learning rate α and discount factor γ) and PSO parameters (inertia weight) on the algorithm’s performance. In the baseline scenario (Scenario 1), each parameter combination is tested over 10 runs, with renewable energy utilization and power supply risk recorded to evaluate performance sensitivity to parameter variations.
Figure 11 presents the sensitivity analysis results of the DQN-PSO algorithm with respect to inertia weight and learning rate
α. Subfigures (a) and (b) illustrate the effect of inertia weight on renewable energy utilization and power supply risk, respectively, while subfigures (c) and (d) depict the impact of learning rate α. The results indicate that the algorithm achieves optimal performance when the inertia weight is set to 0.7, and the learning rate
α is 0.0005.
Figure 12 shows the sensitivity analysis results of the DQN-PSO algorithm concerning the discount factor
γ. Subfigures (a) and (b) illustrate the impact of
γ on renewable energy utilization and power supply risk, respectively. The analysis reveals that the best balance between the two performance metrics is achieved when
γ = 0.98.
5. Discussion
This study proposes a DQN-PSO-based scheduling method for standalone microgrids and validates its effectiveness in enhancing renewable energy utilization and ensuring power supply reliability through simulation experiments. The results demonstrate that the proposed method significantly improves microgrid operational performance under various scenarios. In the typical-day scenario, the clean energy utilization rate increased from 65.28% (standard PSO) to 68.51%, while the power supply reliability risk decreased from 2.45% to 0.70%. Under high-fluctuation conditions, clean energy utilization improved by approximately 5%, and the reliability risk was reduced by about 3%, indicating strong adaptability to environmental variability.
These improvements are primarily attributed to the dynamic parameter tuning and multi-strategy switching mechanisms of the DQN-PSO algorithm. The DQN component leverages deep Q-networks to dynamically optimize PSO parameters, thereby enhancing global search capabilities and effectively mitigating the risk of local optima—a common issue in conventional PSO. Parameter sensitivity analysis suggests that the optimal configuration is w = 0.7, α = 0.0005, and γ = 0.98. Notably, the discount factor γ should be adjusted in accordance with actual microgrid operating conditions to balance multiple optimization objectives. In extreme scenarios, the algorithm’s global search capability may still be constrained by the initial particle distribution, introducing a risk of suboptimal convergence. To counter this, the multi-strategy switching mechanism enhances the robustness of the scheduling process. DQN-PSO can dynamically alternate between distinct search strategies based on real-time operating conditions, thereby effectively addressing renewable energy uncertainties. For instance, in highly volatile scenarios, the algorithm prioritizes exploratory strategies to better accommodate rapid fluctuations in supply and demand.
Compared with traditional PSO, the adaptive nature of DQN-PSO markedly improves scheduling precision, particularly in complex and dynamic environments. Furthermore, the core mechanisms underlying this approach offer strong potential for broader applications in domains requiring dynamic optimization and robust decision-making—such as real-time energy management in intelligent traction power systems [
31] and the optimal control of shipboard power systems [
32].