An Adaptive Electric Vehicle Charging Management Strategy for Multi-Level Travel Demands

Zhang, Shuai; Guo, Dong; Zhou, Bin; Zheng, Chunyan; Li, Zhiqin; Ma, Pengcheng

doi:10.3390/su17062501

Open AccessArticle

An Adaptive Electric Vehicle Charging Management Strategy for Multi-Level Travel Demands

by

Shuai Zhang

¹,

Dong Guo

^1,*,

Bin Zhou

²,

Chunyan Zheng

³,

Zhiqin Li

¹ and

Pengcheng Ma

^1,*

¹

School of Transportation and Vehicle Engineering, Shandong University of Technology, Zibo 255000, China

²

State Key Laboratory of Intelligent Transportation System, Beijing 100088, China

³

School of Management, Shandong University of Technology, Zibo 255000, China

^*

Authors to whom correspondence should be addressed.

Sustainability 2025, 17(6), 2501; https://doi.org/10.3390/su17062501

Submission received: 4 February 2025 / Revised: 2 March 2025 / Accepted: 10 March 2025 / Published: 12 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

As the adoption of electric vehicles (EVs) continues to rise, the pressure on charging station resources has intensified, particularly under high-load conditions, where limited charging infrastructure struggles to meet the growing demand. Issues such as uneven resource allocation, prolonged charging wait times, fairness concerns among different user groups, and inefficient scheduling strategies have significantly impacted the overall operational efficiency of charging infrastructure and the user experience. Against this backdrop, the effective management of charging infrastructure has become increasingly critical, especially in balancing the diverse mobility needs and service expectations of users. Traditional charging scheduling methods often rely on static or rule-based strategies, which lack the flexibility to adapt to dynamic load environments. This rigidity hinders optimal resource allocation, leading to low charging pile utilization and reduced charging efficiency for users. To address this, we propose an Adaptive Charging Priority (ACP) strategy aimed at enhancing charging resource utilization and improving user experience. The key innovations include (1) dynamic adjustment of priority parameters for optimized resource allocation; (2) a dynamic charging station reservation algorithm based on load status and user arrival rates to prioritize high-priority users; (3) a scheduling strategy for low-priority vehicles to minimize waiting times for non-reserved vehicles; and (4) integration of real-time data with the DDPDQN algorithm for dynamic resource allocation and user matching. Simulation results indicate that the ACP strategy outperforms the FIFS and RFWDA strategies under high-load conditions (High-priority vehicle arrival rate: 22 EV/h, random vehicle arrival rate: 13 EV/h, maximum parking duration: 1200 s). Specifically, the ACP strategy reduces charging wait times by 96 s and 28 s, respectively, and charging journey times by 452 s and 73 s. Additionally, charging station utilization increases by 19.5% and 11.3%. For reserved vehicles, the ACP strategy reduces waiting times and journey times by 27 s and 188 s, respectively, while increasing the number of fully charged vehicles by 104. For non-reserved vehicles, waiting and journey times decrease by 213 s and 218 s, respectively, with a 75 s increase in fully charged vehicles. Overall, the ACP strategy outperforms traditional methods across several key metrics, demonstrating its advantages in resource optimization and scheduling.

Keywords:

electric vehicles; EV charging; travel demand; charging priority; charging optimization

1. Introduction

As global environmental issues intensify, electric vehicles (EVs) have become a critical component of green transportation [1], playing a key role in addressing environmental pollution and the energy crisis [2]. Compared to traditional internal combustion engine vehicles, EVs significantly reduce greenhouse gas emissions and decrease dependence on fossil fuels [3]. However, the widespread adoption of EVs still faces challenges such as long charging wait times and unevenly distributed charging stations, which severely impact user experience [4,5].

According to the International Energy Agency (IEA) 2024 report, the adoption of electric vehicles (EVs) is expected to reduce carbon dioxide emissions by approximately 83 million tons in 2024, significantly lowering greenhouse gas emissions in the global transportation sector. Additionally, lifecycle assessment (LCA) studies indicate that, compared to internal combustion engine vehicles, EVs can reduce greenhouse gas emissions by 30% to 50% over their entire lifecycle, with the exact reduction depending on the energy mix and manufacturing processes [6]. A study published in Nature Sustainability systematically evaluated the impact of EV adoption on air quality in major urban clusters in China. The research found that, in a scenario where the electrification rate of passenger vehicles reaches 27%, the average annual population-weighted concentration of PM_2.5 in three major urban clusters is expected to decrease by 0.5 µg/m³ (a reduction of 2% to 3%) by 2030, while the annual average concentration of NO₂ is projected to decrease by a more significant 15% to 20% [7]. Moreover, practical experiences from countries such as Norway further demonstrate that the widespread adoption of EVs plays a positive role in reducing urban air pollution and decreasing reliance on fossil fuels. In 2024, the share of pure electric vehicles in Norway’s new car sales reached 88.9%, up from 82.4% in 2023 [8]. Additionally, with more than 98% of Norway’s electricity sourced from renewable energy (primarily hydropower), the growth of electric vehicles has effectively reduced the country’s dependence on fossil fuels [9].

The congestion in charging services is a core barrier to the widespread use of EVs. In particular, when charging stations are unevenly distributed or charging demand surges, users often face long queues [10]. Therefore, optimizing charging scheduling to improve efficiency and reduce wait times has become an urgent issue.

Existing research primarily focuses on charging scheduling methods based on factors such as energy consumption and time costs [11,12]. For example, ref. [13] proposed a strategy to reduce overall charging time by optimizing charging order. However, these methods fail to adequately address the personalized needs of different users [14,15]. While differentiated charging services have been proposed [16], research in this area remains relatively limited.

In terms of differentiated queue waiting time scheduling, the challenges in current research include how to integrate scheduled charging with priority-based services to meet the needs of different users [17,18], and how to balance fairness and demand for high- and low-priority users, particularly in cases of grid congestion or power outages [19,20].

The charging demand for electric vehicles (EVs) is highly dynamic, especially in driving mode, where EVs need to select charging stations in real-time to extend their driving range. As the number of EVs increases, the charging demand becomes more complex, requiring real-time optimization of charging station selection and effective resource management during peak demand periods to reduce waiting times.

To optimize charging station selection, the central controller (GC) aggregates the status of charging stations and charging requests, making global decisions [21]. Some studies have proposed charging station selection strategies based on minimizing waiting times, but the real-time uncertainty of charging station statuses affects their application [22]. Other research has introduced reservation information to improve the accuracy of charging station predictions, yet challenges remain regarding the EVs’ travel routes and parking durations [23,24].

Charging scheduling research primarily focuses on the parking mode and charging station selection for EVs in transit [25,26]. Most studies adopt the First-In-First-Serve (FIFS) strategy but fail to adequately account for the actual needs of EVs and their waiting times [27,28]. To improve scheduling strategies, models based on waiting times [29] and methods using dynamic programming and game theory [30] have been proposed, though they do not consider charging completion times and parking duration constraints [31].

To address parking duration issues, ref. [32] proposed a parking-duration-driven charging optimization strategy, ref. [33] optimized scheduling for charging duration and energy consumption, and [34] introduced a scheduling strategy for heterogeneous EVs, prioritizing high-priority EV needs. Research on charging station selection has also advanced, with strategies based on expected waiting times performing well in highway scenarios [35,36], and the introduction of pricing strategies to alleviate congestion [37,38], further enhancing the efficiency of charging station selection.

Recently, deep reinforcement learning (DRL) has been introduced to the field of EV charging scheduling, yielding significant progress. Deep Q-Networks (DQN) have been particularly effective in optimizing charging station resource allocation, reducing waiting times, and improving resource utilization [39,40]. The introduction of Double DQN (DDQN) addresses the issue of Q-value overestimation, enhancing the accuracy and stability of scheduling decisions [41]. Furthermore, Prioritized DQN has further improved training efficiency and real-time performance [42]. Recent advances in battery management, particularly the integration of PSO with TCN and attention mechanisms, have provided valuable insights for dynamic resource optimization in EV charging scheduling [43,44]. These models improve SOC estimation accuracy, aiding in peak-period resource allocation and utilization.

Although existing research has yielded positive results in single-charging station scheduling, most studies focus on individual station optimization, lacking joint optimization across multiple charging stations and user groups. Coordination and resource sharing between charging stations are critical in EV charging systems; thus, achieving overall optimization of the charging network is key to improving scheduling efficiency.

In summary, while significant progress has been made in EV charging scheduling, notable limitations remain in addressing the complexity of multi-tier user demands and dynamic resource allocation. Existing methods primarily focus on users with a single priority level or uniform service strategies, often overlooking the in-depth exploration of charging demand heterogeneity and equitable resource distribution. This makes it difficult to balance service guarantees for high-priority users with fairness for low-priority users. The issue analysis of existing research is shown in Figure 1.

To address the gaps in existing research, this paper proposes an Adaptive Charging Priority (ACP) strategy tailored to multi-tier travel demands in electric vehicle (EV) charging management. The ACP strategy integrates an attention-based Long Short-Term Memory (ALSTM) model to predict the arrival rate of high-priority users, dynamically adjusting resource allocation to ensure sufficient resources for high-priority users during peak times, thus minimizing waiting times. Additionally, a Simulated Annealing (SA) algorithm is employed to dynamically adjust the resource allocation for low-priority users, minimizing their waiting times. Finally, a resource allocation and user matching algorithm based on the Dueling Prioritized Deep Q Network (DDPDQN) is used to optimize resource distribution in real time, with the goal of minimizing the waiting times of high-priority users.

Simulation results indicate that the ACP strategy significantly reduces waiting times for high-priority users under high-load conditions, improves charging efficiency for low-priority users, and enhances charging station resource utilization. This research provides an optimized management solution for charging stations, improves user charging experiences, and supports the widespread adoption of electric vehicles and the development of sustainable transportation. The methodology framework is illustrated in Figure 2.

The structure of this paper is organized as follows: Section 1 reviews the relevant literature and analyzes the strengths and weaknesses of existing methods; Section 2 introduces the scheduling strategy for reserved vehicles; Section 3 presents the scheduling strategy for non-reserved vehicles; Section 4 discusses the design and implementation of the DDPDQN algorithm; Section 5 presents the simulation experiments and results analysis; Section 6 concludes the paper by summarizing the contributions and outlining future research directions.

2. Scheduling Strategy for Reserved Vehicles

2.1. Charging Priority Parameter Calculation Model

To optimize EV charging scheduling, we introduce a set of key parameters to characterize charging priority, resource utilization, and system performance. To ensure clarity in defining and understanding these variables, Table 1 provides an overview of the primary symbols used in this study along with their corresponding definitions.

2.1.1. Attention Mechanism

The attention mechanism assigns dynamic weights to time steps, allowing the model to prioritize crucial information and effectively manage long sequences. Unlike traditional encoder–decoder architectures, which rely on fixed-length context vectors and struggle with long sequences, the attention mechanism adapts by emphasizing important features during decoding, thereby improving the representation of long-term information, as illustrated in Figure 3.

2.1.2. Attention-LSTM Model

The arrival rates of high-priority EV users are affected by factors like charging station occupancy, visit days, and traffic flow. While traditional LSTM models can capture long-term trends, they often struggle with accuracy during crucial rate fluctuations due to their fixed weight allocation. To overcome this limitation, the attention-based LSTM (ALSTM) model dynamically adjusts the weights at each time step, concentrating on key periods to enhance both prediction accuracy and resource distribution. The structure of the ALSTM model is depicted in Figure 4.

As shown in Figure 4, the attention model is positioned at the output of the LSTM framework in the ALSTM structure. The input sequence x_t₋₁, x_t₋₂, …, x_t is processed through the LSTM’s forget, input, and output gates to generate the intermediate state ht. The attention layer calculates the weights W_ij of the intermediate state using a fully connected layer. These weights are then applied to the states, and the weighted sum generates the feature vector c, which is used to produce the final prediction. The scoring function S_ij is defined as follows:

S_{i j} = ν \tan h (W \cdot h_{j} + U \cdot h_{i - 1}^{'} + b)

(1)

W_{i j} = \frac{\exp (S_{i j})}{\sum_{k = t - n}^{t} \exp (S_{ik})}

(2)

where

ν

,

W

, and

U

represent the weighting factors, while

b

denotes the bias factor.

During the training phase, input features, such as arrival date, time period, and reservation status, are processed sequentially through the LSTM and attention layers. The LSTM extracts both long-term and short-term features from the time series, while the attention layer dynamically allocates weights to emphasize key time periods. The final output is the predicted arrival rate for high-priority users. The model’s performance is evaluated using the Mean Absolute Error (MAE) and the coefficient of determination (R²), with the formulas as follows:

MAE = \frac{1}{N} \sum_{i = 1}^{N} | {\hat{r}}_{i, h} - {\hat{r}}_{i, h} |

(3)

R^{2} = 1 - \frac{{\sum_{i = 1}^{N} (r_{i, h} - {\hat{r}}_{i, h})}^{2}}{{\sum_{i = 1}^{N} (r_{i, h} - {\bar{r}}_{h})}^{2}}

(4)

where

{\hat{r}}_{i, h}

and

r_{i, h}

represent the predicted and actual values of the i sample, respectively, and N denotes the total number of samples.

This study’s key time periods are identified based on the analysis of the publicly available ACN-Data dataset provided by Caltech, which is a standard dataset for electric vehicle (EV) charging optimization research. The dataset includes approximately 40,000 charging records and is useful for analyzing the temporal patterns of EV charging behavior. More information about this dataset can be found in Section 5.2.1 “Dataset”. By analyzing factors such as EV arrival rates, charging station utilization, and reservation status, we identified the following key time periods:

(1): Late-night high charging demand period (23:00–24:00, 22:00): During the late-night period, especially from 23:00 to 24:00, vehicle arrival rates significantly increase as many users charge their vehicles after parking to ensure their travel needs for the following day. During this period, Attention-LSTM assigns higher attention weights to improve the prediction accuracy of late-night vehicle arrival behavior.
(2): Midday short-term charging peak period (12:00–14:00): During the lunch break, there is a short-term surge in vehicle arrivals. Attention-LSTM focuses on the short-term high-frequency arrival patterns, ensuring accurate prediction of vehicles’ arrival rates during this brief peak period.
(3): Evening rush hour and load fluctuation period (18:00–20:00, 16:00–17:00): During the evening commute, charging station utilization rises dramatically, leading to a sudden spike in charging demand. Attention-LSTM dynamically adjusts its attention allocation during these periods to accurately predict vehicle arrival fluctuations during the evening rush.
(4): Low-load but highly volatile periods (10:00–11:00, 15:00, 21:00): Although the overall arrival rate during these periods is low, there are fluctuations in the arrival rates of some high-priority vehicles, especially around 21:00, when some users prepare for nighttime charging. Attention-LSTM allocates attention weights moderately to avoid overfocusing on non-critical periods while capturing potential abnormal fluctuations.

To verify the rationality of selecting high-priority vehicle arrival rate key time periods, we trained the Attention-LSTM model using the ACN-Data dataset and calculated the attention scores (S_ij) using Formula (7). We then applied Softmax normalization (Formula (8)) to compute the attention weights (W_i) and obtained the attention weight distribution for the simulation period (10:00–24:00).

Figure 5 and Figure 6 display the distribution of high-priority vehicle arrival rate attention weights and their predicted attention weight heatmap. The line chart reflects Attention-LSTM’s focus on different time periods, while the heatmap visually presents the distribution of attention weights across time periods. The results show that Attention-LSTM assigns higher attention weights to 23:00–24:00, 13:00, and 18:00–20:00, indicating that the model effectively identifies the major peak periods for high-priority vehicle arrivals. Specifically, during the late-night period (23:00–24:00), the arrival rate of high-priority EVs is the highest, followed by the midday (12:00–14:00) and evening peak (18:00–20:00), which aligns with the charging behavior patterns of commuting high-priority users. In contrast, during low-load periods (such as 10:00–11:00 and 15:00), attention weights are the lowest, showing that Attention-LSTM appropriately reduces attention to these periods, thereby optimizing resource allocation strategies. The attention weight distribution aligns closely with the key time periods defined and selected from the ACN-Data dataset, demonstrating that Attention-LSTM optimized with attention weights can effectively capture the critical arrival times for high-priority vehicles, improving prediction accuracy and optimizing charging resource allocation.

Experiments with step sizes of 5, 10, 15, and 20 were conducted to optimize the ALSTM model, as shown in Figure 7 and Table 2. A step size of 10 demonstrated the fastest error reduction and the lowest final error, indicating superior training efficiency and stability. Table 1 confirms that a step size of 10 achieves the lowest MAE (0.983) and the highest R² (0.937), highlighting its superior prediction accuracy and fit. Smaller step sizes (e.g., 3) fail to fully utilize time series information, while larger ones (e.g., 20) reduce the model’s ability to capture key features. The selection of the time step directly impacts the prediction accuracy of high-priority vehicle arrival rates in the ALSTM model, particularly during the simulation period (10:00–24:00), where charging demand fluctuates significantly across different time intervals. A shorter time step (e.g., 3 or 5) may result in insufficient temporal dependency retention, preventing the model from effectively capturing variations in high-priority vehicle arrivals during peak demand periods (12:00–14:00, 18:00–20:00, and 23:00–24:00), thereby reducing predictive accuracy. Additionally, a short time step can cause the model to be overly sensitive to short-term local fluctuations (e.g., at 15:00 and 21:00), impairing its ability to fit overall trends. Conversely, a longer time step (e.g., 20) encompasses more historical information, but given the extended time span of 10:00–24:00, the arrival patterns of high-priority vehicles may exhibit substantial variations across different periods. An excessively long time step may introduce irrelevant information from low-load periods into high-load periods, increasing prediction noise. Furthermore, a time step of 20 incurs higher computational costs, requiring the model to process longer data sequences, which reduces both training and inference efficiency. Overall, a time step of 10 achieves the optimal balance between optimization speed, prediction accuracy, and computational efficiency. During peak demand periods (12:00–14:00, 18:00–20:00, and 23:00–24:00), this step length effectively captures high-priority vehicle arrival patterns, enhancing predictive precision. In low-load but highly volatile periods (10:00–11:00, 15:00, and 21:00), it mitigates excessive responsiveness to short-term anomalies, thereby improving model stability. Additionally, compared to a time step of 20, a step of 10 achieves optimal MAE and R² with lower computational costs, ensuring the reliability of the predictions. Therefore, a time step of 10 is determined to be the optimal choice for this experiment.

This study employs a deep learning-based modeling approach. To ensure optimal performance, hyperparameters were systematically evaluated through extensive experimentation. Given that training deep learning models typically demands significant time and computational resources, Adam was selected as the optimizer due to its superior performance in non-convex optimization problems, enabling accelerated convergence and reduced computational overhead. This choice is supported by previous studies [45], which demonstrate that Adam can adaptively adjust the learning rate, improving both training efficiency and stability, making it well-suited for the training requirements of this study.

Beyond the optimizer, this study systematically optimizes several key hyperparameters, including the time step, number of attention layer nodes, number of hidden layers and nodes, learning rate, batch size, number of training epochs, and dropout rate. A grid search method was employed to conduct experimental evaluations across a range of candidate hyperparameter values. The model’s performance was assessed based on mean absolute error (MAE) and the coefficient of determination (R²) to determine the optimal hyperparameter combination. For instance, the selection of the time step was determined by comparing model training outcomes across different step values (3, 5, 10, and 20). The results indicate that when the time step is set to 10, the model achieves the lowest MAE (0.983) and the highest R² (0.937), striking the best balance between training efficiency, predictive accuracy, and computational cost. As shown in Table 3, the selection of other hyperparameters follows a similar experimental optimization approach.

This study uses a single layer of dropout with a dropout rate set to 0.2. This setting is based on the findings of [46], which show that a single layer of dropout (with a rate between 0.2 and 0.5) effectively reduces overfitting. However, using too many dropout layers may lead to information loss, which can negatively impact predictive performance.

Finally, we conducted experimental tests based on different hyperparameter combinations and selected the optimal configuration (see Table 4), which achieved the best performance in terms of error metrics (MAE, R²).

During the training phase, the model uses Mean Squared Error (MSE) as the loss function to evaluate the prediction error. The specific algorithmic flow is as follows:

Step 1: Normalize input features, including historical arrival rates, arrival time, date, reservation status, charging demand, station location, and current utilization, to ensure stable model training. The normalized data are input into the model.

Step 2: Train the model for 100 epochs with 500 samples per epoch. The LSTM layer extracts temporal dependencies and outputs hidden state sequences, which the attention mechanism uses to assign weights and generate a weighted sum representation for predicting high-priority user arrival rates.

Step 3: Calculate the loss using Mean Squared Error (MSE) by comparing predicted and actual arrival rates.

Step 4: Fine-tune the model parameters using the Adam algorithm. Training continues until the loss stabilizes or the maximum number of epochs is reached. Once trained, the Attention-LSTM model is employed for real-time forecasting of high-priority user arrival rates, improving resource allocation efficiency at charging stations.

The algorithm flow is depicted in Figure 8.

To provide a basis for subsequent resource allocation, the predicted values are further converted into charging priority parameters Pc, which are used to reasonably allocate charging station resources and prioritize the charging needs of high-priority users during peak demand periods. The specific process for calculating the charging priority parameters is as follows:

Step 1: Initial Calculation of Charging Priority Parameters. The charging priority parameter P_c quantifies the priority of high-priority users in resource allocation. To meet the smooth demand response requirements, an improved Sigmoid function is used for mapping, ensuring smooth changes in the charging priority parameter:

P_{c} = \frac{1}{10} (\frac{1}{1 + e^{(- α ({\hat{r}}_{h} + β))}})

(5)

where α and β are learned parameters, with α being the standardized arrival rate deviation factor, which ensures that the increment of P_c aligns with the increment of the predicted arrival rate; β is initialized based on the average arrival rate under normal load conditions (non-peak demand). In this study, we set α = 0.15 and β = 14 [7].

Step 2: Resource Allocation Rule. Based on the charging priority parameter P_c, charging station resources are dynamically allocated at each time step. Let

N_{t}

be the total number of charging slots in the station. The number of reserved charging slots

N_{r}

is defined as:

N_{r} = ⌈\min (P_{c} \cdot N_{t}, N_{t})⌉

(6)

This rounding ensures that the number of reserved charging slots is an integer and satisfies the charging demand of high-priority users. The higher the value of P_c, the higher the arrival rate, indicating a greater urgency in the resource demand for high-priority users, thus increasing the number of reserved charging slots.

Step 3: Calculation of minimum and maximum expected waiting times. The minimum expected waiting time E_min represents the expected waiting time for high-priority users when the load is low (high priority, low arrival rate, and sufficient reserved charging slots). It can be calculated using steady-state queuing theory [15]:

E_{\min} = \frac{ρ_{h}}{μ (1 - ρ_{h})}

(7)

ρ_{h} = \frac{r_{h}}{μ \cdot N_{r}}

(8)

where

μ

is the service rate of a single charging slot (the number of vehicles serviced per unit time), and

ρ_{h}

is the load rate of reserved charging slots for high-priority vehicles.

The maximum expected waiting time E_max represents the expected waiting time for high-priority users when the load is high (low priority, high arrival rate, and insufficient reserved charging slots). It can also be calculated using steady-state queuing theory:

E_{\max} = \sum_{k = 0}^{N_{r} - 1} \frac{{(ρ_{h})}^{k}}{k!} \cdot \frac{ρ_{h}}{N_{r} \cdot (1 - ρ_{h})}

(9)

Step 4: Dynamic adjustment of the charging priority parameter. To ensure both resource allocation efficiency and fairness while meeting the service demands of high-priority and random users, the following dynamic adjustment logic for the charging priority parameter P_c is designed:

If E_min <

E_{thr}^{\min}

or E_max >

E_{thr}^{\max}

, adjust P_c:

P_{c}^{'} = P_{c} + Δ P

(10)

where

E_{thr}^{\min}

and

E_{thr}^{\max}

represent the minimum and maximum charging waiting times, respectively. The value of

Δ P

ranges from 0.05 to 0.1, with the optimal increment determined through cross-validation. This ensures that the charging priority parameters respond quickly during peak demand periods while avoiding excessive fluctuations. In this study,

Δ P

is set to 0.05.

The value range of the charging priority parameter

P_{c}

, ensuring that E_min ≥

E_{thr}^{\min}

and E_max ≤

E_{thr}^{\max}

, can be determined based on the minimum expected waiting time (E_min) and maximum expected waiting time (E_max).

To ensure fairness, the average queue waiting time in the region, T_avg, is defined as follows:

T_{avg} = \frac{T_{h} + φ \cdot T_{l}}{1 + φ}

(11)

where

T_{h}

represents the average queue waiting time for high-priority (reserved) vehicles,

T_{l}

represents the average queue waiting time for low-priority (random) vehicles, and

φ

is the weight coefficient for the average queue waiting times of reserved and random vehicles. Once

φ

is determined, the charging priority parameter

P_{c}

is selected based on the minimum value of T_avg within the range of

P_{c}

, which in turn determines the number of reserved charging stations.

To prevent the value of P_c from exceeding reasonable limits during continuous adjustments, upper and lower bounds,

P_{\min}

and

P_{\max}

, are set to constrain the range of the charging priority parameter.

P_{\min} \leq P_{c} \leq P_{\max}

(12)

By setting these reasonable upper and lower limits, extreme fluctuations in the charging priority parameter can be avoided during periods of high or low demand, ensuring the stability of resource allocation.

2.2. Algorithm for Reserving Charging Piles for Selection

In order to prioritize the needs of high-priority charging users and dynamically reallocate non-reserved charging stall resources during fluctuations in demand, the selection of reserved charging stalls must be governed by an adaptive adjustment mechanism. By defining selection variables, creating a profit function, and imposing constraints, the algorithm can optimize scheduling within resource constraints, ultimately improving the overall efficiency of charging resource utilization.

The reserved charging stall selection rules include three core conditions, each of which operates at different load levels (charging station occupancy rates) to adapt to system scheduling requirements:

Condition 1: Priority allocation for high-priority users. During peak charging times, to meet the service demands of high-priority users, available reserved charging stalls are prioritized for allocation to high-priority users in the queue. To implement this allocation strategy, selection variables and a profit function are defined to optimize resource utilization.

A binary selection variable,

ω_{j k} (t)

, is introduced to represent the reservation status of a charging stall:

ω_{j k} (t) = \{\begin{array}{l} 1, & If k is a reserved pile, \\ 0, & otherwise \end{array}

(13)

Here, j represents the charging station number, and k represents the charging stall number.

To assess the service effectiveness of reserved charging stalls, we define a profit function

u_{j k}

, which represents the service gain when charging stall k is allocated as a reserved charging stall:

u_{j k} = v_{i} - τ_{j k} g_{j k}

(14)

where

v_{i}

represents the service value per unit time provided to high-priority users,

τ_{j k}

denotes the time that charging stall k is occupied by high-priority users, and

g_{j k}

represents the resource consumption per unit time of charging stall k.

At time step t, if the number of reserved charging stalls

N_{r}

exceeds the current queue length

Q_{h} (t)

for high-priority users, the number of reserved charging stalls

N_{r}^{'}

allocated to high-priority users can be expressed as:

N_{r}^{'} = ⌈\min (N_{r}, Q_{h} (t))⌉

(15)

where

N_{r}

represents the number of reserved charging stalls, which is dynamically adjusted based on the charging priority parameter P_c. The calculation formula for

N_{r}

is provided in Section 2.1.

N_{r} = ⌈\min (P_{c} \cdot N_{t}, N_{t})⌉

(16)

This condition ensures that, when reserved charging stalls are available, high-priority users can immediately receive charging services, thus avoiding unnecessary waiting when resources are plentiful.

Condition 2: Load (charging station stall occupancy) status monitoring. When all reserved charging stalls are occupied by high-priority users (i.e.,

Q_{h} (t)

≥

N_{r}

) and the number of high-priority users in the queue continues to increase, the system must monitor the load status of non-reserved charging stalls. If there are available resources in non-reserved charging stalls, a portion of these stalls can be temporarily allocated to meet the urgent demand from high-priority users.

When the queue demand for high-priority users exceeds the number of reserved charging stalls, the selection variable

ω_{j k}

is set to 1, indicating that charging stall k is temporarily assigned as a reserved stall. The optimization objective at this stage is to maximize the total benefit of selecting reserved charging stalls:

\max_{ω_{j k}} \sum_{j} \sum_{k} u_{j k} ω_{j k}

(17)

The number of dynamically allocated non-reserved charging stalls cannot exceed the set non-reserved capacity limit

N_{non}^{j}

, ensuring that the resource needs of low-priority users are not affected. The constraint for the capacity limit is as follows:

\sum_{k} ω_{j k} \leq N_{non}^{j} - Q_{l}^{j} (t)

(18)

The number of temporarily allocated non-reserved charging stalls

N_{tem, non}^{j}

can be expressed by the following formula:

N_{tem, non}^{j} = \min (N_{non}^{j} - Q_{l}^{j} (t), Q_{r}^{j} (t) - N_{r}^{j})

(19)

where

Q_{l}^{j} (t)

is the queue length for low-priority vehicles (random vehicles) at charging station j,

Q_{r}^{j} (t)

is the queue length for high-priority vehicles at charging station j,

N_{non}^{j}

is the number of non-reserved charging stalls, calculated as:

N_{non}^{j} = N_{t}^{j} - N_{r}^{j}

(20)

This condition ensures that, when the demand from high-priority users exceeds the capacity of reserved charging stalls, the system can flexibly allocate non-reserved resources to meet the service demands of high-priority users. At the same time, the temporary allocation is constrained by the load state of low-priority users, preventing excessive waiting times for them.

Condition 3: Dynamic Resource Recovery Mechanism. To prioritize the charging needs of high-priority users, the dynamic resource recovery mechanism stipulates that reserved charging stalls at charging station j must not be used by low-priority users during the reservation period. The set of reserved charging stalls is denoted as

C_{r}^{j}

, while the set of temporarily allocated non-reserved charging stalls is denoted as

C_{a}^{j}

. Resource allocation is managed using a binary selection variable

w_{j k} (t)

, with the following specific rules:

When charging stall k belongs to the reserved set

C_{r}^{j}

of charging station j and has been reserved by a high-priority user (i.e.,

w_{j k} (t)

= 1), the usage decision variable

x_{k}^{j, l} (t)

for low-priority users must be set to 0 to ensure exclusive access for high-priority users:

\forall j \in J, \forall k \in C_{r}^{j}, \forall t \in T, if w_{j k} (t) = 1, then x_{k}^{j, l} (t) = 0

(21)

When a high-priority vehicle arrives and reserves charging stall k, the system sets the high-priority usage variable

x_{k}^{j, h} (t)

to 1, while ensuring the low-priority variable

x_{k}^{j, l} (t)

is set to 0.

\forall j \in J, \forall k \in C_{r}^{j}, \forall t \in T, if x_{k}^{j, h} (t) = 1, then x_{k}^{j, l} (t) = 0

(22)

For each charging station j, the status of temporarily allocated non-reserved charging stalls in set

C_{a}^{j}

depends on the current queue of high-priority users, denoted as

Q_{h}^{j} (t)

Specifically, temporarily allocated stalls in set

k \in C_{a}^{j}

are only made available to low-priority users when the high-priority user queue is empty, i.e.,

\forall j \in J, \forall k \in C_{a}^{j}, \forall t \in T, x_{k}^{j, h} (t) = \{\begin{cases} 1, if Q_{h}^{j} (t) = 0, \\ 0, oherwise \end{cases}

(23)

To further regulate the restoration process of temporarily allocated charging stalls, when the queue of high-priority users at charging station j is empty, the temporarily allocated stall set

C_{a}^{j}

will revert from the temporary allocation state to the non-reserved state, i.e.,

\forall j \in J, \forall k \in C_{a}^{j}, \forall t \in T, if Q_{h}^{j} (t) = 0, then k removes C_{a}^{j} from C_{r}^{j}

(24)

Additionally, the corresponding temporary allocation state variable

x_{k}^{j, temp} (t)

will be set to 0:

x_{k}^{j, temp} (t) = \{\begin{array}{l} 1, & if k \in C_{a}^{j} and Q_{h}^{j} (t) > 0, \\ 0, & otherwise \end{array}

(25)

This mechanism ensures the exclusive use of reserved charging stalls by high-priority users during their reservation period, preventing interference from low-priority users. Furthermore, when demand from high-priority users decreases, the system can promptly restore temporarily allocated stalls to a non-reserved state for use by low-priority users. This not only improves charging stall utilization but also achieves dynamic resource allocation balance between users of different priorities.

3. Non-Reservation Vehicle Scheduling Strategy

3.1. Optimization Model of Scheduling Strategy

Let

r_{j}^{h}

represent the arrival rate of high-priority users,

N_{j}^{r}

the number of reserved charging stalls, and

N_{j}^{non} = N_{j}^{t} - N_{j}^{r}

the number of remaining charging stalls at charging station j, while

N_{j}^{t}

denotes the total number of charging stalls at station j. Based on the random vehicle arrival rate during the time interval [T,T + ΔT] and location information Loc (t), the random vehicle arrival rate

r_{r}^{l}

of r sub-region at time T − ΔT can be calculated. The scheduling strategy aims to minimize the total time cost by optimizing the distribution of random vehicles across different charging stations.

Assume the total arrival rate of random vehicles in sub-region r, denoted as

r_{r}^{l}

, follows a Poisson distribution. The proportion of random vehicles assigned to charging station j is

p_{r, j}

, and the random vehicle arrival rate at station j from sub-region r is given by:

r_{j}^{r, l} = r_{r}^{l} \cdot p_{r, j}

(26)

The queueing waiting time

W_{q} (j)

of random vehicles at charging station j can be calculated using the Erlang C queueing model [30]:

W_{q} (j) = \frac{P_{w} (j) \cdot {(\frac{r_{j}^{r, l}}{u_{j}})}^{N_{j}^{non}}}{N_{j}^{non}! \cdot {(1 - \frac{r_{j}^{r, l}}{N_{j}^{non} \cdot μ_{j}})}^{2}}

(27)

where

P_{w} (j)

is the probability that charging station j enters a queueing state (waiting probability), which can be calculated using Equation (28):

P_{w} (j) = \frac{\frac{{(r_{j}^{r, l} / μ_{j})}^{N_{j}^{non}}}{N_{j}^{non}!}}{\sum_{n = 0}^{N_{j}^{non} - l} \frac{1}{n!} {(\frac{r_{j}^{r, l}}{μ_{j}})}^{n} + \frac{N_{j}^{non}!}{{(N_{j}^{non} - λ_{r, j} / μ_{j})}^{2}} {(\frac{r_{j}^{r, l}}{μ_{j}})}^{N_{j}^{non}}}

(28)

where

μ_{j}

is the service rate per charging stall at station j, representing the number of vehicles a single charging stall can process per unit of time.

The travel time

T_{r, j}

represents the time taken by a random vehicle to travel from sub-region r to charging station j, and is calculated as:

T_{r, j} = \frac{d_{r, j}}{v_{r}}

(29)

where

d_{r, j}

is the distance from the sub-region to station j, and

v_{r}

is the average speed within sub-region r.

The objective function established using the waiting time and travel time is as follows:

Z = \min \sum_{r \in R} \sum_{j \in J} (r_{r}^{l} \cdot p_{r, j}) [(\frac{P_{w} (j) \cdot {(\frac{r_{r}^{l} \cdot p_{r, j}}{μ_{j}})}^{N_{j}^{non}}}{N_{j}^{non}! \cdot {(1 - \frac{r_{r}^{l} \cdot p_{r, j}}{N_{j}^{non} \cdot μ_{j}})}^{2}}) + (\frac{d_{r, j}}{v_{r}})]

(30)

st . \sum_{j \in J} p_{r, j} = 1, \forall r \in R

(31)

\sum_{r \in R} r_{r}^{l} p_{r, j} \leq μ_{j} N_{j}^{non}, \forall j \in J

(32)

\frac{\sum_{r \in R} r_{r}^{l} p_{r, j}}{μ_{j}} + N_{j}^{r} \leq N_{j}^{t}, \forall j \in J

(33)

Constraint (31) ensures that each random vehicle is assigned to a specific charging station, preventing both “unassigned” and “over-assigned” vehicles. Constraint (32) limits the total load at all charging stations, ensuring it does not exceed their maximum service capacity. Constraint (33) ensures that the total demand from random vehicles and high-priority users does not exceed the total charging stall capacity at each station.

3.2. Optimization Algorithm

This section employs the Simulated Annealing (SA) algorithm to address the random vehicle scheduling optimization problem described in Section 2.1. The problem involves minimizing total time cost by optimizing the distribution ratio of random vehicles from sub-region r to charging station j, a multi-objective, multi-constraint, non-linear optimization challenge. SA is chosen for its global search capability, excellent convergence, and suitability for complex objective functions and constraints, effectively avoiding local optima.

Figure 9 illustrates the solution process, from generating the initial solution to obtaining the optimal result, where Z represents the current solution, and Z^∗ represents the newly generated solution.

The algorithm operates based on the objective function and constraints (31)–(33) as follows: The vehicle distribution ratio is randomly initialized to generate a feasible initial solution. A new solution is generated using the annealing criterion, and its objective function value difference is calculated. The new solution is accepted with a probability determined by:

P = \{\begin{cases} 1, Δ F > 0, \\ \exp (- \frac{Δ F}{T}), Δ F < 0 \end{cases}

(34)

In this equation,

Δ F

represents the difference in objective function values, and T is the current temperature. The initial temperature T₀ decreases iteratively with k according to a factor

T = α T

, transitioning the search from global to local optimization.

To enhance efficiency, a sensitivity-based neighborhood search strategy is employed. It prioritizes adjustments in regions r and charging stations j that significantly influence the objective function, optimizing charging wait and travel times. This increases the likelihood of sampling high-quality solutions and accelerates convergence.

The algorithm ultimately outputs the optimal distribution ratio, minimizing the objective function globally while satisfying constraints and enhancing system performance.

The optimization performance of the Simulated Annealing (SA) algorithm is highly dependent on the selection of key hyperparameters. To ensure the scientific validity and rationality of these hyperparameters, this study references two seminal works: [47,48]. These studies systematically analyze optimization strategies within the SA framework and validate the impact of hyperparameter settings across various application scenarios.

Kirkpatrick et al., in “Optimization by Simulated Annealing” (Science), were the first to systematically introduce the optimization mechanism of SA and explore the effects of key hyperparameters, including the initial temperature (T₀), cooling factor (α), and termination temperature threshold (T_min), on optimization performance. Their findings indicate that setting the initial temperature T₀ within 10–20% of the objective function’s value range provides a strong global search capability while preventing excessive computational costs [47]. Based on this, our study sets T₀ = 100. The cooling factor α, typically within the range of 0.8–0.99, ensures a balance between convergence speed and search quality; therefore, we adopt α = 0.9. The termination temperature threshold T_min, typically set between 0.1 and 1% of the initial temperature, balances optimization accuracy and computational efficiency. Thus, we select T_min = 10⁻⁴.

Furthermore, van Laarhoven and Aarts, in “Simulated Annealing: Theory and Applications” (Springer), provide a comprehensive mathematical analysis of SA and discuss hyperparameter optimization strategies. Their research demonstrates that setting the maximum number of iterations K_max between 500 and 2000 achieves a balance between computational cost and solution quality; hence, we set K_max = 1000. Additionally, the neighborhood search range δα, which influences search stability, is recommended to be within ±0.005 to ±0.02. Based on this, we set δα = ±0.01.

The hyperparameter settings adopted for the SA algorithm in this study are summarized in Table 5.

4. Reservation Vehicle and Charging Pile Matching Algorithm

4.1. Reservation Vehicle and Charging Pile Matching Model

The matching problem between electric vehicle (EV) users and charging piles can be modeled as a reinforcement learning (RL) problem, described using a Markov Decision Process (MDP). The charging pile allocation environment is treated as the system state, while the scheduling strategy serves as the agent. The primary objective of the matching process is to dynamically allocate charging pile resources, minimizing the waiting time for high-priority users, while balancing the resource utilization efficiency for low-priority users. The overall system performance is optimized by considering both travel costs and queuing costs. This section models the problem from the following aspects.

The system state space S serves as the foundation for the decision model, containing key information that describes the operational status of the charging station and user demand. It includes the following variables: the queue lengths

Q_{h} (t)

and

Q_{l} (t)

for high- and low-priority users, the state-of-charge (SOC) thresholds

{SOC}_{h} (t)

and

{SOC}_{l} (t)

for high- and low-priority vehicles, the charging pile status and the number of reserved charging piles

P_{s} (t)

and

N_{r} (t)

, the arrival rates

r_{h} (t)

and

r_{l} (t)

for high- and low-priority vehicles, and the current time t. These state variables provide the agent with global information, with decisions and optimizations based on these data.

Based on the state space, the agent adjusts the charging pile allocation strategy through a set of actions. The action set includes assigning reserved charging piles to high-priority users, dynamically adjusting the resource allocation ratio for low-priority users, temporarily reallocating resources from non-reserved charging piles to high-priority users, and dynamically releasing non-reserved resources to low-priority users. These actions directly affect the evolution of the system state and reflect the scheduling effectiveness through the immediate reward function.

Given a state

s_{t}

and action, the state transition probability

P (s_{t + 1} | s_{t}, a_{t})

describes the likelihood of the system transitioning to the next state

s_{t + 1}

. Historical data analysis allows the dynamic prediction of the high-priority user’s arrival rate

r_{h} (t)

using the Attention-LSTM model, which, combined with a Poisson distribution, characterizes the user arrival pattern. The state transition probability is given by

P (s_{t + 1} | s_{t}, a_{t}) = \prod_{j} Poisson (r_{j})

(35)

The dynamic parameter

r_{j} = r_{h} (t)

is provided by the Attention-LSTM model, based on the prediction results at time t. The load status of the charging piles (including both reserved and non-reserved piles) is updated dynamically according to the scheduling actions and user demand. The combination of Attention-LSTM and the Poisson distribution effectively captures the randomness and time-varying characteristics of the arrival patterns.

The immediate reward function is central to the optimization process, quantifying the impact of each action on the service quality of high-priority users, resource utilization efficiency, and fairness for low-priority users. It is defined as follows:

r (s_{t}, a_{t}, s_{t + 1}) = - λ_{1} \cdot W (s_{t}, a_{t}) - λ_{2} \cdot T (s_{t}, a_{t}) + λ_{3} \cdot U (s_{t}, a_{t})

(36)

where

W (s_{t}, a_{t})

represents the average charging waiting time for users in the region;

T (s_{t}, a_{t})

is the average charging travel time for users in the region;

U (s_{t}, a_{t})

denotes the charging pile utilization rate in the region,

λ_{1}

,

λ_{2}

, and

λ_{3}

are the weighting factors for waiting time, travel time, and charging pile utilization rate, respectively, used to balance the contributions of each component to the reward. The reward function is designed to minimize the waiting time for high-priority users while maintaining the overall fairness of system resource allocation.

The goal of the agent is to maximize the long-term cumulative reward by finding the optimal scheduling strategy u. The corresponding objective function is

L (θ) = E (δ_{t}^{2})

(37)

During the scheduling process, the agent selects the optimal action based on the current state and action set, then observes the next state and immediate reward. Through continuous interaction and optimization, the agent gradually learns the optimal scheduling strategy under different load conditions.

In the modeling framework outlined above, the system integrates the charging priority parameter calculation model from Section 2.1 to dynamically predict the demand of high-priority users. It also utilizes the reserved charging pile selection algorithm from Section 2.2 to appropriately adjust the allocation of reserved charging pile resources. Additionally, based on the low-priority vehicle scheduling strategy in Section 3.1, the system coordinates the distribution ratio of low-priority vehicles to maintain charging fairness. Ultimately, with the reinforcement learning framework, the agent optimizes the loss function by combining state transitions and immediate rewards, learning the optimal strategy and achieving efficient user-to-charging pile matching.

4.2. Vehicle and Charging Pile Matching Framework

The proposed DDPDQN (Double Dueling Prioritized Deep Q-Network) algorithm consists of several modules: system state generation, prioritized experience replay, action selection, value estimation, target network update mechanism, and loss calculation. The complete architecture of the algorithm is shown in Figure 10.

The green module in the figure represents the system state generation module. In this module, the system (System module) collects real-time data related to electric vehicle charging, including the queue lengths of high- and low-priority vehicles, SOC thresholds, and arrival rates of high- and low-priority vehicles, forming the current system state s_t. Based on this current state, the changes in charging priority parameters (ΔP_c), charging waiting time (ΔW), charging travel time (ΔT), and charging pile utilization (ΔU) are calculated. These variations reflect the difference between the current system state and future system requirements. The state vector and the changes are passed as inputs to the neural network’s feature extraction layer.

The feature extraction stage uses a three-layer fully connected network to progressively reduce the dimensionality from 128 to 32. Each layer employs the ReLU activation function to enhance non-linear expression capabilities. After the feature extraction and fully connected layers, the dueling architecture splits the calculation of the Q-value into two parts: the state value neural network

V (s_{t}; θ, α)

and the advantage function neural network

A (s_{t}, a_{t}; θ, β)

. The network then generates the optimal action selection policy π(s) based on the

ε - greedy

strategy, selecting the action a_t.

Once the action is selected, the system evaluates the reward r_t based on the current state s_t and action a_t, and transitions to the next state s_t₊₁. The electric vehicle then uses the current system state, action, reward, and next state to construct an experience sample (s_t, a_t, r_t, s_t₊₁), which is stored in the experience replay pool.

In the DDPDQN algorithm, to ensure that the policy converges to the optimal, the action selection at time t is made using the following mechanism: With probability

ε

, the action a_t is selected randomly, and with probability

1 - ε

, the action corresponding to the maximum action value for the current state s_t is chosen, i.e.,

a_{t} = \{\begin{cases} random A (s_{t}) λ \leq ε, \\ argmax Q (s_{t}, a_{t} | θ) λ \geq ε \end{cases}

(38)

In the equation,

λ

represents a random number within the range [0, 1];

ε

is the exploration rate, which decreases as the number of training iterations increases, and

ε

is a constant.

The exploration rate ε follows a linear decay, and the exploration rate at the k-th training iteration, ε_k, is given by:

ε_{k} = \{\begin{cases} ε_{s} - \frac{ε_{s} - ε_{e}}{ψ} k \leq ψ, \\ ε_{e} k > ψ \end{cases}

(39)

where k is the number of training iterations, ε_s is the starting exploration rate, ε_e is the ending exploration rate, and

ψ

is the total number of decays.

The blue module in the figure represents the Prioritized Experience Replay (PER) module. This module consists of several key components: the Replay Buffer, Mini-batch Sampling, Prioritized Sampling, and Update mechanisms. In the system state generation module from the previous stage, we obtain an experience sample (s_t, a_t, r_t, s_t₊₁), which is stored in the Replay Buffer as a transition.

In the prioritized experience replay mechanism, the Temporal Difference (TD) error δ_i is first calculated for each stored experience sample. Based on the TD error, a priority is assigned to each experience. Higher-priority samples are more likely to be selected during subsequent sampling. During each training cycle, the system samples a mini-batch from the Replay Buffer based on priority. After mini-batch sampling, the selected experience samples are fed into the dueling network for training. The loss function is computed, and model parameters are updated. After each model update, the priority of samples in the Replay Buffer is recalculated based on the TD error.

In our paper, we adopt the Prioritized Experience Replay (PER) mechanism, and the optimized algorithm combining Double DQN and Dueling DQN is referred to as DDPDQN. The TD error and loss function for the DDPDQN algorithm are given by:

δ_{t} = [r_{t + 1} + γ \max_{a_{t + 1}} Q (s_{t + 1}, a_{t + 1}; ω); ω^{'}] - Q (s_{t}, a_{t}; ω)

(40)

L (ω) = E [r_{t + 1} + γ Q (s_{t + 1}, a_{t + 1}; ω^{'})] - Q (s_{t}, a_{t}; ω)

(41)

where

ω

represents the set of parameters

θ

, α, and β for the training network, while

ω^{'}

denotes the same parameters for the target network.

The charging priority is typically defined as the absolute value of the Temporal Difference (TD) error [38], expressed as

p_{t} = | δ_{t} | + ε

(42)

where δ_t is the TD error for the t-th experience and ε is a small positive constant to prevent zero priority. The sampling probability P(i) for a given experience is defined as

P (t) = \frac{p_{t}^{α}}{\sum_{k} p_{k}^{α}}

(43)

Were, α controls the impact of priority sampling. When α = 0, the sampling degenerates into uniform random sampling. To correct the bias introduced by prioritized sampling, importance sampling weights are introduced:

w_{t} = (\frac{1}{N \cdot P (t)})^{β}

(44)

where N is the size of the replay buffer, and β balances the bias correction, starting from a small value and gradually increasing to 1.

In the architecture, the orange module represents the Action Selection Module. Here, the current state s_t and subsequent state s_t₊₁, obtained from the Prioritized Experience Replay mechanism, are input into the module. The current state s_t and its associated variations are fed into the dueling network to determine the action a_t, while s_t₊₁ and its variations are processed by the target dueling network to determine the target action

a^{'}

.

The beige module represents the Value Estimation Module, which takes inputs from the Prioritized Experience Replay mechanism and the Action Selection Module, including a_t, s_t, target action

a^{'}

, and s_t₊₁. These inputs are processed through the dueling network and the target dueling network to calculate Q Eval and Q Target.

The TD error is then calculated based on Q Eval and Q Target, followed by the computation of the loss function Loss. Parameters for the Value Estimation Module and Action Selection Module networks are updated using gradient descent. Similarly, parameters for the Target Action Selection Module and Target Value Estimation Module networks are updated using the same strategy.

The parameter update formula for the DDPDQN algorithm is as follows:

\nabla_{ω} L (ω) = 2 E [δ_{t} \nabla_{ω} Q (s_{t}, a_{t}; ω)]

(45)

The process of the DDPDQN algorithm is shown in Figure 11. The detailed steps are as follows: Initialization: Initialize network parameters θ and target network parameters θ′. Set up the replay buffer. Initialize the environment and observe the current state s_t. Action Selection: Use an ε-greedy strategy to select an action a_t based on s_t. Environment Interaction: Execute the action a_t, receive the reward r_t, and transition to the next state s_t₊₁. Replay Buffer Update: Calculate the TD error δ_t and store the experience (s_t, a_t, r_t, s_t₊₁) in the replay buffer. Assign priorities to experiences based on their TD error. If the buffer is full, remove the oldest sample. Mini-batch Sampling: Sample a mini-batch of mmm experiences from the replay buffer based on priority and train the Dueling DQN model. Compute the loss function L(θ) and update the network parameters using the optimizer. Optimizer Selection: Use the Adam optimizer for t ≤ 14,000 and switch to SGD for t > 14,000. Target Network Update: Update the target network parameters θ′ with the current network parameters θ after every j training iteration. Termination: Repeat the above process until reaching the maximum training iteration J. The enhanced ε-greedy strategy improves the exploration of “state-action” pairs, accelerating the convergence and precision of the DDPDQN algorithm. Throughout the training process, the Double DQN mechanism mitigates the overestimation of Q-values, while the Dueling DQN structure provides a more accurate estimate of state value and action advantage. The incorporation of Prioritized Experience Replay further accelerates training and improves performance. Together, these enhancements significantly improve the efficiency and robustness of the algorithm in complex decision-making scenarios.

In the optimization process of the DDPDQN (Double Dueling Prioritized Deep Q-Network) algorithm, the selection of hyperparameters directly impacts the model’s training stability, convergence speed, and final decision quality. To ensure the scientific and rational selection of these hyperparameters, this study combines experimental analysis with support from influential literature to determine the optimal configuration.

The network structure is a core component of reinforcement learning models, determining feature extraction capabilities, training stability, and the final cumulative reward. To assess performance, we conducted experimental comparisons of different hidden layer configurations (two layers: 64, 32; three layers: 128, 64, 32; four layers: 256, 128, 64, 32). The experiments show that although the two-layer structure converges quickly, it achieves lower cumulative rewards. The four-layer structure is stable during early training but results in a decrease in the final cumulative reward. The three-layer structure converges after 10,000 iterations and achieves the highest cumulative reward. Therefore, we selected the three-layer configuration (128, 64, 32) for the model, as it strikes the best balance between computational overhead, convergence speed, and cumulative reward. This experimental analysis will be discussed in detail in Section 5.6.

This study employs a dual optimization strategy using Adam (0.001) and SGD (0.0001). Initially, Adam accelerates convergence, while SGD with a lower learning rate enhances training stability in later stages ([49]). The experience pool capacity is set to 10,000, and the TD error threshold ε is set to 200, based on the research of [50], which suggests that an experience pool size between 5000 and 50,000 improves sample diversity, while a TD error threshold between 100 and 300 optimizes training efficiency. The batch size is set to 64, following the recommendation of [51], which indicates that a batch size between 32 and 128 balances computational efficiency and gradient stability. The exploration rate ε-greedy strategy is initialized at 1.0 and decays to 0.01 over 10,000 steps, improving the convergence speed and preventing local optima ([49]). The discount factor γ is set to 0.99, as recommended by [51], to ensure the stability of long-term reward optimization. The training iterations are set to 20,000, with the target network update frequency set to 500, following the guidance of [50,51], with an optimal update frequency range of 200 to 1000 to ensure better policy convergence.

Finally, based on experimental testing and literature support, the optimal hyperparameter configuration for DDPDQN was determined, as shown in Table 6.

The DDPDQN reservation vehicle and charging pile matching algorithm is shown in Algorithm 1.

Algorithm 1: Reservation Vehicle and Charging Pile Matching Algorithm (DDPDQN)
	Input: State space S (system dynamic states, queue lengths, SOC thresholds, etc.), Action space A (actions such as assigning reserved piles, adjusting non-reserved resources), $Reward r_{t}$ (immediate feedback for evaluating the action), Replay buffer D with capacity M, Discount factor $γ$ , exploration rate $ε$ , initial parameters $θ$ $, θ^{'}$ Batch size m, learning rate $η$ Output: Optimized policy for matching vehicles to charging piles
1	Step 1: Initialization
2	Initialize replay buffer D with capacity M;
3	Initialize Dueling DQN network with random parameters $θ$ ;
4	Initialize target network with parameters $θ$ ← $θ^{'}$ ;
5	$Set exploration rate ε \leftarrow ε_{s}$ ;
6	Set learning rate $η$ for gradient descent;
7	Initialize counters for training iterations k ← 0;
8	Note: Ensure input state dimension is derived from system observations based on 2.1.
9	Step 2: System Initialization
10	$Observe initial state s_{0} \in S$ , including:
11	$High - priority queue length s_{1, t}$ $, low - priority queue length s_{2, t}$ ;
12	$SOC thresholds s_{3, t}$ $, s_{4, t}$ for high/low priority vehicles;
13	$Charging pile state s_{5, t}$ $, reserved pile count s_{6, t}$ ;
14	$High / low-priority vehicle arrival rates s_{7, t}$ $, s_{8, t}$ ;
15	$Current timestamp s_{9, t}$ ;
16	$Construct state vector s_{t} = [s_{1, t}, s_{2, t}, s_{3, t}, s_{4, t}, s_{5, t}, s_{6, t}, s_{7, t}, s_{8, t}, s_{9, t}]$ , incorporating inputs from 2.1 and 2.2;
17	Step 3: Interaction with Environment
18	for each time step t do
19	$Select action a_{t} \in A$ using $ε$ -greedy strategy: $a_{t} = \{\begin{cases} random action, with probability ε \\ {argmax}_{a} Q (s_{t}, a; θ), with probability 1 - ε \end{cases}$ . Note: Ensure input state dimension is derived from system observations based on 2.1;
20	$Execute action a_{t}$ $, update system state s_{t + 1}$ ;
21	$Observe immediate reward r_{t}$ $based on the effectiveness of a_{t}$ ;
22	$Store transition (s_{t}, a_{t}, r_{t}, s_{t + 1})$ in replay buffer D;
23	if \|D\| > M then
24	Remove oldest transition from D;
25	end
26	Note: $Reward r_{t}$ incorporates system-level performance metrics such as waiting time reduction (from 3) and load balancing (from 2.2);
27	end
28	Step 4: Prioritized Experience Replay
29	Priority Calculation: $For each sample i in D, compute TD error δ_{t}$ : $δ_{t} = r_{t} + γ Q (s_{t + 1}, a_{t + 1}; θ^{'}) - Q (s_{t}, a_{t}; θ)$ $Assign sampling probability p_{t}$ : $p_{t} = \frac{δ_{t} + ε}{\sum t (δ_{t} + ε)}$ Batch Sampling: $Sample minibatch {\{(s_{t}, a_{t}, r_{t}, s_{t + 1})\}}_{t = 1}^{m}$ $from D based on p_{t}$ , prioritizing samples with higher TD errors from 2.1 and 2.2;
30	Step 5: Model Training
31	Compute the loss function for the minibatch: $L (θ) = E [(r_{t + 1} + γ Q (s_{t + 1}, a_{t + 1}; θ^{'}) - Q (s_{t}, a_{t}; θ)]$ Use gradient descent to update network parameters $θ$ : $θ \leftarrow θ - η \nabla_{θ} L (θ)$ Note: Loss function incorporates feedback from 2.1 and 3 through rewards $r_{t}$ ;
32	Step 6: Dueling DQN Update
33	$Use dueling network structure to estimate Q (s, a)$ : $Q (s, a) = V (s) - [A (s, a) - \frac{1}{\|A\|} \sum_{a^{'}} A (s, a^{'})]$ Every fixed number of steps synchronizes target network parameters: $θ' \leftarrow θ$
34	Step 7: Exploration Rate Decay
35	Update exploration rate ^ε: $ε \leftarrow ε_{e} + (ε_{s} - ε_{e}) \cdot \exp (- k / K_{decay})$ . $where k is the current training iteration, K_{decay}$ is the total decay steps;
36	Increment training counter k←k + 1;
37	Step 8: Convergence Check
38	Repeat Steps 3 to 7 until policy converges to optimal strategy;
39	return Optimized policy for matching vehicles to charging piles

5. Simulation Experiments and Analysis

5.1. Simulation Scenario Setup

To validate the effectiveness of the proposed Adaptive Charging Priority (ACP) strategy, the simulation was conducted in a simulated urban electric vehicle (EV) charging network using the Opportunistic Network Environment (ONE) simulator [39]. The simulated area, measuring 4500 × 3400 m², was modeled based on the actual traffic and charging network distribution in Helsinki, Finland. The simulation scenario includes multiple road nodes and paths to represent dynamic urban traffic conditions.

The EVs in the simulation were modeled on the Nissan Leaf configuration [40], with a maximum battery capacity of 40 kWh, a driving range of 240 km, and an average energy consumption of 0.167 kWh/km. At the start of the simulation, all vehicles were assumed to have a fully charged battery to simulate real-world usage.

This study employs empirical research to appropriately set the State of Charge (SOC) thresholds for electric vehicle (EV) users, ensuring the scientific validity of the model and the reasonableness of the simulation environment. Existing research has shown that EV users tend to actively seek charging opportunities when their SOC reaches certain levels. For instance, ref. [52] used Prospect Theory to model EV users’ charging decisions, finding that drivers typically begin seeking charging when the SOC drops to 20–30%. Additionally, ref. [53] analyzed the charging behavior of EV users in Beijing, based on real-world operational data, and found that users are much more likely to seek charging when the SOC falls below 40%. This indicates a strong correlation between SOC levels and charging wait times. These findings provide a quantitative basis for setting the SOC thresholds. Based on this research, the SOC thresholds for high-priority users are set at 25%, 35%, and 45%. In contrast, low-priority users have more flexible charging time requirements and greater charging flexibility, so their SOC thresholds are set at 45%, 55%, and 65%, representing users who are willing to delay charging and are not in urgent need of finding a charging station. To enhance the realism of the simulation environment and improve the model’s accuracy in simulating real-world EV user charging behavior, this study further classifies high- and low-priority vehicles into three categories based on SOC thresholds. This classification method not only adds diversity to the simulation scenarios but also provides a more comprehensive evaluation of the applicability and fairness of the scheduling strategy across different user groups. Vehicles were randomly distributed across road nodes at the beginning of the simulation, with speeds varying between 25 and 45 km/h to simulate realistic traffic dynamics. Upon reaching a destination, vehicles randomly selected a new endpoint until their SOC dropped below the defined threshold.

The simulation scenario included five charging stations, each equipped with eight charging piles, with each pile offering a charging power of 50 kW. A centralized scheduling controller dynamically managed all charging requests, optimizing resource allocation in real time based on charging station load, user priority levels, travel distance, and priority parameters. The controller prioritized high-priority users’ charging efficiency while dynamically adjusting the scheduling strategy for low-priority users to minimize their waiting and travel times. Once users received their charging station assignments, they traveled along the shortest path, calculated based on the regional road network topology, to complete charging.

The simulation setup in this study not only effectively eliminates the influence of EV hardware parameter variations on experimental results—ensuring that the evaluation of the ACP strategy accurately reflects the impact of scheduling optimization on charging efficiency—but also enhances the experiment’s controllability, reproducibility, and realism. Standardized EV configurations and fixed SOC thresholds ensure consistency in experimental conditions, while dynamic traffic modeling and centralized scheduling control simulate real-world traffic flow and charging demand fluctuations, making the experiment more representative of practical application scenarios.

The simulation ran for 14 h, from 10:00 to 24:00, covering both peak and off-peak EV charging periods. During this time, charging resources were dynamically optimized under the ACP framework, aiming to minimize high-priority users’ waiting times, improve low-priority users’ charging efficiency, and ensure fairness.

To evaluate the performance of the ACP strategy, we simulated and compared three charging management strategies: Adaptive Charging Priority (ACP), First-Come-First-Served (FIFS) [27], and Static Reservation Priority (RFWDA) [30]. These strategies differ significantly in terms of charging resource management and scheduling methods. The key differences in their approaches are as follows:

Adaptive Charging Priority (ACP) Strategy: The ACP strategy integrates dynamic priority adjustment and intelligent resource allocation. First, it uses the Attention-LSTM model to dynamically calculate priority parameters based on real-time user demand and charging station load, optimizing resource allocation. Second, ACP incorporates a dynamic charging pile allocation algorithm, which flexibly adjusts the distribution of charging piles in response to real-time load and demand fluctuations, ensuring that high-priority vehicles are charged promptly. Additionally, ACP introduces a non-reservation vehicle scheduling strategy, balancing load and assigning non-reservation vehicles to charging stations with lighter loads, reducing their waiting time and improving resource utilization. Furthermore, ACP uses the DDPDQN algorithm for dynamic matching of charging piles with reservation vehicles, adjusting allocation strategies automatically based on reinforcement learning.

First-Come-First-Served (FIFS) Strategy: The FIFS strategy allocates charging piles based on the order of vehicle arrival, providing charging resources to vehicles as they arrive. Both reservation and non-reservation vehicles are scheduled according to their arrival sequence. This approach does not consider load fluctuations or changes in charging demand, and charging pile allocation strictly follows the order of arrival. As a result, FIFS lacks dedicated optimization for non-reservation vehicles, and all vehicles follow the same queuing rules without dynamic adjustment mechanisms.

Static Reservation Priority (RFWDA) Strategy: The core of the RFWDA strategy is to prioritize the charging needs of reserved vehicles. Reserved vehicles receive charging resources first. However, the resource allocation in RFWDA is static, and it does not adjust dynamically in response to load fluctuations. Additionally, non-reservation vehicles are not optimized, and their scheduling still follows fixed rules, often leading to lower charging station allocation efficiency during peak demand periods. Similar to the FIFS strategy, RFWDA lacks a dynamic scheduling mechanism for non-reservation vehicles.

Through the analysis of these three strategies, the following key differences emerge:

(1): Dynamic Priority Adjustment vs. Static Scheduling: The ACP strategy, by dynamically adjusting priorities, resource allocation, and non-reservation vehicle scheduling, offers greater flexibility in responding to fluctuations in charging demand compared to the static scheduling methods of FIFS and RFWDA. Both FIFS and RFWDA rely on static rules, which lack dynamic responsiveness to changing charging demand, leading to inefficient resource allocation.
(2): Dynamic Charging Pile Allocation vs. Static Reservation Allocation: The ACP strategy not only prioritizes the needs of reserved vehicles but also includes a dedicated non-reservation vehicle scheduling strategy. This strategy assigns non-reservation vehicles to stations with lighter loads, reducing waiting times and improving resource utilization. In contrast, FIFS and RFWDA feature relatively simple scheduling mechanisms for non-reservation vehicles. FIFS treats all vehicles equally, while RFWDA prioritizes reserved vehicles but applies static scheduling for non-reservation vehicles, lacking dynamic optimization of resource distribution.
(3): Non-Reservation Vehicle Scheduling Strategy: The ACP strategy features a non-reservation vehicle scheduling optimization strategy that dynamically allocates these vehicles to charging stations with lighter loads based on station load and non-reservation vehicle arrival rates, preventing queuing issues during peak demand periods. In comparison, FIFS and RFWDA lack dedicated scheduling mechanisms for non-reservation vehicles, with scheduling relying entirely on arrival order or static rules, unable to flexibly respond to demand fluctuations.
(4): Charging Resource Matching Mechanism: The ACP strategy dynamically matches charging piles with reservation vehicles using the DDPDQN algorithm, which can adjust resource allocation in real time to respond to load fluctuations and demand changes. In contrast, FIFS and RFWDA do not account for the impact of load fluctuations on resource allocation, and their charging pile matching is more rigid and lacks flexibility.

The performance of these strategies was evaluated based on the following four charging efficiency metrics:

(1): Average Charging Waiting Time: The average time users wait from arriving at a charging station to starting to charge, reflecting the timeliness of charging services.
(2): Average Charging Travel Time: The total time from departure to charging completion, including travel, waiting, and charging times, representing overall charging efficiency.
(3): Full Charging Count: The number of instances where vehicles achieved a full charge within the allowed parking time, indicating the efficiency of resource utilization.
(4): Unfinished Charging Count: The number of instances where vehicles failed to fully charge due to parking time constraints, including cases where users were still waiting to charge when the parking limit was reached.

5.2. High-Priority Vehicle Arrival Rate Prediction: Experiments and Analysis

5.2.1. Dataset

The experimental data, sourced from the publicly available ACN-Data dataset by Caltech, includes approximately 40,000 charging records from July 2018 to June 2019. Key attributes such as arrival times, reservation status, SOC, energy demand, and station utilization rates provide a solid foundation for EV charging optimization research.

In this study, we define priority levels based on users’ sensitivity to charging wait times. High-priority users are those who are more sensitive to waiting times and seek to complete charging as quickly as possible, whereas low-priority users are more flexible and can tolerate delayed charging when resources are constrained. Since the “reservation status” in the ACN-Data dataset reflects users’ charging planning and urgency, we use it as the criterion for priority classification. Our analysis of the dataset reveals that reserved users experience significantly shorter charging wait times compared to non-reserved users, confirming that reservation status is a valid indicator of user priority. Therefore, we classify reserved users as high-priority and non-reserved users as low-priority, ensuring that the proposed charging scheduling strategy better aligns with user needs.

This priority classification method differs from conventional approaches that primarily rely on battery status indicators, such as state of charge (SOC) and energy demand. Existing studies, predominantly from the perspective of power grid optimization, define priority levels based on factors such as remaining battery capacity, relaxation time, and the ratio of charging energy to charging time [21,22,23,24]. However, these methods often overlook user experience and real-world charging behavior patterns. In contrast, this study introduces reservation status as a priority criterion, offering a more precise representation of user demand and making the charging scheduling strategy more practical and user oriented.

Seven features relevant to predicting high-priority vehicle arrival rates were selected as inputs for the Attention-LSTM (ALSTM) model, with the arrival rate as the output. The dataset was split chronologically into training (75%) and test (25%) sets for model training and evaluation.

To reduce the impact of feature value range differences, min–max normalization was applied to scale data values to the [0, 1] range using the formula:

x_{n} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(46)

where x_n represents the normalized value, x is the original value, and x_min and x_max denote the feature’s minimum and maximum values, respectively. This normalization process significantly improved model training efficiency while ensuring balanced contributions from different features.

5.2.2. Training Phase Analysis

To evaluate the performance of different models in predicting high-priority vehicle arrival rates, the experiment compared four neural network models: LSTM, BPLSTM, DRNN, and Attention-LSTM. Mean Squared Error (MSE) was used as the loss function to measure training error. The error trends over different iteration counts were recorded, and the results are presented in Figure 12 and Table 7.

Figure 10 shows the MSE variation curves during training. While all models exhibited decreasing training errors with iterations, their final error levels varied significantly. Attention-LSTM consistently achieved the lowest error, followed by DRNN, with BPLSTM and LSTM showing higher errors, and LSTM performing the worst.

Table 7 highlights the final MSE values: Attention-LSTM achieved an MSE of 0.238, significantly outperforming DRNN (1.173), BPLSTM (2.745), and LSTM (3.876).

In summary, the Attention-LSTM model effectively captured dynamic variations in high-priority user arrival rates, demonstrating the lowest training error and highest prediction accuracy, particularly excelling in complex time-series tasks.

5.2.3. Experimental Analysis in the Testing Phase

To evaluate the practical performance of the models in predicting the arrival rate of high-priority vehicles, the testing dataset was used to compare four models: LSTM, BPLSTM, DRNN, and Attention-LSTM. The testing data were chronologically arranged, and two evaluation metrics were adopted: Mean Absolute Error (MAE) and the coefficient of determination (R²). These metrics assess prediction error and goodness of fit, respectively.

Figure 13 compares predicted and actual arrival rates, with Attention-LSTM showing the closest fit and highest predictive accuracy. DRNN ranked second, while BPLSTM and LSTM displayed larger deviations, with LSTM exhibiting the most fluctuations. Figure 14 highlights prediction error curves, where Attention-LSTM maintained minimal and stable errors throughout, outperforming DRNN, BPLSTM, and LSTM. The blue-shaded area represents the region where prediction errors fluctuate around zero, indicating the general trend and stability of all models in minimizing error deviations.

By leveraging the attention mechanism, Attention-LSTM effectively captured dynamic variations in high-priority vehicle arrival rates, achieving superior MAE and R² metrics. These results confirm its exceptional predictive accuracy and applicability in dynamic scenarios.

Table 8 summarizes the MAE and R² values for each model. The ALSTM model achieved an MAE of 0.341, representing a 62.7% reduction compared to DRNN (0.915), an 83.1% reduction compared to BPLSTM (2.014), and an 89.1% reduction compared to LSTM (3.127). Additionally, ALSTM achieved an R² value of 0.978, which is 1.8%, 4.6%, and 8.3% higher than those of DRNN, BPLSTM, and LSTM, respectively.

5.3. Impact of Reserved Charging Piles: Experiment and Analysis

To further analyze the impact of reserved charging piles on the charging efficiency of high- and low-priority electric vehicles (EVs), a dual-region, multi-charging station simulation scenario was designed based on the simulation network described in Section 4.1. The experiment assumes an even distribution of EVs across subregions, with each charging station equipped with five DC fast-charging piles. Queue length limits were imposed to reflect real-world service capacity constraints. To capture regional load variations, dynamic adjustment parameters

E_{thr}^{\min}

and

E_{thr}^{\max}

were introduced, enabling flexible simulation of resource allocation strategies under different load distributions. The results are based on the average of 10 independent experiments, with key parameters listed in Table 9.

Table 10 outlines the relationship between parameters

E_{thr}^{\min}

,

E_{thr}^{\max}

, T_avg, and r_h. In this experiment, the random vehicle arrival rate was fixed at r_l = 12 EV/h. Analysis shows that, except for the cases r_h = 22 EV/h, P_c = 0.05 and r_h = 24 EV/h, P_c = 0.057, both E_min and E_max satisfy the constraint conditions.

When the constraints for

E_{thr}^{\min}

and

E_{thr}^{\max}

are satisfied, increasing r_h and the priority parameter P_c results in a gradual reduction in the number of reserved piles, while E_min and E_max increase significantly. This indicates that adjusting the priority parameter effectively influences the queuing time for reserved vehicles, thereby optimizing the allocation of charging resources. To balance the efficiency of reserved and random vehicles, all experiments were conducted under the unified condition of ϕ = 10.

Results further indicate that the reserved vehicle arrival rate rhr_hrh significantly impacts the reserved pile configuration. When r_h ≤ 14 EV/h, charging resources can meet demand without the need for reserved piles. Conversely, when r_h ≥ 26 EV/h, setting reserved piles is not recommended to avoid reducing the efficiency of random vehicles. Within the range 14 EV/h < r_h < 26 EV/h, a moderate number of reserved piles can effectively prioritize reserved vehicle needs while maintaining a balanced charging experience for random vehicles.

To evaluate the impact of reserved charging piles on the ACP algorithm across different charging stations, the experiment fixed the random vehicle arrival rate at r_l = 12 EV/h. Figure 15a,b illustrate the inverse relationship between the arrival rates of reserved and random vehicles at the two charging stations. Reserved vehicle allocation is significantly influenced by the availability of idle charging piles. When the number of reserved piles is low, the impact is limited, but as the number increases, the effect on allocation proportions becomes more pronounced. For random vehicles, as r_h increases, the random vehicle arrival rate at the station with reserved piles rises, while it decreases at the station without reserved piles, highlighting the notable role of reserved piles in regulating the distribution of random vehicles.

Figure 15c,d present the average queuing wait times. Reserved vehicle wait times remain stable between 7 and 8 min, showing minimal sensitivity to changes in r_h and r_l. However, the wait times for reserved vehicles at stations with reserved piles are slightly lower than those at stations without reserved piles. In contrast, the queuing times for random vehicles are significantly influenced by arrival rates and resource distribution. As r_h increases, their queuing times grow steadily, reflecting intensified resource competition.

5.4. Reserved Vehicle and Charging Pile Matching: Experiment and Results Analysis

Figure 16, Figure 17 and Figure 18 compare the allocation performance of three strategies in matching reserved vehicles to charging piles. The ACP strategy achieved 97.6% allocation to reserved piles, with only 1.8% to non-reserved piles and 0.6% unallocated, showcasing its superior precision and efficiency.

In contrast, DDQN allocated 84.2% to reserved piles, 12.8% to non-reserved piles, and there were 3.0% unallocated, performing moderately well during off-peak periods but falling short overall. The DQN strategy performed the worst, with 74.6% allocated to reserved piles, 15.9% to non-reserved piles, and 9.5% unallocated, particularly struggling during peak hours (e.g., 18:00–20:00).

The results highlight the DDPDQN strategy’s dynamic scheduling and optimized allocation capabilities, significantly improving reserved pile usage and reducing unallocated vehicles, especially during peak demand. It surpasses DDQN and DQN in accuracy, efficiency, and reliability.

5.5. Validation of Charging Efficiency Results

In this group of simulations, the random vehicle arrival rate is fixed at 10 EV/h. Figure 19 shows results for high-priority vehicle arrival rates with a fixed maximum parking duration of 1800 s. The ACP strategy achieves shorter travel times, more fully charged vehicles, and fewer partially charged ones compared to RFWDA and FIFS, reducing travel times by up to 506 s during peak periods (22 EV/h). For high-priority vehicles, ACP-Reserved outperforms RFWDA-Reserved in wait times, travel times, and fully charged vehicles, with 132 more fully charged vehicles. RFWDA’s lack of dynamic resource allocation limits its efficiency during peak demand. In summary, the ACP strategy offers greater efficiency and reliability.

Figure 20 shows the performance of random vehicles under varying high-priority vehicle arrival rates, with a fixed parking duration of 1800 s. During peak periods (22 EV/h), the ACP-Random strategy reduces average wait times by 244 s and partially charged vehicles by 125, demonstrating improved charging efficiency. By evenly distributing random vehicle arrivals across charging stations, ACP-Random prevents congestion, ensuring efficient charging for both high-priority and random vehicles. In summary, ACP-Random outperforms the baseline strategy, providing efficient and balanced charging services.

In this group of simulations, the appointment vehicle arrival rate is fixed at 20 EV/h. Figure 21 shows the impact of varying random vehicle arrival rates with a fixed parking duration of 1800 s. Both the ACP and RFWDA strategies consistently achieve shorter wait times and trip durations than FIFS. At a 13 EV/h arrival rate, ACP and RFWDA reduce trip durations by 407 and 351 s, respectively, demonstrating effective resource allocation. ACP outperforms RFWDA under limited parking durations, increasing the number of fully charged vehicles by 265 at 13 EV/h. ACP-Reservation further adds 34 fully charged vehicles over RFWDA-Reservation while maintaining shorter wait times and trip durations. In summary, the ACP strategy excels in resource optimization, enhancing charging efficiency, reducing waiting times, and improving station utilization, particularly for reserved vehicles.

Figure 22 highlights the effects of varying random vehicle arrival rates on random vehicle performance, with a fixed parking duration of 1800 s. The ACP-Random strategy consistently outperforms the baseline Random strategy, reducing charging trip durations by 251 s and increasing the number of fully charged vehicles by 157 at a 13 EV/h arrival rate. This improvement is due to ACP-Random’s dynamic allocation of resources, optimizing charging efficiency for random vehicles while still meeting the needs of high-priority vehicles. Overall, the ACP strategy significantly enhances charging efficiency for both high-priority and random vehicles through optimized resource allocation.

In this group of simulations, the appointment vehicle arrival rate is fixed at 20 EV/h, and the random vehicle arrival rate is fixed at 10 EV/h.

Figure 23 shows the impact of varying parking duration limits, with reserved and random vehicle arrival rates fixed at 20 EV/h and 10 EV/h, respectively. Both ACP and RWFAD outperform FIFS across all limits, with ACP and RWFAD reducing charging trip durations by 453 s and 387 s, respectively, under a 1200 s limit. While ACP and RWFAD achieve similar wait times, ACP delivers shorter trip durations, fewer incomplete charges, and more fully charged vehicles. This is due to ACP’s dynamic resource allocation, which enhances charging efficiency for both high-priority and random vehicles, unlike RWFAD’s static approach. In summary, ACP provides superior charging resource management, optimizing efficiency and fairness across varying parking duration limits.

Figure 24 compares random vehicle performance under varying parking duration limits, with high-priority and random vehicle arrival rates fixed at 20 EV/h and 10 EV/h, respectively. ACP-Random outperforms the random strategy, reducing trip durations by 275 s under a 1200 s limit, while increasing fully charged vehicles and reducing incomplete charges. By dynamically reallocating resources, ACP enhances random vehicle charging efficiency without compromising high-priority vehicle performance. In summary, ACP consistently improves random vehicle charging efficiency, especially under shorter parking durations.

In this group of simulations, the appointment vehicle arrival rate is fixed at 24 EV/h, and the random vehicle arrival rate is fixed at 12 EV/h.

Figure 25 compares charging performance under varying parking duration limits, with reservation and random vehicle arrival rates fixed at 24 EV/h. Both ACP and RWFAD outperform FIFS in reducing wait times and trip durations. Under a 1200 s limit, ACP and RWFAD reduce wait times by 96 and 69 s, respectively. While ACP’s wait times are slightly higher than RWFAD’s in some cases, it achieves shorter trip durations, more fully charged vehicles, and fewer incomplete charges, particularly for high-priority vehicles. ACP optimizes charging efficiency through dynamic resource reallocation based on real-time load and demand predictions, whereas RWFAD lacks such adaptability, limiting its performance under high demand. In summary, ACP offers superior efficiency in managing fluctuating charging demands and resource allocation.

Figure 26 compares random vehicle performance under varying parking duration limits, with reservation and random vehicle arrival rates fixed at 24 EV/h and 12 EV/h, respectively. Across all limits, the ACP strategy outperforms the random strategy. At a 3600 s limit, ACP fully charges 75 more random vehicles, effectively reducing congestion and increasing charging opportunities. By dynamically scheduling non-reservation vehicles while prioritizing high-priority vehicles, ACP optimizes wait times and travel durations, enhancing efficiency for both groups. In summary, ACP ensures balanced and efficient resource allocation, delivering high charging efficiency for all vehicle types.

Figure 27 compares the charging pile utilization rates of ACP, FIFS, and RFWDA at Charging Stations 1 and 2. The ACP strategy maintains near-optimal utilization, reaching ~100% during peak hours (12:00–13:00) and consistently exceeding 85% during non-peak hours (e.g., 10:00, 20:00), showcasing its efficiency and stability. In contrast, RFWDA and FIFS show significant fluctuations. RFWDA’s utilization drops to 65–75% at Charging Station 2 during non-peak periods, while FIFS often falls below 70% across various times. ACP also achieves better balance between stations, unlike the larger disparities observed with RFWDA and FIFS. In summary, ACP ensures higher, more balanced charging pile utilization, demonstrating superior resource management.

Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26 and Figure 27 demonstrate that the ACP strategy significantly outperforms both the FIFS and RFWDA strategies across key metrics, including charging wait time, charging travel time, the number of fully charged vehicles, and charging station utilization rates. These advantages stem primarily from the dynamic scheduling, flexible priority adjustment, and precise resource allocation embedded in the ACP strategy, all of which effectively optimize charging operations and improve overall system efficiency.

In contrast, the FIFS strategy fails to adapt to demand fluctuations, resulting in longer wait times and inefficient resource utilization during high-demand periods. Since FIFS allocates resources based on vehicle arrival order, it cannot prioritize high-demand users and lacks the ability to adjust according to charging station load changes. As a result, FIFS performs poorly during periods of high demand or heavy station load, failing to effectively enhance charging efficiency.

Although the RFWDA strategy introduces a reservation system, it still relies on static scheduling rules, limiting its ability to handle dynamic charging demand. While it prioritizes reserved vehicles, it does not dynamically adjust resource allocation for non-reserved vehicles. This results in longer wait times for non-reserved vehicles and underutilization of charging station resources, particularly during demand fluctuations. In comparison, the ACP strategy improves charging efficiency by dynamically adjusting priorities and reallocating resources, effectively reducing the number of incomplete charges and increasing charging pile utilization.

The ACP strategy ensures higher charging efficiency and a greater number of fully charged vehicles by continuously adjusting and reallocating resources in real-time. Specifically, the dynamic priority adjustment in ACP allows it to prioritize high-priority vehicles during peak demand periods while optimizing the scheduling of non-reserved vehicles, thus reducing wait times and charging travel time. Furthermore, the dynamic charging pile allocation algorithm ensures the efficient utilization of charging piles, preventing both idle and overloaded piles.

In summary, the ACP strategy clearly outperforms both the FIFS and RFWDA strategies in terms of efficiency, resource utilization, and fairness. By addressing demand fluctuations and optimizing resource allocation, ACP offers a more flexible, scalable, and efficient solution for electric vehicle charging management, significantly enhancing charging efficiency and improving the user experience.

5.6. Model Parameter Impact Analysis

Figure 28 illustrates the cumulative reward changes during training for DDPDQN and two comparison strategies. Initially, all three strategies show low cumulative rewards, which progressively increase as training advances. DDPDQN stabilizes at a cumulative reward of approximately 0.153 after 25,000 iterations, representing improvements of 86.6% and 54.5% compared to DQN (0.082) and DDQN (0.099), respectively. Additionally, DDPDQN achieves faster convergence with reduced volatility. These findings confirm DDPDQN’s superiority in convergence speed and cumulative reward, validating its advantage in dynamic resource allocation.

Figure 29 examines the influence of the charging wait time factor on the average charging wait time and charging pile utilization rate. As the factor increases from 0 to 1, the average wait time decreases while utilization improves. When the factor is 0, the average wait time is approximately 28 min, and utilization is around 60%. At a factor of 0.5, the wait time decreases to 23 min, and utilization increases to 80%. At a factor of 1, the wait time further reduces to 18 min, and utilization exceeds 90%. However, when the factor value exceeds 0.6, the rate of decrease in charging wait time becomes smaller, and the increase in charging pile utilization slows. We recommend setting this factor in the range of 0.6–0.8 to achieve a balance between optimizing wait times and charging pile utilization. These results suggest that appropriately increasing the charging wait time factor can effectively optimize charging efficiency and resource utilization, providing a feasible strategy for balancing user experience and resource allocation.

Figure 30 depicts the effect of the charging travel time factor on average travel time and charging pile utilization. As the factor rises from 0 to 1, the average travel time decreases while utilization improves. At a factor of 0.5, the average travel time is 40 min, with utilization at 80%. When the factor reaches 1, the average travel time decreases to 32 min, and utilization approaches 90%. However, when the factor exceeds 0.7, the decrease in charging travel time becomes less significant, while the increase in charging pile utilization stabilizes. We recommend setting this factor in the range of 0.6–0.75 to optimize charging travel time while avoiding excessive resource allocation optimization that could increase computational costs. The research results indicate that appropriately increasing the charging travel time factor contributes to improved charging efficiency and resource utilization.

Figure 31 shows how the charging pile utilization factor affects average travel time and utilization rates. As the factor increases from 0 to 1, the average travel time decreases while utilization improves steadily. At factors between 0.4 and 0.5, the average travel time reduces to 38 min, and utilization reaches 85%. At a factor of 1, the travel time drops to 28 min, with utilization nearing 95%. However, once the factor exceeds 0.7, the optimization effect stabilizes, and the marginal gains from further increases in the factor are minimal. We recommend setting this factor in the range of 0.6–0.7 to balance efficient charging pile utilization and load balancing. The findings demonstrate that increasing the charging pile utilization factor can significantly optimize charging efficiency, but the improvement becomes marginal when the factor exceeds 0.7.

Figure 32 explores the effect of varying hidden layer configurations—two layers (64, 32), three layers (128, 64, 32), and four layers (256, 128, 64, 32)—on cumulative reward. Results indicate that increasing the number of layers initially enhances the model’s learning capacity. The two-layer structure converges quickly but achieves a final cumulative reward of 0.08. The three-layer structure, converging after 10,000 iterations, achieves the highest cumulative reward of 0.2. The four-layer structure, while stable in early iterations, plateaus at a cumulative reward of 0.1, suggesting that excessive layers may hinder optimization. Overall, the three-layer configuration (128, 64, 32) offers the best trade-off between convergence speed and cumulative reward. The results suggest that increasing the number of hidden layers enhances the model’s learning capacity, but excessive layers may reduce optimization effectiveness.

6. Conclusions

This paper proposes an adaptive charging management strategy (ACP) tailored to meet the multi-tiered demands of electric vehicle (EV) users. The strategy aims to optimize charging resource allocation, enhance utilization efficiency, and minimize user waiting times. Through dynamic scheduling, priority adjustments, and precise resource allocation, ACP effectively addresses issues of resource waste and excessive waiting times at charging stations. The main findings are as follows:

Dynamic Charging Priority Parameter Calculation: Utilizing an Attention-LSTM model, a dynamically adjustable charging priority parameter calculation method is proposed. By adjusting priority levels, this method ensures efficient resource allocation. Compared to traditional static scheduling approaches, the ACP strategy guarantees timely support for high-priority users during peak demand periods.

Dynamic Reserved Charging Pile Allocation Algorithm: A dynamic algorithm for selecting reserved charging piles based on station load and demand fluctuations is introduced. This algorithm prioritizes high-priority users during peak hours while dynamically redistributing non-reserved charging piles. It significantly enhances resource scheduling efficiency.

Optimized Non-Reservation Vehicle Scheduling Strategy: An optimized scheduling strategy is developed for non-reservation vehicles, minimizing their charging wait times and travel times by balancing their distribution across charging stations. When the non-reservation vehicle arrival rate is 12 EV/h and the reservation vehicle arrival rate is 22 EV/h, the ACP strategy reduces charging wait times by 96 s and 28 s and charging travel times by 452 s and 73 s compared to FIFS and RFWDA, respectively. Additionally, the number of fully charged vehicles increases by 78 and 64, respectively.

Reservation Vehicle and Charging Pile Matching Algorithm: Based on charging priority parameter calculation, reserved pile selection, and non-reservation vehicle scheduling, a DDPDQN-based matching algorithm (Double + Dueling + Prioritized Experience Replay + DQN) for reservation vehicles and charging piles is proposed. This algorithm learns real-time patterns of high-priority user arrivals and station load variations, enabling dynamic resource allocation. Experimental results indicate that the DPDQN strategy increases the proportion of reservation vehicles allocated to reserved piles by 22.9% and improves charging pile utilization by 19.5% compared to the DQN strategy.

In summary, the ACP strategy achieves precise resource scheduling and dynamic optimization, effectively addressing uneven resource allocation and low charging efficiency. Compared to traditional fixed-resource allocation methods, ACP demonstrates significant advantages in charging efficiency, resource utilization, and overall user experience. It provides an efficient and flexible solution for EV charging management.

To enhance the adaptability of the ACP strategy under high load or emergency conditions, future research can integrate the “Charging Pile Reservation Selection Algorithm” with multi-level traffic data to optimize the emergency scheduling mechanism. As the scale of charging stations and users expands, the update frequency of the DDPDQN algorithm can be dynamically adjusted to improve computational efficiency and practical application performance. Additionally, the ACP strategy can be integrated with Intelligent Transportation Systems (ITS), utilizing traffic flow prediction and V2X communication to optimize charging scheduling, enabling dynamic guidance and real-time reservations. By combining this with urban energy management systems, charging demand can be predicted based on traffic data, optimizing grid load distribution, alleviating peak pressures, and increasing the utilization of renewable energy.

Author Contributions

Conceptualization, D.G. and S.Z.; methodology, S.Z., C.Z. and D.G.; software, S.Z., C.Z. and Z.L.; validation, P.M., B.Z. and Z.L.; formal analysis, S.Z. and C.Z.; investigation, D.G.; resources, D.G.; data curation, P.M., B.Z. and Z.L.; writing—original draft preparation, S.Z. and B.Z.; writing—review and editing, C.Z. and D.G.; visualization, S.Z.; supervision, D.G.; project administration, D.G.; funding acquisition, D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the State Key Lab of Intelligent Transportation System under Project No. 2024-B009, the National Natural Science Foundation of China grant number 52102465, and the Shandong Provincial Nature Science Foundation of China grant number ZR2023QF028. The APC was funded by the State Key Lab of Intelligent Transportation System.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Z.; Song, J.; Kubal, J.; Susarla, N.; Knehr, K.W.; Islam, E.; Nelson, P.; Ahmed, S. Comparing total cost of ownership of battery electric vehicles and internal combustion engine vehicles. Energy Policy 2021, 158, 112564. [Google Scholar] [CrossRef]
Sun, C.X.; Li, T.X.; Tang, X.Y. A Data-Driven Approach for Optimizing Early-Stage Electric Vehicle Charging Station Placement. IEEE Trans. Ind. Inform. 2024, 20, 11500–11510. [Google Scholar] [CrossRef]
Habbal, A.; Alrifaie, M.F. A User-Preference-Based Charging Station Recommendation for Electric Vehicles. IEEE Trans. Intell. Transp. Syst. 2024, 25, 11617–11634. [Google Scholar] [CrossRef]
Infante, W.; Ma, J. Coordinated Management and Ratio Assessment of Electric Vehicle Charging Facilities. IEEE Trans. Ind. Appl. 2020, 56, 5955–5962. [Google Scholar] [CrossRef]
Paudel, A.; Hussain, S.A.; Sadiq, R.; Zareipour, H.; Hewage, K. Decentralized cooperative approach for electric vehicle charging. J. Clean. Prod. 2022, 364, 132590. [Google Scholar] [CrossRef]
International Energy Agency (IEA). Available online: https://www.iea.org/reports/global-ev-outlook-2024 (accessed on 22 February 2025).
Chen, X.; Zhang, H.; Xu, Z.; Nielsen, C.P.; McElroy, M.B.; Lv, J. Impacts of Fleet Types and Charging Modes for Electric Vehicles on Emissions under Different Penetrations of Wind Power. Nat. Energy 2018, 3, 413–421. [Google Scholar] [CrossRef]
Bilsalget i Desember og Hele 2024, Opplysningsrådet for Veitrafikken (OFV). Available online: https://ofv.no/bilsalget/bilsalget-i-desember-2024 (accessed on 22 February 2025).
Sivertsgård, A.; Fjær, K.K.; Gogia, R.; Moum, A.L.; Sivertsgård, A.; Spilde, D.; Syrstad, T.A.; Tveten, Å.G.; Veie, C.A. Norsk og Nordisk Effektbalanse mot 2035; Norges Vassdrags- og Energidirektorat (NVE): Oslo, Norway, 2024; pp. 1–50. [Google Scholar]
Chen, J.; Huang, X.Q.; Cao, Y.J.; Li, L.Y.; Yan, K.; Wu, L.; Liang, K. Electric vehicle charging schedule considering shared charging pile based on Generalized Nash Game. Int. J. Electr. Power Energy Syst. 2022, 136, 107579. [Google Scholar] [CrossRef]
Amini, M.H.; Moghaddam, M.P.; Karabasoglu, O. Simultaneous allocation of electric vehicles’ parking lots and distributed renewable resources in smart power distribution networks. Sustain. Cities Soc. 2017, 28, 332–342. [Google Scholar] [CrossRef]
Bandpey, M.F.; Firouzjah, K.G. Two-stage charging strategy of plug-in electric vehicles based on fuzzy control. Comput. Oper. Res. 2018, 96, 236–243. [Google Scholar] [CrossRef]
Xydas, E.; Marmaras, C.; Cipcigan, L.M. A multi-agent based scheduling algorithm for adaptive electric vehicles charging. Appl. Energy 2016, 177, 354–365. [Google Scholar] [CrossRef]
Abdalrahman, A.; Zhuang, W.H. Dynamic Pricing for Differentiated PEV Charging Services Using Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2022, 23, 1415–1427. [Google Scholar] [CrossRef]
Bitar, E.; Xu, Y.J. Deadline Differentiated Pricing of Deferrable Electric Loads. IEEE Trans. Smart Grid 2017, 8, 13–25. [Google Scholar] [CrossRef]
Huang, A.L.; Mao, Y.X.; Chen, X.S.; Xu, Y.H.; Wu, S.X. A multi-timescale energy scheduling model for microgrid embedded with differentiated electric vehicle charging management strategies. Sustain. Cities Soc. 2024, 101, 105123. [Google Scholar] [CrossRef]
Flocea, R.; Hîncu, A.; Robu, A.; Senocico, S.; Traciu, A.; Remus, B.M.; Raboaca, M.S.; Filote, C. Electric Vehicle Smart Charging Reservation Algorithm. Sensors 2022, 22, 2834. [Google Scholar] [CrossRef]
Cao, Y.; Jiang, T.; Kaiwartya, O.; Sun, H.J.; Zhou, H.; Wang, R. Toward Pre-Empted EV Charging Recommendation Through V2V-Based Reservation System. IEEE Trans. Syst. Man Cybern.-Syst. 2021, 51, 3026–3039. [Google Scholar] [CrossRef]
Shen, X.X.; Lv, J.; Du, S.C.; Deng, Y.F.; Liu, M.L.; Zhou, Y.L. Integrated optimization of electric vehicles charging location and allocation for valet charging service. Flex. Serv. Manuf. J. 2024, 36, 1080–1106. [Google Scholar] [CrossRef]
Lai, Z.J.; Li, S. On-demand valet charging for electric vehicles: Economic equilibrium, infrastructure planning and regulatory incentives. Transp. Res. Part C-Emerg. Technol. 2022, 140, 103669. [Google Scholar] [CrossRef]
Shang, Y.T.; Li, Z.K.; Li, S.; Shao, Z.Y.; Jian, L.N. An Information Security Solution for Vehicle-to-Grid Scheduling by Distributed Edge Computing and Federated Deep Learning. IEEE Trans. Ind. Appl. 2024, 60, 4381–4395. [Google Scholar] [CrossRef]
Jin, J.L.; Xu, Y.J. Optimal Policy Characterization Enhanced Actor-Critic Approach for Electric Vehicle Charging Scheduling in a Power Distribution Network. IEEE Trans. Smart Grid 2021, 12, 1416–1428. [Google Scholar] [CrossRef]
Sadreddini, Z.; Guner, S.; Erdinc, O. Design of a Decision-Based Multicriteria Reservation System for the EV Parking Lot. IEEE Trans. Transp. Electrif. 2021, 7, 2429–2438. [Google Scholar] [CrossRef]
Sone, S.P.; Lehtomaki, J.J.; Khan, Z.; Umebayashi, K.; Kim, K.S. Robust EV Scheduling in Charging Stations Under Uncertain Demands and Deadlines. IEEE Trans. Intell. Transp. Syst. 2024, 25, 21484–21499. [Google Scholar] [CrossRef]
Ahmad, A.; Ullah, Z.; Khalid, M.; Ahmad, N. Toward Efficient Mobile Electric Vehicle Charging under Heterogeneous Battery Switching Technology. Appl. Sci. 2022, 12, 904. [Google Scholar] [CrossRef]
Bao, Z.Y.; Xie, C. Optimal station locations for en-route charging of electric vehicles in congested intercity networks: A new problem formulation and exact and approximate partitioning algorithms. Transp. Res. Part C-Emerg. Technol. 2021, 133, 103447. [Google Scholar] [CrossRef]
Madaram, V.; Biswas, P.K.; Sain, C.; Thanikanti, S.B.; Selvarajan, S. Optimal electric vehicle charge scheduling algorithm using war strategy optimization approach. Sci. Rep. 2024, 14, 21795. [Google Scholar] [CrossRef] [PubMed]
Attaianese, C.; Di Pasquale, A.; Franzese, P.; Iannuzzi, D.; Pagano, M.; Ribera, M. A model-based EVs charging scheduling for a multi-slot Ultra-Fast Charging Station. Electr. Power Syst. Res. 2023, 216, 109009. [Google Scholar] [CrossRef]
Deng, X.S.; Zhang, Q.; Li, Y.; Sun, T.; Yue, H.Z. Hierarchical Distributed Frequency Regulation Strategy of Electric Vehicle Cluster Considering Demand Charging Load Optimization. IEEE Trans. Ind. Appl. 2022, 58, 720–731. [Google Scholar] [CrossRef]
Kim, B.; Paik, M.; Kim, Y.; Ko, H.; Pack, S. Distributed Electric Vehicle Charging Mechanism: A Game-Theoretical Approach. IEEE Trans. Veh. Technol. 2022, 71, 8309–8317. [Google Scholar] [CrossRef]
Wang, Z.F.; Jochem, P.; Fichtner, W. A scenario-based stochastic optimization model for charging scheduling of electric vehicles under uncertainties of vehicle availability and charging demand. J. Clean. Prod. 2020, 254, 119886. [Google Scholar] [CrossRef]
Das, S.; Thakur, P.; Singh, A.K.; Singh, S.N. Optimal management of vehicle-to-grid and grid-to-vehicle strategies for load profile improvement in distribution system. J. Energy Storage 2022, 49, 104068. [Google Scholar] [CrossRef]
Wu, X.M.; Feng, Q.J.; Bai, C.C.; Lai, C.S.; Jia, Y.W.; Lai, L.L. A novel fast-charging stations locational planning model for electric bus transit system. Energy 2021, 224, 120106. [Google Scholar] [CrossRef]
Ji, C.L.; Liu, Y.B.; Lyu, L.; Li, X.C.; Liu, C.; Peng, Y.X.; Xiang, Y. A Personalized Fast-Charging Navigation Strategy Based on Mutual Effect of Dynamic Queuing. IEEE Trans. Ind. Appl. 2020, 56, 5729–5740. [Google Scholar] [CrossRef]
Qin, J.X.; Qiu, J.; Chen, Y.T.; Wu, T.; Xiang, L.G. Charging Stations Selection Using a Graph Convolutional Network from Geographic Grid. Sustainability 2022, 14, 16797. [Google Scholar] [CrossRef]
Scarpelli, C.; Ceraolo, M.; Crisostomi, E.; Apicella, V.; Pellegrini, G. Charging Electric Vehicles on Highways: Challenges and Opportunities. IEEE Access 2024, 12, 55814–55823. [Google Scholar] [CrossRef]
Tanwar, S.; Kakkar, R.; Gupta, R.; Raboaca, M.S.; Sharma, R.; Alqahtani, F.; Tolba, A. Blockchain-based electric vehicle charging reservation scheme for optimum pricing. Int. J. Energy Res. 2022, 46, 14994–15007. [Google Scholar] [CrossRef]
Liu, S.H.; Ni, Q.; Cao, Y.; Cui, J.X.; Tian, D.X.; Zhuang, Y. A Reservation-Based Vehicle-to-Vehicle Charging Service Under Constraint of Parking Duration. IEEE Syst. J. 2023, 17, 176–187. [Google Scholar] [CrossRef]
Xiao, Q.; Zhang, R.T.; Wang, Y.C.; Shi, P.; Wang, X.; Chen, B.R.; Fan, C.W.; Chen, G. A deep reinforcement learning based charging and discharging scheduling strategy for electric vehicles. Energy Rep. 2024, 12, 4854–4863. [Google Scholar] [CrossRef]
Liu, L.F.; Huang, Z.; Xu, J. Multi-Agent Deep Reinforcement Learning Based Scheduling Approach for Mobile Charging in Internet of Electric Vehicles. IEEE Trans. Mob. Comput. 2024, 23, 10130–10145. [Google Scholar] [CrossRef]
Zhang, Y.Y.; Rao, X.P.; Liu, C.Y.; Zhang, X.B.; Zhou, Y. A cooperative EV charging scheduling strategy based on double deep Q-network and Prioritized experience replay. Eng. Appl. Artif. Intell. 2023, 118, 105642. [Google Scholar] [CrossRef]
Wang, L.; Hou, L.Y.; Liu, S.X.; Han, Z.; Wu, J. Reinforcement Contract Design for Vehicular-Edge Computing Scheduling and Energy Trading via Deep Q-Network With Hybrid Action Space. IEEE Trans. Mob. Comput. 2024, 23, 6770–6784. [Google Scholar] [CrossRef]
Li, F.; Zuo, W.; Zhou, K.; Li, Q.Q.; Huang, Y.H.; Zhang, G.D. State-of-charge estimation of lithium-ion battery based on second order resistor-capacitance circuit-PSO-TCN model. Energy 2024, 289, 130025. [Google Scholar] [CrossRef]
Li, F.; Zuo, W.; Zhou, K.; Li, Q.Q.; Huang, Y.H. State of charge estimation of lithium-ion batteries based on PSO-TCN-Attention neural network. J. Energy Storage 2024, 84, 110806. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Kirkpatrick, S.; Gelatt, C.D., Jr.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
Van Laarhoven, P.J.M.; Aarts, E.H.L. Simulated Annealing: Theory and Applications; Reidel: Dordrecht, The Netherlands, 1987; pp. 1–187. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
Hessel, M.; Modayil, J.; van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.G.; Silver, D. Rainbow: Combining Improvements in Deep Reinforcement Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Hu, L.; Dong, J.; Lin, Z.H. Modeling charging behavior of battery electric vehicle drivers: A cumulative prospect theory based approach. Transp. Res. Part C-Emerg. Technol. 2019, 102, 474–489. [Google Scholar] [CrossRef]
Tian, H.; Sun, Y.J.; Hu, F.F.; Du, J.Y. Charging Behavior Analysis Based on Operation Data of Private BEV Customers in Beijing. Electronics 2023, 12, 373. [Google Scholar] [CrossRef]

Figure 1. Overview of the problem and key challenges in electric vehicle (EV) charging management.

Figure 2. Architecture of the proposed Adaptive Charging Priority (ACP) strategy and its key components.

Figure 3. Coding structure of conventional encoder and encoder with attention mechanism.

Figure 4. Attention−LTSM prediction model structure.

Figure 5. ALSTM-based attention weight distribution over time.

Figure 6. Heatmap of ALSTM-based attention weight distribution.

Figure 7. Plot of variation in mean absolute error for different step sizes.

Figure 8. Attention−LSTM Algorithm Training Flowchart.

Figure 9. Algorithmic flow of Attention-LSTM.

Figure 10. DDPDQN algorithm network model architecture.

Figure 11. DPDQN charging pile matching algorithm network model training process.

Figure 12. Comparison of changes in mean square error across models.

Figure 13. Curves of predicted versus actual values for each model of the algorithm.

Figure 14. Prediction error curves for each model.

Figure 15. Impact analysis of reserved piles.

Figure 16. Matching effect diagram of the matching algorithm between the reservation vehicle and the charging pile under the ACP strategy.

Figure 17. Matching effect diagram of reservation vehicle and charging pile matching algorithm under RFWDA strategy.

Figure 18. Matching effect diagram of the reservation vehicle and the charging pile matching algorithm under the FIFS strategy.

Figure 19. Results under the change in high priority vehicle arrival rate (parking time limit 1800 s).

Figure 20. Random vehicle results under the change in high priority vehicle arrival rate (parking time limit 1800 s).

Figure 21. The result of random vehicle arrival rate change (parking time limit 1800 s).

Figure 22. Random vehicle results under the change in random vehicle arrival rate (parking time limit 1800 s).

Figure 23. The results under the change in parking time (parking time limit 1800 s).

Figure 24. Random vehicle results under the change in parking time (parking time limit 1800 s).

Figure 25. Result 2 under the change in parking time (parking time limit 1800 s).

Figure 26. Random vehicle result 2 under the change in parking duration (parking duration upper limit 1800 s).

Figure 27. Comparison of the utilization rate of charging piles with time under different strategies.

Figure 28. Convergence process of the three algorithms.

Figure 29. Average charging waiting time and charging pile utilization under different charging waiting time factors.

Figure 30. Average charging stroke duration and charging pile utilization rate under different charging stroke duration factors.

Figure 31. Average charging travel time and charging pile utilization rate under different charging pile utilization factors.

Figure 32. Learning curves of different hidden layers.

Table 1. Variable symbols and their definitions used in the study.

Symbol	Variable Definition
W_ij	Attention layer weight allocation factor
c	Feature vector
S_ij	Scoring function
$ν,$ $W,$ $U,$	Weight factor
$b$	Bias factor
x_t	Time step in the input sequence
h_t	Intermediate state computed by LSTM
${\hat{r}}_{i, h}$ $, r_{i, h}$	Predicted and actual arrival rates of high-priority vehicles
N	Total sample size
${\bar{r}}_{h}$	Average value of the actual arrival rate of reserved vehicles
P_c	Charging priority parameters
α	Normalized arrival rate deviation factor
β	Initial average arrival rate value
$N_{t}$	Total number of charging piles
$N_{r}$	Number of reserved charging piles
E_min, E_max	Minimum and maximum expected waiting times
μ	Service rate of a single charging pile
$ρ_{h}$	Load rate of high-priority vehicles at reserved charging piles
$E_{thr}^{\min}$ $, E_{thr}^{\max}$	Minimum and maximum constrained charging wait times
T_avg	Average queue waiting time within the region
$T_{h}$ $, T_{l}$	Average queue waiting times for high-priority (reserved) and low-priority vehicles
$φ$	Weight coefficient for the average queue waiting time of reserved vehicles versus random vehicles
$P_{\max}$ $, P_{\min}$	Upper and lower bounds of charging priority parameters
$ω_{j k} (t)$	Binary decision variable representing the reservation status of a charging pile
j, k	Charging station and charging pile number
$u_{j k}$	Service gain when charging pile k is allocated as a reserved charging pile
$v_{i}$	Unit-time service value contributed by high-priority users
$τ_{j k}$	Time occupied by high-priority users at charging pile k
$g_{j k}$	Unit-time resource consumption of charging pile k
$Q_{h}^{j} (t)$ $, Q_{l}^{j} (t)$	Queue length of high- and low-priority vehicles at charging station j at time t
$N_{tem, non}^{j}$	Number of temporarily allocated non-reserved charging piles at charging station j
$N_{non}^{j}$	Number of non-reserved charging piles at charging station j
$C_{r}^{j}$	Set of reserved charging piles at charging station j
$C_{a}^{j}$	Set of temporarily allocated non-reserved charging piles at charging station j
$x_{k}^{j, l} (t)$ $, x_{k}^{j, l} (t)$	Decision variable for high- and low-priority users’ utilization of charging pile j
$x_{k}^{j, temp} (t)$	Temporary allocation state decision variable for charging pile j
$N_{j}^{non}$	Remaining number of charging piles at charging station j
$r_{r}^{l}$	Arrival rate of random vehicles in subregion r
$p_{r, j}$	Proportion of random vehicles in subregion r assigned to charging station j
$W_{q} (j)$	Queue waiting time of random vehicles at charging station j
$P_{w} (j)$	Probability of entering queue at charging station j
$T_{r, j}$	Travel time from subregion r to charging station j
$d_{r, j}$	Distance from subregion r to charging station j
$v_{r}$	Average travel speed in subregion r
P	Probability of accepting a new solution
$Δ F$	Difference between the objective function value of the new solution and the current solution
T₀, T	Initial temperature and current temperature
α	Cooling factor
${SOC}_{h} (t)$ $, {SOC}_{l} (t)$	SOC threshold for high- and low-priority vehicles
$P_{s} (t)$	Charging pile status
$s_{t}$ $, s_{t + 1}$	Current and next system states
$P (s_{t + 1} \| s_{t}, a_{t})$	State transition probability
$r (s_{t}, a_{t}, s_{t + 1})$	Reward function
$W (s_{t}, a_{t})$ $, T (s_{t}, a_{t})$	Average charging wait time and travel time of users in the region
$U (s_{t}, a_{t})$	Utilization rate of charging piles at charging stations in the region
$λ_{1}$ $, λ_{2}$ $, λ_{3}$	Charging wait time factor, charging travel time factor, and charging pile utilization factor
ΔP_c, ΔW, ΔT, ΔU	Charging priority parameter, changes in charging wait time, charging travel time, and charging pile utilization
$ε_{k}$	Exploration rate at the k-th training iteration
$λ$	Random number in the range [0, 1]
$ε_{s}$ $, ε_{e}$	Initial and final exploration rates
$ψ$	Total decay steps
δ_i	TD error
P(i)	Experience sampling probability
α	Influence factor of prioritized sampling
β	Balancing factor for importance sampling weight
N	Size of the experience replay buffer
$a^{'}$	Target action
Q_Eval, Q_Target	Evaluation and target Q-values
θ, θ′	Target network parameters
m	Mini-batch sample size
L(θ)	Loss function value
J	Maximum training iterations

Table 2. Comparison of evaluation indicators at different step sizes.

Set Size	MAE	R²
3	1.354	0.872
5	1.126	0.905
10	0.983	0.937
20	1.101	0.923

Table 3. Candidate hyperparameter values for grid search.

Hyperparameter	Candidate Values
Time step	{3, 5, 10, 20}
Number of attention layer nodes	{64, 128, 256}
Hidden layer configuration	{Single Layer (128), Double Layer (128, 64), Triple Layer (256, 128, 64)}
Learning rate	{0.0005, 0.001, 0.005}
Batch size	{16, 32, 64}
Number of training epochs	{100, 200, 300}
Dropout rate	{0.2, 0.5}
Optimizer	Adam

Table 4. Attention-LSTM model parameter setting table.

Parameter Name	Value
Number of nodes in the attention layer	128
Time step	10
Training sequence ratio	80%
Test sequence ratio	20%
Batch size	32
Training round	200
Intermediate layer and number of nodes	Two floors (128, 64)
Activation function	ReLU
Optimizer	Adam
Learning rate	0.001
Number of dropout layers and ratio	One floor (0.2)

Table 5. Simulated Annealing algorithm parameter setting table.

Parameter Name	Parameter Value
Initial temperature T₀	100
Cooling factor α	7
Temperature stopping threshold T_min	10⁻⁴
Maximum number of iterations K_max	1000
Neighborhood search range δα	±0.01

Table 6. DDPDQN Algorithm parameter setting table.

Parameter Name	Parameter Value
Number of hidden layer nodes	128, 64, 32
Initial optimizer/learning rate	Adam/0.001
Post-optimizer/learning rate	SGD/0.0001
Experience pool capacity	10,000
Batch size	64
TD error threshold (ε)	200
Initial exploration rate	1.0
Final exploration rate	0.01
Exploration rate decay step	10,000
Discount factor	0.99
Number of training rounds	20,000
Target network update frequency	500

Table 7. Comparison of the final mean square errors of the models.

Algorithm	MSE
LSTM	3.876
BPLSTM	2.745
DRNN	1.173
ALSTM	0.238

Table 8. Comparison of MAE and R² values of different models in testing phase.

Algorithm	MAE	R²
LSTM	3.127	0.903
BPLSTM	2.014	0.935
DRNN	0.915	0.961
ALSTM	0.341	0.978

Table 9. Experimental parameter setting table.

Parameter Name	Parameter Value
Reservation vehicle arrival rate r_h	14/h–26/h
Random vehicle arrival rate r_l	9/h–14/h
Minimum queuing time $E_{thr}^{\min}$	15 min
Maximum queue waiting time $E_{thr}^{\max}$	45 min
waiting time balance factor between reserved and random vehicles φ	10
Individual charging post service rate (number of vehicles that can be served per unit of time) $μ$	0.4/h

Table 10. Parameter setting table for reserve pile selection.

Charge Priority Parameters	Number of Piles Reserved	Minimum Desired Queue Waiting Time	Maximum Desired Queuing Time	Average Waiting Time in Queues in the Region
(1) r_h = 14/h
0.05	1	14	30	22.1
0	0	15	31	23.3
(2) r_h = 16/h
0.057	2	14	30	23.4
0.007	1	15	31	23.4
0	0	16	32	24.2
(3) r_h = 18/h
0.065	2	15	31	25.9
0.015	1	16	32	25.6
0	0	17	33	26.7
(4) r_h = 20/h
0.071	2	16	32	28.7
0.021	1	17	33	28.1
0	0	18	34	28.4
(5) r_h = 22/h
0.077	2	17	33	32.5
0.027	1	18	34	30.3
0	0	19	35	31.7
(6) r_h = 24/h
0.082	2	18	34	38.6
0.032	1	19	35	34.2
0	0	20	36	33.1
(7) r_h = 26/h
0.086	2	19	35	46.7
0.036	1	20	36	39.4
0	0	21	37	36.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, S.; Guo, D.; Zhou, B.; Zheng, C.; Li, Z.; Ma, P. An Adaptive Electric Vehicle Charging Management Strategy for Multi-Level Travel Demands. Sustainability 2025, 17, 2501. https://doi.org/10.3390/su17062501

AMA Style

Zhang S, Guo D, Zhou B, Zheng C, Li Z, Ma P. An Adaptive Electric Vehicle Charging Management Strategy for Multi-Level Travel Demands. Sustainability. 2025; 17(6):2501. https://doi.org/10.3390/su17062501

Chicago/Turabian Style

Zhang, Shuai, Dong Guo, Bin Zhou, Chunyan Zheng, Zhiqin Li, and Pengcheng Ma. 2025. "An Adaptive Electric Vehicle Charging Management Strategy for Multi-Level Travel Demands" Sustainability 17, no. 6: 2501. https://doi.org/10.3390/su17062501

APA Style

Zhang, S., Guo, D., Zhou, B., Zheng, C., Li, Z., & Ma, P. (2025). An Adaptive Electric Vehicle Charging Management Strategy for Multi-Level Travel Demands. Sustainability, 17(6), 2501. https://doi.org/10.3390/su17062501

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Adaptive Electric Vehicle Charging Management Strategy for Multi-Level Travel Demands

Abstract

1. Introduction

2. Scheduling Strategy for Reserved Vehicles

2.1. Charging Priority Parameter Calculation Model

2.1.1. Attention Mechanism

2.1.2. Attention-LSTM Model

2.2. Algorithm for Reserving Charging Piles for Selection

3. Non-Reservation Vehicle Scheduling Strategy

3.1. Optimization Model of Scheduling Strategy

3.2. Optimization Algorithm

4. Reservation Vehicle and Charging Pile Matching Algorithm

4.1. Reservation Vehicle and Charging Pile Matching Model

4.2. Vehicle and Charging Pile Matching Framework

5. Simulation Experiments and Analysis

5.1. Simulation Scenario Setup

5.2. High-Priority Vehicle Arrival Rate Prediction: Experiments and Analysis

5.2.1. Dataset

5.2.2. Training Phase Analysis

5.2.3. Experimental Analysis in the Testing Phase

5.3. Impact of Reserved Charging Piles: Experiment and Analysis

5.4. Reserved Vehicle and Charging Pile Matching: Experiment and Results Analysis

5.5. Validation of Charging Efficiency Results

5.6. Model Parameter Impact Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI