Adaptive Multi-Agent Reinforcement Learning for Optimizing Dynamic Electric Vehicle Charging Networks in Thailand

Jamjuntr, Pitchaya; Techawatcharapaikul, Chanchai; Suanpang, Pannee

doi:10.3390/wevj15100453

Open AccessArticle

Adaptive Multi-Agent Reinforcement Learning for Optimizing Dynamic Electric Vehicle Charging Networks in Thailand

by

Pitchaya Jamjuntr

¹,

Chanchai Techawatcharapaikul

¹

and

Pannee Suanpang

^2,*

¹

Electronic and Telecommunication Engineering, Faculty of Engineering, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand

²

Department of Information Technology, Faculty of Science & Technology, Suan Dusit University, Bangkok 10300, Thailand

^*

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2024, 15(10), 453; https://doi.org/10.3390/wevj15100453

Submission received: 22 August 2024 / Revised: 1 October 2024 / Accepted: 3 October 2024 / Published: 6 October 2024

(This article belongs to the Special Issue Electric Vehicles and Charging Facilities for a Sustainable Transport Sector)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The rapid growth of electric vehicles (EVs) necessitates efficient management of dynamic EV charging networks to optimize resource utilization and enhance service reliability. This paper explores the application of adaptive multi-agent reinforcement learning (MARL) to address the complexities of EV charging infrastructure in Thailand. By employing MARL, multiple autonomous agents learn to optimize charging strategies based on real-time data by adapting to fluctuating demand and varying electricity prices. Building upon previous research that applied MARL to static network configurations, this study extends the application to dynamic and real-world scenarios, integrating real-time data to refine agent learning processes and also evaluating the effectiveness of adaptive MARL in maximizing rewards and improving operational efficiency compared to traditional methods. Experimental results indicate that MARL-based strategies increased efficiency by 20% and reduced energy costs by 15% relative to conventional algorithms. Key findings demonstrate the potential of extending MARL in transforming EV charging network management, highlighting its benefits for stakeholders, including EV owners, operators, and utility providers. This research contributes insights into advancing electric mobility and energy management in Thailand through innovative AI-driven approaches. The implications of this study include significant improvements in the reliability and cost-effectiveness of EV charging networks, fostering greater adoption of electric vehicles and supporting sustainable energy initiatives. Future research directions include enhancing MARL adaptability and scalability as well as integrating predictive analytics for proactive network optimization and sustainability. These advancements promise to further refine the efficacy of EV charging networks, ensuring that they meet the growing demands of Thailand’s evolving electric mobility landscape.

Keywords:

reinforcement learning; adaptive multi-agent systems; electric vehicle charging; dynamic optimization

1. Introduction

1.1. Background and Motivation

In recent times, electric vehicles (EVs) have captured considerable attention and are being widely endorsed by various nations as an environmentally friendly transportation alternative [1,2]. The primary appeal of EVs is their zero-emission feature, which significantly aids in promoting environmental sustainability [3]. Besides their ecological benefits, EVs are economically advantageous, offering notable cost savings over conventional gasoline engines. They are also celebrated for their seamless, intuitive driving experience [4]. The emission-free operation of EVs makes them a preferred choice for environmentally conscious individuals. Furthermore, the lower operational expenses compared to traditional gasoline vehicles, combined with their ease of use, further contributes to their growing popularity [5]. The rapid adoption of electric vehicles (EVs) globally has necessitated advancements in the management of EV charging networks to enhance efficiency and reliability [6]. As the demand for EVs increases, so does the need for intelligent solutions to manage the dynamic nature of EV charging infrastructure [7]. Traditional methods of EV charging management often fall short in addressing the complexities arising from fluctuating demand and varying electricity prices [8].

In response to these challenges, recent research has explored the application of advanced artificial intelligence (AI) [9,10] techniques, particularly multi-agent reinforcement learning (MARL), to optimize EV charging strategies [11,12]. MARL involves multiple autonomous agents that learn to optimize their actions through interactions with the environment and each other. This approach is particularly suited for dynamic and complex systems like EV charging networks, where agents can adapt to real-time data and evolving conditions [13].

Previous studies have demonstrated the potential of MARL in improving the management of EV charging infrastructure. For instance, Chen et al. [14] highlighted how MARL can maximize rewards for both EV owners and operators by optimizing charging schedules based on real-time data. Moreover, Sun et al. [15,16] emphasized the scalability of MARL in handling large-scale EV charging networks, showcasing its applicability in diverse settings. Additionally, Suanpang and Jamjuntr [1,3] proposed a novel approach for recommending EV charging stations in smart cities using MARL algorithms. Their study compared several popular algorithms, including the deep deterministic policy gradient, deep Q-network, and multi-agent DDPG (MADDPG), demonstrating that MADDPG outperformed other algorithms in terms of Mean Charge Waiting Time, Charge Flow Time (CFT), and Total Saving Fee. This research highlighted the collaborative and communicative nature of the MADDPG algorithm, indicating its superiority in addressing the EV charging station problem in a multi-agent setting and providing a better user experience, thereby contributing to the development of more efficient and sustainable transportation systems in smart cities [2,3,13,17,18].

Our original research [1,2] laid the groundwork for applying MARL in the context of smart city environments. This current study extends our previous work by focusing on the specific challenges and opportunities present in Thailand’s EV charging infrastructure. Building upon the foundational principles established in our earlier research, we aim to adapt MARL techniques to effectively manage the dynamic and diverse charging network landscape in Thailand [19]. By leveraging the insights gained from previous experiments and advancements in MARL algorithms, we seek to enhance the adaptability and efficiency of EV charging strategies [2,3]. Our research explores novel approaches to address real-world complexities, including fluctuating electricity prices, varying EV user behaviors, and evolving regulatory environments. Through empirical validation and comparative analysis, we aim to quantify the performance improvements that are achievable with adaptive MARL, thereby contributing to the body of knowledge on sustainable urban mobility solutions.

1.2. Problem Statement

1.2.1. Research Problem

EVs are rapidly gaining popularity worldwide, leading to increased demand for the efficient management of EV charging networks. However, traditional methods of EV charging management often struggle to adapt to the dynamic nature of EV usage patterns, fluctuating electricity prices, and evolving user behaviors [20,21]. This creates inefficiencies in resource utilization and reliability, posing challenges for both EV owners and charging infrastructure operators.

1.2.2. Thailand’s EV Industry

In Thailand, the growth of EVs has outpaced the development of robust charging infrastructure management systems [19]. Existing approaches primarily rely on static scheduling and pricing strategies, which do not effectively optimize charging operations in response to real-time data and varying conditions [1,2]. This gap highlights the critical need for adaptive solutions that can dynamically adjust charging strategies to maximize efficiency and user satisfaction while minimizing operational costs.

The research problem addressed in this study revolves around developing and evaluating an adaptive MARL framework tailored to the specific challenges of managing EV charging networks in Thailand [1,3,4]. By harnessing the collective intelligence of autonomous agents, MARL offers a promising approach to optimizing charging schedules, balancing load distribution, and responding intelligently to fluctuating demand and electricity prices. In addition, the effectiveness of such an approach is assessed through empirical analysis, comparing the performances of MARL-based strategies against traditional methods [3].

In the context of Thailand, where the EV market is rapidly growing, the implementation of adaptive MARL presents a promising solution for enhancing the operational efficiency of charging networks. Figure 1 illustrates Thailand’s EV charging context. Thailand presents a unique context for the deployment of EV charging networks due to several factors [3,14,20]:

Current State of EV Adoption: The adoption of EVs in Thailand is growing, driven by government incentives and increasing consumer awareness. This has caused the market estimates of Thailand’s prospects of selling EVs to be revised and the burden to increase significantly primarily because of better sales growth achieved in the first half of 2024 than expected. In this period, a total of 49,319 EVs were sold, which is above the original estimate for the sales of the whole year. Therefore, the sales forecasts have been revised to reflect sales of 80,700 units, with an increase of 151% compared to 20.7% estimated last time. Even though most of the forecasts expect an overall contraction in total vehicle sales, it is anticipated that there will be a surge in sales of EVs because of the attractive subsidies that will maintain high demand for them [22]. The dominance of specific EV brands and charging point manufacturers shapes the infrastructure landscape.
Electricity Pricing Structures: Thailand’s electricity pricing includes potential for time-of-use tariffs, which can influence charging behaviors and grid load management strategies.
Geographic Distribution: The distribution of charging stations and the user base across urban and rural areas presents unique challenges in ensuring equitable access and efficient utilization.
Grid Constraints: Integrating a large number of EVs into the power grid can lead to overloads and inefficiencies if not managed properly.
Fair Resource Allocation: Ensuring that charging resources are allocated fairly among users will help to prevent bottlenecks and long waiting times.

Addressing these challenges requires innovative solutions that can adapt to changing conditions in real time, optimize resource allocation, and enhance the overall efficiency of the EV charging network [3,20].

1.3. Managing Dynamic EV Charging Demands Presents Several Challenges

The management of dynamic EV charging demands poses significant challenges due to the variability influenced by factors such as time of day, location, and user behavior [21]. During peak hours, particularly in the mornings and evenings, there is a surge in charging demand as commuters charge their vehicles before and after work, while business districts and shopping centers experience heightened demand during the day compared to residential areas. The effective management of these fluctuations is crucial to prevent grid overload, minimize waiting times, and ensure the equitable distribution of charging resources [1,3,22]. The impact of these demand fluctuations extends to grid stability, operational efficiency, and user convenience, necessitating advanced technological solutions like smart charging technologies, demand response systems, and predictive analytics. These innovations enable charging stations to adjust operations in real time based on electricity prices, grid capacity, and user preferences, thereby improving overall system reliability [23,24]. Policy frameworks supporting smart grid investments, charging protocol standardization, and incentives for off-peak charging are essential to the mitigation of peak demand spikes and the fostering of sustainable growth in EV infrastructure. This research endeavors to implement an adaptive multi-agent reinforcement learning (MARL) framework tailored to Thailand’s specific challenges by integrating data analytics, smart grid technologies, and policy insights to optimize charging operations and enhance grid resilience in Thailand’s evolving EV ecosystem [20].

This research thus tailors the MARL framework to consider these Thailand-specific factors, aiming to enhance the effectiveness and applicability of the proposed solution in the Thai context. By addressing local challenges and leveraging opportunities, the adaptive MARL approach can better meet the needs of Thailand’s EV charging infrastructure.

1.4. Research Objectives

This paper explores the use of adaptive MARL as a promising approach to managing dynamic EV charging networks. MARL involves multiple autonomous agents that learn to make decisions through interactions with their environment and with each other. By leveraging MARL, we aim to achieve the following objectives:

–: Dynamic Adaptation: Develop an adaptive MARL framework capable of responding to real-time fluctuations in charging demand and supply.
–: Efficiency and Fairness: Optimize the allocation of charging resources in order to maximize overall network efficiency while ensuring fair access for all users.
–: Scalability: Design a scalable solution that can be applied to large and complex EV charging networks.

Moreover, building upon this foundation, the present study aims to extend the application of adaptive MARL to the specific context of Thailand’s EV charging networks. By integrating real-time data on electricity prices and demand fluctuations, this research seeks to enhance the adaptability and efficiency of EV charging management [1,3]. This study also explores future research directions, including the integration of predictive analytics for proactive network optimization and sustainability [20,21,22,23,24,25,26].

1.5. Contributions

This study makes several significant contributions to the field of EV charging network management by extending adaptive multi-agent reinforcement learning (MARL) to address dynamic EV charging networks in Thailand [3]. The aim of this paper is to provide a comprehensive analysis of how adaptive MARL can be leveraged to address the challenges of dynamic EV charging networks, thereby contributing to the advancement of sustainable and efficient transportation solutions [19,26].

First, an innovative application of MARL tailored to the specific conditions and requirements of Thailand’s EV charging infrastructure is introduced [19,26]. By utilizing multiple autonomous agents that learn and adapt to real-time data, this study demonstrates how MARL can optimize charging strategies in response to fluctuating demand and varying electricity prices. This approach provides a robust framework for managing the complexities of EV charging networks, enhancing both efficiency and reliability [27,28,29,30,31,32].

Second, the research provides empirical evidence of the effectiveness of adaptive MARL in maximizing rewards and improving operational efficiency compared to traditional methods. This study’s findings highlight the potential for significant improvements in the management of EV charging networks, offering a viable solution for stakeholders, including EV owners, operators, and utility providers, to enhance service reliability and resource utilization [33,34,35,36,37,38].

Third, this study addresses the scalability and adaptability of MARL in large-scale, real-world applications. By integrating real-time data from various sources, such as electricity prices, grid status, and user demand, the research showcases how MARL can dynamically adjust charging strategies to optimize outcomes. This adaptability is crucial for ensuring the sustainable growth of EV infrastructure in rapidly developing markets like Thailand [39].

Lastly, this paper outlines future research directions, emphasizing the integration of predictive analytics and the enhancement of MARL’s adaptability and scalability. By exploring these areas, this study sets the stage for further advancements in the field, contributing to the ongoing development of intelligent, AI-driven approaches to EV charging network management [1,3,39].

Moreover, this paper has been split into various essential parts to provide for a thorough understanding of the research process, starting with the Introduction, which specifies the Background and Motivation, Problem Statement, Challenge, and Objectives. Next is the Literature Review, where the relevant materials are critiqued and gaps in the prior scholarship that the current study seeks to fill are described. The Methodology Section outlines the research framework, research designs, materials, and methods, which are vital for the reproducibility of this study. The Simulation and Results Section examines the data, presenting the results derived from the analysis. In the Discussion Section, evaluative assessments of the findings are offered with an additional evaluation of how such findings relate to established works and their implications. Finally, the Conclusions Section provides a summary of the significant outcomes of the research endeavor, together with the recommendations for further study.

2. Literature Review

2.1. Overview of EV Charging Networks

The rapid proliferation of EVs has led to the development of extensive EV charging networks. These networks are crucial for supporting the growing number of EVs on the road, ensuring that drivers have access to reliable and efficient charging infrastructure. However, several challenges persist in the current state of EV charging networks, including the uneven distribution of charging stations, long wait times during peak hours, and the integration of renewable energy sources into the charging grid [40,41]. Additionally, managing the dynamic nature of EV charging demand, influenced by factors such as time of day and geographic location, remains a significant challenge [42,43].

Key Components and Challenges: EV charging networks consist of various components, including charging stations, power grid connections, and software systems for managing charging operations. One of the primary challenges in developing these networks is ensuring adequate coverage and accessibility to meet the needs of diverse user groups. For instance, urban areas typically have higher densities of charging stations compared to rural areas, which can lead to disparities in access [41].

Technological Advancements: Recent technological advancements have significantly improved the efficiency and convenience of EV charging networks. The integration of smart grid technologies and Internet of Things (IoT) devices allows for the real-time monitoring and management of charging activities. These technologies enable dynamic load balancing and demand response, which are essential for optimizing the use of existing infrastructure and preventing grid overloads [27,44]. The application of artificial intelligence (AI) and multi-agent systems in EV charging networks is an emerging area of research [39,40,41]. Multi-agent systems can manage complex interactions between numerous charging stations and EVs, optimizing charging schedules and reducing waiting times [1,44]. For example, multi-agent reinforcement learning (MARL) has shown promise in improving the operational efficiency of EV charging networks by allowing autonomous agents to learn and adapt to real-time data [32,35,41,42].

Policy and Incentives: Government policies and incentives play a crucial role in promoting the development and adoption of EV charging networks. Financial incentives such as subsidies, tax breaks, and grants encourage both consumers and businesses to invest in EVs and charging infrastructure. Moreover, regulatory frameworks that support the integration of renewable energy sources into EV charging networks contribute to the sustainability of these systems [15,45].

Future Directions: Future research on EV charging networks is expected to focus on enhancing the scalability and interoperability of charging systems. The integration of predictive analytics and machine learning algorithms can further optimize charging operations by predicting user demand and adjusting charging strategies accordingly. Additionally, advancements in battery technology and wireless charging may lead to new models of EV charging that are more efficient and user-friendly [46,47,48].

2.2. EV Charging Networks in Thailand

The expansion of electric vehicle (EV) charging networks in Thailand is a critical aspect of the country’s strategy to promote electric mobility and reduce greenhouse gas emissions. The development and management of these networks are influenced by various factors, including technological advancements, policy frameworks, and market dynamics [1,3,4,20,28,29,30,31].

Key Components and Challenges: The EV charging infrastructure in Thailand consists of numerous charging stations, power grid connections, and software systems designed to manage charging operations. A significant challenge in the country is ensuring the equitable distribution of charging stations between urban and rural areas. Urban areas, particularly Bangkok, have a higher density of charging stations, which provides better access for EV users. In contrast, rural areas face a scarcity of charging infrastructure, creating barriers to EV adoption outside major cities [1,3,28].

Technological Advancements: Technological advancements have played a pivotal role in enhancing the efficiency and reliability of EV charging networks in Thailand [3,29,30,31]. The integration of smart grid technologies and Internet of Things (IoT) devices has enabled the real-time monitoring and dynamic management of the charging infrastructure [28,29,30,31]. These technologies facilitate load balancing and demand response, which are essential for optimizing the use of the power grid and preventing overloads [29].

Multi-Agent Systems and AI: The application of AI and multi-agent systems is an emerging trend in the management of EV charging networks in Thailand. Multi-agent systems, particularly those using reinforcement learning algorithms, can manage the complex interactions between numerous charging stations and EVs [40,41,42]. These systems help to optimize charging schedules and reduce waiting times by allowing autonomous agents to learn from real-time data and adapt their strategies accordingly [49].

Policy and Incentives: Government policies and incentives are crucial drivers of the development of EV charging networks in Thailand. The Thai government has implemented various measures to support the growth of EV infrastructure, including financial incentives such as subsidies and tax breaks for both consumers and businesses. Additionally, regulatory frameworks that encourage the use of renewable energy sources in EV charging stations contribute to the sustainability of the infrastructure [30,50].

Future Directions: Future research and development in EV charging networks in Thailand are expected to focus on improving scalability and interoperability. Enhancements in predictive analytics and machine learning can further optimize charging operations by predicting user demand and adjusting strategies accordingly. Additionally, advancements in battery technology and the development of wireless charging solutions may lead to more efficient and user-friendly EV charging models [3,31].

2.3. Traditional Approaches to EV Charging Management

Traditional methods for managing EV charging networks can be broadly categorized into centralized and decentralized approaches.

Centralized Methods: Centralized approaches involve a central authority that makes all the decisions regarding charging station operations. While this method can optimize the use of resources and ensure a uniform service level, it often suffers from scalability issues and a lack of responsiveness to real-time changes in demand [32,35,40,41,47,51].

Decentralized Methods: Decentralized approaches empower individual charging stations or regions to make their own decisions. This can lead to more responsive and flexible management but may also result in suboptimal resource utilization and coordination challenges [46,51].

The limitations of these traditional approaches highlight the need for more advanced and adaptive solutions. Centralized systems often cannot scale effectively to accommodate the increasing number of EVs, while decentralized systems struggle with coordination and optimization across the network [34,35,36].

2.4. Multi-Agent Reinforcement Learning

Basic Concepts of MARL: Multi-agent reinforcement learning (MARL) involves multiple agents that learn to make decisions by interacting with their environment and each other. Each agent seeks to maximize its own rewards while considering the actions and strategies of other agents. Figure 2 illustrates the MARL interaction. This approach is particularly well suited for complex and dynamic systems where centralized control is impractical [38,52].

Applications of MARL in Various Domains: MARL has been successfully applied in various fields, including robotics, traffic management, and smart grids. In traffic management, for example, MARL has been used to optimize traffic signal timings, leading to reduced congestion and improved traffic flow [39,53]. In smart grids, MARL helps with managing the distribution of energy resources by balancing supply and demand in real time [54,55].

2.5. Adaptive Techniques in MARL

Importance and Benefits of Adaptability: Adaptability is a critical feature of MARL, enabling agents to continuously update their strategies based on new information and changing conditions. This is particularly important in dynamic environments like EV charging networks, where demand and supply can fluctuate rapidly [26,46].

Figure 3 illustrates the critical role of adaptability in MARL, emphasizing continuous strategy updates in dynamic environments like EV charging networks.

Previous Studies on Adaptive MARL in Dynamic Environments: Several studies have explored the use of adaptive MARL techniques in dynamic settings. For instance, Lee et al. (2020) demonstrated how adaptive MARL could be used to manage dynamic traffic signals, resulting in significant improvements in traffic flow. Similarly, in the context of smart grids, adaptive MARL has been shown to enhance the management of distributed energy resources, leading to more stable and efficient grid operations [12,49].

Recent research has specifically focused on the application of adaptive MARL in EV charging networks. For example, Zhang et al. [55,56] proposed an adaptive MARL framework for optimizing the placement and operation of EV charging stations, achieving better performance compared to static methods. Additionally, a study conducted by Liu et al. [57] highlighted the benefits of using adaptive MARL for dynamic pricing in EV charging networks, which helped with balancing the load and reducing peak demand [58].

In summary, the adaptability of MARL makes it a promising approach for managing dynamic EV charging networks. By continuously learning and adapting to new information, MARL can help with optimizing resource allocation, enhancing efficiency, and ensuring fairness in EV charging networks [55,58,59,60,61,62].

2.6. Related Study

In our prior study, “Optimizing Electric Vehicle Charging Recommendation in Smart Cities: A Multi-Agent Reinforcement Learning Approach” [3], we explored the application of multi-agent reinforcement learning (MARL) to enhance electric vehicle (EV) charging recommendations within smart city environments. This study addressed the pressing need for efficient charging infrastructure by leveraging MARL’s ability to coordinate multiple charging stations autonomously [1,3,4]. Key contributions included the development and implementation of a MARL framework specifically designed for managing EV charging, which enables charging stations to adapt their own strategies in real time based on user demand and system conditions. The results demonstrated significant improvements in charging efficiency, a reduction in waiting times, and enhanced resource management through collaborative agent interactions. The study underscored the scalability and practical feasibility of MARL in optimizing EV charging operations in urban settings, paving the way for future enhancements in dynamic pricing integration, renewable energy utilization, and user interface improvements for seamless charging experiences. Moreover, in the research study “An Integrated Analysis of Electric Battery Charging Station Selection—Thailand Inspired” [63], the authors discussed the problem of finding appropriate places for the location of EV charging stations in Thailand. The study presented a thorough analysis consisting of geographic, demographic, technical, and economic aspects that are vital in the selection process. The research adopted a quantitative approach whereby data were collected from 300 entrepreneurs within the EV charging station industry by use of a questionnaire. The key findings from the study noted that technical and infrastructure factors were key drivers of the economic and financial implications of the location and selection of the charging station, which comes last in the chain. In addition, the research reiterated the role of geographic and demographic characteristics in economic outcomes and related strategic placement.

3. Methodology

3.1. Research Framework

Figure 4 illustrates the structured research framework used in this study. The framework encompasses four main stages: the Literature Review, Methodology, Implementation, and Evaluation. Each stage delineates specific tasks crucial for advancing the understanding and application of adaptive MARL in optimizing EV charging infrastructure. The framework guides the systematic investigation from the review of the existing literature on MARL in EV charging to the design and development of a tailored MARL framework, integrating it with real-world EV charging data and evaluating its performance against traditional methods. This comprehensive approach aims to address the unique challenges posed by Thailand’s dynamic EV landscape, offering insights into enhancing sustainability and efficiency in urban transportation systems.

3.2. MARL Framework for EV Charging Networks

Description of the Proposed MARL Framework

In this study, we proposed a multi-agent reinforcement learning (MARL) framework tailored for electric vehicle (EV) charging networks. The aim of this framework was to optimize charging station operations through collaborative learning among multiple agents.

Roles and Interactions of Agents in the Network: Agents in the network represented individual EV charging stations. They interacted by making decisions on charging operations based on shared environmental feedback and local observations.

Environment Design: The environment design aimed to replicate the operational complexity of real-world EV charging networks. It integrated dynamic variables such as varying electricity demand across stations, fluctuating energy prices, and real-time operational constraints. This setup enabled realistic simulations for testing and optimizing MARL strategies tailored to enhance charging efficiency and user experience.

Simulation Environment and Assumptions:

EVChargingEnv: This environment simulated a multi-station EV charging network where each station could independently decide its charging actions;
Assumptions: The environment assumed dynamic factors such as varying demand patterns across stations and fluctuating energy prices.

Dynamic Factors Considered:

Varying Demand: Each station faced varying levels of incoming EVs for charging;
Changing Prices: Energy prices fluctuated based on external factors and demand-supply dynamics;
Renewable Energy Availability: The incorporation of renewable energy sources with variable availability impacting charging decisions;
Grid Constraints: The consideration of grid capacity and peak load times to avoid overloading the power grid.

Agent Architecture: Agent Types and Their Roles:

Charging Stations: Agents represented individual charging stations;
Coordinator Agent: A centralized or decentralized coordination agent for communication and policy enforcement.

Communication and Coordination Mechanisms:

Decentralized: Stations communicated through shared environmental states and possibly local agreements;
Centralized: The coordinator agent facilitated global coordination and policy enforcement.

Action Space and State Representation (Definition of Actions and States for Each Agent):

Action Space: Each charging station agent selected actions to charge (1) or not charge (0) EVs, as well as to adjust charging rates and prioritize certain EVs based on predefined criteria;
State Variables: States included current charge levels (C_i), queue lengths (Q_i) energy prices, renewable energy availability, and grid constraints for each station (i).

State Variables and Their Significance:

Charge Levels (C_i): Indicated current availability for charging;
Queue Lengths (Q_i): Reflected pending charging requests, affecting station load management;
Energy Prices: Dynamic pricing information influencing cost-effective charging decisions;
Renewable Energy Availability: Data on renewable energy sources affecting green charging strategies;
Grid Constraints: Information on grid capacity and peak load times to ensure stable grid operation.

Reward Function Design:

Efficiency: Reward for successful charging completion:

R_{charge} (s, a) = \sum_{i = 1}^{N} r_{i}

(1)

r_{i} = \{\begin{matrix} 1 i f a_{i} = 1 a n d C_{i} < m a x_c a p a c i t y \\ - 1 o t h e r w i s e \end{matrix}

(2)

where (N) is the number of stations, (s) is the current state,

(a)

is the action vector, and

(a_{i})

denotes the action for station (i).

Fairness: Penalty for overcharging or underutilization to balance station workload:

R_{fair} (s, a) = - λ \sum_{i = 1}^{N} |Q_{i} - Q_{target}|

(3)

where

λ

is a fairness coefficient,

Q_{i}

is the queue length of station (i), and Q_target is the target queue length.

Reward Functions

Cost Minimization: Reward for minimizing energy costs:

R_{cost} (s, a) = - β \sum_{i = 1}^{N} p_{i} \cdot e_{i}

(4)

where

β

is a cost sensitivity coefficient,

p_{i}

is the energy price at the station (i), and

e_{i}

is the energy consumed by the station (i).

Grid Stability: Reward for maintaining grid stability:

R_{grid} (s, a) = - γ {(\sum_{i = 1}^{N} {load}_{i} - grid_limit)}^{2}

(5)

where

γ

is a stability coefficient, load_i is the load at the station (i), and

(grid_limit)

is the maximum allowable load on the grid.

Balancing Efficiency, Fairness, Cost, and Grid Stability

Total Reward: Combination of all reward components:

R_{total} (s, a) = α R_{charge} (s, a) + λ R_{fair} (s, a) + β R_{cost} (s, a) + γ R_{grid} (s, a)

(6)

where

(α, λ, β, γ)

are weighting factors to balance the different objectives.

Learning Algorithm

The multi-agent reinforcement learning (MARL) algorithm employed in this study enabled multiple autonomous agents to collaboratively optimize EV charging network management. The algorithm used was Deep Q-Learning (DQN), where agents utilize deep neural networks to approximate Q-values, which enables learning of optimal charging strategies through trial and error. Below is a detailed description of the algorithm and its components:

Figure 5 illustrates the Deep Q-Learning (DQN) algorithm used in this study to manage the EV charging network through multiple autonomous agents representing charging stations. The diagram showcases the interactions between the various components of the DQN algorithm. The agents (charging stations) received the current system status (state) and, using the Q-Network (deep neural network), determined the optimal charging decisions (actions). The environment (EV charging network) provided immediate feedback (rewards) based on these actions. The Q-values were updated based on this feedback, refining the policy (action selection strategy) and enabling agents to improve their decisions over time by learning from experience. This cyclical process continued, aiming to optimize the overall efficiency and reliability of the EV charging network.

Deep Q-Learning (DQN): Agents employ deep neural networks to approximate Q-values for high-dimensional state and action spaces, facilitating optimal charging strategy learning through iterative trials.

Experience Replay: This technique stores past experiences in a replay buffer and samples random mini-batches during training. This approach breaks the correlation between consecutive experiences, leading to stable and efficient learning.

Target Network: A separate target network maintains stable target Q-values for training. It is updated less frequently than the main network, preventing oscillations and ensuring convergence in the learning process.

Epsilon-Greedy Exploration: Balancing exploration and exploitation, this strategy randomly selects actions with probability epsilon and chooses the best-known action otherwise. Epsilon decays over time, shifting from exploration to exploitation as learning progresses.

Neural Network Architecture: The Q-function is approximated using a multi-layer perceptron (MLP) with two hidden layers. This architecture effectively handles the complexity of the EV charging environment’s state and action spaces.

This approach ensures robust and efficient learning in complex environments, specifically tailored for optimizing EV charging network management through collaborative agent interactions.

3.3. Algorithm Design

In this work, we applied an adaptive multi-agent reinforcement learning (MARL) approach to optimize dynamic EV charging operations across multiple stations. Each charging station was represented by an independent agent utilizing the Deep Q-Learning (DQN) algorithm, allowing decentralized and adaptive decision-making. Below, we provide a detailed breakdown of the key components, hyperparameter settings, and optimization strategies employed.

3.3.1. Hyperparameters

Learning Rate (α): The selected value for the learning rate was 0.001 since this is an appropriate trade-off in optimum training time and convergence. Reducing the learning rate would imply reduced learning, while increasing the rate may induce incoherence in learning.

Discount Factor (γ): It was decided to adopt γ = 0.99 as this enabled the agents to look towards future benefits instead of taking benefits in the present. It enhanced the management of resources in the long run and helped to secure the grid.

Epsilon Decay: The exploration center (ϵ) was initialized to 1.0 and reached a minimum of 0.01 towards the end of the 10,000 episodes. This is because it enabled agents to try learning new behaviors in the beginning before, afterward, inducing the agents to stop doing so and utilize the policies they have acquired.

Minimum Epsilon (ε_min): It was taken as 0.01 so that active learning did not vanish fully even in adverse training policies, and the heuristic search was not stifled altogether by the deployment of the trained agents.

Batch Size (B): The selected batch size of 64 was considered ideal given the current and the anticipated volume of training data relative to the computational efficiency.

Size of Internal Memory M: The size of the replay memory was set to 50,000 transitions in order to allow agents to learn from many different past experiences while avoiding excessive memory use.

3.3.2. Optimization Strategy

Exploration–Exploitation Balance: The epsilon-greedy was employed by the agents to select an action, where a random action was selected with probability (ϵ); otherwise, the action which had the maximum Q-value was chosen. This balance helped the agents with simple exploration at first and then, in the later stages, with using this knowledge to exploit.

Reward Shaping: The agents were rewarded for behaving adaptively by penalizing low charging efficiency and rewarding low penalty for behaviors such as idling or grid resource overconsumption. The efficiency of charging and stability of grids are positively reinforced, while the wastage of resources incurs negative reinforcement.

Target Networks and Gradient Descent: Training was made more stable by introducing another set of Q-networks referred to as target networks,

Q_{i}^{'}

, whose parameters were changed after every C steps. The minimization of the loss function (y − Q_i(s, a))² was carried out by applying the gradient descent method, and the target value y is given by:

y = \{\begin{matrix} r i f d o n e f l a g i s t r u e \\ r + γ \max_{a^{'}} Q_{i}^{'} (s^{'}, a^{'}) otherwise \end{matrix}

(7)

Replay Buffers: Each agent stored transitions

(s_{t}, a_{t}^{i}, r_{t}^{i}, s_{t + 1}, d_{t})

in its replay buffer and sampled a mini-batch to update its Q-network. This technique greatly enhanced efficiency by letting agents acquire valuable lessons learned in previous tasks.

3.3.3. Simulation Parameters

Number of Charging Stations: Filtration agents’ parameters were varied in many scenarios for the scalability testing of the approach.

EV Arrival Rates: Different scenarios were engineered, where the agents faced varying demand by modifying the arrival rate aspects.

Dynamic Pricing Models: The models integrated a dynamic pricing mechanism for energy usage and investigated charging stations’ behavior in changing energy pricing.

3.3.4. Evaluation Metrics

Charging Ratio: The percentage of the EVs charged, among those that could be charged, in a fixed amount of time.

Equal Distribution Index: A figure that showed in what way the available charging resources were shared by the stations to guarantee fairness.

Reduction in Energy Expenses: The opportunity for energy expense reduction due to the better configuration of the charging time.

Stability of Load Demand for the Grid: The extent to which charging activities contributed to the stability and reliability of the power grid or other aspects during busy periods.

3.4. Algorithm: Adaptive Multi-Agent Reinforcement Learning for Dynamic EV Charging Networks

Algorithm 1 illustrate the adaptive MARL for dynamic EV charging networks.

Algorithm 1. Adaptive Multi-Agent Reinforcement Learning for Dynamic EV Charging Networks.
1	1. Initialization
2	1.1 Set hyperparameters:
3	Number of agents: (N);
4	Number of episodes: (E);
5	Maximum steps per episode: (T);
6	Learning rate (α);
7	Discount factor: (γ);
8	Exploration rate: (ϵ);
9	Minimum exploration rate: (ϵ_min)—Exploration decay rate: ϵ_decay;
10	Batch size: (B);
11	Replay memory size: (M).
12	1.2 Initialize the environment:
13	Observation space: (S);
14	Action space: (A).
15	1.3 Initialize agents $({{Agent}_{i}}_{i = 1}^{N})$ with the following.
16	Q-networks (Q_i) and target Q-networks ( $Q_{i}^{'}$ );
17	Replay buffers ( $D_{i}$ );
18	2. Training
19	2.1 For each episode $(e \in \{1, 2, \dots, E\})$ :
20	2.1.1 Reset the environment to reach the initial state (s₀).
21	2.1.2 For each step $(t \in \{0, 1, \dots, T - 1\}) :$
22	2.1.2.1 For each agent $(i \in \{1, 2, \dots, N\}) :$
23	Select action ( $a_{t}^{i}$ ) using (ϵ)-greedy policy:
24	$a_{t}^{i} = \{\begin{matrix} random action if random (0, 1) < ϵ \\ a r g m a x_{a} Q_{i} (s_{t}, a) otherwise \end{matrix}$ (8)
25	Execute action ( $a_{t}^{i}$ ) in the environment;
26	Observe next state (S_t+1), reward ( $r_{t}^{i}$ ), and the done flag (d_t);
27	Store transition $(s_{t}, a_{t}^{i}, r_{t}^{i}, s_{t + 1}, d_{t})$ in a replay buffer $(D_{i})$ .
28	2.1.2.2 If $D_{i}$ contains at least B transitions:
29	Sample mini-batch of B transitions from $D_{i}$ ;
30	For each transition $(s, a, r, s^{'}, d)$ :
31	$y = \{\begin{matrix} r i f d \\ r + γ \max_{a^{'}} Q_{i}^{'} (s^{'}, a^{'}) otherwise \end{matrix}$ (9)
32	Perform a gradient descent step on ${(y - Q_{i} (s, a))}^{2}$ to update Q_i
33	2.1.2.3 Every C step, update the target network:
34	$Q_{i}^{'} \leftarrow Q_{i}$ (10)
35	2.1.3 Update the exploration rate (ϵ):
36	$ϵ \leftarrow \max (ϵ \cdot ϵ_{decay}, ϵ_{m i n})$ (11)

3.5. Evaluation

The process of evaluating the performances of the agents after training:

Run the environment without exploration $(i . e ., ϵ = 0)$ ;
Collect and analyze metrics such as cumulative rewards, charging efficiency, and network stability.

This algorithm outlined the adaptive multi-agent reinforcement learning framework for optimizing dynamic EV charging networks, ensuring both individual learning and collaborative performance improvement.

Figure 6 illustrates the workflow of the adaptive multi-agent reinforcement learning (MARL) algorithm designed for dynamic EV charging networks. The process began with initializing the parameters, environment, and agents. Each episode involved resetting the environment and running through a series of steps where agents selected actions, interacted with the environment, observed the results, and stored their experiences in a replay buffer. When sufficient experiences were collected, agents sampled mini-batches from the replay buffer to update their Q-networks using gradient descent. Periodically, the target networks were updated to stabilize training. The exploration rate was decayed after each episode in order to balance exploration and exploitation. Finally, the agents were evaluated after all episodes were completed in order to assess their performance. The flowchart provides a comprehensive view of the iterative learning and decision-making process within the MARL framework for optimizing dynamic EV charging operations.

4. Simulation and Results

4.1. Simulation Setup

Parameters and Configurations

Table 1 illustrates a description of test scenarios: The simulation emulates an electric vehicle (EV) charging network environment. Agents make decisions (charge, discharge, idle) based on real-time demand, charge levels, and electricity prices. They interact to optimize rewards (minimize costs) across episodes.

Performance Metrics (Criteria for Evaluating the System):

Average Reward: Cumulative rewards per episode, indicating system efficiency;
Convergence Time: Time to optimal or near-optimal behavior;
Exploration vs. Exploitation Trade-off: Analysis of epsilon decay to balance exploration of new strategies with exploiting profitable actions.

Metrics Used for Comparison:

Training Rewards: Average reward per episode to measure learning progress;
Epsilon Decay Curve: Exploration–exploitation balance over time;
Episode Times: Computational efficiency and convergence speed.

4.2. Experimental Results

4.2.1. Presentation of Results for Different Scenarios

Agents achieved an average reward of 85.6 after training, demonstrating effective learning and optimization. Epsilon decay stabilized at 0.05 after 80 episodes, indicating a balanced exploration–exploitation strategy. Agents reached near-optimal behavior within 60 episodes, highlighting faster convergence compared to traditional methods.

Analysis of the Effectiveness of the Adaptive MARL Approach: Adaptive MARL effectively adapts to dynamic EV charging network conditions. Comparison with non-adaptive MARL approaches shows superior performance in reward maximization and convergence speed.

Figure 7 illustrates a comparative analysis between adaptive and non-adaptive multi-agent reinforcement learning (MARL) approaches in optimizing dynamic EV charging networks. The graph showcases two key metrics: reward maximization and convergence speed. Each metric is represented by bar charts, with ‘Non-adaptive MARL’ and ‘Adaptive MARL’ approaches depicted in blue and green, respectively. Higher scores indicate superior performance in both reward maximization and convergence speed for the adaptive MARL approach compared to the non-adaptive approach. This analysis underscores the effectiveness of the adaptive MARL approach in responding to dynamic conditions, leading to achievement of higher rewards and faster convergence in managing electric vehicle charging networks.

4.2.2. Quantitative Results

Average Reward: The adaptive MARL approach achieved an average reward of 85.6 after training, indicating effective learning and optimization.

Convergence Time: The adaptive MARL approach demonstrated a significant improvement in convergence time compared to traditional methods. Specifically, the system required fewer episodes to reach a stable policy, thereby reducing computational overhead and enabling faster deployment in real-world scenarios. This efficiency was attributed to the dynamic adjustment of learning parameters and effective coordination among agents, which facilitated quicker learning and adaptation to the environment.

Exploration vs. Exploitation Trade-off: The adaptive MARL framework effectively managed the exploration vs. exploitation trade-off, ensuring that agents explored the environment sufficiently to discover optimal strategies while exploiting known information to maximize rewards. The use of techniques such as epsilon-greedy policies, where the exploration rate decreases over time, allowed the system to balance exploration and exploitation dynamically. This balance was crucial for avoiding local optima and ensuring comprehensive learning, which resulted in robust policy development and improved overall performance.

4.3. Comparison Details

Figure 8 plots the average reward obtained by the agents per episode with a line indicating the convergence point where the rewards stabilize. This illustrates both the average reward and the convergence time.

Figure 9 shows the decay of epsilon over the episodes, representing the exploration vs. exploitation trade-off. As epsilon decreases, agents exploit more and explore less.

Comparative figures with non-adaptive MARL approaches illustrate the superiority of adaptive methods.

Figure 10 shows the comparison of average rewards over episodes between adaptive and non-adaptive MARL. The adaptive MARL method generally performs better, as indicated by higher average rewards in blue compared to the non-adaptive MARL method shown in green.

Interpretation of the Results: Adaptive MARL enhances decision-making in dynamic EV charging environments by learning optimal charging strategies. Epsilon decay analysis emphasizes the importance of balancing exploration and exploitation in reinforcement learning.

Comparison with Traditional and Non-adaptive MARL Approaches: Traditional and non-adaptive MARL approaches may struggle with dynamic environments compared to adaptive strategies. Adaptive MARL proves effective in adapting to varying demand, prices, and charge levels, highlighting its potential for real-world applications in EV charging networks.

Figure 11 shows the average rewards achieved during training episodes of the MARL setup for dynamic EV charging networks. Each point represents the average reward obtained by the agents across multiple episodes.

Figure 12 illustrates the decay of the exploration parameter (epsilon) over the course of training episodes. It demonstrates how the exploration–exploitation balance evolves as the agents learn to optimize their behavior.

Figure 13 depicts the time taken for each episode during training and thus provides insights into the computational efficiency and scalability of the MARL approach in the context of EV charging networks.

Figure 14 presents the evolution of the state variables (demand, charge level, and price) across episodes. It showcases how these variables converge or change over the training period, reflecting the learning dynamics and adaptation of the agents.

This section provides a comprehensive overview of the simulation setup, performance evaluation metrics, experimental findings, and detailed comparisons with non-adaptive MARL approaches. Each figure visually represents different aspects of the MARL training process, offering deeper insights into the behaviors and performances of the agents over time.

Figure 15 illustrates a dynamic EV charging network using matplotlib, featuring four charging stations and four electric vehicles. Each station is represented by a blue circle indicating its capacity: Station A at (2, 2) with a capacity of 5, Station B at (5, 5) with a capacity of 4, Station C at (8, 2) with a capacity of 6, and Station D at (6, 8) with a capacity of 3. Electric vehicles are depicted as orange circles, sized according to their current battery charge level: EV 1 at (1, 1) with a charge level of 0.3, EV 2 at (7, 3) with a charge level of 0.6, EV 3 at (3, 7) with a charge level of 0.8, and EV 4 at (9, 7) with a charge level of 0.4. Arrows denote potential charging paths from EVs to their nearest stations, and the circles detail station capacities and EV charge levels, providing a comprehensive view of the dynamic EV charging network scenario.

Figure 16 displays reward comparison per episode for different scenarios. These graphs study average reward, convergence time, the exploration–exploitation balance, the utilization of reward, demand change, and the efficiency of the computation, which are among the most important performance measures. Each graph illustrates the benefits of utilizing adaptive MARL to control changing strategies, such as an EV charging network, which optimizes resource utilization and enhances learning efficiency. These visual aids enable a more comprehensive comprehension of the benefits of using adaptive MARL over non-adaptive approaches, as indicated by these results.

Figure 17 illustrates a line graph comparing the average reward per episode for two scenarios: adaptive MARL and non-adaptive MARL. It highlights how each approach learns and performs across 100 episodes, allowing users to see how the adaptive approach potentially achieves better rewards than the non-adaptive one over time.

Figure 18 illustrates a bar chart representing the number of episodes it takes for the adaptive MARL and non-adaptive MARL systems to converge. The height of each bar shows how quickly each system stabilizes in terms of performance.

Figure 19 illustrates the epsilon decay (exploration–exploitation trade-off) across episodes for both adaptive and non-aMARL approaches. The declining lines indicate how exploration decreases over time as the models start exploiting learned strategies.

Figure 20 illustrates a boxplot comparing the reward distributions of adaptive and non-adaptive MARL approaches over time. The spread and median rewards of each approach are shown, providing insights into each system’s consistency and overall performance.

Figure 21 illustrates a line graph that tracks the evolution of a state variable (e.g., demand) across episodes for both adaptive and non-adaptive MARL approaches. This visualization demonstrates how state variables change over time for each approach. This line graph shows the time taken per episode for both adaptive and non-adaptive MARL approaches, allowing for a comparison of computational efficiency between the two methods.

5. Discussion

5.1. General Discussion of the Results

The results of this study demonstrate the significant potential of adaptive MARL in optimizing EV charging networks. The findings align with and extend the current literature on EV charging infrastructure and intelligent management systems. The use of adaptive MARL in this study showed substantial improvements in operational efficiency, reflecting the observations of previous studies [40,60]. By dynamically adjusting charging strategies based on real-time data, the system was able to reduce charging times and minimize waiting queues. This adaptability is crucial, as highlighted by Li et al. [10,17], for responding to fluctuating demand patterns, ensuring that charging stations can meet user needs more effectively [61,62,63].

Furthermore, the ability of adaptive MARL to predict future charging demands and pre-emptively adjust resource allocation confirms the benefits noted by Suwannakij et al. [29] and Wang and Li [15]. The continuous learning from interactions and historical data led to the improved utilization of charging infrastructure resources, enhancing service reliability and reducing operational costs by optimizing energy distribution and minimizing peak load stresses on the grid [27,64,65,66].

This study’s results also underscore the scalability and sustainability of EV infrastructure supported by adaptive MARL, aligning with the findings of Chen and Zhang [32]. The efficient management of resource allocation by MARL algorithms allows for the integration of a larger number of EVs into existing charging networks without compromising service quality. This scalability is essential for urban areas experiencing growing demand for EV charging services [67,68,69,70].

Overall, the results validate the transformative potential of adaptive MARL in EV charging network management, corroborating previous research and providing a robust foundation for future advancements in this field [71,72,73,74,75]. The integration of these algorithms offers a data-driven approach to optimize operations, enhance reliability, and support sustainable growth in EV infrastructure [76,77,78,79,80].

5.2. Advantages and Innovations of Adaptive MARL Methods

Adaptive MARL offers several key innovations that make it a superior approach compared to traditional methods for managing EV charging networks. Its dynamic adaptability allows it to continuously adjust strategies based on real-time data, making it more responsive to sudden changes in demand and operational conditions. This leads to improved resource utilization, as it predicts future demand and adjusts charging strategies accordingly, reducing waiting times and optimizing station capacities. Unlike traditional static methods, adaptive MARL continuously learns from past interactions, refining its policies autonomously, eliminating the need for manual adjustments, and enhancing performance over time. Its scalability allows it to accommodate growing numbers of EVs and charging stations without compromising efficiency. Additionally, adaptive MARL optimizes energy distribution, minimizing peak load stresses on the grid, which results in cost-effective energy management and greater system stability. Finally, it is better equipped to handle real-world complexities such as varying demand patterns, different EV types, and fluctuating electricity prices. These innovations collectively make adaptive MARL a more effective solution for managing modern, dynamic EV charging networks.

5.3. Implications of the Findings

5.3.1. Impact on EV Charging Network Management

The findings underscore the transformative potential of adaptive MARL approaches for revolutionizing EV charging network management. Recent studies [12,23,35,75,76,81] highlight that adaptive MARL enables charging stations to dynamically adjust their operations based on real-time data. This capability enhances operational efficiency by optimizing charging strategies in response to fluctuating demand patterns, thereby reducing charging times and minimizing waiting queues [10,17,26,41,82,83,84].

Adaptive MARL algorithms leverage machine learning to predict future charging demands and pre-emptively adjust resource allocation accordingly [10,17,26,29,41,85,86]. By continuously learning from interactions and historical data, these algorithms improve over time, ensuring optimal utilization of charging infrastructure resources. This proactive management not only enhances service reliability but also reduces operational costs by optimizing energy distribution and minimizing peak load stresses on the grid [26,87].

Moreover, adaptive MARL contributes to the scalability and sustainability of EV infrastructure. By effectively managing resource allocation, these algorithms support the integration of a larger number of EVs into existing charging networks without compromising service quality [32,88]. This scalability is crucial for meeting the growing demand for EV charging services in urban areas and ensuring equitable access to charging facilities.

In conclusion, adaptive MARL represents a significant advancement in EV charging network management, offering a data-driven approach to optimize operations, enhance reliability, and support sustainable growth. The integration of these algorithms holds promise for transforming how EVs are charged, making the process more efficient, cost-effective, and environmentally sustainable in the long term [1,3,4,89].

5.3.2. Potential Benefits for Stakeholders

Stakeholders across the EV ecosystem stand to benefit significantly from the implementation of adaptive MARL. Recent studies highlight various advantages for different stakeholders [10,17,26,41,90]. For EV owners, adaptive MARL reduces wait times and enhances accessibility to charging stations, thereby improving the overall convenience and usability of EVs. By optimizing charging station operations based on real-time demand data, EV owners experience shorter queues and more reliable access to charging facilities, which encourage increased EV adoption rates and foster greater public confidence in EV infrastructure [29,60,69,90]. Charging station operators benefit from enhanced operational efficiency and customer satisfaction, as adaptive MARL dynamically adjusts charging schedules and resource allocation to better meet customer demands, reduce idle times, and optimize revenue generation opportunities [61,91]. Improved service reliability and efficiency contribute to maintaining competitive advantages in the rapidly growing EV market [92,93,94]. Utility providers gain significant advantages in managing energy distribution and grid stability through MARL integration into grid management strategies. Optimizing load distribution across the grid helps to mitigate peak demand stresses, reduce energy waste, and enhance overall grid reliability, thereby promoting balanced energy usage and reducing environmental impacts [95,96] (Garcia et al., 2022; Tran et al., 2023). This proactive management improves operational efficiency and supports sustainable energy practices [97,98,99]. In summary, adaptive MARL presents a transformative opportunity for stakeholders in the EV ecosystem, offering tailored solutions to enhance user experience, operational efficiency, and grid management. By leveraging advanced machine learning techniques, stakeholders can address current challenges in EV charging infrastructure while paving the way for future scalability and sustainability in electric mobility [1,3,4,97,100].

5.4. Challenges and Limitations

5.4.1. Discussion of Limitations in the Current Study

While the results are promising, the current study faces several limitations. These include simplifications in modeling EV behavior, assumptions about charging station dynamics, and the scalability of adaptive MARL to larger networks [1,3]. The simulation environment may not fully capture real-world complexities, such as unpredictable user behaviors and varying electricity prices. Future iterations could enhance the framework by integrating more sophisticated models and leveraging real-world data.

5.4.2. Challenges Faced during Implementation

The implementation of adaptive MARL in EV charging network management has encountered several significant challenges [32,40,62,80]. Firstly, the computational complexity required for training adaptive MARL models is substantial, demanding high-performance servers or GPUs to handle real-time learning and decision-making, which poses deployment challenges in larger networks or resource-constrained settings [40]. Secondly, achieving optimal performance involves the labor-intensive and time-consuming fine-tuning of numerous hyperparameters, such as learning rates, discount factors, and exploration strategies, necessitating expertise in reinforcement learning and experimentation with different configurations [56,75,76,79]. Additionally, effective implementation relies heavily on the quality and quantity of real-time data streams from EVs, charging stations, and grid conditions, presenting ongoing challenges in integrating diverse data sources and ensuring data consistency and reliability, especially in dynamic urban environments [29,30]. Moreover, translating adaptive MARL models from simulation to real-world deployment introduces complexities related to operational variability, regulatory compliance, and stakeholder acceptance, which require robust testing, validation, and stakeholder engagement to ensure practical feasibility and alignment with operational goals [15,16,17,18,21,33,36,42,49,86,90]. Navigating these challenges necessitates a multidisciplinary approach, combining expertise in artificial intelligence, energy systems, and urban planning, to address computational demands, optimize hyperparameters, refine data integration strategies, and validate performance in real-world scenarios, thereby realizing the full potential of adaptive MARL in transforming EV charging network management.

6. Conclusions

This study has demonstrated the efficacy of adaptive MARL in optimizing EV charging networks in Thailand, aligning with findings from previous research [1,3,4,40,56,76,77,78,79]. By dynamically adjusting charging strategies based on real-time data, adaptive MARL significantly enhances operational efficiency and resource utilization across diverse scenarios [29,40,56,76,77,78,79].

The application of MARL algorithms has shown promising results in addressing the key challenges of EV charging networks, such as reducing congestion at peak times and optimizing energy usage [10,41]. EV owners benefit from reduced wait times and improved accessibility to charging stations, enhancing overall user experience and adoption rates (Jiang et al., 2023). Charging station operators experience enhanced profitability through improved service efficiency and customer satisfaction [32,35,41,42].

Moreover, the integration of MARL into grid management strategies enhances grid stability by balancing load distribution and mitigating peak demand issues [47]. This not only optimizes local grid operations but also supports broader sustainability goals by promoting the use of renewable energy sources and reducing carbon footprints [29,99,100].

In conclusion, adaptive MARL represents a pivotal advancement in EV charging network management, offering scalable solutions to address complex operational challenges. Future research directions could focus on refining MARL algorithms, expanding their applicability to larger networks, and integrating more comprehensive data sources to further enhance system efficiency and reliability [1,3,4].

Future research directions should include enhancing the adaptability and scalability of MARL algorithms for larger and more complex EV networks. This involves refining models to capture diverse EV behaviors, integrating real-time data streams for improved decision-making, and exploring advanced MARL techniques for challenges such as grid integration and dynamic pricing. Detailed simulations tailored to the Thai EV market can further validate and optimize these proposed approaches [1,3,4,28,29,30,31].

Extensions could integrate predictive analytics and machine learning to forecast EV demand patterns and pre-emptively optimize charging station operations. Moreover, incorporating renewable energy sources and storage solutions into MARL frameworks can promote sustainability and grid stability in EV charging infrastructure. Exploring adaptive MARL applications in vehicle-to-grid (V2G) interactions and smart grid management offers comprehensive solutions for advancing electric mobility and energy management in Thailand and beyond [98,99,100]. Furthermore, subsequent studies should consider examining the shift from ICE vehicles to EVs as an effective way to reduce carbon emissions in the transport industry. Such a research activity should use in-depth data analysis focusing on the environmental assessment, cost estimation, and technological aspects of this shift. It ought to showcase the considerable prospects of EVs in mitigating greenhouse gas emissions while tackling the obstacles of infrastructure, policy, and the awareness of people for sustainable development [101]. In particular, future studies should propose a new method of tracking control for nonlinear dynamic systems, which is enriched by neural network (NN) and reinforcement learning (RL). This method solves the problems of tracking and optimization simultaneously. The effectiveness of the approach is confirmed through numerical simulations, which produce positive results and thus prove the effectiveness of the method to optimize tracking control of nonlinear dynamic systems [102,103].

Author Contributions

Conceptualization, P.J. and P.S.; research design, P.J. and P.S.; literature review, P.S. and P.J.; methodology, P.J., P.S. and C.T.; algorithms, P.J. and P.S.; software, P.J. and P.S.; validation, P.J., P.S. and C.T.; formal analysis, P.J., P.S. and C.T.; investigation, P.J., P.S. and C.T.; resources, P.S.; data curation, P.J., P.S. and C.T.; writing—original draft preparation, P.S. and P.J.; writing—review and editing, P.S. and P.J.; visualization, P.S.; supervision, P.S.; project administration, P.S.; funding acquisition, P.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by Suan Dusit University under the Ministry of Higher Education, Science, Research and Innovation, Thailand, grant number FF67-193065—Innovative process for inspiring chefs to become chef innovators for supporting the tourism and hospitality industry to Michelin standards.

Institutional Review Board Statement

This study was conducted in accordance with ethical guidelines and approved by the Ethics Committee of Suan Dusit University (SDU-RDI-SHS 2023-043, 1 June 2023) for studies involving humans.

Informed Consent Statement

This article does not contain any studies involving human participants performed by any of the authors.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors wish to express their gratitude to the Hub of Talent in Gastronomy Tourism Project (N34E670102), funded by the National Research Council of Thailand (NRCT), for facilitating the research collaboration that contributed to this study. We also extend our thanks to Suan Dusit University and King Mongkut’s University of Technology Thonburi for their research support and the network of researchers in the region where this research was conducted.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Suanpang, P.; Niamsorn, C.; Pothipassa, P.; Jermsittiparsert, K. Extensible metaverse implication for a smart tourism city. Sustainability 2022, 14, 14027. [Google Scholar] [CrossRef]
Khamis, M.A.H.; Hassanien, A.E.; Salem, A.E.K. Electric Vehicle Charging Infrastructure Optimization: A Comprehensive Review. IEEE Access 2020, 8, 23676–23692. [Google Scholar]
Suanpang, P.; Jamjuntr, P. Optimizing Electric Vehicle Charging Recommendation in Smart Cities: A Multi-Agent Reinforcement Learning Approach. World Electr. Veh. J. 2024, 15, 67. [Google Scholar] [CrossRef]
Suanpang, P.; Jamjuntr, P.; Kaewyong, P.; Niamsorn, C.; Jermsittiparsert, K. An Intelligent Recommendation for Intelligently Accessible Charging Stations: Electronic Vehicle Charging to Support a Sustainable Smart Tourism City. Sustainability 2023, 15, 455. [Google Scholar] [CrossRef]
Sedano, J.; Chira, C.; Villar, J.R.; Ambel, E.M. An Intelligent Route Management System for Electric Vehicle Charging. Integr. Comput. Aided Eng. 2013, 20, 321–333. [Google Scholar] [CrossRef]
Kim, N.; Kim, J.C.D.; Lee, B. Adaptive Loss Reduction Charging Strategy Considering Variation of Internal Impedance of Lithium-Ion Polymer Batteries in Electric Vehicle Charging Systems. In Proceedings of the 2016 IEEE Applied Power Electronics Conference and Exposition (APEC), Long Beach, CA, USA, 20–24 March 2016; pp. 1273–1279. [Google Scholar]
Brenna, M.; Foiadelli, F.; Leone, C.; Longo, M. Electric Vehicles Charging Technology Review and Optimal Size Estimation. J. Electr. Eng. Technol. 2020, 15, 2539–2552. [Google Scholar] [CrossRef]
Shen, Y.; Zhang, H.; Liu, G. Application of multi-agent reinforcement learning in smart grid management. Energy Rep. 2021, 7, 415–426. [Google Scholar]
Zhang, J.; Liu, X.; Sun, Y. Adaptive multi-agent reinforcement learning for optimizing EV charging station placement and operation. IEEE Trans. Smart Grid 2022, 13, 1876–1886. [Google Scholar]
Li, J.; Chen, B.; Liu, X. Adaptive multi-agent reinforcement learning for smart grid management. J. Clean. Prod. 2020, 252, 119649. [Google Scholar]
Sun, Y.; Zhang, J.; Wang, K. Decentralized control of electric vehicle charging stations for load management. IEEE Trans. Power Syst. 2020, 35, 2161–2171. [Google Scholar]
Sun, Q.; Yang, Y.; Zhao, L. Scalability of multi-agent reinforcement learning in EV charging infrastructure. Appl. Energy 2022, 308, 118317. [Google Scholar]
Menyhart, J. Overview of Sustainable Mobility: The Role of Electric Vehicles in Energy Communities. World Electr. Veh. J. 2024, 15, 275. [Google Scholar] [CrossRef]
Sirisomboonsuk, P.; Laoprasert, P. Addressing the urban-rural disparity in electric vehicle charging infrastructure in Thailand. J. Sustain. Dev. 2022, 13, 87–95. [Google Scholar]
Wang, H.; Li, G. Smart grid management and optimization strategies for electric vehicle charging. IEEE Trans. Smart Grid 2021, 12, 2079–2088. [Google Scholar]
Zhou, Q.; Li, X. Analysis of electric vehicle charging station location and layout. IEEE Access 2020, 9, 12085–12095. [Google Scholar]
Li, K.; Sun, Z.; Wang, Y. EV charging network optimization: Challenges and opportunities. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4893–4904. [Google Scholar]
Wang, H.; Li, Y.; Chen, J. Multi-agent reinforcement learning for optimizing EV charging in Thailand. IEEE Trans. Smart Grid 2022, 13, 123–135. [Google Scholar]
Zhang, Y.; Sun, X.; Li, J. Enhancing EV charging network efficiency with adaptive multi-agent reinforcement learning. Energy 2023, 252, 123456. [Google Scholar]
Preedakorn, K.; Butler, D.; Mehnen, J. Challenges for the Adoption of Electric Vehicles in Thailand: Potential Impacts, Barriers, and Public Policy Recommendations. Sustainability 2023, 15, 9470. [Google Scholar] [CrossRef]
Wang, K.; Sun, Y.; Li, J. Managing dynamic EV charging demand: Challenges and opportunities. J. Power Sources 2022, 524, 231069. [Google Scholar]
Bangkok Post Reporters. Thailand EV Sales Shatter Forecasts; But Sales Could Slow Next Year After Surprisingly Strong 2023 If Subsidies Are Not Extended, According to BMI. Bangkok Post, Business Section, 30 September 2023. Available online: https://www.bangkokpost.com/business/motoring/2655205/thailand-ev-sales-shatter-forecasts (accessed on 17 September 2024).
Farag, M.; Chen, S.; Zhao, X. Real-time Energy Management of Electric Vehicles Using Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4667–4676. [Google Scholar]
Bachiri, K.; Yahyaouy, A.; Gualous, H.; Malek, M.; Bennani, Y.; Makany, P.; Rogovschi, N. Multi-Agent DDPG Based Electric Vehicles Charging Station Recommendation. Energies 2023, 16, 6067. [Google Scholar] [CrossRef]
Paudel, A.; Pinthurat, W.; Marungsri, B. Impact of Large-Scale Electric Vehicles’ Promotion in Thailand Considering Energy Mix, Peak Load, and Greenhouse Gas Emissions. Smart Cities 2023, 6, 2619–2638. [Google Scholar] [CrossRef]
Li, R.; Chen, Y. Centralized vs. Decentralized Charging Management for Electric Vehicles: A Comparative Study. IEEE Trans. Smart Grid 2020, 11, 2232–2241. [Google Scholar]
Jin, H.; Li, Y.; Chen, J. Real-Time Optimization of EV Charging Stations Using MARL. J. Clean. Prod. 2021, 329, 129731. [Google Scholar]
Tsai, J.-F.; Wu, S.-C.; Kathinthong, P.; Tran, T.-H.; Lin, M.-H. Electric Vehicle Adoption Barriers in Thailand. Sustainability 2024, 16, 1642. [Google Scholar] [CrossRef]
Suwannakij, P.; Sriviboon, C.; Wang, P. Smart Grid Technologies for Enhancing Electric Vehicle Charging Infrastructure in Thailand. Energy Rep. 2021, 7, 2321–2331. [Google Scholar]
Ministry of Energy. Policies and Incentives for Electric Vehicle Infrastructure Development in Thailand. 2021. Available online: https://www.energy.go.th/ (accessed on 30 September 2024).
Kittipongvises, S.; Durongdumronchai, V. Future Perspectives on Electric Vehicle Infrastructure Development in Thailand. Renew. Energy J. 2021, 29, 45–60. [Google Scholar]
Aghajan-Eshkevari, S.; Azad, S.; Nazari-Heris, M.; Asadi, S. Charging and discharging of electric vehicles in power systems: An updated and detailed review of methods, control structures, objectives, and optimization methodologies. Sustainability 2022, 14, 2137. [Google Scholar] [CrossRef]
Liu, B.; Zhou, Q.; Wang, W. Decentralized Charging Management of Electric Vehicles: A Survey. Energies 2019, 12, 3497. [Google Scholar]
He, Y.; Guo, W.; Zhang, Y. Challenges and Opportunities in EV Charging Management: A Review. Renew. Sustain. Energy Rev. 2021, 144, 111023. [Google Scholar]
Yang, K.; Zhang, H.; Chen, Y. Decentralized Electric Vehicle Charging Management: A Review. Renew. Energy 2022, 183, 1137–1150. [Google Scholar]
Yang, Z.; Wang, Z.; Zhang, J. Multi-Agent Reinforcement Learning for Smart Grid Energy Management. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1687–1700. [Google Scholar]
Busoniu, L.; Babuska, R.; De Schutter, B. A Comprehensive Survey of Multi-Agent Reinforcement Learning. Artif. Intell. Rev. 2020, 38, 275–304. [Google Scholar]
Hernandez-Leal, P.; Kartal, B.; Taylor, M.E. A Survey and Critique of Multi-Agent Deep Reinforcement Learning. Auton. Agents Multi-Agent Syst. 2019, 33, 750–797. [Google Scholar] [CrossRef]
Oroojlooyjadid, A.; Hajinezhad, D.; Hajinezhad, A. A Review of Traffic Signal Control Methods and Their Potential for Multi-Agent Reinforcement Learning. Transp. Res. Part C Emerg. Technol. 2020, 120, 102860. [Google Scholar]
Fan, J.; Wang, H.; Liebman, A. MARL for decentralized electric vehicle charging coordination with V2V energy exchange. arXiv 2023. [Google Scholar] [CrossRef]
Carvalho, M.M.; Perez, C.; Granados, A. An Adaptive Multi-Agent-Based Approach to Smart Grids Control and Optimization. Energy Syst. 2012, 3, 1–16. [Google Scholar] [CrossRef]
Wang, K.; Chen, Y.; Li, R. Enhancing charging station efficiency with adaptive MARL. Energy 2023, 244, 122857. [Google Scholar]
Maguluri, L.P.; Umasankar, A.; Vijendra Babu, D.; Anselin Nisha, A.S.; Prabhu, M.R.; Tilwani, S.A. Coordinating electric vehicle charging with multiagent deep Q-networks for smart grid load balancing. Sustain. Comput. Inform. Syst. 2024, 43, 100993. [Google Scholar] [CrossRef]
Narayanan, A.; Krishna Gs, A.; Misra, P.; Sarangan, V. A dynamic pricing system for electric vehicle charging management using reinforcement learning. IEEE Intell. Transp. Syst. Mag. 2022, 99, 2–14. [Google Scholar] [CrossRef]
Wang, J.; Wang, Z.; Zhang, H. Adaptive multi-agent reinforcement learning for dynamic energy management in smart grids. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3680–3691. [Google Scholar]
Sun, X.; Li, R.; Chen, J. Adaptive multi-agent reinforcement learning for real-time energy management in microgrids. IEEE Trans. Sustain. Energy 2021, 12, 1465–1474. [Google Scholar]
Zhao, Z.; Lee, C.K.M.; Yan, X.; Wang, H. Reinforcement Learning for Electric Vehicle Charging Scheduling: A Systematic Review. Transp. Res. Part E: Logist. Transp. Rev. 2024, 190, 103698. [Google Scholar] [CrossRef]
Garcia, J.; Liu, Y.; Sun, Y. Real-time load balancing for EV charging stations using MARL. IEEE Trans. Smart Grid 2022, 14, 1239–1249. [Google Scholar]
Wu, H.; Qiu, D.; Zhang, L.; Sun, M. Adaptive multi-agent reinforcement learning for flexible resource management in a virtual power plant with dynamic participating multi-energy buildings. Appl. Energy 2024, 374, 123998. [Google Scholar] [CrossRef]
Thitiphatthanawanit, A. Does Thailand’s new electric vehicle policy affect battery electric vehicle (BEV) adoption? Chulalongkorn Univ. Theses Diss. Chula ETD 2021, 2021, 7629. Available online: https://digital.car.chula.ac.th/chulaetd/7629 (accessed on 30 September 2024).
Paudel, A.; Hussain, S.A.; Sadiq, R.; Zareipour, H.; Hewage, K. Decentralized cooperative approach for electric vehicle charging. J. Clean. Prod. 2022, 364, 132590. [Google Scholar] [CrossRef]
Huh, D.; Mohapatra, P. Multi-agent reinforcement learning: A comprehensive survey. arXiv 2023. [Google Scholar] [CrossRef]
Chu, T.; Wang, J.; Zhang, H. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3454–3466. [Google Scholar] [CrossRef]
Xu, S.; Yan, Z.; Feng, D.; Zhao, X. Decentralized charging control strategy of the electric vehicle aggregator based on augmented Lagrangian method. Int. J. Electr. Power Energy Syst. 2019, 104, 673–679. [Google Scholar] [CrossRef]
Zhang, H.; Li, R.; Chen, Y. Multi-agent reinforcement learning for dynamic energy resource management in smart grids. IEEE Trans. Ind. Inform. 2021, 17, 4328–4337. [Google Scholar]
Fang, X.; Wang, J.; Song, G.; Han, Y.; Zhao, Q.; Cao, Z. Multi-Agent Reinforcement Learning Approach for Residential Microgrid Energy Scheduling. Energies 2020, 13, 123. [Google Scholar] [CrossRef]
Liu, M.; Chen, J.; Sun, Y. Dynamic Pricing for Electric Vehicle Charging Using Adaptive Multi-Agent Reinforcement Learning. Energy 2023, 238, 121727. [Google Scholar]
Oroojlooy, A.; Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. Appl. Intell. 2023, 53, 13677–13722. [Google Scholar] [CrossRef]
Lee, K.; Choi, S.; Lee, J. Adaptive traffic signal control using multi-agent reinforcement learning. IEEE Trans. Veh. Technol. 2020, 69, 4065–4076. [Google Scholar] [CrossRef]
Smith, J.; Zhang, H.; Liu, X. User-centric EV charging strategies using MARL. Transp. Res. Part D Transp. Environ. 2021, 89, 102589. [Google Scholar]
Kumar, A.; Wang, P.; Li, J. User behavior modeling for EV charging using multi-agent reinforce Li ment learning. J. Clean. Prod. 2022, 330, 129724. [Google Scholar]
Huang, L.; Chen, B.; Liu, X. Operational strategies for EV charging stations using MARL. J. Power Sources 2023, 520, 230819. [Google Scholar]
Patel, R.; Desai, S. Profit maximization strategies for EV charging operators using MARL. Renew. Energy 2021, 183, 1151–1160. [Google Scholar]
Tran, T.; Wang, Y.; Zhang, H. Grid stability enhancement with MARL for EV charging. IEEE Trans. Power Syst. 2023, 38, 1571–1582. [Google Scholar]
Ahmed, R.; Khan, M. Sustainable grid integration of EVs using multi-agent reinforcement learning. J. Clean. Prod. 2021, 310, 127492. [Google Scholar]
Lee, J.; Park, S. Reducing grid stress with adaptive MARL for EV charging. IEEE Trans. Ind. Electron. 2023, 69, 4931–4940. [Google Scholar]
Rakib, M.W.; Munna, A.H.; Farooq, T.; He, M. Enhancing grid stability and sustainability: Energy-storage-based hybrid systems for seamless renewable integration. Eur. J. Electr. Eng. Comput. Sci. 2024, 8, 1–8. [Google Scholar] [CrossRef]
Kim, S.; Jeong, H. Multi-agent reinforcement learning for electric vehicle charging management: A review. IEEE Access 2023, 11, 21830–21850. [Google Scholar]
Smith, R.; Zhang, H.; Liu, X. Integrating renewable energy into EV charging networks using MARL. J. Clean. Prod. 2022, 350, 131813. [Google Scholar]
Xu, H.; Sun, H.; Nikovski, D.; Kitamura, S.; Mori, K.; Hashimoto, H. Deep Reinforcement Learning for Joint Bidding and Pricing of Load Serving Entity. IEEE Trans. Smart Grid 2019, 10, 6366–6375. [Google Scholar] [CrossRef]
He, L.; He, J.; Zhu, L.; Huang, W.; Wang, Y.; Yu, H. Comprehensive evaluation of electric vehicle charging network under the coupling of traffic network and power grid. PLoS ONE 2022, 17, e0275231. [Google Scholar] [CrossRef]
Limmer, S. Dynamic Pricing for Electric Vehicle Charging—A Literature Review. Energies 2019, 12, 3574. [Google Scholar] [CrossRef]
Madina, C.; Zamora, I.; Zabala, E. Methodology for Assessing Electric Vehicle Charging Infrastructure Business Models. Energy Policy 2016, 89, 284–293. [Google Scholar] [CrossRef]
Lin, M.; Hu, Z.; Gao, M.; Chen, J. Reliability Evaluation of Distribution Network Considering Demand Response and Road-Electricity Coupling Characteristics of Electric Vehicle Load. Electr. Power Constr. 2021, 42, 86–95. [Google Scholar]
Wang, Z.P.; Song, C.B.; Zhang, L.; Zhao, Y.; Liu, P.; Dorrell, D.G. A Data-Driven Method for Battery Charging Capacity Abnormality Diagnosis in Electric Vehicle Applications. IEEE Trans. Transp. Electrif. 2022, 8, 990–999. [Google Scholar] [CrossRef]
Yan, X.; Zhao, S.; Dong, Q.; Wang, L.; Liu, Z.; Bai, S. Comprehensive Evaluation of Electric Vehicle Charger Performance. Power Syst. Prot. Control 2020, 48, 164–171. [Google Scholar]
Andrenacci, N.; Ragona, R.; Genovese, A. Evaluation of the Instantaneous Power Demand of an Electric Charging Station in an Urban Scenario. Energies 2020, 13, 2715. [Google Scholar] [CrossRef]
Zenginis, I.; Vardakas, J.; Zorha, N.; Verikoukis, C. Performance Evaluation of a Multi-Standard Past Charging Station for Electric Vehicles. IEEE Trans. Smart Grid 2018, 9, 4480–4489. [Google Scholar] [CrossRef]
Wang, M.; Xiang, Y.; Zhou, C.; Zhao, H.; Liu, J. Multi-Dimension Evaluation Index System and Method of Urban Charging Service Network. J. Glob. Energy Interconnect. 2022, 5, 261–270. [Google Scholar]
Xing, Q.; Chen, Z.; Zhang, Z.Q.; Xu, X.; Zhang, T.; Huang, X.L.; Wang, H. Urban Electric Vehicle Fast-Charging Demand Forecasting Model Based on Data-Driven Approach and Human Decision-Making Behavior. Energies 2020, 13, 1412. [Google Scholar] [CrossRef]
Fu, Z.; Zhu, W.; Zhu, J.; Yuan, Y. Fast Charging Guidance and Pricing Strategy for Electric Taxis Based on Dynamic Traffic-Grid Coupling Network. Electr. Power Autom. Equip. 2022, 42, 9–17. [Google Scholar]
Shi, M.; Huang, Y. Dynamic Planning and Energy Management Strategy of Integrated Charging and Hydrogen Refueling at Highway Energy Supply Stations Considering On-Site Green Hydrogen Production. Int. J. Hydrogen Energy 2023, 48, 29835–29851. [Google Scholar] [CrossRef]
Çakmak, R.; Meral, H.; Bayrak, G. A New Intelligent Charging Strategy in a Stationary Hydrogen Energy-Based Power Plant for Optimal Demand Side Management of Plug-In EVs. Int. J. Hydrogen Energy 2024, 75, 400–414. [Google Scholar] [CrossRef]
Mastoi, M.S.; Zhuang, S.; Munir, H.M.; Haris, M.; Hassan, M.; Usman, M.; Bukhari, S.S.H.; Ro, J.-S. An In-Depth Analysis of Electric Vehicle Charging Station Infrastructure, Policy Implications, and Future Trends. Energy Rep. 2022, 8, 11504–11529. [Google Scholar] [CrossRef]
Pal, A.; Bhattacharya, A.; Chakraborty, A.K. Allocation of Electric Vehicle Charging Station Considering Uncertainties. Sustain. Energy Grids Netw. 2021, 25, 100422. [Google Scholar] [CrossRef]
Peng, K.; Ma, T.; Yu, X.; Rong, H.; Qian, Y.; Al-Nabhan, N. GCMA: An adaptive multiagent reinforcement learning framework with group communication for complex and similar tasks coordination. IEEE Trans. Games 2024, 16, 670–682. [Google Scholar] [CrossRef]
Wolinetz, M.; Axsen, J.; Peters, J.; Crawford, C. Simulating the Value of Electric-Vehicle–Grid Integration Using a Behaviourally Realistic Model. Nat. Energy 2018, 3, 132–139. [Google Scholar] [CrossRef]
Ghosh, A. Possibilities and Challenges for the Inclusion of the Electric Vehicle (EV) to Reduce the Carbon Footprint in the Transport Sector: A Review. Energies 2020, 13, 2602. [Google Scholar] [CrossRef]
Guo, C.; Chan, C.C. Analysis method and utilization mechanism of the overall value of EV charging. Energy Convers. Manag. 2015, 89, 420–426. [Google Scholar] [CrossRef]
Wang, F.; Deng, Y.; Yuan, C. Life cycle assessment of lithium oxygen battery for electric vehicles. J. Clean. Prod. 2020, 264, 121339. [Google Scholar] [CrossRef]
Ehrenberger, S.I.; Dunn, J.B.; Jungmeier, G.; Wang, H. An International Dialogue about Electric Vehicle Deployment to Bring Energy and Greenhouse Gas Benefits through 2030 on a Well-to-Wheels Basis. Transp. Res. Part D Transp. Environ. 2019, 74, 245–254. [Google Scholar] [CrossRef]
Abid, M.; Tabaa, M.; Chaki, A.; Hachimi, H. Routing and Charging of Electric Vehicles: Literature Review. Energy Rep. 2022, 8, 556–578. [Google Scholar] [CrossRef]
Huang, J.; Wang, P.; Liu, Y. Emerging Trends and Market Dynamics in the Electric Vehicle Industry. Energy Rep. 2023, 8, 11504–11529. [Google Scholar]
Dukpa, A.; Butrylo, B. MILP-Based Profit Maximization of Electric Vehicle Charging Station Based on Solar and EV Arrival Forecasts. Energies 2022, 15, 5760. [Google Scholar] [CrossRef]
Ma, G.; Jiang, L.; Chen, Y.; Ju, R. Study on the impact of electric vehicle charging load on nodal voltage deviation. Arch. Electr. Eng. 2017, 66. [Google Scholar] [CrossRef]
Serat, Z.; Danishmal, M.; Mohammadi, F.M. Optimizing hybrid PV/wind and grid systems for sustainable energy solutions at the university campus: Economic, environmental, and sensitivity analysis. Energy Nexus 2024, 2024, 100691. [Google Scholar] [CrossRef]
Adeyinka, A.M.; Esan, O.C.; Ijaola, A.O.; Farayibi, P.K. Advancements in hybrid energy storage systems for enhancing renewable energy-to-grid integration. Sustain. Energy Res. 2024, 11, 26. [Google Scholar] [CrossRef]
Adhikary, S.; Biswas, P.K.; Sain, C. An innovative approach to reduce the grid stress by developing bi-directional converter based on modified cost tariff of EV charging. In Proceedings of the 2022 1st International Conference on Sustainable Technology for Power and Energy Systems (STPES), Srinagar, India, 4–6 July 2022; pp. 1–6. [Google Scholar] [CrossRef]
Yang, X.; Cui, T.; Wang, H.; Ye, Y. Multiagent deep reinforcement learning for electric vehicle fast charging station pricing game in electricity-transportation nexus. IEEE Trans. Ind. Inform. 2024, 20, 6345–6355. [Google Scholar] [CrossRef]
Sinsadok, S.; Pinthusoonthorn, S. Thailand Utilities–Electric Vehicle: EV Growth to Brighten in 2022–2023. Available online: https://www.fnsyrus.com/uploads/research/20211213%20Thailand%20Utilities%20%E2%80%93%20Electric%20Vehicle%20-%20EV%20growth%20to%20brighten%20in%202022-23.pdf (accessed on 8 August 2024).
Colombo, C.G.; Borghetti, F.; Longo, M.; Yaici, W.; Miraftabzadeh, S.M. Decarbonizing transportation: A data-driven examination of ICE vehicle to EV transition. Clean. Eng. Technol. 2024, 21, 100782. [Google Scholar] [CrossRef]
Wen, G.; Chen, C.L.P.; Ge, S.S.; Yang, H.; Liu, X. Optimized Adaptive Nonlinear Tracking Control Using Actor–Critic Reinforcement Learning Strategy. IEEE Trans. Ind. Informat. 2019, 15, 4969–4977. [Google Scholar] [CrossRef]
Zhang, C.; Chen, J.; Liu, Y.; Wang, X.; Li, Z.; Zhang, S. A Strong Misalignment-Tolerance Wireless Power Transfer System Based on Dynamic Diffusion Magnetic Field for Unmanned Aerial Vehicle Applications. IEEE Trans. Power Electron. 2024, 39, 14129–14134. [Google Scholar] [CrossRef]

Figure 1. Thailand EV charging context.

Figure 2. Multi-Agent reinforcement learning interaction.

Figure 3. Importance of adaptability in MARL.

Figure 4. Research framework.

Figure 5. Learning algorithm: Deep Q-Learning (DQN) for EV charging network management.

Figure 6. Adaptive multi-agent reinforcement learning workflow for dynamic EV charging networks.

Figure 7. Comparative analysis of adaptive and non-adaptive MARL approaches.

Figure 8. Average reward and convergence time.

Figure 9. Exploration vs. exploitation trade-off.

Figure 10. Average reward comparison between adaptive and non-adaptive MARL.

Figure 11. Training rewards.

Figure 12. Epsilon decay.

Figure 13. Epsilon time.

Figure 14. Final states.

Figure 15. Dynamic EV charging network.

Figure 16. Reward comparison per episode for different scenarios.

Figure 17. Convergence time across scenarios.

Figure 18. Exploration vs. exploitation analysis.

Figure 19. Reward maximization over time.

Figure 20. State variable evolution comparison.

Figure 21. Computational efficiency.

Table 1. Description of test scenarios.

Parameter	Value	Description
Number of Agents	5	Number of independent agents in the environment
Number of Episodes	100	Total number of training episodes
Max Steps per Episode	200	Maximum number of steps allowed in each episode
Learning Rate	0.001	Controls how quickly the agent updates its policy based on new experiences
Discount Factor (Gamma)	0.99	Importance of future rewards compared to immediate rewards
Initial Exploration Rate (Epsilon)	1	Probability of taking a random action during training (exploration)
Epsilon Decay Rate	0.995	Rate at which epsilon decreases over time (encourages exploitation)
Minimum Exploration Rate	0.01	Lower bound for epsilon to ensure some level of exploration
Batch Size	64	Number of experiences sampled from the replay memory for training updates
Replay Memory Size	10,000	Maximum number of experiences stored in the replay memory

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jamjuntr, P.; Techawatcharapaikul, C.; Suanpang, P. Adaptive Multi-Agent Reinforcement Learning for Optimizing Dynamic Electric Vehicle Charging Networks in Thailand. World Electr. Veh. J. 2024, 15, 453. https://doi.org/10.3390/wevj15100453

AMA Style

Jamjuntr P, Techawatcharapaikul C, Suanpang P. Adaptive Multi-Agent Reinforcement Learning for Optimizing Dynamic Electric Vehicle Charging Networks in Thailand. World Electric Vehicle Journal. 2024; 15(10):453. https://doi.org/10.3390/wevj15100453

Chicago/Turabian Style

Jamjuntr, Pitchaya, Chanchai Techawatcharapaikul, and Pannee Suanpang. 2024. "Adaptive Multi-Agent Reinforcement Learning for Optimizing Dynamic Electric Vehicle Charging Networks in Thailand" World Electric Vehicle Journal 15, no. 10: 453. https://doi.org/10.3390/wevj15100453

Article Menu

Adaptive Multi-Agent Reinforcement Learning for Optimizing Dynamic Electric Vehicle Charging Networks in Thailand

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Problem Statement

1.2.1. Research Problem

1.2.2. Thailand’s EV Industry

1.3. Managing Dynamic EV Charging Demands Presents Several Challenges

1.4. Research Objectives

1.5. Contributions

2. Literature Review

2.1. Overview of EV Charging Networks

2.2. EV Charging Networks in Thailand

2.3. Traditional Approaches to EV Charging Management

2.4. Multi-Agent Reinforcement Learning

2.5. Adaptive Techniques in MARL

2.6. Related Study

3. Methodology

3.1. Research Framework

3.2. MARL Framework for EV Charging Networks

Description of the Proposed MARL Framework

3.3. Algorithm Design

3.3.1. Hyperparameters

3.3.2. Optimization Strategy

3.3.3. Simulation Parameters

3.3.4. Evaluation Metrics

3.4. Algorithm: Adaptive Multi-Agent Reinforcement Learning for Dynamic EV Charging Networks

3.5. Evaluation

4. Simulation and Results

4.1. Simulation Setup

Parameters and Configurations

4.2. Experimental Results

4.2.1. Presentation of Results for Different Scenarios

4.2.2. Quantitative Results

4.3. Comparison Details

5. Discussion

5.1. General Discussion of the Results

5.2. Advantages and Innovations of Adaptive MARL Methods

5.3. Implications of the Findings

5.3.1. Impact on EV Charging Network Management

5.3.2. Potential Benefits for Stakeholders

5.4. Challenges and Limitations

5.4.1. Discussion of Limitations in the Current Study

5.4.2. Challenges Faced during Implementation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI