Blockchain-Assisted Secure Energy Trading in Electricity Markets: A Tiny Deep Reinforcement Learning-Based Stackelberg Game Approach

Xiao, Yong; Lin, Xiaoming; Lei, Yiyong; Gu, Yanzhang; Tang, Jianlin; Zhang, Fan; Qian, Bin

doi:10.3390/electronics13183647

Open AccessArticle

Blockchain-Assisted Secure Energy Trading in Electricity Markets: A Tiny Deep Reinforcement Learning-Based Stackelberg Game Approach

by

Yong Xiao

^1,2,

Xiaoming Lin

^1,2,*

,

Yiyong Lei

³,

Yanzhang Gu

³,

Jianlin Tang

^1,2,

Fan Zhang

^1,2 and

Bin Qian

^1,2

¹

Electric Power Research Institute of CSG, Guangzhou 510663, China

²

Guangdong Provincial Key Laboratory of Intelligent Measurement and Advanced Metering of Power Grid, Guangzhou 510663, China

³

China Southern Power Grid Co., Ltd., Guangzhou 510663, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3647; https://doi.org/10.3390/electronics13183647

Submission received: 20 August 2024 / Revised: 6 September 2024 / Accepted: 10 September 2024 / Published: 13 September 2024

(This article belongs to the Special Issue Network Security Management in Heterogeneous Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Electricity markets are intricate systems that facilitate efficient energy exchange within interconnected grids. With the rise of low-carbon transportation driven by environmental policies and tech advancements, energy trading has become crucial. This trend towards Electric Vehicles (EVs) is bolstered by the pivotal role played by EV charging operators in providing essential charging infrastructure and services for widespread EV adoption. This paper introduces a blockchain-assisted secure electricity trading framework between EV charging operators and the electricity market with renewable energy sources. We propose a single-leader, multi-follower Stackelberg game between the electricity market and EV charging operators. In the two-stage Stackelberg game, the electricity market acts as the leader, deciding the price of electric energy. The EV charging aggregator leverages blockchain technology to record and verify energy trading transactions securely. The EV charging operators, acting as followers, then decide their demand for electric energy based on the set price. To find the Stackelberg equilibrium, we employ a Deep Reinforcement Learning (DRL) algorithm that tackles non-stationary challenges through policy, action space, and reward function formulation. To optimize efficiency, we propose the integration of pruning techniques into DRL, referred to as Tiny DRL. Numerical results demonstrate that our proposed schemes outperform traditional approaches.

Keywords:

electricity market operators; secure energy trading; Stackelberg game; deep reinforcement learning; pruning techniques

Graphical Abstract

1. Introduction

The electricity market is a structured marketplace where electricity is traded, aiming to ensure the efficient allocation and use of electrical resources to maintain a balance between supply and demand. Key participants in this market include generators, transmission companies, distribution companies, Load-Serving Entities (LSEs), and end users [1]. Price signals within the electricity market are essential, as they incentivize the optimal use and dispatch of electricity resources. The market operates through various segments, notably the day-ahead market, where electricity is traded a day in advance based on forecasts, and the real-time market, which addresses immediate imbalances in electricity demand and supply [2]. Moreover, intraday markets offer additional flexibility by allowing market participants to electricity closer to the time of delivery, further enhancing the market’s ability to respond to unforeseen changes in demand or supply [3]. These mechanisms collectively contribute to the reliable and cost-effective delivery of electricity to consumers, facilitating the integration of renewable energy sources and supporting the overall stability of the power grid.

Electricity trading is a key strategy for achieving low-carbon transportation and offers additional benefits such as improving urban air quality and reducing environmental pressures [4]. This dual benefit motivates both national and local governments to take more decisive actions. In recent years, stronger environmental protection policies and significant reductions in technology costs have solidified commitments from governments and automakers toward the development of Electric Vehicles (EVs). These developments indicate that EVs are poised to become the mainstream choice for future transportation [5]. EV charging operators play a crucial role in the growing EV ecosystem, offering charging facilities and services that are vital for widespread EV adoption. As EV usage increases, these operators become key players in the electricity market. Beyond providing charging services, they actively participate in energy trading, mainly electricity trading, through the use of smart grid technologies [6]. EVs can function as mobile energy storage units, charging during periods of low electricity demand and prices and discharging back to the grid when demand and prices are high. This bidirectional energy flow helps optimize electricity distribution, enhance grid stability, and maximize economic benefits. The integration of vehicle-to-grid technology further enhances this capability by enabling more efficient energy management and supporting the overall stability and efficiency of smart grids [7].

Currently, power trading faces several key challenges that hinder its efficiency and reliability. First, there are inadequate incentive mechanisms in environments with incomplete information. Without proper incentives, market participants may be reluctant to provide reliable and accurate electricity resources, leading to inefficiencies. This issue is compounded by the lack of transparency and trust between stakeholders, which can further discourage active and honest participation in the market [8]. Secondly, the dynamic nature of the trading environment presents significant difficulties. Traditional methods often fail to achieve optimal trading strategies in real time due to rapid price fluctuations and the complex requirements of demand response. These methods are generally not equipped to handle the high volatility and the swift changes in supply and demand, which are characteristic of modern electricity markets. As a result, there is a pressing need for the development of more flexible and efficient technological solutions. Advanced approaches such as machine learning and Deep Reinforcement Learning (DRL), have shown promise in this regard. These technologies can adapt to changing market conditions and optimize trading strategies in real time, thereby ensuring more effective market operations [9,10].

The electricity market is essential for the optimal allocation of electrical resources, ensuring a balance between supply and demand. EV charging operators enhance this process by integrating smart grid technologies and actively participating in the electricity market, thereby promoting efficient electricity utilization. To overcome the challenges posed by incomplete information and dynamic trading environments, it is crucial to continuously innovate and improve market mechanisms and technological solutions [11]. These advancements are vital for maintaining the efficient operation of the electricity market and supporting the green transition of energy systems. In the context of emerging technologies and market dynamics, continuous innovation and the development of advanced market mechanisms are crucial. For instance, integrating Demand Response (DR) strategies and employing advanced algorithms, such as those based on game theory and DRL, can significantly enhance market efficiency and reliability. By leveraging these technologies, the market can better accommodate the variability of renewable energy sources and ensure a more resilient and adaptive power system.

Therefore, to address the challenge of ensuring that electricity markets provide real and reliable resources, we propose a Stackelberg game. This game-theoretic approach effectively structures interactions between market participants, promoting optimal decision making and efficient resource allocation. Furthermore, we integrate DRL with pruning techniques to solve the model efficiently. This combination enables dynamic adaptation to changing market conditions, significantly enhancing the security and efficiency of electricity resource provision. The main contributions of this paper are summarized as follows:

We introduce a blockchain-assisted secure electricity trading framework that facilitates transactions between EV charging operators and the electricity market. Central to this framework is an aggregator that leverages blockchain technology to securely record and manage these transactions. By employing blockchain, we ensure the integrity and security of electricity trading operations.
To address the pricing challenges within the electricity market, we propose a single-leader, multi-follower Stackelberg model involving the electricity market and EV charging operators. Here, the electricity market assumes the role of the leader, establishing the selling price of electric energy units. EV charging operators, as followers, adjust their resource demand strategies based on the pricing set by the market leader. This model aims to optimize resource allocation and pricing decisions within the system.
Recognizing the computational complexity associated with training traditional DRL models, we present a Tiny DRL algorithm that integrates pruning techniques with DRL methodologies. This novel approach enhances computational efficiency while aiming to achieve Stackelberg equilibrium. By combining pruning techniques with DRL, our algorithm efficiently navigates complex and dynamic environments, ultimately improving performance in reaching the desired equilibrium state.

The rest of this paper is organized as follows: Section 2 reviews the related work and introduces the combination of DRL with pruning techniques. In Section 3, we introduce the system model considering electricity trading between EV charging operators and the electricity market. In Section 4, we introduce the single-leader, multi-follower Stackelberg game model between EV charging operators and the electricity market in detail. In Section 5, we propose a Tiny DRL algorithm to find the Stackelberg equilibrium. The numerical results of the proposed scheme are shown in Section 6. Section 7 concludes the paper.

2. Related Work

In this section, we review several related works, with a focus on reliable energy trading in electricity markets. Ensuring reliable energy trading is crucial for maintaining grid stability and optimizing resource allocation. Therefore, compared to traditional schemes, blockchain technology is employed to enhance the security of transactions in this paper, safeguarding the integrity and transparency of the process. Furthermore, advanced DRL methods incorporating pruning techniques are utilized to optimize strategic bidding, energy trading, and load management. By removing less significant neurons or parameters from the network, these techniques enable more efficient decision making, leading to faster convergence and more robust learning outcomes.

2.1. Reliable Energy Trading in Electricity Markets

EVs possess dual attributes as both electrical loads and power sources. They play a crucial role in creating a safe, economical, and environmentally friendly intelligent power system. EVs significantly contribute to the solving of transportation, energy, and environmental challenges by reducing greenhouse gas emissions, enhancing energy efficiency, and supporting grid stability through the use of smart charging and vehicle-to-grid technologies [12]. Integrating EVs into smart grids and urban infrastructure not only mitigates pollution but also fosters the development of sustainable and resilient energy systems. Therefore, numerous scholars have undertaken extensive and in-depth research on the integration of EVs into the electricity market [13,14,15,16]. The authors of [13] proposed a joint demand response and energy trading model for electric vehicles in off-grid microgrid systems, optimizing transaction prices through a broker-led Stackelberg game approach. The results demonstrated that this model achieves up to 25.8% lower transaction prices compared to existing markets while maintaining high power reliability, showcasing its suitability for isolated microgrid environments. The authors of [14] presented a Peer-to-Peer (P2P) local electricity market model that integrates both energy and uncertainty trading to enhance the reliability of energy trading in electricity markets, particularly with the incorporation of EVs. The model significantly improves the local balancing of photovoltaic forecast errors by matching forecast power with time-flexible demand and uncertain power with power-flexible demand. The authors of [17] presented a data-driven probabilistic evaluation method for determining the hosting capacity of hydrogen fuel cell vehicles, incorporating a directional mapping approach, a probabilistic model considering high-dimensional uncertainties, and a cross-term decoupled polynomial chaos expansion for efficient computation. The authors of [15] introduced a decentralized Quality of Service (QoS)-based system for P2P energy trading among EVs, leveraging smart contracts to ensure reliable and resilient transactions without a third party. By employing QoS attributes and a fuzzy-based approach, the system effectively matches energy providers and consumers while implementing penalties to maintain contract integrity, thereby enhancing reliability in electricity markets. The authors of [18] presented a comprehensive analysis of the application and evolution of cooperative, non-cooperative, and evolutionary game theory within the electricity market. They examined the effects of these game theory models on the power generation, power sale, and power consumption sectors, with a particular focus on energy trading. Additionally, the study assessed the current status and scale of electricity markets, both domestically and internationally, providing insights and prospects for future research and the application of game theory in this domain.

2.2. Blockchain-Based Energy Trading in the Electricity Market

With the exponential increase in data volume and the inherent value of these data, transactions within the electricity market are encountering a critical demand for enhanced security measures [19,20,21]. For example, the authors of [22] presented FedPT-V2G, a federated transformer learning approach for real-time vehicle-to-grid dispatch that addresses non-IID data issues and data privacy concerns through the use of proximal algorithms and transformer models, achieving performance comparable to that of centralized learning in both balanced and imbalanced datasets. The adoption of blockchain technology also represents a viable solution for the establishment of trustworthiness and ensuring the continuity of secure transactions within the electricity market. By leveraging blockchain for secure storage and management, a decentralized system can be established that guarantees data integrity through encryption protocols, ensuring transparency and robust security throughout the entire process [23]. The authors of [19] reviewed the role of blockchain technology, combined with smart contracts, in facilitating peer-to-peer energy trading among prosumers, highlighting its potential to reshape the energy sector, the challenges it faces, emerging start-ups, and its application in EV charging. The authors of [20] proposed a novel blockchain-based distributed community energy trading mechanism designed to optimize energy trading efficiency and security in the context of shifting from consumers to producers. The authors of [21] pointed out that the security characteristics of blockchain technology can improve the efficiency of energy transactions and establish the basic stability and robustness of the energy market, e.g., the electricity market, and also reviewed the basic characteristics of blockchain and energy markets. In conclusion, the pivotal role of blockchain technology in energy trading, particularly electricity trading, is increasingly recognized by scholars, as evidenced by the growing body of research in this area. Therefore, in this paper, blockchain technology is utilized to enhance the security of the transaction process, underscoring its significance in ensuring the integrity and trustworthiness of energy transactions compared to current electricity trading methods.

2.3. Deep Reinforcement Learning with Pruning Techniques

DRL combines the advantages of deep learning and reinforcement learning, enabling the creation of algorithms that dynamically interact with and adapt to their environment. By employing privacy-preserving techniques, DRL algorithms iteratively learn and optimize decision making while safeguarding sensitive information. In the context of Stackelberg games, which involve leader–follower dynamics and strategic decision making, participants might be hesitant to disclose too much information due to competition or security concerns. DRL is essential for effectively reaching equilibrium solutions in such settings, as it allows agents to learn optimal strategies through interaction without requiring full disclosure of private information [24]. This capability is particularly beneficial in applications like security games, energy trading, and multi-agent systems, where balancing strategic advantage and information privacy is crucial [25].

However, training DRL models is resource-intensive in terms of computing power and storage. To address the need for more efficient DRL models in specific scenarios, researchers have increasingly adopted pruning techniques to optimize and enhance DRL performance [24,26,27,28,29]. For example, the authors of [29] introduced a novel multi-agent deep reinforcement learning method for urban distribution network reconfiguration, incorporating a “switch contribution” concept to reduce the action space, an improved QMIX algorithm for policy enhancement, and a two-stage learning structure with reward sharing to improve learning efficiency, which was validated through numerical results on a 297-node system. Pruning techniques are mainly divided into structured pruning and unstructured pruning [30]. Structured pruning involves the removal of entire components of a neural network, e.g., layers, neurons, or channels [24]. With the pruning of these larger structures, the shape of the model changes, leading to a more streamlined and often faster-to-execute network. Unstructured pruning, also known as magnitude pruning, targets individual parameters or weights within the neural network [30]. It removes weights that have the smallest magnitude, resulting in a sparse network. Pruning techniques have emerged as a promising approach to compress DRL models and improve algorithm efficiency, with an increasing amount of research focused on integrating pruning techniques with DRL. The authors of [26] proposed a novel model compression framework for DRL models using a sparse regularized pruning method and policy-shrinking technology, achieving a balance between high sparsity and compression rate. The authors of [27] proposed a compact DRL algorithm that leverages adaptive pruning and knowledge distillation to achieve high long-term transaction efficiency and lightweight routing for payment channel networks in resource-limited Internet of Things (IoT) devices. Simulation results show that the algorithm significantly outperforms baseline methods. The authors of [24] proposed a Tiny Multi-Agent DRL (Tiny MADRL) algorithm to facilitate the efficient migration of Unmanned Aerial Vehicle Twins (UTs) in Unmanned Aerial Vehicle (UAV) metaverses. By using pruning techniques, the algorithm reduces the network parameters and computational demands, optimizing Roadside Unit (RSU) selection and bandwidth allocation for seamless UT migration.

3. System Model

Decarbonizing transportation is crucial for climate change mitigation. With the increasing supply of renewable energy, governments are actively promoting the electrification of vehicle fleets [31]. Figure 1 shows the proposed blockchain-assisted secure electricity trading between EV charging operators and the electricity market with renewable energy sources. We provide more details of the system model as follows:

EV Charging Operator: EV charging operators are responsible for managing and operating charging stations where EV owners can recharge their vehicles [32]. They ensure the availability, functionality, and efficiency of charging infrastructure. These charging operators purchase electricity from different kinds of electricity markets to supply their charging stations, maintaining a reliable energy source for EVs.
Aggregator: Traditionally, the aggregator purchases time-varying electricity from the power grid and sells it to traditional users [33]. In this paper, we consider the aggregator responsible for managing electric energy trading between the EV charging operator and electricity markets. Specifically, the aggregator utilizes blockchain technology to securely record and verify energy trading transactions [34], which ensures the transparency, traceability, and efficiency of electric energy trading between the EV charging operator and electricity markets [34]. Note that the Practical Byzantine Fault Tolerance (PBFT) consensus algorithm is used in the blockchain system to achieve lightweight consensus. The incorporation of blockchain technology into energy trading enhances security, transparency, and traceability, surpassing the capabilities of traditional electricity market trading mechanisms [21]. This advancement empowers EV charging operators to make well-informed and optimized operational decisions, thereby ensuring the efficiency and reliability of electric energy trading processes.
Electricity Markets: Electricity market operators facilitate the buying and selling of electrical energy, with EV charging operators participating by purchasing the electricity needed to supply their stations. These markets—especially those incorporating renewable energy resources—regulate prices by continuously adjusting them based on the supply-and-demand dynamics of EV charging operators.

4. Stackelberg Model for Electric Energy Trading

In this section, we consider that one electricity market and a set (

M = {1, \dots, m, \dots, M}

) of M EV charging operators participate in electric energy trading.

During electric energy trading, the electricity market is the sole electricity resource, and EV charging operators rely on electricity resources provided by the electricity market to supply energy for EVs. Since electric energy trading between the electricity market and EV charging operators is an incomplete process [35], a monopoly market is formed [36]. Specifically, the electricity market operates as a monopoly with the authority to regulate electricity, while market supply and demand drive price adjustments. Electric vehicle charging operators must decide how much electricity to purchase based on the prevailing prices. If prices are low, EV charging operators may buy more energy to ensure a reliable supply for EVs. Conversely, high prices may discourage purchases. Therefore, balancing energy trading is crucial to maximizing the utility of the electricity market while maintaining its monopoly power.

The Stackelberg game, acting as an effective game-theoretical model, has been widely used to strategically regulate the price of oligopolies, which can be described as an oligopoly model [23,36]. The Stackelberg game has two stages, where the leader sets its strategy first, followed by the followers, who respond accordingly. We model this as a single-leader, multi-follower Stackelberg game between the electricity market and EV charging operators. In the first stage, the electricity market, as the leader, sets the selling price to maximize its utility. In the second stage, each EV charging operator, as a follower, determines its energy demand to maximize its utility. The Stackelberg game model is described in detail as follows.

4.1. Electric Energy Demands of EV Charging Operators in Stage II

We formulate the utility function of EV charging operators, which is the difference between the profit corresponding to the purchased electric energy and the cost of purchasing electric energy. Specifically, for EV charging operator m, we define

E_{m}

as the electric energy provided by the electricity market. The more electric energy obtained from the electricity market, the more profits that EV charging operators can obtain. Thus, motivated by [36], the profit of EV charging operator m can be defined as

G_{m} (E_{m}) = α_{m} log (1 + E_{m}),

(1)

where

α_{m}

is the unit profit for the purchased electric energy of EV charging operator m. Thus, the utility function of EV charging operator m is given by

U_{m} (E_{m}) = G_{m} (E_{m}) - P \cdot E_{m},

(2)

where

P > 0

is the unit selling price of electric energy. In Stage II, each EV charging operator (m) aims to maximize its utility

U_{m} (E_{m})

by deciding the optimal electric energy demand to purchase. Therefore, the optimization problem that maximizes the utility of EV charging operator m is formulated as

\begin{matrix} P 1 : & max_{E_{m}} U_{m} (E_{m}) \\ s . t . E_{m} > 0 . \end{matrix}

(3)

4.2. Selling Price of the Electricity Market in Stage I

The electricity market, as the energy provider, ensures that its energy allocation meets the demands of EV charging operators while maximizing its utility [23]. To achieve this, it formulates a dynamic pricing strategy, adjusting based on the energy demands of the EV charging operators. The utility of the electricity market is the difference between the total charges paid by EV charging operators and the cost of energy harvesting and transmission. Thus, the utility of the electricity market is expressed as

U_{e} (p) = \sum_{m = 1}^{M} (P \cdot E_{m} - C \cdot E_{m}),

(4)

where

C > 0

is the unit cost of supplying electric energy to EV charging operators. From (4), we know that the electricity market can obtain profits by providing electric energy to EV charging operators but needs to pay the costs of supplying electric energy. Considering that the renewable energy harvested by the electricity market is not unlimited, the energy sold by the electricity market has an upper limit of

E_{m a x}

, and the energy price also has an upper limit of

P_{m a x}

. The electricity market aims to maximize its utility by deciding a selling price under the constraints that the total electric energy sales do not exceed

E_{m a x}

and the energy price does not exceed

P_{m a x}

. Hence, the optimization problem of maximizing the utility of the electricity market is given by

\begin{matrix} P 2 : & max_{P} U_{e} (p) = \sum_{m = 1}^{M} (P \cdot E_{m} - C \cdot E_{m}) \\ s . t . 0 < \sum_{m = 1}^{M} E_{m} \leq E_{m a x}, \\ E_{m} > 0, \forall m \in {1, \dots, M}, \\ 0 < C \leq P \leq p_{m a x} . \end{matrix}

(5)

Note that no EV charging operator would buy electric energy from the electricity market if the selling price of unit electric energy were to exceed

P_{m a x}

. Finally, we formulate the Stackelberg game based on (3) and (5).

4.3. Stackelberg Equilibrium Analysis

In this part, we seek the Stackelberg equilibrium to find the optimal solution for the game. This equilibrium ensures that the electricity market maximizes its utility, while EV charging operators can design energy request policies based on their best response. Both parties maximize their utility by adjusting strategies until they reach equilibrium [23]. The Stackelberg equilibrium is defined as follows:

Definition 1

(Stackelberg Equilibrium). We denote

E^{*} = {E_{m}^{*}}, m \in M

and

P^{*}

as the optimal electric energy demands of EV charging operators and the optimal energy pricing of the electricity market, respectively. The strategy

(E^{*}, P^{*})

can be the Stackelberg equilibrium if and only if the following set of inequalities is strictly satisfied [23,36]:

\{\begin{matrix} U_{e} (P^{*}, E^{*}) \geq U_{e} (P, E^{*}), \\ U_{m} (E_{m}^{*}, E_{- m}^{*}, P^{*}) \geq U_{m} (E_{m}, E_{- m}^{*}, P^{*}), \forall m \in M . \end{matrix}

(6)

In the following, we utilize the backward induction method to analyze the Stackelberg equilibrium [23,36].

4.3.1. EV Charging Operators’ Optimal Strategies as Equilibrium in Stage II

In the Stackelberg game, EV charging operators act as followers, which determine the optimal strategies of electric energy demands based on the selling price of a unit of electric energy (P), thereby maximizing their profits.

Theorem 1.

The perfect equilibrium in the EV charging operators’ subgame is unique.

Proof.

We derive the first-order derivative and the second-order derivative of

U_{m} (E_{m})

with respect to

E_{m}

as follows:

\begin{matrix} \frac{\partial U_{m} (E_{m})}{\partial E_{m}} & = \frac{α_{m}}{1 + E_{m}} - P, \\ \frac{\partial^{2} U_{n} (b_{n})}{\partial b_{n}^{2}} & = - \frac{α_{m}}{{(1 + E_{m})}^{2}} < 0 . \end{matrix}

(7)

Since the first-order derivative of

U_{m} (E_{m})

has a unique zero point and the second-order derivative of

U_{m} (E_{m})

is negative, the utility function (

U_{m} (E_{m})

) of EV charging operators is strictly concave with respect to the electric energy demand strategy (

E_{m}

) of EV charging operators. Based on the first-order optimality condition, i.e.,

\frac{\partial U_{m} (E_{m})}{\partial E_{m}} = 0

, we can obtain the best response function (

E_{m}^{*}

) of EV charging operator m, which is given by

E_{m}^{*} = \frac{α_{m}}{P} - 1 .

(8)

Therefore, perfect equilibrium in the subgame of EV charging operators is unique. □

4.3.2. The Electricity Market’s Optimal Strategy as Equilibrium in Stage I

In this part, we focus on studying the concavity of the utility function of the electricity market, proving the existence and uniqueness of the Stackelberg equilibrium. In Stage I, the electricity market acts as the leader in maximizing its utility by predicting the strategies of EV charging operators.

Theorem 2.

The uniqueness of the Stackelberg equilibrium

(E^{*}, P^{*})

can be guaranteed in the formulated Stackelberg game.

Proof.

According to Theorem 1, there exists a unique Nash equilibrium among EV charging operators under any given value of P. Thus, the electricity market can maximize its utility by choosing the optimal value of P. Based on the optimal electric energy demand strategies of EV charging operators, the utility function of the electricity is given by

U_{e} (P) = \sum_{m = 1}^{M} (P - C) (\frac{α_{m}}{P} - 1) .

(9)

By taking the first-order derivative and the second-order derivative of

U_{e} (P)

with respect to P, we can obtain

\begin{matrix} \frac{\partial U_{e} (P)}{\partial P} & = \sum_{m = 1}^{M} (\frac{α_{m} C}{P^{2}} - 1), \\ \frac{\partial^{2} U_{e} (P)}{\partial^{2} P} & = \sum_{m = 1}^{M} - \frac{2 α_{m} C}{P^{3}} < 0 . \end{matrix}

(10)

Since the first-order derivative of

U_{e} (P)

has a unique zero point, we can obtain

P^{*} = \sqrt{\frac{C \sum_{m = 1}^{M} α_{m}}{M}}

, and the second-order derivative of

U_{e} (P)

is negative, so

U_{e} (P)

is also strictly concave, which indicates that the electricity market has a unique optimal solution to the formulated game [36]. Based on the optimal strategy of the electricity market, the optimal strategies of EV charging operators can be obtained [37]. Therefore, the uniqueness of the Stackelberg game’s equilibrium is proven. □

Due to the dynamic nature of the environment of energy trading between the electricity market and EV charging operators [38], traditional methods may be difficult to adapt to the dynamics of energy trading and not be able to efficiently find the Stackelberg equilibrium. Since DRL agents can learn to adapt their behavior based on environmental dynamics [39], we utilize a DRL algorithm to find the Stackelberg equilibrium. Furthermore, we innovatively add dynamic structured pruning techniques to the DRL algorithm for efficient implementation in energy trading.

5. Tiny Deep Reinforcement Learning for an Optimal Pricing Strategy

In intricate decision-making contexts, sophisticated AI methodologies such as DRL [40,41] represent promising approaches for the development of incentive mechanisms while addressing privacy concerns [42,43]. In this section, we model the formulated Stackelberg game between the electricity market and EV charging operators as a Partially Observable Markov Decision Process (POMDP) [36,44]. To address the challenge posed by incomplete information and enhance the efficiency of finding the Stackelberg equilibrium, we propose a Tiny DRL algorithm. The Tiny DRL algorithm is designed to find the Stackelberg equilibrium by identifying the optimal solutions of the Stackelberg game, enabling the electricity market to quickly converge to near-optimal decisions. Unlike traditional DRL approaches that focus on estimating fixed policies or single-step models, the proposed method leverages Markov properties to effectively decompose the problem.

5.1. POMDP for the Stackelberg Game between the Electricity Market and EV Charging Operators

Because of the effect of the competition between the electricity market and EV charging operators, each EV charging operator has local incomplete information in the Stackelberg game and determines electric energy strategies in a completely non-cooperative manner. The energy trading environment following a POMDP is needed to train the DRL agent, which is formulated by conceptualizing the dynamic relationship between the electricity market and EV charging operators as a Stackelberg game. Let

F = {S, O, A, R, γ}

represent a POMDP [45], where

S

,

O

,

A

,

R

, and

γ

represent the state space, partially observable policy, action space, reward function, and discounted factor for the electricity market, respectively [36,45].

In each time step (t, where

t \in T = {0, \dots, t, \dots, T}

), the electricity market interacts with the environment to determine its current state, which is denoted as

S (t)

. During the training process, the electricity market, acting as the DRL agent, engages in interactions with the environment. At each time step, when the electricity market executes an action (

P (t)

) according to the current state (

S (t)

), the environment provides an immediate reward (

R (t)

) [46]. In the realm of electric energy trading, the electricity market functions as a game leader, responsible for selecting the action, i.e., the pricing policy (

P (t)

). After that, EV charging operators, acting as followers, identify an optimal strategic decision based on (8). Following this, the environment provides a reward (

R (t)

) to the electricity market by considering the strategies decided by all EV charging operators. The system contains a finite relay buffer, denoted as

D

, which can store historical operation data, and the capacity of the finite relay buffer is defined as D. Relevant data of the electricity market can be extracted from the relay buffer to create new states, triggering the subsequent time step [24].

5.1.1. State Space

In each time step (

t \in T = {0, \dots, t, \dots, T}

), the state space is defined as a union of the current pricing strategy of the electricity market and the electric energy demand strategies of EV charging operators, which is denoted as

S (t) ≜ \{P (t), E (t)\},

(11)

where

P (t)

and

E (t)

are the price of the electricity market and the electric energy demand vector of EV charging operators at time step t, respectively.

5.1.2. Partially Observable Policy

We formulate the partially observable space for energy trading between the electricity market and EV charging operators, tackling the non-stationary problem in the DRL system. Throughout the POMDP, the electricity market agent can solely base decisions on local environmental observations. We define the observation space of the electricity market in at time step t as

O (t)

, which is a union of its historical pricing strategies and the electric energy demand strategies of EV charging operators for the previous L games involving the electricity market and all EV charging operators. Consequently, the observation space (

O (t)

) of the electricity market at time step t is represented as

O (t) ≜ \{P (t - L), E (t - L), P (t - L + 1), E (t - L + 1), \dots, P (t - 1), E (t - 1)\},

(12)

where

P (t - L)

and

E (t - L)

can be generated randomly during the initial stage when

t < L

. By considering historical information, the electricity market agent can learn how changes in its strategy impact the game result in the current time slot [36]. When receiving an observation (

O (t)

) from the environment, the electricity market agent needs to design the selling price (

P (t)

) of electric energy to maximize its utility.

5.1.3. Action Space

A ≜ {P}

denotes the action space of the electricity market. Given the lower-bound cost (C) and the upper-bound price (

P_{m a x}

) for the pricing action, the electricity market decides its action (

P (t)

) at each time step (t), where

P (t) \in [C, P_{m a x}]

. This decision-making process relies on the information encapsulated in the observation space (

O (t)

).

5.1.4. Reward Function

R ≜ {R}

denotes the reward function of the electricity market. Following the state transition, the electricity market can acquire an immediate reward based on the current state (

S (t)

) and the corresponding action (

P (t)

) [36]. The reward function is defined as the utility function of the electricity market that we construct in the Stackelberg game. At time step t, the reward function for the electricity market is represented as

R (t) = U_{e} (t)

.

In the actor–critic network framework, the system consists of two crucial elements, i.e., the actor network and the critic network [24]. Proximal Policy Optimization (PPO) is a DRL algorithm based on policy gradients [47]. By employing proximal optimization techniques on the policy, the stability and convergence of agent learning can be enhanced, ensuring more reliable and efficient learning processes. In the proposed tiny DRL framework, we denote the actor–critic network as

(θ, ω)

. Note that the actor and critic networks are all neural networks. The actor network essentially functions as a policy function (

π_{θ} (P | S)

) with parameters (

θ

), which helps to generate the action of the electricity market, namely the pricing strategy (P), and facilitate interactions with the environment. Conversely, the critic network, characterized by the value function (

V_{ω} (S)

) parameterized by

ω

, evaluates the performance of the electricity market agent and guides the actions of the agent in subsequent phases, which is defined as

V_{ω} (S) ≜ {\hat{E}}_{π_{θ}} [\sum_{t = 0}^{T} γ^{t} R (S (t), P (t)) ∣ S_{0} = S],

(13)

where

{\hat{E}}_{π_{θ}} (\cdot)

is the expected value of a random variable, given that the electricity market agent follows the policy (

π_{θ}

).

The primary objective of the critic network is to minimize the Temporal Difference (TD) error, which is expressed as

d = R (t) + γ V_{ω} (S (t + 1)) - V_{ω} (S (t)),

(14)

where

V_{ω} (S (t))

and

V_{ω} (S (t + 1))

represent the value functions associated with the current state (

S (t)

) and the subsequent state (

S (t + 1)

), respectively. Therefore, the loss function of the critic network is derived by minimizing the expected value of the squared temporal difference (TD) value, which is given by [27]

\begin{matrix} min_{ω} L_{c} (ω) = min_{ω} E [(R (t) + γ V_{ω} (S (t + 1)) \\ - V_{ω} (S (t)))^{2}] . \end{matrix}

(15)

Furthermore, the objective of the actor network is specifically defined as

\begin{matrix} max_{θ} J_{a} (θ) = max_{θ} E [min (ζ (θ) {\hat{A}}_{π_{θ}} (S, P), \\ I (ι, ζ (θ)) {\hat{A}}_{π_{θ}} (S, P))], \end{matrix}

(16)

where

ζ (θ) = \frac{π_{θ} (P | S)}{π_{\hat{θ}} (P | S)}

represents the important ratio between the old policy and the new policy,

\hat{θ}

represents the parameters of the strategy used for sampling (P), and

π_{\hat{θ}} (P | S)

denotes the policy employed for importance sampling [48].

I (ι, ζ (θ))

is a piece-wise function with intervals, which is given by [48]

I (ι, ζ (θ)) = \{\begin{matrix} 1 + ι, & ζ (θ) > 1 + ι, \\ ζ (θ), & 1 - ι \leq ζ (θ) \leq 1 + ι, \\ 1 - ι, & ζ (θ) < 1 - ι, \end{matrix}

(17)

where

ι

represents an adjustable hyper-parameter.

{\hat{A}}_{π_{θ}} (S, P)

denotes the estimator for the advantage function that utilizes

V_{ω} (S)

, which is expressed as

\begin{matrix} {\hat{A}}_{π_{θ}} (S (t), P (t)) = & γ^{T - t} V_{ω} (S (T)) - V_{ω} (S (t)) \\ + \sum_{x = t}^{T - 1} γ^{x - t} R (S (x), P (x)), \end{matrix}

(18)

5.2. Dynamic Structured Pruning

In the DRL algorithm, the actor and critic networks are essentially deep neural networks [49], which typically consist of an input layer, multiple hidden layers, and an output layer [24]. These layers have numerous parameters, like neurons and weights. Without loss of generality, we consider an actor network with K layers and denote the weights in the k-th fully connected layer as

θ^{(k)}

, where

k \in {1, \dots, K}

. By inputting the state (

S (t)

) at time step t into the first layer, the output of the first layer is calculated as

h^{(1)} = σ^{(1)} (θ^{(1)} S (t) + b^{(1)}),

(19)

where

σ^{j (1)}

represents the nonlinear response of the first layer, which is typically set to the ReLU function, and

b^{(1)}

is the deviation at the h-th layer. The output of each layer in the network is fed to the subsequent laye ras the input. Therefore, the output of the k-th layer is expressed as

h^{(k)} = σ^{(k)} (θ^{(k)} h^{(k - 1)} + b^{(k)}) .

(20)

Finally, at time step t, the actor network outputs the action, i.e., the price strategy (

P (t)

), which is expressed as

P (t) = σ^{(K)} (θ^{(K)} h^{(K - 1)}) .

(21)

To achieve the Tiny DRL algorithm, we incorporate dynamic structured pruning techniques into the actor network. This helps eliminate neurons and weights that do not significantly contribute to the performance of the actor network [49]. Unlike unstructured pruning techniques for the acceleration of DRL training, which often results in irregular network structures [24], structured pruning is a technique used to reduce model complexity by strategically eliminating redundant neurons or connections [50].

To indicate the pruning status of neurons, a binary mask (

m^{(k)}

) is employed. Specifically, we denote

m_{i}^{(k)} = 1

as a non-pruning neuron (

o_{i}^{(k)}

) and

m_{i}^{(k)} = 0

as a pruning neuron (

o_{i}^{(k)}

) [24]. Thus, the action output from the actor network is expressed as

P (t) = σ^{(H)} (θ^{(H)} h^{(H - 1)} ⊙ m^{(H)}),

(22)

where ⊙ denotes the element-wise multiplication of two matrices. Based on the above analysis, the loss function of the actor network is rewritten as [24,50]

\begin{matrix} J_{a} (θ, m) = E [min (ζ (θ, m) {\hat{A}}_{π_{θ}} (S, P), \\ I (ι, ζ (θ, m)) {\hat{A}}_{π_{θ}} (S, P))] . \end{matrix}

(23)

As D records accumulate in the replay buffer, the actor and critic networks are updated. Specifically, the electricity market updates the parameters of the actor network by using the gradient ascent method, which is given by

{θ^{(k)}}^{'} = θ^{(k)} - ε \frac{\partial J_{a} (θ, m)}{\partial (h^{(k)} ⊙ m^{(k)})} \cdot \frac{\partial (h^{(k)} ⊙ m^{(k)})}{\partial θ^{(k)}},

(24)

where

ε

represents the learning rate employed in the training process of the actor network and

{θ^{(k)}}^{'}

represents the updated parameters of the actor network. The parameters of the critic network are updated through the gradient descent method as follows [24,50]:

{ω^{(k)}}^{'} = ω^{(k)} - ϵ \frac{\partial L_{c} (ω)}{\partial ω^{(k)}},

(25)

where

ϵ

represents the learning rate employed in the training process of the critic network and

{ω^{(h)}}^{'}

represents the updated parameters of the critic network.

The dynamic structured pruning of non-essential neurons consists of two key steps, namely, determining the pruning threshold and updating the binary mask used for pruning [24,50]. The pruning threshold plays a crucial role in identifying and eliminating unnecessary parameters or connections during the pruning process. Motivated by [24,50], we formulate a dynamic pruning threshold, which is given by

χ (t) = \sum_{n = 1}^{N} \sum_{k = 1}^{K} ρ_{n}^{(k)} \cdot τ (t),

(26)

τ (t) = \overset{ˇ}{τ} + (\hat{τ} - \overset{ˇ}{τ}) {(1 - \frac{t}{Y ▵ t})}^{3},

(27)

where

ρ_{n}^{(k)}

and K represent the neuronal importance of the n-th neuron of layer k and the total number of pruning steps, respectively.

▵

represents the pruning frequency.

τ (t)

,

\hat{τ}

, and

\overset{ˇ}{τ}

represent the current sparsity in epoch t, the initial sparsity, and the target sparsity, respectively. This dynamic pruning method can adaptively enhance the sparsity of the model as the iteration goes on, providing a more refined and effective method for structured pruning. Neurons are ranked according to their importance, from least to most important. Neurons whose ranks are below a set threshold are then pruned to improve the overall sparsity of the model. The mask of the n-th neuron of layer k is updated as

m_{n}^{(k)} = \{\begin{matrix} 1, & if abs [m_{n}^{(k)}, θ_{n}^{(k)}] \geq ψ, \\ 0, & otherwise . \end{matrix}

(28)

The above process of dynamic structured pruning is shown in Algorithm 1. In the Tiny DRL model, we adopt a fully connected deep neural network architecture for the actor network, which consists of K layers. Algorithm 1 consists of a two-step process, namely, initially training the DRL model, then using a dynamic pruning threshold to remove unimportant neurons. Note that the complexity of Algorithm 1 over T episodes is

O (T | S |) + O (T \sum_{k = 1}^{K - 1} u^{(h)})

, where

u^{(h)}

is the number of neurons in each hidden layer (k) up to the penultimate layer [24,50].

Algorithm 1: Tiny DRL algorithm with dynamic structured pruning for Stackelberg equilibrium.

Input: State

S

.
Output: The optimal strategy

(E^{*}, P^{*})

.

6. Numerical Results

In this section, we present numerical results to demonstrate the effectiveness of the proposed tiny DRL algorithm and analyze the proposed Stackelberg game model.

Figure 2 presents a performance comparison between the proposed Tiny PPO algorithm and the PPO algorithm. We set the pruning rate, the learning rate of the actor and critic networks, the discount factor, the training epoch, and the batch size as

0.05

,

1 \times 10^{- 4}

,

0.95

, 400, and 512, respectively. From Figure 2, we can observe that the proposed Tiny PPO algorithm is more stable than the PPO algorithm and can obtain more test rewards. The Tiny PPO algorithm can also promote higher utility of the electricity market and sum utilities of all EV charging operators, demonstrating the superior performance of the proposed Tiny PPO algorithm.

Figure 3 shows the utilities and optimal strategies of the electricity market and EV charging operators under different costs (C), with

M = 5

corresponding to the number of EV charging operators and a unit profit of

α = 50

. From Figure 3, we can observe that as the unit cost (C) increases, the selling price of a unit of electric energy (P) set by the electricity market also rises. Concurrently, the electric energy demands (

E_{m}

) determined by the EV charging operators decrease. The underlying reason for this trend is that an increase in the unit cost (C) compels the electricity market to raise prices to maintain stable and increasing profits. Then, this price hike discourages EV charging operators from purchasing large amounts of electricity, leading to a reduction in electric energy demands. Moreover, the utilities of the electricity market and EV charging operators decrease as the unit cost (C) increases. That is because the electric energy demands of EV charging operators decrease, while the selling price of a unit of electric energy (P) increases. Specifically, the reduction in electric energy demand has a more substantial negative impact on the utility of the electricity market than the positive impact of the increased selling price, resulting in a net decrease in the utility of the electricity market. Similarly, for EV charging operators, the adverse effect of a higher selling price per unit of electric energy outweighs the effect of reduced electric energy demands, leading to a decrease in their utility as well.

Figure 4 illustrates the utilities and strategies of the electricity market and EV charging operators under different numbers of EV charging operators, with a unit cost of

C = 5

and unit profit of

α = 50

. From Figure 4, it is evident that the electric energy demands increase as the number of EV charging operators (M) rises, while the selling price of a unit of electric energy (P) remains stable, regardless of changes in the number of EV charging operators. According to the the equation

P^{*} = \sqrt{\frac{C \sum_{m = 1}^{M} α_{m}}{M}}

, since C and

α

are constant, P does not change. Specifically, the stability of the selling price (P) amidst increasing demand can be attributed to the constancy of the unit cost (C) and unit profit (

α

), ensuring that the price equilibrium is maintained. Additionally, we can observe that the utilities of both the electricity market and the EV charging operators increase as the number of EV charging operators grows. This is because the increased energy demands of EV charging operators positively impact the utilities of both the electricity market and the EV charging operators.

Figure 5 shows the utilities and strategies of the electricity market and EV charging operators under different unit profits (

α

), with

M = 5

corresponding to the number of EV charging operators and a cost of

C = 5

. It is observed that as the unit profit (

α

) increases, both the electric energy demands and the selling price per unit of electric energy rise. This is because a higher unit profit (

α

) incentivizes EV charging operators to purchase more electricity resources. The increased demand for electric energy enables the electricity market to set higher prices to maximize its profit. Furthermore, we can observe that the utilities of the electricity market and EV charging operators increase as the unit profit (

α

) increases. It is obvious that the simultaneous growth in selling price and electric energy demands boosts the utility of the electricity market. For EV charging operators, the increase in utility may be attributed to the fact that the positive impact of higher electric energy demands outweighs the negative impact of rising selling prices.

Figure 6 shows the security performance of the PBFT consensus algorithm in the proposed blockchain system for electricity trading. From Figure 6, we can see that regardless of the probability of a delegate being malicious,

p_{m}

exists, and the security probability increases as the number of miners increases. The PBFT algorithm relies on a majority consensus, requiring more than half of the nodes to agree. Therefore, as the number of miners increases, the proportion of honest nodes involved in the consensus process also grows, which enhances the overall robustness of the system, making it increasingly difficult for malicious attackers to compromise its integrity [51]. Therefore, the proposed blockchain system utilizing the PBFT consensus algorithm ensures reliable and secure electricity trading by guaranteeing trustworthy block verification.

7. Conclusions

In this paper, we proposed a blockchain-assisted secure energy trading framework. Specifically, we utilized blockchain technology to securely manage energy trading between the electricity market and EV charging operators. Then, we proposed a single-leader, multi-follower Stackelberg game model to address the electricity trading problem between the electricity market and EV charging operators. In this model, the electricity market acts as the leader, setting the price of a unit of electric energy. The EV charging operators, as followers, determine their electricity demand based on the price set by the electricity market. During the trading process, blockchain technology is utilized by EV charging aggregators to securely record and verify energy transactions. To find the Stackelberg equilibrium, we employed a DRL algorithm. Given the resource-intensive nature of training DRL models, we introduced pruning techniques into the DRL framework, referred to as Tiny DRL, to enhance the efficiency of the algorithm in terms of computing power and storage requirements. In future work, we will consider formulating a multi-leader, multi-follower Stackelberg game between electricity markets and EV charging operators. Our focus will be on enhancing the verification of our model through rigorous testing and validation procedures. Furthermore, we will aim to enhance consensus mechanisms, optimize smart contract functionalities, and explore interoperability with other blockchain networks to improve security, scalability, and efficiency within the energy trading ecosystem.

Author Contributions

Conceptualization, Y.X.; Methodology, Y.L.; Software, F.Z.; Validation, X.L.; Formal analysis, J.T.; Investigation, B.Q.; Resources, F.Z.; Data curation, X.L.; Writing—original draft preparation, X.L.; Writing—review and editing, X.L.; Visualization, F.Z.; Supervision, Y.G.; Project administration, Y.X.; Funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Major Science and Technology Project of China Southern Power Grid Co., Ltd. (ZBKJXM20232456).

Data Availability Statement

All data underlying the results are available as part of the article and no additional source data are required.

Conflicts of Interest

Author Yiyong Lei and Yanzhang Gu were employed by the company China Southern Power Grid Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Liu, P.; Ding, T.; Zou, Z.; Yang, Y. Integrated demand response for a load serving entity in multi-energy market considering network constraints. Appl. Energy 2019, 250, 512–529. [Google Scholar] [CrossRef]
Krishnamurthy, D.; Uckun, C.; Zhou, Z.; Thimmapuram, P.R.; Botterud, A. Energy storage arbitrage under day-ahead and real-time price uncertainty. IEEE Trans. Power Syst. 2017, 33, 84–93. [Google Scholar] [CrossRef]
Shah, D.; Chatterjee, S. A comprehensive review on day-ahead electricity market and important features of world’s major electric power exchanges. Int. Trans. Electr. Energy Syst. 2020, 30, e12360. [Google Scholar] [CrossRef]
Xie, D.; Gou, Z.; Gui, X. How electric vehicles benefit urban air quality improvement: A study in Wuhan. Sci. Total Environ. 2024, 906, 167584. [Google Scholar] [CrossRef] [PubMed]
LaMonaca, S.; Ryan, L. The state of play in electric vehicle charging services–A review of infrastructure provision, players, and policies. Renew. Sustain. Energy Rev. 2022, 154, 111733. [Google Scholar] [CrossRef]
Sultan, V.; Aryal, A.; Chang, H.; Kral, J. Integration of EVs into the smart grid: A systematic literature review. Energy Inform. 2022, 5, 65. [Google Scholar] [CrossRef]
Sovacool, B.K.; Kester, J.; Noel, L.; de Rubens, G.Z. Actors, business models, and innovation activity systems for vehicle-to-grid (V2G) technology: A comprehensive review. Renew. Sustain. Energy Rev. 2020, 131, 109963. [Google Scholar] [CrossRef]
Silva, C.; Faria, P.; Vale, Z.; Corchado, J. Demand response performance and uncertainty: A systematic literature review. Energy Strategy Rev. 2022, 41, 100857. [Google Scholar] [CrossRef]
Motalleb, M.; Annaswamy, A.; Ghorbani, R. A real-time demand response market through a repeated incomplete-information game. Energy 2018, 143, 424–438. [Google Scholar] [CrossRef]
Wen, J.; Nie, J.; Kang, J.; Niyato, D.; Du, H.; Zhang, Y.; Guizani, M. From generative ai to generative internet of things: Fundamentals, framework, and outlooks. IEEE Internet Things Mag. 2024, 7, 30–37. [Google Scholar] [CrossRef]
Parker, G.G.; Tan, B.; Kazan, O. Electric power industry: Operational and public policy challenges and opportunities. Prod. Oper. Manag. 2019, 28, 2738–2777. [Google Scholar] [CrossRef]
Rauf, M.; Kumar, L.; Zulkifli, S.A.; Jamil, A. Aspects of artificial intelligence in future electric vehicle technology for sustainable environmental impact. Environ. Chall. 2024, 14, 100854. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.; Choi, J.K. Joint demand response and energy trading for electric vehicles in off-grid system. IEEE Access 2020, 8, 130576–130587. [Google Scholar] [CrossRef]
Zhang, Z.; Li, R.; Li, F. A novel peer-to-peer local electricity market for joint trading of energy and uncertainty. IEEE Trans. Smart Grid 2019, 11, 1205–1215. [Google Scholar] [CrossRef]
Al-Obaidi, A.A.; Farag, H.E. Decentralized quality of service based system for energy trading among electric vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6586–6595. [Google Scholar] [CrossRef]
Salmani, H.; Rezazadeh, A.; Sedighizadeh, M. Robust stochastic blockchain model for peer-to-peer energy trading among charging stations of electric vehicles. J. Oper. Autom. Power Eng. 2024, 12, 54–68. [Google Scholar]
Xia, W.; Ren, Z.; Li, H.; Pan, Z. A data-driven probabilistic evaluation method of hydrogen fuel cell vehicles hosting capacity for integrated hydrogen-electricity network. Appl. Energy 2024, 376, 123895. [Google Scholar] [CrossRef]
Huang, W.; Li, H. Game theory applications in the electricity market and renewable energy trading: A critical survey. Front. Energy Res. 2022, 10, 1009217. [Google Scholar] [CrossRef]
Thukral, M.K. Emergence of blockchain-technology application in peer-to-peer electrical-energy trading: A review. Clean Energy 2021, 5, 104–123. [Google Scholar] [CrossRef]
Wang, B.; Xu, J.; Ke, J.; Chen, C.P.; Wang, J.; Wang, N.; Li, X.; Zhang, F.; Li, L. CE-SDT: A new blockchain-based distributed community energy trading mechanism. Front. Energy Res. 2023, 10, 1091350. [Google Scholar] [CrossRef]
Jiang, T.; Luo, H.; Yang, K.; Sun, G.; Yu, H.; Huang, Q.; Vasilakos, A.V. Blockchain for Energy Market: A Comprehensive Survey. arXiv 2024, arXiv:2403.20045. [Google Scholar]
Shang, Y.; Li, S. FedPT-V2G: Security enhanced federated transformer learning for real-time V2G dispatch with non-IID data. Appl. Energy 2024, 358, 122626. [Google Scholar] [CrossRef]
Zhong, Y.; Wen, J.; Zhang, J.; Kang, J.; Jiang, Y.; Zhang, Y.; Cheng, Y.; Tong, Y. Blockchain-assisted twin migration for vehicular metaverses: A game theory approach. Trans. Emerg. Telecommun. Technol. 2023, 34, e4856. [Google Scholar] [CrossRef]
Kang, J.; Zhong, Y.; Xu, M.; Nie, J.; Wen, J.; Du, H.; Ye, D.; Huang, X.; Niyato, D.; Xie, S. Tiny Multi-Agent DRL for Twins Migration in UAV Metaverses: A Multi-Leader Multi-Follower Stackelberg Game Approach. IEEE Internet Things J. 2024, 11, 21021–21036. [Google Scholar] [CrossRef]
Zulfiqar, M.; Kamran, M.; Rasheed, M. A blockchain-enabled trust aware energy trading framework using games theory and multi-agent system in smat grid. Energy 2022, 255, 124450. [Google Scholar] [CrossRef]
Su, W.; Li, Z.; Yang, Z.; Lu, J. Deep reinforcement learning with sparse regularized pruning and compressing. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; IEEE: New York, NY, USA, 2021; pp. 8041–8046. [Google Scholar]
Li, Z.; Su, W.; Xu, M.; Yu, R.; Niyato, D.; Xie, S. Compact learning model for dynamic off-chain routing in blockchain-based IoT. IEEE J. Sel. Areas Commun. 2022, 40, 3615–3630. [Google Scholar] [CrossRef]
Livne, D.; Cohen, K. Pops: Policy pruning and shrinking for deep reinforcement learning. IEEE J. Sel. Top. Signal Process. 2020, 14, 789–801. [Google Scholar] [CrossRef]
Gao, H.; Jiang, S.; Li, Z.; Wang, R.; Liu, Y.; Liu, J. A Two-stage Multi-agent Deep Reinforcement Learning Method for Urban Distribution Network Reconfiguration Considering Switch Contribution. IEEE Trans. Power Syst. 2024, 1–12. [Google Scholar] [CrossRef]
He, Y.; Xiao, L. Structured pruning for deep convolutional neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 2900–2919. [Google Scholar] [CrossRef]
Camilleri, S.F.; Montgomery, A.; Visa, M.A.; Schnell, J.L.; Adelman, Z.E.; Janssen, M.; Grubert, E.A.; Anenberg, S.C.; Horton, D.E. Air quality, health and equity implications of electrifying heavy-duty vehicles. Nat. Sustain. 2023, 6, 1643–1653. [Google Scholar] [CrossRef]
Jin, C.; Tang, J.; Ghosh, P. Optimizing electric vehicle charging with energy storage in the electricity market. IEEE Trans. Smart Grid 2013, 4, 311–320. [Google Scholar] [CrossRef]
Amin, U.; Hossain, M.J.; Tushar, W.; Mahmud, K. Energy trading in local electricity market with renewables—A contract theoretic approach. IEEE Trans. Ind. Inform. 2020, 17, 3717–3730. [Google Scholar] [CrossRef]
Kang, J.; Wen, J.; Ye, D.; Lai, B.; Wu, T.; Xiong, Z.; Nie, J.; Niyato, D.; Zhang, Y.; Xie, S. Blockchain-empowered federated learning for healthcare Metaverses: User-centric incentive mechanism with optimal data freshness. IEEE Trans. Cogn. Commun. Netw. 2023, 10, 348–362. [Google Scholar] [CrossRef]
Liu, Z.; Huang, B.; Li, Y.; Sun, Q.; Pedersen, T.B.; Gao, D.W. Pricing Game and Blockchain for Electricity Data Trading in Low-Carbon Smart Energy Systems. IEEE Trans. Ind. Inform. 2024, 20, 6446–6456. [Google Scholar] [CrossRef]
Zhang, J.; Nie, J.; Wen, J.; Kang, J.; Xu, M.; Luo, X.; Niyato, D. Learning-based incentive mechanism for task freshness-aware vehicular twin migration. In Proceedings of the 2023 IEEE 43rd International Conference on Distributed Computing Systems Workshops (ICDCSW), Hong Kong, China, 18–21 July 2023; IEEE: New York, NY, USA, 2023; pp. 103–108. [Google Scholar]
Jiang, Y.; Kang, J.; Niyato, D.; Ge, X.; Xiong, Z.; Miao, C.; Shen, X. Reliable distributed computing for metaverse: A hierarchical game-theoretic approach. IEEE Trans. Veh. Technol. 2022, 72, 1084–1100. [Google Scholar] [CrossRef]
Kiran, P.; Vijaya Chandrakala, K.; Balamurugan, S.; Nambiar, T.; Rahmani-Andebili, M. A new agent-based machine learning strategic electricity market modelling approach towards efficient smart grid operation. In Applications of Artificial Intelligence in Planning and Operation of Smart Grids; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–29. [Google Scholar]
Zhang, T.; Xu, C.; Shen, J.; Kuang, X.; Grieco, L.A. How to Disturb Network Reconnaissance: A Moving Target Defense Approach Based on Deep Reinforcement Learning. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5735–5748. [Google Scholar] [CrossRef]
Zhang, T.; Xu, C.; Lian, Y.; Tian, H.; Kang, J.; Kuang, X.; Niyato, D. When Moving Target Defense Meets Attack Prediction in Digital Twins: A Convolutional and Hierarchical Reinforcement Learning Approach. IEEE J. Sel. Areas Commun. 2023, 41, 3293–3305. [Google Scholar] [CrossRef]
Wen, J.; Nie, J.; Zhong, Y.; Yi, C.; Li, X.; Jin, J.; Zhang, Y.; Niyato, D. Diffusion Model-based Incentive Mechanism with Prospect Theory for Edge AIGC Services in 6G IoT. IEEE Internet Things J. 2024, 1. [Google Scholar] [CrossRef]
Huang, X.; Li, P.; Yu, R.; Wu, Y.; Xie, K.; Xie, S. FedParking: A federated learning based parking space estimation with parked vehicle assisted edge computing. IEEE Trans. Veh. Technol. 2021, 70, 9355–9368. [Google Scholar] [CrossRef]
Ning, Z.; Sun, S.; Wang, X.; Guo, L.; Guo, S.; Hu, X.; Hu, B.; Kwok, R.Y. Blockchain-enabled intelligent transportation systems: A distributed crowdsensing framework. IEEE Trans. Mob. Comput. 2021, 21, 4201–4217. [Google Scholar] [CrossRef]
Zhang, T.; Xu, C.; Zhang, B.; Li, X.; Kuang, X.; Grieco, L.A. Towards Attack-Resistant Service Function Chain Migration: A Model-Based Adaptive Proximal Policy Optimization Approach. IEEE Trans. Dependable Secur. Comput. 2023, 20, 4913–4927. [Google Scholar] [CrossRef]
Liang, H.; Zhang, W. Stochastic Stackelberg Game Based Edge Service Selection for Massive IoT Networks. IEEE Internet Things J. 2023, 10, 22080–22095. [Google Scholar] [CrossRef]
Dewa, C.K.; Miura, J. A framework for DRL navigation with state transition checking and velocity increment scheduling. IEEE Access 2020, 8, 191826–191838. [Google Scholar] [CrossRef]
Wen, J.; Zhang, Y.; Chen, Y.; Zhong, W.; Huang, X.; Liu, L.; Niyato, D. Learning-based Big Data Sharing Incentive in Mobile AIGC Networks. arXiv 2024, arXiv:2407.10980. [Google Scholar]
Zhang, R.; Xiong, K.; Lu, Y.; Fan, P.; Ng, D.W.K.; Letaief, K.B. Energy efficiency maximization in RIS-assisted SWIPT networks with RSMA: A PPO-based approach. IEEE J. Sel. Areas Commun. 2023, 41, 1413–1430. [Google Scholar] [CrossRef]
Wen, J.; Kang, J.; Niyato, D.; Zhang, Y.; Mao, S. Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems. arXiv 2024, arXiv:2408.01173. [Google Scholar]
Su, W.; Li, Z.; Xu, M.; Kang, J.; Niyato, D.; Xie, S. Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving. arXiv 2024, arXiv:2402.05146. [Google Scholar] [CrossRef]
Sameera, K.; Nicolazzo, S.; Arazzi, M.; Nocera, A.; KA, R.R.; Vinod, P.; Conti, M. Privacy-preserving in Blockchain-based Federated Learning systems. Comput. Commun. 2024, 222, 38–67. [Google Scholar]

Figure 1. A blockchain-assisted secure electricity trading framework between EV charging operators and the electricity market.

Figure 2. Performance comparison between the proposed Tiny PPO algorithm and the PPO algorithm.

Figure 3. Utilities and strategies of the electricity market and EVs under different costs, with

M = 5

corresponding to the number of EV charging operators and a unit profit of

α = 50

.

Figure 3. Utilities and strategies of the electricity market and EVs under different costs, with

M = 5

corresponding to the number of EV charging operators and a unit profit of

α = 50

.

Figure 4. Utilities and strategies of the electricity market and EV charging operators under different numbers of EV charging operators with cost of

C = 5

and unit profit of

α = 50

.

Figure 4. Utilities and strategies of the electricity market and EV charging operators under different numbers of EV charging operators with cost of

C = 5

and unit profit of

α = 50

.

Figure 5. Utilities and strategies of the electricity market and EV charging operators under different unit profits (

α

), with

M = 5

corresponding to the number of EV charging operators and cost of

C = 5

.

Figure 5. Utilities and strategies of the electricity market and EV charging operators under different unit profits (

α

), with

M = 5

corresponding to the number of EV charging operators and cost of

C = 5

.

Figure 6. Security probability under different numbers of miners.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, Y.; Lin, X.; Lei, Y.; Gu, Y.; Tang, J.; Zhang, F.; Qian, B. Blockchain-Assisted Secure Energy Trading in Electricity Markets: A Tiny Deep Reinforcement Learning-Based Stackelberg Game Approach. Electronics 2024, 13, 3647. https://doi.org/10.3390/electronics13183647

AMA Style

Xiao Y, Lin X, Lei Y, Gu Y, Tang J, Zhang F, Qian B. Blockchain-Assisted Secure Energy Trading in Electricity Markets: A Tiny Deep Reinforcement Learning-Based Stackelberg Game Approach. Electronics. 2024; 13(18):3647. https://doi.org/10.3390/electronics13183647

Chicago/Turabian Style

Xiao, Yong, Xiaoming Lin, Yiyong Lei, Yanzhang Gu, Jianlin Tang, Fan Zhang, and Bin Qian. 2024. "Blockchain-Assisted Secure Energy Trading in Electricity Markets: A Tiny Deep Reinforcement Learning-Based Stackelberg Game Approach" Electronics 13, no. 18: 3647. https://doi.org/10.3390/electronics13183647

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Blockchain-Assisted Secure Energy Trading in Electricity Markets: A Tiny Deep Reinforcement Learning-Based Stackelberg Game Approach

Abstract

1. Introduction

2. Related Work

2.1. Reliable Energy Trading in Electricity Markets

2.2. Blockchain-Based Energy Trading in the Electricity Market

2.3. Deep Reinforcement Learning with Pruning Techniques

3. System Model

4. Stackelberg Model for Electric Energy Trading

4.1. Electric Energy Demands of EV Charging Operators in Stage II

4.2. Selling Price of the Electricity Market in Stage I

4.3. Stackelberg Equilibrium Analysis

4.3.1. EV Charging Operators’ Optimal Strategies as Equilibrium in Stage II

4.3.2. The Electricity Market’s Optimal Strategy as Equilibrium in Stage I

5. Tiny Deep Reinforcement Learning for an Optimal Pricing Strategy

5.1. POMDP for the Stackelberg Game between the Electricity Market and EV Charging Operators

5.1.1. State Space

5.1.2. Partially Observable Policy

5.1.3. Action Space

5.1.4. Reward Function

5.2. Dynamic Structured Pruning

6. Numerical Results

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI