Coalitional Demand Response Management in Community Energy Management Systems

Kemp, Nicholas; Siraj, Md Sadman; Tsiropoulou, Eirini Eleni

doi:10.3390/en16176363

Open AccessArticle

Coalitional Demand Response Management in Community Energy Management Systems

by

Nicholas Kemp

,

Md Sadman Siraj

and

Eirini Eleni Tsiropoulou

^*

Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131-0001, USA

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(17), 6363; https://doi.org/10.3390/en16176363

Submission received: 24 July 2023 / Revised: 22 August 2023 / Accepted: 28 August 2023 / Published: 1 September 2023

(This article belongs to the Section K: State-of-the-Art Energy Related Technologies)

Download

Browse Figures

Versions Notes

Abstract

:

With the advent of the Distributed Energy Resources within smart grid systems, traditional demand response management (DRM) models need to be redesigned to capture prosumers’ energy consumption requests and dynamic behavior within the energy market. In this paper, a coalitional DRM model is introduced based on the principles of Game Theory and reinforcement learning to dynamically determine prosumers’ formation in local energy trading communities and their optimal energy consumption. A hedonic game-theoretic model is introduced to enable prosumers to autonomously and dynamically select an energy trading community considering the partially available information regarding prosumers’ energy generation and consumption characteristics and utility companies’ provided rewards per community. Then, a log-linear reinforcement learning model is proposed to enable each prosumer to distributedly determine their optimal energy consumption. A detailed evaluation of the proposed coalitional DRM model is performed based on real data collected from the southwest area of the USA.

Keywords:

smart grid systems; demand response management; hedonic games; reinforcement learning; distributed energy resources

1. Introduction

In recent years, growing technological advancements have provided a wide variety of appliances to consumers, improving their living standards. However, this ever-increasing number of appliances, being used in homes in consumers’ every day lives, has resulted in an unprecedented energy demand [1]. The traditional retail energy market struggles to support consumers’ energy demand in a real-time manner, and a large portion of total energy consumption stems from household appliances, e.g., 42% in the U.S. [2]. Towards addressing this supply–demand challenge, Distributed Energy Resources (DERs), such as electric vehicles (EVs), residential rooftop solar photovoltaic (PV) panels, and energy storage systems (ESSs), have transformed traditional consumers into prosumers who are capable of generating and consuming energy and exchanging/selling energy with other prosumers in a peer-to-peer manner or with the main grid [3]. The joint problem of satisfying the energy demand of residential users with a wide variety of energy consumption patterns in order to avoid high energy demand peaks is addressed by demand response management (DRM) models [4,5].

1.1. Background and Motivation

Several sophisticated DRM models have been introduced in the recent literature, such as time-of-use (TOU), price-based home management systems, critical peak pricing (CPP), and real-time pricing (RTP). These DRM models have been applied in energy markets consisting of both consumers and prosumers, considering their energy consumption and generation characteristics [6]. Focusing on efficiently exploiting prosumers’ renewable energy production and DERs flexibility, a new energy market paradigm has recently been introduced based on the principles of peer-to-peer (P2P) transactive energy trading. In this new energy market paradigm, prosumers are organized in communities forming P2P energy trading systems which are coordinated following either system-centric or prosumer-centric approaches [7]. However, the existing prosumer community creation approaches mainly consider prosumers’ physical proximity to group them in communities, without exploiting their energy generation and consumption characteristics to reveal the full potential of the communities in terms of balancing customers’ energy needs in a smart grid system [8].

In this paper, towards addressing the above challenges, we introduce a novel coalitional demand response management model to reveal the full potential of prosumer communities in terms of addressing their energy demands. The proposed coalitional DRM model consists of two main components: (i) the hedonic game-theoretic communities formation framework and (ii) the reinforcement-learning-based DRM mechanism. Specifically, prosumers’ energy generation and consumption characteristics are exploited along with the smart grid operator’s/utility company’s provided rewards and the information availability in the overall smart grid system in order to form communities among prosumers following the principles of hedonic games [9]. Then, a reinforcement-learning-based mechanism is designed to determine each residential prosumer’s optimal consumption in order to optimize its perceived satisfaction while accounting for its energy demand characteristics. A detailed evaluation of the proposed coalitional DRM model is performed based on real data analysis in the southwest area of the USA.

1.2. Related Work

Several artificial-intelligent-inspired DRM models have been introduced in the recent literature focusing on residential consumer use case scenarios [10,11]. A time-of-use DRM model is introduced in [12] enabled by a model-free deep reinforcement learning mechanism with dueling deep Q network structure to optimize the management of the interruptible load considering a variety of consumers energy consumption patterns. A similar reinforcement-learning-enabled time-of-use DRM model is proposed in [13] focusing on the utility companies’ perspective in terms of predicting the optimal energy prices and discounts offered to consumers. Focusing on the home energy management systems as part of the DRM models, a double deep-Q-learning mechanism is proposed in [14] to perform scheduling of home energy appliances considering consumers’ energy needs and utility companies’ announced prices. A similar mechanism is designed in [15] following a deep-Q-network approach and focusing on the time-shiftable appliances, such as electric vehicles, lighting systems, and air conditioners. A residential DRM model based on reinforcement learning and fuzzy reasoning is discussed in [16] considering consumers’ preferences in order to schedule the operation of smart home appliances, also as a function of energy price. An hour-ahead DRM model is analyzed in [17] based on an artificial neural network approach to predict energy price and a multiagent reinforcement learning mechanism in order to perform scheduling of home appliances. A federated learning DRM model is introduced in [18] by aggregating local models from multiple utility companies to train a global smart grid model that performs energy price prediction.

Several recent approaches have also focused on privacy issues related to the design of DRM models targeted at consumers or prosumers [19]. A modified vector homomorphic encryption is analyzed in [20] in order to perform a secure load profiling of consumers based on encrypted meter data. Aiming at protecting consumers’ privacy, a DRM model is designed in [21] to learn an intelligent multi-microgrid system’s energy price response by implementing a deep neural network without direct access to consumers’ private energy consumption information.

Focusing on the DRM models specifically designed to address residential prosumers energy needs, great attention has been given to the analysis of their energy generation and consumption characteristics [22]. A deep reinforcement learning model is designed in [23] to optimize the energy consumption of a household equipped with several DERs towards reducing prosumers’ energy cost while accounting for their comfort-level characteristics. The retail and wholesale energy markets are analyzed in [24] by introducing a reinforcement-learning-based, price-based DRM mechanism that enables the energy management system to determine its optimal retail market energy price and prosumers’ energy consumption to jointly maximize profit and prosumer utility. A reinforcement-learning-based DRM model is introduced in [25] to perform energy scheduling of smart homes’ energy storage systems in order to minimize energy cost given announced energy prices.

The P2P transactive energy trading concept has recently attracted the interest of the research community, enabling prosumers to be organized in groups and directly exchange energy among each other through the grid [26]. A P2P transaction model is designed in [27], including a participant model, an equipment model, and a price model. A community energy management system is introduced in [28] based on a multiagent reinforcement learning approach to handle uncertainty in renewable energy and minimize energy cost. A deep reinforcement learning approach is proposed in [29] to minimize energy costs experienced by prosumers in P2P energy trading considering the dynamic nature of their energy availability. The users’ preferences and their level of engagement in a P2P energy trading market is studied in [30] based on a reinforcement learning approach that optimizes the system’s performance in terms of matching the energy resources.

1.3. Contributions and Outline

Though a tremendous amount of research work has been performed in the fields of DRM and community energy management systems, the problem of forming prosumers’ communities based on their energy generation and consumption characteristics, as well as accounting for the smart grid operator’s provided rewards, remains highly unexplored. Even more complicated is the problem of optimally forming prosumers’ communities while jointly determining prosumers’ energy consumption accounting for their preferences and characteristics.

Towards addressing the above challenges, in this paper, a coalitional demand response management model is introduced to support the operation of community energy management systems and optimize prosumers’ payoff via optimally forming communities among each other and determining their optimal energy consumption. To the best of our knowledge, this is the first research work in the existing literature combining the theory of hedonic games with the principles of reinforcement learning in order to create prosumer coalitions and determine optimal energy consumption, respectively. Specifically, the main contributions of this manuscript are summarized as follows:

A community energy management system is introduced consisting of prosumers who are autonomously organized in communities. The community formation process accounts for prosumers’ energy generation and consumption characteristics, the available information about the prosumers’ characteristics in the overall system, and the smart grid operator’s/utility company’s allocated reward to consumers for participating in the DRM process. The community formation process is formulated as a hedonic community formation game, and the existence of a Nash-stable and individual-stable partition is proven.
A reinforcement-learning-based framework is introduced to determine prosumers’ energy consumption towards fulfilling their energy needs and maximizing their payoff from purchasing their remaining energy demand from the smart grid. Two different types of reinforcement learning mechanisms are adopted, i.e., log-linear reinforcement learning algorithms, in order to test their accuracy, performance, and complexity in terms of determining prosumers’ optimal energy consumption.
A detailed evaluation of the proposed coalitional DRM model is performed based on real data collected from the southwest area of the USA. The performance evaluation demonstrates the operational characteristics of the proposed DRM model along with its superiority compared with the state of the art in terms of optimizing prosumer payoff.

The remainder of this paper is organized as follows. Section 2 and Section 3 present the prosumers’ and their communities’ models, respectively. Section 4 introduces the hedonic communities formation game-theoretic framework. Section 5 analyzes the reinforcement-learning-based DRM model to determine prosumers’ optimal energy consumption. The performance evaluation is demonstrated in Section 6, and Section 7 concludes this paper.

2. Prosumer Characteristics

In this section, prosumers’ energy generation and consumption characteristics are analyzed. We consider a set of residential prosumers

N = [1, \dots, n, \dots, N]

residing in a geographical area and being able to exchange energy among each other through the infrastructure of the smart grid network. Each prosumer is equipped with several DERs that enable them to generate, e.g., residential rooftop solar photovoltaic panels, and store energy, e.g., electric vehicles and energy storage systems. Based on the available DERs, each prosumer can generate

g_{n}^{t}

(KWh) energy per timeslot t, e.g., one hour. We study the overall system for a total set of timeslots

T = [1, \dots, t, \dots, T]

, e.g.,

T = 24

h one-day duration.

Each prosumer has a set of appliances

A_{n}

that consume energy

a_{n}^{t}

(KWh) when they continuously operate. Some of the appliances can be characterized by shiftable operation, e.g., charging of electric vehicles and running dishwashers, and some of them by nonshiftable operation, e.g., refrigerators. Based on each prosumer’s shiftable and nonshiftable appliances’ energy demand characteristics, they can adapt their energy consumption following an intelligent DRM model. The total energy demand of a prosumer is determined by the energy consumption of their appliances, if the latter ones operate, as follows:

d_{n}^{t} = \sum_{\forall a_{n} \in A_{n}} δ_{a_{n}}^{t} \cdot a_{n}^{t}

(1)

where

d_{n}^{t}

(KWh) denotes the prosumer’s energy demand in timeslot t,

a_{n}

denotes the prosumer’s appliance with

a_{n} \in A_{n}

, and

δ_{a_{n}}^{t}

denotes an operator, where

δ_{a_{n}}^{t} = 1

if the appliance

a_{n}

operates during the timeslot t; otherwise,

δ_{a_{n}}^{t} = 0

. The prosumers’ energy demand vector in a timeslot t in the examined area is denoted as

D^{t} = [d_{1}^{t}, \dots, d_{n}^{t}, \dots, d_{N}^{t}]

and their corresponding energy generation vector as

G^{t} = [g_{1}^{t}, \dots, g_{n}^{t}, \dots, g_{N}^{t}]

.

A prosumer is characterized as self-sufficient in timeslot t, if

g_{n}^{t} \geq d_{n}^{t}

. In this case, the prosumer can satisfy their own energy needs without purchasing energy from other prosumers or the utility company. The potential surplus of energy can be stored in the prosumer’s energy management system

b_{n}^{t + 1} = b_{n}^{t} + (g_{n}^{t} - l_{n}^{t})

, where

b_{n}^{t}

(KWh) denotes the prosumer’s battery availability in timeslot t, and

l_{n}^{t}

(KWh) captures the prosumer’s actual energy consumption in timeslot t, where in general

l_{n}^{t} \leq d_{n}^{t}

given appliances with shiftable operation. The prosumer can use the energy surplus to cover their energy needs in a future timeslot or sell it to other prosumers or to the utility company, while the latter point is out of the scope of this research work.

A prosumer is characterized as non-self-sufficient if

g_{n}^{t} + b_{n}^{t - 1} < d_{n}^{t}

. In this case, the prosumer cannot cover their energy needs in timeslot t and buys energy from the utility company. The utility company’s price of the energy is controlled by state/country-level regulations [31], and is denoted as

c [\frac{$}{KWh}]

. The prosumer decides to buy an amount

l_{n}^{t} \leq d_{n}^{t} - g_{n}^{t} - b_{n}^{t - 1}

from the utility company in timeslot t in order to cover their energy needs considering the energy price and flexibility with respect to the energy consumption stemming from their appliances with shiftable operation. In the rest of our analysis, we focus our study on the non-self-sufficient prosumers who need to join a community in order to cover their energy needs by purchasing energy.

3. Communities Model

The goal of the smart grid operator is to efficiently handle the prosumers’ energy demand in order not to experience a brownout or even a blackout, while at the same time maximizing its profit. A brownout occurs when the voltage supplied by a utility company drops below the normal level for a brief period. Brownouts are often caused by factors such as high demand for electricity or grid instability. Unlike a blackout, where power is completely lost, a brownout involves a partial reduction in voltage levels, which can impact the performance of electrical devices. Thus, if the prosumers’ energy demand from a utility company is not handled by intelligent demand response management mechanisms, the smart grid can experience a brownout or even a blackout. Similarly, the goal of the prosumers is to organize themselves in communities in order to experience the maximum possible benefit from the energy purchase process, e.g., discounts from the utility company. We consider that N prosumers can be organized into M communities, where the set of communities is denoted as

M = [1, \dots, m, \dots, M]

. The prosumers are organized in communities by considering their energy generation and consumption characteristics, the rewards provided by the utility company in the form of energy price discounts, and the available information to make their decision, as analyzed below.

A community m is characterized by its value, which is defined as follows:

V (N_{m}^{t}) = \sum_{\forall n \in N_{m}^{t}} l_{n}^{t}

(2)

where

N_{m}^{t} = [1, \dots, n, \dots, N_{m}^{t}]

is the set of prosumers that joined the community m in timeslot t. The physical meaning of the community’s value

V (N_{m}^{t})

captures the energy consumption volume that is requested by the prosumers of this community in timelsot t.

The utility company allocates a reward

r_{m}

(USD) in the form of energy price discounts at each community m to incentivize the prosumers to purchase energy directly from the utility company. The utility company can sophisticatedly decide the allocation of the rewards

r_{m}

per community based on multiple factors, such as energy demand in a geographical area, distribution cost of the energy in specific geographical areas, service priority of specific areas (e.g., hospitals, schools, and in general critical infrastructure), and others. The reward is allocated among the prosumers of the community following the principles of proportional fairness as follows:

\frac{l_{n}^{t}}{V (N_{m}^{t})} \cdot r_{m}^{t} = \frac{l_{n}^{t}}{\sum_{\forall n \in N_{m}^{t}} l_{n}^{t}} \cdot r_{m}^{t}

(3)

However, the prosumers that have selected a community m in timeslot t are not aware of the reward that other prosumers received by joining another community

m^{'}

based on their energy consumption characteristics

l_{n^{'}}^{t}, \forall n^{'} \in N_{m^{'}}^{t}

. Thus, we define the energy consumption uncertainty among the prosumers’ communities as

u_{m^{'}, m}^{t} \in R^{+}

. The lower the value of

u_{m^{'}, m}^{t} \in R^{+}

, the more noisy the information; thus, the prosumers have very vague information about the energy consumption characteristics and experienced rewards in other communities. Based on the consumption uncertainty at each community

m^{'} \in M ∖ {m}

, the prosumers belonging to each community

m^{'} \in M ∖ {m}

are informed in a noisy manner about the energy consumption characteristics and experienced rewards in community m. Thus, we define the noisy energy consumption as follows:

y_{M^{'} \to m}^{t} = \sum_{\forall m^{'} \in M ∖ {m}} u_{m^{'}, m}^{t} \frac{\sum_{\forall n \in N_{m}^{t}} l_{n}^{t}}{\sum_{m = 1}^{M} \sum_{\forall n \in N_{m}^{t}} l_{n}^{t}}

(4)

where

M^{'} = M ∖ {m}

. The noisy energy consumption parameter captures the level of information incompleteness experienced by the prosumers belonging to a community

m^{'} \in M^{'}

regarding the energy consumption characteristics and rewards of the prosumer belonging to community m.

Also, the most recent information regarding the energy consumption characteristics and rewards experienced by the prosumers in a community m values more in other prosumers’ decision to join this community. Thus, we define the freshness fading function to weigh the most recent information more as follows:

θ_{τ} = z^{t - τ}

(5)

where

z \in (0, 1]

, t denotes all the timeslots until the current timeslot that the system had studied, and

τ

denotes the individual passed timeslot up to the current timeslot t.

Based on the freshness fading function, we define the fading-aware consumption uncertainty as follows:

{\tilde{u}}_{m^{'}, m}^{t} = \frac{\sum_{\forall m^{'} \in M^{'}} \sum_{i = 1}^{t} θ_{i} u_{m, m^{'}}^{i}}{\sum_{i = 1}^{t} θ_{i}}

(6)

which captures the level of uncertainty regarding prosumers’ energy consumption and rewards in a community

m^{'}

experienced by the prosumers belonging to community m.

Thus, we introduce the fading-aware noisy energy consumption as follows:

{\tilde{y}}_{M^{'} \to m}^{t} = \frac{\sum_{i = 1}^{t} θ_{i} y_{M^{'} \to m}^{i}}{\sum_{i = 1}^{t} θ_{i}}

(7)

that captures the freshness of information within prosumers’ noisy energy consumption parameter.

By combining the fading-aware consumption uncertainty (Equation (6)) and the fading-aware noisy energy consumption (Equation (7)), we introduce the concept of community’s m captivation parameter as follows:

c_{m}^{t} = w_{1} {\tilde{y}}_{M^{'} \to m}^{t} + w_{2} {\tilde{u}}_{m^{'}, m}^{t}

(8)

where

w_{1}, w_{2} \in [0, 1]

, with

w_{1} + w_{2} = 1

. The captivation parameter captures the attractiveness of a community in terms of attracting prosumers to join it, given that they will experience high rewards and they have the potential of exchanging energy with other prosumers who have the potential of high energy surpluses. The captivation parameter (Equation (8)) depends on the fading-aware noisy energy consumption

{\tilde{y}}_{M^{'} \to m}^{t}

and the fading-aware consumption uncertainty

{\tilde{u}}_{m^{'}, m}^{t}

, which, respectively, depend on the energy consumption uncertainty

u_{m^{'}, m}^{t} \in R^{+}

. Thus, its value is determined based on the prosumers’ energy consumption characteristics and the information availability within the examined smart grid system.

By combining prosumer’s experienced reward and captivation parameter from joining a community m, we can define their corresponding pure payoff by belonging to a community as follows.

P_{n}^{N_{m}^{t}} = \frac{l_{n}^{t}}{\sum_{\forall n \in N_{m}^{t}} l_{n}^{t}} \cdot r_{m}^{t} + α c_{m}^{t}

(9)

where

α = 1 [U S D]

captures the equivalent revenue benefit of the prosumer by joining a community.

Also, the prosumer has a corresponding cost from purchasing energy

l_{n}^{t}

regarding the community that it belongs to, and it is defined as follows:

c_{n}^{N_{m}^{t}} = c \cdot l_{n}^{t}

(10)

Thus, the prosumer’s payoff from joining a community m is derived as follows:

U_{n}^{N_{m}^{t}} = P_{n}^{N_{m}^{t}} - c_{n}^{N_{m}^{t}}

(11)

4. Hedonic Communities Formation

In this section, the theory of hedonic games is exploited in order to autonomously organize prosumers into communities, thus forming a community energy management system. The prosumers’ energy generation and consumption characteristics, along with the utility company’s provided rewards and information availability in the overall smart grid system, are jointly considered in the communities formation process, which is autonomously performed by the prosumers.

Definition 1.

(Community): A community of prosumers is denoted as

N_{m}^{t} \subseteq N

, where m is the community’s index.

Definition 2.

(Prosumers’ Partition): Considering the total number of communities M, with

N_{m}^{t} \cap N_{m^{'}}^{t} = \emptyset, \forall m \neq m^{'}, m, m^{'} \in M

, and

U_{m = 1}^{M} N_{m}^{t} = N

, the partition of

M

that spans all the prosumers in

N

is defined as

Π = {N_{1}^{t}, \dots, N_{m}^{t}, \dots, N_{M}^{t}}

.

The available prosumers’ partitions can be categorized into the following special cases.

Definition 3.

(Grant Community): If all the prosumers are organized in only one community, then this community is called a grant community.

Definition 4.

(Singleton Community): If each prosumer creates its own community without any other prosumers belonging to it, then this community is called a singleton community.

Definition 5.

(Empty Community): If no prosumers belong to a community, then this community is called an empty community.

Each prosumer

n \in N

has its own preferences over all the communities that they can possibly join, as derived by their experienced payoff (Equation (11)). Thus, the prosumers compare the potential payoffs that they can enjoy by joining different communities and order the latter ones in terms of their preferences.

Definition 6.

(Preference Order): A preference relation

\geq_{n}

is defined for each prosumer n,

\forall n \in N

, as a reflexive, complete, and binary relation over all the potential communities that each prosumer can join as follows:

N_{n}^{t} \geq N_{m^{'}}^{t} \Leftrightarrow U_{n}^{N_{m}^{t}} \geq U_{n}^{N_{m^{'}}^{t}}

(12)

where

U_{n}^{N_{m}^{t}}

is given by Equation (11). If a prosumer strictly prefers to join community

N_{m}^{t}

over

N_{m^{'}}^{t}

, then the following expression should hold true:

N_{n}^{t} > N_{m^{'}}^{t} \Leftrightarrow U_{n}^{N_{m}^{t}} > U_{n}^{N_{m^{'}}^{t}}

(13)

The prosumers participate in a non-cooperative game in order to determine their optimal community choice, as quantified by the corresponding payoff that they experience, as captured by Equation (11).

Definition 7.

(Hedonic Game): A hedonic game is defined by the pair

(N, >)

, where

N

denotes the set of prosumers, and

> = [>_{1}, \dots, >_{n}, \dots, >_{N}]

is the preference order vector of the prosumers. The prosumers’ payoff

U_{n}^{N_{m}^{t}}

depends only on the prosumers of the community

N_{m}^{t}

that the prosumer n belongs to, and the communities partition

Π = {N_{1}^{t}, \dots, N_{m}^{t}, \dots, N_{M}^{t}}

results from the prosumers’ preferences over all the potential communities M.

During the execution of the hedonic game, the prosumers dynamically switch communities until they converge to a stable partition.

Definition 8.

(Switching Operation): Given a prosumers’ partition

Π = {N_{1}^{t}, \dots, N_{m}^{t}, \dots, N_{M}^{t}}

, a prosumer

n \in N_{m}^{t}

switches communities from

N_{m}^{t}

to

N_{m^{'}}^{t}

if and only if:

N_{m^{'}}^{t} \cup {n} >_{n} N_{m}^{t}

thus,

N_{m}^{t} \to N_{m}^{t} ∖ {n}

and

N_{m^{'}}^{t} \cup {n}

.

The goal of the prosumers is to converge to a Nash-stable and individual-stable partition so they cannot further improve their payoff given the communities’ choices that the rest of the prosumers have selected, thus achieving the maximum personal payoff by following their own community choice.

Definition 9.

(Nash-stable Partition): A partition

Π = {N_{1}^{t}, \dots, N_{m}^{t}, \dots, N_{M}^{t}}

is Nash-stable if

N_{m}^{t} >_{n} N_{m^{'}}^{t} \cup {n}, \forall m \neq m^{'}, \forall n \in N_{m}^{t}, \forall N_{m}^{t} \in Π

.

The physical meaning of a Nash-stable partition is that no prosumer wants to change community, given the communities’ choices of the rest of the prosumers, as they will not receive a higher payoff (Equation (11)).

Definition 10.

(Individual-stable Partition): A partition

Π = {N_{1}^{t}, \dots, N_{m}^{t}, \dots, N_{M}^{t}}

is individual-stable if and only if the following conditions do not hold true:

(i): $N_{m^{'}}^{t} \cup {n} >_{n} N_{m}^{t} \forall m \neq m^{'}$ , meaning there does not exist a prosumer n in community $N_{m}^{t}$ who strictly prefers another community $N_{m^{'}}^{t}$ ;
(ii): $N_{m^{'}}^{t} \cup {n} >_{n^{'}} N_{m^{'}}^{t}, \forall n^{'} \in N_{m^{'}}^{t}$ , meaning that the formation of a new community $N_{m^{'}}^{t} \cup {n}$ does not reduce the preference payoffs of the members of the new community $N_{m^{'}}^{t}$ .

Based on Definitions 9 and 10, it is easily observed that a Nash-stable partition is also an individual-stable partition, while a vice versa observation does not always hold true.

Theorem 1.

(Existence of a Nash-stable and Individual-stable Partition): Given a random initial partition

Π = {N_{1}^{t}, \dots, N_{m}^{t}, \dots, N_{M}^{t}}

, the proposed hedonic game

(N, >)

always converges to a Nash-stable and individual-stable partition

Π^{*} = {N_{1}^{t *}, \dots, N_{m}^{t *}, \dots, N_{M}^{t *}}

.

Proof.

The proof of Theorem 1 follows the reductio ad absurdum analysis. Suppose that the final partition

Π^{*}

is not Nash-stable. Then, some of the prosumers have the incentive to switch communities in order to experience a higher payoff. Thus, some prosumers will follow the switching operation process (Definition 8) and join some other communities, where they have the potential of enjoying a higher payoff. Given that switching operations still go on, the partition

Π^{*}

is not final, which contradicts our assumption that the final partition

Π^{*}

is not Nash-stable. Therefore, the hedonic game always converges to a final partition

Π^{*}

, which is Nash-stable; thus, it is also individual-stable. □

The hedonic game’s algorithm in order to determine the Nash-stable and individual-stable partition is presented in the next section, along with the reinforcement learning algorithm to determine prosumers’ energy consumption.

5. Reinforcement-Learning-Based Demand Response Management

In this section, a reinforcement-learning-based mechanism is introduced to enable prosumers to determine their optimal energy consumption

l_{n}^{t}

to maximize their utility from purchasing and consuming energy, while also considering the energy tariff announced by the utility company.

The prosumer’s utility from purchasing and consuming energy is defined as follows:

U_{n} (l_{n}^{t}, l_{- n}^{t}) = β f (l_{n}^{t}, l_{- n}^{t}) g (l_{n}^{t}) - c (l_{n}^{t}) + \frac{l_{n}^{t - 1}}{\sum_{\forall n \in N_{m}^{t}} l_{n}^{t - 1}} r_{m}

(14)

where

l_{- n}^{t} = [l_{1}^{t}, \dots, l_{n - 1}^{t}, l_{n + 1}^{t}, \dots, l_{N}^{t}]

denotes the energy consumption vector of all prosumers, excluding prosumer n, and

β = 1 [\frac{$}{{kWh}^{3 / 2}}]

is used as a constant unit-mapping parameter. The function

f (l_{n}^{t}, l_{- n}^{t})

is the prosumer’s pure utility from consuming the purchased energy, and it is a strictly increasing and concave function with respect to the prosumer’s energy consumption

l_{n}^{t}

, e.g.,

f (l_{n}^{t}, l_{- n}^{t}) = \frac{\sqrt{l_{n}^{t}}}{\sum_{\forall n \in N n^{'} \neq n} l_{n^{'}}^{t}}

. The physical meaning of the prosumer’s pure utility function is that it increases as more energy is purchased by the prosumer to cover its nonshiftable energy demand, and the curve is also concave as the prosumer reaches its maximum energy demand

d_{n}^{t} - g_{n}^{t} - b_{n}^{t - 1}

at timeslot t. Also, the prosumer’s pure utility from consuming energy is decreasing with respect to the total energy consumption of the rest of the prosumers in the examined smart grid system. This formulation is reasonable given that if the overall energy consumption in the smart grid system increases, then the utility company will eventually increase the energy price, which will decrease the prosumer’s utility. The function

g (l_{n}^{t})

captures the discount dissatisfaction, and it is a strictly decreasing and concave function with respect to the prosumer’s energy consumption

l_{n}^{t}

, e.g.,

g (l_{n}^{t}) = - {l_{n}^{t}}^{2} + a

,

a > 0

. The physical meaning of the discount dissatisfaction function captures the phenomenon of having the communities reward

r_{m}

shared in a larger portion among the prosumers, who will finally enjoy a smaller percentage of the allocated reward if the prosumers in the same community increase their energy consumption (Equation (3)). The function

c (l_{n}^{t})

captures prosumer’s cost to purchase energy at a price

c [\frac{$}{K W h}]

, e.g.,

c (l_{n}^{t}) = c \cdot l_{n}^{t}

. The last term of Equation (14) quantifies the prosumer’s received reward from participating in community m, as determined in Section 4.

The goal of each prosumer is to maximize its experienced utility by purchasing an optimal amount of energy, while considering its energy demand characteristics. Thus, the corresponding optimization problem is formulated as follows:

\begin{matrix} max_{l_{n}^{t}} U_{n} (l_{n}^{t}, l_{- n}^{t}) \end{matrix}

(15a)

\begin{matrix} s . t . 0 \leq l_{n}^{t} \leq d_{n}^{t} - g_{n}^{t} - b_{n}^{t - 1} \end{matrix}

(15b)

Given the decentralized nature of the smart grid system and respecting the prosumers’ privacy concerns, there is no centralized entity in the system to address the optimization problems (15a) and (15b) and determine prosumers’ optimal energy consumption

l^{t *} = [l_{1}^{t *}, \dots, l_{n}^{t *}, \dots, l_{N}^{t *}]

at each timeslot t. Thus, a reinforcement-learning-based mechanism is a natural choice in order to determine in an autonomous and distributed manner prosumers’ optimal energy consumption

l^{t *}

at each timeslot t. The theory of log-linear reinforcement learning is adopted by exploiting the B-logit and the Max-logit algorithms. Both of them enable prosumers to select the optimal strategy

l_{n}^{t *}, \forall n \in N

that maximizes their utility (Equation (14)) by autonomously performing the exploration and exploiting processes. The benefit of the Max-logit algorithm is that it can determine the Pareto-optimal solution

l^{t *}

, if it exists. The probability update rules of selecting a strategy

l_{n}^{t}

for the Max-logit and the B-logit algorithms are presented in Equations (16a), (16b), (17a) and (17b), respectively, while

β \in R^{+}

denotes the learning parameter, and i denotes the reinforcement learning algorithm’s iteration until it converges to

l^{t *}

.

\begin{matrix} P (l_{n}^{t} |_{i} = l_{n}^{t} |_{i - 1}) = \frac{e^{β U_{n} (l_{n}^{t} |_{i - 1})}}{m a x {e^{β U_{n} (l_{n}^{t} |_{i - 1})}, e^{β U_{n} (l_{n}^{t^{'}} |_{i})}}} \end{matrix}

(16a)

\begin{matrix} P (l_{n}^{t} |_{i} = l_{n}^{t^{'}} |_{i}) = \frac{e^{β U_{n} (l_{n}^{t^{'}} |_{i})}}{m a x {e^{β U_{n} (l_{n}^{t} |_{i - 1})}, e^{β U_{n} (l_{n}^{t^{'}} |_{i})}}} \end{matrix}

(16b)

\begin{matrix} P (l_{n}^{t} |_{i} = l_{n}^{t} |_{i - 1}) = \frac{e^{β U_{n} (l_{n}^{t} |_{i - 1})}}{e^{β U_{n} (l_{n}^{t} |_{i - 1})} + e^{β U_{n} (l_{n}^{t^{'}} |_{i})}} \end{matrix}

(17a)

\begin{matrix} P (l_{n}^{t} |_{i} = l_{n}^{t^{'}} |_{i}) = \frac{e^{β U_{n} (l_{n}^{t^{'}} |_{i})}}{e^{β U_{n} (l_{n}^{t} |_{i - 1})} + e^{β U_{n} (l_{n}^{t^{'}} |_{i})}} \end{matrix}

(17b)

The physical meaning of the above probability update rules of selecting a strategy

l_{n}^{t}

is that prosumers select with equal probability an alternative strategy

l_{n}^{t^{'}} |_{i}

during the exploration phase and receive a corresponding payoff

U_{n} (l_{n}^{t^{'}} |_{i})

. Then, during the exploitation phase, they learn through the probabilistic rules (16a) and (16b) for the Max-logit and (17a) and (17b) for the B-logit, which is the best strategy

l_{n}^{t}

to probabilistically select in the next iteration of the reinforcement learning algorithm. The Max-logit and B-logit algorithms both converge to the optimal solution

l^{t *}

. Detailed results are presented in Section 6.

The coalitional DRM algorithm that determines both the Nash-stable and individual-stable prosumers’ partition

Π^{*}

to communities following the principles of hedonic games and the prosumers’ optimal energy consumption

l^{t *}

based on the proposed reinforcement learning mechanism is presented in Algorithm 1. An overview of the overall proposed model is presented in Figure 1.

Algorithm 1 Coalitional DRM Algorithm

1:: Input: $N, M, l^{t}, t$
2:: Output: $Π^{*} = {N_{1}^{t *}, \dots, N_{m}^{t *}, \dots, N_{M}^{t *}}$ , $l^{t *}$
3:: Initialization: $Π^{*} = \emptyset$ , create an initial partition $Π$ by randomly allocating prosumers N to communities M. $δ_{n} = 1$ if prosumer n switches communities, and $δ_{n} = 0$ if prosumer n does not switch communities. $i = 0$ , $C o n v e r g e n c e = 0$ , $l^{t} |_{i = 0}$ .
4:: while $Convergence = = 0$ do
5:: while $\sum_{\forall n \in N} δ_{n} \neq 0$ do
6:: for $n = 1$ to N do
7:: Determine $U_{n}^{N_{m}^{t}}$ ;
8:: for each community $N_{m^{^{'}}}^{t}$ $\in Π$ , $m \neq m^{'}$ do
9:: Prosumer n joins $N_{m^{'}}^{t}$ , $N_{m^{'}}^{t}$ $\leftarrow N_{m^{'}}^{t} \cup {n}$
10:: Update $V (N_{m^{'}}^{t})$ and determine $U_{n}^{N_{m^{'}}^{t}}$ ;
11:: if $U_{n}^{N_{m^{'}}^{t}}$ $> U_{n}^{N_{m}^{t}}$ and $N_{m^{'}}^{t}$ $\notin h (n)$ then
12:: Prosumer n switches from $N_{m}^{t}, N_{m}^{t} \leftarrow N_{m}^{t} ∖ {n}$ and joins $N_{m^{'}}^{t}$ , $δ_{n} = 1$ ;
13:: else
14:: Prosumer n does not switch to $N_{m^{'}}^{t}$ , $N_{m^{'}}^{t} \leftarrow N_{m^{'}}^{t} ∖ {n}$ and remains at $N_{m}^{t}$ , $δ_{n} = 0$ ;
15:: end if
16:: end for
17:: end for
18:: Update $Π^{*} = Π$ , and determine $U_{n}^{N_{m}^{t}}$ , $\forall n \in N$ ;
19:: end while
20:: for $n = 1$ N do
21:: Prosumer n selects $l_{n}^{t^{'}} |_{i}$ with equal probability among all the possible energy consumption strategies and the rest of the prosumers keep their previous consumption, i.e., $l_{- n}^{t} {|_{i} = l_{- n}^{t} |}_{i - 1}$ .
22:: Prosumer n receives a utility $U_{n} (l_{n}^{t^{'}} |_{i}, l_{- n}^{t} |_{i})$ and updates $l_{n}^{t} |_{i}$ based on Equations (16a) and (16b), (Equations (17a) and (17b)).
23:: end for
24:: if $| \frac{\sum_{i = 0}^{T} \sum_{\forall n \in N} U_{n} |_{i}}{T} - \sum_{\forall n \in N} U_{n} |_{i} | \leq ϵ$ , $ϵ$ small positive number then
25:: $C o n v e r g e n c e = 1$
26:: end if
27:: $i = i + 1$
28:: end return
29:: return $Π^{*}$ , $l^{t *}$

6. Numerical Evaluation

In this section, a detailed numerical evaluation of the proposed coalitional DRM model in community energy management systems in presented. Specifically, the pure operation and performance of the proposed model is presented in Section 6.1, while the impact of the utility company’s provided reward and the captivation parameter on the communities’ creation process is analyzed in Section 6.2. A scalability analysis is provided in Section 6.3 to demonstrate the efficiency and robustness of the proposed model, while a detailed comparative evaluation to the state of the art is performed in Section 6.4. The following simulation parameters have been adopted for the numerical evaluation following a real data analysis in the southwest area of the USA [32]:

M = 5

,

N = 20

, the non-self-sufficient prosumer’s energy demand vector, i.e.,

d_{n}^{t} - g_{n}^{t} - b_{n}^{t - 1}

in an indicative timeslot t is

[1.00, 1.03, 1.05, 1.08, 1.11, 1.13, 1.16, 1.18, 1.21, 1.24, 1.26, 1.29, 1.32, 1.34, 1.37, 1.39, 1.42, 1.45, 1.47, 1.5]

kWh,

w_{1} = 0.7, w_{2} = 0.3, c = 0.1495 \frac{$}{KWh}, r^{t} = [1, 2, 3, 4, 5] \cdot 10^{3}

[¢], and

c^{t} = [3.5, 4.0, 4.5, 5.0, 5.5] \cdot 10^{2}

. In the rest of the numerical evaluation, prosumers’ payoff and utility is presented in [¢]. The proposed framework’s evaluation was conducted using a Dell XPS desktop with 11th Gen Intel core i9-11900 K 5.3 GHz processor, and 64 GB available RAM.

6.1. Pure Operation and Performance

In this section, the pure performance and operation of the proposed coalitional DRM model are presented. Figure 2a,b demonstrates prosumers’ initial and final payoff (Equation (11)) and utility (Equation (14)), respectively, based on the B-logit and the Max-logit DRM algorithms as a function of the prosumer’s ID. It is noted that a higher prosumer ID corresponds to a higher energy demand from the prosumer’s side. Figure 2c presents prosumers’ initial and final utility (Equation (14)) under the B-logit and Max-logit algorithms as a function of the prosumers’ optimal energy consumption.

The results reveal that prosumers intelligently select the communities that they belong to by following the principles of hedonic games (Section 4), and they ultimately achieve a higher payoff (Equation (11)) under both the B-logit and Max-logit algorithms (Figure 2a). Also, it is observed that the resulting prosumers’ payoff under the hedonic games is on average very similar, either following the B-logit or the Max-logit algorithms. Furthermore, prosumers with higher energy consumption achieve a higher payoff, as they better exploit the provided rewards by the utility company (Equation (3)). Focusing on the RL-based DRM model (Section 5) that determines prosumers’ optimal energy consumption, the results show that the Max-logit algorithm determines the Pareto-optimal solution better than the B-logit algorithm, thus resulting in a higher prosumer utility (Equation (14)). Also, both RL-based algorithms converge to an optimal solution, thus achieving a higher prosumer utility compared with the initial one (Figure 2b). Furthermore, the higher the prosumers’ energy demand (i.e., higher prosumer’s ID), the higher the corresponding achieved utility, as the prosumers cover a larger portion and absolute amount of their energy needs. By taking a closer look at the findings of Figure 2c, we observe that the Max-logit DRM algorithm achieves on average a higher utility compared with the B-logit algorithm, as it converges to a Pareto-optimal solution. Also, it is highlighted that higher prosumer utility is achieved by the Max-logit DRM algorithm with a corresponding lower energy consumption, thus supporting prosumers’ energy needs in a manner more beneficial to them (Figure 2c).

Figure 3a–c demonstrate the average prosumers’ utility (Equation (14)), their average energy consumption, and the execution time of the RL-based DRM model under the B-logit and Max-logit algorithms, respectively. The box plots capture the 0th percentile, i.e., the lowest data point, the 100th percentile, i.e., the highest data point, the 25th percentile, i.e., the lowest part of the box, the 75th percentile, i.e., the highest part of the box, the median value, i.e., the horizontal line inside the box, and the mean value, i.e., the circle point in the box. The results show that the Max-logit DRM algorithm results in higher achieved prosumer utility (Figure 3a) with a relatively lower average consumption for prosumers (Figure 3b) and corresponding shorter execution time (Figure 3c). Thus, the Max-logit DRM algorithm outperforms the B-logit algorithm both in terms of computational efficiency and achieved results for prosumers.

6.2. Impact of Reward and Captivation Parameter

In this section, an analysis of quantifying the impact of the utility company’s provided rewards and the captivation parameter on the communities formation process is performed. Figure 4 presents the average size of the five examined communities in terms of prosumers as a function of the utility company’s allocated rewards per community and the captivation parameter for the B-logit and Max-logit DRM algorithms. The results show that as the allocated rewards and the captivation parameter increase, more prosumers are attracted to join the community, as they have the potential to receive a higher reward (in the form of energy price discount) and have the potential to exchange energy with many more prosumers in the same community. Also, the results clearly show the prosumers’ sensitivity to the provided rewards and captivation parameter, as a small increase in the latter ones drives more prosumers to join the corresponding community. Finally, we observe that the choice of the RL-based DRM model that determines prosumers’ optimal energy consumption does not really affect prosumers’ formation process, given that their resulting energy consumption has small differences that cannot drive prosumers to join a different community.

6.3. Scalability Analysis

In this section, a scalability analysis is conducted to quantify the efficiency and robustness of the proposed coalitional DRM model. We consider a large smart grid system consisting of up to 100 prosumers who purchase energy from a utility company. All the prosumers are considered as non-self-sufficient, and they purchase energy, while their energy needs are equally distributed in the interval

l_{n}^{t} \in [1, 1.5]

KWh. Figure 5a illustrates the prosumers’ average switching operations, their average payoff, and the hedonic game’s execution time as the number of prosumers increases. Figure 5b presents the average prosumers’ utility and the Max-logit DRM algorithm’s execution time as a function of the number of prosumers. Also, Figure 5c shows the overall execution time of the proposed coalitional DRM model adopting the Max-logit or B-logit RL-based algorithms for an increasing number of prosumers, as well as the corresponding average values over the maximum number of prosumers (N = 100).

The results show that as the number of prosumers increases, the execution time of the hedonic game for the communities formation process (Figure 5a), the Max-logit DRM algorithm (Figure 5b), and the overall execution time of the proposed coalitional DRM model (Figure 5c) increase. However, it is highlighted that the execution time of the coalitional DRM model remains at low realistic values, i.e., a few seconds, even for a large population of prosumers, i.e., N = 100. Also, the results confirm that the Max-logit algorithm converges faster to an optimal solution than the B-logit algorithm (Figure 5c). Focusing on prosumers’ achieved payoff (Equation (11)) and utility (Equation (14)), the results show that both of them decrease as the number of prosumers increases. Specifically, the prosumers’ payoff decreasing trend stems from the utility company’s rewards sharing among a larger number of prosumers (Figure 5a). It is noted that the reward is considered constant per community in order to capture the impact of the increasing number of prosumers to the average number of switch operations, the average prosumers’ payoff and utility, and the execution time of the proposed coalitional demand response management model. The reward that is assigned to each community is

r^{t} = [1, 2, 3, 4, 5] \times 10^{3}

[¢]. Also, the prosumers’ utility decreasing trend stems from the increase in the energy demand from the utility company due to the increasing number of prosumers, which results in sharing the utility company’s energy supply among a larger number of prosumers (Figure 5b).

6.4. Comparative Evaluation

In this section, a comparative evaluation to the state of the art is performed, both regarding the community formation process and the RL-based DRM framework. Figure 6 presents the prosumers’ achieved average payoff (Equation (11)) for four comparative scenarios: (i) proposed coalitional DRM model, (ii) random community formation process, and (iii)–(iv) reward and captivation parameter-based community selection, where the prosumers select the community with the highest reward and captivation parameter, respectively. It is highlighted that the captivation parameter and the allocated reward per community are the same per community under the reward-based and captivation-based community formation processes, respectively, to capture the impact of each individual parameter. The results reveal that the proposed coalitional DRM model achieves the highest average prosumers’ payoff compared with all the comparative scenarios for an increasing number of prosumers (Figure 6). In contrast, the reward-based community formation scenario results in the worst payoff for the prosumers, as all of them are trying to share the provided reward in one community. The captivation-based community formation process presents better prosumer payoff compared with the reward-based scenario, as the captivation parameter has the same impact to all prosumers and it is not shared among them as the utility company’s provided reward is. The random-based community formation scenario presents intermediate results between the best- and worse-case scenarios, as expected. Specifically, the proposed coalitional DRM model achieves a 7% increase in the average prosumers’ payoff compared with the random coalitional formation scenario for the considered Monte Carlo scalability analysis, which, however, results in substantial improved payoff in terms of profit, i.e., USD, for the prosumers.

Figure 7 demonstrates the prosumers’ energy consumption as a function of the prosumers’ ID for two comparative scenarios: (i) the proposed coalitional DRM model and (ii) the no-discount dissatisfaction model, where, in the latter one, the prosumers are not sensitive to sharing the communities’ rewards with other prosumers (as captured by the

g (l_{n}^{t})

function in Equation (14)). It is noted that in both comparative scenarios, the rewards are shared among the prosumers of the same community following the principles of proportional fairness, as presented in Equation (3). The results reveal that the proposed coalitional DRM model results in lower prosumers’ energy consumption compared with the no-discount dissatisfaction model, as it sophisticatedly captures the impact of the utility company’s allocated reward per community in the prosumers’ decision-making process.

7. Conclusions

In this paper, a novel coalitional demand response management model in community energy management systems is introduced based on the principles of game theory and reinforcement learning. Specifically, the principles of hedonic game and log-linear reinforcement learning are adopted to enable prosumers to get organized in communities and determine their optimal energy consumption, respectively. A community energy management system is introduced consisting of prosumers who intelligently select to participate in a community accounting for the partial available information of their own and other prosumers’ energy generation and consumption characteristics, as well as the rewards provided by the utility company to each community in terms of energy price discount. The prosumers’ community formation process is based on the theory of hedonic games, and the existence of a Nash-stable and individual-stable partition is proven. Then, two log-linear reinforcement learning algorithms, named B-logit and Max-logit, are introduced to enable each prosumer to determine their optimal energy consumption in a distributed and autonomous manner and compared among each other in terms of their accuracy and computational complexity. A detailed evaluation of the proposed coalitional DRM model is performed based on real data collected from the southwest area of the USA.

Part of our current and future work is the extension of this model considering prosumers’ risk-aware decision-making characteristics considering utility companies’ capacity to serve prosumers’ energy demands. Towards this direction, a multivariable prospect-theoretic model will be developed that quantifies prosumers’ risk-aware behavior in scenarios of limited energy supply that can drive the system to brownout or even to blackout scenarios [33]. Also, part of our current and future work is the extension of the proposed coalitional demand response management model towards capturing the energy exchange among prosumers who can buy and steal energy from their peers.

Author Contributions

Conceptualization and writing, N.K. and M.S.S.; methodology and supervision, E.E.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lu, R.; Jiang, Z.; Wu, H.; Ding, Y.; Wang, D.; Zhang, H.T. Reward Shaping-Based Actor–Critic Deep Reinforcement Learning for Residential Energy Management. IEEE Trans. Ind. Inform. 2023, 19, 2662–2673. [Google Scholar] [CrossRef]
Mathew, A.; Roy, A.; Mathew, J. Intelligent Residential Energy Management System Using Deep Reinforcement Learning. IEEE Syst. J. 2020, 14, 5362–5372. [Google Scholar] [CrossRef]
Yan, L.; Chen, X.; Chen, Y.; Wen, J. A Hierarchical Deep Reinforcement Learning-Based Community Energy Trading Scheme for a Neighborhood of Smart Households. IEEE Trans. Smart Grid 2022, 13, 4747–4758. [Google Scholar] [CrossRef]
Xu, X.; Jia, Y.; Xu, Y.; Xu, Z.; Chai, S.; Lai, C.S. A Multi-Agent Reinforcement Learning-Based Data-Driven Method for Home Energy Management. IEEE Trans. Smart Grid 2020, 11, 3201–3211. [Google Scholar] [CrossRef]
Apostolopoulos, P.A.; Tsiropoulou, E.E.; Papavassiliou, S. Demand response management in smart grid networks: A two-stage game-theoretic learning-based approach. Mob. Netw. Appl. 2021, 26, 548–561. [Google Scholar] [CrossRef]
Remani, T.; Jasmin, E.; Ahamed, T.I. Residential Load Scheduling with Renewable Generation in the Smart Grid: A Reinforcement Learning Approach. IEEE Syst. J. 2019, 13, 3283–3294. [Google Scholar] [CrossRef]
Ye, Y.; Tang, Y.; Wang, H.; Zhang, X.P.; Strbac, G. A Scalable Privacy-Preserving Multi-Agent Deep Reinforcement Learning Approach for Large-Scale Peer-to-Peer Transactive Energy Trading. IEEE Trans. Smart Grid 2021, 12, 5185–5200. [Google Scholar] [CrossRef]
Song, H.; Liu, Y.; Zhao, J.; Liu, J.; Wu, G. Prioritized Replay Dueling DDQN Based Grid-Edge Control of Community Energy Storage System. IEEE Trans. Smart Grid 2021, 12, 4950–4961. [Google Scholar] [CrossRef]
Zhang, Y.; Guizani, M. Game Theory for Wireless Communications and Networking; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
Aladdin, S.; El-Tantawy, S.; Fouda, M.M.; Tag Eldien, A.S. MARLA-SG: Multi-Agent Reinforcement Learning Algorithm for Efficient Demand Response in Smart Grid. IEEE Access 2020, 8, 210626–210639. [Google Scholar] [CrossRef]
Irtija, N.; Sangoleye, F.; Tsiropoulou, E.E. Contract-theoretic demand response management in smart grid systems. IEEE Access 2020, 8, 184976–184987. [Google Scholar] [CrossRef]
Wang, B.; Li, Y.; Ming, W.; Wang, S. Deep Reinforcement Learning Method for Demand Response Management of Interruptible Load. IEEE Trans. Smart Grid 2020, 11, 3146–3155. [Google Scholar] [CrossRef]
Fraija, A.; Agbossou, K.; Henao, N.; Kelouwani, S.; Fournier, M.; Hosseini, S.S. A Discount-Based Time-of-Use Electricity Pricing Strategy for Demand Response with Minimum Information Using Reinforcement Learning. IEEE Access 2022, 10, 54018–54028. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, D.; Gooi, H.B. Optimization strategy based on deep reinforcement learning for home energy management. CSEE J. Power Energy Syst. 2020, 6, 572–582. [Google Scholar] [CrossRef]
Forootani, A.; Rastegar, M.; Jooshaki, M. An Advanced Satisfaction-Based Home Energy Management System Using Deep Reinforcement Learning. IEEE Access 2022, 10, 47896–47905. [Google Scholar] [CrossRef]
Alfaverh, F.; Denaï, M.; Sun, Y. Demand Response Strategy Based on Reinforcement Learning and Fuzzy Reasoning for Home Energy Management. IEEE Access 2020, 8, 39310–39321. [Google Scholar] [CrossRef]
Lu, R.; Hong, S.H.; Yu, M. Demand Response for Home Energy Management Using Reinforcement Learning and Artificial Neural Network. IEEE Trans. Smart Grid 2019, 10, 6629–6639. [Google Scholar] [CrossRef]
Huang, C.; Chen, W.; Wang, X.; Hong, F.; Yang, S.; Chen, Y.; Bu, S.; Jiang, C.; Zhou, Y.; Zhang, Y. DearFSAC: A DRL-based Robust Design for Power Demand Forecasting in Federated Smart Grid. In Proceedings of the GLOBECOM 2022-2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 5279–5284. [Google Scholar] [CrossRef]
Ebell, N.; Pruckner, M. Benchmarking a Decentralized Reinforcement Learning Control Strategy for an Energy Community. In Proceedings of the 2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Aachen, Germany, 25–28 October 2021; pp. 385–390. [Google Scholar] [CrossRef]
Yang, H.; Liang, S.; Zhou, Q.; Li, H. Privacy-preserving HE-based clustering for load profiling over encrypted smart meter data. In Proceedings of the ICC 2020–2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
Du, Y.; Li, F. Intelligent Multi-Microgrid Energy Management Based on Deep Neural Network and Model-Free Reinforcement Learning. IEEE Trans. Smart Grid 2020, 11, 1066–1076. [Google Scholar] [CrossRef]
Sadeghi, M.; Erol-Kantarci, M. Power Loss Minimization in Microgrids Using Bayesian Reinforcement Learning with Coalition Formation. In Proceedings of the 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Istanbul, Turkey, 8–11 September 2019; pp. 1–6. [Google Scholar] [CrossRef]
Amer, A.A.; Shaban, K.; Massoud, A.M. DRL-HEMS: Deep Reinforcement Learning Agent for Demand Response in Home Energy Management Systems Considering Customers and Operators Perspectives. IEEE Trans. Smart Grid 2023, 14, 239–250. [Google Scholar] [CrossRef]
Sangoleye, F.; Jao, J.; Faris, K.; Tsiropoulou, E.E.; Papavassiliou, S. Reinforcement Learning-Based Demand Response Management in Smart Grid Systems with Prosumers. IEEE Syst. J. 2023, 17, 1797–1807. [Google Scholar] [CrossRef]
Zenginis, I.; Vardakas, J.; Ramantas, K.; Verikoukis, C. Smart home’s energy management applying the deep deterministic policy gradient and clustering. In Proceedings of the 2022 IEEE 27th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), Paris, France, 2–3 November 2022; pp. 94–99. [Google Scholar] [CrossRef]
Prasad, A.; Dusparic, I. Multi-agent Deep Reinforcement Learning for Zero Energy Communities. In Proceedings of the 2019 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe), Bucharest, Romania, 29 September–2 October 2019; pp. 1–5. [Google Scholar] [CrossRef]
Wang, D.; Liu, B.; Jia, H.; Zhang, Z.; Chen, J.; Huang, D. Peer-to-peer Electricity Transaction Decisions of the User-side Smart Energy System Based on the SARSA Reinforcement Learning. CSEE J. Power Energy Syst. 2022, 8, 826–837. [Google Scholar] [CrossRef]
Lai, B.C.; Chiu, W.Y.; Tsai, Y.P. Multiagent Reinforcement Learning for Community Energy Management to Mitigate Peak Rebounds Under Renewable Energy Uncertainty. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 568–579. [Google Scholar] [CrossRef]
Sadeghi, M.; Erol-Kantarci, M. Deep Reinforcement Learning Based Coalition Formation for Energy Trading in Smart Grid. In Proceedings of the 2021 IEEE 4th 5G World Forum (5GWF), Montreal, QC, Canada, 13–15 October 2021; pp. 200–205. [Google Scholar] [CrossRef]
Agate, V.; Khamesi, A.R.; Silvestri, S.; Gaglio, S. Enabling peer-to-peer User-Preference-Aware Energy Sharing Through Reinforcement Learning. In Proceedings of the ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar] [CrossRef]
Patrizi, N.; LaTouf, S.K.; Tsiropoulou, E.E.; Papavassiliou, S. Prosumer-Centric Self-Sustained Smart Grid Systems. IEEE Syst. J. 2022, 16, 6042–6053. [Google Scholar] [CrossRef]
USA EIA. U.S. Energy Information Administration. 2021. Available online: https://www.eia.gov/ (accessed on 30 August 2023).
Kemp, N.; Siraj, M.S.; Tsiropoulou, E.E.; Papavassiliou, S. Community-based Load Balancing and Prosumers Incentivization in Smart Grid Systems. In Proceedings of the IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 1–6. [Google Scholar]

Figure 1. Overview of the proposed coalitional demand response management model.

Figure 2. (a) Prosumers’ payoff, (b) prosumers’ utility as a function of the prosumers’ ID, and (c) prosumers’ utility as a function of the prosumers’ energy consumption.

Figure 3. (a) Average prosumers’ utility, (b) average energy consumption, and (c) execution time of the RL-based DRM model under the B-logit and Max-logit algorithms.

Figure 4. Impact of utility company’s allocated reward per community and captivation parameter for the B-logit and Max-logit DRM algorithms.

Figure 5. Scalability Analysis: (a) prosumers’ average switching operations, prosumers’ average payoff, and hedonic game’s execution time as the number of prosumers increases; (b) average prosumers’ utility and the Max-logit DRM algorithm’s execution time as a function of the number of prosumers; and (c) overall execution time of the proposed coalitional DRM model adopting the Max-logit or B-logit RL-based algorithms for an increasing number of prosumers.

Figure 6. Comparative evaluation: Average prosumers’ payoff as a function of increasing number of prosumers.

Figure 7. Comparative evaluation: Prosumers’ consumption as a function of the prosumers’ ID.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kemp, N.; Siraj, M.S.; Tsiropoulou, E.E. Coalitional Demand Response Management in Community Energy Management Systems. Energies 2023, 16, 6363. https://doi.org/10.3390/en16176363

AMA Style

Kemp N, Siraj MS, Tsiropoulou EE. Coalitional Demand Response Management in Community Energy Management Systems. Energies. 2023; 16(17):6363. https://doi.org/10.3390/en16176363

Chicago/Turabian Style

Kemp, Nicholas, Md Sadman Siraj, and Eirini Eleni Tsiropoulou. 2023. "Coalitional Demand Response Management in Community Energy Management Systems" Energies 16, no. 17: 6363. https://doi.org/10.3390/en16176363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coalitional Demand Response Management in Community Energy Management Systems

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Related Work

1.3. Contributions and Outline

2. Prosumer Characteristics

3. Communities Model

4. Hedonic Communities Formation

5. Reinforcement-Learning-Based Demand Response Management

6. Numerical Evaluation

6.1. Pure Operation and Performance

6.2. Impact of Reward and Captivation Parameter

6.3. Scalability Analysis

6.4. Comparative Evaluation

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI