1. Introduction
In recent years, growing technological advancements have provided a wide variety of appliances to consumers, improving their living standards. However, this ever-increasing number of appliances, being used in homes in consumers’ every day lives, has resulted in an unprecedented energy demand [
1]. The traditional retail energy market struggles to support consumers’ energy demand in a real-time manner, and a large portion of total energy consumption stems from household appliances, e.g., 42% in the U.S. [
2]. Towards addressing this supply–demand challenge, Distributed Energy Resources (DERs), such as electric vehicles (EVs), residential rooftop solar photovoltaic (PV) panels, and energy storage systems (ESSs), have transformed traditional consumers into prosumers who are capable of generating and consuming energy and exchanging/selling energy with other prosumers in a peer-to-peer manner or with the main grid [
3]. The joint problem of satisfying the energy demand of residential users with a wide variety of energy consumption patterns in order to avoid high energy demand peaks is addressed by demand response management (DRM) models [
4,
5].
1.1. Background and Motivation
Several sophisticated DRM models have been introduced in the recent literature, such as time-of-use (TOU), price-based home management systems, critical peak pricing (CPP), and real-time pricing (RTP). These DRM models have been applied in energy markets consisting of both consumers and prosumers, considering their energy consumption and generation characteristics [
6]. Focusing on efficiently exploiting prosumers’ renewable energy production and DERs flexibility, a new energy market paradigm has recently been introduced based on the principles of peer-to-peer (P2P) transactive energy trading. In this new energy market paradigm, prosumers are organized in communities forming P2P energy trading systems which are coordinated following either system-centric or prosumer-centric approaches [
7]. However, the existing prosumer community creation approaches mainly consider prosumers’ physical proximity to group them in communities, without exploiting their energy generation and consumption characteristics to reveal the full potential of the communities in terms of balancing customers’ energy needs in a smart grid system [
8].
In this paper, towards addressing the above challenges, we introduce a novel coalitional demand response management model to reveal the full potential of prosumer communities in terms of addressing their energy demands. The proposed coalitional DRM model consists of two main components: (i) the hedonic game-theoretic communities formation framework and (ii) the reinforcement-learning-based DRM mechanism. Specifically, prosumers’ energy generation and consumption characteristics are exploited along with the smart grid operator’s/utility company’s provided rewards and the information availability in the overall smart grid system in order to form communities among prosumers following the principles of hedonic games [
9]. Then, a reinforcement-learning-based mechanism is designed to determine each residential prosumer’s optimal consumption in order to optimize its perceived satisfaction while accounting for its energy demand characteristics. A detailed evaluation of the proposed coalitional DRM model is performed based on real data analysis in the southwest area of the USA.
1.2. Related Work
Several artificial-intelligent-inspired DRM models have been introduced in the recent literature focusing on residential
consumer use case scenarios [
10,
11]. A time-of-use DRM model is introduced in [
12] enabled by a model-free deep reinforcement learning mechanism with dueling deep Q network structure to optimize the management of the interruptible load considering a variety of consumers energy consumption patterns. A similar reinforcement-learning-enabled time-of-use DRM model is proposed in [
13] focusing on the utility companies’ perspective in terms of predicting the optimal energy prices and discounts offered to consumers. Focusing on the home energy management systems as part of the DRM models, a double deep-Q-learning mechanism is proposed in [
14] to perform scheduling of home energy appliances considering consumers’ energy needs and utility companies’ announced prices. A similar mechanism is designed in [
15] following a deep-Q-network approach and focusing on the time-shiftable appliances, such as electric vehicles, lighting systems, and air conditioners. A residential DRM model based on reinforcement learning and fuzzy reasoning is discussed in [
16] considering consumers’ preferences in order to schedule the operation of smart home appliances, also as a function of energy price. An hour-ahead DRM model is analyzed in [
17] based on an artificial neural network approach to predict energy price and a multiagent reinforcement learning mechanism in order to perform scheduling of home appliances. A federated learning DRM model is introduced in [
18] by aggregating local models from multiple utility companies to train a global smart grid model that performs energy price prediction.
Several recent approaches have also focused on privacy issues related to the design of DRM models targeted at consumers or prosumers [
19]. A modified vector homomorphic encryption is analyzed in [
20] in order to perform a secure load profiling of consumers based on encrypted meter data. Aiming at protecting consumers’ privacy, a DRM model is designed in [
21] to learn an intelligent multi-microgrid system’s energy price response by implementing a deep neural network without direct access to consumers’ private energy consumption information.
Focusing on the DRM models specifically designed to address residential
prosumers energy needs, great attention has been given to the analysis of their energy generation and consumption characteristics [
22]. A deep reinforcement learning model is designed in [
23] to optimize the energy consumption of a household equipped with several DERs towards reducing prosumers’ energy cost while accounting for their comfort-level characteristics. The retail and wholesale energy markets are analyzed in [
24] by introducing a reinforcement-learning-based, price-based DRM mechanism that enables the energy management system to determine its optimal retail market energy price and prosumers’ energy consumption to jointly maximize profit and prosumer utility. A reinforcement-learning-based DRM model is introduced in [
25] to perform energy scheduling of smart homes’ energy storage systems in order to minimize energy cost given announced energy prices.
The P2P transactive energy trading concept has recently attracted the interest of the research community, enabling prosumers to be organized in groups and directly exchange energy among each other through the grid [
26]. A P2P transaction model is designed in [
27], including a participant model, an equipment model, and a price model. A community energy management system is introduced in [
28] based on a multiagent reinforcement learning approach to handle uncertainty in renewable energy and minimize energy cost. A deep reinforcement learning approach is proposed in [
29] to minimize energy costs experienced by prosumers in P2P energy trading considering the dynamic nature of their energy availability. The users’ preferences and their level of engagement in a P2P energy trading market is studied in [
30] based on a reinforcement learning approach that optimizes the system’s performance in terms of matching the energy resources.
1.3. Contributions and Outline
Though a tremendous amount of research work has been performed in the fields of DRM and community energy management systems, the problem of forming prosumers’ communities based on their energy generation and consumption characteristics, as well as accounting for the smart grid operator’s provided rewards, remains highly unexplored. Even more complicated is the problem of optimally forming prosumers’ communities while jointly determining prosumers’ energy consumption accounting for their preferences and characteristics.
Towards addressing the above challenges, in this paper, a coalitional demand response management model is introduced to support the operation of community energy management systems and optimize prosumers’ payoff via optimally forming communities among each other and determining their optimal energy consumption. To the best of our knowledge, this is the first research work in the existing literature combining the theory of hedonic games with the principles of reinforcement learning in order to create prosumer coalitions and determine optimal energy consumption, respectively. Specifically, the main contributions of this manuscript are summarized as follows:
A community energy management system is introduced consisting of prosumers who are autonomously organized in communities. The community formation process accounts for prosumers’ energy generation and consumption characteristics, the available information about the prosumers’ characteristics in the overall system, and the smart grid operator’s/utility company’s allocated reward to consumers for participating in the DRM process. The community formation process is formulated as a hedonic community formation game, and the existence of a Nash-stable and individual-stable partition is proven.
A reinforcement-learning-based framework is introduced to determine prosumers’ energy consumption towards fulfilling their energy needs and maximizing their payoff from purchasing their remaining energy demand from the smart grid. Two different types of reinforcement learning mechanisms are adopted, i.e., log-linear reinforcement learning algorithms, in order to test their accuracy, performance, and complexity in terms of determining prosumers’ optimal energy consumption.
A detailed evaluation of the proposed coalitional DRM model is performed based on real data collected from the southwest area of the USA. The performance evaluation demonstrates the operational characteristics of the proposed DRM model along with its superiority compared with the state of the art in terms of optimizing prosumer payoff.
The remainder of this paper is organized as follows.
Section 2 and
Section 3 present the prosumers’ and their communities’ models, respectively.
Section 4 introduces the hedonic communities formation game-theoretic framework.
Section 5 analyzes the reinforcement-learning-based DRM model to determine prosumers’ optimal energy consumption. The performance evaluation is demonstrated in
Section 6, and
Section 7 concludes this paper.
2. Prosumer Characteristics
In this section, prosumers’ energy generation and consumption characteristics are analyzed. We consider a set of residential prosumers residing in a geographical area and being able to exchange energy among each other through the infrastructure of the smart grid network. Each prosumer is equipped with several DERs that enable them to generate, e.g., residential rooftop solar photovoltaic panels, and store energy, e.g., electric vehicles and energy storage systems. Based on the available DERs, each prosumer can generate (KWh) energy per timeslot t, e.g., one hour. We study the overall system for a total set of timeslots , e.g., h one-day duration.
Each prosumer has a set of appliances
that consume energy
(KWh) when they continuously operate. Some of the appliances can be characterized by shiftable operation, e.g., charging of electric vehicles and running dishwashers, and some of them by nonshiftable operation, e.g., refrigerators. Based on each prosumer’s shiftable and nonshiftable appliances’ energy demand characteristics, they can adapt their energy consumption following an intelligent DRM model. The total energy demand of a prosumer is determined by the energy consumption of their appliances, if the latter ones operate, as follows:
where
(KWh) denotes the prosumer’s energy demand in timeslot
t,
denotes the prosumer’s appliance with
, and
denotes an operator, where
if the appliance
operates during the timeslot
t; otherwise,
. The prosumers’ energy demand vector in a timeslot
t in the examined area is denoted as
and their corresponding energy generation vector as
.
A prosumer is characterized as self-sufficient in timeslot t, if . In this case, the prosumer can satisfy their own energy needs without purchasing energy from other prosumers or the utility company. The potential surplus of energy can be stored in the prosumer’s energy management system , where (KWh) denotes the prosumer’s battery availability in timeslot t, and (KWh) captures the prosumer’s actual energy consumption in timeslot t, where in general given appliances with shiftable operation. The prosumer can use the energy surplus to cover their energy needs in a future timeslot or sell it to other prosumers or to the utility company, while the latter point is out of the scope of this research work.
A prosumer is characterized as
non-self-sufficient if
. In this case, the prosumer cannot cover their energy needs in timeslot
t and buys energy from the utility company. The utility company’s price of the energy is controlled by state/country-level regulations [
31], and is denoted as
. The prosumer decides to buy an amount
from the utility company in timeslot
t in order to cover their energy needs considering the energy price and flexibility with respect to the energy consumption stemming from their appliances with shiftable operation. In the rest of our analysis, we focus our study on the
non-self-sufficient prosumers who need to join a community in order to cover their energy needs by purchasing energy.
3. Communities Model
The goal of the smart grid operator is to efficiently handle the prosumers’ energy demand in order not to experience a brownout or even a blackout, while at the same time maximizing its profit. A brownout occurs when the voltage supplied by a utility company drops below the normal level for a brief period. Brownouts are often caused by factors such as high demand for electricity or grid instability. Unlike a blackout, where power is completely lost, a brownout involves a partial reduction in voltage levels, which can impact the performance of electrical devices. Thus, if the prosumers’ energy demand from a utility company is not handled by intelligent demand response management mechanisms, the smart grid can experience a brownout or even a blackout. Similarly, the goal of the prosumers is to organize themselves in communities in order to experience the maximum possible benefit from the energy purchase process, e.g., discounts from the utility company. We consider that N prosumers can be organized into M communities, where the set of communities is denoted as . The prosumers are organized in communities by considering their energy generation and consumption characteristics, the rewards provided by the utility company in the form of energy price discounts, and the available information to make their decision, as analyzed below.
A community
m is characterized by its value, which is defined as follows:
where
is the set of prosumers that joined the community
m in timeslot
t. The physical meaning of the community’s value
captures the energy consumption volume that is requested by the prosumers of this community in timelsot
t.
The utility company allocates a reward
(USD) in the form of energy price discounts at each community
m to incentivize the prosumers to purchase energy directly from the utility company. The utility company can sophisticatedly decide the allocation of the rewards
per community based on multiple factors, such as energy demand in a geographical area, distribution cost of the energy in specific geographical areas, service priority of specific areas (e.g., hospitals, schools, and in general critical infrastructure), and others. The reward is allocated among the prosumers of the community following the principles of proportional fairness as follows:
However, the prosumers that have selected a community
m in timeslot
t are not aware of the reward that other prosumers received by joining another community
based on their energy consumption characteristics
. Thus, we define the energy consumption uncertainty among the prosumers’ communities as
. The lower the value of
, the more noisy the information; thus, the prosumers have very vague information about the energy consumption characteristics and experienced rewards in other communities. Based on the consumption uncertainty at each community
, the prosumers belonging to each community
are informed in a noisy manner about the energy consumption characteristics and experienced rewards in community
m. Thus, we define the noisy energy consumption as follows:
where
. The noisy energy consumption parameter captures the level of information incompleteness experienced by the prosumers belonging to a community
regarding the energy consumption characteristics and rewards of the prosumer belonging to community
m.
Also, the most recent information regarding the energy consumption characteristics and rewards experienced by the prosumers in a community
m values more in other prosumers’ decision to join this community. Thus, we define the freshness fading function to weigh the most recent information more as follows:
where
,
t denotes all the timeslots until the current timeslot that the system had studied, and
denotes the individual passed timeslot up to the current timeslot
t.
Based on the freshness fading function, we define the fading-aware consumption uncertainty as follows:
which captures the level of uncertainty regarding prosumers’ energy consumption and rewards in a community
experienced by the prosumers belonging to community
m.
Thus, we introduce the fading-aware noisy energy consumption as follows:
that captures the freshness of information within prosumers’ noisy energy consumption parameter.
By combining the fading-aware consumption uncertainty (Equation (
6)) and the fading-aware noisy energy consumption (Equation (
7)), we introduce the concept of community’s
m captivation parameter as follows:
where
, with
. The captivation parameter captures the attractiveness of a community in terms of attracting prosumers to join it, given that they will experience high rewards and they have the potential of exchanging energy with other prosumers who have the potential of high energy surpluses. The captivation parameter (Equation (
8)) depends on the fading-aware noisy energy consumption
and the fading-aware consumption uncertainty
, which, respectively, depend on the energy consumption uncertainty
. Thus, its value is determined based on the prosumers’ energy consumption characteristics and the information availability within the examined smart grid system.
By combining prosumer’s experienced reward and captivation parameter from joining a community
m, we can define their corresponding pure payoff by belonging to a community as follows.
where
captures the equivalent revenue benefit of the prosumer by joining a community.
Also, the prosumer has a corresponding cost from purchasing energy
regarding the community that it belongs to, and it is defined as follows:
Thus, the prosumer’s payoff from joining a community
m is derived as follows:
4. Hedonic Communities Formation
In this section, the theory of hedonic games is exploited in order to autonomously organize prosumers into communities, thus forming a community energy management system. The prosumers’ energy generation and consumption characteristics, along with the utility company’s provided rewards and information availability in the overall smart grid system, are jointly considered in the communities formation process, which is autonomously performed by the prosumers.
Definition 1. (Community): A community of prosumers is denoted as , where m is the community’s index.
Definition 2. (Prosumers’ Partition): Considering the total number of communities M, with , and , the partition of that spans all the prosumers in is defined as .
The available prosumers’ partitions can be categorized into the following special cases.
Definition 3. (Grant Community): If all the prosumers are organized in only one community, then this community is called a grant community.
Definition 4. (Singleton Community): If each prosumer creates its own community without any other prosumers belonging to it, then this community is called a singleton community.
Definition 5. (Empty Community): If no prosumers belong to a community, then this community is called an empty community.
Each prosumer
has its own preferences over all the communities that they can possibly join, as derived by their experienced payoff (Equation (
11)). Thus, the prosumers compare the potential payoffs that they can enjoy by joining different communities and order the latter ones in terms of their preferences.
Definition 6. (Preference Order): A preference relation is defined for each prosumer n, , as a reflexive, complete, and binary relation over all the potential communities that each prosumer can join as follows: where is given by Equation (11). If a prosumer strictly prefers to join community over , then the following expression should hold true: The prosumers participate in a non-cooperative game in order to determine their optimal community choice, as quantified by the corresponding payoff that they experience, as captured by Equation (
11).
Definition 7. (Hedonic Game): A hedonic game is defined by the pair , where denotes the set of prosumers, and is the preference order vector of the prosumers. The prosumers’ payoff depends only on the prosumers of the community that the prosumer n belongs to, and the communities partition results from the prosumers’ preferences over all the potential communities M.
During the execution of the hedonic game, the prosumers dynamically switch communities until they converge to a stable partition.
Definition 8. (Switching Operation): Given a prosumers’ partition , a prosumer switches communities from to if and only if: thus, and .
The goal of the prosumers is to converge to a Nash-stable and individual-stable partition so they cannot further improve their payoff given the communities’ choices that the rest of the prosumers have selected, thus achieving the maximum personal payoff by following their own community choice.
Definition 9. (Nash-stable Partition): A partition is Nash-stable if .
The physical meaning of a Nash-stable partition is that no prosumer wants to change community, given the communities’ choices of the rest of the prosumers, as they will not receive a higher payoff (Equation (
11)).
Definition 10. (Individual-stable Partition): A partition is individual-stable if and only if the following conditions do not hold true:
- (i)
, meaning there does not exist a prosumer n in community who strictly prefers another community ;
- (ii)
, meaning that the formation of a new community does not reduce the preference payoffs of the members of the new community .
Based on Definitions 9 and 10, it is easily observed that a Nash-stable partition is also an individual-stable partition, while a vice versa observation does not always hold true.
Theorem 1. (Existence of a Nash-stable and Individual-stable Partition): Given a random initial partition , the proposed hedonic game always converges to a Nash-stable and individual-stable partition .
Proof. The proof of Theorem 1 follows the reductio ad absurdum analysis. Suppose that the final partition is not Nash-stable. Then, some of the prosumers have the incentive to switch communities in order to experience a higher payoff. Thus, some prosumers will follow the switching operation process (Definition 8) and join some other communities, where they have the potential of enjoying a higher payoff. Given that switching operations still go on, the partition is not final, which contradicts our assumption that the final partition is not Nash-stable. Therefore, the hedonic game always converges to a final partition , which is Nash-stable; thus, it is also individual-stable. □
The hedonic game’s algorithm in order to determine the Nash-stable and individual-stable partition is presented in the next section, along with the reinforcement learning algorithm to determine prosumers’ energy consumption.
5. Reinforcement-Learning-Based Demand Response Management
In this section, a reinforcement-learning-based mechanism is introduced to enable prosumers to determine their optimal energy consumption to maximize their utility from purchasing and consuming energy, while also considering the energy tariff announced by the utility company.
The prosumer’s utility from purchasing and consuming energy is defined as follows:
where
denotes the energy consumption vector of all prosumers, excluding prosumer
n, and
is used as a constant unit-mapping parameter. The function
is the prosumer’s pure utility from consuming the purchased energy, and it is a strictly increasing and concave function with respect to the prosumer’s energy consumption
, e.g.,
. The physical meaning of the prosumer’s pure utility function is that it increases as more energy is purchased by the prosumer to cover its nonshiftable energy demand, and the curve is also concave as the prosumer reaches its maximum energy demand
at timeslot
t. Also, the prosumer’s pure utility from consuming energy is decreasing with respect to the total energy consumption of the rest of the prosumers in the examined smart grid system. This formulation is reasonable given that if the overall energy consumption in the smart grid system increases, then the utility company will eventually increase the energy price, which will decrease the prosumer’s utility. The function
captures the discount dissatisfaction, and it is a strictly decreasing and concave function with respect to the prosumer’s energy consumption
, e.g.,
,
. The physical meaning of the discount dissatisfaction function captures the phenomenon of having the communities reward
shared in a larger portion among the prosumers, who will finally enjoy a smaller percentage of the allocated reward if the prosumers in the same community increase their energy consumption (Equation (
3)). The function
captures prosumer’s cost to purchase energy at a price
, e.g.,
. The last term of Equation (
14) quantifies the prosumer’s received reward from participating in community
m, as determined in
Section 4.
The goal of each prosumer is to maximize its experienced utility by purchasing an optimal amount of energy, while considering its energy demand characteristics. Thus, the corresponding optimization problem is formulated as follows:
Given the decentralized nature of the smart grid system and respecting the prosumers’ privacy concerns, there is no centralized entity in the system to address the optimization problems (
15a) and (
15b) and determine prosumers’ optimal energy consumption
at each timeslot
t. Thus, a reinforcement-learning-based mechanism is a natural choice in order to determine in an autonomous and distributed manner prosumers’ optimal energy consumption
at each timeslot
t. The theory of log-linear reinforcement learning is adopted by exploiting the B-logit and the Max-logit algorithms. Both of them enable prosumers to select the optimal strategy
that maximizes their utility (Equation (
14)) by autonomously performing the exploration and exploiting processes. The benefit of the Max-logit algorithm is that it can determine the Pareto-optimal solution
, if it exists. The probability update rules of selecting a strategy
for the Max-logit and the B-logit algorithms are presented in Equations (
16a), (
16b), (
17a) and (
17b), respectively, while
denotes the learning parameter, and
i denotes the reinforcement learning algorithm’s iteration until it converges to
.
The physical meaning of the above probability update rules of selecting a strategy
is that prosumers select with equal probability an alternative strategy
during the exploration phase and receive a corresponding payoff
. Then, during the exploitation phase, they learn through the probabilistic rules (
16a) and (
16b) for the Max-logit and (
17a) and (
17b) for the B-logit, which is the best strategy
to probabilistically select in the next iteration of the reinforcement learning algorithm. The Max-logit and B-logit algorithms both converge to the optimal solution
. Detailed results are presented in
Section 6.
The coalitional DRM algorithm that determines both the Nash-stable and individual-stable prosumers’ partition
to communities following the principles of hedonic games and the prosumers’ optimal energy consumption
based on the proposed reinforcement learning mechanism is presented in Algorithm 1. An overview of the overall proposed model is presented in
Figure 1.
Algorithm 1 Coalitional DRM Algorithm |
- 1:
Input: - 2:
Output: , - 3:
Initialization: , create an initial partition by randomly allocating prosumers N to communities M. if prosumer n switches communities, and if prosumer n does not switch communities. , , . - 4:
while
do - 5:
while do - 6:
for to N do - 7:
Determine ; - 8:
for each community , do - 9:
Prosumer n joins , - 10:
Update and determine ; - 11:
if and then - 12:
Prosumer n switches from and joins , ; - 13:
else - 14:
Prosumer n does not switch to , and remains at , ; - 15:
end if - 16:
end for - 17:
end for - 18:
Update , and determine , ; - 19:
end while - 20:
for N do - 21:
Prosumer n selects with equal probability among all the possible energy consumption strategies and the rest of the prosumers keep their previous consumption, i.e., . - 22:
Prosumer n receives a utility and updates based on Equations ( 16a) and ( 16b), (Equations ( 17a) and ( 17b)). - 23:
end for - 24:
if , small positive number then - 25:
- 26:
end if - 27:
- 28:
end return - 29:
return ,
|