Review and Evaluation of Multi-Agent Control Applications for Energy Management in Buildings

Michailidis, Panagiotis; Michailidis, Iakovos; Kosmatopoulos, Elias

doi:10.3390/en17194835

Open AccessFeature PaperReview

Review and Evaluation of Multi-Agent Control Applications for Energy Management in Buildings

by

Panagiotis Michailidis

^1,2,*,†

,

Iakovos Michailidis

^1,2,†

and

Elias Kosmatopoulos

^1,2,*,†

¹

Center for Research and Technology Hellas, Thermi, 57001 Thessaloniki, Greece

²

Department of Electrical and Computer Engineering, Democritus University of Thrace, 67100 Xanthi, Greece

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies 2024, 17(19), 4835; https://doi.org/10.3390/en17194835

Submission received: 20 August 2024 / Revised: 22 September 2024 / Accepted: 23 September 2024 / Published: 26 September 2024

(This article belongs to the Special Issue Building Energy Audits-Diagnosis and Retrofitting towards Decarbonization and Sustainable Cities II)

Download

Browse Figures

Versions Notes

Abstract

The current paper presents a comprehensive review analysis of Multi-agent control methodologies for Integrated Building Energy Management Systems (IBEMSs), considering combinations of multi-diverse equipment such as Heating, Ventilation, and Air conditioning (HVAC), domestic hot water (DHW), lighting systems (LS), renewable energy sources (RES), energy storage systems (ESS) as well as electric vehicles (EVs), integrated at the building level. Grounded in the evaluation of key control methodologies—such as Model Predictive Control (MPC) and reinforcement learning (RL) along with their synergistic hybrid integration—the current study integrates a large number of impactful applications of the last decade and evaluates their contribution to the field of energy management in buildings. To this end, over seventy key scholarly papers from the 2014–2024 period have been integrated and analyzed to provide a holistic evaluation on different areas of interest, including the utilized algorithms, agent interactions, energy system types, building typologies, application types and simulation tools. Moreover, by analyzing the latest advancements in the field, a fruitful trend identification is conducted in the realm of multi-agent control for IBEMS frameworks, highlighting the most prominent solutions to achieve sustainability and energy efficiency.

Keywords:

multi-agent control; decentralized control; distributed control; model-free control; model-based control; reinforcement learning; model predictive control; building energy management; HVAC; RES

1. Introduction

1.1. Motivation

According to the literature, buildings are responsible for a substantial 36% of worldwide energy usage [1]. In this context, the management of integrated building energy systems—such as HVAC, DHW, LS, RES, ESS, and EVs—is critical for enhancing energy efficiency and occupant comfort while addressing environmental concerns [2,3,4,5]. Initially, Building Energy Management Systems relied on manual controls; however, as building complexity and energy demands escalated, the limitations of these methods became apparent and automated systems became necessary to achieve greater efficiency, precision, and scalability in managing energy utilization [6,7,8]. To this end, the primary objective for engineers toward automated BEMS was to create tools able to dynamically adapt to fluctuations in environmental conditions and occupancy levels, thereby minimizing inefficiencies and optimizing energy performance—in other words, the goal was to evolve automated BEMSs into fully autonomous systems, capable of making independent decisions based on real-time data and dynamically responding to changing conditions [9,10,11]. To this end, numerous control methodologies have emerged to achieve this transformation, including machine learning algorithms. Such advanced techniques enable BEMSs to not only follow predefined rules but also to learn from historical data, predict future conditions, and make real-time adjustments [6,10].

However, the performance requirements of building occupants gradually evolved, necessitating the integration of multiple energy systems at the building level. This led to the development of Integrated Building Energy Management Systems (IBEMSs), which incorporate multiple heterogeneous energy systems (HVAC, RES, ESS, LS, DHW, and EV), portraying a System of Systems (SoS) of high complexity [12,13,14,15]. However, managing such a complex and interconnected SoS required more advanced control strategies to effectively handle the intricate energy dynamics and ensure optimal performance; as traditional centralized control methods struggled to manage the diverse and dynamic interactions within SoS frameworks, the need for more flexible and adaptive strategies became clear.

Inspired by concepts from distributed computing and artificial intelligence, multi-agent control portrayed a fruitful solution to the increasing complexity of IBEMS management and other SoS frameworks such as industrial automation [16,17,18], traffic management [19,20], swarm navigation of robots [21,22,23,24], smart grid balance [25,26,27,28,29], etc. Such approaches leverage a network of autonomous agents, each responsible for managing a specific energy subsystem, such as HVAC, lighting, or renewable energy sources. By enabling these agents to make local decisions while coordinating with one another, multi-agent control provides a more scalable and resilient solution for optimizing energy performance in complex building environments. This decentralized philosophy not only enhanced system responsiveness but also improved fault tolerance and adaptability, making it particularly effective in managing the intricate and variable energy demands of modern buildings.

Multi-agent control strategies are primarily classified into model-based and model-free control, each with distinct methodologies and applications [30,31,32]. Model-based control relies on detailed mathematical models to represent the behavior and interactions of energy subsystems. Techniques like Model Predictive Control (MPC) are commonly used, where agents utilize these models to predict future states and optimize decision-making over time [31,32]. This approach provides high precision and predictive capability but requires accurate models and substantial computational power, making it well suited for environments with stable and well-understood dynamics. On the other hand, model-free control bypasses the need for explicit models, instead using adaptive, data-driven methodologies like reinforcement learning (RL) [30,31,32,33,34]. In this approach, agents learn optimal behaviors by interacting with the environment and receiving feedback in the form of rewards or penalties. This makes model-free control highly flexible and capable of adapting to complex or changing conditions, though it may involve a longer learning process and potentially less precision during the initial phase [30,31,32,35,36]. These complementary approaches allow multi-agent systems to handle the diverse and dynamic challenges of modern IBEMS.

However, it is important to note that the choice between model-based and model-free control in multi-agent control frameworks is guided by the nature of the energy systems and the specific goals of the IBEMS. For instance, model-based control is favored when the system dynamics are well understood and can be accurately modeled, enabling precise and predictive management using methodologies [30,31,32,35,37]. Such an approach is highly effective in stable and predictable environments, where accurate models may be leveraged to optimize system performance. However, it should be noted that it commonly requires significant computational resources and relies heavily on the accuracy of the underlying models. On the other hand, model-free control is more suitable for environments characterized by complexity, uncertainty, or frequent changes. The model-free approach allows agents to learn optimal behaviors directly from data through adaptive techniques, making it highly flexible and capable of responding to dynamic conditions. However, achieving optimal performance with such adaptive control schemes commonly requires a significant amount of time, especially during the initial learning stage [36,37].

It is important to note that in the period between 2014 and 2024, and especially the last years, multi-agent control approaches in the literature comprise a noticeable amount of the overall control approaches in the literature [30]. Such a tendency is grounded on numerous reasons, such as the following: (a) The sophistication that multi-agent control for multiple Integrated BEMS (IBEMS) may offer: the need for scalable and decentralized control solutions has become more pressing as building systems have grown in complexity, and multi-agent systems naturally address this requirement by allowing distributed decision-making across large or networked equipment. (b) Advancements in computational power and edge computing: the rise of edge computing and increased computational power have made it feasible to implement complex multi-agent control in real-world settings. Contrary to old-fashioned practices, agents can now process and analyze data locally, leading to faster decision-making and reducing the reliance on centralized control systems. (c) The emergence of novel multi-agent control schemes such as RL and Deep Learning have made it possible to design more sophisticated and adaptive multi-agent systems able to handle complex tasks, learn from dynamic environments, and optimize energy management more efficiently. (d) The proliferation of IoT and sensor networks: the widespread deployment of Internet of Things (IoT) devices and sensor networks in buildings has provided real-time data, helping multi-agent methodologies to make informed, data-driven decisions about energy management, enhancing the ability to respond to changing conditions and user behaviors. (e) The increased demand for energy efficiency and sustainability: growing concerns about climate change and the need for sustainable energy practices have pushed the building industry to adopt more efficient energy management systems. Multi-agent control offers a scalable and flexible approach to optimizing energy use, integrating RES and other equipment more efficiently into the energy mix for reducing the overall carbon footprint of buildings.

In exploring multi-agent control, this review aims to delve into the most prominent model-based and model-free control strategies in IBEMS, examining their applications, effectiveness, and associated challenges. By analyzing the most impactful literature works from the past decade (2014–2024), the current paper has as its primary aim to provide insights into the utilization of these novel methodologies and their potential impact on future energy management in buildings.

1.2. Literature Analysis Approach

The current review aims to meticulously explore the influential studies on model-free and model-based multi-agent control approaches towards multifunctional IBEMS frameworks, seeking to draw out significant trends, conclusions, and future research directions in this field from the last decade. This review methodically delves into a diverse range of research works of the last decade (2014–2024), illustrating their foundational concepts, management strategies, applied algorithms, and specific implementations toward IBEMSs. By categorizing studies based on (a) the multi-agent control strategies; (b) the various combinations of energy systems; (c) their testbed building characteristics; and (d) the simulation tools that have been utilized, this study provides a comprehensive overview of established trends in the field. The systematic approach of this review ensures a detailed analysis and evaluation of each selected study.

Article Selection: Primarily utilizing databases like Scopus and Google Scholar, this research involved a preliminary review of over 500 papers through abstracts, from which the most pertinent were chosen for in-depth examination.
Keyword Research: Extensive keyword research was conducted, encompassing phrases like “Model-free Decentralized control in IBEMS”, “Adaptive Decentralized control in IBEMS”, “Multi-agent Control in IBEMS”, “Distributed Control in BEMS”, and specific terms related to each subsystem. This strategy ensures the capture of the diverse challenges and dynamics in optimizing IBEMS performance
Data Collection: Data from each publication were categorized, focusing on the multi-agent control techniques used for IBEMS management and its application context. Considerations included the advantages, limitations, and practical implications, especially in optimal IBEMS management scenarios.
Quality Assessment: Each selected study underwent a thorough quality evaluation based on citation count, the academic contribution of the authors, and the research methodologies used. This helped determine the relative significance and impact of each piece of research. It should be noted that the papers from the last decade that have been elected scored more than 10 citations according to Scopus. Such integration ensured a large number of integrated papers and also directed the evaluation and the final conclusions towards the field of multi-agent control in IBEMSs.
Data Synthesis: Finally, the gathered insights were organized into distinct categories, facilitating straightforward comparisons and a clear understanding of the research landscape in model-based and model-free multi-agent control for IBEMS.

1.3. Previous Work

The literature exhibits numerous review works concerning the control of different BEMsS, evaluating model-based and model-free approaches in order to embrace autonomy and intelligence. In [38], Mahela et al., studied multi-agent systems (MAS) for controlling smart grids, examining their applications in energy management, pricing, scheduling, and reliability. The review highlighted the advantages of MAS in enhancing smart grid operations, including communication, security, and integration with electric vehicles and building energy systems. In [39], Naylor et al. described the gap between designed and actual building energy use, emphasizing the need to consider occupants’ influence on energy consumption. This effort reviewed the existing building control systems and explored research on advanced occupant-centric controls, comparing different methods from simple presence-based lighting to complex predictive control. The findings indicated that a balance is required between system complexity and energy-saving potential. Merabet et al. [40] evaluated different AI techniques in HVAC systems to improve energy efficiency and comfort, noting AI’s potential despite current limitations like data quality. Covering a wide range of studies, the review concluded that AI techniques contributed to significant energy and comfort improvements but also outlined challenges and future research directions for optimizing building energy use and occupant comfort with AI.

An important work by Kathirgamanathan et al. [41] explored how data-driven control, enhanced by Internet of Things technology, may improve energy management in buildings for better smart grid integration, addressing challenges in modeling diverse building stocks and emphasizing unexplored areas in model integration and energy flexibility. This work also identified research gaps and future directions for efficient grid integration. Also, Michailidis et al. [30] provided a detailed examination of the recent progress in HVAC control for energy savings and comfort in buildings, focusing on model-free methods like reinforcement learning, neural networks, and fuzzy logic for HVAC management. By analyzing key studies from recent years, the paper provided an evaluation on different fields, including HVAC use and building zoning. The review identified trends in HVAC control.

1.4. Novelty and Contribution

The current review stands out in several key ways from the existing literature on multi-agent control in IBEMS. First, it is the only one specifically focused on multi-agent control frameworks for managing various BEMS components: HVAC, DHW, LS, RES, ESS, EVs, and more. Unlike previous studies, the current approach involves a large-scale comprehensive examination and analysis of applications—notably, more that 70 research works from 2014 to 2024 have been analyzed and evaluated in order to extract fruitful conclusions for multi-agent control in the field of energy management in buildings. It should be noted that almost every work from 2014 to 2024 that scored above ten citations (>10), in the field of multi-agent control in IBEMS, has been integrated in this paper, in order to provide conclusions in the Section 5.

By providing a detailed summary of the most impactful works from the past decade, the current effort highlights the impact of each intelligent control method in the field. Through summarized tables that cover the most influential works of the last decade, potential readers may easily locate research relevant to their interests and distinguish between different studies. To this end, a thorough evaluation of different aspects of research is given: primary control strategies and their subsets, IBEMS types, application type, building type, simulation tools, etc.

1.5. Paper Structure

This paper is structured as follows: Section 1 integrates the motivation of the current review as well as the Literature Analysis Approach, the contribution and novelty, as well as the paper structure. Section 2 illustrates a general description of IBEMS frameworks, the classification of multi-agent types considering the interaction between the agents, as well as a description of the generalized process of model-based and model-free multi-agent control strategies for IBEMS. In Section 3, the general mathematical concept of the primary control strategies and algorithmic methodologies are given in detail. Next, in Section 4, a review of the most highly cited research works between 2014 and 2024 is illustrated per control strategy along with summarized tables integrating the primary characteristics of the research works. Section 5 concerns the evaluation of the different research aspects, integrating comparisons between the prevailing methodologies, the agent interactions, the IBEMS elements, the application types, and the simulation tools that have been primarily utilized. Last but not least, the Conclusions Section 6 summarizes the most significant outcomes of this work. Figure 1 illustrates graphically the structure of this paper.

2. Integrated Building Energy Management Systems

2.1. Primary IBEMS Subsystems

Managing energy at the building level through IBEMS involves coordinating various subsystems like HVAC, DHW, LSs, RES, ESS, and EVs. These subsystems are integrated to optimize energy use, comfort, and sustainability [7,30,42].

HVACs: Essential for maintaining indoor climate by regulating temperature, humidity, and air quality, using components like furnaces, air conditioners, and heat pumps.
DHW: Provides hot water for household and commercial use, typically involving water heaters and storage tanks to ensure a steady supply.
LS: Lighting systems, including LEDs and smart controls, provide appropriate illumination, enhancing safety and comfort.
RES: Systems such as solar panels and wind turbines generate power from sustainable sources, offering a clean energy alternative.
ESS: Store excess energy from a RES for later use, ensuring a consistent energy supply during high demand or low generation periods.
EVs: Integrated into a BEMS as mobile energy storage, EVs can be charged during low-demand periods and supply power during peak times, aiding in load management.

Figure 2 shows the interactions among energy subsystems within an IBEMS. The complexity and stochastic nature of these interactions require advanced control algorithms, real-time data processing, and robust communication networks [43]. Multi-agent control is effective in this context, allowing decentralized decision-making and optimizing each subsystem’s performance while maintaining overall coordination. The main challenge is to balance diverse energy demands and ensure the optimal performance of each subsystem without compromising the building’s energy efficiency and reliability.

2.2. Multi-Agent Control Types for IBEMS

According to the literature [44], multi-agent control in IBEMS can be segmented into four key approaches: centralized, decentralized, cooperative, and non-cooperative, each with distinct advantages and challenges. More specifically,

Centralized Multi-Agent Control (CE): In this approach, a single central agent makes decisions based on information from all other agents, enabling globally optimal solutions and highly efficient energy management. However, it is prone to scalability issues and risks a single point of failure.
Decentralized Multi-Agent Control (DE): Here, multiple agents operate independently, making decisions based on local observations. This offers robustness and scalability, as the system is resilient to individual agent failures. The main challenge is the potential for suboptimal global performance due to limited coordination.
Cooperative Multi-Agent Control (CO): Agents in this approach share information and coordinate actions to achieve common goals, balancing local and global optimization. While this enhances overall efficiency, it requires reliable communication networks and complex algorithms, which can add cost and complexity.
Non-Cooperative Multi-Agent Control (Non-CO): This approach is characterized by agents operating independently without coordination, offering simplicity and ease of implementation. However, it can lead to inefficiencies and conflicts, as agents may work at cross-purposes, reducing overall system performance.

It should be noted that in some cases the interaction between the agents may hold more than one schemes from the above classification. To this end, in some cases the algorithmic approaches may integrate cooperative and also decentralized schemes (DE/CO) or even centralized and cooperative schemes (CE/CO), etc.

The following Figure 3 illustrates graphically the four primary multi-agent types considering the behavior of multiple agents in BEMS applications:

2.3. General Description of Multi-Agent Control Processes in IBEMS Applications

In a multi-agent IBEMS control process, both model-based and model-free approaches start with data acquisition, but they differ in execution: Model-based control uses the acquired data to refine mathematical models, predict future states, and optimize decision-making. Feedback is used to adjust optimization parameters and refine existing models, ensuring precise and efficient energy management. On the other hand, Model-free control skips detailed modeling and continuously adapts the control strategies based on real-time feedback. This approach is more dynamic, learning optimal behaviors over time without relying on predefined models. More specifically Figure 4 and Figure 5, portray the generalized control processes for Mode-based and Model-free control in IBEMS, respectively:

Model-based Control
1.
Data Acquisition: Collects real-time and historical data to capture the current state and improve model accuracy.
2.
System Modeling: Develops and refines models based on data, crucial for predicting future states.
3.
Future State Prediction: Uses models to predict future states and optimize control actions across subsystems.
4.
Control Strategy Execution: Executes optimized actions and makes real-time adjustments based on feedback.
5.
Performance Evaluation and Feedback: Monitors performance and refines models for continuous improvement.

Figure 4. The generalized model-based multi-agent control process for IBEMS.

Model-free Control
1.
Data Acquisition: Collects real-time and historical data for pattern recognition and learning.
2.
Learning Algorithm Initialization: Selects and initializes model-free algorithms like reinforcement learning.
3.
Real-Time Learning and Adaptation: Continuously adapts decisions based on new data and learning outcomes.
4.
Control Strategy Execution: Executes control actions independently in each subsystem based on learning outcomes.
5.
Performance Evaluation and Feedback: Monitors performance and refines algorithms for ongoing improvement.

Figure 5. The generalized model-free multi-agent control process for IBEMS.

3. Mathematical Concept Multi-Agent Control Methodologies for IBEMS

According the recent literature, various control algorithms are commonly employed in multi-agent control for IBEMS, both in model-based and model-free approaches. Model-based approaches such as MPC are frequently used for their ability to leverage detailed system models for predictive optimization and coordinated decision-making [45,46]. On the other hand, model-free algorithms such as reinforcement learning (RL) are widely adopted for their flexibility and adaptability, allowing systems to learn and optimize control strategies without relying on predefined models [33]. Other common methodologies that may be implemented in both ways may concern ADMM [47] or evolutionary algorithms such as Particle Swarm Optimization (PSO) [48] and Genetic Algorithms (GAs) [49]. Such control strategies are often tailored to the specific requirements of the building’s energy management systems, providing a balance between precision and adaptability.

3.1. Reinforcement Learning

Reinforcement learning (RL) methodologies applied to multi-agent control in Integrated Building Energy Management Systems (IBEMS) are typically modeled using Markov Decision Processes (MDPs) [50]. An MDP is a mathematical framework used to describe environments in which agents make decisions, with the goal of optimizing a long-term objective, such as energy efficiency or occupant comfort [50]. An MDP is defined by the tuple

(S, A, P, R, γ)

, where

S is the set of possible states. Each state $s \in S$ represents a specific configuration of the building environment, such as temperature levels, energy consumption, and occupant activities.
A is the set of actions available to the agent. An action $a \in A$ could represent decisions like adjusting the HVAC system or controlling lighting levels.
$P (s^{'} | s, a)$ is the state transition probability function, which describes the likelihood of moving to a new state $s^{'}$ after taking action a in state s. This probability models the dynamics of the environment.
$R (s, a)$ is the reward function, which assigns a numerical value to each state–action pair. The reward serves as feedback to the agent, encouraging actions that lead to desirable states (e.g., high energy efficiency or occupant comfort) and discouraging undesirable ones.
$γ \in [0, 1]$ is the discount factor, which determines the weight given to future rewards. A lower $γ$ prioritizes immediate rewards, while a higher $γ$ encourages the agent to plan for long-term benefits.

The agent’s goal is to learn a policy

π : S \to A

, which dictates the best action to take in each state to maximize the expected cumulative reward over time [50]. The cumulative reward, also known as the return, is defined as follows:

G_{t} = \sum_{k = 0}^{\infty} γ^{k} r_{t + k + 1}

where

G_{t}

is the total reward from time step t onward, and

r_{t + k + 1}

is the immediate reward received at each future time step. This formulation emphasizes the balance between short-term and long-term rewards, controlled by the discount factor

γ

. In multi-agent systems, each agent i operates with its own policy

π_{i} (a_{i} | s)

, where

a_{i} \in A_{i}

is the action chosen by agent i based on the observed state s. The interaction of multiple agents introduces complexity because the optimal policy for one agent may depend on the actions taken by others. Thus, agents must learn to cooperate or compete, depending on the system’s objectives [50]. One common approach for learning the optimal policy is Q-learning, a value-based method where agents estimate the action value function

Q (s, a)

. The function

Q (s, a)

represents the expected cumulative reward for taking action a in state s and following the optimal policy thereafter. The Q-learning update rule is based on the Bellman equation [50]:

Q (s, a) \leftarrow Q (s, a) + α [R (s, a) + γ max_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]

Here,

α

is the learning rate, controlling how quickly the agent updates its knowledge. The term

R (s, a) + γ {max}_{a^{'}} Q (s^{'}, a^{'})

represents the target value, which combines the immediate reward

R (s, a)

with the estimated value of the best future action

{max}_{a^{'}} Q (s^{'}, a^{'})

. The agent iteratively adjusts its estimate of

Q (s, a)

to minimize the difference between the current estimate and the target value. By following this iterative process, agents in multi-agent systems gradually learn optimal or near-optimal policies that improve system performance over time. In IBEMS, these policies typically focus on optimizing energy consumption, reducing costs, and maintaining occupant comfort [50]. In more advanced approaches, actor–critic methods are used, where the policy (actor) and the value function (critic) are updated simultaneously. This approach helps agents converge more reliably and efficiently in large and complex environments, as it decouples the process of policy improvement from the process of value estimation [50]. In multi-agent settings, agents coordinate their actions, optimizing overall system performance through centralized or decentralized approaches. According to the literature, RL methodologies are classified into three approaches [36]:

Value-based: Methods like Q-learning and Deep Q-Networks (DQNs) estimate action–state values, suitable for discrete actions but limited in complex, continuous tasks.
Policy-based: Methods like Proximal Policy Optimization (PPO) optimize policies directly, effective for continuous control but computationally intensive.
Actor–critic: Combining value- and policy-based methods, techniques like Soft Actor–Critic (SAC) balance exploration and exploitation, ideal for complex, multi-agent environments but requiring careful tuning.

3.2. Model Predictive Control

Model Predictive Control (MPC) is a powerful approach commonly used in multi-agent control for Integrated Building Energy Management Systems (IBEMS), where agents optimize control actions over a finite future time horizon based on a predictive model of their subsystems [45]. MPC’s advantage lies in its ability to forecast future states and adjust control actions to minimize a predefined cost function while considering system constraints, such as energy consumption limits and occupant comfort levels. In the MPC framework, at each time step, each agent i solves an optimization problem to determine the sequence of control actions

u_{i} (t)

for a future time horizon T [45,51]. The control strategy is designed to minimize a cost function

f_{i}

, which encapsulates the objectives of the agent, such as minimizing energy consumption, reducing operational costs, or maintaining indoor comfort. The general optimization problem for each agent i can be expressed as follows:

min_{u_{i}} \sum_{t = 0}^{T} f_{i} (x_{i} (t), u_{i} (t))

where

u_{i} (t)

represents the control inputs (e.g., HVAC settings and lighting levels),

x_{i} (t)

denotes the state variables (e.g., indoor temperature and energy storage levels), and

f_{i} (x_{i} (t), u_{i} (t))

is the cost function that penalizes deviations from desired performance levels [45,51].

System Dynamics and Constraints: The optimization is subject to the system dynamics, typically modeled as a discrete-time linear system [45]:

x_{i} (t + 1) = A_{i} x_{i} (t) + B_{i} u_{i} (t)

where

$x_{i} (t) \in R^{n_{i}}$ is the state vector at time t, representing the internal state of agent i’s subsystem (e.g., temperature in a zone or energy usage of a system);
$u_{i} (t) \in R^{m_{i}}$ is the control input vector;
$A_{i} \in R^{n_{i} \times n_{i}}$ is the state transition matrix, governing how the state evolves over time in the absence of control inputs;
$B_{i} \in R^{n_{i} \times m_{i}}$ is the control input matrix, which defines how control actions influence the state evolution.

The cost function

f_{i} (x_{i} (t), u_{i} (t))

is often quadratic in practice, as this allows for efficient optimization [45,51]. A common form is as follows:

f_{i} (x_{i} (t), u_{i} (t)) = {(x_{i} (t) - x_{i}^{r e f})}^{T} Q_{i} (x_{i} (t) - x_{i}^{r e f}) + u_{i} {(t)}^{T} R_{i} u_{i} (t)

where

x_{i}^{r e f}

is the reference state (desired temperature, for example) and

Q_{i}

and

R_{i}

are weight matrices that penalize deviations from the reference and large control actions, respectively. The quadratic nature of

f_{i}

makes it easier to solve the optimization problem using standard convex optimization techniques, such as quadratic programming (QP).

MPC Optimization Procedure: The MPC optimization is solved over a receding horizon; although a sequence of control actions

{u_{i} (0), u_{i} (1), \dots, u_{i} (T)}

is computed, only the first control action

u_{i} (0)

is implemented. The process then repeats at the next time step with updated state information

x_{i} (t + 1)

, making MPC an adaptive control strategy. This iterative process ensures that the agents can continuously adjust their actions based on real-time data, providing robustness to uncertainties and changes in the building environment [45,51].

3.3. Evolutionary Algorithms

Evolutionary algorithms (EAs) are a class of optimization methods inspired by natural evolution and are widely applied in multi-agent control for Integrated Building Energy Management Systems (IBEMS). These algorithms are particularly effective for exploring large, complex search spaces, where traditional gradient-based methods may struggle due to the non-derivable nature of the objective functions. Two prominent evolutionary approaches used in IBEMS are Particle Swarm Optimization (PSO) and Genetic Algorithms (GAs).

Particle Swarm Optimization (PSO): PSO is a population-based optimization method where agents, known as particles, explore the search space to optimize control parameters (e.g., HVAC settings or renewable energy source (RES) integration) in a collaborative manner [48]. Each particle adjusts its position based on its own experience and that of its neighbors, making PSO particularly suitable for dynamic and uncertain environments like building energy management. The position

{\vec{x}}_{i}

and velocity

{\vec{v}}_{i}

of each particle i are updated using the following equations:

{\vec{v}}_{i} (t + 1) = w {\vec{v}}_{i} (t) + c_{1} r_{1} ({\vec{p}}_{i} - {\vec{x}}_{i} (t)) + c_{2} r_{2} (\vec{g} - {\vec{x}}_{i} (t))

{\vec{x}}_{i} (t + 1) = {\vec{x}}_{i} (t) + {\vec{v}}_{i} (t + 1)

where

${\vec{v}}_{i} (t)$ is the velocity of particle i at time t;
w is the inertia weight that controls the exploration–exploitation trade-off;
$c_{1}$ and $c_{2}$ are cognitive and social coefficients, respectively;
$r_{1}$ and $r_{2}$ are random numbers drawn from the uniform distribution in $[0, 1]$ ;
${\vec{p}}_{i}$ is the particle’s personal best position;
$\vec{g}$ is the global best position found by the swarm.

PSO is simple, fast, and adaptive but can suffer from premature convergence, especially when the problem landscape contains multiple local optima [48].

Genetic Algorithms (GA): Genetic Algorithms simulate the process of natural selection, where a population of potential solutions evolves over time to optimize building energy management tasks such as scheduling energy storage systems (ESS) and optimizing domestic hot water (DHW) usage [49]. The evolutionary process follows the following steps [52]:

1.: Selection: Individuals are selected based on their fitness, with higher-fitness individuals more likely to reproduce.
2.: Crossover: Selected individuals are paired to exchange genetic material, combining their chromosomes (solution encodings) to produce offspring.
3.: Mutation: Random changes are introduced to offspring to maintain diversity and prevent premature convergence.

The overall process can be represented by the following equation:

New Population = Select (Mutate (Crossover (Population)))

where the new population is formed after selection, crossover, and mutation operations are applied to the existing population. GAs are robust and flexible but can be computationally intensive, particularly in large-scale IBEMS, as the search space grows with system complexity. These evolutionary operations enable the system to converge towards optimal energy management strategies, ensuring an integrated and efficient approach to building energy management [49].

3.4. Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers (ADMM) decomposes complex optimization problems into smaller sub-problems, each managed by individual agents controlling different building systems like HVAC, RES, and ESS [53]. Agents solve these sub-problems independently, and then iteratively coordinate to find a global solution by updating local variables

x_{i}

, global variables z, and dual variables

λ_{i}

through a series of minimization steps. This iterative process continues until convergence, allowing ADMM to efficiently handle large-scale, distributed optimization tasks in IBEMS. Although ADMM is communication-intensive, requiring frequent data exchanges, its decomposition strategy is particularly suited for real-time applications in dynamic environments [53]. The general form of the optimization problem can be written as follows [54]:

min_{x, z} \sum_{i = 1}^{N} f_{i} (x_{i}) + g (z), subject to A_{i} x_{i} + B_{i} z = c_{i}

where

$x_{i}$ represents the local decision variables of agent i;
z is a global variable shared among the agents;
$f_{i} (x_{i})$ is the local cost function for agent i;
$g (z)$ is a global cost function;
$A_{i}$ and $B_{i}$ are matrices that define the coupling between local and global variables;
$c_{i}$ is a constant vector representing constraints.

ADMM operates by splitting the optimization problem into two parts: one for the local variables

x_{i}

and another for the global variable z. This decomposition allows each agent to solve its own sub-problem independently, which can be computed in parallel, while a global coordination step ensures that the solutions remain consistent [54]. The iterative update steps in ADMM are as follows:

1.: Update $x_{i}$ : Each agent solves its local optimization problem based on its current state and the global variable z.

$x_{i}^{k + 1} = arg min_{x_{i}} (f_{i} (x_{i}) + \frac{ρ}{2} {∥ A_{i} x_{i} + B_{i} z^{k} - c_{i} + λ_{i}^{k} ∥}_{2}^{2})$

Here, $ρ$ is a penalty parameter that controls the convergence speed, and $λ_{i}$ is the Lagrange multiplier associated with the constraints.
2.: Update z: A central coordinator updates the global variable z by solving

$z^{k + 1} = arg min_{z} (g (z) + \sum_{i = 1}^{N} \frac{ρ}{2} {∥ A_{i} x_{i}^{k + 1} + B_{i} z - c_{i} + λ_{i}^{k} ∥}_{2}^{2})$
3.: Update Lagrange multipliers $λ_{i}$ : Each agent updates its local multiplier:

$λ_{i}^{k + 1} = λ_{i}^{k} + ρ (A_{i} x_{i}^{k + 1} + B_{i} z^{k + 1} - c_{i})$

These steps are repeated iteratively until the variables

x_{i}

, z, and

λ_{i}

converge within a predefined threshold. The method’s ability to decompose large optimization problems into smaller, parallel tasks makes it highly efficient for large-scale multi-agent systems such as IBEMS, especially when coordination and computational scalability are critical [54]. ADMM’s decomposition allows for parallel computation of local updates and centralized updating of global variables, making it well suited to managing large-scale, diverse subsystems in IBEMS [53].

4. Review on Multi-Agent Control for Energy Management Systems

This section exhibits numerous highly cited multi-agent research applications related to IBEMS control optimization in an effort to discriminate them into the aforementioned sub-fields: Model-free (as RL); Model-based (as MPC or other Model-based with Evolutionary optimization, ADMM optimization, Lyapunov optimization etc.); Hybrid strategies which concern the integration of multiple approaches; and Other model-free and model-based applications do not concern any of the aforementioned control strategies.

Along with analyzing the concerning highly cited applications of the last decade (2014–2024), this section also integrates summarized tables for each model-free control sub-field. To this end, Table 1, Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 contain the following features:

Ref.: illustrating the reference application in the first column;
Year: illustrating the publication year of for each research application;
Type: defining whether the applied multi-agent control strategy concerns the centralized (CE); decentralized (DE); cooperative (CO); or non-cooperative (Non-CO) interaction type between the agents;
Method: illustrating the specific RL; MPC; ADMM; evolutionary algorithms; Lyapunov optimization; hybrid; or other multi-agent algorithmic methodologies applied in each work;
IBEMS: illustrating the specific integrated equipment that each IBEMS integrates considering the most common devices in the building setting: HVAC, RES, ESS, LS, EVs, DHW, and Other;
Residential: defining if the testbed application concerns a Residential Building Control Application with an “x”;
Commercial: defining if the testbed application concerns a Commercial Building Control Application with an “x”;
Simulation: defining if the testbed application concerns a Simulative Building Control Application with an “x”;
Real-life: defining if the testbed application concerns a Real-world or Real-life Building Control Application with an “x”;
Citations: portrays the number of citations—according to Scopus—of each work.

The abbreviation “N/A” represents the “not identified” elements in tables and figures. In the following subsections, the integrated research applications are described in detail regarding their motivation, their conceptual control methodology, as well and their final outcome.

Table 1. Summarized table of Model-free control applications using RL strategies (2014–2024).

Ref.	Year	Type	Method	IBEMS	Residential	Commercial	Simulation	Real-Life	Citations
[55]	2015	CO	Q-learning	RES/ESS	x		x		37
[56]	2017	CO	BL	RES/DHW/ESS	x		x		218
[57]	2017	DE/CO	eJAL	HVAC/LS/Other		x	x		50
[58]	2018	DE/CO	Q-learning	RES/ESS/Other	x			x	179
[59]	2019	DE/CO	DQN	RES/ESS/Other	x		x		34
[60]	2020	DE/CO	DQN	HVAC	x	x	x		77
[61]	2020	CO	DQN	HVAC		x	x		42
[62]	2020	CO	A2C	HVAC/RES/ESS	x		x		112
[63]	2020	DE	Q-learning	HVAC/RES/EVs	x		x		209
[64]	2020	DE	ISA	HVAC/RES/Other	x		x		43
[65]	2021	CO	MAAC	HVAC		x	x		175
[66]	2022	CO	DQN	HVAC		x	x		21
[67]	2022	CO	MAAC	HVAC		x	x		38
[68]	2022	CO	D3QN	HVAC/RES/ESS		x	x		50
[69]	2022	CO	Q-learning	HVAC	x		x		23
[70]	2022	CE/CO	BDQ	HVAC		x	x	x	48
[71]	2022	CO	SAC	RES/ESS	x	x	x		21
[72]	2022	DE	ACKTR	RES/ESS/Other	x		x		8
[73]	2022	CO	TD3	RES/ESS/Other		x	x	x	17
[74]	2023	DE/CO	PPO	RES/ESS	x	x	x		11
[75]	2023	DE/CO	DDPG	HVAC/RES/ESS	x	x	x		30
[76]	2023	DE/CO	MAAC	HVAC		x	x		18
[77]	2023	Non-CO/DE	SAC	RES/ESS	x		x		11
[78]	2023	CO	DQN	HVAC	x		x		15

Table 2. Summarized table of Model-based control applications with using MPC strategies (2014–2024).

Ref.	Year	Agent	Method	IBEMS	Residential	Commercial	Simulation	Real-Life	Citations
[79]	2014	DE	N/A	HVAC		x	x		16
[80]	2017	DE	E-MPC	HVAC	x		x		66
[81]	2017	DE	Grey-Box	HVAC	x		x		12
[82]	2018	DE/CO	Game	HVAC	x	x	x		13
[83]	2018	CO	N/A	HVAC		x	x		34
[84]	2018	DE	EBDC	RES/EVs	x	x	x		71
[85]	2019	DE	DP	HVAC/RES/EVs	x		x		11
[86]	2020	DE	QP	HVAC/ESS	x		x		12
[87]	2020	DE	Naive	HVAC	x		x		44
[88]	2020	DE	Hierarchical	HVAC/RES/ESS	x	x	x		18
[89]	2020	DE	DP	HVAC		x	x		31
[90]	2021	DE	N/A	HVAC		x	x	x	28
[91]	2022	CO	DD	HVAC/ESS/Other		x	x	x	29

Table 3. Summarized table of Model-based control applications with Evolutionary optimization (2014–2024).

Ref.	Year	Agent	Method	IBEMS	Residential	Commercial	Simulation	Citations
[92]	2015	CO	PSO	HVAC/LS/ESS		x	x	71
[93]	2016	CO	GA	HVAC/LS/RES/ESS		x	x	69
[94]	2017	CO	PSO	HVAC/LS/RES/ESS		x	x	20
[95]	2018	CO	GA	HVAC/LS/RES/ESS		x	x	43
[96]	2023	DE/CO	PSO	HVAC/RES/ESS/EVs	x		x	16

Table 4. Summarized table of Model-based control applications with ADMM optimization (2014–2024).

Ref.	Year	Agent	Method	IBEMS	Residential	Commercial	Simulation	Citations
[97]	2016	CO	ADMM	HVAC		x	x	72
[98]	2017	CO	J-ADMM	HVAC		x	x	34
[99]	2019	CO	NC-ADMM	RES	x		x	73
[100]	2020	CO	J-ADMM	HVAC		x	x	54
[101]	2021	DE/CO	J-ADMM	HVAC		x	x	21
[102]	2021	DE/CO	DC-ADMM	HVAC/RES/ESS/EVs	x	x	x	80
[103]	2022	CO	H-ADMM	HVAC		x	x	11

Table 5. Summarized table of Model-based control applications with Lyapunov optimization (2014–2024).

Ref.	Year	Agent	Method	IBEMS	Residential	Commercial	Simulation	Citations
[104]	2014	Non-CO	LCMA	RES/EES/Other	x		x	147
[105]	2014	Non-CO	Lyapunov	HVAC	x		x	87
[106]	2017	Non-CO	Lyapunov	HVAC		x	x	50

Table 6. Summarized table of Hybrid control applications (2014–2024).

Ref.	Year	Agent	Method	IBEMS	Residential	Commercial	Simulation	Real-Life	Citations
[107]	2014	CO	ULDC/FLC	HVAC		x		x	162
[108]	2016	DE	PSO/ANN	HVAC		x		x	73
[109]	2018	CO	CBR/ANN	HVAC		x		x	105
[110]	2018	CO	ADMM/MPC	HVAC		x	x	x	31
[111]	2020	CE	RL/ANN	HVAC	x		x		60
[112]	2020	CE	RL/ANN	HVAC		x	x		96
[113]	2021	CO	PSO/MPC	HVAC/RES/ESS	x		x	x	30
[114]	2021	CO	PSO/MPC	HVAC		x	x		19
[115]	2022	CO	RL/FLC	HVAC/LS/Other	x		x		29
[116]	2022	CO	ADMM/MPC	HVAC		x	x		19

Table 7. Summarized table of Other control applications (2014–2024).

Ref.	Year	Agent	Method	IBEMS	Residential	Commercial	Simulation	Real-Life	Citations
[117]	2014	DE	Consensus	HVAC/EVs/Other	x			x	225
[118]	2016	CO	N/A	HVAC		x	x	x	46
[119]	2016	CO	SoC	RES/ESS	x		x	x	58
[120]	2018	CE/CO	L4GPCAO	HVAC		x		x	51
[121]	2019	DE/CO	TBSA	HVAC		x		x	85
[122]	2019	DE	CHPD	HVAC/RES/ESS		x	x		44
[123]	2020	CO	DAC	HVAC		x	x		41
[124]	2021	DE/CO	Consensus	RES/ESS/Other	x	x	x		49
[125]	2021	CE/DE/CO	RPC/APT	HVAC/DHW/RES/ESS	x		x		10
[126]	2021	CO	Hierarchical	RES/ESS/Other	x	x	x		25
[127]	2021	DE/CO	Blockchain	HVAC/LS/Other	x		x		35
[128]	2022	CO	N/A	HVAC/RES/ESS		x	x		22

4.1. Model-Free Control Strategies

Reinforcement Learning

In 2015, Raju et al. [55] implemented a novel value-based approach–namely, the Coordinated Q-learning (CQL) approach—in a multi-agent framework for grid-connected solar microgrid battery scheduling. According to the evaluation, CQL was able to reduce grid power consumption by 15% and enhance solar power utility by 10–12%. In 2017, a policy-based RL approach was proposed by Avari et al. [56]. According to the evaluation, they proposed an ontology-driven EMS, achieving a 5% reduction in microgrid costs and maintaining a 90% user satisfaction rate. In the same year, Hurtado et al. [57] introduced another policy-based approach—namely eJAL—which concerned a cooperative and decentralized algorithm. The algorithm managed to achieve the highest fairness index and reduced overload duration by 16.3%, compared to 22.6% with Q-learning and 5.2% with the n-player game. According to the simulation results, the comfort loss using eJAL was minimal, averaging at 0.76%. The next year, Kofinas et al. [58] developed a Hybrid Q-learning multi-agent control framework for IBEMS, integrating MPC to manage different zones and HVAC components. This model-based RL approach achieved around 20% energy savings while maintaining thermal comfort at the desired levels. In 2019, Prasad et al. [59] utilized a DQN framework for energy sharing, modeling each building as an agent, achieving a 40–60 kWh improvement over non-sharing strategies.

In 2020, Gupta et al. [60] compared the performance of a novel DQN-based heating controller with traditional thermostats. Stimulative experiments demonstrated significant enhancements in thermal comfort (15–30%) and energy cost reduction (5–12%). In the same year, Nagarathinam et al. [61] employed multiple cooperative DQL agents to dynamically control both AHU and chiller set-points for optimizing HVAC operations towards energy efficiency and occupant comfort. According to the evaluation, the MARCO framework achieved a 17% reduction in energy consumption compared to traditional seasonal set-point strategies, and performed comparably to an ideal MPC with only 2% higher energy use while maintaining an absolute comfort compliance. In an actor–critic RL approach of 2020, Lee et al. [62] introduced a federated RL framework using the A2C algorithm. According to the results, the AC framework reduced electricity consumption by 20%, costs by 30%, and emissions by 20%. In the same yer, Xu et al. [63] applied Q-learning for HVAC control, based on historical and real-time data, resulting in approximately 15% energy savings and reduced operational costs. Also in 2020, Vazquez et al. [64] developed the MARLISA framework, which considers a MARL controller with an iterative sequential action selection algorithm, designed for scalable, decentralized load shaping in urban energy systems. The approach reduced the daily peak load by 15% and ramping by 35% in contrast to RBC in the concerned building. In a 2021 work, Yu et al. [65] proposed a Multi-Agent Actor–critic (MAAC) approach to minimize energy costs while considering random zone occupancy, thermal comfort, and indoor air quality. The novel control framework achieved a significant energy cost reduction of 56.50–75.25% in contrast to baseline control in simulations with 30 zones.

In 2022, Fu et al. [66] proposed a DQN framework to optimize HVAC system operation by distributing the action space across five agents—each responsible for different system components like chillers, cooling towers, and water pumps. The novel framework managed to achieve 11.1% improved energy efficiency and faster convergence. In the same year, Yu et al. [67] used a MAAC framework for coordinating HVAC and personal comfort systems in office spaces, reducing energy consumption by 0.7–4.18% and improving thermal comfort by 64.13–72.08%. In the same year, Shen et al. [68] proposed a novel multi-agent D3QN framework for energy system optimization. Such a novel approach achieved an 84% reduction in uncomfortable duration and a 43% decrease in unconsumed renewable energy. Blad et al. [69] introduced an offline Q-learning framework for HVACs, comparing Multi-Layer Perceptron (MLP) and Long short-term memory (LSTM) networks. According to the final outcomes, the LSTM models reduced prediction errors by 45% and achieved a 19.4% reduction in heating costs. Also the same year, Lei et al. [70] utilized a Branching Dueling Q-network for high-dimensional control action spaces, incorporating a tabular-based personal comfort modeling method. In a real-world office deployment, the approach achieved a 14% cooling energy reduction and 11% improvement in thermal acceptability. Pinto et al. [71] utilized a SAC algorithm for energy management in buildings with thermal storage and PV systems, leading to a 7% cost reduction and a 14% peak demand reduction. Similarly, Chu et al. [72] employed another AC methodology—namely, the ACKTR methodology—for optimizing home IBEMS, achieving a 25.37% cost reduction on a typical test day. Moreover, in 2022 Gao et al. [73] developed a hierarchical RL framework for optimizing off-grid operations, improving performance by 64.93% and reducing unsafe battery runtime by 84%. In the same year, Pigott et al. [74] introduced GridLearn, using the PPO algorithm to manage voltage regulation, resulting in a 34% reduction in overvoltages.

In 2023, a DDPG algorithm was proposed by Qiu et al. [75], abbreviated as the Fed-JPC algorithm. Such a novel algorithmic scheme incorporated federated learning to maintain privacy while optimizing energy trading and emissions reduction. Empirical results showed that the Fed-JPC outperformed traditional methods, reducing total energy and environmental costs by 5.87% and 8.02%, respectively. In the same year, Xie et al. [76] applied a MAAC methodology for demand response in grid-responsive buildings, achieving over a 6% reduction in net load demand. Also in 2023, Nweye et al. [77] introduced the MERLIN framework using SAC agents for battery control, improving electricity consumption by 20%, cost by 30%, and emissions by 20%. More recently, Homod et al. [78] introduced a DCCMARL framework for multi-chiller control, reducing energy consumption by up to 49% and improving overall building energy efficiency by 44.5%.

MARL approach classification into value-based, policy-based, and actor–critic may be found at Table 8 along with the primary achievement of each RL research work.

4.2. Model-Based Control Strategies

4.2.1. Model Predictive Control (MPC)

In 2014, Lauro et al. [79] proposed an MPC strategy for a commercial office building’s HVAC system, comparing decentralized and distributed configurations. Their adaptive strategy reduced energy consumption by 49% and achieved a lower average temperature error (0.035 °C vs. 0.073 °C) compared to non-adaptive methods, with a 29% improvement in comfort. In 2017, Pedersen et al. [80] introduced an Economic–MPC (E-MPC) scheme for multi-story residential buildings, achieving up to 6% cost savings and 3% CO₂ reduction. According to the evaluation, the centralized approach slightly outperformed the decentralized MPC, shifting 2 kWh/m² of energy from peak load periods. In a related study by Pedersen et al. [81], a decentralized MPC managed to achieve similar cost savings (11.1%) and comfort levels to centralized control, particularly in retrofitted buildings with insulated walls. Abobakr et al. [82] proposed a decentralized MPC strategy for thermal appliances in smart buildings, combining local MPC with a game-theoretic supervisory control. This approach reduced peak power consumption by 36% and maintained temperature regulation within limits. In 2018, Sangi et al. [83] introduced a hybrid agent-based MPC approach for large-scale commercial buildings, achieving a 2% reduction in primary energy consumption using exergy as a cost function. In the same year, Yang et al. [84] developed a decentralized MPC algorithm for coordinating EV charging with wind power in microgrids, reducing grid dependency by 68.4%. Zhuo et al. [85] presented a multi-agent control strategy for microgrids, optimizing RES, and loads. Their method reduced energy costs by 10% and enhanced computational efficiency. The next year, Lyons et al. [86] implemented MPC for a communal heating system in residential estates, achieving significant improvements in thermal comfort (2.19 Kh vs. 126.45 Kh) and cost efficiency (13.40 p vs. 13.43 p per kWh) compared to PI control. El Geneidy et al. [87] investigated energy flexibility in residential communities using MPC, showing that centralized control offered better coordination and energy efficiency, despite increased overall energy consumption (3.31–25.11%) due to preheating. Also in 2020, Wang et al. [88] proposed a multi-agent MPC scheme for optimizing renewable resource allocation and load scheduling in mixed-use communities, minimizing unserved load to 0.27% in optimal scenarios. Saletti et al. [89] enhanced MPC with dynamic programming to optimize thermal energy distribution in a school complex, reducing fuel consumption by 7% and avoiding comfort failures. The next year, Wu et al. [90] applied a multi-agent MPC approach for district heating in commercial buildings, achieving up to 4.8% annual cost savings, with a trade-off between energy savings and occupant comfort. More recently, Lefebure et al. [91] examined distributed MPC using dual decomposition, achieving near-centralized performance with 16% lower heating costs compared to decentralized control, confirming the approach’s scalability and efficiency.

4.2.2. Model-Based Control with Evolutionary-Based Optimization

The literature from 2014 to 2024 presents several multi-agent evolutionary approaches for optimizing IBEMS, with Particle Swarm Optimization (PSO) and Genetic Algorithms (GAs) being the most predominan.

Particle Swarm Optimization: Hurtado et al. [92] in 2015 integrated PSO with a multi-agent system (MAS) to optimize energy use and comfort in smart buildings. Focusing on HVAC systems, RES, and ESS, their approach dynamically adjusts building operations to support local grid voltage control. The approach achieved up to 25% energy savings while improving grid support. Altayeva et al. [94] in 2017 developed a MAS using PSO for IBEMS, dynamically allocating power based on comfort priorities. PSO achieved the required temperature values and overall comfort even with a reduction in available power, effectively handling a 30% decrease in power availability. In 2023, Ghazimirsaeid et al. [96] optimized home microgrids in green buildings using PSO, achieving a 34% reduction in the Market Clearing Price during peak times and shifting 26% of total DR+ load to non-peak intervals.

Genetic Algorithms: Shaikh et al. [93] in 2016 proposed a multi-agent system using a hybrid multi-objective genetic algorithm (HMOGA) for building energy management, including eight solar panels (each 4.5 kW) and six wind turbine generators (each 5 kW), along with batteries totaling 45 kWh storage capacity. Such a framekwork improved energy efficiency by 31.6% and the comfort index by 8.1%. In another GA study of 2018, Rehman et al. [95] proposed multi-objective GA—namely, the Non-dominated Sorting Genetic Algorithm II (NSGA-II)—to optimize and compare centralized and semi-decentralized solar district heating systems in Finnish conditions. The specific IBEMS integrated solar collectors, heat pumps, thermal energy storage, and decentralized high-temperature tanks. According to the evaluation, the decentralized approach achieved up to 35% lower life cycle costs and a renewable energy fraction close to 90%, outperforming the centralized system in both economic and energy efficiency.

4.2.3. Model-Based Control with ADMM-Based Optimization

The literature from 2014 to 2024 highlights several multi-agent ADMM approaches for optimizing IBEMS, focusing on distributed control strategies: In 2016, Cai et al. [97] proposed a multi-agent control approach utilizing ADMM that automated controller design and reduced engineering effort. Such a framework achieved a 12.2% improvement in energy efficiency and a 42.7% potential energy saving in a DX air conditioning system. Hou et al. [98] in 2017 introduced a cooperative multi-agent control strategy using Proximal Jacobian ADMM within a DMPC framework, optimizing HVAC energy consumption across multiple zones. According to the evaluation, the approached exhibited a 10.15% reduction in energy consumption. In 2019, Carli et al. [99] proposed a Non-Convex ADMM (NC-ADMM) methodology for energy scheduling in microgrids. According to the simulation results, the approach significantly reduced the overall energy costs compared to scenarios without RES exchange, indicating the algorithm’s effectiveness in practical applications. In 2020, Li et al. [100] proposed a Jacobian ADMM (J-ADMM) for decentralized HVAC control, in IoT-enabled smart buildings. The study compared the novel strategy to a baseline control, which considered a time-driven centralized optimal control with a 1-min interval, achieving up to 11.34% energy savings and a 96.44% reduction in the excessive pollutant index (EPI). The following year, Li et al. [101] extended this approach, reducing CO₂ levels and energy consumption, and extending sensor battery life from 6.23 days to 84.03 days. In 2021, Lyu et al. [102] developed a fully decentralized algorithm based on Dual-Consensus ADMM (DC-ADMM) for peer-to-peer energy sharing among smart buildings, achieving cost reductions up to USD 110.57 per building while enhancing privacy and reducing information exchange. More recently, Wang et al. [103] proposed a Hierarchical ADMM (H-ADMM) for HVAC systems, using a multi-layer architecture to represent the tree-like structure of BEMS components and zones. Such an approach divided the overall optimization task into smaller local tasks, solved recursively using a hierarchical ADMM with Nesterov acceleration to improve convergence. According to the evalution, the proposed methodology achieved 4.05–5.33% energy savings and a 69.55% reduction in computation time compared to conventional methods.

4.2.4. Model-Based Control with Lyapunov-Based Optimization

The literature from 2014 to 2024 includes a few notable multi-agent Lyapunov optimization approaches, which have shown substantial results in IBEMS optimization: Guo et al. [104] in 2013 utilized a Lyapunov-based cost minimization algorithm (LCMA) for coordinating energy consumption in smart grid neighborhoods. Such an approach was able to handle stochastic processes without needing future statistical knowledge and provided a decentralized implementation to preserve household privacy, achieving a 20% reduction in total energy costs by optimizing renewable generation, energy storage, and smart appliances without requiring future statistical knowledge. In 2014, Zheng et al. [105] applied Lyapunov optimization for distributed demand response (DR) in HVAC systems, minimizing power demand fluctuations while maintaining user comfort. The study validated its approach with practical data, including wind speed and temperature, reducing demand variability effectively. Yu et al. [106] in 2017 extended Lyapunov optimization to a real-time distributed control strategy for commercial building HVAC systems including air handling units (AHUs), variable air volume (VAV) boxes, and sensors. According to the evaluation, which considered real-world data sets like electricity prices and outdoor temperatures, the methodology managed to reduce energy costs by 26.6% and maintain temperature deviations within 1 °C.

4.3. Hybrid Strategies

From 2014 to 2024, numerous hybrid multi-agent optimization approaches have been developed, combining RL, ANNs, FLC, and MPC to enhance IBEMS efficiency. These approaches demonstrate substantial improvements over basic methods: In 2014, Jazizadeh [107] introduced a hybrid HVAC control framework combining participatory sensing, FLC, and a decentralized multi-agent strategy. The system dynamically adjusted set-points based on user comfort profiles, achieving a 39% reduction in daily airflow and increasing user satisfaction from 4.7 to 8.4 out of 10. Javed et al. [108] in 2016 proposed a decentralized multi-agent control strategy using a Random Neural Network and PSO-SQP optimization for HVAC systems in commercial buildings. The approach reduced energy consumption by 27.12% and achieved 88% occupancy estimation accuracy. Gonzalez et al. [109] in 2018 employed a hybrid approach combining a multi-agent system, case-based reasoning, and an ANN to optimize HVAC settings, resulting in a 41% energy saving in office buildings. That same year, Joe et al. [110] introduced a novel decentralized MPC strategy using PJ-ADMM for decentralized HVAC optimization—specifically, radiant floor heating and cooling—within a commercial office building environment. The methodology was evaluated in both real-life implementation and simulations, showing a 27% reduction in electricity consumption during the cooling season compared to baseline feedback control. In 2020, Lork et al. [111] proposed a hybrid control system using Bayesian Convolutional Neural Networks (BCNNs) and Q-learning, achieving a 15–20% improvement in energy savings. Zou et al. [112] combined LSTM networks with Deep Reinforcement Learning for HVAC control, reducing energy consumption by 27–30% while maintaining thermal comfort within ASHRAE standards. Rochd et al. [113] in 2021 introduced a PSO-MPC hybrid algorithm for energy management in smart homes, achieving 90% self-consumption and an 85% reduction in daily electricity costs. Li et al. [114] proposed using Improved Particle Swarm Optimization (IPSO) and Newton Raphson (NR) methods for multi-zone heating systems. Such a hybrid algorithm enhanced multi-agent control by leveraging Nash optimization for global equilibrium and sharing information between subsystems, significantly improving control accuracy. Numerical results illustrated a 19.05% improvement in control accuracy and a 35.71% reduction in mean absolute error (MAE) compared to quadratic programming for single-zone buildings and decentralized MPC for multi-zone buildings. In 2022, Homod et al. [115] introduced the DCMARL (Hybrid Deep Clustering Multi-Agent Reinforcement Learning) algorithm, which integrated RL and fuzzy logic control (FLC) to effectively handle large state and action spaces and improve learning efficiency. Such a framework was able to improve energy savings by 32% and thermal comfort by 21% compared to PID control. In 2022, Mork et al. [116] developed a multi-agent control strategy using distributed MPC for energy optimization. The approach integrated the cooperative ADMM algorithm for hydraulic coupling and non-cooperative Nash optimization for thermal coupling, applied to HVAC systems with detailed Modelica models, reducing energy consumption by 7.6% and thermal discomfort by 65.0% compared to decentralized methods.

4.4. Other Model-Free and Model-Based Strategies

Unlike RL, evolutionary algorithms, ADMM, or Lyapunov optimization, the literature from 2014 to 2024 features various independent multi-agent algorithms. These approaches excel in handling complex, nonlinear, and stochastic environments without requiring detailed system models. While computationally demanding and requiring substantial training data, their adaptability and learning capabilities make them powerful tools for efficient energy management in smart buildings: In 2014, Chen et al. [117] proposed a Distributed Direct Load Control (DDLC) algorithm using an average consensus algorithm to coordinate Energy Management Controllers (EMCs) within buildings. This system dynamically adjusted local power consumption based on real-time data, reducing demand–supply mismatches by 54.6% compared to unmanaged scenarios. In 2016, Dai et al. [118] introduced a decentralized control algorithm for HVAC systems, achieving a 6.8% reduction in power consumption and a 9.93% improvement in chiller efficiency through collaborative optimization. Hollinger et al. [119] in 2016 developed a State of Charge (SoC) decentralized control algorithm for distributed solar battery systems, ensuring local self-sufficiency while providing grid services. Their approach achieved a 58.6% self-sufficiency rate and significantly reduced the need for corrective energy. Michailidis et al. [120] evaluated a novel decentralized, agent-based, model-free Building Optimization and Control (BOC) methodology, referred as Local-for-Global Parameterized Cognitive Adaptive Optimization (L4GPCAO), applied to a real-world building. The building, located at RWTH Aachen University, was equipped with smart HVAC systems and renewable energy sources, integrated through the existing Building Management System (BES). According to the evaluation, the L4GPCAO approach demonstrated a 34.7% reduction in non-renewable energy consumption (NREC) compared to the baseline control strategy, achieving significant energy savings and improved comfort. A follow-up simulative study in 2020 [129] demonstrated further improvements, reducing non-renewable energy consumption by up to 8.96% and improving indoor comfort by up to 56.82%. Png et al. [121] in 2019 introduced the Smart-Token-Based Scheduling Algorithm (Smart-TBSA) using IoT devices to optimize HVAC systems in commercial buildings, achieving up to 20% energy savings. The same year, Xie et al. [122] proposed a decentralized control strategy for combined heat and power dispatch in integrated energy systems, reducing costs by 6.1% and wind curtailment by 51.8%. In a noticeable study of 2020, Lymperopoulos et al. [123] evaluated a Distributed Adaptive Control (DAC) scheme for multi-zone HVAC systems using online learning and information exchange between neighboring zones. The control system was applied to a six-zone building and a large school, achieving up to a 36.80% improvement in temperature tracking and a 13.68% reduction in energy consumption compared to baseline constant-gain control. The next year, Alhasnawi et al. [124] proposed a decentralized control strategy for multi-agent microgrids, enhancing frequency and voltage regulation by 30% and improving power-sharing accuracy by up to 25%. In 2021, Ahrens et al. [125] compared centralized and decentralized strategies for grid resilience, achieving a 99.91% reduction in voltage deviations and a 91.52% decrease in line congestion using decentralized Reactive Power Control (RPC). The same year, Jonban et al. [126] introduced a model-based EMS for DC microgrids in green buildings, maintaining stable DC bus voltage within 2% of the nominal value. Kolahan et al. [127] proposed a blockchain-integrated control with smart contracts to enable homes to share real-time data (Probability of the Next Hour, PNH) and collaboratively optimize energy consumption. The study simulated a neighborhood of 2,000 homes with various occupancy patterns, utilizing a validated physical model of heating, illumination, and appliance systems, reducing peak power load by 15% and total energy consumption by 11%. Lastly, Gupta et al. [128] in 2022 introduced a coordinated control strategy for ESS and IBDR programs. Each component (ESS and HVAC systems in buildings) operates based on local control signals and algorithms (PI and PD controllers) rather than a centralized coordinator, reducing the frequency deviation to within 0.1 Hz and improving the regulation capacity by 50% compared to BES alone.

5. Evaluation

5.1. Evaluation per Multi-Agent Algorithmic Methodology

Model-based approaches are predominant in multi-agent control applications due to their predictive capabilities, which are crucial for coordinating complex systems (Figure 6). However, according to the evaluation the most common algorithmic approach in multi-agent control remains RL, which predominantly concerns a model-free methodology. The following subsections elaborate on the most dominant multi-agent control methodologies as concerns model-based and model-free strategies.

5.1.1. Reinforcement Learning

The exploration of model-free algorithms and especially RL in multi-agent control systems, particularly within IBEMS, has led to a diverse set of approaches. Some algorithms have emerged as more prevalent due to their ability to handle specific challenges in multi-agent environments (Figure 7 (Left) and Figure 7 (Right)), such as coordination among agents, learning efficiency, and the handling of large state or action spaces. More specifically, Q-learning remains one of the fundamental approaches in MARL. Numerous applications [58,63,69] applied Q-learning frameworks to optimize energy consumption in BEMS, demonstrating the algorithm’s strength in handling discrete actions and providing stable, incremental improvements in control policies across different building zones and HVAC systems. Q-learning’s robustness in such structured environments has made it a popular choice, particularly when enhanced with MPC and deep learning techniques. DQNs have also become increasingly prominent, especially in complex multi-agent environments where state spaces are continuous. Such a tendency is evident in [59,60,61] showcasing the efficiency of a DQN in decentralized and cooperative settings, where agents manage their control strategies independently but contribute to a common goal. Additionally, in [66,78], the DQN multi-agent methodology was enhanced by integrating deep clustering and distributed action spaces, thereby improving learning efficiency and coordination among agents in HVAC systems. The prevalence of a DQN among all RL methodologies (Figure 7 (Right)), especially in cooperative scenarios, highlighted its flexibility and effectiveness in balancing energy efficiency and occupant comfort. Other advanced value-based methods, such as the Dueling Double Deep Q-Network (D3QN), have also been utilized in [68] to optimize renewable energy integration in building systems, leveraging its improved learning stability and coordination capabilities among agents. Moreover, the Coordinated Q-learning approach in [55] and the Branching Dueling Q-network (BDQ) in [70] further illustrated the adaptability of the value-based methods in high-dimensional action spaces, enabling efficient and reliable control in complex environments.

As concerns policy-based methods—which present a lower occurrence in RL methodologies (Figure 7 (Left))—they have demonstrated significant advantages in environments requiring continuous control and direct policy optimization. In [56,57], a policy-based multi-agent control was deployed, highlighted the effectiveness of policy-based approaches—like eJAL—in balancing grid support and occupant comfort. Similarly, policy-based methodologies [73,74] were applied to optimize off-grid operations and voltage regulation in power systems, demonstrating their scalability and robustness in real-time applications.

As concerns the actor–critic methods, it is noticeable that especially Multi-Actor Attention–Critic (MAAC) and also Soft Actor–Critic (SAC) have become increasingly prevalent in MARL due to their ability to handle continuous actions and large state spaces. Such AC methodologies are employed in [65,67,71,76,77] to manage energy consumption and demand response in grid-interactive buildings, showcasing their effectiveness in coordinating multiple agents to achieve energy efficiency and thermal comfort. It is evident that such frameworks illustrate enhanced robustness and adaptability, particularly when extended with attention mechanisms [62] and federated learning [72], facts which render them as preferable choices in complicated multi-agent scenarios.

Across these RL approaches, DQN, D3QN, and actor–critic methods, particularly SAC and MAAC, have emerged as the most prevalent algorithms due to their ability to effectively handle large, complex environments and continuous action spaces (Figure 7 (Left)). These algorithms have been chosen for their robustness and versatility in optimizing multi-agent interactions, managing state and action spaces, and ensuring stable learning processes in dynamic environments. The similarities in the adoption of these algorithms across different studies underline their effectiveness in addressing the unique challenges of multi-agent control in BEMS and related applications.

Last but not least, it should be mentioned that according to the evaluation, the non-RL model-free multi-agent control applications are limited in numbers. In the current study, such cases may be found in [117,120,123,124], where the algorithms present similarities with an RL approach learning directly from the outcomes of actions in the environment rather than solving predefined equations or optimization problems. Of course, it should be noted that there are cases where an RL-based approach may be considered model-based, since the focus is not on RL but on energy management using predefined optimization models, as in [56] or [115]. Such models include explicit mathematical formulations and optimization criteria that guide decision-making.

5.1.2. Model Predictive Control

The evaluation of algorithmic methodologies in multi-agent control for intelligent IBEMS reveals a nuanced landscape where each approach is selected based on its inherent computational strengths and optimization capabilities. MPC, EAs, and ADMM have emerged as the primary methodologies, each chosen for its specific algorithmic advantages in managing the complexities of multi-agent systems (Figure 8 (Left, Right)). Additionally, Lyapunov optimization and hybrid approaches provide distinct advantages that complement these primary methodologies. MPC is particularly favored for its capability to handle constrained optimization problems over a finite time horizon, making it well suited to real-time control in dynamic environments. The algorithmic foundation of MPC involves solving a sequence of optimization problems where the current control actions are derived by predicting future system states. This predictive control framework is highly effective in scenarios requiring anticipatory adjustments, such as the work in [80,81], where E-MPC is utilized to optimize energy cost and demand response. The flexibility of MPC is further enhanced by its integration with various optimization techniques, such as quadratic programming (QP) in the work of [86], and its application in decentralized settings as seen in [85]. The adaptive nature of MPC, as explored in [79], highlights its ability to adjust dynamically based on real-time data, optimizing both energy efficiency and occupant comfort. A common thread across these studies is the focus on balancing computational efficiency with real-time applicability, where MPC’s ability to iteratively refine predictions and control actions is crucial.

5.1.3. Model-Based Control with Evolutionary-Based Optimization

EAs, including PSO and GAs, are algorithmically robust in exploring and exploiting large and complex search spaces. The stochastic nature of these algorithms allows them to escape local optima, making them particularly useful in non-convex optimization problems commonly encountered in IBEMS. PSO, as employed in [92,94], leverages the collective behavior of particles to converge towards an optimal solution, with each particle adjusting its position based on its own experience and that of neighboring particles. Such an approach is computationally efficient and can handle multi-objective optimization, which is crucial in balancing energy use and occupant comfort. Similarly, the GAs, as applied in [93,95], utilizes principles of natural selection to iteratively evolve a population of solutions, providing a powerful mechanism for optimizing complex systems with multiple conflicting objectives. These studies highlight the adaptability of EA methods in multi-agent systems, particularly in environments where the search space is vast and the optimization landscape is rugged.

5.1.4. Model-Based Control with ADMM-Based Optimization

ADMM is algorithmically distinct in its ability to decompose complex optimization problems into smaller, more manageable sub-problems. This decomposition is particularly advantageous in distributed systems, where computational tasks can be parallelized across multiple agents. The core of ADMM’s algorithmic appeal lies in its iterative approach, where each agent solves a local sub-problem and updates a global variable to ensure consistency across the entire system. This method is effectively demonstrated in [97], where ADMM is used to coordinate multiple HVAC units by breaking down the global optimization problem into simpler sub-problems. The proximal Jacobian variant of ADMM, as utilized by Hou et al. [98,110], further enhanced the algorithm’s ability to converge quickly, even in non-convex settings.

5.1.5. Model-Based Control with Lyapunov-Based Optimization

Such an optimization strategy is less frequently applied; however, it offers a mathematically rigorous approach to ensuring system stability while optimizing performance over time. The algorithm’s strength lies in its ability to handle systems with time-varying constraints and uncertainties, making it particularly suitable for applications where maintaining stability is as critical as optimizing performance. The Lyapunov-based approach, as employed in [104,105,106], uses a drift-plus-penalty framework to balance system performance with stability, ensuring that the system remains stable while optimizing energy costs. This method does not require future statistical information, making it robust against uncertainty and highly adaptable to real-time applications.

5.1.6. Hybrid Strategies

Such schemes combine the strengths of multiple algorithms, creating a versatile framework that can address the limitations of individual methodologies. According to the evaluation, such approaches often integrate machine learning techniques, such as RL, evolutionary algorithms, and ADMM, with traditional control algorithms like MPC, ANNs, and FLC. In particular, ANNs and FLC are commonly utilized in hybrid schemes since they represent a straightforward approach to model the behavior of specific frameworks based on historical data [7,130,131]. In general, the hybridization process, as seen in [107,109], leverages the adaptive learning capabilities of RL and the predictive power of ANNs to enhance the decision-making process in multi-agent systems. Hybrid multi-agent control schemes are particularly efficient at dynamically adjusting control strategies based on real-time data, offering a high degree of flexibility and adaptability in managing complex, non-linear systems. The studies in [111,112] further demonstrate the effectiveness of combining RL with other algorithms to improve both energy efficiency and occupant comfort in dynamically changing environments.

In conclusion, the selection of MPC, evolutionary algorithms, and ADMM in multi-agent control for IBEMS is driven by their distinct algorithmic capabilities. MPC’s predictive control and adaptability, EA’s robust search mechanisms, and ADMM’s ability to decompose and parallelize optimization tasks provide a comprehensive toolkit for managing the complexities of smart building environments. Lyapunov optimization and hybrid approaches further complement these methodologies by ensuring stability and enhancing adaptability, making them essential components in the ongoing development of intelligent energy management systems. The commonalities across these methodologies include their focus on optimization under uncertainty, scalability in distributed systems, and adaptability to real-time data, all of which are critical for the successful implementation of multi-agent control in IBEMS.

5.2. Evaluation per Agent Type

In the review of multi-agent control applications, the prevalence of cooperative agents reflects the advantages of shared information and collaboration in optimizing complex environments (Figure 9). RL naturally aligns with cooperative strategies, as agents sharing observations can collectively learn to maximize overall system rewards [55,56,61,62,65,66,67,68,69,71,73,78]. Such an approach is particularly effective in scenarios where agents’ actions are interdependent, making cooperation essential for achieving optimal outcomes. Numerous RL applications, though, exhibit both decentralized and cooperative (DE/CO) characteristics, where agents retain local decision-making autonomy (decentralization) while still engaging in limited cooperation by sharing critical information [57,58,59,60,74,75,76]. Such a combined approach allows agents to benefit from both localized control and the collective knowledge of the group, leading to more robust and flexible solutions in dynamic environments. On the other hand, purely decentralized RL applications focus on scenarios where agents operate independently, relying solely on their local observations without sharing information [63,64,72]. These applications are particularly suited to environments where agents must function autonomously, such as in large-scale systems where communication overhead is a concern or in scenarios where agents operate in completely separate domains. Decentralized RL allows for scalable and flexible solutions, enabling each agent to adapt to its specific environment without the need for coordination with others. This approach is beneficial in situations where global cooperation is either unnecessary or impractical, emphasizing the autonomy and adaptability of individual agents. The distinction between these approaches in RL highlights the adaptability of the algorithm to different multi-agent types, where the degree of cooperation and decentralization can be tuned to match the specific requirements and constraints of the application. This versatility makes RL a powerful tool in multi-agent systems, capable of addressing a wide range of control challenges through varying levels of agent interaction and autonomy.

Similarly, evolutionary-based approaches, which involve population-based search processes, inherently leverage cooperation to explore and exploit the solution space more effectively, leading to the dominance of cooperative strategies [92,93,94,95,96]. More specifically, evolutionary-based approaches, such as GAs [93,95] and PSO [92,94,96], rely on a population of solutions that evolve over time through processes like selection, crossover, and mutation. Cooperation among agents is integral to these methods, as it allows for the sharing of information and traits, which enhances the exploration and exploitation of the solution space. For example, in GAs, the crossover operation enables agents to combine their best features, while in PSO, particles share information about their best positions, guiding the swarm towards optimal solutions. This inherent cooperation helps prevent the algorithm from getting stuck in local optima and ensures a more effective search for global optima, making cooperative strategies dominant in these approaches.

MPC, on the other hand, often employs decentralized strategies, adjusting its design to operate with agents that make decisions based on local information. Such a tendency suits applications where agents function independently or in loosely coupled environments, allowing for flexibility while still optimizing individual objectives [80,81,82,84,85,86,87,88,89,90]. The combination of decentralized and cooperative (DE/CO) strategies in some applications suggests a combined approach, where agents maintain autonomy but still engage in limited collaboration to enhance overall system performance [82].

In ADMM approaches, the algorithm inherently requires cooperation among agents to solve decomposed sub-problems iteratively, leading to a strong preference for cooperative strategies [97,98,99,100,103,110,116]. However, due to its distributed nature, some ADMM applications may also exhibit decentralized characteristics, where agents work on different parts of the problem while coordinating with others to ensure convergence [101,102].

Lyapunov optimization, with its focus on stability and performance under constraints, tends to favor non-cooperative strategies, where agents prioritize their individual goals to maintain system stability. This approach is consistent with the algorithm’s need to enforce strict performance criteria, often requiring agents to operate independently rather than cooperatively [104,105,106]. These tendencies illustrate how the core architecture and operational principles of each algorithm influence the choice of multi-agent optimization strategies in various control scenarios.

5.3. Evaluation per BEMS Type

According to the evaluation, HVAC equipment is the primary concern of multi-agent control for IBEMS, since they comprise a major energy consumer element in buildings, often accounting for a significant portion of energy costs (Figure 10). Additionally, the complexity and variability in heating and cooling needs make HVAC systems ideal for the advanced optimization capabilities offered by multi-agent control systems. RES—such as PV, Solar Heating, or wind turbines—commonly concern multi-agent control, since such kinds of equipment are commonly integrated in the energy mix of a building. The growing emphasis on sustainability and energy efficiency drives the adoption of such systems, which may be dynamically managed by multi-agent control frameworks to optimize energy generation and consumption. Additionally, the variability of RES aligns well with multi-agent systems’ ability to adapt to changing conditions and optimize the use of available resources. Likewise, ESS are frequently used in the energy mix for IBEMS, since they are able to foster the balance and the variability of RES by storing excess energy for use during periods of low generation. Such integration enhances energy reliability and efficiency, allowing buildings to maximize the benefits of RES and maintain a stable energy supply. EVs posses a lower occurrence in the IBEMS mix; however, it it noticeable that such energy systems are often integrated with RES and ESS to optimize energy usage and support sustainable transportation solutions. The combination of EVs with RES and ESS enables the use of renewable energy for charging, reducing the carbon footprint of EV operations. LS are an inseparable part of buildings and thus numerous applications concern a the adaptability of such systems for enhancing energy efficiency and creating more comfortable and productive environments. What is noticeable in the evaluation is that the applications concerning LS are common in evolutionary algorithm implementation along with other energy systems such as HVAC, RES, and ESS [92,93,94,95]. Last but not least, DHW systems are frequently coupled with RES, utilizing renewable energy to meet water heating demands efficiently. Such trends demonstrate the ability of multi-agent control systems to manage complex energy interactions and promote sustainability across various building functions. The following Figure 10 illustrates the occurrence of the different energy systems in the IBEMS frameworks as denoted in the literature.

A notable trend emerging from the evaluation pertains to the combinations of different kinds of equipment within IBEMS frameworks. The most frequent combination involves HVAC/RES/ESS, observed in seven instances across the reviewed literature [62,68,75,88,113,122,128]. Similarly, the integration of RES/ESS appears five times [55,71,74,77,119], while more intricate interactions, such as those involving RES/ESS/other equipment, are noted in six cases [58,59,72,73,124,126]. It is evident that as the complexity of integration increases, the occurrence diminishes. For example, combinations involving four or more energy systems, such as HVAC/LS/RES/ESS, are documented three times [93,94,95], while HVAC/RES/ESS/EVs appears just twice in the evaluated studies [96,102]. Such a trend suggests a gap in the research on more complex, multi-system IBEMS configurations. Addressing this gap may provide valuable insights into optimizing energy efficiency and system coordination in buildings where multiple subsystems—such as HVAC, RES, ESS, LS, DHW, and EVs are required to operate in an integrated and cohesive manner. Research in this area may be benefited and explore challenges like interoperability, system dynamics, and control strategies for more holistic energy management solutions.

5.4. Evaluation per Building Type

According to the evaluation, multi-agent control applications for IBEMS are more prevalent in commercial buildings, such as offices [67,70,79,84,90,98,109,110,116] and university research centers [83,120,121,129], due to the complex energy demands and potential for energy savings in such environments (Figure 11 (Left) and Figure 11 (Right)). Since commercial buildings often hold diverse and dynamic occupancy patterns, they require more sophisticated management strategies to optimize energy efficiency. Multi-agent control frameworks may effectively manage and coordinate various subsystems and their combination in the energy mix, adapting to changing conditions and user needs. Furthermore, the larger scale and higher energy consumption of commercial buildings offer significant opportunities for research into advanced control strategies, making them a prime focus for studies on improving energy efficiency and sustainability.

It should be noted, though, that the case of multi-agent control in residential settings mostly involves multi-zone, multi-storey residential ecosystems [80,81,85,86,87,88,96] as communities or green districts. Such particularity implies the integration of various and heterogeneous kinds of energy system equipment and thus preserves significant complexity as well in comparison to a conventional single-storey residential apartment. Last but not least, there are numerous cases in which multi-agent control application involved and concerned both commercial and residential structures [60,71,74,75,82,84,88,102,124]. Such a tendency highlights the versatility of multi-agent control systems in managing diverse building types by addressing the distinct energy needs and occupancy patterns of mixed-use developments, for enhancing overall energy efficiency.

The following Figure 11 illustrates the occurrence of residential and commercial buildings found in the literature (2014–2024).

5.5. Evaluation per Application Type

According to the evaluation, multi-agent control applications for IBEMS are limited in real-life implementations primarily due to the complexity and cost of deployment [58,70,73,90,91,107,109,110,113,117,118,119,120,121]. Such a tendency is reasonable considering the investment and effort requirements for infrastructure, sensors, communication networks, and control systems. Moreover, the unpredictable nature of building environments, such as varying user behaviors and external factors, poses challenges for the reliability and scalability of such control frameworks. Consequently, the vast majority of applications often rely on simulations to test and refine multi-agent strategies, as they offer a controlled environment to explore theoretical concepts without the logistical and financial constraints of physical implementation. A lot of them, though, integrate real-world historical data in an effort to enhance the accuracy and relevance of their models, allowing researchers to validate their control strategies under realistic conditions [63,104,106]. This approach helps ensure that simulations account for actual energy consumption patterns and environmental variables, improving the potential for successful real-world application. The following Figure 12, portrays the comparison between simulative and real-life applications type occurrence and percentage (%), for the last decade (2014–2024).

What is evident, though, is that other/independent algorithmic approaches illustrate a higher prevalence towards real-world applications (Table 6 and Table 7). Such a tendency may be justified, since such multi-agent approaches may be tailored to specific building conditions and constraints, providing more flexibility than standard methods like RL or MPC. Such bespoke solutions often address unique challenges and limitations that common algorithms might not handle effectively, increasing their suitability for real-life practical deployment [117,118,119,120,121].

5.6. Evaluation per Simulation Tool

In the framework of IBEMS, advanced simulation tools play a crucial role in developing and testing multi-agent control strategies. Such tools allow researchers to model complex building environments, simulate various control scenarios, and optimize energy management systems using a combination of traditional modeling techniques and advanced machine learning approaches. Such a combination of simulation tools and machine learning paves the way for intelligent and sustainable energy management systems, reducing energy consumption and enhancing building performance.

More specifically, platforms like OpenAI Gym have been widely utilized in order to enhance the development and testing of multi-agent control strategies for IBEMS. Since OpenAI Gym supports the implementation of RL algorithms, it has helped researchers to create simulated environments that mimic real-world building operations [71,72,74,75,76]. In particular, the utilization of the CityLearn open-source environment, as denoted in [77,132], leveraged OpenAI Gym’s ability to provide a standardized setting for developing and testing RL algorithms, facilitating the exploration of decentralized and cooperative control strategies.

EnergyPlus is another building energy software that is commonly utilized in order to efficiently model heating, cooling, lighting, ventilation, and other energy flows within a building, allowing researchers to simulate and analyze the energy performance of building systems [66,86,123]. According to the evaluation, EnergyPlus is often used in combination with other tools to simulate building environments where agents may learn and adapt their control strategies. For instance, Xie et al. [76] utilized EnergyPlus along with OpenAI Gym for multi-agent experiments, while others [61,80,81] combined the tool with the Building Controls Virtual Test Bed (BCVTB) for space modeling and energy performance analysis.

Modelica is another open-source, object-oriented modeling language commonly used for simulating complex multi-domain systems, capturing interactions between mechanical, electrical, and thermal components [70]. The tool does not explicitly focus on building energy systems, but posses more general-purpose features than EnergyPlus. In particular, the use of Dymola—a powerful simulation tool that provides a graphical user interface and advanced simulation algorithms for developing and analyzing models written in Modelica—has also been commonly denoted in the literature. More specifically, Blad et al. [69] utilized Modelica/Dymola for simulating data-driven models in building control, while Michailidis et al. [129] and Sangi et al. [83] used the tools to apply novel multi-agent control scenarios where complex system interactions were required to be modeled, such as integrating HVAC, lighting, and control systems.

Similar to Dymola, TRNSYS is often used for studying the dynamic performance of systems and optimizing energy flows in BEMS through control strategies, including those implemented by multi-agent systems [95,100,101]. Such a tool was specifically designed for simulating transient energy systems and building performance, focusing on dynamic interactions over time. It excels in modeling RES, HVACs, and building energy consumption.

Last but not least, the Java Agent Development Framework (JADE) is also commonly denoted in the literature, portraying a software framework that simplifies the development of multi-agent systems. It is widely used in academic research for implementing agent-based models due to its flexibility and ease of integration with other software environments. For instance, Hurtado et al. [57,92] utilized the JADE with MATLAB for enabling smart building systems, while Kofinas et al. [58] used the tool for developing FLC in multi-agent systems.

According to the evaluation, it is evident that OpenAI Gym and CityLearn are particularly favored for reinforcement learning experiments, while EnergyPlus and Modelica/Dymola are preferred for detailed energy simulations and dynamic system modeling. The following Figure 13 portrays the occurrence of each simulation tool as denoted in the literature for multi-agent control simulation experiments.

6. Conclusions

This review critically examines the most significant applications of multi-agent control strategies for IBEMS over the past decade (2014–2024). By drawing fruitful conclusions from the evaluation of various algorithmic methodologies, the evaluation underscores the critical importance of selecting appropriate methodologies to effectively address the unique challenges of smart buildings. The most significant conclusions drawn from the evaluation are as follows:

Prevalence of Algorithmic Control Methodologies: RL remains the most prominent algorithmic approach for multi-agent control in IBEMS, with methods such as Q-learning, Deep Q-Networks, and actor–critic approaches (e.g., MAAC) being the most frequently used. Such strategies excelled at handling complex environments with large state spaces, and their effectiveness in optimizing energy management and occupant comfort makes them a top choice for multi-agent scenarios. However, it is observed that, overall, model-based control strategies are more commonly used in the literature than model-free ones. In particular, MPC and other model-based approaches, such as ADDM-based and PSO-based, have been shown to offer strong predictive capabilities and real-time adaptability, making them essential for managing dynamic environments where anticipatory control is required. The dominance of model-based control methodologies, however, highlights an opportunity for modern research to delve deeper into model-free methodologies, which offer more flexible and adaptive solutions that are potentially less reliant on predefined system models. This trend could also be supported by the rise of IoT technologies, enabling more data-driven, real-time control without the need for complex system modeling.
Rising Adoption of Hybrid Methodologies: Hybrid control schemes, integrating methodologies like RL, MPC, EAs, and ANN/FLC, are becoming increasingly popular due to their ability to leverage the strengths of multiple techniques. Such methods offer enhanced flexibility and adaptability, particularly in dynamically changing environments where multiple control objectives must be optimized simultaneously. According to the evaluation, hybrid methodologies hold strong potential for the future, not only due to their ability to synergize strengths from different algorithmic paradigms but also because they provide a framework for adaptive control in increasingly complex and uncertain environments. To this end, as the complexity of IBEMS frameworks continues to grow—requiring robust coordination and real-time adaptability—hybrid methodologies will play a crucial role in optimizing energy management, improving system efficiency, and enhancing occupant comfort.
Prevalence of Cooperative Agent Type Strategies: According to the evaluation, cooperative agents represent a particularly advantageous scheme in multi-agent control systems, especially in RL and evolutionary algorithms. Cooperation enables agents to share critical information and align their objectives, which is crucial in interdependent systems like HVAC, RES, and ESS. By coordinating actions, cooperative agents can optimize energy management more effectively than decentralized agents, which operate independently and may struggle in complex environments with high interdependencies. The ability to exchange experiences accelerates learning in RL, leading to better collective outcomes. Similarly, in evolutionary algorithms, cooperation prevents premature convergence by promoting diverse exploration of the solution space. In contrast, purely decentralized approaches may underperform in scenarios requiring high coordination, highlighting the superiority of cooperative methods in managing complex, distributed energy systems.
Constraints on Multi-Equipment Integration in IBEMS: The evaluation highlights HVAC systems as the predominant focus of multi-agent control in IBEMS due to their high energy consumption. RES, often combined with ESS, also feature prominently, contributing to sustainable building energy management. A key trend identified concerns the relatively low frequency of applications involving more than four types of systems. While multi-system interactions involving HVAC, RES, ESS, and EVs remain limited, such integrations are essential in real-world applications for achieving comprehensive energy efficiency. As IBEMS grow more complex with diverse subsystems, the need for advanced multi-agent control methods capable of managing this complexity, ensuring real-time adaptability, and facilitating effective inter-agent coordination, becomes increasingly critical.
Limited Real-life Implementations: Despite the potential of multi-agent control systems, real-world applications are still limited due to the high cost, complexity, and logistical challenges of deployment. However, it is noticeable that real-life implementations tend to favor novel approaches–like Hybrid or Other approaches—because they offer greater flexibility and customization to specific building conditions. Unlike conventional approaches, customized approaches may provide a better handling of real-world complexities, uncertainties, and system-specific constraints, making them more practical for deployment in diverse and dynamic environments. Such a fact is also supported by numerous literature works, where comparisons often highlight that novel approaches outperform conventional methods. Another key observation is that real-life implementations predominantly focus on commercial buildings, underscoring the need for more extensive research and deployment at the residential level. Given the unique energy demands and dynamics of residential environments, there is a clear opportunity to expand multi-agent control systems to optimize energy use and sustainability in homes, which remains underexplored in comparison to commercial real-world applications.

In conclusion, the ongoing advancements in multi-agent control for energy management in buildings reflect a strong commitment to developing more efficient, adaptable, and intelligent energy management systems. By leveraging the complementary strengths of model-free and model-based approaches, future research and practical applications will be well equipped to meet the evolving demands of smart buildings, contributing to the creation of more sustainable and responsive urban environments.

Author Contributions

Conceptualization, P.M.; methodology, P.M. and I.M.; software, P.M., validation, all authors; formal analysis, all authors; investigation, all authors; resources, all authors; writing—original draft preparation, P.M.; writing—review and editing, all authors; visualization, all authors; supervision, P.M. and E.K. All authors have read and agreed to the published version of the manuscript.

Funding

The research leading to these results was partially funded by the European Commission HORIZON-CL5-2023-D4-01-05—Innovative solutions for cost-effective decarbonisation of buildings through energy efficiency and electrification (IA): SEEDS—Cost-effective and replicable RES-integrated electrified heating and cooling systems for improved energy efficiency and demand response (Grant agreement ID: 101138211) https://project-seeds.eu/ EC signature date 12 December 2023; Start date 1 January 2024; End date 31 December 2027.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AC	Actor–critic
ACKTR	Actor–critic using Kronecker-Factored Trust Region
ADMM	Alternating Direction Method of Multipliers
ANN	Artificial Neural Network
BDQ	Branching Dueling Q-network
BOC	Building Optimization and Control
CQL	Coordinated Q-learning
CO	Cooperative
D3QN	Dueling Double Deep Q-Network
DCCMARL	Deep Clustering of Cooperative Multi-agent Reinforcement Learning
DE	Decentralized
DHW	Domestic Hot Water systems
DQN	Deep Q-Networks
DR	Demand Response
EA	Evolutionary Algorithm
E-MPC	Economic-MPC
ESS	Energy Storage Systems
EVs	Electric Vehicles
FLC	Fuzzy logic control
GA	Genetic Algorithm
HVAC	Heating, Ventilation, and Air conditioning
IBEMS	Integrated Building Energy Management Systems
IoT	Internet of Things
J-ADMM	Jacobian ADMM
L4GPCAO	Local-for-Global Parameterized Cognitive Adaptive Optimization
LCMA	Lyapunov-based Cost Minimization Algorithm
LSTM	Long short-term memory networks
MAAC	Multi-Actor Attention–Critic
MARDL	Multi-agent reinforcement learning
MDP	Markov Decision Process
MLP	Multi-Layer Perceptron
MPC	Model Predictive Control
N/A	Not identified
Non-CO	Non-cooperative
NR	Newton–Raphson
PPO	Proximal Policy Optimization
PSO	Particle Swarm Optimization
QP	Quadratic Programming
RES	Renewable Energy Sources
SAC	Soft Actor–Critic
Smart-TBSA	Smart-Token-Based Scheduling Algorithm
SoC	State of Charge
SoS	System of Systems
SQP	Sequential Quadratic Programming
TD3	Twin-Delayed Deep Deterministic

References

Santamouris, M.; Feng, J. Recent progress in daytime radiative cooling: Is it the air conditioner of the future? Buildings 2018, 8, 168. [Google Scholar] [CrossRef]
Zhang, W.; Wu, Y.; Calautit, J.K. A review on occupancy prediction through machine learning for enhancing energy efficiency, air quality and thermal comfort in the built environment. Renew. Sustain. Energy Rev. 2022, 167, 112704. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, J. Advances in emerging digital technologies for energy efficiency and energy integration in smart cities. Energy Build. 2024, 315, 114289. [Google Scholar] [CrossRef]
Papada, L.; Balaskas, A.; Katsoulakos, N.; Kaliampakos, D.; Damigos, D. Fighting energy poverty using user-driven approaches in mountainous Greece: Lessons learnt from a living lab. Energies 2021, 14, 1525. [Google Scholar] [CrossRef]
Minelli, F.; Ciriello, I.; Minichiello, F.; D’Agostino, D. From Net Zero Energy Buildings to an energy sharing model-The role of NZEBs in Renewable Energy Communities. Renew. Energy 2024, 223, 120110. [Google Scholar] [CrossRef]
Dhabliya, D.; Gopalakrishnan, S.; Mudigonda, A.; Omirbayevna, T.G.; Rajalakshmi, K.; Kulshreshtha, K.; Shnain, A.H.; KrishnaBhargavi, Y. Utilizing Big Data and environmentally-focused innovations to create smart, sustainable cities by integrating energy management, energy-efficient buildings, pollution mitigation, and urban circulation. In Proceedings of the 2023 International Conference for Technological Engineering and Its Applications in Sustainable Development (ICTEASD), Al-Najaf, Iraq, 14–15 November 2023; IEEE: New York, NY, USA, 2023; pp. 402–408. [Google Scholar]
Michailidis, P.; Michailidis, I.; Gkelios, S.; Kosmatopoulos, E. Artificial Neural Network Applications for Energy Management in Buildings: Current Trends and Future Directions. Energies 2024, 17, 570. [Google Scholar] [CrossRef]
Hannan, M.A.; Faisal, M.; Ker, P.J.; Mun, L.H.; Parvin, K.; Mahlia, T.M.I.; Blaabjerg, F. A review of internet of energy based building energy management systems: Issues and recommendations. IEEE Access 2018, 6, 38997–39014. [Google Scholar] [CrossRef]
Ahmed, I.; Asif, M.; Alhelou, H.H.; Khalid, M.; Khalid, M. A review on enhancing energy efficiency and adaptability through system integration for smart buildings. J. Build. Eng. 2024, 89, 109354. [Google Scholar]
Michailidis, P.; Pelitaris, P.; Korkas, C.; Michailidis, I.; Baldi, S.; Kosmatopoulos, E. Enabling optimal energy management with minimal IoT requirements: A legacy A/C case study. Energies 2021, 14, 7910. [Google Scholar] [CrossRef]
Ahn, J. An Adaptive Control Model for Thermal Environmental Factors to Supplement the Sustainability of a Small-Sized Factory. Sustainability 2023, 15, 16619. [Google Scholar] [CrossRef]
Minelli, F.; D’Agostino, D.; Migliozzi, M.; Minichiello, F.; D’Agostino, P. PhloVer: A Modular and integrated tracking photovoltaic shading Device for sustainable large urban spaces—preliminary Study and prototyping. Energies 2023, 16, 5786. [Google Scholar] [CrossRef]
Fotopoulou, M.C.; Drosatos, P.; Petridis, S.; Rakopoulos, D.; Stergiopoulos, F.; Nikolopoulos, N. Model predictive control for the energy Management in a District of buildings equipped with building integrated photovoltaic systems and batteries. Energies 2021, 14, 3369. [Google Scholar] [CrossRef]
Papantoniou, S.; Mangili, S.; Mangialenti, I. Using intelligent building energy management system for the integration of several systems to one overall monitoring and management system. Energy Procedia 2017, 111, 639–647. [Google Scholar] [CrossRef]
Tsaousoglou, G.; Efthymiopoulos, N.; Makris, P.; Varvarigos, E. Multistage energy management of coordinated smart buildings: A multiagent Markov decision process approach. IEEE Trans. Smart Grid 2022, 13, 2788–2797. [Google Scholar] [CrossRef]
Xia, W.; Goh, J.; Cortes, C.A.; Lu, Y.; Xu, X. Decentralized coordination of autonomous AGVs for flexible factory automation in the context of Industry 4.0. In Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), Hong Kong, China, 20–21 August 2020; IEEE: New York, NY, USA, 2020; pp. 488–493. [Google Scholar]
Tan, K.K.; Putra, A.S. Drives and Control for Industrial Automation; Springer Science & Business Media: London, UK, 2010. [Google Scholar]
Tsintotas, K.A.; Kansizoglou, I.; Konstantinidis, F.K.; Mouroutsos, S.G.; Syrakoulis, G.C.; Psarommatis, F.; Aloimonos, Y.; Gasteratos, A. Active vision: A promising technology for achieving zero-defect manufacturing. Procedia Comput. Sci. 2024, 232, 2821–2830. [Google Scholar] [CrossRef]
Michailidis, I.T.; Manolis, D.; Michailidis, P.; Diakaki, C.; Kosmatopoulos, E.B. Autonomous self-regulating intersections in large-scale urban traffic networks: A Chania City case study. In Proceedings of the 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), Thessaloniki, Greece, 10–13 April 2018; IEEE: New York, NY, USA, 2018; pp. 853–858. [Google Scholar]
Michailidis, I.T.; Manolis, D.; Michailidis, P.; Diakaki, C.; Kosmatopoulos, E.B. A decentralized optimization approach employing cooperative cycle-regulation in an intersection-centric manner: A complex urban simulative case study. Transp. Res. Interdiscip. Perspect. 2020, 8, 100232. [Google Scholar] [CrossRef]
Baskar, L.; De Schutter, B.; Hellendoorn, H. Decentralized traffic control and management with intelligent vehicles. In Proceedings of the 9th TRAIL Congress, Delft, The Netherlands, 21 November 2006; Volume 1, p. 3. [Google Scholar]
Karatzinis, G.D.; Michailidis, P.; Michailidis, I.T.; Kapoutsis, A.C.; Kosmatopoulos, E.B.; Boutalis, Y.S. Coordinating heterogeneous mobile sensing platforms for effectively monitoring a dispersed gas plume. Integr. Comput.-Aided Eng. 2022, 29, 411–429. [Google Scholar] [CrossRef]
Figetakis, E.; Bello, Y.; Refaey, A.; Shami, A. Decentralized semantic traffic control in avs using rl and dqn for dynamic roadblocks. arXiv 2024, arXiv:2406.18741. [Google Scholar]
Cervo, A.; Goldoni, G.; Fantuzzi, C.; Borsari, R. Decentralized line equipment detection and production control by multi-agent technology. In Proceedings of the IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society, Lisbon, Portugal, 14–17 October 2019; IEEE: New York, NY, USA, 2019; Volume 1, pp. 2940–2945. [Google Scholar]
Ramadan, R.; Huang, Q.; Zalhaf, A.S.; Bamisile, O.; Li, J.; Mansour, D.E.A.; Lin, X.; Yehia, D.M. Energy Management in Residential Microgrid Based on Non-Intrusive Load Monitoring and Internet of Things. Smart Cities 2024, 7, 1907–1935. [Google Scholar] [CrossRef]
Hamid, M.N.A.; Banakhr, F.A.; Mohamed, T.H.; Ali, S.M.; Mahmoud, M.M.; Mosaad, M.I.; Albla, A.A.H.; Hussein, M.M. Adaptive Frequency Control of an Isolated Microgrids Implementing Different Recent Optimization Techniques. Int. J. Robot. Control Syst. 2024, 4, 1000–1012. [Google Scholar] [CrossRef]
Mussi, M.; Pellegrino, L.; Pindaro, O.F.; Restelli, M.; Trovò, F. A Reinforcement Learning controller optimizing costs and battery State of Health in smart grids. J. Energy Storage 2024, 82, 110572. [Google Scholar] [CrossRef]
Rajani, D.; Gopal, J.V.; Saxena, A.; Salman, Z.N.; Jain, A. Enhancing Electric Vehicle Charging with Dynamic Adaptation: A Machine Learning Approach to Improving Grid Alignment Precision. In Proceedings of the 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), Gautam Buddha Nagar, India, 9–11 May 2024; IEEE: New York, NY, USA, 2024; pp. 702–707. [Google Scholar]
Sivakumar, S.; Yadav, M.R.; Bharathi, A.; Akila, D.; Dineshkumar, P.; Banupriya, V. Reinforcement Learning Driven Smart Charging Algorithms to Enhance Battery Lifespan and Grid Sustainability. In Proceedings of the 2024 International Conference on Advances in Modern Age Technologies for Health and Engineering Science (AMATHE), Shivamogga, India, 16–17 May 2024; IEEE: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
Michailidis, P.; Michailidis, I.; Vamvakas, D.; Kosmatopoulos, E. Model-Free HVAC Control in Buildings: A Review. Energies 2023, 16, 7124. [Google Scholar] [CrossRef]
Gao, C.; Wang, D. Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems. J. Build. Eng. 2023, 74, 106852. [Google Scholar] [CrossRef]
Biagioni, D.; Zhang, X.; Adcock, C.; Sinner, M.; Graf, P.; King, J. From model-based to model-free: Learning building control for demand response. arXiv 2022, arXiv:2210.10203. [Google Scholar]
Lazaridis, C.R.; Michailidis, I.; Karatzinis, G.; Michailidis, P.; Kosmatopoulos, E. Evaluating Reinforcement Learning Algorithms in Residential Energy Saving and Comfort Management. Energies 2024, 17, 581. [Google Scholar] [CrossRef]
Gautam, M. Deep Reinforcement learning for resilient power and energy systems: Progress, prospects, and future avenues. Electricity 2023, 4, 336–380. [Google Scholar] [CrossRef]
Xin, X.; Zhang, Z.; Zhou, Y.; Liu, Y.; Wang, D.; Nan, S. A comprehensive review of predictive control strategies in Heating, Ventilation, and Air-conditioning (HVAC): Model-free VS Model. J. Build. Eng. 2024, 94, 110013. [Google Scholar] [CrossRef]
Vamvakas, D.; Michailidis, P.; Korkas, C.; Kosmatopoulos, E. Review and evaluation of reinforcement learning frameworks on smart grid applications. Energies 2023, 16, 5326. [Google Scholar] [CrossRef]
Kurte, K.; Amasyali, K.; Munk, J.; Zandi, H. Comparative analysis of model-free and model-based HVAC control for residential demand response. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Coimbra, Portugal, 17–18 November 2021; pp. 309–313. [Google Scholar]
Mahela, O.P.; Khosravy, M.; Gupta, N.; Khan, B.; Alhelou, H.H.; Mahla, R.; Patel, N.; Siano, P. Comprehensive overview of multi-agent systems for controlling smart grids. CSEE J. Power Energy Syst. 2020, 8, 115–131. [Google Scholar]
Naylor, S.; Gillott, M.; Lau, T. A review of occupant-centric building control strategies to reduce building energy use. Renew. Sustain. Energy Rev. 2018, 96, 1–10. [Google Scholar] [CrossRef]
Merabet, G.H.; Essaaidi, M.; Haddou, M.B.; Qolomany, B.; Qadir, J.; Anan, M.; Al-Fuqaha, A.; Abid, M.R.; Benhaddou, D. Intelligent building control systems for thermal comfort and energy-efficiency: A systematic review of artificial intelligence-assisted techniques. Renew. Sustain. Energy Rev. 2021, 144, 110969. [Google Scholar] [CrossRef]
Kathirgamanathan, A.; De Rosa, M.; Mangina, E.; Finn, D.P. Data-driven predictive control for unlocking building energy flexibility: A review. Renew. Sustain. Energy Rev. 2021, 135, 110120. [Google Scholar] [CrossRef]
Al Dakheel, J.; Del Pero, C.; Aste, N.; Leonforte, F. Smart buildings features and key performance indicators: A review. Sustain. Cities Soc. 2020, 61, 102328. [Google Scholar] [CrossRef]
Latif, M.; Nasir, A. Decentralized stochastic control for building energy and comfort management. J. Build. Eng. 2019, 24, 100739. [Google Scholar] [CrossRef]
Wang, Z.; Hong, T. Reinforcement learning for building controls: The opportunities and challenges. Appl. Energy 2020, 269, 115036. [Google Scholar] [CrossRef]
Váňa, Z.; Cigler, J.; Širokỳ, J.; Žáčeková, E.; Ferkl, L. Model-based energy efficient control applied to an office building. J. Process. Control. 2014, 24, 790–797. [Google Scholar] [CrossRef]
Moroşan, P.D.; Bourdais, R.; Dumur, D.; Buisson, J. Building temperature regulation using a distributed model predictive control. Energy Build. 2010, 42, 1445–1452. [Google Scholar] [CrossRef]
Eser, S.; Stoffel, P.; Kümpel, A.; Müller, D. Distributed model predictive control of a nonlinear building energy system using consensus ADMM. In Proceedings of the 2022 30th Mediterranean Conference on Control and Automation (MED), Athens, Greece, 28 June–1 July 2022; pp. 902–907. [Google Scholar]
Yang, R.; Wang, L. Multi-zone building energy management using intelligent control and optimization. Sustain. Cities Soc. 2013, 6, 16–21. [Google Scholar] [CrossRef]
Smitha, S.; Chacko, F.M. Intelligent energy management in smart and sustainable buildings with multi-agent control system. In Proceedings of the 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), Kottayam, India, 22–23 March 2013; pp. 190–195. [Google Scholar]
Buşoniu, L.; Babuška, R.; De Schutter, B. Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications-1; Springer: Berlin/Heidelberg, Germany, 2010; pp. 183–221. [Google Scholar]
Morari, M.; Garcia, C.E.; Prett, D.M. Model predictive control: Theory and practice. IFAC Proc. Vol. 1988, 21, 1–12. [Google Scholar] [CrossRef]
Doukas, H.; Patlitzianas, K.D.; Iatropoulos, K.; Psarras, J. Intelligent building energy management system using rule sets. Build. Environ. 2007, 42, 3562–3569. [Google Scholar] [CrossRef]
Yang, Y.; Hu, G.; Spanos, C.J. HVAC energy cost optimization for a multizone building via a decentralized approach. IEEE Trans. Autom. Sci. Eng. 2020, 17, 1950–1960. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends ® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
Raju, L.; Sankar, S.; Milton, R. Distributed optimization of solar micro-grid using multi agent reinforcement learning. Procedia Comput. Sci. 2015, 46, 231–239. [Google Scholar] [CrossRef]
Anvari-Moghaddam, A.; Rahimi-Kian, A.; Mirian, M.S.; Guerrero, J.M. A multi-agent based energy management solution for integrated buildings and microgrid system. Appl. Energy 2017, 203, 41–56. [Google Scholar] [CrossRef]
Hurtado, L.A.; Mocanu, E.; Nguyen, P.H.; Gibescu, M.; Kamphuis, R.I. Enabling cooperative behavior for building demand response based on extended joint action learning. IEEE Trans. Ind. Inform. 2017, 14, 127–136. [Google Scholar] [CrossRef]
Kofinas, P.; Dounis, A.I.; Vouros, G.A. Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids. Appl. Energy 2018, 219, 53–67. [Google Scholar] [CrossRef]
Prasad, A.; Dusparic, I. Multi-agent deep reinforcement learning for zero energy communities. In Proceedings of the 2019 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe), Bucharest, Romania, 29 September–2 October 2019; pp. 1–5. [Google Scholar]
Gupta, A.; Badr, Y.; Negahban, A.; Qiu, R.G. Energy-efficient heating control for smart buildings with deep reinforcement learning. J. Build. Eng. 2021, 34, 101739. [Google Scholar] [CrossRef]
Nagarathinam, S.; Menon, V.; Vasan, A.; Sivasubramaniam, A. Marco-multi-agent reinforcement learning based control of building hvac systems. In Proceedings of the Eleventh ACM International Conference on Future Energy Systems, Virtual, 22–26 June 2020; pp. 57–67. [Google Scholar]
Lee, S.; Choi, D.H. Federated reinforcement learning for energy management of multiple smart homes with distributed energy resources. IEEE Trans. Ind. Inform. 2020, 18, 488–497. [Google Scholar] [CrossRef]
Xu, X.; Jia, Y.; Xu, Y.; Xu, Z.; Chai, S.; Lai, C.S. A multi-agent reinforcement learning-based data-driven method for home energy management. IEEE Trans. Smart Grid 2020, 11, 3201–3211. [Google Scholar] [CrossRef]
Vazquez-Canteli, J.R.; Henze, G.; Nagy, Z. MARLISA: Multi-agent reinforcement learning with iterative sequential action selection for load shaping of grid-interactive connected buildings. In Proceedings of the 7th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Virtual, 18–20 November 2020; pp. 170–179. [Google Scholar]
Yu, L.; Sun, Y.; Xu, Z.; Shen, C.; Yue, D.; Jiang, T.; Guan, X. Multi-agent deep reinforcement learning for HVAC control in commercial buildings. IEEE Trans. Smart Grid 2020, 12, 407–419. [Google Scholar] [CrossRef]
Fu, Q.; Chen, X.; Ma, S.; Fang, N.; Xing, B.; Chen, J. Optimal control method of HVAC based on multi-agent deep reinforcement learning. Energy Build. 2022, 270, 112284. [Google Scholar] [CrossRef]
Yu, L.; Xu, Z.; Zhang, T.; Guan, X.; Yue, D. Energy-efficient personalized thermal comfort control in office buildings based on multi-agent deep reinforcement learning. Build. Environ. 2022, 223, 109458. [Google Scholar] [CrossRef]
Shen, R.; Zhong, S.; Wen, X.; An, Q.; Zheng, R.; Li, Y.; Zhao, J. Multi-agent deep reinforcement learning optimization framework for building energy system with renewable energy. Appl. Energy 2022, 312, 118724. [Google Scholar] [CrossRef]
Blad, C.; Bøgh, S.; Kallesøe, C.S. Data-driven offline reinforcement learning for HVAC-systems. Energy 2022, 261, 125290. [Google Scholar] [CrossRef]
Lei, Y.; Zhan, S.; Ono, E.; Peng, Y.; Zhang, Z.; Hasama, T.; Chong, A. A practical deep reinforcement learning framework for multivariate occupant-centric control in buildings. Appl. Energy 2022, 324, 119742. [Google Scholar] [CrossRef]
Pinto, G.; Kathirgamanathan, A.; Mangina, E.; Finn, D.P.; Capozzoli, A. Enhancing energy management in grid-interactive buildings: A comparison among cooperative and coordinated architectures. Appl. Energy 2022, 310, 118497. [Google Scholar] [CrossRef]
Chu, Y.; Wei, Z.; Sun, G.; Zang, H.; Chen, S.; Zhou, Y. Optimal home energy management strategy: A reinforcement learning method with actor-critic using Kronecker-factored trust region. Electr. Power Syst. Res. 2022, 212, 108617. [Google Scholar] [CrossRef]
Gao, Y.; Matsunami, Y.; Miyata, S.; Akashi, Y. Multi-agent reinforcement learning dealing with hybrid action spaces: A case study for off-grid oriented renewable building energy system. Appl. Energy 2022, 326, 120021. [Google Scholar] [CrossRef]
Pigott, A.; Crozier, C.; Baker, K.; Nagy, Z. Gridlearn: Multiagent reinforcement learning for grid-aware building energy management. Electr. Power Syst. Res. 2022, 213, 108521. [Google Scholar] [CrossRef]
Qiu, D.; Xue, J.; Zhang, T.; Wang, J.; Sun, M. Federated reinforcement learning for smart building joint peer-to-peer energy and carbon allowance trading. Appl. Energy 2023, 333, 120526. [Google Scholar] [CrossRef]
Xie, J.; Ajagekar, A.; You, F. Multi-agent attention-based deep reinforcement learning for demand response in grid-responsive buildings. Appl. Energy 2023, 342, 121162. [Google Scholar] [CrossRef]
Nweye, K.; Sankaranarayanan, S.; Nagy, Z. MERLIN: Multi-agent offline and transfer learning for occupant-centric operation of grid-interactive communities. Appl. Energy 2023, 346, 121323. [Google Scholar] [CrossRef]
Homod, R.Z.; Yaseen, Z.M.; Hussein, A.K.; Almusaed, A.; Alawi, O.A.; Falah, M.W.; Abdelrazek, A.H.; Ahmed, W.; Eltaweel, M. Deep clustering of cooperative multi-agent reinforcement learning to optimize multi chiller HVAC systems for smart buildings energy management. J. Build. Eng. 2023, 65, 105689. [Google Scholar] [CrossRef]
Lauro, F.; Longobardi, L.; Panzieri, S. An adaptive distributed predictive control strategy for temperature regulation in a multizone office building. In Proceedings of the 2014 IEEE International Workshop on Intelligent Energy Systems (IWIES), San Diego, CA, USA, 8 October 2014; IEEE: New York, NY, USA, 2014; pp. 32–37. [Google Scholar]
Pedersen, T.H.; Hedegaard, R.E.; Petersen, S. Space heating demand response potential of retrofitted residential apartment blocks. Energy Build. 2017, 141, 158–166. [Google Scholar] [CrossRef]
Pedersen, T.H.; Hedegaard, R.E.; Knudsen, M.D.; Petersen, S. Comparison of centralized and decentralized model predictive control in a building retrofit scenario. Energy Procedia 2017, 122, 979–984. [Google Scholar] [CrossRef]
Abobakr, S.A.; Sadid, W.H.; Zhu, G. A game-theoretic decentralized model predictive control of thermal appliances in discrete-event systems framework. IEEE Trans. Ind. Electron. 2018, 65, 6446–6456. [Google Scholar] [CrossRef]
Sangi, R.; Müller, D. A novel hybrid agent-based model predictive control for advanced building energy systems. Energy Convers. Manag. 2018, 178, 415–427. [Google Scholar] [CrossRef]
Yang, Y.; Jia, Q.S.; Guan, X.; Zhang, X.; Qiu, Z.; Deconinck, G. Decentralized EV-based charging optimization with building integrated wind energy. IEEE Trans. Autom. Sci. Eng. 2018, 16, 1002–1017. [Google Scholar] [CrossRef]
Zhuo, W.; Savkin, A.V.; Meng, K. Decentralized optimal control of a microgrid with solar PV, BESS and thermostatically controlled loads. Energies 2019, 12, 2111. [Google Scholar] [CrossRef]
Lyons, B.; O’Dwyer, E.; Shah, N. Model reduction for Model Predictive Control of district and communal heating systems within cooperative energy systems. Energy 2020, 197, 117178. [Google Scholar] [CrossRef]
El Geneidy, R.; Howard, B. Contracted energy flexibility characteristics of communities: Analysis of a control strategy for demand response. Appl. Energy 2020, 263, 114600. [Google Scholar] [CrossRef]
Wang, J.; Garifi, K.; Baker, K.; Zuo, W.; Zhang, Y.; Huang, S.; Vrabie, D. Optimal renewable resource allocation and load scheduling of resilient communities. Energies 2020, 13, 5683. [Google Scholar] [CrossRef]
Saletti, C.; Gambarotta, A.; Morini, M. Development, analysis and application of a predictive controller to a small-scale district heating system. Appl. Therm. Eng. 2020, 165, 114558. [Google Scholar] [CrossRef]
Wu, Y.; Mäki, A.; Jokisalo, J.; Kosonen, R.; Kilpeläinen, S.; Salo, S.; Liu, H.; Li, B. Demand response of district heating using model predictive control to prevent the draught risk of cold window in an office building. J. Build. Eng. 2021, 33, 101855. [Google Scholar] [CrossRef]
Lefebure, N.; Khosravi, M.; de Badyn, M.H.; Bünning, F.; Lygeros, J.; Jones, C.; Smith, R.S. Distributed model predictive control of buildings and energy hubs. Energy Build. 2022, 259, 111806. [Google Scholar] [CrossRef]
Hurtado, L.; Nguyen, P.; Kling, W. Smart grid and smart building inter-operation using agent-based particle swarm optimization. Sustain. Energy Grids Networks 2015, 2, 32–40. [Google Scholar] [CrossRef]
Shaikh, P.H.; Nor, N.B.M.; Nallagownden, P.; Elamvazuthi, I.; Ibrahim, T. Intelligent multi-objective control and management for smart energy efficient buildings. Int. J. Electr. Power Energy Syst. 2016, 74, 403–409. [Google Scholar] [CrossRef]
Altayeva, A.; Omarov, B.; Suleimenov, Z.; Im Cho, Y. Application of multi-agent control systems in energy-efficient intelligent building. In Proceedings of the 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), Otsu, Japan, 27–30 June 2017; IEEE: New York, NY, USA, 2017; pp. 1–5. [Google Scholar]
ur Rehman, H.; Hirvonen, J.; Sirén, K. Performance comparison between optimized design of a centralized and semi-decentralized community size solar district heating system. Appl. Energy 2018, 229, 1072–1094. [Google Scholar] [CrossRef]
Ghazimirsaeid, S.S.; Jonban, M.S.; Mudiyanselage, M.W.; Marzband, M.; Martinez, J.L.R.; Abusorrah, A. Multi-agent-based energy management of multiple grid-connected green buildings. J. Build. Eng. 2023, 74, 106866. [Google Scholar] [CrossRef]
Cai, J.; Kim, D.; Jaramillo, R.; Braun, J.E.; Hu, J. A general multi-agent control approach for building energy system optimization. Energy Build. 2016, 127, 337–351. [Google Scholar] [CrossRef]
Hou, X.; Xiao, Y.; Cai, J.; Hu, J.; Braun, J.E. Distributed model predictive control via proximal Jacobian ADMM for building control applications. In Proceedings of the 2017 American Control Conference (ACC), Seattle, DA, USA, 24–26 May 2017; IEEE: New York, NY, USA, 2017; pp. 37–43. [Google Scholar]
Carli, R.; Dotoli, M. Decentralized control for residential energy management of a smart users’ microgrid with renewable energy exchange. IEEE/CAA J. Autom. Sin. 2019, 6, 641–656. [Google Scholar] [CrossRef]
Li, W.; Wang, S. A multi-agent based distributed approach for optimal control of multi-zone ventilation systems considering indoor air quality and energy use. Appl. Energy 2020, 275, 115371. [Google Scholar] [CrossRef]
Li, W.; Li, H.; Wang, S. An event-driven multi-agent based distributed optimal control strategy for HVAC systems in IoT-enabled smart buildings. Autom. Constr. 2021, 132, 103919. [Google Scholar] [CrossRef]
Lyu, C.; Jia, Y.; Xu, Z. Fully decentralized peer-to-peer energy sharing framework for smart buildings with local battery system and aggregated electric vehicles. Appl. Energy 2021, 299, 117243. [Google Scholar] [CrossRef]
Wang, Z.; Zhao, Y.; Zhang, C.; Ma, P.; Liu, X. A general multi agent-based distributed framework for optimal control of building HVAC systems. J. Build. Eng. 2022, 52, 104498. [Google Scholar] [CrossRef]
Guo, Y.; Pan, M.; Fang, Y.; Khargonekar, P.P. Decentralized coordination of energy utilization for residential households in the smart grid. IEEE Trans. Smart Grid 2013, 4, 1341–1350. [Google Scholar] [CrossRef]
Zheng, L.; Cai, L. A distributed demand response control strategy using Lyapunov optimization. IEEE Trans. Smart Grid 2014, 5, 2075–2083. [Google Scholar] [CrossRef]
Yu, L.; Xie, D.; Jiang, T.; Zou, Y.; Wang, K. Distributed real-time HVAC control for cost-efficient commercial buildings under smart grid environment. IEEE Internet Things J. 2017, 5, 44–55. [Google Scholar] [CrossRef]
Jazizadeh, F.; Ghahramani, A.; Becerik-Gerber, B.; Kichkaylo, T.; Orosz, M. User-led decentralized thermal comfort driven HVAC operations for improved efficiency in office buildings. Energy Build. 2014, 70, 398–410. [Google Scholar] [CrossRef]
Javed, A.; Larijani, H.; Ahmadinia, A.; Emmanuel, R.; Mannion, M.; Gibson, D. Design and implementation of a cloud enabled random neural network-based decentralized smart controller with intelligent sensor nodes for HVAC. IEEE Internet Things J. 2016, 4, 393–403. [Google Scholar] [CrossRef]
González-Briones, A.; Prieto, J.; De La Prieta, F.; Herrera-Viedma, E.; Corchado, J.M. Energy optimization using a case-based reasoning strategy. Sensors 2018, 18, 865. [Google Scholar] [CrossRef] [PubMed]
Joe, J.; Karava, P.; Hou, X.; Xiao, Y.; Hu, J. A distributed approach to model-predictive control of radiant comfort delivery systems in office spaces with localized thermal environments. Energy Build. 2018, 175, 173–188. [Google Scholar] [CrossRef]
Lork, C.; Li, W.T.; Qin, Y.; Zhou, Y.; Yuen, C.; Tushar, W.; Saha, T.K. An uncertainty-aware deep reinforcement learning framework for residential air conditioning energy management. Appl. Energy 2020, 276, 115426. [Google Scholar] [CrossRef]
Zou, Z.; Yu, X.; Ergan, S. Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network. Build. Environ. 2020, 168, 106535. [Google Scholar] [CrossRef]
Rochd, A.; Benazzouz, A.; Abdelmoula, I.A.; Raihani, A.; Ghennioui, A.; Naimi, Z.; Ikken, B. Design and implementation of an AI-based & IoT-enabled Home Energy Management System: A case study in Benguerir—Morocco. Energy Rep. 2021, 7, 699–719. [Google Scholar]
Li, Z.; Zhang, J. Study on the distributed model predictive control for multi-zone buildings in personalized heating. Energy Build. 2021, 231, 110627. [Google Scholar] [CrossRef]
Homod, R.Z.; Togun, H.; Hussein, A.K.; Al-Mousawi, F.N.; Yaseen, Z.M.; Al-Kouz, W.; Abd, H.J.; Alawi, O.A.; Goodarzi, M.; Hussein, O.A. Dynamics analysis of a novel hybrid deep clustering for unsupervised learning by reinforcement of multi-agent to energy saving in intelligent buildings. Appl. Energy 2022, 313, 118863. [Google Scholar] [CrossRef]
Mork, M.; Xhonneux, A.; Müller, D. Nonlinear Distributed Model Predictive Control for multi-zone building energy systems. Energy Build. 2022, 264, 112066. [Google Scholar] [CrossRef]
Chen, C.; Wang, J.; Kishore, S. A distributed direct load control approach for large-scale residential demand response. IEEE Trans. Power Syst. 2014, 29, 2219–2228. [Google Scholar] [CrossRef]
Dai, Y.; Jiang, Z.; Shen, Q.; Chen, P.; Wang, S.; Jiang, Y. A decentralized algorithm for optimal distribution in HVAC systems. Build. Environ. 2016, 95, 21–31. [Google Scholar] [CrossRef]
Hollinger, R.; Diazgranados, L.M.; Braam, F.; Erge, T.; Bopp, G.; Engel, B. Distributed solar battery systems providing primary control reserve. IET Renew. Power Gener. 2016, 10, 63–70. [Google Scholar] [CrossRef]
Michailidis, I.T.; Schild, T.; Sangi, R.; Michailidis, P.; Korkas, C.; Fütterer, J.; Müller, D.; Kosmatopoulos, E.B. Energy-efficient HVAC management using cooperative, self-trained, control agents: A real-life German building case study. Appl. Energy 2018, 211, 113–125. [Google Scholar] [CrossRef]
Png, E.; Srinivasan, S.; Bekiroglu, K.; Chaoyang, J.; Su, R.; Poolla, K. An internet of things upgrade for smart and scalable heating, ventilation and air-conditioning control in commercial buildings. Appl. Energy 2019, 239, 408–424. [Google Scholar] [CrossRef]
Xue, Y.; Li, Z.; Lin, C.; Guo, Q.; Sun, H. Coordinated dispatch of integrated electric and district heating systems using heterogeneous decomposition. IEEE Trans. Sustain. Energy 2019, 11, 1495–1507. [Google Scholar] [CrossRef]
Lymperopoulos, G.; Ioannou, P. Building temperature regulation in a multi-zone HVAC system using distributed adaptive control. Energy Build. 2020, 215, 109825. [Google Scholar] [CrossRef]
Alhasnawi, B.N.; Jasim, B.H.; Sedhom, B.E.; Hossain, E.; Guerrero, J.M. A new decentralized control strategy of microgrids in the internet of energy paradigm. Energies 2021, 14, 2183. [Google Scholar] [CrossRef]
Ahrens, M.; Kern, F.; Schmeck, H. Strategies for an adaptive control system to improve power grid resilience with smart buildings. Energies 2021, 14, 4472. [Google Scholar] [CrossRef]
Jonban, M.S.; Romeral, L.; Akbarimajd, A.; Ali, Z.; Ghazimirsaeid, S.S.; Marzband, M.; Putrus, G. Autonomous energy management system with self-healing capabilities for green buildings (microgrids). J. Build. Eng. 2021, 34, 101604. [Google Scholar] [CrossRef]
Kolahan, A.; Maadi, S.R.; Teymouri, Z.; Schenone, C. Blockchain-based solution for energy demand-side management of residential buildings. Sustain. Cities Soc. 2021, 75, 103316. [Google Scholar] [CrossRef]
Gupta, S.K.; Ghose, T.; Chatterjee, K. Coordinated control of Incentive-Based Demand Response Program and BESS for frequency regulation in low inertia isolated grid. Electr. Power Syst. Res. 2022, 209, 108037. [Google Scholar] [CrossRef]
Michailidis, I.T.; Sangi, R.; Michailidis, P.; Schild, T.; Fuetterer, J.; Mueller, D.; Kosmatopoulos, E.B. Balancing energy efficiency with indoor comfort using smart control agents: A simulative case study. Energies 2020, 13, 6228. [Google Scholar] [CrossRef]
Michailidis, P.; Michailidis, I.T.; Gkelios, S.; Karatzinis, G.; Kosmatopoulos, E.B. Neuro-distributed cognitive adaptive optimization for training neural networks in a parallel and asynchronous manner. Integr.-Comput.-Aided Eng. 2024, 31, 1–23. [Google Scholar] [CrossRef]
Tran, N.; Nguyen, T. Artificial Neural Networks for Modeling Pollutant Removal in Wastewater Treatment: A Review. Galore Int. J. Appl. Sci. Humanit. 2024, 8, 88–98. [Google Scholar]
Nguyen, D.H.; Funabashi, T. Decentralized control design for user comfort and energy saving in multi-zone buildings. Energy Procedia 2019, 156, 172–176. [Google Scholar] [CrossRef]

Figure 1. Paper structure.

Figure 2. Primary interactions between energy subsystems in IBEMS framework.

Figure 3. Multi-agent control types classification.

Figure 6. Model-based vs. Model-free methodologies occurrence (Left) and share (%) (Right) in the multi-agent control applications for IBEMS (2014–2024).

Figure 7. RL methodology occurrence (Left) and RL type share (%) (Right) in multi-agent control applications for IBEMS (2014–2024).

Figure 8. Model-based methodologies occurrence (Left) and share (%) (Right) in multi-agent control applications for IBEMS (2014–2024).

Figure 9. Multi-agent type occurrence in the multi-agent control application for IBEMS (2014–2024).

Figure 10. Energy system occurrence in multi-agent control applications for IBEMS (2014–2024).

Figure 11. Building type occurrence (Left) and percentage(%) (Right) in the multi-agent control applications for IBEMS (2014–2024).

Figure 12. Application type occurrence (Left) and percentage(%) (Right) in multi-agent control applications for IBEMS (2014–2024).

Figure 13. Simulation tools occurrence in multi-agent control applications for IBEMS (2014–2024).

Table 8. Classification into value-based, policy-based, and actor–critic for RL applications.

Approach	Ref.	Achievement
Value-based	[55]	CQL reduced grid power consumption by 15% and increased solar utility by 10–12%
	[58]	Hybrid Q-learning achieved 20% energy savings
	[63]	Q-learning for HVAC control achieved 15% energy savings
	[69]	Q-learning with LSTM reduced heating costs by 19.4%
	[59]	DQN achieved a 40–60 kWh improvement in energy sharing
	[60]	DQN-based controller improved thermal comfort by 15–30%
	[61]	DQL agents reduced energy consumption by 17%
	[66]	DQN optimized HVACs, improving energy efficiency by 11.1%
	[78]	DCCMARL reduced energy consumption by 49%
	[68]	D3QN reduced uncomfortable duration by 84%
	[70]	BDQ reduced cooling energy by 14%
Policy-based	[56]	Ontology-driven EMS reduced microgrid costs by 5%
	[57]	eJAL algorithm reduced overload duration by 16.3%
	[64]	MARLISA reduced peak load by 15% and ramping by 35%
	[73]	Hierarchical RL improved off-grid operations by 64.93%
	[74]	PPO reduced overvoltages by 34%
Actor–critic	[65]	MAAC reduced energy costs by 56.50–75.25%
	[67]	MAAC reduced energy consumption by 0.7–4.18%
	[75]	Fed-JPC algorithm reduced costs and emissions by 5.87% and 8.02%
	[76]	MAAC reduced net load demand by 6%
	[71]	SAC reduced costs by 7% and peak demand by 14%
	[77]	SAC reduced electricity consumption by 20% and costs by 30%
	[62]	A2C reduced electricity by 20% and emissions by 20%
	[72]	ACKTR reduced energy cost by 25.37%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Michailidis, P.; Michailidis, I.; Kosmatopoulos, E. Review and Evaluation of Multi-Agent Control Applications for Energy Management in Buildings. Energies 2024, 17, 4835. https://doi.org/10.3390/en17194835

AMA Style

Michailidis P, Michailidis I, Kosmatopoulos E. Review and Evaluation of Multi-Agent Control Applications for Energy Management in Buildings. Energies. 2024; 17(19):4835. https://doi.org/10.3390/en17194835

Chicago/Turabian Style

Michailidis, Panagiotis, Iakovos Michailidis, and Elias Kosmatopoulos. 2024. "Review and Evaluation of Multi-Agent Control Applications for Energy Management in Buildings" Energies 17, no. 19: 4835. https://doi.org/10.3390/en17194835

APA Style

Michailidis, P., Michailidis, I., & Kosmatopoulos, E. (2024). Review and Evaluation of Multi-Agent Control Applications for Energy Management in Buildings. Energies, 17(19), 4835. https://doi.org/10.3390/en17194835

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Review and Evaluation of Multi-Agent Control Applications for Energy Management in Buildings

Abstract

1. Introduction

1.1. Motivation

1.2. Literature Analysis Approach

1.3. Previous Work

1.4. Novelty and Contribution

1.5. Paper Structure

2. Integrated Building Energy Management Systems

2.1. Primary IBEMS Subsystems

2.2. Multi-Agent Control Types for IBEMS

2.3. General Description of Multi-Agent Control Processes in IBEMS Applications

3. Mathematical Concept Multi-Agent Control Methodologies for IBEMS

3.1. Reinforcement Learning

3.2. Model Predictive Control

3.3. Evolutionary Algorithms

3.4. Alternating Direction Method of Multipliers

4. Review on Multi-Agent Control for Energy Management Systems

4.1. Model-Free Control Strategies

Reinforcement Learning

4.2. Model-Based Control Strategies

4.2.1. Model Predictive Control (MPC)

4.2.2. Model-Based Control with Evolutionary-Based Optimization

4.2.3. Model-Based Control with ADMM-Based Optimization

4.2.4. Model-Based Control with Lyapunov-Based Optimization

4.3. Hybrid Strategies

4.4. Other Model-Free and Model-Based Strategies

5. Evaluation

5.1. Evaluation per Multi-Agent Algorithmic Methodology

5.1.1. Reinforcement Learning

5.1.2. Model Predictive Control

5.1.3. Model-Based Control with Evolutionary-Based Optimization

5.1.4. Model-Based Control with ADMM-Based Optimization

5.1.5. Model-Based Control with Lyapunov-Based Optimization

5.1.6. Hybrid Strategies

5.2. Evaluation per Agent Type

5.3. Evaluation per BEMS Type

5.4. Evaluation per Building Type

5.5. Evaluation per Application Type

5.6. Evaluation per Simulation Tool

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI