Next Article in Journal
Hydrocarbon Potential Assessment Methods in Complex Fault Zones: A Case Study of the Southern Pinghu Structural Belt, East China Sea
Previous Article in Journal
Application and Performance Evaluation of Key Technologies in Green Buildings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Applications of Deep Reinforcement Learning for Home Energy Management Systems: A Review

Department of Power Electronics and Energy Control Systems, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, AGH University of Krakow, 30-059 Krakow, Poland
*
Authors to whom correspondence should be addressed.
Energies 2024, 17(24), 6420; https://doi.org/10.3390/en17246420
Submission received: 8 November 2024 / Revised: 16 December 2024 / Accepted: 18 December 2024 / Published: 20 December 2024
(This article belongs to the Section K: State-of-the-Art Energy Related Technologies)

Abstract

:
In the context of the increasing integration of renewable energy sources (RES) and smart devices in domestic applications, the implementation of Home Energy Management Systems (HEMS) is becoming a pivotal factor in optimizing energy usage and reducing costs. This review examines the role of reinforcement learning (RL) in the advancement of HEMS, presenting it as a powerful tool for the adaptive management of complex, real-time energy demands. This review is notable for its comprehensive examination of the applications of RL-based methods and tools in HEMS, which encompasses demand response, load scheduling, and renewable energy integration. Furthermore, the integration of RL within distributed automation and Internet of Things (IoT) frameworks is emphasized in the review as a means of facilitating autonomous, data-driven control. Despite the considerable potential of this approach, the authors identify a number of challenges that require further investigation, including the need for robust data security and scalable solutions. It is recommended that future research place greater emphasis on real applications and case studies, with the objective of bridging the gap between theoretical models and practical implementations. The objective is to achieve resilient and secure energy management in residential and prosumer buildings, particularly within local microgrids.

1. Introduction

The transition towards sustainable energy practices has led to an increased emphasis on Home Energy Management Systems (HEMS) as integral components of energy-efficient residential buildings. These systems are designed to monitor, control and optimize energy usage within residences, thereby aligning with broader goals of reduced environmental impact and operational cost savings [1,2]. As residential energy demands increase, driven by the growing utilization of smart devices, electric vehicles and distributed energy resources (DERs) such as solar panels and battery storage, HEMS are becoming indispensable components of modern homes [3,4,5,6]. The importance of HEMS is further emphasized by the advent of smart grids, as they not only facilitate the management of energy within the context of individual households but also provide support for grid stability and efficiency through the implementation of demand side management (DSM) and demand side response (DSR) mechanisms [6,7,8]. In conjunction with these technological developments, new regulatory frameworks, policies and standards are fostering the evolution of HEMS, thereby rendering energy management functions obligatory for residential and prosumer premises. The revised Energy Performance of Buildings Directive (EPBD 2024) [9] places an emphasis on energy efficiency within the context of the building sector and mandates improvements in energy performance across EU member states. The directive’s objective is to achieve a highly energy-efficient and decarbonized building stock by 2050, which will necessitate updates to existing HEMS to meet the requisite energy-saving and emission-reducing targets [10]. Furthermore, the Smart Readiness Indicator (SRI), recently introduced in EPBD 2024, assesses a building’s capacity to support a range of smart services, including the integration of renewable energy sources (RES) and storage solutions, the enabling of vehicle-to-grid (V2G) connections for electric vehicle (EV) services, and effective interfacing with the smart grid. The SRI is therefore of great importance in the promotion of buildings that are not only energy-efficient but also adaptable to future energy requirements [11,12,13]. New technical standards, such as ISO 52120 [14] for Building Automation and Control Systems (BACS), provide further specification of the functional and performance requirements for energy management. This standard provides comprehensive guidance for optimizing energy usage within residential and commercial buildings, requiring the integration of BACS with HEMS to achieve enhanced energy efficiency and responsiveness to grid signals. In combination, these regulatory changes are propelling HEMS towards more sophisticated functionalities that not only facilitate energy management at the household level but also actively contribute to grid stability [11,15,16].

1.1. Modern and Future Home Energy Management Systems—Complexity and Advancements

It is expected that modern and future HEMS will manage a wide array of interconnected devices, including HVAC systems, lighting, appliances, as well as RESs and local prosumer microgrids [8,17]. These devices are often integrated with distributed control networks and Internet of Things (IoT) technologies that enable real-time monitoring and automated control [18,19,20]. The utilization of the IoT within smart homes provides a plethora of data, yet simultaneously introduces a multitude of complexities. The system must adeptly handle diverse data streams from a multitude of sources, process these data in real time and make optimal control decisions under varying conditions [21,22,23,24]. The heterogeneity of data, encompassing weather forecasts, energy prices, user habits, and the operational status of devices, gives rise to a highly complex problem space. The complexity is further amplified in interconnected environments, where homes operate as nodes within a smart grid. Adaptive and coordinated responses are therefore essential to ensure grid stability and efficiency [25,26,27,28].
Bearing in mind all the discussed aspects, conventional optimization techniques for HEMS, such as rule-based algorithms, static scheduling and linear programming, frequently demonstrate shortcomings in addressing the dynamic characteristics of smart homes comprising interconnected devices and fluctuating demand profiles. Reinforcement Learning (RL) offers a promising approach for dynamically managing energy resources within complex environments. Its ability to learn and improve policies through interaction and reward makes it a compelling solution for these scenarios. In particular, RL enables HEMS to adapt continuously to changing conditions by adjusting control policies in response to real-time feedback [29,30,31,32]. As HEMS applications become increasingly complex and diverse, Deep Reinforcement Learning (DRL), a subfield of RL that employs deep neural networks to process high-dimensional data, has emerged as a powerful tool for managing these systems. The abilities of DRL algorithms to manage extensive data inputs and learn intricate relationships between variables make them particularly well-suited to smart home environments with extensive IoT devices and diverse energy sources [32,33,34,35].
The implementation of RL and DRL in HEMS presents a number of benefits with regard to DSM and load management. To illustrate, DRL algorithms are capable of anticipating peak load periods based on historical data and current usage patterns, thereby enabling systems to preemptively shift loads, control appliance usage, or draw on stored energy to reduce strain on the grid [6,15,36,37]. Furthermore, the adaptability of RL-based methods enables HEMS to autonomously respond to DSM signals from utility providers, thereby adjusting loads in a manner that optimizes both homeowner cost savings and grid efficiency. This adaptability is crucial, as HEMS are increasingly expected to balance energy consumption and storage within the household while synchronizing with smart grid demands. This will contribute to broader goals of energy resilience and sustainability as well as support transactive energy models [38,39,40,41,42].
The BACS and smart home systems plays a pivotal role within HEMS by orchestrating the control of an array of home subsystems, including lighting, climate, and security, through centralized or distributed control platforms [43,44,45]. The integration of advanced BACS with RL and DRL models facilitates a more unified and effective operation of HEMS. This is achieved by providing a comprehensive view of the user’s activity, the home’s energy demands and environmental conditions, which enables the management of the building’s diverse systems in a coordinated manner. This integration supports an optimized approach to energy management, where, for instance, Heating, Ventilation and Air Conditioning (HVAC) systems can be programmed to adjust based on occupancy patterns detected by IoT sensors or energy pricing data, thereby reducing operational costs [5,22,46,47]. Despite the considerable potential of RL and DRL in the field of HEMS, a number of challenges remain. One significant constraint is the data-intensive nature of DRL models, which necessitate extensive training data to learn optimal policies, often resulting in high computational overheads. Furthermore, DRL models necessitate a high degree of safety and reliability, as suboptimal decisions could result in increased energy costs or compromise user comfort. Moreover, the potential for integrating DRL with IoT-based HEMS gives rise to concerns regarding data security and user privacy, given that these systems process sensitive personal data from smart home devices [26,48,49,50,51].

1.2. Fundamentals of RL and DRL

RL is a learning process whereby the objective is to maximize rewards by mapping situations to actions. In contrast to other learning paradigms, RL does not provide explicit instructions regarding the selection of actions. Instead, the algorithm focuses on developing policies through agent–environment interactions, with the objective of maximizing cumulative rewards based on actions in specific states. It creates a model that identifies actions yielding the highest rewards, considering not only the current state but also subsequent future states. Key features distinguishing RL from other learning forms include trial-and-error exploration and delayed reward mechanisms, which are formalized using the theory of dynamic systems [52,53]. RL methods frequently employ tabular approaches or linear function approximation, which demonstrate efficacy in low-dimensional settings but encounter challenges in high-dimensional state and action spaces. Furthermore, these methods are deficient in their capacity to represent complex features in an efficacious manner [52,54]. The RL process involves continuous interaction between an agent and the environment. The agent performs an action, the environment responds by modifying its state and offering a reward, and the agent utilizes this information to optimize its future decisions. The objective is to develop a policy, that is to say, a strategy that maximizes the cumulative future rewards that the agent can achieve. Reinforcement learning employs the formal framework of Markov decision processes to define the interaction between a learning agent and its environment in terms of states, actions, and rewards [52]. The Markov Decision Process (MDP) is a mathematical framework that is employed to model decision-making in environments with stochastic outcomes. It provides the foundation for reinforcement learning by defining the interaction between an agent and its environment. In its formal definition, an MDP is represented by a 5-tuple ( S ,   A ,   P ,   R ,   γ ), where
  • S : The set of states representing the environment’s possible configurations.
  • A : The set of actions available to the agent.
  • P ( s | s , a ) : The transition probability, indicating the likelihood of moving from state s to state s after taking action a .
  • R ( s , a ,   s ) : The reward function, providing feedback for transitioning between states due to an action.
  • γ : The discount factor, which balances the importance of immediate rewards versus future rewards ( 0 < γ < 1 ) .
In an MDP, the next state s depends only on the current state s and the action a , a property known as the Markov property. The agent aims to find an optimal policy π ( a | s ) , a mapping from states to actions, that maximizes the cumulative discounted reward over time. This cumulative reward, or return, is often denoted as G t = i = t γ i t R ( s i , a i , s i + 1 ) , where t is each time step. The MDPs are widely used in RL, as they enable systematic exploration and exploitation of an environment through iterative updates of the policy based on the received rewards [55].
Considering this information, the fundamental elements of RL include
  • Observations: Information about the environment’s state that helps determine the agent’s next action. In some cases, the agent may not have access to all state variables, which can complicate the learning process.
  • Actions: The choices the agent makes to influence the environment’s state. Actions can be discrete or continuous, depending on the problem domain.
  • Reward: A scalar value returned by the environment after an action, indicating the desirability of that action. The agent aims to maximize the total reward over its interactions with the environment.
  • Trajectory: A sequence of state-action-reward tuples, representing the agent’s interaction with the environment, e.g., s 0 ,   a 0 , r 1 , s 1 ,   a 1 , r 2 ,   s 2 ,   a 2 , r 3 ,
  • Replay Buffer: A memory mechanism used to store past experiences ( s t , a t , r t , s t + 1 ) , which helps to break the correlation between consecutive samples and enables more effective optimization, especially in deep RL approaches [53].
  • Policy: A function mapping observation to actions, defining the agent’s behavior. The policy can be deterministic or stochastic, depending on the application.
  • Discount Factor: A parameter that balances the trade-off between immediate and future rewards, shaping the agent’s long-term strategy.
A common approach to categorizing reinforcement learning methods is based on the degree of knowledge that the agent possesses regarding the environment’s model. Model-based RL utilizes an environment model to forecast future states and rewards based on the current state and selected actions. By simulating the environment’s behavior, these methods permit agents to plan their actions, considering potential future scenarios before they occur. This approach is particularly useful for complex tasks where forward planning can improve decision-making [52]. In contrast to model-based RL, model-free RL does not require an environment model. Instead, it learns through trial-and-error interactions, directly optimizing actions based on feedback received from the environment. While simpler and less computationally intensive, model-free methods lack the ability to plan ahead, focusing solely on experiential data. Modern RL systems often integrate both model-based and model-free strategies for improved performance [52,54].
A further category of classification is associated with the policy, tactics and rewards associated with agents engaged in RL procedures. On-policy methods seek to optimize a single policy that is simultaneously deployed for interaction with the environment. These methods update the same policy based on the experiences gathered during the training phase [52]. In contrast to on-policy methods, off-policy methods maintain two distinct policies: one for data generation (behavioral policy) and another for optimization (target policy). The experiences amassed by the behavioral policy are stored in a replay buffer, which the target policy utilizes for training purposes. This separation enhances sample efficiency and allows for robust learning, even in stochastic environments. Actor–critic methods represent a further evolution in RL methodology, combining aspects of both on-policy and off-policy approaches. In this framework, the actor represents the policy that selects actions, while the critic evaluates these actions using a value function. The feedback from the critic helps improve the actor’s policy over time. This hybrid approach leverages the strengths of policy-based and value-based methods, making it versatile for a wide range of RL applications [52,54].
DRL represents an extension of RL, incorporating deep learning through the use of neural networks to approximate policies or value functions. This enables DRL to process high-dimensional inputs, such as images or time-series data, and address complex tasks involving continuous action spaces. To illustrate, the Deep Q-Network (DQN) algorithm employs convolutional neural networks to extract features from raw pixel data, thereby enabling agents to learn directly from high-dimensional observations, a capability that extends beyond the scope of traditional RL methods. The application of DRL has significantly expanded the scope of RL, rendering it a suitable approach for advanced domains such as robotics, autonomous vehicles, and strategy games [52,54]. In contemporary solutions, principal DRL algorithms are employed:
  • Deep Q-Network (DQN): The DQN extends the capabilities of Q-learning by employing convolutional neural networks (CNNs) for the approximation of the Q-value function. Innovations such as experience replay and target networks facilitate the stabilization of learning in high-dimensional tasks. The experience replay method involves storing agent–environment interactions and randomly sampling them to reduce data correlation. Target networks provide a means of ensuring stable learning objectives by updating parameters periodically. These mechanisms enabled DQN to achieve superhuman performance in Atari games, thereby demonstrating the potential of DRL [56].
  • Proximal Policy Optimization (PPO): PPO simplifies policy optimization by replacing complex constraints with a clipped surrogate objective, thereby ensuring stable updates and avoiding overly aggressive policy changes. The process alternates between sampling data and optimizing policies using minibatch gradient descent. The robustness and computational efficiency of PPO make it a preferred choice for robotics and scalable DRL applications [57].
  • Advantage Actor–Critic (A2C): The A2C algorithm extends reinforcement learning by employing an actor–critic architecture, which combines policy optimization through the actor network and value estimation via the critic network. The actor network generates probabilistic actions based on states, while the critic evaluates these actions to refine policy gradients. A2C introduces an advantage function to stabilize training by quantifying the quality of actions compared to others. In the context of hydrocracking optimization, A2C effectively integrates with a DNN surrogate model to adapt quickly to changing targets. This model serves as the environment for the agent, enabling accurate determination of optimal operating conditions with enhanced computational efficiency [58].
  • Asynchronous Advantage Actor–Critic (A3C): A3C introduces the concept of asynchronous parallel agents interacting with separate environment instances, which serves to decorrelate data and stabilize the training process. The combination of policy-based (actor) and value-based (critic) methods allows for the efficient training of policies using n-step returns and entropy regularization, which encourages exploration. The versatility of this approach has made it an effective method for navigation, robotics, and continuous control tasks [59].
  • Deep Deterministic Policy Gradient (DDPG): The DDPG algorithm extends reinforcement learning to continuous control tasks by employing an actor–critic architecture with two neural networks—an actor to generate deterministic actions and a critic to estimate Q-values. Experience replay and target networks are key stabilizing mechanisms, decoupling experience correlations and maintaining consistent targets for updates. An enhanced variant introduces prioritized experience replay, which ranks experiences based on temporal-difference errors, emphasizing higher-value experiences during training. This prioritization accelerates learning, improves stability, and mitigates sensitivity to hyperparameter changes [60].
The combination of neural networks and RL principles enables efficient learning from high-dimensional inputs, transforming RL into a tool for solving complex, real-world problems. Algorithms such as DQN, A3C, and PPO exemplify this progress, demonstrating the capabilities of DRL across diverse domains. The strengths of DRL algorithms are diverse and tailored to specific challenges. The DQN algorithm demonstrates particular efficacy in high-dimensional tasks, employing techniques such as experience replay and target networks to enhance the stability of the learning process. This illustrates its potential for applications involving discrete action spaces. PPO simplifies policy optimization through the use of a clipped surrogate objective, thereby facilitating robust and efficient updates that are well-suited to scalable applications such as robotics. A2C and its asynchronous counterpart, A3C, employ actor–critic architectures. A3C incorporates parallel agents, which enhance data decorrelation and exploration. These methods are particularly well-suited to the domain of continuous and dynamic control tasks. Meanwhile, DDPG extends DRL to continuous action spaces with deterministic policies, enhanced by mechanisms such as prioritized experience replay, which facilitate accelerated learning and enhanced stability. Collectively, these algorithms illustrate the versatility of DRL, encompassing discrete and continuous control, as well as other domains.

1.3. Deploying Artificial Intelligence/Machine Learning/RL in Building Automation—Trends and Significant Challenges

It is important to note that in the context of contemporary BACS, there is a growing reliance on cutting-edge Artificial Intelligence (AI)/Machine Learning (ML)/RL tools and algorithms across a range of applications. Consequently, a number of challenges emerge. While they are not central to the focus of this review, the authors highlight the most significant issues that may directly impact the topics addressed in this paper.

1.3.1. Data Collection Architectures

The efficient collection of data constitutes the foundation of RL and DRL model training for HEMS. The utilization of IoT devices facilitates the real-time acquisition of diverse and high-frequency data streams, including energy consumption, temperature, and occupancy patterns [57,59]. Such frameworks provide the requisite granularity for the training of robust and adaptive models. Nevertheless, the issue of how to handle incomplete or noisy data remains unresolved. Techniques such as data smoothing, imputation algorithms and outlier detection are vital for ensuring the reliability and actionability of datasets through the preprocessing stage [61,62]. Moreover, offline meta-learning approaches have been introduced with the objective of enhancing model robustness. These methods permit RL models to learn generalizable policies from heterogeneous datasets, thereby facilitating adaptation to varying household configurations without requiring extensive retraining. The combination of these techniques with data augmentation methodologies has the potential to enhance model performance across a range of scenarios. The creation of standardized frameworks for data representation, such as those based on Industry Foundation Classes (IFC) protocols, ensures compatibility across diverse systems and promotes scalability [63,64,65].

1.3.2. Scalability Considerations

The scaling of RL/DRL models for multiple HEMS entails the need to address a number of challenges, including those related to computation, communication, and integration. A hybrid approach that combines edge computing with cloud platforms has emerged as a promising solution in this context. Edge computing enables prompt decision-making by processing essential data at the point of occurrence, thereby alleviating the burden on centralized servers [62,66]. Concurrently, cloud platforms assume responsibility for computationally demanding tasks, such as extensive model optimization and global policy updates, thereby facilitating collaborative learning across multiple HEMS units. The scalable reinforcement learning frameworks, exemplified by SEED RL, enable distributed training by leveraging asynchronous updates and efficient utilization of hardware, thus allowing for training across thousands of environments in parallel [62]. This approach achieves a high level of throughput while maintaining the fidelity of the model, thereby enabling faster deployment. Moreover, the integration of AutoML pipelines into cloud platforms enables the automation of model selection and hyperparameter tuning, thereby reducing the time and expertise required for deployment [61]. The issue of interoperability remains a significant factor in the context of scalability. Compliance with standardized communication protocols, for example IFC, ensures the seamless integration of RL/DRL models into existing building management systems, IoT devices and energy grids [63]. Such standards facilitate compatibility and reduce costs by minimizing the necessity for custom integrated solutions.

1.3.3. Data Security and Privacy

It is of paramount importance to ensure the security and privacy of data when considering the adoption of RL/DRL-based HEMS, particularly given the sensitive nature of energy consumption patterns and user behavior data. The application of differential privacy techniques provides a layer of protection by introducing noise into datasets, thereby ensuring that individual data points cannot be reverse-engineered while preserving the overall utility of the dataset [67,68]. Furthermore, federated learning serves to mitigate risks by enabling decentralized training, whereby data remain on local devices and only model updates are shared with a central server. Robust encryption methods ensure the security of data during transmission, while adversarial defenses protect RL models from attacks designed to manipulate policy behavior. The combination of these techniques with privacy-preserving technologies, such as multi-party computation, serves to enhance the resilience of the system against sophisticated cyber threats. Furthermore, adherence to privacy regulations, including the General Data Protection Regulation (GDPR), is crucial for cultivating user confidence and ensuring legal compliance. The implementation of transparent data handling practices, such as providing users control over their data through consent mechanisms, can serve to further enhance the acceptance of the system in question [68,69].

1.4. An Original Contribution and the Paper Structure

This review provides a comprehensive examination of the current applications of RL and DRL in HEMS, with a particular focus on the analysis of their benefits, limitations, and the state-of-the-art methodologies. By outlining the contributions of RL and DRL to energy demand forecasting, load scheduling, and peak load management, as well as their role in smart grid integration and DSM, this review aims to demonstrate the value of these technologies in advancing smart home capabilities. Furthermore, it discusses ongoing research and future directions, with particular emphasis on areas where RL and DRL have the potential to facilitate the development of autonomous, efficient, and sustainable home energy systems that will benefit both individual households and the broader energy grid.
Moreover, the review makes a novel contribution by exploring the intersection of RL with IoT technologies, DSR and DSM strategies, scheduling optimization, and the integration of RESs with energy storage systems. The focus on these interrelated domains represents a cutting-edge direction in smart building technology development, especially in smart home automation and energy management, which remains underdeveloped in the existing literature. The novelty of this review lies in the following key aspects:
  • Synergy between RL and IoT for real-time smart home systems. While RL and IoT have individually shown promise in home and building automation [43,70,71,72], this review is among the first to extensively analyze how RL can be leveraged with IoT networks to achieve real-time monitoring and adaptive control in energy management. Moreover, it demonstrates the potential for more efficient and autonomous building operations through the utilization of IoT sensors to feed RL systems with real-time data on energy usage, occupancy, and environmental conditions;
  • Innovative approaches to DSR optimization. This review identifies a novel application of RL in enhancing DSR programs, enabling homes, particularly prosumers, to dynamically respond to fluctuations in energy prices and grid conditions. By utilizing RL, homes and buildings can autonomously learn optimal strategies for shifting or reducing energy loads, contributing to grid stability and energy cost savings, particularly in the context of peak demand periods. The ability of RL to adapt to varying DR signals and building-specific constraints presents a significant advancement over traditional rule-based approaches;
  • Advanced scheduling for energy and resource optimization. A unique focus of this review is the application of RL in scheduling algorithms for home automation systems, particularly in relation to energy consumption, occupancy prediction, and appliance usage. This review explores how RL and DRL can optimize multiobjective scheduling problems, balancing comfort, energy efficiency, and operational costs. Such applications are critical for ensuring flexible home and prosumer systems, capable of responding to dynamic energy demands and varying occupant needs;
  • Integration of RL and DRL with RES and energy storage systems. One of the most novel aspects of this review is the examination of how RL and DRL techniques can be used to manage RESs, such as solar and wind, in conjunction with energy storage systems, especially important for modern and future prosumer applications. By enabling intelligent decision-making about when to store, use, or sell generated energy, RL and DRL algorithms can help maximize the self-consumption of renewables and ensure grid or microgrid independence. This is particularly important in homes and buildings aiming for net-zero energy performance, as RL-driven strategies can optimize the use of intermittent RES in real time;
  • Bridging the gap between theory and practice. While much of the existing research on RL in building automation remains theoretical or simulation-based, this review uniquely emphasizes the need for practical case studies and real-world implementations. It identifies key challenges such as scalability, data availability, and heterogeneous system integration, offering insights into how these challenges can be overcome when deploying RL-based systems in operational environments.
Section 2 sets forth the principal assumptions and methodology of the review, whereas Section 3 contains a synthetic analysis of selected publications. The following Section 4 provides detailed information and characterization of the application areas of RL and DRL techniques and algorithms in homes, along with the identification of gaps and challenges. On this basis, in Section 5, the authors identify potential opportunities and development trends, with particular attention to the role of RL and DRL in the organization of advanced energy management systems for homes and prosumers in local microgrids. In the final Section 6, the conclusions are presented, along with an outline of future work.

2. Methodology of the Review

The methodology of organizing this review is informed by the latest developments in home and building automation systems, including an analysis of the potential for incorporating RL and DRL algorithms into the various functional areas of energy management, considering the guidelines for the SRI [73] introduced firstly by the EPBD 2018 [74]. Consequently, the initial stage of the selection process entailed verifying the number of publications addressing the subjects of building automation, home automation and reinforcement learning in the primary bibliographic databases of scientific and technical literature. The outcomes of this preliminary selection process are presented in Table 1.
It is notable that there is a paucity of scientific publications from the area that combine two threads: home and building automation and RL. In particular, the number of articles and individual review studies on the use of RL in home automation and smart homes is limited, indicating a relatively low level of interest among researchers and scientists in this application area for RL. Conversely, a substantial number of publications in these thematic areas were identified in the Google Scholar database, which encompasses a comprehensive range of academic materials, including those that may not meet traditional bibliographic and scientific standards. This suggests that ongoing processes of technological and technical development are occurring regarding solutions based on RL that are dedicated to applications in building and home automation.
However, it is important to note that the factors driving this development are relatively recent. They have emerged in recent years, largely driven by the necessity to align building infrastructure with new standards governing the smart grid, integration of RES, energy storage units and charging stations for electric vehicles. Furthermore, in Europe, the necessity to develop and implement novel, sophisticated energy and demand management mechanisms in residential and commercial buildings is driven by the requirements set forth in the aforementioned EPBD 2018 directive and the SRI indicator [11,13,75,76]. Consequently, in the subsequent phase of the verification and selection process, it was decided that the analysis of the publications in question would be limited to those published after 2018. To gain insight into the current state of knowledge and latest developments in RL solutions and algorithms in home and building automation, a literature review was conducted. This involved selecting relevant publications from technical science publishers. These included Springer, ScienceDirect, MDPI, IEEE Xplore (journals and conferences), and additionally Taylor and Francis, the ACM Digital Library, and the Wiley Online Library. The findings are presented in Table 2.
A detailed examination of the data presented in Table 2 clearly demonstrates that the subject of RL implementation in the functional structures of home and building automation systems is a developing area of research and is being addressed in scientific publications by leading publishers. However, it should be noted that most publications are not indexed in the Web of Science and Scopus databases. Furthermore, the authors of this review, who have been engaged with the subject of supporting BACS building automation systems and the impact of automation functions on the energy efficiency of buildings for several years [41,77,78], highlight the necessity for the development and implementation of new, efficient mechanisms for the processing of large amounts of data in such applications. This is a direct consequence of the continual expansion of the infrastructure of both residential and non-residential buildings, particularly in response to the evolving requirements and needs of their users and managers. Thus, there is an increasing number of diverse types of sensors, monitoring modules, energy and media meters that generate an ever-increasing volume of data, as well as fieldbus-level controllers, which in turn process these data and based on them implement usage scenarios for rooms, buildings or entire building campuses, along with public spaces in their surroundings.
This is an area of application in which BACS and Building Management Systems (BMS) have been functioning effectively for a considerable period. They have been installed in a range of larger public, commercial and office buildings, among other locations. However, the growing trend of developing local energy microgrids and smart grid solutions has led to a requirement for the active inclusion of the infrastructure of these facilities, both commercial and residential, in the energy management network. This, in turn, necessitates the implementation of a novel and ingenious methodology for the organization of monitoring and automation functions with data processing. One of the solutions in this domain is the utilization of external data servers and cloud computing resources [44,46,79]. Moreover, an essential aspect is also the provision of data services and effective processing at the local automation network level (fieldbus) [80,81], with the potential for implementing procedures directly in network nodes (IoT) [70,82] or local automation servers (edge computing) [44,50,83]. It is in this application area that the potential of machine learning mechanisms and algorithms, as well as RL and DRL, with various learning models and data analysis are seen to be particularly promising. Accordingly, the authors selected several dozen scientific papers directly concerning the simulation and application studies of such algorithms in the field of building automation, with a particular focus on home automation. The decision to focus on home automation was motivated by the limited number of publications, particularly reviews that simultaneously addressed RL and DRL as well as home automation. Additionally, the growing trend of this application field for RL and DRL further reinforced the choice of home automation as a key area of interest.

3. State of the Art and Practice

In the assumptions for the analysis of the existing state of knowledge in the area of the use of RL and DRL techniques and algorithms in home automation systems, the focus for this review was on threads supporting the implementation of energy management functions. This is a specific area of application in the context of home automation, as so far, the focus has been mainly on ensuring comfort, convenience and security in the use of the home infrastructure [18,84,85,86,87]. On the other hand, the trends and changes identified in Section 1 and Section 2 indicate the need to carry out research and development work on the integration of mechanisms in home automation systems to support effective energy management, the organization of the functioning of these facilities as prosumers, and so on [8,26,88,89,90,91,92,93,94]. The authors selected five key application areas that support integration, energy and device management, and the operation of prosumer installations in the home. These are the Internet of Things, Demand Response, Scheduling, RES + Storage and EV. As a result, Table 3 presents a unique collection of several dozen publications in the form of articles in journals and conference materials (without reviews) from the 2019–2024 period that constitute the main core of this review analysis, considering the issues indicated previously. In addition, the graph in Figure 1 shows the link between the research and development work on the application of RL and DRL methods discussed in these publications and the main publishers. The developmental nature of this work is evidenced by the large number of publications by the IEEE Xplore publisher, particularly in the form of conference proceedings and individual scientific articles.
The publications collected in Table 3 have been divided into four groups according to the areas of application of RL and DRL algorithms identified in them. It should be emphasized that for each of the application groups, the authors of these publications analyzed the possibilities of using different RL and DRL algorithms and techniques. They also assumed slightly different objectives and methods for verifying the effectiveness of the algorithm implementation, which can be summarized as follows:
  • IoT applications
    • Algorithms used: DRL, Deep Q-learning, Q-learning, DDPG
    • Objectives: Focuses on optimizing cost and comfort, with additional considerations for autonomy, personalization, and privacy
    • Verification: All experiments and models are verified through simulations
  • DSR applications
    • Algorithms used: A variety including MORL, Q-learning (and its variations with Fuzzy Reasoning), DQN, MARL, PPO, Actor–Critic methods, among others
    • Objectives: Primarily target cost and comfort optimization
    • Both simulations and some evaluations using real-world data or physical testing setups (e.g., MATLAB and Arduino Uno)
  • Scheduling applications
    • Algorithms used: Q-learning, DQN, PPO, MADDPG, among others
    • Objectives: Focus on cost and comfort optimization, with several entries solely targeting cost
    • Verification: Predominantly simulations, with some studies using practical data from real-world networks
  • Data security and privacy
    • Algorithms used: TRPO, SAC, Q-learning, PPO, DDPG, and others
    • Objectives: Aimed at optimizing cost and comfort, with a specific focus on energy systems integrating renewable sources and storage
    • Verification: All studies verify their findings through simulations, with some using real-world data from energy markets and PV profiles.
  • Electric vehicles
    • Algorithms used: Q-learning, DDPG, MDP, DQN, and others
    • Objectives: Aimed at optimizing cost and comfort, with a specific focus on grid stability and integrating renewable sources
    • Verification: All studies verify their findings through simulations with real-world data.
It is noteworthy that the efficacy of RL and DRL algorithms has been evaluated through simulations conducted within the development environment for each analyzed case. It is notable that only a limited number of the simulations included data sets derived from real-world objects. Furthermore, there is a notable absence of detailed descriptions of studies conducted on real-world objects in the subject literature. Additionally, there is a dearth of case studies that would provide insight into the tangible impact of RL and DRL algorithm techniques on the efficacy of automation systems, particularly in the context of energy management in residential and commercial buildings. This is one of the issues identified by the authors of this review as a gap and challenge for further research and development work. Furthermore, they conducted a comprehensive analysis of the characteristics of the solutions and application areas for RL and DRL proposed in the literature, with a particular focus on smaller home automation applications.

4. Applications of Reinforcement Learning for Home Automation

As previously stated, the architectural framework of HEMS can be based on IoT modules, as they facilitate the collection of data on energy consumption, weather conditions, or user presence in real time and remotely, enabling the automatic control of receivers and energy flows [97]. The straightforward integration of IoT devices and the relatively simple control algorithms implemented in them prompt the consideration of more effective control methods, such as DRL [34,139]. In the context of HEMS applications, DRL algorithms are employed with the objective of rationalizing energy consumption in households. In particular, the application of these techniques is beneficial in the context of new home infrastructure related to RES, electric car charging systems and energy storage installations. In such circumstances, the incorporation of DRL mechanisms facilitates the formulation of intelligent decisions pertaining to energy management, with considerations of variables such as energy costs, environmental conditions and user preferences [95,96]. DRL agents learn through interactions with their environment, receiving rewards or penalties for their actions, thereby enabling them to gradually enhance their real-time decision-making capabilities [95]. Moreover, DRL models are employed to forecast future energy availability, consumption patterns, and storage levels by analyzing historical data, weather forecasts, and real-time sensor data. This enables the automation system and users to make decisions that are based on informed reasoning, thereby maximizing the use of renewable energy sources and minimizing dependence on external energy sources. RL-based HEMS systems are capable of adapting energy management strategies in accordance with individual user preferences and changing system conditions, including tariffs and DSM/DSR signals. This enables the achievement of greater energy savings and user comfort. Furthermore, these systems can provide users with recommendations regarding the optimal management of energy consumption [96].
From the technical and functional organization point of view, energy consumption management through optimal work planning (i.e., switching on and off or changing the mode) of devices, particularly those included in the technical infrastructure of the building (e.g., heating, ventilation, air conditioning), represents a primary task of modern HEMS. The utilization of control algorithms employing DRL within HEMS facilitates the incorporation of dynamic planning (scheduling) functions. The use of DRL is important for the optimization of scheduling strategies in HEMS systems. This is because DRL algorithms learn optimal energy management strategies through interactions with the environment, receiving rewards for actions that result in energy and cost savings. This is evidenced by the literature, which cites numerous studies on the subject [3,6,112,113,114,115,116,117,118,119,120,121]. The adaptability and flexibility of DRL algorithms enable management systems based on monitoring and control functions to adjust their operations in response to changing market and environmental conditions, including fluctuations in energy prices and changes in demand. Concurrently, they facilitate the optimal utilization of renewable energy and energy stored in energy storage tanks, thereby rationalizing energy storage management processes. As previously stated, a key consideration for the implementation of scheduling in HEMS is the incorporation of user preferences, with the objective of aligning energy management strategies with individual requirements. This approach is essential for the acceptance of such a mode of operation by users. Such preferences can be collated through the utilization of user interfaces, which permit the configuration of the system in accordance with user specifications [6,112,113,116,117,119,121]. Alternatively, they can be employed to facilitate the adaptive realignment of the energy management strategy, thereby reducing instances of user dissatisfaction and enhancing overall system efficiency [112,114,116,117,120].
It is becoming increasingly common to integrate RES with HEMS. These sources of energy include photovoltaic (PV) panels, small wind turbines and associated energy storage devices. The primary challenge facing HEMS is the rationalization of energy supplied to buildings by disparate sources, in addition to the effective integration of PV systems and energy storage devices. Subsequently, energy management strategies must be adjusted in a dynamic manner, with a view to ensuring the safety and efficiency of the systems. A significant number of research and technical teams have already indicated that the use of DRL may offer a potential solution to these issues. In the field of energy management optimization, DRL-based algorithms are being developed and implemented with the objective of managing the charging and discharging of energy storage devices in response to changing market conditions and energy demand. A variety of DRL algorithms are employed, including PPO, LSTM, and DDPG. These facilitate the forecasting of prospective energy conditions and the real-time adaptation of energy management strategies. The primary objective of optimization in such procedures is to minimize operating costs and maximize savings and energy efficiency [122,126,127,128,129].
In the context of the evolving energy storage market and its integration into residential settings, the dynamic adjustment of energy management strategies stored in these systems represents a pivotal challenge for HEMS. Systems based on DRL can anticipate future price conditions and make optimal decisions regarding the management of energy storage and RES in accordance with these predictions. These approaches facilitate flexible and efficient energy management, which is of particular importance in the context of evolving market conditions and energy demand [123,124,125,128]. The implementation of advanced DRL algorithms is contingent upon the assurance of user data protection and the safeguarding of systems against potential cyber threats. Furthermore, challenges related to the scalability and interoperability of these systems necessitate the development of standards and communication protocols that will facilitate the integration of diverse devices and platforms. The issues of security and scalability were addressed in the article [129]. The utilization of DRL in the management of energy storage and RES in HEMS systems offers substantial benefits in terms of energy consumption optimization, dynamic adjustment of energy management strategies and integration of renewable energy sources. The utilization of DRL in HEMS also facilitates the implementation of the DSR concept, which represents a promising solution for dynamic energy management. The implementation of DR in HEMS is achieved through a variety of techniques, including load shifting, peak load reduction, and energy storage management [94,98,99,103,105,109]. Furthermore, such systems gather data on energy consumption, energy prices, and environmental conditions, thereby facilitating the optimal management of energy consumption and the maximization of user comfort [100,109,110].
MARL has emerged as a transformative approach in the field of building automation systems, addressing a number of critical challenges, including those related to energy management, privacy protection, and adaptability to varying operational contexts. A significant advantage of MARL is its decentralized structure, wherein agents operate autonomously while exchanging only minimal information, thereby ensuring the security and privacy of sensitive user data. This characteristic reduces the necessity for centralized data collection systems, which are often susceptible to security breaches and data misuse. Furthermore, MARL facilitates the development of scalable solutions for the management of distributed smart building systems. This approach allows agents to learn local optimal policies, thereby promoting energy efficiency and user satisfaction. Nevertheless, there are significant challenges that must be overcome before MARL can be practically deployed. One of the most significant challenges is the limited transferability of models trained in one system to another. This is due to the heterogeneous configurations, energy demand patterns, and environmental dynamics present in different buildings or systems. To illustrate, a MARL model that has been optimized for a residential grid with specific appliance usage patterns may be observed to underperform in a commercial building with different energy requirements and occupant behaviors. Consequently, the implementation of MARL solutions across diverse systems necessitates the undertaking of environment-specific tuning or retraining, which can prove both costly and time-consuming [140,141,142].
Transfer learning (TL) represents a promising avenue for enhancing the adaptability of RL and MARL in the context of building automation. TL permits the reuse of models that have been trained on data-rich buildings or systems (the source domain) for the purpose of developing or fine-tuning models for data-scarce environments (the target domain). This approach is especially beneficial for transferring insights about energy consumption, occupancy dynamics, or HVAC operations across buildings with comparable attributes, such as usage type or climatic conditions. The application of TL techniques has the potential to reduce the necessity for extensive retraining, thereby improving the efficiency of model deployment and performance. It is noteworthy that TL has been successfully applied to a number of tasks, including building energy load prediction, occupancy detection and the development of energy management control systems. For instance, the utilization of pre-trained models has been shown to result in a reduction of up to 78% in the error rate associated with energy demand prediction tasks for target buildings with limited data [143,144,145].
A further crucial aspect to be considered is the incorporation of privacy-preserving techniques, notably in contexts where centralized algorithms necessitate access to comprehensive user data. The decentralized learning approach inherent to MARL inherently limits the exposure of private information, as agents depend on local observations to make decisions. However, it is essential to guarantee that the aforementioned privacy-preserving measures are maintained throughout the processes of model training and adaptation. The application of advanced techniques, such as federated learning or privacy-aware reward shaping, could further enhance the potential of MARL in scenarios where stringent data privacy requirements must be observed. It is imperative that MARL algorithms comply with safety constraints and effectively balance multi-objective optimization, including energy efficiency, cost reduction and user comfort. The inability to transfer pre-trained models without a reduction in performance emphasizes the necessity for research into adaptive learning frameworks and methods that can generalize across heterogeneous environments while maintaining data integrity and operational safety [140,141,142].
The integration of EVs with building automation systems through RL, particularly DRL, is transforming energy management in smart buildings. DRL enables dynamic scheduling and intelligent control of energy flows, optimizing EV charging and discharging while ensuring efficient use of RES like PV panels and energy storage systems. By leveraging IoT devices, DRL-based systems collect real-time data on energy consumption, user behavior, and weather conditions, enabling predictive and adaptive energy strategies. For example, algorithms like Charging Control Deep Deterministic Policy Gradient (CDDPG) dynamically adjust EV charging schedules based on real-time energy prices and demand, reducing costs and enhancing system efficiency [130,132,133]. Furthermore, DRL simplifies Vehicle-to-Home (V2H) and Vehicle-to-Grid (V2G) operations, enabling EVs to act as energy storage units that transfer surplus energy to buildings or the grid during periods of peak demand [131,134,135]. These systems also align energy management strategies with user preferences via customizable interfaces, enhancing user satisfaction and acceptance [131,138]. Overall, integrating EVs with building automation through DRL not only reduces energy costs but also maximizes renewable energy use, ensuring smarter, more sustainable energy management [130,133,136,137].
It is of paramount importance to consider user preferences in DR systems integrated with HEMS to enhance user acceptance and satisfaction. Such preferences can be gathered via user interfaces, which permit the configuration of the system in accordance with user requirements [92,98,100,103,106,110]. It is of paramount importance to ascertain user feedback regarding their satisfaction with the energy management strategy, as this enables the adaptive adjustment of the strategy and the minimization of user dissatisfaction, thus optimizing system efficiency. The deployment of RL algorithms, including Q-learning, DRL, PPO, and Primal-Dual DDPG, is of paramount importance for the optimization of DR strategies in HEMS systems. These algorithms learn optimal energy management strategies through interactions with the environment, receiving rewards for actions that result in energy and cost savings [94,99,101,103,105,107,110]. This enables the systems to modify their operational procedures in response to alterations in conditions, such as fluctuations in energy prices and demand [35,92,99,100,106,108]. Mobile applications and other user interfaces afford users the ability to interact with the system, configure preferences, and receive recommendations for optimal energy management. The adaptability of DRL algorithms enables their effective utilization in response to changing market and environmental conditions, such as fluctuations in energy prices and changes in demand. These algorithms facilitate the optimization of renewable energy utilization and energy storage management, thereby minimizing costs and maximizing savings [31,101,102,105,108,111]. Furthermore, DRL models can be employed to forecast future energy requirements and to plan energy consumption in an optimal manner, thus facilitating more effective resource management and cost reduction [35,92,100,106,107].

Problems, Gaps and Challenges

As mentioned previously, the conjunction of IoT and DRL technologies in HEMS exhibits considerable promise. Nevertheless, numerous challenges and deficiencies necessitate attention. Further research is required to ascertain the feasibility of implementing these technologies in real-world settings, with particular attention paid to data security, scalability, and integration with existing energy infrastructure. Overcoming these challenges will facilitate the development of more efficient, safe, and widely available energy management systems in households.
The results of the application analyses indicate that HEMS systems implementing DRL-based scheduling show good results, particularly in simulations. However, there are several significant gaps and challenges that need to be addressed. Firstly, there is a paucity of empirical research on the implementation of DLR applications in HEMS. To assess how these systems cope with changing conditions and user preferences, it is necessary to conduct tests in real conditions. Furthermore, additional research is required on data security, scalability and integration with existing infrastructure, as well as real economic analyses. The authors of numerous publications related to the research on the application possibilities of DR technology and advanced DRL algorithms in HEMS systems have highlighted these issues. In particular, the papers [31,98,99,100,103,106,108,110,111] indicate a requirement for economic analyses and the development of more cost-effective solutions to facilitate the scaling of the technology to a larger number of households. In this context, the research and technical teams also identify challenges from a technological and system security perspective. The collection and processing of substantial data sets pertaining to energy consumption and user preferences carry an inherent risk of privacy violations and vulnerability to cyber-attacks. Accordingly, as the authors of the papers [31,35,99,101,105,107,109,111] indicate, advanced security methods should be developed to ensure the protection of user data and the security of systems against potential threats. This remains a current and significant challenge, particularly considering the monotony of available communication technologies and the trend of data processing in cloud applications. A further significant limitation is the scalability of DR systems and their interoperability with a range of devices and platforms. It is imperative to develop standards and communication protocols that will facilitate the seamless integration of diverse devices and systems [31,98,100,101,106,108,146].
RL typically operates by interacting directly with an environment, thereby enabling agents to learn optimal policies without the necessity of pre-labelled datasets. However, when the environment is a self-developed digital twin (DT), such as one based on LSTM predictive models, the training process entails the introduction of specific data requirements. Such data sets require detailed, high-frequency time-series data, including energy consumption, system performance metrics and external variables such as weather or occupancy patterns. The construction of a realistic and dynamic virtual environment is contingent upon the availability of these data sets, which are essential for enabling the DT to simulate the physical system’s behavior accurately and to predict outcomes under various scenarios. The efficacy of RL models trained in such DT environments is contingent upon the accuracy and comprehensiveness of the data. To illustrate, the DT must capture device-level energy consumption profiles and integrate contextual variables such as seasonal weather variations or user schedules. In the absence of such data, there is a considerable risk that the RL agent will be trained on a model that diverges significantly from real-world conditions. Furthermore, the data enable the DT to assess the long-term consequences of RL strategies, thereby enhancing their resilience and suitability for deployment in physical systems. The development of building-specific DTs is facilitated by a number of common platforms, including EnergyPlus, TRNSYS, and Modelica. These tools permit users to model and simulate building energy systems, HVAC dynamics, and occupant behavior, thereby providing a comprehensive environment for testing RL algorithms and refining control strategies [147,148,149].
The objective of benchmarking RL and DRL algorithms in building control is to evaluate their performance in achieving energy efficiency, maintaining thermal comfort, and addressing cost objectives under controlled conditions. Simulation-based benchmarking is a particularly favored approach due to its cost-effectiveness and ability to standardize test conditions across diverse scenarios. The process entails the definition of key performance indicators (KPIs), including total energy consumption, peak energy demand, occupant comfort metrics (e.g., predicted mean vote or temperature deviations from setpoints), and energy cost savings. The aforementioned KPIs provide a quantitative basis for the evaluation of the effectiveness of the algorithms. Virtual environments, such as those provided by CityLearn, offer a platform for RL/DRL agents to optimize building energy management policies while being compared against traditional control methods, such as rule-based or model predictive controls. As an illustration, KPIs pertaining to flexibility, such as load-shifting potential or photovoltaic (PV) self-consumption, are also becoming progressively pertinent in the context of benchmarking studies. Nevertheless, ensuring comprehensive test coverage reflecting variations in climate, building designs, and occupant behaviors while maintaining high-fidelity simulations to avoid discrepancies between virtual and real-world performance remains a significant challenge [150].
Furthermore, the efficacy of RL/DRL algorithms can be assessed through the utilization of established statistical metrics that ascertain the precision and efficacy of predictive outcomes. Such metrics include the following:
  • Mean Absolute Error: This metric represents the mean of the absolute value of the discrepancy between the predicted and actual values. This provides a straightforward measure of the accuracy of the model, with lower values indicating superior performance.
  • Mean Squared Error: This calculates the mean of the squares of the discrepancies between the predicted and actual values, thereby emphasizing larger errors. This metric is particularly useful for identifying models that are prone to occasional significant errors.
  • Root Mean Squared Error: The square root of the MSE provides an interpretable measure of prediction error in the same units as the target variable. A lower RMSE value is indicative of superior performance.
  • The median absolute error: The median of the absolute differences between predicted and actual values is calculated, thereby ensuring robustness to outliers and providing an accurate measure of typical prediction error.
  • Mean Absolute Percentage Error: This expression of the prediction error as a percentage of the actual values provides a normalized accuracy metric suitable for scenarios where the target variable varies in scale.
  • The R2 value: Also known as the coefficient of determination, this is a statistical measure that quantifies the proportion of variance in a dependent variable that is explained by a given independent variable. It indicates the proportion of variance in the actual data that can be explained by the model. An R2 value approaching 1 indicates a superior quality model, whereas values approaching 0 suggest a lack of efficacy in capturing the data’s inherent variability.
These metrics facilitate a detailed examination of the performance of RL/DRL algorithms, allowing for comprehensive comparisons between models and the refinement of control strategies. In the context of integrating EVs with building automation systems, several unique challenges emerge that require further investigation. While studies [130,132,134,135,137] highlight the potential of RL in managing energy flows between EVs and HEMS, there is a lack of real-world implementation studies. Most existing research relies on simulations, which do not accurately reflect the complexity of integrating EVs with V2H and V2G technologies into existing energy infrastructure. Additionally, ensuring scalability and interoperability with diverse devices and platforms remains a significant challenge [136,138]. The need for robust communication standards and protocols is critical for facilitating seamless integration across systems. Furthermore, RL-based systems are highly dependent on data quality. Incomplete or inaccurate datasets can hinder decision-making processes, as highlighted in [133,137]. Privacy and cybersecurity concerns are also heightened due to the extensive data collection required for optimizing energy use, necessitating the development of advanced security methods to protect user data and prevent potential breaches [135,138]. Addressing these gaps will be pivotal in advancing the practical application of RL in EV and HEMS integration.
A further two problematic issues emerge from the analysis of the literature. The initial challenge pertains to the integration of sophisticated HEMS systems that incorporate DR mechanisms based on DRL with the extant energy infrastructure and smart grid monitoring and control systems. The authors of the papers [31,99,102,104,106,108,110,111] indicate that, given the current technical and standardization conditions, the process may prove to be both complicated and expensive. This necessitates further investigation into methodologies that will facilitate seamless integration of novel technologies with existing energy management systems, thereby enabling the establishment of consistent standards within this domain. A further challenge is that the efficacy of DRL algorithms is dependent on the quality and quantity of data collected by DR systems. In the absence of sufficiently accurate data, suboptimal decisions and system actions may result. Consequently, research is required to develop methods to enhance data quality and algorithms to address the issue of missing or incomplete data, as is suggested in papers [35,92,100,101,104,107,108,111].
The application of Large Language Models (LLMs) is a trending topic across various domains, including the life sciences and engineering solutions. In building automation, only a limited number of studies explore the combination of RL with LLMs, such as in virtual assistants or energy modeling. However, there is currently no literature comprehensively addressing the application of RL/DRL in Home Energy Management Systems.

5. Opportunities and Future Perspective in Application of Reinforcement Learning in Home Automation and Home and Building Energy Management

Considering the identified gaps and challenges pertaining to the implementation of RL algorithmic methodologies in buildings, it is imperative to consider the diverse nature of such facilities when contemplating the prospective avenues for their advancement. While the principal objective is energy efficiency, there are several fundamental differences between smart home automation systems and larger building automation systems that have an impact on the application and development of RL algorithms in each case. The home automation systems are often focused on the management of energy-consuming appliances, lighting, HVAC systems, as well as RES, such as rooftop solar PV panels. In some cases, they may also include home energy storage systems, which are batteries that are used to store energy generated from renewable sources. In comparison to large commercial or industrial buildings, these systems are generally smaller in scale, involve fewer interconnected devices, and have more predictable occupant behavior [112,117]. In contrast, building automation systems, particularly in commercial or multi-dwelling environments, manage a more complex array of equipment, including centralized HVAC systems, elevators, large-scale lighting networks, security systems, and RES integrated into microgrids [32,151,152].
Furthermore, recent scientific and technical publications have identified several key application areas, creating new opportunities for the development of the aforementioned RL and DRL mechanisms in energy management and infrastructure of homes and buildings. Table 4 provides a summary of the most important opportunities assigned to these areas, divided into applications in homes and buildings.
DRL approaches demonstrate considerable potential for enhancing the efficiency of HEMS, thereby improving the energy efficiency of residential buildings under their monitoring and control. Another crucial domain is facilitating the integration of such buildings with smart grid platform functions and DSM/ DSR mechanisms operated by distribution system operators (DSO).
Considering the aforementioned opportunities, the authors have identified key applications and future perspectives for RL and DRL in HEMS that facilitate effective energy management while maintaining the comfort and safety of residential buildings:
  • Adaptability and optimization
    The utilization of DRL models in HEMS facilitates the dynamic realignment of energy management strategies (storage and consumption) in accordance with fluctuating market and weather conditions, as well as evolving user preferences. This approach has the potential to significantly enhance energy and cost savings [172,173];
  • Integration with renewable energy sources
    RL-based HEMS systems facilitate the integration of renewable energy sources, such as photovoltaic panels and wind turbines, thereby enhancing energy independence and reducing reliance on the power grid [172];
  • HEMS as a part of the Smart Grid
    The implementation of DRL models in HEMS can facilitate demand management by enabling more flexible and responsive management of energy consumption, which is of paramount importance in the context of dynamic tariffs and demand response programs [91,162];
  • Data security and privacy
    The implementation of DRL in HEMS systems requires the utilization of sophisticated data protection and security methodologies to guarantee the confidentiality of user data and the integrity of the system against the threat of cyberattacks [29].

6. Conclusions

This review has detailed the significant potential of RL and DRL-based algorithms and methods for improving the efficiency of HEMS, particularly in light of the growing complexities of home infrastructure and energy systems as well as their integration with smart grids. The capacity of RL and DRL for real-time, adaptive decision-making introduces novel approaches to energy management in smart homes, offering benefits over conventional, static control methods. The authors have made a distinctive contribution by examining the applications of RL and DRL methods across several key areas, including load scheduling, DSM/DSR, integration with IoT networks, and energy storage management. Furthermore, the authors emphasize that RL and DRL, through their capacity for continuous learning and data-driven control, can enhance the flexibility and responsiveness of HEMS to factors such as fluctuating energy tariffs, renewable energy availability, and user preferences [174,175,176].
One of the key contributions of this review is the synthesis of RL and DRL techniques in the context of BACS and IoT frameworks, which demonstrates how interconnected systems can utilize real-time data to autonomously balance energy loads, optimize renewable energy use and adjust consumption patterns for cost savings [31,116]. This integration represents a future-oriented direction for HEMS development, supporting not only household energy optimization but also contributing to grid stability through DSR mechanisms.
Notwithstanding the aforementioned opportunities, the review also identifies challenges that require future research work. Most notable is the scalability of DRL solutions and their integration within diverse residential infrastructures. A crucial next stage in the development of DRL-based HEMS is the validation of the technology through case studies conducted in real-world applications and scenarios. This approach will facilitate a practical evaluation of the algorithms’ performance, economic impact, and user satisfaction. Furthermore, future research should address data analysis, processing and security concerns. Also needed is the investigation of methods to enhance algorithmic efficiency in processing high volumes of real-time data and their integration and availability. Such studies will facilitate the transition from theoretical models to practical implementations, thereby advancing RL and DRL-based HEMS towards scalable, resilient, and secure applications in residential energy management.

Author Contributions

Conceptualization, J.G. and A.O.; methodology, D.L.; validation, D.L. and J.G.; formal analysis, D.L. and J.G.; investigation, D.L. and J.G.; resources, D.L.; data curation, J.G.; writing—original draft preparation, J.G.; writing—review and editing, A.O.; supervision, J.G. and A.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Faculty of Engineering, Automatics, Computer Science and Biomedical Engineering of the AGH University of Krakow as part of a research subsidy for young scientists (Dean’s grants) for 2024. Application number 10.16.120.79990.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

NomenclatureDefinition
A2CAdvantage Actor–Critic
A3CAsynchronous Advantage Actor–Critic
AIArtificial Intelligence
ANNArtificial Neural Networks
BACSBuilding Automation and Control Systems
BMSBuilding Management Systems
DDPGDeep Deterministic Policy Gradients
CDDPGCharging Control Deep Deterministic Policy Gradient
CNNConvolutional Neural Networks
DDQNDouble Deep Q-learning
DERsDistributed Energy Resources
DQNDeep Q-network
DRLDeep Reinforcement Learning
DSMDemand Side Management
DSODistribution System Operator
DSRDemand Side Response
DTDigital Twin
DTADual Targeting Algorithm
EPBDEnergy Performance of Buildings Directive
EVElectric Vehicle
GDPRGeneral Data Protection Regulation
HEMSHome Energy Management Systems
HVACHeating, Ventilation and Air Condition
IFCIndustry Foundation Classes
IoTInternet of Things
LLMLarge Language Models
LSTMLong Short-Term Memory
MADDPGMulti-agent Deep Deterministic Policy Gradient
MARLMulti-Agent Reinforcement Learning
MDPMarkov Decision Process
MLMachine Learning
MORLMulti-Objective Reinforcement Learning
PPOProximal Policy Optimization
PVPhotovoltaic
RESRenewable Energy Sources
RLReinforcement Learning
SACSoft Actor–Critic
SRISmart Readiness Indicator
TD3Twin Delayed Deep Deterministic Policy Gradient
TLTransfer Learning
TRPOTrust Region Policy Optimization
V2G Vehicle-to-Grid
V2HVehicle-to-Home

References

  1. Filho, G.P.R.; Villas, L.A.; Gonçalves, V.P.; Pessin, G.; Loureiro, A.A.F.; Ueyama, J. Energy-Efficient Smart Home Systems: Infrastructure and Decision-Making Process. Internet Things 2019, 5, 153–167. [Google Scholar] [CrossRef]
  2. Pratt, A.; Krishnamurthy, D.; Ruth, M.; Wu, H.; Lunacek, M.; Vaynshenk, P. Transactive Home Energy Management Systems: The Impact of Their Proliferation on the Electric Grid. IEEE Electrif. Mag. 2016, 4, 8–14. [Google Scholar] [CrossRef]
  3. Diyan, M.; Silva, B.N.; Han, K. A Multi-Objective Approach for Optimal Energy Management in Smart Home Using the Reinforcement Learning. Sensors 2020, 20, 3450. [Google Scholar] [CrossRef] [PubMed]
  4. Pau, G.; Collotta, M.; Ruano, A.; Qin, J. Smart Home Energy Management. Energies 2017, 10, 382. [Google Scholar] [CrossRef]
  5. Umair, M.; Cheema, M.A.; Afzal, B.; Shah, G. Energy Management of Smart Homes over Fog-Based IoT Architecture. Sustain. Comput. Inform. Syst. 2023, 39, 100898. [Google Scholar] [CrossRef]
  6. Deanseekeaw, A.; Khortsriwong, N.; Boonraksa, P.; Boonraksa, T.; Marungsri, B. Optimal Load Scheduling for Smart Home Energy Management Using Deep Reinforcement Learning. In Proceedings of the 2024 12th International Electrical Engineering Congress (iEECON), Pattaya, Thailand, 6–8 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
  7. Ożadowicz, A.; Grela, J. An Event-Driven Building Energy Management System Enabling Active Demand Side Management. In Proceedings of the 2016 Second International Conference on Event-Based Control, Communication, and Signal Processing (EBCCSP), Krakow, Poland, 13–15 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–8. [Google Scholar]
  8. Verschae, R.; Kato, T.; Matsuyama, T. Energy Management in Prosumer Communities: A Coordinated Approach. Energies 2016, 9, 562. [Google Scholar] [CrossRef]
  9. European Parliament Directive (EU) 2024/1275 of the European Parliament and the Council on the Energy Performance of Buildings; The European Parliament And The Council of The European Union: Strasbourg, France, 2024.
  10. European Commission. Energy Roadmap 2050; European Commission: Brussels, Belgium, 2012.
  11. Fokaides, P.A.; Panteli, C.; Panayidou, A. How Are the Smart Readiness Indicators Expected to Affect the Energy Performance of Buildings: First Evidence and Perspectives. Sustainability 2020, 12, 9496. [Google Scholar] [CrossRef]
  12. Märzinger, T.; Österreicher, D. Extending the Application of the Smart Readiness Indicator—A Methodology for the Quantitative Assessment of the Load Shifting Potential of Smart Districts. Energies 2020, 13, 3507. [Google Scholar] [CrossRef]
  13. Ożadowicz, A. A Hybrid Approach in Design of Building Energy Management System with Smart Readiness Indicator and Building as a Service Concept. Energies 2022, 15, 1432. [Google Scholar] [CrossRef]
  14. ISO 52120-1:2021; I. 205 T.C. Energy Performance of Buildings Contribution of Building Automation, Controls and Building Management. International Organization for Standardization: Geneva, Switzerland, 2021.
  15. Favuzza, S.; Ippolito, M.; Massaro, F.; Musca, R.; Riva Sanseverino, E.; Schillaci, G.; Zizzo, G. Building Automation and Control Systems and Electrical Distribution Grids: A Study on the Effects of Loads Control Logics on Power Losses and Peaks. Energies 2018, 11, 667. [Google Scholar] [CrossRef]
  16. Mahmood, A.; Baig, F.; Alrajeh, N.; Qasim, U.; Khan, Z.; Javaid, N. An Enhanced System Architecture for Optimized Demand Side Management in Smart Grid. Appl. Sci. 2016, 6, 122. [Google Scholar] [CrossRef]
  17. Hou, P.; Yang, G.; Hu, J.; Douglass, P.J.; Xue, Y. A Distributed Transactive Energy Mechanism for Integrating PV and Storage Prosumers in Market Operation. Engineering 2022, 12, 171–182. [Google Scholar] [CrossRef]
  18. Kato, T.; Ishikawa, N.; Yoshida, N. Distributed Autonomous Control of Home Appliances Based on Event Driven Architecture. In Proceedings of the 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE), Nagoya, Japan, 24–27 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–2. [Google Scholar]
  19. Charbonnier, F.; Morstyn, T.; McCulloch, M.D. Scalable Multi-Agent Reinforcement Learning for Distributed Control of Residential Energy Flexibility. Appl. Energy 2022, 314, 118825. [Google Scholar] [CrossRef]
  20. Delsing, J. Local Cloud Internet of Things Automation: Technology and Business Model Features of Distributed Internet of Things Automation Solutions. IEEE Ind. Electron. Mag. 2017, 11, 8–21. [Google Scholar] [CrossRef]
  21. Yassine, A.; Singh, S.; Hossain, M.S.; Muhammad, G. IoT Big Data Analytics for Smart Homes with Fog and Cloud Computing. Future Gener. Comput. Syst. 2019, 91, 563–573. [Google Scholar] [CrossRef]
  22. Machorro-Cano, I.; Alor-Hernández, G.; Paredes-Valverde, M.A.; Rodríguez-Mazahua, L.; Sánchez-Cervantes, J.L.; Olmedo-Aguirre, J.O. HEMS-IoT: A Big Data and Machine Learning-Based Smart Home System for Energy Saving. Energies 2020, 13, 1097. [Google Scholar] [CrossRef]
  23. Bawa, M.; Caganova, D.; Szilva, I.; Spirkova, D. Importance of Internet of Things and Big Data in Building Smart City and What Would Be Its Challenges. In Smart City 360°; Leon-Garcia, A., Lenort, R., Holman, D., Staš, D., Krutilova, V., Wicher, P., Cagáňová, D., Špirková, D., Golej, J., Nguyen, K., Eds.; Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering; Springer International Publishing: Cham, Switzerland, 2016; Volume 166, pp. 605–616. ISBN 978-3-319-33680-0. [Google Scholar]
  24. Lawal, K.N.; Olaniyi, T.K.; Gibson, R.M. Leveraging Real-World Data from IoT Devices in a Fog–Cloud Architecture for Resource Optimisation within a Smart Building. Appl. Sci. 2023, 14, 316. [Google Scholar] [CrossRef]
  25. Akter, M.N.; Mahmud, M.A.; Oo, A.M.T. A Hierarchical Transactive Energy Management System for Microgrids. In Proceedings of the 2016 IEEE Power and Energy Society General Meeting (PESGM), Boston, MA, USA, 17–21 July 2016; IEEE: Piscataway, NJ, USA, 2016; Volume 2016, pp. 1–5. [Google Scholar]
  26. Taghizad-Tavana, K.; Ghanbari-Ghalehjoughi, M.; Razzaghi-Asl, N.; Nojavan, S.; Alizadeh, A. An Overview of the Architecture of Home Energy Management System as Microgrids, Automation Systems, Communication Protocols, Security, and Cyber Challenges. Sustainability 2022, 14, 15938. [Google Scholar] [CrossRef]
  27. Kiehbadroudinezhad, M.; Merabet, A.; Abo-Khalil, A.G.; Salameh, T.; Ghenai, C. Intelligent and Optimized Microgrids for Future Supply Power from Renewable Energy Resources: A Review. Energies 2022, 15, 3359. [Google Scholar] [CrossRef]
  28. Chamana, M.; Schmitt, K.E.K.; Bhatta, R.; Liyanage, S.; Osman, I.; Murshed, M.; Bayne, S.; MacFie, J. Buildings Participation in Resilience Enhancement of Community Microgrids: Synergy Between Microgrid and Building Management Systems. IEEE Access 2022, 10, 100922–100938. [Google Scholar] [CrossRef]
  29. Al-Ani, O.; Das, S. Reinforcement Learning: Theory and Applications in HEMS. Energies 2022, 15, 6392. [Google Scholar] [CrossRef]
  30. Wang, Z.; Hong, T. Reinforcement Learning for Building Controls: The Opportunities and Challenges. Appl. Energy 2020, 269, 115036. [Google Scholar] [CrossRef]
  31. Benjamin, A.; Badar, A.Q.H. Reinforcement Learning Based Cost-Effective Smart Home Energy Management. In Proceedings of the 2023 IEEE 3rd International Conference on Sustainable Energy and Future Electric Transportation (SEFET), Bhubaneswar, India, 9–12 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
  32. Yu, L.; Qin, S.; Zhang, M.; Shen, C.; Jiang, T.; Guan, X. A Review of Deep Reinforcement Learning for Smart Building Energy Management. IEEE Internet Things J. 2021, 8, 12046–12063. [Google Scholar] [CrossRef]
  33. Wei, T.; Wang, Y.; Zhu, Q. Deep Reinforcement Learning for Building HVAC Control. In Proceedings of the 54th Annual Design Automation Conference 2017, Austin, TX, USA, 18–22 June 2017; ACM: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
  34. Yu, L.; Xie, W.; Xie, D.; Zou, Y.; Zhang, D.; Sun, Z.; Zhang, L.; Zhang, Y.; Jiang, T. Deep Reinforcement Learning for Smart Home Energy Management. IEEE Internet Things J. 2020, 7, 2751–2762. [Google Scholar] [CrossRef]
  35. Kodama, N.; Harada, T.; Miyazaki, K. Home Energy Management Algorithm Based on Deep Reinforcement Learning Using Multistep Prediction. IEEE Access 2021, 9, 153108–153115. [Google Scholar] [CrossRef]
  36. Perez, K.X.; Baldea, M.; Edgar, T.F. Integrated Smart Appliance Scheduling and HVAC Control for Peak Residential Load Management. In Proceedings of the 2016 American Control Conference (ACC), Boston, MA, USA, 6–8 July 2016; IEEE: Piscataway, NJ, USA, 2016; Volume 2016, pp. 1458–1463. [Google Scholar]
  37. Tekler, Z.D.; Low, R.; Yuen, C.; Blessing, L. Plug-Mate: An IoT-Based Occupancy-Driven Plug Load Management System in Smart Buildings. Build. Environ. 2022, 223, 109472. [Google Scholar] [CrossRef]
  38. Fambri, G.; Badami, M.; Tsagkrasoulis, D.; Katsiki, V.; Giannakis, G.; Papanikolaou, A. Demand Flexibility Enabled by Virtual Energy Storage to Improve Renewable Energy Penetration. Energies 2020, 13, 5128. [Google Scholar] [CrossRef]
  39. Mancini, F.; Lo Basso, G.; de Santoli, L. Energy Use in Residential Buildings: Impact of Building Automation Control Systems on Energy Performance and Flexibility. Energies 2019, 12, 2896. [Google Scholar] [CrossRef]
  40. Liu, Z.; Zhang, X.; Sun, Y.; Zhou, Y. Advanced Controls on Energy Reliability, Flexibility and Occupant-Centric Control for Smart and Energy-Efficient Buildings. Energy Build. 2023, 297, 113436. [Google Scholar] [CrossRef]
  41. Babar, M.; Grela, J.; Ożadowicz, A.; Nguyen, P.; Hanzelka, Z.; Kamphuis, I. Energy Flexometer: Transactive Energy-Based Internet of Things Technology. Energies 2018, 11, 568. [Google Scholar] [CrossRef]
  42. Chen, Y.; Yang, Y.; Xu, X. Towards Transactive Energy: An Analysis of Information-related Practical Issues. Energy Convers. Econ. 2022, 3, 112–121. [Google Scholar] [CrossRef]
  43. Sheshalani, B.; Zapiee, M.K.; Mohana, D. Smart Home Automation System Using IOT. Int. J. Recent Technol. Appl. Sci. 2022, 4, 44–53. [Google Scholar] [CrossRef]
  44. Yar, H.; Imran, A.S.; Khan, Z.A.; Sajjad, M.; Kastrati, Z. Towards Smart Home Automation Using IoT-Enabled Edge-Computing Paradigm. Sensors 2021, 21, 4932. [Google Scholar] [CrossRef] [PubMed]
  45. Almusaylim, Z.A.; Zaman, N. A Review on Smart Home Present State and Challenges: Linked to Context-Awareness Internet of Things (IoT). Wirel. Netw. 2019, 25, 3193–3204. [Google Scholar] [CrossRef]
  46. Sun, H.; Yu, H.; Fan, G.; Chen, L. Energy and Time Efficient Task Offloading and Resource Allocation on the Generic IoT-Fog-Cloud Architecture. Peer Peer Netw. Appl. 2020, 13, 548–563. [Google Scholar] [CrossRef]
  47. García-Monge, M.; Zalba, B.; Casas, R.; Cano, E.; Guillén-Lambea, S.; López-Mesa, B.; Martínez, I. Is IoT Monitoring Key to Improve Building Energy Efficiency? Case Study of a Smart Campus in Spain. Energy Build. 2023, 285, 112882. [Google Scholar] [CrossRef]
  48. Arif, S.; Khan, M.A.; Rehman, S.U.; Kabir, M.A.; Imran, M. Investigating Smart Home Security: Is Blockchain the Answer? IEEE Access 2020, 8, 117802–117816. [Google Scholar] [CrossRef]
  49. Graveto, V.; Cruz, T.; Simöes, P. Security of Building Automation and Control Systems: Survey and Future Research Directions. Comput. Secur. 2022, 112, 102527. [Google Scholar] [CrossRef]
  50. Parikh, S.; Dave, D.; Patel, R.; Doshi, N. Security and Privacy Issues in Cloud, Fog and Edge Computing. Procedia Comput. Sci. 2019, 160, 734–739. [Google Scholar] [CrossRef]
  51. Abed, S.; Jaffal, R.; Mohd, B.J. A Review on Blockchain and IoT Integration from Energy, Security and Hardware Perspectives. Wirel. Pers. Commun. 2023, 129, 2079–2122. [Google Scholar] [CrossRef]
  52. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; Adaptive Computation and Machine Learning; The MIT Press: Cambridge, MA, USA, 2018; ISBN 9780262039246. (Hardcover). [Google Scholar]
  53. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
  54. Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep Reinforcement Learning: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5064–5078. [Google Scholar] [CrossRef] [PubMed]
  55. Liu, X.; Zhang, J.; Hou, Z.; Yang, Y.I.; Gao, Y.Q. From Predicting to Decision Making: Reinforcement Learning in Biomedicine. WIREs Comput. Mol. Sci. 2024, 14, e1723. [Google Scholar] [CrossRef]
  56. Roderick, M.; MacGlashan, J.; Tellex, S. Implementing the Deep Q-Network. arXiv 2017, arXiv:1711.07478. [Google Scholar]
  57. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
  58. Oh, D.-H.; Adams, D.; Vo, N.D.; Gbadago, D.Q.; Lee, C.-H.; Oh, M. Actor-Critic Reinforcement Learning to Estimate the Optimal Operating Conditions of the Hydrocracking Process. Comput. Chem. Eng. 2021, 149, 107280. [Google Scholar] [CrossRef]
  59. Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. arXiv 2016, arXiv:1602.01783. [Google Scholar]
  60. Hou, Y.; Liu, L.; Wei, Q.; Xu, X.; Chen, C. A Novel DDPG Method with Prioritized Experience Replay. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 316–321. [Google Scholar]
  61. Kumar Rachakatla, S.; Ravichandran, P.; Reddy Machireddy, J. Scalable Machine Learning Workflows in Data Warehousing: Automating Model Training and Deployment with AI. Aust. J. Mach. Learn. Res. Appl. 2022, 2, 262–286. [Google Scholar]
  62. Stone, G.B.; Talbert, D.A.; Eberle, W. A Survey of Scalable Reinforcement Learning. Int. J. Intell. Comput. Res. 2022, 13, 1118–1124. [Google Scholar] [CrossRef]
  63. Sanz-Jimeno, R.; Álvarez-Díaz, S. A Tool Based on the Industry Foundation Classes Standard for Dynamic Data Collection and Automatic Generation of Building Automation Control Networks. J. Build. Eng. 2023, 78, 107625. [Google Scholar] [CrossRef]
  64. Ruiz-Zafra, A.; Benghazi, K.; Noguera, M. IFC+: Towards the Integration of IoT into Early Stages of Building Design. Autom. Constr. 2022, 136, 104129. [Google Scholar] [CrossRef]
  65. Tang, S.; Shelden, D.R.; Eastman, C.M.; Pishdad-Bozorgi, P.; Gao, X. BIM Assisted Building Automation System Information Exchange Using BACnet and IFC. Autom. Constr. 2020, 110, 103049. [Google Scholar] [CrossRef]
  66. Sadeghi Eshkevari, S.; Tang, X.; Qin, Z.; Mei, J.; Zhang, C.; Meng, Q.; Xu, J. Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm Deployed in Ridehailing Marketplace. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; ACM: New York, NY, USA, 2022; pp. 3838–3848. [Google Scholar]
  67. Mo, K.; Ye, P.; Ren, X.; Wang, S.; Li, W.; Li, J. Security and Privacy Issues in Deep Reinforcement Learning: Threats and Countermeasures. ACM Comput. Surv. 2024, 56, 1–39. [Google Scholar] [CrossRef]
  68. Papernot, N.; McDaniel, P.; Sinha, A.; Wellman, M.P. SoK: Security and Privacy in Machine Learning. In Proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroS&P), London, UK, 24–26 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 399–414. [Google Scholar]
  69. Benyahya, M.; Kechagia, S.; Collen, A.; Nijdam, N.A. The Interface of Privacy and Data Security in Automated City Shuttles: The GDPR Analysis. Appl. Sci. 2022, 12, 4413. [Google Scholar] [CrossRef]
  70. Ożadowicz, A. Generic IoT for Smart Buildings and Field-Level Automation—Challenges, Threats, Approaches, and Solutions. Computers 2024, 13, 45. [Google Scholar] [CrossRef]
  71. Yu, J.; Kim, M.; Bang, H.C.; Bae, S.H.; Kim, S.J. IoT as a Applications: Cloud-Based Building Management Systems for the Internet of Things. Multimed. Tools Appl. 2016, 75, 14583–14596. [Google Scholar] [CrossRef]
  72. Kastner, W.; Kofler, M.; Jung, M.; Gridling, G.; Weidinger, J. Building Automation Systems Integration into the Internet of Things. The IoT6 Approach, Its Realization and Validation. In Proceedings of the Emerging Technology and Factory Automation (ETFA), Barcelona, Spain, 16–19 September 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–9. [Google Scholar]
  73. Stijin, V.; Dorien, A.; Glenn, R.; Yixiao, M. Waide Paul Final Report on The Technical Support to The Development of a Smart Readiness Indicator for Buildings; European Commission: Brussels, Belgium, 2020. [Google Scholar]
  74. European Parliament Directive (EU) 2018/844 of the European Parliament and the Council on the Energy Performance of Buildings; The European Parliament and The Council of the European Union: Strasbourg, France, 2018.
  75. Ramezani, B.; da Silva, M.C.G.; Simões, N. Application of Smart Readiness Indicator for Mediterranean Buildings in Retrofitting Actions. Energy Build. 2021, 249, 111173. [Google Scholar] [CrossRef]
  76. Janhunen, E.; Pulkka, L.; Säynäjoki, A.; Junnila, S. Applicability of the Smart Readiness Indicator for Cold Climate Countries. Buildings 2019, 9, 102. [Google Scholar] [CrossRef]
  77. Ożadowicz, A.; Grela, J. Impact of Building Automation Control Systems on Energy Efficiency—University Building Case Study. In Proceedings of the 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Limassol, Cyprus, 12–15 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
  78. Ożadowicz, A.; Grela, J. Energy Saving in the Street Lighting Control System—A New Approach Based on the EN-15232 Standard. Energy Effic. 2017, 10, 563–576. [Google Scholar] [CrossRef]
  79. Laroui, M.; Nour, B.; Moungla, H.; Cherif, M.A.; Afifi, H.; Guizani, M. Edge and Fog Computing for IoT: A Survey on Current Research Activities & Future Directions. Comput. Commun. 2021, 180, 210–231. [Google Scholar] [CrossRef]
  80. Genkin, M.; McArthur, J.J. B-SMART: A Reference Architecture for Artificially Intelligent Autonomic Smart Buildings. Eng. Appl. Artif. Intell. 2023, 121, 106063. [Google Scholar] [CrossRef]
  81. Seitz, A.; Johanssen, J.O.; Bruegge, B.; Loftness, V.; Hartkopf, V.; Sturm, M. A Fog Architecture for Decentralized Decision Making in Smart Buildings. In Proceedings of the 2017 2nd International Workshop on Science of Smart City Operations and Platforms Engineering, in Partnership with Global City Teams Challenge, SCOPE 2017, Pittsburgh, PA, USA, 21 April 2017; Association for Computing Machinery, Inc.: New York, NY, USA, 2017; pp. 34–39. [Google Scholar]
  82. Mansour, M.; Gamal, A.; Ahmed, A.I.; Said, L.A.; Elbaz, A.; Herencsar, N.; Soltan, A. Internet of Things: A Comprehensive Overview on Protocols, Architectures, Technologies, Simulation Tools, and Future Directions. Energies 2023, 16, 3465. [Google Scholar] [CrossRef]
  83. Yousefpour, A.; Fung, C.; Nguyen, T.; Kadiyala, K.; Jalali, F.; Niakanlahiji, A.; Kong, J.; Jue, J.P. All One Needs to Know about Fog Computing and Related Edge Computing Paradigms: A Complete Survey. J. Syst. Archit. 2019, 98, 289–330. [Google Scholar] [CrossRef]
  84. Kastner, W.; Jung, M.; Krammer, L. Future Trends in Smart Homes and Buildings. In Industrial Communication Technology Handbook, 2nd ed.; Zurawski, R., Ed.; CRC Press Taylor & Francis Group: Boca Raton, FL, USA, 2015; pp. 59-1–59-20. ISBN 978-1-4822-0732-3. [Google Scholar]
  85. Lobaccaro, G.; Carlucci, S.; Löfström, E. A Review of Systems and Technologies for Smart Homes and Smart Grids. Energies 2016, 9, 348. [Google Scholar] [CrossRef]
  86. Bouchabou, D.; Nguyen, S.M.; Lohr, C.; LeDuc, B.; Kanellos, I. A Survey of Human Activity Recognition in Smart Homes Based on IoT Sensors Algorithms: Taxonomies, Challenges, and Opportunities with Deep Learning. Sensors 2021, 21, 6037. [Google Scholar] [CrossRef] [PubMed]
  87. Grela, J.; Ożadowicz, A. Building Automation Planning and Design Tool Implementing EN 15 232 BACS Efficiency Classes. In Proceedings of the 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), Berlin, Germany, 6–9 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–4. [Google Scholar]
  88. Sharda, S.; Sharma, K.; Singh, M. A Real-Time Automated Scheduling Algorithm with PV Integration for Smart Home Prosumers. J. Build. Eng. 2021, 44, 102828. [Google Scholar] [CrossRef]
  89. Sangoleye, F.; Jao, J.; Faris, K.; Tsiropoulou, E.E.; Papavassiliou, S. Reinforcement Learning-Based Demand Response Management in Smart Grid Systems With Prosumers. IEEE Syst. J. 2023, 17, 1797–1807. [Google Scholar] [CrossRef]
  90. Ożadowicz, A. A New Concept of Active Demand Side Management for Energy Efficient Prosumer Microgrids with Smart Building Technologies. Energies 2017, 10, 1771. [Google Scholar] [CrossRef]
  91. Sierla, S.; Pourakbari-Kasmaei, M.; Vyatkin, V. A Taxonomy of Machine Learning Applications for Virtual Power Plants and Home/Building Energy Management Systems. Autom. Constr. 2022, 136, 104174. [Google Scholar] [CrossRef]
  92. Razghandi, M.; Zhou, H.; Erol-Kantarci, M.; Turgut, D. Smart Home Energy Management: Sequence-to-Sequence Load Forecasting and Q-Learning. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
  93. Zhang, H.; Wu, D.; Boulet, B. A Review of Recent Advances on Reinforcement Learning for Smart Home Energy Management. In Proceedings of the 2020 IEEE Electric Power and Energy Conference (EPEC), Piscataway, NJ, USA, 9–10 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
  94. Lu, R.; Hong, S.H.; Yu, M. Demand Response for Home Energy Management Using Reinforcement Learning and Artificial Neural Network. IEEE Trans. Smart Grid 2019, 10, 6629–6639. [Google Scholar] [CrossRef]
  95. Radhamani, R.; Karthick, S.; Kishore Kumar, S.; Gokulraj, M. Deployment of an IoT-Integrated Home Energy Management System Employing Deep Reinforcement Learning. In Proceedings of the 2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA), Namakkal, India, 15–16 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
  96. Dhayalan, V.; Raman, R.; Kalaivani, N.; Shrirvastava, A.; Reddy, R.S.; Meenakshi, B. Smart Renewable Energy Management Using Internet of Things and Reinforcement Learning. In Proceedings of the 2024 2nd International Conference on Computer, Communication and Control (IC4), Indore, India, 8–10 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
  97. Wang, Y.; Xiao, R.; Wang, X.; Liu, A. Constructing Autonomous, Personalized, and Private Working Management of Smart Home Products Based on Deep Reinforcement Learning. Procedia CIRP 2023, 119, 72–77. [Google Scholar] [CrossRef]
  98. Chen, S.-J.; Chiu, W.-Y.; Liu, W.-J. User Preference-Based Demand Response for Smart Home Energy Management Using Multiobjective Reinforcement Learning. IEEE Access 2021, 9, 161627–161637. [Google Scholar] [CrossRef]
  99. Angano, W.; Musau, P.; Wekesa, C.W. Design and Testing of a Demand Response Q-Learning Algorithm for a Smart Home Energy Management System. In Proceedings of the 2021 IEEE PES/IAS PowerAfrica, Virtual Conference, 23–27 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
  100. Amer, A.A.; Shaban, K.; Massoud, A.M. DRL-HEMS: Deep Reinforcement Learning Agent for Demand Response in Home Energy Management Systems Considering Customers and Operators Perspectives. IEEE Trans. Smart Grid 2023, 14, 239–250. [Google Scholar] [CrossRef]
  101. Liu, W.; Wang, Y.; Jiang, F.; Cheng, Y.; Rong, J.; Wang, C.; Peng, J. A Real-Time Demand Response Strategy of Home Energy Management by Using Distributed Deep Reinforcement Learning. In Proceedings of the 2021 IEEE 23rd International Conference on High Performance Computing & Communications; 7th International Conference on Data Science & Systems; 19th International Conference on Smart City; 7th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), Haikou, China, 20–22 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 988–995. [Google Scholar]
  102. Alfaverh, F.; Denai, M.; Sun, Y. Demand Response Strategy Based on Reinforcement Learning and Fuzzy Reasoning for Home Energy Management. IEEE Access 2020, 8, 39310–39321. [Google Scholar] [CrossRef]
  103. Li, H.; Wan, Z.; He, H. A Deep Reinforcement Learning Based Approach for Home Energy Management System. In Proceedings of the 2020 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 17–20 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
  104. Mathew, A.; Roy, A.; Mathew, J. Intelligent Residential Energy Management System Using Deep Reinforcement Learning. IEEE Syst. J. 2020, 14, 5362–5372. [Google Scholar] [CrossRef]
  105. Ding, H.; Xu, Y.; Chew Si Hao, B.; Li, Q.; Lentzakis, A. A Safe Reinforcement Learning Approach for Multi-Energy Management of Smart Home. Electr. Power Syst. Res. 2022, 210, 108120. [Google Scholar] [CrossRef]
  106. Chu, Y.; Wei, Z.; Sun, G.; Zang, H.; Chen, S.; Zhou, Y. Optimal Home Energy Management Strategy: A Reinforcement Learning Method with Actor-Critic Using Kronecker-Factored Trust Region. Electr. Power Syst. Res. 2022, 212, 108617. [Google Scholar] [CrossRef]
  107. Lissa, P.; Deane, C.; Schukat, M.; Seri, F.; Keane, M.; Barrett, E. Deep Reinforcement Learning for Home Energy Management System Control. Energy AI 2021, 3, 100043. [Google Scholar] [CrossRef]
  108. Liu, Y.; Zhang, D.; Gooi, H.B. Optimization Strategy Based on Deep Reinforcement Learning for Home Energy Management. CSEE J. Power Energy Syst. 2020, 6, 572–582. [Google Scholar] [CrossRef]
  109. Kumari, A.; Tanwar, S. Reinforcement Learning for Multiagent-Based Residential Energy Management System. In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
  110. Kumari, A.; Kakkar, R.; Tanwar, S.; Garg, D.; Polkowski, Z.; Alqahtani, F.; Tolba, A. Multi-Agent-Based Decentralized Residential Energy Management Using Deep Reinforcement Learning. J. Build. Eng. 2024, 87, 109031. [Google Scholar] [CrossRef]
  111. Amer, A.; Shaban, K.; Massoud, A. Demand Response in HEMSs Using DRL and the Impact of Its Various Configurations and Environmental Changes. Energies 2022, 15, 8235. [Google Scholar] [CrossRef]
  112. Roslann, A.; Asuhaimi, F.A.; Ariffin, K.N.Z. Energy Efficient Scheduling in Smart Home Using Deep Reinforcement Learning. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 13–15 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
  113. Xiong, L.; Tang, Y.; Liu, C.; Mao, S.; Meng, K.; Dong, Z.; Qian, F. Meta-Reinforcement Learning-Based Transferable Scheduling Strategy for Energy Management. IEEE Trans. Circuits Syst. I Regul. Pap. 2023, 70, 1685–1695. [Google Scholar] [CrossRef]
  114. Kahraman, A.; Yang, G. Home Energy Management System Based on Deep Reinforcement Learning Algorithms. In Proceedings of the 2022 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Novi Sad, Serbia, 10–12 October 2022; IEEE: Piscataway, NJ, USA, 2022; Volume 2022, pp. 1–5. [Google Scholar]
  115. Aldahmashi, J.; Ma, X. Real-Time Energy Management in Smart Homes Through Deep Reinforcement Learning. IEEE Access 2024, 12, 43155–43172. [Google Scholar] [CrossRef]
  116. Seveiche-Maury, Z.; Arrubla-Hoyos, W. Proposal of a Decision-Making Model for Home Energy Saving through Artificial Intelligence Applied to a HEMS. In Proceedings of the 2023 IEEE Colombian Caribbean Conference (C3), Barranquilla, Colombia, 22–25 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
  117. Wei, G.; Chi, M.; Liu, Z.-W.; Ge, M.; Li, C.; Liu, X. Deep Reinforcement Learning for Real-Time Energy Management in Smart Home. IEEE Syst. J. 2023, 17, 2489–2499. [Google Scholar] [CrossRef]
  118. Jiang, F.; Zheng, C.; Gao, D.; Zhang, X.; Liu, W.; Cheng, Y.; Hu, C.; Peng, J. A Novel Multi-Agent Cooperative Reinforcement Learning Method for Home Energy Management under a Peak Power-Limiting. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 350–355. [Google Scholar]
  119. Diyan, M.; Khan, M.; Zhenbo, C.; Silva, B.N.; Han, J.; Han, K.J. Intelligent Home Energy Management System Based on Bi-Directional Long-Short Term Memory and Reinforcement Learning. In Proceedings of the 2021 International Conference on Information Networking (ICOIN), Jeju Island, Republic of Korea, 13–16 January 2021; IEEE: Piscataway, NJ, USA, 2021; Volume 2021, pp. 782–787. [Google Scholar]
  120. Zenginis, I.; Vardakas, J.; Koltsaklis, N.E.; Verikoukis, C. Smart Home’s Energy Management Through a Clustering-Based Reinforcement Learning Approach. IEEE Internet Things J. 2022, 9, 16363–16371. [Google Scholar] [CrossRef]
  121. Haq, E.U.; Lyu, C.; Xie, P.; Yan, S.; Ahmad, F.; Jia, Y. Implementation of Home Energy Management System Based on Reinforcement Learning. Energy Rep. 2022, 8, 560–566. [Google Scholar] [CrossRef]
  122. Thattai, K.; Ravishankar, J.; Li, C. Consumer-Centric Home Energy Management System Using Trust Region Policy Optimization- Based Multi-Agent Deep Reinforcement Learning. In Proceedings of the 2023 IEEE Belgrade PowerTech, Belgrade, Serbia, 25–29 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
  123. Langer, L.; Volling, T. A Reinforcement Learning Approach to Home Energy Management for Modulating Heat Pumps and Photovoltaic Systems. Appl. Energy 2022, 327, 120020. [Google Scholar] [CrossRef]
  124. Xiong, S.; Liu, D.; Chen, Y.; Zhang, Y.; Cai, X. A Deep Reinforcement Learning Approach Based Energy Management Strategy for Home Energy System Considering the Time-of-Use Price and Real-Time Control of Energy Storage System. Energy Rep. 2024, 11, 3501–3508. [Google Scholar] [CrossRef]
  125. Lee, S.; Choi, D.-H. Reinforcement Learning-Based Energy Management of Smart Home with Rooftop Solar Photovoltaic System, Energy Storage System, and Home Appliances. Sensors 2019, 19, 3937. [Google Scholar] [CrossRef] [PubMed]
  126. Abedi, S.; Yoon, S.W.; Kwon, S. Battery Energy Storage Control Using a Reinforcement Learning Approach with Cyclic Time-Dependent Markov Process. Int. J. Electr. Power Energy Syst. 2022, 134, 107368. [Google Scholar] [CrossRef]
  127. Härtel, F.; Bocklisch, T. Minimizing Energy Cost in PV Battery Storage Systems Using Reinforcement Learning. IEEE Access 2023, 11, 39855–39865. [Google Scholar] [CrossRef]
  128. Xu, G.; Shi, J.; Wu, J.; Lu, C.; Wu, C.; Wang, D.; Han, Z. An Optimal Solutions-Guided Deep Reinforcement Learning Approach for Online Energy Storage Control. Appl. Energy 2024, 361, 122915. [Google Scholar] [CrossRef]
  129. Wang, B.; Zha, Z.; Zhang, L.; Liu, L.; Fan, H. Deep Reinforcement Learning-Based Security-Constrained Battery Scheduling in Home Energy System. IEEE Trans. Consum. Electron. 2024, 70, 3548–3561. [Google Scholar] [CrossRef]
  130. Kumar, P.P.; Nuvvula, R.S.S.; Tan, C.C.; Al-Salman, G.A.; Guntreddi, V.; Raj, V.A.; Khan, B. Energy-Aware Vehicle-to-Grid (V2G) Scheduling with Reinforcement Learning for Renewable Energy Integration. In Proceedings of the 2024 12th International Conference on Smart Grid (icSmartGrid), Setubal, Portugal, 27–29 May 2024; pp. 345–349. [Google Scholar]
  131. Almughram, O.; Abdullah ben Slama, S.; Zafar, B.A. A Reinforcement Learning Approach for Integrating an Intelligent Home Energy Management System with a Vehicle-to-Home Unit. Appl. Sci. 2023, 13, 5539. [Google Scholar] [CrossRef]
  132. Zhang, F.; Yang, Q.; An, D. CDDPG: A Deep-Reinforcement-Learning-Based Approach for Electric Vehicle Charging Control. IEEE Internet Things J. 2021, 8, 3075–3087. [Google Scholar] [CrossRef]
  133. Li, S.; Hu, W.; Cao, D.; Dragicevic, T.; Huang, Q.; Chen, Z.; Blaabjerg, F. Electric Vehicle Charging Management Based on Deep Reinforcement Learning. J. Mod. Power Syst. Clean Energy 2022, 10, 719–730. [Google Scholar] [CrossRef]
  134. Alfaverh, F.; Denaï, M.; Sun, Y. Electrical Vehicle Grid Integration for Demand Response in Distribution Networks Using Reinforcement Learning. IET Electr. Syst. Transp. 2021, 11, 348–361. [Google Scholar] [CrossRef]
  135. Maeng, J.; Min, D.; Kang, Y. Intelligent Charging and Discharging of Electric Vehicles in a Vehicle-to-Grid System Using a Reinforcement Learning-Based Approach. Sustain. Energy Grids Netw. 2023, 36, 101224. [Google Scholar] [CrossRef]
  136. Ding, T.; Zeng, Z.; Bai, J.; Qin, B.; Yang, Y.; Shahidehpour, M. Optimal Electric Vehicle Charging Strategy With Markov Decision Process and Reinforcement Learning Technique. IEEE Trans. Ind. Appl. 2020, 56, 5811–5823. [Google Scholar] [CrossRef]
  137. Kaewdornhan, N.; Srithapon, C.; Liemthong, R.; Chatthaworn, R. Real-Time Multi-Home Energy Management with EV Charging Scheduling Using Multi-Agent Deep Reinforcement Learning Optimization. Energies 2023, 16, 2357. [Google Scholar] [CrossRef]
  138. Suleman, A.; Amin, M.A.; Fatima, M.; Asad, B.; Menghwar, M.; Hashmi, M.A. Smart Scheduling of EVs Through Intelligent Home Energy Management Using Deep Reinforcement Learning. In Proceedings of the 2022 17th International Conference on Emerging Technologies (ICET), Swabi, Pakistan, 29–30 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 18–24. [Google Scholar]
  139. Markiewicz, M.; Skała, A.; Grela, J.; Janusz, S.; Stasiak, T.; Latoń, D.; Bielecki, A.; Bańczyk, K. The Architecture for Testing Central Heating Control Algorithms with Feedback from Wireless Temperature Sensors. Energies 2023, 16, 5584. [Google Scholar] [CrossRef]
  140. van Tilburg, J.; Siebert, L.C.; Cremer, J.L. MARL-IDR: Multi-Agent Reinforcement Learning for Incentive-Based Residential Demand Response. In Proceedings of the 2023 IEEE Belgrade PowerTech, Belgrade, Serbia, 25–29 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
  141. Sun, Y.; Zhang, S.; Liu, M.; Zheng, R.; Dong, S. Energy Management Based on Safe Multi-Agent Reinforcement Learning for Smart Buildings in Distribution Networks. Energy Build. 2024, 318, 114410. [Google Scholar] [CrossRef]
  142. Liu, J.; Liu, P.; Feng, L.; Wu, W.; Li, D.; Chen, Y.F. Automated Clash Resolution for Reinforcement Steel Design in Concrete Frames via Q-Learning and Building Information Modeling. Autom. Constr. 2020, 112, 103062. [Google Scholar] [CrossRef]
  143. Li, A.; Xiao, F.; Fan, C.; Hu, M. Development of an ANN-Based Building Energy Model for Information-Poor Buildings Using Transfer Learning. Build. Simul. 2021, 14, 89–101. [Google Scholar] [CrossRef]
  144. Pinto, G.; Wang, Z.; Roy, A.; Hong, T.; Capozzoli, A. Transfer Learning for Smart Buildings: A Critical Review of Algorithms, Applications, and Future Perspectives. Adv. Appl. Energy 2022, 5, 100084. [Google Scholar] [CrossRef]
  145. Ali, S.M.M.; Augusto, J.C.; Windridge, D. A Survey of User-Centred Approaches for Smart Home Transfer Learning and New User Home Automation Adaptation. Appl. Artif. Intell. 2019, 33, 747–774. [Google Scholar] [CrossRef]
  146. Arun, S.L.; Selvan, M.P. Intelligent Residential Energy Management System for Dynamic Demand Response in Smart Buildings. IEEE Syst. J. 2017, 12, 1329–1340. [Google Scholar] [CrossRef]
  147. Ghenai, C.; Husein, L.A.; Al Nahlawi, M.; Hamid, A.K.; Bettayeb, M. Recent Trends of Digital Twin Technologies in the Energy Sector: A Comprehensive Review. Sustain. Energy Technol. Assess. 2022, 54, 102837. [Google Scholar] [CrossRef]
  148. Cheng, N.; Wang, X.; Li, Z.; Yin, Z.; Luan, T.; Shen, X.S. Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges; IEEE Network: New York, NJ, USA, 2024; p. 1. [Google Scholar] [CrossRef]
  149. Henzel, J.; Wróbel, Ł.; Fice, M.; Sikora, M. Energy Consumption Forecasting for the Digital-Twin Model of the Building. Energies 2022, 15, 4318. [Google Scholar] [CrossRef]
  150. Ceccolini, C.; Sangi, R. Benchmarking Approaches for Assessing the Performance of Building Control Strategies: A Review. Energies 2022, 15, 1270. [Google Scholar] [CrossRef]
  151. Pan, Y.; Shen, Y.; Qin, J.; Zhang, L. Deep Reinforcement Learning for Multi-Objective Optimization in BIM-Based Green Building Design. Autom. Constr. 2024, 166, 105598. [Google Scholar] [CrossRef]
  152. Shaqour, A.; Hagishima, A. Systematic Review on Deep Reinforcement Learning-Based Energy Management for Different Building Types. Energies 2022, 15, 8663. [Google Scholar] [CrossRef]
  153. Qi, T.; Ye, C.; Zhao, Y.; Li, L.; Ding, Y. Deep Reinforcement Learning Based Charging Scheduling for Household Electric Vehicles in Active Distribution Network. J. Mod. Power Syst. Clean Energy 2023, 11, 1890–1901. [Google Scholar] [CrossRef]
  154. Jendoubi, I.; Bouffard, F. Multi-Agent Hierarchical Reinforcement Learning for Energy Management. Appl. Energy 2023, 332, 120500. [Google Scholar] [CrossRef]
  155. Qin, Y.; Ke, J.; Wang, B.; Filaretov, G.F. Energy Optimization for Regional Buildings Based on Distributed Reinforcement Learning. Sustain. Cities Soc. 2022, 78, 103625. [Google Scholar] [CrossRef]
  156. Anvari-Moghaddam, A.; Rahimi-Kian, A.; Mirian, M.S.; Guerrero, J.M. A Multi-Agent Based Energy Management Solution for Integrated Buildings and Microgrid System. Appl. Energy 2017, 203, 41–56. [Google Scholar] [CrossRef]
  157. Kumar Nunna, H.S.V.S.; Srinivasan, D. Multi-Agent Based Transactive Energy Framework for Distribution Systems with Smart Microgrids. IEEE Trans. Ind. Inform. 2017, 13, 2241–2250. [Google Scholar] [CrossRef]
  158. Vamvakas, D.; Michailidis, P.; Korkas, C.; Kosmatopoulos, E. Review and Evaluation of Reinforcement Learning Frameworks on Smart Grid Applications. Energies 2023, 16, 5326. [Google Scholar] [CrossRef]
  159. Zhang, L.; Gao, Y.; Zhu, H.; Tao, L. A Distributed Real-Time Pricing Strategy Based on Reinforcement Learning Approach for Smart Grid. Expert Syst. Appl. 2022, 191, 116285. [Google Scholar] [CrossRef]
  160. Huang, X.; Zhang, D.; Zhang, X. Energy Management of Intelligent Building Based on Deep Reinforced Learning. Alex. Eng. J. 2021, 60, 1509–1517. [Google Scholar] [CrossRef]
  161. Wang, Z.; Xiao, F.; Ran, Y.; Li, Y.; Xu, Y. Scalable Energy Management Approach of Residential Hybrid Energy System Using Multi-Agent Deep Reinforcement Learning. Appl. Energy 2024, 367, 123414. [Google Scholar] [CrossRef]
  162. Knap, P.; Gerding, E. Energy Storage in the Smart Grid: A Multi-Agent Deep Reinforcement Learning Approach. In Trends in Clean Energy Research: Proceedings of the 9th International Conference on Advances on Clean Energy Research (ICACER 2024), Lille, France, 27–29 April 2024; Chen, L., Ed.; Springer Nature: Cham, Switzerland, 2024; pp. 221–235. [Google Scholar]
  163. Sobhani, A.; Khorshidi, F.; Fakhredanesh, M. DeePLS: Personalize Lighting in Smart Home by Human Detection, Recognition, and Tracking. SN Comput. Sci. 2023, 4, 773. [Google Scholar] [CrossRef]
  164. Safaei, D.; Sobhani, A.; Kiaei, A.A. DeePLT: Personalized Lighting Facilitates by Trajectory Prediction of Recognized Residents in the Smart Home. Int. J. Inf. Technol. 2024, 16, 2987–2999. [Google Scholar] [CrossRef]
  165. Manganelli, M.; Consalvi, R. Design and Energy Performance Assessment of High-Efficiency Lighting Systems. In Proceedings of the 2015 IEEE 15th International Conference on Environment and Electrical Engineering (EEEIC), Rome, Italy, 10–13 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1035–1040. [Google Scholar]
  166. Liu, J.; Chen, H.-M.; Li, S.; Lin, S. Adaptive and Energy-Saving Smart Lighting Control Based on Deep Q-Network Algorithm. In Proceedings of the 2021 6th International Conference on Control, Robotics and Cybernetics (CRC), Shanghai, China, 9–11 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 207–211. [Google Scholar]
  167. Suman, S.; Rivest, F.; Etemad, A. Toward Personalization of User Preferences in Partially Observable Smart Home Environments. IEEE Trans. Artif. Intell. 2023, 4, 549–561. [Google Scholar] [CrossRef]
  168. Almilaify, Y.; Nweye, K.; Nagy, Z. SCALEX: Scalability Exploration of Multi-Agent Reinforcement Learning Agents in Grid-Interactive Efficient Buildings. In Proceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Istanbul, Turkey, 15–16 November 2023; ACM: New York, NY, USA, 2023; pp. 261–264. [Google Scholar]
  169. Khan, M.A.; Saleh, A.M.; Waseem, M.; Sajjad, I.A. Artificial Intelligence Enabled Demand Response: Prospects and Challenges in Smart Grid Environment. IEEE Access 2023, 11, 1477–1505. [Google Scholar] [CrossRef]
  170. Gao, Y.; Li, S.; Xiao, Y.; Dong, W.; Fairbank, M.; Lu, B. An Iterative Optimization and Learning-Based IoT System for Energy Management of Connected Buildings. IEEE Internet Things J. 2022, 9, 21246–21259. [Google Scholar] [CrossRef]
  171. Malagnino, A.; Montanaro, T.; Lazoi, M.; Sergi, I.; Corallo, A.; Patrono, L. Building Information Modeling and Internet of Things Integration for Smart and Sustainable Environments: A Review. J. Clean. Prod. 2021, 312, 127716. [Google Scholar] [CrossRef]
  172. Pinthurat, W.; Surinkaew, T.; Hredzak, B. An Overview of Reinforcement Learning-Based Approaches for Smart Home Energy Management Systems with Energy Storages. Renew. Sustain. Energy Rev. 2024, 202, 114648. [Google Scholar] [CrossRef]
  173. Sheng, R.; Mu, C.; Zhang, X.; Ding, Z.; Sun, C. Review of Home Energy Management Systems Based on Deep Reinforcement Learning. In Proceedings of the 2023 38th Youth Academic Annual Conference of Chinese Association of Automation, YAC 2023, Hefei, China, 27–29 August 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023; pp. 1239–1244. [Google Scholar]
  174. Daneshvar, M.; Pesaran, M.; Mohammadi-ivatloo, B. Transactive Energy in Future Smart Homes. In The Energy Internet; Su, W., Huang, A.Q., Eds.; Elsevier: Amsterdam, The Netherlands, 2019; pp. 153–179. [Google Scholar]
  175. Rodrigues, S.D.; Garcia, V.J. Transactive Energy in Microgrid Communities: A Systematic Review. Renew. Sustain. Energy Rev. 2023, 171, 112999. [Google Scholar] [CrossRef]
  176. Nizami, S.; Tushar, W.; Hossain, M.J.; Yuen, C.; Saha, T.; Poor, H.V. Transactive Energy for Low Voltage Residential Networks: A Review. Appl. Energy 2022, 323, 119556. [Google Scholar] [CrossRef]
Figure 1. The number of selected publications in four key areas of RL and DRL applications for four major publishers (2019–2024 period).
Figure 1. The number of selected publications in four key areas of RL and DRL applications for four major publishers (2019–2024 period).
Energies 17 06420 g001
Table 1. The literature review results from bibliometric databases.
Table 1. The literature review results from bibliometric databases.
DatabasePublication TypeBuilding
Automation
Home
Automation
Reinforcement LearningBuilding
Automation + Reinforcement Learning
Home
Automation + Reinforcement Learning
Web of ScienceArticles13,770236846,76416420
Reviews8881792007133
ScopusArticles11,628848151,883103101
Reviews967622320693
Google ScholarAny type3,150,0003,170,0004,680,000250,000204,000
Reviews172,000191,00063,20024,60021,500
Table 2. The literature review results from publisher databases.
Table 2. The literature review results from publisher databases.
DatabasePublication TypeBuilding
Automation
Home
Automation
Reinforcement LearningBuilding
Automation + Reinforcement Learning
Home
Automation + Reinforcement Learning
SpringerArticles36,84815,33964,64824341073
Reviews289812975232462173
Science DirectArticles70,24725,54183,14937601312
Reviews8619404612,96461323579
MDPIArticles13463793797172
Reviews1334726171
IEEE XploreConferences28,815819431,83143065
Journals5861103287620224
Taylor
and Francis
Articles154,03354,445310,10816,7949081
Reviews449817375397512228
ACM Digital
Library
All type149,40328,66347,16513,2374325
Reviews2014362184
Wiley Online
Library
Journal236,56571,939223,64518,30710,196
Books45,95618,85536,65152943001
Table 3. Recent application of RL and DRL algorithms in HEMS.
Table 3. Recent application of RL and DRL algorithms in HEMS.
Reference
/Year
ApplicationAlgorithm
Method
ObjectivesVerification
[95] 2024IoTDeep Reinforcement
Learning (DRL)
Cost and ComfortSimulation
[96] 2024IoTDeep Q-learningCost and ComfortSimulation
[97] 2023IoTQ-learningOther
(Autonomy,
Personalization,
and Privacy)
Simulation
[34] 2020IoTDeep Deterministic
Policy Gradients (DDPG)
Cost and ComfortSimulation
[98] 2021Demand ResponseMulti-Objective
Reinforcement Learning (MORL)
Cost and ComfortSimulation
[99] 2021Demand ResponseQ-learningCost and ComfortReal (Physical
system testing
using MATLAB
and Arduino Uno)
[100] 2023Demand ResponseDeep Q-network (DQN)Cost and ComfortSimulation (evaluated
using real-world data)
[101] 2021Demand ResponseMATD3—Multi-Agent Twin
Delayed Deep Deterministic
Policy Gradient
Cost and ComfortSimulation (evaluated
using real-world data)
[102] 2020Demand ResponseQ-learning combined
with Fuzzy Reasoning
CostSimulation
[94] 2019Demand ResponseMulti-Agent Reinforcement Learning (MARL) combined with
Artificial Neural Networks (ANN)
Cost and ComfortSimulation
[103] 2020Demand ResponseProximal Policy Optimization (PPO)CostSimulation
[31] 2023Demand ResponseQ-learning combined
with Fuzzy Reasoning
Cost and ComfortSimulation
[35] 2021Demand ResponseDDPG with
Dual Targeting Algorithm (DTA)
Cost and ComfortSimulation
[104] 2020Demand ResponseDQNCostSimulation
[105] 2022Demand ResponsePrimal-Dual Deep Deterministic Policy Gradient (PD-DDPG)CostSimulation
[106] 2022Demand ResponseActor–Critic using Kronecker-Factored Trust Region (ACKTR)Cost and ComfortSimulation (evaluated
using real-world data)
[107] 2021Demand ResponseDRLCost and ComfortSimulation
[108] 2020Demand ResponseDQN and Double
Deep Q-learning (DDQN)
Cost and ComfortSimulation (validated
using a real-world database combined with the
household energy
storage model)
[109] 2021Demand ResponseQ-learningCostSimulation
[110] 2024Demand ResponseDQNCost and ComfortSimulation
[92] 2021Demand ResponseQ-learningCostSimulation
[111] 2022Demand ResponseDQNCost and ComfortSimulation
[6] 2024SchedulingDQN,
Advantage Actor–Critic (A2C),
and Proximal Policy Optimization (PPO)
CostSimulation
[112] 2022SchedulingQ-learningCost and ComfortSimulation
[113] 2023SchedulingMeta-Reinforcement Learning (Meta-RL) with Long Short-Term Memory (LSTM)CostSimulation (using practical data from Australia’s
electricity network)
[114] 2022SchedulingDQN, DDPG, and Twin Delayed Deep Deterministic Policy Gradient (TD3)CostSimulation
[115] 2024SchedulingPPOCostSimulation (using real-world datasets)
[116] 2023SchedulingDQNCost and ComfortReal (using real-time data from a test bench with household devices)
[117] 2023SchedulingPPOCost and ComfortSimulation (based on real-world data)
[118] 2020SchedulingMulti-agent Deep Deterministic Policy Gradient (MADDPG)CostSimulation
[119] 2021SchedulingQ-learningCost and ComfortSimulation
[120] 2022SchedulingDDPGCost and ComfortSimulation
[121] 2022SchedulingQ-learningCost and ComfortSimulation
[3] 2020SchedulingQ-learningCost and ComfortSimulation
[122] 2023RES + StorageTrust Region Policy Optimization (TRPO) based Multi-Agent Deep Reinforcement Learning (DRL)Cost and ComfortSimulation (using real-world data from the Australian National Electricity Market and PV profiles)
[123] 2022RES + StorageDDPGCost and ComfortSimulation
[124] 2024RES + StorageSACCostSimulation
[125] 2019RES + StorageQ-learningCost and ComfortSimulation
[126] 2022RES + StorageQ-learningCostSimulation
[127] 2023RES + StoragePPO with LSTM networksCostSimulation
[128] 2024RES + StorageDRL, specifically DDPG
and PPO
CostSimulation
[129] 2024RES + StorageActor–Critic-based RL
with Distributional Critic Net
CostSimulation
[130] 2024EV (V2G)Deep Q-Learning, MDPCost
and grid stability
Simulation (based on real-world data)
[131] 2023EV (V2G)Q-Learning, RL-HCPVCost and ComfortSimulation (based on real-world data)
[132] 2021EVCharging Control Deep Deterministic Policy Gradient (CDDPG)CostSimulation (based on real-world data)
[133] 2022EVDeep Reinforcement Learning (DRL), LSTMCostSimulation (based on real-world data)
[134] 2021EV (V2G)Q-LearningCost and ComfortSimulation (based on real-world data)
[135] 2023EV (V2G)Model-Free RLCostSimulation (based on real-world data)
[136] 2020EVDDPG, MDPCost
and grid stability
Simulation (based on real-world data)
[137] 2023EV + SchedulingMulti-Agent Deep Reinforcement Learning (MADRL)Cost and ComfortSimulation (based on real-world data)
[138] 2022EV + SchedulingDeep Q-Network (DQN), Double DQN, Dueling DQNCost and ComfortSimulation
Table 4. Comparison of opportunities: home vs. building applications of RL and DRL.
Table 4. Comparison of opportunities: home vs. building applications of RL and DRL.
OpportunityHome AutomationBuilding Automation
Demand Response
and Load Shifting
-
RL is used to shift energy-intensive activities to off-peak hours based on dynamic pricing or renewable energy availability [32]
-
Methods like PPO and A2C are used for optimizing the timing of energy use in home devices [6,153]
-
RL enables buildings to participate in demand response programs by shifting large loads (e.g., elevators, HVAC) to off-peak periods or times of high renewable generation [154]
-
More complex energy balancing strategies are needed due to scale [30,155]
Integration
with Renewable Energy
-
RL can optimize the use of rooftop solar panels and home batteries by learning when to store energy or sell it back to the grid
-
Key opportunity lies in coordinating solar generation with storage for maximum efficiency [125]
-
RL manages larger-scale renewable energy systems (e.g., building-integrated PV, wind turbines), optimizing when to use, store, or sell energy to the grid [156,157]
-
RL models handle interactions with smart grids and microgrids [158]
Energy Storage
Management
-
RL optimizes home battery usage by learning when to store solar energy or discharge it during peak demand periods [34,107,124]
-
Future opportunities include real-time adaptation to energy pricing and household consumption patterns [159,160]
-
Large buildings with energy storage systems require RL to balance stored energy with grid demand, renewable generation, and internal consumption [152,161]
-
RL agents coordinate across multiple storage units and energy systems [154,157,162]
Smart Lighting
and Occupancy-based Control
-
RL-based lighting systems learn from occupancy sensors and adjust lighting schedules to save energy while maintaining comfort
-
Personalized lighting control based on user habits is a key development area [163,164]
-
RL for adaptive lighting in large buildings helps reduce energy waste by adjusting lighting across zones based on occupancy [165]
-
Deep Q-learning has been applied for energy-efficient lighting control in commercial spaces [166]
Scalability
and Complexity
-
Home automation systems involve fewer devices and simpler control systems, making it easier to deploy RL models and achieve fast optimization results
-
Future work will focus on personalization and adapting RL to individual preferences [3,167]
-
Building automation systems are more complex, requiring multi-agent RL systems to handle diverse, multi-zone environments [168]
-
Scalability of RL models to manage multi-objective optimization across large buildings is an ongoing research challenge [40]
Integration
with Smart Grids
and IoT
-
IoT devices in smart homes provide real-time data to RL systems for better energy optimization and appliance control [169]
-
RL agents can integrate with home microgrids, managing energy flows between renewable sources, storage, and consumption [19,125]
-
In large buildings, RL facilitates participation in smart grids by managing energy exchange, load balancing, and interactions with external energy markets [158,159]
-
Enhanced IoT connectivity improves RL performance in coordinating various building subsystems [170,171]
Renewable Energy
Prosumers
-
Homes with solar panels and energy storage can act as “prosumers”, where RL optimizes energy generation, consumption, and selling excess energy back to the grid [88,89,90]
-
Buildings with integrated renewable systems participate as prosumers in energy markets, and RL manages the building’s contribution to local energy grids and microgrids [28,156]
Integration of EVs
-
RL enables intelligent scheduling of EV charging and discharging based on renewable energy availability, grid demand, and user preferences. EVs also act as energy storage units, supporting V2H and V2G operations to optimize home energy management [131,132,135]
-
RL in building automation manages fleets of EVs to optimize charging schedules for energy cost reduction, grid stability, and peak load shaving.
-
EV integration supports building-wide V2G operations, enabling energy trading and balancing [130,133,137]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Latoń, D.; Grela, J.; Ożadowicz, A. Applications of Deep Reinforcement Learning for Home Energy Management Systems: A Review. Energies 2024, 17, 6420. https://doi.org/10.3390/en17246420

AMA Style

Latoń D, Grela J, Ożadowicz A. Applications of Deep Reinforcement Learning for Home Energy Management Systems: A Review. Energies. 2024; 17(24):6420. https://doi.org/10.3390/en17246420

Chicago/Turabian Style

Latoń, Dominik, Jakub Grela, and Andrzej Ożadowicz. 2024. "Applications of Deep Reinforcement Learning for Home Energy Management Systems: A Review" Energies 17, no. 24: 6420. https://doi.org/10.3390/en17246420

APA Style

Latoń, D., Grela, J., & Ożadowicz, A. (2024). Applications of Deep Reinforcement Learning for Home Energy Management Systems: A Review. Energies, 17(24), 6420. https://doi.org/10.3390/en17246420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop