1. Introduction
In line with the EU commitment to global climate action under the Paris Agreement, the strategic long-term vision for a prosperous and climate-neutral European economy determined that GHG emissions must be drastically reduced by 2050 [
1]. Accordingly, the European Green Deal (EGD) set a reduction target of 50–55% by 2030 [
2]. EU-wide, buildings account for 40% of energy consumption and 36% of GHG emissions, and thus there will be a highly significant portion of potential actions in eliminating GHG emissions [
3]. Currently, primary energy consumption in the EU building stock is reducing at a rate of about 1% per year [
4], meaning reaching carbon neutrality by 2030 will require a significant effort to be able to manage building energy demand. Energy scenarios currently indicate that the share of renewable electricity for the European countries ranges from 48% to 70% by 2050, compared to 31% currently [
5].
Furthermore, the political and economic situation (due to the war in Ukraine and several years of COVID) created additional pressure and major energy security and energy poverty risks worldwide today. Many European countries are facing a deepening energy crisis as they prepare for a cold winter. Addressing the climate neutrality needs and at the same time securing affordable energy for all, calls for more radical and dynamic approaches to optimize energy usage, such as by minimizing the overall energy consumption of building systems, as well as by optimization hourly usage of energy based on energy prices and the availability of clean energy sources, as well as directing energy usage out of peak energy consumption hours.
For example, in Finland, a majority of energy operators offer contracts to their clients where the price is following the hourly changes on the Nord Pool [
6] spot prices. Factors that affect the prices include available production capacity, fuel prices, emission rights, and electricity consumption [
7]. The most common reason for price fluctuations is the prevailing weather in Finland, as well as in the countries from which Finland buys electricity. For example, the abundant rains, especially in Norway, increase the hydropower reservoir level and thus lower the price. Similarly, strong winds increase the production of wind turbines. Additionally, the weather also has an impact on the demand. In cold winters, the price of electricity remains clearly higher than usual, when there is a greater need for heating. In the summer, on the other hand, the price of electricity is typically lower, although power plant maintenance is often carried out during the summer. Therefore, consideration of electricity spot prices, combined with weather forecast, has a potential to optimize the energy consumption of building systems, lower electricity prices, and at the same time reduce the level of CO
2 emissions caused by energy production.
As modern buildings are becoming increasingly smart-integrated with sensors, smart control systems, networking means, and data analyzing platforms, the data collected from sensors and application of artificial intelligence (AI) and machine learning (ML) algorithms can support achieving this goal. The electricity cost-based optimization of building energy consumption while ensuring building occupants’ comfort is the main motivation behind this research.
Heating, ventilation, and air conditioning (HVAC) equipment is some of the most extensively used and most energy-consuming systems in the buildings. Accordingly, the optimal control of HVAC systems can improve electricity usage, lower electricity prices, and at the same time reduce green gas emissions. The optimization of HVAC functions is not a new area of research. It is extensively studied as a part of demand response (DR) management, which also includes approaches towards shifting electricity usage and dynamic pricing control. Existing methods for improving building HVAC energy efficiency can be broadly categorized as follows: traditional mathematical rule-based, model-based, and data-driven (AI). Rule-based controls are simple heuristic methods. They are usually based on known data and rely on the monitoring of a specific “trigger” parameter (e.g., room temperature) on which a threshold value is fixed to control the system according to the predefined strategy. For example, studies by Alimohammadisagvand et al. [
8] investigated rule-base DR control algorithms in several types of buildings in Finland based on the electricity prices to control the temperature set point of space heating (real-time hourly electricity price and previous-/next-hour forecast electricity price). It was reported that the control algorithm based on the previous hourly electricity prices is the most effective algorithm in most of the studied cases. When compared with the reference case (the indoor temperature set point of heating is a constant 21.0 °C), the maximum total delivered energy and cost saved using control algorithms was around 3% and 6–14%, respectively, depending on the house type, heat distribution systems, and parameters used by algorithms. However, rule-based DR strategies have the advantage of being simple; they feature several lacks, usually concerning their poor dynamics. Rule-based models can be hard to maintain due to potential changes during the building life. Despite this lack of adaptation, dynamicity, and predictability, rule-based DR strategies account for the majority of DR commercial implementations [
9,
10].
In the model-based control algorithms, some of the parameters are predicted, and this results in a more reliable but complex control strategy. For example, model-based HVAC control algorithms to minimize total energy costs for end-users were studied by Avci et al. [
11]. However, model-based approaches have limited practical adoption due to its predictive model complexity and memory footprint required for the online optimization. Computational complexity exponentially increases with the complexity of the building and the structure of the energy network [
12,
13]. Several studies pointed out model-based approaches overcoming the limitations encountered by simpler rule-based controls and outperforming them [
14,
15].
Instead, AI data-driven methods were demonstrated as more flexible [
16] and able to impact HVAC systems operations by adjusting the control parameters (e.g., temperature), leveraging historical operational and occupancy data of the building, as well as environmental data (e.g., weather). The flexibility comes from the ability of machine learning algorithms to learn from historical operational data of the building and adjust functions of HVAC systems accordingly. Additionally, compared to traditional rule-based models, for example, data-driven approaches require less domain expert knowledge and no description of the building’s physical dynamics.
Many data-driven studies utilize supervised machine learning methods. For example, Liu et al. applied the deep deterministic policy gradient (DDPG) for short-term energy consumption of HVAC systems for heating and cooling in small office environments [
17]. It was reported that the proposed model produced more accurate results than the common supervised learning models, such as the support vector machine (SVM) and neural network (NN). Large commercial buildings were studied by Reena [
6], where structural equation modelling (SEM) is proposed to improve the prediction of temperature within a zone to build energy-efficient HVAC systems.
Analyzing occupant behavior and their interaction with HVAC systems can also help in better meeting the thermal comfort of occupants saving the energy at the same time. Raza et al. developed a machine leaning model for space heating that can determine the occupants’ behavior, which generally results in the wastage of energy in the operation of HVAC systems [
18].
The impact of different occupancy prediction models using ML techniques was analyzed by Esrafilian-Najafabadi [
19]. Several ML techniques (decision trees, k-nearest neighbor, multilayer perceptron, and gated recurrent units) were deployed to predict the occupancy types and patterns and provide an accurate and reliable evaluation of the performance of the occupancy model for coupling with HVAC control systems. A few supervised machine learning models: support vector machines (SVM), artificial neural network (ANN), logistic regression (LR), linear discriminant analysis (LDA), k-nearest neighbour (KNN), and classification trees (CT) are proposed by Chaudhury to predict comfort levels of occupants [
20].
Evolutionary algorithms are also used to learn the optimal control parameters, using historical data. For example, Kusiak in [
21] used an evolutionary algorithm to find the optimal control settings (i.e., supply air temperature and supply air static pressure) of an HVAC system based on a data-driven model built for system performance.
Nassif [
22] proposed the cooling optimization of HVAC systems based on genetic algorithms for controller optimization and supervised machine learning methods for HVAC modelling. Optimal price-based control of HVAC systems in multizone office buildings for demand response is reported by [
23]. Occupants’ varying thermal preferences, represented as a coefficient of a bidding price (chosen by the occupants) in response to price signals, are modeled using ANN and integrated into the optimal HVAC scheduling. Furthermore, a control mechanism is developed to determine the varying HVAC thermostat settings in various zones based on the ANN prediction model results.
The optimizations based on supervised machine learning algorithms may require a vast amount of labeled data. Accordingly, the performance of supervised ML approaches depends on the quality of the building’s historical data, which might not be available. In addition, in case of a change in equipment or users, this data becomes obsolete, and the performance of trained machine learning algorithms can decrease.
To address these challenges, a data-driven approach that can learn online optimal control parameters from historical data to optimize HVAC operations, is needed. Reinforcement learning (RL) seems promising to address this type of a problem, where a software agent needs to learn an optimal or a near-optimal policy that would maximize the user-defined reinforcement signal (i.e., reward). Furthermore, RL-based approaches for heating and cooling control and optimization of decision-making action in real-time rely on minimal dependency on historical data.
There are several studies that applied RL control strategy in the operation optimization of building HVAC systems [
24]. The application of a discrete and a continuous reinforcement learning-based supervisory control approach, which actively learns how to appropriately schedule thermostat temperature setpoints based on the occupants’ comfort profiles, was studied by Fazenda et al. [
25]. Liu and Henze [
26] used RL, and specifically Q-learning, to optimize the operation of active and passive building thermal storage inventory. The intelligent temperature control in the controlled areas of the building, by learning the characteristics of HVAC equipment and occupant habits, was studied by Barrett and Linder [
27]. Costanzo et al. [
28] applied RL controlling strategies to building demand response to achieve 90% of the mathematical optimum solution. Ruelens et al. [
29] applied RL algorithms to an HVAC system with a heat pump, achieving significant energy savings. Li and Xia [
30] proposed multi-scale RL to accelerate the process of solving optimal control strategies. Wei et al. [
31] proposed a deep RL-based control method of an HVAC system. It was pointed out by researchers that deep RL controller requires improving in long learning time. A RL architecture for the efficient scheduling and control of an HVAC system in a commercial building while harnessing its demand response (DR) potentials was proposed by [
32]. Simulation demonstrated achieving a weekly energy reduction of up to 22% compared to a baseline controller.
A RL-based energy optimization model applied in factories’ real-time environment (reported learning time about several weeks) and able to provide around 25% energy saving on top of a baseline controller was proposed by Biswas [
33]. The HVAC optimization goal was to keep the temperature and (relative) humidity within the prescribed manufacturing tolerance ranges, and at the same time, balanced with energy savings and CO
2 emission reductions.
A deep reinforcement learning (DRL) approach for building heating control to automate decision making in real-time with minimal dependency on historical data is proposed by Gupta et al. [
34]. As an input, simulation experiments used real-world outside temperature data, but constant electricity price. It was reported that the DRL-based smart controller outperforms a traditional thermostat controller by improving thermal comfort by 15–30% and reducing energy costs between 5% and 12% in the simulated environment.
In contrast, this research presents a deep reinforcement learning-based model for HVAC control and optimization, which can optimize the functionality of HVAC systems considering dynamic electricity costs and weather information towards the minimization of energy bill costs of the occupant, and at the same time, securing thermal comfort. The results indicate that in situations with highly fluctuating electricity prices, it is possible to reach significant cost savings, whereas savings in energy usage remain marginal. The method is tested by simulations with typical buildings of different ages to test the adaptability and scalability of the proposed approach.
In the following paper, the methods used to design and develop the cost optimization support are presented in
Section 2. More specifically, the architecture, data analytics, and algorithms to enable optimization and control features are discussed here.
Section 3 is focused on the obtained results. The strengths of the developed solution and the aspects of future work are concluded in
Section 4.
4. Conclusions
This paper proposes a reinforcement learning-based electricity cost saving method that increases the heating of an electrically heated building when electricity is cheap and reduces the electricity use when it is expensive, in such a way that the resident does not notice it as thermally uncomfortable.
The results indicate that it is possible to lower heating costs significantly with RL. Depending on the fluctuations of the electricity price, the savings can reach the same level as when reducing the stable indoor temperature setpoint by two degrees, or be even higher. In this study, the algorithm was less successful during the first two years, and performing a lot better during the years 2021 and 2022. First, in the beginning of the year 2019, the agent started training the deep neural networks from scratch, and as experience was gained, the operation began improving. Second, the electricity price level and variability from the end of 2021 onwards is radically different from the first simulation years. This results in higher savings from optimizing the heating times.
Here, it is assumed that the occupants prefer indoor temperatures closer to 21 °C, but in real cases, the end users might also suffer from temperature changes. Transitions in the temperature should be rather small and slow to keep user experience positive. Presumably, the lags in the heating system and heat capacity of the building and furnishing are supporting here, but this would require further analysis, e.g., by integrating thermal sensation calculations with the human thermal model [
45] to the simulations.
The agent is not aimed to minimize the total delivered electricity, and consequently, the consumed electricity is just slightly less than with the 21 °C reference case, and with the stable 19 °C indoor temperature higher, energy saving could be reached. However, the electricity price is also dependent on the production type. Usually, the price is lower when the share of renewable energy, e.g., from wind power, is high. Thus, it would be interesting to include some estimate of the emissions based on the production types.
Selecting the right reward function has a high impact on the results. By changing it, the algorithm can focus on different targets, e.g., energy savings or minimizing emissions. However, it must balance between the savings and thermal indoor comfort, not only to keep residents contented, but also to be able to utilize the heat capacity of the building and retain the controllability of indoor temperature.
For the future work, approaches for fine-tuning the energy cost saving agent for more complex building energy systems should be investigated, e.g., by taking also hot water boilers, heat pumps, local energy production, such as PV panels, and energy storages into account.
In addition, the presented method is tested with building simulation models, which represent typical Finnish one-family house constructed between 1961 and 1970, 1971 and 1980, 1981 and 1990, 1991 and 2000, 2001 and 2010, and 2011 and 2017. Based on these tests, the energy cost saving agent can be scaled for different Finnish one-family houses. However, the tested method does have higher performance with newer buildings. The oldest buildings have typically less insulation and a lack of heat recovery systems. This means that they have faster reaction to heating power reductions, which results in a less dynamical margin for the indoor temperature control. However, it is important to note that the RL parameters are calibrated with 2001–2010 building, and it is not tested how the much older buildings would behave with variant configuration. Furthermore, for the future work, testing with different types of buildings in various climatic conditions should be performed.
From a practical deployment point of view, the system has several challenges in the future. The first challenge is related to the initialization of the agent. More specifically, in real cases, the controller cannot behave randomly for a long time, so it should be studied if the algorithm can adapt to a new building fast enough, or should the agent be pretrained with a simulator beforehand. Furthermore, after major renovations, the system should be able to readjust to the new consumption and be able to forget the old behavior in descent time.
The second challenge is related to the fact that many buildings do not have an existing building automation and controller system (BACS) or IoT connected room temperature controller or smart thermostats and related secure REST API for daily communication with a cloud-based electricity cost saving agent. This means the integration of the presented approach to real use would require some physical installations to be performed.
Overall, scaling this kind of a solution could increase the flexibility in the electricity market, which is important also from the electricity network balance and related electricity price point of view.