**1. Introduction**

Considering that buildings account for 40% of the primary energy consumption (EC) in the European Union [1], reducing the EC of buildings has become a necessity. The European Union, considering the increasing urbanization and climate change trends, defined the objective to reduce EC by 32.5% until 2030, from the baseline year of 2007, as a key priority in the EU's strategy and Green deal [2] to increase EE and decrease the energy performance (EP) of existing buildings [2–4]. This goal is aligned with the United Nations' seventh Sustainable Development Goal (SDG): "Ensure access to affordable, reliable, sustainable and modern energy for all" [5].

Buildings are responsible for the second largest portion of the final EC in the European Union [1,6,7], with households on 26.3% and public buildings on 28.8%, just after the transport sector (with 30.9%). Their refurbishment and energy-efficient retrofitting is a priority for many countries to reduce EC and decrease the EP of existing buildings as part of the EU Green deal [2,8]. In the current state of the art, data science and machine learning are available to analyze, predict and improve energy efficiency (EE) in buildings in meaningful ways. Such computer science approaches can be used to forecast and minimize energy consumption, design energy-efficient buildings, define strategies for mitigating

**Citation:** Anastasiadou, M.; Santos, V.; Dias, M.S. Machine Learning Techniques Focusing on the Energy Performance of Buildings: A Dimensions and Methods Analysis. *Buildings* **2022**, *12*, 28. https:// doi.org/10.3390/buildings12010028

Academic Editors: Roberto Alonso González-Lezcano, Francesco Nocera and Rosa Giuseppina Caponetto

Received: 3 December 2021 Accepted: 23 December 2021 Published: 31 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

impacts on the environment and climate, and predict and propose useful and cost-effective retrofit measures to increase the EE of buildings to provide a comfortable indoor living environment [9,10]. By measuring, monitoring, and improving the EE in buildings, we can reduce the amount of energy consumed while maintaining or even enhancing the quality of services provided by those buildings, a "double the global rate of improvement in EE"—SDG7.8 [5,11].

This paper proposes a conceptual and theoretical framework applicable in the analysis of literature papers that tackle the problem of the EPB with machine learning or statistical methods. In more detail, this work aims to add to the improvement of the EP of existing buildings, one of the core goals of the EU Green deal [2], by identifying and analyzing the latest and most appropriate machine learning or statistical techniques, as a baseline for future research by building a conceptual and theoretical framework based on a systematic literature review using PRISMA guidelines. Our approach helps the researcher find which methods are most used and more appropriate for analyzing the EP of different types of buildings.

Moreover, our framework addresses the dimensions and factors extracted from available data sources such as building energy certification data, EC data, wheatear and climate data, and others. Our proposal will help the community foster innovation on enhanced buildings' energy performance (EP) and predict energy-efficient retrofit measures (EERM).

In this context, our study adopts a well-established systematic literature review (SLR) method, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA [12]), to identify the most relevant literature contributions to the energy performance of buildings (EPB) and the prediction of EERM, using machine learning (ML) or statistical methods. Furthermore, we used a visualization bibliometric tool, VOSviewer [13], to find the most used terms in the literature related to the EPB with machine learning or statistical methods.

Some literature review papers tackle similar problems, mostly related to EC [14–18]. The main innovation and novelty of the study is how we present and group the data, focusing on the building types and addressing the dimensions and methods for each type. We believe that our study will help the community foster innovation on the enhanced EPB and predict energy-efficient retrofit measures. We present and visualize our results using the bibliometric network software tool VOSviewer. This tool allows creating and visualizing bibliometric networks based on text data and keyword co-occurrence, and authors' co-authorship networks of terms. This allows us to visualize and identify the most important terms and authors co-authorship respective relations for quantitative analysis.

Considering the stated intentions of this paper, we raised the following research questions:


The research questions focus on two objectives (1) automatic clustering—classification of the EPC of a building, and (2) prediction of energy-efficient retrofit measures, using ML and EPC data. Additionally, as mentioned, our approach brings a clear contribution to the EU Green deal and SDG7 of the United Nations [5].

Our paper is organized as follows. Section 2 presents the adopted systematic literature review technique (PRISMA) and our overall methodology. Section 3 describes the application of PRISMA and details the collected data from the survey, whereas in Section 4, we present and analyze such results using the visualization and bibliometric tool. Section 5 discusses our findings, aligned with our research questions, while in Section 6, we present our conclusions.

#### **2. Methodology**

The SLR analysis was performed by adopting a well-established systematic literature review and meta-analysis method (PRISMA). In our methodology, we combined this method with data visualization techniques, ending up with 4 main phases: (1) data selection, (2) results and analysis: survey results, categorization and dimensions analysis, visualization and bibliometric analysis, (3) discussion, (4) conclusions [19], as depicted in Figure 1.

**Figure 1.** Methodology.

Phase 1—Data Collection: Following PRISMA guidelines [12], we conducted an evidence-based systematic review to select the best basis for reporting systematic reviews. Our adoption of PRISMA follows the literature trend of using such a method as a basis for reporting systematic reviews, especially evaluations of interventions [12]. The PRISMA guidelines consist of a flow diagram and a checklist. The flow diagram of conducting a PRISMA survey has four phases: identification, screening, eligibility, and inclusion, as depicted in Figure 2. The checklist proposes a pre-defined structure for a survey with different sections. In addition, there are precise guidelines to be followed and described in more detail in Section 3 [12]. As mentioned, we focused our analysis on ML or statistical approaches using the public build, residential, and office buildings.

Phase 2—Results and Analysis: In this phase, we present the analysis of our PRISMA results. We analyze the main journals and conferences, the keyword co-occurrence, and the authors' co-authorship. We present and visualize our results using the bibliometric network software tool VOSviewer. This tool allows creating and visualizing bibliometric networks based on text data, particularly keyword co-occurrence and authors' co-authorship networks of terms. This analysis illustrates the relationships and connections between the network's elements (nodes), corresponding to the most used terms, allowing the identification of networks characteristics, such as node and cluster centrality. VOSviewer calculates the node links and weight, demonstrating each node's importance in the network. This allows us to visualize and detect the most important terms and authors' co-authorship individual relations for quantitative analysis. The size of nodes presents the degree of centrality: the larger the node, the more times it is reported in the text data. The thickness of edges presents the number of times two linked nodes are reported, showing their significance; by default, the networks are allocated from the largest to the smallest [13]. With this approach, we could summarize and critically analyze the most used dimensions, clustering and classification techniques, EP retrofitting prediction techniques, and the most used building types in each study. This method allowed us to find, accurately and efficiently, the best literature modeling practices and techniques for achieving enhanced EP.

Phase 3—Discussion: In this phase, we discuss the previous phases' findings by following the research questions. We specifically address the identified knowledge gaps and our study limitations.

Phase 4—Conclusion: We sum up and present the conclusion of our study.

**Figure 2.** PRISMA Flowchart.
