**3. Data Collection**

*3.1. PRISMA Method*

By adopting the PRISMA guidelines, the SLR was performed as follows. First, a search process was conducted to detect publications that have in their titles, abstract or keywords the following Boolean expressions:

*("energy retrofit\*" OR "energy performance" OR "energy analysis" AND ("artificial intelligence" OR "artificial neural networks" OR "machine learning" OR "genetic algorithms" OR "classification" OR "clustering analysis") OR "certificat\*" OR "hypercube" OR "k-means")*

The literature search was performed in April 2021 using the following data repositories: Science Direct, Web of Science, and Scopus. Using 'OR,' and 'AND' statements, we include all papers published between the periods 1st January 2016–27th of April 2021. The analyzed topics were integrative, including computer science, mathematics, engineering, environmental, and data science. While all sources were used, the analysis indicated that most of the publications from Science Direct were also in Web of Science and Scopus.

The final set of SLR papers for qualitative and quantitative analysis was organized using the Mendeley references manager open-source tool [20]. This step permitted us to extract metadata, remove duplicates, and obtain precise figures on the relative importance of the author of a particular keyword. The obtained metadata were: authors, publication metadata, references, and citations.

#### *3.2. PRISMA Results*

The following PRISMA flow diagram presents the SLR data collection process for our quantitative and qualitative analyses (Figure 2). The initial step in this approach identified published papers through a database search, resulting in 1292 publications (Web of Science 374; Science Direct 338; Scopus 580). The inclusion criteria were original research papers written in English and published in Q1–Q2 peer-reviewed journals (based on scimago rank) and related conferences in the said period. We focused only on papers with studies within the EU, given the applicability of EU directives and regulations and building energy certification, which differs for countries outside the EU. Moreover, even within the EU, there is variation in the methods used to identify and assess EC and building energy certification [21]. Additionally, review, position, and reports papers were excluded.

Subsequently, we removed duplicates (e = 415). Then, we performed title and abstract screening. Step 1 excluded all the papers whose title was not relevant to the scope and objectives of this study (e = 503). Step 2 excluded all the papers without an abstract or whose abstract was not relevant to the scope and objectives of this study (e = 162). Finally, step 3 excluded all the papers according to the outlined inclusion and exclusion criteria (e = 146) as mentioned in the previous paragraph. Next, the full texts of the remaining 66 papers were read, assessed, and fitted on the scope of the research. Thirty-one papers were excluded, given that they did not use ML or statistical techniques. Finally, the remaining 35 papers were considered eligible for further analysis. Thirty-three were published in scientific journals, whereas two were published in conference proceedings.

#### **4. Results and Analysis**

#### *4.1. Journals and Conferences Analysis*

In the study of a total of 33 literature papers, we analyzed 13 journal papers, including from Applied Energy (9), Energy & Buildings (7), Sustainable Cities & Society (4), Energies (3), Energy (2), Sustainability (1), IEEE Transactions on Automation Science & Engineering (1), Renewable & Sustainable Energy Reviews (1), Measurement (1), Croatian Review of Economic, Business & Social Statistics (1), Journal Electronics (1), Energy Policy (1) and Neural Computing & Applications (1). Table 1 shows the summary of the journals with their information, that most journals are Q1-quartile ranked (9), representing 90%, (2) are Q2-quartile-ranked, and the remaining (1) is not yet classified by the quartile-ranked [22], although the quartile rank can change over time.

The five major research areas found in the analysis were energy, engineering, environmental science, mathematics, and social sciences. The 33 selected papers' publishers originate from five countries, with most of them from the United Kingdom (6) and The Netherlands (2), followed by Switzerland (2), the United States of America (1), Croatia (1), and China (1). The top publishers found are Elsevier BV (5), Elsevier Ltd. (4), Taylor and Francis Ltd. (2), MDPI Multidisciplinary Digital Publishing Institute (1), MDPI AG (1), Institute of Electrical and Electronics Engineers Inc (1), Croatian Statistical Association (1), Science Press (1) and Springer London (1).


**Table 1.** Journals details.

The conferences found in this study were IEEE International Conference on Internet of Things and Green Computing & Communications and Cyber, Physical & Social Computing and Smart Data (2017), and IOP Conference Series: Earth & Environmental Science (2019). Table 2 presents that the major research areas of the conference are computer and environmental science in the United Kingdom and Indonesia.

**Table 2.** Conferences details.


*4.2. Keyword Co-Occurrence Analysis*

Term co-occurrence analysis was conducted utilizing the mentioned text mining tool for network analysis, VOSviewer. The analysis was conducted utilizing a full counting method, encompassing 143 screened terms, with a minimum threshold of two cooccurrences. Of the total 143, only 21 terms were chosen for the analysis (Table 3).


**Table 3.** Keywords co-occurrence ranked by the link strength.

Most of the analyzed keywords were related to energy efficiency (EE), building energy retrofit, ML, and building energy performance. The top five found keywords were EE (7 occurrences, 10 total link strength), building energy retrofit (4 occurrences, 8 total link strength), ML (4 occurrences, 8 total link strength), office buildings (3 occurrences, 8 total link strength) and building EP (5 occurrences, 7 total link strength).

In keywords of co-occurrence analysis, four clusters (Figure 3) were found with 21 keywords and 50 links. The biggest nodes of each cluster were identified as EE (blue), building EP and EPC (red), office buildings (green), and energy simulation (yellow).

Focusing on the interrelated network of Figure 3 (21 items, 4 clusters, and 50 links), the energy simulation term (yellow cluster) has a connection only with the term energy efficiency (EE) (blue cluster). The building energy performance term (red cluster) has a connection only with the term EE (blue cluster), and the energy performance certificate (EPC) term (red cluster) has a connection with the terms building retrofit (blue cluster) and energy retrofitting (green cluster). The office buildings term (green cluster) relates to all the clusters, namely with the term's sensitivity analysis (yellow cluster), building energy retrofit and artificial neural networks (ANN) (red cluster), EE, and building retrofit (blue cluster). Finally, the EE term (blue cluster) relates to all the clusters too, namely with the

term's energy simulation (yellow cluster), ANN, building energy performance and ML (red cluster), office buildings, and clustering analysis (green cluster).

An extensive, connected network of keywords and groups of keywords occurs in individual articles, mostly between 2018 and 2020 (Figure 4). The keyword analysis indicated research fields emphasizing ML and EE and found ML techniques, such as clustering analysis, energy simulation, and ANN.

**Figure 4.** Keyword co-occurrence by year overlay visualization.

#### *4.3. Authors' Co-Authorship Analysis*

An authors' occurrence analysis was conducted using the reported text mining tool for network and bibliometric analysis, VOSviewer. The analysis was conducted applying the full count method, choosing 15 maximum number of authors per document and a minimum threshold of 2, resulting in a total of 154 authors meeting this threshold, of which 28 authors were analyzed (Figure 5).

**Figure 5.** Authors' co-authorship network visualization analysis.

The top 10 found authors were Ascione Fabrizio with a link strength of 21 [23–27], Bianco Nicola with a link strength of 21 [23–27], Mauro Gerardo Maria with a link strength of 21 [23–27], Vanoli Giuseppe Peter with a link strength of 21 [23–27], Ali Usman with a link strength of 16 [28–30], Hoare Cathal with a link strength of 16 [28–30], Mangina Eleni with a link strength of 16 [28–30], O'Donnell James with a link strength of 16 [28–30], Shamsi Mohammad Haris with a link strength of 16 [28–30], and Bohacek Mark with a link strength of 12 [28,29].

In the authors' co-authorship analysis, seven clusters were found with 28 items and 54 links. Cluster 1 (green) relates to the top four author co-authorships ranked by link strength. For De Stasio Claudio with a link strength of 9 [23–25] and De Masi Rosa Francesca with a link strength of 8 [26,27] (Table 4), Cluster 2 (red) has seven items and found Ali Usman and Bohacek Mark. For Hoare Cathal, Mangina Eleni, O'Donnell James, Purcell Karl, Shamsi Mohammad Haris [28–30], Cluster 3 (yellow) has four items and found Casals Miquel, Ferré-Bigorra Jaume, Gangolells Marta, and Macarulla Marcel [31,32]. Cluster 4 (blue) has three items and found Capozzoli Alfonso, Cerquitelli Tania and Piscitelli Marco Savino [33–35], and Cluster 5 (cyan) has three items and found Ciancio Virgilio, Dell'olmo Jacopo and Salata Ferdinando [36,37]. Cluster 6 (purple) has two items: Fernández Bandera C and Ramos Ruiz G [38].


**Table 4.** Authors' co-authorship ranked by link strength.

Clusters 2, 3, and 5 relate to authors' published articles in 2020–2021. Cluster 4 relates to authors with publications in 2019, and cluster 7 corresponds to authors with publications in 2018; for the remaining authors, articles were published in 2017. Figure 6 indicates that the top 10 author co-authorships were published in 2017, demonstrating that the academic community had a strong connection in 2017. Finally, the most relevant papers were published from 2017 to 2020, demonstrating that the academic community has increased.

**Figure 6.** Authors' co-authorship visualization by year.

#### *4.4. Most Cited Publications*

Analysis of the most-cited publications helped us to detect the important research topics in the literature. The most cited and chosen publications were searched using Science Direct, Scopus and Web of Science datasets. The study detected publications that have been cited between 84 times and 0 times. Table 5 shows this process's resulting conceptual and theoretical framework with each paper's dimensions, intelligent computing methods, and type of buildings.

The top five found publications are from the following authors: Ascione, Bianco, Stasio et al. [23] with 84 citations (the most cited), followed by Ramos Ruiz et al. [38] with 44 citations, Ascione, Bianco, De Masi et al. [27] with 44 citations, Niemelä, Kosonen, and Jokisalo [39] with 39 citations and Beccali et al. [40] with 37 citations. These results (Table 5) are coherent with previous analyses described above. These papers are the most cited and present the central concepts in the field.

The top five cited papers present in Table 5 were published in Q1-ranked journals and mostly in Energy and Buildings and Applied Energy journals. Furthermore, and coherent to the analysis, the most cited article is also emphasized in the authors' co-authorship analysis (Section 4.3). Cluster 1 (green) in Figure 5 groups the most cited author coauthorship Ascione, Bianco, Stasio et al. [23] and cluster 6 (purple) groups most of the author co-authorships of the second most-cited article Ramos Ruiz et al. [38]. In keyword co-occurrence analysis (Section 4.3), the term ANN was outstanding and is one of the techniques used by the most cited publication of Beccali et al. [40].


**Table 5.**

Publications

 ranked by the number of citations.







Likewise, several ML and statistical methods were used for energy applications on SLR papers. The 10 top most-cited papers used a combination of methods, namely simulation techniques, Pareto front, genetic algorithm NSGA-II, and ANN (Table 5). As input in those methods, these top 10 papers used the following dimensions extracted from the data: climate and weather, building thermo-physical characteristics, building envelope, building geometry, HVAC systems, EC, and building typology. In the case studies, most of them used residential buildings (6), offices (1), universities (2), schools (1), and hospitals (1), refs. [23,27,37–44]. The remaining SLR papers used similar dimensions: building geometry, building envelope, other building properties, climate and weather, HVAC systems, and energy consumption (EC) [16,18,20,21,28–48]. A total of 19 out of 35 papers used a residential building as the case study [28–30,33–37,39,41–43,45–47,53–57]. Some of them combine different types of buildings. Five papers combined residential and commercial buildings—offices and schools [25,32,44,52,58], two papers addressed offices [31,49] and six analyzed schools and universities [27,38,40,50,51,59]. Only one addressed a hospital [23] and one a care home [48].

Furthermore, only the most recent papers utilized building EPC data for their analysis [28,29,31,34,50–54,56,57,60,61]. This aspect is surprising since the first directive on building energy performance, "the Energy Performance of Building Directive (EPBD)," was introduced by the European Parliament in 2002. Additionally, improvements to the EPBD were performed in 2010 [60,61]. The remaining papers use energy building audits analysis and reference buildings for their research.

The most-used techniques for predicting EP and retrofitting were energy performance simulation techniques, statistical-based approaches, genetic algorithms, and ANN. Few studies use only ML methods, namely (13) studies [28,29,31,33–35,40,45,48,50,52,56,59]. The most common clustering and classification techniques were K-means (7), statistical methods (6), Latin hypercube sampling (2), other manual groupings (2), decision tree (2), and probability density function (1) [23,25,28–34,47,50–52,54,55,57,59].

#### *4.5. Type of Buildings, Dimensions, and Methods Analysis*

A conceptual and theoretical framework was built to evaluate this survey's building types, dimensions, and computational intelligence methods in more detail; see Tables 6 and 7. This framework seeks to understand the most-used ML and statistical approach according to each SLP study's dimensions and building types resulting from the previous analysis (Table 5). It focuses on research inputs, goals, and outcomes to create the basis for our research evaluation criteria.


**Table 6.** Analysis of the used Dimensions by Type of Buildings.


Table 4 presents our findings on dimensions by building types to implicate new knowledge, which helps energy experts to learn and use the most critical dimensions for particular building types in their modeling and research work.

The SLR analysis suggests that the dimensions extracted from the data sources, can be grouped in the following way:


Table 7 presents our inference which may help data scientists understand the right method to employ for further research.

**Table 7.** Methods by Dimensional Analysis.



The above analysis allows us to use the most common dimension categories of building to find an adequate method to evaluate the energy performance according to the building type we are interested in. As the results demonstrated, most studies have common dimensions no matter the building type and methods.

#### **5. Discussion**

Our research aimed to highlight and detect the literature on machine learning (ML) and statistical techniques that tackle the EPB and create a systematic, organized view of those literature studies.

Following, we discuss how our study answers the posed research questions, namely:


#### *5.1. Research Questions Discussion*

Our analysis indicates that the two problems discussed by the proposed machine learning or statistical approaches are clustering (classification) and prediction in the energy performance of buildings.

Regarding the first question (RQ1), 13 studies used the EPC dataset [30,32–35,52,58] as explained in Sections 4.4 and 4.5. This kind of data is multi-dimensional, given that each energy certificate has many attributes. The exploitation of a given data mining algorithm on such data (such as cluster analysis) is challenging due to the high variability and dimensionality of the data [33]. As for data classification and clustering techniques, most studies applied the K-means clustering algorithm to characterize the cluster sets with given energy performance, as explained in Sections 4.4 and 4.5. Some studies used a density-based spatial clustering of application with noise algorithm (DBSCAN) to handle outliers and correlation analysis to identify the best input demission for their clustering analysis [32–34,52]. A few studies referring to RQ1 used GIS and geospatial maps to visualize their clustering results [30,58]. Finally, (5) papers of similar studies answered RQ1, namely [30,33–35,58].

Regarding RQ2, most approaches to predicting energy-efficient retrofit measures used simulation tools such as EnergyPlus [62] or TRNSYS [63] to model the energy consumption (EC) of a 3D model of the building. They understudy and then use GA to perform multiobjective optimization, obtaining a good solution for the different criteria defined as important in their studies [46]. The strategy of using precomputed 3D models requires a

large database of models and the accuracy depends on how close those models match realworld buildings. Although the most common algorithm for multi-objective optimization is the non-dominated sorting GA II (NSGA-II), it is possible to improve the algorithm by customizing it for energy retrofitting scenarios [64]. NSGA-II is a GA and customizing it for the specific field of energy retrofit would yield more efficient computations. Additionally, the more recent NSGA-III is not used by researchers [65]. The improved version will be more efficient computationally when finding optimal solutions. The simulation's quality depends on having a good model representation of the building and using other environmental factors such as weather data and orientation of the building/solar exposition [44].

The environmental characteristics that impact the building EP are also important criteria to determine what retrofitting measures are cost-effective. It is also essential to describe the building materials in terms of their heat loss/gain rating by the thicknesses (Uvalue) of features, namely roof, wall, floor, ceiling, and window, as well as identify the type of heating and cooling systems, renewable energy systems being used, occupation density, and others that might affect the building's energy consumption. It can be considered that the more extreme the weather conditions are in the region of the building, the more critical it is to include it in the modeling of EP [44].

Moreover, referring to the RQ2, several authors used GA (7) [23,27,38,39,42,48] to predict cost-optimal energy retrofit solutions. Some approaches used artificial neural networks (ANN) [35,40,48,50,52,56]. Most papers in this category are case studies using a single building or a representative building sample to collect the necessary data to serve their experiments. No study referring to RQ2 used GIS and geospatial maps to visualize. Finally, (15) papers of similar studies answered the RQ2, namely [23,25,27,29,36–39,41,42,44,46,48,55].

Some studies (8) answer both research questions; two such approaches are an excellent example of using K-means clustering and ANN with public EPC databases to predict EERM [50,52]. Other approaches focusing only on predicting energy consumption (EC) show that it is possible to use a data-driven urban energy simulation to predict the hourly, daily, and monthly energy consumption. In addition, models are used as a baseline for EC and then apply a residual network ML model to predict the EC on the various scales [43].

The primary objectives of the studies in this category (8 studies) are the prediction of EP, potential for energy savings, and cost-optimal retrofitting solutions [40,43,45,47,49,50,52,59]. As data classification and clustering techniques, some studies (6) adopted Kmeans [28,32,50,52,57,59]. Ultimately, some (2) applied manual classification [47,49]. As a prediction of EP and cost-optimal retrofit solutions techniques, some approaches (7) employed ANN and GA [40,47,49,50,52,56,57]. Others implemented different ML algorithms, such as random forest (RF) [59]. Lastly, some of the approaches executed simulations and mathematical techniques, such as a multiple linear regression, Pearson's correlation, principal component analysis, Monte Carlo, Gaussian process regression model, Gaussian mixture regression model, and deep learning algorithms [28,49,56]. Finally, some studies (3) use geographical information systems (GIS) and geospatial maps to visualize their results [28,50,56].

#### *5.2. Knowledge Gap*

Our analysis concluded that the research gap is related to identifying and testing ML approaches that are best fitted and have better performance in targeting automatic evaluation of buildings' energy performance using EPC data. Moreover, most of the studies use statistical and audit approaches at a multilevel scope [15,17,19,22,24,25,27–41,45,48,49]. However, some studies (13) use the EPC dataset for their analysis [28,29,31,34,50–54,56,57,60,61]. Furthermore, most studies apply simulation techniques and GA for prediction, targeting multi-objective cost-optimal solutions, a promising approach.

We conclude that more research is needed to validate and improve future modeling strategies using EPC datasets and different features. These gaps have shown an opportunity for future research.

#### *5.3. Study Limitations*

Although we tried to guarantee the quality of this review and, particularly, the data selection, this study has limitations. Specifically, we would like to highlight the dependency on the keywords and the selected data repositories, since additional data repositories could be used and only English papers were included, neglecting publications written in other languages. Finally, another important limitation of this study is the time frame, given that we focused on papers published in the last five years, between early 2016 and April 2021.

#### **6. Conclusions**

The PRISMA methodology summarized the SLR analysis and generated a systematic view of ML and statistical approaches applied in improving the EPB which can be used for future research. This study showed that after 2019, most studies used, processed, and analyzed EPC datasets, adopting ML or statistical approaches. Clustering analysis is applied to find similar patterns in buildings' EPC data. Simulation techniques and K-means clustering are the most used approaches to group buildings with similar characteristics. Box plot statistical analysis and dbscan are robust techniques used to eliminate outliers and noise due to their ability to deal with complex and high-dimensional data. Correlation analysis showed that the best approach is to estimate the importance of each analyzed input dimension. Additionally, the literature indicated that the best and most used evaluation method of the performance of the proposed algorithm was the accuracy of the ML-based solution.

Our research findings aim to fulfill identified knowledge gaps and open a methodological agenda that will help the reader identify effective combinations of ML and statistical approaches, addressing EPB and EERM in the future, providing a good starting point for further research.

**Author Contributions:** M.A.: Conceptualization, Methodology, Formal analysis, Literature Review, Investigation, Data curation, Writing—Original Draft, Writing—Review and Editing, Visualization, Project administration. V.S.: Conceptualization, Methodology, Formal analysis, Writing—Review and Editing, Supervision, Project administration, Funding acquisition. M.S.D.: Conceptualization, Methodology, Formal analysis, Writing—Review and Editing, Supervision, Project administration, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by a Ph.D. Scholarship of NOVA IMS supported by project POCI-05-5762-FSE 000223, and its scope lies in the context of Simplex #109 "Consumo SMART". This work is partially funded by national funds through FCT—Foundation for Science and Technology, I.P., under the project FCT UIDB/04466/2020.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** We would like to thank Ricardo Pinto and Vitória Albuquerque [66] for their help reviewing the paper. The authors would also like to thank the reviewers and the editorial team who offered edifying and useful remarks to enhance the quality of the paper.

**Conflicts of Interest:** The authors declare no conflict of interest. The sponsors had no role in the design, collection, analysis, or interpretation of data, or writing of the study, nor in the decision to publish the results.

#### **References**

