Prediction of Energy Efficiency for Residential Buildings Using Supervised Machine Learning Algorithms

Mahmood, Tahir; Asif, Muhammad

doi:10.3390/en17194965

Open AccessArticle

Prediction of Energy Efficiency for Residential Buildings Using Supervised Machine Learning Algorithms

by

Tahir Mahmood

¹

and

Muhammad Asif

^2,3,*

¹

School of Computing, Engineering and Physical Sciences, University of the West of Scotland, Paisley PA1 2BE, UK

²

Architectural Engineering and Construction Management, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia

³

IRC Sustainable Energy Systems, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(19), 4965; https://doi.org/10.3390/en17194965

Submission received: 26 July 2024 / Revised: 16 September 2024 / Accepted: 25 September 2024 / Published: 4 October 2024

(This article belongs to the Special Issue Climate Change and Sustainable Energy Transition)

Download

Browse Figures

Versions Notes

Abstract

:

In the era of digitalization, the large availability of data and innovations in machine learning algorithms provide new potential to improve the prediction of energy efficiency in buildings. The building sector research in the Kingdom of Saudi Arabia (KSA) lacks actual/measured data-based studies as the existing studies are predominantly modeling-based. The results of simulation-based studies can deviate from the actual energy performance of buildings due to several factors. A clearer understanding of building energy performance can be better established through actual data-based analysis. This study aims to predict the energy efficiency of residential buildings in the KSA using supervised machine learning algorithms. It analyzes residential energy trends through data collected from an energy audit of 200 homes. It predicts energy efficiency using five supervised machine learning algorithms: ridge regression, least absolute shrinkage and selection operator (LASSO) regression, a least angle regression (LARS) model, a Lasso-LARS model, and an elastic net regression (ENR) model. It also explores the most significant explanatory energy efficiency variables. The results reveal that the ENR model outperforms other models in predicting energy consumption. This study offers a new and prolific avenue for the research community and other building sector stakeholders, especially regulators and policymakers.

Keywords:

buildings; energy efficiency; machine learning; renewable energy; energy management

1. Introduction

Global warming is considered to be the most daunting challenge to the planet [1]. Wide-ranging weather anomalies and a pattern of more frequent and severe natural catastrophes, such as storms, flooding, droughts, and forest fires, are being brought on by global warming and the ensuing changes in the climate. To combat climate change and achieve sustainability goals, the global community requires major changes across all spheres of life. The building sector is an essential and critical part of modern societies. Buildings not only influence the infrastructure but also the socio-economic and technological fabric of society. The building sector, accounting for 36% and 40% of the total energy and natural resource consumption, respectively, while emitting over one-third of greenhouse gas (GHG) emissions, is a significant stakeholder in the global energy and environmental scenarios [2]. Buildings’ role in energy and environmental outlooks worldwide is projected to be more significant in the future owing to factors like growing population and urbanization. Compared to 2018, by 2050, the global building stock is expected to experience an increment of approximately 2.5 trillion ft² in floor area [3]. Improvement in the building sector, due to its mounting energy and natural resource consumption, is imperative to address the energy and environmental challenges [4,5,6]. Buildings’ leading role in this respect has been validated by the United Nation’s Sustainable Development Goals, while SDG 11 aims to promote sustainable cities and communities, a number of other SDGs also support a sustainably built environment [7]. The findings of the International Panel on Climate Change (IPCC) suggest that to satisfy the Paris Climate Agreement’s goals, the world needs major changes across four main global systems, energy, land use, cities, and industry [8,9], all of which are closely related to the building sector.

Energy is used in modern buildings throughout their life cycle including construction, usage, maintenance, and decommissioning.

Buildings’ energy use can be classified as direct and indirect; the former encompasses features like construction, operation and maintenance, and renovation/demolition, while the latter mainly covers the embodied energy of materials and equipment and installation processes [10]. Of all the stages involved in this entire life cycle, the operational phase accounts for most of the energy used by buildings and can range from 40% to 90% depending on factors like the nature of the building, user behavior, and weather conditions [11]. Against the backdrop of the fight against climate change, sustainable buildings are considered an integral part of the global drive for sustainability. Sustainable building initiatives are actively being realized on the technological and policy fronts. While building performance regulations and standards are being made stringent through robust policy frameworks, technological advancements are improvising effective solutions to help the cause. Improvement in their energy performance through energy conservation and management is considered to be the most viable and cost-effective sustainable initiative for buildings. The starting point of any energy efficiency program is a robust understanding of the baseline in terms of existing energy consumption trends and practices. Energy efficiency is an established yet actively evolving subject in terms of technological and policy developments.

The Kingdom of Saudi Arabia (KSA) is one of the largest countries in the Middle East in terms of population, economy, and land area. It also has rich fossil fuel reserves and is experiencing a fast-growing building sector [12]. Buildings in this country have a high energy consumption compared to global standards [13,14,15]. Energy performance improvement is an active area of research [5,6,16], and policy frameworks are being increasingly developed. Insulation, for example, has been made mandatory in new buildings and declared a prerequisite for electricity connection. There is a realization in this country that the energy and environmental performance of buildings needs to be improved to make the building sector more sustainable [8,12]. The existing studies in the KSA context are mainly simulation-based [17,18]. There is a critical shortage of actual/monitored data to undertake more practical building performance analysis.

The effective implementation of any energy efficiency program critically depends on the robustness of the required data. In the wake of smart and digital technologies, the amount of available data is rapidly growing. The analysis and interpretation of the vastly available data are also important. The need for improved monitoring and reporting from large volumes of data presents huge opportunities to employ modern analysis techniques and tools, including machine learning [19]. In the wake of digitalization, there is great drive across various sectors to explore ML’s potential [20]. ML includes a variety of powerful tools, facilitating the extraction of valuable insights from raw data (e.g., class prediction, pattern recognition), which can then be used to assist enterprises in improving their operations and strategic decisions.

Machine learning is increasingly being used in building energy performance analysis as well. Researchers have studied the application of machine learning from different perspectives. Several studies have examined the application of machine learning in energy analysis and renewables’ applications in buildings and building indoor air quality [21,22,23,24]. However, owing to factors like the growing recognition of the need for an improved energy performance of buildings around the world, the developed nations’ strive to enhance their existing energy efficiency standards, advancements in technology, and the availability of new datasets, machine learning will continue to find new and more impactful applications in buildings. The present study has geographic significance as well. In the KSA’s context, there is a clear gap when it comes to actual data-based studies, and currently, no such study exists. For example, Ahmed et al. [13] examined buildings’ energy performance based on a simulation model. Ahmed and Asif [18] studied building information modeling (BIM)-based retrofitting for buildings. Syed and Abdou [17] also studied building energy performance with the help of modeling. It is noted that there is no study on the prediction of energy efficiency using the supervised machine learning method nor on the findings of significant variables for the energy efficiency of buildings in the KSA. The KSA and other countries in the Gulf Cooperation Council (GCC) region are initiating energy efficiency programs [4,5,9]. Therefore, the present study offers a new and prolific avenue to the research community and other building sector stakeholders, especially regulators and policymakers. The present study aims to conduct the following:

Analyze residential energy trends through data collected from an energy audit of 200 homes in the KSA.
Predict energy efficiency using five supervised machine learning algorithms: ridge regression, LASSO regression, a least angle regression (LARS) model, a Lasso-LARS model, and an elastic net regression (ENR) model.
Explore the most significant explanatory variables of energy efficiency for KSA buildings.

The rest of this article is structured as follows: An overview of the current energy efficiency practices in the KSA is provided in Section 2; studies on energy efficiency using supervised machine learning models are reviewed in Section 3; a data description and preprocessing are provided in Section 4; the mathematical details of supervised machine learning algorithms and performance evaluation criteria are described in Section 5. The results and discussion are provided in Section 6, and the main conclusions and recommendations are presented in Section 7.

2. Energy Efficiency in KSA’s Building Sector

Saudi Arabia is one of the leading nations in the world in terms of per capita energy consumption [25]. One of the major factors in the extensive use of energy in this country is its extreme climatic conditions. Its harsh climate is characterized by hot, dry, and humid weather, which exposes buildings to the sun for an extended period. Around 2,000 kWh/m² of radiation is emitted annually on average, ranging from 1630 kWh/m² in Tabuk to 2560 kWh/m² in Bisha [25]. Buildings significantly rely on air conditioning, which typically consumes around 70% of a building’s electricity consumption, to deal with the high temperatures. The building sector is thus significantly contributing to the energy and environmental footprint of the country. The nation’s electricity needs are almost entirely satisfied by fossil fuels, mainly oil and gas. The country has recently started to diversify its energy mix by setting up solar and wind power projects. In recent decades, domestic oil consumption has rapidly increased for several reasons, such as growth in population, low energy prices, modernization, and economic and infrastructure growth. The annual growth rates in energy consumption and customer numbers have been estimated to be up to 8% and 5%, respectively. In a business-as-usual scenario, the demand is expected to continue to increase rapidly in the future, as some studies estimate a 100% spike by 2025 compared to the level in 2009 [10].

Low electricity tariffs have been one of the factors contributing to the traditionally high energy usage in the KSA’s building sector. Customers’ interest in energy-efficient products is observed to be low. In the country’s Eastern Province, the Energy Usage Index (EUI) values for apartments, traditional homes, and villas are found to be 196.5 kWh/m²/year, 156.5 kWh/m²/year, and 150 kWh/m²/year, respectively [25]. The situation, however, is changing as electricity tariffs have seen a rise over 300% since 2015. Also, Saudi Arabia has started several projects to address the rising energy demand, particularly in the residential and industrial sectors. Energy efficiency has become an area of priority in the energy sector. There is, however, a lack of understanding of energy conservation in society as a whole. In Saudi Arabia and the GCC region, little research has been conducted to quantify the influence of demand-side factors on building electricity consumption [26]. There is a greater need to understand energy demand dynamics to make effective energy efficiency and conservation frameworks.

Despite the importance of energy efficiency in buildings, the construction industry places sustainability quite low on its list of priorities, as shown by the fact that approximately 70% of the nation’s buildings lack thermal insulation [27,28]. The retrofitting of existing buildings can significantly reduce their energy requirements. The energy retrofitting of existing buildings is still in the very early stages in the country. To make the building sector sustainable, not only are new buildings required to be made more energy-efficient, but the existing ones also need to be retrofitted. The KSA needs to follow global energy efficiency trends and best practices.

It is noteworthy that in comparison to other sectors such as industry, agriculture, and transport, the building sector has the highest potential to offer substantial, long-term, and economically viable solutions to improve energy efficiency and lower GHG emissions. With techno-economically mature and viable energy efficiency solutions, the use of energy in both new and existing buildings can be curtailed by up to 80%, potentially resulting in a net profit for the building’s life cycle [29]. Between 2017 and 2040, through technology and policy advancements, the energy efficiency of buildings is likely to improve by 40%. Figure 1 highlights the several ways in which buildings can save energy [30].

3. Review of Machine Learning Applications for Buildings’ Energy Efficiency

In the past, many regression model-based studies were designed to predict building energy consumption. As a result, many of them worked under the strong assumption of linearity between explained and explanatory variables. However, in practice, this assumption is violated in many cases, and using a linear model produces wrong decisions. Hence, in this era of advancement, machine learning models that are free from such statistical assumptions are designed. In the last decade, very few studies were designed to predict energy use intensity (EUI) using machine learning algorithms. A few of them are listed as follows: Kontokosta and Tull [31] implemented linear regression (OLS), random forest (RF), and support vector regression (SVR) models to fit New York City’s energy data. Lim and Zhai [32] utilized OLS, SVR, artificial neural network (ANN) [33], multivariate adaptive regression spline (MARS), and Gaussian process emulator (GPE) models to predict the EUI of the simulated data generated through EnergyPlus. Deng et al. [34] adopted OLS, least absolute shrinkage and selection operator (LASSO) regression, SVR, ANN, RF, and extreme gradient boosted decision tree (XGBoost) models to forecast the energy efficiency reported in a survey (CBECS 2012) conducted for commercial buildings of the United States. Papadopoulos and Kontokosta [35] used XGBoost to predict the building EUI of New York City’s multifamily residential building stock of 7500.

Wang et al. [36] utilized the long short-term memory (LSTM), ANN, and SVR methods to predict the energy efficiency of the building of the campus of Southeast University, Nanjing, China. Abbasabadi et al. [37] implemented several statistical and machine learning algorithms (i.e., OLS, RF, ANNs, nonlinear regression (NLR), classification and regression trees (C and RTs) and k-nearest neighbors (k-NN)) to predict the energy efficiency of Chicago’s buildings. Xu et al. [38] used an integrated social network analysis-based artificial neural network (SNA-ANN) algorithm on 17 buildings in the Southeast University, Nanjing, China, to forecast the accuracy of energy efficiency. Mohammadiziazi and Bilec [39] applied the OLS, RF, XGBoost, and decision tree (DT) methods on a USA commercial building energy consumption survey dataset to predict the EUI. Jia et al. [40] used OLS, SVR, ANN, and XGBoost models to forecast the EUI of simulated building data. Li and Yao [41] implemented SVR with Gaussian, linear, and polynomial kernels and XGBoost, RF, OLS, ridge regression, RF, ANNs, LASSO regression, and elastic net to predict heating and cooling energy using a case study in Chongqing, China. Ding et al. [42] used OLS, ridge regression, and SVR with linear kernel, DT, RF, and XGBoost models on a dataset related to 2370 public buildings in Chongqing, China. Jin et al. [43] used XGBoost, Extra Trees, LASSO, elastic net, and Bayesian regression models to predict New York City’s energy efficiency data. Jiang et al. [44] implemented a dynamically updated multi-fold semi-supervised learning method based on deep neural networks (DUMSL-DNN) to predict the buildings’ EUI in the Manhattan borough in New York. Jin et al. [45] used the SMOTE algorithm for the removal of the unbalancing of data, while RF, Adaboost, and Local interpretable Model-Agnostic explanation (LIME) were used to predict the EUI. Singh et al. [46] applied a convolutional neural network (CNN) approach to predict the EUI of the simulated data generated through EnergyPlus. Seyedzadeh et al. [47] and Sun et al. [48] provided a detailed review on machine learning models used in the prediction of buildings’ energy consumption.

Recently, Rau et al. [49] described a systematic approach to energy efficiency utilizing neural networks, with OS-ELM demonstrating good accuracy and 12.2% potential savings. Egwim et al. [50] revealed that ensemble strategies, particularly stacking, outperform others in UK building energy estimates. Sapnken et al. [51] showed that deep neural networks attain 96% accuracy in predicting building energy usage. Aderibigbe et al. [52] emphasized deep learning’s advantage in energy forecasting and suggested increasing AI-driven management. Nur-E-Alam et al. [53] integrated solar technologies to optimize energy systems, demonstrating that Middle Eastern cities exceed Toronto in electricity generation. Ali et al. [54] used machine learning to forecast urban building energy performance with 91% accuracy, which aids in sustainable planning. Das et al. [55] examined the impact of machine learning in increasing building energy efficiency and identified areas for future research. Boutahri and Tilioua [56] used random forest and XGBOOST to optimize HVAC systems, resulting in excellent accuracy. Christensen et al. [57] illustrated that data-driven strategies enhance retrofit impact forecasts, resulting in higher net benefits. Cui et al. [58] discovered that LightGBM and CatBoost excel at estimating energy use for US apartments and households and providing energy-saving recommendations. Yussuf and Asfour [59] discussed AI’s role in improving energy efficiency throughout a building’s lifetime. Arowoiya et al. [60] investigated digital twin technology for building efficiency, highlighting its rise and the need for improved sensors and occupant behavior studies.

In conclusion, machine learning algorithms are rapidly being utilized to forecast building energy consumption, beyond the limitations of conventional linear regression methods. Recent research shows that machine learning approaches like LASSO regression, ridge regression, elastic net regression, deep neural networks, and ensemble methods improve accuracy while also saving energy. The advantages of machine learning include the following: (a) ML models, particularly ensemble and deep learning techniques, exceed standard linear models in prediction accuracy and (b) advanced ML models deliver significant energy savings and better building system optimization. However, several limitations include the following: (a) ML models can be complex and require significant data and processing power. (b) Limited sensor acceptance and difficulty interpreting occupant behavior restrict broader applicability and efficacy.

4. Data Description and Processing

In this study, a dataset of several key energy performance parameters of residential buildings was used. The data were collected through an energy audit process. The level of the energy audit was equivalent to a typical preliminary or ASHRE’s level-1 energy audit. The audit process considered the local cultural sensitivities and had limited access to the audited buildings. Where direct observation by the audit team could not be managed, the required information was acquired through interviews with occupants. A total of 200 buildings’ data were recorded for the variables dwelling type, covered area, occupancy status, construction year, exterior wall insulation, parapet wall (wall along the parameter of the roof, also serving as the safety fence), window frame, glazing type, air conditioning system, lighting, electricity bill, and energy use intensity (EUI). This study is focused on finding significant contributors to the EUI of Al Khobar’s buildings and finding appropriate machine learning algorithms to reveal some future impacts on energy efficiency.

Figure 2 shows the research structure that was used for this investigation. Data preparation is the first step in this process, which involves carefully examining the building data to look for outliers, missing information, and inconsistencies. Six buildings’ resident information was found to be missing during this phase, which resulted in the elimination of these data points. The interquartile range rule was also used to identify outliers in the Energy Usage Index (EUI) variable. Ten outliers were found and eliminated from the dataset. Consequently, a dataset of 184 buildings was used to start the modeling phase. Table 1 provides descriptive characteristics for this dataset. There are nine explanatory factors in this dataset, as Table 1 shows. Among them, the binary values in the range of [0,1] pertaining to the kind of dwelling, occupancy status, parapet wall, and window frame were left unaltered. We used one-hot encoding to turn category categories into dummy variables for the remaining variables. With the EUI acting as the explained variable and the remaining 30 variables acting as explanatory variables, the processed dataset has 31 variables and 184 occurrences (buildings). Preprocessing ensures that the data are prepared correctly for further analysis and modeling.

In the second step, we divided the complete data into two sets based on the 70:30 rule to evaluate supervised machine learning models. A total of 70% of random data (129 observations) were considered the training set, while 30% of the remaining data (55 observations) were considered the testing set. The models discussed in Section 5.1, Section 5.2, Section 5.3 and Section 5.4 were developed based on the training set, while the test set was used to test the model’s predictions. The models were evaluated based on common performance measures such as the Mean Square Error (MSE), Root Mean Square Error (RMSE), coefficient of determination

(R^{2})

[61], Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and correlation coefficient (r), which are briefly discussed in Section 5.5.

5. Supervised Machine Learning Models

As mentioned above, the final processed data consisted of 30 explanatory variables, and the EUI was considered an explained variable. Hence, to model or predict the EUI based on 30 explanatory variables, supervised machine learning models such as the ridge regression, LASSO regression, LARS, LASSO-LARS, and elastic net regression (ENR) models were implemented due to their interpretability, efficiency, and suitability for the dataset and problem context. These regression-based methods were chosen not only for their ability to handle multicollinearity and perform variable selection but also because they can identify the most significant explanatory variables from the final selected model, thereby enhancing the model’s predictive accuracy and providing valuable insights into the factors driving energy consumption. Hence, this section discusses the supervised machine learning methods adopted to model the EUI against 30 explanatory variables using a training set. Moreover, performance measures are also presented in this section.

5.1. Ridge Regression

Ridge regression is an advanced form of ordinary least square (OLS) regression in which a linear function is fitted with multiple explanatory variables. Assume we have

n

data points,

Y

is an explained variable, and

X

represents

p

explanatory variables; then, the OLS model can be expressed as follows:

Y_{(1 \times n)} = X_{(n \times p + 1)} β_{(p + 1 \times 1)} + ε_{(1 \times n)},

(1)

where

ε

represents an error in the model. In OLS,

β s

are estimated by minimizing the error function given as follows:

\frac{1}{n} \sum_{i = 1}^{n} {(X β - y)}^{2} .

(2)

However, in ridge regression [62],

β s

are estimated by minimizing the panelized error function expressed as follows:

\frac{1}{n} \sum_{i = 1}^{n} {(X β - y)}^{2} + α \sum_{j = 0}^{p} β_{j}^{2} .

(3)

where

α > 0

is a complexity parameter that controls the amount of shrinkage. Therefore, ridge regression shrinks

β

estimates close to zero. Moreover, ridge regression is useful when there is significant correlation among explanatory variables, termed multicollinearity.

5.2. Least Absolute Shrinkage and Selection Operator (LASSO) Regression

Least absolute shrinkage and selection operator (LASSO) regression is also an advanced form of OLS regression used to fit a linear function with multiple explanatory variables (cf. Equation (1)). Similar to ridge regression, in LASSO regression [63],

β s

are also estimated by minimizing the panelized error function expressed as follows:

\frac{1}{n} \sum_{i = 1}^{n} {(X β - y)}^{2} + α \sum_{j = 0}^{p} |β_{j}| .

(4)

However, in LASSO regression, the penalty function is the absolute of the regression coefficients, while in ridge regression, the penalty function is the square of the regression coefficients. In ridge regression, coefficients are shrunken to near zero, but in LASSO, the coefficients of insignificant variables are estimated at exactly zero. Therefore, LASSO regression is also considered a dimension reduction technique.

5.3. Least Angle Regression (LARS) and LASSO-LARS Model

Least Angle Regression (LARS) is a supervised machine learning algorithm, similar to the forward stepwise regression, used for high dimensional data [64]. The LARS algorithm works by finding the explanatory variable that is highly correlated to the residuals and proceeds the regression line toward its direction until it exhausts the data or reaches another similar or highly correlated explanatory variable. In many practical situations, more than one explanatory variable has an exact correlation, then LARS average the explanatory variables and moves in the direction of the angle to the explanatory variables. Due to this characteristic, this algorithm is named by least angle regression. The LARS is a computationally fast algorithm as compared to LASSO and forward stepwise regression. Moreover, due to its complete piecewise linear paths, it is more efficient as compared to LASSO regression.

The LASSO-LARS model is the LASSO regression method, but the estimates are estimated through the LARS algorithm [65]. This method is also computationally fast and provides efficient estimates as compared to the ordinary LASSO regression [66].

5.4. Elastic-Net Regression (ENR) Model

In the LASSO regression model, variable selection is dependent on the data, which raises the issue of instability. Further, when a group of highly correlated explained variables exists, LASSO uses one of them and ignores the others. Moreover, for high-dimensional data

(p > n)

, LASSO only selects at most

n

explanatory variables before it suffuses. Therefore, to overcome this issue, elastic net regression (ENR) was proposed by Zou and Hastie [67]. In the ENR model, penalties of both ridge and LASSO regressions are used, and

β s

are estimated by minimizing the panelized error function given below:

(\sum_{i = 1}^{n} {(X β - y)}^{2} / 2 n) + λ (\frac{1 - α}{2} \sum_{j = 0}^{p} β_{j}^{2} + α \sum_{j = 0}^{p} |β_{j}|),

(5)

where

α

is the mixing parameter between ridge

(α = 0)

and LASSO

(α = 0)

. Hence, the ENR model is also a regularized regression method based on the linear combination of ridge and LASSO regressions, and it is also useful for the situation of multicollinearity. Moreover, the main difference between ENR and LASSO is that LASSO chooses an explanatory variable at random, while the ENR model is likely to choose both at once.

5.5. Performance Measures

The test set was used to evaluate the efficiency of the above-stated supervised machine learning models. We evaluated the performance from two perspectives: (a) sufficiency (the model fitted sufficient information) and (b) prediction (the model predicted the actual test observations). So, we selected the Mean Square Error (MSE), Root Mean Square Error (RMSE), coefficient of determination

(R^{2})

, Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and correlation coefficient (r), and their mathematical expressions are given below:

M S E = \sum_{i = 1}^{n} \frac{{(y_{i} - {\hat{y}}_{i})}^{2}}{n},

(6)

R M S E = \sqrt{\sum_{i = 1}^{n} \frac{{(y_{i} - {\hat{y}}_{i})}^{2}}{n}},

(7)

R^{2} = \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - {\bar{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}},

(8)

M A E = \sum_{i = 1}^{n} \frac{|y_{i} - {\hat{y}}_{i}|}{n},

(9)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100,

(10)

r = \frac{\sum_{i = 1}^{n} (y_{i} - {\bar{y}}_{i}) ({\hat{y}}_{i} - {\bar{\hat{y}}}_{i})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2} \sum_{i = 1}^{n} {({\hat{y}}_{i} - {\bar{\hat{y}}}_{i})}^{2}}} .

(11)

where

n

is the number of data points,

y_{i}

is the

i^{t h}

observation of an explained variable, and

{\hat{y}}_{i}

is the

i^{t h}

predicted observation. The MSE clearly indicates the model’s prediction error, which is calculated as the difference between the actual and predicted values, whereas the RMSE is the square root of the MSE. The MAE measures the average absolute difference between expected and actual values, and its percentage form is expressed as the MAPE. In contrast to the MSE and RMSE, the MAE and MAPE provide a clear, understandable measure of prediction accuracy without squaring the errors, making them less prone to outliers. The

r

value lies between

- 1

and

+ 1

. If

r = - 1 or + 1

, it reveals a perfect (negative or positive) correlation between predicted and actual observations, while

r = 0

shows no correlation. Additionally,

R^{2}

values range from zero to one, with higher values closer to one indicating a better model fit. Thus, the model with the highest

R^{2}

is considered the most accurate. The decision criteria for the best model are that it will reveal the minimum value of the MSE, RMSE, MAE, and MAPE and the maximum value of

R^{2}

and

r

[68,69].

It should be noted that the MSE and RMSE are commonly used because they penalize larger errors, providing useful insight into models where large deviations are critical.

R^{2}

helps assess the model’s explanatory power, though it can be less reliable for nonlinear models. The MAE and MAPE are more interpretable for average errors, but the MAPE may be problematic for near-zero true values. The correlation coefficient highlights the strength of linear relationships but does not account for overall model fit. These metrics, despite their limitations, were chosen for their robustness and relevance in the context of our study.

6. Results and Analysis

As discussed earlier, we split the whole data into two sets according to the 70:30 ratio in order to assess supervised machine learning models. The training set consists of 70% random data (129 observations), while the testing set consists of 30% remaining data (55 observations). The five above-stated supervised machine learning models were adopted to model the energy efficiency of the buildings. These models have their hyper-parameters, which can affect the performance of the models, and their details and settings are reported in Table 2. For each model, we defined a grid of possible values for key hyper-parameters (e.g., regularization parameters for ridge and LASSO, mixing parameters for elastic net) and performed a systematic search to identify the optimal configuration based on k-fold cross-validation. Rodriguez et al. [70] advocate for 10-fold cross-validation since it provides a good compromise between bias and variance, offering more consistent and robust model performance estimates, particularly for small datasets. Therefore, to find the best tuning parameter(s) of each model, we adopted a 10-fold cross-validation procedure, which was repeated three times with different randomization. Further, the performance measures were calculated for each model and are presented in Table 2, which reveals that the ENR model is the best model among others as it reveals the minimum value of the MSE, RMSE, MAE, and MAPE and the maximum value of

R^{2}

and

r

.

Radar plots were created to match the actual energy efficiency (presented by the green line) and predicted energy efficiency (shown by the dotted red line) values of the supervised machine learning models and are presented in Figure 3. Figure 3a consists of a radar plot between the actual and predicted values of ridge regression, which reveals that many predicted values are off from the actual values. Similar findings are also observed from the radar plots presented in Figure 3b–d, which were designed for the actual and predicted values of LASSO regression, the LARS model, and the LASSO-LARS model, respectively. However, it is found that very few predicted values of the ENR model are off from the actual values (cf. Figure 3e); therefore, the ENR model is considered the best among others to predict energy efficiency.

Furthermore, to discuss the contribution or relationship of each explanatory variable to the prediction of energy efficiency, a radar chart based on the regression coefficients of the ENR model (best-selected model) is presented in Figure 4. The prime findings are listed below:

Negative coefficients are observed for the variables covered area (CA), dwelling type (villa; DT1), occupancy status (owner; OS1), building year (pre-2000 and 2010–2021; CY1 and CY3), exterior wall insulation (yes; EWI1), parapet wall (yes; PW1), glazing (double clear; G1), air conditioning (split; AC1), and lights (incandescent; L4). This reveals that the predicted EUI has an inverse relation with all these variables.
The coefficient of the exterior wall insulation (not known; EWI3), air conditioning (central; AC2), window frame (steel and aluminum; WF1 and WF2), and light (fluorescent; L3) is zero, which means that they do not have any role in the EUI prediction of the buildings.
All remaining variables, i.e., occupants (OC), electricity bill (EB), dwelling type (apartment; DT2), occupancy status (rented; OS2), building year (2000–2010; CY2), exterior wall insulation (no; EWI2), parapet wall (no; PW2), glazing (single clear and tinted; G2 and G3), air conditioning (windows unit; AC3), and lights (CFL and LED; L1 and L2)) have a direct relation with EUI prediction.

The five most significant variables identified in our study are building year (2000–2010 and 2010–2021; CY2 and CY3), dwelling type (apartment and villa; DT2 and DT1), and glazing (double clear; G1). Our analysis reveals that buildings categorized as villas, constructed between 2010 and 2021, with double clear glazing, generally exhibit lower Energy Usage Index (EUI) values. Conversely, apartment buildings tend to have higher EUI values. These findings are consistent with existing studies on the energy performance of buildings in Saudi Arabia [16,45], which suggest that newer buildings with better glazing tend to be more energy-efficient. The practical significance of these results highlights the importance of considering building age, type, and glazing when assessing energy efficiency.

7. Conclusions and Recommendations

Energy efficiency has become critical to the sustainability of buildings. The use of energy in new as well as existing buildings can be curtailed by 30 to 80% with the help of commercially available energy efficiency technologies. Energy efficiency strategies for buildings, however, may vary from region to region depending on several factors including climate conditions, construction details, technologies used and occupancy profile, and consumer behavior. The availability and understanding of data around these parameters are important for devising effective energy efficiency strategies. This study was designed to predict the energy efficiency of buildings in the KSA using the EUI as an index. It is based on ASHARAE level-1 energy audit data gathered from 200 residential buildings. Along with the EUI as an explained variable, several explanatory variables were also recorded, such as dwelling type, covered area, occupancy status, construction year, exterior wall insulation, parapet wall, window frame, glazing type, air conditioning system, lighting, and electricity bill. Initially, in the data preprocessing stage, outliers and missing information-based responses were removed, and the data of the final 184 buildings were considered for further analysis. Usually, statistical models are used to predict energy efficiency by considering the strong assumptions of normality, linearity, and no autocorrelation, and under the violation of such assumptions, machine learning models are the best candidates for energy efficiency prediction. Hence, this study was designed to predict the energy efficiency of Saudi residential buildings using supervised machine learning algorithms such as ridge regression, LASSO regression, the least angle regression (LARS) model, the Lasso-LARS model, and the elastic net regression (ENR) model. The performance of the above-mentioned models was evaluated through well-known performance measures, such as the MSE, RMSE,

R^{2}

, MAE, MAPE, and r. The findings reveal that the ENR model is the best as it shows the minimum value of the MSE, RMSE, MAE, and MAPE (i.e., 4070.93, 63.80, 44.73, and 36.15, respectively) and the maximum value of

R^{2}

and

r

(i.e., 0.68 and 0.83, respectively). Furthermore, based on the coefficients, it is found that building year (2000–2010 and 2010–2021; CY2 and CY3), dwelling type (apartment and villa; DT2 and DT1), and glazing (double clear; G1) are the most significant variables for the prediction of energy efficiency. It is found that parameters such as covered area (CA), dwelling type (villa; DT1), occupancy status (owner; OS1), building year (pre-2000 and 2010–2021; CY1 & CY3), exterior wall insulation (yes; EWI1), parapet wall (yes; PW1), glazing (double clear; G1), air conditioning (split; AC1), and lights (incandescent; L4) are noted to have an inverse relationship with the predicted EUI.

The building sector is being rapidly digitalized. Technologies like sensors, smart meters, building information modeling, building management systems, 3D printing, and robotics are becoming common. Effective monitoring and data analytics are imperative for the optimum utilization of these technologies. Machine learning will, therefore, continue to have new and more impactful applications with regard to the energy performance of buildings due to factors like the growing recognition of the need for an improved energy performance of buildings around the world, especially in developing countries like the KSA where it is a relatively new topic; the developed nations’ strive to enhance their existing energy efficiency standards; advancements in technology; and the availability of new datasets.

The building sector research in the KSA lacks actual/measured data-based studies, as the existing studies are predominantly simulation-based. The results of modeling/simulation-based studies tend to deviate from the actual performance of buildings due to a number of different factors, i.e., modeling parameters, quality of materials and equipment, construction practices, and user behaviors. Despite their usefulness, simulation-based studies have the issue of performance gaps. A clearer understanding of building energy performance can only be established by actual data-based studies. The present study, therefore, offers a new and prolific avenue not only to the research community but also to other building sector stakeholders, especially regulators and policymakers. Similar studies can help the development and refinement of relevant building regulations.

It is emphasized for future studies that the availability of larger datasets will considerably improve the findings of this machine learning investigation. Larger datasets would allow the models to capture more complicated linkages and differences in energy performance across different building types and settings. Furthermore, using more advanced machine learning models, such as the Fully Adaptive Regression Model (FARM) [71], may improve prediction accuracy. With its enhanced adaptive capabilities, the FARM has potential to solve nonlinearities and interactions in data, resulting in more exact and trustworthy predictions of building energy efficiency. This approach may also make incorporating new characteristics and variables easier, enhancing the models and broadening their application to a wider range of building scenarios and geographic areas. Enhanced models and more detailed data will provide more actionable insights for improving energy efficiency and guiding successful energy management methods. Concerned authorities can develop a repository of actual/measured energy data from buildings to make more practical policy frameworks. This will reduce the performance gaps between predicted and actual energy performance, helping craft more effective energy policies tailored to real-world conditions. In this respect, the widespread use of digital technologies, including smart meters, sensors, and building management systems, will be helpful. It will enable the real-time monitoring of energy use and improve the ability to manage building energy performance.

Author Contributions

Conceptualization, M.A. and T.M.; methodology, M.A and T.M.; software, T.M.; validation, M.A. and T.M.; formal analysis, M.A. and T.M.; writing—original draft preparation, M.A. and T.M.; writing—review and editing, M.A.; visualization, T.M.; supervision, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

Authors acknowledge the funding from KFUPM under the project: INRE2319.

Data Availability Statement

The data that support the findings of this study are available upon appropriate request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Amaripadath, D.; Rahif, R.; Zuo, W.; Velickovic, M.; Voglaire, C.; Attia, S. Climate change sensitive sizing and design for nearly zero-energy office building systems in Brussels. Energy Build. 2023, 286, 112971. [Google Scholar] [CrossRef]
Sayadi, S.; Akander, J.; Hayati, A.; Cehlin, M. Analyzing the climate-driven energy demand and carbon emission for a prototype residential nZEB in central Sweden. Energy Build. 2022, 261, 111960. [Google Scholar] [CrossRef]
Phillips, R.; Fannon, D.; Eckelman, M.J. Dynamic modeling of future climatic and technological trends on life cycle global warming impacts and occupant satisfaction in US office buildings. Energy Build. 2022, 256, 111705. [Google Scholar] [CrossRef]
Alazazmeh, A.; Ahmed, A.; Siddiqui, M.; Asif, M. Real-time data-based performance analysis of a large-scale building applied PV system. Energy Rep. 2022, 8, 15408–15420. [Google Scholar] [CrossRef]
Ahmed, W.; Alazazmeh, A.; Asif, M. Energy and Water Saving Potential in Commercial Buildings: A Retrofit Case Study. Sustainability 2023, 15, 518. [Google Scholar] [CrossRef]
Asif, M.; Ahmed, W.; Alazazmeh, A. Energy performance assessment of a post-retrofit office building using measurement and verification protocol: A case study from KSA. Energy Rep. 2023, 9, 1366–1379. [Google Scholar] [CrossRef]
Pandey, A.; Asif, M. Assessment of energy and environmental sustainability in South Asia in the perspective of the Sustainable Development Goals. Renew. Sustain. Energy Rev. 2022, 165, 112492. [Google Scholar] [CrossRef]
Asif, M. The 4Ds of Energy Transition: Decarbonization, Decentralization, Decreasing Use, and Digitalization; John Wiley & Sons: Hoboken, NJ, USA, 2022. [Google Scholar]
Ahmed, W.; Asif, M. A critical review of energy retrofitting trends in residential buildings with particular focus on the GCC countries. Renew. Sustain. Energy Rev. 2021, 144, 111000. [Google Scholar] [CrossRef]
Alazazmeh, A.; Asif, M. Commercial building retrofitting: Assessment of improvements in energy performance and indoor air quality. Case Stud. Therm. Eng. 2021, 26, 100946. [Google Scholar] [CrossRef]
Asif, M. Handbook of Energy Transitions; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar]
Asif, M. Growth and sustainability trends in the buildings sector in the GCC region with particular reference to the KSA and UAE. Renew. Sustain. Energy Rev. 2016, 55, 1267–1273. [Google Scholar] [CrossRef]
Ahmed, W.; Asif, M.; Alrashed, F. Application of building performance simulation to design energy-efficient homes: Case study from Saudi Arabia. Sustainability 2019, 11, 6048. [Google Scholar] [CrossRef]
Alrashed, F.; Asif, M. Analysis of critical climate related factors for the application of zero-energy homes in Saudi Arabia. Renew. Sustain. Energy Rev. 2015, 41, 1395–1403. [Google Scholar] [CrossRef]
Alrashed, F.; Asif, M.; Burek, S. The role of vernacular construction techniques and materials for developing zero-energy homes in various desert climates. Buildings 2017, 7, 17. [Google Scholar] [CrossRef]
Al-Homoud, M.S.; Krarti, M. Energy efficiency of residential buildings in the kingdom of Saudi Arabia: Review of status and future roadmap. J. Build. Eng. 2021, 36, 102143. [Google Scholar] [CrossRef]
Syed, A.T.; Abdou, A.A. A model of a near-zero energy home (NZEH) using passive design strategies and PV technology in hot climates. J. Green Build. 2016, 11, 38–70. [Google Scholar] [CrossRef]
Ahmed, W.; Asif, M. BIM-based techno-economic assessment of energy retrofitting residential buildings in hot humid climate. Energy Build. 2020, 227, 110406. [Google Scholar] [CrossRef]
Olu-Ajayi, R.; Alaka, H.; Sulaimon, I.; Sunmola, F.; Ajayi, S. Machine learning for energy performance prediction at the design stage of buildings. Energy Sustain. Dev. 2022, 66, 12–25. [Google Scholar] [CrossRef]
Alcácer, V.; Cruz-Machado, V. Scanning the industry 4.0: A literature review on technologies for manufacturing systems. Eng. Sci. Technol. Int. J. 2019, 22, 899–919. [Google Scholar] [CrossRef]
Liu, W.; Shen, Y.; Aungkulanon, P.; Ghalandari, M.; Le, B.N.; Alviz-Meza, A.; Cárdenas-Escrocia, Y. Machine learning applications for photovoltaic system optimization in zero green energy buildings. Energy Rep. 2023, 9, 2787–2796. [Google Scholar] [CrossRef]
Tien, P.W.; Wei, S.; Darkwa, J.; Wood, C.; Calautit, J.K. Machine learning and deep learning methods for enhancing building energy efficiency and indoor environmental quality—A review. Energy AI 2022, 10, 100198. [Google Scholar] [CrossRef]
Yang, S.; Chen, W.; Wan, M.P. A machine-learning-based event-triggered model predictive control for building energy management. Build. Environ. 2023, 233, 110101. [Google Scholar] [CrossRef]
Ali, A.; Jayaraman, R.; Mayyas, A.; Alaifan, B.; Azar, E. Machine learning as a surrogate to building performance simulation: Predicting energy consumption under different operational settings. Energy Build. 2023, 286, 112940. [Google Scholar] [CrossRef]
Alrashed, F.; Asif, M. Trends in residential energy consumption in Saudi Arabia with particular reference to the Eastern Province. J. Sustain. Dev. Energy Water Environ. Syst. 2014, 2, 376–387. [Google Scholar] [CrossRef]
Nahiduzaman, K.; Al-Dosary, A.; Abdallah, A.; Asif, M.; Kua, H.; Alqadhib, A. Change-agents driven interventions for energy conservation at the Saudi households: Lessons learnt. J. Clean. Prod 2018, 185, 998–1014. [Google Scholar] [CrossRef]
Sahin, A.Z.; Rehman, S. Economical feasibility of utilizing photovoltaics for water pumping in Saudi Arabia. Int. J. Photoenergy 2012, 2012, 542416. [Google Scholar] [CrossRef]
Mujeebu, M.A.; Alshamrani, O.S. Prospects of energy conservation and management in buildings–The Saudi Arabian scenario versus global trends. Renew. Sustain. Energy Rev. 2016, 58, 1647–1663. [Google Scholar] [CrossRef]
UNEP, Buildings and Climate Change: Summary for Decision Makers. 2009. Available online: https://europa.eu/capacity4dev/unep/document/buildings-and-climate-change-summary-decision-makers (accessed on 20 March 2023).
IEA. Energy Efficiency. 2019. Available online: https://www.iea.org/reports/energy-efficiency-2019 (accessed on 20 March 2023).
Kontokosta, C.E.; Tull, C. A data-driven predictive model of city-scale energy use in buildings. Appl. Energy 2017, 197, 303–317. [Google Scholar] [CrossRef]
Lim, H.; Zhai, Z.J. Comprehensive evaluation of the influence of meta-models on Bayesian calibration. Energy Build. 2017, 155, 66–75. [Google Scholar] [CrossRef]
Ibrahim, M.; Zhang, C.; Mahmood, T. Surveillance of high-yield processes using deep learning models. Qual. Reliab. Eng. Int. 2024. [Google Scholar] [CrossRef]
Deng, H.; Fannon, D.; Eckelman, M.J. Predictive modeling for US commercial building energy use: A comparison of existing statistical and machine learning algorithms using CBECS microdata. Energy Build. 2018, 163, 34–43. [Google Scholar] [CrossRef]
Papadopoulos, S.; Kontokosta, C.E. Grading buildings on energy performance using city benchmarking data. Appl. Energy 2019, 233, 244–253. [Google Scholar] [CrossRef]
Wang, W.; Hong, T.; Xu, X.; Chen, J.; Liu, Z.; Xu, N. Forecasting district-scale energy dynamics through integrating building network and long short-term memory learning algorithm. Appl. Energy 2019, 248, 217–230. [Google Scholar] [CrossRef]
Abbasabadi, N.; Ashayeri, M.; Azari, R.; Stephens, B.; Heidarinejad, M. An integrated data-driven framework for urban energy use modeling (UEUM). Appl. Energy 2019, 253, 113550. [Google Scholar] [CrossRef]
Xu, X.; Wang, W.; Hong, T.; Chen, J. Incorporating machine learning with building network analysis to predict multi-building energy use. Energy Build. 2019, 186, 80–97. [Google Scholar] [CrossRef]
Mohammadiziazi, R.; Bilec, M.M. Application of machine learning for predicting building energy use at different temporal and spatial resolution under climate change in USA. Buildings 2020, 10, 139. [Google Scholar] [CrossRef]
Jia, B.; Hou, D.; Kamal, A.; Hassan, I.; Wang, L. Developing machine-learning meta-models for high-rise residential district cooling in hot and humid climate. J. Build. Perform. Simul. 2022, 15, 553–573. [Google Scholar] [CrossRef]
Li, X.; Yao, R. Modelling heating and cooling energy demand for building stock using a hybrid approach. Energy Build. 2021, 235, 110740. [Google Scholar] [CrossRef]
Ding, Y.; Fan, L.; Liu, X. Analysis of feature matrix in machine learning algorithms to predict energy consumption of public buildings. Energy Build. 2021, 249, 111208. [Google Scholar] [CrossRef]
Jin, X.; Xiao, F.; Zhang, C.; Chen, Z. Semi-supervised learning based framework for urban level building electricity consumption prediction. Appl. Energy 2022, 328, 120210. [Google Scholar] [CrossRef]
Jiang, F.; Ma, J.; Li, Z.; Ding, Y. Prediction of energy use intensity of urban buildings using the semi-supervised deep learning model. Energy 2022, 249, 123631. [Google Scholar] [CrossRef]
Jin, X.; Xiao, F.; Zhang, C.; Li, A. GEIN: An interpretable benchmarking framework towards all building types based on machine learning. Energy Build. 2022, 260, 111909. [Google Scholar] [CrossRef]
Singh, M.M.; Deb, C.; Geyer, P. Early-stage design support combining machine learning and building information modelling. Autom. Constr. 2022, 136, 104147. [Google Scholar] [CrossRef]
Seyedzadeh, S.; Rahimian, F.P.; Glesk, I.; Roper, M. Machine learning for estimation of building energy consumption and performance: A review. Vis. Eng. 2018, 6, 5. [Google Scholar] [CrossRef]
Sun, Y.; Haghighat, F.; Fung, B.C. A review of the-state-of-the-art in data-driven approaches for building energy prediction. Energy Build. 2020, 221, 110022. [Google Scholar] [CrossRef]
Rau, F.; Soto, I.; Zabala-Blanco, D.; Azurdia-Meza, C.; Ijaz, M.; Ekpo, S.; Gutierrez, S. A novel traffic prediction method using machine learning for energy efficiency in service provider networks. Sensors 2023, 23, 4997. [Google Scholar] [CrossRef]
Egwim, C.N.; Alaka, H.; Egunjobi, O.O.; Gomes, A.; Mporas, I. Comparison of machine learning algorithms for evaluating building energy efficiency using big data analytics. J. Eng. Des. Technol. 2024, 22, 1325–1350. [Google Scholar] [CrossRef]
Sapnken, F.E.; Hamed, M.M.; Soldo, B.; Tamba, J.G. Modeling energy-efficient building loads using machine-learning algorithms for the design phase. Energy Build. 2023, 283, 112807. [Google Scholar] [CrossRef]
Aderibigbe, A.O.; Ani, E.C.; Ohenhen, P.E.; Ohalete, N.C.; Daraojimba, D.O. Enhancing energy efficiency with ai: A review of machine learning models in electricity demand forecasting. Eng. Sci. Technol. J. 2023, 4, 341–356. [Google Scholar] [CrossRef]
Nur-E-Alam, M.; Mostofa, K.Z.; Yap, B.K.; Basher, M.K.; Islam, M.A.; Vasiliev, M.; Soudagar, M.E.M.; Das, N.; Kiong, T.S. Machine learning-enhanced all-photovoltaic blended systems for energy-efficient sustainable buildings. Sustain. Energy Technol. Assess. 2024, 62, 103636. [Google Scholar] [CrossRef]
Ali, U.; Bano, S.; Shamsi, M.H.; Sood, D.; Hoare, C.; Zuo, W.; Hewitt, N.; O’Donnell, J. Urban building energy performance prediction and retrofit analysis using data-driven machine learning approach. Energy Build. 2024, 303, 113768. [Google Scholar] [CrossRef]
Das, H.P.; Lin, Y.-W.; Agwan, U.; Spangher, L.; Devonport, A.; Yang, Y.; Drgoňa, J.; Chong, A.; Schiavon, S.; Spanos, C.J. Machine learning for smart and energy-efficient buildings. Environ. Data Sci. 2024, 3, e1. [Google Scholar] [CrossRef]
Boutahri, Y.; Tilioua, A. Machine learning-based predictive model for thermal comfort and energy optimization in smart buildings. Results Eng. 2024, 22, 102148. [Google Scholar] [CrossRef]
Christensen, P.; Francisco, P.; Myers, E.; Shao, H.; Souza, M. Energy efficiency can deliver for climate policy: Evidence from machine learning-based targeting. J. Public Econ. 2024, 234, 105098. [Google Scholar] [CrossRef]
Cui, X.; Lee, M.; Koo, C.; Hong, T. Energy consumption prediction and household feature analysis for different residential building types using machine learning and SHAP: Toward energy-efficient buildings. Energy Build. 2024, 309, 113997. [Google Scholar] [CrossRef]
Yussuf, R.O.; Asfour, O.S. Applications of artificial intelligence for energy efficiency throughout the building lifecycle: An overview. Energy Build. 2024, 305, 113903. [Google Scholar] [CrossRef]
Arowoiya, V.A.; Moehler, R.C.; Fang, Y. Digital twin technology for thermal comfort and energy efficiency in buildings: A state-of-the-art and future directions. Energy Built Environ. 2024, 5, 641–656. [Google Scholar] [CrossRef]
Riaz, M.; Alshammari, H.; Abbas, N.; Mahmood, T. Navigating process drift: The power of CUSUM in monitoring air quality processes and maintenance operations. Arab. J. Sci. Eng. 2024. [Google Scholar] [CrossRef]
Marquardt, D.W.; Snee, R.D. Ridge regression in practice. Am. Stat. 1975, 29, 3–20. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
Januaviani, T.M.A.; Gusriani, N.; Joebaedi, K.; Supian, S.; Subiyanto, S. The best model of LASSO with the LARS (least angle regression and shrinkage) algorithm using Mallow’s Cp. World Sci. News 2019, 116, 245–252. [Google Scholar]
Zhang, L.; Li, K. Forward and backward least angle regression for nonlinear system identification. Automatica 2015, 53, 94–102. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef]
Mahmood, T.; Ahmed, F.; Riaz, M.; Abbas, N. High-dimensional control charts with application to surveillance of grease damage in bearings of wind turbines. Prod. Manuf. Res. 2024, 12, 2377739. [Google Scholar] [CrossRef]
Ahmed, F.; Mahmood, T.; Riaz, M.; Abbas, N. Comprehensive review of high-dimensional monitoring methods: Trends, insights, and interconnections. Qual. Technol. Quant. Manag. 2024. [Google Scholar] [CrossRef]
Rodriguez, J.D.; Perez, A.; Lozano, J.A. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 569–575. [Google Scholar] [CrossRef]
Calik, N.; Güneş, F.; Koziel, S.; Pietrenko-Dabrowska, A.; Belen, M.A.; Mahouti, P. Deep-learning-based precise characterization of microwave transistors using fully-automated regression surrogates. Sci. Rep. 2023, 13, 1445. [Google Scholar] [CrossRef]

Figure 1. Energy saving potential in buildings.

Figure 2. A paradigm for the processes used for implemented predictive modeling.

Figure 3. Radar plots of actual (presented by the green line) and predicted values (shown by the dotted red line) from (a) ridge regression, (b) LASSO regression, (c) LARS model, (d) LASSO-LARS model, and (e) ENR model.

Figure 4. A radar chart based on the coefficients of the ENR model. Where the zero on the radar plot is presented by the green line and regression coefficients were shown by the dotted red line.

Table 1. Descriptive statistics of dataset deployed for model development.

Categorical Variable	Code	Category	Frequency	Percentage
Dwelling type	DT1	Villa	144	78.26%
Dwelling type	DT2	Apartment	40	21.74%
Occupancy status	OS1	Owner	143	77.72%
Occupancy status	OS2	Rented	41	22.28%
Construction year	CY1	Unknown	68	36.96%
	CY2	2000–2010	43	23.37%
	CY3	2010–2021	73	39.67%
Exterior wall insulation	EWI1	Yes	109	59.24%
	EWI2	No	45	24.46%
	EWI3	Unknown	30	16.30%
Parapet wall	PW1	Yes	146	79.35%
Parapet wall	PW2	No	38	20.65%
Window frame	WF1	Aluminum	173	94.02%
Window frame	WF2	Steel	11	5.98%
Glazing	G1	Double Clear	122	66.30%
	G2	Single Clear	37	20.11%
	G3	Single Tinted	25	13.59%
Air conditioning (AC) system	AC1	Split	125	67.93%
	AC2	Central	31	16.85%
	AC3	Window Units	28	15.22%
Lighting	L1	LED	149	80.98%
	L2	CFL	13	7.23%
	L3	Fluorescent	12	6.52%
	L4	Incandescent	10	5.43%
Numeric Variable	Code	Min	Max	Mean ± S.D
Covered area (m²)	CA	100	2000	$472.53 \pm 320.10$
Occupants (count)	OC	2	35	$7.03 \pm 4.21$
Electricity bill (SAR)	EB	55	3000	$888.66 \pm 552.23$
Energy use intensity (EUI) (kWh/m²)	EUI	16	524	$219.98 \pm 115.07$

Table 2. Selected hyper-parameters and performance evaluation of models.

Model	Hyper-Parameter	Tuning Values	$M S E$	$R M S E$	$R^{2}$	$M A E$	$M A P E$	$r$
Ridge regression	Regularization parameter $(α)$	$α = 1000$	4593.80	67.78	0.64	47.79	38.45	0.80
LASSO regression	Regularization parameter $(α)$	$α = 100$	4636.62	68.09	0.64	48.08	38.56	0.80
LARS model	Regularization parameter $(α)$	$α = 1.687$	4393.76	66.29	0.66	48.67	37.68	0.83
Lasso-LARS model	Regularization parameter $(α)$	$α = 1.081$	4207.14	64.86	0.67	47.14	36.87	0.82
ENR model	Compromise between l1 and l2 penalization ( $α$ ). Amount of penalization chosen ( $λ$ ).	$α = 0.88$ $λ = 1$	4070.93	63.80	0.68	44.73	36.15	0.83

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mahmood, T.; Asif, M. Prediction of Energy Efficiency for Residential Buildings Using Supervised Machine Learning Algorithms. Energies 2024, 17, 4965. https://doi.org/10.3390/en17194965

AMA Style

Mahmood T, Asif M. Prediction of Energy Efficiency for Residential Buildings Using Supervised Machine Learning Algorithms. Energies. 2024; 17(19):4965. https://doi.org/10.3390/en17194965

Chicago/Turabian Style

Mahmood, Tahir, and Muhammad Asif. 2024. "Prediction of Energy Efficiency for Residential Buildings Using Supervised Machine Learning Algorithms" Energies 17, no. 19: 4965. https://doi.org/10.3390/en17194965

APA Style

Mahmood, T., & Asif, M. (2024). Prediction of Energy Efficiency for Residential Buildings Using Supervised Machine Learning Algorithms. Energies, 17(19), 4965. https://doi.org/10.3390/en17194965

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Energy Efficiency for Residential Buildings Using Supervised Machine Learning Algorithms

Abstract

1. Introduction

2. Energy Efficiency in KSA’s Building Sector

3. Review of Machine Learning Applications for Buildings’ Energy Efficiency

4. Data Description and Processing

5. Supervised Machine Learning Models

5.1. Ridge Regression

5.2. Least Absolute Shrinkage and Selection Operator (LASSO) Regression

5.3. Least Angle Regression (LARS) and LASSO-LARS Model

5.4. Elastic-Net Regression (ENR) Model

5.5. Performance Measures

6. Results and Analysis

7. Conclusions and Recommendations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI