Energy Demand Forecasting Scenarios for Buildings Using Six AI Models

Salem, Khaled M.; Rey-Martínez, Francisco J.; Elgharib, A. O.; Rey-Hernández, Javier M.

doi:10.3390/app15158238

Open AccessArticle

Energy Demand Forecasting Scenarios for Buildings Using Six AI Models

by

Khaled M. Salem

^1,2,3

,

Francisco J. Rey-Martínez

^1,3,4

,

A. O. Elgharib

^1,2

and

Javier M. Rey-Hernández

^1,5,6,*

¹

GIRTER Research Group, Consolidated Research Unit (UIC053) of Castile and Leon, 47002 Valladolid, Spain

²

Department of Basic and Applied Science Engineering, Arab Academy for Science, Technology and Maritime Transport (Smart Village Campus), Smart Village, Giza 12577, Egypt

³

Department of Energy and Fluid Mechanics, Engineering School (EII), University of Valladolid (UVa), 47002 Valladolid, Spain

⁴

Institute of Advanced Production Technologies (ITAP), University of Valladolid (UVa), 47002 Valladolid, Spain

⁵

Department of Mechanical Engineering, Fluid Mechanics and Thermal Engines, Engineering School, University of Malaga (UMa), 29016 Málaga, Spain

⁶

RE+ Research Group (TEP1003), University of Málaga (UMa), 29010 Málaga, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8238; https://doi.org/10.3390/app15158238

Submission received: 7 July 2025 / Revised: 21 July 2025 / Accepted: 23 July 2025 / Published: 24 July 2025

Download

Browse Figures

Versions Notes

Abstract

Understanding and forecasting energy consumption patterns is crucial for improving energy efficiency and human well-being, especially in diverse infrastructures like Spain. This research addresses a significant gap in energy demand forecasting across three building types by comparing six machine learning algorithms: Artificial Neural Networks, Random Forest, XGBoost, Radial Basis Function Network, Autoencoder, and Decision Trees. The primary aim is to identify the most effective model for predicting energy consumption based on historical data, contributing to the relationship between energy systems and urban well-being. The study emphasizes challenges in energy use and advocates for sustainable management practices. By forecasting energy demand over the next three years using linear regression, it provides actionable insights for energy providers, enhancing resilience in urban environments impacted by climate change. The findings deepen our understanding of energy dynamics across various building types and promote a sustainable energy future. Stakeholders will receive targeted recommendations for aligning energy production with consumption trends while meeting environmental responsibilities. Model performance is rigorously evaluated using metrics like Squared Mean Root Percentage Error (RMSPE) and Coefficient of Determination (R²), ensuring robust analysis. Training times for models in the LUCIA building ranged from 2 to 19 s, with the Decision Tree model showing the shortest times, highlighting the need to balance computational efficiency with model performance.

Keywords:

sustainability; nZEB; AI integration; energy demand; built environment; smart building

1. Introduction

One of the key components of a nation’s economic stability and environmental sustainability is its energy needs. Spain’s diverse climate, which ranges from scorching summers to frigid winters, significantly influences patterns of energy use, mostly in the residential and commercial sectors. The demands for heating and cooling increase in the winter and summer, respectively, as temperatures rise from hot to cold, placing a significant strain on energy supplies. The ability to effectively manage energy resources by proactively placing an adequate supply with the least amount of environmental destruction results from an understanding of such dynamics. Furthermore, accurate energy demand forecasting, to which Spain has committed to using renewable energy sources, can improve grid integration going forward and advance sustainability [1,2,3,4].

Energy demand forecasting for the next three years will be key in the development of strategies that can balance energy supply with environmental sustainability. It will also, by accurately projecting consumption trends, enable energy providers to plan effectively for renewable energy integration, reducing dependence on fossil fuels and minimizing carbon emissions. This foresight enables optimal resource allocation to ensure energy systems meet peak demands without overloading the environment. In addition, effective demand forecasting of energy will provide economic stability, credible supply, and thus a cleaner and more sustainable energy future [5,6,7].

1.1. Literature Review

Kialashaki and Reisel [8] developed predictive models for energy demand in the residential sector of the United States using Artificial Neural Networks (ANNs) and multiple linear regression (MLR). They sought to forecast future energy consumption trends up to 2030 based on various socio-economic factors. The study also compared the effectiveness of ANN and MLR models in capturing energy demand dynamics. It aimed to provide insights for energy planning and policy-making. Ekici and Aksoy [9] studied the prediction of heating energy requirements for buildings using Artificial Neural Networks. This paper discussed key driving factors such as orientation, insulation thickness, and transparency ratio using a backpropagation neural network to depict energy demands. It addresses the precision of ANN forecasts compared to the traditional method and, at the same time, mentions that it has effectiveness in providing reliable energy consumption forecasts. It finally tries to outline the role of ANN in promoting energy efficiency in building design. ANNs demonstrate potential as an effective instrument to improve building design’s energy efficiency [10,11,12]. Caceres et al. [13] developed a predictive model for household energy demand using the Random Forest algorithm within a Big Data framework. They sought to combine socioeconomic and meteorological data to enhance the accuracy of energy consumption forecasts at different temporal resolutions (weekly, daily, and hourly). The study emphasized the importance of socioeconomic factors in understanding consumer behavior and aimed to provide a scalable solution for real-time energy management in smart grid systems. Improving the accuracy of energy demand forecasts and, consequently, enabling resource conservation and well-informed energy planning promotes socioeconomic segmentation [14,15,16]. Zhang et al. [17] proposed the development of a unified long-term forecasting model for energy consumption and peak power demand utilizing the Sequential-XGBoost algorithm. This study addressed the complexities of conducting separate forecasts by integrating macroeconomic and climatic factors, thereby enhancing accuracy. The study also explored different sequential configurations to optimize predictions for both energy and peak demand. Ultimately, it aimed to improve power system planning and management through more effective forecasting methodologies. Ghods and Kalantar [18] discussed the use of Radial Basis Function Neural Networks (RBFNs) for long-term peak demand forecasting in Iran, targeting predictions from 2007 to 2011. The study emphasized the impact of economic factors over weather conditions in long-term forecasting, highlighting the importance of variables like Gross Domestic Product (GDP), population, and electricity prices. The study demonstrated that RBFNs can provide more accurate predictions compared to traditional methods, with an average annual load growth rate of about 5.35%. Overall, it aimed to improve the reliability of load forecasts for better power system planning. Long-term forecasting is mostly influenced by economic factors, while short-term forecasting is predominantly influenced by weather occurrences [19,20]. Fan et al. [21] investigated the use of Autoencoders for unsupervised anomaly detection in building energy data, addressing the challenges of identifying operational faults without labeled data. They proposed an ensemble method combining various Autoencoder architectures and training schemes, emphasizing their ability to capture intrinsic characteristics of energy consumption patterns. The study evaluated performance through reconstruction residuals and the quality of high-level features extracted. The results indicate that Autoencoders can effectively identify anomalies, offering a data-driven solution for building energy management. Ramos et al. [22] explored the use of Decision Trees to select the most suitable forecasting algorithm—either Artificial Neural Networks or K-Nearest Neighbors—for electricity consumption in an office building, based on five-minute data intervals. They evaluated the accuracy of these selections by analyzing various parameters, including sensor data. The findings indicate that Decision Tree depth influences selection accuracy, with K-Nearest Neighbors being favored during weekdays due to more reliable sensor data. Different modeling approaches for electricity energy use prediction were presented in this paper [23,24,25,26,27]. Tso and Yau [28] provided a comparison of three modeling techniques for electricity consumption forecasting in Hong Kong, including regression analysis, Decision Trees, and neural networks. The authors prove that both Decision Trees and neural networks represent feasible alternatives to the classical models of regression, especially in catching the energy consumption pattern over different seasons. Research showed that all generally comparable models yield slightly better performance when Decision Trees are applied during summer and neural networks in winter. The main factors influencing energy consumption found were household size, appliance ownership, and housing type. Considerable interest in energy management thus underlines the importance of various approaches to modeling. To forecast energy demand using linear regression, we must collect historical energy consumption data along with relevant predictors, fit a linear regression model to the training data and evaluate its performance using metrics like R-squared and Mean Squared Error and then use the model to make future predictions based on new input data [29,30,31,32]. Table 1 summarizes the literature review on energy demand forecasting models. The research gap that was highlighted in the literature on energy demand forecasting identified several key issues. First, many models do not demonstrate real-time applicability, limiting their ability to respond immediately to changes in energy demand, which is necessary for operational HVAC management. Second, there is little consideration of different socio-economic variables—that will more broadly influence energy usage including HVAC usage and efficiency—in relation to energy demand forecasting. Third, we have not considered the potential advantage of simultaneously using multiple forecasting methods, which may improve the accuracy of forecasts, and produce a robust model that more accurately depicts energy demand and the complexities of the interactions between HVAC systems and energy demand. Closing these research gaps is significant for providing a more holistic approach, contributing to the development of more effective methodologies that can inform future energy planning and policy development and that can help to improve the performance of HVAC systems in the framework of contemporary energy management.

1.2. Contributions

This paper aims to create a reliable energy demand forecasting system for Spain by employing six advanced machine learning algorithms to identify the most effective model for predicting energy consumption patterns. To determine the most suitable approach for current demand trends, the study evaluates the performance of Artificial Neural Networks, Random Forest, XGBoost, Radial Basis Network, Autoencoder, and Decision Trees using historical energy data. The selected model will then be utilized to project energy demand for the next three years based on input data and will provide practical recommendations to energy providers regarding infrastructure planning and resource management. The main contributions of this paper can be summarized as follows:

The aim of research is to develop a reliable energy demand forecasting system for Spain.
Six advanced machine learning algorithms—Artificial Neural Networks, Random Forest, XGBoost, Radial Basis Network, Autoencoder, and Decision Trees—are evaluated.
The models are tested using historical energy consumption data to identify the most accurate one.
The best model will be used to forecast energy demand for the next three years.
The study aims to provide strategic recommendations to energy providers for resource management and infrastructure development.

2. Methodology

2.1. Data Collection

The dataset from LUCIA Valladolid includes all variations in energy demand for HVAC systems over a year and corresponds to 8760 h as shown in Figure 1a. This dataset will be very useful in showing the variation in energy consumption due to seasonal changes, daily patterns, and other operational conditions. With this broad dataset, researchers and energy managers will be able to find peak demand periods, evaluate the efficiency of systems, and create strategies for energy optimization in HVAC systems. This could also be helpful in predictive modeling and implementation of demand response strategies. The FUHEM building dataset, containing 100 data points collected randomly, refers to heating energy demand in Madrid as shown in Figure 1b. This dataset gives insight into how various factors influence energy consumption for heating, such as building design, occupancy patterns, and external weather conditions. With the analysis of these data points, stakeholders will have a better understanding of the thermal performance of buildings in urban environments—something quite critical to enhancing energy efficiency. The findings deduced from this dataset will provide insightful suggestions for retrofitting strategies and designing more sustainable buildings. Complementing the analysis of the FUHEM building, the EII dataset includes 260 random data points related to both heating and cooling demands as shown in Figure 1c. This dataset provides a broader perspective on energy consumption across different types of buildings and usage scenarios. By examining both heating and cooling demands, researchers can identify trends and correlations that are essential for developing integrated energy management solutions. The insights gained from the EII dataset can support the design of heating and cooling systems that are more adaptable to varying climate conditions and user requirements [33].

2.2. Data Preprocessing

In the data preprocessing phase, since there are no missing values or outliers in the datasets, the focus is on preparing the data for analysis by arranging it into a suitable format. The datasets—comprising hourly energy demand data from the LUCIA dataset, the limited data from FUHEM, and the combined heating and cooling demands from the EII building—are organized to ensure consistency across different sources. This involves aligning the timeframes, ensuring that timestamps are correctly synchronized, and verifying that all relevant variables are correctly labeled and structured for further analysis. Once the data is properly organized, the next step involves exploring the datasets to understand inherent patterns and relationships. This includes visualizing the demand profiles over time, identifying peak usage periods, and examining seasonal or operational trends. By carefully preparing and analyzing the data in this way, meaningful insights can be gained regarding the interactions between heating and cooling demands, which can inform strategies for improving energy efficiency and optimizing building management systems.

Using deep learning models on very small datasets like FUHEM (100 points) and EII (260 points) is a significant challenge, typically addressed through strategic methodological choices. Justification centers on the specific deep learning approach, often involving a simple architecture and crucially, transfers learning from a larger, relevant pre-trained model to leverage existing learned features. To prevent severe overfitting, aggressive regularization techniques like dropout, L1/L2 penalties, and early stopping are essential. Furthermore, the definition of random sampling is critical; it must clarify how the chosen points, despite their scarcity, were ensured to be representative of the full data’s diversity, particularly in addressing inherent seasonality by distributing samples across different months and time periods, potentially augmented by feature engineering for time-based attributes. Finally, rigorous evaluation using methods like K-Fold Cross-Validation and comparison against traditional machine learning baselines is vital to demonstrate the model’s performance and assess its limited, but justifiable, generalization capabilities [34,35,36].

Data Normalizing

The LUCIA dataset from Valladolid consists of 8760 hourly measurements of energy demand from an HVAC system throughout the year. Therefore, it is necessary to compare this dataset over time and under different conditions. Applying min–max scaling will help standardize the values of energy demand, allowing for clear observation of trends, peak usage times, and consumption patterns. This process is crucial for understanding how the HVAC system performs under various circumstances and can inform strategies for energy efficiency improvements. The FUHEM building in Madrid provides a dataset of 100 random data points related to heating demands. Like the LUCIA dataset, normalization through min–max scaling is essential to account for variations arising from occupancy and outdoor temperatures. This technique adjusts the values to make them more comparable, enabling better analysis of heating performance over time. Consequently, any insights drawn from the data will be reliable and relevant for optimizing the building’s heating processes. In the EII building, the dataset includes 260 random combined data points for heating and cooling needs. Normalization using min–max scaling facilitates the analysis of interactions between heating and cooling demands across various seasons. By standardizing the measurements, it becomes easier to identify relationships between the two demands, which will further assist in uncovering trends and efficiencies [37,38,39]. This holistic approach to normalization supports better decision-making for energy management and contributes to more sustainable building operations. The energy demand forecasting workflow in shown in Figure 2.

2.3. Model Development

AI can be applied to energy demand forecasting; there is a need for robust predictive models that can estimate demand. The ANNs are very effective here due to their strength in capturing complicated nonlinear relations inherent in data. Training ANN models with historical data in energy consumption helps in deducing patterns that could provide useful insights for demand in the future. In addition, ensemble methods such as Random Forest and XGBoost enhance the predictive accuracy by combining the results of a large number of Decision Trees. These models are fit for handling large datasets with several features, which in turn makes them suitable for a variety of factors that may influence energy demand, such as temperature fluctuations, occupancy rates, and building characteristics. Other useful approaches include Radial Basis Networks and Autoencoders, which have advantages in energy demand modeling. RBFs use radial basis functions for activation, which are very effective in capturing the local patterns of the data and, hence, useful in specific energy consumption scenarios. Autoencoders are especially useful for feature extraction and dimensionality reduction. By training an Autoencoder on energy demand data, we can find the latent features contributing to the fluctuation in demand that can then be fed into other models to improve the forecasting accuracy [40,41,42]. Decision Trees, including advanced variants such as Gradient Boosted Trees, remain highly relevant in energy demand forecasting due to their interpretability and ease of use. Decision trees, by splitting the data into subsets based on feature values, can show with clarity how various factors contribute to energy demand. Linear Regression models can make simple predictions, which are easy to interpret and present, by considering historical trends and the integration of external factors. While it does not capture more complex nonlinear relationships as well as more advanced AI techniques, it is a good benchmark and has often been used in conjunction with more sophisticated models to validate results [43,44].

2.3.1. Mathematical Model (ANN)

Energy demand forecasting uses artificial neural networks that seek out complex relationships between input characteristics and energy use. Primarily, the design consists of interlaced layers, beginning with the input layer that simply receives raw data such as past energy consumption, temperature, and economic indicators. The mathematical model for this process, described in Equations (1)–(16), shows how data enter the hidden layers by means of neurons interconnected by weighted connections and activation functions, which can catch even the finest patterns. Finally, an estimation is made for future energy demand. Table 2 also shows how the ANN is trained on historical data and appropriately weighted to minimize prediction error and generalize well to new data, thereby causing significant insights to energy suppliers and policy makers [45]:

2.3.2. Mathematical Model RANDOM Forest (RF)

Random Forest builds with the use of several Decision Trees, an ensemble learning approach for forecasting energy demand, from arbitrary selections of training datasets and variables. Each tree independently makes its own forecast depending on inputs such as weather, socioeconomic, and past consumption design. The mathematical representation of this averaging process, given by Equations (17)–(22), results in the actual forecast from Random Forest, which enhances accuracy and reduces overfitting. Moreover, as seen in Table 3, Random Forest can also give some insight into feature relevance, which can help an energy provider decide on which factor is influencing demand most [46].

2.3.3. Mathematical Model Extreme Gradient Boosting (XGBOOST)

XGBoost is an advanced energy prediction system well-suited for large datasets with complicated interactions. It produces Decision Trees sequentially, such that successive trees try to remedy the errors of the previous ones and the energy demand equation is given by Equations (23)–(28). Minimizing a loss function with gradient boosting maximizes its performance. It captures nonlinear relationships by registering variables like historical energy consumption, weather conditions, and temporal parameters (energy demand equation is shown in Table 4). Due to its internal regularization, it avoids overfitting, thus giving good grounds for energy companies to trust its estimations [47].

2.3.4. Mathematical Model Radial Bias Network (RBF)

A type of neural network known as Radial Basis Function (RBF) algorithms is very good at simulating the nonlinear relationships present in energy demand forecasting. They enable the network to recognize intricate patterns in input by using radial basis functions as activation functions, usually Gaussian functions that react to the distance from a center point. RBF networks can learn from a variety of input elements in the context of energy demand, with the underlying mathematical relationships described by Equations (29)–(32). These elements include historical consumption data, seasonal effects, temperature fluctuations, and economic indicators, which are all further illustrated in Table 5 [48].

2.3.5. Mathematical Model Autoencoder

By identifying intricate patterns in data, Autoencoder algorithms—neural network topologies intended for unsupervised learning—are useful for predicting energy consumption. They are composed of a decoder that reconstructs the original input and an encoder that compresses input data into a lower-dimensional representation. The mathematical framework by which Autoencoders learn from past energy usage data and pertinent characteristics such as temperature and time of day to recognize typical consumption patterns and spot irregularities is presented in Equations (33)–(36). By filtering out noise and comparing incoming inputs to learnt representations, this capacity improves future demand estimates, increasing their accuracy and dependability. Table 6 displays all the Autoencoder’s equations [43].

2.3.6. Mathematical Model Decision Trees

Decision Tree algorithms, which use a tree-like structure to translate input information to projected outcomes, are useful tools for energy demand forecasting. The mathematical basis for how they operate, by recursively dividing the dataset according to the characteristic that yields the most information gain, is provided in Equations (37)–(41). This process considers variables like temperature, time of day, and past usage trends. The result is a set of comprehensible rules that highlight the main causes of energy usage. Decision Trees are capable of handling both continuous and categorical variables, and their use in ensemble techniques such as Random Forests improves the forecasting of energy demand’s predictive accuracy and resilience. As demonstrated in Table 7, these equations are fundamental to their operation [49].

2.4. Evaluation Metrics

In regression analysis and predictive modeling, these metrics characterize the elements that influence how well the model fits the systems under study. Kling–Gupta Efficiency (KGE), Nash–Sutcliffe Efficiency (NSE), mean absolute percentage error (MAPE), and squared mean root percentage error (RMSPE) are often employed metrics. The calculation of these metrics is described by Equations (42)–(46). Researchers and practitioners can better understand how well their models are producing the intended results by using these metrics, which each highlight distinct facets of a model’s predictive ability. The key to developing a successful model is having a solid grasp of the measures that will produce accurate predictions [45]. When comparing the error between the expected and actual numbers, RMSPE and MAPE are frequently utilized. RMSPE places greater focus on significant mistakes that are susceptible to outliers because it is calculated using a quadratic scoring algorithm by squaring differences. Because it can detect big mistakes, it can be utilized in situations where big mistakes must be avoided. However, because MAPE averages the absolute error by each outlier, it is a linear measure of the absolute error and might be less susceptible to them. The performance analysis is mostly defined by KGE and NSE models, especially for water and environmental models. Kling–Gupta Efficiency is a composite measure that makes it possible to analyze factors related to bias, correlation, and discrepancies between estimated and real series. The degree to which the predicted value captures the variation in the observed data is determined by the symmetric measure for R². A robust explanation is indicated by values near to 1. These metrics provide a sound basis on which model performance evaluation and improvement can be based as shown in Table 8, therefore offering enhanced insight and decision-making in modeling [50].

2.5. Optimization Procedures

Energy usage is taken into consideration when optimizing the performance of different machine learning models. The data are systematically separated into training, validation, and test sets using the MATLAB software. First, the data are separated into features and a target variable, where the training set takes up 70% of the entire data and the remaining 30% is used for validation and testing. Printing these set sizes ensures that dataset proportions remain clear, which is crucial for model evaluation. To balance computational efficiency and model complexity, we used a meticulously planned two-hidden-layer architecture for the Artificial Neural Network model, with five neurons in each layer chosen through iterative testing. The Levenberg–Marquardt backpropagation algorithm was used in the training protocol, which had a strict performance goal (mean squared error ≤1 × 10⁻³), a maximum of 2000 epochs, and a learning rate of 0.01. To avoid overfitting and guarantee optimal convergence, the training procedure included an early stopping mechanism that was activated after 20 consecutive validation failures. The Radial Basis Function network underwent specific optimization of its spread factor through an exhaustive grid search across the range of 0.1 to 2.0, with performance evaluated using k-fold cross-validation. The Autoencoder model focused on latent space optimization, where we systematically tested dimensions from 2 to 10 nodes to achieve the optimal balance between dimensionality reduction and feature preservation, measured by reconstruction error minimization, for up to 2000 epochs. For the ensemble tree methods (Random Forest, XGBoost, and Decision Tree), we used an advanced Bayesian optimization technique to simultaneously adjust a number of hyperparameters, such as the number of features to consider at each split, the minimum leaf size, the maximum tree depth, and, for ensemble methods, the number of trees in the forest (e.g., 100 trees for Random Forest). This strategy worked especially well for handling the bias–variance tradeoff while considering the various operating conditions found in the various campus buildings. The block diagram focused on the complex relationship between these elements and their collective impact on energy demand trends as shown in Figure 3. The steps for the optimization procedures method are given as follows:

Split the dataset into three parts: 70% for training, 15% for validation, and 15% for testing.
Choose the network architecture and set the training parameters.
Train the model with the training dataset.
Validate the model’s performance using the validation dataset.
Iterate steps 2 to 4, experimenting with various architectures and training settings.
Identify the optimal network architecture based on validation results.
Evaluate the selected final model using the test dataset to assess its performance.

3. Results and Discussions

The need for energy consumption in HVAC systems to provide both heating and cooling has become essential as the climate varies across the different parts of Spain, while the designs of buildings also differ. Comparing the three types of buildings, variation is seen in terms of factors mainly related to occupancy, insulation, and efficiency in HVAC systems. Results will be present in four categories, including actual versus predict energy use, evaluation metrics, and prediction for after three years. Three-year forecasts based on historical data and trends can be created, enabling proactive adjustments to be made in HVAC, heating and cooling strategies that are congruent with sustainability goals and Spain energy efficiency targets. The following figures illustrate energy demand across specific zones within each building, allowing for a detailed assessment of energy consumption patterns. This zonal analysis highlights the areas with the highest energy usage, providing insight into their contributions to the overall carbon footprint.

3.1. Actual vs. Prediction Energy (kWh/m²)

The LUCIA Building built as a Net Zero Energy Building (nZEB) is going to have an energy-efficient HVAC system, balancing energy consumption versus environmental impact requirements. The box plot in Figure 4a describes points ranging from very little up to very extensive. The building guarantees adequate indoor comfort levels at minimum energy usage. The variance in data indicates periods of increased demand, which may correspond to seasonal variations or occupancy levels. The figures show the positive value is heating and negative value is cooling. The box plot for HVAC demand shows a median close to 100 kWh/m² for actual demand, with predictions from various models (ANN, RF, XG, etc.) displaying significant variability, with some even reaching above 200 kWh/m². Outliers are present, indicating occasional extreme demands, but overall, the predictions tend to cluster around the actual values. In Figure 4b, the box plot represents the heating performance of FUHEM buildings. It demonstrates the performance of the heating system. It has clear implications on the ability of this heating system to maintain certain temperatures during cold months with very little fluctuation, pointing to its effectiveness at regulating temperatures. Most central tendencies of heating data reflect how good design can meet an occupant’s needs of energy. It also handles the aspect of energy economy concern, whereby total energy costs are reduced as one part into the building’s systems. In the heating demand box plot, the actual values are centered around 5 kWh/m², with most prediction models showing similar median values. The variability is low, indicating consistent heating demands, while outliers suggest occasional spikes that the models may struggle to predict accurately. In the EII building, the cooling system’s performance is depicted through a box plot that illustrates temperature management during warmer months, as shown in Figure 4c. The data reveal a stable cooling output, ensuring a comfortable indoor environment despite external heat fluctuations. The range of values indicates periods of increased demand, likely to coincide with peak occupancy or extreme weather conditions. This cooling system is designed to efficiently utilize energy, contributing to the building’s overall sustainability objectives and occupant satisfaction. The box plot for cooling demand indicates that actual values cluster around 55 kWh/m², with some outliers above 90 kWh/m². The heating performance of the EII building is similarly illustrated through a box plot in Figure 4d, showcasing its capability to maintain comfortable indoor temperatures during colder periods. The tight clustering of data points suggests effective thermal management, with the system responding well to changes in external conditions. The heating data reflect a commitment to energy efficiency, aligning with modern sustainability practices. This ensures that the building remains a comfortable space for occupants while minimizing energy consumption and environmental impact. In the heating demand box plot, actual values are concentrated around 3 kWh/m².

Figure 5a displays the probable scatter plot showing predicted vs. actual HVAC performance for the LUCIA building designed to work like a Nearly Zero Energy Building (nZEB). The close alignment of the points to the diagonal line indicates that the prediction models (like ANN, RF, etc.) are effective in estimating actual energy use, showing high accuracy for HVAC demand. The models nearest to the actual values for the LUCIA building HVAC (nZEB) are the ANN and RF models, as indicated by their points being closely aligned with the diagonal line in the scatter plot. In Figure 5b, the focus here is on the heating demand of FUHEM buildings. The scatter plot between the predicted and the actual heating requirement shows a comparatively narrower scatter around the diagonal line, which suggests that the prediction models function effectively in this situation. The models are nearest to the actual values in the scatter plot. In contrast, points farther from the diagonal line would signify a need for improvement in the models. The model nearest to the actual values for the FUHEM buildings is the Autoencoder model. Figure 5c refers to the cooling demand of the EII building. The analysis of the scatter plot aimed to demonstrate the ability of the prediction models to estimate actual cooling needs. The scatter diagram can be analyzed for prediction accuracy. The ANN, XGBoost and RBF models are nearest to the actual values in the cooling building. Figure 5d below presents the predictions of different models of AI type—ANN, Random Forest, XGBoost, RBF networks, Autoencoders, and Decision Trees—versus actual values for heating in the EII building. For each model, it is shown how close the predictions of energy demand are to the actual values. Some models, such as ANN, XGBoost and Autoencoder, show closer clustering to the line, reflecting higher accuracy, while others show more dispersion, reflecting lesser predictive precision. This will, in turn, provide a visual comparison of the performance of each model in terms of the accuracy in forecasting heating from the EII building.

The box plot and scatter line illustrate the comparison between actual and predicted values generated by the AI models for HVAC performance in three distinct building cases, specifically focusing on heating and cooling scenarios. The box plot effectively summarizes the distribution of errors for each model by showcasing the central tendency and variability of the residuals, highlighting the median and interquartile ranges, which help identify how well the models predict actual performance. In contrast, the scatter line offers a visual representation of the correlation between actual and predicted values, revealing any trends or discrepancies that may exist. This combination of visual data is further supported by the accompanying Table 9, which provides specific numerical insights and detailed performance metrics for each model. By analyzing these visual and tabular representations, one can evaluate the accuracy of the models and determine which one performs effectively in predicting HVAC needs across the three different building scenarios, thereby informing better design and operational strategies.

3.2. Evaluation Matrix

These graphs refer to performance metrics of HVAC systems, heating and cooling from three different buildings in Spain. For the LUCIA building in Figure 6a, it can be observed that metrics relate to high efficiencies for the HVAC system designed for near-zero energy consumption; values present significantly reduced consumptions compared with classic ones. As shown in Figure 6b, the FUHEM building presents a strong heating performance whereby the indicators show low energy consumption during the peak winter months, indicating good insulation and appropriate system design. Cooling performance in the EII building, as shown in Figure 6c, indicates that energy consumption is higher during the summer period, with values higher than those in Figure 6d. These differences underline the energy needs arising from seasonal fluctuations and the nature of the building. Based on the dataset, the models for the various buildings vary greatly. For example, the ANN and Autoencoder are appropriate for the LUCIA building, the ANN and Autoencoder for the FUHEM building, the radial bias function for the cooling system in EIIN, and the ANN and Autoencoder for the EII heating. All the values for the matrix are shown in Table 9.

3.3. Training Time for AI Models

The training time of AI models is an important factor in their development and implementation. Training time impacts the efficiency of the model development cycle, which impacts how quickly the model can be iterated through, optimized, or put into use in practice. Longer training times can result in greater computational costs, more energy used, and slower research and development. Additionally, in situations that involve re-training a model frequently or adapting in quick succession according to real-world stimuli, longer training times can greatly limit model response and practicality. Therefore, AI research is concerned with reducing training times while maintaining accuracy in their models, which provides more nimble and economical solutions.

Figure 7 illustrates the training times in seconds for six different AI models across four distinct building scenarios: (a) LUCIA building HVAC (nZEB), (b) FUHEM building (heating), (c) EII building (cooling), and (d) EII building (heating). For the LUCIA building HVAC (nZEB), the training times are as follows: for ANN, approximately 5.5 s; for Random Forest, approximately 2 s; for XGBoost, around 8 s; for RBFN, about 19 s; for Autoencoder, approximately 11 s; for Decision Tree, around 7 s. In the FUHEM building (heating) scenario, the training times are as follows: for ANN, around 1.3 s; for Random Forest, approximately 1.4 s; for XGBoost, about 0.7 s; for RBFN, around 2.1 s; for Autoencoder, about 1.1 s; for Decision Tree, approximately 0.2 s. For the EII building (cooling), the training times observed are as follows: for ANN, approximately 1.8 s; for Random Forest, about 1.7 s; for XGBoost, around 0.8 s; for RBFN, about 3.2 s; for Autoencoder, approximately 2 s; for Decision Tree, about 0.1 s. Finally, for the EII building (heating), the training times are as follows: for ANN, approximately 2.1 s; for Random Forest, around 1.3 s; for XGBoost, about 0.9 s; for RBFN, about 2.2 s; for Autoencoder, approximately 5.8 s; for Decision Tree, around 0.3 s. Across all scenarios, the RBFN model consistently exhibits the longest training time, while the Decision Tree model consistently demonstrates the shortest training time. The variability in training times across different models and scenarios highlights the need to consider computational efficiency alongside performance when selecting an AI model for specific building applications.

3.4. Short-Term AI Prediction

Linear regression serves as a fundamental tool for short-term energy consumption forecasting across various building types, as demonstrated by the examples of the LUCIA, FUHEM, and EII buildings. By establishing a linear relationship between historical energy consumption data and time or other relevant variables, this statistical method allows for the projection of future energy demands, typically for a period of up to three years [36].

Figure 8a illustrates the HVAC energy consumption forecast for the LUCIA building, designed as a Nearly Zero Energy Building (nZEB). The blue bars depict historical energy consumption data, showcasing fluctuations in HVAC usage. In contrast, the red line forecasts energy consumption for heating, expected to increase during colder periods, while the green line forecasts cooling energy needs, projected to rise during warmer months. Additionally, a linear regression helps to represent the trend in energy consumption, with a marked maximum point indicating the peak value expected over three years. The maximum peak after 3 years in heating is around 800 kWh/m² and for cooling, it is around 300 kWh/m². Figure 8b illustrates the heating energy consumption forecast for the FUHEM building over the next three years. The x-axis, representing time in hours, spans from 0 to approximately 300 h, with a scale from 0 to 70 kWh/m². The maximum peak after 3 years in heating is around 70 kWh/m². These figures show the forecast of energy consumption for the EII building cooling and heating demand for the next three years. Figure 8c shows some fluctuations and a peak at approximately 70 kWh/m². The green line of the prediction shows an upward trend in cooling energy consumption, which means that when approaching warmer months, the demand increases. In Figure 8d, on the right-hand side, one can observe significant peaks close to 120 kwh/m². The forecast for heating energy consumption provided by the red line is steadily increasing, considering the rising demands during the colder periods. These graphs together point out the need for effective forecasting for both the cooling and heating energy management in EII.

3.5. Practical Application

The potential implications of the energy demand forecasting models are highly applicable in real-world energy management decisions. As the Autoencoder model has consistently demonstrated the best performance by a considerable margin in regression metrics such as Mean Absolute Percentage Error (MAPE) and Kling–Gupta Efficiency (KGE), stakeholders across any number of building types can have confidence using its predicted energy usage to optimize energy consumption practices. For example, with an MAPE of only 0.1% at the FUHEM building, energy managers can estimate heating needs with unparalleled accuracy to inform the scheduling of energy resources to maximize operational efficiency. Such accuracy not only aids in current energy management practices but also contributes to future energy management and infrastructure planning, which ultimately can increase sustainability in urban settings.

Moreover, understanding the training times of different models provides essential insights into their deployment in real-world applications. The variability in training times emphasizes the importance of balancing computational efficiency with predictive performance. For example, while the Random Forest and Decision Tree models offer faster training times, the superior accuracy of the Autoencoder may justify the additional computational time for critical applications where precision is paramount. This knowledge enables energy managers to make informed decisions about which models to implement based on specific operational needs, ensuring that energy consumption is managed effectively while also preparing for future challenges associated with climate change and urban growth.

4. Conclusions

The evaluation of the LUCIA building was based on a nZEB Building concept; as a consequence, its HVAC system had to be highly evolved in balancing energy consumption and environmental impacts with indoor comfort. The box plots present the median HVAC demand close to 100 kWh/m². Predictions made by different AI models demonstrate high values, with some even over 200 kWh/m². This deviation shows that the models generally agreed with the real values, yet there are further refinements needed for extreme demand. The performance of the FUHEM building heating system is highlighted given the stable temperatures it achieves during colder months with minimal fluctuation, showing effective design and energy economy. The heating demand boxplot gives an actual value of about 5 kWh/m² with a low variability forecast, reinforcing how reliable the system is. The EII building also shows strong performance metrics for its cooling and heating systems, with box plots indicating that the outputs are quite stable even in fluctuating temperatures. Analysis through scatter plots of predicted versus actual performances of HVACs for all buildings further reinforces the efficacy of AI models—the ANN and Autoencoder models—especially in the forecast of energy demands. These forecast figures are expected to be achieved within the next three years, peaking in colder and warmer months of heating and cooling, respectively. This underlines the importance of an accurate energy management strategy; hence, advanced modeling techniques have become highly relevant in optimizing HVAC, heating and cooling performance in modern buildings with sustainability goals. Maximum peak energy demand forecasts for a three-year horizon highlight diverse building needs. For the Nearly Zero Energy Building (nZEB) LUCIA, heating demand is projected to peak around kWh/m², while cooling is expected to reach 300 kWh/m². The FUHEM building’s heating consumption is anticipated to peak at approximately 70 kWh/m². The EII Building’s forecasts indicate cooling demands peaking near 70 kWh/m² and heating demands reaching significant peaks close to 120 kWh/m². These projections underscore the varying energy requirements across different building types and seasons.

Energy demand models presently can capture trends related to various geometries and a range of weather data, but they have issues accurately modeling extremes, especially concerning prediction deviations in buildings, which suggest that they need to be improved or that the data were not accurate enough to account for peaks. Going forward, more robust models should include real-time weather forecasts and detailed occupant behavior modeling to improve responsive demand modeling and predictive models. The models can also be improved with datasets capturing internal heat gains, real-time HVAC sensor data, and energy price data for a more robust energy management solution.

Author Contributions

Conceptualization, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; methodology, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; software, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; validation, K.M.S.; formal analysis, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; investigation, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; resources, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; data curation K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; writing—original draft preparation, K.M.S.; writing—review and editing, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; visualization, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; supervision, A.O.E. and J.M.R.-H.; project administration, F.J.R.-M. and J.M.R.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We would like to acknowledge the support received from “LIFE23-CET-Re-Energize” European Project by University of Málaga, Spain, “EUSUVa4.0” Project by University of Valladolid; “Lime4Health” National Project by Technical University of Madrid (UPM); RED-“TRAPECIO” IberAmerican Project by CYTED (Ibero-American Program of Science and Technology for Development); and ITAP Research Institute at University of Valladolid. We would like to acknowledge the use of MATLAB (Version R2018a, MathWorks, https://www.mathworks.com) for data analysis and visualization in this study. Additionally, the images included in this document were created by the authors and are original works.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

List of Symbols
Variable	Description
X	Input vector: represents the input features to the neural network, where $n$ is the number of input parameters
$z$	Weighted Sum
$W^{1}$	Weight matrix for connections from input to first hidden layer
$b^{1}$	Bias vector for the first hidden layer
$L$	Loss value
$N$	Number of samples
$Y$	Output variable
$\hat{y}$	Predicted demand
$g$	Activation function of the output layer
$E_{actual}$	Actual demand
$δ$	Error term for layers
$f^{'} z$	Derivative of the activation function.
$η$	Learning rate code for controlling the step size
$h^{1}$	Activations from the first hidden layer
$f_{i} (X)$	Represents the prediction from each tree in the forest
$a r g {m a x}_{j \in J} (G a i n (j))$	Node splitting
${\hat{Y}}_{l e a f}$	Leaf predictions
$m$	Number of samples
$I m p o r t a n c e (X_{k})$	Feature importance
$X_{k}$	Each feature
OOB	Out-of-bag observations, which are instances not included in a tree’s bootstrap sample, used for performance estimation
$K$	Sum of the predictions from all Decision Trees in the ensemble
$L o s s (Y_{i}, {\hat{Y}}_{i})$	The error between the actual values $Y_{i}$ $and the expected values Y_{i}$ ,
$Ω (f_{k})$	Regularization of term
$T$	Number of leaves in the tree
$λ$	Weight of the leaf
$G a i n (X_{j}, s p l i t)$	Objective function
$ϕ_{j} (x)$	Activation of the $j$ -th neuron in the hidden layer
$c_{j}$	Center of the $j$ -th RBF
$σ_{j}$	Spread (width) of the RBF
$M$	Number of training samples
$W_{e}$	Weight matrix of the encoder
$b_{e}$	Bias vector of the encoder
$σ$	Activation function (e.g., sigmoid, ReLU)
$W_{d}$	Weight matrix of the decoder
$b_{d}$	Bias vector of the decoder
$\hat{x}$	Reconstructed output
\|\| • \|	Norm (typically L2 norm).
$A$	The attribute being tested
$D_{v}$	The subset of data for value v
$\| D \|$	Size of dataset $D$
$p_{k}$	The fraction of class k in dataset $D$
List of abbreviations
ANNs	Artificial Neural Networks
RFs	Random Forests
XGBoost	Extreme Gradient Boosting
RFB	Radial Bias Function
RMSPE	Root Mean Square Percentage Error
MAPE	Mean Absolute Percentage Error
KGE	Kling–Gupta Efficiency
NSE	Nash–Sutcliffe Efficiency
R²	Coefficient of Determination
nZEB	Nearly Zero Energy Building
AI	Artificial intelligence
MLR	Multiple Linear Regression
GDP	Gross Domestic Product
HVAC	Heating, Ventilation, and Air Conditioning
KNN	K-Nearest Neighbor algorithm

References

Rey-Hernández, J.; Velasco-Gómez, E.; San José-Alonso, J.; Tejero-González, A.; Rey-Martínez, F. Energy Analysis at a Near Zero Energy Building. A Case-Study in Spain. Energies 2018, 11, 857. [Google Scholar] [CrossRef]
Nagai, T. Optimization Method for Minimizing Annual Energy, Peak Energy Demand, and Annual Energy Cost through Use of Building Thermal Storage/Discussion. ASHRAE Trans. 2002, 108, 43. [Google Scholar]
Antonopoulos, I.; Robu, V.; Couraud, B.; Kirli, D.; Norbu, S.; Kiprakis, A.; Flynn, D.; Elizondo-Gonzalez, S.; Wattam, S. Artificial Intelligence and Machine Learning Approaches to Energy Demand-Side Response: A Systematic Review. Renew. Sustain. Energy Rev. 2020, 130, 109899. [Google Scholar] [CrossRef]
Ghalehkhondabi, I.; Ardjmand, E.; Weckman, G.R.; Young, W.A. An Overview of Energy Demand Forecasting Methods Published in 2005–2015. Energy Syst. 2017, 8, 411–447. [Google Scholar] [CrossRef]
Wang, Q.; Li, S.; Li, R. Forecasting Energy Demand in China and India: Using Single-Linear, Hybrid-Linear, and Non-Linear Time Series Forecast Techniques. Energy 2018, 161, 821–831. [Google Scholar] [CrossRef]
Raza, M.Q.; Khosravi, A. A Review on Artificial Intelligence Based Load Demand Forecasting Techniques for Smart Grid and Buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [Google Scholar] [CrossRef]
Bianco, V.; Manca, O.; Nardini, S. Electricity Consumption Forecasting in Italy Using Linear Regression Models. Energy 2009, 34, 1413–1421. [Google Scholar] [CrossRef]
Kialashaki, A.; Reisel, J.R. Modeling of the Energy Demand of the Residential Sector in the United States Using Regression Models and Artificial Neural Networks. Appl. Energy 2013, 108, 271–280. [Google Scholar] [CrossRef]
Ekici, B.B.; Aksoy, U.T. Prediction of Building Energy Consumption by Using Artificial Neural Networks. Adv. Eng. Softw. 2009, 40, 356–362. [Google Scholar] [CrossRef]
Román-Portabales, A.; López-Nores, M.; Pazos-Arias, J.J. Systematic Review of Electricity Demand Forecast Using ANN-Based Machine Learning Algorithms. Sensors 2021, 21, 4544. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Dai, J.; Chen, H.; Lin, B. An ANN-Based Fast Building Energy Consumption Prediction Method for Complex Architectural Form at the Early Design Stage. In Proceedings of the Building Simulation; Springer: Berlin/Heidelberg, Germany, 2019; Volume 12, pp. 665–681. [Google Scholar]
Verma, A.; Prakash, S.; Kumar, A. ANN-based Energy Consumption Prediction Model up to 2050 for a Residential Building: Towards Sustainable Decision Making. Environ. Prog. Sustain. Energy 2021, 40, e13544. [Google Scholar] [CrossRef]
Cáceres, L.; Merino, J.I.; Díaz-Díaz, N. A Computational Intelligence Approach to Predict Energy Demand Using Random Forest in a Cloudera Cluster. Appl. Sci. 2021, 11, 8635. [Google Scholar] [CrossRef]
Naghibi, S.A.; Ahmadi, K.; Daneshi, A. Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resour. Manag. 2017, 31, 2761–2775. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Y.; Zeng, R.; Srinivasan, R.S.; Ahrentzen, S. Random Forest Based Hourly Building Energy Prediction. Energy Build. 2018, 171, 11–25. [Google Scholar] [CrossRef]
Pham, A.D.; Ngo, N.T.; Truong, T.T.H.; Huynh, N.T.; Truong, N.S. Predicting energy consumption in multiple buildings using machine learning for improving energy efficiency and sustainability. J. Clean. Prod. 2020, 260, 121082. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X.; Rubasinghe, O.; Liu, Y.; Chow, Y.H.; Iu, H.H.C.; Fernando, T. Long-Term Energy and Peak Power Demand Forecasting Based on Sequential-XGBoost. IEEE Trans. Power Syst. 2024, 39, 3088–3104. [Google Scholar] [CrossRef]
Ghods, L.; Kalantar, M. Long-Term Peak Demand Forecasting by Using Radial Basis Function Neural Networks. DOAJ 2010, 6, 175–182. [Google Scholar]
de Jesús Rubio, J.; Garcia, D.; Sossa, H.; Garcia, I.; Zacarias, A.; Mujica-Vargas, D. Energy Processes Prediction by a Convolutional Radial Basis Function Network. Energy 2023, 284, 128470. [Google Scholar] [CrossRef]
Lin, W.-M.; Gow, H.-J.; Tsai, M.-T. An Enhanced Radial Basis Function Network for Short-Term Electricity Price Forecasting. Appl. Energy 2010, 87, 3226–3234. [Google Scholar] [CrossRef]
Fan, C.; Xiao, F.; Zhao, Y.; Wang, J. Analytical Investigation of Autoencoder-Based Methods for Unsupervised Anomaly Detection in Building Energy Data. Appl. Energy 2018, 211, 1123–1135. [Google Scholar] [CrossRef]
Ramos, D.; Faria, P.; Morais, A.; Vale, Z. Using Decision Tree to Select Forecasting Algorithms in Distinct Electricity Consumption Context of an Office Building. Energy Rep. 2022, 8, 417–422. [Google Scholar] [CrossRef]
Wu, W.; Dandy, G.C.; Maier, H.R. Protocol for Developing ANN Models and Its Application to the Assessment of the Quality of the ANN Model Development Process in Drinking Water Quality Modelling. Environ. Model. Softw. 2014, 54, 108–127. [Google Scholar] [CrossRef]
Yu, S.; Wang, K.; Wei, Y.-M. A Hybrid Self-Adaptive Particle Swarm Optimization–Genetic Algorithm–Radial Basis Function Model for Annual Electricity Demand Prediction. Energy Convers. Manag. 2015, 91, 176–185. [Google Scholar] [CrossRef]
Fast, M.; Assadi, M.; De, S. Development and Multi-Utility of an ANN Model for an Industrial Gas Turbine. Appl. Energy 2009, 86, 9–17. [Google Scholar] [CrossRef]
Ahmad, T.; Chen, H. Nonlinear Autoregressive and Random Forest Approaches to Forecasting Electricity Load for Utility Energy Management Systems. Sustain. Cities Soc. 2019, 45, 460–473. [Google Scholar] [CrossRef]
Mohan, R.; Pachauri, N. An ensemble model for the energy consumption prediction of residential buildings. Energy 2025, 314, 134255. [Google Scholar] [CrossRef]
Tso, G.K.F.; Yau, K.K.W. Predicting Electricity Energy Consumption: A Comparison of Regression Analysis, Decision Tree and Neural Networks. Energy 2007, 32, 1761–1768. [Google Scholar] [CrossRef]
Pino-Mejías, R.; Pérez-Fargallo, A.; Rubio-Bellido, C.; Pulido-Arcas, J.A. Comparison of Linear Regression and Artificial Neural Networks Models to Predict Heating and Cooling Energy Demand, Energy Consumption and CO₂ Emissions. Energy 2017, 118, 24–36. [Google Scholar] [CrossRef]
Mądziel, M. Future Cities Carbon Emission Models: Hybrid Vehicle Emission Modelling for Low-Emission Zones. Energies 2023, 16, 6928. [Google Scholar] [CrossRef]
Salem, K.M.; Elreafay, A.M.; Abumandour, R.M.; Dawood, A.S. Modeling Two-Phase Gas-Solid Flow in Axisymmetric Diffusers Using Cut Cell Technique: An Eulerian-Eulerian Approach. Bound. Value Probl. 2024, 2024, 150. [Google Scholar] [CrossRef]
Abumandour, R.M.; El-Reafay, A.M.; Salem, K.M.; Dawood, A.S. Numerical Investigation by Cut-Cell Approach for Turbulent Flow through an Expanded Wall Channel. Axioms 2023, 12, 442. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Hernández, J.M.; Elgharib, A.O.; Rey-Martínez, F.J. Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions. Appl. Sci. 2025, 15, 6806. [Google Scholar] [CrossRef]
Ali, D.M.T.E.; Motuzienė, V.; Džiugaitė-Tumėnienė, R. Ai-Driven Innovations in Building Energy Management Systems: A Review of Potential Applications and Energy Savings. Energies 2024, 17, 4277. [Google Scholar] [CrossRef]
Piersigilli, P.; Citroni, R.; Mangini, F.; Frezza, F. Electromagnetic Techniques Applied to Cultural Heritage Diagnosis. State of the Art and Future Prospective. A Comprehensive Review. Appl. Sci. 2025, 15, 6402. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Hernández, J.M.; Rey-Martínez, F.J.; Elgharib, A.O. Assessing the Accuracy of AI Approaches for CO2 Emission Predictions in Buildings. J. Clean. Prod. 2025, 513, 145692. [Google Scholar] [CrossRef]
Chen, Y.-H.; Li, Y.-Z.; Jiang, H.; Huang, Z. Research on Household Energy Demand Patterns, Data Acquisition and Influencing Factors: A Review. Sustain. Cities Soc. 2023, 99, 104916. [Google Scholar] [CrossRef]
Elreafay, A.M.; Salem, K.M.; Abumandour, R.M.; Dawood, A.S.; Al Nuaimi, S. Effect of Particle Diameter and Void Fraction on Gas–Solid Two-Phase Flow: A Numerical Investigation Using the Eulerian–Eulerian Approach. Comput. Part Mech. 2024, 12, 289–311. [Google Scholar] [CrossRef]
D’Agostino, D.; Minelli, F.; Minichiello, F. An Innovative Multi-Stakeholder Decision Methodology for the Optimal Energy Retrofit of Shopping Mall Buildings. Energy Build. 2025, 115958. [Google Scholar] [CrossRef]
Vieri, A.; Gambarotta, A.; Morini, M.; Saletti, C. An Integrated Artificial Intelligence Approach for Building Energy Demand Forecasting. Energies 2024, 17, 4920. [Google Scholar] [CrossRef]
Abbasimehr, H.; Paki, R.; Bahrini, A. A Novel XGBoost-Based Featurization Approach to Forecast Renewable Energy Consumption with Deep Learning Models. Sustain. Comput. Inform. Syst. 2023, 38, 100863. [Google Scholar] [CrossRef]
Elhabyb, K.; Baina, A.; Bellafkih, M.; Deifalla, A.F. Machine learning algorithms for predicting energy consumption in educational buildings. Int. J. Energy Res. 2024, 2024, 6812425. [Google Scholar] [CrossRef]
Chen, S.; Guo, W. Auto-Encoders in Deep Learning—A Review with New Perspectives. Mathematics 2023, 11, 1777. [Google Scholar] [CrossRef]
Salem, K.M.; Rady, M.; Aly, H.; Elshimy, H. Design and Implementation of a Six-Degrees-of-Freedom Underwater Remotely Operated Vehicle. Appl. Sci. 2023, 13, 6870. [Google Scholar] [CrossRef]
Betiku, E.; Omilakin, O.R.; Ajala, S.O.; Okeleye, A.A.; Taiwo, A.E.; Solomon, B.O. Mathematical Modeling and Process Parameters Optimization Studies by Artificial Neural Network and Response Surface Methodology: A Case of Non-Edible Neem (Azadirachta Indica) Seed Oil Biodiesel Synthesis. Energy 2014, 72, 266–273. [Google Scholar] [CrossRef]
Farnaaz, N.; Jabbar, M.A. Random Forest Modeling for Network Intrusion Detection System. Procedia Comput. Sci. 2016, 89, 213–217. [Google Scholar] [CrossRef]
Lee, S.; Park, J.; Kim, N.; Lee, T.; Quagliato, L. Extreme Gradient Boosting-Inspired Process Optimization Algorithm for Manufacturing Engineering Applications. Mater. Des. 2023, 226, 111625. [Google Scholar] [CrossRef]
Liu, J. Radial Basis Function (RBF) Neural Network Control for Mechanical Systems: Design, Analysis and Matlab Simulation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; ISBN 3642348165. [Google Scholar]
Tom, E.; Schulman, K.A. Mathematical Models in Decision Analysis. Infect. Control Hosp. Epidemiol. 1997, 18, 65–73. [Google Scholar] [CrossRef] [PubMed]
Robinson, C.; Dilkina, B.; Hubbs, J.; Zhang, W.; Guhathakurta, S.; Brown, M.A.; Pendyala, R.M. Machine Learning Approaches for Estimating Commercial Building Energy Consumption. Appl. Energy 2017, 208, 889–904. [Google Scholar] [CrossRef]

Figure 1. Scheme for the buildings in Spain: (a) LUCIA building (nZEB), (b) FUHEM building and (c) EII building.

Figure 2. Energy demand forecasting workflow.

Figure 3. Block diagram for analyzing energy demand.

Figure 4. Box plot for 3 different buildings: (a) LUCIA building HVAC (nZEB), (b) FUHEM building (heating), (c) EII building (cooling) and (d) EII building (heating).

Figure 5. Scatter line for 3 different buildings: (a) LUCIA building HVAC (nZEB), (b) FUHEM building (heating), (c) EII building (cooling) and (d) EII building (heating).

Figure 6. Evaluation matrix for 3 different buildings: (a) LUCIA building HVAC (nZEB), (b) FUHEM building (heating), (c) EII building (cooling) and (d) EII building (heating).

Figure 7. Training time for AI models: (a) LUCIA building HVAC (nZEB), (b) FUHEM building (heating), (c) EII building (cooling) and (d) EII building (heating).

Figure 8. The energy demand after 3 years using linear regression: (a) LUCIA building HVAC (nZEB), (b) FUHEM building (heating), (c) EII building (cooling) and (d) EII building (heating).

Table 1. Summary of literature review.

Reference	Mathematical Model	Purpose	Accuracy
Kialashaki and Reisel [8]	ANN, MLR	To forecast residential energy demand up to 2030 based on socioeconomic factors	ANN and MLR models are at a similar level during the test period, though ANN shows sensitivity to recent economic fluctuations.
Ekici and Aksoy [9]	ANN	To predict heating energy requirements and improve building design efficiency	The ANN model achieved a high prediction accuracy with a deviation of 3.43% and a success rate of 94.8–98.5% for estimating building energy needs.
Caceres et al. [13]	Random Forest	To enhance accuracy of household energy demand forecasts using socio-economic and meteorological data	The Random Forest model in a Big Data architecture achieves high-resolution energy demand forecasts (weekly, daily, hourly) with consistent accuracy, though prediction error increases over longer time gaps.
Zhang et al. [17]	XGBoost	To develop a unified model for energy consumption and peak power demand	The XGBoost-based model achieves superior accuracy in predicting energy 1–3 years ahead and peak power forecasts by integrating macroeconomic using MAE and MAPE, climatic, and consumption data, outperforming state-of-the-art methods.
Ghods and Kalantar [18]	Radial Basis Function Neural Networks (RBFN)	To forecast long-term peak demand and improve reliability of load forecasts	The RBFN model predicts Iran’s peak load growth (37,138 MW to 45,749 MW, 2007–2011) with 5.35% annual growth, driven primarily by economic factors.
Fan et al. [21]	Autoencoders	To detect anomalies in building energy data without labeled data	The Autoencoder-based ensemble method enables unsupervised anomaly detection in building energy data with interpretable scores (0–1), identifying faults, inefficiencies, and atypical events while reducing preprocessing needs through robust feature extraction.
Ramos et al. [22]	Decision Trees	To select the most suitable forecasting algorithm for electricity consumption	This study optimizes 5 min building energy forecasts by using Decision Trees to dynamically switch between ANN and k-NN, achieving near-100% accuracy on weekdays (Mon–Fri) while sensor data validate KNN as the preferred choice.
Tso and Yau [28]	Regression, Decision Trees, Neural Networks	To compare forecasting techniques for electricity consumption in Hong Kong	Decision Trees (RASE: 0.15) surpass ANN (0.17) and regression (0.18) in summer by prioritizing flat size, household size, and AC use (59% load), while winter models converge (RASE: 0.16–0.18) with housing type and appliances as dominant factors.

Table 2. Mathematical model for ANN.

Layer	Equation Description	Equation	NO. Equation
Input Layer	Input Features	$x = [x_{1}, x_{2}, \dots, x_{n}]$	(1)
Hidden Lever 1	Weighted Sum	$z^{1} = W^{1} x + b^{1}$	(2)
Hidden Layer 2	Weighted Sum	$z^{2} = W^{2} h^{1} + b^{2}$	(3)
Output Layer	Weighted Sum	$z^{3} = W^{3} h^{2} + b^{3}$	(4)
	Final Output (Prediction)	$\hat{y} = g z^{3}$	(5)
Loss Function	Mean Squared Error	$\begin{array}{r} L = \frac{1}{N} \sum_{i = 1}^{N} {(\hat{y} - E_{actual})}^{2} \end{array}$	(6)
Backpropagation	Gradient of Loss with respect to. Output	$\frac{\partial L}{\partial \hat{y}} = - \frac{2}{N} E_{actual} - \hat{y}$	(7)
	Gradient with respect to Last Layer	$δ^{3} = \frac{\partial L}{\partial z^{3}} = \frac{\partial L}{\partial y} g^{'} z^{3}$	(8)
	Gradient with respect to Hidden Layer 2	$\begin{array}{r} δ^{2} = δ^{3} \cdot W^{3} \cdot f^{'} z^{2} \end{array}$	(9)
	Gradient with respect to Hidden Layer 1	$δ^{1} = δ^{2} \cdot W^{2} \cdot f_{z}^{'}$	(10)
Weight Updates	Update Rule for Weights (Layer 1)	$\begin{array}{r} W^{1} \leftarrow W^{1} - η \cdot δ^{1} \cdot x^{T} \end{array}$	(11)
	Update Rule for Weights (Layer 2)	$\begin{array}{r} W^{2} \leftarrow W^{2} - η δ^{2} \cdot h^{1 T} \end{array}$	(12)
	Update Rule for Weights (Output Layer)	$\begin{array}{r} W^{3} \leftarrow W^{3} - η δ^{3} \cdot h^{2 T} \end{array}$	(13)
	Update Rule for Biases (Hidden Layer 1)	$b^{1} \leftarrow b^{1} - η \cdot δ^{1}$	(14)
	Update Rule for Biases (Hidden Layer 2)	$b^{2} \leftarrow b^{2} - η \cdot δ^{2}$	(15)
	Update Rule for Biases (Output Layer)	$b^{3} \leftarrow b^{3} - η \cdot δ^{3}$	(16)

Table 3. Mathematical model for RF.

Component	Equation	NO. Equation
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(17)
Ensemble Prediction	$\hat{Y} = \frac{1}{N} \sum_{i = 1}^{N} f_{i} (X)$	(18)
Tree Structure	Each tree $f_{i} (X)$ is constructed using random samples of features and instances
Node Splitting	${arg max}_{j \in J} ($ Gain $(j))$	(19)
Leaf Prediction	${\hat{Y}}_{leaf} = \frac{1}{m} \sum_{j = 1}^{m} Y_{j}$	(20)
Feature Importance	Importance $(X_{k}) = \frac{1}{N} \sum_{i = 1}^{N}$ Gain $(X_{k})$	(21)
Error Estimation	$OOB Error = \frac{1}{N} \sum_{i = 1}^{N} I (Y_{i} \neq {\hat{Y}}_{i})$	(22)

Table 4. Mathematical model for XGBoost.

Component	Equation	NO. Equation
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(23)
Model Equation	$\hat{Y} = \sum_{k = 1}^{K} f_{k} (X)$	(24)
Objective Function	$L = \sum_{i = 1}^{N} L o s s (Y_{i}, {\hat{Y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})$	(25)
Regularization Term	$Ω (f_{k}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}$	(26)
Tree Splitting Gain	$G a i n (X_{j}$ $, split) = \frac{1}{2} (\frac{{(\sum_{i \in L} g_{i})}^{2}}{\sum_{i \in L} h_{i} + λ} + \frac{{(\sum_{i \in R} g_{i})}^{2}}{\sum_{i \in R} h_{i} + λ} - \frac{{(\sum_{i} g_{i})}^{2}}{\sum_{i} h_{i} + λ})$	(27)
Final Prediction	$\hat{Y} =$ $base_score + \sum_{k = 1}^{K} f_{k} (X)$	(28)

Table 5. Mathematical model (RBF).

Component	Equation	NO. Equation
Input variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(29)
Activation of the j-th neuron in the hidden layer	$ϕ_{j} (x) = e^{- \frac{{∥ x \cdot c_{j} ∥}^{2}}{2 σ_{j}^{2}}}$	(30)
Output of the RBF	$y (x) = \sum_{j = 1}^{N} w_{j} ϕ_{j} (x)$	(31)
Error can be computed	$E = \frac{1}{2} \sum_{i = 1}^{M} {(y_{i} - {\hat{y}}_{i})}^{2}$	(32)

Table 6. Mathematical model Autoencoder.

Component	Equation	NO. Equation
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(33)
Encoding Process	$z = f (x) = σ (W_{e} x + b_{e})$	(34)
Decoding Process	$\hat{x} = g (z) = σ (W_{d} z + b_{d})$	(35)
Loss Function	$L = \frac{1}{N} \sum_{i = 1}^{N} {∥ x_{i} - {\hat{x}}_{i} ∥}^{2}$	(36)

Table 7. Mathematical model Decision Trees.

Component	Equation	NO. Equation
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(37)
Gini Impurity	$G i n i (D) = 1 - \sum_{k = 1}^{K} p_{k}^{2}$	(38)
Entropy	$E n t r o p y (D) = - \sum_{k = 1}^{K} p_{k} {l o g}_{2} (p_{k})$	(39)
Mean Squared Error (MSE)	$M S E = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \hat{y})}^{2}$	(40)
Information Gain	$I G (D, A) = E n t r o p y (D) - \sum_{v \in V a l u e s (A)} \frac{\| D_{v} \|}{\| D \|} E n t r o p y (D_{v})$	(41)

Table 8. Evaluation metrics equation.

Component	Equation	NO. Equation
Root Mean Square Percentage Error (RMSPE)	$R M S P E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} * 100$	(42)
Mean Absolute Percentage Error (MAPE)	$M A P E = \frac{1}{N} \sum_{i = 1}^{N} \| y_{i} - {\hat{y}}_{i} \| * 100$	(43)
Kling–Gupta Efficiency (KGE)	$K G E = 1 - \sqrt{(r - 1)^{2} + {(\frac{σ_{model}}{σ_{obs}} - 1)}^{2} + {(\frac{μ_{model}}{μ_{o b s}} - 1)}^{2}}$	(44)
Nash–Sutcliffe Efficiency (NSE)	$N S E = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \overline{y})}^{2}}$	(45)
Coefficient of Determination (R²)	$R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \overline{y})}^{2}}$	(46)

Table 9. AI models with evolution matrix.

AI Models	ANN	RF	XGBoost	RBF	Autoencoder	Decision Trees
Buildings	ANN	RF	XGBoost	RBF	Autoencoder	Decision Trees
LUCIA Building	MAPR = 3.62%	MAPR = 4.37%	MAPR = 3.4%	MAPR = 10.6%	MAPR = 4.95%	MAPR = 4.01%
	RMSPR = 11.43%	RMSPR = 14.13%	RMSPR = 16%	RMSPR = 28%	RMSPR = 12.48%	RMSPR = 18.50%
	KGE = 0.9373	KGE = 0.7501	KGE = 0.65	KGE = 0.58	KGE = 0.8237	KGE = 0.8338
	NSE = 0.9267	NSE = 0.8870	NSE = 0.88	NSE = 0.57	NSE = 0.9126	NSE = 0.8081
	R² = 0.9267	R² = 0.8870	R² = 0.88	R² = 0.57	R² = 0.9126	R² = 0.8081
FUHEM Building	MAPR = 0.32%	MAPR = 0.71	MAPR = 0.71%	MAPR = 3.2%	MAPR = 0.1%	MAPR = 0.5%
	RMSPR = 1.4%	RMSPR = 2.9%	RMSPR = 2.1%	RMSPR = 7.4%	RMSPR = 0.3	RMSPR = 1.5%
	KGE = 0.81	KGE = 0.6	KGE = 0.71	KGE = 0.35	KGE = 0.96	KGE = 0.8
	NSE = 0.96	NSE = 0.81	NSE = 0.9	NSE = 0.23	NSE = 0.9	NSE = 0.95
	R² = 0.96	R² = 0.81	R² = 0.9	R² = 0.23	R² = 0.99	R² = 0.95
EII Building (Cooling)	MAPR = 1%	MAPR = 0.7%	MAPR = 1.0%	MAPR = 0.1%	MAPR = 0.1%	MAPR = 0.3%
	RMSPR = 1%	RMSPR = 1.6%	RMSPR = 2%	RMSPR = 0.9%	RMSPR = 0.3%	RMSPR = 0.7%
	KGE = 0.9	KGE = 0.81	KGE = 0.86	KGE = 0.9	KGE = 0.97	KGE = 0.97
	NSE = 0.94	NSE = 0.92	NSE = 0.77	NSE = 0.96	NSE = 0.99	NSE = 0.97
	R² = 0.94	R² = 0.92	R² = 0.77	R² = 0.96	R² = 0.99	R² = 0.93
EII Building (Heating)	MAPR = 0.1%	MAPR = 1.09%	MAPR = 0.34%	MAPR = 0.1%	MAPR = 0.01%	MAPR = 0.2%
	RMSPR = 0.2%	RMSPR = 2.5%	RMSPR = 1.1%	RMSPR = 0.1%	RMSPR = 0.015%	RMSPR = 0.8%
	KGE = 0.97	KGE = 0.6	KGE = 0.92	KGE = 0.99	KGE = 0.99	KGE = 0.95
	NSE = 0.99	NSE = 0.74	NSE = 0.94	NSE = 0.99	NSE = 1	NSE = 0.97
	R² = 0.99	R² = 0.74	R² = 0.94	R² = 0.99	R² = 1	R² = 0.97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salem, K.M.; Rey-Martínez, F.J.; Elgharib, A.O.; Rey-Hernández, J.M. Energy Demand Forecasting Scenarios for Buildings Using Six AI Models. Appl. Sci. 2025, 15, 8238. https://doi.org/10.3390/app15158238

AMA Style

Salem KM, Rey-Martínez FJ, Elgharib AO, Rey-Hernández JM. Energy Demand Forecasting Scenarios for Buildings Using Six AI Models. Applied Sciences. 2025; 15(15):8238. https://doi.org/10.3390/app15158238

Chicago/Turabian Style

Salem, Khaled M., Francisco J. Rey-Martínez, A. O. Elgharib, and Javier M. Rey-Hernández. 2025. "Energy Demand Forecasting Scenarios for Buildings Using Six AI Models" Applied Sciences 15, no. 15: 8238. https://doi.org/10.3390/app15158238

APA Style

Salem, K. M., Rey-Martínez, F. J., Elgharib, A. O., & Rey-Hernández, J. M. (2025). Energy Demand Forecasting Scenarios for Buildings Using Six AI Models. Applied Sciences, 15(15), 8238. https://doi.org/10.3390/app15158238

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Energy Demand Forecasting Scenarios for Buildings Using Six AI Models

Abstract

1. Introduction

1.1. Literature Review

1.2. Contributions

2. Methodology

2.1. Data Collection

2.2. Data Preprocessing

Data Normalizing

2.3. Model Development

2.3.1. Mathematical Model (ANN)

2.3.2. Mathematical Model RANDOM Forest (RF)

2.3.3. Mathematical Model Extreme Gradient Boosting (XGBOOST)

2.3.4. Mathematical Model Radial Bias Network (RBF)

2.3.5. Mathematical Model Autoencoder

2.3.6. Mathematical Model Decision Trees

2.4. Evaluation Metrics

2.5. Optimization Procedures

3. Results and Discussions

3.1. Actual vs. Prediction Energy (kWh/m2)

3.2. Evaluation Matrix

3.3. Training Time for AI Models

3.4. Short-Term AI Prediction

3.5. Practical Application

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. Actual vs. Prediction Energy (kWh/m²)