A Predictive Approach for Energy Efficiency and Emission Reduction in University Campuses

Rey-Hernández, Alberto; San José-Alonso, Julio; Picallo-Perez, Ana; Rey-Martínez, Francisco J.; Elgharib, A. O.; Rey-Hernández, Javier M.; Salem, Khaled M.

doi:10.3390/app15179419

Open AccessArticle

A Predictive Approach for Energy Efficiency and Emission Reduction in University Campuses

by

Alberto Rey-Hernández

^1,2,*,

Julio San José-Alonso

^1,2,3,

Ana Picallo-Perez

^1,4

,

Francisco J. Rey-Martínez

^1,2,3,*

,

A. O. Elgharib

^1,5

,

Javier M. Rey-Hernández

^1,6,7

and

Khaled M. Salem

^1,5

¹

GIRTER Research Group, Consolidated Research Unit (UIC053) of Castile and Leon, 47002 Valladolid, Spain

²

Department of Energy and Fluid Mechanics, Engineering School (EII), University of Valladolid (UVa), 47002 Valladolid, Spain

³

Institute of Advanced Production Technologies (ITAP), University of Valladolid (Uva), 47002 Valladolid, Spain

⁴

Department of Thermal Engineering, Engineering School, University of the Basque Country (UPV/EHU), 01006 Vitoria, Spain

⁵

Department of Basic and Applied Science Engineering, Arab Academy for Science, Technology and Maritime Transport, Smart Village Campus, Giza 12577, Egypt

⁶

Department of Mechanical Engineering, Fluid Mechanics and Thermal Engines, Engineering School, University of Malaga (UMa), 29016 Málaga, Spain

⁷

RE+ Research Group (TEP1003), University of Málaga (UMa), 29010 Málaga, Spain

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9419; https://doi.org/10.3390/app15179419

Submission received: 24 July 2025 / Revised: 12 August 2025 / Accepted: 26 August 2025 / Published: 27 August 2025

(This article belongs to the Special Issue Energy Transition in Sustainable Buildings)

Download

Browse Figures

Versions Notes

Abstract

This study proposes a comprehensive artificial intelligence (AI)-based framework to predict, disaggregate, and optimize energy consumption and associated CO₂ emissions across a multi-building university campus. Leveraging real-world data from 27 buildings at the University of Valladolid (Spain), six AI models—artificial neural networks (ANN), radial basis function (RBF), autoencoders, random forest (RF), XGBoost, and decision trees—were trained on heat exchanger performance metrics and contextual building parameters. The models were validated using an extensive set of key performance indicators (MAPE, RMSE, R², KGE, NSE) to ensure both predictive accuracy and generalizability. The ANN, RBF, and autoencoder models exhibited the highest correlation with actual data (R > 0.99) and lowest error rates, indicating strong suitability for operational deployment. A detailed analysis at building level revealed heterogeneity in energy demand patterns and model sensitivities, emphasizing the need for tailored forecasting approaches. Forecasts for a 5-year horizon further demonstrated that, without intervention, energy consumption and CO₂ emissions are projected to increase significantly, underscoring the relevance of predictive control strategies. This research establishes a robust and scalable methodology for campus-wide energy planning and offers a data-driven pathway for CO₂ mitigation aligned with European climate targets.

Keywords:

energy management; CO₂ emissions; environmental impact; sustainability; Artificial Intelligence (AI); machine learning

1. Introduction

Heat exchangers play a vital role in enhancing energy efficiency within buildings, particularly in response to the country’s commitment to sustainability and climate action. With its diverse climate zones, Europe faces unique heating and cooling challenges. Heat exchangers are increasingly integrated into HVAC systems to optimize energy use, especially in commercial and residential buildings. Heat exchanger systems help move heat efficiently so that less extra heating or cooling is needed. This is favorable in areas with hot summers and mild winters, since huge energy savings occur in maintaining indoor temperatures. Europe’s emphasis on renewable energy, especially solar energy, further warrants the apt utilization of heat exchangers for better utilization of solar energy by buildings. Keeping up with the EU Energy Efficiency Directive targets requires using innovative heat exchanger technology for energy conservation and less greenhouse emissions, putting the country on the one-level priority for green building promotion and envisaging how innovation can co-own the energy dynamics of cities [1,2,3,4,5].

Heat exchangers in Europe are vital in the minimization of CO₂ emissions, particularly in the construction sector. By enhancing the energy efficiency of heating, ventilation, and air conditioning (HVAC) systems, heat exchangers limit the consumption of fossil fuels, which are major sources of carbon emissions. As Spain transitions to a more sustainable energy paradigm, the application of heat recovery systems allows buildings to reuse thermal energy, lowering the total energy consumption by a great margin [6,7,8]. This savings in energy directly equates to reduced CO₂ emissions, which is in line with Spain’s commitment to the European Union’s climate goals. Further, as solar and wind, among other renewable sources of energy, are incorporated more into the energy blend, heat exchangers can optimize energy use in a way that reduces the carbon footprint of buildings even further. In addition to saving energy, heat exchangers are crucial in efforts to combat climate change and lay the foundations for a greener future in Europe and Spain [9,10].

1.1. Literature Review

Among AI models that can help improve heat exchanger efficiency are neural networks, random forest, radial basis function, XGBoost, autoencoders, and decision trees. These models help identify problems, estimate thermal efficiency, and even suggest working conditions that lead to decreased energy consumption and carbon emissions. For example, autoencoders help to detect where the system performance is deficient, while neural networks and XGBoost predict heat transfer rates. Thus, the buildings can adopt these AI tools to ensure better energy management, thereby lowering CO₂ emissions and furthering sustainability goals. Recent research demonstrates how artificial intelligence can optimize heat exchanger performance in building systems. Wenjie Gang and Jinbo Wang [11] developed an ANN model to predict ground heat exchanger temperatures with exceptional accuracy (<0.2 °C error) for hybrid ground source heat pumps, enabling smarter cooling control. Rakesh Kumar et al. [12] created both deterministic and ANN models for earth-to-air heat exchangers, with the intelligent model achieving superior ±2.6% accuracy in outlet temperature predictions. Shojaeefard et al. [13] compared neural network approaches for refrigerant evaporators, finding recurrent neural networks (RNSE = 1.169) outperformed both traditional and GA-optimized feed-forward networks while being computationally efficient. Qinhua Hu et al. [14] established highly accurate static and dynamic ANN models (95–105% agreement, <2.5% error) for MVAC system heat exchangers, providing practical implementation guidance through published network parameters. These studies collectively highlight how different AI techniques—from basic ANNs to advanced RNN and hybrid approaches—can significantly improve heat exchanger modeling accuracy, system efficiency, and control strategies across various building applications, while also reducing computational demands compared to traditional numerical methods [15,16,17]. Taki and Rohani [18] developed machine learning models to predict MSW’s Higher Heating Value (HHV), with RBF-ANN achieving the highest accuracy (0.45% MAPE). Their results demonstrate ANNs’ effectiveness for waste-to-energy applications, outperforming SVM and ANFIS in HHV prediction. Radial bias in heat exchangers refers to uneven temperature distribution due to non-uniform flow, reducing efficiency and causing thermal stress. Computational modeling and design optimization help minimize radial bias, improving heat transfer uniformity and system performance. Manimegalai et al. [19] proposed an autoencoder-based model to enhance heat exchanger design by predicting efficiency and cost, outperforming traditional empirical methods. The model enables rapid design exploration with high accuracy (correlation coefficient: 0.98171, NRMSE: 0.001523), offering a data-driven approach for optimizing chemical processing operations. El Mokhtari and McArthur [20] applied autoencoder algorithms for automated fault detection in HVAC systems, specifically Fan Coil Units (FCUs), addressing limitations in traditional FDD methods. The research introduces a novel approach to distinguish equipment-level from system-level faults and demonstrates cross-unit generalizability, showing autoencoders outperform conventional methods in accuracy and efficiency. Wang et al. [21] applied Virtual In situ Calibration (VIC) with Bayesian inference and MCMC to correct sensor errors in PVT heat pump systems, finding sparse autoencoders more effective than mathematical models. This improves calibration accuracy (>90%) by capturing sensor interconnections, reducing both systematic and random errors.

Tree-based machine learning models, including random forest (RF), XGBoost, and decision trees, have emerged as powerful tools for predicting heat exchanger performance due to their ability to handle complex, nonlinear relationships in thermal systems. These models excel in scenarios where traditional physics-based approaches struggle with dynamic operating conditions or incomplete system data. Random forest, an ensemble method, improves prediction accuracy by aggregating multiple decision trees while reducing overfitting—making it particularly useful for analyzing feature importance in heat exchanger efficiency studies [22,23,24,25]. XGBoost, a gradient-boosted decision tree algorithm, further enhances predictive performance through optimized loss function minimization, demonstrating superior accuracy in evaporator capacity prediction compared to conventional neural networks [26,27,28]. Decision trees, though simpler, provide interpretable rule-based insights into key parameters like flow rates and temperature differentials, serving as a foundation for more advanced ensemble methods. The application of tree-based models in heat exchanger analysis extends beyond performance prediction to system optimization and fault detection. RF and XGBoost have been successfully employed to identify critical operational parameters, such as refrigerant pressure and air mass flow rates, which significantly influence heat transfer efficiency [29]. These models also facilitate sensitivity analyses, enabling engineers to prioritize maintenance actions or design improvements. For instance, XGBoost’s ability to handle missing data makes it robust for real-world HVAC system monitoring, where sensor inconsistencies may occur. Meanwhile, decision trees offer rapid prototyping for preliminary system assessments, though they may lack the precision of ensemble methods. Collectively, these tree-based approaches provide a balance among accuracy, computational efficiency, and interpretability, making them valuable for both academic research and industrial applications in thermal energy systems.

1.2. Contributions

This paper provides a complete AI approach to examine, reduce, and optimize energy use and CO₂ emissions for an entire university campus of 27 buildings. We used six AI techniques—artificial neural networks (ANN), radial basis function (RBF), autoencoders, random forest (RF), XGBoost, and decision trees—to first predict the total energy consumed on the campus as a whole and then disaggregate the predictions to provide the share of energy for each building. Crucially, these models were also employed to forecast short-term energy consumption and CO₂ emissions up to 5 years into the future. This allows us to reliably identify the highest energy-use buildings, and we can target energy conservation measures effectively. The models also reported the CO₂ emissions associated with energy consumption to indicate opportunities for sustainable operation of the campus. Using many AI techniques together produced strong and accurate forecasts, and could form a scalable solution for larger organizations to optimize energy use and emissions from energy consumption in complex-built environments. We provide four key, novel contributions to the study distilled to the key innovative aspects:

Addresses the impact of Industry 5.0 on energy demand, specifically within the context of sustainability and meeting current market needs while minimizing environmental impact.
Fills a gap in energy consumption forecasting by integrating artificial intelligence approaches, specifically artificial neural networks (ANN) and random forests (RF), into the Industry 5.0 framework.
Implements and compares ANN and RF models using real-world energy consumption data from two specific locations in Spain (LUCIA, FUHEM) using a house-developed code based on MATLAB (Version R2018a).
Evaluates and compares the performance of ANN and RF models in predicting energy demand using a comprehensive set of metrics, including Root Mean Square Percentage Error (RMSPE), Root Mean Square Relative Percentage Error (RMSRPE), Mean Absolute Percentage Error (MAPE), Mean Absolute Relative Percentage Error (MARPE), Kling–Gupta Efficiency (KGE), and the coefficient of determination (R²).

This paper is organized as follows: Section 1 offers an overview, including background information, a review of the relevant literature, and the objectives of the study. Section 2 details the methodology, covering data acquisition, normalization techniques, and mathematical modeling with AI, as well as the optimization processes. Section 3 presents the results along with their discussion. Finally, Section 4 provides the conclusions and suggests directions for future research.

2. Methodology

2.1. Data Acquisition

Researchers gathered extensive energy and operation records from the University of Valladolid (UVA) as shown in Figure 1, one of Spain’s earliest and most respected universities, which has been welcoming students since 1241. The campus includes 27 separate buildings—academic faculties, research labs, and administrative offices—each with its own energy habits. Because heat exchangers (HX) are vital to the campus’s heating, cooling, and some industrial activities, special care was taken to track their performance in every structure. Monthly records were pulled for all of 2019, noting thermal loads, fluid flow, temperature differences, and any maintenance issues for each HX. Those detailed HX readings were then linked to each building’s total energy use so that their influence overall campuses efficiency could be clearly seen. The dataset was so constructed as to create a complete monthly energy profile for the year 2019 for 27 buildings and the sample for data in Table 1. To make meaningful comparisons between buildings of variable sizes and different uses, energy consumption was first normalized with respect to floor area. We also collected additional operational data for facilities that possessed big HX systems, such as those with central HVAC plants or engineering labs. This data set considered the interactions of the HX systems with other building systems and seasonal variations in performance [30,31,32,33]. Outside variables, such as weather and building occupancy patterns, were kept into consideration while collecting data to measure their impact on HX performance. This thorough approach allowed us to gain more insight into how HX systems influence the energy behavior of buildings on an individual level and the campus as a whole.

The data collection created a very detailed dataset with 324 monthly data points (27 buildings × 12 months) for the entire University of Valladolid campus. The yearlong dataset documented energy usage and HX performance for each building, also allowing campus- and system-level analyses. The 324-point matrix included synchronized measurements of occupancy patterns, variations in the weather-related component of the load, the thermal energy provided by HX systems, and electrical consumption. The wide temporal span of data allowed identification of both typical seasonal trends—e.g., lab buildings requirements for cooling in summer—and atypical events—e.g., significant efficiency drops in specific HX units in winter. The whole campus and the 27 buildings are shown below [33,34,35,36,37,38].

2.2. Data Preprocessing

The dataset had 324 monthly observations spanning a twelve-month period from 27 buildings. The dataset was complete but needed some cleaning. We put a strict validation process in place to check all metrics, making sure they were measured and followed thermodynamic principles. To compare buildings of different sizes, we divided all energy use metrics by floor area (kWh/m²). We also added time-related components to analyze months and seasons. As we worked through the dataset, we made all the units the same to keep things consistent. We examined the heat exchanger (HX) metrics’ physical bounds. These tests demonstrated that all flow rates (lowest to highest), temperature variations (high to low), and efficiency measures remained within the manufacturers’ permitted ranges. Without deleting any odd points or adding missing information, we accepted the dataset exactly as it was. We determined key HX performance metrics, which were acceptable metrics with maintenance correction and real-time efficiency factors. We further targeted feature selection to reveal the most considerable variables and reduce dimensionality, while preserving 98% of the explanatory power held in the original data. To prepare the data for machine learning, we encoded categorical building attributes and normalized continuous variables using non-distant transformations. The resulting dataset was purposefully optimized for thermal system performance analysis and energy modelling, while 324 observations, all original data, were verified without distortion of the data. Figure 2 show the sequences of data processing [39]. The figure illustrates the sequence of steps involved in data processing and analysis. It begins with data collection, where raw data is gathered. The process then includes data validation to ensure accuracy, followed by data normalization to standardize the data format. Subsequent steps involve feature selection, data encoding, and further data normalization and verification to prepare the data for analysis [40].

2.3. Mathematical Model

The objectives included using six different artificial intelligence algorithms, namely artificial neural networks (ANN), radial basis function networks (RBF), autoencoders, random forest (RF), XGBoost, and decision trees, to study the energy use in the buildings around campus, with an emphasis on the performance of heat exchangers (HXs). Each model was trained to link building energy use with the operating parameters of the HX (e.g., thermal efficiency, flow rates, and temperature differentials). Both ANN and RBF orderings were able to perceive the complex nonlinear relationships between HX performance and energy consumption; meanwhile, the autoencoder function was used mainly for reducing dimensionality and unveiling latent patterns in the data. The tree-based methods examined here (RF, XGBoost, and DT) gave interpretable insights regarding the importance of the features and how HX efficiency can influence energy consumption. When combined, these models provided strong energy use predictions so that the optimization of the HX systems could be directed toward reducing energy waste and improving efficiency on a campus-wide scale [41,42,43,44,45,46,47,48,49,50,51].

The study applied standardized emission factors for different types of fuels to estimate CO₂ emissions from energy consumption on campus. Natural gas, being the main energy source for heating and HX systems, was given an emission factor of between 0 and 0.25 kg CO₂/kWh, depending on the consideration of aspects such as combustion efficiency and supply conditions. Biomass, however, used in a few buildings as a renewable energy alternative, had a much wider range of from 0 to 0.0185 kg CO₂/kWh, given that it can be carbon-neutral when harvested sustainably. These factors were integrated into the AI-generated data for energy consumption to quantify the carbon footprint of each building, with particular attention to emissions associated with HX operations. The study’s results highlighted the significance of HX performance optimization with respect to reducing energy demand and, in turn, CO₂ emissions, thus paving the way toward the university’s sustainability agenda [6].

2.3.1. Mathematical Model (ANN)

Artificial neural network (ANN) modeling seeks to mathematically describe HX performance through the nonlinear mapping the energy efficiency and operating parameters. For this purpose, in the architecture of ANN, the most important variables of the HX—the inlet/outlet temperatures, ṁ, and thermal resistance—are housed in the hidden layers, resulting in an energy consumption curve. In this sense, this data-driven approach was able to illustrate the complex thermodynamics of HX operation and avoid the problems of variable flow conditions associated with the traditional log mean temperature difference (LMTD) methodology [52]. Table 2 shows the mathematical model for ANN.

2.3.2. Mathematical Model (RBF)

The study used a radial basis function (RBF) neural network to model the relationship between heat exchanger (HX) operational parameters and energy demand across campus buildings. The RBF model applied Gaussian kernel functions to represent the nonlinear relationships between HX operational parameters (flow rates, temperature differentials (dT), and thermal load profiles) and the associated energy consumption; therefore, the RBF model could still show localized changes in performance that conventional approaches, which assume linear relationships, would not capture under more complicated scenarios with uncertain demand, such as partial-load. By allowing for the optimization of the spread parameter with cross-validation, the RBF model was able to generate reliable predictions of energy demand with acceptable precision (MAE < 5%) while being able to provide energy demand predictions in a reasonable timeframe. The results show that RBF networks can be useful for modelling energy related relationships at the heat exchanger level, at multiple buildings across a campus, given the complexity of operational parameters/change in demand characteristics across varying thermal zones and occupant behaviors. The model’s ability to interpolate between sparse data points made it valuable for identifying energy-saving opportunities in HX operation without requiring exhaustive physical testing [53]. Table 3 show the mathematical model for RBF.

2.3.3. Mathematical Model Autoencoder

The study investigated the energy requirements of heat exchangers (HX) using an autoencoder neural network. Temperatures, pressure dips, and flow rates were among the operational data that the autoencoder compressed into a latent space representation. This data reduction procedure eliminated noise that would have confused the initial sensor readings while identifying underlying trends and anomalies in HX performance. The rebuilt output will create a clean version of the energy demand signals, which will help to spot system inefficiencies. Each autoencoder variation used a bottleneck structure to describe non-linear relationships between HX parameters and energy logistics output. This allows HVAC systems to adapt their performance in real time as shown in Table 4 [54,55].

2.3.4. Mathematical Model Random Forest

Random forest is an ensemble of many algorithms that can be used for energy demand forecasts. The method builds multiple decision trees using random subsets of training data and features. These decision trees independently make predictions based on input (for example, historical usage, weather, and socioeconomic data). To find the final forecast, the predictions of every decision tree are averaged, which results in a more accurate forecast and prevents overfitting of the model. Random forest also indicates feature importance values for input variables, which can help energy providers evaluate the importance of variables that help predict demand, such as the factor importance lists that are shown in Table 5 [56].

2.3.5. Mathematical Model XGBoost

XGBoost is an advanced machine learning technique used for energy demand forecasting, known for its effectiveness with large datasets and complex relationships. It builds decision trees sequentially, where each tree addresses errors from previous ones, optimizing performance through gradient boosting to minimize a specified loss function. By incorporating features like historical energy consumption, weather patterns, and time variables, XGBoost captures non-linear relationships effectively; the equations are shown in Table 6. Its built-in regularization helps prevent overfitting, ensuring accurate and robust forecasts for energy providers [57].

2.3.6. Mathematical Model Tree Decision

Tree decision algorithms are effective tools for forecasting energy demand, utilizing a tree-like structure to map input features to predicted outcomes. They work by recursively splitting the dataset based on the feature that provides the highest information gain, using factors like time of day, temperature, and historical usage patterns, as shown in Table 7. The result is a series of interpretable if–then rules that reveal the key drivers of energy consumption. Decision trees can handle both categorical and continuous variables, and their integration into ensemble methods like random forests enhances predictive accuracy and robustness in energy demand forecasting [58].

2.4. KPI’S Matrix

A thorough KPI matrix was used to thoroughly assess how well AI models predicted the energy demand of heat exchangers (HX). Prediction accuracy was measured in percentage terms using the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Percentage Error (RMSPE), where RMSPE penalizes larger errors more severely, and MAPE measures average deviation. The Mean Absolute Relative Percentage Error (MARPE) and Root Mean Squared Relative Error (RMSRE) normalized errors against actual demand changes; they were ideal for assessing relative performance in HX systems with fluctuating loads. The linear relationship between anticipated and observed values was also evaluated using R-squared (R²) and correlation with real demand to ensure that the models accurately reflected the underlying trends in energy usage [6].

To evaluate and promote the technical validity of the model, advanced metrics such as Nash–Sutcliffe Efficiency (NSE) and Kling–Gupta Efficiency (KGE) were employed for a more comprehensive examination of model reliability. The KGE metric obviously communicated the full picture of reliability for HX energy prediction by breaking the model performances into components of correlation, bias, and variability. The NSE compared performance of the AI models on a value of −1 to 1 basis, to justify the mean or no-predictive basis for AI learning; K-values closer to 1 demonstrated better performance. These key performance indicators (KPIs) assured that the AI model provided consistency (KGE, NSE), explanatory power (R²), and minimizing error (MAPE, RMSPE) over major iterations in different HX operating conditions. The variety of metrics rationalized and justified the development of robust, widely applicable models for optimizing HX energy use in real-life applications. These metrics provide a sound basis on which model performance evaluation and improvement can be based as show in Table 8, therefore offering enhanced insight and decision-making in modeling [39].

2.5. Optimization Procedures

To improve the predictive performance of six different artificial intelligence models (artificial neural network, radial basis function, autoencoder, random forest, XGBoost, and decision tree) for examining heat exchanger energy consumption patterns throughout the entire university campus and its 27 individual buildings, this study implemented a rigorous optimization framework in MATLAB (Version R2018a). To ensure robust model validation, the entire dataset included a variety of heat exchanger operational parameters, building characteristics, and environmental factors—methodically separated into training (70%) and testing (30%) sets at the start of the optimization process.

To balance computational efficiency and model complexity, we used a meticulously planned one-hidden-layer architecture for the artificial neural network model, with five neurons in layer chosen through iterative testing. The Levenberg–Marquardt backpropagation algorithm was used in the training protocol, which had a strict performance goal (mean squared error ≤ 1 × 10⁻³), a maximum of 500 epochs, and a learning rate of 0.01. To avoid overfitting and guarantee optimal convergence, the training procedure included an early stopping mechanism that was activated after 20 consecutive validation failures. The radial basis function network underwent specific optimization of its spread factor through an exhaustive grid search across the range of 0.1 to 2.0, with performance evaluated using k-fold cross-validation. The autoencoder model focused on latent space optimization, where we systematically tested dimensions from 2 to 10 nodes to achieve the optimal balance between dimensionality reduction and feature preservation, measured by reconstruction error minimization. We used an advanced Bayesian optimization technique for the ensemble tree methods (random forest, XGBoost, and decision tree) to simultaneously adjust a number of hyperparameters, such as the number of features to consider at each split, the minimum leaf size, the maximum tree depth, and, for ensemble methods, the number of trees in the forest. This strategy worked especially well for handling the bias–variance tradeoff while considering the various operating conditions found in the various campus buildings. The block diagram focused on the complex relationship between these elements and their collective impact on energy demand trends, as shown in Figure 3.

3. Results and Discussions

This study evaluated energy consumption projections for the Valladolid campus using six artificial intelligence models: XGBoost, random forest, autoencoder, radial basis function, artificial neural networks, and decision trees. The on-campus evaluation revealed the differences in models used to forecast energy use across the campus. When applied to larger quantities of consumption, some models were more consistent, while others were more variable when applied to a smaller group of consumption. Variability was also found in the evaluation of 27 different buildings, with models varying based on the sizes, purposes, and data accessibility of the buildings.

A comparison of the AI models displayed across individual buildings indicated that several structural and operational aspects render model performance dependent. Buildings with stable and predictable energy demands exhibited one prediction profile, while buildings with stochastic energy consumption exhibited a different prediction profile. While both had AI models that were successfully able to respond and delivery accuracy to usage complexity, others had models whose performance in turn exhibited sensitivity to the written performance variability of the data. Given the discrepancy in model performance, we would suggest that the relative energy prediction performance of an AI model is very likely related to a model structure and a building energy consumption profile.

The integration of heat exchangers into campus energy systems can significantly affect total energy consumption and CO₂ emissions. Heat exchangers recover waste heat and increase thermal efficiency, which can reduce the quantity of needed primary energy sources and the carbon emissions associated with their use. The energy source does matter in how much CO₂ is decreased; clearly, powered systems using renewable energy will decrease emissions more than those powered by fossil fuels. Also, while heat exchangers do maximize actual energy use, system design, maintenance, and operating conditions will impact their maximum efficiency. Exploring their impacts, and possibly in conjunction with AI energy models, could provide deeper insights into how to minimize emissions while ensuring energy efficiency values in various building types.

3.1. Validation of the Whole Campus

The Valladolid campus strives to conduct focused research in the energy conservation and sustainability areas, with special attention to the beauty of heat exchanger systems. They are fundamental systems that ensure minimal energy waste due to heat being transferred between fluids. The facilities have incorporated technology whereby system energy consumption is lowered by 30% in most instances with respect to conventional systems. This spirit of conservation aids in improving efficiency and setting an example for the rest of the academic world as an environmentally sustainable practice.

Artificial neural network (ANN), random forest (RF), XGBoost (XG), radial basis function (RBF), and autoencoder (AUTO) are the six AI models whose anticipated energy consumption (in kWh) is shown in the box plot. Actual (1.5 ×

10^{5}

kWh), ANN (1.1 ×

10^{5}

kWh), RF (1.3 ×

10^{5}

kWh), XG (1.4 ×

10^{5}

kWh), RBF (1.2 ×

10^{5}

kWh), and AUTO (1.3 ×

10^{5}

kWh) are the median energy consumption amounts, as shown in Figure 4. With the lowest median energy consumption of 1.1 kWh, the ANN model performs better than the others, demonstrating its ability to anticipate energy usage with accuracy. This improved performance implies that applying the ANN model to heat exchanger operations may result in more accurate energy management, which would ultimately raise productivity and lower operating expenses throughout the Valladolid campus.

Figure 5 compares the true measured energy consumption values with predictions made by different AI solutions, including random forest (RF), XGBoost (XG), autoencoder, radial bias function (RBF), artificial neural network (ANN), and tree decision. Perfect estimation is represented as a diagonal dotted line in each plot, where expected and true values are equal. This line describes how accurate predictions are by the model: the closer points are to the dotted line, the more accurate the model’s predictions are predicted to be. This visualization allows us to see each model’s ability to predict energy consumption in detail.

Each model displays varying degrees of accuracy in its predictions. The ANN, RBF, and autoencoder models showcase a strong correlation between predicted and actual values, with most data points closely aligning with the diagonal line. In contrast, the random forest model also performs well but exhibits slightly more dispersion in its predictions. The XGBoost model, while relatively accurate, shows a broader spread of points, indicating some variability in its predictions. The RBF and autoencoder models reveal a more significant divergence from the ideal line, suggesting that they may require further tuning to enhance their predictive capabilities. The tree model, while offering some insights, shows the most deviation from actual values, indicating it is the least effective among the models presented.

The information provided by the models in comparison is essential to optimize energy management practice, specifically on the Valladolid campus regarding heat exchangers. Due to the ANN, RBF, and autoencoder models’ accuracy being the best, it could be argued that they are the most efficacious tools to predict energy consumption, allowing for improved operational efficiency and lower spending. Leveraging the strengths of the best models, the campus will have more effective options to create different strategies that optimize energy, while reducing their overall carbon footprint. As the campus continues to refine their AI models, it will be important to continue to improve prediction accuracy, to ensure energy efficiency using ANN in future applications.

Q-Q plots are useful for evaluating whether the predicted energy consumption values from AI models follow a theoretical normal distribution. Deciding on the distribution of the energy consumption predictions in heat exchangers is essential when drawing conclusions about patterns, anomalies, or model performances. In each of the Q-Q plots, the quantiles of predicted values from the models, such as ANN, RF, XG, RBF, autoencoder, and tree decision, are compared against the theoretical quantiles of the normal distribution. Thus, if the points fall nearly along the diagonal line, it means that the predicted values from the model are normally distributed, which is crucial for an array of statistical analyses.

When looking at the Q-Q plots shown in Figure 6, we can observe distinctive performance characteristics from the models. The ANN plot has points that closely follow the diagonal, particularly more so than other models. This is a strong indication the predictions are evenly distributed and closer to having an approximately normal distribution. This is also a good indicator that it has good predictive ability and is reliable in estimating energy prediction. While the RF and XGBoost plots have some difference from the diagonal as the tails shows difference, suggesting they struggled with extremes, the RBF and autoencoder plots display a greater difference from the diagonal and a stronger indication those models must improve predictions so there can be a closer to normal distribution. The furthest from the 45-diagonal line suggests it is poor means of predicting energy consumption in this situation. The insights gained from the Q-Q plots have important implications for energy management strategies involving heat exchangers. Models that produce predictions closely aligned with a normal distribution, such as ANN, are likely to be more reliable for forecasting energy consumption patterns.

The histograms provided demonstrate the residuals from six different AI models: artificial neural network (ANN), random forest (RF), XGBoost (XG), random forest with Bayesian optimization (RBF), autoencoder, and tree models. Residuals are defined as the differences between predicted values and the actual observed values of energy consumption. Examining the residuals is necessary to consider how well each model performs and whether there are any biases in their predictions. Residuals should ideally be normally distributed around the 0-value, indicating biased model predictions and a lack of systematic error.

Figure 7 also shows recognizable patterns in the residuals of each model. The ANN model shows a relatively tight distribution of the residuals around the mean of 0 and hints it is a reasonable predictor, with bias nearing zero. The RF and XGBoost models show some wider distribution, suggesting they also provide reasonable predictions; however, due to their broader distribution, there is more variability in these models. The RF and XGBoost have more observable tails than the ANN model, suggesting they may have difficulty with predictions in some ranges of the data, which could result in bias due to over- or under-prediction in some extreme cases. The RBF and autoencoder have wider distributions, suggesting the models need some further optimizations and potentially could have been built differently for improved accuracy. Finally, the tree histogram is a much wider distribution about the mean and, therefore, more skewed, indicating a tendency for larger prediction error and suggesting that it is likely the least reliable model in this use case.

3.2. Evaluation of Performance Matrix

Figure 8 contains two bar charts with six different artificial intelligence (AI) models, which are ANN, RF, XGBoost, RBF, autoencoder, and tree. The site is examining heat exchangers, likely in terms of energy consumption. The bar logs on the left show the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Both of these are common evaluation parameters to assess the accuracy of model predictions. The lower the RMSE/MAE, the better the model performance, as this signifies less difference between the predicted and actual values. The bar charts on the right show Kull–Walsh Global Efficiency (KGE), Nash–Sutcliffe Efficiency (NSE), and the coefficient of determination (R²). All of these are predominantly used in hydrological modeling, which can be used to assess predictive power and/or goodness-of-fit of models in other engineering-related problems. The higher, the better for KGE, NSE, and R² (close to 1 is ideal), meaning that the predicted values are closely matching the observed values.

By analyzing both graphs, we can derive the best AI models that successfully predict energy consumption in the heat exchanger. In terms of error, the RBF and autoencoder models appear to have the best RMSE and MAE, which can be interpreted as presenting the most value in predictively accurate results. We can support this with the right graph, where RBF and autoencoder yield KGE, NSE, and R² measurements very close to 1, suggesting a good fit to the data and high prediction value. While ANN also yields relatively low error values, its KGE, NSE, and R² are lower than RBF and autoencoder, which suggests its predictions are less valuable. Moreover, XGBoost, RF, and tree models, despite possibly providing respective relatively positive error results, have error values that are greater than the RBF and autoencoder models and lower in efficiency/R² measurements, whereby we show prediction accuracy is less significant than RBF and autoencoder. For these reasons, based on measure metrics, it would appear the RBF and autoencoder AI models would present the best and likely most optimal prediction for energy consumption in the heat exchanger, thereby contributing to more sustainable energy management and efficiency within the Valladolid campus.

The Figure 9 is a correlation heatmap that shows the correlation coefficients among the various variables. The variables on the specific heatmap above include “Actual” and “Predicted” values from six different AI models: ANN, RF, XGBoost, RBF, autoencoder, and tree. The coefficient correlation ranges from −1 to 1, where +1 is a perfect positive correlation, −1 is a perfect negative correlation, and 0 is no linear correlation. The color scale shows the strength of the correlation, with a dark red color representing the strongest positive correlation (closer to 1) and a dark blue color showing weakest correlation. Each cell at the intersection of a row and column shows the correlation between corresponding variables.

A higher coefficient correlation between the actual and predicted values indicates that the model’s predictions closely follow the actual trend, signifying better model performance. From the heatmap, “Predicted ANN,” “Predicted RBF,” and “Predicted auto” all show very high correlation coefficients with “Actual” values (0.9939, 0.9915, and 0.9948, respectively). This suggests that these three models are highly accurate in their predictions. “Predicted RF” and “Predicted tree” also show strong positive correlations (0.8269 and 0.7685, respectively), but they are not as high as ANN, RBF, and autoencoder. “Predicted xgboost” exhibits the lowest correlation with “Actual” values (0.7016) among all the models, indicating that its predictions are less aligned with the actual data compared to the other models. Therefore, based on this correlation analysis, ANN, RBF, and autoencoder appear to be the most reliable models in terms of their ability to predict outcomes that closely mirror the actual observations.

3.3. Validation of Each Building

Figure 10 presents a sample of visual analysis regarding energy consumption predictions for 27 distinct buildings at the University of Valladolid. Each image is a grid with nine plots, with titles that show the building IDs, which range from D01 to D013 in the first two images, which are used, while the third image has employed the naming sequence for each of the buildings (E01, E02, etc.). Each subplot represents a visual comparison of “Actual” energy consumption (black line) between actual and each of the three chosen artificial intelligence (AI) models, respectively (a) artificial neural network (ANN—red line), (b) radial basis function (RBF—green line), and (c) autoencoder (autoencoder prediction—magenta line); therefore, each of the nine plots visualizes the predictive performance of each model estimating energy consumption for specific buildings, across respective “Sample Indices,” likely representing different periods of time or data points.

An in-depth analysis of the 27 plots indicated various levels of predictive accuracy for the three AI models, across the various buildings. For many of the buildings, mostly those designated with names “D” (e.g., D01, D04, D07, D08, D010, D011), both the ANN and RBF models performed exceptionally well in estimating the “actual” energy consumption using a close estimation of the respective data. We observed that predicted lines overlapped or in tight proximity on the respective historic energy consumption data—strongly suggesting predictive accuracy and robustness. The predictive capabilities of models using ANN and RBF methods highlight their reliability in utilizing these models/these buildings for the energy forecasting endeavor.

On the other hand, the autoencoder model has a more varied behavior. The autoencoder also fits the actual consumption in some cases well (e.g., D01, D04). However, it has some cases where the autoencoder was very inaccurate. Some buildings, especially D02, D05, D06, D09, and D013 from the “D” series and particularly, from the “E” series, buildings E01, E02, E04_1, and E05, the autoencoder prediction line has greater deviations, sometimes indicating poorer over- or under-estimation of consumption than ANN or RBF. The differences seem to suggest that the autoencoder is more sensitive to the consumption profile of the specific building(s), suggesting it may need better tuning, or its architecture may not be suitable uniformly across all types of the buildings on this campus, for getting improved results.

The overarching objective of this extensive analysis is to identify the most effective AI models for predicting energy consumption across the entire University of Valladolid campus. Accurate energy consumption forecasting is paramount for implementing efficient energy management strategies, optimizing resource allocation, reducing operational costs, and ultimately contributing to the university’s sustainability goals. By analyzing the performance of ANN, RBF, and autoencoder across 27 diverse buildings, the study aims to determine which models offer the best balance of accuracy and adaptability for a complex, multi-building environment.

This detailed visual comparison is an essential diagnostic tool, providing clarity into the strengths and weaknesses in each AI model applied against real energy consumption data. The visualization of these differences by building allows researchers and energy managers to rationally decide which models to use and whether combining models is feasible. For example, in cases where ANN or RBF performed particularly well, these models could act as forecasting tools for energy managers; in buildings that presented specific difficulties with the autoencoder, model hybridizations or other models or further parameter optimization of the autoencoder would be necessary to successfully build a forecasting model.

The analysis performed over all 27 buildings of the University of Valladolid, utilizing the ANN, RBF, and autoencoder models, is an essential first step towards the establishment of an intelligent energy intervention and management system. The evidence from these visual comparisons suggests overall that ANN and RBF will generally provide a more accurate and consistent prediction across a more varied set of building types, which include both suggested “D” series and “E” series buildings. While Autoencoders provide promise, the variability in any model means further consideration, investigation and/or customized application is needed. The additional insights from these plots will be essential to provide a data-driven approach to energy efficiency, making sure energy managers are managing their rare inputs and sustainable campus environment as responsibly and sustainably as possible.

These correlation heatmaps depict strong relationships between actual energy consumption and the foretold by varying AI modes for energy usage at several buildings on the university campus. Similar high correlation patterns were displayed in all 27 buildings assessed, which indicates the strength of the AI models’ predictive capacity. Overall high correlation across all buildings indicates that the models, including artificial neural networks (ANN), radial bias function (RBF), and autoencoders (AE), effectively learned complexity ingrained in energy consumption of different building types and purposes. The consistent performance across the entire dataset of 27 buildings supports the reliability and generalizability of the proposed AI framework for university-wide energy management and optimization.

When the correlation heatmaps for buildings D01, D08, E03, and E014 are carefully examined, the pictorial evidence reinforces the high performance of the model more. The heatmaps represent residuals between an actual value against the predicted values, where the diagonal values (i.e., the values measuring a variable with itself) are all a correlation of 1.0, as expected. More importantly, the off diagonal values, particularly those associating “Actual” energy consumption to “Predicted ANN”, “Predicted RBF” and “Predicted Auto”, each exhibit very high positive correlation values (sometimes greater than 0.9). For example, in Figure 11a–d, each correlation coefficient between “Actual” and the respective predicted values is closely approaching 1, suggesting that across these buildings observed energy usage exhibited an excellent trend alignment with the models’ forecasts. The sample selected for visual representation had high levels of agreement, ultimately confirming the individual AI models were suitably capturing complex energy behavior at the building level, which is particularly important for characterizing high-consumption areas and refining energy conservation pursuits based on evidential data.

3.4. CO₂ Emissions on Campus

Figure 12 shows the CO₂ emissions (kg) of energy use on the entire university campus, with an emphasis on biomass energy as shown by several different AI models. Within the scatter plot, the “Actuals” (shown in yellow) reflect a baseline of reported CO₂ emissions. The distribution and range of CO₂ emissions across different biomass consumption reports provide a broad spectrum of values, ranging from barely anything to values slightly over 4000 kg, with even a few extreme outliers exceeding 5000 kg for some models. The spread shown by the data points of the models (ANN, RF, XGBoost, RBF, and autoencoder) reflect different interpretations and predictions of CO₂ emissions with the influence of biomass consumption across the universities 27 buildings. The distribution of total CO₂ emissions across the campus of the total sum of CO₂ emissions, the different footprints attributed to each building, and the consumption patterns of energy use provided critical information for targeting strategies to reduce emissions.

When examining the performance of the individual AI models in predicting CO₂ emissions from biomass, the goal is to identify which model’s predictions most closely align with the “Actual” observed values. Visually inspecting the plot, we see that the artificial neural network (ANN), radial basis function (RBF), and autoencoder (Auto) models appear to have distributions that closely mirror the “Actual” yellow data points in terms of both range and density, particularly in the lower to mid-range emission values. For instance, the clusters of blue (ANN), cyan (RBF), and magenta (autoencoder) points show a strong resemblance to the spread of the actual data. While all models capture the general trend, these three models demonstrate a higher fidelity in reflecting the real-world CO₂ emissions attributed to biomass energy consumption across the campus. This proximity to actual values makes ANN, RBF, and autoencoder strong candidates for reliable CO₂ emission monitoring and forecasting in the context of biomass energy use.

3.5. Sensitivity Analysis and Feature Importance

These bar charts in Figure 13 present the results of sensitivity analysis and feature importance for each of the six AI models employed: (a) ANN, (b) RF, (c) XGBoost, (d) RBF, (e) autoencoder, and (f) tree decision. Understanding feature importance is critical for interpreting how each model arrived at its predictions, by revealing which input variables had the most significant impact on the model’s output. While the specific labels for the “Features” are not provided, it is evident that the models attribute varying degrees of importance to different input variables. For instance, in panel (a) for ANN, one feature stands out with overwhelmingly high importance, suggesting that the ANN heavily relies on this specific input for its predictions. Conversely, in panel (d) for RBF, several features appear to contribute almost equally and significantly to the model’s performance, indicating a more distributed reliance on its inputs. This variance in feature importance across different models highlights their distinct internal mechanisms and sensitivities to the input data.

3.6. Short-Term Predections

Figure 14a illustrates an energy consumption forecast for a whole campus over the next five years, utilizing a linear regression model. The blue line represents historical energy data, exhibiting significant fluctuations in energy. Following this historical data, a red line extrapolates the energy consumption for the subsequent period, representing the forecasted energy using linear regression. This linear trend suggests a steady and continuous increase in energy consumption over the five-year forecast period. The model projects that the energy demand, which was around

0.5 \times 10^{5}

kWh at the start of the forecast, will rise to approximately

5.5 \times 10^{5}

kWh by the end of the five-year period. This indicates that, based on the linear regression, the campus should anticipate a substantial and consistent rise in its overall energy requirements in the coming five years.

Figure 14b illustrates a forecast for CO₂ emissions for an entire campus over the next five years, also using a linear regression model. Based on the energy usage graph, the blue line on the graph is based on the historical CO₂ emissions data that exhibited random patterns (the variability of CO₂ emissions for the campus over about 350 months). The historic CO₂ data was followed with a projected (red) line extending the forecast using the linear regression model. The resulting linear trend indicates a steady and significant increase in campus CO₂ emissions during the five-year forecast period. The model indicates CO₂ emissions based on the linear trend will start with CO₂ emissions of about 1500 kg at the beginning of the forecast period, increase rapidly to almost 14,000 kg of CO₂ emissions near the end of the five-year forecast period (equivalent to a little less than 1600 months from the first historical data that was used). Therefore, it appears the campus can anticipate a substantially large and steady increase in CO₂ emissions over the next five years, based on the linear regression.

4. Conclusions

This research proposes and validates a comprehensive artificial intelligence-based methodology for forecasting and optimizing energy consumption and associated CO₂ emissions across a complex university campus composed of 27 buildings with varying sizes, uses, and thermal behaviors. The framework combines thermodynamic modeling of heat exchangers with advanced data-driven techniques to improve operational efficiency and sustainability in higher education infrastructures. Six distinct AI algorithms were implemented: artificial neural networks (ANN), radial basis function networks (RBF), autoencoders, random forest (RF), XGBoost, and tree decision, each trained on high-resolution data including monthly building-level energy use, detailed heat exchanger parameters, occupancy information, and weather-dependent load variations. The performance of these models was rigorously assessed using multiple statistical and hydrological metrics, including MAPE, RMSE, R², NSE, and KGE, to ensure robustness, interpretability, and generalizability of the predictions.

The results obtained confirm that ANN, RBF, and autoencoder models exhibit superior performance in terms of prediction accuracy and reliability. These models consistently delivered R² values above 0.99 and mean absolute percentage errors below 5%, even under varying building conditions. ANN demonstrated the most stable behavior across the full dataset, showing limited residual dispersion and strong alignment with the theoretical normal distribution of prediction errors. The approach combines data science with engineering thermodynamics to deliver accurate and actionable insights, making it a valuable tool for energy management. The results demonstrate that the integration of real-world performance data from thermal systems with state-of-the-art machine learning models offers significant potential for improving energy efficiency and reducing environmental impact in complex building clusters.

The ability of these models to capture nonlinear relationships between heat exchanger performance and energy demand is essential in environments characterized by dynamic load profiles and heterogeneous equipment operation. The autoencoder model proved especially useful in identifying latent features and detecting anomalies or inefficiencies in the operation of thermal systems, which is highly valuable for preventive maintenance and fault diagnosis.

The study revealed substantial differences in model performance depending on building typology, thermal load variability, and availability of operational data. Administrative and academic buildings with stable usage patterns allowed all models to achieve high predictive accuracy. Conversely, laboratory spaces, multi-purpose halls, and buildings with intermittent occupancy introduced significant variability that required tailored modeling strategies. These findings suggest that energy forecasting for multi-building infrastructures cannot rely on uniform models, but instead should consider hybrid or adaptive approaches, possibly integrating real-time feedback or reinforcement learning techniques in future work. The capacity to disaggregate predictions at the building level is particularly relevant for energy management aiming to identify high-consumption profiles, benchmark performance, and prioritize retrofitting actions under constrained budgets.

Heat exchangers emerged as a critical component in the energy-emission nexus. The study demonstrated that optimized HX operation, through control of flow rates, temperature differentials, and seasonal adjustments, can result in energy savings of up to 30% in certain buildings when compared to standard operating conditions. These reductions are directly reflected in lower CO₂ emissions, particularly in buildings where fossil fuels such as natural gas are still predominant. By applying standardized emission factors to the energy consumption predictions, the framework enabled precise estimation of building-level carbon footprints. The study confirmed that buildings equipped with well-performing HX systems and partially integrated renewable energy technologies show significantly lower emission intensities, reinforcing the strategic value of upgrading HVAC subsystems as part of a decarbonization pathway.

The inclusion of a five-year forecasting horizon, based on linear regression applied to the AI-derived consumption and emissions data, further strengthens the contribution of this work. The forecast indicates a significant and sustained increase in both energy consumption and CO₂ emissions across the campus if no corrective measures are implemented. Energy demand is expected to grow from 0.5 × 10⁵ kWh to 5.5 × 10⁵ kWh, while emissions may increase from approximately 1500 kg to 14,000 kg over the same period. These projections emphasize the urgency of deploying predictive control systems and data-driven decision-making tools for energy and sustainability planning. AI-based forecasting enables early detection of inefficiencies and supports the definition of targeted intervention strategies with measurable environmental and economic impact.

The study also incorporated sensitivity and feature importance analyses to identify the most influential parameters affecting energy use. Heat exchanger-specific variables, particularly volumetric flow rate, supply and return temperatures, and thermal efficiency, were consistently identified as the dominant predictors across all models. These findings highlight the importance of accurate monitoring and control of thermal subsystems within District Heating according to thermodynamic principles. Moreover, contextual parameters such as building use, occupancy density, and floor area were shown to enhance model performance, supporting the development of intelligent digital twins for operational optimization. The methodology is scalable to other buildings, campuses, hospitals, and cities, contributing directly to the broader transition towards data-informed, low-carbon built environments in alignment with the European Union’s climate and energy directives.

Author Contributions

Conceptualization, A.R.-H., J.S.J.-A., and A.P.-P.; data curation, A.R.-H., J.S.J.-A., A.P.-P., and K.M.S.; formal analysis, A.R.-H., J.S.J.-A., A.P.-P., F.J.R.-M., A.O.E., J.M.R.-H., and K.M.S.; funding acquisition, J.S.J.-A. and F.J.R.-M.; investigation, A.R.-H., J.S.J.-A., A.P.-P., A.O.E., J.M.R.-H., and K.M.S.; methodology, A.R.-H., J.S.J.-A., and A.P.-P.; project administration, J.S.J.-A., A.P.-P., and F.J.R.-M.; resources, A.R.-H., J.S.J.-A., and A.P.-P.; software, A.R.-H., J.S.J.-A., A.P.-P., and A.O.E.; supervision, J.S.J.-A., A.P.-P., F.J.R.-M., and J.M.R.-H.; validation, A.R.-H., J.S.J.-A., and A.P.-P.; visualization, A.R.-H., J.S.J.-A., A.P.-P., and K.M.S.; writing—original draft, A.R.-H., J.S.J.-A., A.P.-P., and K.M.S.; writing—review and editing, A.R.-H., J.S.J.-A., A.P.-P., F.J.R.-M., J.M.R.-H., and K.M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. The data are not publicly available due to privacy restrictions set by the University of Valladolid.

Acknowledgments

We would like to acknowledge the support received from “LIFE23-CET-Re-Energize” European Project by University of Málaga, Spain, “EUSUVa4.0” Project by University of Valladolid; “Lime4Health” National Project by Technical University of Madrid (UPM); RED-“TRAPECIO” IberAmerican Project by CYTED (Ibero-American Program of Science and Technology for Development); and ITAP Research Institute at University of Valladolid. We would like to acknowledge the use of MATLAB (Version R2018a, MathWorks, https://www.mathworks.com, accessed on 1 May 2024) for data analysis and visualization in this study. Additionally, the images included in this document were created by the authors and are original works.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

List of symbols
Variable	Description
$x = ⌊x_{1}, x_{2}, \dots, x_{n}⌋$	Input vector: Represents the input features to the neural network, where $n$ is the number of input parameters.
$z_{j}^{(1)}$	The sum of inputs to the $j$ -th neuron in the first hidden layer, calculated as a weighted sum of inputs.
$w_{i j}$	The weight associated with the connection from the $i$ -th input to the $j$ -th neuron.
$a_{j}^{(1)}$	The output of the $j$ -th neuron in the first hidden layer after applying the activation function.
$k$	Index for neurons in the second hidden layer, indicating the connection from the first hidden layer.
$z_{k}^{(3)}$	The sum of inputs to the $k$ -th neuron in the second hidden layer, calculated similarly to the first layer.
$a_{k}^{(3)}$	The output of the $k$ -th neuron in the second hidden layer, processed through an activation function.
y	The predicted output for energy demand from the output layer of the neural network.
$b_{j}$	The bias term for the $j$ -th neuron in the first hidden layer.
$b_{k}$	The bias term for the $k$ -th neuron in the second hidden layer.
$η$	The learning rate used in the backpropagation algorithm to update weights and biases.
$m$	The total number of neurons in the first hidden layer.
$n$	The total number of input features.
$ϕ_{j} (x)$	Activation of the j-th neuron in the hidden layer
$y (x)$	Output of the RBF
$E$	Error can be computed
$z$	Encoding process
$\hat{x}$	Decoding process
$L$	Loss function
$t$	Total number of trees in the random forest.
$N_{t}$	Number of instances from the training set used to create a bootstrap sample $D_{t}$ .
$n$	The feature to split on based on a chosen criterion (e.g., Gini impurity for classification, Mean Squared Error for regression).
$p_{i}$	The proportion of class $i$ in the node
$N$	The total number of instances at the node.
$\hat{y}$	The predicted output for regression tasks, calculated as the average of predictions from individual trees.
$y$	The actual output value for regression tasks.
${\hat{y}}_{i}$	The predicted class in classification tasks, determined by majority vote among the trees.
${\hat{Y}}_{leaf}$	Leaf prediction
Importance $(X_{k})$	Feature importance
$O O B$	Out-of-bag observations, which are instances not included in a tree’s bootstrap sample, used for performance estimation.
$I$	An indicator function that equals 1 if the predicted class ${\hat{y}}_{i}$ does not match the actual class $y_{i}$ .
$L$	Objective function
$Ω (f_{k})$	Regularization term
$G a i n (X_{j}$ , split)	Tree splitting gain
$G i n i (D)$	Gini impurity
$E n t r o p y (D)$	Entropy
$I G (D, A)$	Information gain
List of abbreviations
ANN	Artificial neural networks
RF	Random forests
XGBoost	Extreme gradient boosting
RBF	Radial bias function
RMSPE	Root Mean Square Percentage Error
MAPE	Mean Absolute Percentage Error
MARE	Mean Absolute Relative Error
RMSRE	Root Mean Squared Relative Error
RNSE	Recurrent neural networks
MVAC	Mechanical Ventilation and Air Conditioning
FCUs	Fan Coil Units
MCMC	Markov Chain Monte Carlo
PVT	Position, Velocity, and Time
KGE	Kling–Gupta Efficiency
NSE	Nash–Sutcliffe Efficiency
nZEB	Nearly zero energy building
EU	European Union
CO₂	Carbon dioxide
HX	Heat exchanger
AI	Artificial intelligence
IoT	Internet of Things
HVAC	Heating, ventilation, and air conditioning
NAR	Nonlinear autoregressive
MSHD	Method of spatial homogenization decomposition
SVR	Support vector machine
ReLU	Rectified linear unit
OOB	Out-of-bag
CDF	Cumulative distribution function
LM	Levenberg–Marquardt

References

Zeng, C.; Liu, S.; Shukla, A. A Review on the Air-to-Air Heat and Mass Exchanger Technologies for Building Applications. Renew. Sustain. Energy Rev. 2017, 75, 753–774. [Google Scholar] [CrossRef]
Li, C.; Guan, Y.; Liu, J.; Jiang, C.; Yang, R.; Hou, X. Heat Transfer Performance of a Deep Ground Heat Exchanger for Building Heating in Long-Term Service. Renew. Energy 2020, 166, 20–34. [Google Scholar] [CrossRef]
Gao, J.; Li, A.; Xu, X.; Gang, W.; Yan, T. Ground Heat Exchangers: Applications, Technology Integration and Potentials for Zero Energy Buildings. Renew. Energy 2018, 128, 337–349. [Google Scholar] [CrossRef]
He, Y.; Bu, X. A Novel Enhanced Deep Borehole Heat Exchanger for Building Heating. Appl. Therm. Eng. 2020, 178, 115643. [Google Scholar] [CrossRef]
Culha, O.; Gunerhan, H.; Biyik, E.; Ekren, O.; Hepbasli, A. Heat Exchanger Applications in Wastewater Source Heat Pumps for Buildings: A Key Review. Energy Build. 2015, 104, 215–232. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Hernández, J.M.; Rey-Martínez, F.J.; Elgharib, A.O. Assessing the Accuracy of AI Approaches for CO₂ Emission Predictions in Buildings. J. Clean. Prod. 2025, 513, 145692. [Google Scholar] [CrossRef]
Giannelos, S.; Bellizio, F.; Strbac, G.; Zhang, T. Machine Learning Approaches for Predictions of CO₂ Emissions in the Building Sector. Electr. Power Syst. Res. 2024, 235, 110735. [Google Scholar] [CrossRef]
Jha, R.; Jha, R.; Islam, M. Forecasting US Data Center CO₂ Emissions Using AI Models: Emissions Reduction Strategies and Policy Recommendations. Front. Sustain. 2025, 5, 1507030. [Google Scholar] [CrossRef]
García-Gusano, D.; Cabal, H.; Lechón, Y. Long-Term Behaviour of CO₂ Emissions from Cement Production in Spain: Scenario Analysis Using an Energy Optimisation Model. J. Clean. Prod. 2015, 99, 101–111. [Google Scholar] [CrossRef]
Luna-Romera, J.M.; Carranza-García, M.; Arcos-Vargas, Á.; Riquelme-Santos, J.C. An Empirical Analysis of the Relationship among Price, Demand and CO₂ Emissions in the Spanish Electricity Market. Heliyon 2024, 10, e25838. [Google Scholar] [CrossRef]
Gang, W.; Wang, J. Predictive ANN Models of Ground Heat Exchanger for the Control of Hybrid Ground Source Heat Pump Systems. Appl. Energy 2013, 112, 1146–1153. [Google Scholar] [CrossRef]
Kumar, R.; Kaushik, S.C.; Garg, S.N. Heating and Cooling Potential of an Earth-to-Air Heat Exchanger Using Artificial Neural Network. Renew. Energy 2006, 31, 1139–1155. [Google Scholar] [CrossRef]
Shojaeefard, M.H.; Zare, J.; Tabatabaei, A.; Mohammadbeigi, H. Evaluating Different Types of Artificial Neural Network Structures for Performance Prediction of Compact Heat Exchanger. Neural. Comput. Appl. 2017, 28, 3953–3965. [Google Scholar] [CrossRef]
Hu, Q.; So, A.T.P.; Tse, W.L.; Ren, Q. Development of ANN-Based Models to Predict the Static Response and Dynamic Response of a Heat Exchanger in a Real MVAC System. J. Phys. Conf. Ser. 2005, 23, 110. [Google Scholar] [CrossRef]
Acikgoz, O.; Çebi, A.; Dalkilic, A.S.; Koca, A.; Çetin, G.; Gemici, Z.; Wongwises, S. A Novel ANN-Based Approach to Estimate Heat Transfer Coefficients in Radiant Wall Heating Systems. Energy Build. 2017, 144, 401–415. [Google Scholar] [CrossRef]
Mohanraj, M.; Jayaraj, S.; Muraleedharan, C. Applications of Artificial Neural Networks for Thermal Analysis of Heat Exchangers—A Review. Int. J. Therm. Sci. 2015, 90, 150–172. [Google Scholar] [CrossRef]
Shahsavar, A.; Bagherzadeh, S.A.; Afrand, M. Application of Artificial Intelligence Techniques in Prediction of Energetic Performance of a Hybrid System Consisting of an Earth-Air Heat Exchanger and a Building-Integrated Photovoltaic/Thermal System. J. Sol. Energy Eng. 2021, 143, 051002. [Google Scholar] [CrossRef]
Taki, M.; Rohani, A. Machine Learning Models for Prediction the Higher Heating Value (HHV) of Municipal Solid Waste (MSW) for Waste-to-Energy Evaluation. Case Stud. Therm. Eng. 2022, 31, 101823. [Google Scholar] [CrossRef]
Manimegalai, T.; Gopalan, A.; Murugesan, V.; Giri, J.; Barmavatu, P.; Praveenkumar, T.R.; Mavaluru, D.; Samrin, R. Enhancing Heat Exchanger Design Using Autoencoder Model for Predicting Efficiency and Cost in Chemical Processing. Case Stud. Therm. Eng. 2025, 65, 105645. [Google Scholar]
El Mokhtari, K.; McArthur, J.J. Autoencoder-Based Fault Detection Using Building Automation System Data. Adv. Eng. Inform. 2024, 62, 102810. [Google Scholar] [CrossRef]
Wang, P.; Li, C.; Liang, R.; Yoon, S.; Mu, S.; Liu, Y. Fault Detection and Calibration for Building Energy System Using Bayesian Inference and Sparse Autoencoder: A Case Study in Photovoltaic Thermal Heat Pump System. Energy Build. 2023, 290, 113051. [Google Scholar] [CrossRef]
Naghibi, S.A.; Ahmadi, K.; Daneshi, A. Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resour. Manag. 2017, 31, 2761–2775. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Y.; Zeng, R.; Srinivasan, R.S.; Ahrentzen, S. Random Forest Based Hourly Building Energy Prediction. Energy Build. 2018, 171, 11–25. [Google Scholar] [CrossRef]
Petropoulos, F.; Apiletti, D.; Assimakopoulos, V.; Babai, M.Z.; Barrow, D.K.; Taieb, S.B.; Bergmeir, C.; Bessa, R.J.; Bijak, J.; Boylan, J.E.; et al. Forecasting: Theory and practice. Int. J. Forecast. 2022, 38, 705–871. [Google Scholar] [CrossRef]
Farnaaz, N.; Jabbar, M.A. Random Forest Modeling for Network Intrusion Detection System. Procedia Comput. Sci. 2016, 89, 213–217. [Google Scholar] [CrossRef]
Abbasimehr, H.; Paki, R.; Bahrini, A. A Novel XGBoost-Based Featurization Approach to Forecast Renewable Energy Consumption with Deep Learning Models. Sustain. Comput. Inform. Syst. 2023, 38, 100863. [Google Scholar] [CrossRef]
Zhu, Z.; He, K. Prediction of Amazon’s Stock Price Based on ARIMA, XGBoost, and LSTM Models. Proc. Bus. Econ. Stud. 2022, 5, 127–136. [Google Scholar] [CrossRef]
Bitirgen, K.; Filik, Ü.B. Electricity Price Forecasting Based on Xgboost and Arima Algorithms. BSEU J. Eng. Res. Technol. 2020, 1, 7–13. [Google Scholar]
Abdollah, M.A.F. Data Driven Fault Detection and Diagnostics for HVAC Systems in Buildings. Ph.D. Thesis, Politecnico di Milano, Milan, Italy, 2023. [Google Scholar]
Parra-Santos, M.-T. A Project for the Future: Experiences and Sharing 05/2020; University of Valladolid: Valladolid, Spain, 2020. [Google Scholar]
Rey-Hernández, J.; Velasco-Gómez, E.; San José-Alonso, J.; Tejero-González, A.; Rey-Martínez, F. Energy Analysis at a Near Zero Energy Building. A Case-Study in Spain. Energies 2018, 11, 857. [Google Scholar] [CrossRef]
Salem, K.M.; Rady, M.; Aly, H.; Elshimy, H. Design and Implementation of a Six-Degrees-of-Freedom Underwater Remotely Operated Vehicle. Appl. Sci. 2023, 13, 6870. [Google Scholar] [CrossRef]
Abumandour, R.M.; El-Reafay, A.M.; Salem, K.M.; Dawood, A.S. Numerical Investigation by Cut-Cell Approach for Turbulent Flow through an Expanded Wall Channel. Axioms 2023, 12, 442. [Google Scholar] [CrossRef]
Rey Hernández, J.M.; Velasco Gómez, E.; San José Alonso, J.F.; González González, S.L.; Rey Martínez, F.J. Energy Management by Dynamic Monitoring of a Building of the University of Valladolid. J. Energy Power Sources 2017, 4, 36–42. [Google Scholar]
Bilbao, J.; Miguel, A.; Ayuso, A. Renewable Energy Education in the Department of Applied Physics at the University of Valladolid, Spain. In Proceedings of the ISES Congress, Göteborg, Sweden, 14–19 June 2003. [Google Scholar]
Bastida-Molina, P.; Torres-Navarro, J.; Honrubia-Escribano, A.; Gómez-Lázaro, E. Electricity Consumption Analysis for University Buildings. Empirical Approach for University of Castilla-La Mancha, Campus Albacete (Spain). Renew. Energy Power Qual. J. 2022, 20, 216–221. [Google Scholar] [CrossRef]
Salem, K.M.; Elreafay, A.M.; Abumandour, R.M.; Dawood, A.S. Modeling Two-Phase Gas-Solid Flow in Axisymmetric Diffusers Using Cut Cell Technique: An Eulerian-Eulerian Approach. Bound. Value Probl. 2024, 2024, 150. [Google Scholar] [CrossRef]
Elreafay, A.M.; Salem, K.M.; Abumandour, R.M.; Dawood, A.S.; Al Nuaimi, S. Effect of Particle Diameter and Void Fraction on Gas–Solid Two-Phase Flow: A Numerical Investigation Using the Eulerian–Eulerian Approach. Comput. Part. Mech. 2025, 12, 289–311. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Hernández, J.M.; Elgharib, A.O.; Rey-Martínez, F.J. Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions. Appl. Sci. 2025, 15, 6806. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Martínez, F.J.; Elgharib, A.O.; Rey-Hernández, J.M. Energy Demand Forecasting Scenarios for Buildings Using Six AI Models. Appl. Sci. 2025, 15, 8238. [Google Scholar] [CrossRef]
Román-Portabales, A.; López-Nores, M.; Pazos-Arias, J.J. Systematic Review of Electricity Demand Forecast Using ANN-Based Machine Learning Algorithms. Sensors 2021, 21, 4544. [Google Scholar] [CrossRef]
Li, Z.; Dai, J.; Chen, H.; Lin, B. An ANN-Based Fast Building Energy Consumption Prediction Method for Complex Architectural Form at the Early Design Stage. Build. Simul. 2019, 12, 665–681. [Google Scholar] [CrossRef]
de Jesús Rubio, J.; Garcia, D.; Sossa, H.; Garcia, I.; Zacarias, A.; Mujica-Vargas, D. Energy Processes Prediction by a Convolutional Radial Basis Function Network. Energy 2023, 284, 128470. [Google Scholar] [CrossRef]
Lin, W.-M.; Gow, H.-J.; Tsai, M.-T. An Enhanced Radial Basis Function Network for Short-Term Electricity Price Forecasting. Appl. Energy 2010, 87, 3226–3234. [Google Scholar] [CrossRef]
Ghods, L.; Kalantar, M. Long-Term Peak Demand Forecasting by Using Radial Basis Function Neural Networks. Iran. J. Electr. Electron. Eng. 2010, 6, 175–182. [Google Scholar]
Chen, S.; Guo, W. Auto-Encoders in Deep Learning—A Review with New Perspectives. Mathematics 2023, 11, 1777. [Google Scholar] [CrossRef]
Yagli, G.M.; Yang, D.; Srinivasan, D. Automatic Hourly Solar Forecasting Using Machine Learning Models. Renew. Sustain. Energy Rev. 2019, 105, 487–498. [Google Scholar] [CrossRef]
Yucong, W.; Bo, W. Research on Ea-Xgboost Hybrid Model for Building Energy Prediction. J. Phys. Conf. Ser. 2020, 1518, 012082. [Google Scholar] [CrossRef]
George, J.; Yadav, J.; Nair, A.M.; Peter, M.V.; Alapatt, B.P.; Baby, R. Improving Groundwater Forecasting Accuracy with a Hybrid ARIMA-XGBoost Approach. In Proceedings of the 2024 3rd International Conference for Advancement in Technology (ICONAT), Goa, India, 6–8 September 2024; pp. 1–7. [Google Scholar]
Garcia-Martin, E.; Lavesson, N.; Grahn, H. Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree. In Proceedings of the Green, Pervasive, and Cloud Computing: 12th International Conference, GPC 2017, Cetara, Italy, 11–14 May 2017; pp. 267–281. [Google Scholar]
Ramos, D.; Faria, P.; Morais, A.; Vale, Z. Using Decision Tree to Select Forecasting Algorithms in Distinct Electricity Consumption Context of an Office Building. Energy Rep. 2022, 8, 417–422. [Google Scholar] [CrossRef]
Verma, A.; Prakash, S.; Kumar, A. ANN-based Energy Consumption Prediction Model up to 2050 for a Residential Building: Towards Sustainable Decision Making. Environ. Prog. Sustain. Energy 2021, 40, e13544. [Google Scholar] [CrossRef]
Liu, J. Radial Basis Function (RBF) Neural Network Control for Mechanical Systems: Design, Analysis and Matlab Simulation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; ISBN 3642348165. [Google Scholar]
Kamusoko, C.; Gamba, J. Simulating Urban Growth Using a Random Forest-Cellular Automata (RF-CA) Model. ISPRS Int. J. Geoinf. 2015, 4, 447–470. [Google Scholar] [CrossRef]
Fawagreh, K.; Gaber, M.M.; Elyan, E. Random Forests: From Early Developments to Recent Advancements. Syst. Sci. Control. Eng. Open Access J. 2014, 2, 602–609. [Google Scholar] [CrossRef]
Cordeiro-Costas, M.; Villanueva, D.; Eguía-Oller, P.; Martínez-Comesaña, M.; Ramos, S. Load forecasting with machine learning and deep learning methods. Appl. Sci. 2023, 13, 7933. [Google Scholar] [CrossRef]
Li, P.; Zhang, J.-S. A New Hybrid Method for China’s Energy Supply Security Forecasting Based on ARIMA and XGBoost. Energies 2018, 11, 1687. [Google Scholar] [CrossRef]
Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs Neurons: Comparison between Random Forest and ANN for High-Resolution Prediction of Building Energy Consumption. Energy Build. 2017, 147, 77–89. [Google Scholar] [CrossRef]

Figure 1. Plan view of connected buildings to DH.

Figure 2. Data preprocessing sequences.

Figure 3. Block diagram for analyzing energy consumption.

Figure 4. Box plot: actual vs. predicted consumption for all the campus with different AI models.

Figure 5. Scatter plot: actual and predicted consumption for all the campus with different AI models.

Figure 6. Q-Q plot: actual vs. predicted consumption for all the campus with different AI models.

Figure 7. Histograms for all the campus with different AI models.

Figure 8. Performance of six machine learning models for all the campus with different AI models.

Figure 9. Correlation heatmap of six machine learning models for all the campus with different AI models.

Figure 10. Sample building, actual vs. predicted consumption: (a) building D01, (b) building D08, (c) building E03 and (d) building E014.

Figure 11. Sample building, heatmap: (a) building D01, (b) building D08, (c) building E03, and (d) building E014.

Figure 12. CO₂ emissions for whole building.

Figure 13. Sensitivity analysis and feature importance: (a) ANN, (b) RF, (c) XGBoost, (d) RBF, (e) autoencoder, and (f) tree decision.

Figure 14. Forecast for the whole campus after 5 years: (a) energy forecast and (b) CO₂ forecast.

Table 1. Sample of data D01.

Tsupplysec	Pinst (V)	Flow Rate from Energy	Energy SCADA	Processed Days Flow Rate from	Flow Rate from Energy	Energy Calculated	AT Primary	AT Secondary	Primary Flow Rate
78.7	352	63.5	262,108	31	64.2	262,214.1449	11.0	4.2	26.7
79.6	281	56.5	188,661	28	56.9	188,858.479	12.2	3.7	19.1
80.3	223	48.3	164,265	31	48.8	165,986.9185	11.7	3.4	15.7
80.2	185	40.2	132,430	30	40.7	132,872.182	10.1	3.4	15.1
80.9	100	25.1	68,231	30	25.4	71,673.1934	8.3	2.8	9.5
80.8	52	8.8	28,590	30	8.8	37,426.36863	5.7	3.4	5.8
75.2	34	4.3	18,763	31	4.3	25,128.6395	5.9	4.4	3.6
73.1	31	4.2	16,253	30	4.2	22,377.90924	6.3	4.0	3.0
78.8	41	9.4	26,707	29	9.4	28,830.24944	6.7	3.1	4.7
80.6	96	25.6	72,340	31	25.6	71,667.5954	9.2	2.8	8.8
79.3	275	56.9	200,219	30	57.2	197,952.3616	11.8	3.7	19.6
79.1	318	62.0	237,072	31	62.4	236,634.592	13.4	3.9	19.8

Table 2. Mathematical model for ANN.

Layer	Equation Description	Equation	No. Equation
Input Layer	Input Features	$x = [x_{1}, x_{2}, \dots, x_{n}]$	(1)
Hidden Lever	Weighted Sum	$z^{1} = W^{1} x + b^{1}$	(2)
Output Layer	Weighted Sum	$z^{3} = W^{3} h^{2} + b^{3}$	(3)
	Final Output (Prediction)	$\hat{y} = g z^{3}$	(4)
Loss Function	Mean Squared Error	$L = \frac{1}{N} \sum_{i = 1}^{N} {(\hat{y} - E_{actual})}^{2}$	(5)
Backpropagation	Gradient of Loss w.r.t. Output	$\frac{\partial L}{\partial \hat{y}} = - \frac{2}{N} E_{actual} - \hat{y}$	(6)
	Gradient w.r.t Hidden Layer 1	$δ^{1} = δ^{2} \cdot W^{2} \cdot f_{z}^{'}$	(7)
Weight Updates	Update Rule for Weights (Layer 1)	$W^{1} \leftarrow W^{1} - η \cdot δ^{1} \cdot x^{T}$	(8)
	Update Rule for Weights (Output Layer)	$W^{3} \leftarrow W^{3} - η δ^{3} \cdot h^{2 T}$	(9)
	Update Rule for Biases (Hidden Layer 1)	$b^{1} \leftarrow b^{1} - η \cdot δ^{1}$	(10)
	Update Rule for Biases (Output Layer)	$b^{3} \leftarrow b^{3} - η \cdot δ^{3}$	(11)

Table 3. Mathematical model for RBF.

Component	Equation	No. Equation
Input variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(12)
Activation of the j-th neuron in the hidden layer	$ϕ_{j} (x) = e^{- \frac{{∥x \cdot c_{j}∥}^{2}}{2 σ_{j}^{2}}}$	(13)
Output of the RBF	$y (x) = \sum_{j = 1}^{N} w_{j} ϕ_{j} (x)$	(14)
Error can be computed	$E = \frac{1}{2} \sum_{i = 1}^{M} {(y_{i} - {\hat{y}}_{i})}^{2}$	(15)

Table 4. Mathematical model for autoencoder.

Component	Equation	No. Equation
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(16)
Encoding Process	$z = f (x) = σ (W_{e} x + b_{e})$	(17)
Decoding Process	$\hat{x} = g (z) = σ (W_{d} z + b_{d})$	(18)
Loss Function	$L = \frac{1}{N} \sum_{i = 1}^{N} {∥x_{i} - {\hat{x}}_{i}∥}^{2}$	(19)

Table 5. Mathematical model for RF.

Component	Equation	No. Equation
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(20)
Ensemble Prediction	$\hat{Y} = \frac{1}{N} \sum_{i = 1}^{N} f_{i} (X)$	(21)
Tree Structure	Each tree $f_{i} (X)$ is constructed using random samples of features and instances
Node Splitting	arg max _j∈J(Gain $(j)$ )	(22)
Leaf Prediction	${\hat{Y}}_{leaf} = \frac{1}{m} \sum_{j = 1}^{m} Y_{j}$	(23)
Feature Importance	Importance $(X_{k}) = \frac{1}{N} \sum_{i = 1}^{N}$ Gain $(X_{k})$	(24)
Error Estimation	OOB Error $= \frac{1}{N} \sum_{i = 1}^{N} I (Y_{i} \neq {\hat{Y}}_{i})$	(25)

Table 6. Mathematical model for XGBoost.

Component	Equation	No. Equation
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(26)
Model Equation	$\hat{Y} = \sum_{k = 1}^{K} f_{k} (X)$	(27)
Objective Function	$L = \sum_{i = 1}^{N} L o s s (Y_{i}, {\hat{Y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})$	(28)
Regularization Term	$Ω (f_{k}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}$	(29)
Tree Splitting Gain	$G a i n (X_{j}$ , split) $= \frac{1}{2} (\frac{{(\sum_{i \in L} g_{i})}^{2}}{\sum_{i \in L} h_{i} + λ} + \frac{{(\sum_{i \in R} g_{i})}^{2}}{\sum_{i \in R} h_{i} + λ} - \frac{{(\sum_{i} g_{i})}^{2}}{\sum_{i} h_{i} + λ})$	(30)
Final Prediction	$\hat{Y} =$ base_score $+ \sum_{k = 1}^{K} f_{k} (X)$	(31)

Table 7. Mathematical model for tree decision.

Component	Equation	No. Equation
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(32)
Gini Impurity	$G i n i (D) = 1 - \sum_{k = 1}^{K} p_{k}^{2}$	(33)
Entropy	$E n t r o p y (D) = - \sum_{k = 1}^{K} p_{k} {l o g}_{2} (p_{k})$	(34)
Mean Squared Error (MSE)	$M S E = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \hat{y})}^{2}$	(35)
Information Gain	$I G (D, A) = E n t r o p y (D) - \sum_{v \in V a l u e s (A)} \frac{\|D_{v}\|}{\| D \|} E n t r o p y (D_{v})$	(36)

Table 8. Evaluation metrics equation.

Component	Equation	No. Equation
Root Mean Square Percentage Error (RMSPE)	$R M S P E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} \times 100$	(37)
Mean Absolute Percentage Error (MAPE)	$M A P E = \frac{1}{N} \sum_{i = 1}^{N} \|y_{i} - {\hat{y}}_{i}\| \times 100$	(38)
Kling–Gupta Efficiency (KGE)	$K G E = 1 - \sqrt{{(r - 1)}^{2} + {(\frac{σ_{model}}{σ_{obs}} - 1)}^{2} + {(\frac{μ_{model}}{μ_{o b s}} - 1)}^{2}}$	(39)
Nash–Sutcliffe Efficiency (NSE)	$N S E = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \overline{y})}^{2}}$	(40)
Coefficient of Determination (R²)	$R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \overline{y})}^{2}}$	(41)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rey-Hernández, A.; San José-Alonso, J.; Picallo-Perez, A.; Rey-Martínez, F.J.; Elgharib, A.O.; Rey-Hernández, J.M.; Salem, K.M. A Predictive Approach for Energy Efficiency and Emission Reduction in University Campuses. Appl. Sci. 2025, 15, 9419. https://doi.org/10.3390/app15179419

AMA Style

Rey-Hernández A, San José-Alonso J, Picallo-Perez A, Rey-Martínez FJ, Elgharib AO, Rey-Hernández JM, Salem KM. A Predictive Approach for Energy Efficiency and Emission Reduction in University Campuses. Applied Sciences. 2025; 15(17):9419. https://doi.org/10.3390/app15179419

Chicago/Turabian Style

Rey-Hernández, Alberto, Julio San José-Alonso, Ana Picallo-Perez, Francisco J. Rey-Martínez, A. O. Elgharib, Javier M. Rey-Hernández, and Khaled M. Salem. 2025. "A Predictive Approach for Energy Efficiency and Emission Reduction in University Campuses" Applied Sciences 15, no. 17: 9419. https://doi.org/10.3390/app15179419

APA Style

Rey-Hernández, A., San José-Alonso, J., Picallo-Perez, A., Rey-Martínez, F. J., Elgharib, A. O., Rey-Hernández, J. M., & Salem, K. M. (2025). A Predictive Approach for Energy Efficiency and Emission Reduction in University Campuses. Applied Sciences, 15(17), 9419. https://doi.org/10.3390/app15179419

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Predictive Approach for Energy Efficiency and Emission Reduction in University Campuses

Abstract

1. Introduction

1.1. Literature Review

1.2. Contributions

2. Methodology

2.1. Data Acquisition

2.2. Data Preprocessing

2.3. Mathematical Model

2.3.1. Mathematical Model (ANN)

2.3.2. Mathematical Model (RBF)

2.3.3. Mathematical Model Autoencoder

2.3.4. Mathematical Model Random Forest

2.3.5. Mathematical Model XGBoost

2.3.6. Mathematical Model Tree Decision

2.4. KPI’S Matrix

2.5. Optimization Procedures

3. Results and Discussions

3.1. Validation of the Whole Campus

3.2. Evaluation of Performance Matrix

3.3. Validation of Each Building

3.4. CO2 Emissions on Campus

3.5. Sensitivity Analysis and Feature Importance

3.6. Short-Term Predections

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.4. CO₂ Emissions on Campus