1. Introduction
In response to growing environmental concerns such as unpredictable climate change, cities are shifting towards sustainability. Among urban components, buildings play a critical role due to their significant energy consumption. Buildings account for nearly 40% of global energy use and contribute to one-third of greenhouse gas emissions [
1]. To mitigate this impact, there is an urgent demand for smart buildings that optimize energy efficiency through intelligent control of mechanical systems. As a result, researchers have been actively exploring Building Energy Management Systems (BEMSs), which leverage advanced prediction models to enhance energy distribution, conservation, and generation strategies. Furthermore, advancements in Internet-of-Things (IoT) sensors and data-driven methodologies have enabled energy forecasting models to better capture the nonlinear relationships in building operation data [
2].
Despite the effectiveness of these models, their performance varies significantly depending on the quantity, quality, and type of training data, even when tested under identical experimental conditions [
2,
3]. In real-world applications, many buildings lack reliable measurement systems, leading to insufficient and inconsistent historical data. Moreover, variations in sensor configurations result in heterogeneous data structures across different buildings, affecting model accuracy. For instance, newly constructed buildings may collect diverse sensor data using Advanced Metering Infrastructure (AMI) but suffer from insufficient historical records due to short operational periods. Conversely, older buildings may have extensive historical data but lack the diversity of operational sensor data. These challenges hinder the scalability and practicality of conventional prediction models, which are often tailored to individual buildings rather than generalized for multiple environments [
4,
5].
To address this, knowledge-sharing models have emerged as a promising approach to deep learning. Transfer learning (TL) has been widely applied to building energy prediction, where knowledge from data-rich buildings is transferred to those with limited data. However, TL follows a unidirectional knowledge transfer paradigm, requiring centralized data aggregation, which raises privacy concerns and limits adaptability in environments with heterogeneous sensor setups. Additionally, TL models often require frequent updates to maintain stable performance, and their effectiveness heavily depends on the pre-trained model’s quality.
Federated Learning (FL) offers a bi-directional knowledge-sharing approach by enabling buildings to train models locally without sharing raw data. FL facilitates the collaborative development of a global model while maintaining data privacy. Unlike TL, FL allows individual buildings to use their local data while contributing to a collectively optimized model. One of the most recognized FL methods is FedAvg, which aggregates local models by averaging their parameters [
6,
7,
8,
9]. Initially, FL research focused on optimizing model aggregation techniques, but more recent studies have explored Personalized Federated Learning (pFL) as a solution to data heterogeneity. PFL aims to create client-specific models by balancing global knowledge with local adaptability. However, challenges remain in preserving model performance across diverse sensor configurations.
This study proposes a Personalized Federated Learning (pFL) approach to predict building energy consumption in environments with limited data availability and diverse sensing modalities. Our method integrates model ensemble techniques with feature masking strategies and knowledge transfer, ensuring generalized and personalized energy prediction across different buildings. While prior studies have examined FL applications in electric load forecasting, they primarily focused on aggregated system-level forecasting [
9,
10,
11]. In contrast, this work aims to demonstrate the advantages of PFL-based models over traditional customized building energy prediction methods.
To evaluate the proposed method, we employ Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) across multiple experimental settings. The effectiveness of our approach is validated using a campus energy dataset, highlighting improvements in prediction performance. This study contributes to the advancement of knowledge-sharing models in FL by addressing data scarcity and sensor heterogeneity. Additionally, we introduce a feature masking-based ensemble approach and federated transfer strategies to enhance PFL’s capability of learning from incomplete and inconsistent sensor data.
The main contributions of this study are as follows:
A novel Personalized Federated Learning (pFL) framework is proposed for short-term building energy prediction under heterogeneous sensing environments.
Model ensemble with multi-level masking strategies is introduced to reduce variance and improve stability in local training.
Transfer learning is integrated with FL for more personalization to enhance local adaptation.
The proposed method is validated through real-world datasets from campus buildings, demonstrating superior performance to baseline FL approaches.
The rest of this study is organized as follows:
Section 2 describes the theoretical reviews on building energy forecasting and FL with personalization. Then, the overall research flow including the development of prediction models based on several neural networks and application of PFL is represented in
Section 3. The results of energy prediction of the proposed methods are evaluated and compared to different predictive models in terms of performance metrics. Lastly, the conclusions and contributions of this work, including further research, are summarized in
Section 5.
2. Literature Review
2.1. Building Energy Forecasting
Building energy consumption prediction deals with sequential data that exhibit varying patterns across different time horizons—short-, medium-, and long-term [
12]. Accurate forecasting is crucial for decision-makers to implement cost-effective and energy-efficient strategies. Consequently, developing robust predictive models has gained significant research attention, particularly in data-driven approaches. Many recent studies have focused on machine learning (ML) techniques, leveraging Advanced Metering Infrastructure (AMI) to collect operational building data. Among conventional ML methods, random forest (RF) and support vector regression (SVR) have been widely used for energy prediction. Jurado et al. [
13] examined ML models for hourly electricity consumption and found fuzzy inductive reasoning (FIR) and RF to be the most accurate. Candanedo et al. [
3] compared multiple linear regression (MLR), SVR, RF, and gradient boost machines (GBM), highlighting RF and GBM as superior models. Wang et al. [
14] demonstrated RF’s advantages over regression tree (RT) and SVR, achieving 14–25% and 5–5.5% better performance indices, respectively. RF was also found to be robust to the number of input variables, while SVR excelled in short-term predictions with minimal hyperparameter tuning. Zhang et al. [
15] optimized nu-SVR and epsilon-SVR parameters using evolutionary algorithms, achieving MAPE as low as 3.77% for half-hour predictions. Li et al. [
16] found support vector machines (SVM) to be superior to back-propagation neural networks (BPNN), radial basis function neural networks (RBFNN), and GRNN in predicting residential energy consumption in China.
Recently, deep learning (DL) models have been widely applied, especially for short-term energy and building environment prediction [
17,
18,
19,
20]. Among them, long short-term memory (LSTM), a recurrent neural network (RNN) variant designed to address the vanishing gradient problem, has gained popularity [
21]. Jin et al. [
21] integrated singular spectrum analysis (SSA) with parallel LSTM, effectively capturing long-term dependencies and sudden fluctuations, achieving high R-squared values close to 1. Additionally, researchers have incorporated occupancy behavior patterns into energy models. Anand et al. [
22] analyzed the impact of occupancy profiles on energy usage using SVR, RF, GBM, ANN-FF, and ANN-DN, finding ANN-DN to be highly accurate but computationally expensive.
2.2. Concept of Federated Learning and Its Personalization
Federated Learning (FL) is a distributed machine learning framework that enables multiple clients to train a global model without sharing raw data, ensuring privacy preservation [
7]. In FL, clients locally update the model using their own data, and a central server aggregates these updates to refine the global model. Since its introduction by Google in 2016, FL has been widely adopted in mobile systems, finance, and healthcare [
6,
11,
23,
24]. It has been used to reduce communication overhead in mobile edge computing, improve fraud detection, and monitor healthcare data while maintaining privacy [
9]. Two representative methods for FL are FedAvg and FedProx [
25,
26]. Former is a simple and communication-efficient algorithm suitable for distributed learning scenarios. It performs well when client data are independent and identically distributed (IID) settings and system conditions are stable. However, it struggles with non-independent and identically distributed (non-IID) data and client variability, often failing to converge reliably. The lack of robustness of FedAvg limits its effectiveness in real-world heterogeneous federated settings. On the other hand, FedProx enhances FedAvg by adding the proximal term to handle heterogeneity across clients. It supports variable local updates, partial participation, and provides convergence guarantees. This makes it more stable and reliable in non-IID environments. But, it requires tuning and additional hyperparameter causing other complexity.
Those standard FL models are still struggling with data heterogeneity across clients, leading to significant performance degradation. A major challenge is data scarcity, where many buildings lack comprehensive historical records due to limited sensing infrastructure. Additionally, sensor heterogeneity leads to inconsistencies in data formats, feature distributions, and measurement frequencies, making it difficult to train a globally optimized model. These issues undermine FL’s ability to generalize across diverse environments.
To address this, Personalized Federated Learning (pFL) has been developed to customize models for individual clients, particularly for non-IID data [
10]. Various strategies have been explored, including data augmentation using generative autoencoders, client selection via reinforcement learning, and model clustering [
6].
Existing PFL methods can be classified into two main approaches: Global Model Personalization, which enhances the generalization of the global model for all clients [
27,
28,
29]. Learning Personalized Models, where clients develop customized models tailored to their unique data distributions. Federated Augmentation (FAug) addresses data heterogeneity by using Generative Adversarial Networks (GANs) to create synthetic data, making originally non-IID data more uniform across clients [
30]. Other approaches, such as pFedMe, introduce Moreau envelope optimization to regulate local-global model differences [
31]. FedFomo employs a weighting strategy to allow clients to selectively adopt models suited to their tasks, while pFedSD enhances local models through knowledge distillation across rounds [
32]. Although these methods improve FL’s adaptability, they still involve some degree of data sharing or require complex model architectures, raising concerns about privacy and deployment feasibility. Consequently, these algorithms have shortcomings in terms of privacy protection and the feasibility of model deployment. To provide a clearer comparison, we have investigated the accuracy of representative algorithms employed in previous studies in
Table 1. Unlike traditional FL methods such as FedAvg and FedProx, which aggregate models uniformly, our framework incorporates model ensemble with multi-level masking and transfer learning-based personalization to adapt to local sensor heterogeneity and reduce training variance.
3. Methodology
Accurately predicting building energy consumption is essential for intelligent energy management, facilitating optimized energy usage and reducing operational costs. Traditional centralized learning and transfer learning methods rely on aggregating data from multiple buildings, which is impractical in heterogeneous sensing environments due to concerns regarding privacy, scalability, and communication overhead. Federated Learning (FL) offers a decentralized alternative, enabling local training without sharing raw data. However, conventional FL approaches assume homogeneous data distributions across clients, which is unrealistic for energy prediction given sensor heterogeneity, fluctuating energy consumption patterns, and environmental variations. A significant challenge in FL for energy forecasting lies in handling non-independent and identically distributed (non-IID) data, as each building operates with distinct sensor types and configurations [
28].
To address these challenges, this study proposes a Personalized Federated Learning (pFL) approach for building energy prediction. The proposed method integrates model ensemble techniques with multi-level feature masking strategies and personalization via knowledge transfer through fine-tuning specific layers. This research focuses on three key questions:
Can FL provide robust energy consumption predictions in heterogeneous sensing environments while addressing data limitations?
How does FL compare to standalone models trained independently on each building’s local dataset in terms of prediction accuracy and generalization?
What techniques can enhance FL’s performance when participating buildings have different numbers and types of sensors?
To achieve personalized FL, this study introduces adaptive feature masking using the AutoCorrelation Function (ACF). ACF helps in selecting high-impact time-series features, ensuring that each building’s local model is optimized for its unique consumption patterns. The Federated Learning objective is to minimize the overall prediction error across all clients as follows:
where
represents the objective loss function, and
is the dataset of the
-
th building. Standard FL updates the global model using the following expression:
However, this simple uniform aggregation fails to account for differences in sensor availability and variations in energy consumption patterns across buildings [
10]. To address this, the proposed pFL-based prediction model dynamically selects time-series features for each client, enhancing learning efficiency while preserving model robustness.
Figure 1 illustrates the overall workflow of the proposed approach, designed to improve both scalability and adaptability in heterogeneous sensing environments. The step-by-step description of proposed method is as follows:
Step 1: Global model initialization ad distribution
The process begins with the initialization of a global model, which is then distributed from the federated cloud server to each building. Each client (building) receives a copy of the model and prepares to train it using locally collected energy consumption data.
Step 2: Autocorrelation analysis on local time-series data
Each building performs an autocorrelation analysis on its time-series energy data to evaluate the temporal dependencies. This step identifies how much past energy consumption influences current values
Step 3: Multi-level feature masking and sub-model generation
Based on the autocorrelation results, three levels of feature masking are applied to emphasize the most relevant temporal features. These masked feature sets are used to train decomposed sub-models on each client, representing different temporal views.
Step 4: Ensemble of local sub-models
The generated sub-models are ensembled on each client. This ensemble incorporates the strengths of different temporal views, thereby increasing model robustness to data irregularity.
Step 5: Global model aggregation and redistribution for personalization
After local training, the cloud server collects the ensembled models from each building and performs weighted and mean aggregation to update the global model. During redistribution, the updated global model is transmitted back to each client. At this point, transfer learning is applied: only the fully connected (prediction) layers are fine-tuned using local data, while the shared feature extractor remains fixed. This selective adaptation enables effective personalization for each building without sacrificing generalizability.
3.1. Data Description and Preprocessing
In data-driven approaches, both the quantity and quality of experimental datasets significantly impact model performance. Additionally, it is crucial to determine which features influence target values and how many should be utilized for predictive model development [
22]. This study incorporates various outdoor environmental datasets, including temperature, relative humidity, wind speed/direction, atmospheric and sea-level pressure, dew temperature, solar radiation, and sunshine duration, as shown in
Table 2. These features are collected from a meteorological center near the campus in Daejeon, Republic of Korea, located approximately 1.04 km away. For building operational data, eight individual buildings within the campus community were selected, each exhibiting distinct operational characteristics and consumption patterns.
Table 3 provides detailed specifications of experimental buildings, including their usage purpose, building area, total floor area, and number of floors. These physical differences across buildings result in varying patterns of energy consumption. For instance, larger or multi-story buildings tend to have higher heating and cooling demands, while usage type affects occupancy-related loads. Such diversity highlights the need to account for building-specific characteristics in energy modeling. The experiments in this study only consider environmental and temporal variables to reflect real-world dynamics.
We collected the experimental datasets during the summer season (June to August 2019), each building contains 2208 experimental samples, resulting in a total dataset of 17,664 samples across all buildings. To maintain the continuity of the time-series data, missing weather and energy consumption values are imputed using the mean values from the same sequence seven days prior, as this minimizes fluctuations and preserves dataset consistency. Once missing values are addressed, continuous and categorical features are transformed through standard normalization and one-hot encoding to ensure compatibility with the predictive model. This preprocessing step enables the model to accurately interpret relationships between variables, eliminating inconsistencies caused by differing feature formats. The dataset comprises 2208 hourly energy consumption records collected during the summer, with each data point represented as a 12-dimensional vector corresponding to various time and environmental features. To capture sequential dependencies for short-term energy prediction, the dataset is restructured into time-series sequences using historical windows of varying lengths. The resulting sequential dataset consists of 2208 − t + 1 samples and is used to train and test deep learning models. A historical time lag is set within the range [28, 48, 72], meaning input sequences span at least 24 h to account for short-term dependencies in energy consumption patterns. Training and testing data are split in an 80:20 ratio, with the PFL-based prediction model used to forecast one-hour-ahead energy consumption for each building.
3.2. Personalized Federated Learning for Short-Term Building Energy Prediction
3.2.1. Selection of Building Energy Prediction Model Architecture
FL relies on a shared model that bridges global and local models, necessitating the selection of optimal predictive models. These models must be capable of effectively representing data through ensemble approaches and knowledge transfer strategies. The shared model developed in this study is designed to meet two practical considerations. First, the input features must be easily obtainable yet rich enough to ensure reliable and accurate predictions. Second, the model architecture must effectively capture high-level features and nonlinear spatial–temporal relationships within the time-series data. To address these requirements, two deep learning architectures were employed: a 1D Convolutional Neural Network (1D-CNN) with dilated convolutions and long short-term memory (LSTM) networks. The 1D-CNN was originally designed for grid-structured data and has been extensively used for both visual data (e.g., images, videos) and time-series data. By applying the weight-sharing concept, it excels at handling nonlinear problems. Its hierarchical learning process enables the network to gradually abstract features through multiple layers. Typically, a 1D-CNN includes three main components: convolution layers, which extract primitive feature representations; pooling layers, which capture intermediate abstract patterns; and fully connected layers, which identify high-level patterns by compressing and integrating features from previous layers.
LSTM, a specialized form of recurrent neural network (RNN), processes sequential data through connections between units that form a directed graph. While traditional RNNs suffer from the vanishing gradient problem, which diminishes model accuracy, LSTMs address this issue by introducing gates (input, forget, output) and memory cells to control the flow of information through the network [
21]. This enables LSTM networks to effectively model long-term dependencies in sequential datasets.
Both models were applied to each client’s dataset. Hyperparameter tuning was performed for all models using a grid search approach, and the specific ranges and intervals for tuning are outlined in
Table 4. For both CNN and LSTM models, several hyperparameters were commonly tuned, including activation functions, optimizers, dropout rates, and batch sizes. Dropout, in particular, served as a regularization method to mitigate overfitting by randomly omitting neurons from certain layers during training. Meanwhile, adjusting the batch size helped generalize the models by dividing the dataset into smaller, manageable portions, thus reducing memory costs. The one-dimensional convolution layers in CNN were specifically used to extract spatial features hidden in the source building’s dataset. This approach has proven effective in reducing data loss and improving model performance.
3.2.2. Dynamic Model Ensemble via Multi-Level Feature Masking Strategies
As previously mentioned, this study focuses on solving regression problems for predicting short-term building energy consumption, despite the varying characteristics of sensing data across different buildings. By reusing collaboratively learned knowledge, the goal is to develop scalable, generalized predictive models. To achieve this, the study employs multi-level masking and dynamic ensemble methods during model aggregation to effectively extract knowledge from all participating buildings. The Auto-Correlation Function (ACF) measures the relationship between a time series and its past values. It is a valuable tool for identifying patterns and periodic trends, such as seasonal cycles, in time-series data [
36]. ACF is instrumental in choosing suitable time-series models, such as AR, MA, and ARIMA, as high autocorrelation suggests that past values significantly influence future observations, improving forecast accuracy. Additionally, ACF aids feature engineering in machine learning by pinpointing relevant lags, enhances residual diagnostics by ensuring that model errors are not autocorrelated, and ultimately boosts predictive performance by highlighting key data dependencies. By providing insights into patterns, ACF supports better model selection and more accurate forecasts. In this study, multi-level masking is achieved by calculating autocorrelation values to determine the degree of influence past feature values have on current observations. As shown in
Figure 2, it was found that states closer to the current values had a stronger impact, with autocorrelation values increasing again at points corresponding to the same time on the previous day. Conversely, ranges highlighted in blue were identified as having minimal influence on current energy consumption, making them less relevant for prediction.
With these calculated values, the different levels of masking strategies are applied to decompose each model to represent the unique patterns in the individual buildings. Features with higher ACF values contribute significantly to temporal dependencies in energy consumption of buildings, while features with low ACF values have weak correlations with past values and can be masked without major performance loss. Thus, we apply three different masking strategies, which categorize features into high-impact, low-impact, and random masking groups by using pre-defined threshold values as follows:
The threshold values and empirically adjusted based on feature distribution. To determine the thresholds we analyzed autocorrelation distributions across time lags. Features exceeding were considered high-impact and retrained, while those were masked to remove redundancy. This classification enables tailored feature selection per building, optimizing training under diverse temporal characteristics. Buildings that participated in this study independently applied their assigned feature masking strategies to modify local datasets. The higher feature masking represents the prevention of over-reliance on dominant patterns and it improves model generalization but potentially slows early learning. In lower feature masking, it removes weakly correlated features to eliminate noise and speed up training, ensuring the model focuses on more informative patterns. Lastly, random feature masking would enhance model robustness. In addition, we designed flexible masking ratios across all training rounds to reduce excessive information loss. This is called the adaptive masking ratio that gradually increases during communication rounds in the federated process. The masking ratio is controlled by an exponential growth function as follows:
where
is the maximum masking ratio, which is the upper bound for the masking ratio.
is the scaling factor for controlling how quickly the masking ratio increases. To prevent overly aggressive feature masking at early training rounds, we initialized the masking ratio in the range of [0.1, 0.5] based on empirical performance. The maximum masking ratio M_max was capped at 50%, ensuring that at least half of the most informative features are preserved throughout training. The scaling factor λ, which controls the speed at which the masking ratio approaches M_max, was selected within the range [1.0, 2.5]. A smaller λ (e.g., 1.0) results in a slower and more gradual masking increase, which favors stability during early model convergence. In contrast, a larger λ (e.g., 2.5) accelerates masking growth, encouraging faster generalization by reducing redundancy more quickly. These values were empirically chosen through grid search, balancing convergence stability and training efficiency across heterogeneous building datasets.
Next, we constructed an ensemble model tailored to each building. First, we selected either 1D-CNN or LSTM as the base model architecture based on validation performance on each local data. The chosen model was then trained three times using each of the masked time-series inputs (i.e., high impact, low impact, random). These models yield distinct parameterizations due to their different feature subsets. Instead of averaging the predictions from the three sub-models, we aggregate the three parameterized models into one by computing a weighted sum of their parameters (i.e., usage of parameter–level model ensemble). The weights are derived from a softmax function applied to the inverse of each model’s validation loss, giving greater influence to the better-performing models. This function can be calculated as follows:
This allows the ensemble model to capture a richer representation of the building’s unique energy usage pattern. The resulting ensemble model serves as the final local model used for global federated aggregation. This approach improves robustness and generalization, especially in heterogeneous sensing environments. This effectively integrates knowledge from different masked feature distributions. The ensemble models provide a more robust local representation for each building. This ensemble approach may significantly reduce variance in local training, improving FL stability [
4]. The ensemble model now replaces individual models as the primary representation for global aggregation. The entire procedure for model ensemble via multi-level feature masking strategies is described in
Figure 3.
3.2.3. Personalization of Knowledge with Federated Transfer Learning
While FL successfully facilitates decentralized energy consumption prediction, the unique data distributions of individual buildings often lead to heterogeneous feature representations. A single global model trained across multiple buildings struggles to capture these building-specific differences, which can negatively impact prediction accuracy. To address this issue, transfer learning is utilized for the personalization of FL with Feature Extractor Updates, enabling each building to adapt the global model while preserving shared knowledge. Once the FL process is complete, the server distributes the final global model to all participating buildings, as depicted in
Figure 4. However, due to privacy constraints, local building data cannot be centrally aggregated for model retraining. Instead, each building uses the global model as a pre-trained baseline, leveraging knowledge from various building datasets while ensuring data privacy. While Federated Learning inherently offers privacy benefits by keeping raw data on local devices and only sharing model parameters, this approach alone is not sufficient to fully address modern privacy risks. Techniques such as parameter sharing reduce direct exposure of sensitive data, but may still be susceptible to inference attacks or gradient leakage. Therefore, more advanced privacy-preserving mechanisms—such as capsule-based representations or encrypted model updates—are required to further encapsulate sensitive information during training.
The global model’s parameters encapsulate shared representations that can be selectively fine-tuned locally to improve building-specific predictions. To achieve personalized performance without sacrificing scalability, we employ a network-based transfer learning approach. Research indicates that early layers of deep neural networks capture general features, while later layers learn task-specific details. In our PFL framework, the feature extraction layers (e.g., CONV/LSTM blocks) are continuously updated to refine temporal dependencies, while the fully connected layers (MLPs) are fine-tuned using each building’s local data. This method ensures the model evolves to incorporate both global insights and local knowledge through multiple FL rounds. By allowing feature extraction layers to update rather than freezing them, our approach gradually adapts to changing building-specific patterns while retaining global knowledge gained during federated training. This dynamic fine-tuning improves the global model’s robustness and ensures each client’s model is tailored to its unique energy consumption characteristics. Additionally, since only a subset of model parameters requires optimization, this approach remains computationally efficient, reducing both the local data burden and hardware requirements.
3.3. Evaluation Metrics
To evaluate prediction accuracy, MAE, RMSE, and MAPE are used as performance metrics. MAE calculates the mean of the absolute differences between predicted and actual values. RMSE, a quadratic metric, measures the average error magnitude by taking the square root of the mean of squared differences. Notably, RMSE is particularly sensitive to large errors, assigning them higher weights and making it more adaptable when significant prediction discrepancies occur. For all three metrics, lower values indicate better prediction performance, as they reflect a smaller gap between actual and predicted outcomes. Despite the broad application of MAE and RMSE, their results can be less intuitive to interpret. To address this, MAPE is included as it expresses the error rate as a percentage, making it easier to understand and communicate the predictive model’s performance. These three metrics are employed to assess the accuracy of short-term building energy consumption predictions. The calculations for each metric are presented in Equations (8)–(10).
4. Experiment and Results
4.1. Preliminary Analysis of Experimental Data
This study analyzes the total electricity consumption across various building facilities, including heating, cooling, ventilation, and lighting. The electricity consumption data were collected hourly from the BEMS of each target building. A core objective of this research is to examine how the unique characteristics of each building such as different occupancy schedule and building usage types influence its energy consumption patterns. As shown in
Figure 5, different buildings on campus exhibit distinct energy consumption patterns. Educational buildings, for example, consume more energy on weekdays due to classroom usage and lab activities, whereas dormitories maintain a relatively stable consumption pattern regardless of the day. These variations highlight the complexity of energy demand across campus, making it impractical to rely on a single centralized model for energy prediction.
Among the buildings, Building A and D exhibit significantly high electricity consumption, showing a clear distinction between weekday and weekend usage. Notably, electricity consumption in August is higher than in June and July, which correlates with an increase in average outdoor temperature to 27.18 °C, surpassing the cooling system’s setpoint of 28 °C. Similarly, Building E and G demonstrate relatively high electricity consumption with distinct weekday and weekend usage patterns. Building B and F follow a similar trend, ranking next in energy consumption. Conversely, Buildings C and H consume the least amount of electricity. Building C, a student dormitory, exhibits minimal variation between weekdays and weekends, as occupants remain in the building regardless of the day.
Figure 6 presents two correlation analyses: (a) electricity consumption across eight campus buildings and (b) the relationship between mean energy usage and various environmental factors. The left figure shows that Buildings A, D, E, F, and G have high inter-correlations, suggesting similar temporal energy usage patterns—likely driven by shared building functions or occupant behavior. In contrast, Buildings B and H exhibit moderate correlations with the rest, potentially due to their unique roles as a campus clinic and a cultural graduate school, respectively. These functions differ from the typical research-focused usage of the other buildings. Building C, which serves as a student dormitory, displays minimal correlation with other buildings, reflecting its residential consumption characteristics rather than academic or laboratory-based patterns. Meanwhile, the right figure reveals that outdoor temperature has a strong positive correlation (0.70) with overall energy use, indicating that higher temperatures are associated with increased cooling demand. On the other hand, relative humidity and atmospheric pressure show moderate negative correlations, likely reflecting inverse relationships with cooling needs. Solar radiation also demonstrates a notable positive association with energy consumption, reinforcing the role of thermal and solar-related variables as primary drivers of electricity demand on campus.
4.2. Selection of Predictive Model
This section compares the performance of different neural network models to determine the most suitable architecture for federated training. The entire dataset from all experimental buildings was used for training, and the best-performing model was selected based on different evaluation metrics. To identify optimal hyperparameters, a grid search approach was applied to each neural network model. For the 1D-CNN model, three 1D convolution layers with max pooling were used for feature extraction, with filter sizes of 128, 64, and 32. The stride size was fixed at 1 to control movement across the sequential data. For the LSTM model, three recurrent layers with 50, 30, and 10 neurons were employed. Unlike CNN, hyperbolic tangent (tanh) activation was used for recurrent layers, while ReLU activation was applied in the fully connected (FC) layers. After setting the hyperparameter of each feature extractor, three FC layers were commonly used in each model. Its hidden layers consisted of 100, 80, and 50 neurons, each using a ReLU activation function. The model was trained using the RMSprop optimizer with a learning rate of 0.01, while dropout and batch size were set to 20% and 32, respectively.
To ensure model reliability, ten experimental trials were conducted.
Table 5 presents the performance comparison, demonstrating that prediction accuracy was maximized when using a 24 h historical sequence window. Among all models, LSTM achieved the best predictive performance, with MAE of 112.46, RMSE of 174.13, and MAPE of 11.24. Additionally, LSTM maintained consistent accuracy across different time sequences, outperforming 1D-CNN and other models in handling longer input sequences. However, as the sequence length increased beyond 24 h, the prediction accuracy of all models slightly declined.
4.3. Comparison of Prediction Performance
To evaluate the effectiveness of the proposed pFL model in handling data heterogeneity, we designed experiments where each building had a different set of input variables for training, simulating a heterogeneous sensing environment. As shown in
Table 6, while the outdoor temperature and relative humidity were consistently used across all buildings, other input variables such as wind speed, solar radiation, pressure, and dew temperature varied between buildings. This setup ensures that the model must learn from diverse data distributions and adapt to varying sensor configurations, demonstrating the robustness and generalization capability of pFL in real-world scenarios.
The proposed model was comprehensively compared with the other three approaches, including the local prediction, the standard Federated Learning A (FedAvg) and the standard Federated Learning B (FedProx). The local prediction means adopting the local data to directly train a customized model for each building without any knowledge sharing. The evaluation is based on RMSE, MAE, and MAPE, where lower values indicate better performance.
As depicted in
Figure 7, the proposed pFL approach consistently outperforms the other methods, particularly in reducing prediction errors (RMSE, MAE, and MAPE) of all experimental buildings. Unlike FedAvg and FedProx, which aggregate models without accounting for data heterogeneity, pFL integrates ensemble learning with ACF-based multi-level masking and transfer learning, allowing for personalized adaptation to each building’s unique energy consumption patterns within heterogeneous sensing environment and lack of training data. ACF-based multi-level masking enhances feature selection by identifying and prioritizing the most relevant time-series features while reducing the impact of irrelevant or redundant information, leading to improved predictive accuracy. This ensures that the model effectively captures building-specific consumption trends despite variations in sensor configurations and energy usage behaviors. In addition, ensemble learning combines knowledge from multiple masking strategies, increasing model robustness and generalization, which is particularly beneficial in heterogeneous sensing environments where buildings exhibit distinct consumption profiles. Furthermore, transfer learning enables effective personalization by fine-tuning the global model for each building, allowing it to retain shared insights while adapting to local variations in energy consumption. Unlike Local Training, which lacks knowledge sharing across buildings, pFL leverages federated knowledge transfer, allowing higher accuracy even in cases with limited local data availability. In contrast, FedAvg and FedProx assume a more uniform data distribution across clients, making them less effective in non-IID settings, leading to higher errors in RMSE, MAE, and MAPE across all buildings. By integrating ensemble learning with multi-level masking and transfer learning, pFL successfully balances global model generalization with local personalization, ensuring that the predictive model remains scalable while maintaining high accuracy in diverse energy consumption scenarios. Moreover, the proposed framework can be extended beyond campus buildings to residential and industrial domains. We also aim to explore cross-domain transfer learning between different types of energy systems. The results clearly indicate that pFL significantly reduces prediction errors by better accounting for sensor heterogeneity, data scarcity, and varying building usage patterns, making it a more effective and practical solution for intelligent energy management in decentralized environments.
Figure 8 illustrates the performance comparison of different energy prediction models, highlighting the superiority of the proposed pFL approach over FedAvg, FedProx, and Local Training. Unlike standard FL methods, which struggle with non-IID data and fail to personalize predictions, pFL effectively adapts to building-specific consumption patterns through ACF-based multi-level masking, ensemble learning, and transfer learning. These techniques allow pFL to prioritize relevant time-series features, improve model robustness, and fine-tune predictions locally while benefiting from global knowledge. As a result, pFL achieves lower RMSE, MAE, and MAPE, especially in buildings with diverse occupancy and sensor configurations. Compared to FedAvg and FedProx, which apply uniform model aggregation, pFL dynamically optimizes each building’s model, ensuring better peak consumption forecasting and improved generalization. Local models, while capturing individual trends, suffer from data limitations, reinforcing the need for a federated approach. The enhanced adaptability and predictive accuracy of pFL demonstrate its effectiveness in handling decentralized energy forecasting, making it a scalable and privacy-preserving solution for heterogeneous building environments.
4.4. Personalization Performance
The personalization score quantifies how well a personalized model adapts to individual clients compared to a shared global model. It measures the improvement in prediction accuracy when local fine-tuning is applied. Below is a way to define and compute the personalization score in Personalized Federated Learning (pFL). It represents the relative performance gain of the personalized model over the global model, as computed by the validation loss ratio.
where
represents the personalization score for each building and
is the loss function of the global model on the same dataset and
is loss function of the personalized model on the local test dataset. A higher
value indicates a greater improvement achieved through knowledge transfer in the personalization layer. If
= 1, the local model perfectly fits the test data with zero loss, while there is no improvement over the global model if
= 0 (i.e., personalization has no effect). A negative score suggests that personalization has degraded performance. This metric is particularly useful in heterogeneous environments, where each client may have distinct local data distributions. Buildings with highly varying energy consumption patterns may benefit significantly from personalization, whereas others with stable patterns may not see much improvement. Moreover, smaller buildings with low energy consumption might exhibit higher absolute percentage errors (APE), making their personalization scores appear lower due to sensitivity to small absolute losses. By analyzing
across multiple buildings, we can determine the effectiveness of Personalized Federated Learning strategies. This insight helps optimize adaptive model aggregation, fine-tuning strategies, and client-specific adjustments to improve overall system performance.
Figure 9 illustrates the personalization scores of different Federated Learning approaches (pFL, FedProx, and FedAvg) across various buildings. The proposed pFL method consistently outperforms the other methods across all buildings, demonstrating its superior ability to adapt to individual building characteristics. Compared to FedProx and FedAvg, pFL achieves the highest personalization scores in every building, confirming its effectiveness in heterogeneous environments. Among the buildings, Buildings C, D, and F exhibit the most significant performance gaps between pFL and the other methods, indicating that these buildings benefit the most from pFL’s ensemble learning with multi-level feature masking and transfer learning-based adaptation strategies. In contrast, for Buildings A, B, E, G, and H, the differences between pFL and other methods are relatively smaller, suggesting that standard Federated Learning methods (FedProx and FedAvg) can still capture useful patterns but are less effective than pFL in highly diverse settings. Building C shows the most pronounced improvement with pFL, likely due to its distinct occupancy patterns and energy consumption behaviors, which require a more refined and personalized approach. Building D also benefits significantly, reinforcing the importance of Personalized Federated Learning for energy consumption prediction in buildings with fluctuating operational schedules. Building F exhibits another substantial gap, suggesting that conventional FL methods struggle to generalize well in this case, while pFL successfully learns building-specific characteristics. Overall, the results confirm that pFL offers substantial advantages over traditional Federated Learning approaches by leveraging feature masking, model ensemble techniques, and adaptive learning strategies. The strong performance across all buildings, particularly in C, D, and F, highlights its robustness and scalability, making it a more effective solution for federated energy consumption prediction in diverse and data-scarce environments.
5. Conclusions
To improve short-term building energy forecasting and address generalization challenges in heterogeneous sensing environments with limited data, this study proposes a Personalized Federated Learning (pFL) framework. Our approach integrates multi-level feature masking based on autocorrelation of energy data in each building, parameter-level ensemble modeling, and knowledge transfer to enable robust collaboration among buildings with diverse sensor configurations. Experimental results on a real-world campus community dataset demonstrate that the proposed method consistently outperforms conventional FL baselines such as FedAvg and FedProx, particularly in buildings with highly variable energy consumption patterns. The results highlight pFL’s superior adaptability, with each local model effectively personalized through selective knowledge sharing and masked feature representations. The ensemble strategy, combined with feature masking, allows for robust temporal dependency extraction, while transfer learning mitigates the effects of sensor heterogeneity and data scarcity.
Although ablation results are not included in this version, future work will isolate the effects of each core component—feature masking, ensemble modeling, and transfer learning—to better understand their individual contributions. Additionally, we plan to incorporate incentive mechanisms to encourage broader participation in federated systems and extend the model for multi-horizon energy prediction using uncertainty-aware methods such as quantile regression.
From a practical perspective, we acknowledge that the proposed framework introduces additional computational complexity due to multi-stage training and ensemble processing. However, these steps significantly enhance generalization and robustness, justifying the trade-off in real-world deployment scenarios. Future work will explore lightweight optimization strategies to improve training efficiency.
While Federated Learning improves privacy by avoiding raw data transmission, it may still be vulnerable to indirect attacks such as gradient leakage or model inversion. To further enhance data protection, we intend to investigate advanced privacy-preserving mechanisms, including capsule-based encapsulation and secure aggregation techniques.
In summary, the proposed pFL framework demonstrates strong potential for reliable and scalable deployment in distributed energy management systems, especially in environments characterized by data heterogeneity and limited sensing fidelity.
Author Contributions
H.K., H.P. and S.L. conceptualized and designed the experiments; H.K. and S.D. designed and implemented the system; H.K., H.P., S.D. and S.L. wrote the paper. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by “Capacity Building Project for School of Information and Communication Technology at Mongolian University of Science and Technology in Mongolia” (Contract No. P2019-00124) funded by KOICA (Korea International Cooperation Agency).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Acknowledgments
This research was supported by “Capacity Building Project for School of Information and Communication Technology at Mongolian University of Science and Technology in Mongolia” (Contract No. P2019-00124) funded by KOICA (Korea International Cooperation Agency).
Conflicts of Interest
Author Hakjae Kim was employed by the company Class Act Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. All authors have read the final manuscript, have approved the submission to the journal, and have accepted full responsibility for the manuscript’s delivery and contents.
References
- Zhao, H.; Magoulès, F. A Review on the Prediction of Building Energy Consumption. Renew. Sustain. Energy Rev. 2012, 16, 3586–3592. [Google Scholar] [CrossRef]
- Kontokosta, C.E.; Tull, C. A Data-Driven Predictive Model of City-Scale Energy Use in Buildings. Appl. Energy 2017, 197, 303–317. [Google Scholar] [CrossRef]
- Candanedo, L.M.; Feldheim, V.; Deramaix, D. Data Driven Prediction Models of Energy Use of Appliances in a Low-Energy House. Energy Build. 2017, 140, 81–97. [Google Scholar] [CrossRef]
- Park, H.; Park, D.Y.; Noh, B.; Chang, S. Stacking Deep Transfer Learning for Short-Term Cross Building Energy Prediction with Different Seasonality and Occupant Schedule. Build. Environ. 2022, 218, 109060. [Google Scholar] [CrossRef]
- Park, H.; Park, D.Y.; Son, J.J.; Choi, J.-H. Cross-Building Prediction of Natural Ventilation Rate with Small Datasets Based on a Hybrid Ensembled Transfer Learning. Build. Environ. 2023, 242, 110589. [Google Scholar] [CrossRef]
- Venkataramanan, V.; Kaza, S.; Annaswamy, A.M. DER Forecast Using Privacy-Preserving Federated Learning. IEEE Internet Things J. 2022, 10, 2046–2055. [Google Scholar] [CrossRef]
- Wang, R.; Yun, H.; Rayhana, R.; Bin, J.; Zhang, C.; Herrera, O.E.; Liu, Z.; Mérida, W. An Adaptive Federated Learning System for Community Building Energy Load Forecasting and Anomaly Prediction. Energy Build. 2023, 295, 113215. [Google Scholar] [CrossRef]
- Grataloup, A.; Jonas, S.; Meyer, A. A Review of Federated Learning in Renewable Energy Applications: Potential, Challenges, and Future Directions. Energy AI 2024, 17, 100375. [Google Scholar] [CrossRef]
- Wang, Y.; Gao, N.; Hug, G. Personalized Federated Learning for Individual Consumer Load Forecasting. CSEE J. Power Energy Syst. 2022, 9, 326–330. [Google Scholar]
- Wang, R.; Bai, L.; Rayhana, R.; Liu, Z. Personalized Federated Learning for Buildings Energy Consumption Forecasting. Energy Build. 2024, 323, 114762. [Google Scholar] [CrossRef]
- Wu, H.; Xu, Z. Multi-Energy Load Forecasting in Integrated Energy Systems: A Spatial-Temporal Adaptive Personalized Federated Learning Approach. IEEE Trans. Ind. Inform. 2024, 20, 12262–12274. [Google Scholar] [CrossRef]
- Yildiz, B.; Bilbao, J.I.; Sproul, A.B. A Review and Analysis of Regression and Machine Learning Models on Commercial Building Electricity Load Forecasting. Renew. Sustain. Energy Rev. 2017, 73, 1104–1122. [Google Scholar] [CrossRef]
- Jurado, S.; Nebot, À.; Mugica, F.; Avellana, N. Hybrid Methodologies for Electricity Load Forecasting: Entropy-Based Feature Selection with Machine Learning and Soft Computing Techniques. Energy 2015, 86, 276–291. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, Y.; Zeng, R.; Srinivasan, R.S.; Ahrentzen, S. Random Forest Based Hourly Building Energy Prediction. Energy Build. 2018, 171, 11–25. [Google Scholar] [CrossRef]
- Zhang, F.; Deb, C.; Lee, S.E.; Yang, J.; Shah, K.W. Time Series Forecasting for Building Energy Consumption Using Weighted Support Vector Regression with Differential Evolution Optimization Technique. Energy Build. 2016, 126, 94–103. [Google Scholar] [CrossRef]
- Li, Q.; Ren, P.; Meng, Q. Prediction Model of Annual Energy Consumption of Residential Buildings. In Proceedings of the 2010 International Conference on Advances in Energy Engineering, Beijing, China, 19–20 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 223–226. [Google Scholar]
- Kim, M.; Lee, S.; Jeong, T. Time Series Prediction Methodology and Ensemble Model Using Real-World Data. Electronics 2023, 12, 2811. [Google Scholar] [CrossRef]
- Yang, S.; Kim, M.; Lee, S. Deep-Learning-Based Natural Ventilation Rate Prediction with Auxiliary Data in Mismeasurement Sensing Environments. Electronics 2023, 12, 3294. [Google Scholar] [CrossRef]
- Kim, M.; Lee, S. Augmenting Knowledge for Individual NVR Prediction in Different Spatial and Temporal Cross-Building Environments. Electronics 2024, 13, 2901. [Google Scholar] [CrossRef]
- Jin, N.; Yang, F.; Mo, Y.; Zeng, Y.; Zhou, X.; Yan, K.; Ma, X. Highly Accurate Energy Consumption Forecasting Model Based on Parallel LSTM Neural Networks. Adv. Eng. Inform. 2022, 51, 101442. [Google Scholar] [CrossRef]
- Anand, P.; Deb, C.; Yan, K.; Yang, J.; Cheong, D.; Sekhar, C. Occupancy-Based Energy Consumption Modelling Using Machine Learning Algorithms for Institutional Buildings. Energy Build. 2021, 252, 111478. [Google Scholar] [CrossRef]
- Pei, J.; Liu, W.; Li, J.; Wang, L.; Liu, C. A Review of Federated Learning Methods in Heterogeneous Scenarios. IEEE Trans. Consum. Electron. 2024, 70, 5983–5999. [Google Scholar] [CrossRef]
- Banabilah, S.; Aloqaily, M.; Alsayed, E.; Malik, N.; Jararweh, Y. Federated Learning Review: Fundamentals, Enabling Technologies, and Future Applications. Inf. Process. Manag. 2022, 59, 103061. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, FL, USA, 20–22 April 2017; JMLR: W&CP. Volume 54, pp. 1273–1282. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Beltrán, E.T.M.; Pérez, M.Q.; Sánchez, P.M.S.; Bernal, S.L.; Bovet, G.; Pérez, M.G.; Pérez, G.M.; Celdrán, A.H. Decentralized Federated Learning: Fundamentals, State of the Art, Frameworks, Trends, and Challenges. IEEE Commun. Surv. Tutor. 2023, 25, 2983–3013. [Google Scholar] [CrossRef]
- Tan, A.Z.; Yu, H.; Cui, L.; Yang, Q. Towards Personalized Federated Learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9587–9603. [Google Scholar] [CrossRef]
- Javeed, D.; Saeed, M.S.; Kumar, P.; Jolfaei, A.; Islam, S.; Islam, A.K.M.N. Federated Learning-Based Personalized Recommendation Systems: An Overview on Security and Privacy Challenges. IEEE Trans. Consum. Electron. 2023, 70, 2618–2627. [Google Scholar] [CrossRef]
- Chen, H.; Frikha, A.; Krompass, D.; Gu, J.; Tresp, V. FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2023, Paris, France, 2–6 October 2023; pp. 4849–4859. [Google Scholar]
- Li, Q.; Li, X.; Liu, Z.; Qi, H. PFedCE: Personalized Federated Learning Based on Contribution Evaluation. In Proceedings of the 2024 4th International Conference on Artificial Intelligence, Robotics, and Communication (ICAIRC), Xiamen, China, 27–29 December 2024; IEEE: Piscataway, NJ, USA, 2025; pp. 262–270. [Google Scholar]
- Mestoukirdi, M.; Zecchin, M.; Gesbert, D.; Li, Q. User-Centric Federated Learning: Trading off Wireless Resources for Personalization. IEEE Trans. Mach. Learn. Commun. Netw. 2023, 1, 346–359. [Google Scholar] [CrossRef]
- Savi, M.; Olivadese, F. Short-Term Energy Consumption Forecasting at the Edge: A Federated Learning Approach. IEEE Access 2021, 9, 95949–95969. [Google Scholar] [CrossRef]
- Ahmadi, A.; Talaei, M.; Sadipour, M.; Amani, A.M.; Jalili, M. Deep Federated Learning-Based Privacy-Preserving Wind Power Forecasting. IEEE Access 2022, 11, 39521–39530. [Google Scholar] [CrossRef]
- Fernández, J.D.; Menci, S.P.; Lee, C.M.; Rieger, A.; Fridgen, G. Privacy-Preserving Federated Learning for Residential Short-Term Load Forecasting. Appl. Energy 2022, 326, 119915. [Google Scholar] [CrossRef]
- Tun, Y.L.; Thar, K.; Thwal, C.M.; Hong, C.S. Federated Learning Based Energy Demand Prediction with Clustered Aggregation. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju Island, Republic of Korea, 17–20 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 164–167. [Google Scholar]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).