Next Article in Journal
A Study on a Radio Source Location Estimation System Using High Altitude Platform Stations (HAPS)
Previous Article in Journal
Low-Cost Dynamometer for Measuring and Regulating Wrist Extension and Flexion Motor Tasks in Electroencephalography Experiments
Previous Article in Special Issue
Real-Time Monitoring of Cable Sag and Overhead Power Line Parameters Based on a Distributed Sensor Network and Implementation in a Web Server and IoT
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Affinity-Driven Transfer Learning for Load Forecasting

by
Ahmed Rebei
,
Manar Amayri
and
Nizar Bouguila
*
Concordia Institute for Information Systems Engineering, Montreal, QC H3G1M8, Canada
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(17), 5802; https://doi.org/10.3390/s24175802
Submission received: 17 August 2024 / Revised: 31 August 2024 / Accepted: 5 September 2024 / Published: 6 September 2024
(This article belongs to the Special Issue Sensors Technology and Data Analytics Applied in Smart Grid)

Abstract

:
In this study, we introduce an innovative method for load forecasting that capitalizes on the concept of task affinity score to measure the similarity between various tasks. The task affinity score emerges as a superior technique for assessing task similarity within the realm of transfer learning. Through empirical evaluation on a synthetic dataset, we establish the superiority of the task affinity score over traditional metrics in task selection scenarios. To operationalize this method, we unveil the Affinity-Driven Transfer Learning (ADTL) algorithm to enhance load forecasting precision. The ADTL algorithm enriches the transfer learning framework by incorporating insights from both pre-trained models and datasets, thereby augmenting the accuracy of load forecasting for new and unseen datasets. The robustness of the ADTL algorithm is further evidenced through its application to two empirical datasets, namely the dataset provided by the Australian Energy Market Operator (AEMO) and the Smart Australian dataset. In conclusion, our research underscores the important role of the task affinity score in refining transfer learning methodologies for load forecasting applications.

1. Introduction

Load forecasting is the process of predicting future electricity demand. It is an essential task for electricity grids and power system operators to allow for proper functioning and maintain a reliable supply of electricity. Therefore, accurate forecasting is imperative to plan and manage electricity generation, transmission, storage, and distribution. In addition, it is crucial to ensure the stability and reliability of power grids and make informed decisions about electricity generation and transmission capacity [1,2,3]. In recent years, the electricity demand has been increasing due to the emergence of new technologies. The complexity and variability of electricity data are increasing, making it more challenging to address the load forecasting problem.
Many factors can impact the accuracy of load forecasting. For example, weather patterns, economic conditions, changes in electricity consumption patterns, extreme weather events such as heatwaves or cold spells, changes in economic activity, and the adoption of energy-efficient technologies can significantly alter electricity demand. Additionally, the consumption patterns of individual customers can vary considerably over time, making it difficult to predict the overall load on the grid accurately.
Various statistical algorithms have been developed for load forecasting to address these challenges. Before deep learning models became popular, several statistical methods were used to predict future values, such as the autoregressive integrated moving average (ARIMA) [4] and exponential smoothing [5]. However, these methods assume that the data are stationary, which is not valid for electricity demand. Electricity demand is often volatile and exhibits complex trends and seasonal patterns that traditional statistical models cannot capture. Therefore, many machine learning methods have been proposed to mitigate this issue [6], such as support vector regressors [7], fuzzy logic [8], artificial neural networks (ANNs) [9,10,11], radial basis functional networks (RBFNs) [12], and hybrid methods [13,14,15]. In recent years, neural networks, especially recurrent neural networks (RNNs), have become famous for forecasting because they can model nonlinear features and take into account the temporal structure of the data, making them more effective at capturing the evolution of load data [16,17]. More complex approaches have also been proposed in the literature, combining different types of neural networks. In [18], for example, the authors proposed a new forecasting approach that considers both temporal and spatial features using a graph convolution network (GCN) and a multiresolution convolutional neural network (CNN) for short-term wind power forecasting. Similarly, Jiang [19] focused on developing a new learning mechanism to enhance the mapping capability of multi-step demand in building energy forecasting. He also proposed a deep-chain echo state network (DCESN) to effectively prevent error accumulation compared to sliding-window echo state networks and LSTM models.
In addition to the trivial forecasting problem, the context of load forecasting presents some other issues—mainly data scarcity and source volatility. One promising approach for addressing the load forecasting problem is transfer learning. This machine learning technique allows a model to quickly adapt to new tasks by learning from past experiences. Transfer learning has been successfully applied to a variety of functions in various fields, including computer vision, natural language processing, and robotics [20,21,22,23,24]. In the context of load forecasting, transfer learning can potentially improve load prediction efficiency by allowing the model to learn from a diverse set of past forecasting tasks and adapt to new ones more quickly.
In this paper, we propose the use of transfer learning to solve the load forecasting problem. We begin by reviewing the existing literature on load forecasting and transfer learning, highlighting the challenges and limitations of traditional load forecasting approaches and the potential benefits of using transfer learning. Indeed, the core contribution of our work lies in the mathematical framework we propose, which addresses specific limitations of traditional load forecasting and transfer learning methods. Traditional approaches often lack a rigorous mechanism for selecting the most relevant pre-trained models for new tasks, leading to suboptimal performance. Our work fills this gap by introducing a novel task distance metric grounded in mathematical theory, which enables more precise and effective model selection. We then describe our proposed transfer learning-based approach for load forecasting, including a review of the task affinity score and its use in transfer learning. Finally, we present our experimental evaluation results, demonstrating the effectiveness of our approach.
Overall, this work’s contributions can be summarized as follows:
  • In an empirical study, we demonstrate the usefulness of using the task affinity score as a measure of task distance for selection of the nearest source task from which to transfer knowledge (see Section 3.1).
  • We propose a transfer learning approach integrating the task affinity score as a distance metric for source task selection (see Section 3.2).
  • We improve the efficacy of load forecasting deep learning models in terms of training time and prediction score (see Section 4).
This paper is structured as follows. Section 2 comprehensively reviews the relevant literature related to transfer learning in load forecasting. Section 3 outlines the methodology used in this study, including the formulation of the task affinity score and the algorithm used in the experimental section. In Section 4, we present two case studies that illustrate the practical application of our methodology. Section 5 outlines future work that could build upon our findings and concludes the paper by summarizing our main findings and outlining their implications for future research and practice.

2. Literature Review

Accurate electric load forecasting is crucial for the safety and efficient operation of modern electric power systems, and various methods have been proposed to improve it. Several studies have suggested using transfer learning techniques to address the challenge of limited training data. For example, in [25], the authors proposed two deep learning models and a transfer learning framework to improve energy consumption prediction accuracy for buildings with limited data and demonstrated the effectiveness of the models through a case study of three office buildings. The proposed models, a sequence-to-sequence (seq2seq) model and a two-dimensional convolutional neural network with an attention layer, showed improved forecast accuracy over an extended memory network under a poor information state.
Similarly, in [26], the authors proposed a transfer learning-based artificial neural network model for one-hour-ahead building energy prediction to address the challenge of insufficient data for the training of data-driven predictive models for new buildings and existing buildings without advanced building automation systems. The study used data from 400 non-residential buildings from the open-source Building Genome Project to test the proposed method and found that transfer learning can effectively improve the accuracy of Back Propagation Neural Network (BPNN)-based building energy models for information-poor buildings with limited training data. The research also identified the most influential building features that influence the effectiveness of transfer learning, particularly in selecting appropriate source buildings and datasets.
In [27], Fang et al. proposed a novel hybrid deep transfer learning strategy to improve the accuracy of energy predictions in buildings with limited historical measurements. The approach combines long short-term memory and a domain adversarial neural network to extract temporal and domain-invariant features between source and target buildings. Experiments showed that this strategy significantly enhances building energy prediction performance compared to models trained on target or source-only data without transfer learning. The results can guide the effective use of existing building data resources. Another approach intended to be applied effectively to intelligent energy management in smart buildings was introduced in [28]. Using transfer learning and long short-term memory models, the authors proposed a new MEC-TLL framework for forecasting electric energy consumption in smart buildings. The framework uses a k-means clustering algorithm to group the daily load demand of many profiles in the training set. Then, it applies transfer learning to LSTM models to reduce computational time. The proposed approach was tested on two smart buildings in South Korea, and the results showed that it can reduce computational time while achieving superior performance compared to other models.
Zhou et al. proposed an integrated load forecasting model for an Integrated Energy System (IES) to improve energy scheduling [29]. The model addresses the problem of insufficient data for new users in the IES by combining Bidirectional Generative Adversarial Networks (BiGANs), data augmentation, and transfer learning techniques. The proposed model was compared to ten other data-driven models for two different types of users, namely residential and commercial, and found to be more accurate, on average, for each user type. The study also analyzed the impact of sample size, showing that the proposed model can improve the efficiency of other predictive models and can be used for load forecasting even when data are lacking. Peng et al. used a multi-source transfer learning-guided ensemble LSTM method (MTE-LSTM) to address the problem of insufficient energy data [30]. The process uses a two-stage source-domain building-matching method to find similar buildings and an LSTM modeling strategy that combines transfer learning and fine tuning to generate basic load forecasting models for the target building. An ensemble strategy is then used to weigh the output results of the basic forecasting models. The method was applied to multiple actual buildings and achieved high-precision load forecasting results when the target building data were relatively limited.
In another work [31], the authors found that a two-layer transfer learning-based architecture for short-term load forecasting (STLF) can improve the forecasting accuracy of load in a target zone. The architecture utilizes load data from source zones and includes an inner layer where latent parameters are introduced to represent the differences in electricity consumption behavior between zones. An iterative algorithm is developed in the outer layer to assign variant weights to datasets according to their fitness relative to the latent parameter-assisted model. Results from case studies showed that the proposed STLF architecture can improve the forecasting accuracy of classic STLF algorithms, mainly when the load data of the target zone are limited. Another work [32] presented a solution to the problem of developing predictive models for energy assets, such as electricity loads and PV power generation, using limited data. The authors proposed an energy-predictive model based on convolutional neural networks (CNNs) to capture time-series patterns, trends, and seasonalities in energy assets. They then proposed a transfer learning strategy to improve the model’s performance with limited training data. The approach was demonstrated in a case of daily electricity demand forecasting, and the results showed that the transfer learning strategy improves existing forecasting methods. The authors of [33] addressed the problem of insufficient data by training graph neural network (GNN)-based models in newly built residential neighborhoods. They proposed a transfer learning framework that uses knowledge from other areas with abundant data to assist the model in learning in areas with limited data. Specifically, the authors proposed an “attentive transfer framework” that ensembles GNN models trained from source domains and a GNN model trained on the target domain. The framework assigns dynamic weights to different GNN models based on the input data. The proposed framework was tested on real-world datasets, and the results showed that it is effective in various scenarios.
While existing research in load forecasting has explored various methodologies, including machine learning models and transfer learning techniques, these approaches often lack the adaptability needed for dynamic task environments. Most studies have focused on improving accuracy within a specific context, without addressing the broader challenge of model generalization across diverse tasks. Our work advances this field by introducing the ADTL framework, which uniquely incorporates a task distance metric to guide the selection of pre-trained models. Unlike traditional methods that apply a one-size-fits-all approach, our method dynamically adapts to the specific characteristics of each task, offering significant improvements in both accuracy and flexibility. This innovation positions our work as a critical contribution to the ongoing evolution of load forecasting methodologies.

3. Methodology

This section presents the task affinity score as a task-distance metric. We develop an empirical proof of the effectiveness of the TAS as the distance between two tasks. In this context, we define a task as a model–dataset pair for simplicity. For a source dataset ( X a ), we train a model ( f a ) using L a ( θ ) as a loss function. The instigated distance is applied using a target dataset ( X b ) and the same model ( f a ) using the same loss function ( L a ( θ ) ).

3.1. Task Affinity Score

3.1.1. Theoretical Formulation

The task affinity score is a measure based on Fisher information to approximate the similarity between two tasks. It determines how easily one task can gain knowledge from another task. It can help identify which tasks are most closely related and, therefore, most likely to benefit from shared knowledge [34].
First, we must define the Fisher information matrix to calculate the task affinity score. This matrix is a measure of the amount of information that is gained about a particular task after training. It is calculated by taking the expectation of the second derivative of the log likelihood of the loss function with respect to the task parameters. Once we define the Fisher information matrix, we use it to calculate the task affinity score. This is done by comparing the Fisher information matrices of the source and target tasks to determine the pseudo-distance between them. The greater the distance, the higher the task affinity score, indicating that knowledge gained from the source task is more likely to help learn the target task. For a neural network ( f θ ) with weights ( θ ) and a negative log-likelihood loss function ( L ( θ ) ), we define the Fisher information matrix as follows:
F ( θ ) = E θ L ( θ ) θ L ( θ ) t = E H ( L ( θ ) )
where H is the Hessian matrix.
To calculate the Fisher information matrix in practice, we use an empirical approach, as shown in Equation (2).
F ^ ( θ ) = 1 | X | i X θ L i ( θ ) θ L i ( θ ) t
where for dataset X , L i ( θ ) is the loss at the ith data point in the dataset.
The task affinity score between the source dataset ( X a ) and the target dataset ( X b ) is calculated using the Fréchet distance based on the Fisher information matrices of the network ( f θ ). f θ is trained on the dataset ( X a ). Specifically, the TAS is defined as follows:
s [ a , b ] = 1 2 T r a c e F a , a + F a , b 2 F a , a F a , b 1 2 1 2
where F a , a is the Fisher information matrix of f θ with source dataset X a and F a , b is the Fisher information matrix of f θ with target dataset X b .
The full Fisher information matrix is not used because it is computationally expensive to calculate in the ample space of neural network parameters. Instead, we calculate the diagonal approximation of the Fisher information matrix. These matrices are also normalized to have a unit trace. As a result, the TAS formula in Equation (3) can be simplified to the following form:
s [ a , b ] = 1 2 F a , a 1 2 F a , b 1 2 = 1 2 i ( F a , a i i ) 1 2 ( F a , b i i ) 1 2 1 2
where F a , a i i and F a , b i i denote the diagonal elements of F a , a and F a , b , respectively. The value of the TAS ranges from 0 to 1, where a score of 0 indicates a perfect similarity and a score of 1 indicates complete dissimilarity.

3.1.2. Empirical Justification

We conducted a simulation using synthetic data to investigate the usefulness of the task affinity score as a measure of similarity between tasks. In this simulation, we added a linear trend and Gaussian noise information to a pseudo-sine function in a recursive manner to simulate the increasing difference between the datasets. The pseudo-sine function we used attempts to mimic the behavior of load data from week to week, as described in Equation (5). Figure 1 shows the difference between some datasets we used in this simulation.
y i = G ( t ) . 1 + sin ( 2 π t π 2 ) + τ i ( t ) + ϵ i ( t )
G ( t ) = 1 w e e k d a y s ( t ) + α . 1 w e e k d a y s ( t ) α = 1.7 τ i ( t ) = k i t k i [ 0 , 1 ] ϵ ( t ) N ( 0 , σ i 2 ) σ i [ 0.1 , 1 ]
where t, k i , and  s i g m a i are linearly spread in the respective intervals. We used 48 data points to simulate one day (although it is not necessary, we attempted to mimic the 30 min sampling rate used in the datasets in the experimental section). We used ten datasets.
Our results showed that as the difference between the datasets increased, the task affinity score consistently increased, indicating that the task affinity score effectively captured the increasing dissimilarity between the tasks. This is clear from Figure 2, where we trained two models for a few epochs on dataset 1 and dataset 5. The task affinity score increased as we strayed further away from the primary dataset. This simulation supports the intuition behind using the task affinity score as a measure of similarity between tasks, as it demonstrates that the metric is sensitive to changes in the differences between the datasets. Overall, our simulation results support the use of the task affinity score as a reliable and valid measure of the dissimilarity between tasks.

3.1.3. TAS vs. MSE

This section compares the task affinity score (TAS) and the mean squared error (MSE) as metrics to measure the distance between tasks. We trained a fully connected neural network and a long short-term memory (LSTM) model on the same datasets from the previous experiment (first and sixth datasets). We compared the distances between all tasks to these target datasets. To evaluate the performance of the TAS and MSE metrics, we computed the distance between every task (trained models and respective datasets) and the target task using both metrics and compared the results.
The results in Figure 3 show that the TAS metric outperforms the MSE loss in selecting the appropriate initial task to transfer the knowledge. Comparing Figure 2 and Figure 3, we can see how the loss function does not show the same pattern as the TAS distance. In particular, the TAS was more reliable at identifying the nearest tasks and distinguishing dissimilar tasks, allowing the model to converge faster while requiring fewer training epochs. In contrast, the MSE loss was inconsistent, leading to less efficient model selection. On the other hand, we trained different fully connected neural networks and LSTM models with random weights. We calculated the number of epochs required to achieve the same performance as the nearest (and the second nearest) neural network. We present the results in Table 1.
In conclusion, our empirical study demonstrates that the TAS metric is a superior choice for measuring the distance between tasks compared to the MSE loss. The TAS metric is more accurate and consistent in identifying similar tasks and distinguishing dissimilar tasks.

3.2. Affinity-Driven Transfer Learning

This paragraph presents the methodology behind the affinity-driven transfer learning algorithm. As shown in Figure 4 and detailed in Algorithm 1, the algorithm has two steps, namely a learning step and a transfer learning step. In the first step, we train different algorithms on different elements of a particular grid. The elements are usually historical household electricity data. The models are then stored for future use. The second step is selecting a suitable model to transfer knowledge, from which we add new elements to the grid by calculating the nearest task in terms of the task affinity score.
Algorithm 1 Affinity-Driven Transfer Learning
Input: Old grid elements Electricity Demand (OGTS)
Input: New grid element Electricity demand (NGTS)
Output:  h *
  I - pretraining:
 1: train h i models on i { 1 m } dataset from OGTS
  II - Transfer Learning:
 2: h * = min h i , i { 1 m } T A S ( τ i , t a u n e w )
 3: Train h * for few epochs
 4: return  h *

3.3. Models and Metrics

3.3.1. Fully Connected Neural Networks

A fully connected network, also known as a fully connected layer or a dense layer, is an artificial neural network in which each neuron in a layer is connected to every neuron in the previous layer. In other words, each neuron in a fully connected layer receives input from all the neurons in the previous layer [35].
Let us consider a fully connected layer with n inputs and m outputs [36], where the inputs are represented by a vector x of size n and the outputs are represented by a vector y of size m. Each output ( y i ) is computed as a weighted sum of the inputs ( x j ) plus a bias term ( b i ), then passed through an activation function (g) as shown in Equation (6).
y i = g ( j = 1 n w i j x j + b i )
where w i j represents the weight of the connection between the jth input and the ith output and b i represents the bias term for the ith output.
This equation can also be represented in matrix form as follows:
y = σ ( Wx + b )
where W is a weight matrix of size ( m × n ) , b is a bias vector of size m, x is the input vector of size n, and σ is the sigmoid function that operates on the elements of the output vector ( y ).
In a multilayer neural network, the output of one fully connected layer is typically fed as input to the next fully connected layer, and so on, until the final output layer is reached.

3.3.2. LSTMs

LSTM models are gated recurrent neural networks [37] used in various areas, such as image generation [38], speech recognition [39], natural language processing [40], and time-series forecasting [41].
An LSTM model uses the same weights for every timestamp. It takes the input sequence element by element and carries hidden information from one timestamp to the next, as shown in Figure 5. The classic recurrent neural network (RNN) fails to carry information from long in the past because of the gradient vanishing problem. On the other hand, an LSTM model uses gates to carry different information when scanning the input sequence. So, instead of one simple activation function, an LSTM model uses Equations (7)–(12) as follows:
i t = σ ( W i · h t 1 + U i · x t + P i · C t 1 + b i )
f t = σ ( W f · h t 1 + U f · x t + P f · C t 1 + b f )
C ˜ t = ψ ( W c · h t 1 + U c · x t + b c )
C t = f t C t 1 + i t C ˜ t
o t = σ ( W o · h t 1 + U o · x t + P o · C t 1 + b o )
h t = o t ψ ( C t )
where subscripts i, f, and o denote the input, forget, and output gates, respectively; h denotes the hidden state vector; C is the long-term state vector; W i , W f , W o , and W c represent the weight matrices of the hidden information from the last timestamp; U i , U f , U o , and U c represent the weight matrices of the input information; and P i , P f , P o , and P c represent the weight matrices of the long-term state (C). The terms b i , b f , and b o are biases of the gates. σ is the sigmoid operator, ψ is the hyperbolic tangent function (tanh), and ⊙ is the element-wise multiplication operator.

3.3.3. Metrics

To evaluate the accuracy of our proposed model, we relied on commonly used evaluation metrics, namely the Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE), as represented in Equations (13) and (14), respectively.
R M S E ( y , y ^ ) = t = 1 n | y t y ^ t | 2 n
M A P E ( y , y ^ ) = 1 n t = 1 n | y t y ^ t y t |

4. Case Study

In this experimental section, we present two case studies that compare the AEMO dataset’s forecasting performance using the ADTL algorithm and an initialized network. In the second case study, we extend our analysis to investigate the performance of the nearest model and random network over different forecast horizons. By doing so, we aim to identify whether the relative performance of each model remains consistent across various time horizons. The selected datasets are highly relevant to electricity consumption, where transfer learning significantly enhances predictive accuracy and model efficiency. By focusing on open-source datasets, we ensure that our results are replicable and accessible to the broader research community, fostering transparency and further innovation. Additionally, the datasets are relatively clean, allowing us to concentrate on the core contributions of our paper rather than data preprocessing challenges, thereby providing a more explicit demonstration of the effectiveness of our proposed methods.

4.1. Case Study 1: Australian Dataset

The Australian Energy Market Operator (AEMO) dataset [42] is a collection of energy market data from the National Electricity Market (NEM) in Australia. The NEM is a wholesale electricity market that covers the eastern and southeastern parts of Australia. The AEMO dataset includes data on electricity demand and supply in the NEM. These data are collected in real time from market participants, including electricity generators, retailers, and transmission network service providers, and are used to manage the operation of the electricity grid and ensure a reliable supply of electricity to consumers.
In this work, we selected a few datasets from different parts of the market. We covered different years and locations to obtain the spatial and temporal differences between the datasets. The time series were selected from Queensland, New South Wales, Victoria, South Australia, and Tasmania, as detailed in Table 2.
In this study, we split our datasets into a meta subset and a query subset. The meta subset was used to train multiple models, and the query subset was used to evaluate their transferability. Specifically, we trained five different models on the five datasets from the meta subset and evaluated their performance when used on the datasets from the query subset.
To evaluate the transferability of knowledge from the meta tasks to new tasks, we calculated the task affinity score distance between the meta tasks and each of the query tasks. We then measured the average performance after transferring knowledge from the meta models to each task from the query set. We found a high correlation between the MAPE performance and the task affinity score distance. Figure 6 shows two subfigures labeled Figure 6a and Figure 6b. Figure 6a illustrates the task affinity score (TAS) applied to each dataset from the meta set with respect to the QLD18 dataset.
On the other hand, Figure 6b displays each model’s mean absolute percentage error performance from the meta set trained on the QLD18 dataset from the query dataset. This figure shows a very high correlation between the performance of the models and the distance between tasks calculated by the TAS. We list the different correlations of the same experiment with different datasets from the query set in Table 3. The correlation between the proposed TAS distance and the MAPE performance is consistently high, with a mean value of 88.78%. In this experiment, to ensure the stability of the TAS metric, we experimented with each of the query sets ten times and averaged the results. These results indicate that the similarity between the meta tasks and the query tasks influences the transferability of knowledge from the meta models to new models. In particular, the closer the task affinity score between the two sets of tasks, the better the transferability of knowledge.
In addition to the results discussed above, we trained the nearest and the second nearest models to further compare the performance using the TAS information and trained the model from scratch. In Table 4, we list the MAPE of the models that were pre-trained on the different datasets of the meta set on the query datasets. We performed the experiment for zero, one, and five epochs. The results indicate that the knowledge gained from the nearest task made the models faster in terms of epoch convergence. The results from the second nearest task indicate that transferring the knowledge even from the second nearest task is still a better starting point for training neural networks than training them with random weights. The average MAPE of the nearest model is 0.29 without training, compared to an average of 0.95 when we trained a random FCN for five epochs.
Furthermore, we downsampled the datasets from the query set to 10%, 20%, and 50% of their original sizes. Downsampling simulates data scarcity in this context. Table 5 shows the MAPE performance of the nearest and second nearest models when trained on different datasets. The results demonstrate that even with a reduction in data due to downsampling, transfer learning from the nearest dataset can result in good performance. In all cases, the nearest model performed well compared to the models trained on the whole dataset, indicating that the knowledge transferred from the nearest task overcame the data scarcity. For example, the average MAPE performance was better by a factor of 32% when using the pre-trained model and a random FCN. In Figure 7, we show the performances of different networks in the experiment. The figure clearly shows how the nearest model achieves better forecast ability than the random network.

4.2. Case Study 2: Smart Grid Dataset

“The Smart-Grid Smart-City Customer Trial Data” dataset was collected as part of a trial conducted by the Australian Government Department of Climate Change, Energy, the Environment, and Water [43]. The dataset contains electricity consumption data from around 1300 households in New South Wales, Australia, collected over a period of 12 months. The data include half-hourly electricity consumption readings.
In the context of transfer learning for load forecasting in an electricity grid, the meta set can be seen as a set of pre-trained models that have already learned to forecast the load for some subset of elements in the grid. These models were trained on historical data and can be thought of as “experts” in predicting the load for those specific elements. The query set, in this case, refers to the new models that need to be trained for forecasting of load for previously unseen or new elements in the same electricity grid. These new models need to be trained on a smaller amount of data as compared to the pre-trained models in the meta set. The goal of transfer learning in this scenario is to leverage the knowledge learned by the pre-trained models on the elements they have already forecasted to improve the accuracy and efficiency of training of the new models for the new elements.
The primary goal of our experiments on this dataset was to evaluate the effectiveness of our approach in transferring knowledge from a set of known source apartments to forecast the electricity consumption in a target apartment. Specifically, we sought to determine whether the TAS could identify the most relevant source task for a given target task and whether this approach could speed up the transfer learning process. We selected a sample of 20 random apartments as the meta set and a sample of 30 apartments as the query set. Then, we trained 20 different LSTM models on the apartments’ data from the meta set, serving as the different possible sources for the query set of apartment data.
To evaluate the performance of ADTL, we used the mean absolute percentage error (MAPE) and the root mean square error (RMSE). Overall, our experiments on the dataset demonstrate the potential of ADTL as an effective transfer learning approach when the source and target tasks are closely related. In Table 6, we present a comparison between a random LSTM model and the selected LSTM model using the ADTL approach. We evaluated the algorithm’s performance by comparing the results of a randomly initialized LSTM model with the results of the nearest LSTM model in terms of task distance. After training for five epochs, the random network’s performance could not be improved and remained very poor for all prediction windows and horizons (1 day, 3 days, and 7 days), with an average mean absolute percentage error (MAPE) of 1.83 and an average root mean squared error (RMSE) of 0.233. In contrast, the nearest LSTM model showed significant improvement in performance after being trained for five epochs, achieving high accuracy for all prediction horizons, with an average MAPE and of of 0.73 and an average RMSE of 0.196. These results highlight the limitations of traditional deep learning models in learning from new and unseen data and demonstrate the potential of our transfer learning approach to improve their performance.
As shown in Table 7, we conducted experiments with reduced data samples from each apartment to simulate data scarcity issues. We downsampled the data from the query set. We present the average performance of a randomly generated LSTM network and the nearest LSTM network in terms of TAS. Our results demonstrate that the nearest LSTM network outperformed the randomly generated network. Notably, after only five epochs of training, the nearest LSTM network achieved performance comparable that achieved following training on the complete dataset without downsampling.

5. Future Work and Conclusions

Our results suggest that the TAS metric should be given more consideration as a metric for evaluating the transferability of machine learning models in real-world applications, particularly in scenarios where efficiency and convergence speed are important considerations. This finding has important implications for the design of meta-learning algorithms and suggests that incorporating task similarity information can improve the performance of these algorithms. In particular, MAML [44] was used as a part of the transferable model-agnostic meta-learning (T-MAML) approach [45], proposing an approach for load forecasting for single households that enables multiple households to collaboratively train a generic artificial neural network (ANN) model, then further train the model at each target household node for the purpose of STLF. We propose the use of the TAS distance to select the subset of the dataset used in the meta-learning phase to minimize the number of learning steps instead of a random selection.
In this paper, we present empirical evidence to support the use of the task affinity score (TAS) as a reliable and effective distance measure for task similarity in transfer learning. Our study also introduces a new transfer learning algorithm called affinity-driven transfer learning (ADTL), which leverages the TAS to select the most appropriate source task for knowledge transfer. To evaluate the effectiveness of ADTL, we conducted experiments on two datasets, namely the AEMO dataset and the smart apartment dataset. Our results demonstrate that ADTL outperformed traditional transfer learning approaches in terms of mean absolute percentage error (MAPE) across both datasets. These findings suggest that ADTL can successfully identify the most relevant source task for knowledge transfer based on the task affinity score. Furthermore, we investigated the effectiveness of ADTL under data scarcity conditions. To simulate this scenario, we downsampled the datasets to 10%, 20%, and 50% of their original sizes. Our experiments showed that even under these data-scarce conditions, ADTL continued to perform significantly better than randomly initialized models. This demonstrates the potential of ADTL to address the data scarcity problem in transfer learning and suggests that it is a promising approach for real-world applications. Overall, our study provides important insights into the use of thee TAS as a distance measure for task similarity in transfer learning and highlights the potential of ADTL as an effective transfer learning approach. Future research in this area could further refine the use of the TAS in transfer learning and explore additional approaches to leverage task similarity to improve knowledge transfer.

Author Contributions

Conceptualization, A.R.; methodology, A.R.; software, A.R.; validation, A.R.; formal analysis, A.R.; investigation, A.R.; data curation, A.R.; writing—original draft preparation, A.R.; writing—review and editing, M.A. and N.B.; supervision, M.A. and N.B.; project administration, M.A. and N.B.; funding acquisition, M.A. and N.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grant number 6656-2017.

Data Availability Statement

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yu, Z.; Niu, Z.; Tang, W.; Wu, Q. Deep learning for daily peak load forecasting—A novel gated recurrent neural network combining dynamic time warping. IEEE Access 2019, 7, 17184–17194. [Google Scholar] [CrossRef]
  2. Barkous, H.; Amayri, M.; Bouguila, N. A Comprehensive Analysis of a Hybrid Deep Learning Model for Midterm Electric Load Forecasting. In Proceedings of the 2023 IEEE International Conference on High Performance Computing and Communications, Data Science and Systems, Smart City and Dependability in Sensor, Cloud and Big Data Systems and Application (HPCC/DSS/SmartCity/DependSys), Melbourne, Australia, 17–21 December 2023; pp. 795–800. [Google Scholar]
  3. Bouzid, M.; Amayri, M.; Bouguila, N. Addressing Load Forecasting Challenges in Industrial Environments Using Time Series Deep Models. In Proceedings of the 2023 6th International Conference on Computational Intelligence and Intelligent Systems, CIIS 2023, Tokyo, Japan, 25–27 November 2023; ACM: New York, NY, USA, 2023; pp. 52–58. [Google Scholar]
  4. Pappas, S.; Ekonomou, L.; Karamousantas, D.; Chatzarakis, G.; Katsikas, S.; Liatsis, P. Electricity demand loads modeling using AutoRegressive Moving Average (ARMA) models. Energy 2008, 33, 1353–1360. [Google Scholar] [CrossRef]
  5. Christiaanse, W. Short-term load forecasting using general exponential smoothing. IEEE Trans. Power Appar. Syst. 1971, PAS-90, 900–911. [Google Scholar] [CrossRef]
  6. Hong, T.; Pinson, P.; Wang, Y.; Weron, R.; Yang, D.; Zareipour, H. Energy forecasting: A review and outlook. IEEE Open Access J. Power Energy 2020, 7, 376–388. [Google Scholar] [CrossRef]
  7. Hong, W.C. Electric load forecasting by support vector model. Appl. Math. Model. 2009, 33, 2444–2454. [Google Scholar] [CrossRef]
  8. Ranaweera, D.; Hubele, N.; Karady, G. Fuzzy logic for short term load forecasting. Int. J. Electr. Power Energy Syst. 1996, 18, 215–222. [Google Scholar] [CrossRef]
  9. Park, D.C.; El-Sharkawi, M.; Marks, R.; Atlas, L.; Damborg, M. Electric load forecasting using an artificial neural network. IEEE Trans. Power Syst. 1991, 6, 442–449. [Google Scholar] [CrossRef]
  10. Chen, H.; Canizares, C.A.; Singh, A. ANN-based short-term load forecasting in electricity markets. In Proceedings of the 2001 IEEE Power Engineering Society Winter Meeting. Conference Proceedings (Cat. No. 01CH37194), Columbus, OH, USA, 28 January–1 February 2001; Volume 2, pp. 411–415. [Google Scholar]
  11. Lu, C.N.; Wu, H.T.; Vemuri, S. Neural network based short term load forecasting. IEEE Trans. Power Syst. 1993, 8, 336–342. [Google Scholar] [CrossRef]
  12. Xia, C.; Wang, J.; McMenemy, K. Short, medium and long term load forecasting model and virtual load forecaster based on radial basis function neural networks. Int. J. Electr. Power Energy Syst. 2010, 32, 743–750. [Google Scholar] [CrossRef]
  13. Lv, L.; Wu, Z.; Zhang, J.; Zhang, L.; Tan, Z.; Tian, Z. A VMD and LSTM based hybrid model of load forecasting for power grid security. IEEE Trans. Ind. Inform. 2021, 18, 6474–6482. [Google Scholar] [CrossRef]
  14. Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM model for short-term individual household load forecasting. IEEE Access 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
  15. Liao, W.; Yang, Z.; Chen, X.; Li, Y. WindGMMN: Scenario Forecasting for Wind Power Using Generative Moment Matching Networks. IEEE Trans. Artif. Intell. 2021, 3, 843–850. [Google Scholar] [CrossRef]
  16. Mansouri, V.; Akbari, M.E. Efficient Short-Term Electricity Load Forecasting Using Recurrent Neural Networks. J. Artif. Intell. Electr. Eng. 2014, 3, 46–53. [Google Scholar]
  17. Rebei, A.; Amayri, M.; Bouguila, N. FSNet: A Hybrid Model for Seasonal Forecasting. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 1167–1180. [Google Scholar] [CrossRef]
  18. Song, Y.; Tang, D.; Yu, J.; Yu, Z.; Li, X. Short-Term Forecasting Based on Graph Convolution Networks and Multiresolution Convolution Neural Networks for Wind Power. IEEE Trans. Ind. Inform. 2022, 19, 1691–1702. [Google Scholar] [CrossRef]
  19. Jiang, R.; Zeng, S.; Song, Q.; Wu, Z. Deep-Chain Echo State Network with Explainable Temporal Dependence for Complex Building Energy Prediction. IEEE Trans. Ind. Inform. 2022, 19, 426–435. [Google Scholar] [CrossRef]
  20. Niu, S.; Liu, Y.; Wang, J.; Song, H. A decade survey of transfer learning (2010–2020). IEEE Trans. Artif. Intell. 2020, 1, 151–166. [Google Scholar] [CrossRef]
  21. Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
  22. Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr. Build. Mater. 2017, 157, 322–330. [Google Scholar] [CrossRef]
  23. Zoph, B.; Yuret, D.; May, J.; Knight, K. Transfer learning for low-resource neural machine translation. arXiv 2016, arXiv:1604.02201. [Google Scholar]
  24. Hua, J.; Zeng, L.; Li, G.; Ju, Z. Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning. Sensors 2021, 21, 1278. [Google Scholar] [CrossRef] [PubMed]
  25. Gao, Y.; Ruan, Y.; Fang, C.; Yin, S. Deep learning and transfer learning models of energy consumption forecasting for a building with poor information data. Energy Build. 2020, 223, 110156. [Google Scholar] [CrossRef]
  26. Li, A.; Xiao, F.; Fan, C.; Hu, M. Development of an ANN-based building energy model for information-poor buildings using transfer learning. In Proceedings of the Building Simulation; Springer: Berlin/Heidelberg, Germany, 2021; Volume 14, pp. 89–101. [Google Scholar]
  27. Fang, X.; Gong, G.; Li, G.; Chun, L.; Li, W.; Peng, P. A hybrid deep transfer learning strategy for short term cross-building energy prediction. Energy 2021, 215, 119208. [Google Scholar] [CrossRef]
  28. Le, T.; Vo, M.T.; Kieu, T.; Hwang, E.; Rho, S.; Baik, S.W. Multiple electric energy consumption forecasting using a cluster-based strategy for transfer learning in smart building. Sensors 2020, 20, 2668. [Google Scholar] [CrossRef] [PubMed]
  29. Zhou, D.; Ma, S.; Hao, J.; Han, D.; Huang, D.; Yan, S.; Li, T. An electricity load forecasting model for Integrated Energy System based on BiGAN and transfer learning. Energy Rep. 2020, 6, 3446–3461. [Google Scholar] [CrossRef]
  30. Peng, C.; Tao, Y.; Chen, Z.; Zhang, Y.; Sun, X. Multi-source transfer learning guided ensemble LSTM for building multi-load forecasting. Expert Syst. Appl. 2022, 202, 117194. [Google Scholar] [CrossRef]
  31. Cai, L.; Gu, J.; Jin, Z. Two-layer transfer-learning-based architecture for short-term load forecasting. IEEE Trans. Ind. Inform. 2019, 16, 1722–1732. [Google Scholar] [CrossRef]
  32. Hooshmand, A.; Sharma, R. Energy predictive models with limited data using transfer learning. In Proceedings of the Tenth ACM International Conference on Future Energy Systems, Phoenix, AZ, USA, 25–28 June 2019; pp. 12–16. [Google Scholar]
  33. Wu, D.; Lin, W. Efficient residential electric load forecasting via transfer learning and graph neural networks. IEEE Trans. Smart Grid 2022, 14, 2423–2431. [Google Scholar] [CrossRef]
  34. Le, C.P.; Dong, J.; Soltani, M.; Tarokh, V. Task Affinity with Maximum Bipartite Matching in Few-Shot Learning. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
  35. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  36. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  37. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  38. Gregor, K.; Danihelka, I.; Graves, A.; Rezende, D.; Wierstra, D. Draw: A recurrent neural network for image generation. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1462–1471. [Google Scholar]
  39. Graves, A.; Mohamed, A.R.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
  40. Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the Interspeech. Makuhari, Chiba, Japan, 26–30 September 2010; Volume 2, pp. 1045–1048. [Google Scholar]
  41. Taieb, S.B.; Atiya, A.F. A bias and variance analysis for multistep-ahead time series forecasting. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 62–76. [Google Scholar] [CrossRef] [PubMed]
  42. Operator, A.E.M. Aggregated Price and Demand Data. Available online: https://aemo.com.au/energy-systems/electricity/national-electricity-market-nem/data-nem/aggregated-data (accessed on 16 August 2024).
  43. Australian Government Department of Climate Change, Energy, the Environment and Water. Smart Grid, Smart City. 2014. Available online: https://data.gov.au/dataset/ds-dga-4e21dea3-9b87-4610-94c7-15a8a77907ef/details?q=smart%20grid%20smart%20city (accessed on 6 February 2023).
  44. Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
  45. He, Y.; Luo, F.; Ranzi, G. Transferrable Model-Agnostic Meta-learning for Short-Term Household Load Forecasting With Limited Training Data. IEEE Trans. Power Syst. 2022, 37, 3177–3180. [Google Scholar] [CrossRef]
Figure 1. Synthetic datasets 2, 6, and 10.
Figure 1. Synthetic datasets 2, 6, and 10.
Sensors 24 05802 g001
Figure 2. Task affinity scores between the different data and tasks 0 and 5.
Figure 2. Task affinity scores between the different data and tasks 0 and 5.
Sensors 24 05802 g002
Figure 3. Task affinity score between the different datasets on tasks 0 and 5.
Figure 3. Task affinity score between the different datasets on tasks 0 and 5.
Sensors 24 05802 g003
Figure 4. Affinity-driven transfer learning diagram.
Figure 4. Affinity-driven transfer learning diagram.
Sensors 24 05802 g004
Figure 5. A long short-term memory cell.
Figure 5. A long short-term memory cell.
Sensors 24 05802 g005
Figure 6. Average performance on different query sets.
Figure 6. Average performance on different query sets.
Sensors 24 05802 g006
Figure 7. Forecasting samples at different learning levels.
Figure 7. Forecasting samples at different learning levels.
Sensors 24 05802 g007
Table 1. Number of epochs an LSTM (a fully connected ANN) needs to match the MAPE performance of a one-epoch pre-trained model of the nearest dataset and the second nearest dataset.
Table 1. Number of epochs an LSTM (a fully connected ANN) needs to match the MAPE performance of a one-epoch pre-trained model of the nearest dataset and the second nearest dataset.
SourceNearest Dataset
Performance
Second Nearest
Dataset Performance
LSTMFCNLSTMFCN
dataset 07375
dataset 17375
dataset 27384
dataset 38655
dataset 45453
dataset 56365
dataset 68374
dataset 78374
dataset 85576
dataset 97476
Table 2. Dataset locations and years used in experiment 1.
Table 2. Dataset locations and years used in experiment 1.
Dataset NameLocationYear
NSW03New South Wales2003
NSW18New South Wales2018
QLD03Queensland2003
QLD18Queensland2018
SA03South Australia2003
SA18South Australia2018
TAS07Tasmania2007
TAS18Tasmania2018
VIC04Victoria2004
VIC18Victoria2018
Table 3. Correlations between the TAS distance and the MAPE on the query datasets.
Table 3. Correlations between the TAS distance and the MAPE on the query datasets.
Dataset NameCorrelation Value
NSW0389.10
NSW1888.45
QLD0386.89
QLD1886.82
SA0387.94
Table 4. MAPE performance of the selected pre-trained neural network in comparison with a random neural network.
Table 4. MAPE performance of the selected pre-trained neural network in comparison with a random neural network.
SourceNearest TaskSecond Nearest TaskRandom FCN
Number of Epochs015015015
NSW030.330.270.140.550.490.410.980.920.72
NSW180.230.220.150.540.470.321.181.080.85
QLD030.370.290.240.630.580.371.781.401.22
QLD180.260.210.200.620.490.461.331.290.94
SA030.290.200.180.500.480.431.541.491.02
Table 5. Performance of the selected pre-trained neural network and a random neural network on downsampled datasets.
Table 5. Performance of the selected pre-trained neural network and a random neural network on downsampled datasets.
SourceNearest Task Neural
Network Performance (MAPE)
Average of Random Neural
Networks Performances (MAPE)
10%20%50%10%20%50%
NSW030.280.280.270.650.610.58
NSW180.340.330.240.520.520.48
QLD030.510.460.400.440.430.41
QLD180.370.340.270.690.630.59
SA030.480.470.290.680.570.53
Table 6. Performance of the selected pre-trained neural network in comparison with a random neural network.
Table 6. Performance of the selected pre-trained neural network in comparison with a random neural network.
1-Day Data3-Day Data7-Day Data
MAPERMSEMAPERMSEMAPERMSE
Random LSTM2.250.3401.980.3132.250.349
Random LSTM (trained for 5 epochs)1.750.2111.720.2372.010.251
Nearest LSTM0.810.2010.770.1990.850.208
Nearest LSTM (trained for 5 epochs)0.730.1950.720.1950.750.198
Table 7. Performance of the selected pre-trained neural network and a random neural network on downsampled datasets.
Table 7. Performance of the selected pre-trained neural network and a random neural network on downsampled datasets.
10% Downsampling20% Downsampling50% Downsampling
MAPERMSEMAPERMSEMAPERMSE
Random LSTM3.730.5373.350.5523.640.532
Random LSTM (trained for 5 epochs)3.580.5303.110.4153.510.304
Nearest LSTM0.920.2380.830.2250.790.182
Nearest LSTM (trained for 5 epochs)0.820.1990.800.2020.750.197
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rebei, A.; Amayri, M.; Bouguila, N. Affinity-Driven Transfer Learning for Load Forecasting. Sensors 2024, 24, 5802. https://doi.org/10.3390/s24175802

AMA Style

Rebei A, Amayri M, Bouguila N. Affinity-Driven Transfer Learning for Load Forecasting. Sensors. 2024; 24(17):5802. https://doi.org/10.3390/s24175802

Chicago/Turabian Style

Rebei, Ahmed, Manar Amayri, and Nizar Bouguila. 2024. "Affinity-Driven Transfer Learning for Load Forecasting" Sensors 24, no. 17: 5802. https://doi.org/10.3390/s24175802

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop