Enhanced Sequence-to-Sequence Deep Transfer Learning for Day-Ahead Electricity Load Forecasting

Laitsos, Vasileios; Vontzos, Georgios; Tsiovoulos, Apostolos; Bargiotas, Dimitrios; Tsoukalas, Lefteri H.

doi:10.3390/electronics13101996

Open AccessArticle

Enhanced Sequence-to-Sequence Deep Transfer Learning for Day-Ahead Electricity Load Forecasting

¹

Department of Electrical and Computer Engineering, University of Thessaly, 383 34 Volos, Greece

²

Center for Intelligent Energy Systems (CiENS), School of Nuclear Engineering, Purdue University, West Lafayette, IN 47906, USA

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(10), 1996; https://doi.org/10.3390/electronics13101996

Submission received: 17 April 2024 / Revised: 16 May 2024 / Accepted: 17 May 2024 / Published: 20 May 2024

(This article belongs to the Special Issue Control and Optimization Technologies in Renewable Energy and Integrated Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Electricity load forecasting is a crucial undertaking within all the deregulated markets globally. Among the research challenges on a global scale, the investigation of deep transfer learning (DTL) in the field of electricity load forecasting represents a fundamental effort that can inform artificial intelligence applications in general. In this paper, a comprehensive study is reported regarding day-ahead electricity load forecasting. For this purpose, three sequence-to-sequence (Seq2seq) deep learning (DL) models are used, namely the multilayer perceptron (MLP), the convolutional neural network (CNN) and the ensemble learning model (ELM), which consists of the weighted combination of the outputs of MLP and CNN models. Also, the study focuses on the development of different forecasting strategies based on DTL, emphasizing the way the datasets are trained and fine-tuned for higher forecasting accuracy. In order to implement the forecasting strategies using deep learning models, load datasets from three Greek islands, Rhodes, Lesvos, and Chios, are used. The main purpose is to apply DTL for day-ahead predictions (1–24 h) for each month of the year for the Chios dataset after training and fine-tuning the models using the datasets of the three islands in various combinations. Four DTL strategies are illustrated. In the first strategy (DTL Case 1), each of the three DL models is trained using only the Lesvos dataset, while fine-tuning is performed on the dataset of Chios island, in order to create day-ahead predictions for the Chios load. In the second strategy (DTL Case 2), data from both Lesvos and Rhodes concurrently are used for the DL model training period, and fine-tuning is performed on the data from Chios. The third DTL strategy (DTL Case 3) involves the training of the DL models using the Lesvos dataset, and the testing period is performed directly on the Chios dataset without fine-tuning. The fourth strategy is a multi-task deep learning (MTDL) approach, which has been extensively studied in recent years. In MTDL, the three DL models are trained simultaneously on all three datasets and the final predictions are made on the unknown part of the dataset of Chios. The results obtained demonstrate that DTL can be applied with high efficiency for day-ahead load forecasting. Specifically, DTL Case 1 and 2 outperformed MTDL in terms of load prediction accuracy. Regarding the DL models, all three exhibit very high prediction accuracy, especially in the two cases with fine-tuning. The ELM excels compared to the single models. More specifically, for conducting day-ahead predictions, it is concluded that the MLP model presents the best monthly forecasts with MAPE values of 6.24% and 6.01% for the first two cases, the CNN model presents the best monthly forecasts with MAPE values of 5.57% and 5.60%, respectively, and the ELM model achieves the best monthly forecasts with MAPE values of 5.29% and 5.31%, respectively, indicating the very high accuracy it can achieve.

Keywords:

deep transfer learning; electricity load forecasting; multilayer perceptron; convolutional neural network; ensemble deep learning; multi-task deep learning; exploratory data analysis

1. Introduction

Research interest in the electricity sector has been growing for several key reasons. First, there is a global shift toward using sustainable and renewable energy sources like solar and wind. This shift has led researchers to find ways to better incorporate these technologies into existing power grids. Second, the increasing need for electricity due to population growth and industrialization requires new and efficient solutions for the transmission, distribution, and consumption of electricity. Third, the development of smart grids and improvements in energy storage technologies have opened up opportunities to improve the resilience, reliability, and responsiveness of power grids. Additionally, concerns about the environment and climate change have motivated researchers to explore cleaner and more eco-friendly energy options. The digitization of the electricity sector, thanks to advances in data analytics, machine learning, and Internet of Things (IoT) technologies, has also spurred research to create smarter and more efficient energy systems. In general, these factors have created a dynamic landscape, encouraging researchers to explore new approaches and technologies to tackle the changing challenges and opportunities in the electricity sector.

Deep transfer learning (DTL) refers to the use of deep neural networks (DNNs) in the domain of transfer learning. DTL utilizes the knowledge learned from one task to improve the performance of another related task. Usually, in DTL a pre-trained DNN model is fine-tuned for a different task. This domain improves data efficiency, reduces training time, and allows models to generalize well, capturing the underlying patterns in electricity consumption. The ability to adapt to dynamic conditions and the improved accuracy stemming from pre-trained models make transfer learning a valuable tool for addressing the challenges of forecasting in the energy sector. With limited and heterogeneous datasets, transfer learning enables models trained in one domain to be adapted to another, addressing the challenges of varying temporal scales and spatial characteristics. This approach proves essential for optimizing electricity forecasting models, particularly in the face of emerging technologies, changing infrastructures, and the need for resource-efficient solutions. By reusing pre-trained models and enhancing adaptability, transfer learning contributes significantly to robust and accurate predictions, ultimately supporting more effective energy management in the dynamically evolving landscape of the electricity sector.

More generally, in recent years, transfer learning has attracted increasing scientific interest. In this regard, an extensive literature review follows, aiming to highlight the most relevant papers that have addressed this specific field. Meng et al. in [1] propose a transfer learning-based method for abnormal electricity consumption detection, where a pre-trained model is fine-tuned using a small amount of data from the target domain. Antoniadis et al. in [2] discuss the use of transfer learning techniques for electricity load forecasting, specifically in the context of leveraging information from finer scales to improve forecasts at wider scales. Yang et al. in [3] discuss the implementation of a transfer learning strategy to address the multi-parameter coupling problem in the design of water-flow-enabled generators. Dong et al. in [4] propose a transfer learning model based on the Xception neural network for electrical load prediction, which is trained using pre-trained models and fine-tuned in the training process. Li et al. in [5] explore a transfer learning scheme for non-intrusive load monitoring (NILM) in smart buildings, which involves transferring a well-trained model to estimate power consumption in another dataset for all appliances. Peirelinck et al. in [6] discuss the use of transfer learning techniques in the context of demand response in the electricity sector. The study shows that transfer learning can improve performance by over 30% in various tasks. Laitsos et al. in [7] use a transfer learning technique with several deep learning models to predict the energy consumption of Greek Islands and the pre-trained model demonstrates outstanding flexibility when adapting to a new and unknown dataset. Wu et al. in [8] propose an attentive transfer framework for efficient residential electric load forecasting using transfer learning and graph neural networks. Kamalov et al. in [9] introduce an NBEATS model in order to test its effectiveness in medium-term electricity forecasting for zero-shot transfer learning. Syed et al. in [10] propose a reliable inductive transfer learning (ITL) method for load forecasting in electrical networks, which uses knowledge from existing deep learning models to develop accurate ITL models at other distribution nodes. Laitsos et al. in [11] propose an automated deep learning application for electricity load forecasting. Santos et al. in [12] propose a novel methodology that combines transfer learning and deep learning techniques to enhance short-term load forecasting for buildings with limited electricity data. Arvanitidis et al. in [13] propose clustering MLP models for short-term load forecasting. Luo et al. in [14] discuss the use of transfer learning techniques for load, solar, and wind power predictions, but their study does not specifically refer to the application of transfer learning to load prediction. Li et al. in [15] discuss a short-term load forecasting framework that adopts transfer learning. Transfer learning is used to train learnable parameters based on trend components and then to transfer them to the load forecasting model. Chan et al. in [16] introduce a hybridized modeling approach, using a convolutional neural network (CNN) and a support vector machine (SVM), for short-term load forecasting. Gontijo et al. in [17] examine the hourly power generation data in Brazil from 2018 to 2020, categorized based on the different electrical subsystems and their corresponding energy sources. The aim was to assess the precision of key methods for combining and splitting forecasts generated by the autoregressive integrated moving average (ARIMA) and the error, trend, seasonal (ETS) models.

Jung et al. in [18] propose a monthly electricity load forecasting framework for smart cities, using transfer learning techniques. Collecting data from multiple districts, they selected similar data based on correlation coefficients, and fine-tuned the model using target data. Al-Hajj et al. in [19] report a survey of transfer learning in renewable energy systems, specifically in the prediction of solar and wind power, the prediction of load, and the diagnosis of faults. Nivarthi et al. in [20] discuss the use of transfer learning in renewable energy systems, specifically in power forecasting and anomaly detection. The authors propose a transfer learning framework and a feature embedding approach to handle missing sensor data. Miraftabzadeh et al. in [21] present a framework based on transfer learning and deep neural networks for the prediction of day-ahead photovoltaic power. Dakovic et al. in [22] report an extensive review of machine learning applications aimed at addressing energy-related issues through the examination of various energy types and opportunities for energy reduction. Vontzos et al. in [23] propose a data-driven short-term forecasting method for electricity consumption in airports. Yang et al. in [24] propose an innovative monthly DNN approach for load forecasting in urban and regional areas. In order to draw more secure conclusions, an extended comparison with other machine learning models was performed. Li et al. in [25] propose a building electricity load forecasting method based on the maximum mean discrepancy (MMD) and an improved TrAdaBoost algorithm (iTrAdaBoost). Gontijo et al. in [26] introduce a dynamic time scan forecasting (DTSF) technique as a novel approach to predict hourly energy prices in Brazil. By identifying similarity patterns in time series data, DTSF demonstrated competitive advantages over traditional forecasting methods presented in prior research.

The relentless growth of electricity demand, coupled with the dynamic and often unpredictable nature of energy consumption patterns, requires advanced forecasting methods for effective grid management. In contrast to the aforementioned studies, which investigate transfer learning only within a few hours or so, the forecasting strategies developed in this paper target day-ahead prediction, thereby enhancing the effectiveness of the models. Additionally, none of the aforementioned studies explore ensemble deep learning, as is done in this study. This deep learning domain includes models that have attracted significant scientific interest due to their high performance, as well as the potential they offer for further investigations and improvements. So, in this context, transfer learning emerges as a promising paradigm to address the challenges associated with limited and disparate data sources. This research paper delves into the application of deep transfer learning techniques in the domain of electricity forecasting, with the aim of exploiting the knowledge gained from a source domain to improve predictive accuracy in a target domain. By leveraging preexisting models trained on related datasets or domains, transfer learning seeks to enhance the adaptability and robustness of forecasting models, ultimately contributing to more accurate and reliable predictions in the complex and ever-evolving landscape of electricity demand. This paper explores the theoretical foundations, methodologies and practical implications of transfer learning in the specific context of electricity forecasting, shedding light on its potential to revolutionize the field and pave the way for more resilient and efficient energy management systems.

With respect to the contributions of this paper, the following points are emphasized:

For the first time, a high accuracy results implementation of sequence-to-sequence (Seq2seq) ensemble deep transfer learning for day-ahead (1–24 h) forecasting is conducted on three distinct datasets from islands using the Greek power system. Although the Rhodes training dataset exhibits somewhat different behavior compared to the other two datasets, the proposed algorithms provide very satisfactory results. This fact further enhances the performance of the proposed strategies and models. The characteristics of the Rhodes dataset lead to a more robust and comprehensive evaluation of the models, as it introduces variability and challenges that may not be present in the other datasets. This diversity in behavior across datasets provides a more realistic and thorough assessment of the models’ capabilities.
The results obtained indicate that deep transfer learning (DTL) could be of particular value to both transmission system operators (TSO) and distribution system operators (DSO) within various regions of the Greek system.
The application of the models is performed on actual load data with minimal data preprocessing, a fact that leads to optimistic conclusions regarding their applicability under real-time conditions.

This paper is organized as follows: First, in Section 2, exploratory dataset analysis and feature creation is used. Then, in Section 3, the forecasting strategies are analyzed. In Section 4, the emergent results for each algorithm are presented, along with a discussion about their performance. Finally, in Section 5, the main conclusions are drawn and future study proposals are presented.

2. Materials and Methods

2.1. Dataset Analysis

In this section, all the features, behaviors, and correlations of the three datasets used are investigated and analyzed. Initially, the three time series under study are presented, and then their monthly and daily average fluctuations are considered.

Figure 1 illustrates the three power time series fluctuations for each of the three datasets at an hourly resolution. What is noteworthy is that Rhodes, especially during the summer months, experiences a substantial increase in demand. The other two islands exhibit several similarities between them, with both the average and extreme values behaving relatively similarly. This consistency in behavior across the two datasets suggests common characteristics or patterns in the energy-related dynamics of these islands. The shared trends in both average and extreme values contribute to a more coherent and comparable analysis between the two datasets, helping to develop and evaluate models for these specific island environments.

Figure 2 presents the monthly boxplots for each of the three datasets in hourly resolution.

In addition, Figure 3 visualizes the average daily electricity consumption (1–24 h) for each island.

This figure clearly illustrates that the average hourly values for Rhodes are significantly higher than those of the other two islands. However, it should be noted that the three patterns exhibit strong similarities among them, a fact demonstrated by the common peak and off-peak demand hours among the three islands.

2.2. Data Preprocessing

In order to shape the raw data into a suitable format capable of being used as input in deep learning models, the preprocessing involved the following four stages:

Anomaly Detection: Anomaly detection in time series involves establishing a baseline of normal behavior through statistical methods or machine learning algorithms, extracting relevant features, and training a model on labeled data to distinguish normal patterns from anomalies. Due to instances of zero consumption during specific hours, probably caused by network faults, these particular values were set equal to the corresponding values from one week prior. This adjustment was made to address the challenge of unexpected situations in the data, as algorithms may struggle to account for such anomalies. The goal is to ensure optimal training for each model by handling these irregularities in the dataset.
Filling Missing Values: Filling missing values in time series data is a crucial preprocessing step for anomaly detection. Since anomalies are often identified based on patterns and trends in the data, it is essential to address gaps caused by missing values.
Min-Max Scaling: The preprocessing method applied to all datasets in this paper involves min-max scaling, which normalizes the data points to a range between 0 and 1. To achieve this, two distinct scalers were employed, one for the input and another for the output dataset. The primary rationale behind utilizing min-max scaling is its ability to enhance the efficiency of training deep learning models during the training phase, facilitating faster convergence to the optimal solution of the loss function.
One-Hot Encoding: With this process, numerical data are transformed to cyclical data through trigonometric equations. In this study, day of the week, hour of the day, and month of the year were converted to sin and cosine formulations. Figure 4 represents the day of the week transformed in sin and cosine format. The periodicity that appears with this method helps the models to better understand the patterns presented by each time series studied.

2.3. Feature Creation

In this subsection, the generation of the parameters used as inputs for the models is described. For this purpose, several input characteristics were studied and evaluated in order to understand the most significant for predicting electricity demand. After several trials, we determined the best combination of features which would provide us with the most accurate prediction results. So, eight input characteristics were selected and investigated to forecast the day-ahead electricity demand. More specifically, variables 2–8 involve the methodology of one-hot encoding, which was introduced earlier and helps the DL models to better understand and adapt to time series exhibiting seasonal patterns, such as those studied in the paper. The input variables used for all the deep learning models remain consistent and are presented in detail below.

Power in hourly resolution: The sequence of 168 h of load values for 7 days/one week.
Cos of Day of Week: The sequence of 168 values of Day of the Week (0–6) converted by one-hot encoding to cosine type.
Sin of Day of Week: The sequence of 168 values of Day of the Week (0–6) converted by one-hot encoding to sin type.
Cos of Hour of Day: The sequence of 168 values of Hour of Day (0–23), converted by one-hot encoding to cosine type.
Sin of Hour of Day: The sequence of 168 values of Hour of Day (0–23), converted by one-hot encoding to sin type.
Cos of Month of Year: The sequence of 168 values of Month of the Year (1–12) converted by one-hot encoding to cosine type.
Sin of Month of Year: The sequence of 168 values of Month of the Year (1–12) converted by one-hot encoding to sin type.
IsWeekend: The sequence of 168 values of a dummy variable named “Is Weekend”, with value equal to 0 for working days and 1 for weekends and holidays.

Our target was to utilize a historical sequence of 168 h from the aforementioned eight features, and to create day-ahead predictions for the load, i.e., a sequence of 24 values of the power. Figure 5 visualizes the Seq2seq prediction technique.

Finally, a correlation heatmap is presented in Figure 6 in order to highlight the relationships between the power time series for each of the three islands.

What is noteworthy is that the target time series of Chios shows a correlation of 0.37 with that of Rhodes and 0.94 with that of Lesvos. This particular characteristic further enhances the reliability of the implemented applications, demonstrating generality and robustness through the results that will be presented below. The correlation between the target time series of different islands suggests some level of interdependence or shared patterns, which can contribute to the generalizability and effectiveness of the models developed for these islands.

3. Methodology

In this section, the fundamental methodology followed in the paper is presented. Initially, an introduction to the three DL models is presented. Then, the forecasting strategies are analyzed in depth, emphasizing all the details of each, highlighting the way datasets are utilized for each strategy. Subsequently, reference is made to the optimization framework which was used for enhancing the performance of the DL models. Additionally, the evaluation metrics which were used to evaluate and compare the performance of the algorithms of each strategy are presented. Finally, the software tools that were used are analyzed.

3.1. Deep Learning Models

In this subsection, the functionality and architectures of the three DL models used in the paper, MLP, CNN, and ELM, are analyzed. The main reason why the specific models were chosen is both because they are very powerful and advanced architectures of the deep learning domain in the field of time series forecasting, and because they are of high scientific interest in comparison to simpler machine learning algorithms.

3.1.1. Multilayer Perceptron

A multilayer perceptron (MLP), as presented in Figure 7, is a versatile artificial neural network architecture employed for learning and modeling complex relationships within data. Comprising an input layer (X), one or more hidden layers (

H_{i}

), and an output layer (Y), an MLP is characterized by its capacity to capture intricate non-linear patterns. During forward propagation, the input data are transformed through weighted connections and activation functions (

σ

) in the hidden layers, generating progressively more abstract representations. The hidden layer outputs (

H_{i}

) can be mathematically expressed as:

H_{i} = σ (W_{i} H_{i - 1} + b_{i}),

(1)

where

W_{i}

denotes the weight matrix connecting layer

i - 1

to i,

H_{i - 1}

is the output of the previous layer, and

b_{i}

represents the bias term for layer i. The activation function introduces non-linearity, enabling the network to capture complex mappings.

The final output (Y) is computed through similar operations in the output layer:

Y = σ (W_{out} H_{last} + b_{out})

(2)

During training, the network adjusts its weights to minimize a defined loss function (L), which quantifies the disparity between the predicted output and the actual target values. The weights are updated using an optimization algorithm, typically gradient descent, with the weight update rule expressed as:

W_{new} = W_{old} - η \frac{\partial L}{\partial W},

(3)

where

η

is the learning rate. This iterative process, known as back propagation, refines the model’s weights to improve its predictive accuracy.

In the domain of time series forecasting, MLPs exhibit efficiency owing to their inherent ability to capture temporal dependencies and non-linear patterns. The adaptability of the model enables it to discern and model various temporal structures, including trends and seasonality. The hidden layers serve as dynamic feature extractors, automatically learning relevant temporal features from the time series data. This feature learning capability, coupled with the tunability of the model parameters, positions MLPs as robust and effective tools for a wide array of time series forecasting tasks.

3.1.2. Convolutional Neural Network

Convolutional neural networks (CNNs), the architecture of which is presented in Figure 8, constitute a class of sophisticated deep learning architectures specifically designed for the analysis and processing of visual data. The principal structure of CNNs encompasses multiple layers, notably including convolutional layers, pooling layers, and fully connected layers. Convolutional layers assume a pivotal role in feature extraction from input images through the application of convolutional operations utilizing trainable filters. These filters adeptly identify patterns and features at various spatial scales, enabling the network to discern intricate details within the data. Accompanying pooling layers serve to diminish the spatial dimensions of feature maps, thereby diminishing computational complexity and augmenting the model’s capacity for generalization. The culmination of these operations transpires in fully connected layers positioned at the conclusion of the network, where the amalgamated features facilitate conclusive predictions. The applicability of CNNs extends across diverse computer vision domains, manifesting notable success in tasks such as image classification, object detection, and image segmentation.

In time series forecasting, CNNs adapt to sequential data using 1D convolutional layers. These layers analyze temporal patterns, aided by pooling layers for downsizing. CNNs efficiently capture short- and long-term dependencies, making them valuable for tasks such as stock price prediction and weather forecasting, showcasing their versatility across diverse data types.

3.1.3. Ensemble Learning Model

The ensemble learning model (ELM), comprising a multilayer perceptron (MLP) and a convolutional neural network (CNN), operates by independently training both models on a given dataset and then combining their predictions through weighted averaging. The MLP focuses on learning non-linear relationships, while the CNN excels at extracting hierarchical features. The weights assigned to each model in the ensemble are determined based on their performance, enhancing the contribution of the better-performing model. The final prediction is generated by summing the weighted predictions, aiming to capitalize on the complementary strengths of the MLP and CNN for improved the predictive accuracy and robustness across diverse data patterns. Figure 9 visualizes the main body of ELM created in this paper.

3.2. Deep Transfer Learning Forecasting Strategies

The basic idea on which the four implemented strategies are based is the extended application and exploration of deep transfer learning in three different datasets, finding the best method, in combination with the creation of powerful DTL forecasting strategies and tools with robust generalization capabilities. The first forecasting strategy, named Deep Transfer Learning Case 1 (DTL Case 1), involves training each of the three DL models exclusively on the Lesvos dataset, with fine-tuning carried out using the Chios dataset. In the second strategy, Deep Transfer Learning Case 2 (DTL Case 2), both the Lesvos and Rhodes datasets are used concurrently during the DL model training phase, followed by fine-tuning using the Chios dataset. The third strategy, Deep Transfer Learning Case 3 (DTL Case 3), entails training the DL models solely on the Lesvos dataset, with the testing phase conducted directly on the Chios dataset, without any fine-tuning. Lastly, in the Multi-task Deep Learning (MTDL) application strategy, each of the three DL models are trained simultaneously on all three datasets, with final predictions made on the unused portion of the Chios dataset.

The way in which the available data were used in order to implement the DTL strategies and the MTDL strategy is described below:

For DTL Case 1, only the dataset of Lesvos was used for the first training phase of the models, specifically for the time period from 2019-01-01 01:00:00 to 2021-12-31 23:00:00. Then, for the second phase, i.e., fine-tuning, the time period from 2019-01-01 01:00:00 to 2021-12-31 23:00:00 of the Chios dataset was utilized.
For DTL Case 2, the first training phase of the models was based on the datasets from Lesvos and Rhodes, and more specifically, for the time period from 2019-01-01 01:00:00 to 2021-12-31 23:00:00 for each dataset. Then, for the second phase, i.e., fine-tuning, the same time period as in DTL Case 1 was used, from 2019-01-01 01:00:00 to 2021-12-31 23:00:00 of the Chios dataset.
For DTL Case 3, the training dataset that was used covers the time period from 2019-01-01 01:00:00 to 2022-12-31 23:00:00 from the Lesvos dataset without fine tuning. The choice of Lesvos alone was made because it exhibited a higher correlation and similarity with the corresponding dataset of the Chios time series compared to the Rhodes island.
For MTDL, the datasets from all three islands were used simultaneously for training. Specifically, the datasets of Rhodes and Lesvos for the time period 2019-01-01 01:00:00 to 2022-12-31 23:00:00, and for Chios dataset, for the time period 2019-01-01 01:00:00 to 2021-12-31 23:00:00, were used.

Deep transfer learning is a field of transfer learning that entails utilizing knowledge gained from solving one task to make predictions on another related task, employing deep neural networks. This field often consists of pre-training a neural network on a source task with abundant labeled data and subsequently applying the acquired knowledge to a target task characterized by a foreign dataset. Two prevalent scenarios in transfer learning include domain adaptation, where the source and target tasks have the same input space but differ in output spaces, and feature extraction, where the source and target tasks share similar input and output spaces.

In the realm of time series approaches, deep transfer learning is valuable for several reasons.Time series data often exhibit complex patterns, trends, and seasonality, and acquiring labeled data for training deep models can be challenging due to limited availability. Deep transfer learning allows a model pre-trained on a source time series task to capture generic temporal features and representations that can be beneficial for a target task.The learned features can serve as a useful initialization for the target task, reducing the need for extensive training data and potentially enhancing the model’s ability to generalize to new patterns. Additionally, transfer learning is particularly advantageous when the source and target tasks share similar temporal characteristics, enabling the model to transfer knowledge effectively and improve its performance on the target task. This approach is especially relevant in situations where collecting large amounts of labeled data for every specific task is impractical or costly.

In this paper, DTL strategies with and without fine-tuning are developed. Initially, the two cases of DTL with fine-tuning are examined (DTL Case 1 and 2) and are analyzed below. Subsequently, the scenario where the pre-trained model, as configured without fine-tuning, is examined (DTL Case 3). Finally, the MTDL strategy, involving the simultaneous training of each DL model on the datasets of the three islands, is presented. The four proposed forecasting strategies implemented are analyzed and considered in detail below.

3.2.1. Deep Transfer Learning Case 1

In this methodology a DNN pre-trained on a source task is adjusted to perform a related target task. Initially trained on a large dataset for a general task, such as load forecasting, the pre-trained model captures broad features. This knowledge is then transferred to a target task, which is a foreign dataset. During fine-tuning, the model’s weights, especially in the deeper layers, are adjusted based on the target task’s data, allowing the model to adapt its learned representations to task-specific characteristics. This approach is particularly advantageous when the target task also has limited labeled data, enabling the model to leverage the knowledge gained from the source task and enhance its performance on the target task.

In DTL Case 1, the testing dataset of Chios is used, and the training dataset consists only of the time series of Lesvos island. This approach is followed due to a higher correlation between Chios and Lesvos compared to Rhodes, influencing the selection of the training data for better model performance. After, the training period, fine-tuning is performed on the training parts of each model on the dataset of Chios, creating the final fine-tuned models. Finally, these models are used for Chios day-ahead load forecasting. Figure 10 graphically visualizes the DTL Case 1 strategy.

3.2.2. Deep Transfer Learning Case 2

Similarly with DTL Case 1, for DTL Case 2, the two datasets from Rhodes and Lesvos are merged, and the three models are trained on the combined training dataset. The trainable part of the pre-trained model is fine-tuned on the dataset of Chios. Finally, predictions are made on the unused dataset from Chios island. Figure 11 visualizes this strategy.

3.2.3. Deep Transfer Learning Case 3

Deep transfer learning without fine-tuning involves a two-step process. Initially, a deep neural network is trained on a source task using a substantial dataset, learning hierarchical features relevant to that task. Subsequently, in the transfer phase, the pre-trained model is utilized with the exact same structure it was formed with during the training period in order to create predictions for a target task.The learned features are extracted without further adjusting the model’s weights and these fixed representations serve as input to a new task-specific regressor trained on the target task’s dataset.

This approach proves advantageous when the target task possesses limited labeled data, as it allows for knowledge transfer from a source task without the computational overhead of fine-tuning the entire model. By utilizing the pre-trained model as a feature extractor, the knowledge encapsulated in the generic representations can be harnessed for tasks that share similar low-level features and structures, promoting effective knowledge transfer across related tasks while mitigating the need for task-specific fine-tuning.

In DTL Case 3 strategy, the three DL models are trained on the dataset of Lesvos, and subsequently, predictions are made directly on the Chios day-ahead load, without a fine-tuning process. This strategy is used in order to draw secure conclusions regarding the approaches with and without fine-tuning. This strategy is presented in Figure 12.

3.2.4. Multi-Task Deep Learning

Multi-task deep learning (MTDL) is a multi-task learning (MTL) methodology in deep neural networks (DNN), where a DNN model is trained simultaneously on multiple datasets, leveraging the shared knowledge across the different datasets to improve the model’s generalization performance. The underlying principle of MTDL is to exploit the relationships and commonalities among distinct but related tasks, allowing the model to learn a shared representation that captures the inherent structure present in the data. Essentially, a unified dataset is created which is a concatenation of different datasets, where each of them corresponds to a specific output. During training, the model optimizes its parameters by jointly minimizing the loss across all tasks. The shared representation learned across tasks facilitates the transfer of knowledge between them, leading to enhanced generalization performance, particularly in scenarios where individual tasks lack sufficient data for robust learning. The success of MTDL lies in its ability to induce a form of regularization, encouraging the model to discover and focus on relevant features that are beneficial for multiple tasks simultaneously.

Each of the training tasks has its own objective function, and the model learns to jointly optimize these objective functions. The general idea is to share information between tasks to improve overall performance. Mathematically, the following applies:

N is the total number of tasks.
X are the input data.
$Y_{i}$ is the output for task i.
$θ$ are the parameters of the neural network model.

For each task i, there is an associated loss function

L_{i} (θ)

that measures the error between the predicted output and the true output for that task. The overall loss function for all tasks can be defined as a combination of the individual task loss functions, often using a weighted sum.

L (θ) = \sum_{i = 1}^{N} α_{i} L_{i} (θ),

(4)

where

α_{i}

are optional weighting factors and

L i

represents the loss for the i-th task.

The goal is to minimize this overall loss function with respect to the model parameters

θ

. The minimal value of the loss function,

θ^{*}

, is given below:

θ^{*} = arg min_{θ} L (θ)

(5)

The model parameters are then updated using gradient descent or other optimization techniques to minimize this combined loss. The shared representation in the intermediate layers allows the model to discover commonalities and relationships among tasks, promoting a more generalized feature extraction process. By training on diverse datasets simultaneously, MDTL facilitates the development of a model that not only excels in individual tasks but also demonstrates improved performance on new, foreign data.

In the strategy employed in this paper, which is presented in Figure 13, training is conducted simultaneously on all three different datasets of the islands, Rhodes, Chios and Lesvos. The objective is for the model to acquire high generalization capabilities and make predictions on the testing dataset selected of Chios. Due to the distinct variations in the three time series, this approach proves more robust than cases involving singular training, imparting generality in performance across the models.

3.3. Optimization Framework

The Bayesian optimization algorithm (BOA) is utilized for each training period and for each model. It is a probabilistic optimization approach designed to tackle complex and computationally expensive objective functions. Central to BOA is the use of a surrogate model, typically a Gaussian process, which provides a probabilistic representation of the unknown objective function. This surrogate model captures both the mean and uncertainty associated with the objective function across the parameter space. The algorithm iteratively refines its understanding of the objective function by selecting points for evaluation based on an acquisition function that balances exploration and exploitation. The chosen points are then used to update the surrogate model through Bayesian inference, adjusting the model’s predictions in light of the new information. This iterative process allows BOA to systematically explore the parameter space, adapt to the underlying structure of the objective function, and efficiently converge towards optimal solutions.

BOA excels in making informed decisions using the uncertainty measured by its surrogate model. An acquisition function guides the algorithm to explore areas where the objective function is uncertain or is likely to have optimal values. As the optimization progresses, the surrogate model of BOA improves continuously, enhancing its understanding of the objective function and focusing the search on regions most likely to contain the global optimum. This principled approach makes it particularly well suited for optimization problems in scientific and engineering domains where objective function evaluations are resource-intensive or subject to noise. It efficiently identifies optimal parameter configurations in such scenarios. In this study, the architecture of each DL model was optimized and configured through BOA. For each case, each model was saved in h5 format. Regarding the hyperparameters of each DL model, Table 1 follows, which presents in detail each hyperparameter, the value of which was optimized with the aim of obtaining the most accurate predictions.

3.4. Evaluation Metrics

For this paper, the following four error prediction metrics are used:

Mean Absolute Error (MAE): In this metric, the average of the absolute differences between the forecast and true values is calculated.

Root Mean Squared Error (RMSE): This metric calculates the square root of the average of the squared differences between the forecast and true values.

Mean Absolute Percentage Error (MAPE): This metric computes the average of the absolute percentage differences between the predicted and actual values.

R-squared (

R^{2}

): This is a statistical metric that measures how the independent variable(s) in a forecasting model explain the variation in the dependent variable. It takes values between 0 and 1. A value of 1 implies a satisfactory fit, meaning all variation in the dependent variable is explained by the independent variable(s); 0 demonstrates zero connection between the variables.

The above metrics are defined as follows:

M A E = \frac{\sum_{i = 1}^{n} | y_{i} - x_{i} |}{n}

(6)

R M S E = \sqrt{\sum_{i = 1}^{n} \frac{{(y_{i} - x_{i})}^{2}}{n}}

(7)

M A P E = \frac{100}{n} \sum_{i = 1}^{n} | \frac{y_{i} - x_{i}}{x_{i}} |

(8)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{x}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(9)

where

x_{i}, y_{i}

,

{\hat{x}}_{i}

and

\bar{y}

are the forecasting, the actual, the mean of the forecasting values and the mean of the actual values, respectively.

3.5. Software Environment

All the algorithms in this work were developed using the Python 3.10 programming language. The open source software library Tensorflow 2.15.0 and the Keras 2.15.0 high-level API were used to train and test the deep learning algorithms. Furthermore, the Pandas 2.1.0 and Numpy 1.26.0 libraries were used for data analysis. For visualization purposes in exploratory analysis and prediction results, the Seaborn 0.13.2, Plotly 5.19.0, Matplotlib 3.8.3, graphviz 0.20.2 and torchviz 0.0.2 libraries were incorporated. Also, the official Calendar 1.0.1 library was used in order to identify the weekends and the Greek holidays. The research was carried out on the Google Colab Pro environment. For the MLP model, the GPU Tesla T4 with specific characteristics: NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2, RAM: 12.7 GB, and disk space: 78.2 GB was used. For the CNN and ensemble model, a cloud TPU 28.6 GB RAM and 107.7 GB disc space was employed.

4. Results Analysis

In this section, the experimental results obtained are presented. First, only the mean variance of the mean absolute error (MAE) on a monthly basis for the MLP, CNN, and ELM, respectively, is presented for economy of space. Then, the MAPE (mean absolute percentage error), the RMSE (root mean square error) and the aggregated results are visualized, analyzed, and compared for each strategy and each model on a monthly basis. Finally, the tables detailing

R^{2}

are presented in the aggregated results.

4.1. Multilayer Perceptron Results

Figure 14 visualizes the variation in MAE on a monthly basis for each of the four strategies for the MLP model. It is observed that May and September exhibit better prediction results, while January, July, and August are the months with poorer performance.

4.2. Convolutional Neural Networks Results

Figure 15 presents the variation in MAE on a monthly basis for each of the four strategies for CNN. It is clear that April and May exhibit better average prediction results, while June, July, and August are the months with the worst performance.

4.3. Ensemble Learning Model Results

Figure 16 visualizes the variation in MAE on a monthly basis for each of the four strategies for the ensemble learning model. It is observed that June, July, and August exhibit better average prediction results, while May, September, and October are the months with the lowest prediction accuracy.

4.4. Results Comparison

In order to perform an evaluation of the performance of each model, the results are consolidated and compared for each of the four strategies that were pursued in the following radial graph. This particular graph was chosen because it constitutes an illustrated and clear approach in presenting results for each of the twelve months of the year. In other words, the closer a point is to the center of the circle, the lower the prediction error achieved.

4.4.1. Variation in MAE for Each Strategy

Figure 17 presents a comparison month-by-month of the mean absolute error (MAE) of the DL models for each strategy.

Based on the comparative results, the following observations emerge:

For the DTL strategy with fine-tuning and training data from the Lesvos time series, DTL Case 1, it is observed that the MLP and ensemble models outperform the CNN model.
In the forecasting strategy with fine-tuning and training data from both the Rhodes and Lesvos time series, DTL Case 2, it seems that the MLP and ensemble models exhibit comparable performance, except for October, where MLP outperforms.
Regarding the use of pre-trained models without fine-tuning, DTL Case 3, the CNN significantly lags behind the other two models, with MLP consistently exhibiting the highest prediction accuracy.
Finally, regarding the Multi-task Deep Learning application strategy, MTDL, it is observed that the MLP model consistently shows inferior results compared to the other two models for all months. Here, CNN and ensemble achieve similar accuracy.

For additional analysis and understanding of the behavior of the algorithms, Figure 18 and Figure 19 plot the variation in the mean absolute percentage error (MAPE) and the root mean squared error (RMSE) for each month, respectively.

4.4.2. Variation in MAPE for Each Strategy

Figure 18 presents a comparison of the mean absolute percentage error (MAPE) for each model and strategy on a monthly basis.

It is clearly observed that for the DTL Cases 1 and 2, June exhibits the best prediction accuracy, with the ensemble model achieving a MAPE of 5.29% for Case 1 and the MLP achieving a MAPE of 6.01% for Case 2. In the case of MTDL, the best predictions are observed in the months of June and July, with the ELM presenting predictions of 6.33% and 6.86%, respectively. Finally, for the DTL Case 3, the best prediction is observed in January, corresponding to a MAPE of 7.85%, achieved by the ELM model.

4.4.3. Variation in RMSE for Each Strategy

Figure 19 presents a comparison of the root mean square error (RMSE) for each model and strategy, monthly.

4.5. Aggregated Results

This subsection provides a detailed presentation of the results obtained for each forecasting strategy and each DL model, monthly. Table 2, Table 3, Table 4 and Table 5 present all the aggregated results for each forecasting strategy to provide a complete view of the performance of each algorithm and for each forecasting month. The MAE and RMSE metrics are given in MWs, MAPE is given in percent (%), and

R^{2}

takes values between 0 and 1.

Based on the results shown in the Tables, the following summarizations apply:

In DTL Case 1, it is observed that the ensemble model achieves the best prediction for the month of June, presenting a MAPE of 5.29%.
In DTL Case 2, again, the ensemble model achieves the best prediction, which pertains to the month of February, exhibiting a MAPE of 5.31%.
Regarding DTL Case 3, the ensemble model achieves the best prediction in January with a MAPE of 7.85%.
In MTDL, the CNN model manages the best prediction in January, corresponding to a MAPE of 5.62%.

Generally, based on the Table 2, Table 3, Table 4 and Table 5, as well as Figure 17, Figure 18 and Figure 19, a notable difference is observed between the best and worst prediction months. The best corresponds to the ELM prediction for May, with a MAPE of 5.36% and the worst, corresponds to the CNN model prediction for December, with a MAPE of 22.62%, resulting in a precision difference of 17.26%. This fact further reinforces the high performance of ELM in cases with fine-tuning, which is influenced by the predictions of the two distinct models. On the other hand, CNN, as evidenced and in the case of the worst-performing month, exhibits general instability in the case without fine-tuning, which is related to the inability of its trainable parameters to adapt to foreign data. This fact may reveal the general difficulty of models using convolution mechanisms to adapt to unknown datasets.

4.6. Results Discussion

Based on the above results, it becomes evident that the application of deep learning algorithms in the domain of deep transfer learning (DTL) can yield satisfactory outcomes, reducing the computational power requirements and model training times, due to the fact that, after the initial training of the model, only the fine-tuning needs to take place each time the DL model is applied in a different area. The time required for a DL model to be trained during the fine-tuning period is significantly shorter compared to the time needed for direct training, as in the case of MTDL. Also, the variation in results for each month indicates that the ELM improves predictions for the majority of the forecast months.

In general, it is observed that the two strategies of deep transfer learning with fine-tuning (DTL Case 1 and 2) significantly outperform DTL Case 3 and MTDL. Specifically, in the comparison between fine-tuning strategies and multi-task deep learning, the differences suggest that the utilized models can adapt better when trained separately on different datasets, as opposed to parallel and simultaneous training on multiple datasets together. Both of these cases involve efforts to create models capable of efficiently generalizing to unknown and differently behaving time series.

Additionally, in the case of the direct use of a pre-trained model (DTL Case 3), a poorer performance is achieved compared to other cases. For the ELM, which is influenced by both the MLP and CNN models, the poor performance of the CNN negatively affects the accuracy for most months, with exceptions in January and July.

The variation in results clearly demonstrates that the employment of more than one model in an ensemble combination significantly improves the performance compared to individual algorithms. The reason behind this improvement lies in the weighted average learning, which takes into account the best predictions from both models, MLP and CNN separately. As a result, the final day-ahead load prediction is considerably improved, demonstrating the effectiveness of combining multiple models.

Finally, it is worth noting that the adaptability of the algorithms to the three examined time series relies on both the trainable parameters of each model and the different features and patterns exhibited by each case. Seasonality, the peak demand periods, and the average values of each dataset are some of the characteristics that influence the algorithmic functionalities.

5. Conclusions and Future Study Proposals

In this study, an extensive investigation was conducted regarding Seq2Seq deep transfer learning on time series data. For this reason, a case study of a month-to-month approach was employed with the aim of day-ahead forecasting of electricity load in three islands using the Greek power system. The results obtained provide us with valuable information regarding the application of such methods and their effectiveness. The first major conclusion is that transfer learning outperforms simple learning, even in the case of multi-task deep learning, which is utilized for better model generalization.

Furthermore, another conclusion is that deep transfer learning using ensemble models outperforms simple DL models, as evidenced by the results obtained. More specifically, in the strategies DTL Case 1, DTL Case 2, as well as MTDL, it is observed that, for the majority of months, the ELM enhances the predictions achieved by the two individual DL models. This fact enables particularly optimistic conclusions to be drawn regarding further exploration of ensemble models in the field of prediction in power systems.

DTL strategies are cost-effective, requiring significantly less computational power and time compared to simple prediction methods, due to the fact that the DL models are trained once in a source dataset, after they are saved in appropriate format, and subsequently only their final part is fine-tuned for each specific task. Therefore, DTL minimizes the computational resources required and speeds up training for a specific target task, such as forecasting the day-ahead electrical load. By leveraging the knowledge stored in pre-trained models, DTL efficiently utilizes resources, facilitating swift deployment of effective models across different domains. Thus, it becomes evident that the application of DTL on real-time data can be achieved with high performance due to the aforementioned conditions, as well as the flexibility provided by this specific domain compared to simple transfer learning applications.

More specifically, deploying the models studied in this work in real-time conditions is really challenging. Adapting to fast-changing data streams, maintaining low latency, and balancing model complexity with real-time requirements are key hurdles. Continuous updates of the model to address concept drift and ensure interpretability in real-time settings are also critical. Success requires tailored optimization and infrastructure to handle real-time processing demands effectively.

In the context of future study proposals, it should be emphasized initially that DTL could be applied to other branches of power systems, such as fault prediction in electric power transmission and distribution networks. Furthermore, beyond energy systems, it could be highlighted that a significant challenge lies in the application of DTL to areas like healthcare, where numerous research studies are conducted globally.

As already mentioned, the study of DTL using ELM, both in the field of power systems and in other domains, can help to further improve the results of transfer learning.

Finally, the combination of DTL with reinforcement learning holds promise for future research and offers potential advancements. This could be explored to improve the efficiency of demand forecasting and load management systems. For instance, a pre-trained reinforcement learning agent could learn general patterns and behaviors from historical data across different regions or time periods. This pre-trained agent could then be fine-tuned on a specific locality or timeframe in order to adapt to unique characteristics and changes in electricity demand. This approach may lead to more accurate and adaptable models for load prediction, contributing to improved resource planning and energy efficiency in the electricity grid.

Author Contributions

Conceptualization, V.L. and G.V.; methodology, V.L.; software, V.L. and G.V.; validation, V.L., G.V., A.T., D.B. and L.H.T.; formal analysis, V.L. and G.V.; investigation, V.L. and G.V.; resources, V.L. and G.V.; data curation, V.L. and G.V.; writing—original draft preparation, V.L. and G.V.; writing—review and editing, V.L., G.V., A.T., D.B. and L.H.T.; visualization, V.L. and G.V.; supervision, D.B. and L.H.T.; project administration, D.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The load data used in this study are available from the HEDNO, Greece portal in [27].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ARIMA	Autoregressive Integrated Moving Average
BO	Bayesian Optimization
CNN	Convolutional Neural Network
DNN	Deep Neural Network
DTL	Deep Transfer Learning
DTSF	Dynamic Time Scan Forecasting
EDA	Exploratory Data Analysis
EDL	Ensemble Deep Learning
ELM	Ensemble Learning Model
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MDTL	Multi-task Deep Transfer Learning
MLP	Multilayer Perceptron
MMD	Maximum Mean Discrepancy
MSE	Mean Squared Error
NBEATS	Neural Basis Expansion Analysis for Time Series
NILM	Non-Intrusive Load Monitoring
$R^{2}$	R-Squared
RMSE	Root Mean Squared Error
Seq2Seq	Sequence-to-Sequence
SVM	Support Vector Machine
TL	Transfer Learning

References

Meng, S.; Li, C.; Tian, C.; Peng, W.; Tian, C. Transfer learning based graph convolutional network with self-attention mechanism for abnormal electricity consumption detection. Energy Rep. 2023, 9, 5647–5658. [Google Scholar] [CrossRef]
Antoniadis, A.; Gaucher, S.; Goude, Y. Hierarchical transfer learning with applications to electricity load forecasting. Int. J. Forecast. 2023, 40, 641–660. [Google Scholar] [CrossRef]
Yang, C.; Wang, H.; Bai, J.; He, T.; Cheng, H.; Guang, T.; Yao, H.; Qu, L. Transfer learning enhanced water-enabled electricity generation in highly oriented graphene oxide nanochannels. Nat. Commun. 2022, 13, 6819. [Google Scholar] [CrossRef] [PubMed]
Dong, Y.; Xiao, L. A Transfer Learning Based Deep Model for Electrical Load Prediction. In Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications (ICCC), Chengdu, China, 9–12 December 2022; pp. 2251–2255. [Google Scholar] [CrossRef]
Li, D.; Li, J.; Zeng, X.; Stankovic, V.; Stankovic, L.; Xiao, C.; Shi, Q. Transfer learning for multi-objective non-intrusive load monitoring in smart building. Appl. Energy 2023, 329, 120223. [Google Scholar] [CrossRef]
Peirelinck, T.; Kazmi, H.; Mbuwir, B.V.; Hermans, C.; Spiessens, F.; Suykens, J.; Deconinck, G. Transfer learning in demand response: A review of algorithms for data-efficient modelling and control. Energy AI 2022, 7, 100126. [Google Scholar] [CrossRef]
Laitsos, V.; Vontzos, G.; Bargiotas, D. Investigation of Transfer Learning for Electricity Load Forecasting. In Proceedings of the 2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA), Volos, Greece, 10–12 July 2023; pp. 1–7. [Google Scholar] [CrossRef]
Wu, D.; Lin, W. Efficient Residential Electric Load Forecasting via Transfer Learning and Graph Neural Networks. IEEE Trans. Smart Grid 2023, 14, 2423–2431. [Google Scholar] [CrossRef]
Kamalov, F.; Sulieman, H.; Moussa, S.; Avante Reyes, J.; Safaraliev, M. Powering Electricity Forecasting with Transfer Learning. Energies 2024, 17, 626. [Google Scholar] [CrossRef]
Syed, D.; Zainab, A.; Refaat, S.S.; Abu-Rub, H.; Bouhali, O.; Ghrayeb, A.; Houchati, M.; Bañales, S. Inductive Transfer and Deep Neural Network Learning-Based Cross-Model Method for Short-Term Load Forecasting in Smarts Grids. IEEE Can. J. Electr. Comput. Eng. 2023, 46, 157–169. [Google Scholar] [CrossRef]
Laitsos, V.; Vontzos, G.; Bargiotas, D.; Daskalopulu, A.; Tsoukalas, L.H. Enhanced Automated Deep Learning Application for Short-Term Load Forecasting. Mathematics 2023, 11, 2912. [Google Scholar] [CrossRef]
Santos, M.L.; García, S.D.; García-Santiago, X.; Ogando-Martínez, A.; Camarero, F.E.; Gil, G.B.; Ortega, P.C. Deep learning and transfer learning techniques applied to short-term load forecasting of data-poor buildings in local energy communities. Energy Build. 2023, 292, 113164. [Google Scholar] [CrossRef]
Arvanitidis, A.I.; Bargiotas, D.; Daskalopulu, A.; Kontogiannis, D.; Panapakidis, I.P.; Tsoukalas, L.H. Clustering informed MLP models for fast and accurate short-term load forecasting. Energies 2022, 15, 1295. [Google Scholar] [CrossRef]
Luo, T.; Tang, Z.; Liu, J.; Zhou, B. A Review of Transfer Learning Approaches for Load, Solar and Wind Power Predictions. In Proceedings of the 2023 Panda Forum on Power and Energy (PandaFPE), Chengdu, China, 27–30 April 2023; pp. 1580–1584. [Google Scholar] [CrossRef]
Li, S.; Wu, H.; Wang, X.; Xu, B.; Yang, L.; Bi, R. Short-term load forecasting based on AM-CIF-LSTM method adopting transfer learning. Front. Energy Res. 2023, 11, 1162040. [Google Scholar] [CrossRef]
Chan, S.; Oktavianti, I.; Puspita, V. A deep learning cnn and ai-tuned svm for electricity consumption forecasting: Multivariate time series data. In Proceedings of the 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 17–19 October 2019; pp. 488–494. [Google Scholar] [CrossRef]
Silveira Gontijo, T.; Azevedo Costa, M. Forecasting Hierarchical Time Series in Power Generation. Energies 2020, 13, 3722. [Google Scholar] [CrossRef]
Jung, S.M.; Park, S.; Jung, S.W.; Hwang, E. Monthly electric load forecasting using transfer learning for smart cities. Sustainability 2020, 12, 6364. [Google Scholar] [CrossRef]
Al-Hajj, R.; Assi, A.; Neji, B.; Ghandour, R.; Al Barakeh, Z. Transfer Learning for Renewable Energy Systems: A Survey. Sustainability 2023, 15, 9131. [Google Scholar] [CrossRef]
Nivarthi, C.P. Transfer Learning as an Essential Tool for Digital Twins in Renewable Energy Systems. arXiv 2022, arXiv:2203.05026. [Google Scholar] [CrossRef]
Miraftabzadeh, S.M.; Colombo, C.G.; Longo, M.; Foiadelli, F. A Day-Ahead Photovoltaic Power Prediction via Transfer Learning and Deep Neural Networks. Forecasting 2023, 5, 213–228. [Google Scholar] [CrossRef]
Đaković, D.; Kljajić, M.; Milivojević, N.; Doder, Đ.; Anđelković, A.S. Review of Energy-Related Machine Learning Applications in Drying Processes. Energies 2024, 17, 224. [Google Scholar] [CrossRef]
Vontzos, G.; Laitsos, V.; Bargiotas, D. Data-Driven Airport Multi-Step Very Short-Term Load Forecasting. In Proceedings of the 2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA), Volos, Greece, 10–12 July 2023; pp. 1–6. [Google Scholar] [CrossRef]
Yang, M.; Liu, Y.; Liu, Q. Nonintrusive residential electricity load decomposition based on transfer learning. Sustainability 2021, 13, 6546. [Google Scholar] [CrossRef]
Li, K.; Wei, B.; Tang, Q.; Liu, Y. A Data-Efficient Building Electricity Load Forecasting Method Based on Maximum Mean Discrepancy and Improved TrAdaBoost Algorithm. Energies 2022, 15, 8780. [Google Scholar] [CrossRef]
Silveira Gontijo, T.; Barbosa de Santis, R.; Azevedo Costa, M. Application of a data-driven DTSF and benchmark models for the prediction of electricity prices in Brazil: A time-series case. J. Renew. Sustain. Energy 2023, 15, 036101. [Google Scholar] [CrossRef]
Publication of NII Daily Energy Planning Data|HEDNO. Available online: https://deddie.gr/en/themata-tou-diaxeiristi-mi-diasundedemenwn-nisiwn/leitourgia-mdn/dimosieusi-imerisiou-energeiakou-programmatismou/ (accessed on 10 January 2023).

Figure 1. Hourly consumption for Lesvos, Rhodes and Chios.

Figure 2. Monthly average consumption for every island.

Figure 3. Hourly average consumption per day for each island.

Figure 4. Day of the week in sine and cosine formulation.

Figure 5. Sequence-to-sequence forecasting technique.

Figure 6. Correlation heatmap.

Figure 7. Multilayer perceptron architecture.

Figure 8. CNN model architecture.

Figure 9. Ensemble deep learning model architecture.

Figure 10. Deep Transfer Learning Case 1.

Figure 11. Deep Transfer Learning Case 2.

Figure 12. Deep Transfer Learning Case 3.

Figure 13. Multi-task Deep Learning.

Figure 14. Variation in MAE for MLP model.

Figure 15. Variation in MAE for CNN model.

Figure 16. Variation in MAE for ELM.

Figure 17. Comparison of monthly MAE for each forecasting strategy.

Figure 18. Comparison of monthly MAPE for each forecasting strategy.

Figure 19. Comparison of monthly RMSE for each forecasting strategy.

Table 1. Models optimization hyperparameters.

Deep Learning Model	Input Sequence Length	Optimization Function	Model Hyperparameters
MLP	168 h	Validation loss of MSE	Dense layer with search space of Neurons: min_value = 16, max_value = 512, step = 16 Dropout layer with search space: min_value = 0, max_value = 0.25, step = 0.05 Optimizer = Adam Learning rate with search space: min_value = 0.0010, max_value = 0.010, sampling = “log”
CNN	168 h	Validation loss of MSE	Filters of CNN with range: min_value = 64, max_value = 128, step = 16 CNN kernel size with search space: min_value = 4, max_value = 8, step = 2 Neurons of Dense Layer with search space: min_value = 24, max_value = 120, step = 12 Optimizer = Adam Learning rate with search space: min_value = 0.0010, max_value = 0.010, sampling = “log”
ELM	168 h	Validation loss of MSE	Dense layer with search space of Neurons: min_value = 16, max_value = 512, step = 16 Dropout layer with search space: min_value = 0, max_value = 0.25, step = 0.05 Filters of CNN with range: min_value = 64, max_value = 128, step = 16 CNN kernel size with search space: min_value = 4, max_value = 8, step = 2 Neurons of Final Dense Layer with search space: min_value = 24, max_value = 48, step = 4 Optimizer = Adam Learning rate with search space: min_value = 0.0010, max_value = 0.010, sampling = “log”

Table 2. Deep Transfer Learning Case 1.

		Month
Model		Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec
MLP	MAE	2.16	1.56	2.01	1.38	1.12	1.12	1.57	2.12	1.40	1.39	1.64	1.42
	MAPE	7.38	6.24	7.68	7.98	6.97	6.05	7.01	9.23	7.66	9.04	9.39	7.00
	RMSE	2.79	1.98	2.59	1.78	1.41	1.48	2.15	2.76	1.78	1.70	2.06	1.79
	$R^{2}$	0.83	0.87	0.82	0.67	0.73	0.74	0.63	0.08	0.75	0.53	0.70	0.82
CNN	MAE	1.69	1.52	1.73	1.31	1.06	1.24	1.63	2.52	1.43	1.75	1.54	1.38
	MAPE	5.57	6.07	6.60	7.57	6.55	6.69	7.46	10.95	7.79	11.39	8.85	6.79
	RMSE	2.37	2.03	2.28	1.78	1.38	1.74	2.25	3.19	1.90	2.20	1.98	1.79
	$R^{2}$	0.88	0.87	0.86	0.68	0.74	0.64	0.59	0.11	0.70	0.21	0.73	0.82
ELM	MAE	1.76	1.42	1.77	1.16	0.86	0.98	1.38	2.20	1.22	1.23	1.37	1.14
	MAPE	5.87	5.72	6.77	6.73	5.36	5.29	6.24	9.57	6.67	8.04	7.82	5.62
	RMSE	2.42	1.84	2.31	1.57	1.12	1.36	1.91	2.80	1.58	1.57	1.73	1.51
	$R^{2}$	0.88	0.89	0.86	0.75	0.83	0.78	0.71	6.25	0.80	0.61	0.80	0.87

Table 3. Deep Transfer Learning Case 2.

		Month
Model		Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec
MLP	MAE	1.91	1.51	1.91	1.34	0.98	1.11	1.44	2.04	1.28	1.28	1.51	1.32
	MAPE	6.50	6.07	7.29	7.79	6.24	6.01	6.61	8.87	6.95	8.38	8.66	6.51
	RMSE	2.44	1.94	2.47	1.80	1.29	1.50	1.98	2.67	1.64	1.63	1.89	1.71
	$R^{2}$	0.87	0.88	0.84	0.67	0.77	0.73	0.69	0.15	0.78	0.57	0.75	0.84
CNN	MAE	1.85	1.40	1.91	1.43	1.17	1.37	2.14	2.40	1.66	2.39	1.90	1.46
	MAPE	6.26	5.60	7.29	8.30	7.27	7.38	9.85	10.44	9.02	15.51	10.90	7.23
	RMSE	2.56	1.83	2.43	1.80	1.46	1.79	2.76	2.96	2.06	2.96	2.36	1.90
	$R^{2}$	0.86	0.89	0.84	0.67	0.71	0.62	0.39	0.02	0.66	0.08	0.61	0.80
ELM	MAE	1.76	1.32	1.82	1.20	0.95	1.14	1.57	2.10	1.30	1.71	1.51	1.27
	MAPE	6.03	5.31	6.94	6.95	5.92	6.17	7.23	9.10	7.09	11.17	8.68	6.26
	RMSE	2.38	1.72	2.33	1.57	1.23	1.52	2.12	2.66	1.65	2.12	1.89	1.65
	$R^{2}$	0.88	0.91	0.85	0.75	0.80	0.72	0.64	0.16	0.78	0.27	0.75	0.85

Table 4. Deep Transfer Learning Case 3.

		Month
Model		Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec
MLP	MAE	2.56	2.65	2.58	2.51	2.12	2.01	2.12	2.42	2.03	2.10	2.34	2.37
	MAPE	8.75	10.60	9.87	14.54	13.12	10.86	9.74	10.51	11.05	13.71	13.43	11.69
	RMSE	3.14	3.32	3.38	3.06	2.56	2.48	2.71	3.03	2.44	2.62	2.91	2.94
	$R^{2}$	0.79	0.64	0.69	0.05	0.10	0.27	0.41	−0.10	0.52	−0.11	0.41	0.51
CNN	MAE	3.02	3.69	3.53	4.81	3.62	2.71	2.49	2.95	2.96	4.55	4.45	4.59
	MAPE	10.34	14.79	13.48	27.82	22.40	14.63	11.45	12.82	16.09	29.63	25.47	22.62
	RMSE	4.39	5.37	5.10	6.25	4.75	3.70	3.27	3.67	3.58	5.66	5.93	6.59
	$R^{2}$	0.58	0.05	0.29	−2.96	−2.08	−0.64	0.14	−0.61	−0.05	−4.18	−1.45	−1.45
ELM	MAE	2.29	2.80	2.68	2.93	2.41	1.98	1.92	2.44	2.23	3.09	3.18	3.21
	MAPE	7.85	11.22	10.22	16.96	14.91	10.68	8.84	10.58	12.16	20.10	18.18	15.84
	RMSE	3.07	3.76	3.65	3.82	3.06	2.61	2.55	3.09	2.67	3.78	4.05	4.27
	$R^{2}$	0.80	0.54	0.64	−0.48	−0.28	0.19	0.48	−0.14	0.42	−1.30	−0.14	−0.03

Table 5. Multi-task Deep Learning.

		Month
Model		Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec
MLP	MAE	3.21	2.74	2.81	1.98	1.66	1.69	1.98	2.67	1.88	2.22	2.75	2.80
	MAPE	10.97	10.99	10.73	11.45	10.31	9.10	9.06	11.50	10.23	14.45	15.75	13.80
	RMSE	3.75	3.24	3.36	2.47	2.00	2.16	2.61	3.43	2.32	2.64	3.31	3.27
	$R^{2}$	0.70	0.66	0.69	0.38	0.46	0.44	0.45	-0.40	0.56	−0.12	0.24	0.40
CNN	MAE	1.67	1.69	0.80	1.49	1.28	1.20	1.46	2.14	1.22	1.60	1.75	1.57
	MAPE	5.62	6.78	6.86	8.63	7.97	6.47	6.58	9.30	6.66	10.46	10.03	7.75
	RMSE	2.30	2.22	2.45	1.99	1.62	1.67	1.98	2.87	1.51	1.88	2.04	2.02
	$R^{2}$	0.89	0.84	0.84	0.60	0.64	0.67	0.69	0.01	0.82	0.43	0.70	0.77
ELM	MAE	2.03	1.85	2.03	1.35	0.96	1.17	1.49	2.28	1.14	1.00	1.38	1.56
	MAPE	6.86	7.42	7.77	7.80	5.96	6.33	6.86	9.89	6.22	6.56	7.92	7.69
	RMSE	2.61	2.28	2.56	1.80	1.21	1.57	2.05	2.96	1.46	1.30	1.84	1.95
	$R^{2}$	0.85	0.83	0.82	0.67	0.80	0.71	0.66	-0.05	0.83	0.73	0.77	0.79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Laitsos, V.; Vontzos, G.; Tsiovoulos, A.; Bargiotas, D.; Tsoukalas, L.H. Enhanced Sequence-to-Sequence Deep Transfer Learning for Day-Ahead Electricity Load Forecasting. Electronics 2024, 13, 1996. https://doi.org/10.3390/electronics13101996

AMA Style

Laitsos V, Vontzos G, Tsiovoulos A, Bargiotas D, Tsoukalas LH. Enhanced Sequence-to-Sequence Deep Transfer Learning for Day-Ahead Electricity Load Forecasting. Electronics. 2024; 13(10):1996. https://doi.org/10.3390/electronics13101996

Chicago/Turabian Style

Laitsos, Vasileios, Georgios Vontzos, Apostolos Tsiovoulos, Dimitrios Bargiotas, and Lefteri H. Tsoukalas. 2024. "Enhanced Sequence-to-Sequence Deep Transfer Learning for Day-Ahead Electricity Load Forecasting" Electronics 13, no. 10: 1996. https://doi.org/10.3390/electronics13101996

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Sequence-to-Sequence Deep Transfer Learning for Day-Ahead Electricity Load Forecasting

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Analysis

2.2. Data Preprocessing

2.3. Feature Creation

3. Methodology

3.1. Deep Learning Models

3.1.1. Multilayer Perceptron

3.1.2. Convolutional Neural Network

3.1.3. Ensemble Learning Model

3.2. Deep Transfer Learning Forecasting Strategies

3.2.1. Deep Transfer Learning Case 1

3.2.2. Deep Transfer Learning Case 2

3.2.3. Deep Transfer Learning Case 3

3.2.4. Multi-Task Deep Learning

3.3. Optimization Framework

3.4. Evaluation Metrics

3.5. Software Environment

4. Results Analysis

4.1. Multilayer Perceptron Results

4.2. Convolutional Neural Networks Results

4.3. Ensemble Learning Model Results

4.4. Results Comparison

4.4.1. Variation in MAE for Each Strategy

4.4.2. Variation in MAPE for Each Strategy

4.4.3. Variation in RMSE for Each Strategy

4.5. Aggregated Results

4.6. Results Discussion

5. Conclusions and Future Study Proposals

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI