Feature Transfer and Rapid Adaptation for Few-Shot Solar Power Forecasting

Ren, Xin; Wang, Yimei; Cao, Zhi; Chen, Fuhao; Li, Yujia; Yan, Jie

doi:10.3390/en16176211

Open AccessArticle

Feature Transfer and Rapid Adaptation for Few-Shot Solar Power Forecasting

by

Xin Ren

¹,

Yimei Wang

¹,

Zhi Cao

²,

Fuhao Chen

³,

Yujia Li

³ and

Jie Yan

^3,*

¹

China Huaneng Clean Energy Research Institute, Beijing 102209, China

²

China Huaneng Group Co., Ltd., Beijing 100031, China

³

School of New Energy, North China Electric Power University, Beijing 102206, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(17), 6211; https://doi.org/10.3390/en16176211

Submission received: 14 July 2023 / Revised: 5 August 2023 / Accepted: 24 August 2023 / Published: 26 August 2023

(This article belongs to the Section A3: Wind, Wave and Tidal Energy)

Download

Browse Figures

Versions Notes

Abstract

:

A common dilemma with deep-learning-based solar power forecasting models is their heavy dependence on a large amount of training data. Few-Shot Solar Power Forecasting (FSSPF) has been investigated in this paper, which aims to obtain accurate forecasting models with limited training data. Integrating Transfer Learning and Meta-Learning, approaches of Feature Transfer and Rapid Adaptation (FTRA), have been proposed for FSSPF. Specifically, the adopted model will be divided into Transferable learner and Adaptive learner. Using massive training data from source solar plants, Transferable learner and Adaptive learner will be pre-trained through a Transfer Learning and Meta-Learning algorithm, respectively. Ultimately, the parameters of the Adaptive learner will undergo fine-tuning using the limited training data obtained directly from the target solar plant. Three open solar power forecasting datasets (GEFCom2014) were utilized to conduct 24-h-ahead FSSPF experiments. The results illustrate that the proposed FTRA is able to outperform other FSSPF approaches, under various amounts of training data as well as different deep-learning models. Notably, with only 10-day training data, the proposed FTRA can achieve an RMSR of 8.42%, which will be lower than the 0.5% achieved by the state-of-the-art approaches.

Keywords:

Few-Shot Solar Power Forecasting; deep-learning; transfer learning; meta-learning

1. Introduction

On the one hand, due to their environmental pollution, greenhouse gas emissions, non-renewable nature, and traditional fossil energy sources are increasingly falling short of meeting the requirements of the world for sustainable development [1]. On the other hand, renewable energy sources, such as wind and solar power, are gaining significant global attention due to their clean, low-carbon, and renewable attributes [2]. Benefiting from technological advances as well as policy support, Solar Power (SP) has rapidly developed in recent years and become an important part of the new power system [3]. Nevertheless, SP output is susceptible to weather conditions, such as irradiance, resulting in significant volatility and randomness [4]. This poses a great challenge to the stability and security of the whole power system [5]. Solar Power Forecasting (SPF) is designed to forecast the SP for a desired future period, which can provide references for dispatch and control in power systems [6].

Deep-learning-based SPF has gained prominent attention in the current research, benefiting from its ability to learn nonlinear complex features and adaptability to various types of SP datasets [7,8,9]. Long Short-Term Memory Neural Networks (LSTM) [10], Gate Recurrent Unit (GRU) [11], and Transformer [12] have been widely adopted in SPF, to accurately capture the changing patterns of SP and the complex relationship between SP and meteorological factors, thus improving the forecasting accuracy. However, the performance of the aforementioned deep-learning models heavily relies on the availability of a substantial amount of training data [13,14].

For newly built or expanded solar plants, it poses a challenge to gather an adequate amount of training data due to their limited operating time [15]. To reduce the losses caused by data scarcity, Few-Shot Solar Power Forecasting (FSSPF) is investigated in this paper. The objective of FSSPF is to obtain accurate SPF models using a limited amount of training data.

Currently, there are three approaches to implementing FSSPF: Data Augmentation, Metric Learning, and Transfer Learning:

Data Augmentation refers to the utilization of auxiliary data or information to expand the limited number of original samples or enhance their features [16]. A dual-dimensional time series adversarial neural network has been proposed in [17] to enhance the low-value-density SP data by two dimensions (time dimension and feature dimension) and obtain high-value-density feature data.

Metric learning refers to the process of selecting an appropriate distance function for calculating the similarity between different datasets. This similarity measurement serves as the foundation for expanding training data or weighting models [18]. The Mahalanobis Distance Similarity metric has been adopted in [19]. Firstly, the gray correlation between each meteorological factor and the output power of the solar plant is calculated. Secondly, several similar days are selected to expand the training set using the Mahalanobis distance.

Although the above two methods can enhance the accuracy of FSSPF to some extent, the robustness and effectiveness of these two approaches on deep-learning models can hardly be guaranteed, due to the low diversity of the original few-shot dataset [20]. In contrast, Transfer Learning methods dominate the current deep-learning-based FSSPF.

Transfer learning (TL), which aims to apply knowledge learned from the source domain to the target domain, is now accepted as the dominant approach in the field of Few-Shot Learning (FSL) [21]. Corresponding to the FSSPF, we will refer to solar plants that contain only a small amount of operational data as Target Solar Plants (TSP), while solar plants that have a large amount of operational data are referred to as Source Solar Plants (SSP). A TL-based FSSPF consists of two stages: pre-raining and fine-tuning. The former will optimize all the parameters in the models with massive training data from SSP, while the latter will use the limited training data from the TSP to selectively fine-tune the model parameters. A digital twin model for the FSSPF based on TL and LSTM was proposed in [22], which chooses to freeze the first layer of the models during the fine-tuning stage, and then fine-tunes the other layers using a small amount of training data from the TSP. Additionally, [23] has selected to freeze the earlier layers and only fine-tune the weight values of the last layer of the model in the fine-tuning stage.

Existing TL-based FSSPF methods directly use the parameters pre-trained from the SPF as initialization parameters for the fine-tuning stage. However, this implementation may make it difficult for the pre-trained model to rapidly adapt to the TSP, due to the large differences between the SSP and the TSP. To address this problem, Meta-Learning has been employed in this paper.

Meta-Learning [24], which is widely known as “learn to learn”, is the process of distilling the knowledge from multiple learning tasks and using this knowledge to improve future learning performance, and has excelled in many FSL tasks [25,26,27,28]. With the excellent adaptations and scalability to deep neural networks, Reptile [29] has greatly facilitated the development of related fields [30]. Compared with TL, the pre-trained models obtained through Reptile can more accurately identify the underlying data characteristics of the target task, and thus achieve faster adaptation.

Through the integration of TL and Reptile, an approach to Feature Transfer and Rapid Adaptation (FTRA) is proposed for FSSPF in this paper. Compared with previous studies, the contributions of this paper are summarized as follows:

In the proposed FTRA, the adopted deep-learning-based SPF model will be reasonably divided into Transferable Learner and Adaptive Learner, which are responsible for Feature Transfer and Rapid Adaptation, respectively.
TL and Reptile were integrated to develop different pre-training and fine-tuning strategies for parameters in different parts of the model, so as to extract valuable knowledge from the SSP to the TSP and adapt the pre-trained model to TSP rapidly.

In addition, to validate the proposed FTRA on FSSPF, three open-solar power forecasting datasets from GEFCom2014 [31] were utilized. A 24 h SPF will be conducted ahead of time, which will use Numerical Weather Prediction (NWP) to forecast the SP at the corresponding time. SPF models, LSTM-based, GRU-based, and Transformer-based, will be adopted to examine the generalizability of the proposed FTRA to various types of deep-learning models. Three different sizes of training data (10-day, 20-day, and 30-day) and cross-validation methods will be used to comprehensively compare the performance of different approaches on FSSPF.

This paper is organized as follows: Section 2 is the preliminary for the proposed approach, which will introduce the detailed structure of the adopted SPF models. The implementation of the proposed FTRA will be explained in Section 3. A case study for FSSPF will be presented in Section 4. Section 5 provides conclusions and outlooks for future works.

2. Solar Power Forecasting Models

The diagram of the adopted SPF models adopted is shown in Figure 1. The input of the SPF model is 24 h ahead NWP data, while the output is 24 h ahead SPF results.

Each SPF model contains three components: NWP Embedding Layer (NWPEL), Meteorological Encoder (ME), and Power Output Layer (POL). NWPEL is designed to embed NWP vectors into meteorological feature vectors, ME is utilized to map meteorological feature vectors into output vectors, and POL is used to map output vectors into the final SPF results. For the three deep-learning SPF models adopted in this paper, their NWPEL and POL are identical, composed of single-layer Fully Connected Neural Networks (FCNN). The main difference between the three models reflects on ME, which can be described as follows:

(1): Transformer-based: The encoderlayer in [32] is used as the ME in the Transformer-based SPF model, which contains a multi-head self-attention mechanism, residual connection, layer normalization, and position-wise FCNN;
(2): LSTM-based: Single-layer LSTM in [33] is used as the ME in the LSTM-based SPF model, which comprises of forget gate, input gate, update gate, and output gate;
(3): GRU-based: Single-layer GRU in [34] is used as the ME in the LSTM-based SPF model, which is made up of a reset gate and an update gate.

3. Methodology: FTRA

In this section, an approach to Feature Transfer and Rapid Adaptation (FTRA) for FSSPF will be proposed. It can be learned from Figure 2 that the proposed FTRA consists of 4 steps: Division of the SPF Model, Transfer-Pre-Training, Meta-Pre-Training, and fine-tuning. The detailed implementation is described as follows.

3.1. Division of SPF Model

In FTRA, each adopted SPF model will first be divided into Transferable Learner and Adaptive Learner. In particular, the Transferable Learner consists of ME and POL, while the Adaptive Learner is made up of NWPEL only. Formally, we consider an SPF model represented by a parametrized function

f_{θ}

with parameters

θ

. Further, Transferable Learner will be represented by a function

f_{θ^{T L}}

with parameters

θ^{T L}

, while Adaptive Learner will be represented by a function

f_{θ^{A L}}

with parameters

θ^{A L}

.

3.2. Transfer-Pre-Training

In Transfer-Pre-Training, massive training data from SSP will be used to update all parameters in the adopted SPF model. The optimization objective of the Transfer-Pre-Training is to minimize the forecasting error of the model in all the SSP [20], and its detailed implementation can refer to Algorithm 1.

Algorithm 1 Transfer-Pre-Training Algorithm
Require: $N$ , $l_{T P T}$ , $L$ , $U$
1.	Randomly initialize $θ$
2.	while not done do
3.	Randomly select $N$ SSPs as $\{{s s p}_{i}\}$
4.	for all ${s s p}_{i}$ do
5.	$θ \to θ_{i}$
6.	Sample one $b a t c h$ datapoints $\{n w p, p\}$ from ${s s p}_{i}$
7.	Update $θ_{i} \leftarrow U (L ((f_{θ_{i}} (n w p)), p), l_{T P T})$
8.	end for
9.	Update $θ \leftarrow \frac{1}{N} \cdot {\sum_{i = 1}^{N} θ}_{i}$
10.	end while

Within Algorithm 1,

N

denotes the number of the selected SSPs per updating epoch, where

\{n w p, p\}

represents the value of the NWP data and their corresponding SP. Each batch contains

{b s}_{t p t}

training samples.

l_{T P T}

,

L

, and

U

, denote the learning rate, loss function and optimizer in the updating of the

θ

, respectively.

3.3. Meta-Pre-Training

After the pre-training with Algorithm 1, the parameters of the Adaptive Learner,

θ^{A L}

, will first be re-initialized randomly, and then be pre-trained with the Reptile [29] algorithm in Meta-Pre-Training. During this process, the parameters of the Transferable Learner,

θ^{T L}

, will remain fixed.

The optimization objective of the Meta-Pre-Training is not only to minimize the forecasting error of the model in all SSPs, but also to maximize the gradient similarity of the samples from the same SSP [29]. Based on this, the Adaptive Learner is able to distinguish the respective gradient using limited available samples, and then make a rapid adaptation to TSP. The detailed implementation of Meta-Pre-Training can refer to Algorithm 2.

Algorithm 2 Meta-Pre-Training Algorithm
Require: $N$ , $S$ , $l_{i n n e r}$ , $l_{o u t e r}$ , $L$ , $U$
1.	Randomly initialize $θ^{A L}$
2.	while not done do
3.	Randomly select $N$ SPPs as $\{{s p p}_{i}\}$
4.	for all ${s p p}_{i}$ do
5.	$θ^{A L} \to θ_{i}^{A L}$
6.	Sample $S - b a t c h$ datapoints $\{{n w p}^{j}, p^{j}\}$ from ${s p p}_{i}$
	//Inner Loop
7.	for all [ ${n w p}^{j}, p^{j}]$ do
8.	Update $θ_{i}^{A L} \leftarrow U (L ((f_{[θ^{T L}, θ_{i}^{A L}]} ({n w p}^{j})), p^{j}), l_{i n n e r})$
9.	end for
10.	end for
	//Outer Loop
11.	Update $θ^{A L} \leftarrow θ^{A L} + \frac{l_{o u t e r}}{N} \cdot \sum_{i = 1}^{N} (θ_{i}^{A L} - θ^{A L})$
12.	end while

As shown in Algorithm 2, the “Inner–Outer Loop” mechanism has been introduced to achieve the above bilevel optimization [29].

S

denotes the number of Inner-Loops inside each Outer Loop, while

l_{i n n e r}

and

l_{o u t e r}

represent the learning rates in the Inner-Loop and Outer Loop, respectively.

\{{n w p}^{j}, p^{j}\}

represents the value of the NWP data and their corresponding wind power in the

j

th Inner Loop. Each batch contains

{b s}_{m p t}

training samples. The other notations have the same meaning as in Algorithm 1.

3.4. Fine-Tuning

After pre-training with Algorithms 1 and 2, the Adaptive Learner will further be fined-tuned using limited training data from TSP itself. During this process, the parameters of the Transferable Learner will remain fixed. After fine-tuning, the final SPF model for TSP can be obtained.

The optimization objective of fine-tuning is to minimize the forecasting error of the model in the TSP, and its detailed implementation can refer to Algorithm 3.

l_{F T}

in Algorithm 3 denotes the learning rate in updating

θ^{A L}

. Each batch contains

{b s}_{f t}

training samples here. Other notations have the same meaning as in Algorithm 1.

Algorithm 3 Fine-Tunning Algorithm
Require: $l_{F T}$ , $L$ , $U$
1.	while not done do
2.	Sample one $b a t c h$ datapoints $\{n w p, p\}$ from $T S P$
3.	Update $θ^{A L} \leftarrow U (L ((f_{[θ^{T L}, θ^{A L}]} (n w p)), p), l_{F T})$
4.	end while

4. Case Study

4.1. Data Description

This paper utilizes the publicly available dataset of three SPPs from the 2014 Global Energy Competition (GEFCom2014) [31] for the case study. The three SPPs are located in a region of Australia, and the exact location is unknown. Each SPP contains NWP data and SP data, and the time resolution is 1-h. Two-year data (from 1 April 2012 to 1 April 2014) will be adopted in this case.

As for NWP data, there are 12 meteorological variables included: total column liquid water, total column ice water, surface pressure, relative humidity at 1000 mbar, total cloud cover, 10-metre U wind component, 10-metre V wind component, 2-metre temperature, surface solar rad down, surface thermal rad down, top net solar rad, and total precipitation. The forecasting horizon of NWP is 24 h (1:00 today to 0:00 tomorrow). To be able to be used for model training, original NWP data need to be normalized. Specifically, we will use the Maximum–Minimum normalization method to pre-process the different NWP items. It is worth noting that, unlike the NWP items from the SSPs, which are normalized according to their own maximum and minimum values, the NWP items in the TSP will be normalized according to the maximum and minimum values of the SSPs.

In addition to SP data, it has been normalized to values between 0 and 1.

4.2. Settings of SPP and TPP

For three SPPs in GEFCom2014, we will mark them as SPP1, SPP2, and SPP3. Each SPP will be picked out once to be the TSP and the remaining two SPPs will be treated as SSPs, corresponding to one FSSPF setting. Finally, there are a total of three FSSPF settings in this case, which detailed information can refer to Table 1.

4.3. Evaluation Metric

In order to evaluate the FSSPF performance of the proposed approach, Root Mean Square Error (RMSE) and Mean Average Error (MAE) have been adopted as the evaluation metrics, which calculation can refer to (1) and (2), respectively.

RMSE = (\sqrt{\frac{1}{T} \sum_{t = 1}^{T} (\frac{P_{M}^{t} - P_{F}^{t}}{P_{R}})}) \times 100 %

(1)

MAE = \frac{1}{T} \sum_{t = 1}^{T} |\frac{P_{M}^{t} - P_{F}^{t}}{P_{R}}| \times 100 %

(2)

P_{R}

is the rated capacity of the SPP, and its value is 1 in this case.

P_{M}^{t}

and

P_{F}^{t}

are the measured power and the forecasted power at t-th time step, respectively.

T

is the total number of steps in the forecasting time horizon, and its value is 24 in this case.

4.4. Evaluation Method

To comprehensively evaluate the forecasting performance of the proposed approach in different scenarios, the K-Fold Cross-Validation method (KFCV) will be introduced. The schematic diagram of the KFCV for FSSPF is shown in Figure 3.

Given a limited amount of the training dataset, the total dataset will be divided into

K

sub-datasets in chronological order, corresponding to different operational scenarios. Each sub-dataset is given one chance to be viewed as the training dataset, while the others are treated as testing datasets. There are a total of

K

FSSPF experiments, where the final evaluation results rely on their average results.

4.5. Comparison Methods

In order to demonstrate the advanced nature of the proposed FTRA, we are going to compare it with other six typical methods:

Baseline [23]: supervised learning method. Models will be trained from random initialization parameters, using limited training data from the TSP.

Upper-Bound: supervised learning method. Models will be trained from random initialization parameters, using massive training data from the TSP.

TL-FT-Out [23]: TL method. Models will be first pre-trained with Algorithm 1, and then only the parameters in POL will be fine-tuned using limited training data from the TSP.

TL-FT-All [22]: TL method. Models will be first pre-trained with Algorithm 1, and then all parameters will be fine-tuned using limited training data from the TSP.

TL-FT-Inp: TL method. Models will be first pre-trained with Algorithm 1, and then only the parameters in NWPEL will be fine-tuned using limited training data from the TSP.

Reptile [29]: Meta-Learning method. Models will be first pre-trained with the Reptile algorithm and massive training data from SSP, and all parameters will be fine-tuned using limited training data from the TSP.

4.6. Hyperparameters

The detailed hyperparameters in the proposed FTRA have been described as below:

Firstly, for three adopted SPF models, the dimension of their hidden layer is 64. Secondly,

L

and

U

will be set as L2Loss and Adam, respectively. Secondly, Early Stopping [35] will be adopted in this paper to timely terminate the training process. Particularly, in Algorithms 1–3, 20% of samples will be randomly selected from the training dataset to act as the validation dataset, and the maximum tolerance to overfitting is

E

epochs. The other hyperparameters will vary with respective conditions:

Transformer-based:

S-1:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.0001, 64, and 50, in Algorithm 1;

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 10, 0.001, 0.7, 16, and 50, in Algorithm 2;

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.001, 4, and 10, in Algorithm 3.

S-2:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.0001, 64, and 50, in Algorithm 1;

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 6, 0.001, 0.7, 32, and 50, in Algorithm 2;

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.01, 4, and 10, in Algorithm 3.

S-3:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.001, 64, and 50, in Algorithm 1.

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 8, 0.001, 1.0, 32, and 50, in Algorithm 2;

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.00001, 4, and 10, in Algorithm 3.

LSTM-based:

S-1:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.0001, 64, and 50, in Algorithm 1;

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 10, 0.001, 0.7, 32, and 50, in Algorithm 2;

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.001, 4, and 10, in Algorithm 3.

S-2:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.0001, 16, and 50, in Algorithm 1;

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 10, 0.001, 0.5, 32, and 50, in Algorithm 2;

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.00001, 4, and 10, in Algorithm 3.

S-3:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.001, 64, and 50, in Algorithm 1;

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 8, 0.01, 0.7, 32, and 50, in Algorithm 2;

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.00001, 4, and 10, in Algorithm 3.

GRU-based:

S-1:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.001, 64, and 50, in Algorithm 1;

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 4, 0.01, 0.7, 32, and 50, in Algorithm 2;

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.0001, 4, and 10, in Algorithm 3.

S-2:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.0001, 16, and 50, in Algorithm 1;

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 2, 0.01, 0.7, 16, and 50, in Algorithm 2;

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.0001, 4, and 10, in Algorithm 3.

S-3:

N

,

l_{T P T}

,

{b s}_{t p t}

, and

E

, will be set as 2, 0.0001, 16, and 50, in Algorithm 1;

N

,

S

,

l_{i n n e r}

,

l_{o u t e r}

,

{b s}_{m p t}

, and

E

, will be set as 2, 8, 0.001, 0.5, 32, and 50, in Algorithm 2.

l_{F T}

,

{b s}_{f t}

, and

E

, will be set as 0.00001, 8, and 10, in Algorithm 3.

4.7. FSSPF Results

Using 10-day, 20-day, and 30-day training data, the FSSPF results of the proposed FTRA and other comparison approaches have been presented in Table 2 and Table 3.

From Table 2 and Table 3, it can be deduced that: (1) Under the same amount of training data from TSP, the proposed FTRA can significantly improve the accuracy of FSSPF, compared with Baseline. These results illustrate the effectiveness of the proposal.

FTRA. (2) The forecasting accuracy of the proposed FTRA is, to varying degrees, better than the other comparison approaches for different models and different amounts of training data, which demonstrates the effect of the Feature Transfer and Rapid Adaptation. (3) When the length of the training samples is 10-day, 20-day and 30-day, respectively, the forecasting accuracy of FTRA steadily improves with an increase in the number of training samples. The forecasting error difference with Upper-Bound is also quite minimal. The RMSE of FTRA is only approximately 1.16% larger on average, and the MAE is only about 0.94% larger on average, further illustrating the efficiency and supremacy of the proposed FTRA.

On the one hand, the solar power forecasted curves resulting from different approaches in a specific period are shown in Figure 4. On the one hand, the forecasting performance of the Baseline is clearly subpar in the “valley” of the forecasted curve. The forecasting performance of FTRA is noticeably superior to that of the Reptile algorithm in the “peak” of the forecasting curve, where solar power fluctuates greatly and the training samples are limited. On the other hand, the FTRA approach’s forecasting accuracy is marginally more excellent than the TL-FT-Inp approach when the power is falling or low, which confirms the forecasted outcomes in Table 2 and Table 3. The forecasting curves illustrate that the models’ capacity to adapt to changing scenarios is enhanced by FTRA, and it is possible to build an accurate mapping from NWP to solar power, which raises the forecasting accuracy of FSSPF.

4.8. Computational Costs

This case is implemented in PyTorch’s deep-learning library. The configurations of the simulation computer are Intel Core i7-12700H processor and NVIDIA GeForce RTX 3060 Laptop GPU using Windows 11 Operating System.

The computational costs of neural networks can be divided into two parts: training and inference. Once the neural networks are well trained, their computational time in the inference phase is negligible at less than 1 s. Hence, the main computational costs of the proposed FTRA in this paper are reflected in the training of the SPF models. The specific computation time of each algorithm in FTRA for different scenarios is shown in Table 4. It can be concluded that the computation time of the proposed FTRA is small enough for practical applications.

5. Conclusions

For the purpose of developing accurate solar power forecasting models for newly built solar power plants with only a small amount of training data available, an approach to Feature Transfer and Rapid Adaptation (FTRA) is proposed in this paper. Building on the existing TL methods, the contributions of the proposed FTRA are reflected in two aspects:

(1): FTRA will divide the adopted deep-learning-based SPF model into the Transferable Learner and the Adaptive Learner, which will take charge of Feature Transfer and Rapid Adaptation, respectively.
(2): Through integrating TL and Reptile, the parameters of the Transferable Learner and the Adaptive Learner will be assigned different pre-training and fine-tuning strategies.

By doing so, FTRA can transfer valuable knowledge from the source solar plants, while simultaneously achieving rapid adjustment to the designated target solar plant.

One publicly available solar power dataset (GEFCom2014) and three deep-learning SPF models have been adopted in our case study to validate the proposed FTRA. The results illustrate that the proposed FTRA is able to outperform the other state-of-the-art methods, under different amounts of training data, different SPF models, and different “SPP-TSP” settings.

To further improve the proposed FTRA, some work can be done in the future: Firstly, meteorological factors like solar radiation and air humidity are closely related to the changes in months and seasons. To further enhance the adaptability of the pre-trained models to different meteorological conditions, the segmentation of the original datasets deserves to be improved. Secondly, in the pre-training stage, the correlation of features between the SSPs and TSP should be analyzed and incorporated into the parameters updating process.

Author Contributions

Conceptualization, X.R. and Y.W.; methodology, Z.C. and F.C.; software, F.C.; validation, X.R. and J.Y.; formal analysis, Y.W., F.C. and Z.C.; investigation, Y.L., F.C. and J.Y.; data curation, Y.L. and F.C.; writing—original draft preparation, Y.L. and F.C., writing—review and editing, F.C., J.Y. and Z.C.; visualization, F.C. and J.Y.; supervision, J.Y.; project administration, X.R., Y.W. and Z.C.; funding acquisition, X.R. Y.W. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

Key technology research and system development for the construction of group-level intelligent operation and maintenance platform, China Huaneng Group Technology Project, Grant/Award Number: HNKJ21-H52.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available with reference number [31]. They can be downloaded through this link: https://www.sciencedirect.com/science/article/abs/pii/S0169207016000133.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

FSSPF	Few-Shot Solar Power Forecasting
FTRA	Feature Transfer and Rapid Adaptation
SP	Solar Power
SSP	Source Solar Plants
TSP	Target Solar Plants
SPF	Solar Power Forecasting
LSTM	Long Short-Term Memory Neural Network
GRU	Gate Recurrent Unit
TL	Transfer learning
FSL	Few-Shot Learning
NWP	Numerical Weather Prediction
NWPEL	NWP Embedding Layer
ME	Meteorological Encoder
POL	Power Output Layer
FCNN	Fully-Connected Neural Networks
GEFCom2014	2014 Global Energy Competition
RMSE	Root Mean Square Error
MAE	Mean Average Error
KFCV	K-Fold Cross-Validation method

References

Zhang, J.; Hao, Y.; Fan, R.; Wang, Z. An Ultra-Short-Term PV Power Forecasting Method for Changeable Weather Based on Clustering and Signal Decomposition. Energies 2023, 16, 3092. [Google Scholar] [CrossRef]
Wu, K.; Peng, X.; Li, Z.; Cui, W.; Yuan, H.; Lai, C.S.; Lai, L.L. A Short-Term Photovoltaic Power Forecasting Method Combining a Deep Learning Model with Trend Feature Extraction and Feature Selection. Energies 2022, 15, 5410. [Google Scholar] [CrossRef]
Marweni, M.; Hajji, M.; Mansouri, M.; Mimouni, M.F. Photovoltaic Power Forecasting Using Multiscale-Model-Based Machine Learning Techniques. Energies 2023, 16, 4696. [Google Scholar] [CrossRef]
Cantillo-Luna, S.; Moreno-Chuquen, R.; Celeita, D.; Anders, G. Deep and Machine Learning Models to Forecast Photovoltaic Power Generation. Energies 2023, 16, 4097. [Google Scholar] [CrossRef]
Wang, M.; Wang, P.; Zhang, T. Evidential Extreme Learning Machine Algorithm-Based Day-Ahead Photovoltaic Power Forecasting. Energies 2022, 15, 3882. [Google Scholar] [CrossRef]
Huang, H.; Zhu, Q.; Zhu, X.; Zhang, J. An Adaptive, Data-Driven Stacking Ensemble Learning Framework for the Short-Term Forecasting of Renewable Energy Generation. Energies 2023, 16, 1963. [Google Scholar] [CrossRef]
Alkhayat, G.; Mehmood, R. A Review and Taxonomy of Wind and Solar Energy Forecasting Methods Based on Deep Learning. Energy AI 2021, 4, 100060. [Google Scholar] [CrossRef]
Zhao, S.; Wu, Q.; Zhang, Y.; Wu, J.; Li, X.-A. An Asymmetric Bisquare Regression for Mixed Cyberattack-Resilient Load Forecasting. Expert Syst. Appl. 2022, 210, 118467. [Google Scholar] [CrossRef]
Yang, Y.; Zhou, H.; Wu, J.; Liu, C.-J.; Wang, Y.-G. A Novel Decompose-Cluster-Feedback Algorithm for Load Forecasting with Hierarchical Structure. Int. J. Electr. Power Energy Syst. 2022, 142, 108249. [Google Scholar] [CrossRef]
Sareen, K.; Panigrahi, B.K.; Shikhola, T.; Sharma, R. An Imputation and Decomposition Algorithms Based Integrated Approach with Bidirectional LSTM Neural Network for Wind Speed Prediction. Energy 2023, 278, 127799. [Google Scholar] [CrossRef]
Ji, L.; Fu, C.; Ju, Z.; Shi, Y.; Wu, S.; Tao, L. Short-Term Canyon Wind Speed Prediction Based on CNN—GRU Transfer Learning. Atmosphere 2022, 13, 813. [Google Scholar] [CrossRef]
Al-Ali, E.M.; Hajji, Y.; Said, Y.; Hleili, M.; Alanzi, A.M.; Laatar, A.H.; Atri, M. Solar Energy Production Forecasting Based on a Hybrid CNN-LSTM-Transformer Model. Mathematics 2023, 11, 676. [Google Scholar] [CrossRef]
Hu, J.; Li, H. A Transfer Learning-Based Scenario Generation Method for Stochastic Optimal Scheduling of Microgrid with Newly-Built Wind Farm. Renew. Energy 2022, 185, 1139–1151. [Google Scholar] [CrossRef]
Yang, Y.; Wang, Z.; Zhao, S.; Wu, J. An Integrated Federated Learning Algorithm for Short-Term Load Forecasting. Electr. Power Syst. Res. 2023, 214, 108830. [Google Scholar] [CrossRef]
Luo, X.; Zhang, D.; Zhu, X. Combining Transfer Learning and Constrained Long Short-Term Memory for Power Generation Forecasting of Newly-Constructed Photovoltaic Plants. Renew. Energy 2022, 185, 1062–1077. [Google Scholar] [CrossRef]
Wen, Q.; Sun, L.; Yang, F.; Song, X.; Gao, J.; Wang, X.; Xu, H. Time Series Data Augmentation for Deep Learning: A Survey. arXiv 2020, arXiv:2002.12478. [Google Scholar]
Liu, L.-M.; Ren, X.-Y.; Zhang, F.; Gao, L.; Hao, B. Dual-Dimension Time-GGAN Data Augmentation Method for Improving the Performance of Deep Learning Models for PV Power Forecasting. Energy Rep. 2023, 9, 6419–6433. [Google Scholar] [CrossRef]
Kaya, M.; Bilge, H. Deep Metric Learning: A Survey. Symmetry 2019, 11, 1066. [Google Scholar] [CrossRef]
Mao, Y.; Fan, F. Ultra-short-term prediction of PV power based on similar days of Mahalanobis distance. Renew. Energy Resour. 2021, 2, 175–181. [Google Scholar]
Sun, Q.; Liu, Y.; Chen, Z.; Chua, T.-S.; Schiele, B. Meta-Transfer Learning Through Hard Tasks. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1443–1456. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Yang, H.; Wang, W. Prediction of photovoltaic power generation based on LSTM and transfer learning digital twin. J. Phys. Conf. Ser. 2023, 2467, 012015. [Google Scholar] [CrossRef]
Miraftabzadeh, S.M.; Colombo, C.G.; Longo, M.; Foiadelli, F. A Day-Ahead Photovoltaic Power Prediction via Transfer Learning and Deep Neural Networks. Forecasting 2023, 5, 213–228. [Google Scholar] [CrossRef]
Hospedales, T.M.; Antoniou, A.; Micaelli, P.; Storkey, A.J. Meta-Learning in Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 1. [Google Scholar] [CrossRef] [PubMed]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks 2017. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Feng, Y.; Chen, J.; Xie, J.; Zhang, T.; Lv, H.; Pan, T. Meta-Learning as a Promising Approach for Few-Shot cross-Domain Fault Diagnosis: Algorithms, Applications, and Prospects. Knowl.-Based Syst. 2022, 235, 107646. [Google Scholar] [CrossRef]
Liu, T.; Ma, X.; Li, S.; Li, X.; Zhang, C. A Stock Price Prediction Method Based on Meta-Learning and Variational Mode Decomposition. Knowl.-Based Syst. 2022, 252, 109324. [Google Scholar] [CrossRef]
Li, Y.; Zhang, S.; Hu, R.; Lu, N. A Meta-Learning Based Distribution System Load Forecasting Model Selection Framework. Appl. Energy 2021, 294, 116991. [Google Scholar] [CrossRef]
Nichol, A.; Achiam, J.; Schulman, J. On First-Order Meta-Learning Algorithms 2018. arXiv 2018, arXiv:1803.02999. [Google Scholar]
Yan, M.; Pan, Y. Meta-Learning for Compressed Language Model: A Multiple Choice Question Answering Study. Neurocomputing 2022, 487, 181–189. [Google Scholar] [CrossRef]
Hong, T.; Pinson, P.; Fan, S.; Zareipour, H.; Troccoli, A.; Hyndman, R.J. Probabilistic Energy Forecasting: Global Energy Forecasting Competition 2014 and beyond. Int. J. Forecast. 2016, 32, 896–913. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021; pp. 1–12. [Google Scholar]

Figure 1. The diagram of the SPF models.

Figure 2. The pipeline of the proposed FTRA.

Figure 3. The diagram of the KFCV for FSSPF.

Figure 4. Solar power curves forecasted by different approaches. FSSPF setting: S-1; SPF model: Transformer-based; Training set: 1 April 2012–10 April 2012 (20% samples will be randomly selected as the validation set); Testing set: 13 May 2012–17 May 2012.

Table 1. Detailed information on each FSSPF setting.

Names	TSP	SSP
S-1	SPP1	SPP2, SPP3
S-2	SPP2	SPP1, SPP3
S-3	SPP3	SPP1, SPP2

Table 2. RMSE of the proposed FTRA and other comparison approaches.

Models	Approaches	10-Day				20-Day				30-Day
Models	Approaches	S-1	S-2	S-3	AVE ⁺	S-1	S-2	S-3	AVE	S-1	S-2	S-3	AVE
Transformer-based	Baseline [23]	12.69	12.99	16.86	14.18	11.36	10.97	11.26	11.20	10.58	10.28	11.09	10.65
	TL-FT-Out [23]	9.78	9.16	9.82	9.59	9.26	8.62	9.35	9.08	9.09	8.54	9.27	8.97
	TL-FT-All [22]	8.79	8.93	9.03	8.92	8.16	8.17	8.56	8.30	7.94	8.14	8.34	8.14
	TL-FT-Inp	8.22	9.06	9.54	8.94	7.61	8.35	9.12	8.36	7.40	8.24	8.95	8.20
	Reptile [29]	8.61	9.23	8.97	8.94	8.28	8.53	8.48	8.43	8.22	8.32	8.31	8.28
	FTRA(Ours)	8.01	8.81	8.43	8.42	7.74	8.07	8.21	8.01	7.49	8.08	8.16	7.91
	Upper-Bound *	6.57	7.01	7.38	6.99	6.57	7.01	7.38	6.99	6.57	7.01	7.38	6.99
LSTM-based	Baseline [23]	16.50	16.53	17.04	16.69	13.50	12.98	13.97	13.48	12.42	12.14	12.89	12.48
	TL-FT-Out [23]	9.56	9.46	8.90	9.31	8.91	8.80	10.06	9.26	8.67	8.64	9.79	9.03
	TL-FT-All [22]	8.98	8.85	9.28	9.04	8.40	8.18	8.71	8.43	8.23	8.11	8.44	8.26
	TL-FT-Inp	8.61	8.90	9.49	9.00	7.98	8.23	8.80	8.34	7.76	8.22	8.60	8.19
	Reptile [29]	8.27	9.23	8.80	8.77	7.89	8.83	8.65	8.46	7.70	8.66	8.30	8.22
	FTRA(Ours)	8.28	8.70	8.38	8.45	8.02	8.09	8.13	8.08	7.92	7.98	7.95	7.95
	Upper-Bound	6.52	6.95	7.38	6.95	6.52	6.95	7.38	6.95	6.52	6.95	7.38	6.95
GRU-based	Baseline [23]	15.97	16.44	16.86	16.42	12.79	12.55	13.68	13.01	11.73	11.41	12.58	11.91
	TL-FT-Out [23]	9.89	9.41	11.34	10.21	9.31	8.71	10.26	9.43	9.09	8.55	9.72	9.12
	TL-FT-All [22]	9.04	8.92	9.41	9.12	8.57	8.31	8.78	8.55	8.41	8.20	8.51	8.37
	TL-FT-Inp	8.40	8.77	9.68	8.95	7.63	8.29	9.15	8.36	7.51	8.19	8.91	8.20
	Reptile [29]	8.27	8.72	8.73	8.57	7.80	8.23	8.43	8.15	7.67	8.12	8.25	8.01
	FTRA(Ours)	8.05	8.58	8.44	8.36	7.61	7.99	8.23	7.94	7.52	7.99	8.18	7.90
	Upper-Bound	6.51	6.96	7.28	6.92	6.51	6.96	7.28	6.92	6.51	6.96	7.28	6.92

* In Upper-bound, the amount of training data is 23 months. ⁺ AVE represents the average results of the three FSSPF settings.

Table 3. MAE of the proposed FTRA and other comparison approaches.

Models	Approaches	10-Day				20-Day				30-Day
Models	Approaches	S-1	S-2	S-3	AVE ⁺	S-1	S-2	S-3	AVE	S-1	S-2	S-3	AVE
Transformer -based	Baseline [23]	8.19	8.78	9.10	8.69	7.10	7.20	7.27	7.19	6.39	6.55	6.82	6.59
	TL-FT-Out [23]	5.97	5.75	6.11	5.94	5.47	5.39	5.82	5.56	5.31	5.13	5.75	5.40
	TL-FT-All [22]	5.29	5.34	5.29	5.31	4.81	4.76	4.95	4.84	4.54	4.68	4.81	4.68
	TL-FT-Inp	4.55	5.50	5.71	5.25	4.14	4.94	5.31	4.80	3.98	4.82	5.20	4.67
	Reptile [29]	4.89	5.51	5.19	5.20	4.68	5.04	4.84	4.85	4.67	4.82	4.78	4.76
	FTRA(Ours)	4.36	5.37	4.98	4.90	4.23	4.79	4.84	4.62	4.04	4.75	4.80	4.53
	Upper-Bound *	3.41	3.83	4.07	3.77	3.41	3.83	4.07	3.77	3.41	3.83	4.07	3.77
LSTM-based	Baseline [23]	11.77	12.13	12.64	12.18	9.52	9.09	10.17	9.59	8.12	8.31	9.07	8.50
	TL-FT-Out [23]	5.57	5.82	5.76	5.72	5.33	5.52	6.41	5.75	5.08	5.28	6.22	5.53
	TL-FT-All [22]	5.06	5.24	5.55	5.28	4.68	4.75	5.11	4.85	4.59	4.67	4.95	4.74
	TL-FT-Inp	4.83	5.30	5.69	5.27	4.42	4.96	5.25	4.88	4.30	4.80	5.12	4.74
	Reptile [29]	4.65	5.63	5.20	5.16	4.41	5.28	5.10	4.93	4.29	5.13	4.89	4.77
	FTRA(Ours)	4.69	5.28	4.78	4.92	4.50	4.98	4.66	4.71	4.43	4.75	4.58	4.59
	Upper-Bound	3.45	3.82	4.05	3.77	3.45	3.82	4.05	3.77	3.45	3.82	4.05	3.77
GRU-based	Baseline [23]	10.80	12.17	12.44	11.80	8.28	9.04	10.01	9.11	7.34	7.79	8.68	7.94
	TL-FT-Out [23]	6.16	5.87	7.12	6.38	5.45	5.31	6.29	5.68	5.31	5.16	5.87	5.45
	TL-FT-All [22]	5.24	5.35	5.61	5.40	4.93	4.86	5.15	4.98	4.82	4.73	5.12	4.89
	TL-FT-Inp	4.68	5.23	5.87	5.26	4.19	4.85	5.45	4.83	4.11	4.72	5.33	4.72
	Reptile [29]	4.71	5.22	5.32	5.08	4.40	4.87	5.10	4.79	4.32	4.75	4.93	4.67
	FTRA(Ours)	4.55	5.27	4.87	4.90	4.23	4.84	4.78	4.62	4.18	4.80	4.76	4.58
	Upper-Bound	3.42	3.83	3.99	3.75	3.42	3.83	3.99	3.75	3.42	3.83	3.99	3.75

* In Upper-bound, the amount of training data is 23 months. ⁺ AVE represents the average results of the three FSSPF settings.

Table 4. Computation Time of different septs in the proposed FTRA (Unit: s).

Settings	Models	Algorithm 1	Algorithm 2	Algoritm 3 *
Settings	Models	Algorithm 1	Algorithm 2	10-Day	20-Day	30-Day
S-1	Transformer -based	54	16	6	10	15
	LSTM-based	75	9	12	14	18
	GRU-based	14	7	8	14	15
S-2	Transformer -based	75	17	15	20	23
	LSTM-based	100	41	22	28	31
	GRU-based	86	3	18	21	32
S-3	Transformer -based	18	22	4	9	13
	LSTM-based	12	9	6	15	21
	GRU-based	118	13	10	13	17

* The computation time of Algorithm 3 is average one-fold in KFCV.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, X.; Wang, Y.; Cao, Z.; Chen, F.; Li, Y.; Yan, J. Feature Transfer and Rapid Adaptation for Few-Shot Solar Power Forecasting. Energies 2023, 16, 6211. https://doi.org/10.3390/en16176211

AMA Style

Ren X, Wang Y, Cao Z, Chen F, Li Y, Yan J. Feature Transfer and Rapid Adaptation for Few-Shot Solar Power Forecasting. Energies. 2023; 16(17):6211. https://doi.org/10.3390/en16176211

Chicago/Turabian Style

Ren, Xin, Yimei Wang, Zhi Cao, Fuhao Chen, Yujia Li, and Jie Yan. 2023. "Feature Transfer and Rapid Adaptation for Few-Shot Solar Power Forecasting" Energies 16, no. 17: 6211. https://doi.org/10.3390/en16176211

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Transfer and Rapid Adaptation for Few-Shot Solar Power Forecasting

Abstract

1. Introduction

2. Solar Power Forecasting Models

3. Methodology: FTRA

3.1. Division of SPF Model

3.2. Transfer-Pre-Training

3.3. Meta-Pre-Training

3.4. Fine-Tuning

4. Case Study

4.1. Data Description

4.2. Settings of SPP and TPP

4.3. Evaluation Metric

4.4. Evaluation Method

4.5. Comparison Methods

4.6. Hyperparameters

4.7. FSSPF Results

4.8. Computational Costs

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI