TransPVP: A Transformer-Based Method for Ultra-Short-Term Photovoltaic Power Forecasting

Wang, Jinfeng; Hu, Wenshan; Xuan, Lingfeng; He, Feiwu; Zhong, Chaojie; Guo, Guowei

doi:10.3390/en17174426

Open AccessArticle

TransPVP: A Transformer-Based Method for Ultra-Short-Term Photovoltaic Power Forecasting

by

Jinfeng Wang

^1,*,

Wenshan Hu

²

,

Lingfeng Xuan

³,

Feiwu He

³,

Chaojie Zhong

³ and

Guowei Guo

⁴

¹

Electric Power Science Research Institute, Guangdong Power Grid Limited Liability Company, Guangzhou 510062, China

²

School of Power and Mechanical Engineering, Wuhan University, Wuhan 430072, China

³

Qingyuan Yingde Power Supply Bureau, Guangdong Power Grid Limited Liability Company, Guangzhou 513000, China

⁴

Foshan Shunde Power Supply Bureau, Guangdong Power Grid Limited Liability Company, Guangzhou 528300, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(17), 4426; https://doi.org/10.3390/en17174426

Submission received: 15 July 2024 / Revised: 11 August 2024 / Accepted: 27 August 2024 / Published: 4 September 2024

(This article belongs to the Section A2: Solar Energy and Photovoltaic Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The increasing adoption of renewable energy, particularly photovoltaic (PV) power, has highlighted the importance of accurate PV power forecasting. Despite advances driven by deep learning (DL), significant challenges remain, particularly in capturing the long-term dependencies essential for accurate forecasting. This study presents TransPVP, a novel transformer-based methodology that addresses these challenges and advances PV power forecasting. TransPVP employs a deep fusion technique alongside a multi-task joint learning framework, effectively integrating heterogeneous data sources and capturing long-term dependencies. This innovative approach enhances the model’s ability to detect patterns of PV power variation, surpassing the capabilities of traditional models. The effectiveness of TransPVP was rigorously evaluated using real data from a PV power plant. Experimental results showed that TransPVP significantly outperformed established baseline models on key performance metrics including RMSE, R², and CC, underscoring its accuracy, predictive power, and reliability in practical forecasting scenarios.

Keywords:

power forecasting; photovoltaic power; renewable energy; deep learning; information fusion

1. Introduction

The International Energy Agency (IEA) reports that PV energy is rapidly growing, with 1177.3 GW of PV energy in 2022 [1]. Figure 1 illustrates the historical and current trends of various renewable energy sources. The emergence of solar photovoltaic (PV) power generation is a significant development in the global energy landscape, offering a sustainable, silent, and cost-effective alternative to traditional electricity sources. In recent years, there has been a significant increase in PV production. However, the rapid growth of solar energy presents operational challenges due to its high variability and low short-term predictability [2]. This variability is caused by fluctuations in meteorological conditions, such as solar irradiance and temperature, and poses complex challenges for balancing the supply and demand in real-time and ensuring grid stability.

In the context of smart grids, PV forecasts are necessary to manage distribution networks, microgrids, or smart homes. As a result, a large amount of research has appeared in the literature. According to their specific forecast horizon, these tasks can be classified as belonging to one of four categories [3,4]: ultra-short-term prediction (few seconds to 30 min), short-term prediction (30 min to 6 h), medium-term prediction (6 h to 1 day), and long-term prediction (longer than 1 day).

Ultra-short-term photovoltaic (PV) production forecasting is crucial in this context, especially for maintaining grid stability and ensuring efficient operations [5]. Ultra-short-term forecasting, like the 5-min interval used in this paper, provides the high prediction accuracy required for immediate adjustments. This type of forecasting is particularly useful for grids with highly flexible generation resources such as fast-ramping natural gas plants and battery storage systems. For grids with these resources, a 5-min forecasting interval is sufficient to make quick adjustments based on frequent updates, ensuring an efficient and stable operation.

Recognizing the crucial importance of accurate short-term PV forecasting, the last decade has seen a surge in research efforts dedicated to this field. Earlier approaches to PV output forecasting were typically based on shallow artificial neural networks (ANNs) that had limited predictive accuracy due to their simplistic architecture. In recent years, there has been growing interest among researchers and academics in using deep learning (DL) to improve the prediction accuracy. This is due to the availability of large amounts of data collected worldwide and the advent of supercomputers.

In a recent work [6], the authors reviewed the topic of PV forecasting. These methods can broadly be classified into three categories: methods based on long short-term memory (LSTM), methods based on the gated recurrent unit (GRU), and methods based on convolutional neural networks (CNNs).

In a large portion of the literature, PV power forecasting is approached as a regression problem based on time series. To this end, LSTM networks, a special class of recurrent neural networks (RNNs), are extensively employed. Wen et al. [7] used a LSTM model to predict the hourly PV production based on the schedule, weather, and timescale variables. Abdel-Nasser and Mahmoud [8] proposed a photovoltaic power forecasting model based on LSTM that considered the temporal changes, while Hossain, Mohammad, and Hisham [9] proposed multi-task prediction models for joint weather forecasting and PV power prediction based on LSTM.

The gated recurrent unit (GRU), which is a simplified version of the LSTM with fewer gates, has shown comparable performance in PV power forecasting tasks. Abdel-Basset et al. [10] enhanced the standard GRU cell architecture by using dilated convolutions and residual connections. This modification allowed the network to more effectively capture the spatial-temporal dynamics inherent in the photovoltaic data, thus improving its predictive accuracy for solar energy output. Rai et al. [11] proposed a GRU-based auto-encoder, while Pengyun Jia et al. [12] proposed a hybrid model (VMD-ISSA-GRU) based on the GRU. Yeming Dai et al. [13] introduced the repeat vector layer and time distributed layer into the GRU model. Xiuli Xiang et al. [14] introduced channel attention into the GRU model to improve the prediction accuracy.

Convolutional neural networks (CNNs) are a category of deep learning algorithms particularly well-suited for processing data with a grid-like topology such as images (which are two-dimensional grids of pixels) and time-series data (one-dimensional grids). Based on CNNs and the encoder–decoder, Jurado et al. [15] proposed an improved model for probabilistic PV forecasting, while Suresh et al. [16] used CNNs with a sliding window algorithm for PV output forecasting.

With the development of modeling studies, PV forecasting tends to use hybrid models. Lee et al. [17] proposed a model that combined CNN and LSTM networks for day-ahead PV production forecasting using the previous day’s PV power observations. Wang et al. [18] also proposed a hybrid CNN and LSTM model and compared its performance over pure LSTM and CNN models. Ali Agga et al. [19] introduced two more LSTM layers, which improved the accuracy of the overall predictions. Wencheng Liu and Zhizhong Mao [20] introduced a hybrid model that integrated an attention mechanism with the convolutional neural network (CNN) and bidirectional long short-term memory network (BiLSTM). Zheng et al. [21] applied LSTM and a particle swarm optimization algorithm to successfully predict the PV power. Agga et al. [22] proposed a method based on hybrid CNN-LSTM and ConvLSTM models to enhance the accuracy of power production forecasts.

Despite the advancements facilitated by DL, challenges persist, particularly in capturing long-term dependencies that are crucial for accurate forecasting. This limitation is illustrated by the LSTM-based model proposed by Shi et al. [23]. Although the model innovatively uses LSTM for solar forecasting, it struggles with long-term data sequences. This gap underscores the necessity for novel approaches that can effectively capture these dependencies.

As a result, Phan et al. [24] proposed a transformer-based solution. Unfortunately, the details of the input and output data were not elaborated in their paper, making the proposed method and experimental conclusions difficult to replicate. Tian et al. [25] and Guo et al. [26] incorporated prior knowledge that the output data should be positive within the transformer framework. However, they did not fully utilize historical photovoltaic power data. Furthermore, they adopted Vaswani’s position encoding method [27] to represent the position of weather features (or historical energy yield) in the input vector. Unlike tokens in a sentence, the input in our task consists of structured data, where the position of an observation in the input vector is less important than the time of the observation.

In order to further improve the PV power prediction performance, this study introduces TransPVP, a transformer-based method for removing the limitations of traditional DL models. TransPVP integrates a deep fusion technique for amalgamating heterogeneous data sources, inspired by Vaswani et al.’s work on transformers [27], with a multi-task joint learning framework. This paper proposes a novel approach to enhance the model’s comprehension of PV power variation patterns, addressing the crucial gap in capturing long-term dependencies from historical data.

The main contributions of this paper can be summarized as follows:

(1): A deep fusion method of heterogeneous information from multiple sources is proposed for photovoltaic power forecasting.
(2): A transformer-based multi-task joint learning framework is proposed to deepen the model’s understanding of photovoltaic power change patterns.
(3): Rigorous evaluations were conducted using real-world datasets. The exhaustive experiments on the datasets demonstrate that TransPVP outperforms the baselines in terms of prediction accuracy.

The remainder of this paper is organized as follows. In Section 2, the proposed method is described. In Section 3, the experiments are conducted, and the results are discussed. Finally, the conclusions and future work are given in Section 4.

2. Proposed Methodology

This section introduces the TransPVP model, which fuses multiple sources of information, and hence enhances the prediction performance. Figure 2 provides a schematic overview of the model architecture. This framework consists of two main modules, encoding and decoding. Their core components are both multi-head attention models. However, due to the differences in input data, the kernels used in the multi-head attention models in different modules are different. In the figure, A represents the multi-head self-attention model [27], and B represents the multi-head cross-attention model [28].

2.1. Training Data

Table 1 shows the specific observations and their units. In Table 1, GHR, A-Power, Humidity, and W-Speed are the abbreviations for global horizontal radiation, active power, weather relative humidity, and wind speed, respectively. Although rain can have a significant positive effect on the energy output of photovoltaic systems, the benefits are mainly due to thermal and optical factors [29]. It is important to note that rainfall has little correlation with PV power generation for clean solar panels and can be considered as noise [30]. Adding it to the training data can affect the model’s performance. Time was the duration from 1 January 1970, 00:00 h to the present, measured in minutes.

The task of predicting PV power generation can be informed by two principal categories of data: observational values of determinant factors and historical electricity generation data. The first category encompasses measured or forecasted environmental and operational parameters that directly influence the PV output such as solar irradiance and temperature. The second category comprises historical records of electricity produced by the PV system, which are essential for understanding the system’s performance over time and under varying conditions. By integrating these two data streams, predictive models can more accurately forecast PV power generation, leveraging historical trends and real-time or predicted environmental conditions.

The architecture adopts a multi-input and multi-output model. At time t, the input data consist of two parts, one is the observational values

v

of determinant factors before time t, and the other is the historical electricity generation data p before time t. The output data are the PV power p after moment t. The input–output relationship of the data can be formulated as follows:

(p_{t + 1}, p_{t + 2}, \dots, p_{t + m 1}) = f (p_{t}, p_{t - 1}, \dots, p_{t - m 2}; v_{t}, v_{t - 1}, \dots, v_{t - m 3}) .

(1)

Here,

m 1, m 2

are the size of the observation windows of output and input power, respectively,

m 3

is the size of the observation windows of the observational values

v

. Note that

v

is a vector, not a scalar.

2.2. Data Preprocessing

(1): NORMALIZATION

The normalization of data is crucial to prevent memory overflow when input and output data have different physical quantities with large differences in their value domains. The MinMaxScaler method is used for normalization, which is calculated as follows:

X_{N} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}},

(2)

where

X_{m i n} a n d X_{m a x}

are the minimum and maximum of

X

, respectively. Its inverse transformation can be expressed as

X = X_{N} (X_{m a x} - X_{m i n}) + X_{m i n} .

(3)

(2): OUTLIER REMOVAL

To improve the model’s performance, outliers were removed using the local outlier factor (LOF) method [31], which is based on the concept of local density. The density is estimated using the distance of the k nearest neighbors. Regions of similar density and outliers can be identified by comparing the local density of an object to the local densities of its neighbors.

Specifically, a LOF score of approximately 1 indicates that a data point has a similar density compared to its k nearest neighbors, suggesting that the point is located in a region of uniform density and is therefore likely to be considered normal within its local context.

A LOF score less than 1 indicates that the data point is situated in a higher density region compared to its neighbors. Such points are typically regarded as inliers, as they reside within denser clusters of the data distribution.

Conversely, a LOF score greater than 1 suggests that the data point is in a region of lower density relative to its neighbors, marking it as an outlier. These points are of particular interest as they can signify anomalies or exceptions in the dataset, deviating significantly from the local densities of their surrounding points.

Here, the LOF of an object p is defined as

L O F (p) = \frac{\sum_{o \in K_{N (p)}} d_{l} (o)}{N_{k} (p) * d_{l} (p)},

(4)

where

N_{k} (p)

denotes the cardinality of k-nearest-neighbor sets

K_N (p)

of the object p. It is worth noting that the cardinality of k-nearest-neighbor sets

K_N (p)

may not be equal to k when there are multiple points with the same distance to the k-th nearest neighbor, or when there are less than k neighbors. Here,

d_{l} (p)

is the local reachability density of an object p, which can be defined as

d_{l} (p) = \frac{N_{k} (p)}{\sum_{o \in K_N (p)} D_{r} (p, o)} .

(5)

Here,

D_{r} (p, o)

is the reachability distance of object p with respect to object o, which is defined as

D_{r} (p, o) = \max \{d_{k} (o), d (p, o)\},

(6)

where

d_{k} (o)

is the distance of the object o to the k-th nearest neighbor while

d (p, o)

is the distance between object p and o.

2.3. Multi-Head Attention Model

To leverage the sequential nature of the data, it is essential to provide the model with information regarding the relative or absolute positions of elements within the sequence. This is effectively accomplished by employing positional encoding using sine and cosine functions [27]. The formula used is

T_{E} (p, i) = \{\begin{matrix} \sin (\frac{p}{10000^{\frac{i}{d_{m}}}}) i f i % 2 = 0 \\ \cos (\frac{p}{10000^{\frac{i - 1}{d_{m}}}}) if i % 2 = 1 \end{matrix} .

(7)

Note that, unlike nature language, a time point is more important than the position of the input data in the PV power generation. For this reason, here, p represents the time of a node in a year rather than the position, while i denotes the position of an element within the encoding. The encoding length is denoted by d_m.

The multi-head attention (MHA) mechanism enhances the model’s capability to process information from distinct representational subspaces at various positions. This is achieved by calculating attention scores within each attention head through a scaled dot-product operation of queries and keys. This mechanism allows the model to focus on relevant parts of the input sequence, adapting dynamically to the different aspects of the data it processes. The detailed computational procedure is depicted in Figure 3. Mathematically, MHA can be expressed as:

H = c o n c a t (h_{1}, h_{2}, \dots, h_{k}) W^{O} (k \in \{8,9, \dots, 12\}),

(8)

h_{i} = s o f t m a x (\frac{((α + T_{E}) W_{i}^{Q}) {({(α + T_{E}) W}_{i}^{K})}^{T}}{\sqrt{d_{k}}}) ({(α + T_{E}) W}_{i}^{V}) .

(9)

Following the MHA, a feed-forward network (FFN) is employed. The FFN comprises of two linear transformations with a ReLU activation function, in other words,

γ = m a x (0; H W_{A} + b_{A}) W_{B} + b_{B} .

(10)

The loss function is defined by

L = \sum_{i = 1}^{M} {(p_{i} - {\hat{p}}_{i})}^{2},

(11)

where

{{\hat{p}}_{i}, p}_{i}

indicate the predicted power value and the corresponding ground truth, respectively. Note that

p_{i}

refers to the i-th element of the vector P.

3. Experiment Setups

3.1. Dataset

The experimental data for predicting PV power generation came from observations of power generation from solar panels over a period of up to 1 year. The data, consisting of irradiance measurements and meteorological predictions, were collected at 5-min intervals. After removing the outliers, the irradiance data were downsampled and are displayed in Figure 4 at 1-h intervals. In order to increase the generalizability of the conclusions of this study, the value of electric power was measured as the amount generated per unit area of the solar panel. The data were divided into two parts: 80% for training, and 20% for testing.

High temperatures increase the recombination rate of these carriers, which reduces the voltage output of the cell and hence decreases the efficiency of PV cells [32]. Therefore, temperature has an important effect on the performance of photovoltaic cells. The temperature after the removal of outliers is shown in Figure 5.

High atmospheric pressure typically leads to clearer skies and increased solar irradiance [33], which can enhance the performance of solar panels by providing more solar energy for conversion into electricity. Figure 6 displays the atmospheric pressure after removing the outliers.

High relative humidity is often associated with increased cloud cover [34], which reduces solar irradiance and thus the output of solar panels. Moreover, higher relative humidity typically correlates with cooler temperatures [35]. For this reason, relative humidity provides important clues for PV power prediction. The relative humidity after the removal of outliers is shown in Figure 7.

3.2. Baselines

(1): LSTM

Long short-term memory (LSTM) networks, a more advanced type of RNN, were specifically developed to tackle the issues of vanishing and exploding gradients that are inherent in traditional RNNs. Introduced by Hochreiter and Schmidhuber in 1997 [15], LSTMs include specialized units called memory cells, which allow the network to learn when to forget previous information and when to update the hidden state with new information. LSTMs have been demonstrated to be effective in learning long-term dependencies and have been shown to outperform traditional neural networks in terms of the prediction accuracy of PV power [4].

(2): Bidirectional LSTM (BiLSTM)

Bidirectional long short-term memory (BiLSTM) networks extend the LSTM architecture by processing the data in both forward and backward directions [36]. This approach allows the network to have both past and future context at any point in the sequence, thereby providing a richer representation of the sequence data. BiLSTMs achieve this by maintaining two separate hidden layers, one for the forward pass and one for the backward pass, whose outputs are combined at each time step. BiLSTMs are particularly useful in applications where the entire sequence is essential for making accurate predictions. It has been proved that BiLSTM outperforms LSTM in PV prediction performance [37].

(3): Gated Recurrent Unit (GRU)

Gated recurrent units (GRUs) are a variant of the traditional RNN architecture, which simplify the LSTM design by combining the forget and input gates into a single “update gate” while also merging the cell state and hidden state [38]. This results in a more streamlined architecture that requires fewer parameters and computational resources compared to LSTMs. GRUs comprise two gates: the update gate, which determines the extent to which past information should be conveyed to the future, and the reset gate, which determines the extent to which information from the past is forgotten. This architecture has been proven to be effective in various sequence learning tasks.

(4): Bidirectional GRU (BiGRU)

The Bidirectional GRU (BiGRU) extends the GRU architecture by applying it in both forward and reverse directions over the input sequence [36]. Like BiLSTMs, BiGRUs process data with two separate GRUs, one operating in a forward direction and the other in a reverse direction. This allows the network to capture information from both the past and future contexts at every point in the sequence. The outputs of the two GRUs are typically concatenated at each time step, enhancing the model’s capacity to understand complex patterns in the data. BiGRUs are particularly advantageous in tasks where understanding the entire sequence improves the performance.

(5): Stacked LSTM

A stacked LSTM model is an advanced version of the basic LSTM model. In a stacked LSTM model, multiple layers of LSTM units are “stacked” on top of each other, with the output sequence of one layer forming the input sequence for the next [39]. This architecture allows the model to learn at various levels of abstraction and complexity. Stacked LSTMs offer a powerful tool for sequence learning tasks, providing a way to model complex dependencies and relationships in data over long sequences. Stacked LSTMs outperform BiLSTM in PV prediction performance [37].

(6): CNN-LSTM

The CNN-LSTM architecture combines the strengths of CNN and LSTM networks to process data that contain both spatial features and temporal sequences. The CNN-LSTM architecture is a powerful tool for complex tasks, providing a way to capture intricate patterns and dynamics that many models struggle with [40]. As with Lim’s model [41], a one-dimensional convolutional network was employed in the experiment, which was then connected to an LSTM network. As Bi-LSTM performed the best among all models [37], LSTM was substituted with Bi-LSTM in the experiment.

(7): Transformer

The transformer model represents a significant advancement in sequence learning tasks, particularly in its ability to handle long-range dependencies with greater efficiency compared to traditional models like LSTM or BiLSTM. The transformer model was originally proposed for natural language processing tasks but has been adapted for photovoltaic (PV) power prediction by researchers such as Phan et al. [24] and Tian et al. [25]. However, since Phan et al. [24] did not provide specific implementation details, we implemented the transformer model in accordance with the methodology described in the work by Tian et al. [25].

A more detailed description of the methods above-mentioned can be found in Refs. [4,37].

3.3. Evaluation Metrics

There are many indicators for the evaluation of PV forecasts and different combinations of indicators have been chosen in different studies [42,43,44]. Some of the indicators evaluate the same content, but only in different forms of expression. In this experiment, five evaluation metrics were selected from among them with a view to reflecting different aspects of model performance. We assumed that

y

and

\hat{y}

were the measured and predicted values, and these metrics are specified as follows.

(1): MAE (Mean Absolute Error)

MAE is the average magnitude of the absolute differences between the predicted values and the actual observations. The MAE is an intuitive metric because it provides a simple average of the error magnitudes across all predictions [45]. A lower MAE value indicates a better fit. The MAE can be calculated by

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}| .

(12)

(2): MAPE (Mean Absolute Percentage Error)

MAPE measures the average of the absolute percentage errors, offering a view of prediction accuracy in a percentage term [46]. This is easy to interpret since it provides errors in percentage terms. However, it can be infinite or undefined if there are actual values equal to zero and is sensitive to small denominators. MAPE can be calculated by

M A P E = (\frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - x_{i}}{y_{i}} |) \times 100 % .

(13)

(3): MBE (Mean Bias Error)

MBE calculates the average bias in the predictions by subtracting the actual values from the predicted values and averaging the result. It assesses the average tendency of the predicted values to be larger or smaller than their actual counterparts. MBE can indicate the model’s systematic error. A positive value suggests a tendency to overestimate, while a negative value indicates a tendency to underestimate [47]. However, it does not reflect the magnitude of the error. MBE can be calculated by

M B E = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i}) .

(14)

(4): RMSE (Root Mean Squared Error)

RMSE measures the standard deviation of the prediction errors or residuals, providing a sense of how much error the system typically makes in its predictions. RMSE is in the same units as the target variable, making it more interpretable. It penalizes larger errors more, but its scale is easier to understand relative to the target variable [48]. RMSE can be calculated by

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}} .

(15)

(5): R² (R-squared)

R-squared, also known as the coefficient of determination, measures the proportion of the variance in the dependent variable that is predictable from the independent variable [49]. R-squared values range from 0 to 1, where 1 indicates a perfect fit. It is more informative than MAE, MSE, and RMSE in regression analysis evaluation [50]. R² is formulated by

R^{2} = 1 - \frac{\sum {({\hat{y}}_{i} {- y}_{i})}^{2}}{\sum {(y_{i} - \bar{y})}^{2}} .

(16)

(6): CC (Correlation Coefficient)

The correlation coefficient measures the strength and direction of a linear relationship between two variables [51]. It is a statistical measure that indicates how closely two variables move in relation to one another. Values range from −1 to 1, where 1 means a perfect positive linear correlation, −1 indicates a perfect negative linear correlation, and 0 means no linear correlation. It is useful for examining the relationship between variables but does not imply causation [52]. The correlation coefficient can be calculated by

C C = \frac{n (\sum \hat{y} y) - (\sum \hat{y}) (\sum y)}{\sqrt{[n \sum {\hat{y}}^{2} - {(\sum \hat{y})}^{2}] [n \sum y^{2} - {(\sum y)}^{2}]}} .

(17)

3.4. Experimental Results and Discussion

The experiments were conducted on a computer platform equipped with an NVIDIA GeForce RTX3080Ti, an Intel i9-10920X processor, and 64 GB of memory. For a fair comparison, all algorithms in the experiment were configured with a batch size of 64 and trained for 50 epochs. In the implementation of all models, the Adam optimizer was employed due to its robust performance across various types of neural architectures. The learning rate was uniformly set to 0.001, balancing the need for rapid convergence with the risk of overshooting the minimal loss values.

To ensure a fair comparison, all baselines were trained for 50 epochs without the use of early stopping techniques, whereby training was halted once a predefined success criterion was reached.

The performance comparison between TransPVP and the baselines is presented in Table 2. In the table, the letters ‘P’, ‘SLSTM’, ‘CLSTM’, and ‘TrsF’ stand for ‘Input Data Containing Historical Output Power’, ‘Stacked LSTM’, ‘CNN_LSTM’, and ‘Transformer’, respectively. The bold red numbers in the table denote the best performance, while the bold black numbers denote the second best performance.

Table 2 shows that the proposed method, TransPVP, demonstrated superior performance in PV power prediction compared to the baseline models. The effectiveness of the model was evaluated based on the key performance metrics including the RMSE, R², and CC. The following section discusses these results and the implications for practical applications.

(1): RMSE (Root Mean Squared Error)

TransPVP achieved the lowest RMSE among the models, indicating that it had the smallest average error magnitude. This suggests that TransPVP is highly effective in reducing large prediction errors, which is crucial for reliable PV power forecasting.

(2): R² (Coefficient of Determination)

The high R² value of TransPVP indicates a strong correlation between the predicted and actual PV power values. This metric confirms that the model accurately captures the variability in the data, making it a robust tool for forecasting.

(3): CC (Correlation Coefficient)

The CC values further support the model’s strong predictive ability. A high CC indicates that when the model predicts high PV power values, the actual values are also likely to be high, demonstrating the consistency of the model’s predictions.

(4): MAE (Mean Absolute Error) and MAPE (Mean Absolute Percentage Error)

While the RMSE of TransPVP was relatively low, the MAE and MAPE were higher than some of the other models. This discrepancy suggests that although TransPVP is generally accurate, it tends to have small deviations in its predictions. These metrics highlight the need for further refinement of the model to improve accuracy.

(5): MBE (Mean Bias Error)

The positive MBE value of TransPVP indicates a slight overestimation in the PV output predictions. This bias needs to be addressed to ensure the model’s accuracy in real-world applications. However, it is important to note that a low MBE does not necessarily equate to high accuracy, as the RMSE remains a critical indicator of fewer large prediction errors.

(6): Impact of Historical Data

The integration of historical output power data into the TransPVP model significantly improved its prediction accuracy. Historical data provide context, which helps the model understand long-term dependencies and trends in PV power generation. This integration is particularly beneficial for capturing the temporal dynamics of PV output, leading to more accurate and reliable forecasts.

(7): Loss Value Analysis

Figure 8 shows the loss value decrease during the training process for various models based on our experimental results. The introduction of historical PV power data resulted in a smoother decrease in the loss value, indicating improved model performance during training. Notably, the loss values for TransPVP_P and CLSTM_P were significantly lower than those of other methods. Among them, CLSTM_P exhibited the lowest loss value at the same epoch. However, it is important to note that lower loss values in the training set do not always guarantee better performance in the test set.

(8): Visual Comparison of Predictions

Figure 9 and Figure 10 present a comparison of the prediction results in our experiment for each model with the true values over one day, however, their input data were different. In Figure 9, the input data were devoid of historical output power, whereas in Figure 10, the input data contained historical output power. TransPVP’s predictions closely followed the actual values, demonstrating its ability to effectively the capture daily variations in PV power output.

Figure 11 and Figure 12 illustrate the prediction errors in the experiment, with TransPVP showing lower errors compared to the other models. In Figure 11, the input data were devoid of historical output power, whereas in Figure 12, the input data contained historical output power. The largest estimation errors were generated by BiGRU, followed by the stacked LSTM and CNN_LSTM.

(9): Discussion of Results

The experimental results underscore the effectiveness of TransPVP in PV power forecasting. The model’s ability to integrate heterogeneous data sources and capture long-term dependencies sets it apart from traditional deep learning models. The high R² and CC values, along with the low RMSE, highlight TransPVP’s potential for practical applications in energy management and planning.

However, the higher MAE and MAPE values indicate room for improvement in the prediction accuracy. Future work should focus on refining the model to reduce these small deviations and address the slight overestimation bias reflected in the MBE. Additionally, incorporating more diverse sources of input data, such as satellite cloud imagery or ground-based cloud photographs, could further enhance the model’s predictive accuracy under varying weather conditions.

In conclusion, TransPVP represents a significant advancement in PV power forecasting, offering a reliable and accurate tool for predicting energy production. Its innovative approach to integrating historical data and capturing long-term dependencies makes it a valuable asset for the renewable energy sector, contributing to more efficient and effective energy management practices.

4. Conclusions

This paper introduces TransPVP, a novel transformer-based approach that significantly enhances the accuracy of PV power prediction. By integrating additional prior knowledge and thereby reducing the complexity of relational expressions, the model’s performance was improved, particularly under conditions of limited datasets. Specifically, the multi-head attention mechanism was introduced to capture different operating modes of PV power generation under varying conditions. The use of time encoding emphasizes the importance of time information. By incorporating a ReLU function in the output layer, we leveraged the prior knowledge that PV power cannot be negative. Additionally, by using historical power generation data as the input, the model captured the continuity in PV power generation patterns.

Extensive experiments were conducted using real-world data from a photovoltaic power plant. The results demonstrate that TransPVP significantly outperformed conventional baseline methods, achieving superior metrics in terms of RMSE, R², and CC. These metrics are critical as they reflect the accuracy, predictive power, and overall reliability of the forecasting model in operational settings.

The findings affirm that the inclusion of historical power output data significantly improves the predictive accuracy, confirming the relevance of historical context in power generation forecasting. TransPVP’s superior performance demonstrates its capability to excel in power generation forecasting tasks, making it a valuable tool for energy management and planning in the crucial sector of renewable energies.

The TransPVP model achieved the lowest RMSE but had a higher MAE compared to the others. This suggests that while TransPVP effectively reduces large errors, it may still have small, consistent errors. Addressing these small errors will be an important focus for future work. To address this problem, it would be beneficial to consider the use of more diverse sources of input data to provide richer information. This could include satellite cloud imagery or ground-based cloud photographs, which could enhance the model’s ability to predict the photovoltaic power output under varying weather conditions. Additionally, further refinement of the model to reduce small deviations and address the overestimation bias will be crucial to improve its precision and applicability in real-world scenarios.

In conclusion, TransPVP enhances the performance of the transformer in the task of PV power prediction by integrating additional prior knowledge, reducing the complexity of relational expressions, and thereby improving the model performance, particularly under limited dataset conditions. TransPVP represents a significant advancement in PV power forecasting, offering a reliable and accurate tool for predicting energy production. Its innovative approach to integrating historical data and capturing long-term dependencies makes it a valuable asset for the renewable energy sector, contributing to more efficient and effective energy management practices.

Author Contributions

Conceptualization, J.W.; Methodology, L.X.; Software, L.X.; Formal analysis, W.H.; Investigation, W.H.; Data curation, J.W.; Writing—original draft, F.H.; Writing—review & editing, G.G.; Supervision, J.W.; Project administration, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by South Power Grid Network-level Science and Technology Project grant number GDKJXM20222474.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Jinfeng Wang was employed by the Electric Power Science Research Institute, Guangdong Power Grid Limited Liability Company. Authors Lingfeng Xuan, Feiwu He and Chaojie Zhong were employed by the Qingyuan Yingde Power Supply Bureau, Guangdong Power Grid Limited Liability Company. Author Guowei Guo was employed by the Foshan Shunde Power Supply Bureau, Guangdong Power Grid Limited Liability Company. Wenshan Hu declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

ANN	Artificial neural network	R-Humidity	Relative humidity
A-Power	Active power	RMSE	Root mean squared error
BiLSTM	Bidirectional LSTM	RNN	Recurrent neural network
CC	Correlation coefficient	W-Speed	Wind speed
CNN	Convolutional neural network	$d_{k}$	Distance to the k-th nearest neighbor
DL	Deep learning	$d_{l}$	Local reachability density
FFN	Feed-forward network	$D_{r}$	Reachability distance
GHR	Global horizontal radiation	$m 1$	Observation windows size of output power
GRU	Gated recurrent unit	$m 2$	Observation windows size of input power
LOF	Local outlier factor	$m 3$	Observation windows size
LSTM	Long short-term memory	$N_{k}$	Cardinality of k-nearest-neighbor sets
MAE	Mean absolute error	p	PV power
MAPE	Mean absolute percentage error	R²	R-squared
MBE	Mean bias error	t	Time
MHA	Multi-head attention	v	Observational values

References

IEA. Explore Historical Data and Forecasts for All Renewables Sectors and Technologies. Available online: https://www.iea.org/data-and-statistics/data-tools/renewable-energy-progress-tracker (accessed on 23 March 2024).
Agoua, X.G.; Girard, R.; Kariniotakis, G. Short-term spatio-temporal forecasting of photovoltaic power production. IEEE Trans. Sustain. Energy 2017, 9, 538–546. [Google Scholar] [CrossRef]
Hu, K.; Cao, S.; Wang, L.; Li, W.; Lv, M. A new ultra-short-term photovoltaic power prediction model based on ground-based cloud images. J. Clean. Prod. 2018, 200, 731–745. [Google Scholar] [CrossRef]
Mellit, A.; Pavan, A.M.; Lughi, V. Deep learning neural networks for short-term photovoltaic power forecasting. Renew. Energy 2021, 172, 276–288. [Google Scholar] [CrossRef]
Jamal, T.; Carter, C.; Schmidt, T.; Shafiullah, G.; Calais, M.; Urmee, T. An energy flow simulation tool for incorporating short-term PV forecasting in a diesel-PV-battery off-grid power supply system. Appl. Energy 2019, 254, 113718. [Google Scholar] [CrossRef]
Mellit, A.; Massi Pavan, A.; Ogliari, E.; Leva, S.; Lughi, V. Advanced methods for photovoltaic output power forecasting: A review. Appl. Sci. 2020, 10, 487. [Google Scholar] [CrossRef]
Wen, L.; Zhou, K.; Yang, S.; Lu, X. Optimal load dispatch of community microgrid with deep learning based solar power and load forecasting. Energy 2019, 171, 1053–1065. [Google Scholar] [CrossRef]
Abdel-Nasser, M.; Mahmoud, K. Accurate photovoltaic power forecasting models using deep LSTM-RNN. Neural Comput. Appl. 2019, 31, 2727–2740. [Google Scholar] [CrossRef]
Hossain, M.S.; Mahmood, H. Short-term photovoltaic power forecasting using an LSTM neural network and synthetic weather forecast. IEEE Access 2020, 8, 172524–172533. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Hawash, H.; Chakrabortty, R.K.; Ryan, M. PV-Net: An innovative deep learning approach for efficient forecasting of short-term photovoltaic energy production. J. Clean. Prod. 2021, 303, 127037. [Google Scholar] [CrossRef]
Rai, A.; Shrivastava, A.; Jana, K.C. A robust auto encoder-gated recurrent unit (AE-GRU) based deep learning approach for short term solar power forecasting. Optik 2022, 252, 168515. [Google Scholar] [CrossRef]
Jia, P.; Zhang, H.; Liu, X.; Gong, X. Short-term photovoltaic power forecasting based on VMD and ISSA-GRU. IEEE Access 2021, 9, 105939–105950. [Google Scholar] [CrossRef]
Dai, Y.; Wang, Y.; Leng, M.; Yang, X.; Zhou, Q. LOWESS smoothing and Random Forest based GRU model: A short-term photovoltaic power generation forecasting method. Energy 2022, 256, 124661. [Google Scholar] [CrossRef]
Xiang, X.; Li, X.; Zhang, Y.; Hu, J. A short-term forecasting method for photovoltaic power generation based on the TCN-ECANet-GRU hybrid model. Sci. Rep. 2024, 14, 6744. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Suresh, V.; Janik, P.; Rezmer, J.; Leonowicz, Z. Forecasting solar PV output using convolutional neural networks with a sliding window algorithm. Energies 2020, 13, 723. [Google Scholar] [CrossRef]
Lee, W.; Kim, K.; Park, J.; Kim, J.; Kim, Y. Forecasting solar power using long-short term memory and convolutional neural networks. IEEE Access 2018, 6, 73068–73080. [Google Scholar] [CrossRef]
Wang, K.; Qi, X.; Liu, H. A comparison of day-ahead photovoltaic power forecasting models based on deep learning neural network. Appl. Energy 2019, 251, 113315. [Google Scholar] [CrossRef]
Agga, A.; Abbou, A.; Labbadi, M.; El Houm, Y.; Ali, I.H.O. CNN-LSTM: An efficient hybrid deep learning architecture for predicting short-term photovoltaic power production. Electr. Power Syst. Res. 2022, 208, 107908. [Google Scholar] [CrossRef]
Liu, W.; Mao, Z. Short-term photovoltaic power forecasting with feature extraction and attention mechanisms. Renew. Energy 2024, 226, 120437. [Google Scholar] [CrossRef]
Zheng, J.; Zhang, H.; Dai, Y.; Wang, B.; Zheng, T.; Liao, Q.; Liang, Y.; Zhang, F.; Song, X. Time series prediction for output of multi-region solar power plants. Appl. Energy 2020, 257, 114001. [Google Scholar] [CrossRef]
Agga, A.; Abbou, A.; Labbadi, M.; El Houm, Y. Short-term self consumption PV plant power production forecasts based on hybrid CNN-LSTM, ConvLSTM models. Renew. Energy 2021, 177, 101–112. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.; Wong, W.; Woo, W. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems, Quebec, QC, Canada, 7–12 December 2015; pp. 1–9. [Google Scholar]
Phan, Q.-T.; Wu, Y.-K.; Phan, Q.-D. An approach using transformer-based model for short-term PV generation forecasting. In Proceedings of the 2022 8th International Conference on Applied System Innovation (ICASI), Nantou, Taiwan, 22–23 April 2022; pp. 17–20. [Google Scholar]
Tian, F.; Fan, X.; Wang, R.; Qin, H.; Fan, Y. A Power Forecasting Method for Ultra-Short-Term Photovoltaic Power Generation Using Transformer Model. Math. Probl. Eng. 2022, 2022, 9421400. [Google Scholar] [CrossRef]
Guo, M.; Mu, C.; Zhang, X.; Ding, Z. Ultra-Short-Term Photovoltaic Power Forecasting Based on Transformer Model. In Proceedings of the 2023 38th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Hefei, China, 27–29 August 2023; pp. 1041–1046. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Hao, Y.; Zhang, Y.; Liu, K.; He, S.; Liu, Z.; Wu, H.; Zhao, J. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; Association for Computational Linguistic: Kerrville, TX, USA, 2017; pp. 221–231. [Google Scholar]
Del Pero, C.; Aste, N.; Leonforte, F. The effect of rain on photovoltaic systems. Renew. Energy 2021, 179, 1803–1814. [Google Scholar] [CrossRef]
Scott, C.; Ahsan, M.; Albarbar, A. Machine learning for forecasting a photovoltaic (PV) generation system. Energy 2023, 278, 127807. [Google Scholar] [CrossRef]
Breunig, M.M.; Kriegel, H.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; Association for Computing Machinery: New York, NY, USA, 2000; pp. 93–104. [Google Scholar]
Meneses Rodríguez, D.; Horley, P.P.; Gonzalez Hernandez, J.; Vorobiev, Y.V.; Gorley, P.N. Photovoltaic solar cells performance at elevated temperatures. Sol. Energy 2005, 78, 243–250. [Google Scholar] [CrossRef]
Solanki, S.K.; Krivova, N.A.; Haigh, J.D. Solar irradiance variability and climate. Annu. Rev. Astron. Astrophys. 2013, 51, 311–351. [Google Scholar] [CrossRef]
Twohy, C.H.; Coakley, J.A., Jr.; Tahnk, W.R. Effect of changes in relative humidity on aerosol scattering near clouds. J. Geophys. Res. Atmos. 2009, 114, 1–12. [Google Scholar] [CrossRef]
Lowen, A.C.; Mubareka, S.; Steel, J.; Palese, P. Influenza virus transmission is dependent on relative humidity and temperature. PLoS Pathog. 2007, 3, 1470–1476. [Google Scholar] [CrossRef]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Sarmas, E.; Spiliotis, E.; Stamatopoulos, E.; Marinakis, V.; Doukas, H. Short-term photovoltaic power forecasting using meta-learning and numerical weather prediction independent Long Short-Term Memory models. Renew. Energy 2023, 216, 118997. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; Association for Computational Linguistic: Kerrville, TX, USA, 2014; pp. 1724–1734. [Google Scholar]
Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long Short Term Memory Networks for Anomaly Detection in Time Series. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 22–24 April 2015; pp. 89–94. [Google Scholar]
Venugopalan, S.; Xu, H.; Donahue, J.; Rohrbach, M.; Mooney, R.; Saenko, K. Translating videos to natural language using deep recurrent neural networks. In Proceedings of the 2015 Annual Conference of the North American Chapter of the ACL, Denver, CO, USA, 31 May–5 June 2015; pp. 1494–1504. [Google Scholar]
Lim, S.; Huh, J.; Hong, S.; Park, C.; Kim, J. Solar power forecasting using CNN-LSTM hybrid model. Energies 2022, 15, 8233. [Google Scholar] [CrossRef]
Ramadhan, R.A.; Heatubun, Y.R.; Tan, S.F.; Lee, H. Comparison of physical and machine learning models for estimating solar irradiance and photovoltaic power. Renew. Energy 2021, 178, 1006–1019. [Google Scholar] [CrossRef]
Rodríguez, F.; Azcárate, I.; Vadillo, J.; Galarza, A. Forecasting intra-hour solar photovoltaic energy by assembling wavelet based time-frequency analysis with deep learning neural networks. Int. J. Electr. Power Energy Syst. 2022, 137, 107777. [Google Scholar] [CrossRef]
Akhter, M.N.; Mekhilef, S.; Mokhlis, H.; Ali, R.; Usama, M.; Muhammad, M.A.; Khairuddin, A.S.M. A hybrid deep learning method for an hour ahead power output forecasting of three different photovoltaic systems. Appl. Energy 2022, 307, 118185. [Google Scholar] [CrossRef]
Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [Google Scholar] [CrossRef]
Urraca, R.; Huld, T.; Gracia-Amillo, A.; Martinez-de-Pison, F.J.; Kaspar, F.; Sanz-Garcia, A. Evaluation of global horizontal irradiance estimates from ERA5 and COSMO-REA6 reanalyses using ground and satellite-based data. Sol. Energy 2018, 164, 339–354. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE). Geosci. Model Dev. Discuss. 2014, 7, 1525–1534. [Google Scholar]
Oster, E. Unobservable selection and coefficient stability: Theory and evidence. J. Bus. Econ. Stat. 2019, 37, 187–204. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, 623. [Google Scholar] [CrossRef]
Asuero, A.G.; Sayago, A.; González, A. The correlation coefficient: An overview. Crit. Rev. Anal. Chem. 2006, 36, 41–59. [Google Scholar] [CrossRef]
Schober, P.; Boer, C.; Schwarte, L.A. Correlation coefficients: Appropriate use and interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The history and trend of changes in various renewable energy sources. The image is from IEA.

Figure 2. Architecture of the TransPVP model. The input data include observational values of determinant factors, historical electricity generation data, and their corresponding time. In the diagram, N represents the number of elements in the observational values vector v, which refers to the collected data points or measurements. P denotes the prediction results, representing the forecasted sequence of future power generation values.

Figure 3. Architecture of the scaled dot-product attention.

Figure 4. The irradiance variation over a year at 1-h intervals after removing the outliers.

Figure 5. The temperature variation over a year at 1-h intervals after the removal of outliers.

Figure 6. The atmospheric pressure variation over a year at 1-h intervals after the removal of outliers.

Figure 7. The relative humidity variation over a year at 1-h intervals after the removal of outliers.

Figure 8. Loss change with epoch number during the training process.

Figure 9. PV power prediction over one day without historical output power in the input data.

Figure 10. PV power prediction over one day with historical output power in the input data.

Figure 11. Prediction error in one day without historical output power in the input data.

Figure 12. Prediction error in one day with historical output power in the input data.

Table 1. Observations and their units for photovoltaic power forecasting.

Para.	Time	A-Power	GHR	Temperature	Humidity	W-Speed
Unit	min.	kW	W/m²	°C	%	m/s
Measuring instruments	UTC	DC energy meter	Thermopile pyranometer	Thermometers	Hygrometer	Vane anemometer
Uncertainty	±1–10 ns	±1%	±2%	±0.45 °C	±2%	±1%

Table 2. Comparison of model performance under different evaluation metrics.

	MAE	MAPE	MBE	RMSE	R²	CC
LSTM	0.186155	314.011908	−0.00602	0.388174	0.984294	0.992131
LSTM_P	0.195905	231.321192	0.098914	0.360774	0.986433	0.995273
BiLSTM	0.190848	468.690777	0.015479	0.375903	0.985271	0.992657
BiLSTM_P	0.139434	92.3552930	−0.05036	0.298369	0.99072	0.995651
SLSTM	0.179162	230.70800	−0.00643	0.411663	0.982335	0.991167
SLSTM_P	0.143087	119.037521	0.015665	0.315641	0.989615	0.99487
CLSTM	0.175775	216.945815	0.005124	0.401153	0.983226	0.991595
CLSTM_P	0.119932	44.4043636	0.03429	0.285214	0.991521	0.995814
GRU	0.185243	340.92245	0.004308	0.398642	0.983435	0.991686
GRU_P	0.145815	239.695311	0.050259	0.29379	0.991003	0.995789
BiGRU	0.191552	207.974887	−0.04307	0.411354	0.982362	0.991448
BiGRU_P	0.148156	120.255649	−0.07482	0.304496	0.990335	0.995716
TrsF	0.142578	319.17021	−0.04898	0.349862	0.987241	0.993739
TrsF_P	0.139847	316.4061	0.03608	0.295644	0.990889	0.995686
TransPVP	0.144838	403.382540	0.042599	0.278392	0.991921	0.996168
TransPVP_P	0.143808	331.934595	0.064755	0.266043	0.992622	0.996524

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Hu, W.; Xuan, L.; He, F.; Zhong, C.; Guo, G. TransPVP: A Transformer-Based Method for Ultra-Short-Term Photovoltaic Power Forecasting. Energies 2024, 17, 4426. https://doi.org/10.3390/en17174426

AMA Style

Wang J, Hu W, Xuan L, He F, Zhong C, Guo G. TransPVP: A Transformer-Based Method for Ultra-Short-Term Photovoltaic Power Forecasting. Energies. 2024; 17(17):4426. https://doi.org/10.3390/en17174426

Chicago/Turabian Style

Wang, Jinfeng, Wenshan Hu, Lingfeng Xuan, Feiwu He, Chaojie Zhong, and Guowei Guo. 2024. "TransPVP: A Transformer-Based Method for Ultra-Short-Term Photovoltaic Power Forecasting" Energies 17, no. 17: 4426. https://doi.org/10.3390/en17174426

APA Style

Wang, J., Hu, W., Xuan, L., He, F., Zhong, C., & Guo, G. (2024). TransPVP: A Transformer-Based Method for Ultra-Short-Term Photovoltaic Power Forecasting. Energies, 17(17), 4426. https://doi.org/10.3390/en17174426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TransPVP: A Transformer-Based Method for Ultra-Short-Term Photovoltaic Power Forecasting

Abstract

1. Introduction

2. Proposed Methodology

2.1. Training Data

2.2. Data Preprocessing

2.3. Multi-Head Attention Model

3. Experiment Setups

3.1. Dataset

3.2. Baselines

3.3. Evaluation Metrics

3.4. Experimental Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI