Improving Short-Term Photovoltaic Power Generation Forecasting with a Bidirectional Temporal Convolutional Network Enhanced by Temporal Bottlenecks and Attention Mechanisms

Gan, Jianhong; Lin, Xi; Chen, Tinghui; Fan, Changyuan; Wei, Peiyang; Li, Zhibin; Huo, Yaoran; Zhang, Fan; Liu, Jia; He, Tongli

doi:10.3390/electronics14020214

Open AccessArticle

Improving Short-Term Photovoltaic Power Generation Forecasting with a Bidirectional Temporal Convolutional Network Enhanced by Temporal Bottlenecks and Attention Mechanisms

by

Jianhong Gan

¹

,

Xi Lin

^1,*,

Tinghui Chen

¹,

Changyuan Fan

^1,*,

Peiyang Wei

¹

,

Zhibin Li

^1,2,

Yaoran Huo

³,

Fan Zhang

¹,

Jia Liu

¹ and

Tongli He

¹

College of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China

²

Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China

³

Information & Communication Company, State Grid Sichuan Electric Power Company, Chengdu 610041, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(2), 214; https://doi.org/10.3390/electronics14020214

Submission received: 14 November 2024 / Revised: 19 December 2024 / Accepted: 30 December 2024 / Published: 7 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Accurate photovoltaic (PV) power forecasting is crucial for effective smart grid management, given the intermittent nature of PV generation. To address these challenges, this paper proposes the Temporal Bottleneck-enhanced Bidirectional Temporal Convolutional Network with Multi-Head Attention and Autoregressive (TB-BTCGA) model. It introduces a temporal bottleneck structure and Deep Residual Shrinkage Network (DRSN) into the Temporal Convolutional Network (TCN), improving feature extraction and reducing redundancy. Additionally, the model transforms the traditional TCN into a bidirectional TCN (BiTCN), allowing it to capture both past and future dependencies while expanding the receptive field with fewer layers. The integration of an autoregressive (AR) model optimizes the linear extraction of features, while the inclusion of multi-head attention and the Bidirectional Gated Recurrent Unit (BiGRU) further strengthens the model’s ability to capture both short-term and long-term dependencies in the data. Experiments on complex datasets, including weather forecast data, station meteorological data, and power data, demonstrate that the proposed TB-BTCGA model outperforms several state-of-the-art deep learning models in prediction accuracy. Specifically, in single-step forecasting using data from three PV stations in Hebei, China, the model reduces Mean Absolute Error (MAE) by 38.53% and Root Mean Square Error (RMSE) by 33.12% and increases the coefficient of determination (

R^{2}

) by 7.01% compared to the baseline TCN model. Additionally, in multi-step forecasting, the model achieves a reduction of 54.26% in the best MAE and 52.64% in the best RMSE across various time horizons. These results underscore the TB-BTCGA model’s effectiveness and its strong potential for real-time photovoltaic power forecasting in smart grids.

Keywords:

photovoltaic power generation prediction; BiTCN; temporal bottleneck structure; deep residual shrinkage networks

1. Introduction

The continuous rise in global energy demand, coupled with resource depletion and climate change, has posed significant challenges for humanity. By 2050, the share of renewable energy in the total primary energy supply is anticipated to rise from 14% in 2015 to 63% [1,2]. Currently, distributed resources like wind and solar power are spreading worldwide and playing a vital role in energy systems. As technology progresses, the integration of solar photovoltaics (PV) into smart grids is becoming increasingly common. Accurate and reliable PV forecasting can provide significant benefits to contemporary smart grid management. However, the PV system is influenced by the environment, weather, and solar radiation, making its output power unstable and fluctuating [3]. Hence, minimizing the randomness and intermittency of solar power generation is crucial to enhancing the accuracy of PV power prediction.

Recently, researchers have conducted extensive studies on PV power generation forecasting, employing a wide range of methods to build models, e.g., physical methods, statistical and probability methods, artificial methods, and mixed methods. However, physical methods are susceptible to environmental changes, leading to reduced prediction accuracy [4]. Moreover, the complex and multidimensional nonlinear relationship between input and output data makes it challenging for existing statistical and probabilistic methods to accurately capture and model this complexity, resulting in inaccurate fitting results [5,6].

Artificial Intelligence (AI) methods have become integral to the current photovoltaic power prediction framework. Their robust nonlinear mapping and feature extraction capabilities, coupled with rapid advancements in computer and data mining technologies, have made them indispensable [7]. Furthermore, the rapid development of computer technology has resulted in the extensive use of single neural network prediction models in photovoltaic power forecasting. Lee and Kim [8] developed three PV prediction models based on neural networks, i.e., Artificial Neural Networks (ANNs), Deep Neural Networks (DNNs), and Long Short-Term Memory (LSTM) networks, and validated their performance. The results show that all three models surpass non-AI methods in accuracy and better accommodate the nonlinear characteristics of photovoltaic power generation. Sarmas et al. [9] studied the enhancement of short-term PV power prediction by integrating meta-learning and LSTM models independent of numerical weather prediction. Nonetheless, the traditional LSTM model only captures forward temporal information, neglecting backward information [10]. To solve this problem, Graver and Schmidhuber [11] proposed the Bidirectional Long Short-Term Memory Network (BiLSTM), which captures time series with higher predictive accuracy than LSTM. Ma et al. [12] proposed an integrated framework that uses a Gated Recurrent Unit (GRU) combined with an improved sine and cosine algorithm to optimize prediction accuracy. Despite the high prediction accuracy achieved by recurrent neural network (RNN)-based models and their variants, their training processes are often time-consuming and memory-intensive, and they frequently encounter issues with gradient explosion or vanishing. Bai et al. [13] introduced the Temporal Convolutional Network (TCN) to address lengthy input sequences in their entirety, allowing for faster data processing and robust parallel computing capabilities. Notably, by utilizing residual blocks and extended causal convolution, the TCN maintains strong predictive performance for long time series, surpassing LSTM in certain cases [14]. Wang et al. [15] proposed an efficient contract TCN model, which used an improved Deep Residual Shrinkage Network (DRSN) to replace the original TCN residuals to improve the accuracy of PV power prediction. Zhang et al. [16] proposed the Attention-based Multivariate Temporal Convolutional Network (AMTCN), integrating dilated convolutions with multi-head attention to enhance the accuracy and robustness of electricity consumption forecasting.

Due to the inherent randomness, volatility, and instability of photovoltaic electricity, a single forecast model frequently fails to meet engineering practice standards [17,18]. In order to solve this problem, Convolutional Neural Networks (CNNs) combined with deep learning techniques can provide significant advantages, especially in processing time-series data and extracting features. Li et al. [19] proposed a hybrid PV prediction framework based on a TCN, designed to improve utility-scale PV forecasting several hours ahead. Qu et al. [20] proposed a CNN-LSTM neural network combined with an attention mechanism for day-by-hour photovoltaic power prediction. By embedding a variety of prediction models of relevant and target variables, the method employs an attention mechanism to improve the ability of the model to capture important features, thereby improving the prediction accuracy. Lai et al. [21] introduced a Long- and Short-Term Net (LSTNet) model to integrate CNN, LSTM, and AR models, enabling the effective analysis and forecasting of PV data and power. Zhen et al. [22] developed a hybrid model that merges BiLSTM with a genetic algorithm to improve forecasting accuracy. Abdel-Basset et al. [23] introduced a novel data-driven PV-Net for short-term PV power forecasting, which redesigns the gates of the GRU with convolutional layers. Yu et al. [24] proposed a short-term PV power prediction method that combines double-layer decomposition, the whale optimization algorithm, and a Bidirectional Long Short-Term Memory network. Empirical studies on weather datasets showed that their proposed method could significantly improve the accuracy and reliability of the prediction. Limouni et al. [25] developed a hybrid PV prediction model that combines weather factors and a TCN-LSTM network, demonstrating superior performance to that of the TCN or LSTM alone. Pu et al. [26] introduced an interactive behavior-learning method based on the TCN-GRU model, which exhibited excellent coupling ability in scenarios with incomplete information. Fu et al. [27] proposed a hybrid framework combining improved Variational Mode Decomposition (VMD), a TCN, a GRU, and multi-head attention (MA), optimized by the Sparrow Search Algorithm, to enhance the accuracy and robustness of photovoltaic power forecasting under nonlinear and fluctuating conditions.

While TCNs have broad applications in time-series analysis, prior research has primarily utilized one-way TCNs, which only extract forward information and neglect the impact of backward information on predictions [28]. Notably, redundant secondary feature information can interfere with the TCN’s feature extraction, affecting final predictions. Furthermore, the receptive field size in the TCN depends on the network parameters, including the number of layers and convolution kernel size, which directly influences feature extraction performance, prediction accuracy, and memory usage. Traditional convolution operations, especially when using large steps, reduce the time dimension and lead to the loss of time information. To address the limitations and unidirectivity of TCNs, this paper proposes a combination model based on a BiTCN and BiGRU, which improves the residual block of the BiTCN model with a temporal bottleneck structure. Additionally, this model also introduces a multi-head attention mechanism and a linear AR model to improve the model’s linear extraction capabilities and prediction accuracy. The temporal bottleneck structure enhances feature extraction by reducing redundancy and focusing on critical temporal information, while the bidirectional networks, including the BiTCN and BiGRU, comprehensively capture both past and future dependencies in the sequence. These innovations are designed to mitigate the shortcomings of traditional methods, enabling more accurate and robust PV power forecasting. This paper makes the following specific contributions:

(1) To enhance the information extraction capability of a single network, a new hybrid model, BiTCN–Multi-Head Attention–BiGRU, called the Temporal Bottleneck-enhanced Bidirectional Temporal Convolutional Network with Multi-Head Attention and Autoregressive (TB-BTCGA) model, is developed. The BiTCN extracts the implicit relationships between PV power and meteorological factors, such as solar irradiance and temperature, which are then fed into the BiGRU for prediction. The multi-head attention mechanism amplifies the influence of crucial information on BiGRU output, thereby improving prediction accuracy.

(2) Considering the various meteorological inputs for photovoltaic power generation prediction, the multi-head attention mechanism enables each attention unit to calculate its weight in parallel for each time step. After aggregation, it selects the most relevant time step for prediction.

(3) The residual blocks in the BiTCN have been enhanced to address the issue of gradient disappearance or attenuation when the number of network layers increases significantly, thereby improving feature extraction capabilities. Firstly, the residual block of the TCN is enhanced using a temporal bottleneck structure to improve the model’s time feature extraction ability. Secondly, the DRSN can be used to further refine the residual blocks in the TCN, enhancing the model’s adaptability and interpretability for complex time-series data and reducing redundant features.

(4) Additionally, the TB-BTCGA model effectively captures linear features by incorporating the AR model. Weighting and combining the outputs of other models further enhance the model’s ability to extract linear features and significantly improve its overall prediction accuracy.

Section 2 introduces the primary methods used in the TB-BTCGA model, including the core modules and their functionalities. Section 3 focuses on the improvements made to the TCN residual block, detailing the structural enhancements and their impact on the model’s performance. Section 4 discusses the data sources, data analysis, experimental settings, and implementation details. Finally, Section 5 concludes the paper, summarizing the key findings and outlining directions for future research.

2. Methodology

2.1. The Proposed TB-BTCGA Model

The TB-BTCGA model is composed of three main modules: the BiTCN, BiGRU, and multi-head attention mechanism. The BiTCN takes into account both past and future information of the sequence, capturing bidirectional dependencies more comprehensively than the traditional TCN. Meanwhile, the multi-layer time convolutional network implemented by the BiTCN with an increasing expansion rate can adapt to feature learning at different time scales. As a highly efficient bidirectional RNN, the BiGRU can effectively capture long-duration dependencies in sequences while minimizing both the model parameters and training duration. It allows the model to focus on different aspects of sequence information in parallel and comprehensively consider the feature representation of multiple subspaces, thereby improving the efficiency of important feature extraction in the sequence. By enhancing the model’s focus on key information, the multi-head attention mechanism improves both performance and interpretability in dealing with data that have complex internal structures. In addition, the AR model and the output-weighted combination of the BiTCN-MA-BiGRU model are utilized to achieve the final prediction. The flowchart of the model structure is shown in Figure 1.

The TB-BTCGA model processes data through a structured flow designed to extract and refine temporal features for accurate PV power forecasting. First, the model begins with data preprocessing, where input data, including meteorological data (e.g., global irradiance, temperature, humidity, and wind speed) and historical PV power generation data, are cleaned to handle abnormal or missing values. The cleaned data are then normalized to ensure numerical stability, and correlation analysis is performed to select the most relevant features for the prediction task.

Secondly, the preprocessed data are fed into the core modules of the TB-BTCGA model. The first core component is the BiTCN module, which processes the sequence data by capturing both past and future dependencies. Through its residual blocks with increasing dilation rates, the BiTCN extracts temporal features at multiple time scales while preserving critical information across layers. Then, the outputs of the BiTCN are further refined by the BiGRU module, a bidirectional recurrent neural network. The BiGRU effectively captures long-term dependencies in the sequence while maintaining computational efficiency through its reduced parameter complexity. At the same time, a multi-head attention mechanism is applied to enhance the model’s focus on key features. By assigning varying weights to different parts of the sequence, the attention mechanism improves the interpretability and accuracy of the model, particularly for complex datasets.

After feature extraction, the outputs of the BiTCN, BiGRU, and attention mechanism are integrated into a unified representation. Then, this combined representation is passed through a fully connected layer to reduce the dimensionality and align the features.

Finally, the AR model plays a critical role in refining the final predictions. By leveraging its strength in capturing linear relationships within sequential data, the AR model effectively complements the nonlinear patterns learned by the BiTCN, BiGRU, and multi-head attention mechanisms. It combines and weights the outputs of these modules to create a balanced and comprehensive representation that incorporates both linear and nonlinear dependencies. This synergy ensures that the final predictions are not only precise but also robust across varying scenarios.

2.2. TCN Model

The TCN is composed of three core modules, i.e., causal convolution, extended convolution, and residual connection, which combine the advantages of a CNN and an RNN [29]. It effectively avoids the issue of gradient hours or gradient explosions that often occur in recurrent neural networks. Moreover, this model has the advantages of parallel computing, lower memory usage, improved network performance, and the ability to capture both long- and short-term features in the input sequence, compared to traditional RNN and LSTM models. The TCN emphasizes time causality in time-series prediction. The value at time step t in the previous layer depends only on the value at time step t in the next layer and the values before it. Given the input sequence

X^{τ + 1} = x_{0}, x_{1}, \dots, x_{τ}

and the corresponding target output sequence

Y^{τ + 1} = y_{0}, y_{1}, \dots, y_{τ}

, the predictive output

{\hat{y}}_{t}

for time step t is constrained to depend only on the inputs up to and including time t, i.e.,

x_{0}, x_{1}, \dots, x_{t}

. This ensures that each prediction

{\hat{y}}_{t}

is made based solely on the observed data up to that point, and the predictive output sequence can be expressed as

\hat{y} (t) = F_{θ} (x_{0}, x_{1}, \dots, x_{τ}) {(\forall)}_{t} \in [0, τ]

(1)

where

F_{θ}

() presents the forward propagation process in the neural network, and

θ

represents the parameters in the network.

The structure shown in Figure 2a ensures that future data do not leak into the past. The TCN convolutional layer skips the specified step size by sampling at intervals during convolution so that a larger receptive field and longer time-series dependence can be obtained at the same size output. Given a 1D sequence of inputs

x \in R^{n}

and a convolution filter mapping

0, \dots, k - 1 \in R

, the dilated convolution for the components s in the sequence is defined as follows:

F (s) = \sum_{j = 0}^{k - 1} f (j) \cdot x_{s - d \cdot j}

(2)

where k represents the size of the convolution kernel, and

s - d \cdot j

captures past information. The dilation factor d determines the number of zero vectors placed between adjacent convolution kernels. With each application of a convolutional layer to the input sequence, the dilation factor d grows exponentially.

However, as the number of network layers grows substantially, problems such as gradient attenuation or even the vanishing gradient problem can occur, particularly in deep networks processing complex time-series data. To address these challenges, a carefully designed residual block is incorporated into the TCN, as illustrated in Figure 2b. This block leverages dilated causal convolutions to expand the receptive field while maintaining the computational efficiency and causality required for sequence modeling. Each residual block includes multiple layers of dilated convolutions, complemented by weight normalization, Rectified Linear Unit (ReLU) activation functions, and dropout for enhanced regularization and stability. Additionally, the residual block integrates positive and inverse residual units, enabling the model to effectively capture bidirectional temporal dependencies. Skip connections are implemented using 1 × 1 convolutions, allowing the direct mapping of input features to the output while adjusting the dimensionality as needed. In the residual block, let x be the output from the previous layer, and let

F (x)

be the result of the operation performed by the current layer. Then, the sum of

F (x)

and x is passed through the ReLU activation function to obtain the final output y. This process can be expressed as

y = R e L U (x + F (x))

(3)

where x is the input from the previous layer,

F (x)

is the transformed output of the current layer, and y is the final output after applying the ReLU activation function. The residual connection ensures that the output of the current layer is the sum of the transformed features and the original input features, allowing the model to learn identity mappings better if needed.

Nevertheless, since the traditional TCN only extracts the forward information and ignores the backward information, the TCN has been modified into a bidirectional version to expand the receptive field with fewer layers while maintaining the feature mapping dimension, as illustrated in Figure 3.

In Figure 4, it can be seen that after modifying the dilated causal convolutions in the TCN to a bidirectional version to expand the receptive field with fewer layers, the corresponding residual blocks also need to be modified to a bidirectional structure. This ensures that the residual connections, which combine the input and output of each block, match the bidirectional processing of the modified convolutions. Such a modification is necessary to maintain the integrity of the feature flow and to effectively capture both past and future contexts in the network.

2.3. BiGRU

GRU networks perform well when processing time-series tasks. Compared to LSTM, the GRU has a more streamlined design and demonstrates superior performance in convergence speed, parameter updates, and generalization. It effectively captures the dependency relationships within time-series data. The essential parts of the GRU are the reset gate and the update gate. The BiGRU splits the traditional GRU neural unit into forward and reverse transmission states, each corresponding to updating the hidden state based on historical and future data, respectively. The structures of the GRU and BiGRU are shown in Figure 5.

R_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}]+ b_{r})

(4)

Z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}]+ b_{z})

(5)

\tilde{h_{t}} = t a n h (W_{h} \cdot [R_{t} * h_{t - 1}, x_{t}] + b_{h})

(6)

h_{t} = (1 - Z_{t}) * h_{t - 1} + Z_{t} * \tilde{h_{t}}

(7)

where

Z_{t}

,

R_{t}

,

\tilde{h_{t}}

, and

h_{t}

represent the update gate, reset gate, candidate hidden state, and final hidden state, respectively;

W_{r}

,

W_{z}

, and

W_{h}

denote the weight matrices;

σ

is the sigmoid activation function; tanh is the hyperbolic tangent function;

h_{t - 1}

is the hidden state from the previous time step; and

x_{t}

is the input at the current time step; and

b_{r}

,

b_{z}

, and

b_{h}

are bias vectors corresponding to the reset gate, update gate, and candidate hidden state bias items.

2.4. Multi-Head Attention Mechanism (MA)

In this study, a multi-head attention mechanism is employed, allowing the model to learn information from different representation subspaces by processing multiple attention heads in parallel. Each header performs a self-attention calculation by first putting the input sequence X through three different linear transformations to obtain the query (Q), key (K), and value (V) matrices. Then, the inner product of the query and key is computed and normalized by

\sqrt{d_{k}}

to stabilize the training process. The softmax function converts the normalized scores into a probability distribution for the weighted summation of the value vectors, and finally, the output of the different heads is integrated by a linear transformation to form the final attention output. This mechanism not only enhances the model’s ability to capture complex dependencies in sequence data but also improves the diversity and efficiency of information processing. The expressions for Q, K, V, and softmax are shown in (8) to (11):

Q = X W^{Q}

(8)

K = X W^{K}

(9)

V = V W^{V}

(10)

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(11)

where

W^{Q}

,

W^{K}

, and

W^{V}

are parameter matrices,

d_{k}

is the dimension of the key vector, and the value

\sqrt{d_{k}}

is employed to scale the magnitude of the dot product.

2.5. Autoregressive Model

The autoregressive model (AR) performs the data autoregressive operation, using

x_{1}

to

x_{t - i}

to predict the value of

x_{t}

at time step t. The

x_{t}

expression is

x_{t} = \sum_{i = 1}^{p} ε_{i} x_{t - i} + β_{i}

(12)

where

ε_{i}

is a constant coefficient, and

β_{i}

is a random error. The prediction result

x_{i}

of the AR model and the output

y t - 1 + Δ

of the BiTCN-MA-BiGRU model are combined by linear weighting to form the final prediction result in the

y^{*}

expression:

y^{*} = α y_{α} + β y_{t - 1 + Δ}

(13)

In this context,

α

and

β

are weight coefficients, where

α + β = 1

.

3. Improved TCN Residual Block

To address the issue in the TCN where convolution operations, especially with larger strides, reduce the temporal dimension and cause information loss, a temporal bottleneck residual structure was introduced into the TCN’s residual framework. The temporal bottleneck residual structure, proposed by Choi et al. [30], incorporates a transposed convolution that enriches temporal information. It specifically includes two components,

G_{1}

and

G_{2}

, along with skip connections. The formula is as follows:

y = R e L u (G_{2} (G_{1} (x)) + I (x)

(14)

where

G_{1}

is a standard convolution used to reduce the temporal dimension, while

G_{2}

is a transposed convolution function used to restore the temporal dimension.

I (x)

represents the skip connection, expressed as follows:

G_{1} (x) = γ_{1} \frac{C o n v 1 D (x, s t r i d e, k e r n e l_s i z e) - μ_{B}}{\sqrt{σ_{B}^{2}} + ϵ} + β_{1}

(15)

G_{2} (x) = γ_{2} \frac{C o n v 1 D (x, s t r i d e, k e r n e l_s i z e) - μ_{B}}{\sqrt{σ_{B}^{2}} + ϵ} + β_{2}

(16)

C o n v 1 D (x) = \sum_{i = 0}^{k - 1} x (t + i) \cdot ω_{1} (i)

(17)

C o n v 1 D T r a n s p o s e (x) = \sum_{i = 0}^{k_{s} - 1} x (t + i) \cdot ω_{2} (i)

(18)

I (x) = \sum_{i = 0}^{k_{s} - 1} x (t + i) \cdot ω_{s} (i)

(19)

where

C o n v 1 D (x)

is a 1D convolution,

C o n v 1 D T r a n s p o s e (x)

is a transpose convolution, stride is the stride size, and

k e r n e l_s i z e

is the size of the convolution kernel.

γ_{1}

and

γ_{2}

are batch-normalized learnable scaling parameters used to adjust the amplitude of the output.

β_{1}

and

β_{2}

are learnable offset parameters after batch normalization.

μ_{B}

is the small-lot mean.

σ_{B}^{2}

is the small-lot variance.

ϵ

is a very small constant for numerical stability, preventing division by zero.

ω_{1} (i)

,

ω_{2} (i)

, and

ω_{s} (i)

are the convolution kernel weights.

By incorporating the concept of the temporal bottleneck residual structure, the original residuals in the TCN are improved, as shown in part A of Figure 6. Only the residual structure of the unidirectional TCN is shown here. The bidirectional structure is similar to that in Figure 4, with the original structure replaced by the improved structure. In the original TCN residual structure, a transposed convolution is added to create the temporal bottleneck structure. This approach enhances the model’s ability to capture temporal features, ensuring that key temporal information is not lost during feature extraction. By maintaining the integrity of the temporal dimension, the statistical pooling layer can more effectively aggregate temporal information, thereby improving the quality and accuracy of the aggregated features. This improvement significantly enhances the accuracy and stability of photovoltaic power generation predictions.

In addition to the above improvements, this paper also introduces the soft thresholding mechanism from the DRSN, as shown in parts B and C of Figure 6.

The DRSN is an enhanced algorithm for residual networks that integrates attention mechanisms and soft thresholding methods to support autonomous filter learning. Compared with the traditional wavelet threshold, it is more efficient and accurate, which can avoid the inconvenience and blindness of artificial threshold settings and achieve the purpose of reducing the influence of secondary redundancy features on the network [31]. The DRSN enhances the original residual network by incorporating a soft thresholding mechanism and an attention mechanism. The soft thresholding mechanism is used for signal noise reduction by setting a threshold value. Features with absolute values beneath this threshold are assigned a value of zero, while other features are shrunk toward 0. The output of the soft thresholding mechanism and its derivatives are as follows:

f (x) = \{\begin{matrix} x - τ, x > τ \\ 0, - τ \leq x \leq τ \\ x + τ, x < - τ \end{matrix}

(20)

\frac{d_{f (x)}}{d_{x}} = \{\begin{matrix} 1, x > τ \\ 0, - τ \leq x \leq τ \\ 1, x < - τ \end{matrix}

(21)

where x is the input value,

f (x)

is the output after the soft threshold, and

τ

is the threshold. This approach efficiently lessens the model’s load in handling redundant information while enhancing its emphasis on significant features. Consequently, it boosts overall prediction accuracy and enhances the model’s robustness. Through this method, the DRSN not only optimizes the feature extraction process but also enhances the adaptability and interpretation ability of the model for complex time-series data. Additionally, the ReLu activation function was replaced with the Gaussian Error Linear Unit (GeLu) activation function, resulting in less information loss and a smoother model, thereby improving the model’s generalization capability.

4. Case Study

4.1. Analyze Dataset

The dataset used in this study consists of one year of data from a power station in Hebei, China, with records from three monitoring stations: Station 0, Station 1, and Station 2, representing the central, northern, and southern regions of Hebei Province, respectively. Each station’s dataset includes power generation, forecasted total irradiance, forecasted direct irradiance, forecasted temperature, humidity, wind speed, and local meteorological data, as detailed in Table 1. The forecast data were provided by Numerical Weather Prediction (NWP), while the local meteorological data were derived from Local Meteorological Data (LMD). The data cover the period from August 2018 to September 2019, with records collected every 15 min. Taking Station 0 as an example, Table 2 presents a statistical analysis of the mean, maximum, and minimum values for each data component. The dataset is divided into training, validation, and test sets in a 7:1.5:1.5 ratio.

The three stations differ significantly in terms of their weather patterns. Station 0 (central Hebei) experiences a moderate, temperate climate with relatively balanced seasonal variations, making it suitable for assessing standard operational conditions for PV generation. Station 1 (northern Hebei) faces harsher, colder winters and more extreme temperature variations, often influenced by the Siberian cold air, which can impact the efficiency of PV systems during winter. Station 2 (southern Hebei) enjoys a milder climate with higher humidity and more frequent rainfall in summer, resulting in a different set of challenges, such as higher humidity levels and seasonal storms, which can affect PV performance. These variations in climate across the three stations allow for a comprehensive evaluation of the model’s generalization ability under different meteorological conditions.

4.2. Feature Analysis Method

To minimize the computational complexity of the prediction model, the Spearman Correlation Coefficient (SCC) and the Pearson Correlation Coefficient (PCC) were employed to assess the relationships among input variables [32,33]. The PCC (22) and SCC (23) are calculated as follows:

r = \frac{\sum_{i = 1}^{n} (q_{i} - \bar{Q}) (k_{i} - \bar{K})}{\sqrt{\sum_{i = 1}^{n} {(q_{i} - \bar{Q})}^{2}} \sqrt{\sum_{i = 1}^{n} {(q_{i} - \bar{K})}^{2}}}

(22)

ρ = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}

(23)

where

r_{Q K}

has the range [−1,1], and the larger the absolute value, the stronger the correlation between Q and K.

ρ

represents Spearman’s rank correlation coefficient, and

d_{i}

represents the rank difference in the corresponding variable, that is, the position (rank) difference in the paired variables after the two variables are sorted. n indicates the number of observed objects.

In their study, Gherboudj and Ghedira [34] demonstrated that effective feature selection for PV power generation models relies on meteorological factors such as irradiance, temperature, and humidity, which significantly influence the solar energy potential and performance. Building on this, the correlation coefficient between meteorological variables and power generation at Station 0 was calculated using the above method, as shown in Table 3. Figure 7 visualizes this information in the form of a heat map. The analysis shows that the most relevant meteorological factor for photovoltaic power generation is irradiance, with both forecast and ground-observed data showing a strong correlation between solar irradiance and PV power output. Temperature and humidity exhibit a moderate correlation with photovoltaic power generation, as does wind speed. However, while the wind speed in the forecast data shows a weak correlation, the ground-observed wind speed demonstrates a higher correlation, indicating a moderate relationship with PV power generation. Therefore, the final model also includes ground-observed wind speed and meteorological factors such as irradiance, temperature, and humidity—each showing strong to moderate correlations.

4.3. Performance Evaluation Metrics

Equations (24)–(26) define the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (

R^{2}

), which are used to evaluate the prediction model’s accuracy in this experiment [35]. The smaller the MAE and RMSE values, the more accurate the prediction result. Additionally, an

R^{2}

value closer to 1 indicates higher prediction accuracy.

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(24)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | (y_{i} - {\hat{y}}_{i}) |

(25)

R^{2} = 1 - \frac{\sum_{i} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1} {({\bar{y}}_{i} - y_{i})}^{2}}

(26)

In this context,

y_{i}

stands for the actual PV output power,

{\hat{y}}_{i}

signifies the predicted PV output power, and N denotes the number of test samples.

4.4. Parameter Setting

The experimental code described in this paper is implemented using the PyTorch 2.1.1 framework. Moreover, selecting hyperparameters is essential for effectively training a model. The chosen hyperparameters in this study are shown in Table 4. The model’s configuration includes an input dimension of 10 and an output dimension of 1. Additionally, the training process is configured to end after 200 epochs, with a dropout rate set at 0.05. Since PV power generation mainly relies on solar irradiance, the power generation at night is zero. Therefore, a total of 13 h of data, from 6 AM to 7 PM, is selected, and a total of 52 data points are collected. Therefore, the time step is set to 52. The optimizer is Adam.

4.5. Results

This paper evaluates the performance of the proposed model through a single-step prediction test, where the goal is to predict a data point 15 min ahead. Table 5 summarizes the evaluation metrics for six models in the single-step forecasting test set. The proposed TB-BTCGA model is compared with the TCN, TCN-LSTM, BiTCN, BiTCN-BiGRU, and BiTCN-BiGRU-MA models based on MAE, RMSE, and

R^{2}

. The results in Table 5 show that the proposed model achieves an MAE of 0.193, RMSE of 0.325, and

R^{2}

of 0.946 for single-step prediction.

By comparing the standalone TCN and BiTCN models, it is evident that the bidirectional TCN outperforms the unidirectional TCN in terms of prediction accuracy. This demonstrates that incorporating a bidirectional structure effectively enhances the model’s predictive capability. Specifically, the bidirectional TCN captures more time-series features by utilizing both past and future contexts, which helps avoid the loss of important information that a unidirectional model might miss.

In the comparison of hybrid models, both TCN-LSTM and BiTCN-BiGRU show significant improvements in

R^{2}

. This suggests that combining the TCN with the LSTM network or BiGRU allows the model to effectively integrate local and global time-series features, thereby improving prediction accuracy. In terms of evaluation metrics, the proposed model outperforms the BiTCN-BiGRU-MA model, reducing MAE by 3.2%, decreasing RMSE by 5.1%, and increasing

R^{2}

by 1.3%. These results demonstrate that the proposed method improves accuracy compared to the unimproved model, particularly by increasing

R^{2}

while effectively reducing the errors.

To further verify the effectiveness of the proposed hybrid photovoltaic power generation prediction model, the six aforementioned models are applied to forecast the data of the region containing the power station in the test set, covering the period from 6:00 to 19:00 on both clear and cloudy days. The outcomes of the experiment are shown in Figure 8 and Figure 9. Specifically, Figure 8 presents the forecasting outcomes for clear days, while Figure 9 illustrates the prediction outcomes for overcast conditions. In the single-step prediction, the model introduced in this study has a greater advantage in fitting the true values compared to other models and has a better power prediction trend when there are large fluctuations in power generation.

In summary, the TB-BTCGA model proposed in this paper, which integrates the improved BiTCN, BiGRU, multi-head attention mechanism, and AR model, demonstrates superior accuracy in single-step prediction on the Station 0 dataset compared to other models. However, the model’s generalization ability and stability in real-world applications, particularly under conditions of poor data quality or significant data fluctuations, still require further validation. Therefore, the model’s predictive accuracy was further evaluated using the Station 1 and Station 2 datasets.

Figure 10 presents the 3-day PV power prediction results for Station 1 under cold winter conditions in the northern region. Due to the region’s climate characteristics, the PV power generation data remain relatively stable. Consequently, the predicted values in the figure exhibit smooth variations with relatively small errors, highlighting the model’s strong adaptability to stable environmental conditions. The curves in the figure illustrate the differences between actual PV power generation and the predicted values, showcasing the model’s ability to forecast such stable seasonal fluctuations. Despite the limited sunlight during winter, the model effectively predicts the power generation trend and accurately captures short-term variations. This demonstrates that the proposed model is well suited for stable environments, exhibiting both strong adaptability and high prediction accuracy for power generation patterns under typical climatic conditions.

Figure 11 presents the 3-day PV power prediction results for Station 2 under summer rainy conditions. Compared to Station 1, the data from Station 2 are more affected by weather fluctuations, particularly during the rainy season, which increases the volatility of photovoltaic power generation. The instability in light intensity and duration during rainy weather leads to significant short-term fluctuations in power generation, making forecasting more challenging. The prediction curves in the figure illustrate the model’s ability to handle such unstable weather conditions. Despite the increased uncertainty, the model still effectively predicts the short-term power generation trend. Although the prediction error is slightly higher compared to Station 1, the model still delivers relatively stable results. In particular, during frequent rainy conditions, the model successfully captures the variations in features and adjusts for errors, enabling it to track power generation changes accurately and maintain high prediction precision even amid short-term fluctuations.

In addition to single-step prediction, experiments were also conducted on multi-step forecasting, where power generation for 3, 6, and 12 time steps (corresponding to 45 min, 1.5 h, and 3 h into the future) is predicted. The results of these experiments are shown in Table 6. While the proposed model outperforms all other models in every multi-step prediction, its decreasing performance with increasing time steps warrants further investigation.

As expected, the prediction accuracy tends to degrade as the forecast horizon lengthens, a common challenge in time-series forecasting. The temporal bottleneck structure partially mitigates error propagation in multi-step predictions by emphasizing critical features, resulting in a controlled increase in MAE and RMSE. The main reason for this decline is the accumulation of prediction errors over multiple time steps, which is inherent to sequential forecasting tasks. In particular, the model’s ability to maintain prediction accuracy decreases as the forecast horizon extends, making it more challenging to predict short-term fluctuations accurately. However, despite this decline, the proposed model still maintains a clear edge over other models, indicating that the enhancements—such as the bidirectional TCN, BiGRU, and attention mechanisms—significantly improve the model’s performance in multi-step forecasting. These improvements allow the model to handle the increasing complexity of multi-step predictions better than other methods, although further work is needed to address the challenges associated with long-term forecasting.

4.6. Analysis of Computational Complexity

Table 7 presents the computational complexity of the proposed and benchmark frameworks. The space complexity can be described by floating-point operations (FLOPs). As shown, the computational cost increases progressively with the inclusion of advanced modules such as the BiGRU, multi-head attention, the DRSN, and the temporal bottleneck structure in the TB-BTCGA model. These additions enhance the model’s ability to capture complex temporal dependencies and refine feature extraction, leading to the highest FLOPs (6.88 G) and training time (65.96 s). While the increased complexity demands more computational resources, it significantly improves the model’s predictive accuracy and robustness, justifying the trade-off for better forecasting performance.

4.7. Discussion

The TB-BTCGA model significantly improves PV power prediction, particularly in short-term forecasting. By incorporating the BiTCN, BiGRU, multi-head attention mechanism, and AR components, the model can better capture temporal dependencies in PV power generation data, leading to more accurate forecasts.

However, challenges in long-term forecasting remain, as error accumulation across time steps continues to affect predictive accuracy. The temporal bottleneck structure mitigates this issue by emphasizing critical features across time steps, reducing redundancy and focusing on key patterns. Meanwhile, bidirectional networks address the limitation of unidirectional models by capturing both historical and future dependencies. Despite these advancements, further improvements are needed, such as exploring ensemble learning or Transformer-based long-term memory structures, to better handle multi-step forecasting challenges.

A major strength of the TB-BTCGA model lies in its adaptability to varying environmental conditions. The temporal bottleneck ensures efficient feature extraction, which benefits both stable climates, such as cold winters (Station 1), and volatile climates, like summer rainy conditions (Station 2). Additionally, the bidirectional networks allow the model to dynamically adjust to diverse climatic scenarios, improving robustness and accuracy.

Nevertheless, the model’s performance under extreme weather conditions, such as sudden storms or rapid weather changes, remains untested. Future research should focus on these scenarios to ensure the model’s reliability in real-world applications, where extreme conditions are increasingly common due to climate change.

The TB-BTCGA model demonstrates significant potential for real-world applications, particularly in smart grids and PV power management systems. By minimizing redundant computations, the temporal bottleneck structure enhances computational efficiency, enabling the real-time processing of data from multiple PV systems. This feature is critical for real-time energy management, as it allows grid operators to make informed decisions under fluctuating energy production conditions. The bidirectional networks further improve decision-making by capturing both historical trends and future dependencies, ensuring accurate and dynamic predictions even in rapidly changing grid environments.

For practical implementation, the model must seamlessly integrate with existing grid management systems, accommodating inputs such as grid load, energy storage, and demand fluctuations. For example, accurate short-term predictions can optimize energy storage operations by ensuring that storage systems charge during periods of surplus energy and discharge during peak demand. Furthermore, during extreme weather events, the model’s ability to process real-time data from Internet of Things (IoT) devices, such as weather sensors and grid monitors, enables it to provide timely and reliable forecasts. This adaptability is crucial for maintaining grid stability and supporting intelligent energy management.

In summary, while the TB-BTCGA model achieves strong short-term forecasting performance and demonstrates adaptability to diverse conditions, several areas require further exploration. These include the following:

(1) Improving multi-step forecasting: Future work should focus on reducing error accumulation through advanced methods like long-term memory networks or error correction mechanisms.

(2) Handling data uncertainty: Techniques such as data imputation and anomaly detection can improve the model’s robustness when faced with noisy or incomplete data.

(3) Cross-regional validation: Testing on diverse datasets will help evaluate the model’s generalizability to different regions and environmental conditions.

(4) Enhancing computational efficiency: Optimizing the model for large-scale deployment without compromising predictive accuracy will be critical for its real-time applications.

5. Conclusions

This paper proposes a novel TB-BTCGA model for PV power prediction that integrates an improved BiTCN, a multi-focus mechanism, and a BiGRU. To enhance the model’s feature extraction capacity and mitigate the interference of redundant features, the DRSN is incorporated to improve the residual module of the BiTCN. Additionally, a temporal bottleneck structure is introduced to optimize the original TCN residual block. Notably, the inclusion of multiple attention mechanisms facilitates more effective extraction of critical features. Furthermore, the BiGRU network captures long-term dependencies within the time series, while AR models contribute to refining the model’s ability to extract linear features.

Experiments conducted on various datasets demonstrate that the proposed TB-BTCGA model outperforms several state-of-the-art models in terms of PV power prediction accuracy. Specifically, compared to the single TCN model, the proposed model significantly reduces the MAE and RMSE by 12.1% and 16.1%, respectively, while the

R^{2}

improves by 6.2%. These results highlight the model’s robust performance across different environmental conditions and its ability to generalize effectively, suggesting its potential for practical applications in diverse scenarios.

However, the model’s performance in forecasting during extreme weather events, such as rapid temperature changes or cloud cover, still needs improvement. These delays in prediction accuracy could impact the usability of forecast results in real-time applications. Future work should focus on addressing these issues, possibly through enhanced weather condition modeling or real-time data integration. While this study demonstrates the model’s effectiveness in PV power prediction under typical environmental conditions, its generalizability across a wider range of climates and geographical areas requires further investigation. Case studies in other regions with varying weather patterns should be conducted to better understand the model’s scalability and robustness. Additionally, future research should explore the model’s performance under different operational scenarios, such as varying levels of solar panel efficiency or grid management requirements. By addressing these areas, the TB-BTCGA model could provide an even more effective solution for photovoltaic power management in smart grids.

Author Contributions

Conceptualization, J.G., X.L. and F.Z.; writing—review and editing, J.G., T.C. and X.L.; methodology, X.L. and J.G.; writing—original draft, X.L. and C.F.; formal analysis, Y.H., T.C. and J.L.; supervision, P.W. and F.Z.; data curation, Z.L., T.H. and Y.H.; funding acquisition, Z.L. and C.F.; visualization, J.L. and P.W.; validation, T.C., C.F., T.H. and J.L. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Sichuan Science and Technology Plan for the Research and Application of Key Technologies for Command and Equipment in Hail Suppression in Xinjiang (2024YFHZ0151), the Second Comprehensive Scientific Investigation of the Tibetan Plateau on Extreme Weather, Climate Events, and Disaster Risk (2019QZKK0104) funded by the Ministry of Science and Technology, the National Funded Postdoctoral Research Program (GZC20241900), the Natural Science Foundation Program of Xinjiang Uygur Autonomous Region (2024D01A141), and the Tianchi Talents Program of Xinjiang Uygur Autonomous Region. It is also supported by the Postdoctoral Fund of Xinjiang Uygur Autonomous Region, as well as the Key Projects of Open Fund (ZSAQ202401, ZSAQ202423, ZSAQ202424).

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

Special thanks to each of the authors for their contributions to this article.

Conflicts of Interest

Author Yaoran Huo was employed by the Information & Communication Company, State Grid Sichuan Electric Power Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PV	Photovoltaic
TCN	Temporal Convolutional Network
DRSN	Deep Residual Shrinkage Networks
BiTCN	Bidirectional Temporal Convolutional Network
AR	Autoregressive
TB-BTCGA	Temporal Bottleneck-enhanced Bidirectional Temporal Convolutional
TB-BTCGA	Network with Multi-Head Attention and Autoregressive
AI	Artificial Intelligence
ANN	Artificial Neural Network
DNN	Deep Neural Networks
LSTM	Long Short-Term Memory Network
GRU	Gated Recurrent Unit
RNN	Recurrent neural network
BiLSTM	Bidirectional Long Short-Term Memory Network
CNN	Convolutional Neural Networks
LSTNet	Long- and Short-Term Net
AMTCN	Attention-based Multivariate Temporal Convolutional Network
BiGRU	Bidirectional Gated Recurrent Unit
VMD	Variational Mode Decomposition
ReLu	Rectified Linear Unit
GeLu	Gaussian Error Linear Unit
NWP	Numerical Weather Prediction
LMD	Local Meteorological Data
FC	Full Connection
PCC	Pearson Correlation Coefficient
SCC	Spearman Correlation Coefficient
irrad	Irradiance
MA	Multi-head attention
RMSE	Root Mean Square Error
MAE	Mean Absolute Error
$R^{2}$	Coefficient of determination
FLOPs	Floating-point operations
IoT	Internet of Things

References

Gielen, D.; Boshell, F.; Saygin, D.; Bazilian, M.D.; Wagner, N.; Gorini, R. The role of renewable energy in the global energy transformation. Energy Strategy Rev. 2019, 24, 38–50. [Google Scholar] [CrossRef]
International Energy Agency. Global Energy Review 2021. 2021. Available online: https://www.iea.org/reports/global-energy-review-2021 (accessed on 5 June 2024).
Li, Z.; Rahman, S.M.; Vega, R.; Dong, B. A hierarchical approach using machine learning methods in solar photovoltaic energy production forecasting. Energies 2016, 9, 55. [Google Scholar] [CrossRef]
Moreira, M.; Balestrassi, P.; Paiva, A.; Ribeiro, P.; Bonatto, B. Design of experiments using artificial neural network ensemble for photovoltaic generation forecasting. Renew. Sustain. Energy Rev. 2021, 135, 110450. [Google Scholar] [CrossRef]
Wolff, B.; Kühnert, J.; Lorenz, E.; Kramer, O.; Heinemann, D. Comparing support vector regression for PV power forecasting to a physical modeling approach using measurement, numerical weather prediction, and cloud motion data. Sol. Energy 2016, 135, 197–208. [Google Scholar] [CrossRef]
Li, P.; Zhou, K.; Lu, X.; Yang, S. A hybrid deep learning model for short-term PV power forecasting. Appl. Energy 2020, 259, 114216. [Google Scholar] [CrossRef]
Wang, H.; Liu, Y.; Zhou, B.; Li, C.; Cao, G.; Voropai, N.; Barakhtenko, E. Taxonomy research of artificial intelligence for deterministic solar power forecasting. Energy Convers. Manag. 2020, 214, 112909. [Google Scholar] [CrossRef]
Lee, D.; Kim, K. Recurrent neural network-based hourly prediction of photovoltaic power output using meteorological information. Energies 2019, 12, 215. [Google Scholar] [CrossRef]
Sarmas, E.; Spiliotis, E.; Stamatopoulos, E.; Marinakis, V.; Doukas, H. Short-term photovoltaic power forecasting using meta-learning and numerical weather prediction independent Long Short-Term Memory models. Renew. Energy 2023, 216, 118997. [Google Scholar] [CrossRef]
Joseph, L.P.; Deo, R.C.; Prasad, R.; Salcedo-Sanz, S.; Raj, N.; Soar, J. Near real-time wind speed forecast model with bidirectional LSTM networks. Renew. Energy 2023, 204, 39–58. [Google Scholar] [CrossRef]
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Ma, H.; Zhang, C.; Peng, T.; Nazir, M.S.; Li, Y. An integrated framework of gated recurrent unit based on improved sine cosine algorithm for photovoltaic power forecasting. Energy 2022, 256, 124650. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Xiang, L.; Liu, J.; Yang, X.; Hu, A.; Su, H. Ultra-short term wind power prediction applying a novel model named SATCN-LSTM. Energy Convers. Manag. 2022, 252, 115036. [Google Scholar] [CrossRef]
Wang, M.; Rao, C.; Xiao, X.; Hu, Z.; Goh, M. Efficient shrinkage temporal convolutional network model for photovoltaic power prediction. Energy 2024, 297, 131295. [Google Scholar] [CrossRef]
Zhang, W.; Liu, J.; Deng, W.; Tang, S.; Yang, F.; Han, Y.; Liu, M.; Wan, R. AMTCN: An Attention-Based Multivariate Temporal Convolutional Network for Electricity Consumption Prediction. Electronics 2024, 13, 4080. [Google Scholar] [CrossRef]
Heo, J.; Song, K.; Han, S.; Lee, D.E. Multi-channel convolutional neural network for integration of meteorological and geographical features in solar power forecasting. Appl. Energy 2021, 295, 117083. [Google Scholar] [CrossRef]
Netsanet, S.; Zheng, D.; Zhang, W.; Teshager, G. Short-term PV power forecasting using variational mode decomposition integrated with Ant colony optimization and neural network. Energy Rep. 2022, 8, 2022–2035. [Google Scholar] [CrossRef]
Li, Y.; Song, L.; Zhang, S.; Kraus, L.; Adcox, T.; Willardson, R.; Komandur, A.; Lu, N. A TCN-based hybrid forecasting framework for hours-ahead utility-scale PV forecasting. IEEE Trans. Smart Grid 2023, 14, 4073–4085. [Google Scholar] [CrossRef]
Qu, J.; Qian, Z.; Pei, Y. Day-ahead hourly photovoltaic power forecasting using attention-based CNN-LSTM neural network embedded with multiple relevant and target variables prediction pattern. Energy 2021, 232, 120996. [Google Scholar] [CrossRef]
Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long-and short-term temporal patterns with deep neural networks. In Proceedings of the The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar]
Zhen, H.; Niu, D.; Wang, K.; Shi, Y.; Ji, Z.; Xu, X. Photovoltaic power forecasting based on GA improved Bi-LSTM in microgrid without meteorological information. Energy 2021, 231, 120908. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Hawash, H.; Chakrabortty, R.K.; Ryan, M. PV-Net: An innovative deep learning approach for efficient forecasting of short-term photovoltaic energy production. J. Clean. Prod. 2021, 303, 127037. [Google Scholar] [CrossRef]
Yu, M.; Niu, D.; Wang, K.; Du, R.; Yu, X.; Sun, L.; Wang, F. Short-term photovoltaic power point-interval forecasting based on double-layer decomposition and WOA-BiLSTM-Attention and considering weather classification. Energy 2023, 275, 127348. [Google Scholar] [CrossRef]
Limouni, T.; Yaagoubi, R.; Bouziane, K.; Guissi, K.; Baali, E.H. Accurate one step and multistep forecasting of very short-term PV power using LSTM-TCN model. Renew. Energy 2023, 205, 1010–1024. [Google Scholar] [CrossRef]
Pu, X.; Hao, X.; Jiarui, W.; Pei, W.; Yang, J.; Zhang, J. A novel GRU-TCN network based Interactive Behavior Learning of multi-energy Microgrid under incomplete information. Energy Rep. 2023, 9, 608–616. [Google Scholar] [CrossRef]
Fu, H.; Zhang, J.; Xie, S. A Novel Improved Variational Mode Decomposition-Temporal Convolutional Network-Gated Recurrent Unit with Multi-Head Attention Mechanism for Enhanced Photovoltaic Power Forecasting. Electronics 2024, 13, 1837. [Google Scholar] [CrossRef]
Zhang, D.; Chen, B.; Zhu, H.; Goh, H.H.; Dong, Y.; Wu, T. Short-term wind power prediction based on two-layer decomposition and BiTCN-BiLSTM-attention model. Energy 2023, 285, 128762. [Google Scholar] [CrossRef]
Samal, K.K.R.; Panda, A.K.; Babu, K.S.; Das, S.K. Multi-output TCN autoencoder for long-term pollution forecasting for multiple sites. Urban Clim. 2021, 39, 100943. [Google Scholar] [CrossRef]
Choi, S.; Chung, S.; Lee, S.; Han, S.; Kang, T.; Seo, J.; Kwak, I.Y.; Oh, S. TB-ResNet: Bridging the Gap from TDNN to ResNet in Automatic Speaker Verification with Temporal-Bottleneck Enhancement. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 10291–10295. [Google Scholar]
Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Pecht, M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 2019, 16, 4681–4690. [Google Scholar] [CrossRef]
Huang, C.; Yang, M. Memory long and short term time series network for ultra-short-term photovoltaic power forecasting. Energy 2023, 279, 127961. [Google Scholar] [CrossRef]
Zhang, M.; Li, W.; Zhang, L.; Jin, H.; Mu, Y.; Wang, L. A Pearson correlation-based adaptive variable grouping method for large-scale multi-objective optimization. Inf. Sci. 2023, 639, 118737. [Google Scholar] [CrossRef]
Gherboudj, I.; Ghedira, H. Assessment of solar energy potential over the United Arab Emirates using remote sensing and weather forecast data. Renew. Sustain. Energy Rev. 2016, 55, 1210–1224. [Google Scholar] [CrossRef]
Rao, C.; Xu, Y.; Xiao, X.; Hu, F.; Goh, M. Imbalanced customer churn classification using a new multi-strategy collaborative processing method. Expert Syst. Appl. 2024, 247, 123251. [Google Scholar] [CrossRef]

Figure 1. The block diagram for the framework of the proposed model.

Figure 2. The structure of the dilated causal convolutional network and residual block.

Figure 3. The structure of the bidirectional dilated causal convolutional network.

Figure 4. The structure of the residual block in the BiTCN.

Figure 5. The structures of the GRU and BiGRU.

Figure 6. Improved TCN residual structure diagram.

Figure 7. Pearson and Spearman correlation heat maps.

Figure 8. Comparison of PV power prediction on clear days.

Figure 9. Comparison of PV power prediction in cloudy weather.

Figure 10. Prediction and comparison of 3-day photovoltaic power generation at Station 1.

Figure 11. Prediction and comparison of 3-day photovoltaic power generation at Station 2.

Table 1. Detailed description of dataset.

Name	Description	Units
Nwp_globalirrad	Global irradiance of NWP	W/m²
Nwp_directirrad	Direct irradiance of NWP	W/m²
Nwp_temperature	10 m dry-bulb temperature of NWP	°C
Nwp_humidity	10 m relative humidity of NWP	%
Nwp_windspeed	10 m wind speed of NWP	m/s
Lmd_totalirrad	Global irradiance of LMD	W/m²
Lmd_diffuseirrad	Diffuse irradiance of LMD	W/m²
Lmd_temperature	Temperature of LMD	°C
Lmd_windspeed	Wind speed of LMD	m/s
Power	PV output of the station	MW

Table 2. Dataset statistics table for Station 0.

Variable	Average	Maximum	Minimum
Nwp_globalirrad (W/m²)	304.84	304.84	0
Nwp_directirrad (W/m²)	267.52	885.62	0
Nwp_temperature (°C)	13.56	41.09	−14.01
Nwp_humidity (%)	34.58	99.81	5.07
Nwp_windspeed (m/s)	3.80	15.98	0.08
Lmd_totalirrad (W/m²)	303.11	1122	0
Lmd_diffuseirrad (W/m²)	174.54	927	0
Lmd_temperature (°C)	13.17	36.80	−13.50
Lmd_windspeed (m/s)	1.81	12.10	0
Power (MW)	1.50	5.52	0

Table 3. Correlation coefficient analysis.

Variable	SCC	PCC
Nwp_globalirrad (W/m²)	0.906	0.886
Nwp_directirrad (W/m²)	0.891	0.880
Nwp_temperature (°C)	0.462	0.451
Nwp_humidity (%)	−0.362	−0.365
Nwp_windspeed (m/s)	0.182	0.184
Lmd_totalirrad (W/m²)	0.968	0.966
Lmd_diffuseirrad (W/m²)	0.869	0.773
Lmd_temperature (°C)	0.448	0.448
Lmd_windspeed (m/s)	0.381	0.354

Table 4. Parameter settings.

Parameters	Value
Model dim	[16, 32, 64, 128, 256]
Batch size	16
Learning rate	0.001
Kernel size	4
Dilations	1, 2, 4, 8, 16
Time steps	52
Dropout	0.5
Epochs	200

Table 5. Prediction model evaluation index table for Station 0.

Model	MAE	RMSE	$R^{2}$
TCN	0.314	0.486	0.884
TCN-LSTM	0.329	0.465	0.894
BiTCN	0.270	0.437	0.906
BiTCN-BiGRU	0.243	0.393	0.924
BiTCN-BiGRU-MA	0.225	0.376	0.933
TB-BTCGA	0.193	0.325	0.946

Table 6. Table of multi-step prediction model evaluation indices.

Dataset	Models	MAE			RMSE
Dataset	Models	45 min	90 min	180 min	45 min	90 min	180 min
Station 0	TCN	0.395	0.535	0.707	0.596	0.735	1.003
	BiTCN	0.368	0.445	0.590	0.574	0.634	0.855
	TCN-LSTM	0.377	0.523	0.667	0.579	0.649	0.909
	BiTCN-BiGRU	0.303	0.378	0.519	0.468	0.570	0.734
	BiTCN-BiGRU-MA	0.293	0.329	0.413	0.453	0.497	0.586
	TB-BTCGA	0.255	0.291	0.346	0.400	0.441	0.496
Station 1	TCN	0.374	0.514	0.693	0.562	0.716	0.984
	BiTCN	0.364	0.428	0.557	0.563	0.615	0.815
	TCN-LSTM	0.362	0.501	0.631	0.541	0.622	0.862
	BiTCN-BiGRU	0.299	0.346	0.488	0.463	0.548	0.708
	BiTCN-BiGRU-MA	0.284	0.303	0.395	0.431	0.462	0.562
	TB-BTCGA	0.234	0.272	0.317	0.369	0.425	0.466
Station 2	TCN	0.422	0.564	0.731	0.653	0.746	1.032
	BiTCN	0.421	0.453	0.592	0.631	0.669	0.875
	TCN-LSTM	0.422	0.541	0.682	0.598	0.672	0.901
	BiTCN-BiGRU	0.355	0.393	0.512	0.496	0.631	0.788
	BiTCN-BiGRU-MA	0.327	0.353	0.431	0.499	0.517	0.608
	TB-BTCGA	0.251	0.341	0.393	0.399	0.477	0.592

Table 7. Comparison of computational complexity of different models.

Model	Training Time/Epoch (s)	Total Training Time (min)	FLOPs (G)
TCN	12.67	42.23	1.88
TCN-LSTM	28.32	94.4	2.21
BiTCN	25.47	84.9	3.76
BiTCN-BiGRU	45.53	151.77	4.75
BiTCN-BiGRU-MA	50.74	169.13	5.30
TB-BTCGA	65.96	219.87	6.88

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gan, J.; Lin, X.; Chen, T.; Fan, C.; Wei, P.; Li, Z.; Huo, Y.; Zhang, F.; Liu, J.; He, T. Improving Short-Term Photovoltaic Power Generation Forecasting with a Bidirectional Temporal Convolutional Network Enhanced by Temporal Bottlenecks and Attention Mechanisms. Electronics 2025, 14, 214. https://doi.org/10.3390/electronics14020214

AMA Style

Gan J, Lin X, Chen T, Fan C, Wei P, Li Z, Huo Y, Zhang F, Liu J, He T. Improving Short-Term Photovoltaic Power Generation Forecasting with a Bidirectional Temporal Convolutional Network Enhanced by Temporal Bottlenecks and Attention Mechanisms. Electronics. 2025; 14(2):214. https://doi.org/10.3390/electronics14020214

Chicago/Turabian Style

Gan, Jianhong, Xi Lin, Tinghui Chen, Changyuan Fan, Peiyang Wei, Zhibin Li, Yaoran Huo, Fan Zhang, Jia Liu, and Tongli He. 2025. "Improving Short-Term Photovoltaic Power Generation Forecasting with a Bidirectional Temporal Convolutional Network Enhanced by Temporal Bottlenecks and Attention Mechanisms" Electronics 14, no. 2: 214. https://doi.org/10.3390/electronics14020214

APA Style

Gan, J., Lin, X., Chen, T., Fan, C., Wei, P., Li, Z., Huo, Y., Zhang, F., Liu, J., & He, T. (2025). Improving Short-Term Photovoltaic Power Generation Forecasting with a Bidirectional Temporal Convolutional Network Enhanced by Temporal Bottlenecks and Attention Mechanisms. Electronics, 14(2), 214. https://doi.org/10.3390/electronics14020214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving Short-Term Photovoltaic Power Generation Forecasting with a Bidirectional Temporal Convolutional Network Enhanced by Temporal Bottlenecks and Attention Mechanisms

Abstract

1. Introduction

2. Methodology

2.1. The Proposed TB-BTCGA Model

2.2. TCN Model

2.3. BiGRU

2.4. Multi-Head Attention Mechanism (MA)

2.5. Autoregressive Model

3. Improved TCN Residual Block

4. Case Study

4.1. Analyze Dataset

4.2. Feature Analysis Method

4.3. Performance Evaluation Metrics

4.4. Parameter Setting

4.5. Results

4.6. Analysis of Computational Complexity

4.7. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI