Deep Learning Models for PV Power Forecasting: Review

Yu, Junfeng; Li, Xiaodong; Yang, Lei; Li, Linze; Huang, Zhichao; Shen, Keyan; Yang, Xu; Yang, Xu; Xu, Zhikang; Zhang, Dongying; Du, Shuai

doi:10.3390/en17163973

Open AccessReview

Deep Learning Models for PV Power Forecasting: Review

by

Junfeng Yu

^1,†,

Xiaodong Li

^1,†,

Lei Yang

²,

Linze Li

²,

Zhichao Huang

¹,

Keyan Shen

³

,

Xu Yang

³

,

Xu Yang

¹,

Zhikang Xu

¹

,

Dongying Zhang

^1,4,*

and

Shuai Du

¹

School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

²

CTG Wuhan Science and Technology Innovation Park, China Three Gorges Corporation, Wuhan 430074, China

³

Hubei Key Laboratory of Intelligent Yangtze and Hydroelectric Science, China Yangtze Power Co., Ltd., Yichang 443000, China

⁴

Hubei Key Laboratory of Digital Watershed Science and Technology, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies 2024, 17(16), 3973; https://doi.org/10.3390/en17163973

Submission received: 1 June 2024 / Revised: 25 July 2024 / Accepted: 1 August 2024 / Published: 10 August 2024

(This article belongs to the Topic Clean and Low Carbon Energy, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Accurate forecasting of photovoltaic (PV) power is essential for grid scheduling and energy management. In recent years, deep learning technology has made significant progress in time-series forecasting, offering new solutions for PV power forecasting. This study provides a systematic review of deep learning models for PV power forecasting, concentrating on comparisons of the features, advantages, and limitations of different model architectures. First, we analyze the commonly used datasets for PV power forecasting. Additionally, we provide an overview of mainstream deep learning model architectures, including multilayer perceptron (MLP), recurrent neural networks (RNN), convolutional neural networks (CNN), and graph neural networks (GNN), and explain their fundamental principles and technical features. Moreover, we systematically organize the research progress of deep learning models based on different architectures for PV power forecasting. This study indicates that different deep learning model architectures have their own advantages in PV power forecasting. MLP models have strong nonlinear fitting capabilities, RNN models can capture long-term dependencies, CNN models can automatically extract local features, and GNN models have unique advantages for modeling spatiotemporal characteristics. This manuscript provides a comprehensive research survey for PV power forecasting using deep learning models, helping researchers and practitioners to gain a deeper understanding of the current applications, challenges, and opportunities of deep learning technology in this area.

Keywords:

PV power forecasting; deep learning; MLP; CNN; RNN; GNN

1. Introduction

Solar energy, as a clean, safe, and inexhaustible renewable energy source, is playing an increasingly important role in the energy structure adjustments of various countries. However, PV power generation is influenced by multiple factors such as weather conditions and geographic location, leading to intermittent and fluctuating power output. This poses a significant challenge for grid scheduling and management. Therefore, accurate PV power forecasting is crucial to ensure the safe and stable operation of the grid and improve the efficiency of solar energy utilization. Traditional PV power forecasting methods mainly include physical modeling and statistical learning. The physical modeling approach is based on meteorological elements, such as solar radiation and temperature, to establish mathematical and physical models, but it is difficult to comprehensively consider all influencing factors and has high computational complexity. Statistical learning methods, such as autoregressive moving average (ARMA) [1] models and support vector machines (SVM) [2], attempt to uncover inherent patterns by analyzing historical power data. However, these methods have limited capability for modeling nonlinear and nonstationary time-series data.

In recent years, deep learning methods have shown tremendous advantages and potential in the field of time-series forecasting because of their powerful feature extraction and nonlinear mapping capabilities, attracting widespread attention from both academia and industry. Various robust model architectures, such as multilayer perceptrons (MLP) [3], recurrent neural networks (RNN) [4], convolutional neural networks (CNN) [5], and graph neural networks (GNN) [6] have been applied to PV power forecasting. MLPs use their multi-layer structure to learn complex nonlinear patterns by transforming input data through multiple hidden layers, extracting high-level features, and generating predictive outputs. RNNs and their variants, long short-term memory (LSTM) [7] and gated recurrent unit), are used to capture long-term dependencies in sequences by maintaining a state vector that retains historical information at each time step, enabling the memory and utilization of past information. CNNs extract local features through convolution and pooling operations, and by applying convolutional kernels over a sliding window to the input data, they automatically learn and extract local features [8]. GNNs can model the spatiotemporal correlations of data from various regional solar stations by treating each solar station as a node in a graph and establishing edge connections based on their geographic location and temporal relationships, using the graph’s structural information for forecasting [9]. These deep learning models provide powerful tools for PV power forecasting, capable of automatically learning and extracting meaningful features from complex time-series data and generating accurate predictive results.

However, owing to the complex mechanisms and dynamic characteristics of PV power generation, forecasting time-series data for PV power generation remains challenging. PV power generation is influenced by numerous factors such as solar radiation intensity, temperature, humidity, and cloud cover, which exhibit complex, interactive, and nonlinear relationships. Additionally, PV power generation has significant temporal and spatial dependencies, with generation patterns varying significantly across different times and geographic locations. Moreover, models need to balance forecasting accuracy and computational efficiency to meet the requirements of practical applications. Therefore, time-series forecasting of PV power generation remains a challenging research problem, requiring continuous exploration and innovation by researchers.

Given the importance and challenges of forecasting time-series data for PV power generation, this study aims to systematically review the current research status and advancements in deep learning for forecasting in this area. We introduce the main forecasting model architectures, analyze their advantages and disadvantages, and compare their performances on public datasets. We delve into representative models for univariate probabilistic, multivariate, and spatiotemporal forecasting. The remainder of this paper is divided into four sections.

Section 1 explains the research background and significance of PV power forecasting and provides an overview of the research content and structure of this paper.

Section 2 introduces the commonly used datasets and evaluation metrics for PV power forecasting and provides an overview of basic deep learning model architectures such as MLP, RNN, and CNN.

Section 3 provides a categorized review of deep-learning-based PV power forecasting methods according to model architecture, focusing on analyzing and comparing the principles, characteristics, and performance of various models. Section 3.1 introduces the development of MLP-based forecasting models for univariate, multivariate, and frequency-domain forecasting. Section 3.2 discusses improvements in RNN-based forecasting models, particularly in aspects such as long short-term memory and attention mechanisms. Section 3.3 introduces innovations in CNN-based forecasting models, particularly in cross-scale and cross-variable relationship extraction. Section 3.4 describes how GNN-based forecasting models effectively capture complex dependencies among variables by combining temporal dependencies with spatial correlations.

Section 4 summarizes the entire paper, investigates the limitations of existing research, and forecasts future research directions.

2. Fundamental Deep Learning Models for Time-Series Forecasting and PV Datasets

In the field of energy, there is a strong demand for time-series data forecasting. The power generation and power output of renewable energy sources such as solar, wind, and hydropower will affect the operation and configuration of large-scale power grids; therefore, accurate forecasting systems are very important in guiding power generation. Taking PV power forecasting as an example, time-series data forecasting usually includes the following steps:

Data collection and preprocessing

Collect historical time-series data and perform data cleaning, missing-value processing, and outlier processing. Stationarity processing is performed on the data as needed, such as differencing and logarithmic transformation, to eliminate trends and seasonal influences.

2.: Model selection and training

An appropriate forecasting model is selected based on the characteristics of the time-series and the requirements of the forecasting task. We divide the historical data into training and test sets, train the model with the training set data, and adjust the model’s hyperparameters.

3.: Model evaluation and validation

Evaluate the trained model with the test set data, calculate forecasting errors, and evaluate metrics, including MSE, MAE, MBE, MAPE, RMSE, SDE, RSE, RRSE, and R² (These metrics are used to evaluate model performance across different models mentioned in this paper). Cross-validation or time-series cross-validation is performed to assess the generalization ability and stability of the model.

Table 1 presents the details of evaluation metrics used to assess the accuracy of PV power forecasting models mentioned in this paper. Table 2 summarizes the datasets and their brief descriptions employed in the referenced literature to evaluate the performance of PV power forecasting models.

Subsequently, we provide an overview of mainstream deep learning model architectures, including multilayer perceptron (MLP), recurrent neural networks (RNN), convolutional neural networks (CNN), and graph neural networks (GNN), and explain their fundamental principles and technical features.

2.1. MLP (Multilayer Perceptron)

Multilayer perceptron (MLP) is a type of feedforward neural network that consists of an input layer, an output layer, and one or more hidden layers in between. The MLP has evolved from a single-layer perceptron, which only has an input layer and an output layer. In a single-layer perceptron, the input is multiplied by the corresponding weights, added to the bias, and then passed through an activation function to obtain the output. The model is relatively simple, and single-layer perceptrons are only suitable for handling linearly separable classification problems. Figure 1 illustrates the structure of a multilayer perceptron [3].

Compared with single-layer perceptrons, MLPs are able to better handle complex pattern-classification problems. Each layer in the MLP contains multiple neurons which are fully connected. During the computation process, each neuron in a layer generates an output based on the output values from the previous layer and the corresponding weights using an activation function. This forms the feedforward process. The output of a neuron is given by the following equation:

y_{i} = σ (\sum_{i = 1}^{n} w_{i} x_{i} + b_{i})

where

y_{i}

is the output of the neuron;

x_{i}

is the input value from each unit in the previous layer;

w_{i}

is the weight from each unit in the previous layer to the current unit; and

b_{i}

is the bias term.

The training process of an MLP network typically employs a back-propagation algorithm. The error generated by the difference between the output and desired output is propagated back through the network, updating the corresponding weights and bias terms. The network output results are continuously adjusted and optimized through this process.

2.2. RNN (Recurrent Neural Networks)

RNNs are capable of processing sequential data and capturing temporal dependencies within the data through recurrent connections. By training an RNN model, forecasting can be made for future time steps [4].

As shown in the Figure 2, an RNN is a network containing a loop, where each neural network module A reads an input

x_{t}

and outputs a value

h_{t}

, and then the process repeats. This loop allows information to be passed from the current step to the next step. RNNs are inherently related to sequences and lists, making them the most natural neural network architecture for this data type.

One of the key features of RNNs is their ability to connect previous information to the current task. However, as the interval increases, RNNs lose their ability to learn and connect information from the distant past. The vanishing gradient problem in RNNs means that the gradient is dominated by short-term gradients, making it difficult for the model to learn long-term dependencies (it can be considered that as the time-series progresses, the model gradually forgets the previously learned dependencies). Therefore, RNNs are not suitable for processing long sequences.

Common variants of RNNs include long short-term memory (LSTM) networks and gated recurrent units (GRU), which can alleviate the vanishing and exploding gradient problems. The structure of LSTM is similar to that of a standard RNN; however, it uses different functions to compute the hidden state. The “memory” of LSTM is usually referred to as cells, and the input to these cells is the previous state and the current input. These cells determine which previous information and states need to be retained and which should be erased. In practical applications, it has been found that this approach can effectively preserve the relevant information for a long time. In simple terms, owing to its limited memory capacity, it remembers important information and forgets irrelevant information. Figure 3 depicts the architecture of an LSTM model [7].

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

{\tilde{C}}_{t} = t a n h (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{C})

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

h_{t} = o_{t} * t a n h (C_{t})

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

LSTM controls the transmission of states through gated states, remembering important information and forgetting unimportant information. Unlike standard RNNs, which only have a “naive” way of accumulating memory, LSTM is particularly useful for tasks that require “long-term memory”. However, because it introduces many components, the number of parameters increases, making the training process much more difficult. Therefore, in many cases, GRU, which has a similar performance to LSTM but fewer parameters, is often used to build models with large training volumes. Figure 4 presents the structure of a GRU unit [20].

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}])

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])

{\tilde{h}}_{t} = t a n h (W \cdot [{r_{t} * h}_{t - 1}, x_{t}])

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\tilde{h}}_{t}

GRU combines the forget gate and the input gate into one, mixing the cell and hidden states. GRU has one less gate than LSTM, making its training speed faster and more convenient for building more complex networks. It has a current input

x_{t}

and a hidden state

h_{t - 1}

passed down from the previous node, which contains relevant information from the previous nodes. Combining

x_{t}

and

h_{t - 1}

, GRU obtains the output yt of the current hidden node and the hidden state

h_{t}

passed to the next node. The range of the gating signal

z

is 0~1, with values closer to 1 indicating more data being “remembered” and values closer to 0 indicating more data being “forgotten”. One clever aspect of GRU is that it uses the same gate to simultaneously perform forgetting and remembering (while LSTM requires multiple gates).

z_{t}

represents what needs to be remembered, and conversely,

1 - z_{t}

represents what is forgotten.

r_{t}

acts as a backup of

x_{t}

and

h_{t - 1}

before the transformation of

z

and

1 - z

.

2.3. CNN (Convolutional Neural Networks)

CNNs extract local features of time-series data through convolution and pooling operations, making them primarily used for image processing. CNNs fully consider the structure of the data when processing images, with neurons being arranged in three dimensions—width (W), height (H), and depth (D). Each neuron in the current layer is connected with a small block of the output from the previous layer, while each neuron in the fully connected layer is connected to all neurons in the previous layer, enabling effective feature extraction from images. CNNs typically use the following types of layers: input layer, convolutional layer, pooling layer, fully connected layer, and output layer. Figure 5 demonstrates the basic structure of a convolutional neural network [5].

Input layer: Input information such as images.

The convolutional layer is used to extract low-level features of the images.

Pooling layer: Prevents overfitting and reduces data dimensionality.

Fully connected layer: Summarizes the low-level features and information obtained from the convolutional and pooling layers.

Output layer: Obtains the result with the highest probability based on information from the fully connected layer.

Convolutional neural networks can also be used for time-series data forecasting and can automatically learn multi-scale representations of a time-series. In many tasks, convolutional networks can achieve better performance than RNNs while avoiding common pitfalls of recursive models, such as gradient explosion/vanishing problems or lack of memory retention. Furthermore, using convolutional networks can improve performance because it allows for parallel computation of outputs. Temporal convolutional networks (TCNs) employ causal convolutions to ensure that the model only uses past information at each time step. Figure 6 illustrates the method of information propagation in the TCN model [21].

Causal convolution is intuitively represented by the figure above. That is, for the value at time t in the upper layer, it only depends on the values at time t and earlier in the next layer. The difference from traditional convolutional neural networks is that causal convolution cannot see future data; it has a unidirectional structure rather than a bidirectional one. In other words, the effect comes after the cause, making it a strictly time-constrained model; hence, it is called causal convolution.

TCNs can handle sequences of arbitrary length, have strong parallelization capabilities, fast training speed, and exhibit excellent performance on many time-series tasks. The input tensor implemented by TCNs has a shape of batch_size, input_length, input size, and the output tensor has a shape of batch_size, input_length, output_size. Since each layer in a TCN has the same input and output length, only the third dimension of the input and output tensors is different. In the univariate case, both input size and output size are equal to 1. In the more general multivariate case, input size and output size may differ because we typically do not predict every feature of the input sequence.

A single 1D convolutional layer receives an input tensor with shape (batch_size, input_length, nr_input_channels) and outputs a tensor with shape (batch_size, input_length, nr_output_channels). Figure 7 below illustrates how an element of the output tensor is computed.

As shown, to compute an element of the output, we first need to examine a series of continuous elements of length kernel size in the input. In the example above, the chosen kernel size is 3, and the dot product of the selected input subsequence and a learned weight vector of the same length yields the output. The next element of the output uses the same application, but the input subsequence for the kernel size window is also shifted from the previous element to the next element. The Figure 8 below shows two consecutive output elements and their respective input subsequences.

Figure 9 below illustrates the case with multiple input channels, i.e., nr_input_channels are greater than 1. In this example, the above process is repeated by each individual input channel, but each time using a different kernel. In a sense, this is equal to performing a 2D convolution with an input tensor of shape (input size, nr input channels) and a kernel tensor of shape (kernel size, nr input channels), as shown in the figure below.

2.4. GNN (Graph Neural Networks)

Graph neural networks (GNNs) are a type of deep learning model used to process graph-structured data. In recent years, GNNs have also been applied to time-series forecasting tasks through the construction of time-series graphs. If time-series data has a graph structure, GNNs can be used for modeling. GNNs can capture spatiotemporal dependencies while considering the topological connection information of nodes, improving forecasting accuracy. Time-series analysis is crucial for uncovering the inherent information richness in available data. With the recent advances in graph neural networks (GNNs), research on GNN-based time-series analysis methods has increased. Common GNN models include graph convolutional networks (GCNs), graph attention networks (GATs), etc. These methods can explicitly model the relationships between time and variables, which is difficult for traditional and other deep neural network-based methods [6,22]. Figure 10 illustrates a classic graph neural network.

In the application of GNNs for time-series forecasting tasks, the time-series data are converted into a graph structure, where each time step is represented as a node in the graph, and edges between nodes represent relationships between time steps, such as edges between adjacent time steps or similarity edges. The weight of an edge can represent the correlation or influence strength between time steps. In simple terms, each time step in the spatial-temporal graph neural network (ST-GNN) is represented as a graph, which is passed through a GCN/GAT network to obtain an encoded graph that embeds the inter-dependencies in the data space. These encoded graphs can then be modeled like time-series data, as long as the integrity of the graph structure of the data at each time step is preserved. Figure 11 illustrates these two steps, where the temporal model can be any sequence model, ranging from ARIMA or a simple recurrent neural network to Transformers.

When GNN is used for time-series data forecasting: For single-step forecasting, the node representation of the last time step can be directly used for forecasting. For multi-step forecasting, an autoregressive approach can be employed, where the predicted values are used as input for the next step. Alternatively, an encoder-decoder structure can be used, where the encoder is a GNN, and the decoder can be another GNN or other sequence models, such as RNN. Figure 12 illustrates the use of a simple recurrent neural network as a component of the ST-GNN.

The advantages of using GNNs for time-series forecasting include: (1) Capturing structural information of time-series: GNNs can explicitly model the relationships and dependencies between time steps. (2) Handling irregular time-series data: GNNs can handle time-series data with varying lengths or missing values. (3) Fusing multivariate time-series: GNNs can naturally process multiple related time-series by modeling their interactions through edges in the graph. (4) Capturing long-term dependencies: By stacking multiple layers of GNNs, long-term dependencies in time-series can be captured. However, using GNNs for time-series forecasting also poses some challenges: (1) Graph construction: How to construct an appropriate time-series graph is a key issue, requiring consideration of the relationships and correlations between time steps. (2) Computational complexity: The computational complexity of GNNs can be high, especially for large-scale time-series graphs. (3) Hyperparameter tuning: GNNs have multiple hyperparameters that need to be tuned, such as the number of graph convolutional layers and the choice of aggregation functions.

3. Variations of the Baseline Models for PV Power Forecasting

According to Section 2, we first divided 55 variations of the baseline models into four categories: MLP-based models, RNN-based models, CNN-based models, and GNN-based models; in addition, we filtered 15 references that has the application in PV power forecasting (See Figure 13. For details, please see Appendix A), and analyzed the performance of each type of deep learning models using PV datasets and model evaluation indicators referred in Section 2.

3.1. MLP-Based Model

Rumelhart et al. [3] made a significant breakthrough in deep learning. This seminal work introduced a method to train multilayer perceptrons (MLP) with the backpropagation algorithm. MLP, with its multi-layer architecture, adeptly learns intricate linear relationships to capture the nonlinear dynamics inherent in time-series data.

Expanding on the foundations laid by MLP, Huang et al. [11] innovatively replaced the conventional loss function in the network architecture with the pseudo-Huber loss function. This new approach combines the advantageous characteristics of both squared loss function and absolute loss function, specifically suitable for the day-ahead forecasting of hourly PV power generation. The findings underscored the superiority of this method over conventional MLP networks, showing notable improvements in metrics such as root mean square error (RMSE) and mean absolute error (MAE) (for details, please see Table 3).

In recent years, the rapid progression of deep learning has spurred a surge in the development of time-series models based on the MLP architecture. Researchers are increasingly shifting their focus from mere parameter enhancements to the integration of diverse fusion models that combine various scales, incorporating advanced time-series feature extraction methods, such as sequence decomposition and multi-periodicity decomposition. This forward-thinking approach has paved the way for new advancements in time-series forecasting models.

3.1.1. Univariate Forecasting Models

Tackling the task of forecasting single-variable time-series points, Oreshkin et al. [23] introduced the N-BESTS neural network architecture. This model integrates backward and forward residual links to form a deep stack of fully connected layers. Through the incorporation of function decomposition, N-BESTS can derive interpretable elements, such as trends and seasonal terms, offering versatility across diverse domains. Using data from the NSRDB dataset, Anwar et al. [12] started their research. Their method involved identifying pivotal features for radiation forecasting, conducting regression analyses based on these features, and applying the N-BEATS framework to analyze clear sky GHI, clear sky DHI, and clear sky DNI time-series. The findings reveal that N-BEATS outperforms other prevalent approaches, such as LSTM, showing minimal errors with MAPE scores below 4% for clear sky GHI, below 12% for clear sky DHI, and below 14% for clear sky DNI (for details, please see Table 4).

Olivares et al. [24] innovatively enhanced the N-BEATS model by integrating exogenous variables as supplementary inputs, fortifying the model’s capacity to navigate intricate environments. Their proposed strategy aims to strike a balance between versatility and interpretability, enabling specific patches to refine accuracy, whereas others focus on generating informative components. Empirical evidence underscores the substantial enhancement in predictive accuracy achieved by the newly introduced N-BEATSx model in comparison to the original N-BEATS model. Remarkable progress has been made in the field of neural forecasting, significantly increasing the overall efficacy of large-scale predictive systems. Nonetheless, long-term forecasting remains a formidable challenge because of the volatility of forecasting and the complexities associated with computational processes. Challu et al. [25] introduced the N-HiTS architecture as an extension of the N-BEATS framework, incorporating a pioneering method involving cross-patch hierarchical synchronization of input sampling rates and output interpolation scales. This approach empowers each patch to forecast its unique time-series signal frequency band autonomously. The findings underscore a notable augmentation in the average accuracy with N-HiTS, particularly in scenarios requiring long-term forecasting, where the model efficiently curtails computational expenses without compromising predictive precision. Despite the integration of multi-rate sampling and hierarchical interpolation, N-HiTS encountered limitations in enhancing the performance of ultra-long-term periodicity forecasting. In response to this challenge, Fan et al. [26] introduced DEPTS, a tailored deep learning framework. The DEPTS leverages residual learning to model intricate cyclic dependencies and employs parameterized periodic functions to capture a diverse array of cyclic combinations, thereby enhancing the forecasting of complex cyclic dependencies in periodic time-series. Notably, DEPTS surpassed N-BEATS in periodicity modeling by directly capturing varying periods over time and emulating inherent global periodicity. Experimental findings revealed the dynamic nature of the forecasting capabilities and complexities associated with cyclic effects, showing fluctuations in performance disparities between DEPTS and N-BEATS over time. Nevertheless, the DEPTS consistently delivers stable and substantial enhancements in performance for forecasting periodic time-series in most scenarios.

In the field of univariate time-series forecasting, remarkable advancements have been made in the domains of irradiance and PV power forecasting. This success can be attributed to the inherent simplicity, intuitive nature, computational efficiency, and robust interpretability of the method. Nevertheless, the singular focus of univariate forecasting neglects the impact of other pertinent variables, thereby impeding the ability to capture the intricate relationships among multiple factors under certain conditions. Consequently, this constraint can lead to diminished forecasting accuracy and heightened susceptibility to outliers, potentially introducing substantial bias into the projected outcomes. Hence, in practical scenarios, the judicious selection of suitable forecasting methodologies tailored to specific contexts is imperative. Such an approach entails comprehensive consideration of the multifaceted influences of multivariate variables, progressively enhancing the performance of the forecasting results.

3.1.2. Multivariate Forecasting Models

In various practical domains, multivariate time-series forecasting has demonstrated superior performance over single-variable models. Recent research has emphasized that excessive emphasis on capturing time dependencies is not essential; instead, the risk of over-capturing entanglements and redundancies in time and channel interactions can hinder the predictive performance. Therefore, understanding the mapping between the input sequences and the time dependencies of the predictive sequences is important to forecasting accuracy. Addressing these insights, Li et al. [27] introduced a versatile framework named MTS-Mixers. This framework aims to capture time and channel dependencies separately by decomposing time and channel mixtures, thereby leveraging the low-rank properties inherent in existing time-series data. Using this approach, the framework achieves an enhanced predictive accuracy and efficiency in multivariate time-series forecasting tasks.

Although multivariate models exhibit robustness in multivariate time-series forecasting, they encounter overfitting issues, particularly when the target time-series lacks correlation with other covariates. In response to this challenge, Chen et al. [28] integrated linear and nonlinear components within a time-series framework and introduced cross-variable feedforward layers. This integration led to the development of the TSMixer model, which cyclically applies multilayer perceptrons (MLPs) across time and feature dimensions. Conceptually, these operations correspond to time and feature mixing, facilitating the effective capture of time patterns and cross-variable information. Despite its strengths, TSMixer may face constraints in embedding dimensions when handling exceptionally long sequences. Consequently, the design of a computationally efficient MLP architecture that maximizes the utilization of the inherent characteristics of a time-series has emerged as a pivotal inquiry. In response to the previously discussed issue, Vijay et al. [29] introduced a novel hybrid channel modeling approach named CI-TSMixer. Similar to the TSMixer framework proposed by Chen et al. [28], CI-TSMixer uses a mixer architecture based on multilayer perceptrons (MLPs) as its core network structure. However, the CI-TSMixer distinguishes itself by dividing the input multivariate time-series into fixed-sized patches, with each block containing data from all variables within a specific time window. This patch-based approach enhances the capture of local time dependencies more effectively. Moreover, CI-TSMixer integrates a straightforward gate attention mechanism within the core network to prioritize important features. By incorporating these lightweight components, the learning capacity of the fundamental MLP structure was notably strengthened. Consequently, CI-TSMixer outperforms complex Transformer models with minimal computational overhead.

The presence of trends, seasonality, and irregular fluctuations often leads to nonstationarity in time-series data. This nonstationarity poses challenges by impeding the stable propagation of deep features, disrupting feature distributions, and complicating the learning of data distribution changes. In response to this issue, Yi et al. [30] introduced a model named U-Mixer, which effectively captures local time dependencies between different blocks and channels by integrating Unet and Mixer architectures. U-Mixer is designed to alleviate the impact of distribution shifts between channels, merging low-level and high-level features to create comprehensive data representations. Through extensive experimentation on real-world time-series data, U-Mixer has shown to be effective and robust. Real-world time-series analysis often grapples with intricate temporal changes that pose significant challenges in forecasting tasks, surpassing the boundaries of traditional approaches that focus on simple decomposition and multiple periodicity analysis. Wang et al. [16] introduced a fresh perspective on dissecting time variations through a new multiscale hybrid method. This viewpoint is rooted in the recognition that time-series show unique patterns across different sampling scales, where micro and macro information manifest at fine and coarse levels, thereby enabling a profound comprehension of complex variations. Wang et al. [16] introduced the TimeMixer model, structured entirely on multilayer perceptrons (MLPs). TimeMixer adopts a multiscale mixing architecture, leveraging two critical modules—past-decomposable-mixing (PDM) and future-multipredictor-mixing (FMM), to effectively harness distinctive change information and complementary predictive abilities across a spectrum of time-series scales. The PDM module employs a decomposed design to address the unique attributes of seasonal and trend components, while the FMM module integrates multiple predictors of varying scales to forecast future sequences. Overall, TimeMixer exploits the multiscale nature of time-series data by combining information from diverse scales using PDM and FMM modules. This approach, characterized by its simplicity and efficiency, captures intricate temporal changes and heralds a fresh paradigm in time-series forecasting. The evaluation of TimeMixer’s performance on the Solar-1 dataset across varying forecasting lengths (96, 192, 336, 720) showed remarkable predictive advantages in PV forecasting, particularly in short-term forecasting endeavors (for details, please see Table 5). Comparative assessments highlighted TimeMixer’s consistent superiority over methods such as DLinear and Informer in terms of overall performance.

In multivariate PV forecasting, the incorporation of diverse variables linked to solar energy production, containing aspects such as weather conditions, light intensity, and temperature, offers a holistic framework that enhances the predictive precision. This inclusive method not only fosters a deeper comprehension of the interrelations and influences among variables but also improves the efficacy of forecasting models. However, this comprehensive approach presents inherent challenges in terms of data acquisition, model intricacy, and computational overhead, thereby diminishing the interpretability of models.

3.1.3. Frequency Domain-Based Forecasting Models

The presence of noise and measurement errors raises concerns regarding the viability of accurate long-term forecasting, suggesting there are limitations to the current methods of extended forecasting tasks. Sun and Boning [15] conducted a mathematical analysis demonstrating that complex models may not surpass simple linear models in long-term forecasting because of error accumulation issues. They introduced the straightforward linear model named AvergeTile, which exploits data periodicity by averaging historical periods and extrapolating them to generate forecasting. Empirical findings show that this simplistic model achieves performance on par with cutting-edge Transformer models, emphasizing the significance of periodicity as a key feature in time-series data that can provide valuable prior knowledge for forecasting tasks. Furthermore, they proposed the FreDo model. FreDo integrated a frequency-domain-based iterative correction mechanism, enabling learning in the frequency domain rather than the time domain. Initially, employing the relatively simple AverageTile module to generate initial forecasting in the frequency domain, the model iteratively refined these forecasting using a mixer module. The Mixer module, resembling a Transformer architecture, includes causality masks to ensure causal relationships. FreDo merged the simplicity of AverageTile with the potent learning capacities of frequency-domain modeling and neural networks, offering a fresh perspective for uncovering inherent patterns in time-series data. The evaluation of the FreDo model on the Solar-2 dataset demonstrated its superior performance in long-term forecasting tasks compared to Autoformer, affirming the benefits of frequency-domain modeling. Nonetheless, designed for long-term forecasting, the model exhibited weaker performance in the short term and lacked consideration of inter-sequence interactions as a univariate model. Future investigations could explore multidimensional frequency-domain models to enhance short-term performance and delve more deeply into the advantages of frequency-domain learning (for details, please see Table 6).

The majority of MLP-based forecasting methods face challenges owing to point-wise mapping and information bottlenecks, leading to a notable hindrance in predictive efficacy. In response to this issue, Yi et al. [30] employed MLPs in the frequency domain for time-series forecasting. Their research delved into the learning patterns of frequency-domain MLPs, revealing two inherent advantages conducive to predictive tasks: (1) Global View: the utilization of the spectrum equips MLPs with a holistic perspective of signals, facilitating the comprehension of global dependency relationships. (2) Energy Compaction: Frequency-domain MLPs concentrate on smaller frequency components characterized by compact signal energy. Building on these characteristics, Yi et al. [30] introduced FreTS, a straightforward yet efficient time-series forecasting framework grounded in frequency-domain MLPs. Notably, this model stands out for its utilization of two parallel frequency-domain MLPs to model the real and imaginary components of spectral data. Furthermore, it conducts frequency-domain transformations and learning across two scales, inter-sequence and intra-sequence, to capture dependencies of varying granularities. Empirical findings underscore the substantial potential of this model in achieving commendable efficiency and predictive performance. Long-term time-series forecasting poses a significant challenge, as existing methods often focus on capturing patterns from a singular domain (such as time or frequency), neglecting a comprehensive treatment of long-term time-series across both time and frequency domains. In response to this gap, Luo et al. [31] introduced the time-frequency enhanced decomposition network (TFDNet) to uncover enduring underlying patterns and temporal periodicity within the time-frequency domain. By leveraging the short-time Fourier transform (STFT), this study transforms extended time-series into time-frequency matrices. The researchers devised a multi-scale time-frequency enhanced encoder featuring distinct time-frequency blocks (TFBs) tailored for trend and seasonal components, aiming at extracting diverse foundational patterns within these elements. Insights into various channel-related patterns are incorporated into the kernel operations within the TFB, exploring two kernel learning strategies (separate and shared) to address diverse channel-related patterns within the seasonal components. Empirical findings validate the effectiveness and efficiency of TFDNet in handling long-term time-series forecasting tasks.

Time-frequency domain time-series forecasting represents an approach that combines temporal and spectral information for predictive tasks. This method facilitates the extraction of long-term dependencies within time-series data and underlying characteristics in the frequency domain, consequently improving forecasting accuracy. Nonetheless, its elevated model complexity relative to those of univariate and multivariate models typically results in heightened computational and storage demands. Furthermore, the transformation of time-series data (e.g., via STFT) in time-frequency domain forecasting may introduce certain information loss. Moreover, challenges such as limited domain adaptability are also present in this framework.

This section emphasizes the unique developmental properties of MLP-based models for univariate, multivariate, and frequency domain forecasting. MLP-based models have demonstrated strong applicability and predictive performance in PV energy forecasting.

3.2. RNN-Based Models

Recurrent neural network (RNN) is a specialized neural architecture adept at processing sequential data, containing fields such as speech and text. Unlike MLP, RNN possesses internal states that enable it to harness prior inputs and concealed states for forecasting at the present time step, rendering it particularly effective for time-series forecasting tasks. The seminal work long short-term memory LSTM), penned by Hochreiter and Schmidhuber [7], ushered in a groundbreaking recurrent neural network variant. LSTM was created to rectify the limitations of RNNs in managing prolonged dependencies. LSTM introduces a trio of gating mechanisms, namely forget gate, input gate, and output gate, that deftly regulate hidden state updates, thereby enhancing the capture of enduring dependencies. Moreover, LSTM introduces a cell state as a supplementary element to hidden states, streamlining the preservation of long-term information and effectively mitigating the vanishing gradient predicament endemic in conventional RNNs. In a bold fusion of innovation, Wang et al. [13] combined LSTM with convolutional layers, making the LSTM-convolutional neural network, which was tested on the DKASC dataset. The results underscore the superiority of this hybrid model, showing its superiority over standalone models in terms of predictive accuracy and performance (for details, please see Table 7).

The key work by Cho et al. [32] revealed the gated recurrent unit (GRU) as a distinct variant of recurrent neural network (RNN). In contrast to the intricate architecture of LSTM, GRU presents a streamlined design including only two gating mechanisms, the reset gate and the update gate. GRU adeptly governs the updates of hidden states and long-term dependencies within data sequences. One notable departure from LSTM lies in GRU’s omission of a dedicated cell state; instead, it seamlessly employs the hidden state as the output, fostering a more concise network structure. Moreover, with fewer parameters compared to LSTM, GRU facilitates more expedient training processes and yields. To accelerate the standalone GRU model, Jia et al. [33] proposed the innovative VMD-ISSA-GRU hybrid model, which combined a data dimensionality reduction method—variational mode decomposition (VMD) and a parameter optimization method—improved sparrow search algorithm (ISSA) with GRU. Table 8 shows a comparison of LSTM, GRU, and VMD-ISSA-GRU using the DKASC dataset (for details, please see Table 8).

The evolution of RNN-based models sets sail on their unique developmental path. We summarize the RNN-based models in two sections: the probabilistic forecasting model in Section 3.2.1, and the multivariate forecasting model in Section 3.2.2.

3.2.1. Probabilistic Forecasting Models

To capture long-term temporal dependencies accurately and discern vital driving sequences for time-series forecasting, Qin et al. [34] proposed the dual-stage attention-based recurrent neural network (DA-RNN). The architecture of DA-RNN comprises an encoder endowed with an input attention mechanism and a decoder featuring a temporal attention mechanism. The innovation lies in the adaptive selection process of the input attention mechanism, which can identify pertinent driving sequences, while the temporal attention mechanism inherently absorbs extensive temporal information from the encoded input. By harnessing these dual attention mechanisms, DA-RNN not only adeptly selects the most important input features but also effectively captures the intricate long-term temporal dependencies. Despite its significant capabilities, DA-RNN’s exclusive focus on single-step forward forecasting poses limitations for researchers aspiring to forecast extensive arrays of interconnected time-series data. To address this constraint, Salinas et al. [35] introduced the DeepAR model, which distinguishes itself by its ability to learn global models from correlated time-series data. By rescaling and velocity-based sampling, it produces accurate probabilistic forecasts with precise accuracy and can learn intricate patterns, such as seasonality and increasing uncertainty over time from the data. Yemane [14] evaluated the performance of DeepAR for PV power forecasting using the Solar-7 dataset. The findings show DeepAR’s superiority over traditional baseline models across both the 36 h and 1 h forecasting horizons, marking a significant improvement in PV power forecasting accuracy (for more details, please see Table 9).

The training of DeepAR on discrete or mixed datasets can result in error accumulation, which affects model accuracy. To address this issue, Bergsma et al. [36] introduced the coarse-to-fine autoregressive network (C2FAR), a new method tailored for modeling the probability distribution of univariate numerical random variables. Empirical analysis has underscored the capability of C2FAR to effectively emulate intricate, multimodal, authentic, or discrete data patterns without relying on prior information, rendering it highly versatile for applications in time-series forecasting. Subsequently, Bergsma et al. [37] proposed SutraNets, a model that reframes lengthy univariate forecasting as multivariate forecasts of low-frequency subsequences. Based on C2FAR, SutraNets facilitate the efficient representation of time-series amplitudes and afford cost-effective encoding of covariate subsequences. Notably, SutraNets demonstrate commendable predictive accuracy while maintaining performance levels similar to those of standard sequence models, and address the challenges associated with error accumulation in DeepAR.

Later, Li et al. [27] have introduced an impactful training method that combines sequential neural networks and multi-level forecasting, termed “forking-sequences”. Furthermore, they developed a versatile framework (MQ-RNN) tailored for probabilistic multi-step time-series regression. MQ-RNN shows a remarkable ability to adapt to multiple interrelated time-series enriched with static attribute information. For newly emerging sequences devoid of historical data, MQ-RNN leverages its static characteristics to obtain valuable insights from analogous sequences, thereby enhancing the forecasting accuracy.

Although some models are beginning to indirectly consider temporal frequency information, the majority still reside within the realm of time-domain methods. Wang et al. [38] proposed a new wavelet-based neural network architecture known as the multi-level wavelet decomposition network (mWDN) to develop a frequency-aware deep learning model tailored for time-series analysis. This model is seamlessly integrated into prominent deep learning frameworks, such as LSTM, boasting commendable interpretability. In addressing the challenges posed by long retrospective windows and forecasting horizons, RNN-based methods encounter problems in long-term time-series forecasting (LTSF) stemming from many recurrent iterations within RNNs. To handle this issue, Lin et al. [39] proposed a segmentation technique called SegRNN, which replaces the conventional point-wise and segmentation iterations in LTSF with a more efficient architecture. SegRNN significantly reduces the requisite recurrent iterations in LTSF, thereby substantially enhancing forecasting accuracy and inference speed. Empirical findings underscore that SegRNN not only surpasses models relying on state-of-the-art Transformer architectures but also reduces runtime and memory consumption by over 78%. Capturing sequential features is important for precise long-term time-series forecasting, necessitating the modeling of both global and local correlations. In response to the challenge of capturing diverse semantic information, Jia et al. [40] presented a cutting-edge framework called WTRAN. By leveraging horizontal-vertical gate selection units (HVGSU), WTRAN adeptly captures semantic nuances ranging from local to global within sequences, all while preserving the integrity of both long-term and short-term periodic features. Experimental outcomes demonstrate that WTRAN enhances forecasting performance by 5.80% and 14.28% in long-term and super long-term time-series forecasting undertakings, respectively. The above three models improve the efficiency of time-series forecasting models in terms of interpretability, computational efficiency, and data dependence, respectively, thus improving certain shortcomings of probabilistic forecasting models. In PV power forecasting, the relevant models can be used based on the significant characteristics of PV power generation time-series data in different regions. However, it is important to emphasize that these models mainly focus on univariate forecasting. In regions where PV operations are strongly influenced by climatic conditions, the dynamic interactions of multiple variables must be considered to improve forecast accuracy.

3.2.2. Multivariate Forecasting Models

Lai et al. [41] introduced a new deep learning framework, LSTNet, tailored for multivariate time-series forecasting tasks. Leveraging convolutional neural networks (CNNs) and recurrent neural networks (RNNs), LSTNet adeptly extracts short-term local dependency patterns among variables while discerning long-term trends within the time-series. Lai et al. [41] evaluated the performance of the LSTNet model in PV power generation forecasting tasks using the Solar-2 dataset. Experimentation across various forecasting horizons (30 min, 1 h, 2 h, and 4 h) showed LSTNet’s pronounced superiority over traditional RNN models, particularly evident when forecasting over extended time spans. In contrast to LSTNet’s incorporation of time attention layers to select pertinent time steps, Chang et al. [42] proposed an end-to-end memory network named MTNet. Comprising a large memory component, three independent encoders, and an autoregressive component for joint training, MTNet excels in capturing long-term dependency relationships, and its interpretability is further enhanced by the attention mechanism design. Experimental results using the Solar-2 dataset reveal that MTNet outperforms other models in terms of the RSE and CORR metrics (for details, please see Table 10).

In essence, multivariate models excel in learning and modeling the interrelations among different variables, facilitating a better comprehension and forecasting of dynamic time-series data. By leveraging multiple related variables as inputs, these models enhance their resilience against noise and outliers, thereby elevating the robustness of forecasting. However, challenges still exist in this approach, such as difficulties in data acquisition, complex feature engineering, high model complexity, strong cross-variable dependencies, and weak generalization capabilities.

3.3. CNN-Based Models

In 1989, LeCun et al. [43] introduced a revolutionary convolutional neural network architecture founded on a multilayer perceptron, famously known as convolutional neural network (CNN). This pioneering work not only laid the cornerstone for the proliferation of convolutional neural networks within the realm of computer vision but also had a profound influence on subsequent advancements in the domain of deep learning. Although convolutional neural networks demonstrate relatively weaker prowess in modeling prolonged dependencies and incurring higher computational expenses in contrast to recurrent neural networks, they excel in handling spatially correlated data, despite their versatility in processing time-series data. Nevertheless, propelled by the progress in temporal deep learning, a plethora of researchers have discerned the distinctive advantages of convolutional neural networks in cross-variable modeling, a quintessential attribute crucial for intricate tasks such as PV forecasting and other multifaceted predictive undertakings. Fast forward to 2018, Bai et al. [21] proposed an innovative temporal modeling paradigm christened TCN (temporal convolutional network) anchored on convolutional neural networks. The primary objective of this paradigm is to overcome certain constraints inherent in traditional recurrent neural networks and LSTMs when handling extensive sequence data. TCN integrates dilated convolutions to broaden the receptive field, thereby empowering it to capture extended dependencies. Furthermore, the integration of residual connections assuages the predicament of vanishing gradients, thereby augmenting the stability and efficacy of the model. TCN shines brightly in terms of parallel computing efficiency and proficiency in modeling protracted sequences, thereby assuming a key role in propelling the evolution of convolution-based temporal modeling methods.

3.3.1. Cross-Time-Scale Models

Time-series data exhibits distinctive characteristics, including trends, periodicity, and seasonality, which imbue it with inherent predictability—a quality often absent in general sequential data. However, prevailing time-series forecasting models encounter challenges in handling ultra-high-dimensional time-series data. These models tend to either prioritize global patterns at the expense of local nuances or neglect the broader context. In response to these limitations, Sen et al. [44] introduced an innovative model named DeepGLO that adeptly captures both global properties and local features. DeepGLO leverages a fusion of temporal convolutional networks (TCN) and matrix factorization, culminating in a regularized matrix factorization model (TCN-MF). By integrating TCN’s expertise in deciphering nonlinear temporal dependencies with the knowledge of matrix factorization in capturing overarching trends, the TCN-MF model harnesses global insights for more informed forecasting. Notably, the TCN-MF model’s outputs can intelligently guide another local TCN model, enabling DeepGLO to navigate the intricacies of ultra-high-dimensional time-series data while harmonizing local and global temporal patterns for enhanced predictive accuracy. Despite strides in mitigating the long-term dependency challenges associated with recurrent neural networks (RNNs), DeepGLO may still fall short in modeling exceedingly prolonged dependencies, particularly in datasets showing dynamic periodic or non-periodic patterns. In pursuit of precise and robust multivariate time-series forecasting, Huang et al. [45] introduced DSANet—an innovative dual self-attention network combining the strengths of convolutional neural networks (CNNs), self-attention mechanisms, and autoregressive models. DSANet deftly captures time-series intricacies from a dual vantage point, encompassing both global and local temporal perspectives. By carefully modeling dependencies between sequences and exhibiting scale adaptability, DSANet emerges as a promising solution, especially well-suited for scenarios characterized by dynamic temporal patterns, offering the potential for high-precision and resilient forecasts. Notwithstanding these advancements, the integration of prior knowledge remains a challenge for DSANet, particularly in effectively leveraging such insights for time-series data marked by distinct periodic or seasonal patterns. To tackle this issue, Liu et al. [46] proposed SCINet—a recursive downsampling, convolution, interaction framework primed to extract unique temporal features from downsampled subsequences using multiple convolution filters. Through combining and synthesizing a wealth of features across different resolutions, SCINet skillfully learns trend, periodicity, and seasonality patterns at varying time scales within time-series data, effectively capturing both local intricacies and global temporal dependencies. Employing such comprehensive representations not only enriches the predictive information but also significantly enhances the accuracy of time-series forecasting by mapping sequences adeptly. Notably, the evaluation of SCINet on the Solar-2 dataset for predicting PV power at different intervals shows superior performance with RSE values of 0.1775, 0.2301, 0.2997, and 0.4081, respectively (for details, please see Table 11).

When compared to certain traditional time-series models, SCINet indeed demonstrates a more intricate internal mechanism, leading to extracted time features that may lack strong interpretability. Drawing inspiration from conventional time-series decomposition algorithms, Wang et al. [47] introduced the MICN (multi-scale isometric convolution network) model. This model integrates a multi-scale hybrid decomposition module that adeptly dissects input sequences into trend-cyclical and -seasonal components. Through the fusion of local feature extraction and global correlation modeling, MICN efficiently forecasts long-term time-series patterns while streamlining computational complexity. In a departure from MICN’s utilization of multi-scale or multi-branch CNNs, Gong et al. [48] proposed a pioneering convolutional neural network (CNN) architecture called PatchMixer. PatchMixer introduced a permutation-invariant convolutional structure to safeguard temporal information, relying entirely on separable convolutions. This innovative design empowers PatchMixer to capture local features and global correlations within a singular-scale framework. Additionally, PatchMixer incorporates dual forecasting heads comprising both linear and nonlinear elements to enhance the emulation of future curve trends and intricacies.

DeepGLO, DSANet, SCINet, MICN, and PatchMixer exhibit distinct levels of innovation and advancements in the domains of local feature extraction, global correlation modeling, short-term fluctuations, long-term trends, and the management of trend, periodicity, and seasonality characteristics. Nevertheless, the dynamics of real-world time-series changes often encompass intricate temporal patterns. Drawing upon insights into the multi-periodicity of time-series data, Wu et al. [49] introduce a model named TimesNet, which disentangles complex temporal variations into intra-period and inter-period changes. TimesNet extends the analysis of temporal changes into a two-dimensional realm by capturing temporal alteration patterns triggered by diverse periods. It facilitates the transformation of 1D time-series data into a 2D space, effectively encapsulating both intra-period and inter-period changes. The modular architecture of TimesNet enables adaptable modeling of intricate time-series data while upholding superior parameter efficiency and generalization capabilities (for more details, please see Table 12).

In addition to temporal and regular pattern variations, the presence of uncertainty and noise in time-series data poses additional technical challenges when incorporating trends and seasonal components. Drawing inspiration from TimesNet, Ou et al. [50] introduced WinNet, an architecture where a single convolutional layer acts as the foundational structure of the forecasting network. WinNet innovatively employs a method of setting periodic windows to convert a one-dimensional sequence into a two-dimensional tensor featuring both long and short periodic characteristics. This tensor is subsequently decomposed into period trend and oscillation components, using their correlation to bolster the predictive capabilities of the CNN. In comparison to various baseline methods, WinNet shows enhanced performance in long-term time-series forecasting tasks while maintaining lower computational complexity.

In addressing the integration of local and global modeling capabilities, Cheng et al. [51] highlight a key limitation in existing methods. These methods predominantly focus on predicting future time step values without considering the interactions between forecasting at varying time distances ahead. This oversight may impede the model from gathering adequate future insights, thereby impacting the accuracy of forecasting. Drawing insights from the psychological theory of construal level theory, Cheng et al. [51] introduced MLCNN (multi-level construal neural network), a deep learning framework designed to fuse information using diverse perspectives on future scenarios. MLCNN employs a multi-layer convolutional neural network to generate multi-level abstract representations for different future time steps. By integrating forecast information from multiple future instances through a fusion-encoder-main-decoder architecture and bolstering resilience with AR models, MLCNN effectively enhances the forecasting accuracy of multi-dimensional time-series data.

The extraction of global and local trends, periodicity, and seasonality features from time-series data represents an ongoing objective that models within this domain continuously investigating. In PV power forecasting, time-series data manifest diverse characteristics, including the overarching trend of power escalation during daylight hours and stabilization at night; the local trait of notable fluctuations throughout the day influenced by cloud cover; and seasonal attributes, such as diminished power levels in winter and heightened levels in summer. Optimal model selection for modeling purposes can facilitate more precise assimilation of the varying feature scales present in PV data, ultimately enhancing the accuracy of forecasting. Notably, in specific regions, PV time-series data are subject to influences from factors like cloud cover, wind speed, and surface temperature, resulting in pronounced periodic fluctuations, less discernible features, and heightened complexities in learning, thereby posing challenges for these models to achieve precise forecasting.

3.3.2. Cross-Variate-Dependence Models

PV time-series data display intricate temporal dependencies and cross-variable connections. However, many models have predominantly focused on forecasting cross-temporal correlations, neglecting the importance of cross-variable relationships. To address this gap, Luo and Wang [52] proposed the sophisticated convolutional model called cross-LKTCN. This innovative cross-LKTCN approach leverages patch-style embedding to enhance the local context and integrate more semantic information. Furthermore, it employs depth-wise large kernel convolution to capture prolonged dependencies across time and uses two successive point-wise group convolution feedforward networks to grasp cross-variable connections effectively. The design of cross-LKTCN, featuring a purely convolutional structure and linear complexity, offers an efficient and accurate framework for modeling. In comparison to existing convolutional and cross-variable modeling techniques, cross-LKTCN has shown superior performance in accuracy. On the other hand, Cheng et al. [53] proposed ConvTimeNet as an alternative to cross-LKTCN. ConvTimeNet is a deep hierarchical fully convolutional network with a slightly higher computational complexity. This model integrates innovative elements such as adaptive deformable patches to alleviate semantic information loss and fully convolutional blocks to capture temporal and cross-variable dependencies. Additionally, the deep hierarchical structure of ConvTimeNet allows for the extraction of multi-scale representations of time-series data, effectively capturing complex features and dependencies within the data.

Groundbreaking models such as cross-LKTCN and ConvTimeNet have left an indelible mark in exploring cross-variate dependencies in time-series by using CNNs. Yet, the question arises: can we further harness the prowess of convolutional networks in time-series analysis to broaden the receptive field for convolutional time-series forecasting? Luo and Wang [54] revamped the conventional temporal convolutional network (TCN), tailoring it specifically for time-series tasks. Their brainchild, ModernTCN, introduces innovative elements such as depth-wise convolution (DWConv) and two convolutional feedforward networks (ConvFFN). DWConv, strategically employed to unravel dependencies in the temporal dimension, eschews the conventional method of layer stacking in favor of larger convolutional kernels to amplify the receptive field. Meanwhile, the dual ConvFFNs are tasked with assimilating fresh feature representations for each variable and capturing inter-variable dependencies. By implementing this sophisticated network architecture to achieve an expansive receptive field, ModernTCN has shown stellar performance across a spectrum of time-series analysis tasks. On a parallel trajectory, Wang et al. [55] introduced a paradigm-shifting concept, positing that rather than relying on deep networks to attain a substantial receptive field, features could be transmuted into a domain inherently endowed with a vast receptive field. Grounded in this visionary notion, they unveiled TLNets (transformation learning networks), designed with the explicit goal of fortifying the receptive field. Drawing inspiration from receptive field learning (RFL), TLNets incorporate an array of transformative mechanisms: Fourier transform (FT), singular value decomposition (SVD), matrix multiplication, and convolutional blocks (Conv). This innovative framework establishes four distinct TLNets models: FT-Matrix, FT-SVD, FT-Conv, and Conv-SVD. The FT and SVD modules are dedicated to assimilating global insights, while the Conv module zeroes on local information acquisition, and the Matrix module harmonizes both realms. Through this strategic amalgamation, TLNets achieve a harmonious blend of local and global receptive fields, thereby elevating model performance, especially in long-term predictive tasks.

In PV power forecasting, influential factors such as cloud cover, wind speed, and surface temperature play key roles in shaping the daily trends of PV power generation. Leveraging cross-variable modeling proves to be a potent strategy for effectively capturing the intricate inter-dependencies among these factors, thereby unearthing the underlying relationships between variables to yield more profound insights into the temporal dynamics of the data. By expanding the model’s receptive field, it becomes adept at discerning time-dependent features over extended time spans and simultaneously monitoring the temporal evolution of multiple variables. This capability not only facilitates the capture of long-term dependencies within time-series data but also enables the acquisition of richer and more meaningful representations of the temporal features, ultimately fortifying the model’s resilience and performance. It is imperative to acknowledge that as the number of variables grows, the computational complexity of the model escalates exponentially. Hence, the judicious selection of influential factors closely linked to PV power fluctuations and their inclusion in the training process stands as a crucial prerequisite for the effective functioning of such models.

3.3.3. Other Models

In the field of PV time-series data analysis, the goal remains to predict future trends by establishing complex links between multiple scale features in historical data, whether it is to learn global and cyclical features or to capture cross-variate features. While the basic modeling ideas remain consistent, scholars may offer different perspectives on modeling approaches.

The essence of time-series forecasting lies in revealing the relationship between past series and future series, rather than merely emphasizing the internal correlation of past series or global features. Therefore, Shen et al. [56] proposed a novel model, FDNet (focused decomposition network), which uniquely focuses on localized segments of the input series. FDNet employs a decomposition forecasting strategy to break down the forecasting process into independent components. By using a focal decomposition approach to the input sequence, FDNet extracts features from only localized sub-sequences and aggregates the forecasting results from each segment to generate the final predicted sequence. By discarding the global coarse-grained feature mapping, FDNet effectively mitigates the distributional bias problem caused by dynamic changes in the time-series data and outperforms 13 state-of-the-art baseline models by reducing the mean square error (MSE) by 38.4% on average. This highlights FDNet’s superior ability to improve forecast accuracy.

In the domain of cross-variable forecasting methods, there is often a tendency to overlook the unique characteristics of individual variables, thereby impacting forecasting accuracy. To better capture the inherent patterns within time-series data and enhance the performance of long-term forecasting, Wu et al. [49] proposed a new deep learning network architecture called the multi-resolution periodic pattern network (MPPN). MPPN constructs context-aware multi-resolution semantic units from time-series data, leveraging multi-period pattern mining to identify key patterns within the data. By incorporating a channel-adaptive module, MPPN effectively captures the intrinsic features of multivariable time-series data across different patterns, thereby improving long-term forecasting performance.

In an endeavor to significantly reduce computational and storage overheads while maintaining high forecasting accuracy, and broadening the applicability of multi-dimensional time-series forecasting techniques, Lai et al. [57] introduced a lightweight multi-dimensional time-series forecasting framework named LightCTS. LightCTS strikes a balance between computational efficiency and predictive performance through its plain stacking architecture, lightweight temporal convolutional networks (L-TCN), and last-shot compression approach. These design choices offer a new pathway for constructing lightweight time-series forecasting models. Through evaluation on the Solar-2 dataset for predicting future electricity generation over various time intervals, LightCTS demonstrated superior performance with relative squared error (RSE) values outperforming baseline models. Notably, LightCTS required only approximately 1/10th of the floating-point operations and model parameters compared to the baseline models, resulting in significantly reduced inference latency and memory usage. These findings highlight LightCTS as an efficient time-series forecasting model that effectively minimizes computational and storage costs while achieving high predictive accuracy (for more details, please see Table 13).

FDNet, MPPN, and LightCTS have collectively brought new perspectives to the field of PV time-series forecasting, highlighting the intricate relationship between historical and future data, the unique attributes of individual variables in cross-variable analysis, and the innovation of lightweight architectures. These models show their respective unique strengths, but they also come with certain limitations. Therefore, in practical applications, it is important to choose the appropriate model based on the specific requirements and characteristics of the PV time-series data.

3.4. GNN-Based Models

Li et al. [58] proposed the innovative neural network architecture known as GNN (graph neural network), which played a pivotal role in shaping the landscape of graph neural networks. Within the domain of time-series forecasting, GNNs exhibit a remarkable capacity to represent time-series data as graph structures, effectively capturing intricate dependencies among variables, combining temporal dependencies with spatial correlations, and accommodating diverse sources of heterogeneous data. For example, applying GNN to model the spatial attributes of neighboring PV systems entails treating each PV system as a node within the graph (incorporating node features such as position coordinates, tilt angle, and azimuth angle), with edges symbolizing spatial relationships between systems based on features such as distance. Leveraging historical PV power data as the primary input feature, along with the surrounding system power data and irradiance data (GHI, DHI, and DNI) as supplementary inputs, enhances the modeling process. In their empirical work, Woschitz [17] employed the aforementioned methodology to assess the predictive accuracy and efficacy of the GNN model using the Solar-5 dataset, with a specific emphasis on datasets exhibiting substantial variations in PV power output under conditions of extensive cloud cover. The results revealed a remarkable enhancement in forecasting accuracy with the GNN model, showing a 39% improvement in RMSE and a 37% enhancement in MAE compared to a traditional linear regression model. This underscores the GNN model’s adeptness in leveraging spatial information from neighboring PV systems, thus enabling a more precise capture of factors, such as the impact of cloud cover on PV power generation and subsequently boosting forecasting performance (for more details, please see Table 14). This research unveils a fresh modeling paradigm and finds a new way to tackle time-series forecasting challenges.

Wu et al. [59] introduced MTGNN, an end-to-end framework that revolutionizes the approach to modeling multivariate time-series forecasting through the lens of static graphs. This pioneering framework excels at concurrently learning the graph structure and conducting spatiotemporal modeling. By leveraging graph learning layers, MTGNN dynamically uncovers hidden relationships between variables, culminating in the generation of the essential graph adjacency matrix. Harnessing the acquired graph structure and using graph convolution modules, it captures the intricate spatial dependencies among the variables. Furthermore, the integration of the time convolution module facilitates the modeling of temporal dependencies within the time-series data, thereby enhancing its capability to address the complexities of multivariate time-series forecasting challenges. Upon evaluating the predictive efficacy of MTGNN on the Solar-1 dataset, Wu et al. [59] revealed remarkable outcomes. Across forecasting horizons of 3, 6, 12, and 24 steps, the Root Squared Error (RSE) values stood at 0.1778, 0.2348, 0.3109, and 0.4270 correspondingly, accompanied by correlation coefficients (CORR) of 0.9825, 0.9726, 0.9509, and 0.9031. These findings show the robust performance of MTGNN, particularly emphasizing its advantages in short-term forecasting horizons within the Solar-1 dataset. Such results underscore the framework’s adeptness in effectively capturing the spatiotemporal dependencies inherent in solar energy generation data. Despite a marginal decline in performance superiority with extended forecast horizons, MTGNN continued to surpass other conventional benchmark models.

Liu et al. [60] unearthed a key realization: traditional static graphs fall short in capturing the dynamic correlations that evolve among variables over time. Leveraging this insight, they introduced a groundbreaking framework known as TPGNN (temporal polynomial graph neural network), harnessing temporal matrix polynomials to depict these dynamic correlations. At the core of TPGNN lies the TPG module within an encoder-decoder architecture, adept at modeling the dynamic interplay among variables through the intricate integration of temporal matrix polynomials. Moreover, TPGNN integrates cyclic timestamp embeddings to facilitate forecasting on time-series data of varying lengths, imbuing it with the sophistication required to adeptly grasp the intricate spatiotemporal dependencies inherent in multivariate time-series (MTS) data, ultimately resulting in more precise forecasting. Remarkable results were obtained in the evaluation of the TPGNN’s predictive performance using the Solar-1 dataset. While TPGNN did not show a substantial advantage over MTGNN at forecasting horizons of 3, 6, and 12 steps, its supremacy shone through notably at the 24-step forecast horizon, underscoring its exceptional long-term forecasting capabilities (for more details, please see Table 15).

Cao et al. [61] have introduced a cutting-edge framework known as StemGNN (spectral temporal graph neural network), designed to adeptly model temporal patterns and correlations across multiple time-series simultaneously within the frequency domain. StemGNN leverages the intricate spatiotemporal relationships inherent in PV systems to conduct a comprehensive multivariate analysis of PV power sequences spanning both time and space dimensions. By implementing sliding window techniques, StemGNN efficiently manages computational complexity. The resulting spatiotemporal attention matrix acts as a foundational adjacency matrix, feeding into the graph neural network for correlation extraction. Additionally, StemGNN leverages graph Fourier transform and Fourier transform methods to transition data into the spectral and frequency domains, thereby enhancing the visibility of trend features and elevating forecasting accuracy. Zhang et al. [62] conducted an evaluation of StemGNN with the DKASC dataset. The outcomes showed substantial enhancements in metrics such as MAE and RMSE when compared to standard models like LSTM and GRU (for more details, please see Table 16 and Table 17). This highlights StemGNN’s proficiency in elevating predictive performance by capturing intricate temporal relationships and harnessing the capabilities of the frequency domain for more precise forecasting.

Cai et al. [63] introduced a groundbreaking model called MSGNet, which melds frequency domain analysis with adaptive graph convolution to concurrently capture both internal and cross-sequence correlations within sequences. Initially, MSGNet leverages frequency domain analysis to encompass various time scales, followed by adaptive graph convolution to discern cross-sequence relationships. By incorporating a multi-head attention mechanism to delineate internal sequence dependencies and combining feature representations across diverse time scales, MSGNet adeptly captures the intricate dependencies present in multivariate time-series data across different temporal scales.

In contrast to methods like StemGNN and MTGNN that rely on stacked graph and temporal networks to separate spatial and temporal dependencies, Yi et al. [18] re-envisioned the challenge of multivariate time-series forecasting from a pure graph perspective. They delved into the notion of whether spatial and temporal dependencies could be simultaneously apprehended solely through graph networks, introducing a fresh data construct known as a hypergraph. At the heart of FourierGNN lies the construction of a hypergraph data structure and the formulation of efficient graph convolution operators rooted in Fourier transforms to harmonize the modeling of spatiotemporal correlations. This methodology converts time-series data into a fully interconnected graph, thereby amplifying the efficacy of multivariate time-series forecasting. In an analysis conducted on the Solar-3 dataset, Yi et al. [18] evaluated FourierGNN’s performance in predicting the succeeding 12 time steps based on the preceding 12 time steps of data. FourierGNN showed an impressive performance with an MAE of 0.120, RMSE of 0.162, and MAPE of 116.48, outshining models such as StemGNN and MTGNN, thereby demonstrating its superior predictive prowess (for more details, please see Table 18). This underscores that FourierGNN, anchored in hypergraph modeling, excels in capturing the interplay between spatial and temporal dependencies compared to conventional methods that segregate these dependencies. Moreover, the Fourier graph operator module used in FourierGNN has a lower computational complexity, enabling more efficient predictions.

In order to effectively capture the intricate spatiotemporal dependencies among regional PV stations, Zhang et al. [19] introduced a cutting-edge method called STGCN, a spatiotemporal graph neural network that incorporates weather condition recognition. Drawing upon data from 20 PV stations in Jilin Province and cloud imagery sourced from the Fengyun-4G (FY-4G) satellite, STGCN follows a structured methodology. Initially, spatial features are harnessed through graph convolutional neural networks (Graph CNN), followed by the utilization of gated convolutional neural networks (Gated CNN) to capture temporal characteristics. Subsequently, a spatiotemporal convolution block (ST-Conv Block) is integrated to blend spatial and temporal features seamlessly. Furthermore, leveraging satellite cloud imagery data classifies the weather conditions into three categories (clear, partly cloudy, and cloudy) (for more details, please see Table 19). The outcomes highlight that, in comparison to other baseline models, STGCN shows superior predictive capabilities across various evaluation metrics and forecasting time frames. Unlike models that singularly focus on spatial features using GCN or temporal features using LSTM, STGCN excels in simultaneously capturing the spatial and temporal attributes of power data, thereby enhancing forecasting accuracy significantly.

Whether it is through graph-structured modeling for forecasting (such as MTGNN, TPGNN, and FourierGNN), examining the frequency domain (like StemGNN and MSGNet), or analyzing spatial relationships (such as STGCN), these varied approaches offer diverse and viable solutions for PV time-series forecasting. These innovative methods underscore the significance of a thorough understanding of data features and relationships, paving the way for enhanced accuracy and efficiency in forecasting. By integrating graph structures, frequency domain insights, and spatial relationships holistically, we can grasp the nuances of time-series data more comprehensively, delivering more precise and dependable forecasting and decision support for the PV industry and energy management sector.

4. Conclusions

Accurate PV power forecasting is crucial for achieving high levels of PV operation coverage, reducing costs, and meeting the national new energy strategy development. However, PV power forecasting has a high degree of uncertainty, and the climate in each country is different—even the climate in different regions of the same country is vastly different. In regions where cloud cover changes dramatically, the changes in weather parameters are extremely unstable.

Therefore, it is impractical to predict PV power in all cases with a single-structure deep learning model.

This paper mainly studies the accuracy, performance, and application of neural networks for PV power forecasting under the basic structures of MLP, RNN, CNN, and GNN, as well as the advantages and disadvantages of different model methods under the same basic structure.

This paper finds that models with different basic architectures reflect different advantages in forecasting. For example:

(1): MLP-based models are often more efficient in forecasting PV power due to their relatively simple vertical structure, which results in a smoother amplitude of time-frequency domain changes in scenarios where PV power forecasting is concerned;
(2): RNN-based models to some extent overcome the problem of gradient disappearance, so they generally perform better than MLP-based models in long-term PV power forecasting;
(3): CNN-based models use the advantage of convolution and perform well in capturing temporal cross-variable features compared to MLP-based models, and also show lower error rates in forecasting PV power generated by multi-cloud weather conditions;
(4): GNN-based models inherit the spatial convolution characteristics and not only take into account the larger uncertainty of temporal dynamic changes in sky cloud cover in PV power forecasting but also consider the larger uncertainty of spatial variability in large-scale centralized PV power stations. Therefore, in large-scale centralized PV power stations or scenarios with temporal dynamic changes in sky cloud cover, GNN-based models have a significant advantage in forecasting PV power compared to MLP-based and RNN-based models.

By comparing and analyzing the accuracy, performance and application of deep neural network models under the basic structures of MLP, RNN, CNN and GNN to the output power forecasting of PV power generation systems, this paper helps the PV power forecasting operation and maintenance personnel and researchers to match the appropriate model structure according to different situations, and build the PV power forecasting method.

Author Contributions

Writing draft, X.L. and J.Y.; resources, L.Y., L.L., K.S., X.Y. (yang_xu@ctg.com.cn); data curation, L.L. and K.S.; methodology, Z.H.; literature investigation, L.Y., X.Y. and S.D.; visualization, X.Y.; writing—review and editing, D.Z. and Z.X.; supervision, D.Z. and X.Y. (yang_xu@ctg.com.cn). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Key Project of Chinese Water Resources Ministry (SKS-2022120), China Yangtze Power Co., Ltd. (contract no. Z242302044) and funded by Natural Science Foundation of Hubei Province (2022CFD027).

Data Availability Statement

Not applicable.

Acknowledgments

Thanks go to all the editors and commenters.

Conflicts of Interest

Authors Lei Yang, Linze Li were employed by the company China Three Gorges Corporation. Authors Keyan Shen, Xu Yang (yang_xu@ctg.com.cn) were employed by the company China Yangtze Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Appendix A

Reference	Based Frame	Models	Main Structure and Features	Already Used in PV Forecasting
[23]	MLP	N-Beats	Block Input, Block Layers (FC layers), Backcast Output, Forecast Output, Doubly Residual Stacking	√
[24]	MLP	N-BeatsX	Block Input, Block Layers (FC layers), Backcast Output, Forecast Output, Convolutional Layer, Doubly Residual Stacking, Interpretable Time-Series Signal Decomposition	-
[25]	MLP	N-HiTS	Multi-Rate Signal Sampling, Hierarchical Interpolation, Cross-block Synchronization of Input Sample Rate, Output Interpolation Scale	-
[26]	MLP	DEPTS	Periodicity Module, Discrete Cosine Transform (DCT), Triply Residual Expansions	-
[15]	MLP	FreDo	Mixer, Discrete Fourier Transform (DFT), AverageTile, Inverse DFT	√
[27]	MLP	MTS-Mixers	Temporal MLP, Factorized Channel MLP, Optional Embedding, Linear Projection Layer, Attention-based MTS-Mixer, Random Matrix-based MTS-Mixer, MLP-based MTS-Mixer	-
[28]	MLP	TSMixer	Time-mixing MLP, Feature-mixing MLP, Temporal Projection, Align and Mixing	-
[29]	MLP	CI-TSMixer	Linear Patch Embedding Layer, Mixer Layers, Inter-Patch Mixer Module, Intra-Patch Mixer Module, Inter-Channel Mixer Module Linear Patch Embedding Layer, Gated Attention Block, Online Forecast Reconciliation Heads	-
[31]	MLP	TFDNet	Multi-Scale Window Mechanism, Trend Time-frequency Block, Seasonal Time-frequency Block, Frequency-FFN, Mixture Loss	-
[30]	MLP	FreTS	Domain Conversion, Inversion Stages, Frequency-domain MLPs, Frequency Channel Learner, Frequency Temporal Learner, Dimension Extension Block	√
[30]	MLP	U-Mixer	Mixer, Normalization and Patch Embedding, Unet Encoder-Decoder, Stationarity Correction	-
[16]	MLP	TimeMixer	Multiscale Time Series Downsampling, Past-Decomposable-Mixing Block, Future-Multipredictor-Mixing Block	√
[41]	RNN	LSTNet	Convolutional Component, Recurrent Component, Recurrent-skip Component, Dense Layer, Autoregressive Linear Model, Final Forecasting	√
[34]	RNN	DA-RNN	Input Attention Mechanism, Encoder (LSTM), Temporal Attention Mechanism, Decoder (LSTM), Output Layer (LSTM)	-
[35]	RNN	DeepAR	Input Layer, Encoder (LSTM), Autoregressive Mechanism, Decoder (LSTM), Output Layer (LSTM), Probabilistic Forecasting	√
[64]	RNN	MQRNN	Encoder (LSTM), Decoder (Global MLP and Local MLP), Forking-Sequences Training Scheme, Target Masking Strategy	-
[38]	RNN	mWDN	Multilevel Discrete Wavelet Decomposition, Residual Classification Flow, multi-frequency Long Short-Term Memory	-
[42]	RNN	MTNet	Large Memory Component, Three Separate Encoders, Attention Mechanism, Convolutional Layer, Autoregressive Component	√
[65]	RNN	ESLSTM	Deseasonalization and Adaptive Normalization, Generation of Forecasts, Ensembling, Dilated LSTM-based Stacks, Linear Adapter Layer, Residual Connections, Attention Mechanism	-
[66]	RNN	MH-TAL	Encoder (LSTM), Decoder (BiLSTM), Temporal Attention Mechanism, Multimodal Fusion, Fully Connected Layer	-
[36]	RNN	C2FAR	Hierarchical Generation, Neural Network Parameterization, Negative Log-Likelihood Minimization, RNN-based Forecasting, Multi-Level C2FAR Models	-
[39]	RNN	SegRNN	Segment-wise Iterations, Parallel Multi-step Forecasting (PMF), GRU Cell, Encoding Phase, Decoding Phase, Channel Independent (CI) Strategy, Channel Identifier	-
[40]	RNN	WITRAN	Water-wave Information Transmission, Horizontal Vertical Gated Selective Unit, Recurrent Acceleration Network	-
[37]	RNN	SutraNets	Sub-series Decomposition, Autoregressive Model, Parallel Training, C2FAR-LSTMs, Low2HighFreq Approach, Backfill-alt Strategy, Monte Carlo Sampling	-
[44]	CNN	DeepGLO	Global Matrix Factorization Model, Local Temporal Convolution Network, Hybrid Model, Handling Scale Variations	√
[45]	CNN	DSANet	Global Temporal Convolution, Local Temporal Convolution, Self-Attention Module, Autoregressive Component, Parallel Computing and Long Sequence Modeling	√
[51]	CNN	MLCNN	Convolutional Component, Sharing Mechanism, Fusion Encoder (LSTM), Main Decoder (LSTM), Autoregressive Component, Multi-Task Learning Framework	-
[46]	CNN	SCINet	Interactive Learning, Hierarchical Structure, Residual Connection, Decoder (Fully Connected Network), Multiple Layers of SCINet, Intermediate Supervision	√
[47]	CNN	MICN	Multiple Branches of Convolution Kernels, Local Features Extraction, Global Correlations Modeling, Merge Operation, Seasonal Forecasting Block, Trend-cyclical Forecasting Block	√
[49]	CNN	TimesNet	TimesBlock, Multi-scale 2D Kernels, Residual Connection, Various Vision Backbones	√
[57]	CNN	LightCTS	Plain Stacking Architecture, Light-TCN, Global-Local TransFormer, Last-shot Compression Scheme, Embedding Module, Aggregation and Output Module	√
[55]	CNN	TLNets	Fourier Transform, Singular Value Decomposition, Matrix Multiplication, Convolutional Block, Receptive Field Learning	-
[52]	CNN	Cross-LKTCN	Patch-Style Embedding Strategy, Depth-Wise Large Kernel Convolution, Feed Forward Networks, Multiple Cross-LKTCN Block Stacking, Linear Head with a Flatten Layer	-
[67]	CNN	MPPN	Multi-Resolution Patching, Multi-Periodic Pattern Mining, Channel Adaptive Module, Output Layer	-
[56]	CNN	FDNet	Decomposed Forecasting Formula, Basic Linear Projection Layers, 2D Convolutional Layers, Focal Input Sequence Decomposition, Final Output Design	-
[48]	CNN	PatchMixer	Single-scale Depthwise Separable Convolutional Block, MLP, Patch Embedding, Patch-Mixing, Instance Normalization	-
[50]	CNN	WinNet	Inter-Intra Period Encoder, Two-Dimensional Period Decomposition, Decomposition Correlation Block, Series Decoder	-
[54]	CNN	ModernTCN	Variable-Independent Embedding, Depthwise Convolution, ConvFFN1, ConvFFN2, Fully-Convolutional Structure	-
[53]	CNN	ConvTimeNet	Deformable Patch Embedding, Fully Convolutional Blocks, Hierarchical and Multi-Scale Representations, Linear Layer	-
[19]	GNN	STGCN	Spatiotemporal Convolutional Blocks, Graph Convolutional Layers, Gated Temporal Convolutional Layers, Residual Connections and Bottleneck Strategy, Fully-Connected Output Layer	√
[59]	GNN	MTGNN	Graph Learning Layer, Graph Convolution Modules, Temporal Convolution Modules, Residual and Skip Connections, Output Module, Curriculum Learning Strategy	√
[61]	GNN	StemGNN	Latent Correlation Layer, Graph Fourier Transform (GFT), Discrete Fourier Transform (DFT), 1D Convolution and GLU Graph Convolution and Inverse GFT, Residual Connections, Inverse Discrete Fourier Transform (IDFT)	√
[60]	GNN	TPGNN	Encoder-Decoder, Temporal Polynomial Graph (TPG), Diffusion Graph Convolution Layer, Adaptive Graph Construction	√
[18]	GNN	FourierGNN	Hypervariate Graph, Fourier Graph Operator (FGO), Stacking FGO Layers in Fourier Space	√
[63]	GNN	MSGNet	Scale Learning and Transforming Layer, Multiple Graph Convolution Module, Temporal Multi-Head Attention Module, ScaleGraph Block, Input Embedding and Residual Connection Multi-Scale Adaptive Graph Convolution, Multi-Head Attention Mechanism, Integrating Representations from Different Scales	-

References

Patterson, K. An Introduction to ARMA Models. In Unit Root Tests in Time Series: Key Concepts and Problems; Palgrave Macmillan: London, UK, 2011; pp. 68–122. [Google Scholar]
Cao, L.J.N. Support vector machines experts for time series forecasting. Neurocomputing 2003, 51, 321–339. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Elman, J. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Yang, R.; Zha, X.; Liu, K.; Xu, S. A CNN model embedded with local feature knowledge and its application to time-varying signal classification. Neural Netw. 2021, 142, 564–572. [Google Scholar] [CrossRef] [PubMed]
Karimi, A.M.; Wu, Y.; Koyuturk, M.; French, R. Spatiotemporal graph neural network for performance prediction of photovoltaic power systems. Proc. AAAI Conf. Artif. Intell. 2021, 35, 15323–15330. [Google Scholar] [CrossRef]
Liu, C.; Li, M.; Yu, Y.; Wu, Z.; Gong, H.; Cheng, F. A review of multitemporal and multispatial scales photovoltaic forecasting methods. IEEE Access 2022, 10, 35073–35093. [Google Scholar] [CrossRef]
Huang, C.; Cao, L.; Peng, N.; Li, S.; Zhang, J.; Wang, L.; Luo, X.; Wang, J.-H. Day-ahead forecasting of hourly photovoltaic power based on robust multilayer perception. Sustainability 2018, 10, 4863. [Google Scholar] [CrossRef]
Anwar, M.T.; Islam, M.F.; Alam, M.G.R. Forecasting Meteorological Solar Irradiation Using Machine Learning and N-BEATS Architecture. In Proceedings of the 2023 8th International Conference on Machine Learning Technologies, Stockholm, Sweden, 10–12 March 2023; pp. 46–53. [Google Scholar]
Wang, K.; Qi, X.; Liu, H. Photovoltaic power forecasting based LSTM-Convolutional Network. Energy 2019, 189, 116225. [Google Scholar] [CrossRef]
Yemane, S. Deep Forecasting of Renewable Energy Production with Numerical Weather Predictions. Master’s Thesis, LUT University, Lappeenranta, Finland, 2021. [Google Scholar]
Sun, F.-K.; Boning, D. Fredo: Frequency domain-based long-term time series forecasting. arXiv 2022, arXiv:2205.12301. [Google Scholar]
Wang, S.; Wu, H.; Shi, X.; Hu, T.; Luo, H.; Ma, L.; Zhang, J.Y.; Zhou, J. TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Woschitz, M. Spatio-Temporal PV Forecasting with (Graph) Neural Networks. Master’s Thesis, Technische Universität Wien, Vienna, Austria, 2023. [Google Scholar]
Yi, K.; Zhang, Q.; Fan, W.; He, H.; Hu, L.; Wang, P.; An, N.; Cao, L.; Niu, Z. FourierGNN: Rethinking multivariate time series forecasting from a pure graph perspective. arXiv 2024, arXiv:2311.06190v1. [Google Scholar]
Zhang, M.; Tao, P.; Ren, P.; Zhen, Z.; Wang, F.; Wang, G. Spatial-Temporal Graph Neural Network for Regional Photovoltaic Power Forecasting Based on Weather Condition Recognition. In Proceedings of the 10th Renewable Power Generation Conference (RPG 2021), Online, 14–15 October 2021; pp. 361–368. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Zhang, H.; Lu, G.; Zhan, M.; Zhang, B. Semi-supervised classification of graph convolutional networks with Laplacian rank constraints. Neural Process. Lett. 2022, 54, 2645–2656. [Google Scholar] [CrossRef]
Oreshkin, B.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv 2019, arXiv:1905.10437. [Google Scholar]
Olivares, K.G.; Challu, C.; Marcjasz, G.; Weron, R.; Dubrawski, A. Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx. Int. J. Forecast. 2023, 39, 884–900. [Google Scholar] [CrossRef]
Challu, C.; Olivares, K.G.; Oreshkin, B.N.; Ramirez, F.G.; Canseco, M.M.; Dubrawski, A. Nhits: Neural hierarchical interpolation for time series forecasting. Proc. AAAI Conf. Artif. Intell. 2023, 37, 6989–6997. [Google Scholar] [CrossRef]
Fan, W.; Zheng, S.; Yi, X.; Cao, W.; Fu, Y.; Bian, J.; Liu, T.-Y. DEPTS: Deep expansion learning for periodic time series forecasting. arXiv 2022, arXiv:2203.07681. [Google Scholar]
Li, Z.; Rao, Z.; Pan, L.; Xu, Z. Mts-mixers: Multivariate time series forecasting via factorized temporal and channel mixing. arXiv 2023, arXiv:2302.04501. [Google Scholar]
Chen, S.-A.; Li, C.-L.; Yoder, N.; Arik, S.O.; Pfister, T. Tsmixer: An all-mlp architecture for time series forecasting. arXiv 2023, arXiv:2303.06053. [Google Scholar]
Vijay, E.; Jati, A.; Nguyen, N.; Sinthong, G.; Kalagnanam, J. TSMixer: Lightweight MLP-mixer model for multivariate time series forecasting. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024. [Google Scholar]
Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; An, N.; Lian, D.; Cao, L.; Niu, Z. Frequency-domain MLPs are more effective learners in time series forecasting. arXiv 2024, arXiv:2311.06184. [Google Scholar]
Luo, Y.; Lyu, Z.; Huang, X. TFDNet: Time-Frequency Enhanced Decomposed Network for Long-term Time Series Forecasting. arXiv 2023, arXiv:2308.13386. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Jia, P.; Zhang, H.; Liu, X.; Gong, X. Short-term photovoltaic power forecasting based on VMD and ISSA-GRU. IEEE Access 2021, 9, 105939–105950. [Google Scholar] [CrossRef]
Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G. A dual-stage attention-based recurrent neural network for time series prediction. arXiv 2017, arXiv:1704.02971. [Google Scholar]
Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
Bergsma, S.; Zeyl, T.; Rahimipour Anaraki, J.; Guo, L. C2far: Coarse-to-fine autoregressive networks for precise probabilistic forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 21900–21915. [Google Scholar]
Bergsma, S.; Zeyl, T.; Guo, L. SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting. Adv. Neural Inf. Process. Syst. 2023, 36, 30518–30533. [Google Scholar]
Wang, J.; Wang, Z.; Li, J.; Wu, J. Multilevel wavelet decomposition network for interpretable time series analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2437–2446. [Google Scholar]
Lin, S.; Lin, W.; Wu, W.; Zhao, F.; Mo, R.; Zhang, H. SegRNN: Segment Recurrent Neural Network for Long-Term Time Series Forecasting. arXiv 2023, arXiv:2308.11200. [Google Scholar]
Jia, Y.; Lin, Y.; Hao, X.; Lin, Y.; Guo, S.; Wan, H. Witran: Water-wave information transmission and recurrent acceleration network for long-range time series forecasting. Adv. Neural Inf. Process. Syst. 2024, 36, 12389–12456. [Google Scholar]
Lai, G.; Chang, W.-C.; Yang, Y.; Liu, H. Modeling long-and short-term temporal patterns with deep neural networks. In Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar]
Chang, Y.-Y.; Sun, F.-Y.; Wu, Y.-H.; Lin, S.-D. A memory-network based solution for multivariate time-series forecasting. arXiv 2018, arXiv:1809.02105. [Google Scholar]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Sen, R.; Yu, H.-F.; Dhillon, I. Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting. NIPS’19 2019, 32, 4837–4846. [Google Scholar]
Huang, S.; Wang, D.; Wu, X.; Tang, A. Dsanet: Dual self-attention network for multivariate time series forecasting. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2129–2132. [Google Scholar]
Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. Scinet: Time series modeling and forecasting with sample convolution and interaction. Adv. Neural Inf. Process. Syst. 2022, 35, 5816–5828. [Google Scholar]
Wang, H.; Peng, J.; Huang, F.; Wang, J.; Chen, J.; Xiao, Y. Micn: Multi-scale local and global context modeling for long-term series forecasting. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2022. [Google Scholar]
Gong, Z.; Tang, Y.; Liang, J. Patchmixer: A patch-mixing architecture for long-term time series forecasting. arXiv 2023, arXiv:2310.00655. [Google Scholar]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. In Proceedings of the The eleventh international conference on learning representations. arXiv 2022, arXiv:2210.02186. [Google Scholar]
Ou, W.; Guo, D.; Zhang, Z.; Zhao, Z.; Lin, Y. WinNet: Time series forecasting with a window-enhanced period extracting and interacting. arXiv 2023, arXiv:2311.00214. [Google Scholar]
Cheng, J.; Huang, K.; Zheng, Z. Towards better forecasting by fusing near and distant future visions. Proc. AAAI Conf. Artif. Intell. 2020, 34, 3593–3600. [Google Scholar] [CrossRef]
Luo, D.; Wang, X. Cross-LKTCN: Modern Convolution Utilizing Cross-Variable Dependency for Multivariate Time Series Forecasting Dependency for Multivariate Time Series Forecasting. arXiv 2023, arXiv:2306.02326. [Google Scholar]
Cheng, M.; Yang, J.; Pan, T.; Liu, Q.; Li, Z. Convtimenet: A deep hierarchical fully convolutional model for multivariate time series analysis. arXiv 2024, arXiv:2403.01493. [Google Scholar]
Luo, D.; Wang, X. Moderntcn: A modern pure convolution structure for general time series analysis. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Wang, W.; Liu, Y.; Sun, H. Tlnets: Transformation learning networks for long-range time-series prediction. arXiv 2023, arXiv:2305.15770. [Google Scholar]
Shen, L.; Wei, Y.; Wang, Y.; Qiu, H. FDNet: Focal Decomposed Network for efficient, robust and practical time series forecasting. Knowl.Based Syst. 2023, 275, 110666. [Google Scholar] [CrossRef]
Lai, Z.; Zhang, D.; Li, H.; Jensen, C.S.; Lu, H.; Zhao, Y. Lightcts: A lightweight framework for correlated time series forecasting. Proc. ACM Manag. Data 2023, 1, 1–26. [Google Scholar] [CrossRef]
Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated graph sequence neural networks. arXiv 2015, arXiv:1511.05493. [Google Scholar]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 6–10 July 2020; pp. 753–763. [Google Scholar]
Liu, Y.; Liu, Q.; Zhang, J.-W.; Feng, H.; Wang, Z.; Zhou, Z.; Chen, W. Multivariate time-series forecasting with temporal polynomial graph neural networks. Adv. Neural Inf. Process. Syst. 2022, 35, 19414–19426. [Google Scholar]
Cao, D.; Wang, Y.; Duan, J.; Zhang, C.; Zhu, X.; Huang, C.; Tong, Y.; Xu, B.; Bai, J.; Tong, J. Spectral temporal graph neural network for multivariate time-series forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17766–17778. [Google Scholar]
Zhang, S.; Gong, S.; Ren, Z.; Zhang, Z. Photovoltaic Power Prediction Based on Time-Space-Attention Mechanism and Spectral Temporal Graph. 2021. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4547760 (accessed on 22 October 2023).
Cai, W.; Liang, Y.; Liu, X.; Feng, J.; Wu, Y. Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting. Proc. AAAI Conf. Artif. Intell. 2024, 38, 11141–11149. [Google Scholar] [CrossRef]
Wen, R.; Torkkola, K.; Narayanaswamy, B.; Madeka, D. A multi-horizon quantile recurrent forecaster. arXiv 2017, arXiv:1711.11053. [Google Scholar]
Smyl, S. A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. Int. J. Forecast. 2020, 36, 75–85. [Google Scholar] [CrossRef]
Fan, C.; Zhang, Y.; Pan, Y.; Li, X.; Zhang, C.; Yuan, R.; Wu, D.; Wang, W.; Pei, J.; Huang, H. Multi-horizon time series forecasting with temporal attention learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2527–2535. [Google Scholar]
Wang, X.; Wang, Z.; Yang, K.; Feng, J.; Song, Z.; Deng, C.; Zhu, L. MPPN: Multi-Resolution Periodic Pattern Network For Long-Term Time Series Forecasting. arXiv 2023, arXiv:2306.06895. [Google Scholar]

Figure 1. Multilayer perceptron with a single hidden layer containing five hidden units.

Figure 2. A network module for RNNs.

Figure 3. The architecture of an LSTM model.

Figure 4. Unit structure of GRU network.

Figure 5. The basic structure of a convolutional neural network.

Figure 6. The method of network information propagation in the TCN model.

Figure 7. The approach for calculating an element of the output tensor.

Figure 8. The diagram showing the relationship between two consecutive output elements and their respective input subsequences.

Figure 9. The case of multiple input channels.

Figure 10. A classic graph neural network.

Figure 11. The modeling process of the spatial-temporal graph neural network (ST-GNN).

Figure 12. Using RNN as components of ST-GNN.

Figure 13. All references and PV power forecasting references.

Table 1. Summary of model evaluation indicators [10].

Metrics	Equation	Description
MSE	$M S E (p, \hat{p}) = \frac{1}{n_{s a m p l e s}} \sum_{i = 1}^{n_{s a m p l e s}} {(p_{i} - {\hat{p}}_{i})}^{2}$	Measures the expected value of the squared difference between the predicted and actual values, to evaluate the degree of variation in the forecasting.
MAE	$M A E (p, \hat{p}) = \frac{1}{n_{s a m p l e s}} \sum_{i = 1}^{n_{s a m p l e s}} \| p_{i} - {\hat{p}}_{i} \|$	Calculates the average of the absolute errors between the predicted and actual values.
MBE	$M B E (p, \hat{p}) = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{p}}_{i} - p_{i})$	Computes the average deviation between the predicted and true values.
MAPE	$M A P E (p, {\hat{p}}_{i}) = \frac{1}{n} \sum_{i = 1}^{n} \| \frac{{\hat{p}}_{i} - p_{i}}{p_{i}} \| \times 100 %$	Assesses the average of the percentage errors between the predicted and actual values.
RMSE	$R M S E (p, \hat{p}) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(p_{i} - {\hat{p}}_{i})}^{2}}$	Evaluates the average of the square root of the squared errors between the predicted and actual values.
SDE	$S D E = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(e - \bar{e})}^{2}}$	Measures the degree of dispersion between the predicted values and the actual values.
RSE	$R S E = \frac{\sum_{i = 1}^{n} {(p_{i} - {\hat{p}}_{i})}^{2}}{\sum_{i = 1}^{n} {(p_{i} - {\bar{p}}_{i})}^{2}}$	Employed to evaluate the performance improvement of the predictive model compared to a simple average model.
RRSE	$R R S E = \sqrt{\frac{\sum_{i = 1}^{n} {(p_{i} - {\hat{p}}_{i})}^{2}}{\sum_{i = 1}^{n} {(p_{i} - {\bar{p}}_{i})}^{2}}}$	Represents the square root of the performance improvement of the predictive model relative to a simple average model.
R²	$R^{2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{p}}_{i} - p_{i})}^{2}}{\sum_{i = 1}^{n} {(p_{i} - {\bar{p}}_{i})}^{2}}$	Signifies the correlation between the predicted values and the actual values.

Table 2. Dataset retrieval table.

Reference	Datasets	Description
[11]	Solar-6	Hourly PV power generation data from the Andre Agassi Preparatory Academy Building B PV power station (36.19 N, 115.16 W, elevation 620 m) in the United States, spanning from 1 January 2012 to 31 December 2017. The data can be obtained from https://maps.nrel.gov/pvdaq/. e.g., accessed on 1 July 2024
[12]	NSRDB	The NSRDB is a database containing global PV horizontal irradiance (GHI), direct normal irradiance (DNI), and diffuse horizontal irradiance (DHI) data. The authors selected data from four regions in Bangladesh (Khulna, Chittagong, Rajshahi, and Sylhet) during the period from 1 January 2018 to 31 December 2020. The data can be obtained from https://nsrdb.nrel.gov/. e.g., accessed on 1 July 2024
[13]	DKASC	This is a publicly available PV system dataset provided by the Desert Knowledge Australia Solar Centre (DKASC). The data includes features related to the PV system, such as current phase average, active power, wind speed, air temperature, relative humidity, global horizontal radiation, scattered horizontal radiation, and wind direction, with a sampling frequency of 5 min. The data can be obtained from http://dkasolarcentre.com.au/download?location=alice-springs. e.g., accessed on 1 July 2024
[14]	Solar-7	PV power generation data from the Finnish Meteorological Institute (FMI) station located in Helsinki, Finland. The data span from 26 August 2015 to 31 December 2020, and contain four PV power generation time-series with a sampling frequency of 1 min. The data also include global horizontal radiation, scattered radiation, direct normal irradiance, global radiation on the tilted PV surface, air temperature, and PV module temperature. The data can be obtained from https://en.ilmatieteenlaitos.fi/download-observations/. e.g., accessed on 1 July 2024
[15]	Solar-2	Data from the National Renewable Energy Laboratory (NREL) in the United States, recording PV power generation from 137 PV power stations in Alabama in 2006. The data have a sampling frequency of 10 min, with a total of 52,560 time points. The data can be obtained from https://www.nrel.gov/grid/solar-power-data.html/. e.g., accessed on 1 July 2024
[16]	Solar-1	Data from the National Renewable Energy Laboratory (NREL) in the United States, recording PV power generation from 137 PV power stations in Alabama in 2007. The data have a sampling frequency of 10 min, with a total of 52,560 time points.
[17]	Solar-5	Data collected by the Energy Intranets project in the Netherlands and provided by the Netherlands Research Council (NWO). The data consist of power data recorded at a sampling frequency of 0.5 Hz from 175 private residential rooftop PV systems in the province of Utrecht, spanning from January 2014 to December 2017. The data include the geographic location (latitude and longitude), tilt angle, azimuth angle, and estimated maximum power output for each PV system. The data can be obtained from https://zenodo.org/records/10953360. e.g., accessed on 1 July 2024
[18]	Solar-3	Data from a PV power plant in Florida, collected by the National Renewable Energy Laboratory (NREL) in the United States. The dataset contains 593 data points, spanning from 1 January 2006 to 31 December 2016, with a sampling interval of 1 h. The data can be obtained from https://www.nrel.gov/grid/solar-power-data.html/. e.g., accessed on 1 July 2024
[19]	Solar-4	Actual power generation data from 20 PV power stations in Jilin Province, China, ranging from 13 March 2018 to 30 June 2019, with a sampling frequency of 15 min. The data include the latitude and longitude information for each power station and the cloud total amount (CTA) data provided by the Fengyun-4G (FY-4G) satellite.

Table 3. Daily forecasting performance of the Solar-6 dataset [11].

Method	RMSE	MAE
Robust-MLP	0.6508	0.4370
Generic-MLP	0.6635	0.4511

Table 4. Performance comparison of the NSRDB dataset [12].

Features	Models	CTG		KHU		SYL		RAJ
Features	Models	RMSE	MAPE	RMSE	MAPE	RMSE	MAPE	RMSE	MAPE
clear sky GHI	N-BEATS	29.03	2.34%	37.89	3.29%	31.76	2.53%	35.77	3.15%
clear sky GHI	LSTM	424.39	19.79%	421.20	15.73%	434.22	22.04%	424.39	20.23%
clear sky DHI	N-BEATS	73.001	6.28%	70.81	5.93%	102.39	11.79%	91.33	8.11%
clear sky DHI	LSTM	136.21	14.09%	120.81	13.74%	125.49	18.69%	156.73	16.89%
clear sky DNI	N-BEATS	103.33	10.32%	119.10	13.02%	98.39	9.44%	85.63	7.39%
clear sky DNI	LSTM	256.49	20.33%	254.40	21.79%	260.11	19.04%	252.39	18.34%

Table 5. Experimental results of TimeMixer, DLinear, and Informer using Solar-1 dataset [16].

Methods	Metric	Forecasting Length				Avg
Methods	Metric	96	192	336	720	Avg
TimeMixer	MSE	0.189	0.222	0.231	0.223	0.216
TimeMixer	MAE	0.259	0.283	0.292	0.285	0.280
DLinear	MSE	0.290	0.320	0.353	0.357	0.330
DLinear	MAE	0.378	0.398	0.415	0.413	0.401
Informer	MSE	0.287	0.297	0.367	0.374	0.331
Informer	MAE	0.323	0.341	0.429	0.431	0.381

Table 6. Results of FreDo and Autoformer models using the Solar-2 dataset [15].

Methods	Metric	Forecasting Lengths				Avg
Methods	Metric	96	192	336	720	Avg
FreDo	MSE	0.176	0.193	0.202	0.207	0.195
FreDo	MAE	0.234	0.248	0.255	0.260	0.249
Autoformer	MSE	0.466	0.761	0.820	0.834	0.720
Autoformer	MAE	0.467	0.618	0.690	0.653	0.607

Table 7. Performance comparison of LSTM, CNN, and LSTM-CNN models using the DKASC dataset [13].

Models	Metric
Models	MAE	RMSE	MAPE	SDE
LSTM	0.327	0.709	0.062	0.689
CNN	0.304	0.822	0.058	0.790
LSTM-CNN	0.221	0.621	0.042	0.635

Table 8. Comparison of LSTM, GRU, and VMD-ISSA-GRU using the DKASC dataset [33].

Models	The Weather Conditions Are Relatively Stable			The Weather Conditions Fluctuate Obviously			The Weather Conditions Fluctuated Violently
Models	RMSE (kW)	MAE (kW)	$R_{a d j}^{2}$	RMSE (kW)	MAE (kW)	$R_{a d j}^{2}$	RMSE (kW)	MAE (kW)	$R_{a d j}^{2}$
LSTM	4.9669	3.8476	0.9924	9.2025	7.6064	0.9725	17.4440	11.9388	0.8513
GRU	4.9791	3.9001	0.9923	10.5367	8.3149	0.9640	18.8791	12.7621	0.8258
VMD-ISSA-GRU	0.6898	0.5409	0.9999	1.9933	1.4711	0.9987	3.7858	2.8565	0.9930

Table 9. Comparison of DeepAR and traditional baseline models—using Solar-7 datasets Yemane [14].

	DeepAR	Naïve	Seasonal Naive Predictor	Constant Predictor
	36 h forecasting horizon
Best sample	11	0	15	0
Average sample	1400	4289	1119	3078
Worst sample	4424	9989	5716	8216
Average error	1178	3781	2148	4091
	1 h forecasting horizon
Best sample	0	0	0	0
Average sample	5	0	0	0
Worst sample	169	194	163	194
Average error	22	19	20	19

Table 10. The comparison results on the Solar-2 dataset [42].

Models	Metric	Horizon
Models	Metric	3	6	12	24
RNN-GRU	RSE	0.1932	0.2628	0.4163	0.4852
RNN-GRU	CORR	0.9823	0.9675	0.9150	0.8823
LSTNet	RSE	0.1916	0.2475	0.3449	0.4521
LSTNet	CORR	0.9820	0.9698	0.9394	0.8911
MTNet	RSE	0.1847	0.2398	0.3251	0.4285
MTNet	CORR	0.9840	0.9723	0.9462	0.9013

Table 11. The comparison results on the Solar-2 dataset [46].

Models	Metric	Horizon
Models	Metric	3	6	12	24
TCN	RSE	0.1940	0.2581	0.3512	0.4732
TCN	CORR	0.9835	0.9602	0.9321	0.8812
LSTNet	RSE	0.1843	0.2559	0.3254	0.4643
LSTNet	CORR	0.9843	0.9690	0.9467	0.8870
SCINet	RSE	0.1775	0.2301	0.2997	0.4081
SCINet	CORR	0.9853	0.9739	0.9550	0.9112

Table 12. Unified hyperparameter results for Solar-1 dataset [49].

Methods	Metric	Forecasting Lengths				Avg
Methods	Metric	96	192	336	720	Avg
MICN	MSE	0.257	0.278	0.298	0.299	0.283
MICN	MAE	0.325	0.354	0.375	0.379	0.358
TimesNet	MSE	0.373	0.397	0.420	0.420	0.430
TimesNet	MAE	0.358	0.376	0.380	0.381	0.374

Table 13. Accuracy and lightness comparison for single-step CTS forecasting on the Solar-2 dataset [57].

Methods	Metric	Forecasting Lengths				FLOPs (Unit: M)	Params (Unit: K)	Latency (Unit: s)	Peak Mem (Unit: Mb)
Methods	Metric	3	6	12	24	FLOPs (Unit: M)	Params (Unit: K)	Latency (Unit: s)	Peak Mem (Unit: Mb)
DSANet	RRSE	0.1822	0.2450	0.3287	0.4389	914	6377	0.8	32.5
DSANet	CORR	0.9842	0.9701	0.9444	0.8943	914	6377	0.8	32.5
MTGNN	RRSE	0.1778	0.2348	0.3109	0.4270	1090	348	0.5	9.9
MTGNN	CORR	0.9852	0.9726	0.9509	0.9031	1090	348	0.5	9.9
LightCTS	RRSE	0.1714	0.2202	0.2955	0.4129	169	38	0.2	8.6
LightCTS	CORR	0.9864	0.9765	0.9568	0.9084	169	38	0.2	8.6

Table 14. The comparison results on the Solar-5 dataset [17].

Metric	PV s432		PV s499		PV s353		PV s192
Metric	GNN	LinReg	GNN	LinReg	GNN	LinReg	GNN	LinReg
RMSE	85.99	112.74	89.14	146.63	97.25	105.77	91.08	132.83
MAE	57.59	78.04	65.93	104.82	69.14	75.55	72.54	98.54

Table 15. The comparison results on the Solar-1 dataset [60].

Models	Metric	Horizon
Models	Metric	3	6	12	24
RNN-GRU	RSE	0.1932	0.2628	0.4163	0.4852
RNN-GRU	CORR	0.9823	0.9675	0.9150	0.8823
LSTNet	RSE	0.1843	0.2559	0.3254	0.4643
LSTNet	CORR	0.9843	0.9690	0.9467	0.8870
MTGNN	RSE	0.1778	0.2348	0.3109	0.4270
MTGNN	CORR	0.9852	0.9726	0.9509	0.9031
TPGNN	RSE	0.1850	0.2412	0.3059	0.3498
TPGNN	CORR	0.9840	0.9716	0.9529	0.9710

Table 16. The comparison results on the DKASC dataset [62].

Models	MAE (KW)	RMSE (KW)	$δ_{m a x}$ (KW)	$R^{2}$
LSTM	0.36	0.60	2.48	0.90
GRU	0.39	0.67	2.63	0.85
StemGNN	0.35	0.59	2.92	0.88

Table 17. The comparison results on the Solar-1 dataset [61].

	MAE	RMSE	MAPE (%)
N-BEATS	0.09	0.15	23.53
LSTNet	0.07	0.19	19.13
TCN	0.06	0.06	21.1
DeepGLO	0.09	0.14	21.6
StemGNN	0.03	0.07	11.55

Table 18. The comparison results on the Solar-3 dataset [18].

	MAE	RMSE	MAPE (%)
LSTNet	0.148	0.200	132.95
TCN	0.176	0.222	142.23
DeepGLO	0.178	0.400	346.78
StemGNN	0.176	0.222	128.39
MTGNN	0.151	0.207	507.91
FourierGNN	0.120	0.162	116.48

Table 19. The comparison results on the Solar-4 dataset [19].

Forecasting Length	Metric	Models
Forecasting Length	Metric	STGCN- Classified	STGCN- Unclassified	GCN	LSTM
15 min	RMSE	2.242%	2.297%	2.921%	3.323%
15 min	MAE	1.282%	1.909%	1.645%	2.146%
1 h	RMSE	3.457%	3.709%	4.226%	4.261%
1 h	MAE	2.141%	2.246%	2.410%	2.573%
2 h	RMSE	4.870%	5.028%	5.497%	5.309%
2 h	MAE	2.712%	2.986%	3.071%	3.031%
3 h	RMSE	6.143%	6.468%	6.978%	6.676%
3 h	MAE	3.467%	3.702%	3.758%	3.991%
4 h	RMSE	7.342%	7.402%	8.395%	8.535%
4 h	MAE	4.075%	4.342%	4.427%	5.291%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, J.; Li, X.; Yang, L.; Li, L.; Huang, Z.; Shen, K.; Yang, X.; Yang, X.; Xu, Z.; Zhang, D.; et al. Deep Learning Models for PV Power Forecasting: Review. Energies 2024, 17, 3973. https://doi.org/10.3390/en17163973

AMA Style

Yu J, Li X, Yang L, Li L, Huang Z, Shen K, Yang X, Yang X, Xu Z, Zhang D, et al. Deep Learning Models for PV Power Forecasting: Review. Energies. 2024; 17(16):3973. https://doi.org/10.3390/en17163973

Chicago/Turabian Style

Yu, Junfeng, Xiaodong Li, Lei Yang, Linze Li, Zhichao Huang, Keyan Shen, Xu Yang, Xu Yang, Zhikang Xu, Dongying Zhang, and et al. 2024. "Deep Learning Models for PV Power Forecasting: Review" Energies 17, no. 16: 3973. https://doi.org/10.3390/en17163973

APA Style

Yu, J., Li, X., Yang, L., Li, L., Huang, Z., Shen, K., Yang, X., Yang, X., Xu, Z., Zhang, D., & Du, S. (2024). Deep Learning Models for PV Power Forecasting: Review. Energies, 17(16), 3973. https://doi.org/10.3390/en17163973

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning Models for PV Power Forecasting: Review

Abstract

1. Introduction

2. Fundamental Deep Learning Models for Time-Series Forecasting and PV Datasets

2.1. MLP (Multilayer Perceptron)

2.2. RNN (Recurrent Neural Networks)

2.3. CNN (Convolutional Neural Networks)

2.4. GNN (Graph Neural Networks)

3. Variations of the Baseline Models for PV Power Forecasting

3.1. MLP-Based Model

3.1.1. Univariate Forecasting Models

3.1.2. Multivariate Forecasting Models

3.1.3. Frequency Domain-Based Forecasting Models

3.2. RNN-Based Models

3.2.1. Probabilistic Forecasting Models

3.2.2. Multivariate Forecasting Models

3.3. CNN-Based Models

3.3.1. Cross-Time-Scale Models

3.3.2. Cross-Variate-Dependence Models

3.3.3. Other Models

3.4. GNN-Based Models

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI