1. Introduction
The global pursuit of sustainable development, driven by the urgent need to mitigate climate change and transition away from fossil fuels, has positioned renewable energy at the center of the world’s energy strategy. Among these sources, PV systems are pivotal, experiencing exponential growth [
1,
2]. Highlighting this trend, the International Energy Agency reports that global solar PV capacity additions reached a record 425 GW in 2023 alone, an 80% increase from the previous year, underscoring the accelerating pace of deployment [
3]. This rapid expansion is fundamental to achieving sustainability goals, but it also introduces significant challenges to the stability and reliability of existing power grids. Consequently, the ability to accurately forecast PV power generation has become a critical enabling technology, essential for ensuring the seamless and sustainable integration of solar energy into the broader energy system [
4,
5].
The importance of accurately forecasting PV generation goes beyond simple generation estimates. Precise predictions underpin numerous grid operations, including economic dispatch, ancillary service scheduling, real-time balancing, and peak demand management [
6]. In liberalized electricity markets, forecast accuracy directly impacts financial outcomes by minimizing imbalance costs and enabling optimized trading strategies [
7]. Furthermore, as grid penetration of solar energy increases, the need for sophisticated forecasting becomes more acute to mitigate grid instability and ensure a reliable power supply [
8].
However, PV power forecasting presents significant technical challenges that distinguish it from conventional load forecasting. PV power output exhibits high variability and intermittency, primarily due to fluctuating meteorological conditions such as solar irradiance, cloud cover, temperature, and atmospheric turbidity [
9,
10]. The non-linear relationship between weather variables and power output, combined with the stochastic nature of weather patterns, creates a complex forecasting environment [
11]. Additionally, PV power data exhibits multi-scale temporal dependencies, including diurnal cycles, seasonal variations, and weather-induced short-term fluctuations, which traditional forecasting methods struggle to capture adequately [
12].
The development of PV forecasting methods has gone through several different stages. Early approaches relied primarily on physical models that utilize numerical weather predictions (NWP) and satellite imagery to estimate solar irradiance and subsequent power output [
13]. While these methods provide valuable meteorological insights, they are computationally intensive, require extensive domain expertise, and often exhibit limited accuracy for short-term forecasting horizons [
14]. Simultaneously, statistical approaches such as autoregressive integrated moving average (ARIMA), seasonal decomposition methods, and exponential smoothing gained popularity for their mathematical rigor and interpretability [
15]. However, these linear models fundamentally struggle with the non-linear dynamics inherent in PV power generation, limiting their effectiveness in capturing complex weather–power relationships [
16].
The emergence of machine learning marked a paradigmatic shift in forecasting methodology. support vector machines (SVMs) and random forests demonstrated improved capability in handling non-linear relationships and multivariate inputs [
17]. Neural networks, particularly multi-Layer perceptrons (MLPs), showed promise in capturing complex patterns but were often constrained by their shallow architectures and a susceptibility to overfitting on complex time series data [
18]. The advent of ensemble methods, combining multiple algorithms to leverage their complementary strengths, further advanced the field by improving robustness and reducing prediction variance.
The deep learning revolution has fundamentally changed time series forecasting in several fields. recurrent neural networks (RNNs) and their sophisticated variants, long short-term memory (LSTM) networks and gated recurrent units (GRUs), became the effective standard for sequential data modeling due to their ability to capture long-term dependencies and temporal patterns [
19]. In PV forecasting specifically, LSTM networks have demonstrated remarkable success in modeling complex temporal dynamics, handling multivariate inputs, and adapting to varying weather conditions [
20]. Bidirectional LSTMs and stacked architectures have further enhanced performance by processing sequences in both directions and learning hierarchical representations [
21]. More recently, Wang et al. proposed a novel GA-AMODE-BiLSTM model that integrates genetic algorithm and adaptive multi-objective differential evolution for BiLSTM hyperparameter optimization, achieving superior stability and generalization performance in short-term PV power forecasting [
22].
Concurrent with recurrent architectures, temporal convolutional networks (TCNs) emerged as a compelling alternative, leveraging dilated convolutions to extract local and global features from sequential data efficiently [
23]. TCNs offer advantages in parallel processing and gradient flow stability, making them attractive for real-time forecasting applications. Building on convolutional principles, SCINet introduced a novel architecture that uses sample convolution and interaction to explicitly model downsampled sub-sequences, enhancing its ability to capture features at multiple temporal resolutions [
24]. Hybrid architectures combining CNNs and RNNs have also shown promise in capturing both spatial and temporal patterns in multivariate time series.
The introduction of the transformer architecture marked another watershed moment in sequence modeling [
25]. The self-attention mechanism enables models to directly capture long-range dependencies without the sequential processing limitations of RNNs. Early applications to time series forecasting, such as the temporal fusion transformer, demonstrated competitive performance [
26]. Subsequent innovations like Informer addressed the computational complexity of standard attention for long sequences by introducing sparse attention mechanisms, though its focus on long-range global dependencies comes at the cost of structured temporal modeling, making it less effective for signals with strong periodicities [
27]. Autoformer further advanced the field by embedding a decomposition block within its transformer layers, using a moving average to perform a residual separation of trend and seasonal components [
28]. However, this reliance on a static, predefined function like a moving average limits its adaptability to complex, non-stationary data. Moreover, the repetitive decomposition at each layer risks compressing and losing important signal details. These pioneering models laid the groundwork for decomposition-based forecasting but highlighted the need for more flexible and adaptive mechanisms.
Recent developments have seen the emergence of specialized transformer variants optimized for forecasting. The iTransformer represents a significant innovation by inverting the traditional transformer approach—treating individual time series variables as tokens rather than time steps [
29]. This inversion enables better capture of multivariate correlations and has shown remarkable performance across diverse forecasting benchmarks. Hybrid approaches combining iTransformer with other architectures, such as the iTansformer_LSTM_CA_KAN model, have demonstrated enhanced capabilities by integrating cross-attention mechanisms and kolmogorov–arnold networks (KANs) for improved temporal and covariate interaction modeling [
30].
The paradigm of patch-based methods represents the latest frontier in time series forecasting. Inspired by computer vision’s success with patch-based processing, these approaches segment time series into patches and treat them as tokens for transformer processing. PatchTST pioneered this concept, demonstrating that patch-based tokenization can significantly improve forecasting performance while reducing computational complexity [
31]. The approach excels at capturing local temporal patterns efficiently and has shown particular promise for long-term forecasting scenarios. Building upon this foundation, the xPatch framework introduced a sophisticated dual-stream architecture that processes seasonal and trend components separately after statistical decomposition [
32]. This decomposition-based approach aligns with the fundamental principle that time series often comprise multiple underlying components with distinct characteristics.
Despite these advances, effectively separating the complex, non-stationary components inherent in PV power time series remains a significant challenge for existing methods. Many decomposition-based models still rely on traditional statistical filters (e.g., moving averages) or shallow network structures, which often struggle to adapt to the diverse and dynamic data-generating processes underlying PV power [
33]. As recent reviews on deep learning for time series forecasting have highlighted, while decomposition is a powerful paradigm, its effectiveness is contingent on the quality and adaptability of the separation method itself [
34]. An inflexible decomposition can lead to information leakage between components, where the trend retains high-frequency noise or the seasonal part contains residual trend patterns, ultimately limiting the performance of specialized downstream predictors [
35]. Furthermore, the common practice of using a fixed patch size in patch-based transformers limits the model’s ability to adapt to the multi-scale temporal dynamics of PV power, where patterns of interest can manifest across various time scales simultaneously. This rigidity prevents the model from dynamically focusing on short-term fluctuations during volatile periods or long-term trends during stable conditions.
To address these fundamental limitations, the main contributions of this paper are summarized as follows:
An enhanced xPatch framework is proposed, featuring two key innovations. First, a neural network-based decomposition module (nndecomp) replaces traditional statistical methods, allowing for a more adaptive and data-driven separation of trend and seasonal components. Second, an adaptive patching mechanism dynamically processes the time series at multiple temporal scales and fuses them with an attention mechanism, overcoming the limitations of fixed patch sizes.
The proposed NNDecomp-AdaptivePatch-xPatch model, integrating these data-driven and adaptive components, demonstrates state-of-the-art performance in short-term PV power forecasting. By effectively capturing the complex, non-stationary characteristics of PV power, the model achieves high accuracy, with an R2 value exceeding 0.98 for 1-h-ahead forecasts on the test data.
The superiority of the proposed model is validated through extensive experiments on five real-world datasets from two different countries (Australia and China), demonstrating its generalizability across diverse geographical and climatic conditions. Performance is benchmarked against a wide range of models, from classic LSTMs to modern transformers, using MAE, RMSE, R2, and MBE as evaluation metrics. Furthermore, an ablation study is conducted to systematically verify the individual contributions of the nndecomp and adaptive patching modules.
The remainder of this paper is arranged as follows.
Section 2 describes our comprehensive data preprocessing methodology.
Section 3 presents the detailed architecture of our proposed framework.
Section 4 outlines the case study, experimental setup, and comprehensive results analysis. Finally,
Section 5 concludes the paper.