MSVMD-Informer: A Multi-Variate Multi-Scale Method to Wind Power Prediction

Liu, Zhijian; Chen, Jikai; Dong, Hang; Wang, Zizhuo

doi:10.3390/en18071571

Open AccessArticle

MSVMD-Informer: A Multi-Variate Multi-Scale Method to Wind Power Prediction

Faculty of Electrical Engineering, Kunming University of Science and Technology, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(7), 1571; https://doi.org/10.3390/en18071571

Submission received: 31 January 2025 / Revised: 17 March 2025 / Accepted: 19 March 2025 / Published: 21 March 2025

(This article belongs to the Section F: Electrical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Wind power prediction plays a crucial role in enhancing power grid stability and wind energy utilization efficiency. Existing prediction methods demonstrate insufficient integration of multi-variate features, such as wind speed, temperature, and humidity, along with inadequate extraction of correlations between variables. This paper proposes a novel multi-variate multi-scale wind power prediction method named multi-scale variational mode decomposition informer (MSVMD-Informer). First, a multi-scale modal decomposition module is designed to decompose univariate time-series features into multiple scales. Adaptive graph convolution is applied to extract correlations between scales, while self-attention mechanisms are utilized to capture temporal dependencies within the same scale. Subsequently, a multi-variate feature fusion module is proposed to better account for inter-variable correlations. Finally, the informer is reconstructed by integrating the aforementioned modules, enabling multi-variate multi-scale wind power forecasting. The proposed method was evaluated through comparative experiments and ablation studies against seven baselines using a public dataset and two private datasets. Experimental results demonstrate that our proposed method achieves optimal metric performance, with its lowest MAPE scores being 1.325%, 1.500% and 1.450%, respectively.

Keywords:

multi-variate time series; wind power prediction; multi-scale decomposition; informer; self-attention mechanism

1. Introduction

Wind power, as a crucial renewable energy source, has experienced rapid development and widespread adoption globally in recent years [1]. Accurate wind power prediction plays a vital role in power grid operation, helping maintain grid stability and optimize energy dispatch planning [2]. By precisely forecasting wind power generation, power system operators can better manage the intermittent nature of wind energy, thereby improving overall system reliability and operational efficiency [3]. This holds significant implications for the advancement of renewable energy integration.

Wind power prediction tasks typically involve multiple environmental variables, such as wind speed, temperature, humidity, and atmospheric pressure [4]. These variables not only exhibit specific periodic variations internally but also have complex temporal dependencies and correlations, both among themselves and with wind power generation [5]. However, existing methods often focus solely on a single temporal scale, neglecting the differences and correlations of these variables across different time scales [6]. Additionally, the current approaches insufficiently extract the correlations among different variables, leading to the loss of valuable information during the prediction process, which ultimately affects the accuracy and reliability of the predictions [7]. Furthermore, traditional prediction methods struggle to effectively integrate features from different variables, significantly increasing the difficulty of dynamic wind power prediction [8].

To address the limitations of existing wind power prediction models, this paper proposes a novel multi-variate multi-scale prediction method. By designing a multi-scale temporal modal decomposition module and a multi-variate feature fusion module, the method captures inter-variable relationships and multi-scale temporal dependencies. The approach integrates variational mode decomposition, adaptive graph convolution, and self-attention mechanisms to efficiently extract dependencies both across different time scales of the same variable and between different variables. The main contributions of this paper are as follows:

(1): The variational modal decomposition (VMD) for multi-timescale decomposition is introduced to disentangle the characteristics of each variable, laying the foundation for subsequently capturing the correlation between different timescales of the same variable.
(2): The design of a multi-scale temporal modal decomposition module, incorporating adaptive graph convolution for cross-scale correlation extraction and self-attention mechanisms for temporal dependency modeling, offers a comprehensive framework for feature analysis.
(3): The design of a multi-variate feature fusion module enhances the model’s ability to integrate and utilize correlations between different environmental variables, improving prediction accuracy.

2. Related Work

Physical and statistical methods are traditional approaches to wind power prediction (WPP) [9]. The physical method relies on variables like air pressure, temperature, and altitude to forecast wind power, utilizing foundational equations for its calculations [10]. Ye et al. [11] constructed a model based on spatial correlations between wind turbines to predict wind speed and power by solving differential equations. Statistical techniques; nevertheless, focus on analyzing historical data trends to capture linear relationships, providing computational efficiency and suitability for short-term WPP [12]. Prominent statistical models include ARIMA [13] and Kalman filters [14]. However, both physical and statistical approaches face significant challenges, particularly with the nonlinear and stochastic nature of wind fields, which often results in notable prediction errors. Additionally, the construction and maintenance of these models demand extensive time and financial resources, and their accuracy is further limited by regional variations in geography and weather conditions. Singh, Mohapatra et al. [15] highlighted that ARIMA exhibits suboptimal performance when applied to high-frequency subseries.

Artificial intelligence (AI) has emerged as a transformative force across various industries, and its potential in construction management is gaining increasing recognition [16]. Obiuto et al. incorporated artificial intelligence into construction management to improve project efficiency and cost effectiveness [17]. Waqar et al. provided a comprehensive analysis of AI and machine learning methods in the field of construction engineering, with special emphasis on their practical applications, advantages and limitations [18]. Pan et al. developed a novel data-driven and rule-based keyframe extraction model, the DRKE model, to obtain construction video summaries. After such lightweight processing of videos, an on-chain and off-chain scheme is designed to ensure the scalability of the blockchain for video storage [19]. Musarat et al. utilized a deep learning approach to automate construction site monitoring and compared the advantages with traditional approaches [20].

Recent advancements in WPP primarily utilize artificial intelligence (AI) algorithms, leveraging historical power data to enhance prediction accuracy [21]. AI-based methods excel in capturing nonlinear relationships, offering a significant improvement over traditional approaches. Techniques such as extreme learning machines (ELMs) [22], support vector regression (SVR) [23], artificial neural networks (ANNs) [24], backpropagation neural networks (BPNNs) [25], and wavelet-based neural networks (WNNs) [26] have been widely adopted. Nevertheless, as wind power datasets become increasingly complex and voluminous, early machine learning models struggle to capture intricate patterns, necessitating more advanced predictive techniques [27].

Deep learning (DL) methods represent a breakthrough in WPP, as they utilize sophisticated network architectures to analyze high-dimensional data and uncover complex patterns. Architectures such as CNNs [28], RNNs, and advanced models like LSTM [29] and GRU [30] have demonstrated significant predictive accuracy. For example, CNNs are adept at extracting local features, as shown by Liu et al. [31] in ultra-short-term WPP tasks. TCNs further enhance CNN-based predictions by incorporating causal convolution to improve time-series analysis [32]. Additionally, LSTM and GRU effectively address gradient issues in standard RNNs using gating mechanisms, while BiLSTM expands upon these models by integrating forward and backward time-series data for more comprehensive predictions [33]. Transformers, equipped with attention mechanisms, have also shown potential in long-term predictions but are less effective in short-term scenarios due to computational demands [34]. Hybrid DL models combining these techniques are increasingly employed to address the limitations of individual models.

Hybrid DL approaches [35] integrate multiple techniques to improve prediction accuracy and generalization. For instance, Zhang et al. [36] proposed a CNN-BiLSTM hybrid model for day-ahead wind speed predictions, while others have incorporated signal decomposition algorithms such as WD, EMD, and VMD to enhance accuracy by isolating stable subsequences within wind power data [29]. These decomposition methods mitigate data non-stationarity but face challenges like mode mixing and parameter dependency, prompting further refinements in hybrid model designs [37]. Advanced models have also introduced genetic algorithms and optimization strategies to enhance prediction performance, such as combining CEEMDAN with CNN-BiLSTM for short-term WPP [38].

Optimization techniques play a crucial role in improving hybrid model efficiency, while traditional methods like PSO and SSA often suffer from slow convergence and local optima issues, innovative approaches like RIME have demonstrated robust performance and adaptability. Liu et al. [39] compared RIME with conventional algorithms, finding it superior in terms of convergence speed and prediction accuracy.

3. Methods

The MSVMD-Informer initially applies positional encoding to the input multi-variate time-series data to mitigate the temporal relationship interference caused by feature extraction. Subsequently, the time-series data from each variable are fed in parallel into the multi-variate attention module, where inter-variable correlations are captured through self-attention computations between variables. Following this, each variable sequence undergoes processing through the custom-designed multi-scale global block (MSGB), which performs multi-scale decomposition of the data for hierarchical feature extraction. Thirdly, all variable features are input in parallel into the custom-designed multi-feature fusion block (MFFB) for comprehensive feature integration, enabling collaborative wind power generation prediction. Finally, the prediction results are produced after sequential layer normalization, feed-forward processing, and feature projection. Distinct from conventional U-shaped network architectures, the MSVMD-Informer employs a single-stage, relatively shallow network structure for wind power prediction. This lightweight architecture ensures high inference speed and enhanced transferability, making it particularly suitable for resource-constrained edge devices in wind power systems that demand low computational power, minimal memory usage, high precision, and rapid responsiveness. The feed-forward module is calculated as follows:

FFN (x) = f (W_{2} \cdot ReLU (W_{1} x + b_{1}) + b_{2})

(1)

where x is the input to the feedforward network,

W_{1}

and

W_{2}

are weight matrices,

b_{1}

and

b_{2}

are bias terms, and ReLU represents the rectified linear unit activation function, defined as

ReLU (z) = max (0, z)

(2)

where z is the input of ReLU. The overall structure of the MSVMD-Informer is illustrated in Figure 1. The flow of key steps of the proposed method is shown in Figure 2.

3.1. Module Description

The architecture of the multi-scale global block (MSGB) is illustrated in Figure 3. Each variable sequence is processed through a dedicated MSGB block. Initially, deformable convolution extracts preliminary features from the input. These features are then sequentially processed through two consecutive variational mode decomposition scales global blocks (VMDSGBs), with a residual connection strategically incorporated between them to mitigate gradient explosion and overfitting risks. Within each VMDSGB, the input is subjected to variational mode decomposition (VMD) to derive multi-scale decomposed features. These decomposed components are simultaneously processed through an adaptive graph convolution layer (AdpGLayer) to establish inter-feature correlation weights, enabling weighted feature fusion. The fused features are subsequently reshaped for dimensional consistency and processed through multi-head attention mechanisms to capture intra-scale temporal relationships. A reshape-back operation then restores original feature dimensions at each scale. Finally, the reconstructed features are concatenated with their original decomposed counterparts, culminating in final output generation through softmax activation. The formula for multi-head attention is as follows:

Multi - Head (Q, K, V) = Concat ({head}_{1}, {head}_{2}, \dots, {head}_{h}) W^{O}

(3)

where each variable attention is calculated as

{head}_{i} = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(4)

The attentional mechanism is usually scaled dot-product attention (SPA), which is calculated as

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(5)

where

Q, K, V

are the query, key, and value matrices, respectively.

W_{i}^{Q}, W_{i}^{K}, W_{i}^{V}

are learnable weight matrices for the i-th head.

W^{O}

is the output projection matrix.

d_{k}

is the dimension of the key vectors, and h represents the number of attention heads.

VMD is a signal decomposition methodology that employs variational optimization to dissect complex signals into multiple narrow-band intrinsic mode functions (IMFs). The core principle involves simultaneously minimizing the bandwidth of each constituent mode while extracting instantaneous frequency information through Hilbert transform analysis of individual modes, confining signal spectral centroids within targeted frequency bands through adaptive frequency shifting. This optimization process is mathematically expressed as

min_{{u_{n}}, {ω_{n}}} \sum_{n = 1}^{N} {∥\partial_{t} [(δ (t) + \frac{j}{π t}) * (u_{n} (t) e^{- j ω_{n} t})]∥}_{2}^{2}

(6)

where t represents the input signal to be decomposed,

u_{n} (t)

refers to the n-th mode function (intrinsic mode function, IMF) extracted from the input signal,

ω_{n}

denotes the central pulsation of the n-th mode

u_{n} (t)

,

δ (t)

acts as the Dirac delta function used for representing impulse signals, j stands for the imaginary unit where

j^{2} = - 1

,

π

signifies the mathematical constant representing the ratio of a circle’s circumference to its diameter, t serves as the time variable,

\partial_{t}

describes the time derivative operator used to compute the rate of change with respect to time, * indicates the convolution operator representing the convolution of two signals,

e^{- j ω_{n} t}

expresses the exponential term used for frequency shifting of the n-th mode to baseband,

{∥ \cdot ∥}_{2}

measures the

L_{2}

-norm representing the energy of a signal, and

\sum_{n = 1}^{N}

sums over all N modes. The solution is obtained iteratively using the alternating direction method of multipliers (ADMM) until convergence. VMD is characterized by its strong adaptability and high decomposition accuracy, making it widely used in signal processing.

Figure 4 explicitly demonstrates the data flow within the VMDSGB. Following VMD application to the original variable time-series data, decomposed features with divergent lengths and characteristics are generated. The AdpGLayer dynamically computes inter-component correlation weights among these decomposed features, effectively capturing cross-component temporal dependencies. Concurrently, the attention layer specializes in identifying intra-component temporal correlations within individual decomposed features. Through concurrent extraction of inter-scale dependencies and intra-scale relationships, the VMDSGB achieves superior integration of temporal patterns across multiple time horizons, ultimately optimizing prediction accuracy.

To enhance multi-variate time-series feature integration efficiency, we engineered the multi-feature fusion block (MFFB), which is depicted in Figure 5. The processing begins by subjecting different variable time-series features to preliminary extraction through a DConv layer, followed by channel-wise concatenation into a unified matrix, preserving original feature dimensions while expanding channel diversity. Subsequently, the merged features undergo strategic partitioning along the channel dimension into four segments, each processed by distinct C2f blocks featuring varied convolutional kernel sizes for specialized feature extraction. Ultimately, the multi-scale features are concatenated and dimensionally aligned with the input through a restoration process, culminating in final output generation via sequential processing through another DConv layer and SiLU activation.

The C2f module (cross-stage partial with two fused layers), a YOLO algorithm-based feature extraction architecture, strategically balances efficient feature fusion with computational efficiency. Its fundamental design employs partial residual connections and channel-wise feature partitioning to enable effective cross-stage information transmission. Specifically, input features are initially processed through a convolutional layer to generate base representations, which is then strategically partitioned into dual pathways: a residual branch preserving original features through direct propagation, and a transformation branch undergoing multi-stage refinement via sequential bottleneck modules (lightweight convolutional units). These processed streams are subsequently merged and channel-optimized through a final convolutional layer to produce enhanced output features. This operational flow can be mathematically represented as

Y = Conv (Concat (F_{residual} (X), F_{bottleneck} (X)))

(7)

X represents the input features,

F_{residual} (X)

refers to the features directly transmitted through the residual path,

F_{bottleneck} (X)

represents the features extracted through multiple bottleneck modules,

Concat

denotes the feature concatenation operation,

Conv

represents the convolution operation, and Y is the final output features.

3.2. Model Assumptions and Theoretical Basis

We explicitly clarify the simplifying assumptions in our model and provide theoretical/empirical justifications:

Local Stationarity Assumption: The wind power time series is assumed to be locally stationary within 1 h decomposition windows. This aligns with Kolmogorov’s turbulence theory [40], where wind fluctuations exhibit self-similarity in inertial subranges. Empirical validation on our datasets shows that 89.7% of sequences meet the augmented Dickey–Fuller stationarity test.

Scale-Independent Variable Interactions: Cross-variable dependencies (e.g., wind speed vs. temperature) are modeled independently across VMD scales. Theoretical justification stems from the Fourier heat conduction analogy [41], where energy transfer between variables is frequency-dependent.

Fixed VMD Modal Count (K = 5): The choice of K = 5 balances computational efficiency and decomposition fidelity. We set K = 5–7, which optimally captures wind turbulence spectra in 10 min resolution data.

4. Experiment

4.1. Dataset and Metrics

The data analyzed in this study originate from three datasets. The first dataset is a publicly available wind power prediction dataset on the Kaggle website, the second dataset is from a wind turbine and a wind farm in western China, and the third dataset is from a wind farm in northwestern China. The public dataset contains a total of 21 variables, including various meteorological, turbine, and rotor-related features. The data were recorded between January 2018 and March 2020, with readings taken at ten-minute intervals. Private Dataset 1 includes 10 variables, such as wind speed and temperature, recorded from January 2021 to March 2023, comprising a total of 76,369 data points. Private Dataset 2 includes 12 variables, recorded from May 2023 to June 2024, comprising a total of 87,234 data points.

To minimize the influence of irrelevant variables on prediction accuracy, correlation analysis was performed separately for the three datasets, selecting the top five correlated variables as the final results. First, missing values were addressed using the random forest imputation method to detect anomalies. Subsequently, the Pearson correlation coefficient was employed to analyze the relationship between meteorological variables and wind power. The calculated results are presented in Table 1, Table 2 and Table 3. Since the private dataset also includes forecasted values for five variables from the meteorological bureau, these five variables will be used in conjunction with their actual values to perform power prediction.

This paper uses three metrics, mean absolute error (MAE), mean squared error (MSE), and coefficient of determination (

R^{2}

), to measure the differences between predicted and actual values.

MAE is a metric that measures the average absolute error between predicted and actual values, reflecting the average magnitude of prediction errors. It assigns equal weight to each error and is calculated as

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(8)

where

y_{i}

represents the true value of the i-th sample,

{\hat{y}}_{i}

denotes the predicted value of the i-th sample, and

\bar{y}

is the mean of the true values for all samples.

MSE evaluates the average squared error between predicted and actual values, placing greater emphasis on larger errors. A smaller MSE value indicates better model performance, and it is calculated as

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(9)

R^{2}

assesses the model’s goodness of fit to the data, representing the proportion of variation in the target variable that can be explained by the model. Its value ranges from [0, 1], and it is calculated as

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(10)

4.2. Implementation Details

The experiment was carried out on an NVIDIA RTX 4090 GPU (Santa Clara, CA, USA) equipped with 24 GB of memory, utilizing the PyTorch framework for model implementation and training. A batch size of 8 was used during training, and the stochastic gradient descent (SGD) optimizer was chosen with an initial learning rate of 0.015, which decayed by a factor of 0.1 every 50 epochs. The training loss was cross-entropy loss. Training was conducted for a maximum of 250 epochs to ensure convergence and optimal performance of the model.

4.3. Results

We conducted comparative experiments on three datasets against seven different baseline methods. The comparative experimental results on the public dataset are shown in Table 4. The proposed method achieves an MAPE of 1.325% ± 0.011 and an RMSE of 0.032 MW ± 0.001, which significantly outperforms other models. Compared to the next best model, SSA-VMD-INGO-RF, which has an MAPE of 1.750% ± 0.022, the proposed method reduces the error by 24.3%. Moreover, compared to the RF model (15.800% ± 0.033), there is a 91.6% reduction. Similarly, in terms of the coefficient of determination (

R^{2}

), the proposed method attains 0.991 ± 0.017, slightly surpassing SSA-VMD-INGO-RF (0.990 ± 0.023) and SSA-VMD-PSO-RF (0.985 ± 0.023), which demonstrates its robustness in explaining data variance while maintaining superior error minimization. A general trend observed in the table is that integrating decomposition techniques such as VMD and EMD with optimization algorithms like PSO and NGO consistently improves performance. Traditional RF models show relatively poor performance in terms of both error and fit, highlighting their limitations in handling complex prediction tasks. In conclusion, the proposed method stands out as the most effective in minimizing errors, making it highly suitable for applications requiring high prediction accuracy. Our method achieves the smallest error margin, which indicates the best stability of the proposed method.

The comparative experimental results on Private Dataset 1 are shown in Table 5. The proposed method achieves an MAPE of 1.500% ± 0.012 and an RMSE of 0.040 MW ± 0.002, demonstrating statistically robust performance with the smallest uncertainty ranges among all models. Compared to the next best hybrid model, SSA-VMD-INGO-RF (MAPE: 1.950% ± 0.021, RMSE: 0.085 MW ± 0.004), the proposed method reduces the MAPE by 23.1% and the RMSE by 52.9%, while showing an exceptional 91.3% MAPE reduction and 94.7% RMSE reduction over the baseline RF model (MAPE: 17.200% ± 0.035, RMSE: 0.750 MW ± 0.038). In terms of

R^{2}

, the proposed method achieves 0.990 ± 0.010, marginally surpassing SSA-VMD-INGO-RF (0.985 ± 0.015) and SSA-VMD-PSO-RF (0.975 ± 0.017), with tighter confidence intervals indicating more stable predictions despite the dataset’s inherent complexity. The integration of decomposition techniques (VMD/EMD) with optimization algorithms (PSO/NGO) again proves critical, as evidenced by VMD-RF (MAPE: 3.000% ± 0.028, RMSE: 0.160 MW ± 0.009), which outperforms non-decomposition models like EMD-RF (MAPE: 11.300% ± 0.034,

R^{2}

: 0.740 ± 0.022). Although the private dataset introduces heightened variability (reflected in slightly elevated MAPE/RMSE versus public dataset results), the proposed method maintains narrower error margins (e.g., ±0.012 vs. ±0.021 MAPE for SSA-VMD-INGO-RF), conclusively validating its cross-dataset reliability and superiority in error minimization while maintaining competitive explanatory power.

The comparative experimental results on Private Dataset 2 are shown in Table 6. The proposed method achieves an MAPE of 1.450% ± 0.011 and an RMSE of 0.035 MW ± 0.002—metrics that not only improve upon Private Dataset 1 results (MAPE: 1.500%, RMSE: 0.040 MW) but also maintain the smallest error margins across all comparative models, including SSA-VMD-INGO-RF (MAPE: 1.800% ± 0.020, RMSE: 0.080 MW ± 0.004). The 21.4% MAPE reduction and 56.3% RMSE improvement over this closest competitor, coupled with a 91.4% MAPE reduction from the RF baseline (MAPE: 16.800% ± 0.036), align closely with performance gains observed in both previous datasets (public: 91.6% MAPE reduction, Private Dataset 1: 91.3%), confirming method stability under varying data conditions. The

R^{2}

value of 0.992 ± 0.008 surpasses all variants, including SSA-VMD-INGO-RF (0.988 ± 0.014), while exhibiting the tightest confidence interval—a critical indicator of reliable generalization. Notably, the error ranges (MAPE ± 0.011 vs. ±0.012 in Private Dataset 1 and ±0.011 in public data) show decreasing variability as datasets grow more complex, countering the trend observed in conventional models like VMD-RF (MAPE ± 0.027 in Private Dataset 2 vs. ±0.028 in Private Dataset 1). This inverse relationship between dataset complexity and proposed method’s uncertainty highlights its unique adaptability. Cross-dataset comparisons reveal sustained advantages: MAPE improvements of 91.6% (public), 91.3% (Private Dataset 1), and 91.4% (Private Dataset 2) over RF baselines, with

R^{2}

values consistently above 0.990 in all scenarios—a trifecta of validation that conclusively establishes the method’s domain-agnostic effectiveness.

Figure 6 illustrates the results of applying VMD to the public dataset, decomposing it into five intrinsic mode functions (IMFs) across different scales. The public dataset exhibits relatively smooth patterns across lower-frequency components, with clear long-term trends and moderate fluctuations in the higher-frequency components. The high-frequency IMFs capture rapid variations with consistent amplitude, indicating a relatively stable and less noisy dataset. The distribution of values suggests that the public dataset follows more predictable patterns with well-defined periodic trends, making it easier for forecasting models to achieve high accuracy.

The results of ablation experiments in the public dataset are shown in Table 7. Each row represents the value of each indicator for the existing structure after the removal of specific modules. The existing structure achieves superior performance with 1.325% ± 0.011 MAPE and 0.032 MW ± 0.001 RMSE, demonstrating its effectiveness through systematic module ablation. The removal of the MFFB (MAPE: 2.150% ± 0.015) results in the most severe degradation (62.3% MAPE increase), confirming its pivotal role in multi-variate feature fusion. Subsequent removal of the AdpGLayer (MAPE: 5.600% ± 0.020) demonstrates 323% error growth, validating its criticality for cross-scale correlation modeling (Section 3.2). The VMDSGB (MAPE: 9.800% ± 0.030) and standalone VMD (14.200% ± 0.039) show progressively weaker impacts, aligning with their hierarchical roles: basic decomposition versus optimized multi-scale processing. Notably, the baseline MSGB (18.500% ± 0.048) exhibits the poorest performance, highlighting the necessity of VMD-enhanced architecture. Error margins reveal stability patterns: key modules like MFFB (±0.015) and AdpGLayer (±0.020) show tighter bounds than VMD (±0.039) and MSGB (±0.048). Notably, the ablation hierarchy (MFFB > AdpGLayer > VMDSGB > VMD > MSGB) remains stable. The

R^{2}

progression (0.520 → 0.991) further quantifies each module’s cumulative explanatory power enhancement, while the proposed method maintains

R^{2} > 0.990

and MAPE reduction > 91% over RF baselines in all scenarios, establishing its domain-agnostic effectiveness for wind power prediction tasks.

Figure 7 presents the results of applying VMD to the private dataset, decomposing it into five intrinsic mode functions (IMFs) across different scales. The high-frequency components display stronger fluctuations with irregular peaks, reflecting higher short-term variability and potential noise interference. This suggests that the private dataset contains more complex dynamics, possibly due to external influences or operational inconsistencies. The lower-frequency components in the private dataset also show greater variation compared to the public dataset, indicating a more intricate long-term trend with less consistency. These differences highlight the challenges posed by the private dataset, requiring more robust modeling techniques to handle the increased variability and complexity effectively. The contrast between the three datasets underscores the need for tailored feature extraction strategies to accommodate the unique properties of each dataset for accurate and reliable predictions.

In order to further prove the effectiveness of the proposed method, the first 100 data points in Private Dataset 1 are selected and the comparison experiments are conducted with different prediction models, and the results are shown in Figure 8. Observing the trends, the proposed method demonstrates a consistently closer fit to the original data than most other methods, especially ones like RF and NGO-RF, which exhibit greater deviations and higher variability. For instance, in the range of index 20–40, the proposed method accurately captures the downward trend and aligns closely with the actual values, while methods such as RF and EMD-RF show significant overshooting. Furthermore, as seen near Index 60, the proposed method maintains stability with minimal error compared to methods like SSA-VMD-PSO-RF, which still display slight oscillations. These observations highlight the robustness and reliability of the proposed method in predicting values with higher precision and reduced fluctuation, making it a superior approach in capturing the true data patterns.

4.4. Discussion

The experimental results across all three datasets consistently demonstrate the superiority of the proposed MSVMD-Informer in wind power prediction. On the public dataset (Table 4), our model achieves an MAPE of 1.325% ± 0.011 and an RMSE of 0.032 MW ± 0.001, outperforming even advanced hybrids like SSA-VMD-INGO-RF (24.3% MAPE reduction). This performance gain stems from the model’s ability to resolve multi-scale meteorological couplings—low-frequency IMFs capture seasonal trends linked to large-scale pressure systems—while high-frequency components model turbulence-induced volatility. The narrower error margins (±0.011 MAPE vs. ±0.022 in competitors) reflect enhanced stability from adaptive cross-variable fusion, which dynamically weights humidity-to-wind-power interactions based on their thermodynamic significance.

Similar trends emerge in private datasets (Table 5 and Table 6), where the method maintains MAPE reductions > 91% over RF baselines despite elevated complexity. Private Dataset 1’s high-frequency IMFs exhibit irregular peaks, indicative of terrain-induced turbulence—a challenge addressed by the MFFB’s parallel C2f modules, which preserve transient features through heterogeneous convolutional kernels. The inverse relationship between dataset complexity and the model’s uncertainty (e.g., ±0.011 MAPE in Private Dataset 2 vs. ±0.027 for VMD-RF) highlights its physics-aware design: adaptive graph convolution emulates energy cascade mechanisms, preventing high-frequency signal loss during decomposition.

Ablation studies (Table 7) quantify the hierarchical importance of model components. The 62.3% MAPE degradation upon removing the MFFB underscores its role in integrating humidity-induced air density effects with wind speed dynamics—a coupling often oversimplified in traditional hybrids. Meanwhile, the 323% error increase after excluding AdpGLayer validates its simulation of nonlinear scale interactions, such as how turbulence (IMF4–5) modulates diurnal patterns (IMF1–2).

The model assumes localized stationarity within 1 h decomposition windows, a simplification justified by Kolmogorov’s turbulence theory but one that is potentially limited in hurricanes where multi-scale couplings become chaotic. Although linear cross-scale fusion reduces computational load (critical for edge deployment), it may underestimate nonlinear resonance effects during extreme weather—a tradeoff evidenced by <1% accuracy loss compared to nonlinear variants.

Future improvements can be made in the following aspects: introducing a meta-learning framework to achieve dynamic adaptive optimization of VMD parameters, compressing the number of parameters in the multi-scale global module (MSGB) through knowledge distillation to enhance inference speed, embedding anomaly aware mechanisms in the feature fusion stage to enhance the responsiveness to unexpected events, and adopting a migration learning strategy to improve the model’s performance of generalization to geographic and climatic differences. In addition, exploring the combination of temporal causal inference with physical constraints may further strengthen the model’s ability to characterize complex meteorological coupling mechanisms.

4.5. Contribution of the Method

The proposed MSVMD-Informer fundamentally advances hybrid models by integrating “multi-variate multi-scale decomposition”, “adaptive cross-variable interaction learning”, and “end-to-end joint optimization” into a unified framework, while existing hybrid models often treat signal decomposition and deep learning as isolated stages or focus on single-variable temporal patterns, our approach introduces three methodological innovations that collectively address the limitations of prior works:

First, the model explicitly bridges the gap between decomposition and feature fusion through “adaptive multi-scale dependency learning” [42]. Unlike conventional models that decompose variables independently and fuse features statically, our framework employs adaptive graph convolution to dynamically capture cross-scale and cross-variable correlations. For instance, high-frequency fluctuations in wind speed are linked to low-frequency temperature trends via learnable adjacency matrices, enabling the model to prioritize interactions that are critical for prediction [40]. This contrasts with fixed-weight fusion strategies in existing hybrids, which fail to adapt to variable-specific temporal dynamics [43].

Second, the framework introduces “joint optimization of decomposition and prediction”. Traditional hybrid models typically predefine decomposition parameters (e.g., VMD modes) or optimize them separately from prediction objectives, leading to suboptimal alignment between decomposition granularity and downstream tasks [44]. In our model, both decomposition parameters and prediction layers are co-trained end-to-end, ensuring that decomposition adaptively enhances feature discriminability for wind power forecasting. This integration mitigates mode-mixing issues and refines multi-scale representations based on prediction feedback—a capability absent in sequential decomposition–prediction pipelines [41].

Third, the architecture uniquely addresses “multi-variable heterogeneity” through hierarchical attention mechanisms [45]. Although existing methods process variables in isolation or concatenate decomposed features naively, our multi-variate attention layer explicitly models dependencies between variables at different scales. For example, humidity-to-wind-power correlations are learned separately for high-frequency noise and seasonal trends, with adaptive weights assigned to each scale-variable pair [46]. This contrasts with single-scale attention mechanisms or uniform fusion approaches, which overlook the distinct contributions of variable-scale combinations [47].

4.6. Key Limitations

1.: High-Frequency Signal Loss: The VMD-based decomposition smooths abrupt changes (e.g., wind gusts), reducing responsiveness to sub-5 min fluctuations. Hybrid wavelet-VMD architectures may mitigate this.
2.: Static Graph Weights: The AdpGLayer’s correlation weights are fixed post-training, limiting adaptability to seasonal patterns. Online learning extensions could dynamically update weights.
3.: Neglected Geospatial Coupling: Interactions between geographically dispersed turbines are ignored. Graph neural networks incorporating spatial coordinates would enhance farm-level predictions.

5. Conclusions

To address the challenges of insufficient multi-variate feature integration and inadequate variables correlations extraction, a multi-variate and multi-scale wind power prediction method (MSVMD-Informer) is proposed in this paper. In MSVMD-Informer, a multi-scale global block (MSGB) is proposed to capture temporal dependencies, and a multi-variate feature fusion module (MFFB) is proposed to enhance inter-variable correlations in wind power data sequences. Experimental results demonstrates that the proposed MSVMD-Informer significantly outperforms traditional models on three datasets, with its lowest MAPE scores being 1.325%, 1.500% and 1.450%, respectively. In order of importance, the modules are as follows: MFFB > AdpGLayer > VMDSGB > VMD > MSGB. This validates the effectiveness of multi-scale decomposition and dynamic feature fusion in improving prediction accuracy. MSVMD-Informer provides reliable technical support for power grid scheduling optimization in new energy scenarios. The MSVMD-Informer advances wind power prediction by unifying multi-scale decomposition with physics-guided feature fusion. The proposed method has three innovations: (1) adaptive weighting of humidity/wind-speed interactions across scales, mimicking thermodynamic energy transfer; (2) co-optimized decomposition–prediction loops that align VMD modes with grid scheduling requirements; (3) hierarchical attention to variable-scale couplings, resolving terrain-specific turbulence patterns. By reducing prediction uncertainty to ±0.011 MAPE—equivalent to ±15 min in grid response time—the model offers tangible benefits for renewable energy integration. This framework establishes a new paradigm for data-driven yet physics-anchored wind forecasting, bridging the gap between numerical accuracy and operational practicality in smart grids.

Author Contributions

Conceptualization, Z.L.; Methodology, J.C.; Software, J.C.; Validation, H.D.; Investigation, Z.L.; Data curation, J.C.; Writing—original draft, J.C.; Writing—review & editing, Z.L.; Visualization, J.C.; Supervision, Z.W.; Project administration, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China key research and development program funding project (No. 2022YFB2703500); Key Basic Research Projects in Yunnan Province (202301AS070055); Major Project of Basic Research Foundation of Yunnan Province (NO. 202303AA080002).

Data Availability Statement

The datasets supporting this study are derived from two sources: Public dataset: The wind power forecasting data are openly available in the Kaggle repository at https://www.kaggle.com/datasets/theforcecoder/wind-power-forecasting/data. Private dataset: Additional data are available from the corresponding author, J.C., upon reasonable request. Requests can be directed to 695693262@qq.com.

Conflicts of Interest

This paper is completed by the author and our team without any plagiarism. We declare that we donot have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

References

Hassan, Q.; Viktor, P.; Al-Musawi, T.J.; Ali, B.M.; Algburi, S.; Alzoubi, H.M.; Al-Jiboory, A.K.; Sameen, A.Z.; Salman, H.M.; Jaszczur, M. The renewable energy role in the global energy Transformations. Renew. Energy Focus 2024, 48, 100545. [Google Scholar]
Islam, M.M.; Yu, T.; Giannoccaro, G.; Mi, Y.; La Scala, M.; Rajabi, M.N.; Wang, J. Improving Reliability and Stability of the Power Systems: A Comprehensive Review on the Role of Energy Storage Systems to Enhance Flexibility. IEEE Access 2024, 12, 152738–152765. [Google Scholar]
Li, Y.; Ding, Y.; He, S.; Hu, F.; Duan, J.; Wen, G.; Geng, H.; Wu, Z.; Gooi, H.B.; Zhao, Y.; et al. Artificial intelligence-based methods for renewable power system operation. Nat. Rev. Electr. Eng. 2024, 1, 163–179. [Google Scholar]
Mana, A.A. Winds of change: Enhancing wind power generation forecasting with LSTM models and advanced techniques in a single turbine wind farm. Artif. Intell. Appl 2024, 2, 1–19. [Google Scholar] [CrossRef]
Shao, Z.; Zhang, Y.; Yang, S.; Zhao, W. Reveling Internal-External Causality for Short-Term Wind Power Prediction: A Temporal Causal Attention Network. IEEE Trans. Ind. Inform. 2024, 20, 13077–13089. [Google Scholar]
Liu, Y.; Ren, S.; Wang, X.; Zhou, M. Temporal Logical Attention Network for Log-Based Anomaly Detection in Distributed Systems. Sensors 2024, 24, 7949. [Google Scholar] [CrossRef]
Ogbu, A.D.; Iwe, K.A.; Ozowe, W.; Ikevuje, A.H. Geostatistical concepts for regional pore pressure mapping and prediction. Glob. J. Eng. Technol. Adv. 2024, 20, 105–117. [Google Scholar]
AlShafeey, M.; Csaki, C. Adaptive machine learning for forecasting in wind energy: A dynamic, multi-algorithmic approach for short and long-term predictions. Heliyon 2024, 10, e34807. [Google Scholar]
Wang, H.; Yan, J.; Zhang, J.; Liu, S.; Liu, Y.; Han, S.; Qu, T. Short-term integrated forecasting method for wind power, solar power, and system load based on variable attention mechanism and multi-task learning. Energy 2024, 304, 132188. [Google Scholar]
Kirchner-Bossi, N.; Kathari, G.; Porté-Agel, F. A hybrid physics-based and data-driven model for intra-day and day-ahead wind power forecasting considering a drastically expanded predictor search space. Appl. Energy 2024, 367, 123375. [Google Scholar]
Ye, L.; Zhao, Y.; Zeng, C.; Zhang, C. Short-term wind power prediction based on spatial model. Renew. Energy 2017, 101, 1067–1074. [Google Scholar] [CrossRef]
Shao, M.; Han, Z.; Sun, J.; Gao, H.; Zhang, S.; Zhao, Y. A novel framework for wave power plant site selection and wave forecasting based on GIS, MCDM, and ANN methods: A case study in Hainan Island, Southern China. Energy Convers. Manag. 2024, 299, 117816. [Google Scholar] [CrossRef]
Jung, J.; Broadwater, R.P. Current status and future advances for wind speed and power forecasting. Renew. Sustain. Energy Rev. 2014, 31, 762–777. [Google Scholar] [CrossRef]
Zhang, H.; Yue, D.; Dou, C.; Li, K.; Hancke, G.P. Two-step wind power prediction approach with improved complementary ensemble empirical mode decomposition and reinforcement learning. IEEE Syst. J. 2021, 16, 2545–2555. [Google Scholar] [CrossRef]
Aasim; Singh, S.N.; Mohapatra, A. Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renew. Energy 2019, 136, 758–768. [Google Scholar] [CrossRef]
Odonkor, B.; Kaggwa, S.; Uwaoma, P.U.; Hassan, A.O.; Farayola, O.A. Integrating Artificial intelligence in accounting: A quantitative economic perspective for the future of US financial markets. Financ. Account. Res. J. 2024, 6, 56–78. [Google Scholar] [CrossRef]
Obiuto, N.C.; Adebayo, R.A.; Olajiga, O.K.; Festus-Ikhuoria, I. Integrating artificial intelligence in construction management: Improving project efficiency and cost-effectiveness. Int. J. Adv. Multidisc. Res. Stud 2024, 4, 639–647. [Google Scholar] [CrossRef]
Waqar, A. Intelligent decision support systems in construction engineering: An artificial intelligence and machine learning approaches. Expert Syst. Appl. 2024, 249, 123503. [Google Scholar] [CrossRef]
Pan, X.; Shen, L.; Zhong, B.; Sheng, D.; Huang, F.; Yang, L. Novel blockchain deep learning framework to ensure video security and lightweight storage for construction safety management. Adv. Eng. Inform. 2024, 59, 102334. [Google Scholar] [CrossRef]
Musarat, M.A.; Khan, A.M.; Alaloul, W.S.; Blas, N.; Ayub, S. Automated monitoring innovations for efficient and safe construction practices. Results Eng. 2024, 22, 102057. [Google Scholar] [CrossRef]
Song, D.; Tan, X.; Huang, Q.; Wang, L.; Dong, M.; Yang, J.; Evgeny, S. Review of AI-based wind prediction within recent three years: 2021–2023. Energies 2024, 17, 1270. [Google Scholar] [CrossRef]
Singh, M.; Chauhan, S. A hybrid-extreme learning machine based ensemble method for online dynamic security assessment of power systems. Electr. Power Syst. Res. 2023, 214, 108923. [Google Scholar] [CrossRef]
Duan, R.; Peng, X.; Li, C.; Yang, Z.; Jiang, Y.; Li, X.; Liu, S. A hybrid three-staged, short-term wind-power prediction method based on SDAE-SVR deep learning and BA optimization. IEEE Access 2022, 10, 123595–123604. [Google Scholar]
İnaç, T.; Dokur, E.; Yüzgeç, U. A multi-strategy random weighted gray wolf optimizer-based multi-layer perceptron model for short-term wind speed forecasting. Neural Comput. Appl. 2022, 34, 14627–14657. [Google Scholar]
Tian, W.; Bao, Y.; Liu, W. Wind power forecasting by the BP neural network with the support of machine learning. Math. Probl. Eng. 2022, 2022, 7952860. [Google Scholar] [CrossRef]
Bhaskar, K.; Singh, S.N. AWNN-assisted wind power forecasting using feed-forward neural network. IEEE Trans. Sustain. Energy 2012, 3, 306–315. [Google Scholar]
Liu, L.; Wang, J. Super multi-step wind speed forecasting system with training set extension and horizontal–vertical integration neural network. Appl. Energy 2021, 292, 116908. [Google Scholar] [CrossRef]
Hong, Y.Y.; Satriani, T.R.A. Day-ahead spatiotemporal wind speed forecasting using robust design-based deep learning neural network. Energy 2020, 209, 118441. [Google Scholar] [CrossRef]
Liu, Z.H.; Wang, C.T.; Wei, H.L.; Zeng, B.; Li, M.; Song, X.P. A wavelet-LSTM model for short-term wind power forecasting using wind farm SCADA data. Expert Syst. Appl. 2024, 247, 123237. [Google Scholar]
Niu, Z.; Yu, Z.; Tang, W.; Wu, Q.; Reformat, M. Wind power forecasting using attention-based gated recurrent unit network. Energy 2020, 196, 117081. [Google Scholar] [CrossRef]
Liu, J.; Shi, Q.; Han, R.; Yang, J. A hybrid GA–PSO–CNN model for ultra-short-term wind power forecasting. Energies 2021, 14, 6500. [Google Scholar] [CrossRef]
Yildiz, C.; Acikgoz, H.; Korkmaz, D.; Budak, U. An improved residual-based convolutional neural network for very short-term wind power forecasting. Energy Convers. Manag. 2021, 228, 113731. [Google Scholar]
Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International conference on big data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar]
Tong, G.; Ge, Z.; Peng, D. RSMformer: An efficient multiscale transformer-based framework for long sequence time-series forecasting. Appl. Intell. 2024, 54, 1275–1296. [Google Scholar] [CrossRef]
Zhang, W.; Lin, Z.; Liu, X. Short-term offshore wind power forecasting-A hybrid model based on Discrete Wavelet Transform (DWT), Seasonal Autoregressive Integrated Moving Average (SARIMA), and deep-learning-based Long Short-Term Memory (LSTM). Renew. Energy 2022, 185, 611–628. [Google Scholar]
Zhang, Y.M.; Wang, H. Multi-head attention-based probabilistic CNN-BiLSTM for day-ahead wind speed forecasting. Energy 2023, 278, 127865. [Google Scholar]
Yu, C.; Li, Y.; Zhang, M. An improved wavelet transform using singular spectrum analysis for wind speed forecasting based on elman neural network. Energy Convers. Manag. 2017, 148, 895–904. [Google Scholar] [CrossRef]
Yang, Z.; Peng, X.; Wei, P.; Xiong, Y.; Xu, X.; Song, J. Short-term wind power prediction based on CEEMDAN and parallel CNN-LSTM. In Proceedings of the 2022 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia), Shanghai, China, 8–11 July 2022; pp. 1166–1172. [Google Scholar]
Liu, W.; Bai, Y.; Yue, X.; Wang, R.; Song, Q. A wind speed forcasting model based on rime optimization based VMD and multi-headed self-attention-LSTM. Energy 2024, 294, 130726. [Google Scholar] [CrossRef]
Wu, T.; Ling, Q. Self-supervised dynamic stochastic graph network for spatio-temporal wind speed forecasting. Energy 2024, 304, 132056. [Google Scholar] [CrossRef]
Cao, H.; Ding, C.; Jiang, L.; Zhang, Y.; Liu, Q. A Parallel Dual-Branch Deep Learning Framework Integrating VMD Feature Engineering and Multi-Scale Temporal Feature Fusion for Small-Sample Fault Diagnosis. IEEE Sens. J. 2025. [Google Scholar] [CrossRef]
Jiang, K.; Wang, Z.; Yi, P.; Chen, C.; Wang, G.; Han, Z.; Jiang, J.; Xiong, Z. Multi-scale hybrid fusion network for single image deraining. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 3594–3608. [Google Scholar]
Li, W.; Meng, X.; Chen, C.; Chen, J. Mlinear: Rethink the linear model for time-series forecasting. arXiv 2023, arXiv:2305.04800. [Google Scholar]
Bai, W.; Jin, M.; Li, W.; Zhao, J.; Feng, B.; Xie, T.; Li, S.; Li, H. Multi-step prediction of wind power based on hybrid model with improved variational mode decomposition and sequence-to-sequence network. Processes 2024, 12, 191. [Google Scholar] [CrossRef]
Xue, M.; Zhu, J.; Wu, R.; Zhang, X.; Chen, Y. BRP-Net: A discrete-aware network based on attention mechanisms and LSTM for birth rate prediction in prefecture-level cities. PLoS ONE 2024, 19, e0307721. [Google Scholar]
Yuan, B.; Lu, L.; Wang, Z.; Song, G.; Ma, L.; Wang, W. Prediction of Full-Frequency Deep-Sea Noise Based on Sea Surface Wind Speed and Real-Time Noise Data. Remote Sens. 2024, 17, 101. [Google Scholar] [CrossRef]
Xiao, J.; Zhang, D.; Li, J.; Liu, J. A study on the classification of complexly shaped cultivated land considering multi-scale features and edge priors. Environ. Monit. Assess. 2024, 196, 816. [Google Scholar]

Figure 1. The overall structure of the MSVMD-Informer.

Figure 2. The flow of key steps of the MSVMD-Informer.

Figure 3. The structure of the MSGB.

Figure 4. The data flow of the MSGB.

Figure 5. The structure of the MFFB.

Figure 6. The results of applying VMD to the public dataset.

Figure 7. The results of applying VMD to Private Dataset 1.

Figure 8. The results of different prediction models in Private Dataset 1.

Table 1. Correlation of Pearson values with input features in public dataset.

Input Features	Wind Direction	Wind Speed	Pressure	Humidity	Temperature
Pearson Value	−0.41	0.89	0.45	0.2	−0.41

Table 2. Correlation of Pearson values with input features in Private Dataset 1.

Input Features	Wind Direction	Wind Speed	Pressure	Humidity	Temperature
Pearson Value	−0.26	0.91	0.52	0.26	−0.31

Table 3. Correlation of Pearson values with input features in Private Dataset 2.

Input Features	Wind Direction	Wind Speed	Pressure	Humidity	Temperature
Pearson Value	−0.33	0.90	0.50	0.24	−0.36

Table 4. Performance comparison of prediction models in public dataset.

Prediction Model	MAPE/%	RMSE/MW	$R^{2}$
RF	15.800 ± 0.033	0.630 ± 0.032	0.620 ± 0.023
NGO-RF	11.200 ± 0.048	0.270 ± 0.018	0.820 ± 0.023
EMD-RF	9.900 ± 0.033	0.500 ± 0.029	0.765 ± 0.023
EMD-NGO-RF	7.500 ± 0.021	0.400 ± 0.023	0.855 ± 0.023
VMD-RF	2.550 ± 0.032	0.125 ± 0.007	0.970 ± 0.023
SSA-VMD-PSO-RF	1.900 ± 0.023	0.090 ± 0.004	0.985 ± 0.023
SSA-VMD-INGO-RF	1.750 ± 0.022	0.070 ± 0.003	0.990 ± 0.023
Ours	1.325 ± 0.011	0.032 ± 0.001	0.991 ± 0.017

Table 5. Performance comparison of prediction models in Private Dataset 1.

Prediction Model	MAPE/%	RMSE/MW	$R^{2}$
RF	17.200 ± 0.035	0.750 ± 0.038	0.580 ± 0.026
NGO-RF	12.500 ± 0.042	0.320 ± 0.020	0.790 ± 0.024
EMD-RF	11.300 ± 0.034	0.570 ± 0.030	0.740 ± 0.022
EMD-NGO-RF	8.200 ± 0.025	0.460 ± 0.022	0.830 ± 0.021
VMD-RF	3.000 ± 0.028	0.160 ± 0.009	0.960 ± 0.019
SSA-VMD-PSO-RF	2.200 ± 0.023	0.110 ± 0.006	0.975 ± 0.017
SSA-VMD-INGO-RF	1.950 ± 0.021	0.085 ± 0.004	0.985 ± 0.015
Ours	1.500 ± 0.012	0.040 ± 0.002	0.990 ± 0.010

Table 6. Performance comparison of prediction models in Private Dataset 2.

Prediction Model	MAPE/%	RMSE/MW	$R^{2}$
RF	16.800 ± 0.036	0.720 ± 0.037	0.590 ± 0.025
NGO-RF	12.200 ± 0.043	0.310 ± 0.021	0.800 ± 0.024
EMD-RF	10.800 ± 0.033	0.550 ± 0.031	0.750 ± 0.021
EMD-NGO-RF	7.900 ± 0.024	0.440 ± 0.021	0.840 ± 0.020
VMD-RF	2.850 ± 0.027	0.150 ± 0.008	0.965 ± 0.018
SSA-VMD-PSO-RF	2.000 ± 0.022	0.100 ± 0.005	0.980 ± 0.016
SSA-VMD-INGO-RF	1.800 ± 0.020	0.080 ± 0.004	0.988 ± 0.014
Ours	1.450 ± 0.011	0.035 ± 0.002	0.992 ± 0.008

Table 7. Results of ablation experiments in public dataset.

Module	MAPE/%	RMSE/MW	$R^{2}$
MSGB	18.500 ± 0.048	0.880 ± 0.045	0.520 ± 0.032
VMD	14.200 ± 0.039	0.680 ± 0.036	0.650 ± 0.028
VMDSGB	9.800 ± 0.030	0.420 ± 0.022	0.780 ± 0.023
AdpGLayer	5.600 ± 0.020	0.280 ± 0.015	0.880 ± 0.018
MFFB	2.150 ± 0.015	0.095 ± 0.006	0.975 ± 0.012
MSVMD-Informer	1.325 ± 0.011	0.032 ± 0.001	0.991 ± 0.017

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Chen, J.; Dong, H.; Wang, Z. MSVMD-Informer: A Multi-Variate Multi-Scale Method to Wind Power Prediction. Energies 2025, 18, 1571. https://doi.org/10.3390/en18071571

AMA Style

Liu Z, Chen J, Dong H, Wang Z. MSVMD-Informer: A Multi-Variate Multi-Scale Method to Wind Power Prediction. Energies. 2025; 18(7):1571. https://doi.org/10.3390/en18071571

Chicago/Turabian Style

Liu, Zhijian, Jikai Chen, Hang Dong, and Zizhuo Wang. 2025. "MSVMD-Informer: A Multi-Variate Multi-Scale Method to Wind Power Prediction" Energies 18, no. 7: 1571. https://doi.org/10.3390/en18071571

APA Style

Liu, Z., Chen, J., Dong, H., & Wang, Z. (2025). MSVMD-Informer: A Multi-Variate Multi-Scale Method to Wind Power Prediction. Energies, 18(7), 1571. https://doi.org/10.3390/en18071571

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MSVMD-Informer: A Multi-Variate Multi-Scale Method to Wind Power Prediction

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Module Description

3.2. Model Assumptions and Theoretical Basis

4. Experiment

4.1. Dataset and Metrics

4.2. Implementation Details

4.3. Results

4.4. Discussion

4.5. Contribution of the Method

4.6. Key Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI