Next Article in Journal
Natural Language Understanding for Navigation of Service Robots in Low-Resource Domains and Languages: Scenarios in Spanish and Nahuatl
Previous Article in Journal
Nonparametric Estimation of Conditional Copula Using Smoothed Checkerboard Bernstein Sieves
Previous Article in Special Issue
Community Detection in Multiplex Networks Using Orthogonal Non-Negative Matrix Tri-Factorization Based on Graph Regularization and Diversity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Neural Networks with Transfer Learning and Frequency Decomposition for Wind Speed Prediction with Missing Data

1
Departamento de Computacion, CINVESTAV-IPN (National Polytechnic Institute), Mexico City 07360, Mexico
2
Departamento de Control Automatico, CINVESTAV-IPN (National Polytechnic Institute), Mexico City 07360, Mexico
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(8), 1137; https://doi.org/10.3390/math12081137
Submission received: 15 March 2024 / Revised: 5 April 2024 / Accepted: 7 April 2024 / Published: 10 April 2024
(This article belongs to the Special Issue Advanced Computational Intelligence)

Abstract

:
This paper presents a novel data-driven approach for enhancing time series forecasting accuracy when faced with missing data. Our proposed method integrates an Echo State Network (ESN) with ARIMA (Autoregressive Integrated Moving Average) modeling, frequency decomposition, and online transfer learning. This combination specifically addresses the challenges missing data introduce in time series prediction. By using the strengths of each technique, our framework offers a robust solution for handling missing data and achieving superior forecasting accuracy in real-world applications. We demonstrate the effectiveness of the proposed model through a wind speed prediction case study. Compared to the existing methods, our approach achieves significant improvement in prediction accuracy, paving the way for more reliable decisionmaking in wind energy operations and management.

1. Introduction

Neural networks, popular nonlinear models, have demonstrated success in wind speed forecasting. These models often leverage activation functions and the Adam optimization algorithm. Multilayer Perceptron and Radial Basis Function networks are examples of such applications [1]. Additionally, studies have explored Bayesian regularization, Levenberg–Marquardt, and recurrent neural networks [2,3]. Beyond neural networks, various nonlinear models can be used for long-term time series prediction, including reinforcement learning [4], fuzzy logic [5], and Support Vector Machines [6]. In recent years, Long Short-Term Memory (LSTM) networks have emerged as a popular alternative due to their ability to capture long-term dependencies and nonlinear relationships in time series data [7]. However, their complex architecture can hinder interpretability, and existing variants have not shown significant advantages over the standard architecture according to Greff [8].
An alternative approach is Echo State Network (ESN), a recurrent network similar to LSTM. ESNs require only one training phase, making them faster to train [9]. They excel at handling large, complex datasets and are robust to noise in the input data. These characteristics make ESNs a suitable choice for long-term forecasting and dealing with missing values. Additionally, research has explored deep and multilayer neural networks for time series forecasting [10], with Bai et al. [11] proposing a double-layer staged training ESN.
The ARIMA (Autoregressive Integrated Moving Average) model is a widely used and effective tool for time series forecasting, demonstrating its value in wind power generation forecasting, as shown by Yati et al. [12]. The model’s structure can be determined using the autocorrelation function, as described in Elsara et al. [13]. Eldali et al. [14] further improved ARIMA’s accuracy by incorporating aerodynamic atmospheric models. However, Yumeng [15] found its limitations in long-term prediction when applied to multivariate time series. Complex mathematical equations relying on fundamental physical law are used to forecast future weather conditions. Numerical weather prediction (NWP) models are fed with a massive amount of current weather data and are solved on supercomputers. Running high-resolution NWP models can be computationally expensive [16]. NWP models can predict large-scale weather patterns over longer timeframes (days in advance). NWP model computations are resource-intensive; the end user (e.g., wind power company) typically accesses forecasts through weather service providers who handle the heavy lifting. However, our focus is on the limitations inherent in NWP data themselves, particularly for short-term wind speed prediction. NWP accuracy can decrease significantly at shorter horizons crucial for wind speed operations. This is where our proposed method with transfer learning comes in. By using historical data and incorporating missing data handling techniques [17], our approach aims to improve the accuracy of short-term wind speed forecasts, even with potentially incomplete NWP data from traditional sources.
Time series forecasting involves predicting future values over extended periods. Accuracy depends on data quality, method selection, and  incorporating relevant external factors. The presence of missing data in the time series adds another layer of challenge, impacting both data quality and model selection, as noted by Weina et al. [18]. One limitation of the ARIMA model is its linear nature, potentially contributing to lower accuracy in predictions. To address this, various methods, categorized as physical and statistical, have been developed [19]. Statistical methods are generally suited for short-term forecasting, while physical methods are typically used for long-term forecasting, as indicated by [20].
Missing values in time series data significantly impact prediction accuracy, making careful handling essential for reliable results [21,22,23]. To address this, consider techniques like Domain Adaptation Extreme Learning Machines (DA-ELM) and Transfer Learning combined with Ensemble Learning (TL-EL). DA-ELM provides robust classification from limited labeled data in E-nose systems, maintaining ELM’s efficiency [24]. TL-EL tackles data variability in time series prediction, leveraging older data to enhance the network’s memory for current predictions [25]. DA-ELM focuses specifically on drift in gas recognition, while TL-EL offers a broader approach for enhancing time series prediction models.
A primary challenge in time series forecasting lies in the presence of complex patterns like seasonality and trends. Frequency decomposition is a powerful technique to identify and isolate these patterns within the data. Decomposing a time series into its frequency components allows for focused analysis of each component, providing insights into their behavior and enabling more accurate forecasting. Often, data preprocessing techniques play a crucial role in improving data quality. Methods like high-frequency low-frequency decomposition [26], variational mode decomposition [27], and wavelet transform [20] are widely used. The ARIMA model, specifically, leverages this separation of high and low frequencies to extract temporal correlations and probability distributions within the time series [26].
Several studies have explored hybrid approaches to improve forecasting accuracy [28]. In [29], the authors propose combining Empirical Mode Decomposition (EMD) and Local Mean Decomposition (LMD) to reduce decomposition error and enhance the efficiency and accuracy of a Stochastic Configuration Network for large datasets. Liu et al. [30] present four hybrid methods for multi-step wind speed prediction using Adaboost and Multilayer Perceptron (MLP) neural networks with different training algorithms. While both approaches aim to improve accuracy, they face tradeoffs: the EMD-LMD method heavily relies on data availability, while the Adaboost-MLP method requires significant execution time.
The aim of the paper is to design an effective method to predict time series with missing values. We propose an effective model named Echo State ARIMA (ES-ARIMA). This model integrates the ARIMA model as the recurrent layer in an Echo State Network (ESN), introducing an error feedback mechanism that enhances performance. Unlike traditional fusion methods [31] that use ESN solely to compensate for the ARIMA model, our approach leverages the strengths of both models.
To further improve prediction accuracy, we employ frequency decomposition to separate high-frequency signals from low-frequency ones, allowing for a focused “fine-tuning” phase in the short-term data. Additionally, we incorporate online transfer learning, guided by a specialized performance index, to effectively leverage information from other time series during training. This work offers the following key contributions:
  • Addressing forecasting with missing data: We propose a novel method to address the challenge of low accuracy in forecasting with missing data. By  integrating an ESN into the ARIMA model, we significantly improve prediction accuracy. To our knowledge, this is the first time an ARIMA model has been enhanced by an ESN.
  • Leveraging frequency decomposition and transfer learning: We utilize frequency decomposition to account for the specific characteristics of forecasts. Furthermore, we incorporate online transfer learning to tackle the accuracy issues caused by missing data.
  • Successful application in wind speed prediction: The proposed ES-ARIMA model demonstrates its effectiveness through successful application in wind speed prediction.

2. Echo State ARIMA Model

The ARIMA model is an extended version of the ARMA (Auto Regressive Moving Average) model that incorporates an integration component. This model consists of three stages [26]. Echo State Networks (ESNs) are a type of recurrent neural network. They differ from traditional ones by utilizing a fixed random connection pattern between neurons. Only a small fraction of neurons are connected, creating a sparse network. The network input is added to the activity of each neuron, and the output is a linear combination of their activities. These activities update with each time step, forming a dynamic system suitable for tasks like time-series forecasting and signal processing.
Neither ARIMA nor Echo State Networks (ESNs) on their own are sufficient for effectively modeling time series data with missing values. To address this challenge, we introduce Echo State ARIMA (ES-ARIMA), a novel method that combines the strengths of both techniques. ARIMA is powerful at capturing linear trends in data, while ESNs excel at modeling nonlinear relationships and the dynamic behavior of systems. By incorporating ARIMA’s predictions into the ESN framework, we provide the network with additional information, ultimately enhancing the overall accuracy of the forecasts.

2.1. ARIMA Model

(1) Auto Regressive (AR). It expresses the past values of time series y t as
y t = a 0 + a 1 y t 1 + + a p y t p + ϵ t
where a i are the coefficients of the linear AR model, ϵ t is white noise. It has zero mean and independent and identically distributed (i.i.d.).
(2) Moving Average (MA). It expresses the past values of noise ϵ t as
y t = ϵ t + b 1 ϵ t 1 + + b q ϵ t q
here, b i indicates the coefficients of MA model.
(3) Integration (I). Stationarity is a critical factor in time series forecasting and a key parameter for designing the ARIMA model. We calculate the difference as
Δ y t = y t y t 1
It introduces differentiation to smoothen the time series and make it closer to being stationary.
The zero mean p , q -order ARMA model is
1 a 0 y t = a 1 y t 1 + + a p y t p + ϵ t + b 1 ϵ t 1 + + b q ϵ t q
Let z be the lag operator
z 1 y t = y t 1 , z 2 y t = y t 2
d-order integration can be defined as
y t = d x t = 1 z 1 d x t
p , q , d -order ARIMA model is
1 k = 1 p a i z k 1 z 1 d y t = 1 + k = 1 q b j z k ϵ t
The parameters of the ARIMA model are defined by the vector θ as
θ = a 1 a p , b 1 b q
We can use least square method to estimate the parameter θ . For the i-th data,
θ i = a 1 i a p i , b 1 i b q i , i = 1 N
where n is the data size, the ARIMA model (5) in vector form is
Y = θ Y ¯ + E
The objective of the parameter identification is
min θ Y t θ Y ¯ 2
Because (8) is a linear-in-parameter process, the optimal solution of θ is
θ = Y ¯ Y ¯ T 1 Y ¯ T Y

2.2. Echo State Network

Figure 1 illustrates an ESN structure. An ESN architecture consists of three layers: (1) Input Layer: Receives external data and feeds them into the reservoir layer. (2) Reservoir Layer: This large layer contains interconnected neurons with fixed weights (a key ESN feature). These neurons process the input data through a complex nonlinear transformation and forward the result to the output layer. (3) Output Layer: A linear layer that maps the transformed data to the desired output.
The mathematical expression of ESN is
x t = ϕ W i n u t + W r e s x t 1 y t = W o u t x t
where the input to the ESN at time step t is denoted by u t R n , where n is the dimensionality of the input. The output of the network at time step t is denoted by y t R m , where m is the dimensionality of the output. The state of the reservoir layer at time step t is provided by x ( t ) R N . The ESN has a reservoir layer consisting of N neurons, W i n R N × n is the input weight matrix, W r e s R N × N is the recurrent weight matrix, W o u t R m × N is the output weight matrix, and  ϕ is an element-wise activation function.
The recurrent layer is
x t = ( 1 α ) x t 1 + α x t
where α is the leaking rate.
The weights W i n and W r e s are randomly initialized with fixed weights. The weights of the output layer W o u t are adjusted using the least squre algorithm
W o u t = X T X + β I 1 X Y T
where X is the matrix of reservoir states x ( t ) for all time steps t, Y is the matrix of desired outputs y t for all time steps t, β is a regularization parameter, and I is the identity matrix.
For time series modeling, Y is the target vector, and X is historical data. This approach greatly simplifies the training process and reduces the risk of overfitting since the complexity of the model is largely determined by the size and connectivity of the reservoir layer.

2.3. Echo State ARIMA Model (ES-ARIMA)

To address forecast lag, where predictions lag behind actual values, the AR and “I” models are commonly used, which are described in the above Equation (5). AR models can also correct other errors, like over- or under-prediction, by capturing trends, seasonality, and other patterns in the data through linear relationships between observations. Additionally, the error feature layer, inspired by the MA model in the above Equation (2), tackles the issue of random errors or noise in the time series data.
In this paper, we integrate the ARIMA model into the recurrent part of the ESN, resulting in the ES-ARIMA model. Figure 2 illustrates its structure. ES-ARIMA combines an ESN with an ARIMA model. ES-ARIMA has two main blocks:
  • ESN section: This section includes elements similar to the ESN diagram, such as reservoir neurons, input layer, and output layer. It also shows connections between these elements.
  • ARIMA section: This section depicts elements representing the ARIMA model’s components, such as Autoregressive (AR) and Moving Average (MA) components, with arrows indicating the flow of information.
The ES-ARIMA model includes three parts: linear features Ω l i n , error features Ω e r r o r , and nonlinear features Ω N o n l i n e a r . They are
  • The linear features Ω l i n , t at time step t are composed of observations from the input vector u t ; in Figure 1, the input is previous time series y t 1 in dataset D. At the current time t and previous time t s ,
    Ω l i n , t = U t U t s U t ( k 1 ) s
    where U t = [ u 1 , t , , u l , t ] T is a l-dimensional vector, Ω l i n , t has l × k matrix, and s is the number of skipped steps between consecutive observations. Ω l i n , t is used in the ARIMA model in Figure 2.
  • The error features Ω e r r o r is
    Ω e r r o r , t = ϵ t ϵ t 1 ϵ t q
    where ϵ and q are defined in (2). It is the MA model in Figure 1. In time series forecasting, random errors can have a significant impact on the accuracy of predictions, causing them to deviate from the actual values. By modeling the noises as in (15), the  accuracy of predictions can be improved. Additionally, MA models can help the ES-ARIMA model to identify the short-term fluctuations in the data.
  • The nonlinear feature Ω N o n l i n e a r is obtained from the echo state network,
    Ω n o n l i n e a r = x 1 x n
    where x i is defined in (12). Obviously, Ω N o n l i n e a r is the state vector of classical ESN.
The state of ES-ARIMA model X t is
X t = c Ω l i n , t Ω e r r o r , t Ω N o n l i n e a r
where c is a constant.
The output signal of ES-ARIMA y ^ t can be expressed as
y ^ t = W ¯ o u t × X t
The training of the ES-ARIMA is the same as ARIMA (10) and ESN (13), and the parameter vector W o u t is the combination of θ defined in (6) and W o u t defined in (11), i.e.,
W ¯ o u t = W o u t , θ
ES-ARIMA models, like many other statistical models, are susceptible to overfitting. We use the following method to prevent overfitting:
  • Information Criteria: When choosing the ARIMA order ( p , d , q ) , rely on information criteria like AIC or BIC instead of simply picking the model with the lowest in-sample error on the training data. These criteria penalize models for complexity, favoring simpler models that perform well on unseen data.
  • Limit Model Complexity: Avoid excessively high orders ( p , d , q ) for the ES-ARIMA model. Start with a simpler model and increase complexity only if the information criteria or diagnostics on the residuals suggest a need for more parameters.

3. Training of ES-ARIMA Model

3.1. Frequency Decomposition

Time series forecasting involves predicting the future values of a series over time, essentially anticipating its behavior several steps ahead. Frequency decomposition is a valuable technique that separates a time series into its constituent frequencies. This decomposition reveals hidden patterns within the data, which are particularly beneficial for prediction tasks.
High-frequency components capture short-term fluctuations, while low-frequency components reflect long-term trends. By decomposing the signal, the neural network can effectively distinguish between these timescales. It can then focus on the relevant component, whether it is the underlying trend (low frequency) or the short-term volatility (high frequency). For long-term forecasts, the model prioritizes the information contained in the low-frequency components.
Predicting the original time series y t directly with models like ESN and ARIMA can be challenging. However, decomposing the signal into its constituent frequencies makes each component more suitable for specific prediction methods
G y = y + a 1 + a y
where a 1 , it corresponds to the cutoff frequencies, which can be obtained by applying the more general lowpass-to-highpass transformation. or short-term predictions. If long-term forecasting is required, the weight of low frequency will be higher than high frequency.
Low-frequency components are well-suited for linear prediction using ARIMA, while high-frequency components benefit from nonlinear prediction methods like ESN:
X t = F × X t + tanh W i n Ω l i n , t + W X t 1
Here, F R 1 × l represents the forget weight, allowing the network to combine high- and low-frequency signals at different times.
Low-frequency signals facilitate long-term memory in ESN. By incorporating the ARIMA model’s weights as the forgetting rate, the network can selectively choose relevant information from the ESN, improving its ability to adapt to changes over time. This essentially allows the model to “forget” outdated information and focus on the most relevant data for prediction.
As shown in Figure 3, high-frequency signals capture fine details, while low-frequency signals provide the overall form and structure. We utilize another neural network to fuse the high-frequency and low-frequency components obtained after decomposition:
y ^ h = E S _ A R I M A h y h y ^ l = E S _ A R I M A l y l
where y h is high-frequency signal, y l is low-frequency signal, where y ^ h , t and y ^ l , t are the outputs of the low- and high-frequency ES-ARIMA models, respectively.
Finally, a two-layer Multilayer Perceptron (MLP) combines y ^ h , t and y ^ l , t :
y ^ t + 1 = W f σ V l y ^ l , t + V h y ^ h , t
where V h , V h , and W f are the weights, y ^ t + 1 is the final prediction, and  σ is a nonlinear activation function.

3.2. Transfer Learning

Missing values in time series data can be handled in two ways: dropping them or imputing them (replacing them with estimated values). While methods like linear interpolation, spline interpolation, and k-nearest neighbors exist for imputation, they may not work well for non-randomly distributed missing values. Recent research suggests that deep learning methods like Long Short-Term Memory (LSTM) networks can handle such scenarios more effectively.
Many existing prediction methods require sufficient and evenly distributed data. Transfer learning can be used to address this challenge by transferring knowledge from one time series (e.g., with complete data) to another (e.g., with missing values). This can improve the accuracy of machine learning algorithms for time series data with missing values.
Traditional ARIMA models struggle when dealing with uncertainties in training data, such as missing values. Let us assume the training data Λ a has missing values, while datasets Λ b and Λ c share similar distributions with Λ a . Our goal is to recover Λ a using data from Λ b and Λ c . If a relationship Ω exists between Λ b and Λ c , we can express Λ a as a function of them Λ a = F [ Λ b , Λ c ] . This section proposes a transfer learning approach where the ESN-ARIMA_a model can utilize the combined dataset Λ = Λ a Λ b Λ c .
For an ESN-ARIMA_a model, the training objective is to minimize both the training error and the number of parameters, as shown in the following equation:
J a = min W a λ W a 2 + x t a W a y a t a r g e t 2
where W a and x t a are the weight and the state of the ESN-ARIMA model, y a t a r g e t is the desired value.
For an ESN-ARIMAa model, the training objective is to minimize both training error and parameters, as follows: where 1 > λ > 0 , it is the regularization parameter, Y a t a r g e t Λ a .
When there are missing values in Λ a , we use Λ b to help us to improve the model ES-ARIMAa.
This source task Λ b uses a pre-trained Echo State Network (ESN) on a different but related time series forecasting problem of Λ a . The pre-trained ESN captures some general knowledge in Λ b that can be beneficial for the target task Λ a . The knowledge transfering from the source task Λ b to the target task Λ a involves using the weights or activations learned by the pre-trained ESN as a starting point for training the ESN component within the ES-ARIMA model. This transfer learning is shown in Figure 4.
The training index is defined as
J a b = min W a , W b , ξ a , ξ b 1 2 W b 2 + C a 2 i = 1 l a ξ a 2 + C b 2 j = 1 l b ξ b 2
where ξ a = x t a W a Y a t a r g e t , ξ b = x t b W b Y b t a r g e t , C a and C b are positive constants: they are the penalty coefficients on the prediction error from source and target domain, and l a and l b are the lengths of datasets Λ a and Λ b . W b denotes the output weight vector of the domain Λ b ; Y b t a r g e t is the target dataset B . From (24), we obtain the local solution W a . From (25), we will learn W b using all data from the source domain Λ b by leveraging a limited number.
For ES-ARIMA model, X t a R 1 × N , ξ a R 1 × l a , Y t a R 1 × l a denote the states of reservoir, the prediction error, and the target value in the domain Λ a . X t b R 1 × N , ξ b R 1 × l b , Y t b R 1 × l b denote the states of reservoir, the prediction error, and the target value in the domain Λ b . The corresponding Lagrangian problem of (25) is
L ( W b , ξ a , ξ b , α a , α b ) = 1 2 W b 2 + C a 2 i = 1 l a ξ a 2 + C b 2 j = 1 l b ξ b 2 α a ( X i a W b Y a + ξ a ) α b ( X b W b Y b + ξ b )
where α a and α b are Lagrangian multiplier vectors. The  problem (26) can be solved by
L W b = 0 W b = x t a α a + x t b α b L ξ a = 0 α a = C a ξ a T L ξ b = 0 α b = C b ξ b T L α a = 0 x t a W b Y a + ξ a = 0 L α b = 0 x t b W b Y b + ξ b = 0
where x t a and x t b are the output matrix of reservoir layer with respect to the data from target domain Λ a and source domain Λ b , respectively.
The solution of (27) is
W b = ( I + C b x t b x t b + C a x t a x t a ) 1 ( C b x t b Y b + C a x t a Y a )
Calculating the matrix inverse, as shown in Equation (28), can be computationally expensive and resource-intensive, especially for real-time applications and large matrices. This high demand for computational power and memory limits its practical use. To address this challenge, this paper proposes a novel recursive method for finding the inverse. This method offers several benefits, including
  • Fast convergence: It reaches the solution quickly, making it suitable for real-time applications.
  • Low computational complexity: It requires fewer calculations compared to traditional methods, reducing computational burden.
  • Improved numerical stability: It produces more accurate results, especially when dealing with ill-conditioned matrices.
To implement the recursive method, we require the use of the following matrix inverse lemma:
( A + B C D ) 1 = A 1 A 1 B ( C 1 + D A 1 B ) 1 D A 1
where A R n × n , C R m m , B R n × m , D R m × n . We define R t 1 = C a x t a x t a + C b x t b x t b + I , Z t = C b x t b Y b + C a x t a Y a , W b = R t 1 Z t . Then,
R t 1 = C a x t a x t a + C b x t b x t b + I = R t 1 1 + C a x t a T x t a + C b x t b T x t b + I = R t 1 1 + C a C b x t a x t b x t a x t b + I
Then, we can define A = R t 1 , B = R t 1 1 , C = [ x t a ( t ) , x t b ( t ) ] T , D = [ C a , C b ] using (29), (30) is
R t 1 = R t 1 1 + R t 1 1 C ( [ C a , C b ] + C T R t 1 1 C ) 1 C T R t 1 1 + I = R t 1 1 + R t 1 1 C C T R t 1 1 [ C a , C b ] + C T R t 1 1 C + I = R t 1 1 + R t 1 1 [ x t a , x t b ] T [ x t a , x t b ] R t 1 1 [ C a , C b ] + [ x t a , x t b ] R t 1 1 [ x t a , x t b ] T + I
If R t 1 = P t , then P t = P t 1 + P t 1 C C T P t 1 I + C T P t 1 C . We define K t = P t 1 C I + C T R t 1 1 C as gain vectors, and then
P t = P t 1 + K t C T P t 1 + I = P t 1 + K t x t a , x t b P t 1 + I
So, the output weight updates can be performed using
W t = P t Z t = P t [ Z t 1 + x t a y t a + x t b y t b ] = [ P t 1 + K t C T P t 1 ] Z t 1 + P t x t a y n a + P t x t b y t b = W t 1 K t ( C T W t 1 y t a y t b ) = W t 1 K t ( e a + e b )
where the error e a = x t a W t 1 Y a and e a = x t b W t 1 Y b can be easily computed. The scheme of the long-term prediction of time series with missing values using echo state ARIMA model is shown in Figure 5.
For implementation, the ES-ARIMA algorithm is summarized in Algorithm 1.
Algorithm 1 ARMA-ESN
1:
The inputs are the time series y a ( k ) , y b ( k ) in datasets Λ a , Λ b . Λ a is the target domain. Λ b is the source domain.
2:
Initialize two ESN networks, with N hidden neurons, random input weights W i n , and reservoir W for both target and source domains.
3:
Use algorithm RLS to obtain AR model weights as ESN forgetting rate.
4:
Calculate the hidden matrix X a and X b according to (21).
5:
Compute the output weights W o u t using (33).
6:
Return the output weights W o u t and predicted output Y a .
7:
The outputs are predicted output y a ( k + m ) and y b ( k + m ) .
Here, several remarks on the proposed method are presented:
  • While the proposed method offers significant advantages in handling missing data and improving forecasting accuracy, it is important to acknowledge the potential increase in computational complexity compared to simpler approaches. This is particularly relevant for large-scale datasets.
  • The potential increase in computational cost needs to be weighed against the significant benefits gained in terms of accuracy and robustness, especially for applications where high-fidelity wind speed prediction is crucial. Further research can explore optimization techniques to further enhance the scalability of the proposed method for even larger datasets.
  • The effectiveness of the method can be influenced by the quality of the data, particularly the extent and pattern of missing data. High percentage of missing values, especially if randomly distributed, could hinder the model’s ability to learn the underlying patterns in the data. One ideal approach might involve a combination of the method with data preprocessing techniques specifically tailored to the characteristics of the missing data.
  • While the proposed model offers significant improvements in accuracy by combining multiple techniques (ESN, ARIMA, frequency decomposition, and transfer learning), this complexity might come at the cost of interpretability. Understanding the exact contributions of each component to the final prediction can be challenging. Further research can explore additional methods for enhancing interpretability, such as visualization techniques or model-agnostic interpretable machine learning approaches. This will enable deeper understanding of the internal workings of the model and potentially lead to further improvements in its performance.
  • We acknowledge that the proposed method’s success is tied to the availability of sufficient historical data. However, the framework incorporates techniques that can use even smaller datasets effectively. Additionally, the transfer learning component allows the model to potentially benefit from knowledge learned from related domains, even if the target domain has limited data. While the ideal scenario involves abundant high-quality data, our approach offers advantages over existing methods by achieving a good baseline level of accuracy and capturing essential trends even with limited data availability.

4. Applications

To demonstrate the effectiveness of our Echo State ARIMA (ES-ARIMA) model for handling such data, we apply the proposed approach to real-world wind speed systems.
In recent decades, wind speed has emerged as a key solution to address the challenges of climate change and the growing demand for electricity. Its cleaner and more sustainable nature has propelled it to become one of the most important clean energy sources globally [32,33,34]. However, the efficiency of wind power generation is hampered by the inherent variability and uncertainty of wind speed and direction [35].
Wind speed forecasting, a crucial application of time series prediction, plays a vital role in mitigating these challenges. The existing prediction models can be broadly categorized into four types: physics-based, statistical, neural network, and fusion models. Each type is suitable for different time scales, with statistical models commonly employed for short-term forecasting and physical models favored for long-term predictions [20]. These time scales range from very short-term (seconds to 30 min) to very long-term (beyond 72 h).
This study utilizes three datasets from diverse geographical regions: Kaggle [36], Germany [37], and California [38]. We employ 75 % of each dataset for training multiple ARIMA models and the remaining 25 % for testing. These datasets contain 18,000 records. We used the first 15,000 records for training the model and the remaining 3000 records for testing.
The dataset details are (1) Time Period: the date range covered by the data is one year. (2) Features: the datasets include wind speed, wind direction, temperature, pressure, and humidity. (3) Target variable: wind speed. (4) Time Step: hourly. We aggregate these hourly values to create a daily time series. (5) Missing Values: approximately 30 % .
  • Kaggle Dataset: It contains hourly values from 7 wind farms spanning from July 2009 to June 2012. We use data from the first wind farm between 1 July 2009 and 31 December 2010.
  • California Dataset: It records hourly energy production from 5 wind farms, including geothermal, biomass, biogas, mini-hydraulic, total wind, solar photovoltaic, and solar thermal. We use data from 1 September 2011 to 31 August 2012.
  • Germany Dataset: This dataset encompasses data from four wind farms: Tennet, 50 Hertz, TransnetBW, and Amprion. The data are from 1 January 2011 to 31 December 2011.
We use the following min–max normalization to scale the data to a similar range, leading to faster training and convergence of neural networks,
y n o r m ( i ) = y i min ( y ) max ( y ) min ( y )
where y i is each datum of the time series, min ( y ) is the minimum of the data y, and max ( y ) is the maximum of the data y.
For long-term prediction of time series,
y t + m = f y t 1 , y t 2 ,
where m 1 , in this paper, we select m = 2 , i.e., for daily records of wind speed. We consider 48 h forecasts in the simulations because
  • While short-term forecasts are most valuable for real-time operations, including 48 h forecasts, this showcases the proposed method’s ability to handle a wider range of prediction horizons.
  • Research suggests that incorporating information from longer time frames can sometimes improve the accuracy of shorter-term forecasts. By including 48 h data in the training process, we might be capturing underlying patterns that benefit the overall performance, even for the crucial short-term predictions.
The model is
y t + m = A R I M A y t 1 , y t 2 ,
Time series forecasting is
y t + m + 1 y t + m + 1 = A R I M A y t , y t 1 ,
Wind speed plays a critical role in determining electricity generation by wind turbines. However, wind speed datasets often exhibit limitations. High wind speeds, crucial for accurate forecasting, occur only for a small fraction of the year (around 0.1 % ). This imbalance can negatively impact prediction accuracy for high wind speeds compared to regular wind speeds.
To demonstrate the effectiveness of our transfer learning approach, we utilize the Kaggle dataset. It contains hourly wind speed data from 7 wind farms between July 2009 and June 2012. Despite the missing data, the wind speed data across the different farms exhibit substantial similarity due to their geographic proximity. This similarity is ideal for applying transfer learning techniques. These techniques can leverage knowledge gained from one dataset (source domain) to improve predictions for another (target domain). The histogram of wind speed data from different farms is illustrated in Figure 6.
The equation for the ES-ARIMA model is shown in (17). To determine the appropriate ARIMA model structure, we need to analyze the stationarity of the data and obtain the value of d. We use the Augmented Dickey–Fuller (ADF) test [39] for stationarity analysis. The results are presented in Table 1.
Critical values of 1 % , 5 % , and 10 % are used for the ADF hypothesis test. If the hypothesis is accepted, the ARIMA model contains a unit root, indicating non-stationarity. For the Kaggle dataset, the initial test with zero differentiation produces a value of 2.153 . This value is less than all critical values, indicating the time series needs differencing. After one differentiation, the hypothesis is accepted, making the Kaggle set stationary with d = 1 .
ACF (Autocorrelation Function) and PAF (Partial Autocorrelation Function) are crucial tools for selecting the appropriate ARIMA model order (p, d, q). By analyzing their patterns, we can identify and address autocorrelations in our time series data, leading to a more accurate model. Figure 7 shows the ACF/PAF plots for the three datasets. Table 2 displays the ARIMA parameters obtained from ACF/PAF analysis and the ADF test. So, the optimal ARIMA models for the datasets are Kaggle—ARIMA(1,1,2), Germany—ARIMA(1,0,4), and California—ARIMA(0,1,2).
The prediction results using the ES-ARIMA model are shown in Figure 8.
In frequency decomposition, the transfer function of the filters is
G y = y + 0.17 1 + 0.17 y
where 0.17 corresponds to the cutoff frequencies.
Figure 9 depicts the prediction results with frequency decomposition and ES-ARIMA. While these results are better than using only ES-ARIMA, missing data still affect the accuracy.
We aim to enhance prediction accuracy by leveraging transfer learning. This technique allows the model to identify patterns and relationships common across different datasets, leading to more accurate and robust predictions. Our objective is to forecast data in Farm 6 by utilizing information from other farms (source domain).
As shown in Figure 10, transfer learning generally delivers favorable results when source and target domains exhibit high similarity. However, low similarity can lead to lower performance, as illustrated in Figure 9.
Despite the limitations in this case, transfer learning still outperforms directly supplementing Farm 6 data with data from Farm 5. This highlights the potential of transfer learning to improve predictions even with low similarity between source and target domains. By utilizing knowledge from related datasets, transfer learning can help to mitigate the impact of missing data in the target domain.

Comparisons

ARIMA Models vs. Neural Networks:
ARIMA Models: These models are widely used for time series forecasting and excel at identifying patterns and trends in historical data. However, they may struggle with complex relationships or nonlinear data.
Neural Networks: These powerful machine learning techniques can learn complex patterns and relationships, even in nonlinear data. They often outperform ARIMA models in terms of accuracy but require more computational power and data for training.
Comparison with Other Forecasting Methods:
We compared our ES-ARIMA model with several classical methods:
  • Multilayer Perceptron (MLP): A type of artificial neural network.
  • Classical Echo State Network (ESN): A type of recurrent neural network.
  • ARIMA: The standard ARIMA model.
  • T-ARIMA: ARIMA with classical transfer learning.
  • ARIMA: classical ARIMA model
Evaluation Metrics:
The following metrics were used to compare forecasting errors:
M A E = i n y i y i n S M A P E = 1 n i n y i y i y i + y i / 2 R M S E = i n y i y i 2 n R 2 = 1 i n y i y i 2 i n y i y ¯ 2 , y ¯ = 1 n i n y i
where Mean Absolute Error (MAE): Average absolute difference between predicted and actual values. Symmetric Mean Absolute Percentage Error (SMAPE): Error metric less sensitive to outliers than other measures. Root Mean Squared Error (RMSE): Square root of the average squared difference between predicted and actual values. R-squared ( R 2 ): Proportion of variance in the predicted values that can be explained by the independent variable (closer to 1 indicates better fit). However, high R 2 could be overfitting the training data.
Results and Discussion:
Table 3, Table 4 and Table 5 present the comparison results for the three datasets (California, Germany, and Kaggle).
  • California: The multiple ARIMA model significantly outperforms the single ARIMA model, with improvements exceeding 50 % in all metrics (e.g., 74.37 % improvement in MAE).
  • Germany: The multiple ARIMA model exhibits an average improvement of 89 % compared to the single ARIMA model.
  • Kaggle: The ES-ARIMA model achieves an average improvement of 35.68 % in SMAPE, RMSE, and R 2 compared to the other models.
The proposed ES-ARIMA models with frequency decomposition and transfer learning outperform the other classical models for long-term prediction with missing data. The ES-ARIMA model achieves significant improvements over ESN and classical ARIMA models, particularly for the Germany dataset (nearly 90 % improvement).
The comparative tables show that the proposed ES-ARIMA models significantly reduce prediction errors compared to the other methods. These models achieve lower MAE, SMAPE, and RMSE values, while exhibiting R 2 values closer to 1, indicating a superior fit to the data.

5. Conclusions

This paper proposes a novel approach to improve the accuracy of time series forecasting with missing data (an average improvement of 50 % ). By combining an Echo State Network (ESN) with frequency decomposition and transfer learning, we demonstrate significant improvements in prediction accuracy. Frequency decomposition addresses the specific characteristics of a time series, while online transfer learning mitigates the impact of missing data. The proposed model’s effectiveness is demonstrated in a wind speed forecasting case study, highlighting its practical value.
While the proposed model demonstrates effectiveness in the wind speed prediction case study, we are currently conducting further validation efforts to assess its generalizability to other domains and datasets. These efforts involve applying the model to diverse time series forecasting problems, analyzing its performance across different data characteristics. We may also investigate techniques from explainable AI to make the neural network predictions more interpretable, allowing for better understanding of the factors influencing wind speed forecasts.

Author Contributions

Methodology, X.L.; Software, Y.Z.; Writing—original draft, X.L.; Writing—review & editing, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Mexican CONAHCYT (Consejo Nacional de Humanidades, Ciencias y Tecnologias) grant CF-2023-I-2614.

Data Availability Statement

The data presented in this study are openly available in three repositories, see the links in references [36,37,38], accessed on 26 January 2024.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this article.

Abbreviations

The following abbreviations are used in this manuscript:
Echo State NetworkESN
Autoregressive Integrated Moving AverageARIMA
Long Short-Term MemoryLSTM
Numerical Weather PredictionNWP
Domain AdaptationDA
Extreme Learning MachinesELM
Transfer LearningTL
Ensemble LearningEL
Mode DecompositionEMD
Local Mean DecompositionLMD
Multilayer PerceptronMLP
Echo State ARIMAES-ARIMA
Auto RegressiveAR
Moving AverageMA
Augmented Dickey–FullerADF
Mean Absolute ErrorMAE
Symmetric Mean Absolute Percentage ErrorSMAPE
Root Mean Squared ErrorRMSE
R-squared R 2

References

  1. Navas, R.K.B.; Prakash, S.; Sasipraba, T. Artificial Neural Network based computing model for wind speed prediction: A case study of Coimbatore, Tamil Nadu, India. Phys. A Stat. Mech. Its Appl. 2020, 542, 123–383. [Google Scholar] [CrossRef]
  2. Ahadi, A.; Liang, X. Wind Speed Time Series Predicted by Neural Network. In Proceedings of the 2018 IEEE Canadian Conference on Electrical Computer Engineering (CCECE), Quebec, QC, Canada, 13–16 May 2018; pp. 1–4. [Google Scholar] [CrossRef]
  3. Madhiarasan, M. Accurate prediction of different forecast horizons wind speed using a recursive radial basis function neural network. Prot. Control Mod. Power Syst. 2020, 5, 22. [Google Scholar] [CrossRef]
  4. Liu, F.; Li, R.; Dreglea, A. Wind Speed and Power Ultra Short-Term Robust Forecasting Based on Takagi–Sugeno Fuzzy Model. Energies 2019, 12, 3551. [Google Scholar] [CrossRef]
  5. Dhunny, A.; Doorga, J.; Allam, Z.; Lollchund, M.; Boojhawon, R. Identification of optimal wind, solar and hybrid wind-solar farming sites using fuzzy logic modelling. Energy 2019, 188, 116056. [Google Scholar] [CrossRef]
  6. Shabbir, N.; AhmadiAhangar, R.; Katt, L.; Iqbal, M.N.; Rosin, A. Forecasting Short Term Wind Energy Generation using Machine Learning. In Proceedings of the 2019 IEEE 60th International Scientific Conference on Power and Electrical Engineering of Riga Technical University (RTUCON), Riga, Latvia, 7–9 October 2019; pp. 1–4. [Google Scholar] [CrossRef]
  7. Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
  8. Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
  9. Zhang, Z.; Zhu, Y.; Wang, X.; Yu, W. Optimal echo state network parameters based on behavioural spaces. Neurocomputing 2022, 503, 299–313. [Google Scholar] [CrossRef]
  10. Hu, H.; Wang, L.; Lv, S.X. Forecasting energy consumption and wind power generation using deep echo state network. Renew. Energy 2020, 154, 598–613. [Google Scholar] [CrossRef]
  11. Bai, Y.; Liu, M.D.; Ding, L.; Ma, Y.J. Double-layer staged training echo-state networks for wind speed prediction using variational mode decomposition. Appl. Energy 2021, 301, 117461. [Google Scholar] [CrossRef]
  12. Yatiyana, E.; Rajakaruna, S.; Ghosh, A. Wind speed and direction forecasting for wind power generation using ARIMA model. In Proceedings of the 2017 Australasian Universities Power Engineering Conference (AUPEC), Melbourne, VIC, Australia, 19–22 November 2017; pp. 1–6. [Google Scholar] [CrossRef]
  13. Elsaraiti, M.; Merabet, A.; Al-Durra, A. Time Series Analysis and Forecasting of Wind Speed Data. In Proceedings of the 2019 IEEE Industry Applications Society Annual Meeting, Baltimore, MD, USA, 29 September–3 October 2019; pp. 1–5. [Google Scholar] [CrossRef]
  14. Eldali, F.A.; Hansen, T.M.; Suryanarayanan, S.; Chong, E.K.P. Employing ARIMA models to improve wind power forecasts: A case study in ERCOT. In Proceedings of the 2016 North American Power Symposium (NAPS), Denver, CO, USA, 18–20 September 2016; pp. 1–6. [Google Scholar] [CrossRef]
  15. Zhang, Y.; Zhao, Y. Research on Wind Power Prediction Based on Time Series. In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID), Guangzhou, China, 28–30 May 2021; pp. 78–82. [Google Scholar] [CrossRef]
  16. Ren, X. Deep Learning-Based Weather Prediction: A Survey. Big Data Res. 2021, 23, 100178. [Google Scholar]
  17. Backhus, J.; Rao, A.R.; Venkatraman, C.; Padmanabhan, A.; Kumar, A.V.; Gupta, C. Equipment Health Assessment: Time Series Analysis for Wind Turbine Performance. arXiv 2024, arXiv:2403.00975. [Google Scholar]
  18. Wang, W.; Witold Pedrycz, X.L. Time serieslong-termforecastingmodelbasedoninformation granulesandfuzzyclustering. Eng. Appl. Intell. 2015, 41, 17–24. [Google Scholar] [CrossRef]
  19. Lu, P.; Ye, L.; Tang, Y.; Zhao, Y.; Zhong, W.; Qu, Y.; Zhai, B. Ultra-short-term combined prediction approach based on kernel function switch mechanism. Renew. Energy 2021, 164, 842–866. [Google Scholar] [CrossRef]
  20. Singh, S.N.; Mohapatra, A. Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renew. Energy 2019, 136, 758–768. [Google Scholar]
  21. Maya, M.; Yu, W.; Li, X. Time series forecasting with missing data using neural network and meta-transfer learning. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI 2021), Orlando, FL, USA, 5–7 December 2021; pp. 1–6. [Google Scholar]
  22. Liu, T.; Wei, H.; Zhang, K. Wind power prediction with missing data using Gaussian process regression and multiple imputation. Appl. Soft Comput. 2018, 71, 905–916. [Google Scholar] [CrossRef]
  23. Wen, H.; Pinson, P.; Gu, J.; Jin, Z. Wind energy forecasting with missing values within a fully conditional specification framework. Int. J. Forecast. 2023, 40, 77–95. [Google Scholar] [CrossRef]
  24. Zhang, L.; Zhang, D. Domain Adaptation Extreme Learning Machines for Drift Compensation in E-Nose Systems. IEEE Trans. Instrum. Meas. 2015, 64, 1790–1801. [Google Scholar] [CrossRef]
  25. Ye, R.; Dai, Q. A novel transfer learning framework for time series forecasting. Knowl.-Based Syst. 2018, 156, 74–99. [Google Scholar] [CrossRef]
  26. Yunus, K.; Thiringer, T.; Chen, P. ARIMA-Based Frequency-Decomposed Modeling of Wind Speed Time Series. IEEE Trans. Power Syst. 2016, 31, 2546–2556. [Google Scholar] [CrossRef]
  27. Hu, H.; Wang, L.; Tao, R. Wind speed forecasting based on variational mode decomposition and improved echo state network. Renew. Energy 2021, 164, 729–751. [Google Scholar] [CrossRef]
  28. Rao, A.; Reimherr, M. Modern multiple imputation with functional data. Stat 2020, 10, e331. [Google Scholar] [CrossRef]
  29. Tian, Z.; Chen, H. Multi-step short-term wind speed prediction based on integrated multi-model fusion. Appl. Energy 2021, 298, 117248. [Google Scholar] [CrossRef]
  30. Liu, H.; Tian, H.-Q.; Li, Y.-F.; Zhang, L. Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions. Energy Convers. Manag. 2015, 92, 67–81. [Google Scholar] [CrossRef]
  31. Peng, Y.; Lei, M.; Li, J.B.; Peng, X.Y. A novel hybridization of echo state networks and multiplicative seasonal ARIMA model for mobile communication traffic series forecasting. Neural Comput. Appl. 2014, 24, 883–890. [Google Scholar] [CrossRef]
  32. Olabi, A. Renewable Energy and Energy Storage Systems. Energy 2017, 136, 1–6. [Google Scholar] [CrossRef]
  33. GWEC. Global Wind Report 2021; Technical report; Global Wind Energy Council: Lisbon, Portugal, 2021. [Google Scholar]
  34. Zuluaga, C.D.; Álvarez, M.A.; Giraldo, E. Short-term wind speed prediction based on robust Kalman filtering: An experimental comparison. Appl. Energy 2015, 156, 321–330. [Google Scholar] [CrossRef]
  35. Tang, Z.; Zhao, G.; Ouyang, T. Two-phase deep learning model for short-term wind direction forecasting. Renew. Energy 2021, 173, 1005–1016. [Google Scholar] [CrossRef]
  36. Kaggle. Kaggle-Global Energy Forecasting Competition 2012. 2012. Available online: https://www.kaggle.com/c/GEF2012-wind-forecasting/ (accessed on 26 January 2024).
  37. Germany. Netztransparenz-Informationsplattform der Deutschen Übertragungsnetzbetreiber. 2020. Available online: https://www.netztransparenz.de/en/ (accessed on 26 January 2024).
  38. California. California ISO-Renewables and Emissions Reports. 2013. Available online: https://www.caiso.com/market/Pages/ReportsBulletins/DailyRenewablesWatch.aspx (accessed on 26 January 2024).
  39. Dickey, D.A.; Fuller, W.A. Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 1981, 49, 1057–1072. [Google Scholar] [CrossRef]
Figure 1. ESN structure.
Figure 1. ESN structure.
Mathematics 12 01137 g001
Figure 2. Echo state ARIMA model.
Figure 2. Echo state ARIMA model.
Mathematics 12 01137 g002
Figure 3. Frequency decomposition for echo state ARIMA model.
Figure 3. Frequency decomposition for echo state ARIMA model.
Mathematics 12 01137 g003
Figure 4. Transfer learning to improve the model ES-ARIMA.
Figure 4. Transfer learning to improve the model ES-ARIMA.
Mathematics 12 01137 g004
Figure 5. Scheme of the long-term prediction of time series with missing values using echo state ARIMA model.
Figure 5. Scheme of the long-term prediction of time series with missing values using echo state ARIMA model.
Mathematics 12 01137 g005
Figure 6. The histogram of wind speed.
Figure 6. The histogram of wind speed.
Mathematics 12 01137 g006
Figure 7. ACF/PAF analysis of the three datasets. (a) ACF analysis of Kaggle Farm 1 data; (b) PAF analysis of Kaggle Farm 1 data; (c) ACF analysis of California data; (d) PAF analysis of California data; (e) ACF analysis of Germany TenneTTSO data; (f) PAF analysis of Germany TenneTTSO data.
Figure 7. ACF/PAF analysis of the three datasets. (a) ACF analysis of Kaggle Farm 1 data; (b) PAF analysis of Kaggle Farm 1 data; (c) ACF analysis of California data; (d) PAF analysis of California data; (e) ACF analysis of Germany TenneTTSO data; (f) PAF analysis of Germany TenneTTSO data.
Mathematics 12 01137 g007
Figure 8. The prediction of Kaggle using ES-ARIMA: Farm 6.
Figure 8. The prediction of Kaggle using ES-ARIMA: Farm 6.
Mathematics 12 01137 g008
Figure 9. The prediction of Kaggle using frequency decomposition: Farm 6.
Figure 9. The prediction of Kaggle using frequency decomposition: Farm 6.
Mathematics 12 01137 g009
Figure 10. The prediction of using transfer learning: Farm 5.
Figure 10. The prediction of using transfer learning: Farm 5.
Mathematics 12 01137 g010
Table 1. Dickey–Fuller testing results of the datasets.
Table 1. Dickey–Fuller testing results of the datasets.
SerieADF Statisticp-Value1%5%10%H0d
Kaggle1−2.20.2−3.4−2.9−2.6not stationary0
Kaggle2−7.90.041−3.4−2.9−2.61stationary1
Germany−2.90.045−3.4−2.8−2.5stationary0
California1−1.20.66−3.4−2.8−2.5not stationary0
California2−9.70.086−3.4−2.8−2.5stationary1
Table 2. ARIMA model parameters obtained with the tests of ACF, PAF, and the augmented Dickey–Fuller test.
Table 2. ARIMA model parameters obtained with the tests of ACF, PAF, and the augmented Dickey–Fuller test.
Seriepdq
Germany1,201,2,3,4,5
California1,2,3,4,5,610
Kaggle1,211,2,3
Table 3. Comparison of forecasting errors with California data.
Table 3. Comparison of forecasting errors with California data.
ModelsMAESMAPERMSER-Squared
MLP1.230.424.840.82
ESN2.461.635.370.79
T-ARIMA1.670.872.260.89
ARIMA1.921.213.650.83
ES-ARIMA0.610.071.260.93
Table 4. Comparison of forecasting errors with Germany data.
Table 4. Comparison of forecasting errors with Germany data.
ModelsMAESMAPERMSER-Squared
MLP0.260.720.390.91
ESN0.4870.730.7630.85
T-ARIMA0.730.580.810.81
ARIMA0.820.710.570.89
ES-ARIMA0.0730.140.0690.95
Table 5. Comparison of forecasting errors with Kaggle data.
Table 5. Comparison of forecasting errors with Kaggle data.
ModelsMAESMAPERMSER-Squared
MLP1.820.953.780.83
ESN1.451.642.970.87
T-ARIMA1.820.922.930.88
ARIMA1.511.142.870.92
ES-ARIMA1.240.262.730.91
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Zhu, Y. Neural Networks with Transfer Learning and Frequency Decomposition for Wind Speed Prediction with Missing Data. Mathematics 2024, 12, 1137. https://doi.org/10.3390/math12081137

AMA Style

Li X, Zhu Y. Neural Networks with Transfer Learning and Frequency Decomposition for Wind Speed Prediction with Missing Data. Mathematics. 2024; 12(8):1137. https://doi.org/10.3390/math12081137

Chicago/Turabian Style

Li, Xiaoou, and Yingqin Zhu. 2024. "Neural Networks with Transfer Learning and Frequency Decomposition for Wind Speed Prediction with Missing Data" Mathematics 12, no. 8: 1137. https://doi.org/10.3390/math12081137

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop