Multiple-Reservoir Hierarchical Echo State Network

Lun, Shuxian; Sun, Zhenduo; Li, Ming; Wang, Lei

doi:10.3390/math11183961

Open AccessArticle

Multiple-Reservoir Hierarchical Echo State Network

School of Control Science and Engineering, Bohai University, Jinzhou 121013, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(18), 3961; https://doi.org/10.3390/math11183961

Submission received: 8 August 2023 / Revised: 6 September 2023 / Accepted: 6 September 2023 / Published: 18 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

Leaky Integrator Echo State Network (Leaky-ESN) is a useful training method for handling time series prediction problems. However, the singular coupling of all neurons in the reservoir makes Leaky-ESN less effective for sophisticated learning tasks. In this paper, we propose a new improvement to the Leaky-ESN model called the Multiple-Reservoir Hierarchical Echo State Network (MH-ESN). By introducing a new mechanism for constructing the reservoir, the efficiency of the network in handling training tasks is improved. The hierarchical structure is used in the process of constructing the reservoir mechanism of MH-ESN. The MH-ESN consists of multiple layers, each comprising a multi-reservoir echo state network model. The sub-reservoirs within each layer are linked via principal neurons, which mimics the functioning of a biological neural network. As a result, the coupling among neurons in the reservoir is decreased, and the internal dynamics of the reservoir are improved. Based on the analysis results, the MH-ESN exhibits significantly better prediction accuracy than Leaky-ESN for complex time series prediction.

Keywords:

Leaky-ESN; reservoir; multi-reservoirs; MH-ESN; main neurons; time series prediction

MSC:

93-10

1. Introduction

For both pattern recognition and time series prediction, the basis for their implementation is the need to first build an accurate model of the system. Though it is theoretically proven that fuzzy models can achieve arbitrary accuracy in approximating any nonlinear system, their ability to map nonlinearities is considerably limited by the utilization of linear functions in the back parts of the fuzzy rules. For systems with strong nonlinearity, a very large number of rules are required to model the system accurately.

Echo state network (ESN) is an optimization model for a recurrent neural network (RNN) [1], which reconstructs the hidden layer with a reservoir consisting of a high number of sparsely connected neurons. It achieves data memorization by modifying the state of the neuron within the reservoir. The main feature of ESN is to facilitate the network formation process by computing only the connection weights from the reservoir to the output layer. While the other weights remain unchanged, it avoids local minima that tend to occur in traditional RNN. This simplified method of training enables the ESN to achieve good performance when the echo state requirements are met [2]. ESN have been used effectively in some fields, such as time series forecasting [3], dynamic pattern recognition [4], speech recognition [5] and nonlinear signal processing [6].

However, conventional ESN reservoirs have randomly connected neurons. The tunable parameters are the amount of reservoir neurons and the spectral radius of internal connection matrix. In fact, ESN with the same size and spectral radius can show significant differences across training [7]. Therefore, many scholars have proposed improvement methods for ESN, e.g., modifying the topology of reservoir [8,9], refining the state update equation of network, and choosing a more optimal algorithm for the training process [10,11].

In order to enhance both the prediction accuracy and speed, wavelet neurons were introduced to the internal structure of the reservoir in the conventional ESN. This adjustment was made to achieve better performance, as stated in [12]. Minimum complexity reservoirs and simple circulation reservoirs (SCR) are discussed, which have similar properties to stochastic reservoirs [13]. In [14], a novel small-world ESN is proposed, which dynamically corrects the stochastic sparse network of reservoir with an improved Newman Watts (NW) model as a way to increase the prediction accuracy and convergence speed of the ESN. A new optimization method is proposed to constrain the optimization of the ESN using the penalty function interior point method and then further optimize the parameters of the ESN [15]. Introducing a fresh approach named double activation functions echo state network (DAF-ESN), the proposed model substitutes the single activation function of the reservoir with a weighted superposition of various activation functions. Consequently, the ESN state transition equations will be modified, rendering the network model more flexible and responsive to distinct input signal [16]. An optimization procedure is designed to evaluate the output weight based on a fixed number of training patterns, and the evaluation results demonstrate that the optimization procedure can increase the prediction performance of ESN [17]. An echo state network that can cluster the number of neurons in the growth reservoir to improve the prediction accuracy is proposed in [18]. In [19], the author applies an improved gravity search algorithm for the hyperparameter optimization of echo state network to improve the prediction effect of the network.

Since the ESN reservoir has a stochastic network topology, its stochastic nature is difficult to meet the demand for prediction of time series with different characteristics. Furthermore, ESN with only a single reservoir has significant limitations. It was not previously possible to train an ESN to operate as a Multiple Superimposed Oscillator (MSO) [20]. Wiestra states that the MSO problem fails because the neurons are all connected in a coupled way, while the solution requires multiple couplings in the reservoir [21].

In this study, we introduce an innovative ESN model to overcome the above-mentioned limitations, termed the Multiple-Reservoir Hierarchical Echo State Network (MH-ESN). This model consists of several sub-reservoirs of different complexities joined together to form a new total reservoir, which reduces the coupling effect between the neurons in the total reservoir of the conventional ESN. Furthermore, the Xwish function [22] is used as the new activation function to mitigate the issue of gradient disappearance during the network’s training process of the network. The proposed model is compared with GRU and Leaky-ESN models on two synthetic time series and a real dataset. The results show that MH-ESN greatly improves the prediction performance of time series, which proves the effectiveness of the proposed model.

2. Leaky-ESN and Xwish Function

2.1. Leaky-ESN

Leaky integrator echo state network (Leaky-ESN) is a modified network model of the ESN [23]. It replaces the normal neurons in the ESN reservoir with leaky-integrating neurons, giving the sparse network the ability to learn with continuous and slow properties. Leaky-integrating neurons function as low-pass filters on the reservoir neurons, thereby improving the short-term memory capacity of the ESN and enabling it to adapt to temporal characteristics of the network learning task in diverse ways. Leaky-ESN and regular ESN have identical network structures, which comprises three primary components: the input layer, the reservoir, and the output layer. The reservoir replaces the fixed hidden layer of the RNN and functions as the core of the ESN. Throughout the training sequence, the reservoir has the capability to map input data from a low-dimensional input space to a nonlinear, high-dimensional state variable. This state variable is then linearly fitted to the desired output, simplifying the network training process. The classic layout of the Leaky-ESN is seen in Figure 1.

In Figure 1, during the sampling process, the input variable at time n is denoted as

u (n)

, the internal state variable of the reservoir is denoted as

x (n)

, and

y (n)

represents the output vector of the output layer. The input connection weight matrix of the Leaky-ESN is represented as

W^{i n}

, where the elements are randomly generated numbers between 0 and 1. Meanwhile, W denotes the randomly generated weight matrix for the internal connections within the reservoir, which must satisfy the sparsity requirement to ensure the stability of the internal states. The output feedback connection weight matrix of the network is represented by

W^{f b}

. Additionally,

W^{o u t}

represents the weight matrix of the feedback connections, wherein the generation process is similar to that of

W^{i n}

. The formula for updating the state can be expressed as follows:

x (n + 1) = (1 - a) x (n) + f (W^{i n} u (n + 1) + W x (n) + W^{f b} y (n)),

(1)

y (n) = f^{o u t} (W^{o u t} [x (n); u (n)]),

(2)

where

a \in [0, 1]

is the leakage rate, when the leakage rate equals 1, and the leaky integrate neuron undergoes a transition and assumes the characteristics of a regular neuron;

s^{i n}

is the input scaling factor; and

ρ

is the spectral radius of the internal state matrix of the reservoir. The activation function f employed via the internal neurons within the reservoir typically follows the form of a sigmoidal function, such as the Sigmoid or Tanh function. Meanwhile, the output activation function

f^{o u t}

is typically defined to be the identity function. In the whole training process, W and

W^{i n}

generated randomly during the initialization of the network, and

W^{o u t}

is calculated via the network training.

Different from the gradient class training algorithm adopted via RNN, ESN usually adopts linear regression class training to output weights

W^{o u t}

. This paper adopts the ridge regression algorithm to solve for

W^{o u t}

, with the following equation:

W^{o u t} = {(X^{T} X + θ I)}^{- 1} X^{T} Y,

(3)

where

θ

is a hyperparameter, and I is a diagonal unit matrix.

The normalized root mean square error (NRMSE) is utilized both as an assessment tool for evaluating the training performance of Leaky-ESN and as a benchmark for gauging the effectiveness of the improved network model that follows. The expression is as follows:

NRMSE (y, d) = \frac{\sqrt{\frac{1}{n} \sum_{i = 1}^{n} {∥y (i) - d (i)∥}^{2}}}{σ (d)},

(4)

where

y (i)

denotes the ith data of the actual output;

d (i)

denotes the ith data of the desired output;

∥\cdot∥

means the Euclidean distance; and

σ (d)

denotes the standard deviation of the desired output.

The stability of ESN during the training process depends on whether the network model satisfies the echo state property (ESP) [24]. The echo state property of ESN means that when the network reaches a stable state, the initial state has almost no impact on the network, and the state of the reservoir neurons is determined solely using the current input and output. At this point, the network gradually approaches a balanced state and exhibits the echo state property. Echo state property serves as the foundation for ensuring the stability of ESN training. However, in practical training, to guarantee the echo state property of ESN, the spectral radius of the reservoir neuron connection weight matrix should be less than 1. To simplify the complexity of echo state network prediction, the ESN model typically ignores the influence of output feedback on the internal state by setting it to zero in the output feedback matrix.

2.2. Xwish Function

The significance of the activation function in the update process of the reservoir state within ESN becomes apparent when analyzing the reservoir state update equation. Therefore, the activation function determines how the reservoir state is updated. Traditional ESN models generally choose the sigmoidal function as the activation function.

In order to mitigate the effect of the gradient disappearance caused by the S-shaped function in the network training on the prediction accuracy, the Xwish function [22] is used as a new activation function for the echo state network in this paper. Its function formula is as follows:

X w i s h (x) = \frac{x}{π} [arctan (β x) + 0.5 π],

(5)

where

β

is a parameter subject to correction. In Figure 2, the Xwish function is depicted for various values of

β

, and it exhibits smoothness across the real number range. Figure 3 shows the image of the first order derivative of the Xwish function when

β

takes different values, which is smooth in the range of real numbers.

The Xwish function slows down the gradient disappearance, which improves the convergence accuracy of the function, and the new function is centered on 0, which improves the efficiency of the function when the weights are updated. In addition, its weights can be continuously updated without affecting the next training process, which maintains the diversity of input data and results in better convergence.

3. Multiple Reservoirs Hierarchical Echo State Network

3.1. MH-ESN Model

Considering the reservoir internal random sparse connections between neurons, its topology structure is unclear. Therefore, we imagine that the reservoir with a relatively clear topology may have better prediction performance. Some studies have shown that the hierarchical topology of the reservoir can improve the prediction performance of ESN [25]. By integrating the hierarchical topology with the reservoir design of ESN, a multi-reservoir hierarchical echo state network (MH-ESN) is proposed based on the Leaky-ESN model. The MH-ESN comprises multiple layers, with each layer consisting of a multi-reservoir echo state network model. Its topology is shown in Figure 4. The sub-reservoirs of each layer are connected through main neurons, which represent the state of the sub-reservoir by averaging the states of the neurons within it. The connections between main neurons can form a virtual sub-reservoir, as shown by the red dashed box, which is different from the previous actual subreservoirs in that the states of neurons in the virtual reservoir can vary greatly. This chapter selects a two-layer network structure, with each layer having three subreservoirs.

In Figure 4, the input signal is denoted by

u (n)

, the state of the whole reservoir is represented by

x (n)

, and the output signal is denoted by

y (n)

. The number of neurons in each layer is

N 1

and

N 2

, respectively. The first layer consists of sub-reservoirs with sizes of

N 1_{i} (i \in 1, 2, 3)

, and the second layer consists of sub-reservoirs, with sizes of

N 2_{i} (i \in 1, 2, 3)

. The size of the total reservoir N is equal to the sum of the sizes of all sub-reservoirs. The connection weight matrix for sub-reservoir neurons is generated in the same way as for single-reservoir neuron connection weight matrices. The weight matrix of the total reservoir W is given in block matrix form as follows:

W = [\begin{matrix} W_{A} & W_{A B} \\ W_{A B}^{T} & W_{B} \end{matrix}],

(6)

where

W_{A}

represents the total connections within the two-layer reservoir, while the weight matrix

W_{B}

corresponds to the interconnections within the virtual reservoir, and

W_{A B}

pertains to the connections between the main neuron and other neurons within each sub-reservoir. The expression of

W_{A}

is as follows:

W_{A} = [\begin{matrix} W_{1} & 0 \\ 0 & W_{2} \end{matrix}],

(7)

where

W_{1}

and

W_{2}

are the first- and second-layer weight connection matrices. In this paper, the state update equation for the matrix normalization of the MH-ESN model network after discretization using the Xwish function as the new activation function is expressed as follows:

x (n + 1) = (1 - a) [\begin{matrix} x_{A} (n) \\ x_{B} (n) \end{matrix}] + Φ (s^{i n} [\begin{matrix} W_{A}^{i n} (n) \\ W_{B}^{i n} (n) \end{matrix}] u (n + 1) + ρ W [\begin{matrix} x_{A} (n) \\ x_{B} (n) \end{matrix}] + s^{f b} W^{f b} y (n)),

(8)

y (n) = f^{o u t} (W^{o u t} [x (n); u (n)]),

(9)

where

x (n) = {[x_{A} (n), x_{B} (n)]}^{T}

is the total reservoir state,

x_{B} (n)

is the main neuron state in the virtual reservoir,

x_{A} (n) = {[x_{1} (n), x_{2} (n)]}^{T}

is the total two-layer reservoir state,

W_{A}^{i n}

is the two-layer input weight matrix,

W_{B}^{i n}

is the virtual subreservoir input weight matrix,

W^{f b}

is the feedback weight matrix, and

Φ

is the Xwish function.

3.2. Optimizing the Global Parameters of MH-ESN

In order to make the MH-ESN model have echo state characteristics and ensure the stability of the network during training, set

W^{f b} = 0

, when the model does not contain output feedback. The MH-ESN model reservoir state update equation is expressed as follows:

x (n + 1) = (1 - a) x (n) + Φ (s^{i n} W^{i n} u (n + 1) + ρ W x (n)) .

(10)

In Equation (10), the parameters a,

s^{i n}

, and

ρ

should be optimized. The training objective of the MH-ESN model is to minimize the error, and the output weight matrix

W^{o u t}

is trained so that the forecasted output

y (n)

is as near as reasonably practicable to the true value

d (n)

, and the error

e (n)

is expressed in Equation (11). To facilitate the computational process of finding the gradient, the error function is defined as shown in Equation (12).

e (n) = d (n) - y (n),

(11)

E (n) = \frac{1}{2} {∥e (n)∥}^{2} .

(12)

Let the global parameters

q \in \{a, s^{i n}, ρ\}

, and apply stochastic gradient descent to optimize the global parameters. The partial derivative of the error function with respect to q can be expressed as

\frac{\partial E (n + 1)}{\partial q} = - e (n + 1) W^{o u t} [\frac{\partial x (n + 1)}{\partial q}; \frac{\partial u (n + 1)}{\partial q}] .

(13)

Since the partial derivative of the input variable

u (n)

with respect to the global parameter q is 0, the simplified expression for the partial derivative of all global parameters is shown below:

\frac{\partial E (n + 1)}{\partial q} = - e (n + 1) W^{o u t} [\frac{\partial x (n + 1)}{\partial q}; 0] .

(14)

In the above equation, it is also necessary to calculate the partial derivatives of the reservoir state with respect to the global parameter q. To simplify the calculation process, the identity function is chosen as the function of output activation and set

X (n + 1) = s^{i n} W^{i n} u (n + 1) + ρ W x (n)

. At this point, the expressions of the partial derivatives of the global parameters q for a,

s^{i n}

, and

ρ

can be obtained as

\frac{\partial x (n + 1)}{\partial a} = (1 - a) \frac{\partial x (n)}{\partial a} - x (n) + Φ^{'} (X (n + 1)) (ρ W \frac{\partial x (n)}{\partial a}),

(15)

\frac{\partial x (n + 1)}{\partial ρ} = (1 - a) \frac{\partial x (n)}{\partial ρ} + Φ^{'} (X (n + 1)) (ρ W \frac{\partial x (n)}{\partial ρ} + W x (n)),

(16)

\frac{\partial x (n + 1)}{\partial s^{i n}} = (1 - a) \frac{\partial x (n)}{\partial s^{i n}} + Φ^{'} (X (n + 1)) (W^{i n} u (n + 1) + ρ W \frac{\partial x (n)}{\partial s^{i n}}) .

(17)

Bringing

\partial x (n + 1) / \partial q

back to Equation (14) yields the iterative formula for parameter optimization of the MH-ESN model during the training process:

q (n + 1) = q (n) - μ \frac{\partial E (n + 1)}{\partial q},

(18)

where

μ

is the learning rate of the q in the parameter optimization process. The correction values obtained after the optimization search must ensure that the MH-ESN model has echo state characteristics.

4. Simulation and Analysis

To evaluate the feasibility and effectiveness of the proposed MH-ESN model in this chapter, two different time series are selected for prediction in this section, namely the MSO time series and Mackey–Glass chaotic time series. The global parameters of the ESN models are trained using gradient descent. The normalized root mean square error (NRMSE) of the predictions is used as the performance metric to evaluate the performance of the MH-ESN, gate recurrent unit (GRU) [26] and Leaky-ESN models. GRU is generated through MATLAB R022b internal functions. Considering that the reservoir generation is random, the prediction results of the network are not the same even when the hyperparameters are exactly the same. We gave the initial parameters with good results via empirical debugging, and based on this, we ran 30 times and averaged the test results. The initial parameters of each network are shown in Table 1. In the table, leakage rate a indicates the degree of retention of the state of the previous time when the state of the reservoir is updated; spectral radius

ρ

is an important parameter to guarantee ESP; input scaling factor

s^{i n}

scales the input data; learning rate

μ

is the step size or speed at which model parameters are updated in each iteration; and reservoir size N is the number of neurons in reservoir.

4.1. MSO Time Series

The time series are created using the following equation:

u (n) = sin (n) + sin (0.51 n) + sin (0.22 n) + sin (0.1002 n) + sin (0.05343 n) .

(19)

The expected output is

d (n) = u (n - 5)

. In this experiment, the neuron size of each layer of the MH-ESN model was made the same, i.e.,

N 1 = N 2 = 30

. Each layer has three subreservoirs with neuron sizes of 12, 10, and 8 in the sub-reservoir, and each sub-reservoir is set with different sparsity according to the different neuron sizes. The MH-ESN model has a total of six sub-reservoirs, so the size of the main neurons in the virtual sub-reservoir is different. The neurons inside each sub-reservoir are sparsely connected. Using the same input sequence and parameters for excitation, the MSO time series are trained in both cases separately, and the NRMSE is selected as the performance evaluation index.

Figure 5 shows the predicted versus expected values of the MH-ESN model for the MSO time series, and it can be seen that the MH-ESN model can fit the expected output curve well after the washout segment. Table 2 shows the prediction results of the three models trained for MSO time series prediction, and it can be observed that MH-ESN has a better prediction accuracy than GRU and Leaky-ESN models for the MSO time series problem prediction. This indicates that the topology of MH-ESN can enhance the predictive performance of ESN on the MSO time series.

4.2. Mackey-Glass Chaotic Time Series

The Mackey-Glass chaotic time series is extensively employed to assess the discriminate ability of ESN in chaotic nonlinear systems. This time series can be expressed using a first-order time-lag differential system, which is given as follows:

u (n) = \frac{α u (n - τ)}{1 + u {(n - τ)}^{β}} + γ u (n),

(20)

where the chaos parameters are

α = 0.2, β = 10

, and

γ = - 0.1

. The time series has chaotic properties when

τ > 16.8

. Furthermore, in this chapter, we set

τ = 17

and

u (0) = 1.2

. Using the same input vector, initialization parameters, and activation function, the NRMSE is used as the measure of performance.

Figure 6 shows the predicted versus the expected values of the MH-ESN model for the MG chaotic time series, and it can be seen that after more than 1000 steps, the predicted value curve of the MH-ESN model fits well with the expected output curve. Table 3 shows the prediction results of the three models trained for MG time series prediction. It can be observed that the MH-ESN model has a higher prediction accuracy than the GRU and Leaky-ESN models, and the simulation run results illustrate that the MH-ESN model has better prediction results for the MG chaotic time series.

4.3. ECG

Electrocardiogram (ECG) data reflect the physiological activity characteristics of the human heart, which is often used as one of the diagnostic criteria for the heart in medicine. With the continuous improvement of the detection equipment, the acquisition of ECG signals has also become easy, which also helps people to analyze this. In this paper, the ECG signal is provided by the MIT-BIH arrhythmia database.

Figure 7 shows the predicted versus expected values of the MH-ESN model for the ECG time series, and it can be seen that the MH-ESN model can fit the expected output curve well after the washout segment. Table 4 shows the prediction results of the three models trained for ECG time series prediction. It can be seen that in the ECG time series prediction experiment, the prediction results of Leaky-ESN is not as good as GRU, but the MH-ESN model greatly improves the prediction accuracy and performs the best among the three.

5. Conclusions

This paper proposes a multi-reservoir hierarchical echo state network (MH-ESN). Compared with the reservoir structure of traditional ESN, the reservoir topology structure of the MH-ESN model processes the neurons in a hierarchical manner. Each layer is composed of multiple sub-reservoirs of echo state network models, and the sub-reservoirs in each layer are connected through main neurons. This design can more accurately simulate the hierarchical structure of a real biological neural network and improve the stability and prediction accuracy of the network. In the MH-ESN model, the connections between the main neurons are treated as a virtual sub-reservoir, and the virtual sub-reservoir is fully connected to the main neurons. The performance of MH-ESN is verified by predicting the MSO time series, Mackey–Glass chaotic time series, and the ECG time series. The simulation results show that MH-ESN can further improve the prediction accuracy of the ESN, and it is a more reliable time series prediction model.

Author Contributions

Methodology, Z.S.; Software, Z.S.; Validation, Z.S.; Data curation, Z.S.; Writing—original draft preparation, Z.S.; Writing—review and editing, S.L., M.L. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jaeger, H. The “Echo State” Approach to Analysing and Training Recurrent Neural Networks-with an Erratum Note; Technical Report; German National Research Center for Information Technology GMD: Bonn, Germany, 2001; Volume 148, p. 13. [Google Scholar]
Zhang, B.; Miller, D.J.; Wang, Y. Nonlinear system modeling with random matrices: Echo state networks revisited. IEEE Trans. Neural Netw. Learn. Syst. 2011, 23, 175–182. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Han, M.; Wang, J. Chaotic time series prediction based on a novel robust echo state network. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 787–799. [Google Scholar] [CrossRef] [PubMed]
Skowronski, M.D.; Harris, J.G. Noise-robust automatic speech recognition using a predictive echo state network. IEEE Trans. Audio Speech Lang. Process. 2007, 15, 1724–1730. [Google Scholar] [CrossRef]
Skowronski, M.D.; Harris, J.G. Automatic speech recognition using a predictive echo state network classifier. Neural Netw. 2007, 20, 414–423. [Google Scholar] [CrossRef]
Xia, Y.; Jelfs, B.; Van Hulle, M.M.; Príncipe, J.C.; Mandic, D.P. An augmented echo state network for nonlinear adaptive filtering of complex noncircular signals. IEEE Trans. Neural Netw. 2010, 22, 74–83. [Google Scholar]
Jaeger, H.; Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 2004, 304, 78–80. [Google Scholar] [CrossRef]
Liu, H.; Yang, H.; Wang, D. Robust speed prediction of high-speed trains based on improved echo state networks. Neural Comput. Appl. 2021, 33, 2351–2367. [Google Scholar] [CrossRef]
Li, Q.; Wu, Z.; Zhang, H. Spatio-temporal modeling with enhanced flexibility and robustness of solar irradiance prediction: A chain-structure echo state network approach. J. Clean. Prod. 2020, 261, 121151. [Google Scholar] [CrossRef]
Liu, J.; Sun, T.; Luo, Y.; Yang, S.; Cao, Y.; Zhai, J. Echo state network optimization using binary grey wolf algorithm. Neurocomputing 2020, 385, 310–318. [Google Scholar] [CrossRef]
Han, M.; Xu, M. Predicting multivariate time series using subspace echo state network. Neural Process. Lett. 2015, 41, 201–209. [Google Scholar] [CrossRef]
Wang, Z.Q.; Sun, Z.G. Method for prediction of multi-scale time series with WDESN. J. Electron. Meas. Instrum. 2010, 24, 947–952. [Google Scholar] [CrossRef]
Rodan, A.; Tino, P. Minimum complexity echo state network. IEEE Trans. Neural Netw. 2010, 22, 131–144. [Google Scholar] [CrossRef] [PubMed]
Lun, X.; Lin, J.; Yao, X. Time series prediction with an improved echo state network using small world network. Science 2015, 41, 1669–1679. [Google Scholar]
Lun, S.X.; Hu, H.F. Parameter optimization of leak integrator echo state network with internal-point penalty function method. Acta Autom. Sin. 2017, 43, 1160–1168. [Google Scholar]
Lun, S.X.; Yao, X.S.; Qi, H.Y.; Hu, H.F. A novel model of leaky integrator echo state network for time-series prediction. Neurocomputing 2015, 159, 58–66. [Google Scholar] [CrossRef]
Li, G.; Niu, P.; Zhang, W.; Zhang, Y. Control of discrete chaotic systems based on echo state network modeling with an adaptive noise canceler. Knowl.-Based Syst. 2012, 35, 35–40. [Google Scholar] [CrossRef]
Jing, Z.; Lun, S.; Liu, C.; Sun, Z. SOC Estimation of Lithium Batteries Based on Cluster-Growing Leaky Intergrator Echo State Network. In Proceedings of the 2022 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), Dalian, China, 11–12 December 2022; pp. 68–71. [Google Scholar]
Lun, S.; Zhang, Z.; Li, M.; Lu, X. Parameter Optimization in a Leaky Integrator Echo State Network with an Improved Gravitational Search Algorithm. Mathematics 2023, 11, 1514. [Google Scholar] [CrossRef]
Xue, Y.; Yang, L.; Haykin, S. Decoupled echo state networks with lateral inhibition. Neural Netw. 2007, 20, 365–376. [Google Scholar] [CrossRef]
Schmidhuber, J.; Gagliolo, M.; Wierstra, D.; Gomez, F. Evolino for recurrent support vector machines. arXiv 2005, arXiv:cs/0512062. [Google Scholar]
Liu, Y.Q.; Wang, T.H.; Xu, X. A novel adaptive activation function for deep learning neural networks. J. Jilin Univ. Sci. Ed. 2019, 57, 857–859. [Google Scholar]
Jaeger, H.; Lukoševičius, M.; Popovici, D.; Siewert, U. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 2007, 20, 335–352. [Google Scholar] [CrossRef] [PubMed]
Yildiz, I.B.; Jaeger, H.; Kiebel, S.J. Re-visiting the echo state property. Neural Netw. 2012, 35, 1–9. [Google Scholar] [CrossRef] [PubMed]
Na, X.; Ren, W.; Xu, X. Hierarchical delay-memory echo state network: A model designed for multi-step chaotic time series prediction. Eng. Appl. Artif. Intell. 2021, 102, 104229. [Google Scholar] [CrossRef]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]

Figure 1. Structure of Leaky−ESN.

Figure 2. Image of the Xwish function.

Figure 3. The derivative of the Xwish function.

Figure 4. MH−ESN topology structure.

Figure 5. Comparison of predicted and expected values of MSO time series.

Figure 6. Comparison between predicted value and expected value of Mackey−Glass chaotic time series.

Figure 7. Comparison between predicted value and expected value of ECG time series.

Table 1. Initial network parameters.

Network	Parameter	Value
Leaky-ESN MH-ESN	Leakage rate a	0.8
	Spectral radius $ρ$	0.66
	Input scaling factor $s^{i n}$	0.3
	Learning rate $μ$	0.01
	Reservoir size N	60
GRU	$μ$	0.01
GRU	N	60

Table 2. Comparison of the performance of the three models against MSO time series.

Network	NRMSE
GRU	$2.16 \times 10^{- 3}$
Leaky-ESN	$8.86 \times 10^{- 5}$
MH-ESN	$6.06 \times 10^{- 7}$

Table 3. Comparison of the performance of the three models against Mackey−Glass time series.

Network	NRMSE
GRU	$3.49 \times 10^{- 3}$
Leaky-ESN	$5.02 \times 10^{- 4}$
MH-ESN	$6.59 \times 10^{- 6}$

Table 4. Comparison of the performance of the three models against ECG time series.

Network	NRMSE
GRU	$8.37 \times 10^{- 3}$
Leaky-ESN	$7.16 \times 10^{- 2}$
MH-ESN	$4.66 \times 10^{- 5}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lun, S.; Sun, Z.; Li, M.; Wang, L. Multiple-Reservoir Hierarchical Echo State Network. Mathematics 2023, 11, 3961. https://doi.org/10.3390/math11183961

AMA Style

Lun S, Sun Z, Li M, Wang L. Multiple-Reservoir Hierarchical Echo State Network. Mathematics. 2023; 11(18):3961. https://doi.org/10.3390/math11183961

Chicago/Turabian Style

Lun, Shuxian, Zhenduo Sun, Ming Li, and Lei Wang. 2023. "Multiple-Reservoir Hierarchical Echo State Network" Mathematics 11, no. 18: 3961. https://doi.org/10.3390/math11183961

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiple-Reservoir Hierarchical Echo State Network

Abstract

1. Introduction

2. Leaky-ESN and Xwish Function

2.1. Leaky-ESN

2.2. Xwish Function

3. Multiple Reservoirs Hierarchical Echo State Network

3.1. MH-ESN Model

3.2. Optimizing the Global Parameters of MH-ESN

4. Simulation and Analysis

4.1. MSO Time Series

4.2. Mackey-Glass Chaotic Time Series

4.3. ECG

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI