*3.2. Neural Network Architectures with Additional Features*

A deep recurrent neural network was created for forecasting. It consists of two recurrent neuron layers followed by several dense layers, see Figure 1.

**Figure 1.** Architecture of SFC processing with a neural network.

While the general architecture of the network remained the same, the number of layers and number of neurons in each layer varied depending on the hyperparameter optimization process.

The hyperparameter optimization may improve the performance of neural networks and can be used to adapt commonly used architecture to specific domains [51–53]. In this research, the following hyperparameters are varied:


Several recurrent layers were used in a neural network architecture. Deep recurrent neural networks allow for better flexibility compared to one-layer networks and serve as a powerful model for chaotic sequential data. Deep RNN were used for the task of forecasting and achieved better performance compared to shallow recurrent architectures [59,60]. Neural networks of similar architecture were applied to the analysis of indoor navigation [61], climate data [62], human activity classification [63], and health assessment [64]. Achieved results combined with the difference in analyzed data led to the choice of deep RNN architecture. Such combination of deep RNN and MSM algorithms were never used to process climate and physics data prior to this paper.

The enrichment process occurs in-between data processing and neural network construction. We should note that statistical model created on the window **X** is a characteristic of that entire window, not a time-dependent characteristic of any specific observation contained in the window. This also applies to the features based on that model.

There are several methods of passing features to the neural network. The simplest way to do so is to create a multi-input model by adding statistical features to the data flow after recurrent layers, see Figure 2a. Unfortunately, this also means that those layers would be trained without any information derived from SFC.

**Figure 2.** Methods of passing features to the neural network: (**a**) multi-input model; (**b**) adding features to window; (**c**) hidden state initialization.

In the second approach (see Figure 2b), additional data are added to the window itself. The input vector for neural network consists of original *N* window observations and additional *K* SFC features are applied to the end or to the beginning of the data vector. Adding time-independent data to a vector of time observations may create a harder learning task for the neural network. This approach was used but had proven to give worse accuracy compared to the hidden state initialization [65,66].

Finally, the approach presented in Figure 2c directly affects the hidden state of recurrent layers. Additional features for each training sample are transformed into a *K*-sized vector **v** defining the internal state of the recurrent layer:

$$\mathbf{v} = \mathbf{W}\mathbf{x} + \mathbf{b}\_{\prime}$$

where **x** is the vector of features, and *W* and **b** are trainable weights. Those weights can be obtained with an additional single dense layer placed before the recurrent layers of the neural network as a part of the enrichment process. For the first time step, the resulting tensor is added to the hidden state of the RNN. It allows both for conditioning of RNN on additional features and avoiding the problem of increasing the complexity of model training.
