*3.2. LSTM*

The traditional feedforward neural networks only accept information from input nodes. They do not "remember" input to different time series [31]. Thus, it cannot extract the hidden features which have a long-time dependency from raw data. LSTM is proposed for overcoming this shortcoming as its long-term memory character [16]. It is a kind of special recurrent neural network (RNN). It implements memory function through gate structure in one cell as shown in Figure 1. The key point of the LSTM cell is the upper horizontal line, and it works like a conveyor belt; the information will not change during the transmission. It deletes old information or adds new information through three gate structures: forgot gate, input gate, and out gate. The output value of three gates and updated information are expressed using *ft*, *it*, *ot*, *C* ˆ *t* as shown in the following formulas:

$$f\_l = \sigma(w\_{f^\cdot}[h\_{t-1}, \mathbf{x}\_t] + b\_f) \tag{7}$$

$$\mathbf{i}\_{l} = \sigma(\mathcal{W}\_{l} \cdot [h\_{l-1}, \mathbf{x}\_{l}] + b\_{i}) \tag{8}$$

$$\hat{\mathbf{C}}\_{t} = \tan \mathbf{h}(\mathcal{W}\_{\mathbf{C}}[h\_{t-1}, \mathbf{x}\_{t}] + b\_{\mathbf{C}}) \tag{9}$$

$$\mathbf{C}\_{t} = f\_{t} \ast \mathbf{C}\_{t-1} + i\_{t\ast} \mathbf{C}\_{t} \tag{10}$$

$$o\_t = \sigma(w\_\bullet \cdot [h\_{t-1}, x\_t] + b\_\bullet) \tag{11}$$

$$h\_t = \sigma(w\_O \cdot [h\_{t-1}, x\_t] + b\_O) \ast \tanh(C\_t) \tag{12}$$

where *Ct* represents the memory cell which integrates the old useful information *ft* ∗ *Ct*−<sup>1</sup> and adds some new information *it*∗*C* ˆ *t*. *Wf*, *i*, *o* represents the weight and bias vectors of the abovementioned gates. σ is activation function *sigmoid*, *ht*−<sup>1</sup> is the LSTM value of the previous time step, and *xt* is input data.

**Figure 1.** The structure of Long-Short-Term Memory (LSTM) cells.
