*2.5. Reversible Aging Model Based on LSTM*

#### Long Short-Term Memory Networks

Through previous voltage decomposition, we can obtain the reversible aging voltage *Vr* which is the time-series sequence. The recurrent neural network (RNN) has a strong non-linear modeling ability for time-series data, which has achieved great success and wide application in natural language processing (NLP) [43] and time-series problems [44]. With the novel construction of the input gate, the forget gate, and the output gate, the LSTM network can overcome the problem of gradient disappearance or explosion from which traditional RNN suffers [25,35]. The LSTM network is applied to capture the voltage recovery information based on the reversible aging components in this paper. Figure 4a,b illustrates the LSTM architecture and the single cell of LSTM, respectively.

**Figure 4.** (**a**) LSTM architecture. (**b**) The single cell of LSTM.

Every time step, the LSTM unit receives the input from the current state *Xt* and the previous hidden state *ht*−1, as Figure 4b shows. The expression of the input gate can be written as:

$$\dot{a}\_t = \sigma(\mathcal{W}\_{\text{xi}}\mathbf{x}\_i + \mathcal{W}\_{\text{hi}}h\_{t-1} + b\_i) \tag{14}$$

The forget gate *ft* determines which input information should be ignored from the history memory and it is defined as:

$$f\_t = \sigma \left( \mathcal{W}\_{xf} \mathbf{x}\_t + \mathcal{W}\_{hf} \mathbf{h}\_{t-1} + \mathbf{b}\_f \right) \tag{15}$$

Meanwhile, the candidate value of the memory state *C*˜*<sup>t</sup>* is defined as:

$$\mathcal{C}\_t = \tanh(\mathcal{W}\_{\mathbf{x}\mathcal{C}}\mathbf{x}\_t + \mathcal{W}\_{\hbar c}\mathbb{I}\_{t-1} + \mathcal{b}\_c) \tag{16}$$

Combining Equations (14)–(16), we can obtain the expression to update the cell state:

$$\mathbf{C}\_{t} = f\_{t} \odot \mathbf{C}\_{t-1} + i\_{t} \odot \tilde{\mathbf{C}}\_{t} \tag{17}$$

The output gate *ot* is responsible for the final output and it is used to update the hidden state *ht* based on the current cell state *Ct*. They can be written as follows:

$$\rho\_t = \sigma(\mathcal{W}\_{\text{xo}}\mathbf{x}\_t + \mathcal{W}\_{\text{ho}}h\_{t-1} + b\_o) \tag{18}$$

$$h\_t = o\_t \odot \tanh(\mathbb{C}\_t) \tag{19}$$

where *σ* is the activation function and we choose the sigmoid function, *Wxi*, *Whi*, *Wx f* , *Wh f* , *Wxc*,*Whc*, *Wxo*, and *Who* are the weight matrices of each gate, *bi*, *bf* , *bc*, and *bo* are the bias vectors, means multiplied by the elements.

The residual components of the voltage data were smoothed by LOESS algorithm again with a window size of 20 to remove random noise or spikes before being sent to LSTM network. After smoothing to remove the noise, the reversible aging voltage data of PEMFCs and time information of characterization tests were input into the LSTM network as features. The network structure consists of four parts: a sequence input layer, an LSTM layer with the maximum number of 300 neurons in the hidden layer, a fully connected layer with one response, and a regression layer. The maximum sliding window size is 300; the loss function is the RMSE; the optimizer is Adam; the epoch size is 200; and the initial learning rate is 0.005.

We used the reversible voltage data and the time information of characterization from FC1 and FC2 to build samples for our training process of the network. We selected 50%, 70%, and 80% of the sample data as the training set, and the rest was selected as the test data set. The network's output is the reversible voltage at the next time step. Moreover, as shown in Figure 5, we adopted a sliding-window strategy during the training process of the LSTM. By setting the sliding window size reasonably, we can use the information from multiple times together as the feature input of the LSTM to improve the model's prediction ability for time-series data.

**Figure 5.** Sliding-window strategy during LSTM training.
