*2.7. Training*

The model has a large number of unknown parameters that need to be identified by mathematical optimisation: The four learnable parameters *ω*<sup>0</sup> to *ω*∗ <sup>3</sup> , and 4 · *n* + 1 parameters *θ*∗ *<sup>f</sup>* and*θ*∗ *<sup>g</sup>* each in the two learnable functions *f* <sup>∗</sup> and *g*<sup>∗</sup> with *n* the number of hidden neurons.

Due to the small amount of available training data, we split the training into two consecutive steps: First, we trained a static network with the CCCV data. Afterward, we used the pulsed data to take the battery dynamics into account. One has to keep in mind that all current flows through the charge-transfer resistance *R*<sup>1</sup> of the RC circuit at steady-state operation. The double-layer capacitance *C*<sup>1</sup> is used to capture transient phenomena.

In detail, in the first step we neglected the double-layer capacitance. Therefore, the differential Equation (12) was converted into the algebraic equation

$$
\sigma\_{\rm RC1} = R\_1(\rm SOC, i\_{\rm bat}) \cdot i\_{\rm bat}.\tag{15}
$$

We trained the resulting simplified GB model using the data covering the six CCCV charging and discharging processes with different C-rates. We initialised the learnable parameters *ω*0, *ω*∗ <sup>2</sup> , and *ω*<sup>∗</sup> <sup>3</sup> and the learnable functions *f* <sup>∗</sup> and *g*<sup>∗</sup> of the simplified model as discussed above. As we have chosen a constant hysteresis voltage for non-zero battery currents, it is important to provide appropriate values for low currents. We decided to set currents with an absolute value |*i*bat| < 0.25 A to zero. Additionally, we had to provide the initial SOC value. As there was a rest phase before the start of each data set, we assumed that the battery is initially at equilibrium and therefore represented by the OCV curve. We inverted the OCV(SOC)-curve to determine the respective SOC value from the initial voltage. As mentioned above, the Dopri8 method was used to solve Equation (11) with an absolute tolerance of 10−<sup>5</sup> and relative tolerance of 10−3. We performed backpropagation with the standard odeint method from torchdiffeq. An Adam optimiser with a decaying learning rate between 10−<sup>2</sup> and 10−<sup>3</sup> minimised the loss function. The loss function was defined as the sum of the root mean squared error (RMSE) between the simulated battery voltage and the measured battery voltage and an additional penalisation term. Approximated SOC values lower than 0 or higher than 1 were taken into account. Their hundredfold absolute deviation from 0 or 1 was used as the penalisation term. As we had already initialised the other learnable parameters according to the insights from the measurement data, we only optimised *<sup>θ</sup> <sup>f</sup>* <sup>∗</sup> and *θg*<sup>∗</sup> during the first 50 training epochs. The total number of training epochs was varied. It is a hyperparameter of the training process that controls the number of complete passes through the training data set. During each training epoch, the six data sets were given to the model in random order. All time series were used completely. The optimisation steps were carried out with stochastic gradient descent. The parameters were stored when the total training loss during one epoch decreased.

In the second step, we used the complete GB model according to Equations (11) to (14) for further training. Therefore, we initialised *ω*∗ <sup>1</sup> as stated previously. The other parameters were taken from the pre-trained model. The initial SOC was determined as before. Additionally, we had to provide an initial value for the voltage drop *v*RC1 across the RC circuit. Due to the proceeding rest phase we assumed *v*RC1(*t* = 0) = 0 V. The standard odeint backpropagation was used again. We chose Dopri8 as differential equation solver with an absolute tolerance of 10−<sup>5</sup> and relative tolerance of 10−3. As before, the loss function was defined as the sum of the RMSE loss of the model output compared to the measured voltage and the penalisation term. The training loss was minimised by an Adam optimiser with a learning rate of 10<sup>−</sup>3. During the first ten training epochs, we only considered the data from the charging and discharging processes with a pulsed battery current. Afterwards we also considered the data from charging and discharging with the CCCV protocol. Additionally, we froze all learnable parameters except *ω*∗ <sup>1</sup> during the first 20 training epochs. Overall, we carried out 30 training epochs with batch gradient descent.

To further test our approach, we investigated GB models with different numbers of neurons in *f* ∗ and *g*∗. Furthermore, we varied the number of training epochs in the first training step between 100 and 1000, leaving training step two unchanged. The results of this study will be discussed in Section 3. We decided to take the trained model with 100 hidden neurons in *f* ∗ and *g*∗ and 300 training epochs in training step one as the final version.
