*2.1. Background: Neural Ordinary Differential Equations*

Besides the standard feedforward network, a number of other neural network architectures have been developed for different areas of application. The interested reader is referred to Ref. [22] for a detailed overview of neural networks.

RNNs are used for time series prediction. In contrast to feedforward networks, RNNs have recurrent connections. The outputs of a neuron can be used as inputs of a neuron in the same or a previous layer. In Ref. [23] RNNs learn multivariate time series with missing values. The authors of Ref. [24] include external variables in RNNs.

The authors of Ref. [25] introduce residual neural networks (ResNets) to overcome problems with the degradation of the training loss with an increasing number of hidden layers in deep neural networks. ResNets have additional short-cut connections which allow direct addition of the input of a neuron to its output.

In Ref. [26] the connection between ResNets with shared weights (the same weights are used in each layer of the neural network) and special forms of RNNs is established. ResNets can be used for time series prediction as well.

The following recursive formula applies to the state transformation from layer *t* to layer *t* + 1 in a ResNet [25]:

$$
\vec{z}\_{t+1} = \vec{z}\_t + \vec{f}\left(\vec{z}\_t, \vec{\theta}\_t\right), \quad t = 0, \dots, T - 1 \tag{1}
$$

where,*zt* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* is the vector of the hidden states at layer *<sup>t</sup>*, *θ<sup>t</sup>* the learned parameters of layer *t* and *<sup>f</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup>*<sup>d</sup>* a learnable function. The vector*θ<sup>t</sup>* of learned parameters summarises the learned weights and biases. Parameter sharing across the layers (*θ<sup>t</sup>* = *θ* for *t* = 0, ..., *T* − 1) results in the explicit Euler discretisation of the initial value problem [15,27–32],

$$\frac{\mathrm{d}\vec{z}(t)}{\mathrm{d}t} = \vec{f}\left(\vec{z}(t), t, \vec{\theta}\right), \quad \vec{z}(0) = \vec{z}\_0. \tag{2}$$

Herein the continuous change in the states *z*(*t*) is given by the learnable function *f* that represents a neural network. Therefore, the differential equation according to Equation (2) is called NODE. Starting from the initial state*z*(0) a differential equation solver can calculate the output state*z*(*T*) [15,29,30,32].

Originally, NODEs were developed for initial-value problems. The authors of Ref. [14] expanded the approach to solving differential equations with constraints. In our previous work [17], we showed how to consider external variables *u*(*t*) (here, the dynamic battery current as input variable) directly based on a simple application example. The differential equation according to Equation (2) is generalised:

$$\frac{d\vec{z}(t)}{dt} = f\left(\vec{z}(t), \vec{u}(t), t, \vec{\theta}\right), \quad \vec{z}(0) = \vec{z}\_0. \tag{3}$$

The external variables are inputs of the NODE. Therefore, we have to provide a function describing the change in the external variables with time. We could for example interpolate the measured data [17]. Figure 1 illustrates how to use NODEs with external variables schematically.

**Figure 1.** NODEs with external variables;*zt* represents the state variables at time *t* and *ut* represents the respective external variables. Adapted from Figure 1 in [17], which is licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/, accessed on 23 February 2022).

As stated in Refs. [14,17], NODEs can be used for GB modelling. The differential equations derived from physical insights in the system and NODEs can be combined in one equation system. A WB model is used as a basis for GB modelling. Single dependencies or entire equations in the differential equation system are then replaced with learnable parameters and neural networks. The respective ODEs are transformed into NODEs. Additional assumptions going beyond the physical insights in the system can be added. A differential equation solver delivers the corresponding values of the state variables at the considered time points. Additional algebraic model equations can also be modified using learnable parameters and neural networks.
