Learning Data-Driven Stable Corrections of Dynamical Systems—Application to the Simulation of the Top-Oil Temperature Evolution of a Power Transformer

Ghnatios, Chady; Kestelyn, Xavier; Denis, Guillaume; Champaney, Victor; Chinesta, Francisco

doi:10.3390/en16155790

Open AccessArticle

Learning Data-Driven Stable Corrections of Dynamical Systems—Application to the Simulation of the Top-Oil Temperature Evolution of a Power Transformer

by

Chady Ghnatios

^1,†

,

Xavier Kestelyn

^2,†,

Guillaume Denis

^3,†

,

Victor Champaney

^4,† and

Francisco Chinesta

^5,6,*,†

¹

SKF Chair, PIMM Lab, Arts et Metiers Institute of Technology, 151 Boulevard de l’Hôpital, 75013 Paris, France

²

ULR 2697-L2EP, Centrale Lille, Junia ISEN Lille, Arts et Metiers Institute of Technology, University of Lille, 59000 Lille, France

³

RTE R&D, 7C Place du Dôme, 92073 Paris La Defense, CEDEX, France

⁴

ESI Chair, PIMM Lab, Arts et Métiers Institute of Technology, 151 Boulevard de l’Hôpital, 75013 Paris, France

⁵

RTE Chair, PIMM Lab, Arts et Métiers Institute of Technology, 151 Boulevard de l’Hôpital, 75013 Paris, France

⁶

CNRS@CREATE, 1 Create Way, 04-05 Create Tower, Singapore 138602, Singapore

^*

Author to whom correspondence should be addressed.

^†

All authors contributed equally to this work.

Energies 2023, 16(15), 5790; https://doi.org/10.3390/en16155790

Submission received: 25 June 2023 / Revised: 26 July 2023 / Accepted: 2 August 2023 / Published: 4 August 2023

(This article belongs to the Special Issue Advanced Artificial Intelligence Application for Power Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Many engineering systems can be described by using differential models whose solutions, generally obtained after discretization, can exhibit a noticeable deviation with respect to the response of the physical systems that those models are expected to represent. In those circumstances, one possibility consists of enriching the model in order to reproduce the physical system behavior. The present paper considers a dynamical system and proposes enriching the model solution by learning the dynamical model of the gap between the system response and the model-based prediction while ensuring that the time integration of the learned model remains stable. The proposed methodology was applied in the simulation of the top-oil temperature evolution of a power transformer, for which experimental data provided by the RTE, the French electricity transmission system operator, were used to construct the model enrichment with the hybrid rationale, ensuring more accurate predictions.

Keywords:

stable integrator; hybrid twin; machine learning; dynamical system; power transformer; monitoring

1. Introduction

The two most common approaches involved in simulation-based engineering (SBE), the one mainly based on physics and the more recent one based on the use of data manipulated by advanced machine learning techniques, both have inherent limitations.

In general, the physics-based modeling framework produces responses that approximate the real ones quite well as long as an accurate enough model exists. The main difficulties when considering such a physics-based approach are: (i) the fidelity of the model itself, which is assumed to be calibrated; (ii) the impact that variability and uncertainty induce; and (iii) the computing time required for solving the complex and intricate mathematical models.

On the other hand, the data-driven framework is not fully satisfactory because, even if the data (assumed to be noise-free) represent the reality well, the extrapolation or interpolation in space and time from the collected data (in particular, locations and times instants) usually entail a noticeable accuracy loss.

Of course, by increasing the amount of collected data, one could expect to be able to approximate the real solution with more fidelity; however, data are not always simple to collect and not always possible to access, and in all cases, collecting data is expensive (cost of sensors, cost of communication and analysis, etc.). Equipping a very large industrial or civil infrastructure with millions of sensors to cover all its spatial dimension seems simply unreasonable.

Moreover, even when the solution is properly approximated, two difficulties persist: (i) the solution explainability, compulsory to certify solutions and decisions; and (ii) the domain of validity when extrapolating far from the domain where data were collected.

In view of the limitations of both existing procedures, one gateway consists of allying both to conciliate agility and fidelity. The hybrid paradigm seems a valuable and appealing option. It considers the reality expressible from the addition of two contributions: the existing knowledge (the state-of-the-art physics-based model or any other kind of knowledge-based model) and the part of the reality that the model ignores, the so-called ignorance (also called the deviation, gap, or discrepancy).

1.1. The Three Main Simulation-Based Engineering Methodologies Revisited

To introduce and discuss different simulation-based engineering (SBE) frameworks, we consider a certain field

u (x)

describing the evolution of the variable u along the space defined by the coordinates

x \in Ω \subset R^{D}

.

We assume that a certain knowledge of the addressed physics exists, described by the model

M

, which, in general, consists of a system of algebraic equations or a system of differential equations complemented with the appropriate boundary and initial conditions ensuring the problem’s solvability. The solution provided by the model just referred to, which, as indicated, represents the existing knowledge of the problem at hand, is represented by

u^{M} (x)

.

Due to the partial knowledge of the addressed physical phenomenon, the calculated solution

u^{M} (x)

is expected to differ from the reference one

u (x)

, which is intended to be represented with maximum fidelity.

Thus, we define the residual

R^{M} (x)

according to

R^{M} (x) = u (x) - u^{M} (x),

(1)

where the error can be computed from its norm,

E^{M} = ∥ R^{M} (x) ∥

, for which the L2 norm is usually employed.

For reducing that error, different possibilities exist:

Physics-based model improvement. This approach consists of refining the modeling by enriching the model itself, $M \to \hat{M}$ , such that its solution $u^{\hat{M}}$ exhibits a smaller error; i.e., $E^{\hat{M}} \leq E^{M}$ ;
Fully data-driven description. The data-driven route consists of widely sampling the space $Ω$ , $x_{1}, \dots, x_{P}$ , with large enough $P$ and with the location of the points $x_{i}$ , $i = 1, \dots, P$ , maximizing the $Ω$ domain coverage. These points are grouped into the set $S$ .
The coverage is defined by the convex hull $ω (S)$ of the set $S = {x_{1}, \dots, x_{P}}$ , ensuring interpolation for $x \in ω (S)$ and limiting the risky extrapolation to the region of $Ω$ outside the convex hull $ω (S)$ .
Factorial samplings try to maximize the coverage; however, factorial samplings, or those based on the use of Gauss–Lobatto quadratures, related to approximations making use of orthogonal polynomials [1], fail when the dimensionality of the space $D$ increases.
When $D ≫ 1$ , sparse sampling is preferred, the Latin hyper cube (LHP), for instance. Samplings based on Gaussian processes (GPs) aim at distributing the points in locations where the uncertainty is maximum (with respect to the predictions inferred from the previously collected data).
Finally, the so-called active learning techniques drive the sampling with the aim of maximizing the representation of a certain goal-oriented quantity of interest [2].
In what follows, we assume generic sampling to access the reference solution $u (x_{i})$ , $i = 1, \dots, P$ and perfect measurability.
Now, to infer the solution at $x \in ω (S)$ , it suffices to construct an interpolation or approximation; more generally, an adequate regression $u^{S} (x)$ :

$u^{S} (x) \equiv u (x; u_{1}, \dots, u_{P}),$

(2)

where $u_{1} \equiv u (x_{1}), \dots, u_{P} \equiv u (x_{P})$ .
Different possibilities exist, including regularized polynomial regressions [3], neural networks (NNs) [4,5], support vector regression (SVR) [6], decision trees, and their random forest counterparts [7,8], to name a few.
The trickiest issue concerns the error evaluation, which is quantified from a part of the data kept outside the training set, the so-called test set, used to quantify the performance of the trained regression.
The main challenges of such a general procedure, particularly exacerbated in the multi-dimensional case ( $D ≫ 1$ ), are the following:
-
Ability to explain the regression $u^{S} (x)$ ;
-
The size of the dataset ( $P$ ), which scales with the problem dimensionality $D$ ;
-
The optimal sampling to cover $Ω$ while guaranteeing the accuracy of $u^{S} (x)$ or that of the goal-oriented quantities of interest;
Hybrid approach. The hybrid approach proceeds by embracing the physics-based and data-driven approaches. As described in the next section, it can improve the physics-based accuracy (while profiting of the physics-based explanatory capabilities) through the use of data-driven enrichment, which, for its part, and under certain conditions, needs less data than the fully data-driven approach just discussed [9].

1.2. Paper Organization

The present paper aims at addressing the methodologies involved in the construction of data-driven model corrections and then proving their potential with a problem of practical relevance.

For that purpose, the paper is organized as follows. After this brief introduction that emphasizes the different ways of enriching models to improve their ability to represent the observed reality, Section 2 describes and discusses the hybrid approach where physics-based and data-driven models are combined to improve each of them.

Then, Section 3 focuses on the methodologies involved in the procedure of learning stable dynamical systems, where the role of memory is widely discussed.

Section 4 proposes a simple procedure that, combined with ResNet architectures, enables the construction of stable integrators. The procedure is validated with some simple dynamical systems.

Finally, Section 5 addresses a problem of practical relevance, that of predicting the oil temperature in an electric power transformer. More importantly, it proves the superiority of the hybrid approach with respect to the physics-based or the data-driven modeling approaches. The paper ends with some final concluding remarks.

2. Illustrating the Hybrid Approach

2.1. A Simple Linear Regression Reasoning

In the hybrid approach, we consider first the contribution of a model expected to reasonably represent the physics at hand and denoted by

u (x)

, which was previously denoted as

u^{M} (x)

, with

x \in Ω \subset R^{D}

.

As discussed before, it entails a residual

R^{M} (x)

and the associated error,

E^{M}

.

Imagine for a while the simplest (and cheapest) regression of that residual, a linear regression involving

P

data associated with the locations

x_{i} \in Ω, i = 1, \dots, P

.

Thus, the linear regression involving a linear approximation of the residual, denoted by

Δ u^{L} (x)

, reads:

Δ u^{L} (x; a_{0}^{Δ}, \dots, a_{D}^{Δ}) = a_{0}^{Δ} + a_{1}^{Δ} x_{1} + \dots + a_{D}^{Δ} x_{D},

(3)

and involves the unknown coefficients

a_{0}^{Δ}, a_{1}^{Δ}, \dots, a_{D}^{Δ}

.

By using the L2 norm, the linear regression results from the least-square minimization problem:

{a_{0}^{Δ}, \dots, a_{D}^{Δ}} = {argmin}_{a_{0}^{*}, \dots, a_{D}^{*}} \{\sum_{i = 1}^{P} {(Δ u^{L} (x_{i}; a_{0}^{*}, \dots, a_{D}^{*}) - R^{M} (x_{i}))}^{2}\},

(4)

which defines an interpolation in the case where

P = D + 1

and an approximation when

P \geq D + 1

.

If we assume for a while that the reference solution

u (x)

is fully available, and the residual previously defined as

R^{M} (x)

is also available, the previous expression can be rewritten in a continuous form

{a_{0}^{Δ}, \dots, a_{D}^{Δ}} = {argmin}_{a_{0}^{*}, \dots, a_{D}^{*}} {∥ Δ u^{L} (x; a_{0}^{*}, \dots, a_{D}^{*}) - R^{M} (x) ∥}_{2} .

(5)

The linear regression of the reference solution

u (x)

, denoted by

u^{L} (x; a_{0}^{u}, \dots, a_{D}^{u})

, reads

u^{L} (x; a_{0}^{u}, \dots, a_{D}^{u}) = a_{0}^{u} + a_{1}^{u} x_{1} + \dots + a_{D}^{u} x_{D},

(6)

where the coefficients

a_{0}^{u}, \dots, u_{D}^{u}

result again from the least-square minimization problem

{a_{0}^{u}, \dots, a_{D}^{u}} = {argmin}_{a_{0}^{*}, \dots, a_{D}^{*}} \{\sum_{i = 1}^{P} {(u^{L} (x_{i}; a_{0}^{*}, \dots, a_{D}^{*}) - u (x_{i}))}^{2}\},

(7)

whose continuous expression reads

{a_{0}^{u}, \dots, a_{D}^{u}} = {argmin}_{a_{0}^{*}, \dots, a_{D}^{*}} {∥ u^{L} (x; a_{0}^{*}, \dots, a_{D}^{*}) - u (x) ∥}_{2} .

(8)

Equation (5) implies

∥ Δ u^{L} (x; a_{0}^{Δ}, \dots, a_{D}^{Δ}) - R^{M} (x) ∥_{2} \leq {∥ R^{M} (x) ∥}_{2} .

(9)

Thus, as soon as the error related to the residual becomes smaller than the one related to the linear approximation of the reference solution; i.e., if

E^{M} = ∥ R^{M} {(x) ∥}_{2} \leq {∥ u^{L} (x, a_{0}^{u}, \dots, a_{D}^{u}) - u (x) ∥}_{2},

(10)

then

∥ Δ u^{L} (x; a_{0}^{Δ}, \dots, a_{D}^{Δ}) - R^{M} (x) ∥_{2} \leq ∥ R^{M} (x) ∥_{2} \leq {∥ u^{L} (x, a_{0}^{u}, \dots, a_{D}^{u}) - u (x) ∥}_{2},

(11)

which proves the higher accuracy of the hybrid approximation as soon as the solution provided by the model

M

represents a better approximation to the reference solution

u (x)

than a linear regression could attain.

2.2. General Remarks

The analysis just discussed considers a simple linear regression involving a linear approximation; however, it remains valid when considering regressions involving richer nonlinear approximations.

The first part of Equation (11),

∥ Δ u^{L} (x; a_{0}^{Δ}, \dots, a_{D}^{Δ}) - R^{M} (x) ∥_{2} \leq {∥ R^{M} (x) ∥}_{2},

affirms that the simulated model solution is enriched in an L2-norm sense and under the constraint of having enough data to evaluate the considered norms. It is important to note that, even if the simulated model solution is enriched in the L2-norm sense, locally, the accuracy of the enriched solution could be degraded.

The second part pf Equation (11),

∥ R^{M} {(x) ∥}_{2} \leq {∥ u^{L} (x, a_{0}^{u}, \dots, a_{D}^{u}) - u (x) ∥}_{2},

works under some conditions involving the data and the model (i.e., with

u (x)

and

u^{M} (x)

) to be checked and again needs enough data to enable the calculation of the L2 norms.

2.3. On the Domain of Application of the Hybrid Modeling Approach

The hybrid approach can be applied in different settings. It can be viewed as a sort of transfer learning, where rich behaviors are approached from others that are close enough and well established.

This hybrid approach seems particularly appealing in different situations, such as the ones reported below:

When the model captures most of the solution features, the correction must describe a discrepancy that exhibits smaller nonlinearities, as was the case in [10,11], where the same amount of data performed better within the hybrid framework than within the fully data-driven framework;
Sometimes, the physics-based model operates very accurately in a part of the domain, whereas strong nonlinearities localize in a small region, which can, in that case, be captured by a data-driven learning model, as considered in [12] to address the inelastic behavior of spot-welds;
When considering the constitutive modeling of materials, the augmented rationale (or hybrid paradigm) expresses the real behavior from first-order behavior (calibrated from the available data) complemented by enrichment (or correction) filling the gap between the collected data and the predictions obtained from the assumed model [13];
When addressing plates (or shells) with noticeable 3D behaviors (deviating from the usual shell theory) a valuable solution consists of using an enriched kinematics consisting of two contributions: the first-order one (usual shell kinematics) enriched with a second-order contribution [14] that can be learned from data;
The hybrid modeling can also transfer existing knowledge slightly outside its domain of applicability with small amounts of collected data, as performed in [15] to correct state-of-the-art structural beam models;
Sometimes, the discrepancy concerns an imperfect alignment in the solution between the prediction and the measures. That discrepancy may seem very high when evaluating it at each location; however, a small transport allows aligning both solutions. Optimal transport is very suitable in these situations where the hybrid model consists of the usual nominal model enriched from a parametric correction formulated in an optimal transport setting, as described in [16,17];
In [18], a correction of a Mises yield function was performed from the deviation between the results predicted by using it and the measures obtained at the structure level;
Finally, when addressing processing and performances, the hybrid formulation, which can be used with different granularities, can exhibit advantages (amount of data, ability to explain, knowledge transfer, etc.) that pertain to the heart of digital twin developments [9,19,20,21,22,23,24].

3. Methods

In what follows, a system characterized by a state x that evolves in time, (i.e.,

x (t)

) is considered and the stability of its numerical integration is discussed [25]. When the state is multi-valued, it is denoted by

x

.

3.1. On the Integration Stability

We consider a simple linear dynamical system expressed by

\frac{d x}{d t} = a x, a \in R .

(12)

The integration reads

\frac{d x}{x} = a d t \to \ln (x) = C + a t \to x (t) = \hat{C} e^{a t},

(13)

with

\hat{C} = e^{C}

and

\hat{C} = x (t = 0)

.

It can be noted that the existence of a bounded solution requires that

a \leq 0

because if

a > 0

,

x (t \to \infty) = \infty

. Thus, the stability condition is

a \leq 0

.

Next, we consider the discrete case, in which the first-order derivative is discretized by a first-order finite difference

\frac{d x}{d t} = \frac{x_{n} - x_{n - 1}}{Δ t},

(14)

where

x_{n} = x (t = t_{n})

, with

t_{n} = n Δ t

.

In that case, the discrete time evolution reads

x_{n} = a x_{n - 1} Δ t + x_{n - 1} = (a Δ t + 1) x_{n - 1} = b x_{n - 1},

(15)

with

b = (a Δ t + 1)

.

Now, because

x_{n - 1} = b x_{n - 2}

and so on, we finally obtain

x_{n} = b^{n} x_{0},

(16)

from which it can be concluded that bounded solutions are subjected to the constraint

b \leq 1

.

Similar results can be obtained in the case of multi-valued states. In what follows, we consider the differential system

\frac{d x}{d t} = A x .

(17)

For the sake of simplicity, we assume that matrix

P

diagonalizes

A

, with

P^{T} P = I

(

I

is the unit matrix). Thus, if we define

x = P y

, we get

P \frac{d y}{d t} = A P y,

(18)

and multiplying by

P^{T}

results in

P^{T} P \frac{d y}{d t} = P^{T} A P y,

(19)

which, taking into account that

P^{T} P = I

, can be rewritten as

\frac{d y}{d t} = D y,

(20)

with

D

being diagonal.

The fact that

D

is diagonal allows the decoupling of the solution of Equation (20). Thus, we have

\frac{d y_{i}}{d t} = D_{i} y_{i}, \forall i,

(21)

with

y_{i}

being the i-component of

y

and

D_{i}

the

(i, i)

-diagonal component of

D

.

Thus, using the previous results, the stability condition reads

\max_{i} D_{i} \leq 0 .

(22)

By discretizing Equation (20), we get

y_{n} = (Δ t D + I) y_{n - 1},

(23)

or, noting

\hat{D} = Δ t D + I

,

y_{n} = \hat{D} y_{n - 1},

(24)

which, using the recurrence, results in

y_{n} = {\hat{D}}^{n} y_{0} .

(25)

Thus, stability requires that the spectral radius

ρ

of

\hat{D}

is lower than the unity; i.e.,

ρ (\hat{D}) \leq 1

.

These conditions should be satisfied by the dynamical learned models, as discussed later.

3.2. Learning Integrators

When the dynamical system is known, different numerical integration schemes (explicit, implicit, or semi-implicit) can be applied with different convergence orders; for instance, the Euler or the more accurate Runge–Kutta orders, among many other choices. Thus, the discretization time step

Δ t

must be chosen to ensure stability and to guarantee convergence; that is, a numerical solution close enough to the reference solution of the problem.

In what follows, we revisit and discuss different machine learning techniques that enable the learning of dynamical systems and apply them for the integration of dynamical systems, as performed by recurrent NN (rNN) and long short-term memory (LSTM) techniques [26,27].

In this section,

h

refers to the hidden state, whereas

l

and

o

refer, respectively, to the inputs (loading) and observable outputs.

3.2.1. Recurrent Neural Network

If

W^{•}

represents dense matrices associated with the variables •,

σ (\cdot)

the activation function,

h_{n}

the internal state at the n-time step

t_{n}

, and

l_{n}

and

o_{n}

the associated input and output at the present time

t_{n}

, then the rNN proceeds from:

\{\begin{matrix} h_{n} = σ (W^{h} h_{n - 1} + W^{l} l_{n}) \\ o_{n} = σ (W^{o} h_{n}), \end{matrix},

(26)

the architecture of which is depicted in Figure 1.

To implement the time evolution, it suffices to use the rNN and re-inject the rNN output

h_{n}

as the input at the next time step.

A temporal sequence of inputs and outputs, which is assumed to be available, allows calculating the different weights composing the different

W

matrices.

To highlight the connection between the rNN and the usual dynamical system integrators, we consider the finite element semi-discretized form of a linear parabolic differential equation

M \frac{d h}{d t} = K h + l,

(27)

where

M

and

K

are two matrices that result from the spatial discretization. Here,

h

refers to the state and

l

to the system loading.

Then, the explicit discretization of Equation (27) reads

M h_{n} = Δ t K h_{n - 1} + Δ t l_{n} + M h_{n - 1} = (Δ t K + M) h_{n - 1} + Δ t l_{n},

(28)

from which the state updating results in

h_{n} = M^{- 1} (Δ t K + M) h_{n - 1} + Δ t M^{- 1} l_{n},

(29)

which can be rewritten as

h_{n} = W^{h} h_{n - 1} + W^{l} l_{n},

(30)

This corresponds to the particularization of Equation (26) to the linear case. Thus, the intimate connection between the rNN and the usual discretization techniques becomes explicit. The latter operates with a known model whereas the former learns the model itself (the different

W^{•}

matrices).

Remark 1.

In the linear model just discussed, when the whole state is observed (i.e.,

o_{n} = h_{n}

),

W^{o} = I

.

3.2.2. On the Model Memory

When considering a first-order dynamical system as just discussed, the solution at a particular time can be computed from only the knowledge of the previous solution (at the previous time step) and the present loading (also known as action or input). Thus, one could imagine that, for learning first-order dynamical systems, a short memory, such as, for instance, the rNN just described, would suffice.

However, sometimes, larger memory is needed even when addressing first-order models, as described in this section.

For the sake of simplicity, we consider the system state given by

h (t) = {(h_{1} (t), h_{2} (t))}^{T}

, whose evolution is governed by a simple linear first-order dynamical system

\{\begin{matrix} {\dot{h}}_{1} (t) = K_{11} h_{1} (t) + K_{12} h_{2} (t) + l_{1} (t) \\ {\dot{h}}_{2} (t) = K_{21} h_{1} (t) + K_{22} h_{2} (t) + l_{2} (t) \end{matrix},

(31)

where, without loss of generality, it is assumed that

K_{22} = 0

and

l_{2} (t) = 0

; i.e.,

\{\begin{matrix} {\dot{h}}_{1} (t) = K_{11} h_{1} (t) + K_{12} h_{2} (t) + l_{1} (t) \\ {\dot{h}}_{2} (t) = K_{21} h_{1} (t) \end{matrix} .

(32)

In what follows, it is assumed that

h_{1} (t)

and

x_{1} (t)

are accessible but that nothing, even the existence of

l_{2} (t)

, is known. In these circumstances, the key question is: can we learn a model relating

l_{1} (t)

and

h_{1} (t)

that is both accessible and measurable while knowing that the state

h_{1} (t)

depends on another one,

h_{2} (t)

, that evolves in time and remains inaccessible and consequently unknown?

To facilitate the model manipulation, the Fourier transform can be applied, here denoted by the symbol

^{*}

, to the different time-dependent variables. Thus, Equation (32) can be written as

\{\begin{matrix} i ω h_{1}^{*} = K_{11} h_{1}^{*} + K_{12} h_{2}^{*} + l_{1}^{*} \\ i ω h_{2}^{*} = K_{21} h_{1}^{*} \end{matrix} .

(33)

From the second part of Equation (33), we obtain

h_{2}^{*} = K_{21} \frac{h_{1}^{*}}{i ω}

(34)

which, inserted into the first part of Equation (33), leads to

i ω h_{1}^{*} = K_{11} h_{1}^{*} + K_{12} K_{21} \frac{h_{1}^{*}}{i ω} + l_{1}^{*},

(35)

Coming back to the time domain results in

\frac{d h_{1} (t)}{d t} = K_{11} h_{1} (t) + K_{12} K_{21} \int_{0}^{t} h_{1} (τ) d τ + l_{1} (t),

(36)

which proves that ignoring components of the state manifests in the measurable state history, here symbolized by the integral. In a certain way, this result can be interpreted as a consequence of Takens’ delay embedding theorem.

Thus, memory is not only a consequence of the order of the subjacent dynamics; the memory in the learning procedure also depends on the ignored states. Thus, sometimes, when addressing poorly described physical systems or partially accessible ones, larger memory than that offered by recurrent NNs seems compulsory.

3.2.3. Learners with Larger Memory

Long short-term memory (LSTM) [28] is an appealing technique for learning time-dependent problems while ensuring larger memory. LSTM combines short- and long-term memory units, with an evanescent memory for the long-term path and a combination of long and short leads for the short memory response.

A particular architecture is depicted in Figure 2 that includes not only memory but also the previous inputs, as well as the forgetting of the initial conditions of the long

c_{0}

and short

h_{0}

memory channels to facilitate its use as a generic time integrator.

Recurrent NNs (rNNs) and LSTM learn the state evolution, and special care must be paid to ensure time integration stability.

Sometimes, it seems preferable to learn the forcing term of the dynamical system, as in residual nets and neural differential equation techniques. The next section revisits residual nets and discusses stability issues.

3.2.4. Residual Nets

Residual neural networks (ResNets) [29,30,31] aim at emulating the backward integration of a dynamical system. For the dynamical system

\frac{d x}{d t} = f (x), x (t = 0) = x_{0},

(37)

the ResNet looks for the function

f (x)

that allows reproducing the state time evolution

x (t)

from its discrete knowledge; that is, from

x_{n} = x (t_{n})

, with

t_{n} = n Δ t

.

By considering the simplest time stepping, the time derivative can be approximated by

d x / d t = (x_{n + 1} - x_{n}) / Δ t

. Then, one can use the following updating

x_{n + 1} = x_{n} + Δ t f (x_{n}),

(38)

and learn the forcing term

f (x)

that induced the state evolution by using an appropriate regression.

For this purpose, ResNet creates an NN trained from

x_{n}

and

x_{n + 1} - x_{n}

,

n = 1, \dots

. Then, as soon as the NN is trained, the output

x_{n + 1}

is calculated by applying to

x_{n}

the NN just constructed, adding the identity operator applied to the input.

Now, the dynamical system integration consists in applying the ResNet in a recursive way, with the output at time n becoming the input of the next time step. The ResNet architecture and the integration procedure are illustrated in Figure 3.

4. Simple Procedures for Stabilizing ResNet-Based Integration

In what follows, the stabilization of the time integration performed with ResNet is discussed. First, a simple stabilization is accomplished by linearizing the dynamical system while using the results just discussed.

Then, inspired by that accomplishment, a more general stabilization able to operate in nonlinear settings is proposed.

Finally, the performances of both strategies are compared.

4.1. Learning Stable Linear Dynamical Systems

In what follows, we consider a parameterized dynamical system:

\frac{d x}{d t} = f (x; z),

(39)

where the state at time

t_{n}

,

x_{n}

, depends on the previous state

x_{n - 1}

and the loading

f (\cdot)

driving the state change. The last is assumed to depend not only on the state but also on a series of parameters (input features) here grouped as a vector

z

. Thus, the learning procedure aims at computing the regression

f (x; z)

.

However, to obtain a stable integrator, certain constraints must be fulfilled. As these constraints have an easy form in the linear case, as previously discussed, we here proceed to linearize the forcing term as follows

f (x; z) \approx g (z) x + h (z) .

(40)

Now, the stability is ensured as soon as

g (z) \leq 0

, a constraint that can be easily enforced by employing a penalty term in the loss function of the NN associated with

g (z)

.

The linearized ResNet resembles dynamic mode decomposition (DMD) [32] with what is probably an easier introduction of the parametric dimension.

4.2. Learning Stable Nonlinear Dynamical Systems

Inspired by the rationale just discussed, a possible route for enforcing stability without detriment to the nonlinear behavior consists of writing

f (x; z) \approx g (z, x) x + h (z) .

(41)

while enforcing in the construction of the regression

g (x; z)

the negativity constraint; that is,

g (x; z) \leq 0

.

4.3. Numerical Examples and Discussion

To prove the expected performances, we considered two cases: one linear, where both methodologies just described were expected to work, and a nonlinear one, where the former methodology was expected to fail and the latter (the nonlinear counterpart) was expected to performing correctly.

This section only addresses the stability concerns without giving details on the neural architectures employed for learning functions

g (z)

,

g (x; z)

, and

h (z)

, which are described in detail in Section 5. Moreover, when the learned functions are employed for integrating the dynamical system incrementally by using standard first-order explicit time-marching, the computed solution will be denoted with the hat symbol

\hat{}

.

4.3.1. Linear Dynamical System

We consider data generated from the integration of the dynamical system

\frac{d x}{d t} = 1 - x,

(42)

from the initial condition

x (t = 0)

= 0.

The long-term solution

x = 1

is stable because, if

x < 1

, the solution increases, whereas if

x > 1

, it decreases.

The solution reads

x (t) = 1 - e^{- t},

(43)

and is used for generating the synthetic data that will be used for learning the dynamical system with the two procedures previously described.

Figure 4 compares the solution obtained by integrating numerically the models learned by employing the linear and nonlinear learning procedures. The problem being linear, both procedures were expected to perform identically, as Figure 4 proves.

4.3.2. Nonlinear Dynamical System

Next, a nonlinear dynamical system is considered expressed by

\frac{d x}{d t} = {(1 - x)}^{2},

(44)

which is integrated from the initial condition

x (t = 0)

= 0.

Using the same rationale as before, we can reach a conclusion about the stability of the associated long-term solution

x = 1

.

Figure 5 compares the solutions obtained with the linear and nonlinear learning procedures. As expected, the nonlinear learning procedure ensures stability and accurately represents the nonlinear behavior, whereas the linear model learned results in stability but remains inaccurate.

4.4. Final Remarks

Another potential neural architecture with improved robustness with respect to stability is the NeuralODE [33,34,35], which learns a model while considering many integration steps instead of learning in one-step time increments as ResNet does. Even if stability is significantly enhanced, the NeuralODE’s training efforts are greater than those required by the ResNet, mainly due to the fact that the back-propagation in the loss function minimization needs the solution of an adjoint problem, which, in some cases, exhibits poor convergence.

5. Application to the Evaluation of the Top-Oil Temperature of an Electric Power Transformer

This section addresses an application case of industrial relevance. Aging of transformers exhibits a correlation with the temperature of the oil throughout their lives in service. Moreover, the oil temperature seems to be an appealing feature to anticipate faults and, consequently, to be used in predictive maintenance.

Some simplified models exist to evaluate the oil temperature depending on the transformer’s delivered power and the ambient temperature. Standards propose different simplified models for that purpose, such as the one considered later.

Due to the complexity of a transformer, which is large in size and embraces numerous types of physics and scales, an appealing modeling route consists of using a simplified model and then enriching it with the available collected data [36].

The correction just referred to comes from the time integration of a dynamical system involving parametric loading (delivered power and ambient temperature) learned from the available data under the stability constraints.

To illustrate the construction of the enriched model for the top-oil temperature prediction

Θ

, we consider as input the ambient temperature

T^{a m b}

and the transformer load

K^{l o a d}

, which are both measured every hour. Thus, the model parameters read

z = {(T^{a m b}; K^{l o a d})}^{T}

.

The transformer oil temperature can be obtained from

Θ (t) = Θ^{T O} (t) + d (t),

(45)

where

d (t)

represents the deviation, and

Θ^{T O}

the temperature predicted by a state-of-the-art simplified model [37]:

\{\begin{matrix} P (K^{l o a d}, Θ^{W}, X^{t p}) = C_{t h} \frac{d Δ Θ^{T O}}{d t} + \frac{Δ Θ^{T O}}{R_{t h}} \\ Θ^{T O} = T^{a m b} + Δ Θ^{T O} \\ Θ^{W} = Θ^{T O} + Δ Θ^{O W} \end{matrix}

(46)

where

$X^{t p}$ is the position of the tap-changer;
$K^{l o a d}$ is the load factor, the ratio between the nominal load current and the actual current;
P represents the iron, copper, and supplementary losses. The power that heats the oil is composed of the losses that do not depend on the transformer load (iron losses, assumed to be constant) and the losses that do depend on the transformer load (copper and supplementary losses), which depend on the average winding temperature and the load factor in accordance with: $P = P_{0} + P_{l o a d}$ , with $P_{l o a d} = K {^{l o a d}}^{2} P_{l o a d_{r}}$ and $P_{l o a d_{r}} = k P_{J o u l e_{r}} + \frac{P_{s u p p_{r}}}{k}$ (k being a correction factor related to the material resistivity);
$T^{a m b}$ is the ambient temperature;
$Θ^{T O}$ is the simulated top-oil temperature
$Δ Θ^{T O}$ is the temperature difference between the simulated top-oil temperature and the ambient temperature;
$R_{t h}$ and $C_{t h}$ are the thermal resistance and thermal capacitance of the equivalent transformer thermal circuit;
$Θ^{W}$ is the average winding temperature;
$Δ Θ^{O W}$ is the difference between the average winding temperature and the simulated oil temperature $Θ^{T O}$ . It is assumed to be constant and found during the commissioning test (standards).

The physics-based model (Equation (46)) is calibrated from the available experimental data, from which the parameters

C_{t h}

and

R_{t h}

are obtained. Two physics-based models are used, a linear one where

C_{t h}

and

R_{t h}

are assumed constant and a nonlinear physics-based model where

C_{t h}

and

R_{t h}

are temperature-dependent; that is, both coefficients depend on the top-oil temperature

Θ^{T O}

.

The available experimental data

Θ (t)

, the prediction from the calibrated physics-based nonlinear model (Equation (46))

Θ^{T O} (t)

, and the deviation between them

d (t) = Θ (t) - Θ^{T O}

are all depicted in Figure 6.

The model correction (the deviation model

d (t)

) can also be obtained using two different approaches, the linearized ResNet:

\frac{\partial d}{\partial t} = g (\tilde{z}) d + h (\tilde{z}),

(47)

and the nonlinear ResNet counterpart:

\frac{\partial d}{\partial t} = g (\tilde{z}, d) d + h (\tilde{z}),

(48)

where

\tilde{z}

represents the augmented feature set that contains the physical features

z

augmented with the model prediction

Θ^{T O}

.

Functions

g (\tilde{z})

and

h (\tilde{z})

are described by using two LSTM architectures (described in Table 1 and Table 2, respectively, for the linearized ResNet) that consider the extended features involved in

\tilde{z}

at the present time, as well as at the previous four time steps.

Thus, the linearized-correction dynamical model reads

\{\begin{matrix} d_{n} = Δ t g (\tilde{z}) d_{n - 1} + Δ t h (\tilde{z}) + d_{n - 1} \\ \tilde{z} = (\begin{matrix} T_{n - 4}^{a m b} & K_{n - 4}^{l o a d} & Θ_{n - 4}^{T O} \\ T_{n - 3}^{a m b} & K_{n - 3}^{l o a d} & Θ_{n - 3}^{T O} \\ ⋮ \\ T_{n}^{a m b} & K_{n}^{l o a d} & Θ_{n}^{T O} \end{matrix}) \end{matrix} .

(49)

The neural network architectures considered for describing functions

g (\tilde{z})

and

h (\tilde{z})

are both based on the use of LSTM layers combined with a deep dense neural network layer, as described in Table 1 and Table 2 for the linearized ResNet. They were built by using Tensorflow Keras libraries. The inputs involved in Equation (49) are shown in Figure 7.

The training was performed with the initial

80 %

of the available measures while the testing was performed with the remaining

20 %

.

The prediction performed from the corrected (enriched) model is depicted in Figure 8 for the testing interval. It is important to note that the learned models

g (\tilde{z})

and

h (\tilde{z})

were used to integrate the dynamical problem that governs the time evolution of the deviation (Equation (49)); that is, the deviation at time

t_{i}

was computed from that at time

t_{i - 1}

, and then it was used to compute the deviation at the next time step

t_{i + 1}

and so on.

To distinguish between the known deviation

d (t)

and the one computed from the integrator based on the learned dynamics

g (\tilde{z})

and

h (\tilde{z})

, the later is denoted by the hat symbol; i.e.,

{\hat{d}}_{n + 1} = Δ t g (\tilde{z}) {\hat{d}}_{n} + Δ t h (\tilde{z}) + {\hat{d}}_{n}

(50)

From the computed correction, the model was enriched, exhibiting excellent accuracy and stability.

The fact that the linearized dynamical system operating with the solution correction exhibited good performances proves that most of the problem nonlinearities were captured by the first-order simplified model. When it comes to the nonlinear version of the correction effort, the dynamic version of the model is written in a similar manner as shown in Equation (49), with the dynamical integration form used:

{\hat{d}}_{n + 1} = Δ t g (\tilde{z}, {\hat{d}}_{n}) {\hat{d}}_{n} + Δ t h (\tilde{z}) + {\hat{d}}_{n}

(51)

To compare the performances of the hybrid model (data-driven enrichment of the simplified physics-based model), the model governing the experimental data evolution was learned using the same rationale but now applied to the data, as follows:

\{\begin{matrix} Θ_{n + 1} = Δ t g^{θ} (z) Θ_{n} + Δ t h^{θ} (z) + Θ_{n} \\ z = (\begin{matrix} T_{n - 4}^{a m b} & K_{n - 4}^{l o a d} \\ T_{n - 3}^{a m b} & K_{n - 3}^{l o a d} \\ ⋮ \\ T_{n}^{a m b} & K_{n}^{l o a d} \end{matrix}) \end{matrix},

(52)

where

g^{θ} (z)

and

h^{θ} (z)

refer to the parametric functions related to the measured oil temperature, both depending exclusively on the input features

z

.

Again, the training was performed using the same model presented in Table 1 and Table 2 with the same

80 %

and

20 %

training and test datasets.

Figure 9 depicts the results obtained from the integration, where again the hat symbol refers the integrated temperature.

{\hat{Θ}}_{n + 1} = Δ t g^{θ} (z) {\hat{Θ}}_{n} + Δ t h^{θ} (z) + {\hat{Θ}}_{n} .

(53)

Figure 9 depicts the data-driven model integration, proving the stability performances. However, when comparing the residual error (with respect to the experimental data) in the fully data-driven prediction and that related to the physics-based enriched solution obtained from the data-driven model of the deviation, both compared in Figure 10, the hybrid modeling framework performed better, ensuring higher accuracy.

Table 3 compares the mean values of the errors associated with the different tested models. From Table 3, one can infer that:

The hybrid approach improves the physics-based model performances;
Enriching the richer nonlinear physics-based model produced better results than enriching the linear counterpart of the simplified physics-based model;
When the considered physics-based models were too far from the reference solution (experimental data), the data-driven model could outperform the hybrid modeling.

To show the effect of the stabilization, a ResNet was trained without enforcing the stability constraints previously proposed by using the formulation:

\frac{\partial d}{\partial t} = g (\tilde{z}, d) + h (\tilde{z}),

(54)

with no conditions imposed during the calculation of g and h.

The discrete form

{\hat{d}}_{n + 1} = Δ t g (\tilde{z}, d_{n}) + Δ t h (\tilde{z}) + d_{n},

(55)

provided excellent predictions. It is important note that, here, the solution at time

t_{n + 1}

,

{\hat{d}}_{n + 1}

, is computed from the exact deviation at time

t_{n}

,

d_{n}

.

However, a full integration where

{\hat{d}}_{n + 1}

is computed from the previously computed

{\hat{d}}_{n}

{\hat{d}}_{n + 1} = Δ t g (\tilde{z}, {\hat{d}}_{n}) + Δ t h (\tilde{z}) + {\hat{d}}_{n},

(56)

produces extremely bad predictions, a direct consequence of the lack of stability.

Figure 11 proves the importance of using stable formulations when the learned model is expected to serve for performing integrations from an initial condition, as is always the case in prognosis applications.

6. Conclusions

The present paper proposed a hybrid framework where the data-driven model serves to enrich a physics-based model considered as a first approximation of the addressed physics.

We proved that the hybrid framework enhances the prediction accuracy with respect to the physics-based model; however, the hybrid approach is superior to a fully data-driven model only under certain conditions.

The learning technique employed to model the time evolution of the deviation (or that of the data) must ensure the integration stability. A stabilization was proposed and its performance proved.

An application to a problem of practical relevance proved the excellent performances of the proposed methodology.

Author Contributions

Conceptualization, F.C.; methodology, C.G. and V.C.; software, C.G. and V.C.; validation, X.K. and G.D.; resources, G.D.; writing—original draft preparation, C.G. and F.C.; writing—review and editing, C.G., F.C. and X.K.; supervision, X.K. and G.D.; project administration, G.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available upon request to the authors after the agreement of the RTE.

Acknowledgments

The authors acknowledge the support of the RTE, SKF, and ESI research chairs at Arts et Métiers Institute of Technology.

Conflicts of Interest

The authors declare no conflict of interest.

References

Borzacchiello, D.; Aguado, J.V.; Chinesta, F. Non-intrusive sparse subspace learning for parametrized problems. Arch. Comput. Methods Eng. 2019, 26, 303–326. [Google Scholar] [CrossRef]
Settles, B. Active Learning Literature Survey. In Computer Sciences Technical Report 1648; University of Wisconsin-Madison: Madison, WI, USA, 2009. [Google Scholar]
Sancarlos, A.; Champaney, V.; Cueto, E.; Chinesta, F. Regularized regressions for parametric models based on separated representations. Adv. Model. Simul. Eng. Sci. 2023, 10, 4. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods; Cambridge University Press: New York, NY, USA, 2000. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Kirkwood, C.W. Decision Tree Primer. 2002. Available online: https://www.public.asu.edu/~kirkwood/DAStuff/refs/decisiontrees/index.html (accessed on 24 June 2023).
Chinesta, F.; Cueto, E.; Abisset-Chavanne, E.; Duval, J.L.; Khaldi, F.E. Virtual, Digital and Hybrid Twins: A New Paradigm in Data-Based Engineering and Engineered Data. Arch. Comput. Methods Eng. 2020, 27, 105–134. [Google Scholar] [CrossRef] [Green Version]
Sancarlos, A.; Cameron, M.; Abel, A.; Cueto, E.; Duval, J.L.; Chinesta, F. From ROM of electrochemistry to AI-based battery digital and hybrid twin. Arch. Comput. Methods Eng. 2021, 28, 979–1015. [Google Scholar] [CrossRef]
Sancarlos, A.; Cameron, M.; Peuvedic, J.M.L.; Groulier, J.; Duval, J.L.; Cueto, E.; Chinesta, F. Learning stable reduced-order models for hybrid twins. Data Centric Eng. 2021, 2, e10. [Google Scholar] [CrossRef]
Reille, A.; Champaney, V.; Daim, F.; Tourbier, Y.; Hascoet, N.; Gonzalez, D.; Cueto, E.; Duval, J.L.; Chinesta, F. Learning data-driven reduced elastic and inelastic models of spot-welded patches. Mech. Ind. 2021, 22, 32. [Google Scholar] [CrossRef]
Gonzalez, D.; Chinesta, F.; Cueto, E. Learning corrections for hyper-elastic models from data. Front. Mater.-Sect. Comput. Mater. Sci. 2019, 6, 14. [Google Scholar]
Quaranta, G.; Ziane, M.; Haug, E.; Duval, J.L.; Chinesta, F. A minimally-intrusive fully 3D separated plate formulation in computational structural mechanics. Adv. Model. Simul. Eng. Sci. 2019, 6, 11. [Google Scholar] [CrossRef] [Green Version]
Moya, B.; Badias, A.; Alfaro, I.; Chinesta, F.; Cueto, E. Digital twins that learn and correct themselves. Int. J. Numer. Methods Eng. 2022, 123, 3034–3044. [Google Scholar] [CrossRef]
Torregrosa, S.; Champaney, V.; Ammar, A.; Hebert, V.; Chinesta, F. Surrogate Parametric Metamodel based on Optimal Transport. Math. Comput. Simul. 2022, 194, 36–63. [Google Scholar] [CrossRef]
Torregrosa, S.; Champaney, V.; Ammar, A.; Herbert, V.; Chinesta, F. Hybrid Twins based on Optimal Transport. Comput. Math. Appl. 2022, 127, 12–24. [Google Scholar] [CrossRef]
Ibanez, R.; Abisset-Chavanne, E.; Gonzalez, D.; Duval, J.L.; Cueto, E.; Chinesta, F. Hybrid Constitutive Modeling: Data-driven learning of corrections to plasticity models. Int. J. Mater. Form. 2019, 12, 717–725. [Google Scholar] [CrossRef]
Argerich, C.; Carazo, A.; Sainges, O.; Petiot, E.; Barasinski, A.; Piana, M.; Ratier, L.; Chinesta, F. Empowering Design Based on Hybrid Twin: Application to Acoustic Resonators. Designs 2020, 4, 44. [Google Scholar] [CrossRef]
Casteran, F.; Delage, K.; Cassagnau, P.; Ibanez, R.; Argerich, C.; Chinesta, F. Application of Machine Learning tools for the improvement of reactive extrusion simulation. Macromol. Mater. Eng. 2020, 305, 2000375. [Google Scholar] [CrossRef]
Ghanem, R.; Soize, C.; Mehrez, L.; Aitharaju, V. Probabilistic learning and updating of a digital twin for composite material systems. Int. J. Numer. Methods Eng. 2022, 123, 3004–3020. [Google Scholar] [CrossRef]
Ghnatios, C.; Gérard, P.; Barasinski, A. An advanced resin reaction modeling using data-driven and digital twin techniques. Int. J. Mater. Form. 2023, 16, 5. [Google Scholar] [CrossRef]
Kapteyn, M.G.; Willcox, K.E. From Physics-Based Models to Predictive Digital Twins via Interpretable Machine Learning. arXiv 2020, arXiv:2004.11356v3. [Google Scholar]
Tuegel, E.J.; Ingraffea, A.R.; Eason, T.G.; Spottswood, S.M. Reengineering Aircraft Structural Life Prediction Using a Digital Twin. Int. J. Aerosp. Eng. 2011, 2011, 154798. [Google Scholar] [CrossRef] [Green Version]
Distefano, G.P. Stability of numerical integration techniques. AIChE J. 1968, 14, 946–955. [Google Scholar] [CrossRef]
Dar, S.H.; Chen, W.; Zheng, F.; Gao, S.; Hu, K. An LSTM with Differential Structure and Its Application in Action Recognition. Math. Probl. Eng. 2022, 2022, 7316396. [Google Scholar]
Zhou, G.B.; Wu, J.; Zhang, C.L.; Zhou, Z.H. Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 2016, 13, 226–234. [Google Scholar] [CrossRef] [Green Version]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Blaud, P.C.; Chevrel, P.; Claveau, F.; Haurant, P.; Mouraud, A. Resnet and polynet based identification and (mpc) control of dynamical systems: A promising way. IEEE Access 2022, 11, 20657–20672. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Computer Vision—ECCV 2016; ECCV 2016. Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; Volume 9908. [Google Scholar]
Schmid, P.J. Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 2010, 656, 528. [Google Scholar] [CrossRef] [Green Version]
Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D. Neural ordinary differential equations. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, QC, Canada, 2–8 December 2018; Volume 32, pp. 1–18. [Google Scholar]
Chen, R.T.Q.; Amos, B.; Nickel, M. Learning Neural Event Functions for Ordinary Differential Equations. arXiv 2020, arXiv:2011.03902. [Google Scholar]
Enciso-Salas, L.; Perez-Zuniga, G.; Sotomayor-Moriano, J. Fault Detection and Isolation for UAVs using Neural Ordinary Differential Equations. IFAC-PapersOnLine 2022, 55, 643–648. [Google Scholar] [CrossRef]
Kestelyn, X.; Denis, G.; Champaney, V.; Hascoet, N.; Ghnatios, C.; Chinesta, F. Towards a hybrid twin for infrastructure asset management: Investigation on power transformer asset maintenance management. In Proceedings of the 7th International Advanced Research Workshop on Transformers (ARWtr), Baiona, Spain, 23–26 October 2022; pp. 109–114. [Google Scholar]
IEC 60076-7:2018; Loading Guide for Mineral-Oil-Immersed Power Transformers. International electrotechnical Commission: Geneva, Switzerland, 2018.

Figure 1. Recurrent neural network architecture.

Figure 2. LSTM block architecture: The output

o_{n}

results from

l_{n}

, the previous

m - 1

inputs, while the model almost totally forgets the initial hidden long and short memory states

h_{0}

and

c_{0}

, which can be initialized with zero values.

Figure 2. LSTM block architecture: The output

o_{n}

results from

l_{n}

, the previous

m - 1

inputs, while the model almost totally forgets the initial hidden long and short memory states

h_{0}

and

c_{0}

, which can be initialized with zero values.

Figure 3. ResNet global architecture and integration mode (dotted line).

Figure 4. Solution computed by integrating the linear (top) and nonlinear (bottom) learning procedures.

Figure 5. Solution computed by integrating the linear (top) and nonlinear (bottom) learning procedures.

Figure 6. Experimental data

Θ (t)

, simulated solution

Θ^{T O} (t)

, and deviation

d (t) = Θ (t) - Θ^{T O}

.

Figure 6. Experimental data

Θ (t)

, simulated solution

Θ^{T O} (t)

, and deviation

d (t) = Θ (t) - Θ^{T O}

.

Figure 7. Network considered to model

g (\tilde{z})

. Variables

v_{i}

are intermediate variables involved in the construction of

g (\tilde{z})

. A similar architecture was considered for modeling

h (\tilde{z})

.

Figure 7. Network considered to model

g (\tilde{z})

. Variables

v_{i}

are intermediate variables involved in the construction of

g (\tilde{z})

. A similar architecture was considered for modeling

h (\tilde{z})

.

Figure 8. Physics-based simplified model correction from a stabilized linearized ResNet model, here illustrated with the test dataset.

Figure 9. Full data-driven model consisting of a stabilized ResNet model, here illustrated with the test dataset.

Figure 10. Comparing the errors of the data-only linearized ResNet and the linearized ResNet hybrid model with the test dataset.

Figure 11. Integration performed from a ResNet learned without enforcing stability constraints.

Table 1. The building blocks of the LSTM-based surrogate of

g (\tilde{z})

.

Table 1. The building blocks of the LSTM-based surrogate of

g (\tilde{z})

.

Layer	Building Block	Activation
1	LSTM layer with five outputs, return sequence true	sigmoid + tanh
2	Flatten	no activation
3	Dense connection with one output	relu
4	Lambda layer returning $- 1 \times$ inputs	no activation

Table 2. The building blocks of the LSTM-based surrogate of

h (\tilde{z})

.

Table 2. The building blocks of the LSTM-based surrogate of

h (\tilde{z})

.

Layer	Building Block	Activation
1	LSTM layer with five outputs, return sequence true	sigmoid + tanh
2	Flatten	no activation
3	Dense connection with one output	linear

Table 3. Comparing different built models: mean error (in °C) with the testing set. The nonlinear physics-based model’s mean error was 3.91 °C when used alone with the same testing set and that of the linear physics-based model was 3.25 °C.

ResNet	Fully	HT from a Linear	HT from a Nonlinear
	Data-Driven	Physical Model	Physical Model
Linear stabilized	2.173	3.143	1.620
Nonlinear stabilized	1.716	1.516	1.439

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghnatios, C.; Kestelyn, X.; Denis, G.; Champaney, V.; Chinesta, F. Learning Data-Driven Stable Corrections of Dynamical Systems—Application to the Simulation of the Top-Oil Temperature Evolution of a Power Transformer. Energies 2023, 16, 5790. https://doi.org/10.3390/en16155790

AMA Style

Ghnatios C, Kestelyn X, Denis G, Champaney V, Chinesta F. Learning Data-Driven Stable Corrections of Dynamical Systems—Application to the Simulation of the Top-Oil Temperature Evolution of a Power Transformer. Energies. 2023; 16(15):5790. https://doi.org/10.3390/en16155790

Chicago/Turabian Style

Ghnatios, Chady, Xavier Kestelyn, Guillaume Denis, Victor Champaney, and Francisco Chinesta. 2023. "Learning Data-Driven Stable Corrections of Dynamical Systems—Application to the Simulation of the Top-Oil Temperature Evolution of a Power Transformer" Energies 16, no. 15: 5790. https://doi.org/10.3390/en16155790

APA Style

Ghnatios, C., Kestelyn, X., Denis, G., Champaney, V., & Chinesta, F. (2023). Learning Data-Driven Stable Corrections of Dynamical Systems—Application to the Simulation of the Top-Oil Temperature Evolution of a Power Transformer. Energies, 16(15), 5790. https://doi.org/10.3390/en16155790

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning Data-Driven Stable Corrections of Dynamical Systems—Application to the Simulation of the Top-Oil Temperature Evolution of a Power Transformer

Abstract

1. Introduction

1.1. The Three Main Simulation-Based Engineering Methodologies Revisited

1.2. Paper Organization

2. Illustrating the Hybrid Approach

2.1. A Simple Linear Regression Reasoning

2.2. General Remarks

2.3. On the Domain of Application of the Hybrid Modeling Approach

3. Methods

3.1. On the Integration Stability

3.2. Learning Integrators

3.2.1. Recurrent Neural Network

3.2.2. On the Model Memory

3.2.3. Learners with Larger Memory

3.2.4. Residual Nets

4. Simple Procedures for Stabilizing ResNet-Based Integration

4.1. Learning Stable Linear Dynamical Systems

4.2. Learning Stable Nonlinear Dynamical Systems

4.3. Numerical Examples and Discussion

4.3.1. Linear Dynamical System

4.3.2. Nonlinear Dynamical System

4.4. Final Remarks

5. Application to the Evaluation of the Top-Oil Temperature of an Electric Power Transformer

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI