The Finite-Time Turnpike Property in Machine Learning

Gugat, Martin

doi:10.3390/machines12100705

Open AccessArticle

The Finite-Time Turnpike Property in Machine Learning

by

Martin Gugat

Department Mathematik, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Cauerstr. 11, 91058 Erlangen, Germany

Machines 2024, 12(10), 705; https://doi.org/10.3390/machines12100705

Submission received: 14 August 2024 / Revised: 23 September 2024 / Accepted: 2 October 2024 / Published: 4 October 2024

(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)

Download

Browse Figure

Versions Notes

Abstract

:

The finite-time turnpike property describes the situation in an optimal control problem where an optimal trajectory reaches the desired state before the end of the time interval and remains there. We consider a machine learning problem with a neural ordinary differential equation that can be seen as a homogenization of a deep ResNet. We show that with the appropriate scaling of the quadratic control cost and the non-smooth tracking term, the optimal control problem has the finite-time turnpike property; that is, the desired state is reached within the time interval and the optimal state remains there until the terminal time T. The time

t_{0}

where the optimal trajectories reach the desired state can serve as an additional design parameter. Since ResNets can be viewed as discretizations of neural odes, the choice of

t_{0}

corresponds to the choice of the number of layers; that is, the depth of the neural network. The choice of

t_{0}

allows us to achieve a compromise between the depth of the network and the size of the optimal system parameters, which we hope will be useful to determine the optimal depths for neural network architectures in the future.

Keywords:

ResNet; neural ODE; finite-time turnpike property; turnpike phenomenon; non-smooth tracking term; machine learning; optimal control

1. Introduction

We study a system that is governed by a neural ODE that can be considered a continuous-time ResNet. Before we can outline the system, some notation is necessary.

The activation function

σ

is assumed to be continuously differentiable and Lipschitz continuous with a Lipschitz constant that is less than or equal to 1, for example

σ (z) = \tanh (z),

or

σ (z) = \frac{1}{1 + \exp (- z)}

where for

z \in R^{d}

the function

σ

acts component-wise; that is,

σ (z) \in R^{d}

with the i-th component, e.g.,

{(σ (z))}_{i} = \tanh (z_{i})

, (

i \in {1, \dots, d}

).

Let a real number

T > 0

and natural numbers d and p in

{1, 2, 3, \dots}

be given. For

i \in {1, \dots, p}

and

t \in [0, T]

almost everywhere, let

w_{i} (t) \in R^{d}

and

a_{i} (t) \in R^{d}

be given. The

w_{i} (t)

are the columns of the matrix

W (t) \in R^{d \times p}

and the

a_{i} (t)

are the columns of the matrix

A (t) \in R^{d \times p}

. For

t \in [0, T]

, almost everywhere, let the bias vector

b (t)

in

R^{p}

with the components

b_{i} (t)

(

i \in {1, \dots, p})

be given. In order to state the required regularity assumptions, we introduce the space

X (T) = {measurable functions (W (t), A (t), b (t)) defined on (0, T)

such that

\int_{0}^{T} {∥ W (t) ∥}^{2} + {∥ A (t) ∥}^{2} + {∥ b (t) ∥}^{2} d t < \infty} .

For parameters

(W, A, b) \in X (T)

, the system

S

is defined as follows:

S \{\begin{matrix} x (0) = x_{0} \in R^{d}, \\ x^{'} (t) = \sum_{i = 1}^{p} σ (a_{i} {(t)}^{⊤} x (t) + b_{i} (t)) w_{i} (t) \end{matrix}

(1)

(see for example [1,2]).

The motivation to study (1) is that a time-discrete version can be considered as a residual neural network (ResNet) that has been very useful in many applications; see [3] for identification problems with physics-informed neural ordinary differential equations, [4] for applications in image classification for the detection of colorectal cancer, and [5] for examples in image registration and classification problems. A time-discrete version can be obtained for example by an explicit Euler discretization of (1); see (28) in Section 4. The fact that ’ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations’ has been discussed in detail in [6].

For the given time horizon

T > 0

, we study an optimal control problem on the time interval

[0, T]

. The desired state is given by

x_{T} \in R^{d}

; that is,

x_{T}

denotes the desired output of the system. Let

t_{0} \in (0, T)

be given. For the training of the system, we study the loss function with a tracking term

Q (W, A, b) = \int_{t_{0}}^{T} | x (t) - x_{T} | + | x^{'} (t) | d t .

(2)

with the non-smooth norm

| z | = \sum_{i = 1}^{d} | z_{i} |

. For our result, the inclusion of the derivative

x^{'}

in the definition of Q in (2) is essential, since due to this inclusion the loss function multiplied with the factor

(\frac{1}{T - t_{0}} + 1)

is an upper bound for the maximum norm of

x - x_{T}

on

[t_{0}, T]

; see the inequality (9) in Lemma 1. This allows us to prove the finite-time turnpike property in Theorem 1. We will explain the turnpike phenomenon below.

We define the control cost (regularization term)

R (W, A, b) = \int_{0}^{T} \frac{1}{2} {∥ W (t) ∥}^{2} + \frac{1}{2} {∥ A (t) ∥}^{2} + \frac{1}{2} {∥ b (t) ∥}^{2} d t .

(3)

Here

∥ W (t) ∥

denotes the Frobenius norm of

W (t)

.

Lemma 10 in [7] states that system (1) is exactly controllable; that is, the terminal condition

x (t_{0}) = z

(4)

can be satisfied for all

t_{0} > 0

. To be precise, for all

t_{0} > 0

there exists a constant

C_{e} > 0

, such that for all

z \in R^{d}

we can find a control

u_{e x a c t} = (W_{e x a c t}, A_{e x a c t}, b_{e x a c t})

, such that for the state

\tilde{x}

that is generated by (1) with the initial condition

\tilde{x} (0) = x_{0}

, we have

\tilde{x} (t_{0}) = z

and

∥ u_{e x a c t} ∥_{L^{2} (0, t_{0})} \leq C_{e} ∥ z - x_{0} ∥ .

(5)

Also, the linearized system is exactly controllable in the sense that for all

t_{0} > 0

there exists a constant

C_{e} > 0

, such that for all

z \in R^{d}

we can find a control

δ u

, such that for the state

δ x

that is generated by the linearized system that is stated below with the initial condition

δ x (0) = 0

, we have

δ x (t_{0}) = z

and

{∥ δ u ∥}_{L^{2} (0, t_{0})} \leq C_{e} ∥ z ∥ .

(6)

The linearized system at a given

u = (W, A, b)

for the variation

δ x

of the state that is generated by a variation

δ u = (δ W, δ A, δ b)

of the control is

δ x^{'} (t) = \sum_{i = 1}^{p} σ (a_{i} {(t)}^{⊤} x (t) + b_{i} (t)) δ w_{i} (t) + \sum_{i = 1}^{p} σ^{'} (a_{i} {(t)}^{⊤} x (t) + b_{i} (t)) w_{i} (t) x {(t)}^{⊤} δ a_{i} (t)

+ \sum_{i = 1}^{p} σ^{'} (a_{i} {(t)}^{⊤} x (t) + b_{i} (t)) w_{i} (t) δ b_{i} (t) + \sum_{i = 1}^{p} σ^{'} (a_{i} {(t)}^{⊤} x (t) + b_{i} (t)) w_{i} (t) a_{i} {(t)}^{⊤} δ x (t)

with the initial condition

δ x (0) = 0

.

A universal approximation theorem for the corresponding time-discrete case with recurrent neural networks can be found in the seminal paper [8] by Cybenko; see also [9,10,11,12].

For a parameter

γ > 0

, define

J (W, A, b) = γ Q (W, A, b) + R (W, A, b)

(7)

with Q as defined in (2) and R as defined in (3). We study the minimization (training) problem

P (T, γ) : \min_{(W, A, b) \in X (T)} J (W, A, b)

Our main result is that if

γ

is chosen sufficiently large, the optimal control problem

P (T, γ)

has the finite-time turnpike property; that is, the desired state is already reached within the time interval

[0, T]

and remains there until the end of the time interval.

Figure 1 shows an example for the graph of the function

t \mapsto | x (t) |

(

t \in [0, T]

), where x is an optimal trajectory that has the finite-time turnpike property. Our main result in Theorem 1 states that the optimal trajectories of

P (T, γ)

vanish on the interval

[t_{0}, T]

.

The finite-time turnpike property has been studied for example in [13,14,15]. In the first two references, the finite-time turnpike property is achieved by the non-smoothness of the objective functional. In this paper, we use a similar approach adapted to the framework of neural ordinary differential equations.

The finite-time turnpike property is an extremal case of the celebrated turnpike property that has originally been studied in economics. The turnpike analysis investigates how the solutions of dynamic optimal control problems with a time evolution are related to the solutions of the corresponding static problems where the time derivatives are set to zero and the initial conditions are cancelled. It turns out that, often for large time horizons on large parts of the time interval, the solution of the dynamic problems is very close to the solution of the corresponding static problem. For an overview about the turnpike property, see [16,17,18,19] and the numerous references therein.

In the case of the finite-time turnpike property, after a finite time, the solution of the dynamic problem coincides with the solution of the static problem. The exponential turnpike property for ResNets and beyond has been studied for example in [20], but not the finite-time turnpike property.

Our approach yields an optimization problem with a non-smooth objective functional without terminal constraints. For the numerical solution of this type of problem, a relaxation approach can be used; see, for example, [21]. Due to the finite-time turnpike property, we obtain learning problems without terminal conditions (that are easier to solve than problems with terminal constraints), where the optimal trajectories still attain the desired terminal states exactly.

Since the objective functional contains a tracking term of

L^{1}

-norm type, our problem is related to studies in compressed sensing, where this type of objective functional is used to enhance sparsity; see [22]. For a study about sparsity in Bregman machine learning, see [23].

In Section 3, we discuss the well-posedness of

P (T, γ)

. We present a result about the existence of solutions of

P (T, γ)

for a fixed matrix A. This implies the existence of a solution for the problem where the feasible set contains only constant matrices A that are independent of t; see Remark 2.

In Section 4, numerical examples are presented that illustrate that the finite-time turnpike property is also visible in the time-discrete case.

2. The Finite-Time Turnpike Property

The following Theorem contains our main result, which states that the control cost entails the finite-time turnpike property.

Theorem 1.

For each sufficiently large

γ > 0

each optimal trajectory for

P (T, γ)

satisfies

x (t) = x_{T}, t \in [t_{0}, T],

(8)

that is

P (T, γ)

has the finite-time turnpike property. For

t \geq t_{0}

for the optimal parameters, we have

W (t) = 0

,

A (t) = 0

and

b (t) = 0

. The optimal parameters remain unchanged if γ is further enlarged or if T is further enlarged.

Remark 1.

The condition in Theorem 1 that γ is sufficiently large is satisfied if inequality (19) holds, which is stated below. This requires a large size of γ if

t_{0}

is chosen very close to T. In fact, as

t_{0} \in (0, T)

approaches T, the lower bound for γ converges to infinity.

For the proof of Theorem 1, we need a result regarding the embedding of the continuous functions in the Sobolev space

W^{1, 1}

which are related to our learning problem due to the definition of Q in (2). Let

L^{1} (0, T) = {f : [0, T] \to R, f is measurable, i . e ., \int_{0}^{T} | f (t) | d t < \infty} .

Consider the embedding of the space of continuous functions in the space

W^{1, 1} (0, T) = {f \in L^{1} (0, T) : f^{'} \in L^{1} (0, T)} .

Lemma 1.

Let

t_{0} \in [0, T)

. For all

x \in W^{1, 1} (t_{0}, T)

, we have the inequality

\max_{t \in [t_{0}, T]} | x (t) | \leq (\frac{1}{T - t_{0}} + 1) \int_{t_{0}}^{T} |x (t)| + |x^{'} (t)| d t .

(9)

Proof of Lemma 1.

For

t_{1}

,

t_{2} \in [t_{0}, T]

, we have

\begin{matrix} | x (t_{1}) - x (t_{2}) | & = & |\int_{t_{1}}^{t_{2}} x^{'} (t) d t| \\ \leq & |\int_{t_{1}}^{t_{2}} | x^{'} (t) | d t| . \end{matrix}

Thus, x is continuous on

[t_{0}, T]

. Hence, there exists a point

t_{*} \in [t_{0}, T]

with

| x (t_{*}) | = \min_{t \in [t_{0}, T]} | x (t) | \leq \frac{1}{T - t_{0}} \int_{t_{0}}^{T} | x (t) | d t .

Thus, for all

τ \in [t_{0}, T]

the following inequality holds:

\begin{matrix} | x (τ) | & \leq & | x (t_{*}) | + | x (t_{*}) - x (τ) | \\ \leq & \frac{1}{T - t_{0}} \int_{t_{0}}^{T} | x (t) | d t + \int_{t_{0}}^{T} | x^{'} (t) | d t \\ \leq & (\frac{1}{T - t_{0}} + 1) \int_{t_{0}}^{T} |x (t)| + |x^{'} (t)| d t . \end{matrix}

□

Now, we are prepared to detail the proof of Theorem 1.

Proof of Theorem 1.

Case 1: If $x_{0} = x_{T}$ , the parameters $u_{*} = (W_{*}, A_{*}, b_{*}) = (0, 0, 0)$ generate the constant state $x (t) = x_{T}$ . Hence, $u_{*} = 0$ solves $P (T, γ)$ and the assertion follows.
Case 2: Now, we assume that $x_{0} \neq x_{T}$ . For $u = (W, A, b) \in X (T)$ , define the cost

$C_{(0, t_{0})} (u) = \int_{0}^{t_{0}} \frac{1}{2} {∥ W (t) ∥}^{2} + \frac{1}{2} {∥ A (t) ∥}^{2} + \frac{1}{2} {∥ b (t) ∥}^{2} d t .$

(10)

Consider the non-smooth tracking term

Q (u)

as defined in (2). For

u \in X (T)

, define the objective functional

K_{T} (u) = C_{(0, t_{0})} (u) + γ Q (u) .

(11)

We consider the auxiliary problem

Q (T) : \min_{u \in X (T)} K_{T} (u) .

We show that for each solution

u_{*}

of

Q (T)

, we have

Q (u_{*}) = 0

by an indirect proof. Suppose that there exists a solution

u_{*} = (W_{*}, A_{*}, b_{*})

of

Q (T)

, such that

Q (u_{*}) > 0

. Then, for the corresponding optimal state

x_{*}

that is generated by system (1), we have

x_{*} (t_{0}) \neq x_{T}

; otherwise, (that is, if

x_{*} (t_{0}) = x_{T}

) we could proceed as follows: Switch off the control at

t_{0}

and continue with the zero control

(0, 0, 0)

for the

t \in (t_{0}, T]

that generates the constant state

x_{T}

on

(t_{0}, T]

. This approach would strictly improve the performance, since for the concatenated control

u_{M} (t) = u_{*} (t)

for

t \in [0, t_{0}]

,

u_{M} (t) = 0

for

t \in (t_{0}, T]

, we have

C_{(t_{0}, T)} (u_{M}) = 0

,

Q (u_{M}) = 0

and thus,

K_{T} (u_{M}) = C_{(0, t_{0})} (u_{*}) < C_{(0, t_{0})} (u_{*}) + γ Q (u *) = K_{T} (u_{*})

which is a contradiction, since

u_{*}

is an optimal control for

Q (T)

.

Define the auxiliary state

\tilde{x} (t_{0}) = x_{T} + \frac{1}{Q (u_{*})} (x_{*} (t_{0}) - x_{T}) .

(12)

The exact controllability of the linearized system implies that we can find a control

\tilde{v} \in L^{2} (0, t_{0})

that due to (6) and (12) satisfies the inequality

∥ \tilde{v} ∥_{L^{2} (0, t_{0})} \leq C_{e} ∥ \tilde{x} (t_{0}) - x_{T} ∥ = C_{e} \frac{1}{Q (u_{*})} ∥ x_{*} (t_{0}) - x_{T} ∥

(13)

that generates the state

\tilde{V}

with

\tilde{V} (0, \cdot) = 0

and

\tilde{V} (t_{0}) = \tilde{x} (t_{0}) - x_{T}

.

Due to (9), we have

∥ x_{*} (t_{0}) - x_{T} ∥ \leq (\frac{1}{T - t_{0}} + 1) \int_{t_{0}}^{T} |x_{*} (t) - x_{T}| + |x_{*}^{'} (t)| d t

(14)

= (\frac{1}{T - t_{0}} + 1) Q (u_{*}) .

Thus, (13) implies

∥ \tilde{v} ∥_{L^{2} (0, t_{0})} \leq C_{e} (\frac{1}{T - t_{0}} + 1) .

For a step-size

ε \in (0, Q (u_{*}))

, define

λ = 1 - \frac{ε}{Q (u_{*})} \in (0, 1) .

(15)

Consider the control u that, for

t \in (0, t_{0}]

, is defined by the equation

u (t) = u_{*} (t) - ε \tilde{v} (t)

(16)

and, for

t \in (t_{0}, T)

, we define

\tilde{v} = (δ W, δ A, δ b)

with

δ W (t) = - \frac{1}{Q (u_{*})} W_{*} (t), δ A (t) = - \frac{1}{Q (u_{*})} A_{*} (t), δ b (t) = - \frac{1}{Q (u_{*})} b_{*} (t) .

We will show that if

γ > 0

is sufficiently large,

- \tilde{v}

is a descent direction, in the sense that with a little step in the direction of

- \tilde{v}

, we can improve the performance of the control

u_{*}

. This can be seen as follows.

For the state

x = x_{*} + δ x

that is generated with the solution

δ x

of the linearized system with the initial condition

δ x (0) = 0

and the control

- ε \tilde{v}

, we have, at

t_{0}

,

δ x (t_{0}) = - ε (\tilde{x} (t_{0}) - x_{T}) = - \frac{ε}{Q (u_{*})} (x_{*} (t_{0}) - x_{T}) .

where the last step follows from (12).

On the following time interval

[t_{0}, T]

, for the corresponding state

x = x_{*} + δ x

that is generated with the solution

δ x

of the linearized system starting with

δ x (t_{0}) = - \frac{ε}{Q (u_{*})} (x_{*} (t_{0}) - x_{T})

and the control

- ε \tilde{v}

, we have

δ x (t) = - \frac{ε}{Q (u_{*})} (x_{*} (t) - x_{T}) + O (∥ ε \tilde{v} ∥^{2}) .

For

t \in [t_{0}, T]

, due to the definition of

λ

in (15), this yields

x (t) - x_{T} = λ (x_{*} (t) - x_{T}) + O (∥ ε \tilde{v} ∥^{2}) .

Thus, for the tracking term with u defined in (16) we have the bound

Q (u) = λ Q (u_{*}) + O (ε^{2}) = (1 - \frac{ε}{Q (u_{*})}) Q (u_{*}) + O (ε^{2}) .

For the control cost defined in (10), we have

C_{(0, t_{0})} (u) = {〈 u_{*} - ε \tilde{v}, u_{*} - ε \tilde{v} 〉}_{L^{2} (0, t_{0})} = C_{(0, t_{0})} (u_{*}) - 2 ε {〈 u_{*}, \tilde{v} 〉}_{L^{2} (0, t_{0})} + ε^{2} C_{(0, t_{0})} (\tilde{v}) .

With

K_{T}

as defined in (11), consider the function

p (ε) = K_{T} (u_{*} - ε \tilde{v})

= C_{(0, t_{0})} (u_{*}) - 2 ε {〈 u_{*}, \tilde{v} 〉}_{L^{2} (0, t_{0})} + ε^{2} C_{(0, t_{0})} (\tilde{v}) + γ (1 - \frac{ε}{Q (u_{*})}) Q (u_{*}) + O (ε^{2}) .

Then, we have

p^{'} (ε) = - 2 {〈 u_{*}, \tilde{v} 〉}_{L^{2} (0, t_{0})} + 2 ε C_{(0, t_{0})} (\tilde{v}) - γ + O (ε) .

This yields

p^{'} (0) = - 2 {〈 u_{*}, \tilde{v} 〉}_{L^{2} (0, t_{0})} - γ .

(17)

The exact controllability of (1) implies that there is a control

u_{e x a c t} \in L^{2} (0, t_{0})

with (due to (5))

∥ u_{e x a c t} ∥_{L^{2} (0, t_{0})} \leq C_{e} ∥ x_{0} - x_{T} ∥

that generates the state

V_{e x a c t}

with

V_{e x a c t} (0, \cdot) = x_{0}

and

V_{e x a c t} (t_{0}, \cdot) = x_{T}

. For

t > t_{0}

, let

u_{e x a c t} (t) = 0

. Since

u_{e x a c t}

is feasible for

Q (T)

, this yields the inequality

C_{(t_{0}, T)} (u_{*}) \leq K_{T} (u_{*}) \leq K_{T} (u_{e x a c t}) = ∥ u_{e x a c t} ∥_{L^{2} (0, t_{0})}^{2} \leq C_{e}^{2} {∥ x_{0} - x_{T} ∥}_{L^{2} (0, L)}^{2} .

Hence, we have

|{〈 u_{*}, \tilde{v} 〉}_{L^{2} (0, t_{0})}| \leq C_{e} ∥ x_{0} - x_{T} ∥_{L^{2} (0, L)} ∥ \tilde{v} ∥_{L^{2} (0, t_{0})} \leq {∥ x_{0} - x_{T} ∥}_{L^{2} (0, L)} C_{e}^{2} (\frac{1}{T - t_{0}} + 1) .

(18)

Thus, if

γ > 2 ∥ x_{0} - x_{T} ∥_{L^{2} (0, L)} C_{e}^{2} (\frac{1}{T - t_{0}} + 1),

(19)

due to (17) and (18), we have

p^{'} (0) \leq - γ + 2 {∥ x_{0} - x_{T} ∥}_{L^{2} (0, L)} C_{e}^{2} (\frac{1}{T - t_{0}} + 1) < 0

. This implies that for a

ε > 0

sufficiently small, we have

K_{T} (u_{*} - ε \tilde{v}) < K_{T} (u_{*}),

which is a contradiction to the optimality of

u^{*}

.

Hence, for any optimal control of

Q (T)

, we have

Q (u_{*}) = 0 .

(20)

With inequality (14), this implies that for each optimal state, we have

x_{*} (t_{0}) = x_{T} .

(21)

Now, we come back to the problem

P (T, γ) : \min_{u} J (u)

with J defined in (7). Let

v_{P} (T)

denote the optimal value of

P (T, γ)

and

v_{Q} (T)

denote the optimal value of

Q (T)

. Since

K_{T} (u) \leq J (u)

, we have

v_{Q} (T) \leq v_{P} (T) .

Moreover, any optimal control

u_{*}

for

Q (T)

is feasible for

P (T, γ)

. Since (21) holds, i.e.,

x_{*} (t_{0}) = x_{T}

, we have

C_{(t_{0}, T)} (u_{*}) = 0

. Hence,

v_{P} (T) \leq J (u_{*}) = K_{T} (u_{*}) = v_{Q} (T)

, and thus

v_{P} (T) \leq v_{Q} (T) .

Therefore, we have

v_{P} (T) = v_{Q} (T) .

(22)

Equation (22) implies that the parameters that are optimal for

P (T)

are also optimal for

Q (T)

. Hence, for each optimal state, we have (21) and (20), which implies (8) and the assertion follows. Thus, we have proved Theorem 1. □

3. The Existence of Solutions of P(T, γ) for Fixed A

For the sake of the completeness of the analysis, we also state an existence result. However, we can only prove the existence of a solution for the problem where the matrix A is fixed and not an optimization parameter for

P (T, γ)

. Thus, for a given matrix-valued function

A (t)

, we consider the problem

P (T, γ, A) : \min_{(\cdot, A, \cdot) \in X (T)} J (\cdot, A, \cdot)

In order to show the existence of a solution of

P (T, γ, A)

, we assume that there exists a number

M > 0

, such that for

t \in [0, T]

, almost everywhere, we have

\max_{i \in {1, \dots, p}} ∥ (a_{i}) (t) ∥ \leq M

. This is the case if the

a_{i}

are elements of the function space

L^{\infty} (0, T)

; for example, if they are step functions over

(0, T)

.

Theorem 2.

Assume that

\sup_{x} | σ (x) | \leq 1

and the Lipschitz constant of σ is less than or equal to 1. Assume that

A (t)

is given, such that we have

ess \sup_{i \in {1, \dots, p}, s \in [0, T]} ∥ (a_{i}) (s) ∥ < \infty .

Then, for each

T > 0

and

γ > 0

, problem

P (T, γ, A)

has a solution

W, b

, such that

(W, A, b) \in X (T)

.

If

A (t) = 0

for

t \geq t_{0}

, for sufficiently large γ, each solution of

P (T, γ, A)

has the finite-time turnpike property stated in Theorem 1.

The proof of Theorem 2 uses Gronwall’s Lemma (see, for example, [24]). For the convenience of the reader, we state it here:

Lemma 2

(Gronwall’s Lemma). Let

L > 0

,

U_{0} \geq 0

,

ε \geq 0

and an integrable function U on

[0, T]

be given.

Assume that for

t \in [0, T]

, almost everywhere, the integral inequality

0 \leq U (t) \leq U_{0} + \int_{0}^{t} L U (τ) + ε d τ

(23)

hold. Then, for

t \in [0, T]

, almost everywhere, the function U satisfies the inequality

U (t) \leq U_{0} e^{L t} + ε \frac{e^{L t} - 1}{L} .

(24)

Now, we present the proof of Theorem 2.

Proof of Theorem 2.

Consider a minimizing sequence

{(u_{n})}_{n = 1}^{\infty}

with

u_{n} = (W_{n}, A, b_{n}) \in X (T)

for all

n \in {1, 2, 3, \dots}

. Define the norm

∥ u_{n} ∥_{X (T)} = \sqrt{\int_{0}^{T} ∥ W_{n} {(t) ∥}^{2} + {∥ A (t) ∥}^{2} + {∥ b_{n} (t) ∥}^{2} d t}

and the corresponding inner product that gives a Hilbert space structure to

X (T)

. Due to the definition of J, there exists a number

M > 0

, such that for all

n \in {1, 2, \dots}

we have

∥ u_{n} ∥_{X (T)} \leq M,

(25)

that is the sequence is bounded in

X (T)

.

Hence, there exists a weakly converging subsequence with a limit

u_{*} = (W_{*}, A, b_{*}) \in X (T) .

Let

x_{*}

denote the state generated by

u_{*}

. For the states

x_{n}

generated by the

u_{n}

as a solution of

S

defined in (1) we can assume, by increasing M if necessary, that we have

\sup_{s \in [0, T], n \in {1, 2, 3, \dots}} ∥ x_{n} (s) ∥ \leq M .

Due to Mazur’s Lemma (see for example [25,26]), there exists a subsequence of convex combinations that converges strongly. To be precise, there exist convex combinations

v_{k} = \sum_{m = k}^{N (k)} λ_{m}^{(k)} u_{m}, with λ_{m}^{(k)} \geq 0, k \leq m \leq N (k) and \sum_{m = k}^{N (k)} λ_{m}^{(k)} = 1,

such that

\lim_{k \to \infty} {∥ v_{k} - u_{*} ∥}_{X (T)} = 0 .

This implies

\lim_{k \to \infty} \int_{0}^{T} ∥ W_{n} (t) - W_{*} (t) ∥ + ∥ b_{n} (t) - b_{*} (t) ∥ d t = 0 .

Since

σ

is Lipschitz continuous with a Lipschitz constant that is less than or equal to 1, this implies for

i \in {1, \dots, p}

|σ (\sum_{m = k}^{N (k)} λ_{m}^{(k)} [(a_{i}) {(t)}^{⊤} x_{m} (t) + {(b_{i})}_{m} (t))] - σ ((a_{i}) {(t)}^{⊤} x_{*} (t) + {(b_{i})}_{*} (t))|

\leq |\sum_{m = k}^{N (k)} λ_{m}^{(k)} [(a_{i}) {(t)}^{⊤} x_{m} (t) + {(b_{i})}_{m} (t)))] - ((a_{i}) {(t)}^{⊤} x_{*} (t) + {(b_{i})}_{*} (t))| .

(26)

Thus, for

t \in [0, T]

, almost everywhere, we have

∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥

\leq \sum_{i = 1}^{p} \int_{0}^{t} ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (s) - {(w_{i})}_{*} (s)∥ |σ ((a_{i}) {(s)}^{⊤} x_{*} (s) + {(b_{i})}_{*} (s))|

+ ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (s)∥ |σ (\sum_{m = k}^{N (k)} λ_{m}^{(k)} (a_{i}) {(s)}^{⊤} x_{m} (s) + {(b_{i})}_{m} (s))) - σ ((a_{i}) {(s)}^{⊤} x_{*} (s) + {(b_{i})}_{*} (s))| d s .

Then, the fact that

\sup_{x} | σ (x) | \leq 1

, the Cauchy–Schwarz inequality, (25) and (26) yield

∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥ \leq \sum_{i = 1}^{p} \int_{0}^{t} ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (s) - {(w_{i})}_{*} (s)∥ d s

+ \sqrt{\int_{0}^{t} {∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (s)∥}^{2} d s} \sqrt{\int_{0}^{t} {|σ (\sum_{m = k}^{N (k)} λ_{m}^{(k)} (a_{i}) {(s)}^{⊤} x_{m} (s) + {(b_{i})}_{m} (s))) - σ ((a_{i}) {(s)}^{⊤} x_{*} (s) + {(b_{i})}_{*} (s))|}^{2} d s}

\leq \sum_{i = 1}^{p} \int_{0}^{t} ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (s) - {(w_{i})}_{*} (s)∥ d s

+ M \sqrt{\int_{0}^{t} {|(\sum_{m = k}^{N (k)} λ_{m}^{(k)} (a_{i}) {(s)}^{⊤} x_{m} (s) + {(b_{i})}_{m} (s)) - ((a_{i}) {(s)}^{⊤} x_{*} (s) + {(b_{i})}_{*} (s))|}^{2} d s}

\leq \sum_{i = 1}^{p} \int_{0}^{t} ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (s) - {(w_{i})}_{*} (s)∥ d s

+ M \sqrt{\int_{0}^{t} {|(a_{i}) {(s)}^{⊤} [\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (s) - x_{*} (s)]|}^{2} d s} + M \sqrt{\int_{0}^{t} {|\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(b_{i})}_{m} (s) - {(b_{i})}_{*} (s)|}^{2} d s .}

Due to Mazur’s Lemma, this yields the existence of a sequence

{(ϵ_{k})}_{k}

with

ϵ_{k} \geq 0

and

\lim_{k \to \infty} ϵ_{k} = 0

, such that

∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥ \leq ε_{k} + \sum_{i = 1}^{p} M \sqrt{\int_{0}^{t} ess \sup_{s \in (0, T)} {∥ (a_{i}) (s) ∥}^{2} {∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥}^{2} d t} .

Thus, by increasing the value of M if necessary, we obtain for

t \in [0, T]

, almost everywhere,

∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥ \leq ε_{k} + \sum_{i = 1}^{p} M \sqrt{\int_{0}^{t} M^{2} {∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (s) - x_{*} (s)∥}^{2} d s} .

Since

(| u | + {| v |)}^{2} \leq {2 | u |}^{2} + 2 {| v |}^{2}

and

{(\sum_{i = 1}^{p} \sqrt{| z_{i} |})}^{2} \leq p \sum_{i = 1}^{p} | z_{i} |

, this yields the integral inequality

{∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥}^{2} \leq 2 {(ε_{k})}^{2} + 2 p M^{4} \sum_{i = 1}^{p} \int_{0}^{t} {∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (s) - x_{*} (s)∥}^{2} d s .

(27)

The integral inequality (27) has the form of (23) in Lemma 2. Hence, (24) from Gronwall’s Lemma yields for

t \in [0, T]

, almost everywhere,

∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥ = O (ε_{k}) .

This yields

\lim_{k \to \infty} \max_{t \in [0 T]} ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥ = 0 .

For the time derivatives we obtain, again by increasing the value of M if necessary,

\int_{0}^{T} ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m}^{'} (t) - x_{*}^{'} (t)∥ d t

\leq \sum_{i = 1}^{p} \int_{0}^{T} ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (t) - {(w_{i})}_{*} (t)∥ |σ ((a_{i}) {(t)}^{⊤} x_{*} (t) + {(b_{i})}_{*} (t))|

+ ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (t)∥ |σ (\sum_{m = k}^{N (k)} λ_{m}^{(k)} (a_{i}) {(t)}^{⊤} x_{m} (t) + {(b_{i})}_{m} (t))) - σ ((a_{i}) {(t)}^{⊤} x_{*} (t) + {(b_{i})}_{*} (t))| d t

\leq \sum_{i = 1}^{p} \int_{0}^{T} ∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (t) - {(w_{i})}_{*} (t)∥ d t

+ \sqrt{\int_{0}^{T} {∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(w_{i})}_{m} (t)∥}^{2} d t} \sqrt{\int_{0}^{T} {|\sum_{m = k}^{N (k)} λ_{m}^{(k)} (a_{i}) {(t)}^{⊤} x_{m} (t) + {(b_{i})}_{m} (t) - ((a_{i}) {(t)}^{⊤} x_{*} (t) + {(b_{i})}_{*} (t))|}^{2} d t}

\leq ε_{k} + \sum_{i = 1}^{p} M \sqrt{\int_{0}^{T} {|\sum_{m = k}^{N (k)} λ_{m}^{(k)} (a_{i}) {(t)}^{⊤} x_{m} (t) - (a_{i}) {(t)}^{⊤} x_{*} (t)|}^{2} d t}

+ M \sqrt{\int_{0}^{T} {|\sum_{m = k}^{N (k)} λ_{m}^{(k)} {(b_{i})}_{m} (t) - {(b_{i})}_{*} (t)|}^{2} d t}

\leq ε_{k} (1 + M) + M \sum_{i = 1}^{p} \sqrt{\int_{0}^{T} {|{(a_{i})}_{m} {(t)}^{⊤} [\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)]|}^{2} d t}

\leq ε_{k} (1 + M) + M \sum_{i = 1}^{p} \sqrt{\int_{0}^{T} ∥ (a_{i}) (t) ∥^{2} {∥ \sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} ∥}^{2} d t}

\leq ε_{k} (1 + M)

+ M \sum_{i = 1}^{p} ess \sup_{s \in [0, T]} ∥ (a_{i}) (s) ∥ \sqrt{\int_{0}^{T} {∥\sum_{m = k}^{N (k)} λ_{m}^{(k)} x_{m} (t) - x_{*} (t)∥}^{2} d t}

\leq ε_{k} (1 + M) + M \sum_{i = 1}^{p} ess \sup_{s \in [0, T]} ∥ (a_{i}) (s) ∥ ε_{k} = O (ε_{k}) .

Thus, we have

\underset{k \to \infty}{\lim \inf} Q (v_{k}) \geq Q (u_{*}) and \underset{k \to \infty}{\lim \inf} R (v_{k}) \geq R (u_{*}) .

This yields

\underset{k \to \infty}{\lim \inf} J (u_{k}) \geq J (u_{*}) .

Hence,

u_{*}

is a solution of

P (T, γ, A)

. This shows that solutions of

P (T, γ, A)

exist.

The exact controllability properties that have been used for the construction in the proof of Theorem 1 still hold if the matrix A is fixed. Hence, also the assertion about the finite-time turnpike property follows. □

Remark 2.

The results in Theorem 1 can be adapted to the case where the feasible set only contains constant (that is, time-independent) matrices A and constant vectors b, since the exact controllability results that we have used in the proof still hold in this case. Since in this case A and b are in finite-dimensional spaces, Theorem 2 implies the existence of optimal parameters

(W, A, b)

. We consider a problem of this type in the numerical example that is presented in the next section.

4. Numerical Examples

To illustrate our findings, we present numerical experiments. Let a natural number

k_{\max}

and

k_{0} \in {1, \dots, k_{\max}}

be given. For

k \in {1, 2, \dots, k_{\max}}

, we consider the time-discrete system

S_{d i s} \{\begin{matrix} x (0) = x_{0} \in R^{d}, \\ x (k) = x (k - 1) + \sum_{i = 1}^{p} σ (a_{i}^{⊤} x (k - 1) + b_{i}) w_{i} (k) \end{matrix}

(28)

Here, the

a_{i} \in R^{d}

and the real numbers

b_{i}

are independent of the time step. Our results can be adapted to the case of a constant matrix A and a constant vector b, since the exact controllability results that we have used in the proofs still hold. Since in this case A and b are in finite-dimensional spaces, we also obtain the existence of optimal parameters in this case.

The matrices

W (k)

depend on the current time step. To define the objective functional for the time-discrete case, let

Q_{d i s} (W, A, b) = \sum_{k = k_{0}}^{k_{\max}} | x (k) - x_{T} | + \sum_{k = k_{0}}^{k_{\max} - 1} | x (k + 1) - x (k) |

where x is the solution of (28),

R_{d i s} (W, A, b) = \frac{1}{2} {∥ A ∥}^{2} + \frac{1}{2} {∥ b ∥}^{2} + \sum_{k = 1}^{k_{\max}} \frac{1}{2} {∥ W (k) ∥}^{2}

and

J_{d i s} (W, A, b) = γ Q_{d i s} (W, A, b) + R_{d i s} (W, A, b) .

(29)

We consider the minimization problem

\min_{W (k) \in R^{d \times p} (k \in {1, \dots, k_{\max}}), A \in R^{d \times p}, b \in R^{p}} J_{d i s} (W, A, b) .

For the numerical example we have chosen

d = p = 3

,

k_{\max} = 10

,

γ = 10^{5}

,

x_{0} = {(1, 1, 1)}^{⊤}

and

x_{T} = 0

. For the training, we have used the global optimization toolbox in matlab.

Table 1 contains the evolution of the norms

| x (k) |

(

k \in {1, \dots, k_{\max}}

) along the computed approximation of the optimal trajectories for different values of

k_{0}

. It clearly illustrates the finite-time turnpike behavior that is predicted by Theorem 1. The zeros in Table 1 represent numerical values of the order less than

5 \times 10^{- 5}

.

For

γ = 10^{4}

and

k_{0} = 5

, we have obtained the numerical result presented in Table 2. Also, for this smaller value of

γ

, the turnpike structure is still visible. Here, we have given the size of the small norms (that are the numerical approximations of zero) in more detail.

5. Conclusions

We have shown that with a suitable non-smooth loss function, each solution of a learning problem has the finite-time turnpike property, which means that it reaches the desired state exactly after a finite time. Since the finite time

t_{0}

can be considered as a problem parameter, this situation allows us to choose

t_{0}

in a convenient way. Thus,

t_{0}

arises as an additional design parameter in the design of optimal neural networks, which corresponds to the number of layers. Since for

t \in [t_{0}, T]

the optimal parameters are zero, System (1) has a constant state on

[t_{0}, T]

, and thus the time horizon can be cut off at

t_{0}

.

Therefore, the problem of finding the optimal number of layers in a neural network corresponds in the setting of neural ODEs to the problem of time-optimal control (see, for example, [27]), which is defined as follows. Let a number

ρ > 0

be given, which serves as problem parameter. The optimization problem involves

finding a minimal value of t_{0}

subject to (1), the terminal constraint

x (t_{0}) = x_{T}

and for

u = (W, A, b) \in X (t_{0})

the inequality constraint

C_{(0, t_{0})} (u) \leq ρ

holds.

Here,

C_{(0, t_{0})}

is as defined in (10). The solution of the time-optimal control problem is closely related to the solution of

P (T, γ)

. This can be seen as follows. Let

ω (T, γ)

denote the optimal value of

P (T, γ)

. Then, Theorem 1 implies that (if

γ

is sufficiently large) for optimal parameters

u (t)

that solve

P (T, γ)

, we have

Q (u) = 0

and

C_{(0, t_{0})} (u) = ω (T, γ)

, and for the optimal state we have

x (t_{0}) = x_{T}

. Hence, we conclude that the optimal parameters for

P (T, γ)

also solve the time-optimal control problem with parameter

ρ = ω (T, γ)

and the optimal time is

t_{0}

.

This relation allows us to adapt the choice of

t_{0}

to the desired value of

C_{(0, t_{0})} (u)

. If

t_{0}

is enlarged, this value can be decreased for the optimal parameters.

We have shown the existence of a solution of the nonlinear optimization problem for the case that one of the parameters, namely the matrix

A (t)

, is fixed. In order to show that a solution also exists with

A (t)

as an additional (time-dependent) optimization parameter, we expect that an additional regularization term in the objective functional (for example

\int_{0}^{T} ∥ A^{'} (t) ∥^{2} d t)

is necessary. This is a topic for future research.

We expect that the finite-time turnpike property also holds in the case

t_{0} = 0

. However, the proof that is presented here does not apply to this case, so this is another topic for future research. As a possible application of our results, we have the numerical solution of shape inverse problems in mind, as described in [28]. Studying the finite-time turnpike phenomenon in a practical machine learning scenario will be of high value for future research.

Moreover, it would be interesting to combine the dynamics with an approach that allows for data assimilation, such as nudging induced neural networks (NINNs) that have been introduced recently in [29].

Funding

This research was funded by DFG in the framework of the Collaborative Research Centre CRC/Transregio 154, Mathematical Modelling, Simulation and Optimization Using the Example of Gas Networks, project C03 and project C05, Projektnummer 239904186 and the Bundesministerium für Bildung und Forschung (BMBF) and the Croatian Ministry of Science and Education under DAAD grant 57654073 ’Uncertain data in control of PDE systems’.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflicts of interest.

References

Marion, P. Generalization bounds for neural ordinary differential equations and deep residual networks. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates, Inc.: New York, NY, USA, 2024; Volume 36. [Google Scholar]
Dupont, E.; Doucet, A.; Teh, Y.W. Augmented Neural ODEs. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2019; Volume 32. [Google Scholar]
Lai, Z.; Mylonas, C.; Nagarajaiah, S.; Chatzi, E. Structural identification with physics-informed neural ordinary differential equations. J. Sound Vib. 2021, 508, 116196. [Google Scholar] [CrossRef]
Sarwinda, D.; Paradisa, R.H.; Bustamam, A.; Anggia, P. Deep learning in image classification using residual network (ResNet) variants for detection of colorectal cancer. Procedia Comput. Sci. 2021, 179, 423–431. [Google Scholar] [CrossRef]
Thorpe, M.; van Gennip, Y. Deep limits of residual neural networks. Res. Math. Sci. 2023, 10, 6. [Google Scholar] [CrossRef]
Lu, Y.; Zhong, A.; Li, Q.; Dong, B. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 3276–3285. [Google Scholar]
Álvarez López, A.; Slimane, A.H.; Zuazua, E. Interplay between depth and width for interpolation in neural ODEs. Neural Netw. 2024, 180, 106640. [Google Scholar] [CrossRef] [PubMed]
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195. [Google Scholar] [CrossRef]
Schäfer, A.M.; Zimmermann, H.G. Recurrent neural networks are universal approximators. In Proceedings of the Artificial Neural Networks–ICANN 2006: 16th International Conference, Athens, Greece, 10–14 September 2006; Proceedings, Part I 16. Springer: Berlin/Heidelberg, Germany, 2006; pp. 632–640. [Google Scholar]
Schäfer, A.M.; Udluft, S.; Zimmermann, H.G. Learning long term dependencies with recurrent neural networks. In Proceedings of the Artificial Neural Networks–ICANN 2006: 16th International Conference, Athens, Greece, 10–14 September 2006; Proceedings, Part I 16. Springer: Berlin/Heidelberg, Germany, 2006; pp. 71–80. [Google Scholar]
Schaefer, A.M.; Udluft, S.; Zimmermann, H.G. Learning long-term dependencies with recurrent neural networks. Neurocomputing 2008, 71, 2481–2488. [Google Scholar] [CrossRef]
Gugat, M.; Schuster, M.; Zuazua, E. The finite-time turnpike phenomenon for optimal control problems: Stabilization by non-smooth tracking terms. In Proceedings of the Stabilization of Distributed Parameter Systems: Design Methods And Applications; Springer: Cham, Switzerland, 2021; pp. 17–41. [Google Scholar] [CrossRef]
Gugat, M.; Schuster, M. Optimal Neumann control of the wave equation with L 1-control cost: The finite-time turnpike property. Optimization 2024, 1–28. [Google Scholar] [CrossRef]
Gugat, M. Optimal Boundary Control of the Wave Equation: The Finite-Time Turnpike Phenomenon. In Mathematical Reports; Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU): Erlangen, Germany, 2022. [Google Scholar]
Zaslavski, A.J. Turnpike Phenomenon in Metric Spaces; Springer Nature: Berlin/Heidelberg, Germany, 2023; Volume 201. [Google Scholar]
Grüne, L.; Faulwasser, T. Turnpike properties in optimal control: An overview of discrete-time and continuous-time results. In Handbook of Numerical Analysis; Trelat, E., Zuazua, E., Eds.; Elsevier: Amsterdam, The Netherlands, 2022. [Google Scholar] [CrossRef]
Grüne, L.; Guglielmi, R. Turnpike properties and strict dissipativity for discrete time linear quadratic optimal control problems. SIAM J. Control Optim. 2018, 56, 1282–1302. [Google Scholar] [CrossRef]
Trélat, E.; Zuazua, E. The turnpike property in finite-dimensional nonlinear optimal control. J. Differ. Equ. 2015, 258, 81–114. [Google Scholar] [CrossRef]
Geshkovski, B.; Zuazua, E. Turnpike in optimal control of PDEs, ResNets, and beyond. Acta Numer. 2022, 31, 135–263. [Google Scholar] [CrossRef]
Steffensen, S. Relaxation approaches for nonlinear sparse optimization problems. Optimization 2023, 73, 3237–3258. [Google Scholar] [CrossRef]
Burger, M.; Föcke, J.; Nickel, L.; Jung, P.; Augustin, S. Reconstruction methods in thz single-pixel imaging. In Proceedings of the Compressed Sensing and Its Applications: Third International MATHEON Conference 2017, Berlin, Germany, 4–8 December 2017; Springer: Cham, Switzerland, 2019; pp. 263–290. [Google Scholar]
Bungert, L.; Roith, T.; Tenbrinck, D.; Burger, M. A Bregman learning framework for sparse neural networks. J. Mach. Learn. Res. 2022, 23, 1–43. [Google Scholar]
Gugat, M. Optimal Boundary Control and Boundary Stabilization of Hyperbolic Systems; Birkhäuser: Basel, Switzerland, 2015. [Google Scholar] [CrossRef]
Ciarlet, P.G. Mathematical Elasticity: Three-Dimensional Elasticity; SIAM: Philadelphia, PA, USA, 2021. [Google Scholar]
Heuser, H.G. Functional Analysis; Horvath, J., Translator; A Wiley-Interscience Publication; Chichester etc.; John Wiley & Sons: Hoboken, NJ, USA, 1982. [Google Scholar]
LaSalle, J. Time optimal control systems. Proc. Natl. Acad. Sci. USA 1959, 45, 573–577. [Google Scholar] [CrossRef] [PubMed]
Jackowska-Strumillo, L.; Sokolowski, J.; Żochowski, A.; Henrot, A. On numerical solution of shape inverse problems. Comput. Optim. Appl. 2002, 23, 231–255. [Google Scholar] [CrossRef]
Antil, H.; Löhner, R.; Price, R. NINNs: Nudging induced neural networks. Phys. Nonlinear Phenom. 2024, 470, 134364. [Google Scholar] [CrossRef]

Figure 1. A typical graph of

| x (t) |

, where x is an optimal trajectory of

P (T, γ)

that satisfies the finite-time turnpike property. Here

t_{0} = 1

and

T = 5

.

Figure 1. A typical graph of

| x (t) |

, where x is an optimal trajectory of

P (T, γ)

that satisfies the finite-time turnpike property. Here

t_{0} = 1

and

T = 5

.

Table 1. The evolution of the norm of the state along the approximations of the optimal trajectories. The structure of the numerical solutions shows the finite-time turnpike structure that is indicated in Theorem 1. Here,

γ = 10^{5}

.

Table 1. The evolution of the norm of the state along the approximations of the optimal trajectories. The structure of the numerical solutions shows the finite-time turnpike structure that is indicated in Theorem 1. Here,

γ = 10^{5}

.

$γ = 10^{5}$	$\| x (1) \|$	$\| x (2) \|$	$\| x (3) \|$	$\| x (3) \|$	$\| x (5) \|$	$\| x (6) \|$	$\| x (7) \|$
$k_{0} = 1$	3	0	0	0	0	0	0
$k_{0} = 2$	3	2.2178	0	0	0	0	0
$k_{0} = 3$	3	3.2343	3.3124	0	0	0	0
$k_{0} = 4$	3	2.3303	0.9089	0.7560	0	0	0
$k_{0} = 5$	3	2.8638	2.6989	2.2098	0.0396	0	0
$k_{0} = 6$	3	1.8557	1.0519	1.2795	1.1156	0.7379	0.0002

Table 2. For

γ = 10^{4}

, the finite-time turnpike structure for

k_{0} = 5

is still visible.

Table 2. For

γ = 10^{4}

, the finite-time turnpike structure for

k_{0} = 5

is still visible.

$γ = 10^{4}$	$\| x (1) \|$	$\| x (2) \|$	$\| x (3) \|$	$\| x (3) \|$	$\| x (5) \|$	$\| x (6) \|$	$\| x (7) \|$	$\| x (8) \|$	$\| x (9) \|$	$\| x (10) \|$
$k_{0} = 5$	3	3.2622	2.7861	2.0809	0.2694	$3 \times 10^{- 8}$	$2 \times 10^{- 8}$	$3 \times 10^{- 8}$	$2 \times 10^{- 8}$	$2 \times 10^{- 8}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gugat, M. The Finite-Time Turnpike Property in Machine Learning. Machines 2024, 12, 705. https://doi.org/10.3390/machines12100705

AMA Style

Gugat M. The Finite-Time Turnpike Property in Machine Learning. Machines. 2024; 12(10):705. https://doi.org/10.3390/machines12100705

Chicago/Turabian Style

Gugat, Martin. 2024. "The Finite-Time Turnpike Property in Machine Learning" Machines 12, no. 10: 705. https://doi.org/10.3390/machines12100705

APA Style

Gugat, M. (2024). The Finite-Time Turnpike Property in Machine Learning. Machines, 12(10), 705. https://doi.org/10.3390/machines12100705

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Finite-Time Turnpike Property in Machine Learning

Abstract

1. Introduction

2. The Finite-Time Turnpike Property

3. The Existence of Solutions of P(T, γ) for Fixed A

4. Numerical Examples

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI