An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification

Bania, Piotr; Wójcik, Anna

doi:10.3390/e27101041

Open AccessArticle

An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification

by

Piotr Bania

^*

and

Anna Wójcik

Department of Automatic Control and Robotics, Faculty of Electrical Engineering, Automatics, Computer Science, and Biomedical Engineering, AGH University of Krakow, al. A. Mickiewicza 30, 30-059 Krakow, Poland

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(10), 1041; https://doi.org/10.3390/e27101041

Submission received: 11 September 2025 / Revised: 2 October 2025 / Accepted: 3 October 2025 / Published: 7 October 2025

(This article belongs to the Section Signal and Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

The design of informatively rich input signals is essential for accurate system identification, yet classical Fisher-information-based methods are inherently local and often inadequate in the presence of significant model uncertainty and non-linearity. This paper develops a Bayesian approach that uses the mutual information (MI) between observations and parameters as the utility function. To address the computational intractability of the MI, we maximize a tractable MI lower bound. The method is then applied to the design of an input signal for the identification of quasi-linear stochastic dynamical systems. Evaluating the MI lower bound requires the inversion of large covariance matrices whose dimensions scale with the number of data points N. To overcome this problem, an algorithm that reduces the dimension of the matrices to be inverted by a factor of N is developed, making the approach feasible for long experiments. The proposed Bayesian method is compared with the average D-optimal design method, a semi-Bayesian approach, and its advantages are demonstrated. The effectiveness of the proposed method is further illustrated through four examples, including atomic sensor models, where input signals that generate a large amount of MI are especially important for reducing the estimation error.

Keywords:

design of experiment; Bayesian experimental design; optimal input signal design; system identification; entropy; information

1. Introduction

The design of informative input signals is the cornerstone of modern system identification. Without properly chosen excitation, even advanced estimation algorithms may fail to provide accurate parameter estimates, leading to unreliable prediction and control. Classical references such as [1,2,3] and the reviews by [4,5,6] emphasize that identification is not only a matter of statistical estimation but also of experimental design, where the input signal determines the achievable information content. Optimal experimental design (OED) methods therefore play a crucial role in practical applications. Traditionally, OED has relied on the Fisher information matrix (FIM), with criteria such as D or A optimality widely used due to their computational efficiency and asymptotic guarantees [4,7]. However, FIM-based approaches are inherently local, relying on linearization and asymptotic normality. They may thus be fragile in scenarios with large model uncertainty or strongly non-linear stochastic dynamics.

A natural alternative is the Bayesian approach, which evaluates an experiment through its expected information gain, typically quantified by the mutual information between model parameters and observations [5,6,8,9,10,11]. Bayesian design has several advantages: it is globally valid over the parameter space, it naturally incorporates prior information, and it is applicable to non-linear, stochastic models. Its main drawback is computational intractability since mutual information requires high-dimensional integration over parameters and observations. Since the general case is challenging and difficult to solve, in this article, we focus on models that can be represented in the form

Y = F (θ, U) + Z,

(1)

where

θ \in {θ_{1}, \dots, θ_{r}}

is a parameter with the prior distribution

P (θ = θ_{j}) = p_{0, j}

. The noise

Z

is conditionally normal, that is,

p (Z | θ) = N (Z, 0, S (θ, U))

, and

U

is a design variable. For this class of models, the density of the observations

Y

is a finite Gaussian mixture of the form

p (Y) = \sum_{j = 1}^{r} p_{0, j} N (Y, F (θ_{j}, U), S (θ, U))

. Within these Gaussian mixtures, the mutual information between

Y

and

θ

can be estimated from below using the effective and tractable pairwise-distance-based lower bound given by Kolchinsky and Tracey [12,13]. We maximize this bound to achieve an approximate optimal design parameter

U

and then generalize the method to a parameter space of continuum cardinality. In particular, we show how to treat the Gaussian prior

p_{0} (θ) = N (θ, m_{θ}, S_{θ})

and prior distributions with compact support.

In this work, we focus on quasi-linear systems, namely stochastic dynamical systems that are linear in the state variables but non-linear in the control variables. Such systems occur ubiquitously in science and engineering. In quantum mechanics, Hamiltonians and Lindblad dissipators depend on external control fields such as laser intensities, magnetic fields, or gate voltages, leading to a non-linear dependence on the control [14,15]. In chemical processes, flow rates directly determine reaction speeds in continuous stirred tank reactors [16,17]. In thermal plants, convection coefficients scale non-linearly with flow, giving rise to quasi-linear heat transfer dynamics [16]. This broad applicability makes quasi-linear systems a natural and important class for advanced input design. However, there is a notable lack of Bayesian design methods and software tools tailored to this class of systems. Therefore, in this paper, we address this gap by developing such a method. Specifically, we show that a finite sequence of observations generated by a quasi-linear system can always be expressed in the form of the model (1), and we provide an effective algorithm for calculating the lower bound on mutual information.

The study reported here constitutes a substantial and far-reaching extension of the initial results presented in [18], as well as related research reported in [19,20,21]. The article’s main contributions can be summarized as follows. We first introduce an Information-Theoretic Lower Bound (ITB) on the estimation error of any estimator [22,23] and briefly discuss its relation to the Bayesian Cramér–Rao Bound (BCRB) [23,24,25,26]. We conclude that maximizing mutual information is superior to maximizing Bayesian or classical Fisher information, which is consistent with the arguments presented in [5,8,10,11]. Building on this result, we introduce a novel Bayesian design method for model (1). To address the intractability and computational complexity of direct mutual information evaluation, we discretize the parameter space and maximize the Kolchinsky–Tracey lower bound [12,13] and subsequently extend this approach to a parameter space of continuum cardinality. We then focus on the application to linear and quasi-linear system identification. Since the information-theoretic bound requires the inversion of large covariance matrices of the observations, whose dimensions grow linearly with the number of data points N (with N∼

10^{3}

–

10^{6}

in applications), direct inversion is computationally problematic. To overcome this challenge, we develop an algorithm that reduces the dimension of the matrices needed for inversion by a factor of N, thus making the approach feasible for long experiments. The proposed Bayesian method is compared with the average D-optimal design [1,4,27], a semi-Bayesian method, and its advantages are demonstrated. The effectiveness of the method is further illustrated by four examples. The first two, intentionally elementary, highlight the effectiveness of our approach. The third and fourth examples are drawn from atomic sensor models, a domain where optimal input design is particularly critical. We analyze a controlled harmonic oscillator with stochastic disturbances as a paradigmatic atomic sensor model [28] and a complex magnetometer model with non-linear dependence of the system matrices on the input [29]. In the latter case, we provide a simplified model, derive the optimal input, and demonstrate that it significantly outperforms the harmonic signal, which might otherwise be presumed to be optimal. Moreover, we show that the estimation error of the MAP estimator achieves the theoretical lower bound.

This article is organized as follows. Section 2 formulates the problem. Section 3 develops the approximate Bayesian solution for finite and infinite parameter spaces. Section 4 applies the method to quasi-linear systems. Section 5 compares the approach with classical design methods. Section 6 presents examples. Section 7 provides discussion and conclusions.

2. Formulation of the Problem

Let us consider a family of models

Y = F (θ, U) + Z,

(2)

where

Y, Z \in R^{n_{Y}}

,

U \in R^{n_{U}}

,

θ \in Θ \subset R^{n_{θ}}

. The set

Θ

will be called the parameter space. Parameter

θ

is unknown. The prior distribution of

θ

is denoted by

p_{0}

. The random variable

Z

is conditionally normal, i.e.,

p (Z | θ) = N (Z, 0, S (θ, U))

, where

S (θ, U) \in S^{+} (n_{Y})

, for all

θ \in Θ

,

U \in R^{n_{U}}

. Functions

F

and

S

are smooth. The variable

U

is called the design parameter or, in the context of dynamical systems, the input signal. The set of admissible signals is given by

U_{a d} = {U \in R^{n_{U}}; | U - \tilde{U} | ⩽ ϱ},

(3)

where

\tilde{U}

is the given vector, and

ϱ

is the maximal norm of the signal. We will also consider an alternative, and useful in some applications, definition of

U_{a d}

:

U_{a d} = {U \in R^{n_{U}}; U_{m i n} ⩽ U ⩽ U_{m a x}},

(4)

where

U_{m i n}, U_{m a x} \in R^{n_{U}}

are fixed vectors. Under these assumptions and after applying the Bayes rule, we acquire the likelihood, evidence, and posterior distribution of

θ

:

p (Y | θ, U) = N (Y, F (θ, U), S (θ, U)),

(5)

p (Y | U) = \int p_{0} (θ) N (Y, F (θ, U), S (θ, U)) d θ,

(6)

p (θ | Y, U) = \frac{p_{0} (θ) p (Y | θ, U)}{p (Y | U)} .

(7)

The Minimum Mean Squared Error (MMSE) estimator of

θ

is then given by

\hat{θ} (Y, U) = \int θ p (θ | Y, U) d θ .

(8)

To avoid the difficulties involved in calculating the integral (8), instead of the MSE, the Maximum a Posteriori (MAP) estimator is typically used. Taking the negative logarithm of both sides of (7) and omitting the terms independent on

θ

, we get the following:

L (θ, Y, U) = \frac{1}{2} {| Y - F (θ, U) |}_{S^{- 1} (θ, U)}^{2} + \frac{1}{2} \ln | S (θ, U) | - \ln p_{0} (θ) .

(9)

Thus, the MAP estimator of

θ

is given by

\hat{θ} (Y, U) = \arg \min_{θ \in Θ} L (θ, Y, U) .

(10)

Estimators (8) or (10) may be biased, so the Cramér–Rao Bound (CRB) cannot be applied directly to them. However, a Bayesian version of the CRB exists and can be used to estimate the error of any, also biased, estimator [23,24,25,26]. Let us introduce Bayesian information (BI):

J_{B} = E_{p (θ, Y | U)} [\nabla_{θ} L (θ, Y, U) (\nabla_{θ} L {(θ, Y, U)}^{T})] = J_{D} + J_{P},

(11)

where

J_{P} = E_{p_{0} (θ)} [\nabla_{θ} \ln p_{0} (θ) {(\nabla_{θ} \ln p_{0} (θ))}^{T}],

(12)

is the fraction of BI associated with a prior, and

J_{D} (U) = E_{p (θ, Y | U)} [\nabla_{θ} \ln p (Y | θ, U) {(\nabla_{θ} \ln p (Y | θ, U))}^{T}],

(13)

is part of the BI provided by observations

Y

. The matrix

J_{D}

is the Bayesian equivalent of the Fisher information matrix. The Fisher information matrix can be recovered from (13) assuming

p_{0} (θ) = δ (θ - θ^{0})

, where

θ^{0}

is the true value of the parameter. Let

\hat{θ} (Y, U)

be any estimator of

θ

. Assuming a sufficiently regular prior, it can be proven that

E | θ - \hat{θ} {(Y, U) |}^{2} ⩾ \frac{n_{θ}}{| J_{P} + J_{D} {(U) |}^{1 / n_{θ}}},

(14)

which is usually known as the Van Trees inequality or Bayesian Cramér–Rao Bound (BCRB) [23,24,30] [inequality (2.9), p. 17]. Formulas (11)–(13) are well defined only under rather restrictive assumptions. In particular, the joint density

p (Y, θ)

must be differentiable with respect to

θ

, and it must satisfy the regularity condition

\int \nabla_{θ} p (Y, θ) d Y = 0

. Moreover, the prior and likelihood distributions must guarantee the existence of the expectations in (12) and (13). This excludes the uniform and many other useful prior distributions and considerably limits the applicability of inequality (14) (see [23] for details). Beyond inequality (14), a large class of Bayesian bounds exists, reported in [26]. Many of these bounds can serve as a utility function. Probably one of the best design criteria is the Ziv–Zakai lower bound [31]. However, to compute this estimate, an additional, and rather complex, optimization sub-problem must be solved, as shown in [31]. Therefore, due to the high computational complexity of the multivariate Ziv–Zakai bound, we will not consider it here. Given the application-oriented focus of this article and following the arguments presented in [5,6], we conclude that the entropy-based lower bound [22] [p. 255] [23] [Section 2.2, pp. 16–17] is a reasonable optimality criterion and provides slightly tighter estimates than the BCRB (cf. [31] [Section V.D]). To proceed, let us define the entropies of

Y

and

θ

and the corresponding conditional entropies:

\begin{matrix} H_{Y} (U) = E (- \ln p (Y | U)), \end{matrix}

(15)

\begin{matrix} H_{θ} = E (- \ln p_{0} (θ)), \end{matrix}

(16)

\begin{matrix} H_{Y | θ} (U) = E (- \ln p (Y | θ, U)) = \frac{1}{2} \int p_{0} (θ) \ln ({(2 π e)}^{n_{Y}} | S (θ, U) |) d θ, \end{matrix}

(17)

\begin{matrix} H_{θ | Y} (U) = E (- \ln p (θ | Y, U)) . \end{matrix}

(18)

The mutual information (MI) between

θ

and

Y

is defined as

I_{θ; Y} (U) = H_{θ} - H_{θ | Y} (U) = H_{Y} (U) - H_{Y | θ} (U) .

(19)

The following theorem establishes the ultimate limit of the estimation error expressed in terms of mutual information, demonstrating that the Bayesian Cramér–Rao Bound (BCRB) does not constitute a fundamental limit.

Theorem 1.

Let

\hat{θ} (Y, U)

be any estimator of

θ

. Then, the following inequalities hold:

E | θ - \hat{θ} {(Y, U) |}^{2} ⩾ n_{θ} {(2 π e)}^{- 1} e^{2 n_{θ}^{- 1} (H_{θ} - I_{θ; Y} (U))} ⩾ \frac{n_{θ}}{| J_{P} + J_{D} {(U) |}^{1 / n_{θ}}} .

(20)

The proof is given in Appendix A. The first of the inequalities (20) will be called the Information-Theoretic Lower Bound (ITB). The last part of (20) is known as the Efroimovich inequality [23,25] [inequality (2.7), Ch. 2.2, p. 16].

To determine the optimal signal, one may maximize the MI, the determinant of

J_{B}

, or the determinant of FIM. The latter corresponds to the classical design methods outlined in Section 5. However, the right-hand side of (14) generally underestimates the estimation error, and large values of

J_{B}

do not necessarily guarantee a small error. We illustrate this problem in Appendix B. In contrast, the ITB (20), which plays a central role in our subsequent analysis, shows that maximizing

I_{θ; Y} (U)

is essential for reducing the estimation error and provides a more fundamental criterion than maximizing either the Bayesian or classical Fisher information. In particular, in the context of optimal experimental design and input signal design for system identification, maximizing the mutual information between the parameters and the observations constitutes the most principled optimality criterion, as it directly quantifies the amount of knowledge gained about the parameters [5]. Accordingly, we define the optimal signal as the solution of the following optimization problem:

U^{*} = \arg \max_{U \in U_{a d}} I_{θ; Y} (U) .

(21)

As

I_{θ; Y}

is smooth and

U_{a d}

is compact, then (21) is well defined. After solving the task, the MMSE, MAP, or any other estimator can be used to determine

θ

.

Computing mutual information (MI) or its lower bound remains a significant challenge. The analyses presented in the literature [5,6,10,11] show that this can be undertaken in three main ways: (1) using Monte Carlo or nested Monte Carlo (MC) simulations [5] [Section 3.1] [6]; (2) applying variational lower bound (VLB) estimates of the MI [5] [Section 3.3.1] [6]; or (3) utilizing existing, easily computable estimates of conditional entropy or the MI. Since we aim to numerically maximize the MI, which also depends on the design parameter U, the procedure for calculating MI will be called by the optimization solver millions of times and must therefore be sufficiently fast. Consequently, although MC methods provide good estimates of MI, they are of limited use here. The VLB methods require simultaneous optimization of the variational distribution with respect to its parameters and the signal U [5] [Sections 3.3.1 and 4.3.4] [6]. In addition, stochastic simulations are also used to compute the expected values in the VLB. As the goal of this article is to develop a simple design method that does not require hours of computation, we focus on the third option and use the existing, easily computable lower bounds of entropy or MI provided in [12] or [32].

3. Approximate Solutions

The optimization problem (21) becomes considerably more tractable when the parameter space

Θ

is finite. Consequently, we first derive an approximate solution for a finite set of parameters and subsequently extend this result to obtain an approximate solution for the case where

Θ

is an uncountable subset of

R^{n_{θ}}

.

3.1. Finite Parameter Space

Let

Θ = {θ_{1}, \dots, θ_{r}}, θ_{i} \in R^{n_{θ}}

,

θ_{i} \neq θ_{j}

and assume that

p_{0}

is a discrete distribution of the form

p_{0} (θ) = \sum_{j = 1}^{r} p_{0, j} δ (θ - θ_{j}),

(22)

where

p_{0, j} = P (θ = θ_{j})

. Then, on the basis of (6) and (22), the density of

Y

becomes a Gaussian mixture:

p (Y | U) = \sum_{j = 1}^{r} p_{0, j} N (Y, F (θ_{j}, U), S (θ_{j}, U)) .

(23)

The application of the Bayes rule gives the posterior

p (θ_{j} | Y, U) = \frac{p_{0, j} N (Y, F (θ_{j}, U), S (θ_{j}, U))}{p (Y | U)} .

(24)

The discrete counterpart of Formulas (15)–(19) takes the form

\begin{matrix} H_{Y} (U) = - \int p (Y | U) \ln p (Y | U) d Y, \end{matrix}

(25)

\begin{matrix} H_{θ} = - \sum_{j = 1}^{r} p_{0, j} \ln p_{0, j}, \end{matrix}

(26)

\begin{matrix} H_{Y | θ} (U) = \frac{1}{2} \sum_{j = 1}^{r} p_{0, j} \ln ({(2 π e)}^{n_{Y}} | S (θ_{j}, U) |), \end{matrix}

(27)

\begin{matrix} H_{θ | Y} (U) = \int p (Y | U) (- \sum_{j = 1}^{r} p (θ_{j} | Y, U) \ln p (θ_{j} | Y, U)) d Y, \end{matrix}

(28)

I_{θ; Y} (U) = H_{θ} - H_{θ | Y} (U) = H_{Y} (U) - H_{Y | θ} (U) .

(29)

Direct computation of mutual information (29) remains difficult and, in many cases, intractable. Hence, our central idea is to overcome this difficulty by replacing mutual information (29) with a computationally tractable and non-trivial lower bound. In particular, we observe that (23) is a finite Gaussian mixture. For such mixtures, one of the most effective lower bounds on

I_{θ | Y}

is the inequality introduced in [12].

Lemma 1

(Information bounds [12]). For the Gaussian mixture (23) with

p_{0, j} = P (θ = θ_{j})

, the following inequality holds:

I_{l} (U) ⩽ I_{θ; Y} (U) ⩽ H_{θ},

(30)

where

\begin{matrix} I_{l} (U) = - \sum_{i = 1}^{r} p_{0, i} \ln (\sum_{j = 1}^{r} p_{0, j} e^{- d_{i, j} (U)}), \end{matrix}

(31)

\begin{matrix} d_{i, j} (U) = \frac{1}{8} Δ_{i, j}^{T} {(\frac{1}{2} (S_{i} + S_{j}))}^{- 1} Δ_{i, j} + \frac{1}{2} \ln | \frac{1}{2} (S_{i} + S_{j}) | - \frac{1}{4} \ln (| S_{i} | | S_{j} |), \end{matrix}

(32)

\begin{matrix} Δ_{i, j} = F (θ_{i}, U) - F (θ_{j}, U), S_{i} = S (θ_{i}, U), S_{j} = S (θ_{j}, U) . \end{matrix}

(33)

Detailed proof is given in [13] [Section IIIb, inequality (11), and Section IV, Formula (15) with

α = 0.5

] and also in [12]. Now, the approximate solution of (21) is given by

U^{*} = \arg \max_{U \in U_{a d}} I_{l} (U) .

(34)

Since

U_{a d}

is compact and

I_{l}

is smooth and bounded, the solution of (34) exists. We also note that in the case of two alternatives, that is, when

r = 2

in (31), we get

e^{- I_{l} (U)} = {(p_{0, 1} + p_{0, 2} e^{- d_{1, 2} (U)})}^{p_{0, 1}} {(p_{0, 1} e^{- d_{1, 2} (U)} + p_{0, 2})}^{p_{0, 2}} .

(35)

Accordingly, the optimal signal in this case arises as the solution of a somewhat simplified optimization problem:

\max_{U \in U_{a d}} d_{1, 2} (U) .

(36)

If the function

F

in (2) is affine with respect to

U

and the covariance

S

does not depend on

U

, then it follows from Lemma 1, that

d_{1, 2}

is a positive (semi-) definite quadratic form with respect to

U

. For constraints (4), we thus obtain a convex quadratic programming problem. In the case of constraints (3), one needs to find the minimum of

d_{1, 2}

on a closed ball in

R^{n_{U}}

. This is also a convex problem, and it can be reduced to finding zeros of a scalar function [33] [Theorems 4.1, p. 70 and Section 4.3]. Furthermore, if

F (θ_{i}, U) = F_{i} U, i = 1, 2

, and the constraints are defined by (3), then the solution of (36) is the eigenvector of the matrix

Q = {(F_{1} - F_{2})}^{T} {(S_{1} + S_{2})}^{- 1} (F_{1} - F_{2})

, corresponding to its largest eigenvalue (see [18] [Section 2.1] for details).

3.2. Infinite Parameter Space

Let us assume that

Θ = R^{n_{θ}}

and consider the Gaussian prior

p_{0} (θ) = N (θ, m_{θ}, S_{θ}), S_{θ} > 0 .

(37)

Then, the integral in (6) can be approximated with a finite Gaussian mixture

p (Y | U) = \int p_{0} (θ) N (Y, F (θ, U), S (θ, U)) d θ \approx \sum_{j = 1}^{N_{a}} p_{0, j} N (Y, F (θ_{j}, U), S (θ_{j}, U)),

(38)

where

p_{0, j} ⩾ 0

and

\sum_{j = 1}^{N_{a}} p_{0, j} = 1

. The weights

p_{0, j}

and the nodes

θ_{j}

in (38) can be calculated by using the multidimensional Gauss–Hermite quadrature rule or any other suitable method. The Gauss–Hermite quadrature of the order p is exact for polynomials of a degree of at most

2 p - 1

. The approximation error of the integral (38) using the Gauss–Hermite quadrature of the order p depends on the

2 p

-th derivatives of the functions F and S. In the single-parameter case, with the Gaussian prior

N (θ, m_{θ}, σ_{θ})

, the error estimate is given by the formula

e ⩽ σ_{θ} \frac{4^{p} C}{p!} \sup_{θ, U, Y} \frac{d^{2 p}}{d θ^{2 p}} N (Y, F (θ, U), S (θ, U)),

(39)

where C is constant. The error tends to zero as

p \to \infty

or

σ_{θ} \to 0

. Therefore, the Gauss–Hermite approximation of the integral (38) is especially useful when the prior is narrow (small

σ_{θ}

) or when the integrand, in the neighborhood of the point

m_{θ}

, can be well approximated using low-degree polynomials. To illustrate the method, we will show only a very simple second-order Gaussian quadrature rule with

2 n_{θ}

points.

Lemma 2.

The approximate value of the integral

J (f) = \int N (θ, m_{θ}, S_{θ}) f (θ) d θ

is given by

J (f) \approx \frac{1}{2 n_{θ}} \sum_{j = 1}^{2 n_{θ}} f (θ_{j}),

(40)

where

θ_{2 i - 1} = m_{θ} - S_{θ}^{0.5} \sqrt{n_{θ}} e_{i}, θ_{2 i} = m_{θ} + S_{θ}^{0.5} \sqrt{n_{θ}} e_{i}, i = 1, \dots, n_{θ}

(41)

and

e_{i}

is

i^{t h}

basis vector in

R^{n_{θ}}

. If

f (θ) = \frac{1}{2} θ^{T} A θ + b^{T} θ + c

, then the equality holds in (40).

Proof.

Direct calculation. □

An analogous method can be used for prior distributions defined on compact subsets of

R^{n_{θ}}

(e.g., an n-dimensional hypercube), but the formulas for the nodes and weights in (38) will then change. For example, if

θ

is a scalar parameter and the prior distribution is uniform, that is,

p_{0} (θ) = U [a, b]

, then, using a second-order Gauss–Legendre quadrature, the approximate value of the integral

\int_{a}^{b} p_{0} (θ) f (θ) d θ

is computed using the formula

\int_{a}^{b} p_{0} (θ) f (θ) d θ \approx p_{0, 1} f (θ_{1}) + p_{0, 2} f (θ_{2}),

(42)

where

p_{0, 1} = p_{0, 2} = 0.5

and

θ_{1} = \frac{1}{2} (a + b - \frac{b - a}{\sqrt{3}}), θ_{2} = \frac{1}{2} (a + b + \frac{b - a}{\sqrt{3}}) .

(43)

Formula (42) is exact for polynomials of degree 3. The error estimate is analogous to (39) and tends to zero when

b - a \to 0

. More general multidimensional formulas, integration methods, and error estimates are given in [34,35,36].

The application of Lemma 2 to (38) gives

N_{a} = 2 n_{θ}

,

p_{0, j} = {(2 n_{θ})}^{- 1}

. Now, since (38) is approximated by a Gaussian mixture, the results of Section 3.1 can be used. Based on Equations (31)–(33) and (38) and Lemma 1, the information’s lower bound takes the form

I_{l} (U) = - \frac{1}{2 n_{θ}} \sum_{i = 1}^{2 n_{θ}} \ln (\frac{1}{2 n_{θ}} \sum_{j = 1}^{2 n_{θ}} e^{- d_{i, j} (U)}),

(44)

where

d_{i, j}

and

θ_{j}

are given by Equations (31)–(33) and (41) or (43). The approximate solution of (21) can be found by maximizing (44) with constraints (3) or (4).

4. Bayesian Input Signal Design in Quasi-Linear Control Systems

Consider the family of quasi-linear systems

x_{k + 1} = A (θ, u_{k}) x_{k} + B (θ, u_{k}) + G (θ, u_{k}) w_{k},

(45)

y_{k} = C x_{k} + v_{k},

(46)

where

k = 0, 1, \dots, N

,

N ⩾ 1

,

x_{k} \in R^{n}, y_{k} \in R^{n_{y}}, w_{k} \in R^{n_{w}}, v_{k} \in R^{n_{y}}

,

w_{k} \sim N (0, I)

,

v_{k} \sim N (0, S_{v}), S_{v} > 0

. Variables

x_{0}, w_{0}, \dots, w_{N - 1}, v_{0}, \dots, v_{N}

are mutually independent. The initial state

x_{0}

is conditionally normal, i.e.,

p (x_{0} | θ) = N (x_{0}, m_{0}^{-} (θ), S_{0}^{-} (θ))

, where

m_{0}^{-}

,

S_{0}^{-}

are smooth and

S_{0}^{-} (θ) > 0

, for all

θ \in Θ

. The joint prior distribution of the initial state

x_{0}

and the parameter

θ

is given by

p_{0} (x_{0}, θ) = p_{0} (θ) N (x_{0}, m_{0}^{-} (θ), S_{0}^{-} (θ))

. Let us define

A_{k} = A (θ, u_{k})

,

B_{k} = B (θ, u_{k})

,

G_{k} = G (θ, u_{k})

. Then, the solution of (45) has the form

\begin{matrix} x_{0} = I x_{0}, \end{matrix}

(47)

\begin{matrix} x_{1} = A_{0} x_{0} + B_{0} + G_{0} w_{0}, \end{matrix}

(48)

\begin{matrix} x_{2} = A_{1} x_{1} + B_{1} + G_{1} w_{1} = A_{1} A_{0} x_{0} + A_{1} B_{0} + B_{1} + A_{1} G_{0} w_{0} + G_{1} w_{1}, \end{matrix}

(49)

\begin{matrix} ⋮ \end{matrix}

(50)

\begin{matrix} x_{N} = Φ (N, 0) x_{0} + \sum_{j = 0}^{N - 1} Φ (N, j + 1) B_{j} + \sum_{j = 0}^{N - 1} Φ (N, j + 1) G_{j} w_{j}, \end{matrix}

(51)

where

Φ (n, n) = I

and

Φ (n, j) = \prod_{i = 1}^{n - j} A_{n - i}, j < n .

(52)

Now, if we denote

X = col (x_{0}, \dots, x_{N})

,

Y = col (y_{0}, \dots, y_{N})

,

U = col (u_{0}, \dots, u_{N - 1})

,

W = col (w_{0}, \dots, w_{N - 1})

,

V = col (v_{0}, \dots, v_{N})

, we can rewrite Equations (46)–(51), in matrix-vector form:

X = A (θ, U) x_{0} + B (θ, U) + G (θ, U) W,

(53)

Y = C X + V,

(54)

where the matrices

A

,

B

,

G

,

C = I_{N + 1} \otimes C

follow directly from Equations (46)–(51), and

W \sim N (0, I_{N n_{w}})

,

V \sim N (0, I_{N + 1} \otimes S_{v})

. Substituting (53) into (54) and taking into account that

p (x_{0} | θ) = N (x_{0}, m_{0}^{-} (θ), S_{0}^{-} (θ))

, we get

\begin{matrix} Y = C A (θ, U) m_{0}^{-} (θ) + C B (θ, U) + Z, \end{matrix}

(55)

\begin{matrix} Z = C A (θ, U) (x_{0} - m_{0}^{-} (θ)) + C G (θ, U) W + V . \end{matrix}

(56)

The conditional density of variable

Z

has the form

p (Z | θ) = N (Z, 0, S (θ, U))

, where the covariance matrix

S

is given by

S (θ, U) = C (A (θ, U) S_{0}^{-} (θ) A {(θ, U)}^{T} + G (θ, U) G {(θ, U)}^{T}) C^{T} + I_{N + 1} \otimes S_{v} .

(57)

Finally, if we define

F (θ, U) = C A (θ, U) m_{0} (θ) + C B (θ, U),

(58)

we can rewrite (55) in the form

Y = F (θ, U) + Z

, which is exactly the model (2). To find the optimal input signal, we maximize one of the criteria (31) or (44) with constraints (3) or (4).

With a large number of data (large N), calculating the inverse and determinant of a very large matrix

S (θ, U)

in (9) and calculating the quantities

d_{i, j} (U)

in Equations (31)–(33) is numerically ill conditioned and requires special treatment. The algorithms below reduce the size of the matrices necessary to invert by a factor of

N + 1

.

Lemma 3.

Efficient computation of log-likelihood. The following identities hold:

p (Y | θ, U) = \prod_{k = 0}^{N} N (y_{k}, C m_{k}^{-} (θ), Σ_{k} (θ)),

(59)

{| Y - F (θ, U) |}_{S {(θ, U)}^{- 1}}^{2} = \sum_{k = 0}^{N} {| y_{k} - C m_{k}^{-} (θ) |}_{Σ_{k}^{- 1} (θ)}^{2},

(60)

| S (θ, U) | = \prod_{k = 0}^{N} | Σ_{k} (θ) |,

(61)

L (θ, Y, U) = \frac{1}{2} \sum_{k = 0}^{N} (| y_{k} - C m_{k}^{-} {(θ) |}_{Σ_{k}^{- 1} (θ)}^{2} + \ln | Σ_{k} (θ) |) - \ln p_{0} (θ),

(62)

where

L

is given by (9) and

m_{k}^{-}

,

Σ_{k}

, are calculated recursively by the Kalman filter

\begin{matrix} Σ_{k} (θ) = S_{v} + C S_{k}^{-} (θ) C^{T}, \end{matrix}

(63)

\begin{matrix} L_{k} (θ) = S_{k}^{-} (θ) C^{T} Σ_{k}^{- 1} (θ), \end{matrix}

(64)

\begin{matrix} m_{k} (θ) = m_{k}^{-} (θ) + L_{k} (θ) (y_{k} - C m_{k}^{-} (θ)), \end{matrix}

(65)

\begin{matrix} S_{k} (θ) = S_{k}^{-} (θ) - L_{k} (θ) Σ_{k} (θ) L_{k} {(θ)}^{T}, \end{matrix}

(66)

\begin{matrix} m_{k + 1}^{-} (θ) = A_{k} m_{k} (θ) + B_{k}, \end{matrix}

(67)

\begin{matrix} S_{k + 1}^{-} (θ) = A_{k} S_{k} (θ) A_{k}^{T} + G_{k} G_{k}^{T}, k = 0, 1 \dots, N, \end{matrix}

(68)

with initial conditions

m_{0}^{-} (θ)

,

S_{0}^{-} (θ)

.

The proof is given in Appendix A. The Equations (63)–(68), are, in fact, a family of discrete-time Kalman filters indexed by

θ

. The first four formulas describe the correction step. The prediction step is given by the last two equations. The matrix

L_{k}

is the Kalman gain, and

Σ_{k}

is the covariance matrix of the output prediction error

ϵ_{k} = y_{k} - C m_{k}^{-}

.

Lemma 4.

Efficient computation of

d_{i, j}

. Let us define

{\tilde{A}}_{k} = [\begin{matrix} A (θ_{i}, u_{k}) & 0 \\ 0 & A (θ_{j}, u_{k}) \end{matrix}], {\tilde{B}}_{k} = [\begin{matrix} B (θ_{i}, u_{k}) \\ B (θ_{j}, u_{k}) \end{matrix}],

(69)

{\tilde{G}}_{k} = [\begin{matrix} G (θ_{i}, u_{k}) & 0 \\ 0 & G (θ_{j}, u_{k}) \end{matrix}], \tilde{C} = \frac{1}{\sqrt{2}} [\begin{matrix} C & - C \end{matrix}]

(70)

and let

\begin{matrix} {\tilde{Σ}}_{k} = S_{v} + \tilde{C} {\tilde{S}}_{k}^{-} {\tilde{C}}^{T}, \end{matrix}

(71)

\begin{matrix} {\tilde{L}}_{k} = {\tilde{S}}_{k}^{-} {\tilde{C}}^{T} {\tilde{Σ}}_{k}^{- 1}, \end{matrix}

(72)

\begin{matrix} {\tilde{S}}_{k} = {\tilde{S}}_{k}^{-} - {\tilde{L}}_{k} {\tilde{Σ}}_{k} {\tilde{L}}_{k}^{T}, \end{matrix}

(73)

\begin{matrix} {\tilde{m}}_{k + 1}^{-} = {\tilde{A}}_{k} (I - {\tilde{L}}_{k} \tilde{C}) {\tilde{m}}_{k}^{-} + {\tilde{B}}_{k}, \end{matrix}

(74)

\begin{matrix} {\tilde{S}}_{k + 1}^{-} = {\tilde{A}}_{k} {\tilde{S}}_{k} {\tilde{A}}_{k}^{T} + {\tilde{G}}_{k} {\tilde{G}}_{k}^{T}, k = 0, 1, \dots, N, \end{matrix}

(75)

with initial conditions

{\tilde{m}}_{0}^{-} = [\begin{matrix} m_{0}^{-} (θ_{i}) \\ m_{0}^{-} (θ_{j}) \end{matrix}], {\tilde{S}}_{0}^{-} = [\begin{matrix} S_{0}^{-} (θ_{i}) & 0 \\ 0 & S_{0}^{-} (θ_{j}) \end{matrix}] .

(76)

Then, the quantity

d_{i, j} (U)

in Formula (31) is given by

d_{i, j} (U) = \frac{1}{4} \sum_{k = 0}^{N} | \tilde{C} {\tilde{m}}_{k}^{-} |_{{\tilde{Σ}}_{k}^{- 1}}^{2} + \frac{1}{2} \sum_{k = 0}^{N} \ln | {\tilde{Σ}}_{k} | - \frac{1}{4} \ln (| S_{i} | | S_{j} |),

(77)

where

| S_{i} | = | S (θ_{i}, U) |

,

| S_{j} | = | S (θ_{j}, U) |

are calculated according to Lemma 3, Equation (61).

The proof is given in Appendix A. Let us observe that instead of calculating the inverse and determinant of the large matrices

S_{i}

,

S_{j}

,

\frac{1}{2} (S_{i} + S_{j})

, of dimension

(N + 1) n_{y}

, we only need to calculate the determinants and inverses of the much smaller matrices

Σ_{k}

,

{\tilde{Σ}}_{k}

, whose dimension is

n_{y}

, which is usually a small number.

5. Comparison with Classical Methods of Input Signal Design

Classical methods for input signal design in system identification are primarily concerned with LTI state space or transfer function models (such as ARMAX) and are usually based on maximizing some functions of error covariance or the Fisher information matrix. For the prediction error method (PEM) estimator, the asymptotic form of the error covariance matrix (or Fisher information) is well known, both in the time and frequency domains. In the time domain, the solution corresponds to a specific input signal, whereas in the frequency domain, the solution yields the optimal power spectral density of the input signal. Below, we provide a brief overview of these methods, following the methodology presented in [1,2] [Chapter 9. Sections 9.3 and 9.4] and [4] [Section 6.1].

Consider the LTI, SISO system

\begin{matrix} x_{k + 1} = A (θ) x_{k} + B (θ) u_{k} + G (θ) w_{k}, \end{matrix}

(78)

\begin{matrix} y_{k} = C x_{k} + v_{k}, \end{matrix}

(79)

under the assumptions stated in Section 4. System (78), (79) is equivalent to the transfer function model

y_{k} = G (θ, z) u_{k} + H (θ, z) e_{k},

(80)

where

e_{k} \sim N (0, σ_{e}^{2})

is a sequence of mutually independent Gaussian variables. The filters G and H are determined by the formulas

G (θ, z) = C {(z I - A (θ))}^{- 1} B (θ), H (θ, z) = 1 + C {(z I - A (θ))}^{- 1} K (θ),

(81)

where the Kalman gain

K (θ)

is given by

K (θ) = A (θ) S (θ) C^{T} {(C S (θ) C^{T} + σ_{v}^{2})}^{- 1},

(82)

with a non-negative matrix

S

being a solution of the Riccati equation (cf. [2])

S = A S A^{T} + G G^{T} - A S C^{T} {(C S C^{T} + σ_{v}^{2})}^{- 1} C S A^{T} .

(83)

The prediction errors are given by the recurrence

ϵ_{k} (θ, Y, U) = H^{- 1} (θ, z) (y_{k} - G (θ, z) u_{k}) .

(84)

The cost function used in the prediction error method (PEM) is expressed as

V (θ, Y, U) = \frac{1}{2 N σ_{e}^{2}} \sum_{k = 1}^{N} ϵ_{k}^{2} (θ, U) .

(85)

Minimization of (85) with reference to

θ

yields the PEM estimator

\hat{θ} (Y, U) = \arg \min_{θ \in Θ} V (θ, Y, U) .

(86)

The above estimator, under rather weak identifiability conditions, is consistent, asymptotically normal, and efficient, i.e., it achieves the Cramér–Rao lower bound. Following the reasoning presented in [1,2] [Chapter 9, Section 9.3 and 9.4] or [4] [Section 6.1], we divide the parameter vector into two groups related to the parameters appearing in G and H, that is,

θ = col (θ_{H}, θ_{G})

. The sensitivity of

ϵ_{k}

to changes in

θ_{G}

is calculated recursively according to the following equations:

ψ_{k} (θ, U) = H^{- 1} (θ, z) (\nabla_{θ_{G}} G (θ, z)) u_{k} = F_{z} (θ, z) u_{k},

(87)

where

\nabla_{θ_{G}}

means differentiation only with respect to the parameters that occur in G. The information matrix, which is also the inverse of the error covariance

P_{θ_{G}}

, is given by

M (θ, U) = P_{θ_{G}}^{- 1} (θ, U) = R_{e} (θ) + \frac{1}{N σ_{e}^{2}} \sum_{k = 1}^{N} ψ_{k} (θ, U) ψ_{k} {(θ, U)}^{T},

(88)

where

R_{e}

does not depend on

U

. Using the D-optimal criterion, the optimal signal is given through maximization of

\det (M (θ^{0}, U))

, where

θ^{0}

is the true value of the parameter. Since

θ^{0}

is unknown, one can use the prior distribution and maximize the average D-optimal criterion:

Q (U) = E_{p_{0} (θ)} \det (M (θ, U)),

(89)

with constraints (3) or (4). The asymptotic error covariance can also be expressed in terms of the power spectral density of the input signal

u_{k}

. Let

Φ_{u}

denote the spectral density of

u_{k}

. As was shown in [2] [p. 291] and [4] [Section 6.2], we have

M (Φ_{u}, θ^{0}) = P_{θ_{G}}^{- 1} (Φ_{u}, θ^{0}) = \frac{N}{2 π σ_{e}^{2}} \int_{- π}^{π} F_{z} (e^{i ω}, θ^{0}) F_{z} {(e^{- i ω}, θ^{0})}^{T} Φ_{u} (ω) d ω + R_{e} (θ^{0}),

(90)

where

F_{z}

is defined by (87), and the term

R_{e}

in (90) does not depend on

Φ_{u}

. Similarly to before, the parameter-averaged determinant of the matrix

M

is maximized with respect to

Φ_{u}

, subject to the signal power and frequency constraints. Typically, the spectrum

Φ_{u}

is parametrized by a finite number of coefficients

c_{k}

, so that the resulting optimization problem is convex; see [37] for details. After performing spectral factorization of

Φ_{u}

, a filter is obtained, whose input is white noise and whose output yields the optimal signal

u_{k}

. This has been implemented in the MOOSE-2 solver [38]. Unfortunately, MOOSE-2 does not allow for averaging over the prior and involves unknown value of the parameter.

Numerous variants of the aforementioned methods can be found in the literature. For example, instead of the D-optimality criterion, one may also consider maximizing

tr (M)

or

λ_{\min} (M)

. However, the vast majority of methods are based on the principles stated above (see, e.g., [4]), that is, maximization of some functions of the Fisher information matrix. Finally, we note that the above methods employ the classical optimality criterion, averaged only over the prior distribution. Consequently, they are not fully Bayesian and, following the terminology of [11], should rather be referred to as pseudo-Bayesian methods.

6. Examples of Input Signal Design

In the following, we present four examples of optimal input signal design using both Bayesian and classical methods. Examples 1–3 are classical in nature and concern time-invariant linear stochastic systems. Examples 1 and 2 are elementary, while Example 3, taken from [28], addresses the design of a control signal for a paradigmatic model of the atomic sensor. The sensor is modeled as a harmonic oscillator with the natural frequency being the parameter of interest. In Examples 1–3, the Bayesian approach is compared with classical methods. Maximization of the spectral criterion (90) was performed using the MOOSE-2 solver [38], evaluated at

θ = m_{θ}

with default parameters, that is, the input spectrum was FIR-type with 20 lags and the spectrum power constraint was set to 1 (prob.spectrum.signal.power.ub = 1). There were no additional constraints on the shape of the spectrum.

Example 4, adapted from [29], is more advanced and considers the design of the pump laser control signal in an optically pumped magnetometer. The magnetometer is modeled as a quasi-linear stochastic system, where the matrices

A

,

B

, and

G

depend non-linearly on the control signal u. For this system, classical methods cannot be applied. Therefore, estimation errors are compared with the Information-Theoretic Lower Bound (ITB) provided in Theorem 1 and with the errors obtained by using an appropriately selected harmonic input signal.

In all examples, the errors were computed using the Monte Carlo method. The parameter

θ

and the initial conditions

x_{0}

were sampled from the prior distribution

p_{0} (x_{0}, θ)

. Observations

y_{0}, \dots, y_{N}

corresponding to the sampled parameters and initial conditions were then generated, and the error of the MAP estimator was calculated. This error was subsequently averaged over many repetitions of the procedure.

6.1. Elementary Example

We begin with a very simple first-order system

\begin{matrix} x_{k + 1} = θ_{1} x_{k} + θ_{2} u_{k} + g w_{k}, \end{matrix}

(91)

\begin{matrix} y_{k} = x_{k} + σ_{v} v_{k}, k = 0, 1, \dots, N, \end{matrix}

(92)

with

σ_{v} = 0.1

,

g = 0.01

. The parameter vector

θ = {[θ_{1}, θ_{2}]}^{T}

has a prior distribution

p_{0} (θ) = N (θ, m_{θ}, S_{θ})

, where

m_{θ} = {[\begin{matrix} 0.8 & 0.2 \end{matrix}]}^{T}

,

S_{θ} = 10^{- 2} I

. As assumed in Section 4, the initial condition

x_{0}

is conditionally Gaussian; that is,

p (x_{0} | θ) = N (x_{0}, m_{0}^{-} (θ), s_{0}^{-} (θ))

, with

m_{0}^{-} (θ) = 0

,

s_{0}^{-} (θ) = 0.01

. The length of the signal

N = 100

and the set of admissible signals is given by (3) with

\tilde{U} = 0

; that is, the norm of the signal cannot be greater than

ϱ

. To minimize the averaged D-optimal criterion (89), we need to calculate the sensitivity of the prediction error. The sensitivity Equation (87) now take the form

\begin{matrix} (1 + (K (θ_{1}) - θ_{1}) z^{- 1}) (1 - θ_{1} z^{- 1}) ψ_{1, k} = θ_{2} z^{- 2} u_{k}, \end{matrix}

(93)

\begin{matrix} (1 + (K (θ_{1}) - θ_{1}) z^{- 1}) ψ_{2, k} = z^{- 1} u_{k}, \end{matrix}

(94)

where the Kalman gain

K (θ_{1})

is given by (82), (83) with

A = θ_{1}

,

G = g

, and

C = 1

.

The optimal input signals were designed by maximizing the Bayesian criterion (44), the averaged D-optimal criterion (89), and the spectral criterion (90), subject to the constraint (3) with

\tilde{U} = 0

.

The optimal signals and the corresponding estimation errors of

θ_{1}

and

θ_{2}

are shown in Figure 1 and Figure 2. In Figure 2, we also calculate the estimation errors for the constant (step) signal, which is certainly not optimal. The constant signal and the MOOSE signal were always assigned a norm equal to

ϱ

.

6.2. Example with a Non-Gaussian Prior Distribution

Consider the following system:

d x = (A_{c} (θ) + B_{c} (θ) u) d t + G_{c} (θ) d w,

(95)

y_{k} = C x_{k} + s_{v} v_{k},

(96)

where

A_{c} (θ) = [\begin{matrix} 0 & 1 \\ 0 & - θ \end{matrix}], B_{c} = [\begin{matrix} 0 \\ θ \end{matrix}], G_{c} (θ) = [\begin{matrix} 0 \\ \sqrt{d_{c} θ} \end{matrix}], C = [\begin{matrix} 1 & 0 \end{matrix}],

(97)

x_{k} = x (t_{k})

,

t_{k} = k Δ

,

Δ = 0.05 \cdot 10^{- 3}

,

d_{c} = 0.01

,

s_{v} = 0.1

. This system can be considered controlled Brownian motion or a DC motor with stochastic disturbances. The parameter

θ

is the unknown damping rate of the system. Assuming that

u (t) = u_{k}, t \in [t_{k}, t_{k + 1}]

, the discrete-time system corresponding to (95) has the form

x_{k + 1} = A (θ) x_{k} + B (θ) u_{k} + G (θ) w_{k},

(98)

where, according to the procedure given in Appendix C,

\begin{matrix} A (θ) = [\begin{matrix} 1 & \frac{1 - e^{- θ Δ}}{θ} \\ 0 & e^{- θ Δ} \end{matrix}], B (θ) = [\begin{matrix} Δ - \frac{1 - e^{- θ Δ}}{θ} \\ 1 - e^{- θ Δ} \end{matrix}], \\ G (θ) = \sqrt{d_{c} θ} [\begin{matrix} \sqrt{D_{1, 1} (θ)} & 0 \\ \frac{D_{1, 2} (θ)}{\sqrt{D_{1, 1} (θ)}} & \sqrt{\frac{D_{1, 1} (θ) D_{2, 2} (θ) - D_{1, 2} {(θ)}^{2}}{D_{1, 1} (θ)}} \end{matrix}], \end{matrix}

(99)

where

\begin{matrix} D_{1, 1} (θ) = \frac{4 e^{- θ Δ} - e^{- 2 θ Δ} + 2 θ Δ - 3}{2 θ^{3}}, \end{matrix}

(100)

\begin{matrix} D_{1, 2} (θ) = \frac{1 - 2 e^{- θ Δ} + e^{- 2 θ Δ}}{2 θ^{2}}, D_{2, 2} (θ) = \frac{1 - e^{- 2 θ Δ}}{2 θ} . \end{matrix}

(101)

The initial condition is Gaussian with

m_{0}^{-} = 0

,

S_{0}^{-} = diag [0.001, 0.005]

. Unlike in the previous example, here, we assume that the prior distribution of

θ

is uniform, that is,

p_{0} (θ) = U [a, b]

with

a = 0.05

,

b = 2

. Following the Gauss–Legendre Formula (43), we get

θ_{1} = 0.5 (a + b - (b - a) / \sqrt{3}) \approx 0.462

,

θ_{2} = 0.5 (a + b + (b - a) / \sqrt{3}) \approx 1.588

,

p_{0, 1} = p_{0, 2} = 0.5

. Thus,

r = 2

in (31), and according to (35), the Bayesian optimal signal is a solution of the simplified and convex optimization problem (36) with

d_{1, 2}

defined by Lemmas 3 and 4. Moreover, since the matrices in (99) do not depend on

u_{k}

, the last two terms in (77) can be omitted. The set of admissible signals is given by (3) with

\tilde{U} = 0

; that is, the signal norm cannot be greater than

ϱ

.

In order to employ the classical methods described in Section 5, it is necessary to first evaluate the sensitivity of the prediction error. The transfer functions G and H in (80) have the form

G (θ, z) = \frac{B (θ, z)}{A (θ, z)} z^{- 1}, H (θ, z) = \frac{C (θ, z)}{A (θ, z)},

(102)

where

\begin{matrix} A (θ, z) = 1 - (1 + e^{θ Δ})) z^{- 1} + e^{- θ Δ} z^{- 2}, \end{matrix}

(103)

\begin{matrix} B (θ, z) = Δ - \frac{1 - e^{- θ Δ}}{θ} + (\frac{1}{θ} - (\frac{1}{θ} + Δ) e^{- θ Δ}) z^{- 1}, \end{matrix}

(104)

\begin{matrix} C (θ, z) = 1 + (K_{1} (θ) - (1 - e^{- θ Δ})) z^{- 1} + (\frac{1 - e^{- θ Δ}}{θ} K_{2} (θ) + (1 - K_{1} (θ)) e^{- θ Δ}) z^{- 2}, \end{matrix}

(105)

and the Kalman gain

K

is given by (82). Since we only have one parameter, the sensitivity

ψ_{k}

is a number, and the sensitivity Equation (87) now takes the form

A (θ, z) C (θ, z) ψ_{k} (θ, U) = (\frac{\partial B (θ, z)}{\partial θ} A (θ, z) - B (θ, z) \frac{\partial A (θ, z)}{\partial θ}) z^{- 1} u_{k} .

(106)

The D-optimal signal is then obtained through maximization of the averaged D-optimal criterion

Q (U) = E_{p_{0} (θ)} [\frac{1}{N} \sum_{k = 1}^{N} ψ_{k}^{2} (θ, U)],

(107)

with constraints (3).

The optimal input signals were designed by maximizing the Bayesian criterion (44) and the averaged D-optimal criterion (107), subject to the constraint (3) with

\tilde{U} = 0

. The results are presented in Figure 3 and Figure 4. Figure 4 also shows the estimation error for the step signal (constant) and the PRBS signal. The constant and PRBS signals were always assigned a norm equal to

ϱ

.

6.3. Optimal Input Design for the Atomic Sensor Model

In [28], a simplified paradigmatic model of an atomic sensor (an atomic magnetometer [39,40]) is introduced, in which the dynamics is governed by oscillations of the collective spin of an atomic ensemble subjected to an external magnetic field. The system is driven by circularly polarized light from a pump laser, whose frequency acts as the input signal. A linearly polarized probe laser illuminates the atoms, and upon transmission through the medium, its polarization undergoes a Faraday rotation. The

J_{z}

component of the collective spin is inferred from the measurement of the probe laser’s polarization angle. The model presented in [28] describes the dynamics of the spin components

J = {[J_{y}, J_{z}]}^{T}

and has the form

d J = [\begin{matrix} - \frac{1}{T_{2}} & ω_{L} \\ - ω_{L} & - \frac{1}{T_{2}} \end{matrix}] J d t + [\begin{matrix} 0 \\ 1 \end{matrix}] E (t) d t + d w^{(J)},

(108)

where

ω_{L}

is the Larmor frequency,

T_{2} = 0.87

ms is the relaxation time,

E

is the pumping laser frequency, and

w^{(J)}

is the Wiener process with known covariance

q I

. The observation has the form

I_{k} = g_{D} J_{z} (k Δ) + ξ_{k}

,

k = 0, 1, \dots

, where

I_{k}

is the photocurrent,

Δ = 5

μ

s is the sampling time, and

ξ_{k} \sim N (0, σ_{ξ}^{2})

and

g_{D}

,

σ_{ξ}

are known parameters. The Larmor frequency and the external magnetic field B are related to each other by the formula

ω_{L} = γ_{e} B

, where

γ_{e}

is the gyromagnetic ratio. Hence, by measuring

ω_{L}

, one can determine the field B. Taking

T_{2}

as the time unit and after rescaling the time, state variables, observations, and the input signal

E

, we get the following, equivalent to (108), stochastic system:

d x = (A_{c} (θ) + B_{c} u) d t + G_{c} d w,

(109)

y_{k} = C x_{k} + s_{v} v_{k},

(110)

where

A_{c} (θ) = [\begin{matrix} - 1 & θ \\ - θ & - 1 \end{matrix}], B_{c} = [\begin{matrix} 0 \\ b_{c} \end{matrix}], G_{c} = \sqrt{2} I, C = [\begin{matrix} 0 & 1 \end{matrix}],

(111)

x_{k} = x (t_{k})

,

t_{k} = k Δ

,

Δ = 5.7471 \cdot 10^{- 3}

,

s_{v} = 11.85

, and

b_{c} = 10^{5}

. The input signal u in (109) corresponds to

E

in (108). The parameter

θ

in (109) is related to the Larmor frequency

ω_{L}

in (108), by formula

θ = ω_{L} T_{2}

. Since the estimation error of the parameter

θ

depends on the input signal u, a natural question arises as to what form this signal should take. To solve this problem, we will go to discrete time and apply the methodology described in Section 4 and Section 5. Assuming that

u (t) = u_{k}, t \in [t_{k}, t_{k + 1}]

, the discrete-time system corresponding to (109) has the form

x_{k + 1} = A (θ) x_{k} + B (θ) u_{k} + G w_{k},

(112)

where, according to the procedure given in Appendix C,

\begin{matrix} A (θ) = e^{- Δ} [\begin{matrix} \cos θ Δ & \sin θ Δ \\ - \sin θ Δ & \cos θ Δ \end{matrix}], B (θ) = \frac{b_{c}}{1 + θ^{2}} [\begin{matrix} θ - e^{- Δ} (θ \cos θ Δ + \sin θ Δ) \\ 1 - e^{- Δ} (\cos θ Δ - θ \sin θ Δ) \end{matrix}], \end{matrix}

(113)

\begin{matrix} G = \sqrt{1 - e^{- 2 Δ}} I . \end{matrix}

(114)

We assume that the prior distribution of

θ

is Gaussian, that is,

p_{0} (θ) = N (θ, m_{θ}, s_{θ})

with

m_{θ} = 54.6637

,

s_{θ} = 10.76

, which corresponds to the Larmor frequency of 10 kHz and an initial uncertainty in the order of 600 Hz (

3 σ

). At the beginning of the process, the system is in thermal equilibrium, that is,

p (x_{0} | θ) = N (x_{0}, 0, I)

. Following Lemma 2, we get

θ_{1} = m_{θ} - \sqrt{s_{θ}} \approx 51

,

θ_{2} = m_{θ} + \sqrt{s_{θ}} \approx 58

,

p_{0, 1} = p_{0, 2} = 0.5

. Thus,

r = 2

in (31), and according to (35), the Bayesian optimal signal is a solution of the simplified and convex optimization problem (36) with

d_{1, 2}

defined by Lemmas 3 and 4. Moreover, since the matrices in (113), (114) do not depend on

u_{k}

, the last two terms in (77) can be omitted. The set of admissible signals is given by (3) with

\tilde{U} = 0

; that is, the signal norm cannot be greater than

ϱ

.

In order to employ the classical methods described in Section 5, it is necessary to first evaluate the sensitivity of the prediction error. Similarly to in the previous example, the polynomials

A, B, C

have the form

\begin{matrix} A (θ, z) = 1 - 2 e^{- Δ} \cos (θ Δ) z^{- 1} + e^{- 2 Δ} z^{- 2}, \end{matrix}

(115)

\begin{matrix} B (θ, z) = B_{2} (θ) - e^{- Δ} (B_{1} (θ) \sin (θ Δ) + B_{2} (θ) \cos (θ Δ)) z^{- 1}, \end{matrix}

(116)

\begin{matrix} \begin{matrix} C (θ, z) = 1 + (K_{2} (θ) - 2 e^{- Δ} \cos (θ Δ)) z^{- 1} + \\ + e^{- Δ} (e^{- Δ} - K_{1} (θ) \sin (θ Δ) - K_{2} (θ) \cos (θ Δ)) z^{- 2}, \end{matrix} \end{matrix}

(117)

and the Kalman gain

K

and the vector

B

are given by (82) and (113), respectively. The sensitivity Equation (87) is given by (106). The D-optimal signal is then obtained through maximization of the averaged D-optimal criterion (107) with constraints (3).

The optimal input signals were designed by maximizing the Bayesian criterion (44), the averaged D-optimal criterion (107), and the spectral criterion (90), subject to the constraint (3) with

\tilde{U} = 0

. The results are presented in Figure 5 and Figure 6. Figure 6 also shows the estimation error for the step (constant) signal and the harmonic signal

u (t) = a \cos (m_{θ} t)

. The frequency of the harmonic signal was equal to the expected value of the a priori distribution of the parameter

θ

. The constant, MOOSE, and harmonic signals were always assigned a norm equal to

ϱ

.

6.4. Bayesian Input Signal Design for the Pump Laser in an Optically Pumped Magnetometer

Optically pumped magnetometers operate by aligning atomic spins with a circularly polarized pump laser, after which the spins precess around the external magnetic field at the Larmor frequency. The probe laser measures this precession via polarization rotation (the Faraday effect), linking the detected signal to the magnetic field [39,40]. The pump laser’s frequency strongly affect spin polarization and coherence time, making precise laser control central to minimizing the estimation error. The advanced control strategies can then suppress the noise and enhance sensitivity. Consequently, accurate control of the pumping laser is a key factor in achieving a high-resolution and low-error magnetometer. We consider here the magnetometer model given by Equation (S9) in the article [29]:

\frac{d F}{d τ} = (- γ_{e} B + G S_{3} \hat{z}) \times F - γ F + P (τ) (\hat{z} F_{m a x} - F) + G_{0} (P (τ)) w,

(118)

where

F = {(F_{x}, F_{y}, F_{z})}^{T}

is the collective atomic spin,

γ_{e}

is the electron gyromagnetic ratio,

B = {(B_{x}, B_{y}, B_{z})}^{T}

is a constant magnetic field vector, G is a known positive constant, and

G S_{3} \hat{z}

is the effective field produced by ac-Stark shifts due to the probe laser, where

S_{3}

is white Gaussian noise with the variance

σ_{3}^{2}

. The optical pumping rate

P (τ) ⩾ 0

is an input signal. The atomic spin noise

G_{0} (P (τ)) w

is modeled as a white Gaussian where

w = {(w_{x}, w_{y}, w_{z})}^{T}

is a vector of standard and mutually independent Wiener increments. The

G_{0}

matrix is diagonal and is given by

G_{0} (P (τ)) = \sqrt{\frac{2}{3} F (F + 1) N_{A} (γ + P (τ))} I,

(119)

where

N_{A}

is the number of atoms, and F is a known atomic spin number. Parameter

F_{m a x} = N_{A} F

is the maximum possible polarization. The transverse relaxation rate

γ

depends on the number of atoms and is given by

γ (N_{A}) = T_{2} {(N_{A})}^{- 1} = γ_{0} + 10^{- 12} α N_{A}

, where

γ_{0}

and

α

are known positive constants and

T_{2}

is the effective coherence time. The observation equation has the form

S_{2} = F_{z} + N_{S_{2}},

(120)

where

N_{S_{2}}

denotes the measurement noise with the variance

σ_{2}^{2}

. In the experiment, the

S_{2}

component of the Stokes vector is measured at discrete time moments

t_{k} = k Δ

, where

Δ

is the sampling period. The realistic parameters of the model are given in Table 1.

In what follows, Equation (118) will be interpreted in the Itö sense. Moreover, we assume that the noise $G S_{3}$ in (118) is small and can be omitted.

By introducing the state variables

ξ = F \sqrt{\frac{3}{F (F + 1) N_{A}}}

, the control variable

u = P / γ

, and non-dimensional time

t = γ τ

and after multiplying both sides of (120) by

\sqrt{\frac{3}{F (F + 1) N_{A}}}

, we get the following model:

d ξ = (A_{c} (θ, u) ξ + B_{c} u) d t + G_{c} (u) d η, y_{k} = ξ_{3} (t_{k}) + σ_{v} v_{k},

(121)

where

η

is the three-dimensional standard Wiener process with unit covariance and

A_{c} (θ, u) = [\begin{matrix} - (1 + u) & θ_{3} & - θ_{2} \\ - θ_{3} & - (1 + u) & θ_{1} \\ θ_{2} & - θ_{1} & - (1 + u) \end{matrix}], B_{c} = [\begin{matrix} 0 \\ 0 \\ b_{c} \end{matrix}], G_{c} (u) = \sqrt{2 (1 + u)} I,

(122)

with

b_{c} = \sqrt{\frac{3 N_{A} F}{F + 1}}

,

σ_{v} = σ_{2} \sqrt{\frac{3}{F (F + 1) N_{A}}}

. Taking the parameters from Table 1, we have

b_{c} = 1.22 \cdot 10^{6}

,

σ_{v} = 11.85

. The parameter vector

θ = {(θ_{1}, θ_{2}, θ_{3})}^{T}

represents the external magnetic field due to the relation

θ = γ_{e} T_{2} B

. If u is a constant signal, then system (121) approaches thermodynamic equilibrium, with

E x (t) = - A_{c} {(θ, u)}^{- 1} B_{c} u

and

cov (x (t)) = I

.

A closer examination of Equation (121) shows that the component

ξ_{3} (t, θ)

of its solution remains invariant under rotations of the vector

θ

about the z-axis. As a result,

θ

, and hence the field

B

, cannot be uniquely identified from the observations

y_{0}, \dots, y_{N}

. The only quantities that can be uniquely identified in this model are the magnitude of the vector

B

and the angle

η

between

B

and one of the coordinate axes, say, the

\hat{z}

axis. However, to simplify the problem as much as possible, we introduce here the additional assumption that the field

B

always lies in the x–y plane, that is,

B = {(B_{x}, B_{y}, 0)}^{T}

. With this assumption, the change in variables

x_{1} = ξ_{1} \sin φ - ξ_{2} \cos φ, x_{2} = ξ_{3},

(123)

\cos φ = \frac{θ_{1}}{\sqrt{θ_{1}^{2} + θ_{2}^{2}}}, \sin φ = \frac{θ_{2}}{\sqrt{θ_{1}^{2} + θ_{2}^{2}}},

(124)

reduces model (121) to a two-dimensional system:

d x = (A_{c} (θ, u) x + B_{c} u) d t + G_{c} (u) d w, y_{k} = C x_{k} + σ_{v} v_{k},

(125)

where

x = {(x_{1}, x_{2})}^{T}

,

θ = γ_{e} T_{2} \sqrt{B_{x}^{2} + B_{y}^{2}}

,

w

is a two-dimensional standard Wiener process with unit covariance, and

A_{c} (θ, u) = [\begin{matrix} - (1 + u) & - θ \\ θ & - (1 + u) \end{matrix}], B_{c} = [\begin{matrix} 0 \\ b_{c} \end{matrix}], C = [\begin{matrix} 0 & 1 \end{matrix}], G_{c} (u) = \sqrt{2 (1 + u)} I .

(126)

Hence, under the assumption

B_{z} = 0

, the observations

y_{0}, \dots, y_{N}

, the variable

ξ_{3}

, and the

F_{z}

component of the collective spin are fully characterized by the reduced model (125). Furthermore, within this reduced model, it can be readily verified that

θ

is uniquely identifiable. Naturally, the accuracy of estimating

θ

depends on the choice of input u. To determine an input u that maximizes the information about

θ

, we now turn to the discrete-time formulation of (125) and apply the methods described in Section 3 and Section 4. Assuming the control signal is piecewise constant, that is,

u (t) = u_{k}, t \in [t_{k}, t_{k + 1}]

, the process

x_{k} = x (t_{k})

satisfies the difference equation

x_{k + 1} = A (θ, u_{k}) x_{k} + B (θ, u_{k}) + G (u_{k}) w_{k},

(127)

where

w_{k} \sim N (0, I)

, and the matrices

A

,

B

,

G

can be calculated following the procedure given in Appendix C. Upon the completion of straightforward calculations, we get

\begin{matrix} A (θ, u_{k}) = e^{- (1 + u_{k}) Δ} [\begin{matrix} \cos θ Δ & - \sin θ Δ \\ \sin θ Δ & \cos θ Δ \end{matrix}], G (u_{k}) = \sqrt{1 - e^{- 2 (1 + u_{k}) Δ}} I, \end{matrix}

(128)

\begin{matrix} B (θ, u_{k}) = \frac{b_{c} u_{k}}{(1 + u_{k}) + θ^{2}} [\begin{matrix} e^{- (1 + u_{k}) Δ} (θ \cos (θ Δ) + (1 + u_{k}) \sin (θ Δ)) - θ \\ e^{- (1 + u_{k}) Δ} (θ \sin (θ) - (1 + u_{k}) \cos (θ Δ)) + (1 + u_{k}) \end{matrix}] . \end{matrix}

(129)

At the beginning of the process, the system is in a thermal equilibrium corresponding to

u \equiv 0

. Hence,

p (x_{0} | θ) = N (x_{0}, 0, I)

. We also assume that the prior distribution of

θ

is Gaussian, that is,

p_{0} (θ) = N (θ, m_{θ}, s_{θ})

with

m_{θ} = 54.6637

,

s_{θ} = 3 \cdot 10^{- 3}

, which corresponds to the Larmor frequency of 10 kHz and its initial uncertainty in the order of 30 Hz (

3 σ

). Similarly to in Section 6.3, we get the following from Lemma 2:

θ_{1} = m_{θ} - \sqrt{s_{θ}} \approx 54.61

,

θ_{2} = m_{θ} + \sqrt{s_{θ}} \approx 54.72

,

p_{0, 1} = p_{0, 2} = 0.5

. Since

r = 2

in (31), then according to (35), the Bayesian optimal signal is a solution of the simplified optimization problem (36) with

d_{1, 2}

calculated using Lemmas 3 and 4. Unlike in the previous examples, in this problem, we maximize criterion (36) with constraints on the signal amplitude, that is,

0 ⩽ u_{k} ⩽ u_{m a x}

, which is preferable in realistic scenarios.

The results are presented in Figure 7, Figure 8 and Figure 9. The optimal input signal consistently lies on the boundary of the admissible set. For a small value of

u_{\max}

, the optimal signal is rectangular, with a frequency close to the a priori Larmor frequency. Since large values of

u (t)

strongly damp spin oscillations and increase the noise, the optimal signals for a large

u_{\max}

value consist of short pulses at the maximum admissible amplitude. Once the oscillations decay, the system should be re-excited using a new sequence of short pulses, repeated periodically, as illustrated in Figure 9. The harmonic signal

u (t) = 0.5 u_{\max} (1 + \cos (m_{θ} t))

is nearly optimal for a small

u_{\max}

value but becomes ineffective for a large

u_{\max}

value, as it strongly damps the oscillations (see the lower-right panel of Figure 8). As a result, the measurements carry less information about the Larmor frequency, and the estimation error increases despite the higher signal amplitude. Analogous behavior is observed for rectangular signals. More generally, let

s (t) \in [0, 1]

, be any signal, and define

u (t) = a s (t)

with

a ⩾ 0

. Then, as illustrated in Figure 7, the estimation error reaches a minimum for some non-zero value of the parameter a.

Extending the experimental duration from 2 to 5 ms reduces the estimation error by a factor of 2 compared to the case shown in Figure 7. For

u_{\max} = 200

and an experimental duration of 5 ms, the harmonic input signal yields an estimation error of 7 mHz, while the optimal signal, shown in the lower-left panel of Figure 9, reduces the error to

0.48

mHz, that is, approximately 14 times smaller. Finally, the estimation error attains the Information-Theoretic Lower Bound (20), demonstrating that in this case the MAP estimator (10) achieves the optimal performance.

It should be noted that the above models assume the Markovian environment, and this condition should be checked in an experiment. To this end, one can use the criterion given in [41]. Non-Markovian models are much more complicated (see, e.g., [42]), and one would need to employ a noise model with long memory. To model long-memory noise, fractional-order stochastic equations can be used instead of (118). Such models capture long-memory effects, and their noise correlation function decays slowly, for example, as

t^{- 1 / 2}

.

To implement the proposed method in real time, one can proceed as follows. First, observe that the pump signal has a simple structure, consisting of short pulses at the maximum admissible amplitude, each lasting approximately 5

μ

s. These pulses should be repeated with a period of roughly

2 T_{2}

, and each pulse should be triggered when the vector

[F_{y}, F_{z}]

forms an angle of about 30° with the z-axis (i.e., 30° before the maximum of

F_{z}

). To estimate the unknown vector

F

and the Larmor frequency, the MAP estimator is too slow for real-time applications. Instead, an Extended Kalman Filter (EKF) can be employed in a manner roughly similar to that described in [43,44]. This approach is considered feasible for implementation in an experimental setup.

7. Discussion and Conclusions

This paper has developed a Bayesian framework for optimal input signal design in the identification of quasi-linear stochastic dynamical systems. Using an Information-Theoretic Lower Bound on the estimation error and its connection to the Bayesian Cramér–Rao Bound, we showed that maximizing mutual information provides a principled alternative to Fisher-information-based criteria. The proposed method relies on the maximization of the MI lower bound (30), which produces a tractable surrogate objective for both finite parameter sets and parameter spaces of continuum cardinality. A key contribution is the algorithmic reduction in the dimension of the covariance matrices required for inversion by a factor of N, making the method feasible for long-term experiments.

The comparison with the average D-optimal design highlights the practical benefits of the Bayesian approach. While classical methods are computationally efficient, they require complex differentiations to evaluate parameter sensitivities and may yield suboptimal results when the parameter uncertainty is large or when the system exhibits significant non-linearities. In contrast, the proposed Bayesian method requires only the system matrices

A

,

B

,

C

,

G

, together with the prior distributions of the parameter and the initial conditions, without the need to calculate derivatives of the prediction errors. This makes the method applicable to a much broader class of systems while also enabling it to handle large initial parameter uncertainty.

The method also has certain limitations. The lower bound of the MI involves exponential terms that can vanish when the pairwise distance factors

d_{i, j} (U)

are large, which can cause numerical problems. However, this drawback can be mitigated through appropriate scaling of the optimization problem. If we consider the simplified optimization problem (36), with only two candidate parameter values, these numerical problems never occur. The discrete approximation of the MI (29) is a potential source of problems, and the weights and nodes in (38) should be carefully selected to achieve a sufficient approximation accuracy. The third limitation arises from the fact that the maximized criterion is only a lower bound on the MI and is generally not tight. Consequently, a class of problems certainly exists for which maximizing this lower bound is inefficient and may generate signals that are far from optimality in the sense of maximizing the MI (19)

In all analyzed examples, the proposed Bayesian approach, although approximate, generated signals no worse, or even better (see Figure 2 and Figure 4), than the classical methods. The second example illustrates that a non-Gaussian prior distribution leads to increased errors in the average D-optimal method. For a Gaussian prior distribution, it was confirmed that both the average D-optimal and the proposed Bayesian method yield identical results. This observation underscores the sensitivity of the classical approach to the form of the prior distribution and highlights the necessity of employing estimation techniques that are robust to non-Gaussian priors. In the third example, the D-optimal method produces results almost identical to those of the Bayesian approach. To explain this, note that in this problem the prior distribution of the parameter

θ

is relatively narrow. Then, the function

d_{1, 2} (U)

, which we minimize in this task, is approximately proportional to the sensitivity of the output to changes in

θ

. Thus,

d_{1, 2} (U)

can be interpreted as a quantity proportional to the Fisher information. Consequently, the resulting input signals and the corresponding estimation errors are nearly identical.

The study of atomic sensor models further demonstrates the practical relevance of the approach. The optimal signals in these examples are always better than the harmonic signal, with a frequency equal to the expected natural frequency of the oscillator. The fourth example, a seemingly minor modification of the oscillator from the third example, shows that the dependence of the system matrices on the control signal is significant and leads to completely different optimal signals. In the analyzed examples, the MAP estimator achieves an Information-Theoretic Lower Bound (20), but this is not always the case, and depending on the task, there are better estimators. Unfortunately, finding them is difficult.

Since the method produces the posterior distribution

p (θ, x_{k} | Y_{k})

, it can be easily converted into a sequential Bayesian Adaptive Design (BAD) algorithm [5,6]. Then, the optimal strategy is a functional of the posterior, that is,

u_{k} = ϕ_{k} (p (θ | Y_{k}), m_{k} (θ), S_{k} (θ))

. In the simplest case, when the matrices

A

,

B

,

G

do not depend on

u_{k}

and

Θ = {θ_{1}, θ_{2}}

, the optimal strategy

ϕ_{k}

can be determined by maximizing (30) on the trajectories of the system (71)–(75). This problem is deterministic and therefore relatively simple and can be solved using deterministic optimal control methods.

From a broader perspective, quasi-linear systems arise naturally in quantum mechanics, chemical engineering, and thermal processes, making the proposed method widely applicable. In conclusion, this work provides both theoretical justification and practical tools for Bayesian input design in quasi-linear stochastic systems. By bridging information-theoretic principles with efficient computational methods, it establishes a foundation for robust experimental design in a wide range of applications. The results reported here should stimulate further research at the intersection of Bayesian inference, control, and the identification of non-linear systems.

Author Contributions

The article concept, the proofs of Theorem 1 and Lemmas 2–4, Appendix A, Appendix B and Appendix C, the MATLAB, R2018b codes and the implementation of Bayesian and classical signal selection methods, the development of all examples, comparison with the classical methods, all formula derivations, and text writing: P.B. Determination of the optimal spectrum and signal generation using the MOOSE-2 toolbox in Section 6.1 and Section 6.3; verification of the correctness of formulas describing discrete systems in Section 6.1, Section 6.3, and Section 6.4; and verification of the proofs of Lemmas 3 and 4: A.W. Text proofreading, literature review, introduction, discussion, and conclusions: P.B. and A.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the statutory subsidy of the AGH University of Science and Technology, No. 16.16.120.773, and by the Initiative of Excellence-Research University (IDUB) program.

Data Availability Statement

The MATLAB codes, in particular the functions for calculating the lower bound (31) and

d_{i, j}

in (32) and (36), are available in the repository at https://github.com/Jhiqo/Bay_design_ql_sys (accessed on 29 September 2025).

Acknowledgments

The authors gratefully acknowledge Morgan Mitchell, Jan Kołodyński, Klaudia Dilcher, Aleksandra Sierant, Julia Amorós-Binefa, and Diana Méndez-Ávalos for the insightful discussions on quantum control and atomic magnetometers.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript: MI—Mutual Information FIM—Fisher Information Matrix; CRB—Cramér–Rao Bound; BCRB—Bayesian Cramér–Rao Bound; ITB—Information-Theoretic Bound; DOE—Design of Experiment; SDE—Stochastic Differential Equation. The norm of the vector

x \in R^{n}

is denoted by |x|. For any square matrix Q, the quadratic form

x^{T} Q x

is denoted by

{| x |}_{Q}^{2}

. The trace and determinant of the matrix A are denoted by trA, |A|, or det(A). The set of symmetric, positive definite matrices of dimension n is denoted by S⁺ (n). The symbol col (a₁, a₂,…, a_n) denotes the column vector. ξ ∼

N

(m,S) means that ξ has a normal distribution with mean m and covariance S. The density of the Gaussian variable is denoted by

N (x, m, S) = {2 π}^{- \frac{n}{2}} {| S |}^{- \frac{1}{2}} {\exp (- 0.5 (x - m)}^{T} S^{- 1} (x - m))

.

Appendix A. Proofs

Proof of Theorem 1.

{\hat{θ}}_{M} (Y, U) = E_{p (θ | Y, U)} (θ | Y, U)

be the Minimum Mean Squared Error estimator of

θ

. The conditional covariance of

{\hat{θ}}_{M}

is given by

C (Y, U) = \int p (θ | Y, U) (θ - {\hat{θ}}_{M} (Y, U)) {(θ - {\hat{θ}}_{M} (Y, U))}^{T} d θ

. Since the Gaussian distribution maximizes entropy over all distributions with the same covariance, it can be proven (see [22] [Theorem 8.6.5, p. 255]) that

H_{θ | Y} (U) = E (- \ln p (θ | Y, U)) ⩽ \frac{1}{2} {E \ln ((2 π e)}^{n_{θ}} | C (Y, U) |) .

(A1)

Any covariance matrix

C

satisfies the inequality

| C | ⩽ {(n_{θ}^{- 1} tr (C))}^{n_{θ}}

(see [22], Theorem 17.9.4, p. 680). Hence,

\ln | C (Y, U) | ⩽ n_{θ} \ln n_{θ}^{- 1} tr C (Y, U) .

(A2)

Taking into account (19) and Equations (A1) and (A2), we have

\begin{matrix} H_{θ} - I_{θ; Y} (U) = H_{θ | Y} (U) = E (- \ln p (θ | Y, U)) ⩽ \\ \frac{1}{2} {E \ln ((2 π e)}^{n_{θ}} | C (Y, U) |) ⩽ \frac{n_{θ}}{2} E \ln (2 π e n_{θ}^{- 1} tr C (Y, U)) . \end{matrix}

According to the concavity of the logarithm and from Jensen’s inequality,

H_{θ} - I_{θ; Y} (U) ⩽ \frac{n_{θ}}{2} \ln (2 π e n_{θ}^{- 1} E tr C (Y, U)) .

Using the equality

E tr C (Y, U) = E | θ - {\hat{θ}}_{M} {(Y, U) |}^{2}

yields

H_{θ} - I_{θ; Y} (U) ⩽ \frac{n_{θ}}{2} \ln (2 π e n_{θ}^{- 1} E | θ - {\hat{θ}}_{M} (Y, U) |^{2}) .

Since

{\hat{θ}}_{M}

is the MMSE, then

E | θ - {\hat{θ}}_{M} {(Y, U) |}^{2} ⩽ E {| θ - \hat{θ} (Y, U) |}^{2}

, and

H_{θ} - I_{θ; Y} (U) ⩽ \frac{n_{θ}}{2} \ln (2 π e n_{θ}^{- 1} E | θ - \hat{θ} (Y, U) |^{2}),

which is equivalent to the first inequality in (20). The proof of the Efroimovitch inequality, that is, the second inequality in (20), is given in [23] [Cor. 3, Chapter 2.2, p. 16]. □

Proof of Lemma 3.

The proof of (59), (63)–(68) is well documented in the literature and can be found in [19] and [45] [Theorem 12.3, p. 187]. However, for completeness and the convenience of the reader, we reproduce it here in its entirety. Let us denote

X_{k} = col (x_{0}, \dots, x_{k})

,

Y_{k} = col (x_{0}, \dots, x_{k})

. We begin by recalling the filtering equations. If

θ

is a fixed parameter, then the solution of Equation (45) is a Gauss–Markov process with the transition density

p (x_{k} | x_{k - 1}, θ) = N (x_{k}, A_{k - 1} x_{k - 1} + B_{k - 1}, A_{k - 1} S_{k - 1} A_{k - 1}^{T} + G_{k - 1} G_{k - 1}^{T}),

(A3)

where we use the notation of Section 4 and we omit the arguments

U

and

u_{k}

in all the formulas below. It follows from (46) that the density of the observations

y_{k}

, conditioned on

X_{k}

,

Y_{k - 1}

,

θ

, has the form

p (y_{k} | X_{k}, Y_{k - 1}, θ) = p (y_{k} | x_{k}, θ) = N (y_{k}, C x_{k}, S_{v} (θ)) .

(A4)

According to the assumptions given at the beginning of Section 4, the initial distribution of

x_{0}

is given by

p (x_{0} | θ) = N (x_{0}, m_{0}^{-} (θ), S_{0}^{-} (θ)),

(A5)

where

m_{0}^{-}

,

S_{0}^{-}

are smooth functions, and

S_{0}^{-} (θ) > 0

, for all

θ \in Θ

. To find

p (Y_{k} | θ)

, we proceed as follows:

\begin{matrix} p (x_{k}, Y_{k} | θ) = \int p (y_{k}, x_{k}, x_{k - 1}, Y_{k - 1} | θ) d x_{k - 1} \\ = \int p (y_{k} | x_{k}, x_{k - 1}, Y_{k - 1}, θ) p (x_{k}, x_{k - 1}, Y_{k - 1} | θ) d x_{k - 1} \\ = p (y_{k} | x_{k}, θ) \int p (x_{k} | x_{k - 1}, Y_{k - 1}, θ) p (x_{k - 1}, Y_{k - 1}, θ) d x_{k - 1} \\ = p (y_{k} | x_{k}, θ) \int p (x_{k} | x_{k - 1}, θ) p (x_{k - 1} | Y_{k - 1}, θ) d x_{k - 1} p (Y_{k - 1} | θ) \\ = p (y_{k} | x_{k}, θ) p (x_{k} | Y_{k - 1}, θ) p (Y_{k - 1} | θ), \end{matrix}

(A6)

where

p (x_{k} | Y_{k - 1}, θ) = \int p (x_{k} | x_{k - 1}, θ) p (x_{k - 1} | Y_{k - 1}, θ) d x_{k - 1},

(A7)

is the so-called predictive distribution or prediction step. Integration of both sides of (A6) over

x_{k}

yields

p (Y_{k} | θ) = p (y_{k} | Y_{k - 1}, θ) p (Y_{k - 1} | θ),

(A8)

where

p (y_{k} | Y_{k - 1}, θ) = \int p (y_{k} | x_{k}, θ) p (x_{k} | Y_{k - 1}, θ) d x_{k},

(A9)

is the predictive distribution of

y_{k}

. Dividing (A6) by (A8) gives the correction step:

p (x_{k} | Y_{k}, θ) = \frac{p (y_{k} | x_{k}, θ) p (x_{k} | Y_{k - 1}, θ)}{p (y_{k} | Y_{k - 1}, θ)} .

(A10)

Summarizing the above considerations, we have the following algorithm.

(1): Set the initial conditions:

p (x_{0} | Y_{- 1}, θ) = p (x_{0} | θ), p (Y_{- 1} | θ) = 1 .

(A11)

(2): For $k = 0, 1, \dots, N$ , calculate

\begin{matrix} p (y_{k} | Y_{k - 1}, θ) = \int p (y_{k} | x_{k}, θ) p (x_{k} | Y_{k - 1}, θ) d x_{k}, \end{matrix}

(A12)

\begin{matrix} p (Y_{k} | θ) = p (y_{k} | Y_{k - 1}, θ) p (Y_{k - 1} | θ), \end{matrix}

(A13)

\begin{matrix} p (x_{k} | Y_{k}, θ) = \frac{p (y_{k} | x_{k}, θ) p (x_{k} | Y_{k - 1}, θ)}{p (y_{k} | Y_{k - 1}, θ) d x_{k}}, \end{matrix}

(A14)

\begin{matrix} p (x_{k + 1} | Y_{k}, θ) = \int p (x_{k + 1} | x_{k}, θ) p (x_{k} | Y_{k}, θ) d x_{k} . \end{matrix}

(A15)

By substituting (A3)–(A5) into (A11)–(A15), and after somewhat tedious calculations, we obtain

\begin{matrix} p (x_{k} | Y_{k}, θ) = N (x_{k}, m_{k} (θ), S_{k} (θ)), \end{matrix}

(A16)

\begin{matrix} p (y_{k} | Y_{k - 1}, θ) = N (y_{k}, C m_{k}^{-} (θ), Σ_{k} (θ)), \end{matrix}

(A17)

where

m_{k} (θ)

,

S_{k} (θ)

,

m_{k}^{-} (θ)

,

Σ_{k} (θ)

,

k = 0, 1, \dots

, are given by the Kalman filtering Equations (63)–(68). Then, by using the recursive formula (A13) and (A17), we get

p (Y | θ) = \prod_{k = 0}^{N} N (y_{k}, C m_{k}^{-} (θ), Σ_{k} (θ)),

(A18)

where

Y = col (y_{0}, \dots, y_{N})

. On the other side, according to (55)–(58), we have

p (Y | θ) = N (Y, F (θ), S (θ)) = \prod_{k = 0}^{N} N (y_{k}, C m_{k}^{-} (θ), Σ_{k} (θ)) .

(A19)

Taking the logarithm of both sides and calculating its expected value, we get

\begin{matrix} \int p (Y | θ) (- \ln p (Y | θ)) d Y = \\ = \frac{1}{2} \ln ({(2 π e)}^{n_{y} (N + 1)} | S (θ) |) = \frac{1}{2} \ln ({(2 π e)}^{n_{y} (N + 1)} \prod_{k = 0}^{N} | Σ_{k} (θ) |) . \end{matrix}

(A20)

Hence,

| S | = \prod_{k = 0}^{N} | Σ_{k} |

, which proves (61). Now, taking the logarithm of (A19), we have

\frac{1}{2} \ln | S | + \frac{1}{2} {| Y - F |}_{S^{- 1}}^{2} = \frac{1}{2} \ln (\prod_{k = 0}^{N} | Σ_{k} |) + \frac{1}{2} \sum_{k = 0}^{N} {| y_{k} - C m_{k}^{-} |}_{Σ_{k}^{- 1}}^{2},

(A21)

where the arguments have been omitted for convenience. Since

| S | = \prod_{k = 0}^{N} | Σ_{k} |

, we get (60). Putting (60) and (61) into (9) gives (62). □

Proof of Lemma 4.

Let

Θ = {θ_{1}, \dots, θ_{r}}, θ_{i} \in R^{n_{θ}}

. We are interested in calculation of the quantity

d_{i, j} (U) = \frac{1}{8} Δ_{i, j}^{T} {(\frac{1}{2} (S_{i} + S_{j}))}^{- 1} Δ_{i, j} + \frac{1}{2} \ln | \frac{1}{2} (S_{i} + S_{j}) | - \frac{1}{4} \ln (| S_{i} | | S_{j} |),

(A22)

where

Δ_{i, j} = F (θ_{i}, U) - F (θ_{j}, U), S_{i} = S (θ_{i}, U), S_{j} = S (θ_{j}, U)

(A23)

and

F (θ, U)

,

S (θ, U)

are defined by Equations (55)–(58) of Section 4. Let us define

\begin{matrix} Y^{(i)} = F (θ_{i}, U) + Z^{(i)}, Y^{(j)} = F (θ_{j}, U) + Z^{(j)}, \end{matrix}

(A24)

\begin{matrix} \tilde{Y} = \frac{1}{\sqrt{2}} (Y^{(i)} - Y^{(j)}) = \frac{1}{\sqrt{2}} (Δ_{i, j} + Z^{(i)} - Z^{(j)}), \end{matrix}

(A25)

where

Z^{(i)} \sim N (0, S_{i})

,

Z^{(j)} \sim N (0, S_{j})

. The density of variable

\tilde{Y}

, given

\tilde{θ} = col (θ_{i}, θ_{j})

, has the form:

p (\tilde{Y} | \tilde{θ}) = N (\tilde{Y}, \frac{1}{\sqrt{2}} Δ_{i, j}, \frac{1}{2} (S_{i} + S_{j})) .

(A26)

Now, let us consider the following two systems:

\begin{matrix} x_{k + 1}^{(i)} = A (θ_{i}, u_{k}) x_{k}^{(i)} + B (θ_{i}, u_{k}) + G (θ_{i}, u_{k}) w_{k}^{(i)}, y_{k}^{(i)} = C x_{k}^{(i)} + v_{k}^{(i)}, \end{matrix}

(A27)

\begin{matrix} x_{k + 1}^{(j)} = A (θ_{j}, u_{k}) x_{k}^{(j)} + B (θ_{j}, u_{k}) + G (θ_{j}, u_{k}) w_{k}^{(j)}, y_{k}^{(j)} = C x_{k}^{(j)} + v_{k}^{(j)}, \end{matrix}

(A28)

where

w_{k}^{(i)}, w_{k}^{(j)} \sim N (0, I)

and

v_{k}^{(i)}, v_{k}^{(j)} \sim N (0, S_{v})

are mutually independent. The components of the vectors

Y^{(i)}

and

Y^{(i)}

in (A24) correspond to the outputs of the systems (A27) and (A28), that is,

Y^{(i)} = col (y_{0}^{(i)}, \dots, y_{N}^{(i)})

,

Y^{(j)} = col ({y^{(j)}}_{0}, \dots, y_{N}^{(j)})

. Then, on the basis of (A25), we get

\tilde{Y} = col ({\tilde{y}}_{0}, \dots, {\tilde{y}}_{0})

, where

{\tilde{y}}_{k} = \frac{1}{\sqrt{2}} (y_{k}^{(i)} - y_{k}^{(j)}) = \frac{1}{\sqrt{2}} (C (x_{k}^{(i)} - x_{k}^{(j)}) + (v_{k}^{(i)} - v_{k}^{(j)})) .

(A29)

Defining the matrices

{\tilde{A}}_{k}

,

{\tilde{B}}_{k}

,

{\tilde{G}}_{k}

,

\tilde{C}

, according to (69) and (70), and taking into account that

\frac{1}{\sqrt{2}} (v_{k}^{(i)} - v_{k}^{(j)}) \sim N (0, S_{v})

, we can replace Equations (A27)–(A29) with a single,

2 n

-dimensional system with

n_{y}

outputs:

{\tilde{x}}_{k + 1} = {\tilde{A}}_{k} (\tilde{θ}) {\tilde{x}}_{k} + {\tilde{B}}_{k} (\tilde{θ}) + {\tilde{G}}_{k} (\tilde{θ}) {\tilde{w}}_{k}, {\tilde{y}}_{k} = \tilde{C} {\tilde{x}}_{k} + v_{k},

(A30)

where

{\tilde{x}}_{k} = col (x_{k}^{(i)}, x_{k}^{(j)})

,

{\tilde{w}}_{k} = col (w_{k}^{(i)}, w_{k}^{(j)})

, and

v_{k} \sim N (0, S_{v})

. Proceeding analogously to the proof of Lemma 3, we infer that the conditional density of variable

\tilde{Y}

is given by

p (\tilde{Y} | \tilde{θ}) = \prod_{k = 0}^{N} N ({\tilde{y}}_{k}, \tilde{C} {\tilde{m}}_{k}^{-} (\tilde{θ}), {\tilde{Σ}}_{k} (\tilde{θ})),

(A31)

where

{\tilde{m}}_{k}^{-} (\tilde{θ})

,

{\tilde{Σ}}_{k} (\tilde{θ})

, are calculated recursively using the Kalman filter equations

\begin{matrix} {\tilde{Σ}}_{k} = S_{v} + \tilde{C} {\tilde{S}}_{k}^{-} {\tilde{C}}^{T}, \end{matrix}

(A32)

\begin{matrix} {\tilde{L}}_{k} = {\tilde{S}}_{k}^{-} {\tilde{C}}^{T} {\tilde{Σ}}_{k}^{- 1}, \end{matrix}

(A33)

\begin{matrix} {\tilde{m}}_{k} = (I - {\tilde{L}}_{k} \tilde{C}) {\tilde{m}}_{k}^{-} + {\tilde{L}}_{k} {\tilde{y}}_{k}, \end{matrix}

(A34)

\begin{matrix} {\tilde{S}}_{k} = {\tilde{S}}_{k}^{-} - {\tilde{L}}_{k} {\tilde{Σ}}_{k} {\tilde{L}}_{k}^{T}, \end{matrix}

(A35)

\begin{matrix} {\tilde{m}}_{k + 1}^{-} = {\tilde{A}}_{k} {\tilde{m}}_{k} + {\tilde{B}}_{k}, \end{matrix}

(A36)

\begin{matrix} {\tilde{S}}_{k + 1}^{-} = {\tilde{A}}_{k} {\tilde{S}}_{k} {\tilde{A}}_{k}^{T} + {\tilde{G}}_{k} {\tilde{G}}_{k}^{T}, k = 0, 1 \dots, N, \end{matrix}

(A37)

with initial conditions (76). Comparing (A26) with (A31) gives:

p (\tilde{Y} | \tilde{θ}) = N (\tilde{Y}, \frac{1}{\sqrt{2}} Δ_{i, j}, \frac{1}{2} (S_{i} + S_{j})) = \prod_{k = 0}^{N} N ({\tilde{y}}_{k}, \tilde{C} {\tilde{m}}_{k}^{-} (\tilde{θ}), {\tilde{Σ}}_{k} (\tilde{θ})) .

(A38)

Taking the logarithm of both sides and calculating its expected value, we get:

\begin{matrix} \int p (\tilde{Y} | \tilde{θ}) (- \ln p (\tilde{Y} | \tilde{θ})) d \tilde{Y} = \\ = \frac{1}{2} \ln ({(2 π e)}^{n_{y} (N + 1)} | \frac{1}{2} (S_{i} + S_{j}) |) = \frac{1}{2} \ln ({(2 π e)}^{n_{y} (N + 1)} \prod_{k = 0}^{N} | {\tilde{Σ}}_{k} (\tilde{θ}) |) . \end{matrix}

(A39)

Hence,

\ln | \frac{1}{2} (S_{i} + S_{j}) | = \sum_{k = 0}^{N} \ln | {\tilde{Σ}}_{k} | .

(A40)

Taking the logarithm of (A38) and applying (A40) yields:

\frac{1}{2} {(\tilde{Y} - \frac{1}{\sqrt{2}} Δ_{i, j})}^{T} {(\frac{1}{2} (S_{i} + S_{j}))}^{- 1} (\tilde{Y} - \frac{1}{\sqrt{2}} Δ_{i, j}) = \frac{1}{2} \sum_{k = 0}^{N} {| {\tilde{y}}_{k} - \tilde{C} {\tilde{m}}_{k}^{-} |}_{{\tilde{Σ}}_{k}^{- 1}}^{2} .

(A41)

Finally, since

\tilde{Y} = col ({\tilde{y}}_{0}, \dots, {\tilde{y}}_{0})

, then substituting

\tilde{Y} = 0

,

{\tilde{y}}_{k} = 0

in (A41), (A34), we conclude that

\frac{1}{8} Δ_{i, j}^{T} {(\frac{1}{2} (S_{i} + S_{j}))}^{- 1} Δ_{i, j} = \frac{1}{4} \sum_{k = 0}^{N} {| \tilde{C} {\tilde{m}}_{k}^{-} |}_{{\tilde{Σ}}_{k}^{- 1}}^{2},

(A42)

where

{\tilde{m}}_{k}^{-}

,

{\tilde{Σ}}_{k}

fulfils Equations (71)–(75). The last term in (A22) is calculated according to Lemma 3. □

Appendix B. An Example of the Gap Between the ITB and BCRB

The difference between the ITB (20) and BCRB (14) can be significant. To see this, let us consider the model

y = θ + v,

(A43)

where

v \sim N (0, 1)

. The elementary calculation yields

J_{D} = 1

. If the prior is Gaussian, that is,

p_{0} (θ) = N (θ, m_{θ}, σ_{θ}^{2})

, then

J_{P} = σ_{θ}^{- 2}

,

2 (H_{θ} - I_{θ; y}) = \ln 2 π e - \ln (σ_{θ}^{- 2} + 1)

, and both the BCRB (14) and ITB (20) yield the same error estimate. Now, let us assume that

p_{0} (θ) = \frac{1}{2} (Φ (α (1 + θ)) + Φ (α (1 - θ)) - 1),

(A44)

where

α > 0

is a parameter, and

Φ (t) = \int_{- \infty}^{t} N (t, 0, 1) d t

. The prior (A44) is an analytic function which, in the limit

α \to \infty

, tends to the uniform distribution

U [- 1, 1]

. Since

H (y | θ) = \frac{1}{2} \ln (2 π e)

,

H (y) ⩽ \frac{1}{2} \ln (2 π e (var (θ) + 1))

,

var (θ) = \frac{1}{3} + O_{1} (α^{- 1})

,

H (θ) = \ln 2 + O_{2} (α^{- 1})

and

H_{θ} - I_{θ; y} = H (y | θ) + H (θ) - H (y)

, after elementary calculations, we get the ITB:

E {(θ - \hat{θ} (y))}^{2} ⩾ \frac{e^{2 (H_{θ} - I_{θ; y})}}{2 π e} ⩾ \frac{3 \sqrt{2}}{8 π e} + O (α^{- 1}) .

(A45)

On the other hand, according to (12), we have

J_{P} = \frac{α^{2}}{2} \int_{- \infty}^{\infty} \frac{{(N (α (1 + θ), 0, 1) - N (α (1 - θ), 0, 1))}^{2}}{Φ (α (1 + θ)) + Φ (α (1 - θ)) - 1} d θ ⩾ \frac{α}{2 \sqrt{π}} \underset{α \to \infty}{\to} \infty .

(A46)

Hence, the BCRB (14) becomes trivial, but the ITB still gives a reasonable error estimate. If the likelihood is non-Gaussian, a similar effect occurs. Therefore, the BCRB generally underestimates the minimum possible estimation error.

Appendix C. Discretization of Linear SDE

Consider the continuous-time SDE

d x = (A_{c} x + B_{c} u) d t + G_{c} d w,

(A47)

where

x (t) \in R^{n}

,

w (t) \in R^{n_{w}}

is a vector of mutually independent standard Wiener processes. Let

Δ

denote the discretization period, and let

u (t) = u_{k}

,

t \in [t_{k}, t_{k + 1}]

,

t_{k} = k Δ

. Then, process

x_{k} = x (t_{k})

fulfils the difference equation:

x_{k + 1} = A x_{k} + B u_{k} + G w_{k},

(A48)

where

w_{k} \sim N (0, I_{n_{w}})

and

A = e^{A_{c} Δ}, B = \int_{0}^{Δ} e^{A_{c} τ} B_{c} d τ, D = G G^{T} = \int_{0}^{Δ} e^{A_{c} τ} G_{c} G_{c}^{T} e^{A_{c}^{T} τ} d τ .

(A49)

If

D > 0

, then

G

can be determined using the Cholesky factorization of

D

. In the general case, we use spectral decomposition

D = Q Λ Q^{T}

, and then

G = Q Λ^{0.5}

.

References

Goodwin, G.C.; Payne, R.L. Dynamic System Identification: Experiment Design and Data Analysis; Academic Press: New York, NY, USA, 1977. [Google Scholar]
Ljung, L. System Identification: Theory for the User, 2nd ed.; Prentice Hall PTR: Saddle River, NJ, USA, 1999. [Google Scholar]
Söderström, T.; Stoica, P. System Identification; Prentice-Hall International Series in Systems and Control Engineering; Prentice-Hall: Saddle River, NJ, USA, 1989. [Google Scholar]
Pronzato, L. Optimal experimental design and some related control problems. Automatica 2008, 44, 303–325. [Google Scholar] [CrossRef]
Huan, X.; Jagalur, J.; Marzouk, Y. Optimal experimental design: Formulations and computations. Acta Numer. 2024, 33, 715–840. [Google Scholar] [CrossRef]
Rainforth, T.; Foster, A.; Ivanova, D.R.; Bickford Smith, F. Modern Bayesian Experimental Design. Stat. Sci. 2024, 39, 100–114. [Google Scholar] [CrossRef]
Fedorov, V.V.; Hackl, P. Model-Oriented Design of Experiments; Springer: Berlin/Heidelberg, Germany, 1997; Volume 125. [Google Scholar]
Lindley, D.V. On a Measure of the Information Provided by an Experiment. Ann. Math. Stat. 1956, 27, 986–1005. [Google Scholar] [CrossRef]
Arimoto, S.; Kimura, H. Optimum input test signals for system identification—An information-theoretical approach. Int. J. Syst. Sci. 1971, 1, 279–290. [Google Scholar] [CrossRef]
Chaloner, K.; Verdinelli, I. Bayesian Experimental Design: A Review. Stat. Sci. 1995, 10, 273–304. [Google Scholar] [CrossRef]
Ryan, E.; Drovandi, C.; McGree, J.; Pettitt, A. A Review of Modern Computational Algorithms for Bayesian Optimal Design. Int. Stat. Rev. 2015, 84, 128–154. [Google Scholar] [CrossRef]
Kolchinsky, A.; Tracey, B.D. Estimating Mixture Entropy with Pairwise Distances. Entropy 2017, 19, 361. [Google Scholar] [CrossRef]
Kolchinsky, A.; Tracey, B.D. Estimating Mixture Entropy with Pairwise Distances. arXiv 2017, arXiv:1706.02419. [Google Scholar]
Altafini, C.; Ticozzi, F. Modeling and Control of Quantum Systems: An Introduction. IEEE Trans. Autom. Control 2012, 57, 1898–1917. [Google Scholar] [CrossRef]
Dong, D.; Petersen, I.R. Quantum control theory and applications: A survey. IET Control Theory Appl. 2010, 4, 2651–2671. [Google Scholar] [CrossRef]
Friedly, J.C. Dynamic Behavior of Processes; Prentice-Hall International Series in the Physical and Chemical Engineering Sciences; Prentice-Hall: Englewood Cliffs, NJ, USA, 1972; p. 590. [Google Scholar]
Lorenz, S.; Diederichs, E.; Telgmann, R.; Schütte, C. Discrimination of Dynamical System Models for Biological and Chemical Processes. J. Comput. Chem. 2007, 28, 1384–1399. [Google Scholar] [CrossRef]
Bania, P. Bayesian Input Design for Linear Dynamical Model Discrimination. Entropy 2019, 21, 351. [Google Scholar] [CrossRef] [PubMed]
Bania, P.; Baranowski, J. Field Kalman Filter and its approximation. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 2875–2880. [Google Scholar] [CrossRef]
Bania, P. Example for equivalence of dual and information based optimal control. Int. J. Control 2018, 92, 2339–2348. [Google Scholar] [CrossRef]
Baranowski, J.; Bania, P.; Prasad, I.; Cong, T. Bayesian fault detection and isolation using Field Kalman Filter. EURASIP J. Adv. Signal Process. 2017, 2017, 79. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
Lee, K.Y. New Information Inequalities with Applications to Statistics. Ph.D. Thesis, EECS Department, University of California, Berkeley, CA, USA, 2022. UC Berkeley Technical Report. [Google Scholar]
Van Trees, H.L. Detection, Estimation and Modulation Theory; Wiley: Hoboken, NJ, USA, 1968; Volume I. [Google Scholar]
Efroimovich, S.Y. Information Contained in a Sequence of Observations. Probl. Peredachi Informatsii 1979, 15, 24–39. [Google Scholar]
Van Trees, H.L.; Bell, K.L. (Eds.) Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/Tracking; Wiley-IEEE Press: Hoboken, NJ, USA, 2007. [Google Scholar]
Jakowluk, W. Optimal Input Signal Design in Control Systems Identification; Oficyna Wydawnicza Politechniki Białostockiej: Białystok, Poland, 2024; Available online: https://pb.edu.pl/oficyna-wydawnicza/wp-content/uploads/sites/4/2024/06/Optimal-input-signal-design-in-control-systems-identification.pdf (accessed on 29 September 2025).
Jiménez-Martínez, R.; Kołodyński, J.; Troullinou, C.; Lucivero, V.G.; Kong, J.; Mitchell, M.W. Signal Tracking Beyond the Time Resolution of an Atomic Sensor by Kalman Filtering. Phys. Rev. Lett. 2018, 120, 040503. [Google Scholar] [CrossRef]
Troullinou, C.; Shah, V.; Lucivero, V.G.; Mitchell, M.W. Squeezed-Light Enhancement and Backaction Evasion in a High-Sensitivity Optically Pumped Magnetometer. Phys. Rev. Lett. 2021, 127, 193601. [Google Scholar] [CrossRef]
Bobrovsky, B.Z.; Mayer-Wolf, E.; Zakai, M. Some Classes of Global Cramér–Rao Bounds. Ann. Stat. 1987, 15, 1421–1438. [Google Scholar] [CrossRef]
Jeong, M.; Dytso, A.; Cardone, M. A Comprehensive Study on Ziv-Zakai Lower Bounds on the MMSE. IEEE Trans. Inf. Theory 2025, 71, 3214–3236. [Google Scholar] [CrossRef]
Huber, M.F.; Bailey, T.; Durrant-Whyte, H.; Hanebeck, U.D. On entropy approximation for Gaussian mixture random vectors. In Proceedings of the 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Seoul, Republic of Korea, 20–22 August 2008; pp. 181–188. [Google Scholar] [CrossRef]
Nocedal, J.; Wright, S.J. Numerical Optimization, 2nd ed.; Springer Series in Operations Research and Financial Engineering; Springer: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
Davis, P.J.; Rabinowitz, P. Methods of Numerical Integration, 2nd ed.; Academic Press: Orlando, FL, USA, 1984. [Google Scholar]
Stroud, A.H. Approximate Calculation of Multiple Integrals; Prentice Hall: Englewood Cliffs, NJ, USA, 1971. [Google Scholar]
Smolyak, S.A. Quadrature and interpolation formulas for tensor products of certain classes of functions. Sov. Math. Dokl. 1963, 4, 240–243. [Google Scholar]
Jansson, H. Experiment Design with Applications in Identification for Control; Royal Institute of Technology (KTH): Stockholm, Sweden, 2004. [Google Scholar]
Annergren, M.; Larsson, C.A. MOOSE2—A toolbox for least-costly application-oriented input design. SoftwareX 2016, 5, 96–100. [Google Scholar] [CrossRef]
Fabricant, A.; Novikova, I.; Bison, G. How to build a magnetometer with thermal atomic vapor: A tutorial. New J. Phys. 2023, 25, 025001. [Google Scholar] [CrossRef]
Budker, D.; Jackson Kimball, D.F. (Eds.) Optical Magnetometry; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar] [CrossRef]
Breuer, H.P.; Laine, E.M.; Piilo, J. Measure for the Degree of Non-Markovian Behavior of Quantum Processes in Open Systems. Phys. Rev. Lett. 2009, 103, 210401. [Google Scholar] [CrossRef]
Shen, H.Z.; Shang, C.; Zhou, Y.H.; Yi, X.X. Unconventional single-photon blockade in non-Markovian systems. Phys. Rev. A 2018, 98, 023856. [Google Scholar] [CrossRef]
Magrini, L.; Rosenzweig, P.; Bach, C.; Deutschmann-Olek, A.; Hofer, S.G.; Hong, S.; Kiesel, N.; Kugi, A.; Aspelmeyer, M. Real-time optimal quantum control of mechanical motion at room temperature. Nature 2021, 595, 373–377. [Google Scholar] [CrossRef]
Amorós-Binefa, J.; Kołodyński, J. Noisy Atomic Magnetometry with Kalman Filtering and Measurement-Based Feedback. PRX Quantum 2025, 6, 030331. [Google Scholar] [CrossRef]
Särkkä, S. Bayesian Filtering and Smoothing; Institute of Mathematical Statistics Textbooks; Cambridge University Press: Cambridge, UK, 2013; Volume 3. [Google Scholar] [CrossRef]

Figure 1. Optimal input signals resulting from the maximization of the Bayesian criterion (44) (top) and of the averaged D-optimal criterion (89) (bottom), shown for several values of

ϱ

.

Figure 1. Optimal input signals resulting from the maximization of the Bayesian criterion (44) (top) and of the averaged D-optimal criterion (89) (bottom), shown for several values of

ϱ

.

Figure 2. Mean estimation errors of the parameters

θ_{1}

and

θ_{2}

, obtained using the MAP estimator (10), as functions of the maximum admissible signal norm

ϱ

. The results are based on a Monte Carlo simulation with 3000 repetitions. The constant (step) signal and the MOOSE signal were always assigned a norm equal to

ϱ

.

Figure 2. Mean estimation errors of the parameters

θ_{1}

and

θ_{2}

, obtained using the MAP estimator (10), as functions of the maximum admissible signal norm

ϱ

. The results are based on a Monte Carlo simulation with 3000 repetitions. The constant (step) signal and the MOOSE signal were always assigned a norm equal to

ϱ

.

Figure 3. Optimal input signals obtained by maximizing the Bayesian criterion (36) (top) and the averaged D-optimal criterion (107) (bottom) subject to the constraint (3) with

\tilde{U} = 0

.

Figure 3. Optimal input signals obtained by maximizing the Bayesian criterion (36) (top) and the averaged D-optimal criterion (107) (bottom) subject to the constraint (3) with

\tilde{U} = 0

.

Figure 4. Mean estimation errors of the parameter

θ

, obtained using the MAP estimator (10), as functions of the maximum admissible signal norm

ϱ

for different signals. The results are based on a Monte Carlo simulation with 6000 repetitions. Error bars show that the difference between the D-optimal and Bayesian methods is statistically significant.

Figure 4. Mean estimation errors of the parameter

θ

, obtained using the MAP estimator (10), as functions of the maximum admissible signal norm

ϱ

for different signals. The results are based on a Monte Carlo simulation with 6000 repetitions. Error bars show that the difference between the D-optimal and Bayesian methods is statistically significant.

Figure 5. Optimal input signals (left) and corresponding system outputs (right) obtained by maximizing the Bayesian criterion (36) (top), the averaged D-optimal criterion (107) (middle), and the spectral criterion (90) (bottom), subject to the constraint (3) with

\tilde{U} = 0

. Maximization of the spectral criterion (90) was performed using the MOOSE-2 solver evaluated at

θ = m_{θ}

. The norm of all signals is equal to 1, and the scale is consistent across all plots.

Figure 5. Optimal input signals (left) and corresponding system outputs (right) obtained by maximizing the Bayesian criterion (36) (top), the averaged D-optimal criterion (107) (middle), and the spectral criterion (90) (bottom), subject to the constraint (3) with

\tilde{U} = 0

. Maximization of the spectral criterion (90) was performed using the MOOSE-2 solver evaluated at

θ = m_{θ}

. The norm of all signals is equal to 1, and the scale is consistent across all plots.

Figure 6. Mean estimation errors of the Larmor frequency

f_{L} = \frac{ω_{L}}{2 π}

, obtained using the MAP estimator (10), as functions of the maximum admissible signal norm

ϱ

for different signals. The results are based on a Monte Carlo simulation with 3000 repetitions.

Figure 6. Mean estimation errors of the Larmor frequency

f_{L} = \frac{ω_{L}}{2 π}

, obtained using the MAP estimator (10), as functions of the maximum admissible signal norm

ϱ

for different signals. The results are based on a Monte Carlo simulation with 3000 repetitions.

Figure 7. The estimation error of the Larmor frequency

f_{L} = \frac{θ}{2 π T_{2}}

and the Information-Theoretic Bound (ITB) (20) as a function of the maximum admissible signal amplitude

u_{\max}

. The errors were computed using the MAP estimator (10). Both the errors and the ITB were calculated for two cases: (i) the optimal input signal and (ii) the harmonic input

u (t) = 0.5 u_{\max} (1 + \cos (m_{θ} t))

. Results are based on a Monte Carlo simulation with 2000 repetitions. The prior was Gaussian with a mean Larmor frequency

f_{L} = 10

kHz and with its initial uncertainty

σ_{f_{L}} = 10

Hz.

Figure 7. The estimation error of the Larmor frequency

f_{L} = \frac{θ}{2 π T_{2}}

and the Information-Theoretic Bound (ITB) (20) as a function of the maximum admissible signal amplitude

u_{\max}

. The errors were computed using the MAP estimator (10). Both the errors and the ITB were calculated for two cases: (i) the optimal input signal and (ii) the harmonic input

u (t) = 0.5 u_{\max} (1 + \cos (m_{θ} t))

. Results are based on a Monte Carlo simulation with 2000 repetitions. The prior was Gaussian with a mean Larmor frequency

f_{L} = 10

kHz and with its initial uncertainty

σ_{f_{L}} = 10

Hz.

Figure 8. The optimal signals with small, medium, and large amplitudes and the corresponding system outputs. The figure in the lower-right panel also shows the system output for the harmonic signal

u (t) = 0.5 u_{m a x} (1 + \cos (m_{θ} t))

. The prior was Gaussian with a mean Larmor frequency

f_{L} = 10

kHz and with its initial uncertainty

σ_{f_{L}} = 10

Hz.

Figure 8. The optimal signals with small, medium, and large amplitudes and the corresponding system outputs. The figure in the lower-right panel also shows the system output for the harmonic signal

u (t) = 0.5 u_{m a x} (1 + \cos (m_{θ} t))

. The prior was Gaussian with a mean Larmor frequency

f_{L} = 10

kHz and with its initial uncertainty

σ_{f_{L}} = 10

Hz.

Figure 9. Optimal input signals of small, medium, and large amplitudes with corresponding system outputs for a 5 ms experiment. The prior was Gaussian with a mean Larmor frequency

f_{L} = 10

kHz, and with its initial uncertainty

σ_{f_{L}} = 10

Hz.

Figure 9. Optimal input signals of small, medium, and large amplitudes with corresponding system outputs for a 5 ms experiment. The prior was Gaussian with a mean Larmor frequency

f_{L} = 10

kHz, and with its initial uncertainty

σ_{f_{L}} = 10

Hz.

Table 1. Typical parameters.

Parameter	Abbreviation	Typical Value
Number of atoms	$N_{A}$	$10^{12}$
Spin number	F	1
Larmor frequencies	$γ_{e} B$	$2 π [- 50, 50]$ kHz
Parameter	$γ_{0}$	600 Hz
Parameter	$α$	550 Hz
Typical relaxation time	$T_{2}$	0.87 ms
Typical relaxation rate	$γ$	1149 Hz
Pumping rate	P	0–200 kHz
Measurement noise level	$σ_{2}$	$9.6755 \times 10^{6}$
Sampling time	$Δ$	5 $μ$ s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bania, P.; Wójcik, A. An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification. Entropy 2025, 27, 1041. https://doi.org/10.3390/e27101041

AMA Style

Bania P, Wójcik A. An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification. Entropy. 2025; 27(10):1041. https://doi.org/10.3390/e27101041

Chicago/Turabian Style

Bania, Piotr, and Anna Wójcik. 2025. "An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification" Entropy 27, no. 10: 1041. https://doi.org/10.3390/e27101041

APA Style

Bania, P., & Wójcik, A. (2025). An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification. Entropy, 27(10), 1041. https://doi.org/10.3390/e27101041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification

Abstract

1. Introduction

2. Formulation of the Problem

3. Approximate Solutions

3.1. Finite Parameter Space

3.2. Infinite Parameter Space

4. Bayesian Input Signal Design in Quasi-Linear Control Systems

5. Comparison with Classical Methods of Input Signal Design

6. Examples of Input Signal Design

6.1. Elementary Example

6.2. Example with a Non-Gaussian Prior Distribution

6.3. Optimal Input Design for the Atomic Sensor Model

6.4. Bayesian Input Signal Design for the Pump Laser in an Optically Pumped Magnetometer

7. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Proofs

Appendix B. An Example of the Gap Between the ITB and BCRB

Appendix C. Discretization of Linear SDE

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI