Next Article in Journal
Counting Cosmic Cycles: Past Big Crunches, Future Recurrence Limits, and the Age of the Quantum Memory Matrix Universe
Previous Article in Journal
Thermoelectric Enhancement of Series-Connected Cross-Conjugated Molecular Junctions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification

Department of Automatic Control and Robotics, Faculty of Electrical Engineering, Automatics, Computer Science, and Biomedical Engineering, AGH University of Krakow, al. A. Mickiewicza 30, 30-059 Krakow, Poland
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(10), 1041; https://doi.org/10.3390/e27101041
Submission received: 11 September 2025 / Revised: 2 October 2025 / Accepted: 3 October 2025 / Published: 7 October 2025
(This article belongs to the Section Signal and Data Analysis)

Abstract

The design of informatively rich input signals is essential for accurate system identification, yet classical Fisher-information-based methods are inherently local and often inadequate in the presence of significant model uncertainty and non-linearity. This paper develops a Bayesian approach that uses the mutual information (MI) between observations and parameters as the utility function. To address the computational intractability of the MI, we maximize a tractable MI lower bound. The method is then applied to the design of an input signal for the identification of quasi-linear stochastic dynamical systems. Evaluating the MI lower bound requires the inversion of large covariance matrices whose dimensions scale with the number of data points N. To overcome this problem, an algorithm that reduces the dimension of the matrices to be inverted by a factor of N is developed, making the approach feasible for long experiments. The proposed Bayesian method is compared with the average D-optimal design method, a semi-Bayesian approach, and its advantages are demonstrated. The effectiveness of the proposed method is further illustrated through four examples, including atomic sensor models, where input signals that generate a large amount of MI are especially important for reducing the estimation error.

1. Introduction

The design of informative input signals is the cornerstone of modern system identification. Without properly chosen excitation, even advanced estimation algorithms may fail to provide accurate parameter estimates, leading to unreliable prediction and control. Classical references such as [1,2,3] and the reviews by [4,5,6] emphasize that identification is not only a matter of statistical estimation but also of experimental design, where the input signal determines the achievable information content. Optimal experimental design (OED) methods therefore play a crucial role in practical applications. Traditionally, OED has relied on the Fisher information matrix (FIM), with criteria such as D or A optimality widely used due to their computational efficiency and asymptotic guarantees [4,7]. However, FIM-based approaches are inherently local, relying on linearization and asymptotic normality. They may thus be fragile in scenarios with large model uncertainty or strongly non-linear stochastic dynamics.
A natural alternative is the Bayesian approach, which evaluates an experiment through its expected information gain, typically quantified by the mutual information between model parameters and observations [5,6,8,9,10,11]. Bayesian design has several advantages: it is globally valid over the parameter space, it naturally incorporates prior information, and it is applicable to non-linear, stochastic models. Its main drawback is computational intractability since mutual information requires high-dimensional integration over parameters and observations. Since the general case is challenging and difficult to solve, in this article, we focus on models that can be represented in the form
Y = F ( θ , U ) + Z ,
where θ { θ 1 , , θ r } is a parameter with the prior distribution P ( θ = θ j ) = p 0 , j . The noise Z is conditionally normal, that is, p ( Z | θ ) = N ( Z , 0 , S ( θ , U ) ) , and U is a design variable. For this class of models, the density of the observations Y is a finite Gaussian mixture of the form p ( Y ) = j = 1 r p 0 , j N ( Y , F ( θ j , U ) , S ( θ , U ) ) . Within these Gaussian mixtures, the mutual information between Y and θ can be estimated from below using the effective and tractable pairwise-distance-based lower bound given by Kolchinsky and Tracey [12,13]. We maximize this bound to achieve an approximate optimal design parameter U and then generalize the method to a parameter space of continuum cardinality. In particular, we show how to treat the Gaussian prior p 0 ( θ ) = N ( θ , m θ , S θ ) and prior distributions with compact support.
In this work, we focus on quasi-linear systems, namely stochastic dynamical systems that are linear in the state variables but non-linear in the control variables. Such systems occur ubiquitously in science and engineering. In quantum mechanics, Hamiltonians and Lindblad dissipators depend on external control fields such as laser intensities, magnetic fields, or gate voltages, leading to a non-linear dependence on the control [14,15]. In chemical processes, flow rates directly determine reaction speeds in continuous stirred tank reactors [16,17]. In thermal plants, convection coefficients scale non-linearly with flow, giving rise to quasi-linear heat transfer dynamics [16]. This broad applicability makes quasi-linear systems a natural and important class for advanced input design. However, there is a notable lack of Bayesian design methods and software tools tailored to this class of systems. Therefore, in this paper, we address this gap by developing such a method. Specifically, we show that a finite sequence of observations generated by a quasi-linear system can always be expressed in the form of the model (1), and we provide an effective algorithm for calculating the lower bound on mutual information.
The study reported here constitutes a substantial and far-reaching extension of the initial results presented in [18], as well as related research reported in [19,20,21]. The article’s main contributions can be summarized as follows. We first introduce an Information-Theoretic Lower Bound (ITB) on the estimation error of any estimator [22,23] and briefly discuss its relation to the Bayesian Cramér–Rao Bound (BCRB) [23,24,25,26]. We conclude that maximizing mutual information is superior to maximizing Bayesian or classical Fisher information, which is consistent with the arguments presented in [5,8,10,11]. Building on this result, we introduce a novel Bayesian design method for model (1). To address the intractability and computational complexity of direct mutual information evaluation, we discretize the parameter space and maximize the Kolchinsky–Tracey lower bound [12,13] and subsequently extend this approach to a parameter space of continuum cardinality. We then focus on the application to linear and quasi-linear system identification. Since the information-theoretic bound requires the inversion of large covariance matrices of the observations, whose dimensions grow linearly with the number of data points N (with N 10 3 10 6 in applications), direct inversion is computationally problematic. To overcome this challenge, we develop an algorithm that reduces the dimension of the matrices needed for inversion by a factor of N, thus making the approach feasible for long experiments. The proposed Bayesian method is compared with the average D-optimal design [1,4,27], a semi-Bayesian method, and its advantages are demonstrated. The effectiveness of the method is further illustrated by four examples. The first two, intentionally elementary, highlight the effectiveness of our approach. The third and fourth examples are drawn from atomic sensor models, a domain where optimal input design is particularly critical. We analyze a controlled harmonic oscillator with stochastic disturbances as a paradigmatic atomic sensor model [28] and a complex magnetometer model with non-linear dependence of the system matrices on the input [29]. In the latter case, we provide a simplified model, derive the optimal input, and demonstrate that it significantly outperforms the harmonic signal, which might otherwise be presumed to be optimal. Moreover, we show that the estimation error of the MAP estimator achieves the theoretical lower bound.
This article is organized as follows. Section 2 formulates the problem. Section 3 develops the approximate Bayesian solution for finite and infinite parameter spaces. Section 4 applies the method to quasi-linear systems. Section 5 compares the approach with classical design methods. Section 6 presents examples. Section 7 provides discussion and conclusions.

2. Formulation of the Problem

Let us consider a family of models
Y = F ( θ , U ) + Z ,
where Y , Z R n Y , U R n U , θ Θ R n θ . The set Θ will be called the parameter space. Parameter θ is unknown. The prior distribution of θ is denoted by p 0 . The random variable Z is conditionally normal, i.e., p ( Z | θ ) = N ( Z , 0 , S ( θ , U ) ) , where S ( θ , U ) S + ( n Y ) , for all θ Θ , U R n U . Functions F and S are smooth. The variable U is called the design parameter or, in the context of dynamical systems, the input signal. The set of admissible signals is given by
U a d = { U R n U ; | U U ˜ | ϱ } ,
where U ˜ is the given vector, and ϱ is the maximal norm of the signal. We will also consider an alternative, and useful in some applications, definition of U a d :
U a d = { U R n U ; U m i n U U m a x } ,
where U m i n , U m a x R n U are fixed vectors. Under these assumptions and after applying the Bayes rule, we acquire the likelihood, evidence, and posterior distribution of θ :
p ( Y | θ , U ) = N ( Y , F ( θ , U ) , S ( θ , U ) ) ,
p ( Y | U ) = p 0 ( θ ) N ( Y , F ( θ , U ) , S ( θ , U ) ) d θ ,
p ( θ | Y , U ) = p 0 ( θ ) p ( Y | θ , U ) p ( Y | U ) .
The Minimum Mean Squared Error (MMSE) estimator of θ is then given by
θ ^ ( Y , U ) = θ p ( θ | Y , U ) d θ .
To avoid the difficulties involved in calculating the integral (8), instead of the MSE, the Maximum a Posteriori (MAP) estimator is typically used. Taking the negative logarithm of both sides of (7) and omitting the terms independent on θ , we get the following:
L ( θ , Y , U ) = 1 2 | Y F ( θ , U ) | S 1 ( θ , U ) 2 + 1 2 ln | S ( θ , U ) | ln p 0 ( θ ) .
Thus, the MAP estimator of θ is given by
θ ^ ( Y , U ) = arg min θ Θ L ( θ , Y , U ) .
Estimators (8) or (10) may be biased, so the Cramér–Rao Bound (CRB) cannot be applied directly to them. However, a Bayesian version of the CRB exists and can be used to estimate the error of any, also biased, estimator [23,24,25,26]. Let us introduce Bayesian information (BI):
J B = E p ( θ , Y | U ) θ L ( θ , Y , U ) ( θ L ( θ , Y , U ) T ) = J D + J P ,
where
J P = E p 0 ( θ ) θ ln p 0 ( θ ) ( θ ln p 0 ( θ ) ) T ,
is the fraction of BI associated with a prior, and
J D ( U ) = E p ( θ , Y | U ) θ ln p ( Y | θ , U ) ( θ ln p ( Y | θ , U ) ) T ,
is part of the BI provided by observations Y . The matrix J D is the Bayesian equivalent of the Fisher information matrix. The Fisher information matrix can be recovered from (13) assuming p 0 ( θ ) = δ ( θ θ 0 ) , where θ 0 is the true value of the parameter. Let θ ^ ( Y , U ) be any estimator of θ . Assuming a sufficiently regular prior, it can be proven that
E | θ θ ^ ( Y , U ) | 2 n θ | J P + J D ( U ) | 1 / n θ ,
which is usually known as the Van Trees inequality or Bayesian Cramér–Rao Bound (BCRB) [23,24,30] [inequality (2.9), p. 17]. Formulas (11)–(13) are well defined only under rather restrictive assumptions. In particular, the joint density p ( Y , θ ) must be differentiable with respect to θ , and it must satisfy the regularity condition θ p ( Y , θ ) d Y = 0 . Moreover, the prior and likelihood distributions must guarantee the existence of the expectations in (12) and (13). This excludes the uniform and many other useful prior distributions and considerably limits the applicability of inequality (14) (see [23] for details). Beyond inequality (14), a large class of Bayesian bounds exists, reported in [26]. Many of these bounds can serve as a utility function. Probably one of the best design criteria is the Ziv–Zakai lower bound [31]. However, to compute this estimate, an additional, and rather complex, optimization sub-problem must be solved, as shown in [31]. Therefore, due to the high computational complexity of the multivariate Ziv–Zakai bound, we will not consider it here. Given the application-oriented focus of this article and following the arguments presented in [5,6], we conclude that the entropy-based lower bound [22] [p. 255] [23] [Section 2.2, pp. 16–17] is a reasonable optimality criterion and provides slightly tighter estimates than the BCRB (cf. [31] [Section V.D]). To proceed, let us define the entropies of Y and θ and the corresponding conditional entropies:
H Y ( U ) = E ( ln p ( Y | U ) ) ,
H θ = E ( ln p 0 ( θ ) ) ,
H Y | θ ( U ) = E ( ln p ( Y | θ , U ) ) = 1 2 p 0 ( θ ) ln ( 2 π e ) n Y | S ( θ , U ) | d θ ,
H θ | Y ( U ) = E ( ln p ( θ | Y , U ) ) .
The mutual information (MI) between θ and Y is defined as
I θ ; Y ( U ) = H θ H θ | Y ( U ) = H Y ( U ) H Y | θ ( U ) .
The following theorem establishes the ultimate limit of the estimation error expressed in terms of mutual information, demonstrating that the Bayesian Cramér–Rao Bound (BCRB) does not constitute a fundamental limit.
Theorem 1.
Let θ ^ ( Y , U ) be any estimator of θ . Then, the following inequalities hold:
E | θ θ ^ ( Y , U ) | 2 n θ ( 2 π e ) 1 e 2 n θ 1 ( H θ I θ ; Y ( U ) ) n θ | J P + J D ( U ) | 1 / n θ .
The proof is given in Appendix A. The first of the inequalities (20) will be called the Information-Theoretic Lower Bound (ITB). The last part of (20) is known as the Efroimovich inequality [23,25] [inequality (2.7), Ch. 2.2, p. 16].
To determine the optimal signal, one may maximize the MI, the determinant of J B , or the determinant of FIM. The latter corresponds to the classical design methods outlined in Section 5. However, the right-hand side of (14) generally underestimates the estimation error, and large values of J B do not necessarily guarantee a small error. We illustrate this problem in Appendix B. In contrast, the ITB (20), which plays a central role in our subsequent analysis, shows that maximizing I θ ; Y ( U ) is essential for reducing the estimation error and provides a more fundamental criterion than maximizing either the Bayesian or classical Fisher information. In particular, in the context of optimal experimental design and input signal design for system identification, maximizing the mutual information between the parameters and the observations constitutes the most principled optimality criterion, as it directly quantifies the amount of knowledge gained about the parameters [5]. Accordingly, we define the optimal signal as the solution of the following optimization problem:
U = arg max U U a d I θ ; Y ( U ) .
As I θ ; Y is smooth and U a d is compact, then (21) is well defined. After solving the task, the MMSE, MAP, or any other estimator can be used to determine θ .
Computing mutual information (MI) or its lower bound remains a significant challenge. The analyses presented in the literature [5,6,10,11] show that this can be undertaken in three main ways: (1) using Monte Carlo or nested Monte Carlo (MC) simulations [5] [Section 3.1] [6]; (2) applying variational lower bound (VLB) estimates of the MI [5] [Section 3.3.1] [6]; or (3) utilizing existing, easily computable estimates of conditional entropy or the MI. Since we aim to numerically maximize the MI, which also depends on the design parameter U, the procedure for calculating MI will be called by the optimization solver millions of times and must therefore be sufficiently fast. Consequently, although MC methods provide good estimates of MI, they are of limited use here. The VLB methods require simultaneous optimization of the variational distribution with respect to its parameters and the signal U [5] [Sections 3.3.1 and 4.3.4] [6]. In addition, stochastic simulations are also used to compute the expected values in the VLB. As the goal of this article is to develop a simple design method that does not require hours of computation, we focus on the third option and use the existing, easily computable lower bounds of entropy or MI provided in [12] or [32].

3. Approximate Solutions

The optimization problem (21) becomes considerably more tractable when the parameter space Θ is finite. Consequently, we first derive an approximate solution for a finite set of parameters and subsequently extend this result to obtain an approximate solution for the case where Θ is an uncountable subset of R n θ .

3.1. Finite Parameter Space

Let Θ = { θ 1 , , θ r } , θ i R n θ , θ i θ j and assume that p 0 is a discrete distribution of the form
p 0 ( θ ) = j = 1 r p 0 , j δ ( θ θ j ) ,
where p 0 , j = P ( θ = θ j ) . Then, on the basis of (6) and (22), the density of Y becomes a Gaussian mixture:
p ( Y | U ) = j = 1 r p 0 , j N ( Y , F ( θ j , U ) , S ( θ j , U ) ) .
The application of the Bayes rule gives the posterior
p ( θ j | Y , U ) = p 0 , j N ( Y , F ( θ j , U ) , S ( θ j , U ) ) p ( Y | U ) .
The discrete counterpart of Formulas (15)–(19) takes the form
H Y ( U ) = p ( Y | U ) ln p ( Y | U ) d Y ,
H θ = j = 1 r p 0 , j ln p 0 , j ,
H Y | θ ( U ) = 1 2 j = 1 r p 0 , j ln ( 2 π e ) n Y | S ( θ j , U ) | ,
H θ | Y ( U ) = p ( Y | U ) j = 1 r p ( θ j | Y , U ) ln p ( θ j | Y , U ) d Y ,
I θ ; Y ( U ) = H θ H θ | Y ( U ) = H Y ( U ) H Y | θ ( U ) .
Direct computation of mutual information (29) remains difficult and, in many cases, intractable. Hence, our central idea is to overcome this difficulty by replacing mutual information (29) with a computationally tractable and non-trivial lower bound. In particular, we observe that (23) is a finite Gaussian mixture. For such mixtures, one of the most effective lower bounds on I θ | Y is the inequality introduced in [12].
Lemma 1
(Information bounds [12]). For the Gaussian mixture (23) with p 0 , j = P ( θ = θ j ) , the following inequality holds:
I l ( U ) I θ ; Y ( U ) H θ ,
where
I l ( U ) = i = 1 r p 0 , i ln j = 1 r p 0 , j e d i , j ( U ) ,
d i , j ( U ) = 1 8 Δ i , j T 1 2 ( S i + S j ) 1 Δ i , j + 1 2 ln | 1 2 ( S i + S j ) | 1 4 ln | S i | | S j | ,
Δ i , j = F ( θ i , U ) F ( θ j , U ) , S i = S ( θ i , U ) , S j = S ( θ j , U ) .
Detailed proof is given in [13] [Section IIIb, inequality (11), and Section IV, Formula (15) with α = 0.5 ] and also in [12]. Now, the approximate solution of (21) is given by
U = arg max U U a d I l ( U ) .
Since U a d is compact and I l is smooth and bounded, the solution of (34) exists. We also note that in the case of two alternatives, that is, when r = 2 in (31), we get
e I l ( U ) = p 0 , 1 + p 0 , 2 e d 1 , 2 ( U ) p 0 , 1 p 0 , 1 e d 1 , 2 ( U ) + p 0 , 2 p 0 , 2 .
Accordingly, the optimal signal in this case arises as the solution of a somewhat simplified optimization problem:
max U U a d d 1 , 2 ( U ) .
If the function F in (2) is affine with respect to U and the covariance S does not depend on U , then it follows from Lemma 1, that d 1 , 2 is a positive (semi-) definite quadratic form with respect to U . For constraints (4), we thus obtain a convex quadratic programming problem. In the case of constraints (3), one needs to find the minimum of d 1 , 2 on a closed ball in R n U . This is also a convex problem, and it can be reduced to finding zeros of a scalar function [33] [Theorems 4.1, p. 70 and Section 4.3]. Furthermore, if F ( θ i , U ) = F i U , i = 1 , 2 , and the constraints are defined by (3), then the solution of (36) is the eigenvector of the matrix Q = ( F 1 F 2 ) T ( S 1 + S 2 ) 1 ( F 1 F 2 ) , corresponding to its largest eigenvalue (see [18] [Section 2.1] for details).

3.2. Infinite Parameter Space

Let us assume that Θ = R n θ and consider the Gaussian prior
p 0 ( θ ) = N ( θ , m θ , S θ ) , S θ > 0 .
Then, the integral in (6) can be approximated with a finite Gaussian mixture
p ( Y | U ) = p 0 ( θ ) N ( Y , F ( θ , U ) , S ( θ , U ) ) d θ j = 1 N a p 0 , j N ( Y , F ( θ j , U ) , S ( θ j , U ) ) ,
where p 0 , j 0 and j = 1 N a p 0 , j = 1 . The weights p 0 , j and the nodes θ j in (38) can be calculated by using the multidimensional Gauss–Hermite quadrature rule or any other suitable method. The Gauss–Hermite quadrature of the order p is exact for polynomials of a degree of at most 2 p 1 . The approximation error of the integral (38) using the Gauss–Hermite quadrature of the order p depends on the 2 p -th derivatives of the functions F and S. In the single-parameter case, with the Gaussian prior N ( θ , m θ , σ θ ) , the error estimate is given by the formula
e σ θ 4 p C p ! sup θ , U , Y d 2 p d θ 2 p N ( Y , F ( θ , U ) , S ( θ , U ) ) ,
where C is constant. The error tends to zero as p or σ θ 0 . Therefore, the Gauss–Hermite approximation of the integral (38) is especially useful when the prior is narrow (small σ θ ) or when the integrand, in the neighborhood of the point m θ , can be well approximated using low-degree polynomials. To illustrate the method, we will show only a very simple second-order Gaussian quadrature rule with 2 n θ points.
Lemma 2.
The approximate value of the integral J ( f ) = N ( θ , m θ , S θ ) f ( θ ) d θ is given by
J ( f ) 1 2 n θ j = 1 2 n θ f ( θ j ) ,
where
θ 2 i 1 = m θ S θ 0.5 n θ e i , θ 2 i = m θ + S θ 0.5 n θ e i , i = 1 , , n θ
and e i is i t h basis vector in R n θ . If f ( θ ) = 1 2 θ T A θ + b T θ + c , then the equality holds in (40).
Proof. 
Direct calculation.  □
An analogous method can be used for prior distributions defined on compact subsets of R n θ (e.g., an n-dimensional hypercube), but the formulas for the nodes and weights in (38) will then change. For example, if θ is a scalar parameter and the prior distribution is uniform, that is, p 0 ( θ ) = U [ a , b ] , then, using a second-order Gauss–Legendre quadrature, the approximate value of the integral a b p 0 ( θ ) f ( θ ) d θ is computed using the formula
a b p 0 ( θ ) f ( θ ) d θ p 0 , 1 f ( θ 1 ) + p 0 , 2 f ( θ 2 ) ,
where p 0 , 1 = p 0 , 2 = 0.5 and
θ 1 = 1 2 a + b b a 3 , θ 2 = 1 2 a + b + b a 3 .
Formula (42) is exact for polynomials of degree 3. The error estimate is analogous to (39) and tends to zero when b a 0 . More general multidimensional formulas, integration methods, and error estimates are given in [34,35,36].
The application of Lemma 2 to (38) gives N a = 2 n θ , p 0 , j = ( 2 n θ ) 1 . Now, since (38) is approximated by a Gaussian mixture, the results of Section 3.1 can be used. Based on Equations (31)–(33) and (38) and Lemma 1, the information’s lower bound takes the form
I l ( U ) = 1 2 n θ i = 1 2 n θ ln 1 2 n θ j = 1 2 n θ e d i , j ( U ) ,
where d i , j and θ j are given by Equations (31)–(33) and (41) or (43). The approximate solution of (21) can be found by maximizing (44) with constraints (3) or (4).

4. Bayesian Input Signal Design in Quasi-Linear Control Systems

Consider the family of quasi-linear systems
x k + 1 = A ( θ , u k ) x k + B ( θ , u k ) + G ( θ , u k ) w k ,
y k = C x k + v k ,
where k = 0 , 1 , , N , N 1 , x k R n , y k R n y , w k R n w , v k R n y , w k N ( 0 , I ) , v k N ( 0 , S v ) , S v > 0 . Variables x 0 , w 0 , , w N 1 , v 0 , , v N are mutually independent. The initial state x 0 is conditionally normal, i.e., p ( x 0 | θ ) = N ( x 0 , m 0 ( θ ) , S 0 ( θ ) ) , where m 0 , S 0 are smooth and S 0 ( θ ) > 0 , for all θ Θ . The joint prior distribution of the initial state x 0 and the parameter θ is given by p 0 ( x 0 , θ ) = p 0 ( θ ) N ( x 0 , m 0 ( θ ) , S 0 ( θ ) ) . Let us define A k = A ( θ , u k ) , B k = B ( θ , u k ) , G k = G ( θ , u k ) . Then, the solution of (45) has the form
x 0 = I x 0 ,
x 1 = A 0 x 0 + B 0 + G 0 w 0 ,
x 2 = A 1 x 1 + B 1 + G 1 w 1 = A 1 A 0 x 0 + A 1 B 0 + B 1 + A 1 G 0 w 0 + G 1 w 1 ,
x N = Φ ( N , 0 ) x 0 + j = 0 N 1 Φ ( N , j + 1 ) B j + j = 0 N 1 Φ ( N , j + 1 ) G j w j ,
where Φ ( n , n ) = I and
Φ ( n , j ) = i = 1 n j A n i , j < n .
Now, if we denote X = col ( x 0 , , x N ) , Y = col ( y 0 , , y N ) , U = col ( u 0 , , u N 1 ) , W = col ( w 0 , , w N 1 ) , V = col ( v 0 , , v N ) , we can rewrite Equations (46)–(51), in matrix-vector form:
X = A ( θ , U ) x 0 + B ( θ , U ) + G ( θ , U ) W ,
Y = C X + V ,
where the matrices A , B , G , C = I N + 1 C follow directly from Equations (46)–(51), and W N ( 0 , I N n w ) , V N ( 0 , I N + 1 S v ) . Substituting (53) into (54) and taking into account that p ( x 0 | θ ) = N ( x 0 , m 0 ( θ ) , S 0 ( θ ) ) , we get
Y = C A ( θ , U ) m 0 ( θ ) + C B ( θ , U ) + Z ,
Z = C A ( θ , U ) ( x 0 m 0 ( θ ) ) + C G ( θ , U ) W + V .
The conditional density of variable Z has the form p ( Z | θ ) = N ( Z , 0 , S ( θ , U ) ) , where the covariance matrix S is given by
S ( θ , U ) = C ( A ( θ , U ) S 0 ( θ ) A ( θ , U ) T + G ( θ , U ) G ( θ , U ) T ) C T + I N + 1 S v .
Finally, if we define
F ( θ , U ) = C A ( θ , U ) m 0 ( θ ) + C B ( θ , U ) ,
we can rewrite (55) in the form Y = F ( θ , U ) + Z , which is exactly the model (2). To find the optimal input signal, we maximize one of the criteria (31) or (44) with constraints (3) or (4).
With a large number of data (large N), calculating the inverse and determinant of a very large matrix S ( θ , U ) in (9) and calculating the quantities d i , j ( U ) in Equations (31)–(33) is numerically ill conditioned and requires special treatment. The algorithms below reduce the size of the matrices necessary to invert by a factor of N + 1 .
Lemma 3.
Efficient computation of log-likelihood. The following identities hold:
p ( Y | θ , U ) = k = 0 N N ( y k , C m k ( θ ) , Σ k ( θ ) ) ,
| Y F ( θ , U ) | S ( θ , U ) 1 2 = k = 0 N | y k C m k ( θ ) | Σ k 1 ( θ ) 2 ,
| S ( θ , U ) | = k = 0 N | Σ k ( θ ) | ,
L ( θ , Y , U ) = 1 2 k = 0 N | y k C m k ( θ ) | Σ k 1 ( θ ) 2 + ln | Σ k ( θ ) | ln p 0 ( θ ) ,
where L is given by (9) and m k , Σ k , are calculated recursively by the Kalman filter
Σ k ( θ ) = S v + C S k ( θ ) C T ,
L k ( θ ) = S k ( θ ) C T Σ k 1 ( θ ) ,
m k ( θ ) = m k ( θ ) + L k ( θ ) ( y k C m k ( θ ) ) ,
S k ( θ ) = S k ( θ ) L k ( θ ) Σ k ( θ ) L k ( θ ) T ,
m k + 1 ( θ ) = A k m k ( θ ) + B k ,
S k + 1 ( θ ) = A k S k ( θ ) A k T + G k G k T , k = 0 , 1 , N ,
with initial conditions m 0 ( θ ) , S 0 ( θ ) .
The proof is given in Appendix A. The Equations (63)–(68), are, in fact, a family of discrete-time Kalman filters indexed by θ . The first four formulas describe the correction step. The prediction step is given by the last two equations. The matrix L k is the Kalman gain, and Σ k is the covariance matrix of the output prediction error ϵ k = y k C m k .
Lemma 4.
Efficient computation of d i , j . Let us define
A ˜ k = A ( θ i , u k ) 0 0 A ( θ j , u k ) , B ˜ k = B ( θ i , u k ) B ( θ j , u k ) ,
G ˜ k = G ( θ i , u k ) 0 0 G ( θ j , u k ) , C ˜ = 1 2 C C
and let
Σ ˜ k = S v + C ˜ S ˜ k C ˜ T ,
L ˜ k = S ˜ k C ˜ T Σ ˜ k 1 ,
S ˜ k = S ˜ k L ˜ k Σ ˜ k L ˜ k T ,
m ˜ k + 1 = A ˜ k ( I L ˜ k C ˜ ) m ˜ k + B ˜ k ,
S ˜ k + 1 = A ˜ k S ˜ k A ˜ k T + G ˜ k G ˜ k T , k = 0 , 1 , , N ,
with initial conditions
m ˜ 0 = m 0 ( θ i ) m 0 ( θ j ) , S ˜ 0 = S 0 ( θ i ) 0 0 S 0 ( θ j ) .
Then, the quantity d i , j ( U ) in Formula (31) is given by
d i , j ( U ) = 1 4 k = 0 N | C ˜ m ˜ k | Σ ˜ k 1 2 + 1 2 k = 0 N ln | Σ ˜ k | 1 4 ln | S i | | S j | ,
where | S i | = | S ( θ i , U ) | , | S j | = | S ( θ j , U ) | are calculated according to Lemma 3, Equation (61).
The proof is given in Appendix A. Let us observe that instead of calculating the inverse and determinant of the large matrices S i , S j , 1 2 ( S i + S j ) , of dimension ( N + 1 ) n y , we only need to calculate the determinants and inverses of the much smaller matrices Σ k , Σ ˜ k , whose dimension is n y , which is usually a small number.

5. Comparison with Classical Methods of Input Signal Design

Classical methods for input signal design in system identification are primarily concerned with LTI state space or transfer function models (such as ARMAX) and are usually based on maximizing some functions of error covariance or the Fisher information matrix. For the prediction error method (PEM) estimator, the asymptotic form of the error covariance matrix (or Fisher information) is well known, both in the time and frequency domains. In the time domain, the solution corresponds to a specific input signal, whereas in the frequency domain, the solution yields the optimal power spectral density of the input signal. Below, we provide a brief overview of these methods, following the methodology presented in [1,2] [Chapter 9. Sections 9.3 and 9.4] and [4] [Section 6.1].
Consider the LTI, SISO system
x k + 1 = A ( θ ) x k + B ( θ ) u k + G ( θ ) w k ,
y k = C x k + v k ,
under the assumptions stated in Section 4. System (78), (79) is equivalent to the transfer function model
y k = G ( θ , z ) u k + H ( θ , z ) e k ,
where e k N ( 0 , σ e 2 ) is a sequence of mutually independent Gaussian variables. The filters G and H are determined by the formulas
G ( θ , z ) = C ( z I A ( θ ) ) 1 B ( θ ) , H ( θ , z ) = 1 + C ( z I A ( θ ) ) 1 K ( θ ) ,
where the Kalman gain K ( θ ) is given by
K ( θ ) = A ( θ ) S ( θ ) C T ( C S ( θ ) C T + σ v 2 ) 1 ,
with a non-negative matrix S being a solution of the Riccati equation (cf. [2])
S = A S A T + G G T A S C T ( C S C T + σ v 2 ) 1 C S A T .
The prediction errors are given by the recurrence
ϵ k ( θ , Y , U ) = H 1 ( θ , z ) y k G ( θ , z ) u k .
The cost function used in the prediction error method (PEM) is expressed as
V ( θ , Y , U ) = 1 2 N σ e 2 k = 1 N ϵ k 2 ( θ , U ) .
Minimization of (85) with reference to θ yields the PEM estimator
θ ^ ( Y , U ) = arg min θ Θ V ( θ , Y , U ) .
The above estimator, under rather weak identifiability conditions, is consistent, asymptotically normal, and efficient, i.e., it achieves the Cramér–Rao lower bound. Following the reasoning presented in [1,2] [Chapter 9, Section 9.3 and 9.4] or [4] [Section 6.1], we divide the parameter vector into two groups related to the parameters appearing in G and H, that is, θ = col ( θ H , θ G ) . The sensitivity of ϵ k to changes in θ G is calculated recursively according to the following equations:
ψ k ( θ , U ) = H 1 ( θ , z ) θ G G ( θ , z ) u k = F z ( θ , z ) u k ,
where θ G means differentiation only with respect to the parameters that occur in G. The information matrix, which is also the inverse of the error covariance P θ G , is given by
M ( θ , U ) = P θ G 1 ( θ , U ) = R e ( θ ) + 1 N σ e 2 k = 1 N ψ k ( θ , U ) ψ k ( θ , U ) T ,
where R e does not depend on U . Using the D-optimal criterion, the optimal signal is given through maximization of det M ( θ 0 , U ) , where θ 0 is the true value of the parameter. Since θ 0 is unknown, one can use the prior distribution and maximize the average D-optimal criterion:
Q ( U ) = E p 0 ( θ ) det M ( θ , U ) ,
with constraints (3) or (4). The asymptotic error covariance can also be expressed in terms of the power spectral density of the input signal u k . Let Φ u denote the spectral density of u k . As was shown in [2] [p. 291] and [4] [Section 6.2], we have
M ( Φ u , θ 0 ) = P θ G 1 ( Φ u , θ 0 ) = N 2 π σ e 2 π π F z ( e i ω , θ 0 ) F z ( e i ω , θ 0 ) T Φ u ( ω ) d ω + R e ( θ 0 ) ,
where F z is defined by (87), and the term R e in (90) does not depend on Φ u . Similarly to before, the parameter-averaged determinant of the matrix M is maximized with respect to Φ u , subject to the signal power and frequency constraints. Typically, the spectrum Φ u is parametrized by a finite number of coefficients c k , so that the resulting optimization problem is convex; see [37] for details. After performing spectral factorization of Φ u , a filter is obtained, whose input is white noise and whose output yields the optimal signal u k . This has been implemented in the MOOSE-2 solver [38]. Unfortunately, MOOSE-2 does not allow for averaging over the prior and involves unknown value of the parameter.
Numerous variants of the aforementioned methods can be found in the literature. For example, instead of the D-optimality criterion, one may also consider maximizing tr ( M ) or λ min ( M ) . However, the vast majority of methods are based on the principles stated above (see, e.g., [4]), that is, maximization of some functions of the Fisher information matrix. Finally, we note that the above methods employ the classical optimality criterion, averaged only over the prior distribution. Consequently, they are not fully Bayesian and, following the terminology of [11], should rather be referred to as pseudo-Bayesian methods.

6. Examples of Input Signal Design

In the following, we present four examples of optimal input signal design using both Bayesian and classical methods. Examples 1–3 are classical in nature and concern time-invariant linear stochastic systems. Examples 1 and 2 are elementary, while Example 3, taken from [28], addresses the design of a control signal for a paradigmatic model of the atomic sensor. The sensor is modeled as a harmonic oscillator with the natural frequency being the parameter of interest. In Examples 1–3, the Bayesian approach is compared with classical methods. Maximization of the spectral criterion (90) was performed using the MOOSE-2 solver [38], evaluated at θ = m θ with default parameters, that is, the input spectrum was FIR-type with 20 lags and the spectrum power constraint was set to 1 (prob.spectrum.signal.power.ub = 1). There were no additional constraints on the shape of the spectrum.
Example 4, adapted from [29], is more advanced and considers the design of the pump laser control signal in an optically pumped magnetometer. The magnetometer is modeled as a quasi-linear stochastic system, where the matrices A , B , and G depend non-linearly on the control signal u. For this system, classical methods cannot be applied. Therefore, estimation errors are compared with the Information-Theoretic Lower Bound (ITB) provided in Theorem 1 and with the errors obtained by using an appropriately selected harmonic input signal.
In all examples, the errors were computed using the Monte Carlo method. The parameter θ and the initial conditions x 0 were sampled from the prior distribution p 0 ( x 0 , θ ) . Observations y 0 , , y N corresponding to the sampled parameters and initial conditions were then generated, and the error of the MAP estimator was calculated. This error was subsequently averaged over many repetitions of the procedure.

6.1. Elementary Example

We begin with a very simple first-order system
x k + 1 = θ 1 x k + θ 2 u k + g w k ,
y k = x k + σ v v k , k = 0 , 1 , , N ,
with σ v = 0.1 , g = 0.01 . The parameter vector θ = [ θ 1 , θ 2 ] T has a prior distribution p 0 ( θ ) = N ( θ , m θ , S θ ) , where m θ = 0.8 0.2 T , S θ = 10 2 I . As assumed in Section 4, the initial condition x 0 is conditionally Gaussian; that is, p ( x 0 | θ ) = N ( x 0 , m 0 ( θ ) , s 0 ( θ ) ) , with m 0 ( θ ) = 0 , s 0 ( θ ) = 0.01 . The length of the signal N = 100 and the set of admissible signals is given by (3) with U ˜ = 0 ; that is, the norm of the signal cannot be greater than ϱ . To minimize the averaged D-optimal criterion (89), we need to calculate the sensitivity of the prediction error. The sensitivity Equation (87) now take the form
( 1 + ( K ( θ 1 ) θ 1 ) z 1 ) ( 1 θ 1 z 1 ) ψ 1 , k = θ 2 z 2 u k ,
( 1 + ( K ( θ 1 ) θ 1 ) z 1 ) ψ 2 , k = z 1 u k ,
where the Kalman gain K ( θ 1 ) is given by (82), (83) with A = θ 1 , G = g , and C = 1 .
The optimal input signals were designed by maximizing the Bayesian criterion (44), the averaged D-optimal criterion (89), and the spectral criterion (90), subject to the constraint (3) with U ˜ = 0 .
The optimal signals and the corresponding estimation errors of θ 1 and θ 2 are shown in Figure 1 and Figure 2. In Figure 2, we also calculate the estimation errors for the constant (step) signal, which is certainly not optimal. The constant signal and the MOOSE signal were always assigned a norm equal to ϱ .

6.2. Example with a Non-Gaussian Prior Distribution

Consider the following system:
d x = A c ( θ ) + B c ( θ ) u d t + G c ( θ ) d w ,
y k = C x k + s v v k ,
where
A c ( θ ) = 0 1 0 θ , B c = 0 θ , G c ( θ ) = 0 d c θ , C = 1 0 ,
x k = x ( t k ) , t k = k Δ , Δ = 0.05 · 10 3 , d c = 0.01 , s v = 0.1 . This system can be considered controlled Brownian motion or a DC motor with stochastic disturbances. The parameter θ is the unknown damping rate of the system. Assuming that u ( t ) = u k , t [ t k , t k + 1 ] , the discrete-time system corresponding to (95) has the form
x k + 1 = A ( θ ) x k + B ( θ ) u k + G ( θ ) w k ,
where, according to the procedure given in Appendix C,
A ( θ ) = 1 1 e θ Δ θ 0 e θ Δ , B ( θ ) = Δ 1 e θ Δ θ 1 e θ Δ , G ( θ ) = d c θ D 1 , 1 ( θ ) 0 D 1 , 2 ( θ ) D 1 , 1 ( θ ) D 1 , 1 ( θ ) D 2 , 2 ( θ ) D 1 , 2 ( θ ) 2 D 1 , 1 ( θ ) ,
where
D 1 , 1 ( θ ) = 4 e θ Δ e 2 θ Δ + 2 θ Δ 3 2 θ 3 ,
D 1 , 2 ( θ ) = 1 2 e θ Δ + e 2 θ Δ 2 θ 2 , D 2 , 2 ( θ ) = 1 e 2 θ Δ 2 θ .
The initial condition is Gaussian with m 0 = 0 , S 0 = diag [ 0.001 , 0.005 ] . Unlike in the previous example, here, we assume that the prior distribution of θ is uniform, that is, p 0 ( θ ) = U [ a , b ] with a = 0.05 , b = 2 . Following the Gauss–Legendre Formula (43), we get θ 1 = 0.5 ( a + b ( b a ) / 3 ) 0.462 , θ 2 = 0.5 ( a + b + ( b a ) / 3 ) 1.588 , p 0 , 1 = p 0 , 2 = 0.5 . Thus, r = 2 in (31), and according to (35), the Bayesian optimal signal is a solution of the simplified and convex optimization problem (36) with d 1 , 2 defined by Lemmas 3 and 4. Moreover, since the matrices in (99) do not depend on u k , the last two terms in (77) can be omitted. The set of admissible signals is given by (3) with U ˜ = 0 ; that is, the signal norm cannot be greater than ϱ .
In order to employ the classical methods described in Section 5, it is necessary to first evaluate the sensitivity of the prediction error. The transfer functions G and H in (80) have the form
G ( θ , z ) = B ( θ , z ) A ( θ , z ) z 1 , H ( θ , z ) = C ( θ , z ) A ( θ , z ) ,
where
A ( θ , z ) = 1 ( 1 + e θ Δ ) ) z 1 + e θ Δ z 2 ,
B ( θ , z ) = Δ 1 e θ Δ θ + 1 θ 1 θ + Δ e θ Δ z 1 ,
C ( θ , z ) = 1 + K 1 ( θ ) 1 e θ Δ z 1 + 1 e θ Δ θ K 2 ( θ ) + ( 1 K 1 ( θ ) ) e θ Δ z 2 ,
and the Kalman gain K is given by (82). Since we only have one parameter, the sensitivity ψ k is a number, and the sensitivity Equation (87) now takes the form
A ( θ , z ) C ( θ , z ) ψ k ( θ , U ) = B ( θ , z ) θ A ( θ , z ) B ( θ , z ) A ( θ , z ) θ z 1 u k .
The D-optimal signal is then obtained through maximization of the averaged D-optimal criterion
Q ( U ) = E p 0 ( θ ) 1 N k = 1 N ψ k 2 ( θ , U ) ,
with constraints (3).
The optimal input signals were designed by maximizing the Bayesian criterion (44) and the averaged D-optimal criterion (107), subject to the constraint (3) with U ˜ = 0 . The results are presented in Figure 3 and Figure 4. Figure 4 also shows the estimation error for the step signal (constant) and the PRBS signal. The constant and PRBS signals were always assigned a norm equal to ϱ .

6.3. Optimal Input Design for the Atomic Sensor Model

In [28], a simplified paradigmatic model of an atomic sensor (an atomic magnetometer [39,40]) is introduced, in which the dynamics is governed by oscillations of the collective spin of an atomic ensemble subjected to an external magnetic field. The system is driven by circularly polarized light from a pump laser, whose frequency acts as the input signal. A linearly polarized probe laser illuminates the atoms, and upon transmission through the medium, its polarization undergoes a Faraday rotation. The J z component of the collective spin is inferred from the measurement of the probe laser’s polarization angle. The model presented in [28] describes the dynamics of the spin components J = [ J y , J z ] T and has the form
d J = 1 T 2 ω L ω L 1 T 2 J d t + 0 1 E ( t ) d t + d w ( J ) ,
where ω L is the Larmor frequency, T 2 = 0.87 ms is the relaxation time, E is the pumping laser frequency, and w ( J ) is the Wiener process with known covariance q I . The observation has the form I k = g D J z ( k Δ ) + ξ k , k = 0 , 1 , , where I k is the photocurrent, Δ = 5   μ s is the sampling time, and ξ k N ( 0 , σ ξ 2 ) and g D , σ ξ are known parameters. The Larmor frequency and the external magnetic field B are related to each other by the formula ω L = γ e B , where γ e is the gyromagnetic ratio. Hence, by measuring ω L , one can determine the field B. Taking T 2 as the time unit and after rescaling the time, state variables, observations, and the input signal E , we get the following, equivalent to (108), stochastic system:
d x = A c ( θ ) + B c u d t + G c d w ,
y k = C x k + s v v k ,
where
A c ( θ ) = 1 θ θ 1 , B c = 0 b c , G c = 2 I , C = 0 1 ,
x k = x ( t k ) , t k = k Δ , Δ = 5.7471 · 10 3 , s v = 11.85 , and b c = 10 5 . The input signal u in (109) corresponds to E in (108). The parameter θ in (109) is related to the Larmor frequency ω L in (108), by formula θ = ω L T 2 . Since the estimation error of the parameter θ depends on the input signal u, a natural question arises as to what form this signal should take. To solve this problem, we will go to discrete time and apply the methodology described in Section 4 and Section 5. Assuming that u ( t ) = u k , t [ t k , t k + 1 ] , the discrete-time system corresponding to (109) has the form
x k + 1 = A ( θ ) x k + B ( θ ) u k + G w k ,
where, according to the procedure given in Appendix C,
A ( θ ) = e Δ cos θ Δ sin θ Δ sin θ Δ cos θ Δ , B ( θ ) = b c 1 + θ 2 θ e Δ ( θ cos θ Δ + sin θ Δ ) 1 e Δ ( cos θ Δ θ sin θ Δ ) ,
G = 1 e 2 Δ I .
We assume that the prior distribution of θ is Gaussian, that is, p 0 ( θ ) = N ( θ , m θ , s θ ) with m θ = 54.6637 , s θ = 10.76 , which corresponds to the Larmor frequency of 10 kHz and an initial uncertainty in the order of 600 Hz ( 3 σ ). At the beginning of the process, the system is in thermal equilibrium, that is, p ( x 0 | θ ) = N ( x 0 , 0 , I ) . Following Lemma 2, we get θ 1 = m θ s θ 51 , θ 2 = m θ + s θ 58 , p 0 , 1 = p 0 , 2 = 0.5 . Thus, r = 2 in (31), and according to (35), the Bayesian optimal signal is a solution of the simplified and convex optimization problem (36) with d 1 , 2 defined by Lemmas 3 and 4. Moreover, since the matrices in (113), (114) do not depend on u k , the last two terms in (77) can be omitted. The set of admissible signals is given by (3) with U ˜ = 0 ; that is, the signal norm cannot be greater than ϱ .
In order to employ the classical methods described in Section 5, it is necessary to first evaluate the sensitivity of the prediction error. Similarly to in the previous example, the polynomials A , B , C have the form
A ( θ , z ) = 1 2 e Δ cos ( θ Δ ) z 1 + e 2 Δ z 2 ,
B ( θ , z ) = B 2 ( θ ) e Δ B 1 ( θ ) sin ( θ Δ ) + B 2 ( θ ) cos ( θ Δ ) z 1 ,
C ( θ , z ) = 1 + K 2 ( θ ) 2 e Δ cos ( θ Δ ) z 1 + + e Δ e Δ K 1 ( θ ) sin ( θ Δ ) K 2 ( θ ) cos ( θ Δ ) z 2 ,
and the Kalman gain K and the vector B are given by (82) and (113), respectively. The sensitivity Equation (87) is given by (106). The D-optimal signal is then obtained through maximization of the averaged D-optimal criterion (107) with constraints (3).
The optimal input signals were designed by maximizing the Bayesian criterion (44), the averaged D-optimal criterion (107), and the spectral criterion (90), subject to the constraint (3) with U ˜ = 0 . The results are presented in Figure 5 and Figure 6. Figure 6 also shows the estimation error for the step (constant) signal and the harmonic signal u ( t ) = a cos ( m θ t ) . The frequency of the harmonic signal was equal to the expected value of the a priori distribution of the parameter θ . The constant, MOOSE, and harmonic signals were always assigned a norm equal to ϱ .

6.4. Bayesian Input Signal Design for the Pump Laser in an Optically Pumped Magnetometer

Optically pumped magnetometers operate by aligning atomic spins with a circularly polarized pump laser, after which the spins precess around the external magnetic field at the Larmor frequency. The probe laser measures this precession via polarization rotation (the Faraday effect), linking the detected signal to the magnetic field [39,40]. The pump laser’s frequency strongly affect spin polarization and coherence time, making precise laser control central to minimizing the estimation error. The advanced control strategies can then suppress the noise and enhance sensitivity. Consequently, accurate control of the pumping laser is a key factor in achieving a high-resolution and low-error magnetometer. We consider here the magnetometer model given by Equation (S9) in the article [29]:
d F d τ = γ e B + G S 3 z ^ × F γ F + P ( τ ) ( z ^ F m a x F ) + G 0 ( P ( τ ) ) w ,
where F = ( F x , F y , F z ) T is the collective atomic spin, γ e is the electron gyromagnetic ratio, B = ( B x , B y , B z ) T is a constant magnetic field vector, G is a known positive constant, and G S 3 z ^ is the effective field produced by ac-Stark shifts due to the probe laser, where S 3 is white Gaussian noise with the variance σ 3 2 . The optical pumping rate P ( τ ) 0 is an input signal. The atomic spin noise G 0 ( P ( τ ) ) w is modeled as a white Gaussian where w = ( w x , w y , w z ) T is a vector of standard and mutually independent Wiener increments. The G 0 matrix is diagonal and is given by
G 0 ( P ( τ ) ) = 2 3 F ( F + 1 ) N A ( γ + P ( τ ) ) I ,
where N A is the number of atoms, and F is a known atomic spin number. Parameter F m a x = N A F is the maximum possible polarization. The transverse relaxation rate γ depends on the number of atoms and is given by γ ( N A ) = T 2 ( N A ) 1 = γ 0 + 10 12 α N A , where γ 0 and α are known positive constants and T 2 is the effective coherence time. The observation equation has the form
S 2 = F z + N S 2 ,
where N S 2 denotes the measurement noise with the variance σ 2 2 . In the experiment, the S 2 component of the Stokes vector is measured at discrete time moments t k = k Δ , where Δ is the sampling period. The realistic parameters of the model are given in Table 1.
In what follows, Equation (118) will be interpreted in the Itö sense. Moreover, we assume that the noise G S 3 in (118) is small and can be omitted.
By introducing the state variables ξ = F 3 F ( F + 1 ) N A , the control variable u = P / γ , and non-dimensional time t = γ τ and after multiplying both sides of (120) by 3 F ( F + 1 ) N A , we get the following model:
d ξ = ( A c ( θ , u ) ξ + B c u ) d t + G c ( u ) d η , y k = ξ 3 ( t k ) + σ v v k ,
where η is the three-dimensional standard Wiener process with unit covariance and
A c ( θ , u ) = ( 1 + u ) θ 3 θ 2 θ 3 ( 1 + u ) θ 1 θ 2 θ 1 ( 1 + u ) , B c = 0 0 b c , G c ( u ) = 2 ( 1 + u ) I ,
with b c = 3 N A F F + 1 , σ v = σ 2 3 F ( F + 1 ) N A . Taking the parameters from Table 1, we have b c = 1.22 · 10 6 , σ v = 11.85 . The parameter vector θ = ( θ 1 , θ 2 , θ 3 ) T represents the external magnetic field due to the relation θ = γ e T 2 B . If u is a constant signal, then system (121) approaches thermodynamic equilibrium, with E x ( t ) = A c ( θ , u ) 1 B c u and cov ( x ( t ) ) = I .
A closer examination of Equation (121) shows that the component ξ 3 ( t , θ ) of its solution remains invariant under rotations of the vector θ about the z-axis. As a result, θ , and hence the field B , cannot be uniquely identified from the observations y 0 , , y N . The only quantities that can be uniquely identified in this model are the magnitude of the vector B and the angle η between B and one of the coordinate axes, say, the z ^ axis. However, to simplify the problem as much as possible, we introduce here the additional assumption that the field B always lies in the xy plane, that is, B = ( B x , B y , 0 ) T . With this assumption, the change in variables
x 1 = ξ 1 sin φ ξ 2 cos φ , x 2 = ξ 3 ,
cos φ = θ 1 θ 1 2 + θ 2 2 , sin φ = θ 2 θ 1 2 + θ 2 2 ,
reduces model (121) to a two-dimensional system:
d x = ( A c ( θ , u ) x + B c u ) d t + G c ( u ) d w , y k = C x k + σ v v k ,
where x = ( x 1 , x 2 ) T , θ = γ e T 2 B x 2 + B y 2 , w is a two-dimensional standard Wiener process with unit covariance, and
A c ( θ , u ) = ( 1 + u ) θ θ ( 1 + u ) , B c = 0 b c , C = 0 1 , G c ( u ) = 2 ( 1 + u ) I .
Hence, under the assumption B z = 0 , the observations y 0 , , y N , the variable ξ 3 , and the F z component of the collective spin are fully characterized by the reduced model (125). Furthermore, within this reduced model, it can be readily verified that θ is uniquely identifiable. Naturally, the accuracy of estimating θ depends on the choice of input u. To determine an input u that maximizes the information about θ , we now turn to the discrete-time formulation of (125) and apply the methods described in Section 3 and Section 4. Assuming the control signal is piecewise constant, that is, u ( t ) = u k , t [ t k , t k + 1 ] , the process x k = x ( t k ) satisfies the difference equation
x k + 1 = A ( θ , u k ) x k + B ( θ , u k ) + G ( u k ) w k ,
where w k N ( 0 , I ) , and the matrices A , B , G can be calculated following the procedure given in Appendix C. Upon the completion of straightforward calculations, we get
A ( θ , u k ) = e ( 1 + u k ) Δ cos θ Δ sin θ Δ sin θ Δ cos θ Δ , G ( u k ) = 1 e 2 ( 1 + u k ) Δ I ,
B ( θ , u k ) = b c u k ( 1 + u k ) + θ 2 e ( 1 + u k ) Δ θ cos ( θ Δ ) + ( 1 + u k ) sin ( θ Δ ) θ e ( 1 + u k ) Δ θ sin ( θ ) ( 1 + u k ) cos ( θ Δ ) + ( 1 + u k ) .
At the beginning of the process, the system is in a thermal equilibrium corresponding to u 0 . Hence, p ( x 0 | θ ) = N ( x 0 , 0 , I ) . We also assume that the prior distribution of θ is Gaussian, that is, p 0 ( θ ) = N ( θ , m θ , s θ ) with m θ = 54.6637 , s θ = 3 · 10 3 , which corresponds to the Larmor frequency of 10 kHz and its initial uncertainty in the order of 30 Hz ( 3 σ ). Similarly to in Section 6.3, we get the following from Lemma 2: θ 1 = m θ s θ 54.61 , θ 2 = m θ + s θ 54.72 , p 0 , 1 = p 0 , 2 = 0.5 . Since r = 2 in (31), then according to (35), the Bayesian optimal signal is a solution of the simplified optimization problem (36) with d 1 , 2 calculated using Lemmas 3 and 4. Unlike in the previous examples, in this problem, we maximize criterion (36) with constraints on the signal amplitude, that is, 0 u k u m a x , which is preferable in realistic scenarios.
The results are presented in Figure 7, Figure 8 and Figure 9. The optimal input signal consistently lies on the boundary of the admissible set. For a small value of u max , the optimal signal is rectangular, with a frequency close to the a priori Larmor frequency. Since large values of u ( t ) strongly damp spin oscillations and increase the noise, the optimal signals for a large u max value consist of short pulses at the maximum admissible amplitude. Once the oscillations decay, the system should be re-excited using a new sequence of short pulses, repeated periodically, as illustrated in Figure 9. The harmonic signal u ( t ) = 0.5 u max ( 1 + cos ( m θ t ) ) is nearly optimal for a small u max value but becomes ineffective for a large u max value, as it strongly damps the oscillations (see the lower-right panel of Figure 8). As a result, the measurements carry less information about the Larmor frequency, and the estimation error increases despite the higher signal amplitude. Analogous behavior is observed for rectangular signals. More generally, let s ( t ) [ 0 , 1 ] , be any signal, and define u ( t ) = a s ( t ) with a 0 . Then, as illustrated in Figure 7, the estimation error reaches a minimum for some non-zero value of the parameter a.
Extending the experimental duration from 2 to 5 ms reduces the estimation error by a factor of 2 compared to the case shown in Figure 7. For u max = 200 and an experimental duration of 5 ms, the harmonic input signal yields an estimation error of 7 mHz, while the optimal signal, shown in the lower-left panel of Figure 9, reduces the error to 0.48 mHz, that is, approximately 14 times smaller. Finally, the estimation error attains the Information-Theoretic Lower Bound (20), demonstrating that in this case the MAP estimator (10) achieves the optimal performance.
It should be noted that the above models assume the Markovian environment, and this condition should be checked in an experiment. To this end, one can use the criterion given in [41]. Non-Markovian models are much more complicated (see, e.g., [42]), and one would need to employ a noise model with long memory. To model long-memory noise, fractional-order stochastic equations can be used instead of (118). Such models capture long-memory effects, and their noise correlation function decays slowly, for example, as t 1 / 2 .
To implement the proposed method in real time, one can proceed as follows. First, observe that the pump signal has a simple structure, consisting of short pulses at the maximum admissible amplitude, each lasting approximately 5 μ s. These pulses should be repeated with a period of roughly 2 T 2 , and each pulse should be triggered when the vector [ F y , F z ] forms an angle of about 30° with the z-axis (i.e., 30° before the maximum of F z ). To estimate the unknown vector F and the Larmor frequency, the MAP estimator is too slow for real-time applications. Instead, an Extended Kalman Filter (EKF) can be employed in a manner roughly similar to that described in [43,44]. This approach is considered feasible for implementation in an experimental setup.

7. Discussion and Conclusions

This paper has developed a Bayesian framework for optimal input signal design in the identification of quasi-linear stochastic dynamical systems. Using an Information-Theoretic Lower Bound on the estimation error and its connection to the Bayesian Cramér–Rao Bound, we showed that maximizing mutual information provides a principled alternative to Fisher-information-based criteria. The proposed method relies on the maximization of the MI lower bound (30), which produces a tractable surrogate objective for both finite parameter sets and parameter spaces of continuum cardinality. A key contribution is the algorithmic reduction in the dimension of the covariance matrices required for inversion by a factor of N, making the method feasible for long-term experiments.
The comparison with the average D-optimal design highlights the practical benefits of the Bayesian approach. While classical methods are computationally efficient, they require complex differentiations to evaluate parameter sensitivities and may yield suboptimal results when the parameter uncertainty is large or when the system exhibits significant non-linearities. In contrast, the proposed Bayesian method requires only the system matrices A , B , C , G , together with the prior distributions of the parameter and the initial conditions, without the need to calculate derivatives of the prediction errors. This makes the method applicable to a much broader class of systems while also enabling it to handle large initial parameter uncertainty.
The method also has certain limitations. The lower bound of the MI involves exponential terms that can vanish when the pairwise distance factors d i , j ( U ) are large, which can cause numerical problems. However, this drawback can be mitigated through appropriate scaling of the optimization problem. If we consider the simplified optimization problem (36), with only two candidate parameter values, these numerical problems never occur. The discrete approximation of the MI (29) is a potential source of problems, and the weights and nodes in (38) should be carefully selected to achieve a sufficient approximation accuracy. The third limitation arises from the fact that the maximized criterion is only a lower bound on the MI and is generally not tight. Consequently, a class of problems certainly exists for which maximizing this lower bound is inefficient and may generate signals that are far from optimality in the sense of maximizing the MI (19)
In all analyzed examples, the proposed Bayesian approach, although approximate, generated signals no worse, or even better (see Figure 2 and Figure 4), than the classical methods. The second example illustrates that a non-Gaussian prior distribution leads to increased errors in the average D-optimal method. For a Gaussian prior distribution, it was confirmed that both the average D-optimal and the proposed Bayesian method yield identical results. This observation underscores the sensitivity of the classical approach to the form of the prior distribution and highlights the necessity of employing estimation techniques that are robust to non-Gaussian priors. In the third example, the D-optimal method produces results almost identical to those of the Bayesian approach. To explain this, note that in this problem the prior distribution of the parameter θ is relatively narrow. Then, the function d 1 , 2 ( U ) , which we minimize in this task, is approximately proportional to the sensitivity of the output to changes in θ . Thus, d 1 , 2 ( U ) can be interpreted as a quantity proportional to the Fisher information. Consequently, the resulting input signals and the corresponding estimation errors are nearly identical.
The study of atomic sensor models further demonstrates the practical relevance of the approach. The optimal signals in these examples are always better than the harmonic signal, with a frequency equal to the expected natural frequency of the oscillator. The fourth example, a seemingly minor modification of the oscillator from the third example, shows that the dependence of the system matrices on the control signal is significant and leads to completely different optimal signals. In the analyzed examples, the MAP estimator achieves an Information-Theoretic Lower Bound (20), but this is not always the case, and depending on the task, there are better estimators. Unfortunately, finding them is difficult.
Since the method produces the posterior distribution p ( θ , x k | Y k ) , it can be easily converted into a sequential Bayesian Adaptive Design (BAD) algorithm [5,6]. Then, the optimal strategy is a functional of the posterior, that is, u k = ϕ k ( p ( θ | Y k ) , m k ( θ ) , S k ( θ ) ) . In the simplest case, when the matrices A , B , G do not depend on u k and Θ = { θ 1 , θ 2 } , the optimal strategy ϕ k can be determined by maximizing (30) on the trajectories of the system (71)–(75). This problem is deterministic and therefore relatively simple and can be solved using deterministic optimal control methods.
From a broader perspective, quasi-linear systems arise naturally in quantum mechanics, chemical engineering, and thermal processes, making the proposed method widely applicable. In conclusion, this work provides both theoretical justification and practical tools for Bayesian input design in quasi-linear stochastic systems. By bridging information-theoretic principles with efficient computational methods, it establishes a foundation for robust experimental design in a wide range of applications. The results reported here should stimulate further research at the intersection of Bayesian inference, control, and the identification of non-linear systems.

Author Contributions

The article concept, the proofs of Theorem 1 and Lemmas 2–4, Appendix A, Appendix B and Appendix C, the MATLAB, R2018b codes and the implementation of Bayesian and classical signal selection methods, the development of all examples, comparison with the classical methods, all formula derivations, and text writing: P.B. Determination of the optimal spectrum and signal generation using the MOOSE-2 toolbox in Section 6.1 and Section 6.3; verification of the correctness of formulas describing discrete systems in Section 6.1, Section 6.3, and Section 6.4; and verification of the proofs of Lemmas 3 and 4: A.W. Text proofreading, literature review, introduction, discussion, and conclusions: P.B. and A.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the statutory subsidy of the AGH University of Science and Technology, No. 16.16.120.773, and by the Initiative of Excellence-Research University (IDUB) program.

Data Availability Statement

The MATLAB codes, in particular the functions for calculating the lower bound (31) and d i , j in (32) and (36), are available in the repository at https://github.com/Jhiqo/Bay_design_ql_sys (accessed on 29 September 2025).

Acknowledgments

The authors gratefully acknowledge Morgan Mitchell, Jan Kołodyński, Klaudia Dilcher, Aleksandra Sierant, Julia Amorós-Binefa, and Diana Méndez-Ávalos for the insightful discussions on quantum control and atomic magnetometers.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript: MI—Mutual Information FIM—Fisher Information Matrix; CRB—Cramér–Rao Bound; BCRB—Bayesian Cramér–Rao Bound; ITB—Information-Theoretic Bound; DOE—Design of Experiment; SDE—Stochastic Differential Equation. The norm of the vector x R n is denoted by |x|. For any square matrix Q, the quadratic form x T Q x is denoted by | x | Q 2 . The trace and determinant of the matrix A are denoted by trA, |A|, or det(A). The set of symmetric, positive definite matrices of dimension n is denoted by S+ (n). The symbol col (a1, a2,…, an) denotes the column vector. ξ N (m,S) means that ξ has a normal distribution with mean m and covariance S. The density of the Gaussian variable is denoted by N ( x , m , S ) = 2 π n 2 | S | 1 2   exp ( 0.5 ( x m ) T S 1 ( x m ) ) .

Appendix A. Proofs

Proof of Theorem 1.
θ ^ M ( Y , U ) = E p ( θ | Y , U ) ( θ | Y , U ) be the Minimum Mean Squared Error estimator of θ . The conditional covariance of θ ^ M is given by C ( Y , U ) = p ( θ | Y , U ) ( θ θ ^ M ( Y , U ) ) ( θ θ ^ M ( Y , U ) ) T d θ . Since the Gaussian distribution maximizes entropy over all distributions with the same covariance, it can be proven (see [22] [Theorem 8.6.5, p. 255]) that
H θ | Y ( U ) = E ( ln p ( θ | Y , U ) ) 1 2 E ln ( ( 2 π e ) n θ | C ( Y , U ) | ) .
Any covariance matrix C satisfies the inequality | C | ( n θ 1 tr ( C ) ) n θ (see [22], Theorem 17.9.4, p. 680). Hence,
ln | C ( Y , U ) | n θ ln n θ 1 tr C ( Y , U ) .
Taking into account (19) and Equations (A1) and (A2), we have
H θ I θ ; Y ( U ) = H θ | Y ( U ) = E ( ln p ( θ | Y , U ) ) 1 2 E ln ( ( 2 π e ) n θ | C ( Y , U ) | ) n θ 2 E ln ( 2 π e n θ 1 tr C ( Y , U ) ) .
According to the concavity of the logarithm and from Jensen’s inequality,
H θ I θ ; Y ( U ) n θ 2 ln ( 2 π e n θ 1 E tr C ( Y , U ) ) .
Using the equality E tr C ( Y , U ) = E | θ θ ^ M ( Y , U ) | 2 yields
H θ I θ ; Y ( U ) n θ 2 ln ( 2 π e n θ 1 E | θ θ ^ M ( Y , U ) | 2 ) .
Since θ ^ M is the MMSE, then E | θ θ ^ M ( Y , U ) | 2 E | θ θ ^ ( Y , U ) | 2 , and
H θ I θ ; Y ( U ) n θ 2 ln ( 2 π e n θ 1 E | θ θ ^ ( Y , U ) | 2 ) ,
which is equivalent to the first inequality in (20). The proof of the Efroimovitch inequality, that is, the second inequality in (20), is given in [23] [Cor. 3, Chapter 2.2, p. 16].  □
Proof of Lemma 3.
The proof of (59), (63)–(68) is well documented in the literature and can be found in [19] and [45] [Theorem 12.3, p. 187]. However, for completeness and the convenience of the reader, we reproduce it here in its entirety. Let us denote X k = col ( x 0 , , x k ) , Y k = col ( x 0 , , x k ) . We begin by recalling the filtering equations. If θ is a fixed parameter, then the solution of Equation (45) is a Gauss–Markov process with the transition density
p ( x k | x k 1 , θ ) = N ( x k , A k 1 x k 1 + B k 1 , A k 1 S k 1 A k 1 T + G k 1 G k 1 T ) ,
where we use the notation of Section 4 and we omit the arguments U and u k in all the formulas below. It follows from (46) that the density of the observations y k , conditioned on X k , Y k 1 , θ , has the form
p ( y k | X k , Y k 1 , θ ) = p ( y k | x k , θ ) = N ( y k , C x k , S v ( θ ) ) .
According to the assumptions given at the beginning of Section 4, the initial distribution of x 0 is given by
p ( x 0 | θ ) = N ( x 0 , m 0 ( θ ) , S 0 ( θ ) ) ,
where m 0 , S 0 are smooth functions, and S 0 ( θ ) > 0 , for all θ Θ . To find p ( Y k | θ ) , we proceed as follows:
p ( x k , Y k | θ ) = p ( y k , x k , x k 1 , Y k 1 | θ ) d x k 1 = p ( y k | x k , x k 1 , Y k 1 , θ ) p ( x k , x k 1 , Y k 1 | θ ) d x k 1 = p ( y k | x k , θ ) p ( x k | x k 1 , Y k 1 , θ ) p ( x k 1 , Y k 1 , θ ) d x k 1 = p ( y k | x k , θ ) p ( x k | x k 1 , θ ) p ( x k 1 | Y k 1 , θ ) d x k 1 p ( Y k 1 | θ ) = p ( y k | x k , θ ) p ( x k | Y k 1 , θ ) p ( Y k 1 | θ ) ,
where
p ( x k | Y k 1 , θ ) = p ( x k | x k 1 , θ ) p ( x k 1 | Y k 1 , θ ) d x k 1 ,
is the so-called predictive distribution or prediction step. Integration of both sides of (A6) over x k yields
p ( Y k | θ ) = p ( y k | Y k 1 , θ ) p ( Y k 1 | θ ) ,
where
p ( y k | Y k 1 , θ ) = p ( y k | x k , θ ) p ( x k | Y k 1 , θ ) d x k ,
is the predictive distribution of y k . Dividing (A6) by (A8) gives the correction step:
p ( x k | Y k , θ ) = p ( y k | x k , θ ) p ( x k | Y k 1 , θ ) p ( y k | Y k 1 , θ ) .
Summarizing the above considerations, we have the following algorithm.
(1)
Set the initial conditions:
p ( x 0 | Y 1 , θ ) = p ( x 0 | θ ) , p ( Y 1 | θ ) = 1 .
(2)
For k = 0 , 1 , , N , calculate
p ( y k | Y k 1 , θ ) = p ( y k | x k , θ ) p ( x k | Y k 1 , θ ) d x k ,
p ( Y k | θ ) = p ( y k | Y k 1 , θ ) p ( Y k 1 | θ ) ,
p ( x k | Y k , θ ) = p ( y k | x k , θ ) p ( x k | Y k 1 , θ ) p ( y k | Y k 1 , θ ) d x k ,
p ( x k + 1 | Y k , θ ) = p ( x k + 1 | x k , θ ) p ( x k | Y k , θ ) d x k .
By substituting (A3)–(A5) into (A11)–(A15), and after somewhat tedious calculations, we obtain
p ( x k | Y k , θ ) = N ( x k , m k ( θ ) , S k ( θ ) ) ,
p ( y k | Y k 1 , θ ) = N ( y k , C m k ( θ ) , Σ k ( θ ) ) ,
where m k ( θ ) , S k ( θ ) , m k ( θ ) , Σ k ( θ ) , k = 0 , 1 , , are given by the Kalman filtering Equations (63)–(68). Then, by using the recursive formula (A13) and (A17), we get
p ( Y | θ ) = k = 0 N N ( y k , C m k ( θ ) , Σ k ( θ ) ) ,
where Y = col ( y 0 , , y N ) . On the other side, according to (55)–(58), we have
p ( Y | θ ) = N ( Y , F ( θ ) , S ( θ ) ) = k = 0 N N ( y k , C m k ( θ ) , Σ k ( θ ) ) .
Taking the logarithm of both sides and calculating its expected value, we get
p ( Y | θ ) ln p ( Y | θ ) d Y = = 1 2 ln ( 2 π e ) n y ( N + 1 ) | S ( θ ) | = 1 2 ln ( 2 π e ) n y ( N + 1 ) k = 0 N | Σ k ( θ ) | .
Hence, | S | = k = 0 N | Σ k | , which proves (61). Now, taking the logarithm of (A19), we have
1 2 ln | S | + 1 2 | Y F | S 1 2 = 1 2 ln k = 0 N | Σ k | + 1 2 k = 0 N | y k C m k | Σ k 1 2 ,
where the arguments have been omitted for convenience. Since | S | = k = 0 N | Σ k | , we get (60). Putting (60) and (61) into (9) gives (62).  □
Proof of Lemma 4.
Let Θ = { θ 1 , , θ r } , θ i R n θ . We are interested in calculation of the quantity
d i , j ( U ) = 1 8 Δ i , j T 1 2 ( S i + S j ) 1 Δ i , j + 1 2 ln | 1 2 ( S i + S j ) | 1 4 ln | S i | | S j | ,
where
Δ i , j = F ( θ i , U ) F ( θ j , U ) , S i = S ( θ i , U ) , S j = S ( θ j , U )
and F ( θ , U ) , S ( θ , U ) are defined by Equations (55)–(58) of Section 4. Let us define
Y ( i ) = F ( θ i , U ) + Z ( i ) , Y ( j ) = F ( θ j , U ) + Z ( j ) ,
Y ˜ = 1 2 Y ( i ) Y ( j ) = 1 2 Δ i , j + Z ( i ) Z ( j ) ,
where Z ( i ) N ( 0 , S i ) , Z ( j ) N ( 0 , S j ) . The density of variable Y ˜ , given θ ˜ = col ( θ i , θ j ) , has the form:
p ( Y ˜ | θ ˜ ) = N Y ˜ , 1 2 Δ i , j , 1 2 ( S i + S j ) .
Now, let us consider the following two systems:
x k + 1 ( i ) = A ( θ i , u k ) x k ( i ) + B ( θ i , u k ) + G ( θ i , u k ) w k ( i ) , y k ( i ) = C x k ( i ) + v k ( i ) ,
x k + 1 ( j ) = A ( θ j , u k ) x k ( j ) + B ( θ j , u k ) + G ( θ j , u k ) w k ( j ) , y k ( j ) = C x k ( j ) + v k ( j ) ,
where w k ( i ) , w k ( j ) N ( 0 , I ) and v k ( i ) , v k ( j ) N ( 0 , S v ) are mutually independent. The components of the vectors Y ( i ) and Y ( i ) in (A24) correspond to the outputs of the systems (A27) and (A28), that is, Y ( i ) = col ( y 0 ( i ) , , y N ( i ) ) , Y ( j ) = col ( y ( j ) 0 , , y N ( j ) ) . Then, on the basis of (A25), we get Y ˜ = col ( y ˜ 0 , , y ˜ 0 ) , where
y ˜ k = 1 2 y k ( i ) y k ( j ) = 1 2 C x k ( i ) x k ( j ) + v k ( i ) v k ( j ) .
Defining the matrices A ˜ k , B ˜ k , G ˜ k , C ˜ , according to (69) and (70), and taking into account that 1 2 v k ( i ) v k ( j ) N ( 0 , S v ) , we can replace Equations (A27)–(A29) with a single, 2 n -dimensional system with n y outputs:
x ˜ k + 1 = A ˜ k ( θ ˜ ) x ˜ k + B ˜ k ( θ ˜ ) + G ˜ k ( θ ˜ ) w ˜ k , y ˜ k = C ˜ x ˜ k + v k ,
where x ˜ k = col ( x k ( i ) , x k ( j ) ) , w ˜ k = col ( w k ( i ) , w k ( j ) ) , and v k N ( 0 , S v ) . Proceeding analogously to the proof of Lemma 3, we infer that the conditional density of variable Y ˜ is given by
p ( Y ˜ | θ ˜ ) = k = 0 N N ( y ˜ k , C ˜ m ˜ k ( θ ˜ ) , Σ ˜ k ( θ ˜ ) ) ,
where m ˜ k ( θ ˜ ) , Σ ˜ k ( θ ˜ ) , are calculated recursively using the Kalman filter equations
Σ ˜ k = S v + C ˜ S ˜ k C ˜ T ,
L ˜ k = S ˜ k C ˜ T Σ ˜ k 1 ,
m ˜ k = ( I L ˜ k C ˜ ) m ˜ k + L ˜ k y ˜ k ,
S ˜ k = S ˜ k L ˜ k Σ ˜ k L ˜ k T ,
m ˜ k + 1 = A ˜ k m ˜ k + B ˜ k ,
S ˜ k + 1 = A ˜ k S ˜ k A ˜ k T + G ˜ k G ˜ k T , k = 0 , 1 , N ,
with initial conditions (76). Comparing (A26) with (A31) gives:
p ( Y ˜ | θ ˜ ) = N Y ˜ , 1 2 Δ i , j , 1 2 ( S i + S j ) = k = 0 N N ( y ˜ k , C ˜ m ˜ k ( θ ˜ ) , Σ ˜ k ( θ ˜ ) ) .
Taking the logarithm of both sides and calculating its expected value, we get:
p ( Y ˜ | θ ˜ ) ln p ( Y ˜ | θ ˜ ) d Y ˜ = = 1 2 ln ( 2 π e ) n y ( N + 1 ) | 1 2 ( S i + S j ) | = 1 2 ln ( 2 π e ) n y ( N + 1 ) k = 0 N | Σ ˜ k ( θ ˜ ) | .
Hence,
ln | 1 2 ( S i + S j ) | = k = 0 N ln | Σ ˜ k | .
Taking the logarithm of (A38) and applying (A40) yields:
1 2 ( Y ˜ 1 2 Δ i , j ) T 1 2 ( S i + S j ) 1 ( Y ˜ 1 2 Δ i , j ) = 1 2 k = 0 N | y ˜ k C ˜ m ˜ k | Σ ˜ k 1 2 .
Finally, since Y ˜ = col ( y ˜ 0 , , y ˜ 0 ) , then substituting Y ˜ = 0 , y ˜ k = 0 in (A41), (A34), we conclude that
1 8 Δ i , j T 1 2 ( S i + S j ) 1 Δ i , j = 1 4 k = 0 N | C ˜ m ˜ k | Σ ˜ k 1 2 ,
where m ˜ k , Σ ˜ k fulfils Equations (71)–(75). The last term in (A22) is calculated according to Lemma 3.  □

Appendix B. An Example of the Gap Between the ITB and BCRB

The difference between the ITB (20) and BCRB (14) can be significant. To see this, let us consider the model
y = θ + v ,
where v N ( 0 , 1 ) . The elementary calculation yields J D = 1 . If the prior is Gaussian, that is, p 0 ( θ ) = N ( θ , m θ , σ θ 2 ) , then J P = σ θ 2 , 2 ( H θ I θ ; y ) = ln 2 π e ln ( σ θ 2 + 1 ) , and both the BCRB (14) and ITB (20) yield the same error estimate. Now, let us assume that
p 0 ( θ ) = 1 2 Φ ( α ( 1 + θ ) ) + Φ ( α ( 1 θ ) ) 1 ,
where α > 0 is a parameter, and Φ ( t ) = t N ( t , 0 , 1 ) d t . The prior (A44) is an analytic function which, in the limit α , tends to the uniform distribution U [ 1 , 1 ] . Since H ( y | θ ) = 1 2 ln ( 2 π e ) , H ( y ) 1 2 ln ( 2 π e ( var ( θ ) + 1 ) ) , var ( θ ) = 1 3 + O 1 ( α 1 ) , H ( θ ) = ln 2 + O 2 ( α 1 ) and H θ I θ ; y = H ( y | θ ) + H ( θ ) H ( y ) , after elementary calculations, we get the ITB:
E ( θ θ ^ ( y ) ) 2 e 2 ( H θ I θ ; y ) 2 π e 3 2 8 π e + O ( α 1 ) .
On the other hand, according to (12), we have
J P = α 2 2 N ( α ( 1 + θ ) , 0 , 1 ) N ( α ( 1 θ ) , 0 , 1 ) 2 Φ ( α ( 1 + θ ) ) + Φ ( α ( 1 θ ) ) 1 d θ α 2 π α .
Hence, the BCRB (14) becomes trivial, but the ITB still gives a reasonable error estimate. If the likelihood is non-Gaussian, a similar effect occurs. Therefore, the BCRB generally underestimates the minimum possible estimation error.

Appendix C. Discretization of Linear SDE

Consider the continuous-time SDE
d x = ( A c x + B c u ) d t + G c d w ,
where x ( t ) R n , w ( t ) R n w is a vector of mutually independent standard Wiener processes. Let Δ denote the discretization period, and let u ( t ) = u k , t [ t k , t k + 1 ] , t k = k Δ . Then, process x k = x ( t k ) fulfils the difference equation:
x k + 1 = A x k + B u k + G w k ,
where w k N ( 0 , I n w ) and
A = e A c Δ , B = 0 Δ e A c τ B c d τ , D = G G T = 0 Δ e A c τ G c G c T e A c T τ d τ .
If D > 0 , then G can be determined using the Cholesky factorization of D . In the general case, we use spectral decomposition D = Q Λ Q T , and then G = Q Λ 0.5 .

References

  1. Goodwin, G.C.; Payne, R.L. Dynamic System Identification: Experiment Design and Data Analysis; Academic Press: New York, NY, USA, 1977. [Google Scholar]
  2. Ljung, L. System Identification: Theory for the User, 2nd ed.; Prentice Hall PTR: Saddle River, NJ, USA, 1999. [Google Scholar]
  3. Söderström, T.; Stoica, P. System Identification; Prentice-Hall International Series in Systems and Control Engineering; Prentice-Hall: Saddle River, NJ, USA, 1989. [Google Scholar]
  4. Pronzato, L. Optimal experimental design and some related control problems. Automatica 2008, 44, 303–325. [Google Scholar] [CrossRef]
  5. Huan, X.; Jagalur, J.; Marzouk, Y. Optimal experimental design: Formulations and computations. Acta Numer. 2024, 33, 715–840. [Google Scholar] [CrossRef]
  6. Rainforth, T.; Foster, A.; Ivanova, D.R.; Bickford Smith, F. Modern Bayesian Experimental Design. Stat. Sci. 2024, 39, 100–114. [Google Scholar] [CrossRef]
  7. Fedorov, V.V.; Hackl, P. Model-Oriented Design of Experiments; Springer: Berlin/Heidelberg, Germany, 1997; Volume 125. [Google Scholar]
  8. Lindley, D.V. On a Measure of the Information Provided by an Experiment. Ann. Math. Stat. 1956, 27, 986–1005. [Google Scholar] [CrossRef]
  9. Arimoto, S.; Kimura, H. Optimum input test signals for system identification—An information-theoretical approach. Int. J. Syst. Sci. 1971, 1, 279–290. [Google Scholar] [CrossRef]
  10. Chaloner, K.; Verdinelli, I. Bayesian Experimental Design: A Review. Stat. Sci. 1995, 10, 273–304. [Google Scholar] [CrossRef]
  11. Ryan, E.; Drovandi, C.; McGree, J.; Pettitt, A. A Review of Modern Computational Algorithms for Bayesian Optimal Design. Int. Stat. Rev. 2015, 84, 128–154. [Google Scholar] [CrossRef]
  12. Kolchinsky, A.; Tracey, B.D. Estimating Mixture Entropy with Pairwise Distances. Entropy 2017, 19, 361. [Google Scholar] [CrossRef]
  13. Kolchinsky, A.; Tracey, B.D. Estimating Mixture Entropy with Pairwise Distances. arXiv 2017, arXiv:1706.02419. [Google Scholar]
  14. Altafini, C.; Ticozzi, F. Modeling and Control of Quantum Systems: An Introduction. IEEE Trans. Autom. Control 2012, 57, 1898–1917. [Google Scholar] [CrossRef]
  15. Dong, D.; Petersen, I.R. Quantum control theory and applications: A survey. IET Control Theory Appl. 2010, 4, 2651–2671. [Google Scholar] [CrossRef]
  16. Friedly, J.C. Dynamic Behavior of Processes; Prentice-Hall International Series in the Physical and Chemical Engineering Sciences; Prentice-Hall: Englewood Cliffs, NJ, USA, 1972; p. 590. [Google Scholar]
  17. Lorenz, S.; Diederichs, E.; Telgmann, R.; Schütte, C. Discrimination of Dynamical System Models for Biological and Chemical Processes. J. Comput. Chem. 2007, 28, 1384–1399. [Google Scholar] [CrossRef]
  18. Bania, P. Bayesian Input Design for Linear Dynamical Model Discrimination. Entropy 2019, 21, 351. [Google Scholar] [CrossRef] [PubMed]
  19. Bania, P.; Baranowski, J. Field Kalman Filter and its approximation. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 2875–2880. [Google Scholar] [CrossRef]
  20. Bania, P. Example for equivalence of dual and information based optimal control. Int. J. Control 2018, 92, 2339–2348. [Google Scholar] [CrossRef]
  21. Baranowski, J.; Bania, P.; Prasad, I.; Cong, T. Bayesian fault detection and isolation using Field Kalman Filter. EURASIP J. Adv. Signal Process. 2017, 2017, 79. [Google Scholar] [CrossRef]
  22. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
  23. Lee, K.Y. New Information Inequalities with Applications to Statistics. Ph.D. Thesis, EECS Department, University of California, Berkeley, CA, USA, 2022. UC Berkeley Technical Report. [Google Scholar]
  24. Van Trees, H.L. Detection, Estimation and Modulation Theory; Wiley: Hoboken, NJ, USA, 1968; Volume I. [Google Scholar]
  25. Efroimovich, S.Y. Information Contained in a Sequence of Observations. Probl. Peredachi Informatsii 1979, 15, 24–39. [Google Scholar]
  26. Van Trees, H.L.; Bell, K.L. (Eds.) Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/Tracking; Wiley-IEEE Press: Hoboken, NJ, USA, 2007. [Google Scholar]
  27. Jakowluk, W. Optimal Input Signal Design in Control Systems Identification; Oficyna Wydawnicza Politechniki Białostockiej: Białystok, Poland, 2024; Available online: https://pb.edu.pl/oficyna-wydawnicza/wp-content/uploads/sites/4/2024/06/Optimal-input-signal-design-in-control-systems-identification.pdf (accessed on 29 September 2025).
  28. Jiménez-Martínez, R.; Kołodyński, J.; Troullinou, C.; Lucivero, V.G.; Kong, J.; Mitchell, M.W. Signal Tracking Beyond the Time Resolution of an Atomic Sensor by Kalman Filtering. Phys. Rev. Lett. 2018, 120, 040503. [Google Scholar] [CrossRef]
  29. Troullinou, C.; Shah, V.; Lucivero, V.G.; Mitchell, M.W. Squeezed-Light Enhancement and Backaction Evasion in a High-Sensitivity Optically Pumped Magnetometer. Phys. Rev. Lett. 2021, 127, 193601. [Google Scholar] [CrossRef]
  30. Bobrovsky, B.Z.; Mayer-Wolf, E.; Zakai, M. Some Classes of Global Cramér–Rao Bounds. Ann. Stat. 1987, 15, 1421–1438. [Google Scholar] [CrossRef]
  31. Jeong, M.; Dytso, A.; Cardone, M. A Comprehensive Study on Ziv-Zakai Lower Bounds on the MMSE. IEEE Trans. Inf. Theory 2025, 71, 3214–3236. [Google Scholar] [CrossRef]
  32. Huber, M.F.; Bailey, T.; Durrant-Whyte, H.; Hanebeck, U.D. On entropy approximation for Gaussian mixture random vectors. In Proceedings of the 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Seoul, Republic of Korea, 20–22 August 2008; pp. 181–188. [Google Scholar] [CrossRef]
  33. Nocedal, J.; Wright, S.J. Numerical Optimization, 2nd ed.; Springer Series in Operations Research and Financial Engineering; Springer: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
  34. Davis, P.J.; Rabinowitz, P. Methods of Numerical Integration, 2nd ed.; Academic Press: Orlando, FL, USA, 1984. [Google Scholar]
  35. Stroud, A.H. Approximate Calculation of Multiple Integrals; Prentice Hall: Englewood Cliffs, NJ, USA, 1971. [Google Scholar]
  36. Smolyak, S.A. Quadrature and interpolation formulas for tensor products of certain classes of functions. Sov. Math. Dokl. 1963, 4, 240–243. [Google Scholar]
  37. Jansson, H. Experiment Design with Applications in Identification for Control; Royal Institute of Technology (KTH): Stockholm, Sweden, 2004. [Google Scholar]
  38. Annergren, M.; Larsson, C.A. MOOSE2—A toolbox for least-costly application-oriented input design. SoftwareX 2016, 5, 96–100. [Google Scholar] [CrossRef]
  39. Fabricant, A.; Novikova, I.; Bison, G. How to build a magnetometer with thermal atomic vapor: A tutorial. New J. Phys. 2023, 25, 025001. [Google Scholar] [CrossRef]
  40. Budker, D.; Jackson Kimball, D.F. (Eds.) Optical Magnetometry; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar] [CrossRef]
  41. Breuer, H.P.; Laine, E.M.; Piilo, J. Measure for the Degree of Non-Markovian Behavior of Quantum Processes in Open Systems. Phys. Rev. Lett. 2009, 103, 210401. [Google Scholar] [CrossRef]
  42. Shen, H.Z.; Shang, C.; Zhou, Y.H.; Yi, X.X. Unconventional single-photon blockade in non-Markovian systems. Phys. Rev. A 2018, 98, 023856. [Google Scholar] [CrossRef]
  43. Magrini, L.; Rosenzweig, P.; Bach, C.; Deutschmann-Olek, A.; Hofer, S.G.; Hong, S.; Kiesel, N.; Kugi, A.; Aspelmeyer, M. Real-time optimal quantum control of mechanical motion at room temperature. Nature 2021, 595, 373–377. [Google Scholar] [CrossRef]
  44. Amorós-Binefa, J.; Kołodyński, J. Noisy Atomic Magnetometry with Kalman Filtering and Measurement-Based Feedback. PRX Quantum 2025, 6, 030331. [Google Scholar] [CrossRef]
  45. Särkkä, S. Bayesian Filtering and Smoothing; Institute of Mathematical Statistics Textbooks; Cambridge University Press: Cambridge, UK, 2013; Volume 3. [Google Scholar] [CrossRef]
Figure 1. Optimal input signals resulting from the maximization of the Bayesian criterion (44) (top) and of the averaged D-optimal criterion (89) (bottom), shown for several values of ϱ .
Figure 1. Optimal input signals resulting from the maximization of the Bayesian criterion (44) (top) and of the averaged D-optimal criterion (89) (bottom), shown for several values of ϱ .
Entropy 27 01041 g001
Figure 2. Mean estimation errors of the parameters θ 1 and θ 2 , obtained using the MAP estimator (10), as functions of the maximum admissible signal norm ϱ . The results are based on a Monte Carlo simulation with 3000 repetitions. The constant (step) signal and the MOOSE signal were always assigned a norm equal to ϱ .
Figure 2. Mean estimation errors of the parameters θ 1 and θ 2 , obtained using the MAP estimator (10), as functions of the maximum admissible signal norm ϱ . The results are based on a Monte Carlo simulation with 3000 repetitions. The constant (step) signal and the MOOSE signal were always assigned a norm equal to ϱ .
Entropy 27 01041 g002
Figure 3. Optimal input signals obtained by maximizing the Bayesian criterion (36) (top) and the averaged D-optimal criterion (107) (bottom) subject to the constraint (3) with U ˜ = 0 .
Figure 3. Optimal input signals obtained by maximizing the Bayesian criterion (36) (top) and the averaged D-optimal criterion (107) (bottom) subject to the constraint (3) with U ˜ = 0 .
Entropy 27 01041 g003
Figure 4. Mean estimation errors of the parameter θ , obtained using the MAP estimator (10), as functions of the maximum admissible signal norm ϱ for different signals. The results are based on a Monte Carlo simulation with 6000 repetitions. Error bars show that the difference between the D-optimal and Bayesian methods is statistically significant.
Figure 4. Mean estimation errors of the parameter θ , obtained using the MAP estimator (10), as functions of the maximum admissible signal norm ϱ for different signals. The results are based on a Monte Carlo simulation with 6000 repetitions. Error bars show that the difference between the D-optimal and Bayesian methods is statistically significant.
Entropy 27 01041 g004
Figure 5. Optimal input signals (left) and corresponding system outputs (right) obtained by maximizing the Bayesian criterion (36) (top), the averaged D-optimal criterion (107) (middle), and the spectral criterion (90) (bottom), subject to the constraint (3) with U ˜ = 0 . Maximization of the spectral criterion (90) was performed using the MOOSE-2 solver evaluated at θ = m θ . The norm of all signals is equal to 1, and the scale is consistent across all plots.
Figure 5. Optimal input signals (left) and corresponding system outputs (right) obtained by maximizing the Bayesian criterion (36) (top), the averaged D-optimal criterion (107) (middle), and the spectral criterion (90) (bottom), subject to the constraint (3) with U ˜ = 0 . Maximization of the spectral criterion (90) was performed using the MOOSE-2 solver evaluated at θ = m θ . The norm of all signals is equal to 1, and the scale is consistent across all plots.
Entropy 27 01041 g005
Figure 6. Mean estimation errors of the Larmor frequency f L = ω L 2 π , obtained using the MAP estimator (10), as functions of the maximum admissible signal norm ϱ for different signals. The results are based on a Monte Carlo simulation with 3000 repetitions.
Figure 6. Mean estimation errors of the Larmor frequency f L = ω L 2 π , obtained using the MAP estimator (10), as functions of the maximum admissible signal norm ϱ for different signals. The results are based on a Monte Carlo simulation with 3000 repetitions.
Entropy 27 01041 g006
Figure 7. The estimation error of the Larmor frequency f L = θ 2 π T 2 and the Information-Theoretic Bound (ITB) (20) as a function of the maximum admissible signal amplitude u max . The errors were computed using the MAP estimator (10). Both the errors and the ITB were calculated for two cases: (i) the optimal input signal and (ii) the harmonic input u ( t ) = 0.5 u max ( 1 + cos ( m θ t ) ) . Results are based on a Monte Carlo simulation with 2000 repetitions. The prior was Gaussian with a mean Larmor frequency f L = 10 kHz and with its initial uncertainty σ f L = 10 Hz.
Figure 7. The estimation error of the Larmor frequency f L = θ 2 π T 2 and the Information-Theoretic Bound (ITB) (20) as a function of the maximum admissible signal amplitude u max . The errors were computed using the MAP estimator (10). Both the errors and the ITB were calculated for two cases: (i) the optimal input signal and (ii) the harmonic input u ( t ) = 0.5 u max ( 1 + cos ( m θ t ) ) . Results are based on a Monte Carlo simulation with 2000 repetitions. The prior was Gaussian with a mean Larmor frequency f L = 10 kHz and with its initial uncertainty σ f L = 10 Hz.
Entropy 27 01041 g007
Figure 8. The optimal signals with small, medium, and large amplitudes and the corresponding system outputs. The figure in the lower-right panel also shows the system output for the harmonic signal u ( t ) = 0.5 u m a x ( 1 + cos ( m θ t ) ) . The prior was Gaussian with a mean Larmor frequency f L = 10 kHz and with its initial uncertainty σ f L = 10 Hz.
Figure 8. The optimal signals with small, medium, and large amplitudes and the corresponding system outputs. The figure in the lower-right panel also shows the system output for the harmonic signal u ( t ) = 0.5 u m a x ( 1 + cos ( m θ t ) ) . The prior was Gaussian with a mean Larmor frequency f L = 10 kHz and with its initial uncertainty σ f L = 10 Hz.
Entropy 27 01041 g008
Figure 9. Optimal input signals of small, medium, and large amplitudes with corresponding system outputs for a 5 ms experiment. The prior was Gaussian with a mean Larmor frequency f L = 10 kHz, and with its initial uncertainty σ f L = 10 Hz.
Figure 9. Optimal input signals of small, medium, and large amplitudes with corresponding system outputs for a 5 ms experiment. The prior was Gaussian with a mean Larmor frequency f L = 10 kHz, and with its initial uncertainty σ f L = 10 Hz.
Entropy 27 01041 g009
Table 1. Typical parameters.
Table 1. Typical parameters.
ParameterAbbreviationTypical Value
Number of atoms N A 10 12
Spin numberF1
Larmor frequencies γ e B 2 π [ 50 , 50 ] kHz
Parameter γ 0 600 Hz
Parameter α 550 Hz
Typical relaxation time T 2 0.87 ms
Typical relaxation rate γ 1149 Hz
Pumping rateP0–200 kHz
Measurement noise level σ 2 9.6755 × 10 6
Sampling time Δ 5 μ s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bania, P.; Wójcik, A. An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification. Entropy 2025, 27, 1041. https://doi.org/10.3390/e27101041

AMA Style

Bania P, Wójcik A. An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification. Entropy. 2025; 27(10):1041. https://doi.org/10.3390/e27101041

Chicago/Turabian Style

Bania, Piotr, and Anna Wójcik. 2025. "An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification" Entropy 27, no. 10: 1041. https://doi.org/10.3390/e27101041

APA Style

Bania, P., & Wójcik, A. (2025). An Approximate Bayesian Approach to Optimal Input Signal Design for System Identification. Entropy, 27(10), 1041. https://doi.org/10.3390/e27101041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop