**1. Introduction**

The well known least-squares problem [1], very often used to estimate the parameters of a mathematical model, assumes an equivalence between a matrix-vector product *Ax* on the left, and a vector *b* on the right hand side: the matrix *A* is produced by the true model equations, evaluated at some operating conditions, the vector *x* contains the unknown parameters and the vector *b* are measurements, corrupted by white, Gaussian noise. This equivalence cannot be satisfied exactly, but the least-squares solution yields a minimum variance, maximum likelihood estimate of the parameters *x*, with a nice geometric interpretation: the resulting predictions *Ax* are at the minimum Euclidean distance from the true measurements *b* and the vector of residuals is orthogonal w.r.t. the subspace of all possible predictions.

Unfortunately, each violation of these assumptions produces in general a bias in the estimates. Various modifications have been introduced in the literature to cope with some of them: mainly, colored noise on *b* and/or *A* due to model error and/or colored measurement noise. The model error is often assumed as an additive stochastic term in the model, e.g., error-in-variables [2,3], with consequent solution methods like Total Least-Squares [4] and Extended Least-Squares [5], to cite a few. All these techniques let the model to be modified to describe, in some sense, the model error.

Here, instead, we assume that the model error depends from deterministic variables in a way that has not been included in the model, i.e., we suppose to use a reduced model of the real system, as it is often the case in applications. In this paper we propose a method to cope with the bias in the parameter estimates of the approximate model by exploiting the geometric properties of least-squares and using small additional a-priori information about the norm of the modelled and un-modelled components of the system response, available with some approximation in most applications. To eliminate the bias on the parameter estimates we perturb the right-hand-side without modifying the reduced model, since we assume it describes accurately one part of the true model.

### **2. Model Problem**

In applied mathematics, physical models are often available, usually rather precise at describing quantitatively the main phenomena, but not satisfactory at the level of detail required by the application at hand. Here we refer to models described by differential equations, with ordinary and/or partial

derivatives, commonly used in engineering and applied sciences. We assume, therefore, that there are two models at hand: a true, unknown model M and an approximate, known model M*<sup>a</sup>*. These models are usually parametric and they must be tuned to describe a specific physical system, using a-priori knowledge about the application and experimental measurements. Model tuning, and in particular parameter estimation, is usually done with a prediction error minimization criterion that makes the model response to be a good approximation of the dynamics shown by the measured variables used in the estimation process. Assuming that the true model M is linear in the parameters that must be estimated, the application of this criterion brings to a linear least-squares problem:

$$\mathfrak{x} = \underset{\mathbf{x}' \in \mathbb{R}^n}{\operatorname{argmin}} \|A\mathbf{x}' - \bar{f}\|^2 \,\tag{1}$$

where, from here on, · is the Euclidean norm, *A* ∈ R*m*×*n* is supposed full rank rank(*A*) = *n*, *m* ≥ *n*, *x* ¯ ∈ R*n*×1, *Ax* are the model response values and ¯*f* is the vector of experimental measurements. Usually the measured data contain noise, i.e., we measure *f* = ¯ *f* + *-*, with *-* a certain kind of additive noise (e.g., white Gaussian). Since we are interested here in algebraic and geometric aspects of the problem, we suppose *-* = 0 and set *f* = ¯ *f* . Moreover, we assume ideally that ¯ *f* = *Ax*¯ holds exactly. Let us consider also the estimation problem for the approximate model M*a*:

$$\|\mathbf{x}^{\parallel} = \operatorname\*{argmin}\_{\mathbf{x}' \in \mathbb{R}^{n\_d}} \|A\_d \mathbf{x}' - \bar{f} \|^2,\tag{2}$$

where *Aa* ∈ R*m*×*na* , *x* ∈ R*na*×1, with *na* < *n*. The choice of the notation for *x* is to remind that the least-squares solution satisfies *Aax* = *PAa* (*f*) =: *f* , where *f* is the orthogonal projection of ¯*f* on the subspace generated by *Aa*, and the residual *Aax* − ¯*f* is orthogonal to this subspace. Let us suppose that *Aa* corresponds to the first *na* columns of *A*, which means that the approximate model M*a* is exactly one part of the true model M, i.e., *A* = [*Aa*, *Au*] and so the solution *x*¯ of (1) can be decomposed in two parts such that

$$A\vec{x} = [A\_{a\prime}A\_{\prime l}] \begin{bmatrix} \vec{x}\_{a} \\ \vec{x}\_{u} \end{bmatrix} = A\_{a}\vec{x}\_{a} + A\_{u}\vec{x}\_{u} = \vec{f}. \tag{3}$$

This means that the model error corresponds to an additive term *Aux*¯*u* in the estimation problem. Note that the columns of *Aa* are linearly independent since *A* is supposed to be of full rank. We do not consider the case in which *Aa* is rank-deficient, because it would mean that the model is not well parametrized. Moreover, some noise in the data is sufficient to determine a full rank matrix.

For brevity, we will call A the subspace generated by the columns of *A* and A*<sup>a</sup>*, A*u* the subspaces generated by the columns of *Aa*, *Au* respectively. Note that if A*a* and A*u* were orthogonal, decomposition (3) would be orthogonal. However, in the following we will consider the case in which the two subspaces are not orthogonal, as it commonly happens in practice. Oblique projections, even if not as common as orthogonal ones, have a large literature, e.g., [6,7].

Now, it is well known and easy to demonstrate that, when we solve problem (2) and *Au* is not orthogonal to *Aa*, we ge<sup>t</sup> a biased solution, i.e., *x* = *<sup>x</sup>*¯*a*:

**Lemma 1.** *Given A* ∈ R*m*×*n with n* ≥ 2 *and A* = [*Aa*, *Au*]*, and given b* ∈ R*m*×<sup>1</sup> ∈ <sup>I</sup>*m*(*Aa*)*, call x the least-squares solution of* (2) *and x*¯ = [*x*¯*a*, *<sup>x</sup>*¯*u*] *the solution of* (1) *decomposed as in* (3)*. Then*


**Proof.** The least-squares problem *Ax* = *f* boils down to finding *x* such that *Ax* = *<sup>P</sup>*A*a* (*f*). Let us consider the unique decomposition of *f* on A*a* and A<sup>⊥</sup>*a* as *f* = *f* + *f* ⊥ with *f* = *<sup>P</sup>*A*a* (*f*) and *f* ⊥ = *<sup>P</sup>*A<sup>⊥</sup>*a* (*f*). Call *f* = *fa* + *fu* the decomposition on A*a* and A*<sup>u</sup>*, hence there exist two vectors *xa* ∈ R*na* , *xu* ∈ R*<sup>n</sup>*−*na* such that *fa* = *Aaxa* and *fu* = *Auxu*. If A*u* ⊥ A*a* then the two decompositions are the same, hence *f* = *fa* and so *x* = *<sup>x</sup>*¯*a*. Otherwise, for the definition of orthogonal projection ([6], third point of Def at page 429), it must hold *x* = *<sup>x</sup>*¯*a*.

#### **3. Analysis of the Parameter Estimation Error**

The aim of this paper is to propose a method to decrease substantially the bias of the solution of the approximated problem (2), with the smallest additional information about the norms of the model error and of the modelled part responses.

In this section we will introduce sufficient conditions to remove the bias and retrieve the true solution in a unique way, as summarized in Lemma 4. Let us start with a definition.

**Definition 1** (Intensity Ratio)**.** *The intensity ratio If between modelled and un-modelled dynamics is defined as*

$$I\_f = \frac{\|A\_d \mathbf{x}\_d\|}{\|A\_d \mathbf{x}\_d\|}.$$

In the following we assume that a good approximation of this intensity ratio is available and that its magnitude is sufficiently big, i.e., we have an approximate model that is quite accurate. This information about the model error will be used to reduce the bias, as shown in the following sections. Moreover we will consider also the norm *Nf* = *Aaxa* (or, equivalently, the norm *Auxu*).

#### *3.1. The Case of Exact Knowledge about If and Nf*

Here we assume, initially, to know the exact values of *If* and *Nf* , i.e.,

$$\begin{cases} N\_f = \bar{N}\_f = ||A\_d \mathbb{1}\_d||, \\ I\_f = I\_f = \frac{||A\_d \mathbb{1}\_d||}{||A\_d \mathbb{1}\_d||}. \end{cases} \tag{4}$$

This ideal setting is important to figure out the problem also with more practical assumptions. First of all, let us show a nice geometric property that relates *xa* and *fa* under a condition like (4).

**Lemma 2.** *The problem of finding the set of xa* ∈ R*n that give a constant, prescribed value for If and Nf is equivalent to that of finding the set of fa* = *Aaxa* ∈ A*a of the decomposition f* = *fa* + *fu (see the proof of Lemma 1) lying on the intersection of* A*a and the boundaries of two n-dimensional balls in* R*n. In fact, it holds:*

$$\begin{cases} N\_f = \|A\_d \mathbf{x}\_d\| \\ I\_f = \frac{\|A\_d \mathbf{x}\_d\|}{\|A\_u \mathbf{x}\_u\|} \end{cases} \iff \begin{cases} f\_d \in \partial B\_n(0, N\_f) \\ f\_d \in \partial B\_n(f^{\parallel}, T\_f) \end{cases} \qquad \text{with} \quad T\_f := \sqrt{\left(\frac{N\_f}{I\_f}\right)^2 - \|f^{\perp}\|^2}. \tag{5}$$

**Proof.** For every *xa* ∈ R*na* holds,

$$\begin{cases} N\_f = \|f\_d\| = \|A\_d \mathbf{x}\_d\|\\ I\_f = \frac{\|f\_d\|}{\|f\_u\|} = \frac{N\_f}{\|f\_u^\perp + f\_u^\perp\|} = \frac{N\_f}{\sqrt{\|f^\perp\|^2 + \|f^\parallel - A\_u \mathbf{x}\_u\|^2}} = \frac{N\_f}{\sqrt{\|f^\perp\|^2 + \|f^\parallel - f\_u\|^2}} \end{cases} \tag{6}$$

$$\iff \begin{cases} \|f\_a\| = N\_f \\ \|f^\parallel - f\_a\| = \sqrt{\left(\frac{N\_f}{I\_f}\right)^2 - \|f^\perp\|^2} =: T\_{f'} \end{cases} \tag{7}$$

where we used the fact that *fu* = *f u* + *f* ⊥*u* with *f* ⊥*u* := *PA*<sup>⊥</sup>*a* (*fu*) = *f* ⊥, *f u* := *PAa* (*fu*) = *Aaδxa* = *f* − *Aaxa*, and *δxa* = (*x* − *xa*). Hence the equivalence (5) is proved.

Given *If* and *Nf* , we call the feasible set of accurate model responses all the *fa* that satisfy the relations (5). Now we will see that Lemma 2 allows us to reformulate problem (2) in the problem of finding a feasible *fa* that, replaced to ¯ *f* in (2), gives as solution an unbiased estimate of *<sup>x</sup>*¯*a*. Indeed, it is easy to note that *Aax*¯*a* belongs to this feasible set. Moreover, since *fa* ∈ A*<sup>a</sup>*, we can reduce the dimensionality of the problem and work on the subspace A*a* which has dimension *na*, instead of the global space A of dimension *n*. To this aim, let us consider *Ua* the matrix of the SVD decomposition of *Aa*, *Aa* = *UaSaVTa* , and complete its columns to an orthonormal basis of R*n* to obtain a matrix *U*. Since the vectors *fa*, *f* ∈ R*n* belong to the subspace A*<sup>a</sup>*, the vectors ˜*fa*, ˜*f* ∈ R*n* defined such that *fa* = *U* ˜ *fa* and *f* = *U* ˜*f* must have zeros on the last *n* − *na* components. Since *U* has orthonormal columns, it preserves the norms and so *f* = ˜*f* and *fa* = ˜*fa*. If we call ˆ*fa*, ˆ*f* ∈ R*na* the first *na* components of the vectors ˜ *fa*, ˜ *f* (which have again the same norms of the full vectors in R*n*) respectively, we have

$$\begin{cases} f\_a \in \partial B\_{n\_d}(0, N\_f), \\ \hat{f}\_a \in \partial B\_{n\_d}(f^{\parallel}, T\_f). \end{cases} \tag{8}$$

In this way the problem depends only on the dimension of the known subspace, i.e., the value of *na*, and does not depend on the dimensions *m na* and *n* > *na*. From (8) we can deduce the equation of the (*na* − 2)-dimensional boundary of an (*na* − 1)-ball to which the vector *fa* = *Aaxa* must belong. In the following we discuss the various cases.

#### 3.1.1. Case *na* = 1

In this case, we have one unique solution when both conditions on *If* and *Nf* are imposed. When only one of these two is imposed, two solutions are found, shown in Figure 1a,c. Figure 1b shows the intensity ratio *If* .

**(b)**

**Figure 1.** Case *na* = 1. (**a**): Case *na* = 1, *m* = *n* = 2. Solutions with the condition on *Nf* . In the figure: the true decomposition obtained imposing both the conditions (blue), the orthogonal decomposition (red), another possible decomposition (green) that satisfy the same norm condition *Nf* , but different *If* ; (**b**): Case *na* = 1. Intensity Ratio value w.r.t the norm of the vector *Aaxa*: given a fixed value of Intensity Ratio there can be two solution, i.e. two possible decomposition of *f* as sum of two vectors with the same Intensity Ratio; (**c**): Case *na* = 1, *m* = *n* = 2. Solutions with the condition on *If* . In the figure: the true decomposition obtained imposing both the conditions (blue), the orthogonal decomposition (red), another possible decomposition (green) with the same intensity ratio *If*, but different *Nf*.

3.1.2. Case *na* = 2

Consider the vectors ˆ *fa*, ˆ *f* ∈ R*na*=<sup>2</sup> as defined previously, in particular we are looking for ˆ *fa* = [*ξ*1, *ξ*2] ∈ R2. Hence, conditions (8) can be written as

$$\begin{cases} \xi\_1^2 + \xi\_2^2 = N\_f^2\\ (\xi\_1 - \hat{f}\_{\xi\_1}^{\parallel})^2 + (\xi\_2 - \hat{f}\_{\xi\_2}^{\parallel})^2 = T\_f^2 \end{cases} \quad \longrightarrow \quad \mathcal{F}: \quad (f\_{\xi\_1}^{\parallel})^2 - 2f\_{\xi\_1}^{\parallel}\xi\_1 + (f\_{\xi\_2}^{\parallel})^2 - 2f\_{\xi\_2}^{\parallel}\xi\_2 = N\_f^2 - T\_f^2,\tag{9}$$

where the right equation is the (*na* − 1) = 1-dimensional subspace (line) F obtained subtracting the first equation to the second. This subspace has to be intersected with one of the beginning circumferences to obtain the feasible vectors ˆ *fa*, as can be seen in Figure 2a and its projection on A*a* in Figure 2b. The intersection of the two circumferences (5) can have different solutions depending on the value of (*Nf* − *f* ) − *Tf* . When this value is strictly positive there are zero solutions, this means that the estimates of *If* and *Nf* are not correct: we are not interested in this case because we suppose the two values to be sufficiently well estimated. When the value is strictly negative there are two solutions, that coincide when the value is zero.

**Figure 2.** Case *na* = 2. (**a**): Case *na* = 2, *m* = *n* = 3, with *Aaxa* = [*Aa*(1)*Aa*(2)][*xa*(1)*xa*(2)]*<sup>T</sup>*. In the figure: the true decomposition (blue), the orthogonal decomposition (red), another possible decomposition of the infinite ones (green); (**b**): Case *na* = 2, *m* = *n* = 3. Projection of the two circumferences on the subspace A*<sup>a</sup>*, and projections of the possible decompositions of *f* (red, blue and green).

When there are two solutions, we have no sufficient information to determine which one of the two solutions is the true one, i.e., the one that gives *fa* = *Aax*¯*a*: we cannot choose the one that has minimum residual, neither the vector *fa* that has the minimum angle with *f* , because both solutions have the same values of these two quantities. However, since we are supposing the linear system to be originated by an input/output system, where the matrix *Aa* is a function also of the input and *f* are the measurements of the output, we can take two tests with different inputs. Since all the solution sets contain the true parameter vector, we can determine the true solution from their intersection, unless the solutions of the two tests are coincident. The condition for coincidence is expressed in Lemma 3.

Let us call *Aa*,*<sup>i</sup>* ∈ R*n*×*na* the matrix of the test *i* = 1, 2, to which correspond a vector *fi*. The line on which lie the two feasible vectors *fa* of the same test *i* is F*i* and S*i* = *<sup>A</sup>*†*a*,*i*F*<sup>i</sup>* is the line through the two solution points. To have two tests with non-coincident solutions, we need that these two lines S1, S2 do not have more than one common point, that in the case *na* = 2 is equivalent to S1 = S2, i.e., *<sup>A</sup>*†*a*,1F<sup>1</sup> = *<sup>A</sup>*†*a*,<sup>2</sup>F2, i.e., F1 = *Aa*,1*A*†*a*,<sup>2</sup>F<sup>2</sup> =: F12. We represent the lines F*i* by means of their orthogonal vector from the origin *f ort*,*i* = *lort*,*<sup>i</sup> f i f i* . We introduce the matrices *Ca*, *Cf* , *Cf p* such that *Aa*,<sup>2</sup> = *CaAa*,1, *f*2 = *Cf f*1, *f* 2 = *Cf p f* 1 and *k f* such that *f* 2 = *k f f* 1 .

**Lemma 3.** *Consider two tests i* = 1, 2 *from the same system with na* = 2 *with the above notation. Then it holds* F1 = F12 *if and only if Ca* = *Cf p.*

**Proof.** From the relation *f i* = PA*<sup>a</sup>*,*<sup>i</sup>*(*fi*) = *Aa*,*<sup>i</sup>*(*ATa*,*<sup>i</sup>Aa*,*<sup>i</sup>*)−1*ATa*,*<sup>i</sup> fi*, we have

$$f\_2^{\parallel} = A\_{a,2} (A\_{a,2}^T A\_{a,2})^{-1} A\_{a,2}^T f\_2 = \mathbb{C}\_{a} A\_{a,1} (A\_{a,1}^T \mathbb{C}\_a^T \mathbb{C}\_a A\_{a,1})^{-1} A\_{a,1}^T \mathbb{C}\_a^T \mathbb{C}\_f f\_1. \tag{10}$$

It holds F1 = F12 ⇐⇒ *f ort*,1 = *f ort*,12 := *Aa*,1*A*†*a*,<sup>2</sup> *f ort*,2, hence we will show this second equivalence. We note that *lort*,<sup>2</sup> = *k f lort*,<sup>1</sup> and calculate

$$f^{\text{ret,12}} = A\_{a,1} A\_{a,2}^{\dagger} f^{\text{ret,2}} = A\_{a,1} A\_{a,1}^{\dagger} \mathbb{C}\_d^{\dagger} \left( l\_{\text{ret,2}} \frac{f\_2^{\parallel}}{\|f\_2^{\parallel}\|} \right) = A\_{a,1} A\_{a,1}^{\dagger} \mathbb{C}\_d^{\dagger} \left( k\_f l\_{\text{ret,1}} \frac{\mathbb{C}\_{fp} f\_1^{\parallel}}{k\_f \|f\_1^{\parallel}\|} \right) = A\_{a,1} A\_{a,1}^{\dagger} \mathbb{C}\_d^{\dagger} \mathbb{C}\_{fp} f^{\text{ret,1}}.\tag{11}$$

Now let us call *<sup>s</sup>ort*,<sup>1</sup> the vector such that *f ort*,1 = *Aa*,1*sort*,1, then, using the fact that *Ca* = *Cf p* we obtain

$$f^{\text{art},12} = A\_{a,1}A\_{a,1}^{\dagger} \mathbb{C}\_{a}^{\dagger} \mathbb{C}\_{f^{\text{f}}} A\_{a,1} s^{\text{ort},1} = A\_{a,1}(A\_{a,1}^{\dagger} A\_{a,1}) s^{\text{art},1} = (\text{since} A\_{a,1}^{\dagger} A\_{a,1} = I\_{\mathbb{R}\_{4}}) = A\_{a,1} s^{\text{art},1} \tag{12}$$
 
$$\text{where} \qquad \begin{array}{cccccc} \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots \end{array}$$

Hence we have F12 = F1 ⇐⇒ *Aa*,1*A*†*a*,1*C*†*aCf p f ort*,1 = *f ort*,1 ⇐⇒ *<sup>C</sup>*†*aCf p* = *I*.

3.1.3. Case *na* ≥ 3

More generally, for the case *na* ≥ 3, consider the vectors ˆ *fa*, ˆ *f* ∈ R*na* as defined previously, in particular we are looking for ˆ *fa* = [*ξ*1,..., *ξna* ] ∈ R*na* . Conditions (8) can be written as

$$\begin{cases} \sum\_{i=1}^{n\_d} \xi\_i^2 = \mathcal{N}\_f^2\\ \sum\_{i=1}^{n\_d} (\mathcal{J}\_i - \hat{f}\_{\xi\_i}^{\parallel})^2 = T\_f^2 \end{cases} \quad \longrightarrow \quad \mathcal{F} : \quad \sum\_{i=1}^{n\_d} ((\hat{f}\_{\xi\_i}^{\parallel})^2 - 2\hat{f}\_{\xi\_i}^{\parallel}\xi\_i) = \mathcal{N}\_f^2 - T\_{f'}^2 \tag{13}$$

where the two equations on the left are two (*na* − <sup>1</sup>)-spheres, i.e., the boundaries of two *na*-dimensional balls. Analogously to the case *na* = 2, the intersection of these equations can be empty, one point or the boundary of a (*na* − 1)-dimensional ball (with the same conditions on (*Nf* − *f* ) − *Tf*). The equation on the right of (13) is the (*na* − 1)-dimensional subspace F on which lies the boundary of the (*na* − 1)-dimensional ball of the feasible vectors *fa*, and is obtained subtracting the first equation to the second one. In Figure 3a the graphical representation of the decomposition *f* = *fa* + *f u* for the case *na* = 3 is shown, and in Figure 3b the solution ellipsoids of 3 tests whose intersection is one point. Figure 4a shows the solution hyperellipsoids of 4 tests whose intersection is one point, in the case *na* = 4.

We note that, to obtain one unique solution *xa* we must intersect the solutions of at least two tests. Let us give a more precise idea of what happens in general. Given *i* = 1, ... , *na* tests we call, as in the previous case, *f ort*,*i* the vector orthogonal to the (*na* − 1)-dimensional subspace F*i* that contains the feasible *fa*, and S*i* = *<sup>A</sup>*†*a*,*i*F*i*. We project this subspace on A*<sup>a</sup>*,<sup>1</sup> and obtain F1*i* = *Aa*,1*A*†*a*,*i*F*<sup>i</sup>* that we describe through its orthogonal vector *f ort*,1*i* = *Aa*,1*A*†*a*,*<sup>i</sup> f ort*,*i*. If the vectors *f ort*,1, *f ort*,12, ... *f ort*,1*na* are linearly independent, it means that the (*na* − 1)-dimensional subspaces F1, F12, ... F1*na* intersect themselves in one point. In Figure 4b it is shown an example in which, in the case *na* = 3 the vectors *f ort*,1, *f ort*,12, *f ort*,13 are not linearly independent. The three solution sets of this example will intersect in two points, hence, for *na* = 3, three tests are not always sufficient to determine a unique solution.

**Figure 3.** Case *na* = 3. (**a**): Case *na* = 3, *m* = *n* = 4, *n* − *na* = 1: in the picture ¯ *f* , i.e., the projection of *f* on A*<sup>a</sup>*. The decompositions that satisfies the conditions on *If* and *Nf* are the ones with *fa* that lies on the red circumference on the left. The spheres determined by the conditions are shown in yellow for the vector *fa* and in blue for the vector *f* − *aa*. Two feasible decompositions are shown in blue and green; (**b**): Case *na* = 3. Intersection of three hyperellipsoids, set of the solutions *xa* of three different tests, in the space R*na*=3.

**Figure 4.** Case *na* ≥ 3. (**a**): Case *na* = 4. Intersection of four hyperellipsoids, set of the solutions *xa* of four different tests, in the space <sup>R</sup>*na*=4; (**b**): Case *na* = 3. Example of three tests for which the solution has an intersection bigger than one single point. The three (*na* − 1)-dimensional subspaces F1, F12, F13 in the space generated by *Aa*,<sup>1</sup> intersect in a line and their three orthogonal vectors are not linearly independent.

**Lemma 4.** *For all na* > 1*, the condition that, given i* = 1, ... , *na tests, the na hyperplanes* S*i* = *<sup>A</sup>*†*a*,*i*F*<sup>i</sup> previously defined have linearly independent normal vectors is sufficient to determine one unique intersection, i.e., one unique solution vector <sup>x</sup>*¯*a, that satisfies the system of conditions* (4) *for each test.*

**Proof.** The intersection of *na* independent hyperplanes in R*na* is a point. Given a test *i* and S*i* = *<sup>A</sup>*†*a*,*i*F*<sup>i</sup>* the affine subspace of that test

$$\mathcal{S}\_i = \upsilon\_i + \mathcal{W}\_i = \{\upsilon\_i + w \in \mathbb{R}^{n\_d} \, : \, w \cdot \mathbf{n}\_i = 0\} = \{\mathbf{x} \in \mathbb{R}^{n\_d} \, : \, \mathbf{n}\_i^T (\mathbf{x} - \upsilon\_i) = 0\},$$

where **n***i* is the normal vector of the linear subspace and *vi* the translation with respect to the origin.

The conditions on S*i* relative to *na* tests correspond to a linear system *Ax* = *b*, where **n***i* is the *i*-th row of *A* and each component of the vector *b* given by *bi* = **n***Ti vi*. The matrix *A* has full rank because of the linear independence condition of the vectors **n***i*, hence the solution of the linear system is unique.

The unique intersection is due to the hypothesis of full column rank of the matrices *Aa*,*i*: this condition implies that the matrices *Aa*,*<sup>i</sup>* map the surfaces F*i* to hyperplanes S*i* = *Aa*,*i*F*i*.

For example, with *na* = 2 (Lemma 3) this condition is equal to considering two tests with non-coincident lines S1, S2, i.e., two non-coincident F1, F12.

#### *3.2. The Case of Approximate Knowledge of If and Nf Values*

Let us consider *N* tests and call *If* ,*i*, *Nf* ,*i* and *Tf* ,*i* the values as defined in Lemma 2, relative to test *i*. Since the system of conditions

$$\begin{cases} N\_{f,i} = \|A\_{a,i} \mathbf{x}\_a\| \\ I\_{f,i} = \frac{\|A\_{a,i} \mathbf{x}\_a\|}{\|z\_i - A\_{a,i} \mathbf{x}\_a\|} \end{cases} \quad \text{and} \quad \begin{cases} N\_{f,i} = \|A\_{a,i} \mathbf{x}\_a\| \\ T\_{f,i} = \|f\_i^{\parallel} - A\_{a,i} \mathbf{x}\_a\| \end{cases} \tag{14}$$

is equivalent, as shown in Lemma 2, we will take into account the system on the right for its simplicity: the equation on *Tf* ,*i* represents an hyperellipsoid, translated with respect to the origin.

In a real application, we can assume to know only an interval in which the true values of *If* is contained and, analogously, an interval for *Nf* values. Supposing we know the bounds on *If* and *Nf* , then the bounds on *Tf* can be easily computed. Let us call these extreme values *Nmax f* , *Nmin f* , *Tmax f* , *Tmin f* , we will assume it always holds

$$ \begin{cases} N\_f^{\max} \ge \max\_i (N\_{f,i}), \\ N\_f^{\min} \le \min\_i (N\_{f,i}), \end{cases} \quad \text{and} \quad \begin{cases} T\_f^{\max} \ge \max\_i (T\_{f,i}), \\ T\_f^{\min} \le \min\_i (T\_{f,i}), \end{cases} \tag{15} $$

for each *i*-th test of the considered set *i* = 0, . . . , *N*.

Condition (4) is now relaxed as follows: the true solution *<sup>x</sup>*¯*a* satisfies

$$\begin{cases} \left||A\_{a,i}\mathbb{1}\_{a}\right|| \leq N\_{f}^{max},\\ \left||A\_{a,i}\mathbb{1}\_{a}\right|| \geq N\_{f}^{min}, \end{cases} \quad \text{and} \quad \begin{cases} \left||A\_{a,i}\mathbb{1}\_{a} - f\_{i}^{\parallel}\right|| \leq T\_{f}^{max},\\ \left||A\_{a,i}\mathbb{1}\_{a} - f\_{i}^{\parallel}\right|| \geq T\_{f}^{min}, \end{cases} \tag{16}$$

for each *i*-th test of the considered set *i* = 0, . . . , *N*.

Assuming the extremes to be non-coincident ( *Nmin f* = *Nmax f* and *Tmin f* = *Tmax f* ), these conditions do not define a single point, i.e., the unique solution *<sup>x</sup>*¯*a* (as in (4) of Section 3.1), but an entire closed region of the space that may be even not connected, and contains infinite possible solutions *x* different from *<sup>x</sup>*¯*a*.

In Figure 5 two examples, with *na* = 2, of the conditions for a single test are shown: on the left in the case of exact knowledge of the *Nf* ,*i* and *Tf* ,*i* values, and on the right with the knowledge of two intervals containing the right values.

Given a single test, the conditions (16) on a point *x* can be easily characterized. Given the condition

$$\|f\_{\mathfrak{a}}\| = \|A\_{\mathfrak{a}}\mathbf{x}\_{\mathfrak{a}}\| = N\_{f'} $$

we write *xa* = ∑ *χivi* with *vi* the vectors of the orthogonal basis, given by the columns *V* of the SVD decomposition *Aa* = *USVT*. Then

$$f\_{\mathfrak{a}} = A\_{\mathfrak{a}} \mathbf{x}\_{\mathfrak{a}} = \mathcal{U} S V^{T} (\sum\_{i} \chi\_{i} \upsilon\_{i}) = \mathcal{U} S (\sum\_{i} \chi\_{i} \varepsilon\_{i}) = \mathcal{U} (\sum\_{i} s\_{i} \chi\_{i} \varepsilon\_{i}) = \sum\_{i} s\_{i} \chi\_{i} \mathbf{u}\_{i}.$$

Since the norm condition *fa*<sup>2</sup> = ∑*i*(*siχi*)<sup>2</sup> = *N*<sup>2</sup> *f* holds, then we obtain the equation of the hyperellipsoid for *xa* as:

$$\sum\_{i} (s\_i \chi\_i)^2 = \sum\_{i} \frac{\chi\_i^2}{(\frac{1}{s\_i})^2} = N\_f^2. \tag{17}$$

The bounded conditions hence gives the region inside the two hyperellipsoids centered in the origin:

$$N\_f^{\min} \le \sum\_i \frac{\chi\_i^2}{(\frac{1}{s\_i})^2} \le N\_f^{\max}.\tag{18}$$

Analogously for the *If* condition, the region inside the two translated hyperellipsoids:

$$T\_f^{\min} \le \sum\_i \frac{\chi\_i^2}{(\frac{1}{s\_i})^2} - f^\parallel \le T\_f^{\max}.\tag{19}$$

Given a test *i*, each of the conditions (18) and (19), constrain *<sup>x</sup>*¯*a* to lie inside a thick hyperellipsoid, i.e., the region between the two concentric hyperellipsoids. The intersection of these two conditions for test *i* is a zero-residual region that we call *Zri*

$$Z\_{\mathbb{F}\_i} = \{ \mathbf{x} \in \mathbb{R}^{\mathbb{F}\_s} \mid \text{(18) and (19) hold } \}. \tag{20}$$

It is easy to verify that if *Nf* ,*i* is equal to the assumed *Nmin f* or *Nmax f* , or *Tf* ,*i* is equal to the assumed *Tmin f* or *Tmax f* , the true solution will be on a border of the region *Zri* , and if it holds for both *Nf* ,*i* and *Tf* ,*i* it will lie on a vertex.

**Figure 5.** Examples of the exact and approximated conditions on a test with *na* = 2. In the left equation the two black ellipsoids are the two constraints of the right system of (14), while in the right figure the two couples of concentric ellipsoids are the borders of the thick ellipsoids defined by (16) and the blue region *Zri* is the intersection of (18) and (19). The black dot in both the figures is the true solution. (**a**): Exact conditions on *Nf* and *Tf* ; (**b**): Approximated conditions on *Nf* and *Tf* .

When more tests *i* = 1, ... , *N* are put together, we have to consider the points that belong to the intersection of all these regions *Zri* , i.e.,

$$I\_{2r} = \bigcap\_{i=0,\ldots,N} Z\_{r\_i} \tag{21}$$

These points minimize, with zero residual, the following optimization problem:

$$\begin{split} \min\_{\mathbf{x}} & \sum\_{i=1}^{N} \min(0, \|A\_{a,i}\mathbf{x}\| - N\_f^{\min})^2 + \sum\_{i=1}^{N} \max(0, \|A\_{a,i}\mathbf{x}\| - N\_f^{\max})^2 + \\ & + \sum\_{i=1}^{N} \min(0, \|A\_{a,i}\mathbf{x} - f\_i^{\|}\| - T\_f^{\min})^2 + \sum\_{i=1}^{N} \max(0, \|A\_{a,i}\mathbf{x} - f\_i^{\|}\| - T\_f^{\max})^2. \end{split} \tag{22}$$

It is also easy to verify that, if the true solution lies on an edge/vertex of one of the regions *Zri* , it will lie on an edge/vertex of their intersection.

The intersected region *Izr* tends to monotonically shrink in a way that depends from the properties of the added tests. We are interested to study the conditions that make it reduce to a point, or at least to a small region. A sufficient condition to obtain a point is given in Theorem 1.

Let us first consider the function that, given a point in the space R*na* , returns the squared norm of its image through the matrix *Aa*:

$$\begin{aligned} \|\mathbf{N}\_f^2(\mathbf{x}) = \|A\_\mathbf{x}\mathbf{x}\|\_2^2 &= \|l\mathbf{I}\Sigma V^T \mathbf{x}\|\_2^2 = \|\Sigma V^T \mathbf{x}\|\_2^2 = (\Sigma V^T \mathbf{x})^T (\Sigma V^T \mathbf{x}) = \mathbf{x}^T (V\Sigma^T \Sigma V^T)\mathbf{x} = \mathbf{x}^T \mathbf{x} \\ &= \|\begin{bmatrix} \sigma\_1 \mathbf{v}\_1^T \mathbf{x} \\ \sigma\_2 \mathbf{v}\_2^T \mathbf{x} \\ \vdots \end{bmatrix}\|\_2^2 = \sigma\_1^2 (\mathbf{v}\_1^T \mathbf{x})^2 + \sigma\_2^2 (\mathbf{v}\_2^T \mathbf{x})^2 + \dots \end{aligned} \tag{23}$$

where *vi* are the columns of *V* and *x* = [*x*(1) *<sup>x</sup>*(2)..., *<sup>x</sup>*(*na*)].

The direction of maximum increase of this function is given by its gradient

$$\nabla N\_f^2(\mathbf{x}) = 2(V\Sigma^2 V^T)\mathbf{x} = \begin{bmatrix} 2\upsilon\_1^2 \upsilon\_1^T x \upsilon\_1(1) + 2\upsilon\_2^2 \upsilon\_2^T x \upsilon\_2(1) + \dots + 2\upsilon\_{n\_d}^2 \upsilon\_{n\_d}^T x \upsilon\_{n\_d}(1) \\ 2\sigma\_1^2 \upsilon\_1^T x \upsilon\_1(2) + 2\sigma\_2^2 \upsilon\_2^T x \upsilon\_2(2) + \dots + 2\sigma\_{n\_d}^2 \upsilon\_{n\_d}^T x \upsilon\_{n\_d}(2) \\ \vdots \end{bmatrix} . \tag{24}$$

Analogously, define the function *T*2*f*(*x*) as

$$\begin{aligned} T\_f^2(\mathbf{x}) &= \|A\_3 \mathbf{x} - f^\parallel \|\_2^2 = \|l\boldsymbol{L} \boldsymbol{\Sigma} \boldsymbol{V}^\mathbf{x} - \boldsymbol{f}^\parallel \|\_2^2 = \|\boldsymbol{\Sigma} \boldsymbol{V}^\mathbf{x} - \boldsymbol{f}^\parallel \|\_2^2 = \\ &= (\boldsymbol{\Sigma} \boldsymbol{V}^\mathbf{x} - \boldsymbol{f}^\parallel)^T (\boldsymbol{\Sigma} \boldsymbol{V}^\mathbf{x} - \boldsymbol{f}^\parallel) = (\boldsymbol{\Sigma} \boldsymbol{V}^\mathbf{x})^T (\boldsymbol{\Sigma} \boldsymbol{V}^\mathbf{x}) - 2(\boldsymbol{\Sigma} \boldsymbol{V}^\mathbf{x})^T \boldsymbol{f}^\parallel + (\boldsymbol{f}^\parallel)^T (\boldsymbol{f}^\parallel) \\ &= \mathbf{x} (\boldsymbol{V} \boldsymbol{\Sigma}^2 \boldsymbol{V}^\mathbf{x}) \mathbf{x} - 2(\mathbf{x})^T \boldsymbol{V} \boldsymbol{\Sigma} \boldsymbol{f}^\parallel + (\boldsymbol{f}^\parallel)^T (\boldsymbol{f}^\parallel) = \\ &= \begin{bmatrix} \sigma\_1 \boldsymbol{v}\_1^T \boldsymbol{x} \\ \sigma\_2 \boldsymbol{v}\_2^T \boldsymbol{x} \\ \vdots \end{bmatrix} - \boldsymbol{f}^\parallel \begin{bmatrix} \boldsymbol{\end{bmatrix}^2 \\ \end{aligned} \tag{25}$$

with gradient

$$\begin{aligned} \nabla T\_f^2(\mathbf{x}) &= 2(V\Sigma^2 V^T)\mathbf{x} - 2V\Sigma f^\parallel = \\ &= \begin{bmatrix} 2\sigma\_1^2 v\_1^T x v\_1(1) + 2\sigma\_2^2 v\_2^T x v\_2(1) + \dots + 2\sigma\_{n\_a}^2 v\_{n\_a}^T x v\_{n\_a}(1) \\ \vdots \\ 2\sigma\_1^2 v\_1^T x v\_1(j) + 2\sigma\_2^2 v\_2^T x v\_2(j) + \dots + 2\sigma\_{n\_a}^2 v\_{n\_a}^T x v\_{n\_a}(j) \\ \vdots \end{bmatrix} - \begin{bmatrix} -2\sigma\_i^2 \sum\_i f^\parallel(i) v\_i(1) \\ \vdots \\ -2\sigma\_i^2 \sum\_i f^\parallel(i) v\_i(j) \\ \vdots \end{bmatrix} \end{aligned} (26)$$

**Definition 2.** *(Upward/Downward Outgoing Gradients) Take a test i, and the functions N*2*f* (*x*) *and T*2*f* (*x*) *as in* (23) *and* (25)*, with the formulas of the gradient vectors of these two functions* ∇*Nf* ,*<sup>i</sup>*(*x*), ∇*Tf* ,*<sup>i</sup>*(*x*) *as in* (24) *and* (26)*. Given the two extreme values Nmin*/*max fand Tmin*/*max ffor each test, let us define*

 • *the downward outgoing gradients as the set of gradients calculated on the points on the minimum hyperellipsoid*

$$\{-\nabla N\_{f,i}(\mathbf{x}) \mid N\_{f,i}(\mathbf{x}) = N\_f^{\min} \} \quad \text{and} \quad \{-\nabla T\_{f,i}(\mathbf{x}) \mid T\_{f,i}(\mathbf{x}) = T\_f^{\min} \} \tag{27}$$

*they point inward to the region of the thick hyperellipsoid.*

• *the Upward Outgoing Gradients as the set of negative gradients of points on the maximum hyperellipsoid*

$$\{\nabla N\_{f,i}(\mathbf{x}) \mid N\_{f,i}(\mathbf{x}) = N\_f^{\max}\} \quad \text{and} \quad \{\nabla T\_{f,i}(\mathbf{x}) \mid T\_{f,i}(\mathbf{x}) = T\_f^{\max}\} \tag{28}$$

#### *they point outward the region.*

Note that the upward/downward outgoing gradient of function *N*2*f* (*x*) (or *T*2*f* (*x*)) on point *x* is the normal vector to the tangent plane on the hyperellipsoid on which the point lies. Moreover, these vectors point outward the region defined by Equation (18) (and (19) respectively). In Figure 6, an example of some upward/downward outgoing gradients of function *N*2*f*(*x*) is shown.

**Figure 6.** In the figure some upward/downward outgoing gradients are shown: the blue internal ones are downward outgoing gradients calculated on points *x* on the internal ellipsoid with *Nf* ,*<sup>i</sup>*(*x*) = *Nmin f* , while the external red ones are upward outgoing gradients calculated on points *x* on the external ellipsoid with *Nf* ,*<sup>i</sup>*(*x*) = *Nmax f*.

**Theorem 1.** *Given N tests with values If* ,*i and Nf* ,*i in the closed intervals* [*Imin f* , *Imax f* ] *and* [*Nmin f* , *Nmax f* ]*, take the set of all the upward/downward outgoing gradients of functions N*2*f* ,*<sup>i</sup>*(*x*) *and T*2*f* ,*<sup>i</sup>*(*x*) *calculated in the true solution <sup>x</sup>*¯*a, i.e.,*

$$\begin{aligned} \{\nabla N\_{f,i}(\mathfrak{x}\_{a}) \text{ for } i = 1, \dots, N \mid N\_{f,i}(\mathfrak{x}\_{a}) = N\_{f}^{\max}\} \cup \{\nabla N\_{f,i}(\mathfrak{x}\_{a}) \text{ for } i = 1, \dots, N \mid N\_{f,i}(\mathfrak{x}\_{a}) = N\_{f}^{\min}\} \cup \\ \{\nabla T\_{f,i}(\mathfrak{x}\_{a}) \text{ for } i = 1, \dots, N \mid T\_{f,i}(\mathfrak{x}\_{a}) = T\_{f}^{\max}\} \cup \{\nabla T\_{f,i}(\mathfrak{x}\_{a}) \text{ for } i = 1, \dots, N \mid T\_{f,i}(\mathfrak{x}\_{a}) = T\_{f}^{\min}\} .\end{aligned} \tag{29}$$

*If there is at least one outgoing gradient of this set in each orthant of* R*na , then the intersection region Izr of Equation* (21) *reduces to a point.*

**Proof.** What we want to show is that given any perturbation *δx* of the real solution *<sup>x</sup>*¯*a*, there exists at least one condition among (18) and (19) that is not satisfied by the new perturbed point *<sup>x</sup>*¯*a* + *δ<sup>x</sup>*.

Any sufficiently small perturbation *δx* in an orthant in which lies an upward/downward outgoing gradient (from now on "Gradient"), determines an increase/decrease in the value of the hyperellipsoid function relative to that Gradient, that makes the relative condition to be unsatisfied.

Hence, if the Gradient in the orthant considered is upward, it satisfies *Nf* ,*<sup>i</sup>*(*x*¯*a*) = *Nmax f* (or analogously with *Tf* ,*i*) and for each perturbation *δx* in the same orthant we obtain

$$N\_{f,i}(\mathfrak{x}\_a + \delta\_x) > N\_{f,i}(\mathfrak{x}\_a) = N\_f^{max}$$

(or analogously with *Tf* ,*i*). In the same way, if the Gradient is downward we obtain

$$N\_{f,i}(\mathfrak{x}\_a + \delta\_\mathfrak{x}) < N\_{f,i}(\mathfrak{x}\_a) = N\_f^{\text{min}}$$

(or analogously with *Tf* ,*i*).

When in one orthant there are more than one Gradient, it means that more than one condition will be unsatisfied by the perturbed point *<sup>x</sup>*¯*a* + *δx* for a sufficiently small *δx* in that orthant.

#### **4. Problem Solution**

The theory previously presented allows us to build a solution algorithm that can deal with different a-priori information. We will start in Section 4.1 with the ideal case, i.e., with exact knowledge of *If* and *Nf* . Then, we generalize to a more practical setting, where we suppose to know an interval that contains the *Tf* values of all the experiments considered and an interval for the *Nf* values. Hence, the estimate solution will satisfy Equations (18) and (19). In this case we describe an algorithm for computing an estimate of the solution, that we will test in Section 5 against a toy model.

#### *4.1. Exact Knowledge of If and Nf*

When the information about *If* and *Nf* is exact, with the minimum amount of experiments indicated in Section 3 we can find the unbiased parameter estimate as the intersection *Izr* of the zero-residual sets *Zri* corresponding to each experiment. In principle this could be done also following the proof of Lemma 4, but the computation of the *vi* vectors is quite cumbersome. Since this is an ideal case, we solve it by simply imposing the satisfaction of the various *Nf* and *Tf* conditions (Equation (14)) as an optimization problem:

$$\min\_{\mathbf{x}} F(\mathbf{x}) \quad \text{with} \quad F(\mathbf{x}) = \sum\_{i=1}^{N} (\|A\_{a,i}\mathbf{x}\| - N\_{f,i})^2 + \sum\_{i=1}^{N} (\|A\_{a,i}\mathbf{x} - f\_i^{\parallel}\| - T\_{f,i})^2. \tag{30}$$

The solution of this problem is unique when the tests are in a sufficient number and satisfies the conditions of Lemma 4.

This nonlinear least-squares problem can be solved using a general nonlinear optimization algorithm, like Gauss–Newton method or Levenberg–Marquardt [8].

#### *4.2. Approximate Knowledge of If and Nf*

In practice, as already pointed out in Section 3.2, it is more realistic to know the two intervals that contain all the *Nf* ,*i* and *If* ,*i* values for each test *i*. Then, we know that within the region *Izr* there is also the exact unbiased parameter solution *<sup>x</sup>*¯*a*, that we want at least to approximate. We introduce here an Unbiased Least-Squares (ULS) Algorithm 1 for the computation of this estimate.

#### **Algorithm 1** An Unbiased Least-Squares (ULS) algorithm.


In general, the zero-residual region *Zri* of each test contains the true point of the parameters vector, while the estimated iterates with the local optimization usually start from a point outside this region and converge to a point on the boundary of the region.

The ULS estimate can converge to the true solution in two cases:


The size of the intersection set *Izr*, of the zero-residual regions *Zri* , is estimated in the following way.

Let us define an index, that we call region shrinkage estimate, as follows:

$$\mathfrak{sl}(\mathbf{x}) = \min \{ n \mid \sum\_{\delta \in \mathcal{P}} \Delta\_{I\_{\mathbf{z}^n}}(\mathbf{x} + \mu^{-n}\delta) > 0 \},\tag{31}$$

where we used *μ* = 1.5 in the experiments below, P = {*δ* ∈ R*na* | *δ*(*i*) ∈ (−1, 0, 1) ∀*i* = 1, ... , *na*} and Δ*Izr* is the Dirac function of the set *Izr*.

#### **5. Numerical Examples**

Let us consider a classical application example, the equations of a DC motor with a mechanical load, where the electrical variables are governed by the following ordinary differential equation

$$\begin{cases} LI(t) &= -K\omega(t) - RI(t) + V(t) - f\_u(t) \\ I(t\_0) &= I\_{0\prime} \end{cases} \tag{32}$$

where *I* is the motor current, *ω* the motor angular speed, *V* the applied voltage, and *fu*(*t*) a possible unmodelled component

$$f\_{\mathfrak{u}}(t) = -m\_{\mathfrak{t}\mathcal{T}\mathcal{T}} \cos(n\_{\text{poles}}\theta(t)),\tag{33}$$

where *npoles* is the number of poles of the motor, i.e., the number of windings or magnets [9], *merr* the magnitude of the error model and *θ* the angle, given by the system

$$\begin{cases} \dot{\omega}(t) &= \theta(t) \\ \omega(t\_0) &= \omega\_0 \end{cases} \tag{34}$$

ˆ

Note that the unknown component *fu* of this example can be seen as a difference in the potential that is not described by the approximated model. We are interested in the estimation of parameters [*L*, *K*, *<sup>R</sup>*]. In our test the true values were constant values [*L* = 0.0035, *K* = 0.14, *R* = 0.53].

We suppose to know the measurements of *I* and *ω* at equally spaced times *t*0, ... , *tN*¯ with step *h*, such that *tk* = *t*0 + *kh*, and *tk*+<sup>1</sup> = *tk* + *h*. In Figure 7 we see the plots of the motor speed *ω* and of the unknown component *fu* for this experiment.

**Figure 7.** The plots of (**a**) *ω*(*t*) and (**b**) *fu*(*t*) in the experiment.

We compute the approximation of the derivative of the current signal ˙ *I*(*tk*) with the forward finite difference formula of order one

$$\hat{I}(t\_k) = \frac{I(t\_k) - I(t\_{k-1})}{h}, \quad \text{for} \quad t\_k = t\_{1'}, \dots, t\_{N'}$$

with a step *h* = 4 × 10−4. The applied voltage is held constant to the value *V*(*t*) = 30.0.

To obtain a more accurate estimate, or to allow the possibility of using higher step size values *h*, finite differences of higher order can be used, for example the fourth order difference formula

$$\hat{I}(t\_k) = \frac{I(t\_k - 2h) - 8I(t\_k - h) + 8I(t\_k + h) - I(t\_k + 2h)}{12h}, \qquad \text{for} \quad t\_k = t\_2, \dots, t\_{\hat{N}-2}.$$

With the choice of the finite difference formula, we obtain the discretized equations

$$L\hat{I}(t\_k) = -K\omega(t\_k) - RI(t\_k) + V(t\_k) - f\_u(t\_k), \qquad \text{for} \quad t\_k = t\_1, \dots, t\_{\hat{N}}.\tag{35}$$

We will show a possible implementation of the method explained in the previous sections, and the results we ge<sup>t</sup> with this toy-model example. The comparison is made against the standard least-squares. In particular, we will show that when the information about *If* and *Nf* is exact, we have an exact removal of the bias. In case this information is only approximate, which is common in a real application, we will show how the bias asymptotically disappears when the number of experiments increases.

We build each test taking the Equation (35) for *n* samples in the range *t*1, ... , *tN*¯ , obtaining the linear system

$$
\begin{bmatrix}
\hat{I}(t\_k) & \omega(t\_k) & I(t\_k) \\
\hat{I}(t\_{k+1}) & \omega(t\_{k+1}) & I(t\_{k+1}) \\
\vdots & \vdots & \vdots \\
\hat{I}(t\_{k+n}) & \omega(t\_{k+n}) & I(t\_{k+n})
\end{bmatrix}
\begin{bmatrix}
L \\ K \\ R
\end{bmatrix} + 
\begin{bmatrix}
f\_u(t\_k) \\ f\_u(t\_{k+1}) \\ \vdots \\ f\_u(t\_{k+n})
\end{bmatrix} = 
\begin{bmatrix}
V(t\_k) \\ V(t\_{k+1}) \\ \vdots \\ V(t\_{k+n})
\end{bmatrix} \tag{36}
$$

so that the first matrix in the equation is *Aa* ∈ R*n*×*na* with *na* = 3, the number of parameters to be estimated.

To measure the estimation relative error *<sup>e</sup>*<sup>ˆ</sup>*rel* we will use the following formula, where *x*ˆ*a* is the parameter estimate:

$$\mathfrak{k}\_{rel} = \frac{1}{n\_a} \sum\_{i=1}^{n\_d} \frac{||\mathfrak{k}\_a(i) - \mathfrak{x}\_a(i)||\_2}{||\mathfrak{x}\_a(i)||\_2}. \tag{37}$$

Note that the tests that we built in the numerical experiments below are simply small chunks of consecutive data, taken from one single simulation for each experiment.

The results have been obtained with a Python code developed by the authors, using NumPy for linear algebra computations and scipy.optimize for the nonlinear least-squares optimization.

#### *5.1. Exact Knowledge of If and Nf*

As analyzed in Section 4.1, the solution of the minimization problem (30) is computed with a local optimization algorithm.

Here the obtained results show an error *<sup>e</sup>*<sup>ˆ</sup>*rel* with an order of magnitude of 10−<sup>7</sup> in every test we made. Note that it is also possible to construct geometrically the solution, with exact results.

#### *5.2. Approximate Knowledge of If and Nf*

When *If* and *Nf* are known only approximately, i.e., we know only an interval that contains all the *If* values and an interval that contains all the *Nf* values, we lose the unique intersection of Lemma 4, that would require only *na* tests. Moreover, with a finite number of tests we cannot guarantee in general to satisfy the exact hypotheses of Theorem 1. As a consequence, various issues open up. Let's start by showing in Figure 8 that when all the four conditions of (15) hold with equality, the true solution lies on the boundary of the region *Izr* as already mentioned in Section 3.2. If this happens, then with the conditions of Theorem 1 on the upward/downward outgoing gradients, the region *Izr* is

a point. When all the four conditions of (15) hold with strict inequalities, the true solution lies inside the region *Izr* (Figure 8b). From a theoretical point of view this distinction has a big importance, since it means that the zero-residual region can or cannot be reduced to a single point. From a practical point of view it becomes less important, for the moment, since we cannot guarantee that the available tests will reduce *Izr* exactly to a single point and we will arrive most of the times to an approximate estimate. This can be more or less accurate, but this depends on the specific application, and this is out of the scope of the present work.

**Figure 8.** Two examples of (zero-residual) intersection regions *Izr* ⊂ R<sup>3</sup> with different location of the true solution: inside the region or on its border. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball (thick point) is the true solution. (**a**): The true solution (ball) is on the border of *Izr*; (**b**): The true solution (ball) is internal to *Izr*.

**(b)**

**(a)**

To be more precise, when the conditions of Theorem 1 are not satisfied, there is an entire region of the parameters space which satisfies exactly problem (30), but only one point of this region is the true solution *<sup>x</sup>*¯*a*. As more tests are added and intersected together, the zero-residual region *Izr* tends to reduce, simply because it must satisfy an increasing number of inequalities. In Figure 9 we can see four iterations taken from an example, precisely with 3, 5, 9 and 20 tests intersected and *merr* = 19. With only three tests (Figure 9a), there is a big region *Izr* (described by the mesh of small dots), and here we see that the true solution (thick point) and the current estimate (star) stay on opposite sides of the region, as accidentally happens. With five tests (Figure 9a) the region has shrunk considerably and the estimate is reaching the boundary (in the plot it is still half-way), and even more with nine tests (Figure 9c). The convergence arrives here before the region collapses to a single point, because accidentally the estimate has approached the region boundary at the same point where the true solution is located.

In general, the zero-residual region *Zri* (20) of each test contains the true solution, while the estimate arrives from outside the region and stops when it bumps the border of the intersection region *Izr* (21). For this reason we have convergence when the region that contains the true solution is reduced to a single point, and the current estimate *x*ˆ*a* does not lie in a disconnected sub-region of *Izr* different from the one in which the true solution lies. Figure 10 shows an example of an intersection region *Izr* which is the union of two closed disconnected regions: this case creates a local minimum in problem (30).

**Figure 9.** The intersection region *Izr* ⊂ R<sup>3</sup> at different number of tests involved. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball is the true solution and the star is the current estimate in the experiment. (**a**) 3 tests; (**b**) 5 tests; (**c**) 9 tests; (**d**) 20 tests.

**Figure 10.** The intersection region *Izr* ⊂ R<sup>3</sup> at different number of tests involved. On the left a few tests have created a single connected region while, on the right, adding more tests have splitted it into two subregions. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball is the true solution and the star is the current estimate in the experiment. (**a**) A (portion of a) connected region *Izr*; (**b**) A region *Izr* split into two not connected sub regions.

In Figure 11 we see the differences *Nmax f* − *Nmin f* and *Tmax f* − *Tmin f* vs. *merr*. The differences are bigger for higher values of the model error. It seems that this is the cause of a more frequent creation of local minima.

**Figure 11.** The three plots show the values assumed by the extreme values (15) as a function of *merr*. (**a**): {*Imin f* , *Imax f* } vs. *merr*; (**b**): {*Nmin f* , *Nmax f* } vs. *merr*; **(c)** {*Tmin f* , *Tmax f* } vs. *merr*.

Figure 12 synthesizes the main results that we have experienced with this new approach. Globally it shows a grea<sup>t</sup> reduction of the bias contained in the standard least-squares estimates; indeed, we had to use the logarithmic scale to enhance the differences in the behaviour of the proposed method while varying *merr*. In particular,


It is evident from these results that for lower values of modelling error *merr*, it is much easier to produce tests that reduce the zero-residual region to a quite small interval of *Rna* , while for high values of *merr* it is much more difficult and the region *Izr* can even fall to pieces, thus creating local minima. It is also evident that a simple estimate of the *Izr* region size, like (31), can reliably assess the quality of the estimate produced by the approach here proposed, as summarized in Figure 12c.

**Figure 12.** The plots summarize the results obtained by the ULS approach to parameter estimation no the model problem explained at the beginning of this section. (**a**): The relative estimation error (37) vs. *merr*; (**b**): The *Izr* region shrinkage estimate (31) vs. *merr*; (**c**): The relative estimation error (37) vs. the estimate of the *Izr* region shrinkage, considering the experiments with *merr* ∈ [2, <sup>20</sup>]; (**d**): A three dimensional view of the Outgoing Gradients at the last iteration of the experiment with *merr* = 18.

## **6. Conclusions**

In this paper we have analyzed the bias commonly arising in parameter estimation problems where the model is lacking some deterministic part of the system. This result is useful in applications where an accurate estimation of parameters is important, e.g., in physical (grey-box) modelling typically arising in the model-based design of multi-physical systems, see e.g., the motivations that the authors did experience in the design of digital twins of controlled systems [10–12] for virtual prototyping, among an actually huge literature.

At this point, the method should be tested in a variety of applications, since the ULS approach here proposed is not applicable black-box as Least-Squares are. Indeed, it requires some additional a priori information. Moreover, since the computational complexity of the method here presented is relevant, efficient computational methods must be considered and will be a major issue in future investigations.

Another aspect that is even worth to deepen is also the possibility to design tests that contribute optimally to the reduction of the zero-residual region.

**Author Contributions:** Conceptualization, methodology, validation, formal analysis, investigation, software, resources, data curation, writing—original draft preparation, writing—review and editing, visualization: M.G. and F.M.; supervision, project administration, funding acquisition: F.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the project DOR1983079/19 from the University of Padova and by the doctoral gran<sup>t</sup> "Calcolo ad alte prestazioni per il Model Based Design" from Electrolux Italia s.p.a.

**Conflicts of Interest:** The authors declare no conflict of interest.
