In this section we will introduce sufficient conditions to remove the bias and retrieve the true solution in a unique way, as summarized in Lemma 4. Let us start with a definition.
In the following we assume that a good approximation of this intensity ratio is available and that its magnitude is sufficiently big, i.e., we have an approximate model that is quite accurate. This information about the model error will be used to reduce the bias, as shown in the following sections. Moreover we will consider also the norm (or, equivalently, the norm ).
3.1. The Case of Exact Knowledge about and
Here we assume, initially, to know the exact values of
and
, i.e.,
This ideal setting is important to figure out the problem also with more practical assumptions. First of all, let us show a nice geometric property that relates
and
under a condition like (
4).
Lemma 2. The problem of finding the set of that give a constant, prescribed value for and is equivalent to that of finding the set of of the decomposition (see the proof of Lemma 1) lying on the intersection of and the boundaries of two n-dimensional balls in . In fact, it holds: Proof. For every
holds,
where we used the fact that
with
,
, and
. Hence the equivalence (
5) is proved. □
Given
and
, we call the feasible set of accurate model responses all the
that satisfy the relations (
5). Now we will see that Lemma 2 allows us to reformulate problem (
2) in the problem of finding a feasible
that, replaced to
in (
2), gives as solution an unbiased estimate of
. Indeed, it is easy to note that
belongs to this feasible set. Moreover, since
, we can reduce the dimensionality of the problem and work on the subspace
which has dimension
, instead of the global space
of dimension
n. To this aim, let us consider
the matrix of the SVD decomposition of
,
, and complete its columns to an orthonormal basis of
to obtain a matrix
U. Since the vectors
belong to the subspace
, the vectors
defined such that
and
must have zeros on the last
components. Since
U has orthonormal columns, it preserves the norms and so
and
. If we call
the first
components of the vectors
(which have again the same norms of the full vectors in
) respectively, we have
In this way the problem depends only on the dimension of the known subspace, i.e., the value of
, and does not depend on the dimensions
and
. From (
8) we can deduce the equation of the
-dimensional boundary of an
-ball to which the vector
must belong. In the following we discuss the various cases.
3.1.1. Case
In this case, we have one unique solution when both conditions on
and
are imposed. When only one of these two is imposed, two solutions are found, shown in
Figure 1a,c.
Figure 1b shows the intensity ratio
.
3.1.2. Case
Consider the vectors
as defined previously, in particular we are looking for
. Hence, conditions (
8) can be written as
where the right equation is the
-dimensional subspace (line)
obtained subtracting the first equation to the second. This subspace has to be intersected with one of the beginning circumferences to obtain the feasible vectors
, as can be seen in
Figure 2a and its projection on
in
Figure 2b. The intersection of the two circumferences (
5) can have different solutions depending on the value of
. When this value is strictly positive there are zero solutions, this means that the estimates of
and
are not correct: we are not interested in this case because we suppose the two values to be sufficiently well estimated. When the value is strictly negative there are two solutions, that coincide when the value is zero.
When there are two solutions, we have no sufficient information to determine which one of the two solutions is the true one, i.e., the one that gives : we cannot choose the one that has minimum residual, neither the vector that has the minimum angle with f, because both solutions have the same values of these two quantities. However, since we are supposing the linear system to be originated by an input/output system, where the matrix is a function also of the input and f are the measurements of the output, we can take two tests with different inputs. Since all the solution sets contain the true parameter vector, we can determine the true solution from their intersection, unless the solutions of the two tests are coincident. The condition for coincidence is expressed in Lemma 3.
Let us call the matrix of the test , to which correspond a vector . The line on which lie the two feasible vectors of the same test i is and is the line through the two solution points. To have two tests with non-coincident solutions, we need that these two lines do not have more than one common point, that in the case is equivalent to , i.e., , i.e., . We represent the lines by means of their orthogonal vector from the origin We introduce the matrices such that , , and such that .
Lemma 3. Consider two tests from the same system with with the above notation. Then it holds if and only if .
Proof. From the relation
, we have
It holds
hence we will show this second equivalence. We note that
and calculate
Now let us call
the vector such that
, then, using the fact that
we obtain
Hence we have □
3.1.3. Case
More generally, for the case
, consider the vectors
as defined previously, in particular we are looking for
. Conditions (
8) can be written as
where the two equations on the left are two
-spheres, i.e., the boundaries of two
-dimensional balls. Analogously to the case
, the intersection of these equations can be empty, one point or the boundary of a
-dimensional ball (with the same conditions on
). The equation on the right of (
13) is the
-dimensional subspace
on which lies the boundary of the
-dimensional ball of the feasible vectors
, and is obtained subtracting the first equation to the second one. In
Figure 3a the graphical representation of the decomposition
for the case
is shown, and in
Figure 3b the solution ellipsoids of 3 tests whose intersection is one point.
Figure 4a shows the solution hyperellipsoids of 4 tests whose intersection is one point, in the case
.
We note that, to obtain one unique solution
we must intersect the solutions of at least two tests. Let us give a more precise idea of what happens in general. Given
tests we call, as in the previous case,
the vector orthogonal to the
-dimensional subspace
that contains the feasible
, and
. We project this subspace on
and obtain
that we describe through its orthogonal vector
. If the vectors
are linearly independent, it means that the
-dimensional subspaces
intersect themselves in one point. In
Figure 4b it is shown an example in which, in the case
the vectors
are not linearly independent. The three solution sets of this example will intersect in two points, hence, for
, three tests are not always sufficient to determine a unique solution.
Lemma 4. For all , the condition that, given tests, the hyperplanes previously defined have linearly independent normal vectors is sufficient to determine one unique intersection, i.e., one unique solution vector , that satisfies the system of conditions (4) for each test. Proof. The intersection of
independent hyperplanes in
is a point. Given a test
i and
the affine subspace of that test
where
is the normal vector of the linear subspace and
the translation with respect to the origin.
The conditions on relative to tests correspond to a linear system , where is the i-th row of A and each component of the vector b given by . The matrix A has full rank because of the linear independence condition of the vectors , hence the solution of the linear system is unique.
The unique intersection is due to the hypothesis of full column rank of the matrices : this condition implies that the matrices map the surfaces to hyperplanes . □
For example, with (Lemma 3) this condition is equal to considering two tests with non-coincident lines , i.e., two non-coincident .
3.2. The Case of Approximate Knowledge of and Values
Let us consider
N tests and call
and
the values as defined in Lemma 2, relative to test
i. Since the system of conditions
is equivalent, as shown in Lemma 2, we will take into account the system on the right for its simplicity: the equation on
represents an hyperellipsoid, translated with respect to the origin.
In a real application, we can assume to know only an interval in which the true values of
is contained and, analogously, an interval for
values. Supposing we know the bounds on
and
, then the bounds on
can be easily computed. Let us call these extreme values
, we will assume it always holds
for each
i-th test of the considered set
.
Condition (
4) is now relaxed as follows: the true solution
satisfies
for each
i-th test of the considered set
.
Assuming the extremes to be non-coincident (
and
), these conditions do not define a single point, i.e., the unique solution
(as in (
4) of
Section 3.1), but an entire closed region of the space that may be even not connected, and contains infinite possible solutions
x different from
.
In
Figure 5 two examples, with
, of the conditions for a single test are shown: on the left in the case of exact knowledge of the
and
values, and on the right with the knowledge of two intervals containing the right values.
Given a single test, the conditions (
16) on a point
x can be easily characterized. Given the condition
we write
with
the vectors of the orthogonal basis, given by the columns
V of the SVD decomposition
. Then
Since the norm condition
holds, then we obtain the equation of the hyperellipsoid for
as:
The bounded conditions hence gives the region inside the two hyperellipsoids centered in the origin:
Analogously for the
condition, the region inside the two translated hyperellipsoids:
Given a test
i, each of the conditions (
18) and (
19), constrain
to lie inside a thick hyperellipsoid, i.e., the region between the two concentric hyperellipsoids. The intersection of these two conditions for test
i is a zero-residual region that we call
It is easy to verify that if is equal to the assumed or , or is equal to the assumed or , the true solution will be on a border of the region , and if it holds for both and it will lie on a vertex.
When more tests
are put together, we have to consider the points that belong to the intersection of all these regions
, i.e.,
These points minimize, with zero residual, the following optimization problem:
It is also easy to verify that, if the true solution lies on an edge/vertex of one of the regions , it will lie on an edge/vertex of their intersection.
The intersected region tends to monotonically shrink in a way that depends from the properties of the added tests. We are interested to study the conditions that make it reduce to a point, or at least to a small region. A sufficient condition to obtain a point is given in Theorem 1.
Let us first consider the function that, given a point in the space
, returns the squared norm of its image through the matrix
:
where
are the columns of
V and
.
The direction of maximum increase of this function is given by its gradient
Analogously, define the function
as
with gradient
Definition 2. (Upward/Downward Outgoing Gradients) Take a test i, and the functions and as in (23) and (25), with the formulas of the gradient vectors of these two functions as in (24) and (26). Given the two extreme values and for each test, let us define Note that the upward/downward outgoing gradient of function
(or
) on point
x is the normal vector to the tangent plane on the hyperellipsoid on which the point lies. Moreover, these vectors point outward the region defined by Equation (
18) (and (
19) respectively). In
Figure 6, an example of some upward/downward outgoing gradients of function
is shown.
Theorem 1. Given N tests with values and in the closed intervals and , take the set of all the upward/downward outgoing gradients of functions and calculated in the true solution , i.e., If there is at least one outgoing gradient of this set in each orthant of , then the intersection region of Equation (21) reduces to a point. Proof. What we want to show is that given any perturbation
of the real solution
, there exists at least one condition among (
18) and (
19) that is not satisfied by the new perturbed point
.
Any sufficiently small perturbation in an orthant in which lies an upward/downward outgoing gradient (from now on "Gradient"), determines an increase/decrease in the value of the hyperellipsoid function relative to that Gradient, that makes the relative condition to be unsatisfied.
Hence, if the Gradient in the orthant considered is upward, it satisfies
(or analogously with
) and for each perturbation
in the same orthant we obtain
(or analogously with
). In the same way, if the Gradient is downward we obtain
(or analogously with
).
When in one orthant there are more than one Gradient, it means that more than one condition will be unsatisfied by the perturbed point for a sufficiently small in that orthant. □