The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments

Forbes, Alistair

doi:10.3390/a17050193

Open AccessArticle

The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments

by

Alistair Forbes

National Physical Laboratory, Teddington TW11 0LW, UK

Algorithms 2024, 17(5), 193; https://doi.org/10.3390/a17050193

Submission received: 27 March 2024 / Revised: 25 April 2024 / Accepted: 30 April 2024 / Published: 2 May 2024

(This article belongs to the Special Issue Numerical Optimization and Algorithms: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

This paper addresses the following problem: given m potential observations to determine n parameters,

m > n

, what is the best choice of n observations. The problem can be formulated as finding the

n \times n

submatrix of the complete

m \times n

observation matrix that has maximum determinant. An algorithm by Gu and Eisenstat for a determining a strongly rank-revealing QR factorisation of a matrix can be adapted to address this latter formulation. The algorithm starts with an initial selection of n rows of the observation matrix and then performs a sequence of row interchanges, with the determinant of the current submatrix strictly increasing at each step until no further improvement can be made. The algorithm implements rank-one updating strategies, which leads to a compact and efficient algorithm. The algorithm does not necessarily determine the global optimum but provides a practical approach to designing an effective measurement strategy. In this paper, we describe how the Gu–Eisenstat algorithm can be adapted to address the problem of optimal experimental design and used with the QR algorithm with column pivoting to provide effective designs. We also describe implementations of sequential algorithms to add further measurements that optimise the information gain at each step. We illustrate performance on several metrology examples.

Keywords:

design of experiment; D-optimality; metrology; QR factorisation; uncertainty evaluation

1. Introduction

While machine learning paradigms are based on access to large sets of data to extract the information of interest, many practical problems in science and technology arise in a context in which data are sparse and potentially expensive to generate. Given limited experimental resources, it is important that these resources are used effectively. The design of experiment (DoE) is an important area of study in applied mathematics and statistics. Many DoE studies relate to problems in which a response variable of interest, e.g., the yield of a crop, is possibly influenced by a number of control or stimulus variables, e.g., crop variety, soil type, irrigation regime, fertiliser treatment, etc. The form of the response function is usually unknown and the DoE problem is to design a number of experiments with the control variables set to different values, in order to estimate the response function from a set of candidate response functions and consequently choose control values that optimise the response; see, e.g., [1,2,3,4,5,6,7,8]. The DoE problem has been addressed in applications across many scientific disciplines, including the placement of sensors in a electrical power grid [9,10] and the measurement of thermal properties of materials [11,12], for example.

In this paper, we look at a particular aspect of the design problem: given a known response model, how do we choose a minimal number of measurements, taking into account the associated uncertainties, in order to estimate all the parameters of the system under study as accurately as possible. Thus, the problem can be posed as: given m potential observations to determine n parameters,

m > n

, what is the best choice of n observations. In the context of (linear) least squares estimation, a version of the problem can be formulated as finding the

n \times n

submatrix of the complete

m \times n

observation matrix that has maximum determinant. The problem can be posed as an example of the following general optimisation problem. Suppose

A = {a_{i}, i = 1, \dots, m}

is a set with m elements. Represent the set

P (A)

of all subsets of A as the set of characteristic functions

χ : A ⟶ {0, 1}

: if

B \subset A

, then the corresponding function

χ_{B} \in P (A)

is such that

χ_{B} (a_{i}) = 1

if

a_{i} \in B

and is 0 otherwise. Let

D : P (A) ⟶ R

be a real valued function defined on

P (A)

. The optimisation problem is

min_{χ \in P (A)} D (χ) subject to \sum_{i = 1}^{m} χ (a_{i}) = n .

(1)

In theory, the global solution to this DoE problem can be found by a brute force assessment of all choices of subsets, a total of

m! / (n! (m - n)!)

possibilities. For all but small m and n, this approach will be computationally infeasible, even if the functional D is cheap to evaluate. Two other related approaches are commonly used to partially solve this type of combinatorial optimisation problem.

The first approach, with a long history, see e.g., [13,14,15], is to use a sequential approach that at each stage constructs a new candidate subset by removing one element from the subset, replacing it by a new element that leads to a strict improvement in D. The computational cost per step for such an algorithm will be given by the cost evaluating D and the cost of searching for a suitable exchange from the

n (m - n)

possibilities. Ideally, the computational cost of an exchange should be no worse than

O (m)

or

O (m log m)

as a function of m. The second approach is based on convex relaxation, replacing a combinatorial problem by a convex optimisation problem [16]. Let

0 \leq w_{i} = w (a_{i}) \leq 1

, where we think of

w_{i}

as specifying the degree to which the ith element is in the solution optimal subset. If

W (A)

is the set of functions

A ⟶ [0, 1]

, then the optimisation problem is defined by

min_{w \in W (A)} D (w) subject to \sum_{i = 1}^{m} w (a_{i}) = n .

(2)

In this approach, the combinatorial constraint

χ (a_{i}) \in {0, 1}

is relaxed to the convex constraint

0 \leq w (a_{i}) \leq 1

. Often, the optimal w is in a fact a characteristic function associated with an n-element subset. The optimisation algorithms for such problems are generally iterative and often also have a sequential element, such as in sequential quadratic programming [17] and semidefinite programming [18]. Convex relaxation approaches to DoE problems were considered in [4,9,10,16], for example.

In this paper, we show how two algorithms, both following a sequential approach and developed for numerical linear algebra applications, can be repurposed to address the design problem of choosing n observations from m possible observations.

The remainder of this paper is organised as follows. In Section 2, we give an overview of linear least squares estimation, while in Section 3, we describe commonly used aggregate measures of uncertainty and the corresponding criteria used in optimal experimental design. In Section 4, we describe two algorithms that address the DoE problem under consideration, as well as sequential algorithms for choosing additional observations in a step-wise optimal way. Example applications of the algorithms to design problems arising in metrology [19,20,21,22] are given in Section 5. Our concluding remarks are given in Section 6.

2. Least Squares Problems

For

m \times n

observation matrix C,

m \times 1

vector of measurements y, the least squares estimate a is the solution of

min_{α} {(y - C α)}^{⊤} (y - C α) .

The QR factorisation of C expresses

C = Q R

as a product of an

m \times m

orthogonal matrix Q and an upper triangular matrix R. If

C = [Q_{1} Q_{2}] [\begin{matrix} R_{1} \\ 0 \end{matrix}] = Q_{1} R_{1},

where

Q_{1}

is the submatrix constructed from the first n columns of Q and

Q_{2}

that is constructed from the remaining

m - n

columns, then

a = {(C^{⊤} C)}^{- 1} C^{⊤} y = R_{1}^{- 1} Q_{1}^{⊤} y,

\hat{y} = C a = C {(C^{⊤} C)}^{- 1} C^{⊤} y = Q_{1} Q_{1}^{⊤} y,

r = y - C a = (I - Q_{1} Q_{1}^{⊤}) y = Q_{2} Q_{2}^{⊤} y,

are the least squares solution, model approximant, and associated residual vector. The QR factorisation enables a to be given by the solution of

R_{1} a = Q_{1}^{⊤} y

and this solution can be constructed using back substitution, solving for

a_{n}

first, then

a_{n - 1}

, etc., exploiting the upper-triangular form of

R_{1}

. The QR factorisation also avoids the numerical loss of accuracy that can occur in forming

C^{⊤} C

and its inverse in the case in which C is ill-conditioned [23] (Chapter 5). The computational cost of evaluation the least-squares solution in

O (m n^{2})

.

The

n \times m

matrix

S = R_{1}^{- 1} Q_{1}^{⊤}

is the sensitivity matrix of the solution parameters a with respect to the data y with

s_{j i} = \partial a_{j} / \partial y_{i}

. If the variance matrix associated with y is

V_{y}

, then the variance matrix associated with the solution a is

V_{a} = S V_{y} S^{⊤} .

For the case

V_{y} = I

,

V_{a} = S S^{⊤} = {(R_{1}^{⊤} R_{1})}^{- 1} = {(C^{⊤} C)}^{- 1} .

For this case, the trace of the variance matrix is given by

Tr (V_{a}) = \sum_{j} \sum_{i} s_{j i}^{2},

the variance associated with

a_{j}

is

u^{2} (a_{j}) = \sum_{i} s_{j i}^{2}

and

\sum_{j} s_{j i}^{2}

is the contribution of the uncertainty associated with the ith observation to the trace of

V_{a}

.

The least-squares solution a is the maximum likelihood estimate of

α

for the model

y \in N (C α, I)

. If y is associated with the variance matrix

V_{y}

, i.e.,

y \in N (C α, V_{y})

, and

V_{y}

is full rank and has Cholesky decomposition

V_{y} = L L^{⊤}

, where L is lower triangular [23], then the maximum likelihood estimate a of

α

solves the Gauss–Markov problem

min_{α} {(y - C α)}^{⊤} V_{y}^{- 1} (y - C α) .

(3)

If

\tilde{C} = L^{- 1} C, \tilde{y} = L^{- 1} y,

then the solution of (3) is the least-squares solution of

\tilde{C} α \approx \tilde{y}

:

a = {({\tilde{C}}^{⊤} \tilde{C})}^{- 1} {\tilde{C}}^{⊤} \tilde{y} .

In particular, if

V_{y}

is a diagonal matrix with

u_{i}^{2} > 0

in the ith diagonal element, then

\tilde{C}

is constructed from the weighted rows of C, and similarly for

\tilde{y}

:

{\tilde{c}}_{i} = w_{i} c_{i}, {\tilde{y}}_{i} = w_{i} y_{i}, w_{i} = 1 / u_{i} .

In general, the cost of evaluating the Cholesky factorisation of

V_{y}

is

O (m^{3})

, which could be problematic for large m. Often,

V_{y}

has a structure that enables the factorisation to be performed in

O (m)

, as in the simple but common case of a diagonal variance matrix.

3. Aggregate Measures of Uncertainty

A central issue in design of experiment analysis is to maximise some measure of the information gain subject to constraints on resources. In a metrology context, in particular, it is natural to express the information gain in terms of uncertainties associated with the fitted parameters; i.e., related to the

n \times n

variance matrix

V_{a}

. Given an aggregate measure of uncertainty derived from

V_{a}

, one experiment will be judged to be more informative than another if it is associated with a lower aggregate uncertainty. There are a number of aggregate measures that are commonly used. In this paper, we are concerned with two: A-optimality and D-optimality.

A full rank variance matrix V is a symmetric positive definite matrix and has an eigenvalue decomposition with eigenvalues

λ_{j}

,

j = 1, \dots n

. The A-measure is the trace of

Tr (V)

of V, the sum of the diagonal elements of V, as well as the sum of the eigenvalues of V:

Tr (V) = \sum_{j = 1}^{n} λ_{j} .

We note that

Tr (σ^{2} V) = σ^{2} Tr (V)

. The D-measure is the determinant

| V |

of V, and also the product of the eigenvalues of V:

| V | = \prod_{j = 1}^{n} λ_{j} .

We also evaluate

\bar{d} (V) = {| V |}^{1 / n} = {(\prod_{j = 1}^{n} λ_{j})}^{1 / n},

(4)

the geometric mean of the eigenvalues, noting that

\bar{d} (σ^{2} V) = σ^{2} \bar{d} (V)

.

Arguments can be made for and against using each of these two measures. Given V, the A-measure can be evaluated in

O (n)

steps, while the D-measure will usually involve

O (n^{3})

steps, using the Cholesky factorisation of the variance matrix to evaluate its determinant. The A-measure has the apparent advantage of being most easily interpreted as an aggregate uncertainty, being the sum of the variances

\sum_{j} u^{2} (a_{j})

associated with each of the parameter estimates

a_{j}

or, more precisely, the numerical values of these estimates. From a dimensional analysis point of view [24], the A-measure can only be meaningful if all the parameters in a have the same dimension, length, mass, etc. Otherwise, the trace involves adding quantities of different dimensions. The D-measure is consistent with a dimensional analysis.

The A-measure can also be regarded as the variance associated with the sum

\sum_{j} a_{j}

of parameter estimates, assuming these estimates as statistically independent. In general, they will not be independent, and one disadvantage of the A-measure is that it takes no account of correlation, being defined solely in terms of the diagonal elements of V. The D-measure does take into account correlation. Consider the

2 \times 2

variance matrix

V = [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}], Tr (V) = 2, | V | = 1 - ρ^{2},

associated with parameters

a_{1}

and

a_{2}

. The A-measure is independent of the correlation coefficient

ρ

while the D-measure depends directly on

ρ^{2}

. If the result of an experiment is to change

ρ

from 0 to near 1, there is a gain of information, since

a_{1} - a_{2}

is now much more accurately estimated. The D-measure reflects this information gain, the A-measure does not.

The D-measure is invariant to re-parametrizations of the model, while the A-measure is not. Suppose

V_{a} = {(C^{⊤} C)}^{- 1}

and let

b = B a

, where B is

n \times n

and full rank, so that b can be regarded as an alternative parametrization of the problem. For the same experimental design, the variance matrix associated with the estimates b is given by

V_{b} = B V_{a} B^{⊤} .

We have

| V_{b} | = | B V_{a} B^{⊤} {| = | B |}^{2} | V_{a} |,

so that a strategy that is optimal for the D-measure for parametrization a will also be optimal for parametrization b. For the A-measure,

Tr (V_{b}) = Tr (B V_{a} B^{⊤}) = Tr (B^{⊤} B V_{a}) = Tr (V_{a} B^{⊤} B),

and, in general, there is no constant K such that

Tr (B V_{a} B^{⊤}) = K Tr (V_{a})

, unless

B^{⊤} B

is a multiple of the identity matrix. In a similar vein, the D-measure is invariant with respect to changes in unit, the A-measure is not.

The eigenvalues

λ_{j}

of

V_{a}

are related to the size of the confidence ellipses centred about the parameter estimates a. In fact, for a given confidence level, the lengths of the semi-axes are proportional to

\sqrt{λ_{j}}

. Hence, the D-measure is related to the square of the volume of the confidence ellipse, while the A-measure is related to the sum of the squares of the semi-axis lengths.

4. Choosing a D-Optimal Subset of n Measurements

Let C be an

m \times n

observation matrix,

m > n

, representing m measurements that could potentially be made. If we are constrained only to use n measurements, which n should we choose, in order to minimise an aggregate measure of the uncertainties associated with the fitted parameters? It is assumed that there is at least one

n \times n

submatrix constructed from the rows of C that is full rank, allowing all the parameters a to be estimated. In this section, we consider two algorithms described below in Section 4.1 and Section 4.2, for addressing the issue of subset selection with a view to providing a D-optimal solution. Both algorithms were designed to address other problems but can be repurposed for design of experiment applications.

Let

C_{k}

be a full rank

n \times n

submatrix of C defined by a subset of the rows of C indexed by row indices

I_{k} = {i_{1}, \dots, i_{n}}

, and set

V_{k} = H_{k}^{- 1}

, where

H = C_{k}^{⊤} C_{k}

. Then,

| V_{k} | = | H_{k} |^{- 1} = {| C_{k} |}^{- 2},

so that choosing

I_{k}

to minimise

| V_{k} |

amounts to choosing

I_{k}

to maximise

| C_{k} |

. For any square

n \times n

matrix A with rows

a_{i}

,

| A |

is the volume of the parallelopiped spanned by the vectors

a_{i}

in

R^{n}

. This volume will depend on the Euclidean lengths

∥ a_{i} ∥_{2}

of the vectors and also their mutual independence. For vectors of equal length,

| A |

is maximised if the vectors

a_{i}

are orthogonal to each other (and can therefore be rotated or reflected to align with the coordinate axes). In terms of the observation matrix C, the lengths of the row vectors

c_{i}

reflect the accuracy associated with the measurement

y_{i}

—more accurate measurements will be accorded a greater weight and therefore larger row vector

c_{i}

in terms of Euclidean length. The orthogonality of the row vectors relates to how independent the sets of information represented by the measurements are from each other. Consider

[\begin{matrix} y_{1} \\ y_{2} \end{matrix}] = C [\begin{matrix} a_{1} \\ a_{2} \end{matrix}] + [\begin{matrix} ϵ_{1} \\ ϵ_{2} \end{matrix}], C = C (θ) = [\begin{matrix} 1 & 0 \\ \cos θ & \sin θ \end{matrix}] ϵ_{1}, ϵ_{2} \in N (0, 1) .

Both measurements

y_{1}

and

y_{2}

are associated with the equal information gain, since

∥ c_{1} ∥ = ∥ c_{2} ∥ = 1

. The data point

y_{1}

provides information about

a_{1}

alone, while the data point

y_{2}

potentially provides information about both

a_{1}

and

a_{2}

. For

θ \approx 0

, the observation equation involving

y_{2}

only serves to provide more information about

a_{1}

, and

a_{2}

remains poorly estimated with

| C (θ) | = | \sin θ | \approx 0

. For

θ \approx π / 2

, the information provided by

y_{2}

provides significant new information about

a_{2}

and

| C (θ) | = | \sin θ | \approx 1

. In this way, amongst a set of measurements of comparable accuracy, we look for subsets that are mutually maximally independent. Both of the algorithms described below involve trying to select a set of measurements that are sufficiently independent.

4.1. Subset Selection Using the QR Factorisation with Column Pivoting

The QR factorisation of an

m \times n

matrix

C = Q R

has the following geometrical interpretation. The column vectors

c_{j}

of C define an n dimensional subspace

C

of

R^{m}

. The orthogonal matrix Q defines an axis system for

R^{m}

, such that the n columns of

Q_{1}

define an axis system for

C

and the

m - n

columns of

Q_{2}

define an axis system for the space of vectors

C^{⊥}

orthogonal to

C

. The columns for

Q_{1}

are constructed so that

q_{1}

is aligned with

c_{1}

, and so that there is an

r_{11}

such that

c_{1} = r_{11} q_{1}

. The vector

q_{2}

is chosen to lie in the plane defined by

c_{1}

and

c_{2}

, and so there are scalars

r_{12}

and

r_{22}

such that

c_{2} = r_{12} q_{1} + r_{22} q_{2}

, etc. This gives the factorisation

[c_{1} \dots c_{n}] = [q_{1} \dots q_{n}] [\begin{matrix} r_{11} & r_{12} & \dots & r_{1 n} \\ 0 & r_{22} & \dots & r_{2 n} \\ 0 & 0 & ⋱ & ⋮ \\ 0 & 0 & \dots & r_{n n} \end{matrix}],

i.e., in matrix notation

C = Q_{1} R_{1}

.

An equivalent geometrical interpretation is that the orthogonal matrix

Q^{⊤}

, a combination of rotations and reflections, transforms C so that its first column is aligned with the first axis in

R^{m}

, the second column lies in the plane defined by the first and second axes in

R^{m}

, the third column lies in the three-dimensional space defined by the first three axes in

R^{m}

, and so on:

Q^{⊤} C = R

.

For the standard QR algorithm, the columns of C are processed in the order they appear in the matrix C. For the QR algorithm with column pivoting [23], the order of the columns is changed to improve the numerical properties of the algorithm. The column of largest Euclidean length is aligned with the first axis. Of the remaining columns, the column that is furthest from the space defined by the first axis is then rotated to the plane defined by the first and second axes. At the

k + 1

th stage, it is the column furthest from the subspace defined by the first k axes directions that is chosen to be transformed to lie in the space defined by the first

k + 1

axes directions. The output of the algorithm includes the specification of a permutation matrix P that permutes the columns of C, along with an orthogonal Q and upper triangular R such that

C P = Q R

.

The first algorithm considered here is based on a subset selection algorithm described in [23] (Section 5.5). The issue addressed by the subset selection algorithm is that of trying to explain a measured response vector y in terms of basis vectors, usually representing covariates x and functions of covariates, stored in an

m \times p

matrix C. It is assumed that y lies close to a n dimensional subspace defined by a subset of the columns of C. The problem is to estimate n and choose n columns of C that best explain the response, using the QR with column pivoting to select columns that are optimally mutually independent. This information can then be used to predict other, possibly future, responses based on knowledge of the variates represented by the selected columns. The algorithm uses the singular value decomposition (SVD) to estimate n and the QR algorithm with column pivoting to select an appropriate set of n columns.

Our design of experiment problem is different. The observation matrix C of all potential measurements is assumed to be full rank, so estimating the rank of C is not required. More importantly, our selection problem is to select from the rows of C, not the columns. However, the selection of a set of vectors that are maximally mutually independent is directly relevant. Suppose C has QR factorisation

C = Q_{1} R_{1}

, where

Q_{1}

is

m \times n

and

R_{1}

in

n \times n

and upper-triangular and full rank. The matrix

R_{1}

can be used to re-parametrize the problem in terms of

b = R_{1}^{- 1} a

. The observation associated with parameters b is

Q_{1}

. Since D-optimal designs are independent of parametrization, it is sufficient to work with

Q_{1}

as follows. We use column pivoting to find the QR factorisation of

Q_{1}^{⊤}

so that

Q_{1}^{⊤} P = U T

, where U is an

n \times n

orthogonal matrix and T is upper-triangular. The first n columns selected by the QR decomposition of

Q_{1}^{⊤}

specified by P also specify the n rows of C that are a good choice for the D-optimal design. This algorithm is summarised below (Algorithm 1). Its computational complexity is

O (m n^{2})

.

Algorithm 1: SSQR: subset selection algorithm using QR with column pivoting to determine a favourable subset of an observation matrix in terms of the D-measure

Data: An

m \times n

observation matrix C.

Result: Index set

I \subset {i = 1, \dots, m}

where the first n elements specify

C_{n}

that represent a good selection of the rows of C in terms of maximising the determinant.

Steps:

Determine the QR factorization of

C = Q_{1} R_{1}

, where

Q_{1}

is an

m \times n

orthogonal matrix.

Use column pivoting to find the QR factorisation of

Q_{1}^{⊤}

so that

Q_{1}^{⊤} P = U T

, where U is an

n \times n

orthogonal matrix and T is upper-triangular.

Assign the index I by assigning

I (k) = i_{k}

where

i_{k}

specifies the row index of the kth column of P such that

P (i_{k}, k) = 1

.

The SSQR algorithm is not targeted directly at determining a D-optimal design but is an approach that, at least heuristically, has elements that makes it likely to produce a design that is good in terms of the D-measure. The algorithm is an example of a sequential ‘greedy algorithm’, in that at each step, a choice is made that maximises some measure, irrespective of what previous steps have been made, and takes no account of what choices could arise in a future step. Once a row has been selected, there is no opportunity for the row to be deselected at a future iteration. The algorithm is largely independent of the ordering of the rows of the observation matrix C and, other than for reasons of rounding error effects and symmetries in the observation matrix, will arrive at the same solution for different row orderings. As has been mentioned, there is no guarantee that this solution is globally optimal. Example calculations using the algorithm are discussed in Section 5.

4.2. Exchange Approach Based on the Gu and Eisenstat Algorithm

The second algorithm described in this paper is directly targeted at determining a D-optimal design. It is based on the iterative algorithm of Gu and Eisenstat [25] for determining a strongly rank-revealing QR decomposition. Below, we show how it can be adapted to provide a reduction in the D-measure at each iteration through exchanging rows and to stop when no reduction is possible. It starts with a selection

C_{0}

of n rows of the

m \times n

observation matrix C, where

C_{0}

is already full rank; e.g., that provided by the SSQR subset selection algorithm of Section 4.1.

We recall that minimising the D-measure is equivalent to maximising the absolute value of the determinant of the upper-triangular factor of the observation matrix. The idea of the algorithm is as follows. Partition

C^{⊤}

as

C^{⊤} = [\begin{matrix} A & B \end{matrix}]

, where A is

n \times n

and B is

n \times (m - n)

. We assume that

A = C_{0}^{⊤}

is full rank. The Gu–Eisenstat (GE) algorithm uses the following result:

Proposition 1.

For

n \times n

full rank matrix A and

n \times p

matrix B, let

\tilde{A}

be the matrix constructed by replacing the ith column of A,

1 \leq i \leq n

, by the jth column of B,

1 \leq j \leq p

. Then,

| \tilde{A} | = F_{i, j} | A |

where

F = A^{- 1} B

.

Proof.

For n-vectors u and v,

| A + u v^{⊤} | = | A | (1 + v^{⊤} A^{- 1} u);

(5)

see Appendix A.2. The result follows for

v = e_{i}

, the ith column of the identity matrix and

u = b_{j} - a_{i}

, noting that

A^{- 1} a_{i} = e_{i}

. □

This suggests the following approach, assuming that the first n rows of C are full rank (Algorithm 2).

Algorithm 2: Basic steps to determine a D-optimal subset of an observation matrix

The main computational element of this algorithm is forming

F = A^{- 1} B

at each iteration. Forming F explicitly in this way involves

O (n^{3})

steps to evaluate

A^{- 1}

and a further

O (m n)

steps to evaluate the matrix product. Let Q be orthogonal such

\tilde{A} = Q^{⊤} A

is upper-triangular and set

Q^{⊤} [\begin{matrix} A & B \end{matrix}] = [\begin{matrix} \tilde{A} & \tilde{B} \end{matrix}] .

(6)

Then,

F = A^{- 1} B = {\tilde{A}}^{- 1} \tilde{B}

, and the latter can be computed efficiently using back-substitution. Thus, step 4 in Algorithm 2 can be replaced by

4.1: Determine orthogonal Q such that $Q^{⊤} A$ is upper-triangular.
4.2: Set

$[\begin{matrix} A & B \end{matrix}] : = Q^{⊤} [\begin{matrix} A & B \end{matrix}], F = A^{- 1} B .$

Since

| Q^{⊤} A | = | A |

, up to sign, the modified algorithm still determines the optimal index set. As described in the Gu and Eisenstat paper, these computations can be made much more efficient by exploiting the fact that at each iteration we are performing a rank-one modification to update an upper triangular matrix in step 2 in Algorithm 2 above. Suppose we have determined to replace ith column of the upper-triangular matrix A by the jth column of B. First, we move the ith column of A to the nth column, as shown schematically in (7).

\begin{matrix} * & * & i & * & * & * \\ * & i & * & * & * \\ i & * & * & * \\ * & * & * \\ * & * \\ * \end{matrix} ⟶ \begin{matrix} * & * & * & * & * & i \\ * & * & * & * & i \\ * & * & * & i \\ * & * & * \\ * & * \\ * \end{matrix}

(7)

This step can be written as

[\begin{matrix} A_{1} & a_{i} & A_{2} \end{matrix}] \mapsto [\begin{matrix} A_{1} & A_{2} & a_{i} \end{matrix}] = A Π_{i, n}

where

Π_{i, n}

is the appropriate permutation matrix. The right-hand matrix is upper-triangular except for the sub-diagonal elements stored in

A_{2}

. Let Q be the orthogonal matrix where

\tilde{A} = Q^{⊤} A Π_{i, n}

is upper-triangular and set

\tilde{B} = Q^{⊤} B

. Note that

Q^{⊤}

only acts on rows and columns i to n of

A Π_{i, n}

and rows i to n of B. We have that

\tilde{F} = {\tilde{A}}^{- 1} \tilde{B} = Π_{i, n}^{⊤} F .

We can now replace the nth column

{\tilde{a}}_{n}

of

\tilde{A}

by the jth column

{\tilde{b}}_{j}

of

\tilde{B}

, forming matrices

\bar{A} = \tilde{A} + u e_{n}^{⊤}, \bar{B} = \tilde{B} - u e_{j}^{⊤}, u = {\tilde{b}}_{j} - {\tilde{a}}_{n} .

Note that the matrix

\bar{A}

on the left remains upper-triangular. For the next iteration it remains to calculate

\bar{F} = {\bar{A}}^{- 1} \bar{B}

. Using the Sherman–Morrison formula [26] (Appendix A.1), we have

{\bar{A}}^{- 1} = {(\tilde{A} + u e_{n})}^{- 1} = {\tilde{A}}^{- 1} - \frac{\tilde{u} {\tilde{v}}^{⊤}}{1 + {e_{n}}^{⊤} \tilde{u}}, \tilde{u} = {\tilde{A}}^{- 1} u, \tilde{v} = {\tilde{A}}^{- ⊤} e_{n},

so that

\begin{matrix} {\bar{A}}^{- 1} \bar{B} & = & {\tilde{A}}^{- 1} \tilde{B} - {\tilde{A}}^{- 1} u e_{j}^{⊤} - \frac{\tilde{u} ({\tilde{v}}^{⊤} \tilde{B})}{1 + {e_{n}}^{⊤} \tilde{u}} + \frac{\tilde{u} ({\tilde{v}}^{⊤} u) e_{j}^{⊤}}{1 + {e_{n}}^{⊤} \tilde{u}}, \\ = & \tilde{F} + \tilde{u} w^{⊤}, \end{matrix}

(8)

where

w = (\frac{{\tilde{v}}^{⊤} u}{1 + e_{n}^{⊤} \tilde{u}} - 1) e_{j} - \frac{{\tilde{B}}^{⊤} \tilde{v}}{1 + {e_{n}}^{⊤} \tilde{u}} .

Thus,

\bar{F}

is a rank-one update of

\tilde{F}

. The expressions above can be considerably simplified. We note that

\tilde{u} = \tilde{F} (:, j) - e_{n}, \frac{{\tilde{v}}^{⊤} u}{1 + e_{n}^{⊤} \tilde{u}} - 1 = - 1 / \tilde{F} (n, j), \frac{{\tilde{B}}^{⊤} \tilde{v}}{1 + {e_{n}}^{⊤} \tilde{u}} = - \tilde{F} {(n, :)}^{⊤} / \tilde{F} (n, j),

so that

w = - \frac{1}{{\tilde{F}}_{n, j}} (\tilde{F} {(n, :)}^{⊤} + e_{j}) .

Recalling that

\tilde{F} = Π_{i, n}^{⊤} A^{- 1} B

, we see that from (8),

{\bar{A}}^{- 1} \bar{B}

can be constructed as a rank-one update of the previously calculated

F = A^{- 1} B

. We also note that

\tilde{F} (n, j) = F (i, j) > 1

(since the update is only invoked to increase

| A |

) so that the rank-one update of F is numerically stable. The formation of

\tilde{A}

and

\bar{A}

requires

O (n^{2})

steps and the formation of

\bar{F}

requires

O (m n)

steps. These calculations lead to a computationally efficient and numerically stable version (Algorithm 3) of Algorithm 2:

Algorithm 3: GE: efficient algorithm to determine a D-optimal subset of an observation matrix

The algorithm is a sequential descent algorithm, with each exchange producing a decrease in the determinant of the variance matrix by at least a factor of

1 / f^{2} < 1

, ensuring that it stops in a finite number of steps. However, there is no guarantee that the algorithm will determine the globally D-optimal subset, only a locally D-optimal solution. A different ordering of the rows of the observation matrix is quite likely to produce a different solution. Consider the following matrix

C (a)

:

C (a) = [\begin{matrix} 1.00 & 0.00 & 0.00 & 0.00 \\ 0.00 & 1.00 & 0.00 & 0.00 \\ 0.00 & 0.00 & 1.00 & 0.00 \\ 0.00 & 0.00 & 0.00 & a \\ 0.50 & 0.50 & 0.50 & 0.50 \\ 0.17 & - 0.83 & 0.17 & 0.50 \\ 0.17 & 0.17 & - 0.83 & 0.50 \\ - 0.83 & 0.17 & 0.17 & 0.50 \end{matrix}] .

(9)

The bottom four rows form an orthogonal matrix Q with

| Q | = 1

. However, for

a > 0.5

, there is no row interchange that will allow the determinant of the first four rows to be increased, and the GE algorithms stops with the first four rows with determinant a, sub-optimal if

a < 1.0

. For

0.5 < a < 1.0

, the SSQR algorithm, by contrast, selects the bottom four rows, in this case, the globally optimal solution.

4.3. Combined SSQR-GE Algorithm

An effective approach in practice is to combine the SSQR and GE algorithms, using the SSQR algorithm to find an initial selection of columns, and to use the GE interchange algorithm to improve on, if possible, this selection. This combined algorithm has the advantages of being largely independent of the row ordering and will deliver a local minimum based on the SSQR algorithm starting point. Again, there is no guarantee that the solution will be a global optimum, but the examples discussed in Section 5 show that the combined algorithm works well.

4.4. Sequential Approach for Choosing a Set of Additional Observations

The algorithms described in Section 4.1 and Section 4.2 are designed to determine a minimal set of observations that produce D-optimal estimates of the model parameters a. We now consider the case in which estimates of a have already been established. Suppose the (full rank) variance matrix

V = L L^{⊤} = {(R^{⊤} R)}^{- 1}

represents the current knowledge about parameters a and up to m (new) measurements are potentially available, modelled according to

y_{i} \sim N (c_{i}^{⊤} a, 1), i = 1, \dots, m .

(As before, we assume that, for each i,

y_{i}

and

c_{i}

have been weighted so that the uncertainty associated with

y_{i}

is 1.) Which measurements should we make in order to best improve our knowledge about a? We suppose that resources exist to make p more observations. An exhaustive search of all combinations is only feasible for small p and modest m. The sequential algorithms below add the observations one at a time, with the choice at each stage designed to maximise the information. These are further examples of so-called ‘greedy algorithms’.

Let C be the

m \times n

observation matrix whose ith row is

c_{i}^{⊤}

and

V_{i}

the variance matrix associated with a given that the ith measurement (and only the ith measurement) is made. Then,

V_{i} = {(R^{⊤} R + c_{i} c_{i}^{⊤})}^{- 1} = V - \frac{V c_{i} {(V c_{i})}^{⊤}}{1 + c_{i}^{⊤} V c_{i}},

using the Sherman–Morrison formula [26] (Appendix A.1). Let

f_{i} = V c_{i}, g_{i}^{2} = c_{i}^{⊤} f_{i} = c_{i}^{⊤} V c_{i} .

(10)

Then,

V_{i} = V - \frac{f_{i} f_{i}^{⊤}}{1 + g_{i}^{2}} = V - u_{i} u_{i}^{⊤}, u_{i} = \frac{1}{\sqrt{1 + g_{i}^{2}}} f_{i} .

(11)

so that

V_{i}

is a symmetric rank-one update of V.

The eigenvalues

λ_{j, i}

of

V_{i}

can be related to the eigenvalues

λ_{j}

of V through the following result [23] (Section 8.1), [27] (p. 94):

Proposition 2.

Suppose

W = V + ρ u u^{⊤}

, where V is an

n \times n

symmetric matrix, u is an n-vector with

u^{⊤} u = 1

and ρ is a constant. Suppose V and W have eigenvalues

λ_{1} (V) \geq λ_{1} (V) \geq λ_{2} (V) \geq \dots \geq λ_{n} (V),

and

λ_{1} (W) \geq λ_{1} (W) \geq λ_{2} (W) \geq \dots \geq λ_{n} (W) .

If

ρ \geq 0

, then

λ_{i} (W) \in [λ_{i} (V), λ_{i - 1} (V)], i = 2, \dots, n,

while if

ρ \leq 0

, then

λ_{i} (W) \in [λ_{i + 1} (V), λ_{i} (V)], i = 1, \dots, n - 1 .

In either case, there exist n constants

γ_{i}

,

γ_{i} \geq 0

, with

γ_{1} + \dots γ_{n} = 1

, such that

λ_{i} (W) = λ_{i} (V) + γ_{i} ρ .

4.4.1. Sequential D-Optimality Algorithm

We use the relationship (5) to write

\begin{matrix} | V_{i} | & = & | (R^{⊤} R + c_{i} c_{i}^{⊤} |^{- 1}, \\ = & | R^{⊤} R |^{- 1} | {(1 + c_{i}^{⊤} {(R^{⊤} R)}^{- 1} c_{i})}^{- 1}, \\ = & | V | {(1 + {c_{i}}^{⊤} V c_{i})}^{- 1}, \\ = & \frac{1}{1 + g_{i}^{2}} | V |, g_{i}^{2} = c_{i}^{⊤} V c_{i} . \end{matrix}

A sequential algorithm can therefore be designed which at each stage chooses the i that maximises

g_{i}^{2}

. The algorithm can be run in two modes: the first in which the algorithm can choose any row at most one time, the second in which there is no limit in the number of times any row can be selected. Having a row repeated k times is equivalent to having a single more accurate observation with its uncertainty reduced by a factor of

1 / \sqrt k

. In the second mode, it is often the case that additional measurements correspond to repeat measurements chosen from the D-optimal set of n points. In the algorithm below (Algorithm 4), the mode is controlled by the flag

i_{R}

:

Algorithm 4: Sequential approach to optimising the D-measure in choosing p observations from a possible

m > p

observations

For each q, the only operations required are matrix–vector multiplications and similar, requiring at most

O (m n)

steps. This compares with

O (m n^{3})

steps for an algorithm not exploiting rank-one updates.

4.4.2. Expected Information Gain Calculations

The quantities

t_{q}

calculated at each stage give the reduction in determinant of the variance matrix at the qth step. The following argument can be used to provide an expected value for

t_{q}

, under the assumption that the rows of C each provide approximately the same amount of information. Suppose C consists of rows with a 1 in one column and zeros elsewhere. A D-optimal design would select rows so that the ones are distributed equally amongst the columns. If a total of q rows are selected, the variance matrix

V_{q}

is such that

V_{q} \approx \frac{n}{q} I, | V_{q} | \approx {(\frac{n}{q})}^{n} .

From this argument, the expected value of

t_{q}

,

q > n

, is such that

t_{q} \approx \tilde{t} (q) = {(\frac{q - 1}{q})}^{n} .

(12)

In terms of the geometric mean

\bar{d}

of the eigenvalues of the variance matrices, we have

\bar{d} (V_{q}) = \frac{q - 1}{q} \bar{d} (V_{q - 1}) .

For

q = n + 1

, we have

\frac{1}{2} \geq {(\frac{n}{n + 1})}^{n} \geq \frac{1}{e} \approx 0.37,

where the left hand value corresponds to the case

n = 1

, and

1 / e

is the limiting value as

n \to \infty

.

4.4.3. Sequential A-Optimality Algorithm

We can also construct a counterpart of Algorithm 4 for A-optimality. For any v,

Tr (v v^{⊤}) = v^{⊤} v = {∥ v ∥}^{2}

. With

f_{i}

and

g_{i}

as in (10), if

τ_{i}^{2} = f_{i}^{2} / (1 + g_{i}^{2}), f_{i}^{2} = f_{i}^{⊤} f_{i},

then

Tr (V_{i}) = Tr (V) - τ_{i}^{2} .

This last expression calculates the change in the A-measure for a given choice of one additional measurement. A sequential algorithm can therefore be designed which at each stage chooses the i that maximises

τ_{i}^{2}

. Each stage requires the calculation of the quantities

f_{i}^{2}

and

g_{i}^{2}

,

i = 1, \dots, m

. If V is the current variance matrix, then the

f_{i}^{2}

are given by the sum of the squares of row elements of

F = V C^{⊤}

. Suppose measurement k maximises

τ_{i}^{2}

. Then, the update of F is given by

F = (V - u_{k} u_{k}^{⊤}) C^{⊤} = F - u_{k} w^{⊤}, w = C u_{k},

and the updated of

g_{i}^{2}

is given by

g_{i}^{2} - w_{i}^{2}

, with the latter evaluated as

(g_{i} - w_{i}) (g_{i} + w_{i})

for better numerical accuracy. These steps are summarised in (Algorithm 5):

Algorithm 5: Sequential approach to optimising the A-measure in choosing p observations from a possible

m > p

observations

As for Algorithm 4, the computational requirement for each step is

O (m n)

.

5. Numerical Examples

5.1. Polynomial Calibration Curves

Many instrument response functions are modelled as polynomial curves, most commonly, of order 2 (degree 1), i.e., modelling a linear response, but higher orders arise, e.g., in the calibration of platinum resistance thermometers [28], or in the calibration of a stage motion [29] where polynomials of order 10 are involved. (Design problems for problems for polynomial models augmented by Gaussian process models are considered in [4].) Suppose a polynomial response function of order n is defined on the interval [−1,1], and that it is desired to calibrate the response function using n calibration points

x_{i} \in [- 1, 1]

. Which choice of n points is D-optimal? For a linear (quadratic) response

n = 2

(

n = 3

), it is intuitively clear that the end points

x = \pm 1

(and

x = 0

) are optimal. For higher orders, the optimal choice is not obvious. In fact, the D-optimal set [30] for order n is given by solutions of

(1 - x^{2}) {\dot{L}}_{n} (x) = 0,

(13)

where

{\dot{L}}_{n} (x)

is the derivative of Legendre polynomial of order n (degree

n - 1

). These solutions can be approximated by the so-called ‘arcsine’ points

x_{i} = \cos (π \frac{n - 1 - i}{n - 1}), i = 0, 1, \dots, n - 1 .

(14)

Table 1 gives the results of applying the SSQR algorithm and the combined SSQR-GE algorithm to estimate the D-optimal calibration points for the cases

n = 4, 5, \dots, 11

choosing from

m = 2001

possible points equally spaced in the interval

[- 1, 1]

in steps of 0.001. For these calculations, we have used Chebyshev polynomial basis functions [31] for numerical stability (but the optimal choice of points is independent of the choice of basis functions.) The arcsine estimates are also given in the table, along with the D-optimal points derived from (13); see Appendix B. The results show that, in terms of closeness to the D-optimal points, the SSQR algorithm improves on the arcsine estimates, while the GE algorithm improves the SSQR solution and is accurate to three decimals in almost all cases (there is one exception for the case

n = 11

). Table 2 gives the computed geometric mean measure

\bar{d} (V_{a})

given in (4) for evenly spaced calibration points, the arcsine solutions, the SSQR solutions, and the SSQR-GE solutions given in Table 1. Also shown is the number of GE exchanges undertaken to improve the SSQR solutions. The table reflects the improvements provided by the SSQR and SSQR-GE algorithms. The number of GE exchanges is small compared to

m = 2001

. The evenly spaced calibration points are markedly worse than the other solutions for larger n.

Figure 1 graphs the values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, without repeat measurements, for the cases

n = 4

, upper curve, and

n = 11

, lower curve, starting with the SSQR-GE solution. The values of

t_{q}

are batched with a significant jump after every n additional measurement. Without the repeat measurement, the algorithm tends to select available points closest to the D-optimal n points. If the algorithm is run with repeat measurements allowed, the algorithm selects the same set of D-optimal n points again and again. This behaviour is reflected in the graphs of

t_{q}

for this case; see Figure 2.

5.2. Tensor Product of Polynomials

Our second example arose from a problem relating to the estimation of heat distribution on an plate [32]. The application involved modelling the heat distribution as a Gaussian process model [33] involving tensor product polynomials, along with spatially correlated effects. A sub-problem is related to choosing a set of calibration points from a

10 \times 14

grid of possible sensor locations in order to determine the tensor product mean function. The tensor product model is of the form

f (x, y, a) = \sum_{i = 1}^{n_{x}} \sum_{j = 1}^{n_{y}} a_{i j} f_{i} (x) g_{j} (y),

where

f_{i} (x)

and

g_{j} (y)

are Chebyshev polynomial basis functions. Figure 3 shows the solution calibration points for

n_{x} = n_{y} = 5

constructed from polynomial basis functions up to degree 4. Two solutions are shown: the first in which the calibration points are selected from a

131 \times 91

finer grid of 11,921 points, the second in which the calibration points are selected from a

14 \times 10

grid of 140 points. The solutions found for the fine grid coincide with a grid

5 \times 5

grid of points

(x_{k}^{*}, y_{l}^{*})

, where the

x_{k}^{*}

,

k = 1, \dots, 5

are the D-optimal solutions derived from the solution of (13) on the interval [0, 20] and the

y_{l}^{*}

are the D-optimal solutions on the interval [0, 10]. For this arrangement, the observation matrix C associated with the polynomial fit is the tensor product

C = C_{x} \otimes C_{y}

. For square, full rank matrices A and B of size

n_{A} \times n_{A}

and

n_{B} \times n_{B}

| A \otimes B | = {| A |}^{n_{A}} {| B |}^{n_{B}} .

Hence, it is consistent that the D-optimal points for the tensor product are constructed from the D-optimal points for the x- and y-basis functions. The solution points for the coarse grid are seen to approximate the solution for the fine grid, given the constraints imposed by the coarse grid. The coarse grid solution points do not represent a regular grid in this case. For the case of the fine grid, the GE algorithm took 16 iterations to reach a local optimum, starting with the SSQR solution. For the coarse grid, the GE algorithm confirmed that the SSQR solution is locally optimal.

Figure 3. Tensor product polynomial calibration points found by the SSQR-GE algorithm. The points marked with a ‘+’ are those selected from a fine grid, those marked ‘o’ from a coarse grid.

Figure 4 and Figure 5 plot the values of

t_{q}

,

q = 1, \dots, 200

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, for the fine and coarse grids, respectively, with and without repeat measurements. For a coarse grid and no repeat measurements, the additional calibration points that are permitted consistently provide less information compared to the expected information gain given by

\tilde{t}

in (12). For a coarse grid of just 140 points, later additional measurements have to be selected from locations that are less informative in terms of improving the D-measure. With repeat measurements on the coarse grid, the actual information gain is in line with the expected information gain.

5.3. Coordinate Metrology

In coordinate metrology, the size and shape of a geometric surface

S

is determined from the measurement of the coordinates

x_{i}

of points lying on the surface. If the geometric surface

S = S (a)

is parametrized by

a = {(a_{1}, \dots, a_{n})}^{⊤}

and

d (x, a)

is a measure of the distance from a point x to

S (a)

, then for measurement points

x_{1 : m} = {x_{i}, i = 1, \dots, m}

,

m \geq n

, estimates of a are determined by solving the least squares optimisation problem

min_{a} = \sum_{i = 1}^{m} d^{2} (x_{i}, a) .

If the variance matrix associated with the measured points coordinates is

σ^{2} I

, then the variance matrix

V_{a}

associated with the fitted parameters is approximated by

V_{a} = σ^{2} {(J^{⊤} J)}^{- 1}, J_{i j} = \frac{\partial d}{\partial a_{j}} (x_{i}, a),

involving the Jacobian matrix J. Thus, it is possible to determine optimal measurement strategies, i.e., where to measure on the surface

S

, in terms of the determinant of the Jacobian matrix J. Often, the case of choosing a minimal set of points with

m = n

is of interest. For simple geometries, an optimal set of points is straightforward to arrive at, e.g., three points equally spaced around a circle, but for more complex geometries, the choice of an optimal set may not be obvious, as considered below.

Figure 6 shows a reference artefact designed and calibrated by the Czech Metrology Institute (CMI) [34,35]. The form of the geometric surface is a hyperbolic paraboloid given by the equation

z = 24 + (x / 8 - 1) (y / 8 - 1) .

(15)

A paraboloid with its axis approximately parallel to the z-axis is parametrized in terms of two rotation angles

α

and

β

and six further parameters b such that

\hat{z} = b_{1} {\hat{x}}^{2} + b_{2} {\hat{y}}^{2} + b_{3} \hat{x} \hat{y} + b_{4} \hat{x} + b_{5} \hat{y} + b_{6}, \hat{x} = R (α, β) x,

where

R (α, β) = R_{x} (α) R_{y} (β)

is the rotation matrix constructed from the plane rotation matrices

R_{x} (α) = [\begin{matrix} 1 & 0 & 0 \\ 0 & \cos α & - \sin α \\ 0 & \sin α & \cos α \end{matrix}], R_{y} (β) = [\begin{matrix} \cos β & 0 & \sin β \\ 0 & 1 & 0 \\ - \sin β & 0 & \cos β \end{matrix}] .

For the surface (15), nominally

b^{⊤} = (0, 0, 1 / 64, - 1 / 8, - 1 / 8, 25)

. Figure 7, Figure 8 and Figure 9 show the distribution of the x- and y-coordinates for three sets

X_{k}

,

k = 1, 2, 3

, of eight calibration points. The first two are given by ‘expert judgement’; that is, possible choices for a measurement strategy based on engineering practice. The first expert choice,

X_{1}

, Figure 7, represents an approximately uniform distribution, a first guess at a good distribution. This distribution of points is sufficient to determine all the parameters, and the associated

8 \times 8

Jacobian matrix is full rank. (If the x- and y- coordinates of the data points are rotated through 45 degrees about the z-axis, the associated Jacobian matrix is rank deficient by three.) The second set

X_{2}

came about through experimenting with moving the inner four points in dataset

X_{1}

closer or further from the origin of the

x y

-plane. It turned out that moving the four points further from the origin was better, resulting in dataset

X_{2}

, Figure 8. At first glance, the distribution of points in

X_{2}

does not look a good choice. The third dataset is that obtained using the SSQR-GE algorithm. The GE algorithm performed 11 interchanges starting from the SSQR solution. The optimal choice has seven points along the boundary in the

x y

-plane and one point in the interior. Any interior point not too far from the origin would also result in a good selection. In terms of the aggregate measure

{\bar{d}}_{k}

defined in (4) associated with the three datasets, we have

{\bar{d}}_{1} = 6.0 {\bar{d}}_{3}

and

{\bar{d}}_{2} = 2.2 {\bar{d}}_{3}

.

Table 3 gives the square roots

v_{i}

of the diagonal elements of

{(J_{k}^{⊤} J_{k})}^{- 1}

, where

J_{k}

is the Jacobian matrix associated with the kth dataset. If the variance matrix associated with the measurements is

σ^{2} I

, then

v_{i} = u (a_{j}) / σ

, where

u (a_{j})

is the estimate of the standard uncertainty associated with the jth parameter [36,37]. In terms of standard uncertainties, the D-optimal dataset

X_{3}

is far better for estimating the surface parameters than the other two datasets.

Figure 10 plots the values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, with and without repeat measurements. For higher values of q, there is good agreement between

t_{q}

and

{\tilde{t}}_{q}

.

5.4. Calibration of a Network of Standards Using a Comparator

In this section, we consider the calibration of a number of standards, starting with one calibrated standard and using a comparator to calibrate the other standards relative to the calibrated standard. An example application is in mass calibration using a mass balance [38,39,40]. Given a calibrated standard

A_{1}

, nominally of 1 kg, and two uncalibrated masses

A_{2}

and

A_{3}

each of nominal mass 0.5 kg,

A_{2}

and

A_{3}

can be calibrated using two measurements, the first comparing

A_{1}

with the combined mass of

A_{2}

and

A_{3}

, and the second comparing

A_{2}

and

A_{3}

. A second example from length metrology relates to the calibration of gauge blocks [41], where it is possible to align two gauge blocks end to end, a process called ‘wringing’, to define a combined length. This discussion is relevant to the calibration of any set of artefacts defining an extensive quantity where the artefacts can be grouped to define a quantity that is (nominally) the sum of the individual quantities. We use the term ‘network’ to reflect the fact that the estimates of all the quantities are statistically correlated and an experiment that updates the information about one artefact will also update information about the other artefacts.

Let

a = {(a_{1}, \dots, a_{n})}^{⊤}

represent the values of the quantities associated with the artefacts where the nominal values

a^{♯} = {(a_{1}^{♯}, \dots, a_{n}^{♯})}^{⊤}

of the quantities are such that the artefacts can be grouped into sets of two groups, with each group being associated with the same nominal value. Thus, there are disjoint subsets

L_{i}

,

R_{i} \subset I = {1, 2, \dots, n}

,

L_{i} \cap R_{i} = \emptyset

,

i = 1, \dots, m

, such that

\sum_{q \in L_{i}} a_{q} \approx \sum_{q \in R_{i}} a_{q} .

An example considered further below involves a set of artefacts having nominal values 1.00, 0.50, 0.50, 0.20, 0.20, 0.10, 0.10, 0.05, and 0.05. The fact that the two subsets are associated with the same nominal value allows a comparator measurement to be made, with the associated observation equation

0 \approx y_{i} = \sum_{q \in L_{i}} a_{q} - \sum_{q \in R_{i}} a_{q} + ϵ_{i}, ϵ_{i} \in N (0, σ_{i}^{2}),

where

ϵ_{i}

represents a random effect associated with the comparator measurement. The design problem is to define measurement strategies encoded by

L_{i}

and

R_{i}

,

i = 1, \dots, n

, that provide the most accurate calibration of the standards. For the example considered here, there are almost 400 viable experiments.

Table 4 and Table 5 show five designs for the calibration of the

n = 9

standards. A ‘1’ in column j indicates that

j \in L_{i}

, while a ‘

- 1

’ indicates that

j \in R_{i}

. Each array of

9 \times 9

elements is in fact the unweighted observation matrix C associated with the particular design. The first row in each design represents the observation associated with the calibration of the first standard using an absolute measurement system. This first observation is the only source of information about the first standard and must be included in the optimal solution. The uncertainties associated with the remaining standards scale with the uncertainty associated with the first observation.

The first design, given in Table 4, is one defined by ‘expert judgement’, and represents a typical set of experiments that would be performed in practice. (In fact, for this set of standards, it is not entirely obvious how to choose a design that will calibrate all of the standards). The next design, indicated by the first nine rows in Table 5, is that determined by the SSQR-GE algorithm with

σ_{1} = σ_{C} = 1.0

and

σ_{i} = σ_{R} = 0.5

,

i = 2, \dots, n

. The most striking and, at first sight, surprising feature of this second design is that almost all the artefacts are involved in experiments 2 to n. The reason for this is that, in some sense, the comparator uncertainties represented by

σ_{i} = σ_{R}

are partitioned amongst the artefacts involved, and more artefacts being included in an experiment means that each artefact is assigned a smaller uncertainty.

The two designs considered so far are based on the assumption that all the comparator measurements are associated with the same uncertainty

σ_{R}

, irrespective of the number of artefacts and the nominal values of the artefacts. A more plausible assignment of uncertainties would take into account both the number and nominal values. For a mass balance, it would be usual for the associated uncertainty to have a component that varied in proportion to the mass. Similarly, a length measuring device will usually have one or more influencing factors, e.g., refractive index effects for the case of a laser interferometric comparator [41], that vary in proportion to length. The use of multiple artefacts in each experiment will also likely introduce effects that will add to the uncertainties.

For each experiment, let

n_{i}

be the total number of artefacts involved and let

v_{i} = \sum_{q \in L_{i} \cup R_{i}} a_{q}^{♯},

be a measure of the total value of the quantities involved in the ith experiment. Given

σ_{R}

,

σ_{V}

and

σ_{N}

, we can assign

σ_{i}

according to

σ_{i}^{2} = σ_{R}^{2} + n_{i, 2} σ_{N}^{2} + v_{i}^{2} σ_{V}^{2}, n_{i, 2} = max (n_{i} - 2, 0), i = 2, \dots, n .

(16)

Table 5 presents the optimal designs calculated using the SSQR-GE algorithm with

σ_{i}

calculated as in (16) for the four different values of

σ_{R}

,

σ_{N}

and

σ_{V}

shown in Table 6. The third design in Table 5 is the result penalising experiments with a large number of artefacts, while the fourth design in the table is the result of penalising experiments with a larger value of

v_{i}

.

Table 6 gives the uncertainties

u (a_{i})

associated with

a_{i}

for four different assignments of

σ_{i}

,

i = 2, \dots, n = 9

for the designs in Table 4 and Table 5. Also shown in the table is the aggregate measure

\bar{d} = {| V_{a} |}^{1 / n}

of uncertainty, the total number

\sum_{i} n_{i}

of artefacts and a measure

\sum_{i} v_{i}

of the total nominal values involved in each set of experiments. The last three rows give the values of

σ_{R}

,

σ_{N}

and

σ_{V}

used to calculate

σ_{i}

,

i = 2, \dots, n

. In all cases, the SSQR-GE algorithm leads to significant improvements over the expert judgement design.

6. Concluding Remarks

In this paper, we have looked at the experimental design problem: given m potential observations to determine n parameters,

m > n

, what is the best choice of n observations. In the context of least squares estimation, the problem was formulated as finding the

n \times n

submatrix of the complete

m \times n

observation matrix that has maximum determinant, corresponding to the D-optimality criterion. We described two algorithms, the SSQR and GE algorithms, to address this problem. Both were adapted from numerical linear algebra algorithms associated with the QR factorisation of a matrix. We also described algorithms for updating estimates of the problem parameters that are sequentially optimal with the respect to the D-optimality and A-optimality criteria. We illustrated the behaviour of the algorithms on a number of applications drawn from the field of metrology. These algorithms are straightforward to implement. The SSQR algorithm, in particular, is a minor adaption of the standard QR factorisation algorithm with pivoting. The algorithms enable computationally efficient implementations, exploiting rank-one matrix updating techniques. The algorithms are descent-type algorithms that will converge to a local minimum that is not necessarily a global minimum. In the examples considered, the algorithms determined effective experimental designs, often with much better performance than ‘hand-crafted’ designs based on expert judgement.

Funding

This work was supported by the UK’s National Measurement System programme for Data Science.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The author is grateful to the anonymous referees for their helpful comments, and to the Czech Metrology Institute for permission to use their photograph of the hyperbolic paraboloid, Figure 6.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Numerical Linear Algebra

Appendix A.1. Sherman-Morrison Formula

See [23]. For

n \times n

full rank matrix A and n-vectors u and v

{(A + u v^{⊤})}^{- 1} = A^{- 1} - \frac{A^{- 1} u v^{⊤} A^{- 1}}{1 + v^{⊤} A^{- 1} u} .

If

\tilde{u} = A^{- 1} u

and

\tilde{v} = A^{- ⊤} v

, then

{(A + u v^{⊤})}^{- 1} = A^{- 1} - \frac{\tilde{u} {\tilde{v}}^{⊤}}{1 + v^{⊤} \tilde{u}} = A^{- 1} - \frac{\tilde{u} {\tilde{v}}^{⊤}}{1 + {\tilde{v}}^{⊤} u} .

Appendix A.2. Determinant of a Rank One Update of a Matrix

See, e.g., [42,43]. For A and square matrix and n-vectors u and v

| A + u v^{⊤} | = | A | (1 + v^{⊤} A^{- 1} u) .

Writing

A + u v^{⊤} = A (I + (A^{- 1} u) v^{⊤}),

it is sufficient to consider the case

A = I

:

| I + u v^{⊤} | = 1 + v^{⊤} u

. From the matrix equation

[\begin{matrix} I & 0 \\ v^{⊤} & 1 \end{matrix}] [\begin{matrix} I + u v^{⊤} & u \\ 0 & 1 \end{matrix}] [\begin{matrix} I & 0 \\ - v^{⊤} & 1 \end{matrix}] = [\begin{matrix} I & u \\ 0 & 1 + v^{⊤} u \end{matrix}]

it is seen that the determinant of the matrix on the right-hand side is equal to the determinant of the matrix in the middle of the left-hand side, establishing the result for the case

A = I

.

Appendix B. Legendre Polynomials

See, e.g., [44]. Note: in this section, the parameter vector

a = {(a_{0}, \dots, a_{n})}^{⊤}

is an

n + 1

vector indices starting from 0, not 1.

The D-optimal points for calibration experiments involving a polynomial response function of degree n are defined in terms of solutions of the polynomial equation

F (x) = 0, F (x) = (1 - x^{2}) {\dot{L}}_{n - 1} (x) = 0,

where

{\dot{L}}_{n - 1} (x)

is the derivative of Legendre polynomial of degree

n - 1

. Given an estimate x of a solution, an updated solution is given by Newton’s method [17,45] according to

x : = x - F (x) / \dot{F} (x), \dot{F} (x) = - 2 x {\dot{L}}_{n - 1} + (1 - x^{2}) {\ddot{L}}_{n - 1} (x)

where

\ddot{L}

denotes the second derivative of

L (x)

with respect to x. Starting estimates for the

n + 1

solutions are given by the arcsine points:

x_{i} = \cos (π \frac{n - i}{n}), i = 0, 1, \dots, n .

Legendre polynomials

L_{j} (x)

are orthogonal polynomial basis functions defined on the interval [−1, 1] and can be evaluated using the three-term recurrence relationship, starting with

L_{0} (x) = 1

,

L_{1} (x) = x

, and for

j \geq 2

,

j L_{j} (x) = (2 j - 1) x L_{j - 1} (x) - (j - 1) L_{j - 1} (x) .

If

f (x)

is a degree n polynomial given by a sum of Legendre polynomial basis functions

f (x, a) = \sum_{j = 0}^{n} a_{j} L_{j} (x),

then

\dot{f} (x)

, the derivative of f with respect to x, is a degree

n - 1

polynomial and can be written as

\dot{f} (x, \dot{a}) = \sum_{j = 0}^{n - 1} {\dot{a}}_{j} L_{j} (x),

where

\dot{a} = \dot{A} {(a_{1}, \dots, a_{n})}^{⊤}

is defined in terms of the latter n elements of the

n + 1

vector

a = {(a_{0}, a_{1}, \dots, a_{n})}^{⊤}

, and

\dot{A}

is the

n \times n

matrix constructed from diagonals formed from the sequence

1, 3, 5, \dots, 2 n - 1

. The matrix

\dot{A}

for the case

n = 10

is

\dot{A} = [\begin{matrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 3 & 0 & 3 & 0 & 3 & 0 & 3 & 0 & 3 \\ 0 & 0 & 5 & 0 & 5 & 0 & 5 & 0 & 5 & 0 \\ 0 & 0 & 0 & 7 & 0 & 7 & 0 & 7 & 0 & 7 \\ 0 & 0 & 0 & 0 & 9 & 0 & 9 & 0 & 9 & 0 \\ 0 & 0 & 0 & 0 & 0 & 11 & 0 & 11 & 0 & 11 \\ 0 & 0 & 0 & 0 & 0 & 0 & 13 & 0 & 13 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 15 & 0 & 15 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 17 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 19 \end{matrix}]

Similarly, the second derivative

\ddot{f} (x)

can be written as

\ddot{f} (x, \ddot{a}) = \sum_{j = 0}^{n - 2} {\ddot{a}}_{j} L_{j} (x),

where

\ddot{a} = \ddot{A} {(a_{2}, \dots, a_{n})}^{⊤}

is defined in terms of the latter

n - 1

elements of

a = {(a_{0}, a_{1}, \dots, a_{n})}^{⊤}

, and

\ddot{A}

is the

(n - 1) \times (n - 1)

matrix with

\ddot{A} = \dot{A} (1 : n - 1, 1 : n - 1) \dot{A} (2 : n, 2 : n) .

References

Atkinson, A.C.; Donev, A.N.; Tobias, R.D. Optimum Experimental Designs, with SAS; Oxford University Press: Oxford, UK, 2007. [Google Scholar]
Box, G.E.P.; Hunter, W.G.; Hunter, J.S. Statistics for Experimenters: Design, Innovation and Discovery, 2nd ed.; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar]
Chaloner, K. Optimal Bayesian experimental design for linear models. Ann. Stat. 1984, 12, 283–300. [Google Scholar] [CrossRef]
Forbes, A.B.; Minh, H.D. Design of linear calibration experiments. Measurement 2013, 46, 3730–3736. [Google Scholar] [CrossRef]
Goos, P. The Optimal Design of Blocked and Split-Plot Experiments; Springer: New York, NY, USA, 2002. [Google Scholar]
Goos, P.; Jones, B. Optimal Design of Experiments: A Case Study Approach; John Wiley & Sons: New York, NY, USA, 2011. [Google Scholar]
Jones, B.; Nachtsheim, C.J. Effective Model Selection for Definitive Screening Designs. Technometrics 2017, 59, 319–329. [Google Scholar] [CrossRef]
Montgomery, D.C. Design and Analysis of Experiments, 8th ed.; John Wiley & Sons: New York, NY, USA, 2013. [Google Scholar]
Chretien, S.; Clarkson, P. A fast algorithm for the semi-definite relaxation of the state estimation problem in power grids. J. Ind. Manag. Optim. 2020, 16, 431–443. [Google Scholar] [CrossRef]
Kekatos, V.; Giannakis, G.B. A convex relaxation approach to optimal placement of phasor measurement units. In Proceedings of the 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), San Juan, PR, USA, 13–16 December 2011; pp. 145–148. [Google Scholar]
Berger, J.; Busser, T.; Dutykh, D.; Nathan Mendes, N. An efficient method to estimate sorption isotherm curve coefficients. Inverse Probl. Sci. Eng. 2019, 27, 735–772. [Google Scholar] [CrossRef]
Berger, J.; Dutykh, D.; Mendes, N. On the optimal experiment design for heat and moisture parameter estimation. Exp. Therm. Fluid Sci. 2017, 81, 109–122. [Google Scholar] [CrossRef]
Chernoff, H. Sequential Analysis and Optimal Design; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1972. [Google Scholar] [CrossRef]
Meyer, R.K.; Nachtsheim, C.J. The Coordinate Exchange Algorithm for Constructing Exact Optimal Designs. Technometrics 1995, 37, 60–69. [Google Scholar] [CrossRef]
Wald, A. Sequential Tests of Statistical Hypotheses. Ann. Math. Stat. 1945, 16, 117–186. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Gill, P.E.; Murray, W.; Wright, M.H. Practical Optimization; Academic Press: London, UK, 1981. [Google Scholar]
Vandenberghe, L.; Boyd, S. Semidefinite Programming. SIAM Rev. 1996, 38, 49–95. [Google Scholar] [CrossRef]
Barker, R.M.; Cox, M.G.; Forbes, A.B.; Harris, P.M. Software Support for Metrology Best Practice Guide No. 4: Modelling Discrete Data and Experimental Data Analysis. Technical Report DEM-ES 018; National Physical Laboratory: Teddington, UK, 2007.
BIPM. The International System of Units (SI Brochure (EN)), 9th ed.; BIPM: Sèvres, France, 2019. [Google Scholar]
BIPM; IEC; IFCC; ILAC; ISO; IUPAC; IUPAP; OIML. Evaluation of Measurement Data—Guide to the Expression of Uncertainty in Measurement; JCGM 100:2008; Joint Committee for Guides in Metrology: Sèvres, France, 2008.
BIPM; IEC; IFCC; ILAC; ISO; IUPAC; IUPAP; OIML. Evaluation of Measurement Data—Supplement 2 to the “Guide to the Expression of Uncertainty in Measurement”—Extension to Any Number of Output Quantities; JCGM 102:2011; Joint Committee for Guides in Metrology: Sèvres, France, 2011.
Golub, G.; Van Loan, C. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
Hart, G.W. Multidimensional Analysis: Algebras and Systems for Science and Engineering; Springer: Berlin, Germany, 1995. [Google Scholar]
Gu, M.; Eisenstat, S.C. Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization. SIAM J. Sci. Comput. 1996, 17, 848–869. [Google Scholar] [CrossRef]
Sherman, J.; Morrison, W.J. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix. Ann. Math. Stat. 1950, 21, 124–127. [Google Scholar] [CrossRef]
Wilkinson, J.H. The Algebraic Eigenvalue Problem; Oxford University Press, Inc.: New York, NY, USA, 1988. [Google Scholar]
Preston-Thomas, H. The International Temperature Scale of 1990 (ITS-90). Metrologia 1990, 27, 3–10. [Google Scholar] [CrossRef]
Bartlett, G.; Forbes, A.; Heaps, E.; Raby, A.C.; Yacoot, A. Spatial positioning correction for multi-axis nanopositioning stages. In Proceedings of the ASPE Convention and Expo, Indianapolis, IN, USA, 16–21 September 2022. [Google Scholar]
Pukelsheim, F. Optimal Design of Experiments; SIAM: Philadelphia, PA, USA, 2006; Reproduction of 1993 book published by John Wiley and Sons, New York. [Google Scholar]
Handscombe, D.C.; Mason, J.C. Chebyshev Polynomials; Chapman & Hall/CRC Press: London, UK, 2003. [Google Scholar]
Forbes, A.B.; Jagan, K.; Dunlevy, J.; Sousa, J.A. Optimization of sensor distribution using Gaussian processes. Meas. Sens. 2021, 18, 100128. [Google Scholar] [CrossRef]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
Linkeová, I.; Zelený, V. Application of ruled surfaces in freeform and gear metrology. Acta Polytech. 2021, 61, 99–109. [Google Scholar] [CrossRef]
Zelený, V.; Linkeová, I.; Skalnik, P. Calibration of freeform standard. In Proceedings of the 15th International Conference of the European Society for Precision Engineering and Nanotechnology, EUSPEN 2015, Leuven, Belgium, 1–5 June 2015; pp. 147–148. [Google Scholar]
Forbes, A.B. Parameter estimation based on least squares methods. In Data Modeling for Metrology and Testing in Measurement Science; Pavese, F., Forbes, A.B., Eds.; Birkhäuser-Boston: New York, NY, USA, 2009; pp. 147–176. [Google Scholar]
Forbes, A.B. Sensitivity analysis for Gaussian associated features. Appl. Sci. 2022, 12, 2808. [Google Scholar] [CrossRef]
Grabe, M. Note on the Application of the Method of Least Squares. Metrologia 1978, 14, 143. [Google Scholar] [CrossRef]
Hotelling, H. Some Improvements in Weighing and Other Experimental Techniques. Ann. Math. Stat. 1944, 15, 297–306. [Google Scholar] [CrossRef]
Nielsen, L. Evaluation of mass measurements in accordance with the GUM. Metrologia 2014, 51, S183. [Google Scholar] [CrossRef]
Lewis, A.J.; Hughes, E.B.; Aldred, P.J.E. Long term study of gauge block interferometer performance and gauge blocks stability. Metrologia 2010, 47, 473–486. [Google Scholar] [CrossRef]
Ding, J.; Zhou, A. Eigenvalues of rank-one updated matrices with some applications. Appl. Math. Lett. 2007, 20, 1223–1226. [Google Scholar] [CrossRef]
Hogben, L. (Ed.) Handbook of Linear Algebra; Chapman & Hall/CRC: Boca Raton, FL, USA, 2007. [Google Scholar]
Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions; Dover: New York, NY, USA, 1964. [Google Scholar]
Fletcher, R. Practical Methods of Optimization, 2nd ed.; John Wiley and Sons: Chichester, UK, 1987. [Google Scholar]

Figure 1. Polynomial calibration points: values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, for the cases

n = 4

, upper curve, and

n = 11

, lower curve.

Figure 1. Polynomial calibration points: values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, for the cases

n = 4

, upper curve, and

n = 11

, lower curve.

Figure 2. As Figure 1, but for the case where repeated measurements are permitted.

Figure 4. Tensor product polynomial calibration points: values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, for the fine grid, with no repeat measurements, upper curve, and with repeat measurements, lower curve.

Figure 4. Tensor product polynomial calibration points: values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, for the fine grid, with no repeat measurements, upper curve, and with repeat measurements, lower curve.

Figure 5. Tensor product polynomial calibration points: values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, for the coarse grid, with no repeat measurements, upper curve, and with repeat measurements, lower curve.

Figure 5. Tensor product polynomial calibration points: values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, for the coarse grid, with no repeat measurements, upper curve, and with repeat measurements, lower curve.

Figure 6. The CMI hyperbolic paraboloid reference artefact.

Figure 7. Hyperbolic paraboloid calibration points: ‘expert’ guess at a good set of calibration points.

Figure 8. Hyperbolic paraboloid calibration points: second ‘expert’ guess at a good set of calibration points, a modification of

X_{1}

, Figure 7.

Figure 8. Hyperbolic paraboloid calibration points: second ‘expert’ guess at a good set of calibration points, a modification of

X_{1}

, Figure 7.

Figure 9. Hyperbolic paraboloid calibration points: D-optimal points determined by the SSQR-GE algorithm.

Figure 10. Hyperbolic paraboloid calibration points: values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, with no repeat measurements, upper curve, and with repeat measurements, lower curve.

Figure 10. Hyperbolic paraboloid calibration points: values of

t_{q}

,

q = 1, \dots, 100

, and its analytical approximation

\tilde{t} (q)

given by (12), determined by the sequential D-optimal Algorithm 4, with no repeat measurements, upper curve, and with repeat measurements, lower curve.

Table 1. D-optimal calibration points for polynomial calibration curves of order

n = 4, \dots, 11

defined in the interval

[- 1, 1]

. For each set of four columns, the first column, labelled

x_{0}

, gives the arcsine estimates of the calibration points in (14); the second column, labelled

x_{QR}

, gives the optimal the calibration points determined by the SSQR algorithm; the third column, labelled

x_{GE}

, gives the estimates determined by the combined SSQR-GE algorithm; while the fourth column, labelled

x^{*}

, gives the optimal the calibration points derived from (13).

Table 1. D-optimal calibration points for polynomial calibration curves of order

n = 4, \dots, 11

defined in the interval

[- 1, 1]

. For each set of four columns, the first column, labelled

x_{0}

, gives the arcsine estimates of the calibration points in (14); the second column, labelled

x_{QR}

, gives the optimal the calibration points determined by the SSQR algorithm; the third column, labelled

x_{GE}

, gives the estimates determined by the combined SSQR-GE algorithm; while the fourth column, labelled

x^{*}

, gives the optimal the calibration points derived from (13).

$x_{0}$	$x_{QR}$	$x_{GE}$	$x^{*}$	$x_{0}$	$x_{QR}$	$x_{GE}$	$x^{*}$
$n = 4$				$n = 5$
−1.000	−1.000	−1.000	−1.000	−1.000	−1.000	−1.000	−1.000
−0.500	−0.488	−0.447	−0.447	−0.707	−0.669	−0.655	−0.655
0.500	0.437	0.447	0.447	0.000	0.006	0.000	0.000
1.000	1.000	1.000	1.000	0.707	0.686	0.655	0.655
				1.000	1.000	1.000	0.000
$n = 6$				$n = 7$
−1.000	−1.000	−1.000	−1.000	−1.000	−1.000	−1.000	−1.000
−0.809	−0.786	−0.765	−0.765	−0.866	−0.845	−0.830	−0.830
−0.309	−0.286	−0.285	−0.285	−0.500	−0.484	−0.469	−0.469
0.309	0.308	0.285	0.285	0.000	0.002	0.000	0.000
0.809	0.779	0.765	0.765	0.500	0.493	0.469	0.469
1.000	1.000	1.000	1.000	0.866	0.841	0.830	0.830
				1.000	1.000	1.000	0.000
$n = 8$				$n = 9$
−1.000	−1.000	−1.000	−1.000	−1.000	−1.000	−1.000	−1.000
−0.901	−0.880	−0.872	−0.872	−0.924	−0.908	−0.900	−0.900
−0.623	−0.613	−0.592	−0.592	−0.707	−0.692	−0.677	−0.677
−0.223	−0.211	−0.209	−0.209	−0.383	−0.383	−0.363	−0.363
0.223	0.225	0.210	0.209	0.000	−0.002	0.000	0.000
0.623	0.608	0.592	0.592	0.383	0.376	0.363	0.363
0.901	0.882	0.872	0.872	0.707	0.695	0.677	0.677
1.000	1.000	1.000	1.000	0.924	0.906	0.900	0.900
				1.000	1.000	1.000	0.000
$n = 10$				$n = 11$
−1.000	−1.000	−1.000	−1.000	−1.000	−1.000	−1.000	−1.000
−0.940	−0.925	−0.920	−0.920	−0.951	−0.938	−0.934	−0.934
−0.766	−0.753	−0.739	−0.739	−0.809	−0.796	−0.784	−0.784
−0.500	−0.493	−0.478	−0.478	−0.588	−0.580	−0.565	−0.565
−0.174	−0.177	−0.165	−0.165	−0.309	−0.311	−0.296	−0.296
0.174	0.168	0.165	0.165	0.000	−0.001	0.000	0.000
0.500	0.497	0.478	0.478	0.309	0.307	0.296	0.296
0.766	0.751	0.739	0.739	0.588	0.582	0.566	0.565
0.940	0.925	0.920	0.920	0.809	0.795	0.785	0.784
1.000	1.000	1.000	1.000	0.951	0.939	0.934	0.934
				1.000	1.000	1.000	0.000

Table 2. For

n = 4, 5, \dots, 11

, rows two to five give the computed geometric mean measure

\bar{d} (V_{a})

given in (4) for evenly spaced calibration points, the arcsine solutions, the SSQR solutions, and the SSQR-GE solutions given in Table 1. The final row gives the number of GE exchanges undertaken to improve the SSQR solutions.

Table 2. For

n = 4, 5, \dots, 11

, rows two to five give the computed geometric mean measure

\bar{d} (V_{a})

given in (4) for evenly spaced calibration points, the arcsine solutions, the SSQR solutions, and the SSQR-GE solutions given in Table 1. The final row gives the number of GE exchanges undertaken to improve the SSQR solutions.

n	4	5	6	7	8	9	10	11
${\bar{d}}_{0}$	0.4871	0.4152	0.3748	0.3511	0.3379	0.3316	0.3304	0.3332
${\bar{d}}_{AS}$	0.4714	0.3789	0.3175	0.2734	0.2403	0.2143	0.1935	0.1763
${\bar{d}}_{QR}$	0.4682	0.3746	0.3130	0.2691	0.2362	0.2107	0.1901	0.1733
${\bar{d}}_{GE}$	0.4673	0.3735	0.3119	0.2682	0.2354	0.2099	0.1894	0.1726
$n_{GE}$	3	6	7	9	12	17	19	21

Table 3. Standard uncertainty factors

u (a_{j}) / σ

associated with the hyperbolic paraboloid parameters

a = {(α, β, b^{⊤})}^{⊤}

for three eight-point datasets

X_{k}

,

k = 1, 2, 3

.

Table 3. Standard uncertainty factors

u (a_{j}) / σ

associated with the hyperbolic paraboloid parameters

a = {(α, β, b^{⊤})}^{⊤}

for three eight-point datasets

X_{k}

,

k = 1, 2, 3

.

$u (a_{j}) / σ$	$α$	$β$	$b_{1}$	$b_{2}$	$b_{3}$	$b_{4}$	$b_{5}$	$b_{6}$
$X_{1}$	2.3867	2.3867	0.0049	0.0049	0.0139	2.6219	2.6219	10.5328
$X_{2}$	0.7259	0.7259	0.0042	0.0042	0.0035	0.4999	0.4999	2.8188
$X_{3}$	0.2049	0.2413	0.0012	0.0014	0.0014	0.2883	0.2520	1.4690

Table 4. Design of a calibration experiment given by ‘expert judgement’.

1.00	0.50	0.50	0.20	0.20	1.0	1.0	0.05	0.05
1	0	0	0	0	0	0	0	0
1	−1	−1	0	0	0	0	0	0
0	1	−1	0	0	0	0	0	0
0	1	0	−1	−1	−1	0	0	0
0	0	1	−1	−1	0	−1	0	0
0	0	0	1	−1	0	0	0	0
0	0	0	1	0	0	0	−1	−1
0	0	0	0	0	1	0	−1	−1
0	0	0	0	0	0	0	1	−1

Table 5. Designs of four calibration experiments calculated using the SSQR-GE algorithm with uncertainties

σ_{i}

calculated according to (16) for the different values of

σ_{R}

,

σ_{N}

, and

σ_{V}

given in Table 6.

Table 5. Designs of four calibration experiments calculated using the SSQR-GE algorithm with uncertainties

σ_{i}

calculated according to (16) for the different values of

σ_{R}

,

σ_{N}

, and

σ_{V}

given in Table 6.

1.00	0.50	0.50	0.20	0.20	1.0	1.0	0.05	0.05
1	0	0	0	0	0	0	0	0
1	0	−1	−1	−1	−1	−1	1	1
1	−1	−1	1	−1	1	−1	1	−1
1	−1	−1	1	−1	−1	1	−1	1
1	−1	−1	−1	1	1	−1	−1	1
1	−1	−1	−1	1	−1	1	1	−1
0	1	0	−1	−1	0	0	−1	−1
0	1	−1	0	1	−1	0	−1	−1
0	0	1	−1	0	−1	−1	−1	−1
1	0	0	0	0	0	0	0	0
1	−1	−1	0	0	0	0	0	0
0	1	0	−1	−1	−1	−1	1	1
0	1	−1	1	−1	1	−1	−1	1
0	1	−1	0	1	−1	0	−1	−1
0	1	−1	0	−1	1	1	1	−1
0	0	1	−1	−1	0	0	−1	−1
0	0	0	1	0	−1	−1	1	−1
0	0	0	1	−1	−1	1	−1	1
1	0	0	0	0	0	0	0	0
1	−1	−1	0	0	0	0	0	0
0	1	0	−1	−1	0	−1	0	0
0	1	−1	0	0	0	0	0	0
0	0	0	1	0	−1	−1	0	0
0	0	0	1	−1	0	0	0	0
0	0	0	0	0	1	0	−1	−1
0	0	0	0	0	1	−1	0	0
0	0	0	0	0	0	0	1	−1
1	0	0	0	0	0	0	0	0
1	−1	−1	0	0	0	0	0	0
0	1	0	−1	−1	0	−1	0	0
0	1	−1	0	0	0	0	0	0
0	0	0	1	0	−1	−1	0	0
0	0	0	1	−1	0	0	0	0
0	0	0	0	0	1	0	−1	−1
0	0	0	0	0	1	−1	0	0
0	0	0	0	0	0	0	1	−1

Table 6. Uncertainties associated with

a_{i}

for four different assignments of

σ_{i}

,

i = 2, \dots, n = 9

. The columns labelled E give the uncertainties associated with ‘expert judgement’ design given in Table 4, while the columns labelled ‘O’ give those associated with the optimal designs, given, in Table 5, calculated by the SSQR-GE algorithm. Also shown in the table are the aggregate measure

\bar{d} = {| V_{a} |}^{1 / n}

of uncertainty, the total number

\sum_{i} n_{i}

of artefacts, and a measure

\sum_{i} v_{i}

of the total nominal values involved in each set of experiments. The last three rows give the values of

σ_{R}

,

σ_{N}

, and

σ_{V}

used to calculate

σ_{i}

,

i = 2, \dots, n

.

Table 6. Uncertainties associated with

a_{i}

for four different assignments of

σ_{i}

,

i = 2, \dots, n = 9

. The columns labelled E give the uncertainties associated with ‘expert judgement’ design given in Table 4, while the columns labelled ‘O’ give those associated with the optimal designs, given, in Table 5, calculated by the SSQR-GE algorithm. Also shown in the table are the aggregate measure

\bar{d} = {| V_{a} |}^{1 / n}

of uncertainty, the total number

\sum_{i} n_{i}

of artefacts, and a measure

\sum_{i} v_{i}

of the total nominal values involved in each set of experiments. The last three rows give the values of

σ_{R}

,

σ_{N}

, and

σ_{V}

used to calculate

σ_{i}

,

i = 2, \dots, n

.

$a_{i}^{♯}$	E	O1	E	02	E	O3	E	04
1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
0.50	0.61	0.56	0.66	0.64	0.69	0.69	1.04	1.04
0.50	0.61	0.55	0.66	0.64	0.69	0.69	1.04	1.04
0.20	0.39	0.31	0.43	0.40	0.60	0.58	0.50	0.57
0.20	0.49	0.29	0.52	0.39	0.61	0.58	0.54	0.60
0.10	0.57	0.26	0.61	0.32	0.90	0.44	0.57	0.36
0.10	0.91	0.27	1.03	0.36	1.64	0.45	1.34	0.34
0.05	0.35	0.20	0.36	0.28	0.40	0.48	0.29	0.27
0.05	0.35	0.20	0.36	0.28	0.40	0.48	0.29	0.27
$\bar{d}$	0.17	0.06	0.21	0.12	0.21	0.13	0.21	0.15
$\sum v_{i}$	7.0	17.4	7.0	11.0	7.0	6.3	7.0	6.3
$\sum n_{i}$	24	62	24	48	24	22	24	22
$σ_{R}$	0.50	0.50	0.50	0.50	0.20	0.20	0.20	0.20
$σ_{N}$	0.00	0.00	0.20	0.20	0.80	0.80	0.20	0.20
$σ_{V}$	0.00	0.00	0.20	0.20	0.20	0.20	0.80	0.80

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Forbes, A. The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments. Algorithms 2024, 17, 193. https://doi.org/10.3390/a17050193

AMA Style

Forbes A. The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments. Algorithms. 2024; 17(5):193. https://doi.org/10.3390/a17050193

Chicago/Turabian Style

Forbes, Alistair. 2024. "The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments" Algorithms 17, no. 5: 193. https://doi.org/10.3390/a17050193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments

Abstract

1. Introduction

2. Least Squares Problems

3. Aggregate Measures of Uncertainty

4. Choosing a D-Optimal Subset of n Measurements

4.1. Subset Selection Using the QR Factorisation with Column Pivoting

4.2. Exchange Approach Based on the Gu and Eisenstat Algorithm

4.3. Combined SSQR-GE Algorithm

4.4. Sequential Approach for Choosing a Set of Additional Observations

4.4.1. Sequential D-Optimality Algorithm

4.4.2. Expected Information Gain Calculations

4.4.3. Sequential A-Optimality Algorithm

5. Numerical Examples

5.1. Polynomial Calibration Curves

5.2. Tensor Product of Polynomials

5.3. Coordinate Metrology

5.4. Calibration of a Network of Standards Using a Comparator

6. Concluding Remarks

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Numerical Linear Algebra

Appendix A.1. Sherman-Morrison Formula

Appendix A.2. Determinant of a Rank One Update of a Matrix

Appendix B. Legendre Polynomials

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI