Next Article in Journal
The Weighted Least-Squares Approach to State Estimation in Linear State Space Models: The Case of Correlated Noise Terms
Next Article in Special Issue
Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems
Previous Article in Journal
Encouraging Eco-Innovative Urban Development
Previous Article in Special Issue
Hybrid Newton-like Inverse Free Algorithms for Solving Nonlinear Equations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments

National Physical Laboratory, Teddington TW11 0LW, UK
Algorithms 2024, 17(5), 193; https://doi.org/10.3390/a17050193
Submission received: 27 March 2024 / Revised: 25 April 2024 / Accepted: 30 April 2024 / Published: 2 May 2024
(This article belongs to the Special Issue Numerical Optimization and Algorithms: 2nd Edition)

Abstract

:
This paper addresses the following problem: given m potential observations to determine n parameters, m > n , what is the best choice of n observations. The problem can be formulated as finding the n × n submatrix of the complete m × n observation matrix that has maximum determinant. An algorithm by Gu and Eisenstat for a determining a strongly rank-revealing QR factorisation of a matrix can be adapted to address this latter formulation. The algorithm starts with an initial selection of n rows of the observation matrix and then performs a sequence of row interchanges, with the determinant of the current submatrix strictly increasing at each step until no further improvement can be made. The algorithm implements rank-one updating strategies, which leads to a compact and efficient algorithm. The algorithm does not necessarily determine the global optimum but provides a practical approach to designing an effective measurement strategy. In this paper, we describe how the Gu–Eisenstat algorithm can be adapted to address the problem of optimal experimental design and used with the QR algorithm with column pivoting to provide effective designs. We also describe implementations of sequential algorithms to add further measurements that optimise the information gain at each step. We illustrate performance on several metrology examples.

1. Introduction

While machine learning paradigms are based on access to large sets of data to extract the information of interest, many practical problems in science and technology arise in a context in which data are sparse and potentially expensive to generate. Given limited experimental resources, it is important that these resources are used effectively. The design of experiment (DoE) is an important area of study in applied mathematics and statistics. Many DoE studies relate to problems in which a response variable of interest, e.g., the yield of a crop, is possibly influenced by a number of control or stimulus variables, e.g., crop variety, soil type, irrigation regime, fertiliser treatment, etc. The form of the response function is usually unknown and the DoE problem is to design a number of experiments with the control variables set to different values, in order to estimate the response function from a set of candidate response functions and consequently choose control values that optimise the response; see, e.g., [1,2,3,4,5,6,7,8]. The DoE problem has been addressed in applications across many scientific disciplines, including the placement of sensors in a electrical power grid [9,10] and the measurement of thermal properties of materials [11,12], for example.
In this paper, we look at a particular aspect of the design problem: given a known response model, how do we choose a minimal number of measurements, taking into account the associated uncertainties, in order to estimate all the parameters of the system under study as accurately as possible. Thus, the problem can be posed as: given m potential observations to determine n parameters, m > n , what is the best choice of n observations. In the context of (linear) least squares estimation, a version of the problem can be formulated as finding the n × n submatrix of the complete m × n observation matrix that has maximum determinant. The problem can be posed as an example of the following general optimisation problem. Suppose A = { a i , i = 1 , , m } is a set with m elements. Represent the set P ( A ) of all subsets of A as the set of characteristic functions χ : A { 0 , 1 } : if B A , then the corresponding function χ B P ( A ) is such that χ B ( a i ) = 1 if a i B and is 0 otherwise. Let D : P ( A ) R be a real valued function defined on P ( A ) . The optimisation problem is
min χ P ( A ) D ( χ ) subject to i = 1 m χ ( a i ) = n .
In theory, the global solution to this DoE problem can be found by a brute force assessment of all choices of subsets, a total of m ! / ( n ! ( m n ) ! ) possibilities. For all but small m and n, this approach will be computationally infeasible, even if the functional D is cheap to evaluate. Two other related approaches are commonly used to partially solve this type of combinatorial optimisation problem.
The first approach, with a long history, see e.g., [13,14,15], is to use a sequential approach that at each stage constructs a new candidate subset by removing one element from the subset, replacing it by a new element that leads to a strict improvement in D. The computational cost per step for such an algorithm will be given by the cost evaluating D and the cost of searching for a suitable exchange from the n ( m n ) possibilities. Ideally, the computational cost of an exchange should be no worse than O ( m ) or O ( m log m ) as a function of m. The second approach is based on convex relaxation, replacing a combinatorial problem by a convex optimisation problem [16]. Let 0 w i = w ( a i ) 1 , where we think of w i as specifying the degree to which the ith element is in the solution optimal subset. If W ( A ) is the set of functions A [ 0 , 1 ] , then the optimisation problem is defined by
min w W ( A ) D ( w ) subject to i = 1 m w ( a i ) = n .
In this approach, the combinatorial constraint χ ( a i ) { 0 , 1 } is relaxed to the convex constraint 0 w ( a i ) 1 . Often, the optimal w is in a fact a characteristic function associated with an n-element subset. The optimisation algorithms for such problems are generally iterative and often also have a sequential element, such as in sequential quadratic programming [17] and semidefinite programming [18]. Convex relaxation approaches to DoE problems were considered in [4,9,10,16], for example.
In this paper, we show how two algorithms, both following a sequential approach and developed for numerical linear algebra applications, can be repurposed to address the design problem of choosing n observations from m possible observations.
The remainder of this paper is organised as follows. In Section 2, we give an overview of linear least squares estimation, while in Section 3, we describe commonly used aggregate measures of uncertainty and the corresponding criteria used in optimal experimental design. In Section 4, we describe two algorithms that address the DoE problem under consideration, as well as sequential algorithms for choosing additional observations in a step-wise optimal way. Example applications of the algorithms to design problems arising in metrology  [19,20,21,22] are given in Section 5. Our concluding remarks are given in Section 6.

2. Least Squares Problems

For m × n observation matrix C, m × 1 vector of measurements y, the least squares estimate a is the solution of
min α ( y C α ) ( y C α ) .
The QR factorisation of C expresses C = Q R as a product of an m × m orthogonal matrix Q and an upper triangular matrix R. If
C = [ Q 1 Q 2 ] R 1 0 = Q 1 R 1 ,
where Q 1 is the submatrix constructed from the first n columns of Q and Q 2 that is constructed from the remaining m n columns, then
a = ( C C ) 1 C y = R 1 1 Q 1 y ,
y ^ = C a = C ( C C ) 1 C y = Q 1 Q 1 y ,
r = y C a = ( I Q 1 Q 1 ) y = Q 2 Q 2 y ,
are the least squares solution, model approximant, and associated residual vector. The QR factorisation enables a to be given by the solution of R 1 a = Q 1 y and this solution can be constructed using back substitution, solving for a n first, then a n 1 , etc., exploiting the upper-triangular form of R 1 . The QR factorisation also avoids the numerical loss of accuracy that can occur in forming C C and its inverse in the case in which C is ill-conditioned [23] (Chapter 5). The computational cost of evaluation the least-squares solution in O ( m n 2 ) .
The n × m matrix S = R 1 1 Q 1 is the sensitivity matrix of the solution parameters a with respect to the data y with s j i = a j / y i . If the variance matrix associated with y is V y , then the variance matrix associated with the solution a is
V a = S V y S .
For the case V y = I ,
V a = S S = ( R 1 R 1 ) 1 = ( C C ) 1 .
For this case, the trace of the variance matrix is given by
Tr ( V a ) = j i s j i 2 ,
the variance associated with a j is u 2 ( a j ) = i s j i 2 and j s j i 2 is the contribution of the uncertainty associated with the ith observation to the trace of V a .
The least-squares solution a is the maximum likelihood estimate of α for the model y N ( C α , I ) . If y is associated with the variance matrix V y , i.e., y N ( C α , V y ) , and V y is full rank and has Cholesky decomposition V y = L L , where L is lower triangular [23], then the maximum likelihood estimate a of α solves the Gauss–Markov problem
min α ( y C α ) V y 1 ( y C α ) .
If
C ˜ = L 1 C , y ˜ = L 1 y ,
then the solution of (3) is the least-squares solution of C ˜ α y ˜ :
a = C ˜ C ˜ 1 C ˜ y ˜ .
In particular, if V y is a diagonal matrix with u i 2 > 0 in the ith diagonal element, then C ˜ is constructed from the weighted rows of C, and similarly for y ˜ :
c ˜ i = w i c i , y ˜ i = w i y i , w i = 1 / u i .
In general, the cost of evaluating the Cholesky factorisation of V y is O ( m 3 ) , which could be problematic for large m. Often, V y has a structure that enables the factorisation to be performed in O ( m ) , as in the simple but common case of a diagonal variance matrix.

3. Aggregate Measures of Uncertainty

A central issue in design of experiment analysis is to maximise some measure of the information gain subject to constraints on resources. In a metrology context, in particular, it is natural to express the information gain in terms of uncertainties associated with the fitted parameters; i.e., related to the n × n variance matrix V a . Given an aggregate measure of uncertainty derived from V a , one experiment will be judged to be more informative than another if it is associated with a lower aggregate uncertainty. There are a number of aggregate measures that are commonly used. In this paper, we are concerned with two: A-optimality and D-optimality.
A full rank variance matrix V is a symmetric positive definite matrix and has an eigenvalue decomposition with eigenvalues λ j , j = 1 , n . The A-measure is the trace of Tr ( V ) of V, the sum of the diagonal elements of V, as well as the sum of the eigenvalues of V:
Tr ( V ) = j = 1 n λ j .
We note that Tr ( σ 2 V ) = σ 2 Tr ( V ) . The D-measure is the determinant | V | of V, and also the product of the eigenvalues of V:
| V | = j = 1 n λ j .
We also evaluate
d ¯ ( V ) = | V | 1 / n = j = 1 n λ j 1 / n ,
the geometric mean of the eigenvalues, noting that d ¯ ( σ 2 V ) = σ 2 d ¯ ( V ) .
Arguments can be made for and against using each of these two measures. Given V, the A-measure can be evaluated in O ( n ) steps, while the D-measure will usually involve O ( n 3 ) steps, using the Cholesky factorisation of the variance matrix to evaluate its determinant. The A-measure has the apparent advantage of being most easily interpreted as an aggregate uncertainty, being the sum of the variances j u 2 ( a j ) associated with each of the parameter estimates a j or, more precisely, the numerical values of these estimates. From a dimensional analysis point of view [24], the A-measure can only be meaningful if all the parameters in a have the same dimension, length, mass, etc. Otherwise, the trace involves adding quantities of different dimensions. The D-measure is consistent with a dimensional analysis.
The A-measure can also be regarded as the variance associated with the sum j a j of parameter estimates, assuming these estimates as statistically independent. In general, they will not be independent, and one disadvantage of the A-measure is that it takes no account of correlation, being defined solely in terms of the diagonal elements of V. The D-measure does take into account correlation. Consider the 2 × 2 variance matrix
V = 1 ρ ρ 1 , Tr ( V ) = 2 , | V | = 1 ρ 2 ,
associated with parameters a 1 and a 2 . The A-measure is independent of the correlation coefficient ρ while the D-measure depends directly on ρ 2 . If the result of an experiment is to change ρ from 0 to near 1, there is a gain of information, since a 1 a 2 is now much more accurately estimated. The D-measure reflects this information gain, the A-measure does not.
The D-measure is invariant to re-parametrizations of the model, while the A-measure is not. Suppose V a = ( C C ) 1 and let b = B a , where B is n × n and full rank, so that b can be regarded as an alternative parametrization of the problem. For the same experimental design, the variance matrix associated with the estimates b is given by
V b = B V a B .
We have
| V b | = | B V a B | = | B | 2 | V a | ,
so that a strategy that is optimal for the D-measure for parametrization a will also be optimal for parametrization b. For the A-measure,
Tr ( V b ) = Tr ( B V a B ) = Tr ( B B V a ) = Tr ( V a B B ) ,
and, in general, there is no constant K such that Tr ( B V a B ) = K Tr ( V a ) , unless B B is a multiple of the identity matrix. In a similar vein, the D-measure is invariant with respect to changes in unit, the A-measure is not.
The eigenvalues λ j of V a are related to the size of the confidence ellipses centred about the parameter estimates a. In fact, for a given confidence level, the lengths of the semi-axes are proportional to λ j . Hence, the D-measure is related to the square of the volume of the confidence ellipse, while the A-measure is related to the sum of the squares of the semi-axis lengths.

4. Choosing a D-Optimal Subset of n Measurements

Let C be an m × n observation matrix, m > n , representing m measurements that could potentially be made. If we are constrained only to use n measurements, which n should we choose, in order to minimise an aggregate measure of the uncertainties associated with the fitted parameters? It is assumed that there is at least one n × n submatrix constructed from the rows of C that is full rank, allowing all the parameters a to be estimated. In this section, we consider two algorithms described below in Section 4.1 and Section 4.2, for addressing the issue of subset selection with a view to providing a D-optimal solution. Both algorithms were designed to address other problems but can be repurposed for design of experiment applications.
Let C k be a full rank n × n submatrix of C defined by a subset of the rows of C indexed by row indices I k = { i 1 , , i n } , and set V k = H k 1 , where H = C k C k . Then,
| V k | = | H k | 1 = | C k | 2 ,
so that choosing I k to minimise | V k | amounts to choosing I k to maximise | C k | . For any square n × n matrix A with rows a i , | A | is the volume of the parallelopiped spanned by the vectors a i in R n . This volume will depend on the Euclidean lengths a i 2 of the vectors and also their mutual independence. For vectors of equal length, | A | is maximised if the vectors a i are orthogonal to each other (and can therefore be rotated or reflected to align with the coordinate axes). In terms of the observation matrix C, the lengths of the row vectors c i reflect the accuracy associated with the measurement y i —more accurate measurements will be accorded a greater weight and therefore larger row vector c i in terms of Euclidean length. The orthogonality of the row vectors relates to how independent the sets of information represented by the measurements are from each other. Consider
y 1 y 2 = C a 1 a 2 + ϵ 1 ϵ 2 , C = C ( θ ) = 1 0 cos θ sin θ ϵ 1 , ϵ 2 N ( 0 , 1 ) .
Both measurements y 1 and y 2 are associated with the equal information gain, since c 1 = c 2 = 1 . The data point y 1 provides information about a 1 alone, while the data point y 2 potentially provides information about both a 1 and a 2 . For θ 0 , the observation equation involving y 2 only serves to provide more information about a 1 , and a 2 remains poorly estimated with | C ( θ ) | = | sin θ | 0 . For θ π / 2 , the information provided by y 2 provides significant new information about a 2 and | C ( θ ) | = | sin θ | 1 . In this way, amongst a set of measurements of comparable accuracy, we look for subsets that are mutually maximally independent. Both of the algorithms described below involve trying to select a set of measurements that are sufficiently independent.

4.1. Subset Selection Using the QR Factorisation with Column Pivoting

The QR factorisation of an m × n matrix C = Q R has the following geometrical interpretation. The column vectors c j of C define an n dimensional subspace C of R m . The orthogonal matrix Q defines an axis system for R m , such that the n columns of Q 1 define an axis system for C and the m n columns of Q 2 define an axis system for the space of vectors C orthogonal to C . The columns for Q 1 are constructed so that q 1 is aligned with c 1 , and so that there is an r 11 such that c 1 = r 11 q 1 . The vector q 2 is chosen to lie in the plane defined by c 1 and c 2 , and so there are scalars r 12 and r 22 such that c 2 = r 12 q 1 + r 22 q 2 , etc. This gives the factorisation
c 1 c n = q 1 q n r 11 r 12 r 1 n 0 r 22 r 2 n 0 0 0 0 r n n ,
i.e., in matrix notation C = Q 1 R 1 .
An equivalent geometrical interpretation is that the orthogonal matrix Q , a combination of rotations and reflections, transforms C so that its first column is aligned with the first axis in R m , the second column lies in the plane defined by the first and second axes in R m , the third column lies in the three-dimensional space defined by the first three axes in R m , and so on: Q C = R .
For the standard QR algorithm, the columns of C are processed in the order they appear in the matrix C. For the QR algorithm with column pivoting [23], the order of the columns is changed to improve the numerical properties of the algorithm. The column of largest Euclidean length is aligned with the first axis. Of the remaining columns, the column that is furthest from the space defined by the first axis is then rotated to the plane defined by the first and second axes. At the k + 1 th stage, it is the column furthest from the subspace defined by the first k axes directions that is chosen to be transformed to lie in the space defined by the first k + 1 axes directions. The output of the algorithm includes the specification of a permutation matrix P that permutes the columns of C, along with an orthogonal Q and upper triangular R such that C P = Q R .
The first algorithm considered here is based on a subset selection algorithm described in [23] (Section 5.5). The issue addressed by the subset selection algorithm is that of trying to explain a measured response vector y in terms of basis vectors, usually representing covariates x and functions of covariates, stored in an m × p matrix C. It is assumed that y lies close to a n dimensional subspace defined by a subset of the columns of C. The problem is to estimate n and choose n columns of C that best explain the response, using the QR with column pivoting to select columns that are optimally mutually independent. This information can then be used to predict other, possibly future, responses based on knowledge of the variates represented by the selected columns. The algorithm uses the singular value decomposition (SVD) to estimate n and the QR algorithm with column pivoting to select an appropriate set of n columns.
Our design of experiment problem is different. The observation matrix C of all potential measurements is assumed to be full rank, so estimating the rank of C is not required. More importantly, our selection problem is to select from the rows of C, not the columns. However, the selection of a set of vectors that are maximally mutually independent is directly relevant. Suppose C has QR factorisation C = Q 1 R 1 , where Q 1 is m × n and R 1 in n × n and upper-triangular and full rank. The matrix R 1 can be used to re-parametrize the problem in terms of b = R 1 1 a . The observation associated with parameters b is Q 1 . Since D-optimal designs are independent of parametrization, it is sufficient to work with Q 1 as follows. We use column pivoting to find the QR factorisation of Q 1 so that Q 1 P = U T , where U is an n × n orthogonal matrix and T is upper-triangular. The first n columns selected by the QR decomposition of Q 1 specified by P also specify the n rows of C that are a good choice for the D-optimal design. This algorithm is summarised below (Algorithm 1). Its computational complexity is O ( m n 2 ) .    
Algorithm 1: SSQR: subset selection algorithm using QR with column pivoting to determine a favourable subset of an observation matrix in terms of the D-measure
Data: An m × n observation matrix C.
Result: Index set I { i = 1 , , m } where the first n elements specify C n that represent a good selection of the rows of C in terms of maximising the determinant.
Steps:
Determine the QR factorization of C = Q 1 R 1 , where Q 1 is an m × n orthogonal matrix.
Use column pivoting to find the QR factorisation of Q 1 so that Q 1 P = U T , where U is an n × n orthogonal matrix and T is upper-triangular.
Assign the index I by assigning I ( k ) = i k where i k specifies the row index of the kth column of P such that P ( i k , k ) = 1 .
The SSQR algorithm is not targeted directly at determining a D-optimal design but is an approach that, at least heuristically, has elements that makes it likely to produce a design that is good in terms of the D-measure. The algorithm is an example of a sequential ‘greedy algorithm’, in that at each step, a choice is made that maximises some measure, irrespective of what previous steps have been made, and takes no account of what choices could arise in a future step. Once a row has been selected, there is no opportunity for the row to be deselected at a future iteration. The algorithm is largely independent of the ordering of the rows of the observation matrix C and, other than for reasons of rounding error effects and symmetries in the observation matrix, will arrive at the same solution for different row orderings. As has been mentioned, there is no guarantee that this solution is globally optimal. Example calculations using the algorithm are discussed in Section 5.

4.2. Exchange Approach Based on the Gu and Eisenstat Algorithm

The second algorithm described in this paper is directly targeted at determining a D-optimal design. It is based on the iterative algorithm of Gu and Eisenstat [25] for determining a strongly rank-revealing QR decomposition. Below, we show how it can be adapted to provide a reduction in the D-measure at each iteration through exchanging rows and to stop when no reduction is possible. It starts with a selection C 0 of n rows of the m × n observation matrix C, where C 0 is already full rank; e.g., that provided by the SSQR subset selection algorithm of Section 4.1.
We recall that minimising the D-measure is equivalent to maximising the absolute value of the determinant of the upper-triangular factor of the observation matrix. The idea of the algorithm is as follows. Partition C as C = A B , where A is n × n and B is n × ( m n ) . We assume that A = C 0 is full rank. The Gu–Eisenstat (GE) algorithm uses the following result:
Proposition 1.
For n × n full rank matrix A and n × p matrix B, let A ˜ be the matrix constructed by replacing the ith column of A, 1 i n , by the jth column of B, 1 j p . Then,
| A ˜ | = F i , j | A |
where F = A 1 B .
Proof. 
For n-vectors u and v,
| A + u v | = | A | ( 1 + v A 1 u ) ;
see Appendix A.2. The result follows for v = e i , the ith column of the identity matrix and u = b j a i , noting that A 1 a i = e i .    □
This suggests the following approach, assuming that the first n rows of C are full rank (Algorithm 2).
Algorithm 2: Basic steps to determine a D-optimal subset of an observation matrix
Algorithms 17 00193 i001
The main computational element of this algorithm is forming F = A 1 B at each iteration. Forming F explicitly in this way involves O ( n 3 ) steps to evaluate A 1 and a further O ( m n ) steps to evaluate the matrix product. Let Q be orthogonal such A ˜ = Q A is upper-triangular and set
Q A B = A ˜ B ˜ .
Then, F = A 1 B = A ˜ 1 B ˜ , and the latter can be computed efficiently using back-substitution. Thus, step 4 in Algorithm 2 can be replaced by
4.1
Determine orthogonal Q such that Q A is upper-triangular.
4.2
Set
A B : = Q A B , F = A 1 B .
Since | Q A | = | A | , up to sign, the modified algorithm still determines the optimal index set. As described in the Gu and Eisenstat paper, these computations can be made much more efficient by exploiting the fact that at each iteration we are performing a rank-one modification to update an upper triangular matrix in step 2 in Algorithm 2 above. Suppose we have determined to replace ith column of the upper-triangular matrix A by the jth column of B. First, we move the ith column of A to the nth column, as shown schematically in (7).
* * i * * * * i * * * i * * * * * * * * * * * * * * i * * * * i * * * i * * * * * *
This step can be written as
A 1 a i A 2 A 1 A 2 a i = A Π i , n
where Π i , n is the appropriate permutation matrix. The right-hand matrix is upper-triangular except for the sub-diagonal elements stored in A 2 . Let Q be the orthogonal matrix where A ˜ = Q A Π i , n is upper-triangular and set B ˜ = Q B . Note that Q only acts on rows and columns i to n of A Π i , n and rows i to n of B. We have that
F ˜ = A ˜ 1 B ˜ = Π i , n F .
We can now replace the nth column a ˜ n of A ˜ by the jth column b ˜ j of B ˜ , forming matrices
A ¯ = A ˜ + u e n , B ¯ = B ˜ u e j , u = b ˜ j a ˜ n .
Note that the matrix A ¯ on the left remains upper-triangular. For the next iteration it remains to calculate F ¯ = A ¯ 1 B ¯ . Using the Sherman–Morrison formula [26] (Appendix A.1), we have
A ¯ 1 = A ˜ + u e n 1 = A ˜ 1 u ˜ v ˜ 1 + e n u ˜ , u ˜ = A ˜ 1 u , v ˜ = A ˜ e n ,
so that
A ¯ 1 B ¯ = A ˜ 1 B ˜ A ˜ 1 u e j u ˜ ( v ˜ B ˜ ) 1 + e n u ˜ + u ˜ ( v ˜ u ) e j 1 + e n u ˜ , = F ˜ + u ˜ w ,
where
w = v ˜ u 1 + e n u ˜ 1 e j B ˜ v ˜ 1 + e n u ˜ .
Thus, F ¯ is a rank-one update of F ˜ . The expressions above can be considerably simplified. We note that
u ˜ = F ˜ ( : , j ) e n , v ˜ u 1 + e n u ˜ 1 = 1 / F ˜ ( n , j ) , B ˜ v ˜ 1 + e n u ˜ = F ˜ ( n , : ) / F ˜ ( n , j ) ,
so that
w = 1 F ˜ n , j ( F ˜ ( n , : ) + e j ) .
Recalling that F ˜ = Π i , n A 1 B , we see that from (8), A ¯ 1 B ¯ can be constructed as a rank-one update of the previously calculated F = A 1 B . We also note that F ˜ ( n , j ) = F ( i , j ) > 1 (since the update is only invoked to increase | A | ) so that the rank-one update of F is numerically stable. The formation of A ˜ and A ¯ requires O ( n 2 ) steps and the formation of F ¯ requires O ( m n ) steps. These calculations lead to a computationally efficient and numerically stable version (Algorithm 3) of Algorithm 2:
Algorithm 3: GE: efficient algorithm to determine a D-optimal subset of an observation matrix
Algorithms 17 00193 i002
The algorithm is a sequential descent algorithm, with each exchange producing a decrease in the determinant of the variance matrix by at least a factor of 1 / f 2 < 1 , ensuring that it stops in a finite number of steps. However, there is no guarantee that the algorithm will determine the globally D-optimal subset, only a locally D-optimal solution. A different ordering of the rows of the observation matrix is quite likely to produce a different solution. Consider the following matrix C ( a ) :
C ( a ) = 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 a 0.50 0.50 0.50 0.50 0.17 0.83 0.17 0.50 0.17 0.17 0.83 0.50 0.83 0.17 0.17 0.50 .
The bottom four rows form an orthogonal matrix Q with | Q | = 1 . However, for a > 0.5 , there is no row interchange that will allow the determinant of the first four rows to be increased, and the GE algorithms stops with the first four rows with determinant a, sub-optimal if a < 1.0 . For 0.5 < a < 1.0 , the SSQR algorithm, by contrast, selects the bottom four rows, in this case, the globally optimal solution.

4.3. Combined SSQR-GE Algorithm

An effective approach in practice is to combine the SSQR and GE algorithms, using the SSQR algorithm to find an initial selection of columns, and to use the GE interchange algorithm to improve on, if possible, this selection. This combined algorithm has the advantages of being largely independent of the row ordering and will deliver a local minimum based on the SSQR algorithm starting point. Again, there is no guarantee that the solution will be a global optimum, but the examples discussed in Section 5 show that the combined algorithm works well.

4.4. Sequential Approach for Choosing a Set of Additional Observations

The algorithms described in Section 4.1 and Section 4.2 are designed to determine a minimal set of observations that produce D-optimal estimates of the model parameters a. We now consider the case in which estimates of a have already been established. Suppose the (full rank) variance matrix V = L L = ( R R ) 1 represents the current knowledge about parameters a and up to m (new) measurements are potentially available, modelled according to
y i N ( c i a , 1 ) , i = 1 , , m .
(As before, we assume that, for each i, y i and c i have been weighted so that the uncertainty associated with y i is 1.) Which measurements should we make in order to best improve our knowledge about a? We suppose that resources exist to make p more observations. An exhaustive search of all combinations is only feasible for small p and modest m. The sequential algorithms below add the observations one at a time, with the choice at each stage designed to maximise the information. These are further examples of so-called ‘greedy algorithms’.
Let C be the m × n observation matrix whose ith row is c i and V i the variance matrix associated with a given that the ith measurement (and only the ith measurement) is made. Then,
V i = ( R R + c i c i ) 1 = V V c i ( V c i ) 1 + c i V c i ,
using the Sherman–Morrison formula [26] (Appendix A.1). Let
f i = V c i , g i 2 = c i f i = c i V c i .
Then,
V i = V f i f i 1 + g i 2 = V u i u i , u i = 1 1 + g i 2 f i .
so that V i is a symmetric rank-one update of V.
The eigenvalues λ j , i of V i can be related to the eigenvalues λ j of V through the following result [23] (Section 8.1), [27] (p. 94):
Proposition 2.
Suppose W = V + ρ u u , where V is an n × n symmetric matrix, u is an n-vector with u u = 1 and ρ is a constant. Suppose V and W have eigenvalues
λ 1 ( V ) λ 1 ( V ) λ 2 ( V ) λ n ( V ) ,
and
λ 1 ( W ) λ 1 ( W ) λ 2 ( W ) λ n ( W ) .
If ρ 0 , then
λ i ( W ) [ λ i ( V ) , λ i 1 ( V ) ] , i = 2 , , n ,
while if ρ 0 , then
λ i ( W ) [ λ i + 1 ( V ) , λ i ( V ) ] , i = 1 , , n 1 .
In either case, there exist n constants γ i , γ i 0 , with γ 1 + γ n = 1 , such that
λ i ( W ) = λ i ( V ) + γ i ρ .

4.4.1. Sequential D-Optimality Algorithm

We use the relationship (5) to write
| V i | = | ( R R + c i c i | 1 , = | R R | 1 | 1 + c i ( R R ) 1 c i 1 , = | V | 1 + c i V c i 1 , = 1 1 + g i 2 | V | , g i 2 = c i V c i .
A sequential algorithm can therefore be designed which at each stage chooses the i that maximises g i 2 . The algorithm can be run in two modes: the first in which the algorithm can choose any row at most one time, the second in which there is no limit in the number of times any row can be selected. Having a row repeated k times is equivalent to having a single more accurate observation with its uncertainty reduced by a factor of 1 / k . In the second mode, it is often the case that additional measurements correspond to repeat measurements chosen from the D-optimal set of n points. In the algorithm below (Algorithm 4), the mode is controlled by the flag i R :
Algorithm 4: Sequential approach to optimising the D-measure in choosing p observations from a possible m > p observations
Algorithms 17 00193 i003
For each q, the only operations required are matrix–vector multiplications and similar, requiring at most O ( m n ) steps. This compares with O ( m n 3 ) steps for an algorithm not exploiting rank-one updates.

4.4.2. Expected Information Gain Calculations

The quantities t q calculated at each stage give the reduction in determinant of the variance matrix at the qth step. The following argument can be used to provide an expected value for t q , under the assumption that the rows of C each provide approximately the same amount of information. Suppose C consists of rows with a 1 in one column and zeros elsewhere. A D-optimal design would select rows so that the ones are distributed equally amongst the columns. If a total of q rows are selected, the variance matrix V q is such that
V q n q I , | V q | n q n .
From this argument, the expected value of t q , q > n , is such that
t q t ˜ ( q ) = q 1 q n .
In terms of the geometric mean d ¯ of the eigenvalues of the variance matrices, we have
d ¯ ( V q ) = q 1 q d ¯ ( V q 1 ) .
For q = n + 1 , we have
1 2 n n + 1 n 1 e 0.37 ,
where the left hand value corresponds to the case n = 1 , and 1 / e is the limiting value as n .

4.4.3. Sequential A-Optimality Algorithm

We can also construct a counterpart of Algorithm 4 for A-optimality. For any v, Tr ( v v ) = v v = v 2 . With f i and g i as in (10), if
τ i 2 = f i 2 / ( 1 + g i 2 ) , f i 2 = f i f i ,
then
Tr ( V i ) = Tr ( V ) τ i 2 .
This last expression calculates the change in the A-measure for a given choice of one additional measurement. A sequential algorithm can therefore be designed which at each stage chooses the i that maximises τ i 2 . Each stage requires the calculation of the quantities f i 2 and g i 2 , i = 1 , , m . If V is the current variance matrix, then the f i 2 are given by the sum of the squares of row elements of F = V C . Suppose measurement k maximises τ i 2 . Then, the update of F is given by
F = ( V u k u k ) C = F u k w , w = C u k ,
and the updated of g i 2 is given by g i 2 w i 2 , with the latter evaluated as ( g i w i ) ( g i + w i ) for better numerical accuracy. These steps are summarised in (Algorithm 5):
Algorithm 5: Sequential approach to optimising the A-measure in choosing p observations from a possible m > p observations
Algorithms 17 00193 i004
As for Algorithm 4, the computational requirement for each step is O ( m n ) .

5. Numerical Examples

5.1. Polynomial Calibration Curves

Many instrument response functions are modelled as polynomial curves, most commonly, of order 2 (degree 1), i.e., modelling a linear response, but higher orders arise, e.g., in the calibration of platinum resistance thermometers [28], or in the calibration of a stage motion [29] where polynomials of order 10 are involved. (Design problems for problems for polynomial models augmented by Gaussian process models are considered in [4].) Suppose a polynomial response function of order n is defined on the interval [−1,1], and that it is desired to calibrate the response function using n calibration points x i [ 1 , 1 ] . Which choice of n points is D-optimal? For a linear (quadratic) response n = 2 ( n = 3 ), it is intuitively clear that the end points x = ± 1 (and x = 0 ) are optimal. For higher orders, the optimal choice is not obvious. In fact, the D-optimal set [30] for order n is given by solutions of
( 1 x 2 ) L ˙ n ( x ) = 0 ,
where L ˙ n ( x ) is the derivative of Legendre polynomial of order n (degree n 1 ). These solutions can be approximated by the so-called ‘arcsine’ points
x i = cos π n 1 i n 1 , i = 0 , 1 , , n 1 .
Table 1 gives the results of applying the SSQR algorithm and the combined SSQR-GE algorithm to estimate the D-optimal calibration points for the cases n = 4 , 5 , , 11 choosing from m = 2001 possible points equally spaced in the interval [ 1 , 1 ] in steps of 0.001. For these calculations, we have used Chebyshev polynomial basis functions [31] for numerical stability (but the optimal choice of points is independent of the choice of basis functions.) The arcsine estimates are also given in the table, along with the D-optimal points derived from (13); see Appendix B. The results show that, in terms of closeness to the D-optimal points, the SSQR algorithm improves on the arcsine estimates, while the GE algorithm improves the SSQR solution and is accurate to three decimals in almost all cases (there is one exception for the case n = 11 ). Table 2 gives the computed geometric mean measure d ¯ ( V a ) given in (4) for evenly spaced calibration points, the arcsine solutions, the SSQR solutions, and the SSQR-GE solutions given in Table 1. Also shown is the number of GE exchanges undertaken to improve the SSQR solutions. The table reflects the improvements provided by the SSQR and SSQR-GE algorithms. The number of GE exchanges is small compared to m = 2001 . The evenly spaced calibration points are markedly worse than the other solutions for larger n.
Figure 1 graphs the values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, without repeat measurements, for the cases n = 4 , upper curve, and n = 11 , lower curve, starting with the SSQR-GE solution. The values of t q are batched with a significant jump after every n additional measurement. Without the repeat measurement, the algorithm tends to select available points closest to the D-optimal n points. If the algorithm is run with repeat measurements allowed, the algorithm selects the same set of D-optimal n points again and again. This behaviour is reflected in the graphs of t q for this case; see Figure 2.

5.2. Tensor Product of Polynomials

Our second example arose from a problem relating to the estimation of heat distribution on an plate [32]. The application involved modelling the heat distribution as a Gaussian process model [33] involving tensor product polynomials, along with spatially correlated effects. A sub-problem is related to choosing a set of calibration points from a 10 × 14 grid of possible sensor locations in order to determine the tensor product mean function. The tensor product model is of the form
f ( x , y , a ) = i = 1 n x j = 1 n y a i j f i ( x ) g j ( y ) ,
where f i ( x ) and g j ( y ) are Chebyshev polynomial basis functions. Figure 3 shows the solution calibration points for n x = n y = 5 constructed from polynomial basis functions up to degree 4. Two solutions are shown: the first in which the calibration points are selected from a 131 × 91 finer grid of 11,921 points, the second in which the calibration points are selected from a 14 × 10 grid of 140 points. The solutions found for the fine grid coincide with a grid 5 × 5 grid of points ( x k * , y l * ) , where the x k * , k = 1 , , 5 are the D-optimal solutions derived from the solution of (13) on the interval [0, 20] and the y l * are the D-optimal solutions on the interval [0, 10]. For this arrangement, the observation matrix C associated with the polynomial fit is the tensor product C = C x C y . For square, full rank matrices A and B of size n A × n A and n B × n B
| A B | = | A | n A | B | n B .
Hence, it is consistent that the D-optimal points for the tensor product are constructed from the D-optimal points for the x- and y-basis functions. The solution points for the coarse grid are seen to approximate the solution for the fine grid, given the constraints imposed by the coarse grid. The coarse grid solution points do not represent a regular grid in this case. For the case of the fine grid, the GE algorithm took 16 iterations to reach a local optimum, starting with the SSQR solution. For the coarse grid, the GE algorithm confirmed that the SSQR solution is locally optimal.
Figure 3. Tensor product polynomial calibration points found by the SSQR-GE algorithm. The points marked with a ‘+’ are those selected from a fine grid, those marked ‘o’ from a coarse grid.
Figure 3. Tensor product polynomial calibration points found by the SSQR-GE algorithm. The points marked with a ‘+’ are those selected from a fine grid, those marked ‘o’ from a coarse grid.
Algorithms 17 00193 g003
Figure 4 and Figure 5 plot the values of t q , q = 1 , , 200 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, for the fine and coarse grids, respectively, with and without repeat measurements. For a coarse grid and no repeat measurements, the additional calibration points that are permitted consistently provide less information compared to the expected information gain given by t ˜ in (12). For a coarse grid of just 140 points, later additional measurements have to be selected from locations that are less informative in terms of improving the D-measure. With repeat measurements on the coarse grid, the actual information gain is in line with the expected information gain.

5.3. Coordinate Metrology

In coordinate metrology, the size and shape of a geometric surface S is determined from the measurement of the coordinates x i of points lying on the surface. If the geometric surface S = S ( a ) is parametrized by a = ( a 1 , , a n ) and d ( x , a ) is a measure of the distance from a point x to S ( a ) , then for measurement points x 1 : m = { x i , i = 1 , , m } , m n , estimates of a are determined by solving the least squares optimisation problem
min a = i = 1 m d 2 ( x i , a ) .
If the variance matrix associated with the measured points coordinates is σ 2 I , then the variance matrix V a associated with the fitted parameters is approximated by
V a = σ 2 J J 1 , J i j = d a j ( x i , a ) ,
involving the Jacobian matrix J. Thus, it is possible to determine optimal measurement strategies, i.e., where to measure on the surface S , in terms of the determinant of the Jacobian matrix J. Often, the case of choosing a minimal set of points with m = n is of interest. For simple geometries, an optimal set of points is straightforward to arrive at, e.g., three points equally spaced around a circle, but for more complex geometries, the choice of an optimal set may not be obvious, as considered below.
Figure 6 shows a reference artefact designed and calibrated by the Czech Metrology Institute (CMI) [34,35]. The form of the geometric surface is a hyperbolic paraboloid given by the equation
z = 24 + ( x / 8 1 ) ( y / 8 1 ) .
A paraboloid with its axis approximately parallel to the z-axis is parametrized in terms of two rotation angles α and β and six further parameters b such that
z ^ = b 1 x ^ 2 + b 2 y ^ 2 + b 3 x ^ y ^ + b 4 x ^ + b 5 y ^ + b 6 , x ^ = R ( α , β ) x ,
where R ( α , β ) = R x ( α ) R y ( β ) is the rotation matrix constructed from the plane rotation matrices
R x ( α ) = 1 0 0 0 cos α sin α 0 sin α cos α , R y ( β ) = cos β 0 sin β 0 1 0 sin β 0 cos β .
For the surface (15), nominally b = ( 0 , 0 , 1 / 64 , 1 / 8 , 1 / 8 , 25 ) . Figure 7, Figure 8 and Figure 9 show the distribution of the x- and y-coordinates for three sets X k , k = 1 , 2 , 3 , of eight calibration points. The first two are given by ‘expert judgement’; that is, possible choices for a measurement strategy based on engineering practice. The first expert choice, X 1 , Figure 7, represents an approximately uniform distribution, a first guess at a good distribution. This distribution of points is sufficient to determine all the parameters, and the associated 8 × 8 Jacobian matrix is full rank. (If the x- and y- coordinates of the data points are rotated through 45 degrees about the z-axis, the associated Jacobian matrix is rank deficient by three.) The second set X 2 came about through experimenting with moving the inner four points in dataset X 1 closer or further from the origin of the x y -plane. It turned out that moving the four points further from the origin was better, resulting in dataset X 2 , Figure 8. At first glance, the distribution of points in X 2 does not look a good choice. The third dataset is that obtained using the SSQR-GE algorithm. The GE algorithm performed 11 interchanges starting from the SSQR solution. The optimal choice has seven points along the boundary in the x y -plane and one point in the interior. Any interior point not too far from the origin would also result in a good selection. In terms of the aggregate measure d ¯ k defined in (4) associated with the three datasets, we have d ¯ 1 = 6.0 d ¯ 3 and d ¯ 2 = 2.2 d ¯ 3 .
Table 3 gives the square roots v i of the diagonal elements of ( J k J k ) 1 , where J k is the Jacobian matrix associated with the kth dataset. If the variance matrix associated with the measurements is σ 2 I , then v i = u ( a j ) / σ , where u ( a j ) is the estimate of the standard uncertainty associated with the jth parameter [36,37]. In terms of standard uncertainties, the D-optimal dataset X 3 is far better for estimating the surface parameters than the other two datasets.
Figure 10 plots the values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, with and without repeat measurements. For higher values of q, there is good agreement between t q and t ˜ q .

5.4. Calibration of a Network of Standards Using a Comparator

In this section, we consider the calibration of a number of standards, starting with one calibrated standard and using a comparator to calibrate the other standards relative to the calibrated standard. An example application is in mass calibration using a mass balance [38,39,40]. Given a calibrated standard A 1 , nominally of 1 kg, and two uncalibrated masses A 2 and A 3 each of nominal mass 0.5 kg, A 2 and A 3 can be calibrated using two measurements, the first comparing A 1 with the combined mass of A 2 and A 3 , and the second comparing A 2 and A 3 . A second example from length metrology relates to the calibration of gauge blocks [41], where it is possible to align two gauge blocks end to end, a process called ‘wringing’, to define a combined length. This discussion is relevant to the calibration of any set of artefacts defining an extensive quantity where the artefacts can be grouped to define a quantity that is (nominally) the sum of the individual quantities. We use the term ‘network’ to reflect the fact that the estimates of all the quantities are statistically correlated and an experiment that updates the information about one artefact will also update information about the other artefacts.
Let a = ( a 1 , , a n ) represent the values of the quantities associated with the artefacts where the nominal values a = ( a 1 , , a n ) of the quantities are such that the artefacts can be grouped into sets of two groups, with each group being associated with the same nominal value. Thus, there are disjoint subsets L i , R i I = { 1 , 2 , , n } , L i R i = , i = 1 , , m , such that
q L i a q q R i a q .
An example considered further below involves a set of artefacts having nominal values 1.00, 0.50, 0.50, 0.20, 0.20, 0.10, 0.10, 0.05, and 0.05. The fact that the two subsets are associated with the same nominal value allows a comparator measurement to be made, with the associated observation equation
0 y i = q L i a q q R i a q + ϵ i , ϵ i N ( 0 , σ i 2 ) ,
where ϵ i represents a random effect associated with the comparator measurement. The design problem is to define measurement strategies encoded by L i and R i , i = 1 , , n , that provide the most accurate calibration of the standards. For the example considered here, there are almost 400 viable experiments.
Table 4 and Table 5 show five designs for the calibration of the n = 9 standards. A ‘1’ in column j indicates that j L i , while a ‘ 1 ’ indicates that j R i . Each array of 9 × 9 elements is in fact the unweighted observation matrix C associated with the particular design. The first row in each design represents the observation associated with the calibration of the first standard using an absolute measurement system. This first observation is the only source of information about the first standard and must be included in the optimal solution. The uncertainties associated with the remaining standards scale with the uncertainty associated with the first observation.
The first design, given in Table 4, is one defined by ‘expert judgement’, and represents a typical set of experiments that would be performed in practice. (In fact, for this set of standards, it is not entirely obvious how to choose a design that will calibrate all of the standards). The next design, indicated by the first nine rows in Table 5, is that determined by the SSQR-GE algorithm with σ 1 = σ C = 1.0 and σ i = σ R = 0.5 , i = 2 , , n . The most striking and, at first sight, surprising feature of this second design is that almost all the artefacts are involved in experiments 2 to n. The reason for this is that, in some sense, the comparator uncertainties represented by σ i = σ R are partitioned amongst the artefacts involved, and more artefacts being included in an experiment means that each artefact is assigned a smaller uncertainty.
The two designs considered so far are based on the assumption that all the comparator measurements are associated with the same uncertainty σ R , irrespective of the number of artefacts and the nominal values of the artefacts. A more plausible assignment of uncertainties would take into account both the number and nominal values. For a mass balance, it would be usual for the associated uncertainty to have a component that varied in proportion to the mass. Similarly, a length measuring device will usually have one or more influencing factors, e.g., refractive index effects for the case of a laser interferometric comparator [41], that vary in proportion to length. The use of multiple artefacts in each experiment will also likely introduce effects that will add to the uncertainties.
For each experiment, let n i be the total number of artefacts involved and let
v i = q L i R i a q ,
be a measure of the total value of the quantities involved in the ith experiment. Given σ R , σ V and σ N , we can assign σ i according to
σ i 2 = σ R 2 + n i , 2 σ N 2 + v i 2 σ V 2 , n i , 2 = max ( n i 2 , 0 ) , i = 2 , , n .
Table 5 presents the optimal designs calculated using the SSQR-GE algorithm with σ i calculated as in (16) for the four different values of σ R , σ N and σ V shown in Table 6. The third design in Table 5 is the result penalising experiments with a large number of artefacts, while the fourth design in the table is the result of penalising experiments with a larger value of v i .
Table 6 gives the uncertainties u ( a i ) associated with a i for four different assignments of σ i , i = 2 , , n = 9 for the designs in Table 4 and Table 5. Also shown in the table is the aggregate measure d ¯ = | V a | 1 / n of uncertainty, the total number i n i of artefacts and a measure i v i of the total nominal values involved in each set of experiments. The last three rows give the values of σ R , σ N and σ V used to calculate σ i , i = 2 , , n . In all cases, the SSQR-GE algorithm leads to significant improvements over the expert judgement design.

6. Concluding Remarks

In this paper, we have looked at the experimental design problem: given m potential observations to determine n parameters, m > n , what is the best choice of n observations. In the context of least squares estimation, the problem was formulated as finding the n × n submatrix of the complete m × n observation matrix that has maximum determinant, corresponding to the D-optimality criterion. We described two algorithms, the SSQR and GE algorithms, to address this problem. Both were adapted from numerical linear algebra algorithms associated with the QR factorisation of a matrix. We also described algorithms for updating estimates of the problem parameters that are sequentially optimal with the respect to the D-optimality and A-optimality criteria. We illustrated the behaviour of the algorithms on a number of applications drawn from the field of metrology. These algorithms are straightforward to implement. The SSQR algorithm, in particular, is a minor adaption of the standard QR factorisation algorithm with pivoting. The algorithms enable computationally efficient implementations, exploiting rank-one matrix updating techniques. The algorithms are descent-type algorithms that will converge to a local minimum that is not necessarily a global minimum. In the examples considered, the algorithms determined effective experimental designs, often with much better performance than ‘hand-crafted’ designs based on expert judgement.

Funding

This work was supported by the UK’s National Measurement System programme for Data Science.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The author is grateful to the anonymous referees for their helpful comments, and to the Czech Metrology Institute for permission to use their photograph of the hyperbolic paraboloid, Figure 6.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Numerical Linear Algebra

Appendix A.1. Sherman-Morrison Formula

See [23]. For n × n full rank matrix A and n-vectors u and v
A + u v 1 = A 1 A 1 u v A 1 1 + v A 1 u .
If u ˜ = A 1 u and v ˜ = A v , then
A + u v 1 = A 1 u ˜ v ˜ 1 + v u ˜ = A 1 u ˜ v ˜ 1 + v ˜ u .

Appendix A.2. Determinant of a Rank One Update of a Matrix

See, e.g., [42,43]. For A and square matrix and n-vectors u and v
| A + u v | = | A | ( 1 + v A 1 u ) .
Writing
A + u v = A ( I + ( A 1 u ) v ) ,
it is sufficient to consider the case A = I : | I + u v | = 1 + v u . From the matrix equation
I 0 v 1 I + u v u 0 1 I 0 v 1 = I u 0 1 + v u
it is seen that the determinant of the matrix on the right-hand side is equal to the determinant of the matrix in the middle of the left-hand side, establishing the result for the case A = I .

Appendix B. Legendre Polynomials

See, e.g., [44]. Note: in this section, the parameter vector a = ( a 0 , , a n ) is an n + 1 vector indices starting from 0, not 1.
The D-optimal points for calibration experiments involving a polynomial response function of degree n are defined in terms of solutions of the polynomial equation
F ( x ) = 0 , F ( x ) = ( 1 x 2 ) L ˙ n 1 ( x ) = 0 ,
where L ˙ n 1 ( x ) is the derivative of Legendre polynomial of degree n 1 . Given an estimate x of a solution, an updated solution is given by Newton’s method [17,45] according to
x : = x F ( x ) / F ˙ ( x ) , F ˙ ( x ) = 2 x L ˙ n 1 + ( 1 x 2 ) L ¨ n 1 ( x )
where L ¨ denotes the second derivative of L ( x ) with respect to x. Starting estimates for the n + 1 solutions are given by the arcsine points:
x i = cos π n i n , i = 0 , 1 , , n .
Legendre polynomials L j ( x ) are orthogonal polynomial basis functions defined on the interval [−1, 1] and can be evaluated using the three-term recurrence relationship, starting with L 0 ( x ) = 1 , L 1 ( x ) = x , and for j 2 ,
j L j ( x ) = ( 2 j 1 ) x L j 1 ( x ) ( j 1 ) L j 1 ( x ) .
If f ( x ) is a degree n polynomial given by a sum of Legendre polynomial basis functions
f ( x , a ) = j = 0 n a j L j ( x ) ,
then f ˙ ( x ) , the derivative of f with respect to x, is a degree n 1 polynomial and can be written as
f ˙ ( x , a ˙ ) = j = 0 n 1 a ˙ j L j ( x ) ,
where a ˙ = A ˙ ( a 1 , , a n ) is defined in terms of the latter n elements of the n + 1 vector a = ( a 0 , a 1 , , a n ) , and A ˙ is the n × n matrix constructed from diagonals formed from the sequence 1 , 3 , 5 , , 2 n 1 . The matrix A ˙ for the case n = 10 is
A ˙ = 1 0 1 0 1 0 1 0 1 0 0 3 0 3 0 3 0 3 0 3 0 0 5 0 5 0 5 0 5 0 0 0 0 7 0 7 0 7 0 7 0 0 0 0 9 0 9 0 9 0 0 0 0 0 0 11 0 11 0 11 0 0 0 0 0 0 13 0 13 0 0 0 0 0 0 0 0 15 0 15 0 0 0 0 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 19
Similarly, the second derivative f ¨ ( x ) can be written as
f ¨ ( x , a ¨ ) = j = 0 n 2 a ¨ j L j ( x ) ,
where a ¨ = A ¨ ( a 2 , , a n ) is defined in terms of the latter n 1 elements of a = ( a 0 , a 1 , , a n ) , and A ¨ is the ( n 1 ) × ( n 1 ) matrix with
A ¨ = A ˙ ( 1 : n 1 , 1 : n 1 ) A ˙ ( 2 : n , 2 : n ) .

References

  1. Atkinson, A.C.; Donev, A.N.; Tobias, R.D. Optimum Experimental Designs, with SAS; Oxford University Press: Oxford, UK, 2007. [Google Scholar]
  2. Box, G.E.P.; Hunter, W.G.; Hunter, J.S. Statistics for Experimenters: Design, Innovation and Discovery, 2nd ed.; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar]
  3. Chaloner, K. Optimal Bayesian experimental design for linear models. Ann. Stat. 1984, 12, 283–300. [Google Scholar] [CrossRef]
  4. Forbes, A.B.; Minh, H.D. Design of linear calibration experiments. Measurement 2013, 46, 3730–3736. [Google Scholar] [CrossRef]
  5. Goos, P. The Optimal Design of Blocked and Split-Plot Experiments; Springer: New York, NY, USA, 2002. [Google Scholar]
  6. Goos, P.; Jones, B. Optimal Design of Experiments: A Case Study Approach; John Wiley & Sons: New York, NY, USA, 2011. [Google Scholar]
  7. Jones, B.; Nachtsheim, C.J. Effective Model Selection for Definitive Screening Designs. Technometrics 2017, 59, 319–329. [Google Scholar] [CrossRef]
  8. Montgomery, D.C. Design and Analysis of Experiments, 8th ed.; John Wiley & Sons: New York, NY, USA, 2013. [Google Scholar]
  9. Chretien, S.; Clarkson, P. A fast algorithm for the semi-definite relaxation of the state estimation problem in power grids. J. Ind. Manag. Optim. 2020, 16, 431–443. [Google Scholar] [CrossRef]
  10. Kekatos, V.; Giannakis, G.B. A convex relaxation approach to optimal placement of phasor measurement units. In Proceedings of the 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), San Juan, PR, USA, 13–16 December 2011; pp. 145–148. [Google Scholar]
  11. Berger, J.; Busser, T.; Dutykh, D.; Nathan Mendes, N. An efficient method to estimate sorption isotherm curve coefficients. Inverse Probl. Sci. Eng. 2019, 27, 735–772. [Google Scholar] [CrossRef]
  12. Berger, J.; Dutykh, D.; Mendes, N. On the optimal experiment design for heat and moisture parameter estimation. Exp. Therm. Fluid Sci. 2017, 81, 109–122. [Google Scholar] [CrossRef]
  13. Chernoff, H. Sequential Analysis and Optimal Design; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1972. [Google Scholar] [CrossRef]
  14. Meyer, R.K.; Nachtsheim, C.J. The Coordinate Exchange Algorithm for Constructing Exact Optimal Designs. Technometrics 1995, 37, 60–69. [Google Scholar] [CrossRef]
  15. Wald, A. Sequential Tests of Statistical Hypotheses. Ann. Math. Stat. 1945, 16, 117–186. [Google Scholar] [CrossRef]
  16. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  17. Gill, P.E.; Murray, W.; Wright, M.H. Practical Optimization; Academic Press: London, UK, 1981. [Google Scholar]
  18. Vandenberghe, L.; Boyd, S. Semidefinite Programming. SIAM Rev. 1996, 38, 49–95. [Google Scholar] [CrossRef]
  19. Barker, R.M.; Cox, M.G.; Forbes, A.B.; Harris, P.M. Software Support for Metrology Best Practice Guide No. 4: Modelling Discrete Data and Experimental Data Analysis. Technical Report DEM-ES 018; National Physical Laboratory: Teddington, UK, 2007.
  20. BIPM. The International System of Units (SI Brochure (EN)), 9th ed.; BIPM: Sèvres, France, 2019. [Google Scholar]
  21. BIPM; IEC; IFCC; ILAC; ISO; IUPAC; IUPAP; OIML. Evaluation of Measurement Data—Guide to the Expression of Uncertainty in Measurement; JCGM 100:2008; Joint Committee for Guides in Metrology: Sèvres, France, 2008.
  22. BIPM; IEC; IFCC; ILAC; ISO; IUPAC; IUPAP; OIML. Evaluation of Measurement Data—Supplement 2 to the “Guide to the Expression of Uncertainty in Measurement”—Extension to Any Number of Output Quantities; JCGM 102:2011; Joint Committee for Guides in Metrology: Sèvres, France, 2011.
  23. Golub, G.; Van Loan, C. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
  24. Hart, G.W. Multidimensional Analysis: Algebras and Systems for Science and Engineering; Springer: Berlin, Germany, 1995. [Google Scholar]
  25. Gu, M.; Eisenstat, S.C. Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization. SIAM J. Sci. Comput. 1996, 17, 848–869. [Google Scholar] [CrossRef]
  26. Sherman, J.; Morrison, W.J. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix. Ann. Math. Stat. 1950, 21, 124–127. [Google Scholar] [CrossRef]
  27. Wilkinson, J.H. The Algebraic Eigenvalue Problem; Oxford University Press, Inc.: New York, NY, USA, 1988. [Google Scholar]
  28. Preston-Thomas, H. The International Temperature Scale of 1990 (ITS-90). Metrologia 1990, 27, 3–10. [Google Scholar] [CrossRef]
  29. Bartlett, G.; Forbes, A.; Heaps, E.; Raby, A.C.; Yacoot, A. Spatial positioning correction for multi-axis nanopositioning stages. In Proceedings of the ASPE Convention and Expo, Indianapolis, IN, USA, 16–21 September 2022. [Google Scholar]
  30. Pukelsheim, F. Optimal Design of Experiments; SIAM: Philadelphia, PA, USA, 2006; Reproduction of 1993 book published by John Wiley and Sons, New York. [Google Scholar]
  31. Handscombe, D.C.; Mason, J.C. Chebyshev Polynomials; Chapman & Hall/CRC Press: London, UK, 2003. [Google Scholar]
  32. Forbes, A.B.; Jagan, K.; Dunlevy, J.; Sousa, J.A. Optimization of sensor distribution using Gaussian processes. Meas. Sens. 2021, 18, 100128. [Google Scholar] [CrossRef]
  33. Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
  34. Linkeová, I.; Zelený, V. Application of ruled surfaces in freeform and gear metrology. Acta Polytech. 2021, 61, 99–109. [Google Scholar] [CrossRef]
  35. Zelený, V.; Linkeová, I.; Skalnik, P. Calibration of freeform standard. In Proceedings of the 15th International Conference of the European Society for Precision Engineering and Nanotechnology, EUSPEN 2015, Leuven, Belgium, 1–5 June 2015; pp. 147–148. [Google Scholar]
  36. Forbes, A.B. Parameter estimation based on least squares methods. In Data Modeling for Metrology and Testing in Measurement Science; Pavese, F., Forbes, A.B., Eds.; Birkhäuser-Boston: New York, NY, USA, 2009; pp. 147–176. [Google Scholar]
  37. Forbes, A.B. Sensitivity analysis for Gaussian associated features. Appl. Sci. 2022, 12, 2808. [Google Scholar] [CrossRef]
  38. Grabe, M. Note on the Application of the Method of Least Squares. Metrologia 1978, 14, 143. [Google Scholar] [CrossRef]
  39. Hotelling, H. Some Improvements in Weighing and Other Experimental Techniques. Ann. Math. Stat. 1944, 15, 297–306. [Google Scholar] [CrossRef]
  40. Nielsen, L. Evaluation of mass measurements in accordance with the GUM. Metrologia 2014, 51, S183. [Google Scholar] [CrossRef]
  41. Lewis, A.J.; Hughes, E.B.; Aldred, P.J.E. Long term study of gauge block interferometer performance and gauge blocks stability. Metrologia 2010, 47, 473–486. [Google Scholar] [CrossRef]
  42. Ding, J.; Zhou, A. Eigenvalues of rank-one updated matrices with some applications. Appl. Math. Lett. 2007, 20, 1223–1226. [Google Scholar] [CrossRef]
  43. Hogben, L. (Ed.) Handbook of Linear Algebra; Chapman & Hall/CRC: Boca Raton, FL, USA, 2007. [Google Scholar]
  44. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions; Dover: New York, NY, USA, 1964. [Google Scholar]
  45. Fletcher, R. Practical Methods of Optimization, 2nd ed.; John Wiley and Sons: Chichester, UK, 1987. [Google Scholar]
Figure 1. Polynomial calibration points: values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, for the cases n = 4 , upper curve, and n = 11 , lower curve.
Figure 1. Polynomial calibration points: values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, for the cases n = 4 , upper curve, and n = 11 , lower curve.
Algorithms 17 00193 g001
Figure 2. As Figure 1, but for the case where repeated measurements are permitted.
Figure 2. As Figure 1, but for the case where repeated measurements are permitted.
Algorithms 17 00193 g002
Figure 4. Tensor product polynomial calibration points: values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, for the fine grid, with no repeat measurements, upper curve, and with repeat measurements, lower curve.
Figure 4. Tensor product polynomial calibration points: values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, for the fine grid, with no repeat measurements, upper curve, and with repeat measurements, lower curve.
Algorithms 17 00193 g004
Figure 5. Tensor product polynomial calibration points: values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, for the coarse grid, with no repeat measurements, upper curve, and with repeat measurements, lower curve.
Figure 5. Tensor product polynomial calibration points: values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, for the coarse grid, with no repeat measurements, upper curve, and with repeat measurements, lower curve.
Algorithms 17 00193 g005
Figure 6. The CMI hyperbolic paraboloid reference artefact.
Figure 6. The CMI hyperbolic paraboloid reference artefact.
Algorithms 17 00193 g006
Figure 7. Hyperbolic paraboloid calibration points: ‘expert’ guess at a good set of calibration points.
Figure 7. Hyperbolic paraboloid calibration points: ‘expert’ guess at a good set of calibration points.
Algorithms 17 00193 g007
Figure 8. Hyperbolic paraboloid calibration points: second ‘expert’ guess at a good set of calibration points, a modification of X 1 , Figure 7.
Figure 8. Hyperbolic paraboloid calibration points: second ‘expert’ guess at a good set of calibration points, a modification of X 1 , Figure 7.
Algorithms 17 00193 g008
Figure 9. Hyperbolic paraboloid calibration points: D-optimal points determined by the SSQR-GE algorithm.
Figure 9. Hyperbolic paraboloid calibration points: D-optimal points determined by the SSQR-GE algorithm.
Algorithms 17 00193 g009
Figure 10. Hyperbolic paraboloid calibration points: values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, with no repeat measurements, upper curve, and with repeat measurements, lower curve.
Figure 10. Hyperbolic paraboloid calibration points: values of t q , q = 1 , , 100 , and its analytical approximation t ˜ ( q ) given by (12), determined by the sequential D-optimal Algorithm 4, with no repeat measurements, upper curve, and with repeat measurements, lower curve.
Algorithms 17 00193 g010
Table 1. D-optimal calibration points for polynomial calibration curves of order n = 4 , , 11 defined in the interval [ 1 , 1 ] . For each set of four columns, the first column, labelled x 0 , gives the arcsine estimates of the calibration points in (14); the second column, labelled x QR , gives the optimal the calibration points determined by the SSQR algorithm; the third column, labelled x GE , gives the estimates determined by the combined SSQR-GE algorithm; while the fourth column, labelled x * , gives the optimal the calibration points derived from (13).
Table 1. D-optimal calibration points for polynomial calibration curves of order n = 4 , , 11 defined in the interval [ 1 , 1 ] . For each set of four columns, the first column, labelled x 0 , gives the arcsine estimates of the calibration points in (14); the second column, labelled x QR , gives the optimal the calibration points determined by the SSQR algorithm; the third column, labelled x GE , gives the estimates determined by the combined SSQR-GE algorithm; while the fourth column, labelled x * , gives the optimal the calibration points derived from (13).
x 0 x QR x GE x * x 0 x QR x GE x *
n = 4 n = 5
−1.000−1.000−1.000−1.000−1.000−1.000−1.000−1.000
−0.500−0.488−0.447−0.447−0.707−0.669−0.655−0.655
0.5000.4370.4470.4470.0000.0060.0000.000
1.0001.0001.0001.0000.7070.6860.6550.655
1.0001.0001.0000.000
n = 6 n = 7
−1.000−1.000−1.000−1.000−1.000−1.000−1.000−1.000
−0.809−0.786−0.765−0.765−0.866−0.845−0.830−0.830
−0.309−0.286−0.285−0.285−0.500−0.484−0.469−0.469
0.3090.3080.2850.2850.0000.0020.0000.000
0.8090.7790.7650.7650.5000.4930.4690.469
1.0001.0001.0001.0000.8660.8410.8300.830
1.0001.0001.0000.000
n = 8 n = 9
−1.000−1.000−1.000−1.000−1.000−1.000−1.000−1.000
−0.901−0.880−0.872−0.872−0.924−0.908−0.900−0.900
−0.623−0.613−0.592−0.592−0.707−0.692−0.677−0.677
−0.223−0.211−0.209−0.209−0.383−0.383−0.363−0.363
0.2230.2250.2100.2090.000−0.0020.0000.000
0.6230.6080.5920.5920.3830.3760.3630.363
0.9010.8820.8720.8720.7070.6950.6770.677
1.0001.0001.0001.0000.9240.9060.9000.900
1.0001.0001.0000.000
n = 10 n = 11
−1.000−1.000−1.000−1.000−1.000−1.000−1.000−1.000
−0.940−0.925−0.920−0.920−0.951−0.938−0.934−0.934
−0.766−0.753−0.739−0.739−0.809−0.796−0.784−0.784
−0.500−0.493−0.478−0.478−0.588−0.580−0.565−0.565
−0.174−0.177−0.165−0.165−0.309−0.311−0.296−0.296
0.1740.1680.1650.1650.000−0.0010.0000.000
0.5000.4970.4780.4780.3090.3070.2960.296
0.7660.7510.7390.7390.5880.5820.5660.565
0.9400.9250.9200.9200.8090.7950.7850.784
1.0001.0001.0001.0000.9510.9390.9340.934
1.0001.0001.0000.000
Table 2. For n = 4 , 5 , , 11 , rows two to five give the computed geometric mean measure d ¯ ( V a ) given in (4) for evenly spaced calibration points, the arcsine solutions, the SSQR solutions, and the SSQR-GE solutions given in Table 1. The final row gives the number of GE exchanges undertaken to improve the SSQR solutions.
Table 2. For n = 4 , 5 , , 11 , rows two to five give the computed geometric mean measure d ¯ ( V a ) given in (4) for evenly spaced calibration points, the arcsine solutions, the SSQR solutions, and the SSQR-GE solutions given in Table 1. The final row gives the number of GE exchanges undertaken to improve the SSQR solutions.
n4567891011
d ¯ 0 0.48710.41520.37480.35110.33790.33160.33040.3332
d ¯ AS 0.47140.37890.31750.27340.24030.21430.19350.1763
d ¯ QR 0.46820.37460.31300.26910.23620.21070.19010.1733
d ¯ GE 0.46730.37350.31190.26820.23540.20990.18940.1726
n GE 367912171921
Table 3. Standard uncertainty factors u ( a j ) / σ associated with the hyperbolic paraboloid parameters a = ( α , β , b ) for three eight-point datasets X k , k = 1 , 2 , 3 .
Table 3. Standard uncertainty factors u ( a j ) / σ associated with the hyperbolic paraboloid parameters a = ( α , β , b ) for three eight-point datasets X k , k = 1 , 2 , 3 .
u ( a j ) / σ α β b 1 b 2 b 3 b 4 b 5 b 6
X 1 2.38672.38670.00490.00490.01392.62192.621910.5328
X 2 0.72590.72590.00420.00420.00350.49990.49992.8188
X 3 0.20490.24130.00120.00140.00140.28830.25201.4690
Table 4. Design of a calibration experiment given by ‘expert judgement’.
Table 4. Design of a calibration experiment given by ‘expert judgement’.
1.000.500.500.200.201.01.00.050.05
100000000
1−1−1000000
01−1000000
010−1−1−1000
001−1−10−100
0001−10000
0001000−1−1
0000010−1−1
00000001−1
Table 5. Designs of four calibration experiments calculated using the SSQR-GE algorithm with uncertainties σ i calculated according to (16) for the different values of σ R , σ N , and σ V given in Table 6.
Table 5. Designs of four calibration experiments calculated using the SSQR-GE algorithm with uncertainties σ i calculated according to (16) for the different values of σ R , σ N , and σ V given in Table 6.
1.000.500.500.200.201.01.00.050.05
100000000
10−1−1−1−1−111
1−1−11−11−11−1
1−1−11−1−11−11
1−1−1−111−1−11
1−1−1−11−111−1
010−1−100−1−1
01−101−10−1−1
001−10−1−1−1−1
100000000
1−1−1000000
010−1−1−1−111
01−11−11−1−11
01−101−10−1−1
01−10−1111−1
001−1−100−1−1
00010−1−11−1
0001−1−11−11
100000000
1−1−1000000
010−1−10−100
01−1000000
00010−1−100
0001−10000
0000010−1−1
000001−100
00000001−1
100000000
1−1−1000000
010−1−10−100
01−1000000
00010−1−100
0001−10000
0000010−1−1
000001−100
00000001−1
Table 6. Uncertainties associated with a i for four different assignments of σ i , i = 2 , , n = 9 . The columns labelled E give the uncertainties associated with ‘expert judgement’ design given in Table 4, while the columns labelled ‘O’ give those associated with the optimal designs, given, in Table 5, calculated by the SSQR-GE algorithm. Also shown in the table are the aggregate measure d ¯ = | V a | 1 / n of uncertainty, the total number i n i of artefacts, and a measure i v i of the total nominal values involved in each set of experiments. The last three rows give the values of σ R , σ N , and σ V used to calculate σ i , i = 2 , , n .
Table 6. Uncertainties associated with a i for four different assignments of σ i , i = 2 , , n = 9 . The columns labelled E give the uncertainties associated with ‘expert judgement’ design given in Table 4, while the columns labelled ‘O’ give those associated with the optimal designs, given, in Table 5, calculated by the SSQR-GE algorithm. Also shown in the table are the aggregate measure d ¯ = | V a | 1 / n of uncertainty, the total number i n i of artefacts, and a measure i v i of the total nominal values involved in each set of experiments. The last three rows give the values of σ R , σ N , and σ V used to calculate σ i , i = 2 , , n .
a i EO1E02EO3E04
1.001.001.001.001.001.001.001.001.00
0.500.610.560.660.640.690.691.041.04
0.500.610.550.660.640.690.691.041.04
0.200.390.310.430.400.600.580.500.57
0.200.490.290.520.390.610.580.540.60
0.100.570.260.610.320.900.440.570.36
0.100.910.271.030.361.640.451.340.34
0.050.350.200.360.280.400.480.290.27
0.050.350.200.360.280.400.480.290.27
d ¯ 0.170.060.210.120.210.130.210.15
v i 7.017.47.011.07.06.37.06.3
n i 2462244824222422
σ R 0.500.500.500.500.200.200.200.20
σ N 0.000.000.200.200.800.800.200.20
σ V 0.000.000.200.200.200.200.800.80
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Forbes, A. The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments. Algorithms 2024, 17, 193. https://doi.org/10.3390/a17050193

AMA Style

Forbes A. The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments. Algorithms. 2024; 17(5):193. https://doi.org/10.3390/a17050193

Chicago/Turabian Style

Forbes, Alistair. 2024. "The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments" Algorithms 17, no. 5: 193. https://doi.org/10.3390/a17050193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop