*2.1. Evaluating DoEs*

The calculation of DoEs can be expressed succinctly (in the notation adopted in [6]). For the pilot, identified by the letter "Q" (and with a superscript "∗" to indicate a CIPM comparison), the DoE, *D*∗ <sup>Q</sup>, is a weighted sum over all participants ([6], Equation (18)) (the weighting factors *wj* are explained in Appendix A and the notation ·*Aj* is the mean of measurements of the artifact associated with participant *j*):

$$D^\*\_{\mathbb{Q}} = -\sum\_{\mathbf{j}} w\_{\mathbf{j}} \langle \ \overline{Y^\*\_{\mathbf{j}}} - \overline{Y^\*\_{\mathbf{Q}}} \rangle\_{A\_{\mathbf{j}}} \ . \tag{1}$$

For any other participant *i*, the DoE is ([6], Equation (19)):

$$D\_i^\* = \left\langle \overline{Y\_i^\*} - \overline{Y\_{\mathbb{Q}}^\*} \right\rangle\_{A\_i} + D\_{\mathbb{Q}}^\* \,. \tag{2}$$

The bar in these expressions indicates the simple weighted mean of a series of measurements for one artifact (*Y*∗ *<sup>j</sup>* [1],*Y*<sup>∗</sup> *<sup>j</sup>* [2], ··· ), obtained at different stages:

$$\overline{Y^\*\_{\hat{j}}} = \sum\_{n=1}^N \eta\_n \, \, Y^\*\_{\hat{j}}[n] \, \, \mu$$

where:

$$\eta\_{\boldsymbol{n}} = \frac{\left(\boldsymbol{u}(\boldsymbol{y}\_{j}^{\*}[\boldsymbol{n}])\right)^{-2}}{\sum\_{k=1}^{N} \left(\boldsymbol{u}(\boldsymbol{y}\_{j}^{\*}[k])\right)^{-2}}$$

and *u*(*y*∗ *<sup>j</sup>* [*n*]) is the standard uncertainty in the value of the *n*th result, *y*<sup>∗</sup> *<sup>j</sup>* [*n*], from participant *j*. There is only one artifact per participant in this scenario, so *Y*∗ *<sup>j</sup>* − *Y*<sup>∗</sup> Q *Aj* indicates the difference between the mean of participant *j*'s measurements and the mean of the pilot's measurements of the same artifact.

Equation (1) was implemented in the GTC software, as shown below. This code obtains an uncertain number representing *D*∗ Q:

```
d_Q = -sum(
    w[l_j] * ( mean(r_j.lab) - mean(r_j.pilot) )
        for l_j,r_j in kc_results.items()
```
The function mean() evaluates the mean of a sequence of uncertain numbers; r\_j.lab and r\_j.pilot contain, respectively, a sequence of results from participant *j* and the corresponding sequence of pilot measurements for the same artifact; kc\_results is a container of objects as r\_j for all participants; w[l\_j] represents the weighting factors. Following Equation (2), a DoE is evaluated for each of the other comparison participants:

d\_i = mean(r\_i.lab) - mean(r\_i.pilot) + d\_Q

The results, with associated standard uncertainties, may be displayed as:

