*2.2. DoE Uncertainty Budgets*

The DoEs are influenced by factors in the participants' measurements, with each factor giving rise to one component of uncertainty. Because Equations (1) and (2) combine results from all participants, there is a large number (278) of components in the budget of each DoE in this scenario. Listing 3 shows the DoE for Participant A and an abridged uncertainty budget, in which the more significant components of uncertainty are shown—those with magnitudes greater than 10 % of the largest component. These factors can be identified as influences from A's own measurements and from those of the pilot on the same artifact.

**Listing 3.** The DoE for Participant A is shown at the top, with the combined standard uncertainty in parentheses. An abridged uncertainty budget follows. Only the components with a magnitude greater than trim times the largest component are shown. The components are listed in decreasing order of magnitude.

```
D_kc[A] = 0.000249(0.000278)
Uncertainty budget (trim=0.1):
        A:Beam Size & Position (sys): 0.00018602
     A(2):Beam Size & Position (ran): 0.00013449
                   A(4):Type-A (ran): 0.00010139
                   A(2):Type-A (ran): 0.00005375
     A(4):Beam Size & Position (ran): 0.00005153
              A(2):Instability (ran): 0.00004686
               A:Non-linearity (sys): 0.00002859
    Q(A):Non-Parallel Surfaces (sys): 0.00002217
    Q_A(1):Drift & Instability (ran): 0.00002101
    Q_A(3):Drift & Instability (ran): 0.00002101
    Q_A(5):Drift & Instability (ran): 0.00002101
```
There is quite a diversity of structure in the uncertainty budgets among the different participants. Listing 4 shows the DoEs obtained for Participants B and C. Participant B's result has a much larger combined standard uncertainty than Participant C, and the uncertainty budget is dominated by factors associated with B's own measurements. In contrast, the DoE for Participant C has the lowest uncertainty of all participants, and the corresponding uncertainty budget has many more significant influence factors. The largest of these are from C's own measurements and the corresponding pilot measurements. However, we also see components associated with measurements by Participants A, E, F, H, I, J, and K. These participants were weighted more heavily than B, D, and G during the DoE calculation (see Appendix A).

**Listing 4.** The DoEs for Participants B and C. See the caption to Listing 3 for further details.

```
D_kc[B] = 0.000305(0.001225)
Uncertainty budget (trim=0.1):
                B:Inter-reflection (sys): 0.00109561
   B(2):Source Drift & Fluctuation (ran): 0.00036611
   B(4):Source Drift & Fluctuation (ran): 0.00031325
            B:Beam Size & Position (sys): 0.00018227
                       B:Bandwidth (sys): 0.00015936
D_kc[C] = 0.000008(0.000097)
Uncertainty budget (trim=0.1):
                 C:Stray light (sys): 0.00004288
                      C(2):SFK (ran): 0.00002973
               C:Non-linearity (sys): 0.00002751
                   C(4):Type-A (ran): 0.00002727
             C:Inter-reflexion (sys): 0.00002185
    Q(C):Non-Parallel Surfaces (sys): 0.00002099
    Q_C(1):Drift & Instability (ran): 0.00001784
    Q_C(3):Drift & Instability (ran): 0.00001784
    Q_C(5):Drift & Instability (ran): 0.00001784
                 H:Stray Light (sys): 0.00001754
            H:Inter-reflection (sys): 0.00001754
            J:Inter-reflection (sys): 0.00001523
                   I(2):Type-A (ran): 0.00001441
```

```
I:Inter-reflection (sys): 0.00001426
        J:Prismatic effect (sys): 0.00001362
             Q_C(1):Type-A (ran): 0.00001133
             Q_C(3):Type-A (ran): 0.00001133
             Q_C(5):Type-A (ran): 0.00001133
           F:Non-linearity (sys): 0.00001065
    A:Beam Size & Position (sys): 0.00000914
                  C(4):SFK (ran): 0.00000863
        E:Inter-reflection (sys): 0.00000767
            H:Polarization (sys): 0.00000760
E:Detector reproducibility (sys): 0.00000738
               C(2):Type-A (ran): 0.00000690
        F:Inter-reflection (sys): 0.00000666
            F:Polarization (sys): 0.00000666
   H(4):Filter instability (ran): 0.00000666
            K:Polarization (sys): 0.00000662
 A(2):Beam Size & Position (ran): 0.00000661
   H(2):Filter instability (ran): 0.00000634
Q_I(1):Drift & Instability (ran): 0.00000631
Q_I(3):Drift & Instability (ran): 0.00000631
     F(2):Filter Stability (ran): 0.00000614
    K:Beam Size & Position (sys): 0.00000597
H(4):System reproducibility (ran): 0.00000586
     F(4):Filter Stability (ran): 0.00000585
 D(2):Beam Size & Position (ran): 0.00000568
H(2):System reproducibility (ran): 0.00000559
               A(4):Type-A (ran): 0.00000498
               F(4):Type-A (ran): 0.00000455
               D(2):Type-A (ran): 0.00000443
        B:Inter-reflection (sys): 0.00000439
```
The detail about individual influence factors shown in the listings above is more than the minimum required to analyze and link comparisons. Only the net systematic and random components are needed for that purpose. This is what is used at present, and the reduction in complexity makes the analysis tractable without digitalization. However, the physical origins of influence factors are obscured. For example, Listing 5 shows the budgets of Participants B and C in terms of systematic and random components. Compared to the information shown Listing 4, this offers little insight into the origins beyond participant and stage.

**Listing 5.** The DoEs for Participants B and C showing the total systematic and random effects as components of uncertainty. These budgets are equivalent to those in Listing 4; however, only the net random and systematic contributions at each stage are shown.

```
D_kc[B] = 0.000305(0.001225)
Uncertainty budget (trim=0.1):
         B (sys): 0.00061183
      B(2) (ran): 0.00036717
      B(4) (ran): 0.00031562
D_kc[C] = 0.000008(0.000097)
Uncertainty budget (trim=0.1):
      C(2) (ran): 0.00003052
      C(4) (ran): 0.00002860
    Q_C(1) (ran): 0.00002142
    Q_C(3) (ran): 0.00002142
```


## **3. The RMO Key Comparison**

Seven NMIs participated in the subsequent RMO key comparison (identified by the letters T, U, V . . . , Z) and a pilot laboratory (P). The pilot and Participant Z had both taken part in the initial CIPM comparison, so their results were used to link the two comparisons. The participants each measured a different artifact, and the pilot measured all seven artifacts. The comparison was carried out in three stages: first, the pilot measured the artifacts; second, each participant reported a measurement; third, the pilot measured all the artifacts again.

## *3.1. Evaluating DoEs*

The possibility of slight shifts in the scales of the linking participants since the initial CIPM comparison must be accounted for when linking. Therefore, linking participants provide information on the stability of their scales as part of their report during the RMO comparison. Formally, in the analysis, a quantity that includes a term *ED*·*<sup>l</sup>* representing scale movement is used for the DoE of each linking participant:

$$D'\_l = D^\*\_l + E\_{D \cdot l} \,. \tag{3}$$

*ED*·*<sup>l</sup>* can be thought of as a residual error in the scale that contributes to uncertainty in the DoE. To provide a link to the RMO comparison, we then evaluate ([6], Equation (46)):

$$\begin{split} D\_{\rm P} &= -\sum\_{l} \upsilon\_{l} \left( \langle \overline{Y\_{l}} - \overline{Y\_{\rm P}} \rangle\_{A\_{l}} - D\_{l}^{\prime} \right) \\ &= -\upsilon\_{\rm Z} \left( \langle \overline{Y\_{\rm Z}} - \overline{Y\_{\rm P}} \rangle\_{A\_{\rm Z}} - D\_{\rm Z}^{\prime} \right) + \upsilon\_{\rm P} D\_{\rm P}^{\prime} \end{split} \tag{4}$$

where *ν<sup>l</sup>* are the weight factors for linking participants (see Appendix B). Finally, the DoEs of non-linking participants are ([6], Equation (45)):

$$D\_i = \left\langle \overline{Y\_i} - \overline{Y\_\mathcal{P}} \right\rangle\_{A\_i} + D\_\mathcal{P} \;. \tag{5}$$

Our data processing uses a Python dictionary to hold the uncertain numbers for each DoE evaluated according to Equation (5):

```
D_rc = dict()
for l_i in lab_IDs[:-1]:
    r_i = rc_results[l_i]
    D_rc[l_i] = mean(r_i.lab) - mean(r_i.pilot) + D_P
```
A link to the initial comparison is obtained, following Equation (4), from:

```
D_P = -( nu_Z*M_Z + nu_P*M_P )
```
where nu\_Z and nu\_P correspond to *ν*<sup>Z</sup> and *ν*P, respectively, and correspond to Equations (A3) and (A4),

```
M_Z = mean(rc_results['Z'].lab) - mean(rc_results['Z'].pilot) - rc_link_doe['Z']
M_P = -rc_link_doe['P']
```
where rc\_results['Z'].lab is the sequence of measurements submitted by Participant Z and rc\_results['Z'].pilot are the corresponding pilot measurements. Following Equation (3), the uncertain numbers rc\_doe['Z'] and rc\_doe['P'] were calculated by adding an uncertain number for the participant's scale stability to the participant's DoE obtained in the CIPM comparison (see the dataset for further details [14]).

The resulting DoEs, with standard uncertainties in parentheses, are:

```
DoE[T] = 0.00136 (0.00203)
DoE[U] = -0.00137 (0.00076)
DoE[V] = 0.00182 (0.00094)
DoE[W] = -0.00138 (0.00095)
DoE[X] = 0.00032 (0.00042)
DoE[Y] = 0.00297 (0.00314)
```
### *3.2. DoE Uncertainty Budgets*

In the linked RMO comparison, the DoEs are each influenced by 302 factors (these factors were identified by participants when submitting their results and, as explained above, the influences from all participants to contribute to the uncertainty). Again, there is diversity in the uncertainty budgets of different participants. For example, the uncertainty budget in Listing 6 shows that the most important components of uncertainty for Participant Y, the participant with the largest DoE uncertainty, are all related to Y's own measurement.

**Listing 6.** The DoE for participant Y with an abridged uncertainty budget.

```
D[Y] = 0.002972(0.003139)
Uncertainty budget (trim=0.1):
                   Y(2):Scale bias (ran): 0.002910
            Y:Beam Size & Position (sys): 0.000691
                   Y:Non-linearity (sys): 0.000600
   Y(2):Source Drift & Fluctuation (ran): 0.000536
                       Y(2):Type-A (ran): 0.000346
```
In contrast, Listing 7 shows the budget for X, the participant with the least DoE uncertainty, which is influenced most by measurements performed by others: the pilot's measurements of the artifacts used by X and the other linking Participant Z. This budget also includes important components from some factors in the initial comparison.

**Listing 7.** The DoE for Participant X with an abridged uncertainty budget. Note that Participant Z in the RMO comparison was I in the initial comparison. The component of uncertainty labeled Z-I:Scale Instability accounts for the stability of Z's measurement scale.

```
D[X] = 0.000319(0.000420)
```

```
Uncertainty budget (trim=0.1):
           P_X(3):Inter-reflection (ran): 0.000155
           P_X(1):Inter-reflection (ran): 0.000153
           P_Z(3):Inter-reflection (ran): 0.000144
           P_Z(1):Inter-reflection (ran): 0.000142
                       X(2):Type-A (ran): 0.000132
                     X:Stray Light (sys): 0.000130
                       Z(2):Type-A (ran): 0.000127
             Z-I:Scale Instability (sys): 0.000106
            X:Beam Size & Position (sys): 0.000076
             P-E:Scale Instability (sys): 0.000074
   X(2):Source Drift & Fluctuation (ran): 0.000058
                       I(2):Type-A (ran): 0.000056
                     P_X(1):Type-A (ran): 0.000033
                     P_Z(1):Type-A (ran): 0.000030
                   X:Non-linearity (sys): 0.000025
        Q_I(1):Drift & Instability (ran): 0.000024
        Q_I(3):Drift & Instability (ran): 0.000024
                       X:Obliquity (sys): 0.000023
                    X(2):Obliquity (ran): 0.000023
                     H:Stray Light (sys): 0.000018
                H:Inter-reflection (sys): 0.000018
                     Q_I(1):Type-A (ran): 0.000016
                     Q_I(3):Type-A (ran): 0.000016
```
#### **4. Discussion**

This study looked at the use of uncertain numbers as a digital format for reporting measurement data. The context of the study is a specialized area, but the underlying concern is a more general problem: the presence of fixed (systematic) influence factors at different stages of a traceability chain give rise to correlations in data that affect the uncertainty at the end of a chain, but are difficult to account for. Uncertain numbers address this issue and support a simple and intuitive form of data processing. The method is fully compliant with the recommendations in the GUM [7].

Comparison analysis can be a rather laborious and error-prone task at present, because there is a large amount of data to be manipulated. The task could be greatly simplified if something such as the uncertain-number format were adopted. Digital records could then include information about any common factors that lead to correlations. Algorithms would use that information to streamline the data processing and produce more informative results. This is an interesting possibility in the context of the CIPM MRA, because comparison results are used to support NMI claims of competency (CMCs), which are a matter of considerable importance. Greater transparency in the composition of uncertainty budgets for degrees of equivalence would surely be welcome. For example, the largest components of uncertainty in Listing 7 are associated with influence factors for the pilot measurements, not those of Participant X. This shows that the weight of evidence provided by a DoE and its uncertainty to support a CMC claim may be limited by the performance of the pilot and/or linking participants in the comparisons.

Situations where common factors may give rise to correlations in measurement data are not infrequent, but conventional calibration certificate formats do not allow an accurate evaluation of uncertainty in such cases (as is possible in comparison analysis and comparison linking) [9,10]. Our study therefore draws attention to some informal decision, made decades ago, not to report information about influence factors—the uncertainty

budget. This decision was surely made for pragmatic reasons, because additional effort would be required to curate uncertainty budget data in paper-based systems. However, the policy should be reviewed now as part of the digital transformation process. The currently favored DCC formats only report uncertainty intervals or expanded uncertainties [17]. If these formats are ultimately adopted, DCCs will not contain enough information about common influences upstream to handle correlations in downstream data: the scenario envisaged in this article would not be realized.
