1. Introduction
Nuclear safeguards involve measures designed to deter and detect the diversion of nuclear material from the fuel cycle for illicit purposes [
1]. The monitoring for possible data falsification by operators that could mask diversion is based on the statistical analysis of paired (operator, inspector) verification measurements. Significant deviations between the operator declarations and the corresponding inspector measurements may be the consequence of problems with their measurement systems. However, they could also be the result of the operator falsifying the data. In this context, the fundamental elements for a trustworthy analysis are
(i) a proper measurement error model and
(ii) an efficient and unbiased uncertainty quantification (UQ), i.e., the estimation of its uncertainty components.
Concerning this second point, two approaches are available: the bottom-up and the empirical top-down approach [
2]. The former consists on quantifying and then propagating the uncertainty in all key steps of the assay, in order to derive the resulting uncertainty of the measured values. The latter, instead, relies fundamentally on modeling measurement data as a one-way random effect model with two variance components, one accounting for the variation within and the other for that between groups. A group is typically defined as an inspection period, within which the measurements are assumed to have the same random error. Between inspection periods, instead, the systematic error captures the changes that may occur in the metrological conditions, such as instrument calibration, change of inspectors, change of background radiation, and so on [
3]. The two variance components with data from past inspections are estimated through the classical analysis of variance ([ANOVA], [
4]), or using the Grubbs estimator [
5], depending on the scope of the analysis.
In empirical application, the outcomes of the bottom-up and the empirical top-down approach could significantly diverge. Specifically, bottom-up estimations are often larger than their top-down counterparts [
6]. The need to increase consistency between these two results has been stressed more than once (see, for example, [
7,
8]) and encouraged the study of potential improvements for both methods [
9,
10]. The proposed developments of top-down methods never took into consideration the possible presence of anomalous data in the estimation sample even though, as previously mentioned, significant deviations between operator and inspector measurements may occur due to multiple reasons.
The presence of outliers in the data may in fact seriously affect the reliability of the estimations. In random effect models, even one single anomalous observation may greatly influence the properties of the statistics [
11]. The bias introduced by a single anomalous value will also affect the result of the subsequent analyses based on the between and within group standard deviations, as the setting of alarm thresholds in measurements (see, for example, [
12]) or the material balance evaluation [
1]. Consider, for example, the simulated case study in [
12] (see
Figure 1a). The 30 observations, equally split in 3 groups, are simulated using a random effect model with between and within group standard deviations both equal to 0.010. They mimic 30 possible relative distances of the measurements provided by the operator and the inspector in three different inspection periods. Classical analysis-of-variance expressions provide respective estimates of 0.009 and 0.007. Suppose now to change only one single value and convert it in an outliers, as we do with the observation highlighted in red in
Figure 1b. This small change yields a remarkable decrease of the sample average of the 10 observations of group 1, and the classical ANOVA estimates for the between and within group standard deviations become respectively 0.003 and 0.012. It is important to stress that the 30 simulated observations cannot be considered as a time series, even if their plots may suggest it. This is because they are mimicking a use case in the nuclear safeguards context, where the time lag between two single inspections and even between two groups of inspection is not fixed and constant. Therefore, methods coming from time series literature cannot be employed in this context.
The aim of this article is to propose an alternative approach for the UQ problem through the introduction of robust estimation of the between and within group variance estimators. The robust analysis is expected to limit the influence of outliers in past data on the outcomes and to improve the quality and the reliability of the statistical outcomes. For this purpose, we will compare in a simulation exercise the accuracy of the classical estimator with respect to one of the many methods available for estimating the variance components of a random effect model. In particular, we focus on the robust estimators proposed by [
13] and based on the Q-estimator for one-sample scale introduced by [
14], whose main strengths are:
Suitable breakdown point: they have the capacity to handle a consistent proportion of outliers in the data before returning an incorrect result;
High efficiency: they provide estimates with desirable properties even when there are no outliers in the data;
Closed expression: they do not depend on optimization algorithms and are rather easy to calculate.
Thanks to these properties, the proposed estimators are extremely flexible, and may represent a step forward in the still ongoing development of reliable UQ methods for error variance estimation [
9].
This paper proceeds as follows. In the next section, the measurement error model and the classical estimation of its variance components are introduced. Then, in
Section 3, the alternative estimation method based on robust statistics is described. In
Section 4, the results of a simulation experiment that analyzes the behavior of the two estimators subject to different contamination scenarios are presented. An empirical application of UQ in nuclear safeguards measurements through classical and robust estimates is described in
Section 6. Finally,
Section 6 concludes.
2. Measurement Error Model and the Classical Estimation of Its Variance Components
Given a set of data from past inspections and assuming the same number of measurements in each inspection (i.e., balanced design), a typical model for representing the inspector (
I) and operator (
O) measurements of the item
k during inspection
j is the following:
where
is the (unknown) true value of the measurand,
are the short-term systematic errors and
are the random errors. Therefore, the difference between operator and inspector measurements is given by:
which is a random effect model with variance components equal to
and
.
If the error is expected to scale with the true value of the measurand, the starting models become:
where
and
. In this case, the next step is defining the relative difference, given by:
This expression is again a random effect model, with variance components equal to
and
. Since the measurand
is unknown, a reasonable solution is to substitute it with
, because the operator measurement is typically more accurate and precise than the inspector’s measurements [
7]. Ref. [
12] shows that (i) assuming a truncated normal distribution for
that guarantees finite moments for the ratio
, and (ii) considering that typically
and
, we have that:
- -
the distribution of the ratio is extremely close to a normal distribution;
- -
it provides an accurate approximation of the target variable.
Expressions (
1) and (
2) represent both a balanced one-way random effect models. The former assumes no relation between the magnitude of the measurand
and its corresponding error, whereas in the later the error is proportional to the measurand. Since nuclear measurements often have larger uncertainty at larger true values, a multiplicative rather than an additive model is employed [
9]. It is also possible to transform a multiplicative model into an additive one through the well-known approximation
. However, this solution is not recommended due to the inaccuracies it could generate in many situations [
2].
The parameters that need to be estimated in both (
1) and (
2) are the two variance components: the short-term systematic and the random one. In order to extend our focus also to unbalanced cases, in what follows, we will refer to the general formulation of the unbalanced one-way random effect model:
where
and
. Therefore, we have that:
Classical estimators for
and
are given by [
15]:
where:
when
for all
, expressions (
5) simplify to the well-known ANOVA estimators for balanced design.
Finally, it could happen that . In this case, the usual practice is to put , that implies . In nuclear safeguards contexts, this case represents the situation where there are no significant differences between groups of measurements, and all the variability is due to the within groups component.
3. Robust Estimation of Unbalanced One-Way Random Effect Models
Classical estimation of model (
3) through the expressions (
5) yields results with desirable statistical properties whenever no anomalous observations are present in the data. When this assumption is not reliable, the use of robust statistical methods allows to avoid the bias introduced by the contaminated observations. In particular, two different kinds of outlier may affect one-way random effect models:
- (a)
Anomalous values of the group error, also called outlying blocks;
- (b)
Anomalous values of the individual measurement error, also called outlying measurements within blocks.
This double nature of contamination is reflected also in a double definition of breakdown point, defined in general as the smallest proportion of outlying observations that can lead to estimated values which tend to 0 or
∞. Following the definitions provided by [
16], we will define the
block breakdown point for contamination (a) and
measurement breakdown point for contamination (b). Actually, [
17] proposed the definition of a third outlier category, that will be not considered in detail, represented by an extremely small or large variation within one group.
In statistical terms, the price to pay when using robust estimators is a loss of efficiency, given that they are in general less efficient than their classical counterparts. However, an accurate choice of the robust method could sensibly mitigate this drawback, allowing to select the robust estimator that offers the most convenient payoff between breakdown point and efficiency. At this proposal, different choices are available for robustly estimating
and
: from the M-estimators described in [
18,
19,
20,
21] to the S-estimator proposed by [
22] and the GM estimator developed by [
23]. Among all these options, we decided to consider the estimators proposed by [
13] and based on the Q-estimator for one-sample scale introduced by [
14]. The definition of the estimators requires the introduction of the following sets:
and
are respectively the sets of the absolute intragroup and intergroup differences, whereas the union of
and
represents the set of the absolute second order differences of each group pair. The calculation of the estimators is based on the first or second quartile of these two numerical sets. The robust estimator for
is given by:
whereas the two alternative proposals for estimating
are:
where the expression
represents the
p-percentile of set
. The constant factors 2.2191, 1.0484 and 1.5692 guarantee Fisher consistency of the estimators when the data are normally distributed. Therefore, in applications with real data, a normality test for assessing the Gaussian assumption is recommended. Finally, as for the classical estimates, there is the possibility to obtain incoherent estimates, that is,
The solution in this case is again to put
.
The reasons that motivated the choice of these particular estimators are the following:
Suitable breakdown point: [
16] proved that, in case of balanced model (i.e.,
for all
in expression (
3)), the block breakdown points of
,
and
are equal to
, whereas their measurement breakdown points are asymptotically larger or equal to 0.25.
High asymptotic efficiency: defining the asymptotic efficiency as the ratio between the variances of asymptotic distributions of the robust estimators and those of the classical estimators, [
16] proved that the asymptotic efficiency of
and
in a balanced experiment is at least 82%, and can even exceed 90% depending on the values of
n and
. The asymptotic efficiency of
, instead, starts at 37% for
and increase up to 86% when
n increases.
Closed expression:, and are the direct outcome of an explicit formula applied on a set of data. This means that they are not the result of iterative procedures and/or optimization algorithms that may be difficult to calculate and, more importantly, rather time consuming. The only complication of the proposed estimators that might lead to a sensible raise in the execution time of quartile calculation is the growth of the cardinality of and when N and g increase. However, for moderate values of N and g as the ones considered in this article, this does not seem to be an issue. Moreover, , and do not depend on the estimation of any location parameter.
4. Small Sample Comparison of Classical and Robust Estimates
In this section, we want to study and compare through a simulation exercise the behavior of the classical and robust variance estimators defined in (
5)–(
7). At this proposal, we will firstly simulate random effect models according to (
3) without any kind of contamination. The objective of this first experiment is twofold. From one side, we want to compare the behavior of classical and robust estimates in an ideal empirical environment. From the other, it represents a benchmark for a fair assessment of the effects of outliers on both estimates in the second set of simulations that involves contaminated samples.
In order to have an exhaustive view on the main patterns that characterize the estimators in a finite-sample environment, the contaminated and uncontaminated simulations will consider different combinations of settings regarding:
- (i)
the total number of observations ;
- (ii)
the number of groups g, that in nuclear safeguards context corresponds to the number of calibration or inspection periods;
- (iii)
balanced and unbalanced samples. The values of
that describe the unbalanced cases are provided in
Table 1;
- (iv)
the ratio between the systematic variance component and the total variance:
. Actually, looking at expression (
4), the value of
represents the correlation between observations in the same group. Fixing the value of the total variance to 1 (that is,
), we will consider three different cases:
- -
, that implies and ;
- -
, that implies ;
- -
, that implies and .
Similar to [
23], in each simulated sample, we considered three different contamination schemes:
- (A)
contamination of 10% of the random errors in different groups replaced by ;
- (B)
contamination of 10% of the short-term systematic errors , replaced by . For this scheme, we skip the case , and when , we contaminate one systematic error in each simulation, so slightly more than 10%;
- (C)
10% of the random errors in different groups replaced by and at least 10% of the short-term systematic errors , whose random errors are not contaminated, replaced by . Whenever possible, the two contamination schemes do not involve the same group. As before, we skip the case , and when we contaminate one systematic error in each simulation, so slightly more than 10%;
The programming environment used for the simulation exercise is MATLAB (release R2021a). The codes for running and replicating our experiment are available upon request. They are described in the
Appendix A, where we also provide all the implementation details.
Before presenting the results, it is important to notice that condition (
8) may lead to two set of estimates for
, one connected to
, and a second connected to
. However, the simulations outcomes did not highlight relevant differences in the average behavior of the two sets. Therefore, only the results obtained on the first will be presented.
Table 1.
List of the values in unbalanced samples.
Table 1.
List of the values in unbalanced samples.
N | g | |
---|
30 | 3 | 4,10,16 |
30 | 6 | 3,3,5,5,7,7 |
60 | 6 | 4,4,8,12,16,16 |
60 | 10 | 4,4,5,5,6,6,7,7,8,8 |
100 | 10 | 4,4,7,7,10,10,13,13,16,16 |
100 | 20 | 3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7 |
200 | 20 | 4,4,4,4,7,7,7,7,10,10,10,10,13,13,13,13,16,16,16,16 |
200 | 40 | 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7 |
4.1. Simulation Results—No Contamination
Figure 2 presents the values of the Mean Absolute Bias (MAB) multiplied by 100 of the classical and robust variance components estimates obtained in the 100,000 simulations. As expected, in all cases, the absolute biases sensibly decrease on average when
N and
g increase. Similar patterns characterize the values of both classical and robust estimates, with the only difference that the classical estimates always offer better performances than the robust ones. There are only marginal differences between the MAB in balanced and unbalanced samples, with the former slightly outperforming the latter. The value of
has, instead, a remarkable effect on the outcomes. Increasing the value of
yields always to worst estimates of
and a more accurate estimation of
. The pattern of MAB for
is also affected by the values of
N and
g: increasing
g (for a fixed
N) worsens the accuracy of classical and robust estimators, whereas the opposite happens if we increase
N (for a fixed
g). The estimates of
, instead, seem to be only marginally influenced by the combination of
N and
g. Finally, it is worth pointing out that
provides always more accurate estimates on average than
.
Figure 3 shows the estimated efficiencies of
,
and
. Following [
24], in order to obtain a natural measure of accuracy of the estimator, the calculation of the efficiency is based on the standardized variance that, for a generic estimator
, can be calculated as:
where Mean
and Var
refer to the values obtained from the simulations. The ratio
between the standardized variances of the classical and the robust estimator will then provide an estimate of the efficiency. Coherently to what proved in [
16], the panels of
Figure 3 confirm that
is less efficient than
, and that the efficiencies of both do not depend in general on the ratio
, or on the values of
N and
g. Even in small samples, the estimated efficiency of
is about 95% in balanced samples and 90% in unbalanced samples, whereas that one of
oscillates around 65% in both cases. The estimated efficiency of
, instead, varies from 30% (when
N and
g are small) to 85% (for larger values of
N and
g). The value of
has a marginal effect, whereas remarkable differences characterize the value of the estimated efficiency of
in balanced and unbalanced samples.
Concluding, simulations results on noncontaminated data provided a comprehensive description of the finite-sample properties of classical and robust estimators of variance components. Even though classical estimates resulted more accurate on average, the gap with respect to the respective robust counterparts is limited. Also in terms of efficiency, the choice of the robust alternative does not imply a remarkable loss, especially for .
4.2. Simulation Results—Contaminated Samples
Figure 4 shows the effects of contamination (A) in terms of MAB on the classical and robust estimates, and give the possibility to compare these results with the benchmark case of no contamination. As expected, both figures highlight a remarkable increase of the estimated bias of all estimators. However, robust estimators have the capacity to limit the distortion yielded by outlying observations, showing more accuracy than classical estimators. This is particularly evident for the estimates of
independently of the simulation settings. As expected, the bigger efficiency of
with respect to
highlighted in
Figure 3 is counterbalanced by a smaller accuracy in contaminated contexts. Concerning the estimates of
, the robust estimator clearly show more accurate outcomes when
, whereas when
,
is slightly outperformed by
only in small samples (i.e.,
). However, in empirical application on nuclear measurement data, usually
, given that
tends to be smaller than
[
10].
Results concerning contamination (B) are displayed in
Figure 5. Since the contamination involves only the systematic component of model (
2), both classical and robust estimates of
are not affected by anomalous data and confirm the same results previously observed in
Figure 2b. Their MAB values are almost the same, and this explains why in panel
Figure 5b the grey lines (representing the results on contaminated samples) are indistinguishable from the black lines (representing the results on samples without contamination).
and
, instead, worsen their performances due to contamination. However, once again, the robust version of the estimator limits the negative effects of the outliers and provides more accurate estimates. This is particularly true when the samples are balanced, and when
in unbalanced designs.
Finally,
Figure 6 presents the simulation results obtained for contamination (C). The best results in terms of MAB offered by the robust estimators are quite evident. The two robust estimators of
always provide remarkably smaller values of MAB than the classical one, with again
outperforming
. The same can be said also for
, with the only exception of small unbalanced samples, where the two estimators yield approximately the same outcome.
Figure 7 shows in detail the simulated observations for a balanced sample of
observations in
groups under contamination (A). These data may mimic the situation where an inspector has to assess the measurements obtained in the sixth inspection period. The assessment is usually based on the outcomes obtained in the previous five inspections, and on the
rule for the declaration of the outliers. This common practice implicitly assumes that previous measurements are free from anomalies. The plot clearly shows that, if this assumption does not hold, the outlier in the period of interest could remain undetected. Robust estimators, instead, are calculated using also the values of
that we aim to assess and, being only partially affected by past anomalous values, allow a correct identification of the outlier in the last inspection.
5. Empirical Application
Figure 8 presents the
values effectively registered in 6 inspection periods where impure Plutonium items were measured in attended mode. The 52 values form an unbalanced sample, since the number of measurements in each of the 6 inspection periods is 12, 8, 10, 10, 7 and 5. There are 4 values (highlighted in red) that are clearly far from the general pattern. Posterior cross check between the operator and inspection measurements led to a confirmation that those 4 values were inconsistent with the declarations and that the corresponding measurements were actually outliers.
Classical estimates obtained for
,
and
are given respectively by 0.1315, 0.1133 and 0.0667. The robust method, instead, leads to
,
and
. Therefore, in both cases, robust estimates satisfy condition Figure
8, and then we need to put
, that in turn implies
. Therefore, the presence of the 4 anomalous data had a twofold effect on the outcomes:
classical estimates for are always larger than robust ones;
differently from classical one, robust estimation does not detect any relevant short-term systematic error component.
Coherently to what we observed in the simulations, the presence of outliers in the sample yields an overestimation in the variance components that the robust methods is capable to limit. The thresholds represented in the figure provide a further confirmation. The classical ones, based on the sample mean and on , include almost all the outliers. This is due to the effect that the 4 anomalies has on the calculation of both the sample mean and the variance components. The robust ones, based on the median and on , are tighter and more centered with the good measurements, given that the median is a robust estimator of the central tendency, and its value is not affected by the anomalous data. As a result, the four outliers are all outside the robust thresholds.
6. Conclusions
The availability of an efficient and unbiased estimation of the uncertainty components in nuclear safety measurements is a fundamental element for a reliable detection of possible data falsification by operators that could mask diversion. Top-down methods for UQ rely fundamentally on modeling measurement data as a one-way random effect model with within and between groups variance components, which are usually estimated through the classical ANOVA expressions. However, the reliability of these estimates could be seriously affected by the presence of outliers in the data.
The aim of this article was to introduce the use of robust approaches in the UQ of nuclear safety measurements, and to analyze the general benefits in terms of estimates accuracy and bias reduction. For this purpose, we proposed a robust estimator of the variance components that combines several desirable statistical properties, i.e., suitable breakdown point, high efficiency, closed expression. Simulation results and the empirical application proved that the presence of even on single outlier may significantly affect the reliability of classical estimates. The robust estimators, instead, have the capacity to handle anomalous data, and to remarkably limit their negative effects on the final estimates.
Robust statistics offers many other alternative methods for estimating within and between group variance. The promising outcomes obtained through the simulation exercise and the empirical application may be considered as a starting point for a broader investigation aimed to compare different robust estimators for one-way random effect models in simulated and empirical contexts, and to identify the more suitable one. Such analysis will undoubtedly represent a solid contribution to the still ongoing development of reliable UQ methods for error variance estimation.