*3.4. Examination of Pixel Quality*

A closer examination into the homogeneity of pixels reveals some insights into their statistical quality. Figure 7 shows the homogeneity of 2000 pixels from the 50 × 50 km-square area in the 11 June 2016 event of Aqua MODIS B4 versus SNPP VIIRS M4, corresponding to Figure 6b, ranked from best to worst. The two vertical lines mark 500 and 1000 samples. The first 500 samples have homogeneity better than 3.5%, but the next set of 500 samples, from number 501 to 1000, ranges from 3.5% to 4.7%. It is clear that the first 500 ranked samples will generate smaller variability then the next 500 samples and so on. This is consistent with Figure 6 which shows the 500-sample case is actually more precise then the 1000-sample case. The ranking of homogeneity as in Figure 7 exposes that includes more pixels can bring in those pixels with greater variability and make statistics worse. While obvious as presented, this runs counter to the common expectation that a larger sample size would generate better, not worse, statistics. The continually rising pattern of homogeneity of ranked-pixels indicates different variability pixel-wise, thus a sampling analysis over SNO scenes does not conform to standard sampling where each data point conforms to the same variability. This is neither an obvious nor trivial property that is anticipated, but nevertheless is consistent with physical reality in hindsight. Therefore cleaning processes based on physical conditions, such as cloud removal, that focuses on a subset of pixels with specific physical attributes does nothing for this pixel-based variability and will not stabilize the

statistics. The issue is not if the pixels have been cleansed of certain physical attributes but whether or not if too many pixels of higher variability have been sampled. The real physical conditions of Earth scene data can vary, and cannot be expressed by a single well-defined distribution. Inclusion of more samples to improve statistics is inherently erroneous and can end up broadening the distribution and worsen the error bar. Therefore the containment of the worsening statistics, such as limiting the sample size and using only the lowest variability pixels, is the necessary remedy.

**Figure 7.** (**a**) Homogeneity versus ranked pixels for Aqua MODIS B4 versus SNPP VIIRS M4 for the 11 June 2016 SNO event and (**b**) precision (top panel) and homogeneity (bottom panel) with respect to sample size constraint for the 11 June 2016 event for the 36-km (red triangles), 50-km (blue squares), and 80-km (green diamonds) scales.

Figure 7b demonstrates how average precision and average homogeneity increases with respect to the number of sample at three area sizes—36-km (red triangles), 50-km (blue squares), and 80-km scale (green diamonds). For each of three testing area sizes, average precision and average homogeneity are computed for each given number of the homogeneity-ranked pixels. For example, for the 36 × 36 km-square area case which has 1296 pixels, the best 100 pixels in terms of homogeneity are used for computation of statistics for the first point, and then 101 pixels of the best quality are used and so on. Expectedly, the average homogeneity and precision worsens with inclusion of more pixels. The

three cases also show that statistics improve with larger area under sample constraint. The 11 June 2016 event is a marginal case, and its 36-km, 1000-sample precision result at 2.2% would have been excluded by a 2% precision requirement for the time series; but its 50-km, 500-sample result shows that a different set of criteria can improve pixel selection leading to significant improvement to 1.7% precision. A quick summary of the sample size constraint is that, large constraint size can worsen statistics but using larger area can improve them.

It is natural to want to find the optimal scale and sample size choice, but the answer does not require another thorough study, but rather simply on the caution of keeping the area size small enough to avoid potential hidden bias. While Figure 7 may shows that the 80-km scale (green diamonds) generates the best statistics, the overall finding including that of the unconstrained cases in Figure 6 also suggests the presence of some underlying bias over larger area. For the 1-km regime, the 50-km scale is an acceptable balance between having an area small enough to minimize the large area bias and one large enough for good, but not necessarily the best, statistics. The result also shows that sample size range of 400 to 600 to be reasonable.

The distributions of qualified pixel-based ratios per each SNO event are examined for the three different sample size conditions. Figure 8 shows the three distributions of the 29 May 2016 event at the 70-km scale to be normal-like, indicating that the samples as a set are well behaved in each case. The key point is that the 500-sample case has the tightest distribution, followed by the 1000-sample case and finally the unconstrained case. This is consistent with the result of Figure 6 showing 500-sample cases having lower error bars. Other scales are checked to have the same behavior. The broadening of the distribution from the 500-sample to the unconstrained case is the most direct demonstration of the lack of an underlying stable distribution, showing that the sampling in intercomparison involves physical data of different variability. By including more samples in the homogeneity-ranked scheme into the distribution, the result increasingly contains worse statistics to broaden the distribution.

**Figure 8.** The three ratio distributions of Aqua MODIS B5 versus SNPP VIIRS M8 of the 29 May 2016 event, taken at the 70-km scale, for the sample-unconstrained condition (red stars), the constrained size 1000 samples (green diamonds), and the constrained size at 500 samples (blue squares).

It is worthy to clarify that the impact of homogeneity on error bar is neither direct nor absolute. Homogeneity as applied in this study has been shown to be a beneficial metric to help stabilize statistics, but pursuing into greater details is not necessary at the 1% precision level. It has been examined that slight variation at ~4.5% leads only to the slightest difference in a few SNO events. The sample size limitation and the selection procedure as described thus far are the main factors impacting the error bar result.
