*Article* **Field Intercomparison of Radiometers Used for Satellite Validation in the 400–900 nm Range**

**Viktor Vabson 1,\*, Joel Kuusk 1, Ilmar Ansko 1, Riho Vendt 1, Krista Alikas 1, Kevin Ruddick 2, Ave Ansper 1, Mariano Bresciani 3, Henning Burmester 4, Maycira Costa 5, Davide D'Alimonte 6, Giorgio Dall'Olmo 7,8, Bahaiddin Damiri 9, Tilman Dinter 10, Claudia Giardino 3, Kersti Kangro 1, Martin Ligi 1, Birgot Paavel 11, Gavin Tilstone 7, Ronnie Van Dommelen 12, Sonja Wiegmann 10, Astrid Bracher 10, Craig Donlon <sup>13</sup> and Tânia Casal <sup>13</sup>**


Received: 26 March 2019; Accepted: 8 May 2019; Published: 11 May 2019

**Abstract:** An intercomparison of radiance and irradiance ocean color radiometers (the second laboratory comparison exercise—LCE-2) was organized within the frame of the European Space Agency funded project Fiducial Reference Measurements for Satellite Ocean Color (FRM4SOC) May 8–13, 2017 at Tartu Observatory, Estonia. LCE-2 consisted of three sub-tasks: (1) SI-traceable radiometric calibration of all the participating radiance and irradiance radiometers at the Tartu Observatory just before the comparisons; (2) indoor, laboratory intercomparison using stable radiance and irradiance sources in a controlled environment; (3) outdoor, field intercomparison of natural radiation sources over a natural water surface. The aim of the experiment was to provide a link in the chain of traceability from field measurements of water reflectance to the uniform SI-traceable calibration, and after calibration to verify whether different instruments measuring the same object provide results consistent within the expected uncertainty limits. This paper describes the third phase of LCE-2: The results of the field experiment. The calibration of radiometers and laboratory comparison experiment are presented in a related paper of the same journal issue. Compared to the laboratory comparison, the field intercomparison has demonstrated substantially larger variability between freshly calibrated sensors, because the targets and environmental conditions during radiometric calibration were different, both spectrally and spatially. Major differences were found for radiance sensors measuring a sunlit water target at viewing zenith angle of 139◦ because of the different fields of view. Major differences were found for irradiance sensors because of imperfect

cosine response of diffusers. Variability between individual radiometers did depend significantly also on the type of the sensor and on the specific measurement target. Uniform SI traceable radiometric calibration ensuring fairly good consistency for indoor, laboratory measurements is insufficient for outdoor, field measurements, mainly due to the different angular variability of illumination. More stringent specifications and individual testing of radiometers for all relevant systematic effects (temperature, nonlinearity, spectral stray light, etc.) are needed to reduce biases between instruments and better quantify measurement uncertainties.

**Keywords:** ocean color radiometers; radiometric calibration; field intercomparison measurement; agreement between sensors; measurement uncertainty

#### **1. Introduction**

The FRM4SOC project aimed to support the consistency of the ground-based validation measurements for "ocean color (OC)", or water reflectance, with the SI units, and thus, contribute to higher quality and accuracy of Sentinel-2 Multispectral Instrument (MSI) and Sentinel-3 Ocean and Land Color Instrument (OLCI) products. For that, the second laboratory comparison exercise (LCE-2) comparison experiment was organized in the frame of the FRM4SOC project. A stepwise approach was chosen for the LCE-2: At first, calibration of sensors, secondly; indoor, laboratory comparisons using various levels of radiance or irradiance performed in stable conditions similar to those during radiometric calibration; and as a third, outdoor, field measurements of natural radiation sources in an environment significantly different from laboratory conditions. This paper only describes the field experiment, whilst the radiometric calibration and indoor exercise are covered in a related paper of the same journal issue [1].

Intercomparison of data produced by a number of independent radiometric sensors measuring simultaneously the same object allows assessment of the consistency of different results and their estimated uncertainties depending on the type of the sensor, the spectral composition, intensity and angular variability of the measured radiation, environmental temperature, and the particular method used for collecting and handling the measurement data [2,3]. This information can serve also for further elaboration of uncertainty estimation. Compared to the indoor experiment [1], much larger variability between radiometric sensors is expected in the outdoor experiment, due to much larger differences in target signal and environmental temperature with respect to the radiometric calibration conditions.

The analysis of field measurements is more complicated than for the indoor case. The main differences in field and laboratory measurements of LCE-2, causing a substantial increase of the field measurements uncertainty, are shown in Figure 1. The spectral composition and intensity of radiation from the target being measured (sky, water) are significantly different from the incandescent source used as the radiometric calibration standard. The angular distribution of downwelling irradiance also varies from the nearly collimated radiation source used during radiometric calibration. Ambient temperature in the field can differ from the stable laboratory temperature during the radiometric calibration by more than ±15 ◦C. The stray light effect may be an order of magnitude larger, due to different shapes of the calibration and field spectra. Strong autocorrelation in recorded time series data implies that statistical analysis of intercomparison results should be suitably rearranged.

Due to non-ideal performance of radiometers (temperature dependence, deviation from ideal cosine response for irradiance sensors, nonlinearity, spectral stray light, etc.), all the differences between conditions during radiometric calibration and field measurements can contribute to the bias between radiometers and increase the measurement uncertainty. The known measurement errors should be corrected and the unknown or residual errors have to be assessed and accounted for in the uncertainty budget. Unfortunately, the information needed for these corrections is often available only through

highly time- and resource-consuming tests of individual radiometers, and it is often necessary to make such corrections based on the characterization of an instrument from the same family.

**Figure 1.** Main differences between the field and laboratory measurements of the second laboratory comparison exercise (LCE-2) causing a substantial increase in uncertainty of the field measurements.

This study aims to evaluate the effectiveness of SI-traceable radiometric calibration for consistency of OC field measurements, presents LCE-2 data processing results, and discusses techniques and procedures for improving traceability of OC field measurements.

#### **2. Material and Methods**

### *2.1. Participants of the LCE-2*

In total 11 institutes or companies were involved in the LCE-2, see Table 1. Altogether 44 radiometric sensors from five different manufacturers were involved, as shown in Table 2.


**Table 1.** Institutes and instruments participating in the LCE-2 intercomparison.



#### *2.2. Venue and Measurement Setup*

The outdoor exercise took place at Lake Kääriku, Estonia, 58◦0 5"N, 26◦23 55"E on 11–12.05.2017. Lake Kääriku is a small eutrophic lake with 0.2 km<sup>2</sup> surface area. Maximum depth is 5.9 m, with an average of 2.6 m. The water color is greenish-yellow with measured transparency (Secchi disk depth) of 2.6 m. The average chlorophyll content Chl\_a = 7.3 mg m−3, total suspended matter content TSM = 3.9 g m<sup>−</sup>3, absorption of the colored dissolved organic matter *a*CDOM(442 nm) = 1.7 m<sup>−</sup>1, diffuse attenuation coefficient of downwelling irradiance *K*d(PAR) = 1.3 m−1. The bottom is muddy. Lake Kääriku has a 50 m long pier and a diving platform on the southern coast. The diving platform has two levels. During LCE-2 the upper level was used for the instruments, computers and instrument operators were located on the lower level and the pier below the tower (Figure 2).

**Figure 2.** Pier and diving platform at the southern coast of Lake Kääriku.

The instruments were located roughly 7.5 m above the water surface. Depth of water around the diving platform was 2.6 m to 3.6 m and the bottom was not visible to observers. The closest trees were about 65 m south of the platform, the treetops are less than 20◦ above the horizon when viewed from the upper level of the platform. Purpose-built frames were used for mounting and aligning the participating radiometers (Figures 3 and 4). The irradiance sensors were mounted in a fixed frame ensuring the levelling of the cosine collectors. The front surfaces of all the cosine collectors were set at the same height so that the illumination conditions were equal and the instruments were not shadowing each other.

**Figure 3.** 3D CAD (computer-aided design) drawings of the frames for mounting irradiance (left) and radiance (right) sensors during the outdoor experiment.

**Figure 4.** All the radiance and irradiance radiometers were mounted in common frames during the LCE-2 outdoor experiment. Left frame—irradiance sensors; right frame—radiance sensors.

#### *2.3. Environmental Conditions and Selection of Casts*

The environmental conditions during the outdoor experiment were not ideal, mainly due to the presence of scattered cumulus clouds. The aerosol content was low, average daily aerosol optical depth at 500 nm (AOD500) was 0.077 on May 11 and 0.071 on May 12 (measured at Tõravere AERONET station, 30 km north of Lake Kääriku [4]). The air temperature was rather low, between 5 ◦C and 9 ◦C; water temperature was around 11 ◦C. Wind speed was mainly between 0.5 m s−<sup>1</sup> and 4 m s−<sup>1</sup> with occasional gusts of up to7ms<sup>−</sup>1.

The outdoor measurements were performed in 5-minute casts, an exception of 25-minute irradiance cast no. 14. The beginning and end times of casts were announced and during the casts all the participants recorded the radiance and irradiance data at their usual fieldwork data acquisition rate. 30 casts were recorded in total, but only seven of them were included in the intercomparison. The selection of casts was based on the time series of 550 nm spectral band. The coordinating laboratory received the 550 nm time series data for 16 radiance and 10 irradiance sensors. Only the casts with the most stable signal and least missing data were selected for further analysis. All the selected casts were measured on May 12—the second day of the outdoor experiment. The all-sky camera images captured in the middle of the selected casts can be seen in Figure 5.

**Figure 5.** All-sky camera images captured in the middle of the casts used in the intercomparison analysis. Irradiance—C10, C12, C13, C14; blue sky radiance—C8, C12, C13; water radiance—C17, C23. Red dots in C8, C12, C13 indicate approximate view direction of the radiance sensors.

The casts used in the analysis of LCE-2 intercomparison are listed in Table 3. Four casts (C10, C12, C13, and C14) were chosen for irradiance, all recorded with direct sunlight, although with some clouds in the sky away from the sun. Five casts were chosen for radiance: Three casts (C8, C12, and C13) recorded with blue sky as a target, one (C17) measurement of the water surface in cloud shadow, and one (C23) measurement of sunlit water. Measurement C17 is made at a zenith angle suggested in the protocols for above-water radiometry, while measurement C23 is made at a slightly more oblique angle. These measurements are made for azimuth angles 107◦ and 143◦ with respect to the sun, in order to avoid sunglint and direct shadow from the platform. The 550 nm time series of one irradiance (RAMSES SAM\_8329) and one radiance (RAMSES SAM\_81B0) sensor for all the radiance and irradiance casts used for intercomparison are plotted in Figure 6. The initial cast start and stop times were adjusted based on Figure 6 to exclude the intervals with high temporal variability. Photographs of the radiance targets can be seen in Figure 7. Approximate field-of-view (FOV) footprints for WISP-3 (3◦), RAMSES (7◦), and HyperOCR (23◦) are shown in Figure 7 as well. The images were taken with a handheld Nikon D40X digital single-lens reflex (DSLR) camera equipped with a Nikkor 18–200 mm zoom lens. According to the Exchangeable image file format (EXIF) meta-info of the images, the lens was completely zoomed out to 18 mm for C8, C12, C13, and C23. Considering the parameters of the lens and the camera, the horizontal FOV of these images is 67◦. The lens was zoomed to 32 mm for C17 which corresponds to 41◦ horizontal FOV of the image. As the camera was not fixed to the frame in line with the radiometers, its collinearity with the radiometers is uncertain and the actual FOV-s of the radiometers may slightly differ from circles, shown in Figure 7.


**Table 3.** Casts used in the analysis.

UTC—coordinated universal time; NA—not applicable; SZA—solar zenith angle; *Ld*—downwelling sky radiance; SAA—solar azimuth angle; *Lu*—total upwelling water radiance; VAA—view azimuth angle; *Ed*—downwelling irradiance; VZA—view zenith angle.

**Figure 6.** Relative variation of 550 nm signal of one RAMSES sensor during irradiance (**left**) and radiance (**right**; C8, C12, C13 blue sky; C17 water in cloud shadow; C23 sunlit water) casts selected for intercomparison analysis.

**Figure 7.** Photographs of radiance targets used in the intercomparison analysis. The circles denote approximate FOV of WISP-3 (smallest), RAMSES, and HyperOCR (largest).

#### *2.4. Outdoor Experiment of the LCE-2*

The initially planned outdoor intercomparison [5] accounted for two phases: (1) Direct intercomparison of the downwelling irradiance *Ed*, the downwelling sky radiance *Ld*, and the total upwelling water radiance *Lu*; (2) intercomparison of the remote sensing reflectance *Rrs* and the water-leaving radiance *Lw* derived from simultaneously measured *Ed*, *Ld*, and *Lu*. The radiance sensors were mounted on the frame in two groups which could be moved independently in the zenith direction. Additionally, the relative zenith angle between the two groups could be fixed, and both groups tilted together. The selected setting was to fix the relative azimuth angle between the two groups of sensors to 0◦ and move simultaneously all the radiance sensors in the azimuth direction. The design of the radiance frame allowed mounting the *Lu* radiometers to one group and *Ld* radiometers to another group for measuring *Lw* and *Rrs* in a typical 3-radiometer above-water configuration [6].

On the first day of the outdoor measurements, seven casts of simultaneous *Ed*, *Ld*, and *Lu* measurements at typical above-water 3-radiometer configuration were recorded. However, none of the casts was considered suitable for the analysis, due to cumulus clouds causing rather unsteady illumination conditions. On the second day of the outdoor experiment, priority was given to the phase (I) measurements and all the radiance sensors were simultaneously measuring either *Lu* or *Ld*.

#### *2.5. Data Processing*

In total, data for 40 out of 44 radiometers were reported back to the pilot. For the rest, the pilot carried out the data handling using the provided raw files. The data processing details are described in Sections 3.1 and 3.2 of the related paper [1]. The outdoor data processing chain contained the following steps:


#### *2.6. Consensus Value Used for the Analysis*

The group median was used as the consensus value. Compared to the indoor measurements, outdoor variability between radiance sensors on average was about twice larger, and for irradiance sensors more than five times larger. Two irradiance and one radiance sensor were not accounted for in the variability estimate, because they had extremely large deviations from the group median.

#### *2.7. Accuracy of Sensor Adjustment*

The collinearity of groups of radiance sensors on the left and right frame was set by visual observation from the side of the frame and was better than ±1◦. Due to the flexibility of the plastic clamps used to fix the HyperOCR radiometers, slight deflection from collinearity of HyperOCR and RAMSES sensors within the groups was noticed during the experiment (visually much larger than misalignment between the groups). Using Figure 8, the angle between HyperOCR and RAMSES sensors was measured to be 1.3◦, the HyperOCR sensors were pointing lower than the RAMSES instruments. Image taken from the other side of the frame revealed that the HyperOCR sensors in the other group were pointing about 1.1◦ higher than the RAMSES instruments. The left and right radiance frames were visually aligned by the topmost RAMSES instruments, thus, the maximum angle between the HyperOCR instruments on the frames could have been about 2.5◦. Although this is ten times smaller than the FOV of a standard HyperOCR instrument, it can have a significant impact when measuring spatially heterogeneous targets.

**Figure 8.** The angle between red lines marking the directions of HyperOCR and RAMSES sensors was measured to be 1.3◦ from this image.

#### **3. Results**

#### *3.1. Results of Outdoor Comparison*

The consensus spectra for the irradiance and radiance targets are presented in Figure 9. The difference between the casts of radiance sensors measuring the sky and water is evident. Radiation from the water with blue sky gave the smallest signal.

**Figure 9.** Irradiance and radiance consensus values in the outdoor experiment. C8, C10, C12, C13, C14—blue sky (**radiance**) or direct sunshine (**irradiance**); C17—water in cloud shadow; C23—sunlit water.

The measurement results for the field casts are presented in Figures 10 and 11 as the deviation from the consensus value. The different behavior of RAMSES and HyperOCR sensor groups became evident. For the irradiance measurements, the deviation of HyperOCR sensors from the consensus value was very small, and the group of RAMSES sensors caused the increase of mean variability, see Figure 10. Conversely, the variability of the radiance sensors during the indoor and outdoor exercises was almost at the same level for the RAMSES group, and the increase of the outdoor variability was caused largely by the HyperOCR sensors, see Figure 10.

**Figure 10.** Irradiance sensors compared to the consensus value. Solid lines—RAMSES sensors; dashed lines—HyperOCR sensors; double line—SR-3500.

All the irradiance casts in Figure 10 were measured with direct sunshine and no big difference between casts can be observed for the consensus irradiance spectra (Figure 9). The group of HyperOCR sensors, shown in Figure 10 with dashed lines, are more consistent with the consensus value than the sensors of the RAMSES group shown with solid lines. Remarkable is much higher variability across sensors of the RAMSES group. Interestingly, the intra-sensor variability of irradiance is almost wavelength-independent, except at 400 nm.

The comparison of different radiance sensors (Figure 11) did show a very good agreement to within 1.2% across the full spectrum for all RAMSES sensors for casts C12 and C13—the most homogeneous blue sky targets. Higher variability between all sensors, and particularly the HyperOCR radiance sensors, is seen for the obliquely viewed water target C23 (Figure 11). This is probably caused by spatial heterogeneity of the target (C23 in Figure 7), and by slight bias from collinearity of the sensors (Figure 8). This assumption is supported by the fact that radiometers 151, 222, and 444 which are below the consensus value in Figure 11 were mounted on the left frame and radiometers 152, 223, and 445 which all remain above the consensus value in Figure 11 were mounted on the right frame. The water-viewing measurement C17 has better spatial heterogeneity and is more representative, due to more suitable zenith angle normally used for water reflectance measurements because the angular variability of the Fresnel reflection coefficient for 41◦ angle of incidence (cast C17) is smaller than for 50◦ (cast C23), and hence gives less spatial variability of skylight reflection.

In Figure 11, the SeaPRISM shows fairly good agreement with the consensus value of LCE-2, while SR-3500 is through all casts biased to somewhat smaller values. WISP-3 sensors show above an average scattering of results, partly because their alignment to the frame in line with the other radiometers was difficult, due to the ergonomic shape of these handheld instruments and lack of suitable reference surfaces for alignment. It is not possible to conclude which sensor(s) showed best agreement with SI, due to lack of a well-characterized SI-traceable reference radiometer involved simultaneously in the comparison.

**Figure 11.** Radiance sensors compared to the consensus value in the outdoor experiment. C8, C12, C13—blue sky; C17—water in cloud shadow at 139◦ VZA; C23—sunlit water at 130◦ VZA. Solid lines—RAMSES sensors; dashed lines—HyperOCR sensors; double lines—SeaPRISM (SP) and SR-3500; dotted lines—WISP-3.

The variability of irradiance and radiance results in the LCE-2 in comparison with differences between sensors, due to calibration state before the experiment is summarized in Figure 12. All standard deviations of laboratory measurements are smaller than 1%. Standard deviations of the field results are substantially higher (1–5)%, but still much smaller than variability, due to calibration state of sensors before the experiment (5–10)%, i.e., the calibration that each participant would have used if the radiometers were not freshly calibrated just before the start of the LCE-2 intercomparison exercise. It must be noted, however, that some instruments had not been used for fieldwork in recent years, thus, the previous calibration coefficients were several years old.

**Figure 12.** Variability between irradiance and radiance sensors. E\_cal and L\_cal—due to calibration state; E(Lab), L(Low) and L(High)—variability in laboratory intercomparison; E(Sun), L(BlueSky) and L(Water) variability in the field.

#### *3.2. Measurements after the End of LCE-2 Comparison*

Large variability between irradiance sensors of the RAMSES group during the outdoor exercise cannot be fully explained by poor stability of sensors, or by factors, such as temperature dependence (which is rather similar for the whole RAMSES group [7]), nonlinearity (which would be stronger for wavelengths with high digital counts), and stray light (which would show more spectral features). Most likely, the main reason for differences between RAMSES and between HyperOCR irradiance sensors comes from different properties of the entrance optics (angular response). The results of [8] for six RAMSES irradiance sensors suggest a cosine error within ±2% for sun zenith angles lower than 50◦ when radiometric calibration is conducted at 20◦ tilted sensor with respect to the incident irradiance. For the "conventional" calibration procedure at normal illumination somewhat larger cosine error may be expected. Therefore, after the end of LCE-2, in January 2019 the in-air cosine response error of five RAMSES irradiance sensors was measured, see Figure 13. One new sensor number 8598 measured was not involved in LCE-2.

Dependence of the cosine error on the zenith angle varies from radiometer to radiometer significantly with values ranging from −16% up to +9% at ±65◦. Deviation from the ideal cosine response is irregular and does not always show a monotonic increase with the incidence angle. This is in agreement with the results of [8]. For one sensor, 8329, significant asymmetry is evident. The best of the characterized sensors, 81A8, has demonstrated in the outdoor experiment irradiance results very close to the consensus value (Figure 10), whereas the sensor 81EA with the largest cosine error, at the same time, had a deviation from consensus value about −10% to −15%, depending on wavelength.

Following the 20◦ "offsetting" calibration method suggested in Reference [8], the comparison data of Figure 10 were recalculated for two sensors by using the cosine response characterization results. Effect of calibration with tilted to 20◦ with respect to the incident irradiance sensor is shown in Figure 14. Improvement is evident for both sensors, but for 81EA the residual error is still large.

**Figure 14.** Effect of calibration with tilted to 20◦ with respect to the incident irradiance sensor.

The manufacturer's specification of the HyperOCR [9] states that the cosine root mean square (RMS) error is within 3% at 0–60◦, and within 10% at 60◦–85◦ incidence angles. For RAMSES [10], accuracy is stated to be better than 6–10% depending on spectral range. The respective specification in Reference [11] is: For *E*<sup>d</sup> measurement, the response to a collimated source should vary as cosθ within less than 2% for angles 0◦ < θ < 65◦ and 10% for angles 65◦ < θ < 90◦. For easier comparison of different sensors the deviation from ideal cosine response was quantified as the integral of azimuth-independent absolute values of the cosine error for θ in the 0◦ to 85◦ interval, the index *f* <sup>c</sup> in Reference [8] or cosine error *f* <sup>2</sup> in Reference [12], see Figure 15.

**Figure 15.** Integrated cosine error of the five RAMSES radiometers.

Increased variability between the RAMSES sensors in comparison with HyperOCR sensors presented in Figure 10 can be reasonably explained by a too tolerant specification of the cosine error, as departures from cosθ imply analogous errors in *E*<sup>d</sup> in the case of direct sunlight [11]. Although the majority of the RAMSES sensors meet the present specification, differences revealed during the field measurements may render the specification unsatisfactory for the users, unless laboratory characterization data and an indication of the angular variation of the downwelling radiance field, e.g., direct/diffuse ratio, is available to correct for imperfect cosine response.

Thus, rather large cosine errors of RAMSES irradiance sensors can be considered to be the main reason for the differences between irradiance sensors during the LCE-2 outdoor measurements.

#### **4. Uncertainty Budgets of Outdoor Comparisons**

An uncertainty analysis according to Reference [13,14] is undertaken for the outdoor measurements to understand the contribution of different factors to the observed variability between sensors. The outdoor downwelling irradiance uncertainty estimates are presented in Table 4; Table 5 corresponds to the blue sky radiance, and Table 6 to the radiance of sunlit water. All the uncertainty estimations in Tables 4–6 are based on experimental variability data of TriOS RAMSES sensors and information from References [2,6,15–19]. For the other radiometer models that took part in the intercomparison very little publicly available information can be found regarding various instrument characteristics that influence the measurement results [20]. In addition, the RAMSES was the only sensor model that was represented in sufficiently large number for statistical analysis.

**Table 4.** Relative uncertainty budget for the downwelling irradiance (in percent), based on the spread of individual sensors measuring the same target during the outdoor comparison. Data highlighted in green are not used for combined and expanded uncertainties. Last row: Relative experimental variability of sensors evaluated from the results of field comparisons.


**Table 5.** Relative uncertainty budget for the radiance of blue sky (in percent), based on the spread of individual sensors pointing to the same target during the outdoor comparison. Data highlighted in green are not used for combined and expanded uncertainties. Last row: Relative experimental variability of sensors evaluated from the results of field comparisons.


**Table 6.** Relative uncertainty budget for the radiance of sunlit water (in percent), based on the spread of individual sensors pointing to the same target during the outdoor comparison. Data highlighted in green are not used for combined and expanded uncertainties. Last row: Relative experimental variability of sensors evaluated from the results of field comparisons.


In general the uncertainty is calculated from the contributions originating from: (1) The spectral responsivity of the radiometer, including data from the calibration certificate; (2) interpolation of the spectral responsivity values to the designated wavelengths and/or spectral bands; (3) temporal instability of the radiometer; (4) contribution caused by polarization sensitivity; (5) non-linearity effects; (6) effect of spectral stray light; (7) temperature effects; (8) error of cosine collector; (9) type A component of recorded signal; (10) alignment and FOV effects.

The calibration uncertainty is most relevant for traceability to the SI units. The remaining uncertainty sources in Tables 4–6 describe variability between the sensors while overlooking possible systematic effects which can influence all the instruments in a similar way. Moreover, there was no fully characterized reference instrument involved during the LCE-2 outdoor exercise. Thus, the uncertainty analysis presented here is not sufficient to link the measurements to the SI units.

For the RAMSES group, the variability of radiance sensors during indoor and outdoor exercises (Figure 11, except C8 and C23) was close. Therefore, variability due to significant influence factor—temperature, and respective estimate used in uncertainty budget, can be considered practically the same as rather large systematic change is likely similar for all sensors [7]. For example, during outdoor measurements, temperature was rather stable varying from 5 ◦C to 9 ◦C, a range fairly comparable with variation of temperature during indoor exercise from 21 ◦C to 24 ◦C. As the construction of radiance and irradiance sensors (except the input optics) is similar, the similar estimate is likely suitable also for the temperature caused variability between irradiance sensors.

Some increase in variability may be expected, due to nonlinearity and spectral stray light of outdoor results. Major differences in combined uncertainty estimates for outdoor measurements are likely caused by different FOV of the sensors (including deviation from cosine response for irradiance instruments), and due to temporal variation and nonuniformity of the targets.

#### *4.1. Calibration Certificate*

The calibration certificates of the radiometers provide calibration points following the individual wavelength scale of the radiometer. During the relatively short time needed for LCE-2 measurements, this uncertainty component normally is not contributing to the variability between radiometric sensors freshly calibrated at the same laboratory using the same calibration standards. Therefore, this component is presented only for reference and is not included in the combined and expanded uncertainties. At the same time, for the full uncertainty of SI traceable results, the radiometric calibration uncertainty shall always be accounted for.

#### *4.2. Interpolation*

Interpolation of radiometer's data is needed due to differences between individual wavelength scales of the radiometers. Therefore, measured values were transferred for comparison to a common scale basis (a selection of Sentinel-3/OLCI bands). The uncertainty contribution associated with the interpolation of spectra is estimated using different interpolation algorithms. The weights used for binning hyperspectral data to OLCI bands depend on the wavelength scale and exact pixel positions of the hyperspectral sensor. Interpolation component includes interpolation, as well as wavelength scale uncertainty contributions. Figure 16 shows the change of the OLCI band values of a measured spectrum as a function of the wavelength scale error of a radiometer, as determined for a single RAMSES radiance sensor for the casts C8, C12, C17, and C23. The precision of the wavelength scale of the RAMSES instrument is stated by the manufacturer as 0.3 nm. For ±0.3 nm shift of the scale, the changes of the OLCI band values for the different spectra remain less than ±0.5% except for the 400 nm spectral band where the radiance changes rapidly with wavelength and the effect of shifting the wavelength scale is stronger.

**Figure 16.** Relative variability due to wavelength error of ±0.3 nm of a radiance sensor.

#### *4.3. Temporal Instability of Sensor*

Instability of the radiometric responsivity can be estimated from data of repeated radiometric calibrations. For LCE-2, the instruments were calibrated just before the comparisons and only short-term instability relevant for the time needed for the measurements has to be considered. The values are derived from the data collected in calibration sessions of LCE-2 and FICE-AAOT a year later, see 4.1.1 and Figure 10 in Reference [1]. The instability over two weeks was interpolated from the yearly variability data assuming only smooth drift (no mechanical shocks, no abrupt changes). Besides the instability of the sensors, data in Figure 10 of [1] include other uncertainty components related to the calibration setup (e.g., alignment, short-term lamp instability, etc).

#### *4.4. Polarization*

For the outdoor radiance measurements, the uncertainty contribution caused by polarization sensitivity is estimated using worst-case data in Reference [17]. Evaluation of the polarization effect for the outdoor irradiance measurement is difficult as the degree of linear polarization (DoLP) depends on various factors, such as wavelength, solar zenith angle (SZA), aerosol optical depth (AOD), amount and location of clouds, etc. In addition, the DoLP can strongly vary over the hemisphere, being due to Rayleigh scattering the largest at 90◦ from the Sun, and for the direct solar flux decreasing to zero. However, according to Reference [17] the polarization sensitivity of RAMSES irradiance sensors is rather small, hence, regardless of the DoLP value of downwelling irradiance the contribution of polarization effect in the uncertainty budget is also small. Uncertainty component of solar irradiance associated with polarization is estimated to be less than 0.25%.

#### *4.5. Nonlinearity*

The nonlinearity of the participating radiometers was evaluated by varying the integration time during the calibration. As an automatically adjusted optimal integration time is typically used in the field conditions, the class-specific method for the RAMSES instruments was developed and validated using the indoor results, see Equation (6) in Reference [1]. Variability between sensors due to nonlinearity was evaluated by applying this equation to the different casts of the field spectra of five RAMSES sensors.

#### *4.6. Spectral Stray Light*

Figure 17 presents the impact of the stray light in outdoor measurements. The effect is much stronger than in indoor experiment, due to significantly different spectral shape of the target and calibration signals. The general impact of the stray light correction is similar for RAMSES and HyperOCR radiometers. Variability between sensors and between different measurements targets for HyperOCR radiometers increases significantly in the NIR spectral region. This is probably related to the uncertainty associated with the stray light correction procedure and is not characteristic to the actual impact of spectral stray light. The spectral stray light matrices of HyperOCR sensors used in the analysis had a, somewhat, higher noise level compared to the matrices of the RAMSES instruments. Data in Tables 5 and 6 is estimated from Figure 17, and from References [21,22].

**Figure 17.** Stray light effects in the outdoor experiment. One RAMSES 8329 irradiance sensor—dashed line; two RAMSES and two HyperOCR radiance sensors: Solid lines with markers—blue sky (C12), and solid lines without markers—sunlit water (C23).

#### *4.7. Temperature*

For array spectroradiometers with silicon detectors, the present estimate for standard uncertainty, due to temperature variability (±1.5 ºC) in the spectral region from 400 nm to 700 nm is around 0.1% and will increase up to 0.6% for longer wavelengths (950 nm) [7]. In the case of outdoor measurements, the temperature differences between sensors quite likely were in the range of (±2 ºC), so temperature contribution is slightly larger than for indoor experiment. But outside air temperature between 5 ◦C and 9 ◦C was significantly lower than calibration temperature contributing to systematic biases common to all the instruments and not accounted in Tables 4–6.

#### *4.8. Cosine Error*

The irradiance sensors are calibrated using normal illumination, but during outdoor solar irradiance measurements the radiation arriving from hemisphere has to be measured with the angular dependence of responsivity ideally corresponding to the cosine of incidence angle. Typical class-specific values of uncertainty related to the deviation of cosine response are derived from Reference [8]. Measurements carried out after LCE-2 at TO (Figures 13 and 14) have shown that RAMSES sensors may have rather large cosine errors around ±10%. This may be a likely reason for excessive differences from +7% up to −16% evident for irradiance sensors during the LCE-2 outdoor measurements (Figure 10).

#### *4.9. Type A Uncertainty of Repeated Measurements*

The type A uncertainty was estimated from the ratio of two RAMSES radiometers. While there is strong autocorrelation in individual time series, due to the unstable nature of natural illumination, there was almost no correlation between individual ratios during one cast, and the effective number of measurements was close to the actual number of data points in the time series. The effective number of measurements was calculated by using the lag-1 autocorrelation coefficient, as shown in Reference [23].

The instability of the target signal during the outdoor measurements was significantly larger compared to the indoor experiment, however, all the instruments measured simultaneously and the impact of source temporal variability affected all the radiometers in a similar manner without causing differences between the sensors. This was verified by separately analyzing some shorter and more stable sections of the selected casts, no reduction of variability between the sensors was observed. Thus, the uncertainty due to the temporal variability of the target is not included in Tables 4–6.

#### *4.10. Alignment and Field-of-View*

During the outdoor radiance measurement, the spatial and temporal non-uniformity of the target can substantially contribute to the uncertainty, due to different FOV-s of the radiance sensors, and due to misalignment (Figures 7 and 8).

#### **5. Discussion and Conclusions**

For irradiance, the difference in cosine response is the main source of differences between different sensor groups revealed during the field experiment. For radiance, the angular response (different Field of Views) and spatial nonuniformity of the targets provides the main difference between different sensor groups. In the case of a spatially heterogeneous target (sky with scattered clouds, water at oblique viewing angle) the large differences of FOV of different sensors will likely cause significant discrepancies between sensors. Without reliable data or individual testing of the input properties of all involved sensors interpretation of measurement results may be strongly hindered. For field measurements the variability between radiance sensors was about two times larger than during indoor exercise, this can among others be explained by larger effects of outside influence factors like temperature, stray light and nonlinearity which all have not been corrected during the field experiment.

Dependence of the calibration coefficients on temperature can cause significant deviation from SI-traceable result. For maximum temperature difference of about 20 ◦C between calibration and later measurements (typically between 0 ◦C and 40 ◦C) a responsivity change more than 10% will be possible [3,7]. The calibration procedure may be improved if specified conditions will cover all situations possible during the use of a calibrated instrument. For example, if it is known that the radiometer has a linear response with temperature [7], the responsivity of the radiometer can be adequately evaluated when calibration is performed at three different temperatures covering the possible range of temperature variations during the later use.

Variability between irradiance sensors was about five times larger than that observed during indoor exercise. A large variability between sensors during outdoor exercise cannot be explained by poor stability of sensors, as stability check in lab conditions, a year later has shown smaller changes than during outdoor measurements some days after calibration. Variability cannot be fully explained by factors, such as temperature, nonlinearity, and stray light either as one could expect a smaller difference between radiance and irradiance sensors in this case. Most likely, the different behavior of RAMSES and HyperOCR sensors is largely due to a different construction of input optics of these sensors and hence imperfect cosine response. This hypothesis is supported by the angular response characterization of 5 RAMSES irradiance sensors and comparing the integral cosine error values in Figures 13–15 to the deviations from consensus value in the outdoor experiment, shown in Figure 10.

The different behavior of RAMSES and HyperOCR sensor groups was clearly revealed during the LCE-2 exercise. For the RAMSES group, the variability of radiance sensors during indoor and outdoor exercises was very similar, and larger variability for outdoor measurements was mostly caused by HyperOCR and WISP-3 sensors. For irradiance measurements, the deviation of HyperOCR sensors from the consensus value of the group was very small, and an increase in variability was caused mostly by the group of RAMSES sensors.

The indoor experiment has demonstrated great effectiveness of the radiometric calibration at the same laboratory just before intercomparison measurements [1] for obtaining consistent results. Nevertheless, a sufficient individual characterization of radiometers by testing them for all significant systematic effects beside regular radiometric calibration, is the shortest way to enable reduction of biases in outdoor intercomparisons, and thus smaller variability between measurements from different instruments, and more realistic and complete quantification of uncertainties in measurement.

#### *Lessons Learned for the Design of Future Intercomparisons*

In order to foster the interpretation of results, the following suggestions are proposed for the future outdoor intercomparison campaigns. The number of involved radiometers should be around ten for each radiometer type to strengthen the statistical representativeness of the analysis. Consistent calibration of the responsivity of all involved radiometers just before the campaign is indispensable. The 20◦ "offsetting" calibration method suggested in Reference [8] should be also tested by a comparison measurement. Calibration history of each radiometer should be available to detect long-term instabilities. Together with radiometric calibration, the angular response of all individual radiance and irradiance sensors should be measured if such information is not available from previous characterizations. Before radiometric calibration, all instruments involved should be tested or be characterized for temperature, nonlinearity, spectral stray light and wavelength scale effects. As these tests may be rather time-consuming they should be performed well before the radiometric calibration. Spectral responsivity should be calibrated at different ambient temperatures relevant for the campaign. Nonlinearity and wavelength correction coefficients should also be available. The usefulness of individual characterization of the spectral stray light should be further proven by thorough field tests using an independent validation method based on a reference instrument less affected by stray light.

During the outdoor campaign measurements, well-synchronized data acquisition for all instruments is strongly advised. Start timer should be aligned better than within ±1 s; setting exactly the same sampling interval for all sensors is indispensable. Data processing algorithms should be well defined and agreed between the participants. For that, sufficient calibration and test information should be available for each sensor in order to be able to apply likewise all needed corrections. Instruments' temperature should be recorded whenever possible. Using a well-characterized additional reference instrument is highly recommended, as well as using an aligned photo- or video camera to record the measurement scene during outdoor experiments simultaneously with radiometric sensors.

Metrological specifications of all involved radiometers should be based on suitable international standards whenever possible. Minimum requirements should be agreed between the participants, instruments involved should be tested to give evidence that all these minimum requirements are met.

**Author Contributions:** Conceptualization, V.V., J.K., I.A., R.V., K.A., K.R., C.D., T.C.; Data curation, V.V., J.K.; Formal analysis, V.V., I.A.; Investigation, J.K., I.A., R.V., K.A., K.R., A.A., M.B., H.B., M.C., D.D., G.D., B.D., T.D., C.G., K.K., M.L., B.P., G.T., R.V.D., S.W.; Methodology, V.V., J.K., I.A., R.V., K.A., K.R., M.C., K.K.; Project administration, R.V.; Software, I.A.; Validation, V.V., J.K., I.A.; Visualization, V.V., J.K., I.A., K.R., M.L.; Writing—original draft, V.V., J.K., I.A., R.V., K.A.; Writing—review and editing, V.V., J.K., I.A., R.V., K.A., K.R., A.A., M.B., H.B., M.C., D.D., G.D., B.D., T.D., C.G., K.K., M.L., B.P., G.T., R.V.D., S.W.; A.B., C.D., T.C.

**Funding:** This work was funded by European Space Agency project Fiducial Reference Measurements for Satellite Ocean Color (FRM4SOC), contract No. 4000117454/16/I-Sbo.

**Acknowledgments:** Support from numerous scientists, experts and administrative personnel contributing to the project are gratefully acknowledged. The authors express special gratitude to Giuseppe Zibordi (JRC of the EC), Anu Reinart (UT), Tiia Lillemaa (UT), Viljo Allik (UT), Claire Greenwell (NPL).

**Conflicts of Interest:** The authors declare no conflict of interest. Most authors collecting experimental data are customers of the respective instrument manufacturers. Two authors (R.V.D., B.D.) are employees of radiometer manufacturers. Two authors (J.K., K.R.) are developing a new radiometer for commercialization in 2021. Study data analysis and conclusions focus on aspects which are common to all radiometers and results are anonymized to avoid any risk of bias. This study does not constitute an endorsement of any of the products tested by any of the authors or their organizations.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
