1. Introduction
The National Polar-orbiting Operational Environmental Satellite System (NPOESS) Preparatory Project (NPP) satellite was launched on 28 October 2011, into a sun-synchronous, near-polar orbit with an ascending daytime equator crossing time of 13:30 and a 16-day repeat cycle. Soon after launch the satellite was renamed Suomi-NPP (S-NPP) in honor of Werner Suomi, and NPP now stands for “National Polar-orbiting Partnership.” The equator crossing time of S-NPP is the same as that of the A-Train satellites [
1], yet S-NPP’s orbit is higher (824 km) than those of the A-Train satellites (705 km) and thus the S-NPP’s orbital period is somewhat longer than for the A-Train.
S-NPP carries five major instruments of which the Visible Infrared Imaging Radiometer Suite (VIIRS) [
2,
3] is the focus here. The VIIRS is the successor to MODIS (Moderate-resolution Imaging Spectroradiometer) [
4,
5] and like MODIS it is a multispectral imaging radiometer taking measurements in discrete spectral bands in the visible and infrared parts of the electromagnetic spectrum, for research and applications related to the land, atmosphere, cryosphere, and ocean. VIIRS has nine spectral bands in the visible and near infrared plus a panchromatic day/night band; eight bands in the mid-wave infrared, and four in the thermal infrared. Four of the bands, two each in the mid-wave and thermal infrared, are suitable for the retrieval of Sea-Surface Temperature (SST).
SST is a critically important parameter in the climate system and has been declared to be an Essential Climate Variable (ECV) [
6]. The generation of Climate Data Records (CDRs) of SST from data from multiple satellite radiometers is tractable since temperature is one of seven base units of the International System of Units, universally abbreviated by SI (from the French Le Système International d’Unités) [
7]. Such traceability to SI temperature standards is achieved through comparison with ship radiometers with SI-traceable calibration [
8,
9].
CDRs were formally defined in a report of the U.S. National Academy of Sciences [
10] as “a data set designed to enable study and assessment of long-term climate change, with ‘long-term’ meaning year-to-year and decade-to-decade change. Climate research often involves the detection of small changes against a background of intense, short-term variations … The production of CDRs requires repeated analysis and refinement of long-term data sets, usually from multiple data sources.” The report emphasized the need for “data stability,” reasoning that “because natural signals are often small, it is difficult to ascribe particular events or processes to climate change … For this reason, long-term, high-quality measurements are needed to discern subtle shifts in Earth’s climate. Such measurements require an observing strategy emphasizing a strong commitment to maintaining data quality and minimizing gaps in coverage.” Because of the global coverage provided by polar-orbiting earth observation satellites, satellite-derived SSTs are perceived as the basis of CDRs. As such, satellite-derived SSTs must have a convincing determination of the accuracy characteristics of a long time series of measurements [
11].
An important aspect of the generation of an SST-CDR is the need to splice together SST retrievals from several consecutive missions, and this inevitably implies different satellite radiometer designs and characteristics as each generation of instruments benefits from technological improvements [
12]. An unbroken chain of calibration to an SI temperature reference for each source of satellite-derived SSTs is an important factor in generating SST CDRs using data from multiple satellite missions [
13]. Additionally, overlap of data derived from successive instruments is helpful—if not vital—to the generation of all satellite-based CDRs [
14].
Another aspect of generating SST-CDRs from several satellite instruments is the use of comparable algorithms to derive SST from the Top-of-Atmosphere (TOA) Infrared (IR) radiance measurements. Nevertheless, the algorithms for screening cloud-contaminated pixels and for correcting for the effects of clear-sky atmospheric effects, have to be optimized for the characteristics of each radiometer, so identical algorithms cannot be justifiably applied to the measurements of successive instruments. Compatibility with the MODIS SST algorithms [
5] has been a guiding principle to our approach to deriving SST from VIIRS data. Of course, this does not mean that the MODIS algorithms have to be “frozen in time.” Instead, an iterative approach has been adopted in which algorithm improvements derived from VIIRS can be applied to MODIS, if appropriate, as was the case in recent reprocessing of the entire MODIS missions, designated R2019.0 [
15], which includes new cloud screening algorithms that were initially developed for S-NPP VIIRS [
16].
Based on the expected magnitude of a climate change signal, the requirements of an SST CDR are an accuracy of 0.1 K and a stability of 0.04 K decade
−1 [
17]. Ohring et al. [
17] define “accuracy” as the measured bias or systematic error of the measurements, i.e., the difference between a short-term average measured value and the physical value; they define short-term average as “the average of a sufficient number of successive measurements of the variable under identical conditions, such that the random error is negligible relative to the systematic error.” In turn, “stability may be thought of as the extent to which the accuracy remains constant in time. Stability is measured by the maximum excursion of the short-term-average measured value of a variable under identical conditions over a decade. The smaller the maximum excursion, the greater the stability of the dataset.” It is very challenging to achieve such values of accuracy and stability of SSTs. Indeed, demonstrating the true accuracy of satellite-derived SSTs, which is done by comparison with independent measurements [
18,
19,
20,
21,
22], requires a concomitant understanding of the accuracies of the validating sensors and the consequences of the algorithms used to process their data, and of the variability introduced by the method of comparison [
23].
The temperature structure at the sea surface is complex, with temporal and spatial variability on many scales. Satellite measurements on horizontal scale of order 1 km do not resolve smaller-scale variations, such as surface renewal events [
24,
25,
26]. Infrared radiometers on earth observation satellites detect the emission from the sea surface, which has its origin in the electromagnetic skin layer (see Feynman et al. [
27], Volume 2, Chapter 32,
Section 7) on the aqueous side of the air-sea interface and which is modified by passage through the atmosphere. Thus, we refer to the SST derived through IR radiometry as SST
skin [
28]. The depth of the electromagnetic skin layer in the infrared is very small, 10–100 μm depending on wavelength [
29], and this is embedded in the mean thermal skin layer and the viscous sublayer [
30,
31]. The viscous sublayer, and the density difference between seawater and air, dampens turbulence close to the interface. In nearly all situations, the boundary layer of the atmosphere is cooler than the ocean surface, so the heat flow is to the interface and is through molecular conduction. The upward heat flow provides energy for the sensible and latent heat loss at the interface and the net infrared heat loss through the electromagnetic skin layer. Given conduction requires a temperature gradient, the temperature in the thermal skin layer decreases towards the interface [
32,
33,
34,
35,
36,
37,
38]; this is often referred to as the “skin effect.” The emission from the electromagnetic skin layer is thus from a layer colder than the water beneath. Consequently, the temperature derived from an IR radiometer is cooler than the temperature below, such as measured by a thermometer in the water beneath [
39,
40].
Following a description of the relevant characteristics of the VIIRS, this paper continues with a discussion of the independent data used in deriving the algorithms, and to assess the accuracy of VIIRS SST
skin. The cloud-screening and atmospheric correction algorithms are discussed prior to showing the results of comparisons between VIIRS SST
skin and independent data. There follows a discussion of the results and conclusions.
Appendix A presents a discussion of pixel-level quality flags, and Acronyms provides a list of acronyms and their meanings.
2. VIIRS Characteristics
The heritage IR radiometers of VIIRS are the Advanced Very-High-Resolution Radiometers (AVHRR) [
18,
41] on the NOAA and MetOp polar orbiting satellites, and the MODIS [
5,
42] on the NASA satellites
Terra and
Aqua. For the reflected solar radiation bands in the visible part of the spectrum, the heritage instruments are the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) [
43,
44] and MODIS [
4,
45]. The VIIRS design includes the best aspects of heritage instruments, thereby reducing risk of introducing instrumental artifacts, while simultaneously incorporating recent technological developments.
VIIRS includes the following heritage components:
The SeaWiFs foreoptics comprising a rotating telescope whereby the angle of incidence on the primary scan mirror is constant across the entire scan. This design avoids the complication of wavelength-dependent varying reflectivity inherent in the design of the paddle-wheel scan mirror of MODIS, which presented issues with quantitative radiometry in the first year of the
Terra mission [
5,
46].
Multiple detectors for each spectral band from MODIS, which has 10 detectors per infrared band each having a 1 km
2 surface field of view at nadir [
47]. VIIRS has 16 detectors for each band, with a 0.75 km × 0.75 km resolution at nadir [
48]. These moderate-resolution bands are designated as “M” spectral bands.
To retain image integrity across the swath, a plane mirror between the rotating telescope and the aft optics has to rotate at half the rate of the telescope. This component, called the “Half Angle Mirror” was taken from the SeaWiFs design and is double sided.
The radiometric calibration of the VIIRS channels follows the same approach and components as MODIS. The calibration relies on measurements of (a) cold space in the direction away from the sun and (b) the emission from a well-characterized internal blackbody target, whose temperature is measured by several embedded thermometers [
49,
50].
The spectral characteristics of the IR bands are constrained by the atmospheric transmission as an SST
skin retrieval requires the sensed signal to have a useful component originating at the surface, even though it is modified by the intervening atmosphere. As with the heritage instruments, the VIIRS bands used for SST
skin retrieval are placed in two atmospheric transmission “windows” in the mid-IR (3.5 μm < λ < 4.1 μm) and in the thermal IR (10 μm < λ < 13 μm). In
Figure 1, the atmospheric transmission spectrum was calculated using the Line-By-Line Radiative Transfer Model (LBLRTM) [
51] with the atmospheric profiles of temperature and humidity taken from ECMWF ERA-Interim [
52] reanalysis dataset for 1 July 2009 at 00:00 at 00.0° N, 00.0° E. Other gases were taken from the Standard Tropical Atmosphere. The spectroscopic properties of the gases were taken from the HITRAN database [
53]. The VIIRS bandwidths, being about 1 μm, are more comparable to those of the AVHRR, and are about twice as wide as the corresponding MODIS bands (
Figure 1;
Table 1). The specified Noise Equivalent Temperature Differences (NEδT) have been found to be smaller on orbit for both MODISs on
Terra and
Aqua [
54] than for S-NPP VIIRS [
55]. Small NEδT is desirable as it means that each brightness temperature derived from the radiance measurement is less noisy, and that the brightness temperature differences needed in the atmospheric correction algorithm (see
Section 4.2) are also more accurate, and become dominated by noise at smaller values.
The range of rotating telescope angles for which earth view data are taken is ±56° from nadir, resulting in a swath width of ~3000 km, which is wide enough to ensure overlap of adjacent swaths at all latitudes.
A new feature of the VIIRS optical system and on-orbit processing is pixel aggregation. This approach was developed to reduce the growth in pixel size that results from the earth’s curvature and beam spreading along longer atmospheric paths away from nadir, often referred to as the “bow-tie” effect. The native nadir pixel size for the moderate resolution bands, which include those used for SST
skin retrievals, is ~250 m in the scan direction and ~750 m in the along-track direction. Three successive measurements are averaged on-board prior to transmission to ground to yield the pixel size of 750 × 750 m
2 at nadir. At absolute scan angles >31.72° from nadir, the on-board processor averages two successive measurements. Finally, for absolute scan angles >47.87° individual measurements are transmitted (
Figure 2) [
48,
57,
58]. At each aggregation transition, the horizontal sampling interval along the swath returns a value close to that at nadir. This results in much smaller growth in the size of the samples towards the edges of the swath (
Figure 3) and this is much less for VIIRS than for previous broad swath imaging radiometers.
4. Processing Algorithms
As with any satellite IR imager being used to derive SSTskin from calibrated TOA Brightness Temperature (BT) measurements in appropriate spectral bands, there are two distinct data processing steps that need to be taken: (i) Identifying those measurements that are free from radiance from sources other than the surface and gaseous components of the atmosphere, and then (ii) correcting for the radiative effects of atmospheric gases. The first step is conventionally called “cloud screening” as the primary objective is to identify pixels that include radiance from clouds, and the second step is referred to as the “atmospheric correction”. The creation of a consistent SSTskin record with compatible error budgets from multiple infrared sensors can be facilitated if the data are processed using similar cloud identification methods, atmospheric correction algorithms, and ensuring that the stability and calibration accuracy of radiances be maintained on orbit.
Since launch, VIIRS ocean products have been generated by the NASA Ocean Biology Processing Group (OBPG), which uses the convention of identifying each reprocessed version of the satellite-derived variables with the letter “R” followed by the year in which year an algorithm or significant calibration change was introduced into production, and with decimal increments indicating minor algorithm improvements or code changes. The file metadata includes the processing code version. The current VIIRS SSTskin are designated R2016.2.
4.1. Cloud Screening
As with MODIS, for VIIRS we initially developed a recursive binary “classification-tree” to identify cloud contaminated measurements [
5], based on the approach originally developed for the AVHRR Pathfinder program [
18]. Generally, these Binary Decision Trees (BDTrees) are similar between day and night and are dominated by Long-Wave IR (LWIR) channel differences and spatial uniformity tests. However, in the daytime part of each orbit, reflected sunlight provides additional information allowing us to distinguish between the high reflectance clouds from the low reflectance sea surface. A factor that complicates the daytime decision tree is the high surface reflectance that occurs in regions of sun glitter; a separate set of tests is needed for these conditions. The differences in performance metrics, such as sensitivity and specificity [
82] of the decision-trees between day, night, and sun-glitter conditions, were expected to lead to differences in the effectiveness in the identification of cloud-free conditions. Research on sampling errors in derived SST
skin resulting from the presence of clouds however revealed significant differences in cloud persistence between day and night [
83,
84]. These findings are difficult to explain by physical processes, suggesting a problem with the cloud-screening algorithms in certain conditions, and so prompted a reassessment of the performance of the cloud screening approach. The false cloud-induced sampling errors were most severe at moderate to high latitudes in both hemispheres particularly in the daytime, and were initially identified in MODIS data, but subsequently also found in the early preliminary VIIRS fields.
The revised cloud screening algorithm for R2016.0 and subsequent versions of the VIIRS SST
skin retrievals is based on the machine learning approach of Alternating Decision Trees (ADTrees) [
85,
86]. The ADTree algorithm involves a collection of binary decision nodes forming a “branch” ending with a prediction node. Nodes contain a “vote” that is scaled to the predictive power of the test. When combined with “boosting algorithms”, where at each training iteration instances that were previously misclassified are given a larger weight, an accurate ensemble classification model can be developed. To predict likely cloud contamination, a pixel transits all decision nodes that are true, and the prediction values from all true nodes are summed to form the final vote. For VIIRS data, a positive sum indicates clear skies and a negative vote is cloudy. The magnitude of the vote provides an indication of the confidence in the classification for a given pixel. In some instances, the combined vote from a collection of weak prediction nodes when voting in the same way can modify or override the vote of a single strong prediction node. This new approach was developed for four classes of conditions: (i) Nighttime, (ii) daytime with glint coefficient ≤ 0.005, (iii) daytime moderate glint coefficient between 0.005 to 0.01, and (iv) daytime severe glint when 678 nm red reflectance >0.065. The use of the new classification algorithm improves the coverage of the VIIRS data in daily global maps by about ~5–10% at night and up to 60% during daytime depending on the location and season, indicating significant false positives (clear pixels flagged as cloudy) in the previous cloud-screening algorithms. The ADTree approach also improves the discrimination of clouds near ocean thermal fronts that were frequently misclassified as cloudy as a result of the large horizontal SST
skin gradients. Example VIIRS images using the ADTree classifier show improved retention of good quality pixels compared to the BDTree cloud screening are shown in
Figure 7. The ADTree approach is described in greater detail by Kilpatrick et al. [
16].
4.2. Atmospheric Correction Algorithms
The heritage sensor for SST
skin from VIIRS is MODIS, with VIIRS having four of the five bands that MODIS uses for retrieving SST
skin (
Table 1). Because of the influence of reflected and scattered solar radiation, measurements of satellite radiometers taken in the mid-infrared atmospheric transmission window cannot be used in the daytime portion of the orbit. As a result, NASA produces three distinct SST
skin products for VIIRS, as for MODIS. A pair of standard day and night retrievals, referred to as SST
skin, provide continuity and consistency with the 30+ year record of measurements from AVHHR [
18] and MODIS [
5]. The pair of standard products also facilitate the study of diurnal heating as the algorithm and coefficients are applied consistently between day and night. The second night-only product, SST
triple, is produced to take advantage of the cleaner atmospheric transmission window in the mid-infrared bands (MWIR) on both VIIRS and MODIS, capable of producing retrievals with lower uncertainty.
The atmospheric correction used for the standard SST
skin retrieval is based on measurements in the 10–13 μm atmospheric transmission window, often referred to as a “split-window” algorithm and is derived from the same algorithm form used for both MODIS and AVHRR Pathfinder—the Non-Linear SST (NLSST) [
87]:
where
SSTskin is the SST
skin derived during the day or night;
a0, …,
a3 are coefficients,
T11 is the BT measured in the band centered near λ = 11 µm (VIIRS band M15),
T12 is the BT measured in the band centered near λ = 12 µm (VIIRS band M16).
Tsfc is a first guess or climatological SST that scales the coefficient, multiplying the
T11–
T12 BT difference to account primarily for the effects of atmospheric water vapor (see
Figure 5 of Minnett et al. [
12]), the main cause of the atmospheric effect in these infrared spectral intervals [
87], and which is correlated with the SST [
88]. The unit of
Tsfc is °C with a lower bound of zero to prevent the difference term from becoming negative.
θ is the satellite zenith angle and this term compensates for the increasing atmospheric path length when the view angle is off-nadir. The coefficients in this equation and those below have been derived through regression analyses of the TOA BTs in the relevant VIIRS bands in conditions that have been identified as cloud-free, with coincident and contemporaneous in situ measurements from drifting and moored buoys and ancillary variables
θ and
Tsfc.
Building on our experience with both MODISs, the VIIRS R2016.0 algorithms use month-of-year coefficients estimated separately for six latitude bands (with boundaries at the Equator, ±20° and ±40°), thus there are 72 sets of coefficients. Coefficients are estimated from randomly selected 65% of the matchups identified as suitable for this purpose by the cloud decision trees and other quality tests (see
Section 4.4 and
Appendix A); the remaining 35% of the matchups is withheld to determine uncertainties. Tapered weights are applied within 5° of latitude of the boundaries of the latitudinal bands to avoid unphysical abrupt transitions in the atmospheric correction when transitioning from one latitude band to the other. These geographical coefficients provide a better seasonal atmospheric correction for both hemispheres independently compared to earlier approaches applied to MODIS measurements, which assumed the seasonality and geographic variability in the atmosphere could be captured by the 11–12 μm BT difference alone [
5].
The mid-IR MODIS nighttime algorithm is based on measurements in very narrow spectral bands at λ = 3.95 μm and 4.05 μm, but this pair is missing from VIIRS. Therefore, an algorithm similar to Equation (1) for application to nighttime measurements including the VIIRS band at λ = 3.7 μm (M12) is used. The resulting retrieval for this second nighttime SST
skin is referred to here as
SSTtriple and is based on the NLSST triple window SST algorithm [
87]:
At ~3000 km, the swath of VIIRS is wider than the 2440 km swath of MODIS, which results in there being no gaps on the ocean surface between adjacent swaths of successive ascending or descending arcs. To facilitate accurate SST retrieval across the entire swath, additional terms have been added to the MODIS and VIIRS atmospheric correction algorithms to account for the effects at the high emission angles and long atmospheric path lengths:
A further term has been added to the MODIS version of Equation (3) to take into account small differences in the reflectivity of the two sides of the mirror, but that is not required for VIIRS. For the nighttime retrievals:
Equations (3) and (4) together form the VIIRS R2016.0 and subsequent atmospheric correction algorithms. The VIIRS R2016.2 algorithms differ from earlier versions by the use of CMCSST reference fields instead of the DOISSTs, as Tsfc for daytime SSTskin and SSTtriple. At night Tsfc is the SSTtriple if available, otherwise it is the CMCSST value.
4.3. Processing Overview
The S-NPP VIIRS SST
skin files are produced by the OBPG and distributed by the NASA Ocean Biology Distributed Active Archive Center (OB.DAAC), both located at Goddard Space Flight Center, and in GHRSST L2p format (see below) by the Physical Oceanography Distributed Active Archive Center at the Jet Propulsion Laboratory. The data flow for the derivation of VIIRS SST
skin fields is shown in
Figure 8, which includes the main steps in generating matchup databases (MUDBs) that are critical to deriving algorithms applied to the VIIRS data to derive the SST
skin retrievals in near-real time, and in reprocessing the entire mission data when significant benefit is to be gained.
The NASA processing levels are defined by Parkinson, et al. [
89] and summarized in
Table 2.
Within GHRSST, several subdivisions of the L2 and L3 designations were developed, and of most interest here is the L2p, meaning “L2 preprocessed” level [
90]. L2p files comprise the same SST values in the same geographical coordinates as the parent L2 file, with estimates of pixel-by-pixel mean error and standard deviation error often derived from the MUDBs of in situ and satellite data, with ancillary data that may include surface wind speed, aerosol optical depth, sea ice concentration, time of measurement, and a set of quality control flags. The additional information is intended to guide the users in the application of the SST retrievals in a meaningful way.
The Level-1 calibrated BTs are included in the MUDBs along with geolocation data, surface temperature measurements, and VIIRS SST
skin retrievals at the University of Miami’s Rosenstiel School (RSMAS) and at the Cooperative Institute for Climate and Satellites—North Carolina (CICS-NC), where prototyping of the algorithms was done. The MUDBs were used to develop the VIIRS R2016.0 cloud-screening and atmospheric correction algorithms, which were subsequently delivered to NASA OBPG, tested, and installed in their production environment. Analysis of the MUDBs provide estimates of the accuracies of the SST
skin retrievals. The MUDBs are available from the OB.DAAC through
https://seabass.gsfc.nasa.gov/search/sst.
4.4. Quality Flags
A group of 15 quality flags, discussed in
Appendix A, are defined for each pixel, and combined to define the final pixel quality level. Quality levels also are used to control the Level-3 binning process, as only the highest quality level values available are summed into the Level-3 geographical bin in the Integerized Sinusoidal Equal Area Grid (ISEAG) used for all NASA ocean and some other products. Users should note that the NASA L2 SST products are distributed via two different data archive systems, the OB.DAAC and Physical Oceanography Distributed Active Archive Center (PO.DAAC). The two centers use an opposite convention to identify the best quality pixels. The GSFC OBPG is the producer of the L2 SST
skin fields and uses a convention where the best quality retrievals are assigned a quality level of 0. Here we use the OBPG quality order convention. The JPL PO.DAAC primarily services the physical science and modelling communities and converts the OBPG L2 files into L2p files following the requirements of the GHRSST (Group for High-Resolution SST) [
28] Data Specification 2.0. (GDS) [
91] where the best quality is represented by quality level 5. The corresponding meaning of each quality level for data obtained from each center is shown in
Table 3.
5. Assessment of Accuracy of VIIRS SSTskin
Our approach to assessing the performance of VIIRS in producing accurate SSTskin has relied on a series of analyses:
(a) Assessing the spatial characteristics of the VIIRS SSTskin fields by comparison with global SSTs derived from other satellite sensors or as represented in analysis fields for which we have some expectation of their accuracy.
(b) Assessing the accuracies of the VIIRS SSTskin retrievals in conditions that have passed a series of cloud screening tests and other quality tests by comparing the VIIRS SSTskin at the level of individual pixels, or small arrays of adjacent pixels, with the subsurface measurements from drifting or moored buoys, i.e., using the MUDB.
(c) Assessing the accuracies of the VIIRS SST
skin retrievals, as in (b) but using ship-based infrared radiometers (M-AERIs and ISARs), which measure the infrared emission from the ocean and atmosphere leading to a direct comparison of SST
skin. As discussed above, these instruments have SI-traceable calibration to standards at the National Institute of Standards and Technology (NIST) [
8] and the National Physical Laboratory (NPL) [
9] and therefore are the basis of the generation of CDRs of SST [
13].
5.1. Instrumental Performance and Artifacts
Because of lessons learned about the design and performance of heritage instruments, VIIRS measurements lack many of the instrumental artifacts—or at least these are much less pronounced—present, for instance, in the early data collected by MODIS on
Terra. These artifacts include the “response vs. scan angle” (RVS) behavior that results from wavelength-dependent infrared reflectivity of the MODIS scan mirror varying with the angle of incidence of the radiation at the mirror surface [
5,
42]. With the rotating telescope of the VIIRS fore-optics, it was expected that any such effects would be caused by the double-sided half-angle-mirror; this has been quantified on orbit by Xiong et al. [
55] who found it to be very small. As with MODIS, “detector banding” caused by multiple detectors in each spectral band in the along-track direction [
5,
92], was also found for VIIRS, but it has a much smaller magnitude [
93]. While there is evidence of other instrumental artifacts, they are not a major source of qualitative or quantitative shortcomings of the infrared channels for the S-NPP VIIRS.
5.2. Calibration
Our analyses have not revealed any fundamental problems with the on-board calibration of the VIIRS infrared channels, in agreement with an earlier study of Efremova et al. [
94]. Nevertheless, the impacts of such problems on derived quantities can be quite subtle and may be revealed only through analyses of longer time series of data.
5.3. Spatial Distribution of Differences with Heritage Data
The integrity of the spatial distribution of the VIIRS SSTs has been assessed by comparisons with independent SST fields. The first reference field used was the daily, global DOISST (
Section 3.2 above). An example of the difference field, VIIRS–DOISST is shown in
Figure 9 (top); VIIRS SST
skin values were derived using the R2016.0 3-band nighttime algorithm (Equation (4). The data are from day 12 August 2012 and are limited to those with the best quality flag. Some blue areas indicate locations where VIIRS SST
skin retrievals are likely influenced by the presence of atmospheric aerosols, and therefore are cooler than the correct SSTs [
95,
96,
97]. The DOISST field is tied to in situ measurements from drifting buoys for the bias correction of the AVHRR SSTs and is therefore less influenced by the atmospheric conditions. The areas where VIIRS appears to be warmer than the DOISST fields are more difficult to understand, but may be caused by the presence of dry layers [
98] that can occur in the mid-level to lower troposphere [
99]; however, from this comparison alone it is not clear whether the VIIRS SST
skin are showing a warm bias, or whether the DOISST have a cold bias.
Figure 9 also shows the comparison of VIIRS SST
skin with SSTs derived from the WindSat microwave radiometer. Because the sources of uncertainties in the microwave SSTs are different from those for infrared SST
skin retrievals, the uncertainties in the two fields used to derive the differences can be assumed to be uncorrelated. A concern about this comparison is that the geometry of the WindSat swaths requires compilation of five-days of measurements to generate complete global fields thus the time span between WindSat and VIIRS estimates can be over two days. Furthermore, the terminator orbit of
Coriolis means the overpass times are not close to those of S-NPP, so diurnal heating and cooling may contribute to the difference; the use of nighttime VIIRS SST
skin should reduce this possible contribution. However, these concerns aside, the SSTs from WindSat are of good quality [
100] and useful to assess VIIRS retrievals. The differences between VIIRS and WindSat SSTs (
Figure 9, bottom) reveal the same cold bias in regions where we expect aerosol contamination of the VIIRS retrievals, off West Africa and the Arabian Sea. In contrast, the WindSat comparison lacks the areas with a warm bias when compared to the DOISST fields. Although this conclusion is not definitive, this pattern is indicative of likely regional cold biases in the DOISST fields, not warm biases in the VIIRS SST
skin retrievals.
Global plots reveal the spatial pattern of the characteristics of VIIRS SST retrievals—which both reassure us and raise some concerns. However, these fields give no indication of the temporal character of retrieval behavior; Hovmöller diagrams can be used for such purpose.
Figure 10 shows the daily, zonal averages of the differences between the cloud-free, best-quality (QL = 0), VIIRS R2016.0 SSTs relative to the corresponding reference fields: DOISST, WindSat, and AMSR2, SSTs. The latitudinal envelopes of the Hovmöller diagrams indicate the seasonal migration of the ice edge around Antarctica in the south, and seasonal change in solar illumination in the north.
The daytime VIIRS SST
skin derived using the two-band atmospheric correction algorithm (Equation (4)) shows a positive bias relative to DOISST of more than 2 K at high latitudes in the summer, especially in the Northern Hemisphere. This pattern of VIIRS SST
skin > DOISST has a component that is a consequence of diurnal heating of the upper ocean that is present in the daytime VIIRS SST
skin retrievals but absent from the DOISST. Also, there are some regions especially at high latitudes where the DOISST is markedly colder than temperatures measured from buoys [
101]. The differences are much more uniform for both nighttime VIIRS fields with generally VIIRS SST
skin < DOISST.
Since WindSat is in a dawn-dusk orbit, diurnal heating is largely absent from WindSat fields, and this is apparent in the daytime comparison, but to a much smaller degree than for the DOISST comparison. Unlike the DOISST comparison, VIIRS SSTskin < WindSat SSTμw. at high northern latitudes during the summer, being more pronounced at night. Differences between VIIRS and WindSat SSTμw. are generally smaller in the winter in each hemisphere especially during the day; during the night, the differences are smaller in the Southern Hemisphere.
GCOM-W1, carrying AMSR2, is in the A-Train with an Equator-crossing time of 1:30 pm being the same as that of S-NPP. Although overpass times of the two satellites on a given day can differ by up to half an orbital period, ~50 min, the differences between VIIRS SSTskin and AMSR2 SSTμw. are expected to be generally small, and this is the case. During the day, the differences are smaller than for DOISST and WindSat comparisons, but with some seasonal characteristics of the AMSR2 differences. As with the WindSat comparison, VIIRS SSTskin are cooler than the microwave-derived SSTμw. at high northern latitudes, especially during summer at night, but the amplitude is smaller than for WindSat. In the Southern Hemisphere, the differences are generally much smaller. At night, VIIRS SSTskin < AMSR2 SSTμw. for most of the globe, with the exception of high southern latitudes during the summer of 2015 and 2016.
A feature that is apparent in many of the Hovmöller diagrams are zonal discrepancies that in some cases are aligned with the latitudinal band boundaries in the VIIRS atmospheric correction algorithm; this is suggestive of an issue with how the blending of the VIIRS SSTskin is accomplished at the boundaries. However, this may not be the only issue as the differences between microwave- and IR-derived SSTskin are of opposite signs for the WindSat and AMSR2 comparisons, implying that the problem cannot be isolated to the VIIRS algorithm. The nighttime zonal differences are more consistent across all comparisons at night.
5.4. Comparisons to In Situ Measurements
While the Hovmöller diagrams are useful in assessing the spatial and temporal consistencies between different satellite-derived SSTskin fields, they do not permit the attribution of the discrepancies, or parts of the differences under different circumstances, to the performance of either sensor. For this, we compare the VIIRS SSTskin to independent sea-surface, or near-surface, temperature measurements.
Global statistics of differences between R2016.0 VIIRS SST
skin and buoy temperature measurements are shown in
Table 4. The use of the median and robust standard deviation has become a more accepted method of representing the central tendency and dispersion of the differences as they are less sensitive to outliers in the distribution [
5,
23,
102]. The negative mean and median values at night have the correct sign for the cool skin effect, [
39,
40,
103]. Better approaches to identifying situations where large retrieval errors occur—and adjusting the algorithms or coefficient set accordingly—would improve these statistics. The accuracies of the retrievals degrade slightly when the full width of the swath is used, compared to the swath portion where the absolute satellite zenith angle is <55°.
Figure 11 shows SST
skin time series of the monthly median errors in latitudinal bands, from October 2012 through August 2016. The errors and uncertainties in both the day and night SST
skin retrievals are very stable month-to-month and across latitudinal zones.
The spatial consistency of retrieval errors relative to in situ buoys corrected for the cool skin bias are shown in
Figure 12 using a 5° resolution grid and best quality MUDB records for 2012–2016. Tropical regions with higher water vapor and dust aerosols indicate an increase in the robust standard deviation. Known regions with high atmospheric dust, West of Africa and in the Arabian Peninsula, generally have a cold bias often >0.5 K, indicating that the VIIRS quality flags are still not sufficiently identifying and masking episodic dust events.
5.5. Wind Speed Dependence
There is no reason why the accuracy of the atmospheric correction algorithm for the retrieval of SSTskin from IR radiometers should be directly influenced by the wind speed in the intervening atmosphere. By wind tilting of facets of the sea surface [
104], there is a potential influence through the apparent wind-speed dependence of the surface emissivity of the sea-surface [
105,
106,
107] but this effect is very small, with the exception of high emission angles, and where the atmosphere is very dry [
108]. Wind speed, however, does play a role in the comparison between VIIRS SSTskin and subsurface temperature from drifting buoys as the temperature difference between the depth of the subsurface measurement and the skin layer is wind-speed dependent. The effects of a larger temperature drop across the thermal skin layer at low winds can be seen in the nighttime distribution of VIIRS SSTskin–buoy temperature (
Figure 13, left) for wind speed <2 ms
−1, which is in agreement with ship-based measurements [
39,
40]. The same effect is present in daytime conditions but is masked by the much larger positive temperature difference that results from diurnal heating (
Figure 13, right).
5.6. Effects of Pixel Aggregation on SST Retrievals
The VIIRS on-board pixel aggregation decreases across the scan line from 3 to 2 pixels for scan angles between 31.59° and 44.68°, and a single pixel for scan angles >44.68° (
Figure 2 and
Figure 3). To assess the potential impact of the varying pixel aggregation on the error budget we examined the statistics of VIIRS SST
skin relative to buoy temperatures from buoys for MUDB observations within 3 pixels on either side of the two transitions. The results (
Figure 14) suggest that the aggregation scheme changes the retrieval accuracy by about ~10 mK across the swath.
5.7. Continuity with MODIS and AVHRR SSTs
A main motivation of this study is to establish how well VIIRS SST
skin retrievals can be collated with those from other broad-swath imaging infrared satellite radiometers to generate a consistent multi-decadal series that can form the basis of an SST CDR. As discussed above, the VIIRS SST
skin processing algorithms are of the same form as those applied to the measurements of both MODISs on
Terra and
Aqua, and also to the AVHRR on the NOAA-19 polar-orbiting weather satellite. The time series of the global monthly medians and robust standard deviations of the differences between the satellite SST
skin retrievals and the subsurface temperature measurements show good agreement for the period starting January 2012 (
Figure 15). The temperature differences are clustered around a value of −0.17 K, which is the mean skin effect. The interquartile range of the differences between the satellite SST
skin medians is 0.036 K and the robust standard deviations within each month for each satellite are in the range of 0.2–0.4 K.
The latitudinal variation in the differences between the satellite SST
skin and the subsurface temperatures also show good agreement in terms of median and robust standard deviation (
Figure 15). At high southern latitudes, the AVHRR SST
skin median differences are closer to zero than those of VIIRS, and MODIS, and in mid-latitudes generally show a higher robust standard deviation.
Hovmöller diagrams of the daily differences in the zonal averages of SST
skin derived from S-NPP VIIRS and MODIS on
Terra and
Aqua are shown in
Figure 16. The largest discrepancies are with the
Terra MODIS SST
skin during the day; this is a manifestation of diurnal heating that results from the differences in the satellite overpass times. As the Equator crossing time of
Terra is 10:30 compared to 13:30 for S-NPP, the VIIRS retrievals are much more likely to be influenced by diurnal heating. The diurnal heating effect is more pronounced in the Southern Hemisphere summers. In the northern hemisphere high latitudes, the overpass times of the two satellites reverse, with
Terra’s being in the afternoon, and S-NPP’s being in the morning, leading to the
Terra MODIS SST
skin retrievals sampling more diurnal heating than VIIRS. Also, this results in the systematic appearance each summer of VIIRS SST
skin < MODIS SST
skin for latitudes >~60° N, with smaller amplitudes at mid-latitudes. The transition from VIIRS retrievals being warmer than those of
Terra MODIS to being cooler occurs in a seasonally migrating latitude. At high northern latitudes, >~60° N, in winter VIIRS SST
skin > MODIS SST
skin in a systematic pattern, which may be caused by the different band widths of the thermal IR bands of VIIRS and MODIS (
Figure 1) causing the atmospheric correction algorithm to respond differently to very dry atmospheric conditions [
108]. Given that
Aqua and S-NPP have the same Equator crossing times, the differences in the zonally averaged SST
skin retrievals are expected to be small, and this is indeed the case with nearly all differences being <|0.2 K| and many latitudes during the day being <|0.1 K|.
The comparisons of nighttime retrievals reveal very small differences south of ~40° N for MODIS on both Terra and Aqua. North of ~40° N, the pattern is similar for both MODISs, and given the differences in the relative orbit configurations and the fact these are nighttime conditions, this pattern is unlikely to be caused by changes in the true SSTskin; instead, the differences may plausibly result from a systematic issue with the VIIRS atmospheric correction algorithm at high northern latitudes.
As with the Hovmöller diagrams of comparisons with reference fields (
Figure 10), there are linear patterns at constant latitudes in all comparisons, except with
Terra MODIS during the day when the signal from diurnal heating dominates. These discrepancies are at the boundaries of the latitude bands where there is a smoothed transition from SST
skin derived using one set of coefficients in the atmospheric correction algorithm to another set and are suggestive that this needs improvement.
The global statistics for nighttime VIIRS SST
skin retrievals and those of
Terra and
Aqua MODIS compared to subsurface temperatures from drifting buoys and M-AERIs are shown in
Table 5 for QL = 1 and QL = 0. The satellite-derived SST
skin are calculated using measurements in the thermal-IR: MODIS bands 31 and 32 and VIIRS bands M15 and M16 (
Table 1;
Figure 1). The mean and median of the buoy comparisons indicate the effects of the thermal skin layer, and there is a consistent increase in the metrics for QL = 1 retrievals, indicating imperfect corrections for longer atmospheric path lengths. Note, the MODIS data in the Figures and
Table 5 are from the NASA MODIS R2014.0.1 processing scheme, and will be different—improved it is to be hoped—when the MUDBs for the R2019.0 [
15] will be used when they are available.
5.8. Comparisons to Ship Radiometer Measurements
To remove the contributions to the uncertainty estimate of the satellite-derived SST
skin comparisons with buoy data introduced by both near-surface temperature gradients between the depths of buoy measurements and buoy thermometer inaccuracies, the satellite retrievals have been compared with SST
skin derived from well-calibrated ship-borne radiometers (
Section 3.4 and
Section 3.5).
Table 6 shows the statistics of the comparison between VIIRS and M-AERI SST
skin values for VIIRS day and night retrievals using the dual-band algorithm (Equation (3)) and also at night using the triple-band algorithm (Equation (4)), for QL = 0 and QL = 1 matchup. The discrepancies in the numbers of daytime and nighttime matchups are a result of the matchups based on M-AERI measurements from the cruise ships, which spend part of each day in tourist ports, and transit at night. M-AERI measurements within 10 km of a port are not included in the MUDBs. The smaller number of SST
triple comparisons compared to those of nighttime SST
skin is presumed to be a result of different ADTree cloud screening, but this requires further investigation. The mean and median differences are good, being less than 0.1 K of QL = 0, which is reassuring as the M-AERI data have been withheld from the atmospheric correction algorithms, and the variability, especially the robust standard deviation, is also good for all algorithms.
6. Discussion
The results based on analysis of about six years of on-orbit S-NPP VIIRS data suggest that the infrared bands of VIIRS lack many of the instrumental artifacts present in the early MODIS data, and that these bands are “clean” and stable over time.
SST
skin algorithms require well-calibrated TOA BTs. The on-orbit calibration process of VIIRS, along with information collected during pre-launch calibration and characterization, provides such data. However, a full assessment of the accuracy of the VIIRS TOA BT measurements would require an analysis of simultaneous nadir overpasses (SNO) measurements with a well-calibrated spectroradiometer on orbit, such as IASI, the Infrared Atmospheric Sounding Interferometer [
109,
110] on the European MetOp polar orbiters; such an analysis is beyond the scope of this study. A recent report of SNO measurements between VIIRS and MODIS on
Aqua [
111] has demonstrated consistency among the BTs of spectrally similar bands at the level of 0.2 K. Li et al. [
111] used measurements from IR hyperspectral radiometers on each spacecraft—CRIS on S-NPP and AIRS on
Aqua—to account for differences in the VIIRS and MODIS relative spectral response functions (
Figure 1).
Comparisons between global SST
skin fields derived from VIIRS and SST
μw (microwave retrieval adjusted by −0.15 K, see
Section 3.2) from the WindSat show cold biases in the VIIRS infrared SST
skin in regions where heavy loading of atmospheric aerosols are expected [
96,
112]. Similar cold biases have previously been identified in SSTs derived from other satellite infrared radiometers, AVHRR and MODIS [
97,
113,
114]. Comparisons with DOISST fields showed these cold biases in the VIIRS retrievals as well, but also warm biases, particularly in the central South Atlantic Ocean (
Figure 9). The absence of warm biases in the comparison with microwave-derived SST
μw from WindSat is strongly suggestive of cold biases in the DOISST fields.
The generally zonal features of the difference fields revealed in Hovmöller diagrams are likely due to transitions between latitude bands where algorithm coefficients change. The dawn-dusk orbit of WindSat, and the need to composite five days of data to generate near-complete global fields introduces concerns about temporal changes in the upper ocean when these are compared to VIIRS SSTskin retrievals. These temporal effects can be reduced by exploiting AMSR2 as a source of SSTμw.
Although the drifting buoys are known to have lower accuracies than assumed earlier and they take a subsurface temperature measurement, which is decoupled from the skin SST by near-surface temperature gradients [
64,
103], the large number of drifters renders them a valuable tool to assess the accuracies of satellite-derived SST
skin. The differences shown in
Table 4 are reasonable for a satellite radiometer of new design and could be improved through the development of more refined algorithms.
Accounting for the mean skin effect by reducing the buoy measurements by 0.17 K to permit a comparison with satellite-derived SST
skin is recognized as being a simple first order adjustment and contributes to the statistics of the differences presented above, especially in low wind speed conditions. An early attempt to use a wind-speed dependent skin temperature correction [
39,
40] did not improve the statistics, but with enhanced reanalysis fields such as ERA5, which has a 1 h time resolution and ~30 km spatial resolution [
115] having recently become available, reassessing the thermal skin correction in such comparisons would be timely.
Comparisons with SST
skin measurements taken from ship-board radiometers show a bias error close to zero degrees for measurements and encouragingly low variability (
Table 5). These results indicate that the VIIRS SST
skin are comparable in accuracy with those of the heritage instrument, MODIS [
5] and AVHRR [
18], and have a good potential to extend reliable and accurate SSTs into the future [
116].
The series of satellite infrared radiometers discussed here are broad-swath imagers, but another type of satellite infrared radiometer also has the potential to contribute to the SST CDR. These narrow swath radiometers are designed to provide accurate SST
skin field for climate research by making two measurements through the atmosphere to improve the accuracy of the atmospheric correction (see below). These radiometers, called the Along-Track Scanning Radiometers (ATSR), have been flown on a series of satellites of the European Space Agency starting in 1991 with ERS-1 and subsequent satellites [
117,
118,
119], including the Advanced ATSR (AATSR) [
119,
120] on ENVISAT [
121,
122] to the present with the Sea and Land Surface Temperature Radiometer [
123] on the Sentinel-3 satellites [
124]. The SST
skin retrievals from the ATSR series of satellite radiometers [
102,
125] are the basis of the ESA SST Climate Change Initiative [
126].
Until the recent release of a CDR of SSTs by Merchant, et al. [
127] based on the NOAA and EUMETSAT AVHRRs and the (A)ATSRs, the long time series of Pathfinder AVHRR [
18], MODIS [
5], and S-NPP VIIRS, described here, was the only consistent long-term satellite-derived global SST fields that could be considered to be a CDR. The Merchant et al. [
127] approaches to cloud screening [
128] and atmospheric correction algorithms [
129] are different from those described here, but their objectives are the same. Future research comparing the representation of multi-decadal SST
skin fields by both CDRs will be very important and enlightening.
It is tempting, and frequently done, to interpret the differences between satellite-derived SST
skin and any set of measurements used to validate them as being an estimate of the accuracies of the satellite retrievals. However, at the level of discrepancies that are presented here (
Table 5) and elsewhere, e.g., [
125], the contributions of the errors and uncertainties of the instruments used to provide the independent measurements ought to be considered. In addition, errors and uncertainties introduced by the methods of the comparison themselves—such as temporal and spatial variability in the permitted separation between the satellite and surface measurements in the MUDBs—should be taken into account. Thus, the statistics given here of comparisons with subsurface temperatures and ship radiometers are not a true estimate of the errors and uncertainties in the satellite retrievals. In reality, the true accuracies are better.
Other than the potential to contribute to the generation of a CDR of SST
skin, the applications of VIIRS SST
skin retrievals have not been addressed here. Although the improved spatial resolution of VIIRS pixels has the potential to improve the results of applying VIIRS retrievals in an operational setting and to many research studies, these are shared with the retrievals from other IR radiometers on satellites. Such applications are described in textbooks, such as [
130,
131], in a recent review paper [
12], and elsewhere.
7. Summary and Conclusions
We report on our analysis of the integrity of the VIIRS measurements, on the accuracies of the SSTskin retrievals, and the potential of these retrievals to contribute to the Climate Data Record of SST. Our approach to assess the performance of VIIRS in producing accurate SSTskin has involved a diverse suite of analyses. Our main findings are summarized as:
Infrared bands of VIIRS are very “clean” and lack many of the instrumental artifacts that were present in the initial MODIS measurements.
Spatial and temporal distributions of TOA BTs and uncertainties in derived SSTs tally well with those simulated using atmospheric radiative transfer equations.
Validation using other satellite-derived SSTs, analysis fields, ship-board radiometers, and buoys indicate that VIIRS SSTskin are of good accuracy and have the potential to make significant contribution to SST CDRs.
Performance of the heritage Binary Decision Tree cloud screening scheme has been much improved through the adoption of a new Alternating Decision Tree approach.
11–12 µm (day and night; Equation (3)) SST
skin retrievals show accuracies comparable to those of MODIS; the 3.75–11–12 µm nighttime VIIRS SST
triple retrievals (Equation (4)) show improved accuracies compared to the MODIS retrievals using the 3.65 and 4.05 μm measurements [
5].
The standard deviations of the SSTskin derived using the ADTree cloud screening and Equations (3) and (4) atmospheric correction algorithms indicate that VIIRS is capable of producing climate-quality SSTs.
The results presented here for the NASA VIIRS continuity algorithm are a subset of a much larger body of study on the accuracy of SST records. The aim of the broader effort is to develop and evaluate candidate SST
skin algorithms, improve assessment of pixel quality determination, and provide accuracy evaluation methodologies across multiple IR sensors, with the goal of extending and improving the existing four-decade IR-based SST CDR from the AVHRR, MODIS, and now VIIRS. However, it should be noted that NASA support for the improvement of VIIRS SST
skin retrievals ended in mid-2018 and as a result the accuracy of the retrievals is no longer being scrutinized to the same degree as before. Also, there are no current plans for algorithm improvements to be applied to VIIRS measurements, such as those implemented in the recent R2019.0 reprocessing of the MODIS SST
skin retrievals [
15], which included corrections for aerosol-burdened atmospheres [
97] and better accuracy at high latitudes [
108].