*Article* **Evaluation of Satellite-Based Algorithms to Retrieve Chlorophyll-a Concentration in the Canadian Atlantic and Pacific Oceans**

#### **Stephanie Clay 1,†, Angelica Peña 2, Brendan DeTracey <sup>3</sup> and Emmanuel Devred 3,\*,†**


Received: 3 October 2019; Accepted: 1 November 2019; Published: 7 November 2019

**Abstract:** Remote-sensing reflectance data collected by ocean colour satellites are processed using bio-optical algorithms to retrieve biogeochemical properties of the ocean. One such important property is the concentration of chlorophyll-a, an indicator of phytoplankton biomass that serves a multitude of purposes in various ocean science studies. Here, the performance of two generic chlorophyll-a algorithms (i.e., a band ratio one, Ocean Colour X (OCx), and a semi-analytical one, Garver–Siegel Maritorena (GSM)) was assessed against two large *in situ* datasets of chlorophyll-a concentration collected between 1999 and 2016 in the Northeast Pacific (NEP) and Northwest Atlantic (NWA) for three ocean colour sensors: Sea-viewing Wide Field-of-view Sensor (SeaWiFS), Moderate Resolution Imaging Spectroradiometer (MODIS), and Visible Infrared Imaging Radiometer Suite (VIIRS). In addition, new regionally-tuned versions of these two algorithms are presented, which reduced the mean error (mg m<sup>−</sup>3) of chlorophyll-a concentration modelled by OCx in the NWA from −0.40, −0.58 and −0.45 to 0.037, −0.087 and −0.018 for MODIS, SeaWiFS, and VIIRS respectively, and −0.34 and −0.36 to −0.0055 and −0.17 for SeaWiFS and VIIRS in the NEP. An analysis of the uncertainties in chlorophyll-a concentration retrieval showed a strong seasonal pattern in the NWA, which could be attributed to changes in phytoplankton community composition, but no long-term trends were found for all sensors and regions. It was also found that removing the 443 nm waveband for the OCx algorithms significantly improved the results in the NWA. Overall, GSM performed better than the OCx algorithms in both regions for all three sensors but generated fewer chlorophyll-a retrievals than the OCx algorithms.

**Keywords:** ocean colour; satellite-derived chlorophyll-a concentration; algorithm evaluation; Northwest Atlantic; Northeast Pacific

#### **1. Introduction**

The product derived from satellite ocean colour that is the most used is undoubtedly chlorophyll-a (*chla*) concentration, an index of phytoplankton biomass, which has numerous applications in biogeochemical oceanography, such as phytoplankton ecology and phenology [1,2], carbon cycles [3], climate change, transfer of energy to higher trophic levels, and water quality [4]. Satellite-derived *chla* concentration is a mature product of ocean colour that is used not only by experts in the field of bio-optics but also by the entire oceanographic community in various ways, such as modelling and data assimilation [5], fisheries applications [6], and ecosystem management [7]. For instance, European programs such as Copernicus (https://www.copernicus.eu/) or NOAA's COASTWATCH (https://coastwatch.noaa.gov/cw/index.html) provide daily *chla* data that are accessible to the public for fisheries and water quality applications. In Canada, Fisheries and Oceans have been relying on ocean colour for several decades to monitor the state of the marine ecosystem [8,9], and in particular on *chla* concentration, and its derived product primary production, to infer ecosystem indicators that are at the foundation of ocean management.

*Chla* concentration is derived from remote sensing reflectance (*Rrs*), which is the ratio of water-leaving radiance to downwelling irradiance corrected for sun geometry to allow for comparisons independent of locations, times and dates. Several approaches to infer *chla* from *Rrs* have been developed and embedded by space agencies in their data processing software, including band ratio [10] and semi-analytical models [11]. As its name suggests, band ratio algorithms exploit the ratio of wavebands in the blue and green to retrieve *chla*; the Ocean Colour X (OCx) (x stands for 2, 3, or 4 and indicates the number of bands that were used in the algorithm) suite of empirical algorithms have been developed by the National Aeronautics and Space Administration (NASA) using a global dataset of *in situ* measurements of *chla* concentration fitted to remote sensing reflectance (NOMAD, the NASA bio-Optical Marine Algorithm Dataset [12]) and a fourth-degree polynomial expression. On the other hand, semi-analytical algorithms (e.g., Garver–Siegel Maritorena (GSM)) consist of optimizing bio-optical parameters (including *chla* concentration) in an approximate solution of the radiative transfer equation to match modelled reflectance to the reflectance measured by the satellite. This type of algorithm has the advantage of decoupling the contribution of the optically active components (i.e., phytoplankton, non-algal particles and coloured dissolved organic carbon) such that *chla* concentration should, in theory, be retrieved with higher accuracy than the band ratio algorithms. These two types of approaches are well-suited for the so-called case-1 waters (i.e., waters where *chla* concentration drives the optical characteristics of bulk seawater) but do not perform as well in case-2 waters (i.e., where optical signals of marine components are uncorrelated). For case-2 waters, models that use fluorescence [13] or more advanced statistical methods, such as neural networks [14] or principal component analysis [15], have been demonstrated to perform better than band ratio or semi-analytical approaches. Note that the performance of algorithms that use remote sensing reflectance will be inherently dependent on the performance of the atmospheric correction procedure, which will not be addressed here.

It has been shown that OCx [15,16] and GSM [17] algorithms contain regional biases, such that even if its overall performance is satisfying, application to a given region results in systematic bias. Here, we assessed the performance of the OCx and GSM algorithms in Canadian waters (Northwest Atlantic (NWA) and Northeast Pacific (NEP)) using a dataset of *in situ chla* concentration that was collected by Fisheries and Oceans Canada as part of their monitoring of the marine ecosystem. Performance of these algorithms was evaluated for NASA's three sensors, namely the Sea-viewing Wide-field-of-View Sensor (SeaWiFS, 1998–2010), the Moderate Resolution Imaging Spectrometer on the Aqua platform (MODIS-Aqua, 2002–current) and the Visible and Infrared Imaging Spectrometer on the NPP platform (VIIRS, 2012–current). Uncertainties of the algorithms with respect to environmental variables were also discussed.

#### **2. Material and Methods**

#### *2.1. Regions of Interest*

Fisheries and Oceans Canada (DFO) carries out sea-going expeditions as part of its oceanographic monitoring programs to obtain information on the health of the marine ecosystem to support decision-making in the management of the ocean. In the Northwest Atlantic (NWA) Ocean, more than a hundred stations are visited twice a year in spring and fall on the Scotian Shelf and once a year (in spring) in the Labrador Sea as part of the fieldwork of the Atlantic Zone Monitoring Program (AZMP) and Atlantic Zone Offshore Monitoring Program (AZOMP) respectively (Figure 1). During these missions, a large number of physical, chemical and biological oceanographic parameters are collected [8]. As part of its remote sensing operations, DFO archives satellite data of ocean colour and sea-surface temperature in a region bounded by 39◦ to 82◦ north and 95◦ to 42◦ west. These data were used to assess the performance of ocean colour *chla* data products in the NWA. In a similar fashion, DFO runs monitoring programs in the NEP along Line P three times a year (in winter, spring and summer), off the west coast of Vancouver Island twice a year (in spring and summer) and in the Strait of Georgia three times a year (in spring, summer and fall), where physical, chemical and biological data are collected [18]. The in situ data from this region used in this study are bounded by 47◦ to 57◦ north, 148◦ to 123◦ west.

**Figure 1.** Locations of satellite matchups (**a**) in the Northwest Atlantic (NWA) from January to June, (**b**) from July to December, (**c**) in the Northeast Pacific (NEP) from January to June, and (**d**) from July to December.

#### *2.2. In Situ Samples*

DFO collects data on *chla* concentration using two methods, namely, Turner fluorescence (TF) and high-performance liquid chromatography (HPLC). While the most comprehensive dataset is the one obtained using the TF method (tens of thousands of data), here we used the dataset containing HPLC-derived *chla* concentration since it provides the most accurate estimates of *chla* and it is also the method recommended by space agencies for validation activities. A large dataset was compiled for the purpose of this study, consisting of the *in situ* samples gathered in the NWA and NEP which were analyzed in their respective DFO regional laboratories. In brief, water samples were rapidly filtered after collection through 25 mm GF/F glass fiber filters, immediately flash-frozen using liquid nitrogen, and stored in a −80 ◦C freezer until analysis in the laboratory. The NWA samples were analyzed using a Beckman–Coulter Gold HPLC system (1998–2013) and more recently an Agilent 1200 (2013–2014) HPLC instrument following the protocol of Head and Horne [19] as modified by Stuart and Head [20]. Similarly, the NEP samples were analyzed using a Waters Alliance System HPLC following the protocol of Zapata et al. [21]. The analysis was performed for 23 pigments for both

locations, but only *chla* concentrations from both regions and fucoxanthin (*f ucox*) concentrations from the NWA were used in this study. Note that chlorophyll-a concentration includes chlorophyll-a epimer and allomer pigments.

The complete dataset consisted of 1857 samples from the NWA between 1999 and 2014, and 1231 samples from the NEP between 2006 and 2016, all collected within ten metres from the surface. Both datasets showed a skewed normal distribution with a peak around 0.43 (0.42) mg m−<sup>3</sup> in the NWA (NEP) Ocean and a similar range of variation of *chla* from 0.03 to 29.41 mg m<sup>−</sup>3, but samples collected in the NEP had a mean concentration (2.26 mg m−3) greater than in the NWA, where the mean concentration was 1.68 mg m−<sup>3</sup> (Figure 2a). In the NWA, samples were collected mainly in spring (April to June) and fall (September to November) with a few exceptions of data collected in July and August, whereas the NEP dataset consisted of samples collected in winter (February to March), late spring-early summer (June to July) and late summer to early fall (August through October) (Figure 2b). Both datasets captured the seasonal cycle of phytoplankton, which was consistent with the wide range of biomass concentrations. The dataset was reduced to measurements collected within a day of satellite passes and ten kilometres from a valid pixel.

**Figure 2.** (**a**) Density function distribution of *in situ* chlorophyll-a concentration for the Northwest Atlantic (grey) and Northeast Pacific (lavender), and (**b**) total number of *in situ* samples available in the entire dataset for each month.

#### *2.3. Satellite Matchups*

The 2014 reprocessed datasets of SeaWiFS merged local area coverage (MLAC), MODIS-Aqua local area coverage (LAC), and VIIRS-Suomi NPP level-2 satellite scenes were downloaded from NASA's ocean colour L1/L2 browser (https://oceancolor.gsfc.nasa.gov/cgi/browse.pl?sen=am). For each image, a mask was applied to remove pixels with invalid data, defined as any pixel where one or more of the following criteria were met:


A search was performed for potential matches in any satellite image taken within a day of the *in situ* sample, as exact sampling times were not always available in the database. In order to match a satellite value to an *in situ* measurement, the satellite image was projected using a Gnomonic projection centred on the *in situ* point, which stretched great circles to straight lines. In other words, the image is projected onto a plane tangent to the *in situ* point, from which Cartesian coordinates (*x*, *y*) can be extracted for each corresponding (longitude, latitude) pair. For the sake of speed and simplicity, the Pythagorean theorem was then used to compute the distance between the *in situ* point and each satellite pixel. Though this method did not account for Earth's curvature, the computed distances at the

maximum allowed radius around the *in situ* point (i.e., 10 km) were found to be within ±23 m of the more accurate geodetic distance as calculated by the *distGeo*() function using R Statistical Software [22]. A successful match was defined as the closest pixel, within a 10 km radius, that contained at least three valid pixels in a 3 × 3 pixel box extracted around the matching point. The distance between the *in situ* sample and the closest satellite pixel was recorded in the database. The median of the 3 × 3 pixel matrix was computed for each of the extracted remote sensing reflectance (*Rrs*) wavelengths, and the resulting dataset was considered as the total number of matches, mapped in Figure 1.

From the initial dataset of 1857 and 1231 samples for the NWA and NEP respectively, only 416 (SeaWiFS), 530 (MODIS) and 176 (VIIRS) *in situ* samples were matched to valid satellite pixels in the NWA, and 45 (SeaWiFS), 487 (MODIS) and 342 (VIIRS) in the NEP (Table 1). The small number of matchups compared to the initial dataset attests to the difficulty of compiling archives of satellite data that match *in situ* samples for validation purposes and emphasizes the need for recurrent measurements such as monitoring programs and moored automatic sensors.

**Table 1.** Specifications of sensors and matchup datasets. Bold and underlined numbers correspond to blue (*λblue*) and green (*λgreen*) bands used in the band ratio Ocean Colour X (OCx) algorithms respectively (see Section 2.4.1).


Finally, *chla* concentration was derived using the OCx algorithm (Section 2.4.1) and pixels with a coefficient of variation (computed as the ratio of the standard deviation to the mean of all pixels available in the 3 × 3 matrix) exceeding 0.5 were removed from the matchup dataset.

#### *2.4. Chlorophyll-a Algorithms*

Two main algorithms were tested: The Ocean Colour (OCx) [10] and the Garver–Siegel Maritorena (GSM) [23] algorithms, as well as variations of these original versions that were regionally-tuned. The variations consist of four new versions for OCx, following the same polynomial format in degrees 1–4 with regional coefficients, and three optimal versions of GSM, with different combinations of three modifications to the algorithm, described in Section 2.4.4. For the sake of simplicity, results from only six algorithms were analyzed in this study: the two original algorithms (OCx and GSM), two regionally-tuned counterparts for OCx, and two for GSM. The results of the remaining algorithms have been included in Appendix A. The algorithms were implemented using version 3.3.2 of R Statistical Software [24] and several functions and packages referenced in the text.

#### 2.4.1. OCx-Type Algorithms

These empirical algorithms use a polynomial equation to express the log-10 transformed *chla* as a function of the log10-transformed ratio of blue to green remote sensing reflectance (Table 1). The blue band is selected to maximise the blue-to-green ratio, *ROCx*:

$$R\_{\rm OCx} = \log\_{10} \left( \max \left( \frac{R\_{rs}(\lambda\_{\rm blur})}{R\_{rs}(\lambda\_{\rm grav})} \right) \right) . \tag{1}$$

The ratio of blue to green *Rrs* values is used as opposed to a single *Rrs* value in an attempt to normalize the data, as *Rrs* values in the blue part of the spectrum vary greatly with *chla*, whereas the green wavebands have a more limited range of variation. As *chla* increases, *Rrs*(443) will decrease and become close to zero, reaching the noise level of the sensor. To avoid this issue, the algorithm switches to longer wavebands where the signal still correlates with *chla* and remains detectable. Additionally, longer wavelengths (e.g., 490 and 510) are less affected than the 443 nm band by variations in the concentration of coloured dissolved organic matter (i.e., yellow substances), and are therefore selected in the numerator of the ratio. Log10 *chla* is calculated using the following polynomial algorithm:

$$
\log\_{10}(chla) = a + b\,R\_{\text{OCx}} + c\,R\_{\text{OCx}}^2 + d\,R\_{\text{OCx}}^3 + e\,R\_{\text{OCx}}^4.\tag{2}
$$

NASA uses the method prescribed in O'Reilly et al. [10] to retrieve the optimal coefficients *a*, *b*, *c*, *d*, and *e*. This is an iterative method that constrains the slope and intercept of the linear regression between model and *in situ chla* to 1 and 0 respectively, minimizing the *RMSLE* (root mean squared logarithmic error, see Equation (12) in Section 2.5.1) and maximizing the coefficient of determination, *r*2. The comparison between band ratio models is then simplified since the slope, intercept, and consequently, the bias are all equal, and only the *RMSLE* and *r*<sup>2</sup> need to be evaluated. The *in situ* data used for the optimization of the original OCx algorithm is the global NASA Bio-Optical Marine Algorithm Dataset (NOMAD, version 2). Each sensor has a different set of wavelengths in the blue and green part of the spectrum (Table 1), such that the band ratio algorithm parameters are optimized for each sensor and are referred to as OC3M for MODIS, OC4 for SeaWiFS, and OC3V for VIIRS.

#### 2.4.2. Regional Tuning of the OCx Algorithm

To remain consistent with the matchup exercise, the median value of the 3 × 3 pixel matrix of each waveband was computed to represent the *Rrs*(*λ*) of the matchup. Note that the 443 nm waveband was removed from the potential blue wavebands in the band ratio as it was found to negatively affect the retrieval of *chla* (see discussion in Section 3.1). Several polynomial formulations from degrees one to four were tested and are hereafter referred to as "POLY1", "POLY2", "POLY3", and "POLY4" respectively. The optimal coefficients were retrieved using an iterative procedure to force the slope and intercept of the linear regression between satellite-derived and *in situ chla* to one and zero respectively as in O'Reilly et al. [10]. A 95% confidence interval for each optimal coefficient was computed using the *boot*() function in R with 2000 iterations [25,26]. For the sake of clarity and to investigate any potential benefits of adding higher-degree terms to the polynomial, the scores were computed for every polynomial degree, but only the results of the first and fourth-degree polynomials are discussed in the study. The parametrization and full results for the second and third polynomial are reported in Tables A1, A2 and A7 of Appendix A.

#### 2.4.3. Original GSM

The GSM semi-analytical algorithm was developed by Garver and Siegel [27], and modified by Maritorena et al. [23]. The model uses a quadratic formulation (Equation (3)) to relate the underwater remote-sensing reflectance values to the ratio of total absorption, *a* (m−1), and backscattering, *bb* (m<sup>−</sup>1) [28]. Note that the wavelength dependence in the following equations has been removed for the sake of legibility:

$$R\_{rs,model}(0^-) = g\_1 \left(\frac{b\_b}{a+b\_b}\right) + g\_2 \left(\frac{b\_b}{a+b\_b}\right)^2,\tag{3}$$

where *g*<sup>1</sup> = 0.0949 and *g*<sup>2</sup> = 0.0794. Total absorption and backscattering are defined as follows:

$$a(\lambda) \quad = \quad a\_{\text{\textquotedblleft}}(\lambda) + a\_{\text{\textquotedblleft}}(\lambda) + a\_{\text{dg}}(\lambda), \tag{4}$$

$$=\left.a\_w(\lambda) + c\hbar la \times a\_{ph}^\*(\lambda) + a\_{dg}(443)\exp\left(-S(\lambda - 443)\right)\right| \tag{5}$$

and

$$b\_b(\lambda) \quad = \quad b\_{bw}(\lambda) + b\_{bp}(\lambda) \tag{6}$$

$$=\quad b\_{b\text{pv}}(\lambda) + b\_{bp}(\text{443}) \left(\frac{\text{443}}{\lambda}\right)^{Y},\tag{7}$$

where *λ* is the wavelength (nm). The term *a* corresponds to the total absorption term and is the sum of the absorption terms of pure seawater (*aw*), phytoplankton (*aph*), and coloured detrital and dissolved organic materials (*adg*), which are combined in a single term given their similar spectral shapes; *S* expresses the exponential decrease of *adg* with increasing wavelength using the reference wavelength at 443 nm (i.e., *adg*(443)). The phytoplankton absorption term is defined as the concentration of *chla* multiplied by its specific absorption coefficient, *a*- *ph* (i.e., absorption per unit *chla*), to account for its spectral variation. The total backscattering term is the sum of the backscattering terms of pure seawater (*bbw*) and particulate matter (*bbp*), the latter expressed as the product between a reference value (i.e., *bbp*(443)) and the power-law-like decrease with increasing wavelength by the exponent *Y*. The coefficient *a*- *ph*, given in Table A10 of Appendix A, was determined using *in situ* measurements of *chla* and phytoplankton absorption at the wavelengths of interest, and computing the mean value of absorption divided by *chla* across all available records within the AZMP database that spans from 1998 to 2014. The *aw* coefficients were derived from tabulated values with a 2.5 nm resolution from 400 to 700 nm [29], and when necessary, interpolated to the corresponding wavelengths of each sensor. Similarly, the *bbw* coefficients were derived from a 10 nm resolution table of values generated by Smith and Baker [30]. The spectral slope of yellow substances absorption *S* and the power-law decrease of particulate backscattering *Y* were optimized for use in the original algorithm to 0.02061 and 1.03373 respectively using a large simulated dataset [31]. The updated GSM models in this study tuned *S* and *Y* using the dataset collected from the NWA and NEP (see Section 2.4.4 for details).

*Rrs*,*satellite*(0−) was retrieved by taking the median *Rrs*,*satellite*(0+) of the 3 <sup>×</sup> 3 matrix at each wavelength for a given match, then converting it from surface to underwater reflectance using Lee et al. [32]:

$$R\_{rs, satellite}(0^-) = \frac{R\_{rs}(0^+)}{0.52 + 1.7R\_{rs}(0^+)}.\tag{8}$$

Non-linear least squares regression, using the Gauss–Newton algorithm of the *nls*() function [24] to minimize the difference between modelled and satellite-derived *Rrs*, was implemented with a maximum of 30 iterations to derive the remaining three unknowns in Equation (3): *chla*, the absorption coefficient of coloured detrital and dissolved material at the reference wavelength (*adg*(443)), and the backscattering coefficient of particulate matter at the reference wavelength (*bbp*(443)). After retrieving these unknowns, a correction factor of 0.754188 was applied to the *adg*(443) value according to Maritorena et al. [23]. The following constraints were then placed on each to ensure realistic values:

$$\begin{cases} 0.01 \le c h l a \le 64 \,\text{mg } \text{m}^{-3} \\ 0.0001 \le a\_{d\zeta}(443) \le 2 \,\text{m}^{-1} \\ 0.0001 \le b\_{bp}(443) \le 0.1 \,\text{m}^{-1} \end{cases} \tag{9}$$

The original version of the model is abbreviated here as GSM\_ORIG.

#### 2.4.4. Regional Tuning of GSM

Similarly to the optimization of the OCx algorithm, *Rrs*(*λ*) for a given satellite matchup was computed as the median of the corresponding 3 × 3 pixel matrix centered on the matchup pixel. Three modifications to the original GSM algorithm were tested. First, a new optimized exponent, *P*, was added to the *chla* term to correct for systematic underestimation at high *chla* and overestimation

at low *chla* (see Sections 3.2 and 3.3) that were potentially caused by the change of *a*- *ph* with increasing *chla* (i.e., packaging effect [33]). The expression of the total absorption became:

$$a(\lambda) = a\_w(\lambda) + chl a^P \times a\_{ph}^\*(\lambda) + a\_{dg}(443) \times \exp(-S(\lambda - 443)). \tag{10}$$

Second, the *g* parameters (see Equation (3) and Table A8) were optimized and spectral dependency was included as well as a new term *g*<sup>3</sup> (Equation (11)) using the synthetic dataset from the IOCCG working group on remote sensing of inherent optical properties [31]. Coefficients were retrieved at 10 nm intervals from 400 nm to 700 nm and interpolated to sensor-specific wavelengths when necessary.

$$R\_{rs,model}(0^-) = \lg\_1\left(\frac{b\_b}{a+b\_b}\right) + \lg\_2\left(\frac{b\_b}{a+b\_b}\right)^{\xi\_3}.\tag{11}$$

Third, a sensitivity analysis was performed to compute the optimal exponents *S* (in the *adg* term), *Y* (in the *bbp* term), and the new exponent *P* (in the *aph* term), which provided the best agreement between *in situ* and satellite-derived *chla* for the NWA and NEP regions. A total of 5355 (17 × 35 × 9) possible combinations of the three exponents were tested, ranging from 0.008 to 0.04 with an increasing step of 0.002 for *S*, 0.5 to 2.2 with an increasing step of 0.05 for *Y*, and 0.4 to 0.8 with an increasing step of 0.05 for *P*. These combinations of exponents were evaluated for all sensors and regions, as well as for both constant *g* values (see Equation (3)) and the new spectral *g* values (see Equation (11)). The best set of exponents was determined using a similar scoring method as that used for the algorithm performance evaluation (see Section 2.5.2), where each set of exponents was ranked according to a selected set of metrics. To remain consistent with the methods of optimizing band ratio coefficients, the slope and intercept, as well as *RMSLE* and *r*2, were selected for this test, and a score was assigned to each statistic based on its mean across all possible combinations and its value relative to other combinations. The optimal set of exponents was defined as the median of all sets with the highest sum of scores. At least 50% valid retrievals were required for each potential combination in the test, in order to avoid sets with low error values but with a poor predictive capacity. The optimal exponents are given in Table A9.

The modified versions of GSM were divided into three separate algorithms according to the changes that were made. All new versions included the exponent *P* on the *chla* term: 1) GSM\_GC used the original *g* coefficients and regionally-tuned *P*, *S*, and *Y*, 2) GSM\_GCGS used the spectral *g* coefficients in combination with the optimized *P*, *S*, and *Y* parameters from GSM\_GC; and 3) GSM\_GS used the spectral *g* coefficients optimized simultaneously with *P*, *S*, and *Y*. The results of GSM\_GC and GSM\_GS are discussed in this study to demonstrate improvement in *chla* retrieval when using the new spectrally-dependent *g* coefficients and the *P* exponent in the phytoplankton-dependent term, however only the scores of GSM\_GCGS are included in the main text for simplicity's sake, and the full results in Appendix A (see Tables A1 and A2).

#### *2.5. Performance Metrics*

#### 2.5.1. Statistical Models

Several sets of statistics were used for the evaluation of the algorithm. First, the total number of matchups (*N*) and the number of valid retrievals (*n*), were used to calculate the percentage of retrievals. Second, several formulations were used to express the error between *in situ* and satellite-derived *chla*, including the mean error (*μerror*, units of mg m<sup>−</sup>3), as well as the root mean squared logarithmic error (*RMSLE*):

$$RMSELE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left( \log\_{10}(\mathbb{C}\_i^\*) - \log\_{10}(\mathbb{C}\_i) \right)^2} \tag{12}$$

the mean log-transformed error (*MLE*):

$$MLE = 10^{\left(\frac{1}{n}\sum\_{i=1}^{n}\left(\log\_{10}(C\_i^\*) - \log\_{10}(C\_i)\right)\right)}\tag{13}$$

and the mean magnitude of log-transformed error (*MMLE*),

$$MMLE = 10^{\left(\frac{1}{n}\sum\_{i=1}^{n} \left| \log\_{10}(C\_i^\*) - \log\_{10}(C\_i) \right| \right)\_{\prime}} \tag{14}$$

where *C*∗ *<sup>i</sup>* corresponds to satellite-derived *chla* of matchup *i* and *Ci* corresponds to *in situ chla* of matchup *i*. These three metrics are unitless as a result of the log transformation, and the latter two metrics have an ideal value of one, stemming from the reverse log transformation that converts them from linear to multiplicative space. Third, the intercept, slope, and coefficient of determination (*r*2) were computed for the linear regression of log-transformed satellite-derived *chla* on log-transformed *in situ chla* using the Standard Major Axis method of Type 2 regression in the *lmodel*2() function [34]. Finally, a metric referred to as the "win ratio" based on Seegers et. al. [35], was also computed. The full dataset of *in situ*/satellite matchups for a region/sensor was pared down to a new total value, *Tvalid*, which only included matchups with valid satellite-derived *chla* for all algorithms. This facilitated comparisons between the errors of individual matchups for each algorithm. For a single *in situ*/satellite pair, the algorithm that generated the lowest magnitude of error was the "winner". Each model's total number of "wins" in the set, *winsmodel*, was then divided by *Tvalid*:

$$
\hat{u}v\hat{u}\hat{u}\hat{u}\hat{u}\_{model} = \frac{w\text{ins}\_{model}}{T\_{valid}}.\tag{15}$$

For example, if a dataset contains 500 matchups for which every algorithm generates a valid satellite *chla* concentration, and one of the algorithms (e.g., POLY1) produces the lowest error out of all algorithms for 70 of the 500 matchups, then the win ratio of POLY1 would be 70/500 = 0.14.

#### 2.5.2. Scoring Method

The performance of all algorithms was evaluated using a modified version of the ranking approach developed by Brewin et al. [36], which relies on a slightly different set of metrics. Five statistics were selected as suggested in Seegers et al. [35] in an attempt to summarize the bias, accuracy, and precision of the model, as described below:


The *RMSLE* was excluded from the scoring method in order to avoid redundant metrics that could skew the results by adding more emphasis to one element of the assessment (for example, bias) instead of giving all elements equal weight, however, its result is presented here for comparison with the literature. In our evaluation approach, all the metrics used were considered of equal weight. Each statistic was transformed to its magnitude of difference from the ideal value (for example, *r*<sup>2</sup> = 0.6 would be transformed to 0.4, as the ideal *r*<sup>2</sup> value is one) to give each the same reference point of zero so that lower statistics get higher scores. The mean of these transformed statistics was calculated across the results for all six algorithms. For a given algorithm, each of the five statistics was then scored as follows, taking into account their value relative to the mean and to the individual values of other algorithms:

#### *Remote Sens.* **2019**, *11*, 2609


Statistical scores for each algorithm were then summed to give the algorithm's overall performance in comparison to others (Tables 3 and 4).

#### *2.6. Temporal, Geographic and Phytoplankton Composition Influences on Algorithm Performance*

The performance of the algorithms was also evaluated against environmental properties to detect any systematic bias or increase in uncertainties due to external factors. We studied the variability in the error (*Echla*) as a function of time (defined as year + day of year/365) using boxplots generated by *ggplot*2() and *geom*\_*boxplot*() in R [37], given that the validation exercises span over several years for all three sensors and both regions. To examine the relationships between the uncertainties and other environment variables, a correlation analysis was performed on the magnitude of error (|*Echla*|) between *in situ* and satellite-derived *chla* against the variables listed below, using the *shapiro*.*test*() function to check that each subset followed a normal distribution, and the *cor*.*test*() function to retrieve the correlation coefficient and *p*-value [24]:


The ratio of *f ucox* to *chla* was selected to provide the ability to detect any bias due to change in phytoplankton composition, as the *f ucox* pigment is an indicator of the presence of diatoms [38,39].

To test if the seasonal cycle induced different uncertainties we performed a Student's *t*-test on *μerror* for each sensor and algorithm between spring and fall in the NWA, and similarly a One-Way ANOVA test between spring, summer and fall in the NEP. A separate *t*-test was also performed on *μerror* between coastal and open ocean bathymetry subsets (as defined above). Again, the assumption of normality of each subset was checked using the *shapiro*.*test*() function and assumptions of equal variance between subsets were checked using the *var*.*test*() and *f ligner*.*test*() functions for comparisons of two groups and three groups respectively. The tests for differences in means between subsets were conducted using the functions *t*.*test*() (for two groups) and *aov*() (three groups). Tukey's honest significant difference was performed on the ANOVA results from the NEP to determine which seasons generated mean errors that were statistically different from each other, using the *TukeyHSD*() function [24].

#### **3. Results**

#### *3.1. Parameters of the Optimized Band Ratio Algorithms*

Regionalization of the OCx and GSM algorithms was carried out using satellite-derived and *in situ chla*, which differs from the original algorithms where the coefficients of the models were derived using remote sensing reflectances and *chla* measured *in situ*. The regionally-tuned coefficients of the band ratio models remained consistent with the original parameters and notably, the first two coefficients (i.e., *a* and *b*, Table 2) were within the same order of magnitude and of the same sign as the original coefficients. In general, the first term *a* was higher in the regional versions of the algorithms. This can be explained by the overall underestimation of *chla* by the original OCx algorithms. The *a* and

*b* coefficients for the NEP for all sensors were closer to the original ones than for the NWA, which was consistent with the better performance of the original algorithms in the NEP than in the NWA (see Section 3.2 and 3.3). Regarding the coefficients for high degree polynomials (i.e., >= 2), there were no distinct patterns and the coefficients of the regional algorithms tend to be different and often opposite in sign from the original coefficients. The removal of the 443 nm band in the new band ratio algorithms may explain the differences between the original and regional coefficients of the algorithms. The numerator of the band ratio in the OCx algorithm varies between two or three possible wavebands depending on the sensor, where one of the options is the 443 nm band. Our analysis revealed that this band produced more outliers in the polynomial relationship between log-transformed *in situ chla* and the log-transformed band ratio than other numerator wavebands, which is particularly obvious in the NWA MODIS dataset (Figure 3). The blue boxes in Figure 3a,b show how the polynomial fit translates to a small range of modelled *chla* corresponding to a wide range of *in situ chla*. Removing the 443 nm waveband from the potential blue wavebands produced a tighter fit with the polynomial (Figure 3c,d). SeaWiFS and VIIRS NWA matchups datasets showed similar improvements after removing the 443 nm band, while the NEP datasets did not show significant changes after removing this band.

**Figure 3.** (**a**) *In situ chla* as a function of *Rrs* ratio for POLY4 including the 443 nm band, where the blue dots represent the band ratios that used the 488 nm band in the numerator and the pink dots used the 443 nm band in the numerator, (**b**) Satellite-derived versus *in situ chla* when including the 443 nm band in the ratio, where green and red circles represent spring and fall matchups respectively. (**c**) same as (**a**) but without including *Rrs*(443) in the POLY4 algorithm, (**d**) same as (**b**) but without including *Rrs*(443) in the POLY4 algorithm.


**Table 2.** Coefficients of the OCx algorithms for each sensor and region.

#### *3.2. Algorithms performance in the NWA*

The statistic metrics remain consistent across the three sensors, with the regionally-tuned algorithms providing the best performance, as expected. In general, the polynomial-based approach performed better than the semi-analytical algorithms (Figure 4a and Table 3, see scores). It is noteworthy that the polynomial approaches yielded more matchups (96, 81 and 98% of possible matchups for MODIS, SeaWiFS and VIIRS respectively) than the semi-analytical methods (between 57 and 87% of possible matchups depending on sensors, global and regional methods). For the semi-analytical approach, the regional versions systematically provided more valid pixels that the original one.

**Table 3.** Statistics computed for OCx, POLY1, POLY4, Garver–Siegel Maritorena (GSM)\_ORIG, GSM\_GC, and GSM\_GS algorithms for the Northwest Atlantic. Note that other versions of the algorithms OCx and GSM were tested but only included in Appendix A, so the "win ratio" column does not add up to one for each sensor as it does not include all values. Also recall that as a result of the log and reverse log transformations, the ideal value of *MLE* and *MMLE* is one. *N* is the total number of matchups as defined in Section 2.3, and *n* is the number of matchups that returned valid *chla* for the particular algorithm.


**Figure 4.** (**a**) Algorithms' scores for Moderate Resolution Imaging Spectrometer (MODIS), Sea-viewing Wide Field-of-view Sensor (SeaWiFS) and Visible Infrared Imaging Radiometer Suite (VIIRS) in the NWA based on the mean log-transformed error (*MLE*), mean magnitude of log-transformed error (*MMLE*), and *r*<sup>2</sup> from the linear regression of satellite-derived on *in situ chla*, the percentage of retrievals, and the win ratio metric. Red bars correspond to the original algorithms, green bars correspond to regionally-tuned algorithms discussed in the text, grey bars corresponds to regionally-tuned algorithms presented only in Tables A1 and A2 of Appendix A. (**b**) Satellite-derived versus *in situ chla*, green solid circles correspond to the spring season and red solid circles correspond to the fall.

As shown in Table 3, all the generic algorithms (i.e., OCx and GSM\_ORIG) had a slope of the linear regression of satellite-derived *chla* on *in situ chla* lower than 0.84 (0.839 for OC3M) and as low as 0.600 for VIIRS GSM\_ORIG. Note that the slope of the linear regression of the tuned algorithms was forced to one. The regional algorithms also had a higher correlation coefficient than the original methods for all sensors, all algorithms. The *RMSLE* exhibited values (from 0.29 to 0.47) that were in agreement with values reported in the literature for regional algorithms for the retrieval of *chla* [40–42]. Tuned algorithms showed smaller *RMSLE* than original algorithms for both semi-analytical and band ratio approaches except for the GSM approaches for VIIRS. The *MLE* (Equation (13)) was lower than one for the original algorithms for all three sensors, as they have an overall negative bias, as seen in Figure 4b. After correcting the bias in the regional algorithms, the *MLE* was closer to one (particularly in the band ratio models) and the *MMLE* was reduced. In general, the improvement in the *MMLE* was greater for the GSM-type approaches than for the OCx-type approaches. The better performance of the regional algorithms is most noticeable in the mean error, which generally improved by a full order of magnitude. When the algorithms were applied to the identical subset of retrievals, GSM\_ORIG showed the highest win ratio for MODIS and VIIRS, while narrowly losing to OC4 in the SeaWiFS

dataset. The regional linear and fourth-degree polynomial fits (POLY1 and POLY4) provided very similar results for all sensors, which is consistent with the study of Laliberté et al. [15] in the Gulf of Saint-Lawrence (Canada).

#### *3.3. Algorithm Performance in the NEP*

For the NEP, the original OCx algorithms exhibited a better linear fit than for the NWA (Table 4 and Figure 5b). For instance, OC3M showed a slope and intercept of the regression of satellite versus *in situ* data of 0.966 and 0.021 respectively against 0.839 and -0.067 for the NWA. The good performance of the original algorithms is also reflected in the correlation coefficients (*r*<sup>2</sup> ≥ 0.55) for both original algorithms (OCx and GSM\_ORIG) and all sensors, as they have higher values than for the NWA, with the exception of GSM\_ORIG applied to VIIRS data where *r*<sup>2</sup> is 0.48 (Table 4). The regional algorithms, which aimed at reducing the *RMSLE* and forcing the slope and intercept to one and zero respectively, also improved the *MLE*. The *RMSLE* for all algorithms and sensors in the NEP are within the same order of magnitude as in the NWA except for the regional GSM models for SeaWiFS (i.e., GSM\_GC and GSM\_GS), which exhibited low *RMSLE* of 0.16 and 0.17 for GSM\_GC and GSM\_GS, respectively. As in the NWA, regional tuning generally improved the mean error, with the exception of the OCx and band ratio models in the MODIS dataset, where the magnitude of mean error increased, attesting to the good performance of the ocean colour model currently in use for this region and sensor. The semi-analytical algorithms (GSM-type) exhibited higher win ratios in general across all sensors, with GSM\_ORIG showing the highest win ratio for MODIS and VIIRS (0.28 and 0.25 respectively), while GSM\_GS showed the highest win ratio for SeaWiFS. Ultimately, the simple linear band ratio model (POLY1) obtained the highest score across the three sensors but tied with POLY4 and the regional GSM models in the SeaWiFS dataset. It is worth noting that other variations of the band ratio models (POLY2 and POLY3) also performed well, with the third-degree polynomial (POLY3) outperforming POLY1 in the SeaWiFS set, and POLY2 obtaining a higher score than other polynomials in the NWA dataset (Table A1). Similarly, the MODIS NEP dataset contained an exception to the pattern of high-scoring GSM\_GC and GSM\_GS, where GSM\_GCGS outperformed the two.


**Table 4.** Same as Table 3, for the NEP.

**Figure 5.** Same as Figure 4, for the NEP. Note the colour-coding for the solid circles in (**b**): blue corresponds to winter, green to spring, purple to summer, and red to fall.

#### *3.4. Variation of Satellite-Derived Chla with Environmental Factors*

3.4.1. Patterns in *Chla* Uncertainties with Phytoplankton Composition, Time, and Number of Retrievals

The correlation analysis described in Section 2.6 revealed that the magnitude of uncertainty in the retrieved *chla* (|*Echla*|) was strongly positively correlated to *chla* concentration for all sensors, regions, and algorithms, with *r*<sup>2</sup> varying from 0.40 (VIIRS, GSM\_GC, NWA) to 0.97 (GSM\_ORIG, NWA, all sensors). GSM\_GC presented an exception in the NEP SeaWiFS dataset with a weaker correlation between |*Echla*| and *chla* that can be explained by the very low number of matchups (Tables 5 and A2). Similarly, we found a relationship between |*Echla*| and the ratio of *f ucox* to *chla* concentration for all algorithms, which was weaker in the regionally-tuned algorithms (Table 5) indicating that phytoplankton taxonomic composition has an impact on these types of algorithms, which can be reduced by the use of the regional models.


**Table 5.** Pearson's correlation coefficients (*r*) from the test for correlation between |*Echla*| and phytoplankton composition and time (see Section 2.6). \* = statistically significant (*p*-value < 0.05).

There were no significant long-term trends of error magnitude in the time series for all sensors and both regions (Table 5). However, there was a negative correlation between the magnitude of error and the day of year across all sensors in the NWA, which is also apparent in the larger spread of error (the interquartile range) occurring during the first half of each year (1.5 mg m−<sup>3</sup> on average across sensors for both POLY4 and GSM\_GS) than during the last half (0.59 and 0.63 for POLY4 and GSM\_GS respectively), with a general negative bias in the spring across the whole time series (Figure 6 and Table A6). The differences in the spread of error between the first and last halves of the year is generally less noticeable in the NEP (Figure 7), where there is a smaller positive correlation between error magnitude and day of year, corresponding to a larger average spread of error in the second half of each year (1.0 and 0.57 mg m−<sup>3</sup> for POLY4 and GSM\_GS respectively) than in the first half (0.51 and 0.34). This can possibly be attributed in part to the timing of the samples, which were spread out over the year instead of concentrated in the spring and fall as in the NWA.

**Figure 6.** Model residuals across the full-time series for the POLY4 and GSM\_GS algorithms in the NWA for MODIS, SeaWiFS and VIIRS. Blue lines represent *μerror*, and the grey lines indicate the number of valid retrievals. Green boxes represent the first half of the year (January to June) and red boxes represent the second half (July to December). Note the axis is asinh-transformed.

**Figure 7.** Same as Figure 6, for the NEP.

A *t*-test and ANOVA also revealed that in the NWA, the average *μerror* across algorithms in the spring (−0.51, −0.60, and −0.60 mg m−<sup>3</sup> for MODIS, SeaWiFS, and VIIRS respectively) were generally significantly different (opposite in sign and higher in magnitude) than in the fall (0.20, 0.19, and 0.032) (see Table A3), but there were no significant differences in the mean error between seasons in the NEP (Table A4). We did not find any correlation between the number of matchups and the magnitude of the uncertainties for a given season/sensor/region except when only a very low number of matchups was available (e.g., MODIS in spring 2005 in the NWA, 2 matchups, Figure 6) and one bad matchup could negatively weight the entire matchup dataset. In the NEP (Figure 7), there was a large number of outliers spread evenly across spring and fall seasons, unlike the NWA, where spring blooms generated the highest spread of error. Only a few data were available for SeaWiFS in the NEP, which were concentrated mainly in 2009 and 2010, and therefore the correlation and ANOVA results for this region/sensor dataset were less reliable. All variations of GSM also returned fewer valid retrievals

than the band ratio algorithms, mainly due to negative remote sensing reflectances at either end of the spectrum, which prevented the model from generating a valid fit (Table 6).

**Table 6.** Missing retrievals by GSM\_GS algorithm for each region/sensor combination, subdivided by the cause of lost data, with the percentage of lost retrievals given for each cause. *Rrs*(41*X*) is the remote sensing reflectance value at 412 nm for MODIS and SeaWiFS, and 410 nm for VIIRS. *Rrs*(6*XX*) varies from 645 to 671 nm depending on sensor. Note that there were no instances of negative model *chla*, unlike *adg*(443) and *bbp*(443). "Unrealistic unknowns" occur when *chla*, *adg*(443), or *bbp*(443) are outside their acceptable ranges, given in Equation (9). Multiple issues were a combination of *adg*(443) < 0 and *chla* above the acceptable range, with two instances where *bbp*(443) was also too high (>0.18).


#### 3.4.2. Uncertainties Related to Geographic Considerations

The correlation analysis of |*Echla*| on latitude, bathymetry and distance from *in situ* to satellite measurements was carried out to investigate if these variables could explain some of the biases and discrepancies observed between the *in situ* and satellite-derived *chla* (Table 7). There were no correlations observed between |*Echla*| and latitude, which is an interesting result given the range of latitudes in the NWA dataset from approximately 38◦ to 61◦N. Possible exceptions exist in the NWA VIIRS and NEP SeaWiFS datasets, but as mentioned in the previous section, the SeaWiFS dataset had very few data, and the high latitude data in the VIIRS dataset were all retrieved during the spring bloom (see Figure 1a,b), when *chla* concentration would typically be higher and thus |*Echla*| would likely also be higher as the two were collinear (see Section 3.4.1).

There were low negative correlations between |*Echla*| and open ocean bathymetry in the NWA MODIS dataset, and larger negative correlations in the NEP, which again can be explained by the collinearity of *chla* concentration and |*Echla*| and the fact that in the open ocean subset, bathymetry was also negatively correlated with *chla* concentration (*r* = −0.47). As for seasonal bias, the *t*-test on the coastal and offshore datasets in the NWA revealed that the mean uncertainties for both datasets were statistically different in the regional band ratio models for MODIS and SeaWiFS, with *μerror* ranging from 0.14 to 0.42 mg m−<sup>3</sup> for the coastal regions and -0.31 to -0.20 in open ocean, indicating that this family of algorithms systematically underestimated *chla* offshore and overestimates in shallower water, while the difference was less significant with the GSM-type algorithms (see Table A5). The pattern reversed in the NEP, where the regional band ratio underestimated coastal *chla* (−0.90 to −0.77 mg m<sup>−</sup>3) and overestimated *chla* in the open ocean (0.076 to 0.27), again ignoring the SeaWiFS dataset due to insufficient data.

Distance between a satellite matchup and the *in situ* measurement was up to 10 km to maximize the number of matchups, though over half the matchups in each region/sensor subset were within 2 km of the measurement. Regression of |*Echla*| on the distance did not reveal a relationship (|*r*| < 0.1) for any datasets except NWA VIIRS GSM\_ORIG, and the small NEP SeaWiFS dataset, suggesting that in open waters, where spatial variations in *chla* are small (mesoscale) compared to the coastal

environment, the stringent criteria of pixel to *in situ* collocation can be relaxed to increase the number of matchups.


**Table 7.** Pearson's correlation coefficients (*r*) for the test for correlation between |*Echla*| and geographic environmental factors (see Section 2.6). \* = statistically significant (*p*-value < 0.05).

#### **4. Discussion**

Satellite-derived *chla* concentration remains the most used product to infer global scale information on the status of the marine ecosystem, and non-specialists represent an important fraction of ocean colour data users. Here we have assessed the performance of two algorithms currently implemented in NASA's SeaDAS software (https://seadas.gsfc.nasa.gov) in Canadian waters (Atlantic and Pacific Oceans) to inform on biases associated with these algorithms in those regions. In addition, we carried out modifications of these original models by optimizing their parameterization and formulation to correct for regional bias in the NWA and NEP. Both OCx and GSM exhibited a negative overall bias particularly in the NWA, where low *chla* were overestimated and high concentrations were underestimated, which resulted in a linear fit of satellite-derived on *in situ chla* with a slope lower than one. This bias was partially corrected in the modified versions of the models, which also improved the tightness of the relationship as indicated by higher *r*<sup>2</sup> than for the original algorithms, but at the expense of the number of individual retrievals with the lowest magnitude of error between different algorithms ("win ratio", Tables 3 and 4). Significantly different means in *chla* concentration between spring and fall in the NWA (see Section 3.4.1) suggest that the dataset could be further subdivided by season, but at the expense of a synoptical approach. It would also require a larger matchup dataset to achieve statistical significance.

Our approach differed from NASA's in that the algorithms were fitted directly between satellite-derived and *in situ chla*, under the assumption that the satellite *Rrs* were reliable, while the original algorithms were developed using both *in situ* radiometric and *chla* measurements. Furthermore, the regional parameters were optimized, tested, and compared to the original algorithms using the entire dataset for a given region and sensor, rather than extracting a subset for training and using the remainder for testing. Confidence intervals for the polynomial algorithm coefficients were retrieved using a bootstrap method that subsamples the original dataset to compute 2000 different sets of coefficients (see Section 2.4.2), revealing interval widths ranging from 0.08 to 1.26 for the first three coefficients and 0.38 to 2.39 for the last two, which had the smallest impact on the shape of the polynomial in the region defined by the band ratio values. This suggests that the coefficients derived from the full dataset are representative of any subsets. For the GSM algorithms, the sensitivity study on combinations of spectral variations in *P*, *S*, and *Y* (see Section 2.4.4) was performed for each region, sensor, and set of *g* coefficients (spectral or constant), and the median of the subset of highest-scoring combinations was selected. For this reason, we can assume that using the entire dataset or a subset for each combination would have led to similar results after retrieving the median of the optimal combinations. Finally, the small size of some of the datasets, particularly the NEP SeaWiFS set, presented a challenge in developing a reliable set of parameters spanning a wide range of values for *in situ chla* and the environmental variables that were tested for correlations with algorithm error. For these reasons, the regional tuning of the algorithms was carried out on the full datasets rather than subsets.

One of the main results of our studies was the identification of the 443 nm waveband as a strong contributor to the overall uncertainty in the OCx algorithms. Removing this waveband from the OCx algorithms and iteratively forcing the slope and intercept of the linear model between satellite-derived and *in situ chla* to one and zero respectively provided better results than the original algorithms (Table 3). Increasing the degree of the polynomial improved the results for some combinations of region and sensor (see scores in Tables 3 and 4), but overall offered minimal improvements. The negative bias of the original GSM model was corrected by introducing a new exponent, *P*, on the chlorophyll term, which lowered satellite-derived *chla* concentrations <1 mg m−<sup>3</sup> and increased concentrations >1 mg m−3. This exponent *P* was derived in combination with the exponents on the *adg* and *bbp* terms, giving the best set of spectral slopes for each region and sensor. Inclusion of spectral dependence in the *g* coefficients in Equation (11) slightly improved the results, however, optimization of the spectral slopes of the phytoplankton and yellow substances absorption terms, as well as the particulate backscattering term, had a more positive impact. This finding highlights the importance of accounting for regional properties of absorption and backscattering terms, and particularly their spectral dependence. Overall, the GSM-type algorithms provided the highest win ratio (i.e., the highest number of matchups with the smallest *chla* uncertainties), however, this good performance was hampered by the lower number of retrievals compared to the polynomial formulations (Tables 3 and 4). This limitation has to be considered according to the end-user's applications of the *chla* product.

Uncertainties in satellite-derived products are currently an active field of research (IOCCG report on Uncertainties in Ocean Colour Remote Sensing, in preparation) that remains highly complex given the possible sources of uncertainties associated with satellite observations that include, among others, uncertainties in radiometry (and calibration), atmospheric corrections, data binning, and bio-optical algorithms. For instance, it has been shown that at the global scale, the OCx algorithm contains an inherent uncertainty of about 35% [43]. The difference in scales between satellite and *in situ* measurements also creates another source of uncertainty, as *in situ* data refer to about two litres or less of seawater at a discrete depth while satellites integrate a volume of tens of millions of litres of seawater. Note that uncertainties in HPLC-derived *chla* were partially accounted for in the linear model comparing the results by using Type 2 linear regression as described in Section 2.5.1, which minimizes the sum of areas of triangles between each point and the regression line, assuming that both variables contain errors. Here, we have addressed discrepancies between satellite-derived *chla* and *in situ chla* and included temporal and spatial effects. For the two areas of interest, namely the Northwest Atlantic and the Northeast Pacific, we did not find significant patterns of temporal drift, thanks to regular reprocessing of ocean colour products by NASA following a vicarious calibration exercise [44] that ensures stable measurements of the radiative field with time. We did not find spatial patterns, with the exception of minor positive correlations between latitude and magnitude of error in the NWA VIIRS dataset, and negative correlations between open ocean bathymetry and error magnitude in the NEP datasets, which are likely due to changes in *chla* concentration that affect the degree of error. Seasonal biases were observed with an overall underestimation of *chla* in the spring and a small overestimation in the fall. This could be explained by the change in phytoplankton community composition (Table 5), such that when the ratio of fucoxanthin, a marker pigment for diatoms, to *chla* concentration increased, the magnitude of error in *chla* retrieval also increased. This pattern was observed across all algorithms, but weaker in the regionally-tuned algorithms. GSM-type algorithms offered the greatest improvements, in particular, GSM\_GC in the VIIRS dataset where the correlation coefficient of |*Echla*| and the ratio of fucox to *chla* was reduced from 0.54 to 0.27. This demonstrates the effectiveness of regional algorithms at addressing changes in phytoplankton absorption properties, and notably the modified GSM algorithms, which include an exponent on the phytoplankton absorption term to account for changes in absorption properties with changes in phytoplankton community composition in the Northwest Atlantic [45–47]. However, the fact that the *f ucox* to *chla* ratio impacted the regional algorithms for SeaWiFS (i.e., both OCx-like and GSM-like) to a similar degree as the original models suggests that the 510 nm band, which is used across all models for the SeaWiFS sensor, can contain information on the presence of diatoms, as highlighted in Sathyendranath et al. [48] who developed an algorithm to discriminate diatoms from other phytoplankton types for SeaWiFS.

Finally, another notable source of uncertainty is the atmospheric correction method applied to NASA's level 2 images, based on the "black pixel assumption", which assumes that scattering in the near-infrared bands from approximately 670 nm to 865 nm [49] is negligible, thus the ocean appears black. This was a reasonable assumption for case 1 waters [50], but optically-complex case 2 waters can contain elements that contribute to the scattering in those bands, giving an inaccurate correction of remote sensing reflectance values used in *chla* algorithms [51]. Given that many matchups in this report were near coastal or shelf areas, there could have been a degree of error associated with poor atmospheric correction. As discussed in Section 3.4.1, many satellite matchups did contain negative *Rrs* in a waveband near one end of the visible spectrum. These matchups were incapable of computing valid *chla* concentrations for the GSM algorithms but were included in the band ratio algorithms.

#### **5. Conclusions**

This comparison exercise between satellite-derived and *in situ chla* concentration measurements for three ocean colour sensors (i.e., SeaWiFS, MODIS and VIIRS), two widely-used algorithms based on different approaches (i.e., band ratio and semi-analytical) and for two regions (i.e., Northwest Atlantic and Northeast Pacific) revealed systematic bias (slopes varying from 0.60 to 0.84 in the NWA and from 0.63 to 0.97 in the NEP) and relatively large uncertainties. While regional tuning permitted correction for the biases, the uncertainties remained substantial, attesting to the fact that natural variations of phytoplankton optical properties, the presence of other optically active components, and accuracy of the atmospheric correction will challenge accurate retrieval of *chla* concentration. In the Northwest Atlantic, the removal of the 443 nm band in the regional band ratio algorithm improved the performance of the OCx-type algorithms. It is noteworthy that while band ratio algorithms provided the most matchups between satellite and *in situ* data, the GSM-type approach appeared to be most accurate thanks to its ability to decipher the phytoplankton signal from yellow substances, but its performance was hindered by the significantly lower number of retrievals, mainly due to negative radiances present in the blue or red part of the visible spectrum, perhaps due to poor atmospheric corrections.

**Author Contributions:** E.D. conceptualized and led the project, E.D. and S.C. developed the methodology. S.C. wrote most of the code and carried out all the statistical analyses. B.D. contributed to the coding and provided technical support. A.P. provided the dataset for the Northeast Pacific. S.C. and E.D. drafted the manuscript, which was edited by B.D. and A.P.

**Funding:** This research was funded by MEOPAR under the Observation Core Project and the Strategic Program for Ecosystem-Based Research and Advice (SPERA) from Fisheries and Oceans Canada.

**Acknowledgments:** The authors would like to thank NASA's Ocean Biology Processing Group (OBPG) for supplying a large up-to-date satellite ocean colour dataset available for public use (https://oceancolor.gsfc.nasa. gov/), as well as detailed information on the current ocean colour algorithms.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Results Omitted from the Main Text**


**Table A1.** Statistics of extra algorithms not presented in the text, as in Tables 3 and 4.



**Table A1.** *Cont.*

**Table A2.** Pearson's correlation coefficients (*r*) from the test for correlation between |*Echla*| and environmental factors (see Section 2.6) for model versions not presented in the text. \* = statistically significant (*p*-value < 0.05).



**Table A3.** Results of the *t*-test performed on *μerror* of seasonal subsets in the NWA to test for statistically significant differences in mean error between spring and fall. *N* = size of subset. \* = statistically significant (*p*-value < 0.05).

**Table A4.** Results of the ANOVA on *μerror* of seasonal subsets in the NEP, to check for statistically significant differences in means between seasons. The SeaWiFS subset has been excluded from this test, as all SeaWiFS matchups were collected during the summer, with the exception of a single matchup from fall and one from winter. Tukey Honest Significant Difference *p*-values between each pair of seasons are included to determine which seasons, if any, are statistically different from each other.



**Table A5.** Results of the *t*-test to check for differences in *μerror* between coastal (bathymetry <= 200 m) and open ocean (>200 m) subsets. \* = statistically significant (*p*-value < 0.05).

**Table A6.** Average difference between first and third quartiles presented in the boxplots of Figures 6 and 7 across boxes representing the first half of the year, and across the second half of the year.





**Table A8.** *g* coefficients used in Equation (11) after performing linear interpolation to the necessary wavelengths.

**Table A9.** Exponents used in Equations (6) and (10), where the *gc* columns give exponents optimized to the original constant *g*<sup>1</sup> and *g*<sup>2</sup> coefficients, and *gs* columns give the exponents optimized for the use of the spectral *g* coefficients *g*1, *g*2, and *g*3.



**Table A10.** Spectrally-dependent *a*- *ph* coefficients used in Equation (10) for the wavebands used in MODIS, SeaWiFS, and VIIRS.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Letter*
