*3.1. Data Description*

In the current study, a downscaling approach was proposed based on GP and MPR techniques to produce high-resolution MODIS Chl-a data by relating coarse-resolution MODIS Chl-a data to high-resolution Sentinel-2A MSI (S-2A) data. There are some challenges associated with downscaling Chl-a: 1) MODIS Chl-a and S-2A measurements have different revisit times of 8 and 10 days, respectively; 2) in situ data used for validation of the downscaling model are irregularly distributed in space and

rarely accessible; 3) to obtain reasonable results for downscaling Chl-a, a remote sensing image should have cloud coverage less than 10%. It has been reported that sea fog is frequently observed around the Korean peninsula with a maximum occurrence in the West Sea in summer with a mean frequency of 5.3 days in July [43]. The frequency is less than 1 day during fall and winter. Therefore, the above-mentioned challenges imposed some limitations for selection of images on every season. Hence, two MODIS and S-2A Level-1C (hereafter S-2A) datasets were acquired on February 2 (winter) and August 4 (summer) in 2016. All images had less than 10% cloud coverage and were almost concurrent with in situ measurements (±5 days).

The Chl-a concentration maps used in the current study were used from MODIS Chl-a concentration standard mapped images (SMIs) with a 4 km resolution. The SMIs were created from bands 13 (0.653 µm),14 (0.681 µm), and 15 (0.750 µm) of the MODIS sensor (hereafter referred to as MODIS Chl-a). Due to strong fluorescence signals from Chl-a at these spectral bands, these bands were suitable for the detection of the Chl-a concentration [44,45]. For retrieval of Chl-a concentration, the standard OC3/OC4 (OCx) band ratio algorithm combined with color index (CI) introduced by Reference [46] was implemented. The MODIS Chl-a data were obtained from the Giovanni website (https://giovanni.gsfc.nasa.gov/giovanni/).

S-2A band images (Table 1) for high-resolution data were acquired from the Copernicus Open Access Hub (https://scihub.copernicus.eu/). S-2A products include 13 bands with the top-of-atmosphere (TOA) Reflectance. All bands were processed by radiometric and geometric correction using a UTM/WGS84 projection. Recent studies have reported that S-2A is more powerful than Landsat 8 OLI in distinguishing areas affected by algal blooms because S-2A has three near-infrared (NIR) bands and Landsat 8 OLI has only one NIR band [47–51]. The three additional NIR bands of S-2A make it feasible to develop more suitable algorithms for the retrieval of the Chl-a concentration in intense bloom conditions [52].


**Table 1.** Sentinel-2A MSI spectral band characteristics.

The in situ Chl-a measurements used in the analysis were obtained for seven stations from the Korea Institute of Ocean Science and Technology (KIOST) website (http://joiss.kr). The locations of the stations are shown in Figure 1. Multidepth sampling was used to collect water samples from the surface layer (0–15 cm depth). A fairly large water sample was collected and the sample was filtered to concentrate the chlorophyll-containing organisms followed by mechanical rupturing of the collected cells, and extraction of the chlorophyll from the disrupted cells into the organic solvent acetone. The extract was then analyzed by a spectrophotometric method using fluorescence. Therefore, to validate the downscaling models, a total of 14 samples at two depths (Table 2) were employed. The remote sensing and in situ data acquisition dates are shown in Table 2, with in situ data from 1 February 2016, corresponding to remote sensing data from 2 February 2016, and in situ data from 1 August 2016, corresponding to remote sensing data from 4 August 2016.


**Table 2.** Paired Dates of the Acquired Satellite Imagery and Chl-a Measurements of the Study Area.

#### *3.2. Preprocessing*

The image preprocessing methods the same as (1) MODIS gap filling, (2) resampling, (3) land-water separation, and (4) subsetting were used to prepare all satellite images for further processes. Then, feature selection was employed to determine the most important band combinations for Chl-a prediction. First, the grid-fill method [53] was used to fill the missing values in the MODIS Chl-a images. This software has high computational efficiency and fills missing values in an iterative relaxation scheme using Poisson's equation. For validation, random removal of the original Chl-a pixel values was conducted to provide data to validate the grid-fill method. For this purpose, 5% of the original data were deleted from the outside boundary of the study region and preserved for cross-validation. Then, the corresponding values of the deleted data were estimated by utilizing the grid-fill method, and the difference between the original pixel values and the reconstructed values was evaluated. Consequently, the spatial distribution of Chl-a concentrations was produced.

At the second step, all S-2A images were mosaicked and then resampled to a resolution of 10 m (X10). Since the study region had a remarkable number of islands (Figure 1), the reflectance caused by islands could decrease the accuracy of the applied models. Therefore, the third step of preprocessing aimed to separate water from land pixels to improve the downscaling results. The most commonly used indices, namely, the normalized difference vegetation index (NDVI) and normalized difference water index (NDWI), have already been employed for land-water separation purposes [54–58]. In the current study, land-water separation was performed by utilizing the NDWI index. The formula of this index is as follows:

$$\text{NDWI} = \frac{\rho\_{\text{Green}} - \rho\_{\text{NIR}}}{\rho\_{\text{Green}} + \rho\_{\text{NIR}}} \tag{1}$$

where ρGreen and ρNIR are the green and near-infrared (NIR) reflectances of S-2A with 10 m resolution. Based on the NDWI computed images, water pixels have positive values, while negative pixels are usually classified as vegetation or soil features.

At the fourth step, the study area boundaries were used to produce a spatial subset of both MODIS Chl-a and S-2A images, and the S-2A dataset was resampled from 10 m (X10) to 4 km resolution of MODIS Chl-a (X4k) using the nearest neighbor method [59]. Then, MODIS Chl-a and S-2A images at 4 km (Y4k and X4k) were used as dependent and independent variables, respectively, to develop GP and MPR models. Additionally, the developed models were applied to S-2A images at 10 m (X10) to estimate downscaled Chl-a at 10 m spatial resolution.

At the fifth step, the SVM-RFE feature selection method was employed to specify the most important bands of the S-2A dataset (X4k). This method is a widely used technique for feature selection that has been utilized in a wide variety of remote sensing research studies [60–62]. To perform feature selection, multiple mathematical operations, including multiplication, addition, subtraction, rationing, averaging, and square transformation, were used to calculate 597 various combinations of the resampled S-2A bands at 4 km, and the computed combinations served as the input for the feature selection method. Then, SVM-RFE was trained on the calculated dataset using the MODIS Chl-a concentration (Y4k) and S-2A band combinations (X4k) as the predictand and predictor variables, respectively. As a result, relevant combinations for the downscaling procedure were chosen.

#### *3.3. Downscaling*

The downscaling model used in the current study was a regression-based approach considering the statistical relationship between MODIS Chl-a and S-2A bands at 4 km and 10 m spatial resolution. The straightforward formulation of this relationship is expressed as in Equation (2).

$$
\hat{\mathbf{Y}} = \mathbf{f}(\boldsymbol{\lambda})\tag{2}
$$

where *X* is the selected S-2A bands (predictor variables) at 4 km (X4k) and 10 m (X10), Yˆ is the estimated MODIS Chl-a at 4 km (Yˆ 4k) and 10 m (Yˆ <sup>10</sup>), and *f* represents a nonlinear regression function developed by GP and MPR techniques.

First, a relationship between Y4k and X4k was established using all the applied methods as 2 nd-degree MPR, 3rd-degree MPR, 4th-degree MPR, and GP. From the 636 pixels of the independent (X4k) and dependent (Y4k) variables, 445 pixels were used for training, and the remaining 191 pixels were reserved for the validation of the models. Standardization was a prerequisite step before training all the applied methods to ensure that all variables remained on the same scale. Therefore, the standardization of all independent variables (X4k) in the training and validation set was performed by subtracting the mean and dividing by the standard deviation of the training set. This preprocessing sped up the convergence and allowed efficient training of the network. Then, the estimated Chl-a at 4 km (Yˆ 4k) was subtracted from the original MODIS Chl-a at 4 km (Y4k) as follows:

$$
\varepsilon\_{4\mathbf{k}} = \mathbf{Y}\_{4\mathbf{k}} - \mathbf{\hat{Y}}\_{4\mathbf{k}} \tag{3}
$$

where ε4k is the low-resolution residual at 4 km.

A simple kriging interpolation technique [63,64] was utilized to interpolate the ε4k to 10 m resolution (ε10) using the center points of the MODIS Chl-a pixels. Finally, to produce the downscaled map of Chl-a at 10 m resolution (Y10), the developed model as Yˆ <sup>10</sup> = f(X10) in Equation (2) was added to ε10, which is expressed in the function below (Equation (4)):

$$Y\_{10} = \hat{Y}\_{10} + \varepsilon\_{10.} \tag{4}$$

#### *3.4. Sensitivity Analysis*

For a given mathematical model, a sensitivity analysis (SA) measures how much the uncertainty and fluctuations of the input variable contribute to the outputs or performance of the system. In general, SA may be performed by two different techniques, local and global SA, with the former exploring the important model factors for a given set of factor values, and the latter apportioning the uncertainty in outputs to the uncertainty in each input factor to identify the important factors [65]. In the current study, the Morris method [66], as the most widely used global SA method, was employed to quantify the sensitivity of the GP and MPR models. This method is also called the once-at-a-time (OAT) method because, in each run, a new value is assigned to only one input variable. To carry out SA, the Morris sensitivity measure index (µ ∗ ) was used as in Equation (5):

$$
\mu\_{\rm i}^{\*} = \frac{\sum\_{\rm i=1}^{r} |\rm EE\_{i,n}|}{\rm r} \tag{5}
$$

where i is the number of input variables, r is the number of sample points in the parameter space (indexed n) and EEi,n is the elementary effects (EEs) assessed for the i-th input variable using the n-th sample point. EEs are employed to specify noninfluential inputs for a computationally costly mathematical model or for a model with a great number of input parameters, where the costs of evaluating other SA methods such as variance-based methods are not reasonably priced.

#### *3.5. Mathematical Background*

#### 3.5.1. Multiple Polynomial Regression

The polynomial model is a form of a regression method that can be used when the relationship between an independent and dependent variable is curvilinear [67]. The nth order polynomial model with one variable Equation (6) is the general form of the polynomial model that indicates the nonlinear relationship between one predictand and one predictor variable.

$$\mathbf{Y} = \boldsymbol{\beta}\_0 + \boldsymbol{\beta}\_1 \mathbf{X} + \boldsymbol{\beta}\_2 \mathbf{X}^2 + \dots + \boldsymbol{\beta}\_m \mathbf{X}^n + \varepsilon \tag{6}$$

where β1, β2, . . . , β<sup>m</sup> are the unknown regression coefficients and ε is random error. Furthermore, MPR can also be defined with different degrees. For instance, a quadratic (second-order, n = 2) polynomial model can be given as in Equation (7).

$$\mathbf{Y} = \beta\_0 + \beta\_1 \mathbf{X}\_1 + \beta\_2 \mathbf{X}\_2 + \beta\_{11} \mathbf{X}\_1^2 + \beta\_{22} \mathbf{X}\_2^2 + \beta\_{12} \mathbf{X}\_1 \mathbf{X}\_2 + \varepsilon \tag{7}$$

To select the best model between different MPR degrees for MODIS Chl-a downscaling, three degrees of models, including second-order (2nd), third-order (3rd), and fourth-order (4th), were trained and compared.

#### 3.5.2. Genetic Programming

**4. Results** 

*4.1. Preprocessing* 

day that occurs due to high cloudiness on the summer day.

GP is one of the most recent data-driven techniques developed by Koza [68] and is a collection of techniques for finding a highly fit individual in the space of possible solutions. In GP, individuals are mathematical formulas created by combinations of functions (e.g., *sin*, *cos*, ÷, ×, +) and variables (e.g., *x*, *y*, 6). Each individual takes the role of a possible solution for a given problem. Figure 3 shows four simple formulas (individuals) created by functions and variables as an example. Each individual has its fitness value to optimize. GP applies evolutionary computation to find the best individual for optimized fitness values. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 12 of 28 In the current study, the gplearn package [69] in Python software was used for GP implementation. In general, GP has two major parameters (population size and generation size) that should be optimized to generate high performance. The optimum values of the mentioned parameters were computed by utilizing the harmony search (HS) algorithm introduced by Geem*,* et al. [70] using 10-fold cross-validation. For more information about the HS method, readers are referred to Geem, Kim and Loganathan [70] and Lee and Singh [59].

**Figure 3.** Tree structures illustrating crossover and mutation in GP. **Figure 3.** Tree structures illustrating crossover and mutation in GP.

Generally, the GP technique follows four steps to find the fittest individual: (1) an initial random population of individuals composed of functions and variables is created; (2) the fitness of each individual in the population is validated with a problem-specific fitness function, and the most

Therefore, it can be concluded that the grid-fill method exhibits good performance for the reconstruction of MODIS Chl-a values. Note that the reason for the better performance of the method on the winter day than on the summer day might be related to the presence of more missing values in the selected scene of the summer appropriate individuals are selected to survive in the new population as parents; (3) once parents are selected, they create better types known as offspring or new generations by producing algorithms known as genetic operators (i.e., crossover, mutation, and duplication); (4) then, the individuals are assessed for fitness; (5) the process from (2) to (4) is repeated over many generations until an individual satisfies a given success criterion (e.g., the number of generations exceeds the specified number of iterations).

Figure 3 illustrates the crossover and mutation operations in GP. Individuals in GP are shown in tree form with different characteristics, such as size, shape, and content. The crossover and mutation operations are performed to produce new offspring (the panels in the lower row) from the parent (the panels in the upper row). Additionally, Figure 3 displays how crossover and mutation change the final functions and eliminate an input variable *y* at the second offspring (the lower right panel).

In the current study, the gplearn package [69] in Python software was used for GP implementation. In general, GP has two major parameters (population size and generation size) that should be optimized to generate high performance. The optimum values of the mentioned parameters were computed by utilizing the harmony search (HS) algorithm introduced by Geem, et al. [70] using 10-fold cross-validation. For more information about the HS method, readers are referred to Geem, Kim and Loganathan [70] and Lee and Singh [59].

#### **4. Results**

*Remote Sens.* **2020**

#### *4.1. Preprocessing*

The scatter plots between the original MODIS Chl-a and their reconstructed values with the grid-fill method are shown in Figure 4. According to the results, a good match between the original MODIS Chl-a and their filled values can be seen with R<sup>2</sup> = 0.996 for winter (panel (a)) and R<sup>2</sup> = 0.939 for summer (panel (b)). Therefore, it can be concluded that the grid-fill method exhibits good performance for the reconstruction of MODIS Chl-a values. Note that the reason for the better performance of the method on the winter day than on the summer day might be related to the presence of more missing values in the selected scene of the summer day that occurs due to high cloudiness on the summer day.

The most important combinations of S-2A images, among the 597 cases, were chosen by utilizing the SVR-RFE method. Four high-ranked combinations were selected as the final variables, namely, B1/B3, B2/B3, B1/(B3+B4), and B2/(B3+B4). These selected predictors are a combination of bands 1 (Coastal aerosol), 2 (Blue), 3 (Green), and 4 (Red). , *12*, x FOR PEER REVIEW 13 of 28

**Figure 4.** *Cont*.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 13 of 28

**Figure 4.** 1:1 scatter plots of the original MODIS Chl-a vs. reconstructed Chl-a by the grid-fill method on (**a**) the winter day (2016.2.2) and (**b**) the summer day (2016.8.4). **Figure 4.** 1:1 scatter plots of the original MODIS Chl-a vs. reconstructed Chl-a by the grid-fill method on the winter day (2016.2.2) and the summer day (2016.8.4).

#### *4.2. Downscaling Results* The most important combinations of S-2A images, among the 597 cases, were chosen by utilizing the SVR-RFE method. Four high-ranked combinations were

3 (Green), and 4 (Red).

*4.2. Downscaling Results* 

Table 3).

For downscaling, MPR with three degrees (2nd, 3rd, and 4th) and GP were trained with the four determined band combinations as predictors and MODIS Chl-a as a predictand at low-resolution (4 km). Then, a residual correction was performed utilizing the described methodology in the downscaling section (Section 3.3), and Equations (3)–(4) to produce high-resolution Chl-a maps (Y10). To assess the accuracy of the downscaling technique, the final downscaled maps (i.e., Y10) were validated with the original MODIS Chl-a maps at a pixel size of 4 km. For this purpose, sample values were extracted from the downscaled maps (Y10) within a 3 × 3 window around the center of each MODIS Chl-a pixel, and the mean of each sample was calculated and compared with the original MODIS Chl-a (Figure 5 and Table 3). selected as the final variables, namely, B1/B3, B2/B3, B1/(B3+B4), and B2/(B3+B4). These selected predictors are a combination of bands 1 (Coastal aerosol), 2 (Blue), For downscaling, MPR with three degrees (2nd, 3rd, and 4th) and GP were trained with the four determined band combinations as predictors and MODIS Chl-a as a predictand at low-resolution (4 km). Then, a residual correction was performed utilizing the described methodology in the downscaling section (Section 3.3), and Equations (3)–(4) to produce high-resolution Chl-a maps (Y10). To assess the accuracy of the downscaling technique, the final downscaled maps (i.e., Y10) were validated with the original MODIS Chl-a maps at a pixel size of 4 km. For this purpose, sample values were extracted from the downscaled maps (Y10) within a 3×3 window around the center of each MODIS Chl-a pixel, and the mean of each sample was calculated and compared with the original MODIS Chl-a (Figure 5 and

**Figure 5.** 1:1 scatter plots of the original MODIS Chl-a at 4 km pixel size vs. downscaled MODIS Chl-a at 10 m pixel size; (**a**) the winter day (2016.2.2): (**a-1**) 2nd-degree MPR, (**a-2**) 3rd-degree MPR, (**a-3**) 4 th-degree MPR, (**a-4**) GP, and (**b**) the summer day (2016.8.4): (**b-1**) 2nd-degree MPR, (**b-2**) 3rd-degree MPR, (**b-3**) 4th-degree MPR, and (**b-4**) GP.


Notes: MAE is mean absolute error, MBE is mean bias error, RMSE is the root mean square error, and R<sup>2</sup> is the coefficient of determination.

On the winter day, as shown in Table 3, the performance measurements of the GP model were slightly better than those of the 2nd-degree MPR. According to the performance indices, the accuracy of the models was ranked as GP > 2 nd-degree MPR > 3 rd-degree MPR > 4 th-degree MPR for the winter day. For the summer day, the rank was GP > 3 rd-degree MPR > 2 nd-degree MPR > 4 th-degree MPR. Overall, the GP exhibited the best performance for the winter and summer days.

This finding shows the superiority of GP over the MPR method with different degrees of Chl-a downscaling. The possible reason for this result might be related to the flexible structure of the GP model compared with the fixed formulation of MPR models. While MPR models use a fixed form to define the relationship between the predictor and predictand variables, the evolution process in GP allows its function to take any feasible formulation. This flexibility gives GP the ability to adopt any form of functions to capture various relationships between the predictor and predictand variables, even highly nonlinear relations. Therefore, this unique feature of GP increases the probability of finding the best relationship in the downscaling procedure, resulting in a better prediction than the MPR models.

The accuracy of the models is presented in Figure 5. The best match between the MODIS Chl-a and simulated values can be seen in the GP model (see the panels in the fourth column in Figure 5), with R<sup>2</sup> = 0.927 and RMSE = 0.1642 on the winter day, compared to the MPR models (2nd-, 3rd-, and 4 th-degree). The same result can be seen on the summer day (the best performance in the GP model), as shown in the bottom panels of Figure 5. Furthermore, it can be seen that all the applied models estimate Chl-a better at lower concentrations than at higher concentrations, particularly in the range of 1.5–3.5 mg m−<sup>3</sup> , as presented in Figure 5.

#### *4.3. Model Evaluation*

#### 4.3.1. Visual Comparison

The detailed maps of the downscaling approach for the winter and summer days are shown in Figures 6 and 7, respectively. The good agreement between the original MODIS Chl-a (the panels in the first column in Figure 6) and the estimated Chl-a maps (the panels in the last column in Figure 6) can be seen for the GP and 2nd-degree MPR. From Figure 6, the major trend of Chl-a is fairly modeled with GP and 2nd-degree MPR but specific and substantially high values are not captured in both models such as the values in the near coastal area. However, the residual model can additionally capture this high variability and produce a reliable estimate in the final stage, as seen in the last column of Figure 6. According to the results illustrated in Table 3, and the panels in the last column of Figure 6, 4th-degree MPR cannot estimate the Chl-a values at 10 m resolution as accurately as the other models, especially in coastal areas. This phenomenon indicates that the GP and 2nd-degree MPR can capture the most variation in high Chl-a concentrations at 4 km resolution.

17 of 28

**Figure 6.** Detailed maps showing the downscaling steps for all models on the winter day (2016.2.2); (**a**) 2nd-degree MPR, (**b**) 3rd-degree MPR, (**c**) 4th-degree MPR, and (**d**) GP.

By estimating the Chl-a concentrations on the summer day, similar model behaviors became obvious on the summer day compared to the winter day (Figure 7). Among all the applied models, GP estimations in coastal areas showed better agreement with the original MODIS Chl-a on both the winter and summer days than all the applied MPR models. Since coastal areas play a vital role in marine ecosystems and human health, GP estimations are considerable for water quality monitoring in coastal areas.

From Figure 7, the simulation results became worse in the sea (the second and last column panels in Figure 7), and all models tended to underestimate some high Chl-a values. The possible reason for this result might be the higher spatial variability of the MODIS Chl-a concentration on the summer day than on the winter day (Figure 8), which is different from the normal distribution (δ = 0.522, θ = 0.733). Therefore, all the applied models did not fairly estimate the Chl-a values in the sea, and their accuracy decreased on the summer day compared to that on the winter day (Figure 5 and Table 3). From Figures 6 and 7, the Chl-a concentration gradient follows a similar pattern with water depth in all maps, and its concentration decreases as water depth increases. Therefore, the region close to the seashore shows a high Chl-a concentration, while the region in the sea presents a low concentration of Chl-a.

GP.

4.3.2. In Situ Validation

19 of 28

**Figure 7.** Detailed maps showing the downscaling steps for all models on the summer day (2016.8.4); (**a**) 2nd-degree MPR, (**b**) 3rd-degree MPR, (**c**) 4th-degree MPR, and (**d**) GP. gradient follows a similar pattern with water depth in all maps, and its concentration decreases as water depth increases. Therefore, the region close to the seashore shows a high Chl-a concentration, while the region in the sea presents a low concentration of Chl-a.

**Figure 8.** Histogram and fitted normal distribution of MODIS Chl-a for (a) the winter day (2016.2.2) and (**b**) the summer day (2016.8.4). δ and θ are the coefficient of variation (COV) and skewness coefficient, respectively. Note that the COV and skewness coefficient values for a normal distribution are zero. The skewness coefficient indicates that the distribution of the summer day data presents more non-normality than that of the winter day. **Figure 8.** Histogram and fitted normal distribution of MODIS Chl-a for (**a**) the winter day (2016.2.2) and (**b**) the summer day (2016.8.4). δ and θ are the coefficient of variation (COV) and skewness coefficient, respectively. Note that the COV and skewness coefficient values for a normal distribution are zero. The skewness coefficient indicates that the distribution of the summer day data presents more non-normality than that of the winter day.

#### 4.3.2. In Situ Validation

The results of the downscaling models were assessed with the in situ Chl-a data of seven stations along the study area (Figure 1) to validate model performances. For this purpose, sample values were extracted within a 3 × 3 window around the center of each station point, and the mean of each sample was calculated and compared with the in situ data. The results in terms of the R<sup>2</sup> and RMSE are presented in Figure 9 for the winter day (left panel) and the summer day (right panel). The computed *p*-values (Figure 9) shows that R<sup>2</sup> of all models is statistically significant (*p*-values lower than 0.05). In general, the R<sup>2</sup> of the GP model was higher than that of the other models for the winter and summer days, which were 0.59 and 0.47, respectively, as shown in Figure 9. Additionally, the RMSE of the GP model was smaller than that of the other models for the winter and summer days, at 0.766 and 0.483, respectively. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 21 of 28 The results of the downscaling models were assessed with the in situ Chl-a data of seven stations along the study area (Figure 1) to validate model performances. For this purpose, sample values were extracted within a 3×3 window around the center of each station point, and the mean of each sample was calculated and compared with the in situ data. The results in terms of the R2 and RMSE are presented in Figure 9 for the winter day (left panel) and the summer day (right panel). The computed *p*-values (Figure 9) shows that R2 of all models is statistically significant (*p*-values lower than 0.05). In general, the R2 of the GP model was higher than that of the other models for the winter and summer days, which were 0.59 and 0.47, respectively, as shown in Figure 9. Additionally, the RMSE of the GP model was smaller than that of the other models for the winter and summer days, at 0.766 and 0.483, respectively.

**Figure 9.** 1:1 scatter plots of downscaled MODIS Chl-a and in situ data for all methods using two measured data at 0–15 cm depth; (**a**) the winter day (2016.2.2) and (**b**) the **Figure 9.** 1:1 scatter plots of downscaled MODIS Chl-a and in situ data for all methods using two measured data at 0–15 cm depth; (**a**) the winter day (2016.2.2) and (**b**) the summer day (2016.8.4).

These results indicate that the GP model can estimate Chl-a concentrations more accurately than the other models on both winter and summer days. From Figures 5 and 9, one can see that the validation of the downscaling models with the original MODIS Chl-a and the in situ data provides different levels of accuracy. This discrepancy shows the uncertainties in remote sensing data, indicating that the performance of the downscaling model is greatly affected by the accuracy of These results indicate that the GP model can estimate Chl-a concentrations more accurately than the other models on both winter and summer days. From Figures 5 and 9, one can see that the validation of the downscaling models with the original MODIS Chl-a and the in situ data provides different levels of accuracy. This discrepancy shows the uncertainties in remote sensing data, indicating that the performance of the downscaling model is greatly affected by the accuracy of the remote sensing reflectance.

#### 4.3.3. Sensitivity Analysis

summer day (2016.8.4).

the remote sensing reflectance.

4.3.3. Sensitivity Analysis

To explore the sensitivity of the applied models to the changes in input parameters, the sensitivity measure index (µ ∗ ) was calculated, as shown in Equation (5), for all predictor variables on both winter and summer days (Figures 10 and 11). The results of SA indicate that all the applied models are sensitive to the surface reflectance changes (µ ∗ values range from 0 to 1.45 for the winter and 0 to 2.42 for the summer day), and this sensitivity varies between the applied models. Figures 10 and 11 show that the 2nd-degree MPR model is the most sensitive model (µ ∗ values ranged from 0.09 to 1.24 for the winter and 0.94 to 2.42 for the summer day), while the GP model is the least sensitive model (µ ∗ = 1.45 and 1.4 for B2/B3 band combination in winter and summer days, respectively) to the changes in the predictor variables. In addition, the sensitivity of the MPR models increases on the summer day compared to that of the winter day, while the GP model shows a slight decrease to the changes of the B2/B3 band combination on the summer day compared to that of the winter day.

10 and 11 show that the 2nd-degree MPR model is the most sensitive model (

reflectance changes (

of μ

values of

model.

μ

μ

task in the downscaling procedure [71].

while the GP model is the least sensitive model (

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 22 of 28

μ

To explore the sensitivity of the applied models to the changes in input parameters, the sensitivity measure index (

slight decrease to the changes of the B2/B3 band combination on the summer day compared to that of the winter day.

(5), for all predictor variables on both winter and summer days (Figures 10 and 11). The results of SA indicate that all the applied models are sensitive to the surface

predictor variables. In addition, the sensitivity of the MPR models increases on the summer day compared to that of the winter day, while the GP model shows a

It can be concluded that although GP and 2nd-degree MPR show comparable accuracy (Figure 5 and Table 3) on both winter and summer days, the GP model

μ

\* values range from 0 to 1.45 for the winter and 0 to 2.42 for the summer day), and this sensitivity varies between the applied models. Figures

μ

\* values ranged from 0.09 to 1.24 for the winter and 0.94 to 2.42 for the summer day),

\* = 1.45 and 1.4 for B2/B3 band combination in winter and summer days, respectively) to the changes in the

\* ) was calculated, as shown in Equation

**Figure 10.** Sensitivity measure index ( μ\* ) for the winter day (2016.2.2); (**a**) 2nd-degree MPR, (**b**) 3rd-degree MPR, (**c**) 4th-degree MPR, and (**d**) GP. Note that the higher values \* for a given parameter indicate higher sensitivity of the model to the changes of the parameter. **Figure 10.** Sensitivity measure index (µ ∗ ) for the winter day (2016.2.2); (**a**) 2nd-degree MPR, (**b**) 3rd-degree MPR, (**c**) 4th-degree MPR, and (**d**) GP. Note that the higher values of µ ∗ for a given parameter indicate higher sensitivity of the model to the changes of the parameter. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 23 of 28

**Figure 11.** Sensitivity measure index ( μ\* ) for the summer day (2016.8.4); (**a**) 2nd-degree MPR, (**b**) 3rd-degree MPR, (**c**) 4th-degree MPR, and (**d**) GP. Note that the higher \* for a given parameter indicate higher sensitivity of the model to the changes of the parameter. **Figure 11.** Sensitivity measure index (µ ∗ ) for the summer day (2016.8.4); (**a**) 2nd-degree MPR, (**b**) 3rd-degree MPR, (**c**) 4th-degree MPR, and (**d**) GP. Note that the higher values of µ ∗ for a given parameter indicate higher sensitivity of the model to the changes of the parameter.

**5. Discussion and Conclusions**  In the current study, a downscaling framework composed of GP and MPR models was developed to downscale MODIS Chl-a from 4 km to 10 m pixel size using S-2A band combinations at 10 m as predictor variables. The MODIS Chl-a at 4 km was downscaled in four main steps: (i) acquiring MODIS Chl-a and S-2A images at 4 km and 10 m, respectively; (ii) applying feature selection to select the most important S-2A band combinations as predictor variables; (iii) employing the trained MPR with three degrees (2nd, 3rd, and 4th) and GP model to downscale MODIS Chl-a; (iv) assessment of the results with original MODIS Chl-a maps at a It can be concluded that although GP and 2nd-degree MPR show comparable accuracy (Figure 5 and Table 3) on both winter and summer days, the GP model with the least sensitivity to the changes in the input parameters is more effective than the 2nd-degree MPR for downscaling Chl-a. Additionally, SA results show that the accuracy of all models is strongly related to the accuracy of the remote sensing data; this further confirms the need for atmospheric correction as an essential task in the downscaling procedure [71].

#### pixel size of 4 km and in situ measurements. **5. Discussion and Conclusions**

By comparing the performance of all the models applied for downscaling, the GP model presents more accurate results and less sensitivity to the changes in all variables for downscaling Chl-a. The superiority of the GP model over MPR models is related to the flexible formulation structure of the GP compared to the fixed formulation of the MPR models. The flexible structure of the GP allows its function to take any feasible formulation so that it increases the probability of finding the best input-output relationship between thousands of formula. Visual comparison of the applied models showed that although the major trend of Chl-a is fairly modeled with GP and 2nd-degree MPR, but specific and substantially high values are not captured with both models such as the values in near coastal areas. This drawback of the models can be solved with the residual correction that makes it an essential procedure to improve the accuracy of the spatial downscaling Among all the applied models, the GP model provided better estimations in coastal areas than all the MPR models. Therefore, GP can serve as a feasible In the current study, a downscaling framework composed of GP and MPR models was developed to downscale MODIS Chl-a from 4 km to 10 m pixel size using S-2A band combinations at 10 m as predictor variables. The MODIS Chl-a at 4 km was downscaled in four main steps: (i) acquiring MODIS Chl-a and S-2A images at 4 km and 10 m, respectively; (ii) applying feature selection to select the most important S-2A band combinations as predictor variables; (iii) employing the trained MPR with three degrees (2nd, 3rd, and 4th) and GP model to downscale MODIS Chl-a; (iv) assessment of the results with original MODIS Chl-a maps at a pixel size of 4 km and in situ measurements.

alternative model to estimate Chl-a concentrations in coastal areas with complex characteristics, where water quality monitoring plays a vital role in the protection of marine ecosystems and human health. Moreover, the results indicate that the performance of the models greatly depends on the spatial variability of the MODIS By comparing the performance of all the models applied for downscaling, the GP model presents more accurate results and less sensitivity to the changes in all variables for downscaling Chl-a. The superiority of the GP model over MPR models is related to the flexible formulation structure of the GP compared to the fixed formulation of the MPR models. The flexible structure of the GP allows its function to take any feasible formulation so that it increases the probability of finding the best input-output relationship between thousands of formula. Visual comparison of the applied models

showed that although the major trend of Chl-a is fairly modeled with GP and 2nd-degree MPR, but specific and substantially high values are not captured with both models such as the values in near coastal areas. This drawback of the models can be solved with the residual correction that makes it an essential procedure to improve the accuracy of the spatial downscaling model.

Among all the applied models, the GP model provided better estimations in coastal areas than all the MPR models. Therefore, GP can serve as a feasible alternative model to estimate Chl-a concentrations in coastal areas with complex characteristics, where water quality monitoring plays a vital role in the protection of marine ecosystems and human health. Moreover, the results indicate that the performance of the models greatly depends on the spatial variability of the MODIS Chl-a concentration. Its distributional characteristics (e.g., normal or skewed) can be a good option for model selection criteria. Therefore, to ensure the validity of the GP model in other coastal areas, it is recommended to assess the normality to select the GP model for spatial downscaling of the MODIS Chl-a.

Although the downscaling procedure used in the current study was applied on two selected days, one in winter and one in summer based on data availability, it produced reasonable results due to using the S-2A dataset with 10 m spatial resolution. Future studies can be focused on extending this procedure at a higher temporal resolution. In addition, more research can be conducted with deep learning techniques for downscaling Chl-a concertation. In addition to the surface reflectance, more predictors such as NDVI and NDWI, due to the strong Chl-a absorption at red-NIR spectral regions, may be used in determining Chl-a concentration.

**Author Contributions:** H.M. initiated the idea and contributed to the analysis of the data and writing of the manuscript. T.L. secured funding for the project and contributed to the writing of the manuscript. J.Y. made revisions to the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Research Foundation of Korea (NRF), grant number 2018R1A2B6001799.

**Acknowledgments:** The authors would like to thank NASA and the ESA (European Space Agency) for providing the MODIS and Sentinel-2A MSI data, respectively. Additionally, we appreciate the Korea Institute of Ocean Science & Technology (KIOST) for providing Chl-a measurements of the western coast of South Korea. We would like to thank the three anonymous reviewers, whose comments significantly improved the paper.

**Conflicts of Interest:** The authors declare no conflict of interest.
