4.1. Performance of SST Reconstruction
Figure 4 shows the validation results of the MODIS and AMSR2 SSTs reconstructed through DINCAE. While both resulted in similar R
2 (0.98 for MODIS and 0.99 for AMSR2) and bias (0.03 °C for MODIS and –0.09 °C for AMSR2), the reconstructed AMSR2 SST produced slightly better performance metrics than the MODIS one in terms of RMSE, rRMSE, and MAE by 0.25 °C, 0.86%, and 0.14 °C, respectively (
Figure 4a,b). The reconstructed MODIS SST was relatively underestimated and had a slightly higher variance when compared to the reconstructed AMSR2 SST. The different performances between the two reconstructed SSTs were possibly due to differences in missing data rates, spatial resolution, and the number of SST training samples [
7,
70].
The reconstructed MODIS and AMSR2 SSTs showed the RMSE distribution between 0.1 and 0.7 °C across the study area. Both results have higher RMSE distribution (~0.7 °C) in tiles 1–3 (i.e., high latitudes; refer to
Figure 2b) than tiles 4–6 (
Figure 4c,d). This corresponds to the results by tile using the optimized parameters (
Table S2). In addition, high RMSE (>1.0 °C) was found near the land (e.g., tile 1;
Figure 4c,d). This high error near the shelf region corresponds to the results of Han et al., (2020), where DINCAE was applied to reconstruct chlorophyll concentrations in coastal areas. The study pointed out that river discharge and external factors such as marine activities affected the distribution of chlorophyll-a and SST, increasing the reconstruction error [
35]. Therefore, one possible reason of high error near the land in the present study is high spatiotemporal variability of SST due to the west boundary current (i.e., Kuroshio current), which limited the anomaly estimation of DINCAE in the coastal areas [
35]. In addition, the swath characteristic of AMSR2 affected the spatial distribution of RMSE along with missing data regions (145–150° E in
Figure 4d).
Table 4 presents the accuracy assessment results of the reconstruction models by tile. Interestingly, tiles 1–3 at high latitudes performed less than tiles 4–6 at low latitudes (Δ0.3 °C in RMSE, Δ1% in rRMSE, and Δ0.15 °C in MAE for MODIS; and Δ0.3 °C in RMSE, Δ1% in rRMSE, and Δ0.15 °C in MAE for AMSR2). Accuracy difference by latitude possibly resulted from the difference in the range of SST during the study period. The SST range that DINCAE estimated was 35 °C (from –4 to 31 °C) for tiles 1–3, while 21 °C (from 11 to 32 °C) for tiles 4–6, which affected the model performance (
Figure S1).
Table 5 summarizes the accuracy of the original SST and reconstructed SST pixels (i.e., estimated SST from DINCAE), when compared to the in situ measurements. Both MODIS and AMSR2 SSTs showed high correlation (
= 0.98–0.99) with in situ measurements. The original MODIS SST pixels showed higher accuracy than the original AMSR2 SST when compared to in situ measurements by RMSE of 0.07 °C, rRMSE of 0.94%, MAE of 0.07 °C, and ARME of 0.008. The reconstructed MODIS SST pixels resulted in less accuracy (RMSE = 1.1 °C, rRMSE = 5.96%, MAE = 0.78 °C, and ARMAE = 0.041) against in situ data than the original MODIS one (RMSE = 0.76 °C, rRMSE = 3.42%, MAE = 0.53 °C, and ARMAE = 0.023). On the other hand, the reconstructed AMSR2 SST pixels yielded accuracy similar to the original AMSR2 one.
Figure 5 and
Figure 6 show the examples of the original SST, the original SST with the occlusion mask for validation, the reconstructed SST, and its error standard deviation for both MODIS and AMSR2 on 25 May 2015, and 6 November 2018, respectively. Overall, DINCAE well simulated the spatial distribution of SST for both MODIS and AMSR2. Due to the coarse resolution of AMSR2, the result did not show the detailed SST patterns but produced relatively good reconstruction in the occluded areas (
Figure 5a–c;
Figure 6a–c;
Figure S2). In particular, the MODIS SST after reconstruction clearly showed ocean features such as eddies (i.e., mesoscale features < 100 km) and a polar front (blue boxes in
Figure 5a,c,g). However, smoothing occurred for fine-scale features (i.e., sub-mesoscale features < 50 km) such as a warm core ring edge of an eddy (red box in
Figure 5e,g). DINCAE results in dimension reduction while extracting features from SST data through the pooling convolutional layers. When DINCAE reconstructs the reduced data to the original dimension using the nearest resampling, smoothing often occurs, and some fine feature information disappears [
25]. While skip connection to keep the characteristics of the original data was applied to mitigate the smoothing problem, smoothing still remained, affecting the performance of reconstruction at the fine feature scale (i.e., submesoscale).
As shown in
Figure 5 and
Figure 6, the error standard deviation of AMSR2 SST was low (<1 °C) over the study area including the areas with missing data. Similar to other accuracy metrics, the difference in the error standard deviation by latitude was also dominant in the reconstructed AMSR2 SST (
Figure 5f and
Figure 6f). OI-based DINCAE generally yielded relatively low errors for areas where data exist similar to the results of AMSR2 (
Figure 5f and
Figure 6f). On the other hand, the error standard deviation of the reconstructed MODIS SST varied by tile and the difference by latitude was not great. Some areas at high latitude (tiles 1–3) where original SST data exist yielded the high error standard deviation for the reconstructed MODIS SST (green boxes in
Figure 5f,h;
Figure 6f,h). The scaled error probability density function of MODIS was positively skewed, while that of AMSR2 followed the normal distribution (
Figure S3), which made the MODIS reconstruction slightly less performed than the AMSR2 one. In addition, both AMSR2 and MODIS results showed high error standard deviation in the coastal region of Japan regardless of the presence of data, which was similar to the spatial distribution of RMSE (
Figure 4c,d). These results imply that areas with low reconstruction accuracy had large uncertainty in the estimation of reliable error standard deviation. Such an uncertainty has been reported in the previous studies that applied OI for EOF [
13].
4.2. Improvement of the Reconstructed SST
Figure 7 presents the accuracy assessment results of the data fusion model to improve SST using RF for schemes 1 and 2. It should be noted that both Scheme 1 and Scheme 2 showed better performance than the original and reconstructed MODIS SSTs (
Figure 7;
Table 5). Scheme 2 that used multi-satellite data outperformed scheme 1 based on a single satellite data source, yielding higher accuracy metrics (Scheme 2:
= 0.99, Bias = 0.01 °C, RMSE = 0.75 °C, rRMSE = 3.83%, MAE = 0.53 °C and ARMAE = 0.026; Scheme 1:
= 0.98, Bias = 0.02 °C, RMSE = 0.98 °C, rRMSE = 4.99%, MAE = 0.68 °C and ARMAE = 0.034). This implies that the synergistic use of both the reconstructed MODIS and AMSR2 SSTs improved the accuracy generating a more similar SST product to in situ data.
Table 6 compares scheme 1 LOYOCV results, scheme 2 LOYOCV results, and scheme 2 calibration results to the operational gap-free OSTIA SST product using in situ measurements. The accuracy of scheme 1 SST over both the original and reconstructed SST pixels produced a minor accuracy increment when compared to the MODIS SST (
Table 6). Similarly, the proposed scheme 2 approach yielded a similar performance regardless of the presence of the original AMSR2 SST pixels (
Figure S4). Interestingly, the accuracy of the scheme 2 approach over the reconstructed pixels of MODIS SST was similar to that of the original MODIS SST (
Table 5 and
Table 6), which implies that the data fusion approach successfully increased the accuracy of SST for data missing areas. According to ARMAE [
63] (
Table S1), scheme 2 showed excellent performance for both the original and reconstructed SST when compared to the MODIS SST and scheme 1 (
Table 5 and
Table 6).
Notably, the LOYOCV accuracy of scheme 2 over the original MODIS SST pixels is comparable to the accuracy of OSTIA SST (
Table 6). Since the OSTIA model corrected the error of SST using GFS drifting buoy data, which are the same in situ measurements used in this study. While the accuracy of OSTIA over the study area was very similar (RMSE of 0.59 °C) to those reported in the literature [
16], scheme 2 calibration outperformed the OSTIA SST (Δ0.43 °C in RMSE, Δ2.19% in Rrmse, Δ0.28 °C in MAE, and Δ0.014 in ARMAE). Consequently, the proposed scheme 2 approach integrating two satellite data sources can produce the high-quality SST product comparable (or even better) to the operational SST product that incorporates multiple satellite and in situ data.
4.3. Feature Resolution Analysis
Figure 8 depicts the spatial distribution of the reconstructed SSTs, SSTs improved through data fusion by scheme using bias-corrected satellite-SSTs (i.e., MODIS and AMSR2 SSTs, see
Section 2.2), and high-resolution operational SST products. As shown in
Figure 4a and
Figure S3a, the reconstructed MODIS SST (
Figure 8b) tends to be underestimated at high latitude (i.e., blue boxes in
Figure 8b,c,e,f) when compared to other products. One possible reason is that the image resizing method (i.e., nearest neighbor) in the decoder layer for dimension restoration induces smoothness during the reconstruction [
35]. Surprisingly, scheme 1 (
Figure 8d) and scheme 2 (
Figure 8e) effectively mitigated such underestimation in the reconstructed MODIS SST (
Figure 8b). In particular, scheme 1 (
Figure 8d) seemed to focus on a specific range of temperature when compared to the reconstructed MODIS SST (
Figure 8b). For example, the scheme 1 result (
Figure 8d) clearly showed the core ring of a warm eddy (i.e., black boxes on
Figure 8), similar to the operational products (OSTIA in
Figure 8c and MUR SST in
Figure 8f). However, based on visual interpretation, the overall spatial distribution of scheme 2 SST was in better agreement with the operational products than that of scheme 1 SST, which corresponded to the accuracy assessment results in
Table S3.
However, the comma-shaped rotational features at the end of warm cores (i.e., green boxes in
Figure 8) in the operational products were not clearly shown in the reconstructed and scheme 1-/scheme 2-improved SST results. As mentioned in
Figure 5 and
Figure 6, it is challenging to effectively reconstruct SST through DINCAE at the submesoscale [
35]. Scheme 1 and 2 models were also affected by the reconstructed SSTs, which were used as input variables in the models.
Figure 9 depicts the gradient fields of the reconstructed SSTs, the scheme-improved SSTs, and operational SST products. It is not possible to quantitatively compare the gradient fields as they are different by interpolation approach. Thus, a comparison was conducted using the super resolution (i.e., 1 km) MUR SST product as reference data. The reconstructed MODIS SST (
Figure 9b) showed overall similar gradients to MUR SST (
Figure 9f). In particular, scheme 2-improved SST (
Figure 10e) showed much more similar gradients to the MUR SST in the open sea (i.e., yellow box in
Figure 9) than the reconstructed MODIS SST (
Figure 9b). Scheme 2-improved SST resulted in more detailed gradients than OSTIA (
Figure 9c), comparable to MUR SST.
However, the gradient fields of both scheme-improved SSTs (
Figure 9d,e) had more diverged patterns compared to that of the reconstructed MODIS SST (
Figure 9b). In particular, the scheme 1 result (
Figure 9d) had more excessively diverged gradients (i.e., noise in the white box in
Figure 9d) than the scheme 2 result (
Figure 9e). One possible reason is that since RF works pixel-by-pixel, the relationships among the neighboring SST pixels might not be well trained. The incorporation of textual information might be able to mitigate the problem [
71].
Interestingly, the high error standard deviation in the reconstructed MODIS SST in the east coast of Japan (i.e., tile 1 in
Figure 5h and 6h) had a similar distribution with the gradient fields of the reconstructed MODIS SST (i.e., red boxes in
Figure 9b). This implies that DINCAE, the reconstruction model adopted in this study, might have high uncertainty in the areas where there is high spatiotemporal variability of SST (i.e., rapid change due to west-boundary current).
Figure 10 compares the PSDs of the reconstructed MODIS SST, the scheme-improved SSTs, and two operational SST products. All SSTs, except for OSTIA, resulted in a similar turbulence significant energy especially for fine-scale (<50 km) ocean surface phenomena [
15]. OI-based OSTIA was able to simulate SST at the scale of 100 km [
15,
17,
18]. The density of scheme 1-improved SST was higher than those of other high-resolution SSTs including the density of the reference MUR SST. This is possibly due to the excessive diverged gradients generated in the scheme 1 model, increasing false turbulence, which degraded feature resolution (
Figure 9d). The scheme 2-improved SST generated very similar density with the MUR SST, especially at scales between 20 and 100 km, which implies that the proposed approach can successfully simulate the seamless SST at high resolution (~4 km).
4.4. Novelty and Limitations
Many previous SST reconstruction studies have focused on the restoration of missing data, which did not fully consider the consistency with in situ measurements [
17,
18,
25,
34,
35]. To our knowledge, only Sunder et al., 2020, has used in situ data as a target to generate the high-resolution cloud-free daily SST, but it lacks discussion of restoring the ocean phenomena such as fronts and eddies [
32]. This study proposed a novel method that improves consistency with in situ SST measurements and generates the fine resolution seamless daily SST field through the synergistic use of two satellite sensor data based on machine learning. In particular, the proposed method shows a very promising result when compared to the high-resolution operational SST products using various assessment methods from both quantitative and qualitative aspects.
There are, however, several limitations of the proposed method. First, the proposed approach tends to rely on the performance of DINCAE. Although DINCAE applied skip connection to keep the fine feature scale of ocean phenomena of the original data, it was not able to fully mitigate the smoothing problem. DINCAE also has uncertainty in estimating a reliable error standard deviation over areas with a low reconstruction accuracy. Another limitation is that the pixel-wise learning adopted in the second part of the proposed approach may cause diverged gradients without consideration of textual information.