A General Convolutional Neural Network to Reconstruct Remotely Sensed Chlorophyll-a Concentration

Zhang, Xinhao; Zhou, Meng

doi:10.3390/jmse11040810

Open AccessArticle

A General Convolutional Neural Network to Reconstruct Remotely Sensed Chlorophyll-a Concentration

by

Xinhao Zhang

and

Meng Zhou

^*

School of Oceanography, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(4), 810; https://doi.org/10.3390/jmse11040810

Submission received: 13 March 2023 / Revised: 7 April 2023 / Accepted: 10 April 2023 / Published: 11 April 2023

(This article belongs to the Special Issue Advanced Remote Sensing Engineering Applied in the Environmental Monitoring of the Coasts)

Download

Browse Figures

Versions Notes

Abstract

:

Satellite-observed chlorophyll-a (Chl-a) concentrations are key to studies of phytoplankton dynamics. However, there are gaps in remotely sensed images mainly due to cloud coverage which requires reconstruction. This study proposed a method to build a general convolutional neural network (CNN) model that can reconstruct images in unfamiliar areas. Although several CNN models to reconstruct Chl-a in a specific area have already been proposed, the model in this research has the advantage of generality. The model uses a more flexible U-net architecture so that it can accept input of different shapes. Images from three areas of different shapes were used in model training to improve the generality of the model. Six models, with different auxiliary input schemes and architectures, were trained and evaluated. Results show that the model with bathymetry input and coarse-to-fine architecture has the best performance and can give reasonable reconstruction for the unfamiliar area. The best model shows better results than traditional interpolation methods when reconstructing for an unfamiliar area, especially in regions outside the data coverage.

Keywords:

reconstruction; neural network; generality; chlorophyll-a

1. Introduction

Phytoplankton serves as an indicator of marine ecosystem health. Algal blooms can have harmful impacts on anthropogenic activities [1]. Therefore, monitoring and prediction of phytoplankton are crucial for the effective management of coastal and oceanic environment health and resources [2]. Satellite observations of Chl-a concentrations, which are a direct proxy for phytoplankton biomass [3], are critical for studying phytoplankton dynamics [4]. However, there are significant missing data in remotely sensed Chl-a images to varying degrees due to poor atmospheric conditions, missing orbits, and sensor malfunction [5,6].

Numerous methods have been developed to reconstruct missing data in satellite data. Traditional interpolation methods contain two main groups: deterministic and geostatistical interpolation. Deterministic methods predict directly based on the surrounding data points using mathematical functions [7]. The most commonly used deterministic methods include linear interpolation, nearest neighbor interpolation, spline interpolation, etc. Geostatistical methods utilize the statistical properties of the measured points to reconstruct and estimate uncertainties of reconstruction. Geostatistical methods include different schemes of Kriging interpolation algorithms [8]. These traditional methods are easy to operate and hence have been used and evaluated in much research [9,10,11]. However, these methods do not work well when the data has limited spatial and temporal scopes. Because they estimate the missing values using only surrounding measured values [12] and no domain knowledge about the variable is used.

Optimal interpolation (OI) has been found to have better performance with limited measured data than traditional methods by combining background fields and observations. This method has been widely used to interpolate remotely sensed oceanic data in the past [13,14,15]. However, OI requires background fields and the error covariance matrix. This a priori information is not always available, limiting the applicability of OI [16].

The DINEOF (Data Interpolating Empirical Orthogonal Functions) [17] method has been developed and commonly used in recent years. Compared with OI, this method does not need a priori information, achieves similar accuracy, and is 30 times faster [18]. DINEOF was initially used for sea surface temperature (SST) reconstruction [19] and has also been widely used later for Chl-a reconstruction [20,21,22]. The central principle of DINEOF is to represent the data as a combination of a reduced number of Empirical Orthogonal Functions (EOFs) that capture most of the spatial and temporal variability. However, DINEOF is based on recognizing regular spatial-temporal patterns and, hence, can only reconstruct for long data series [23]. The applicability of DINEOF is also limited, similar to other geostatistical methods.

Machine learning techniques are currently widely used in many geoscientific fields. Neural networks, as a type of machine learning technique, have been introduced into ocean data reconstruction because of their ability to handle non-linear relationships among variables. Krasnopolsky et al. [24] successfully applied an Artificial Neural Network (ANN) to predict Chl-a concentration on a global scale, using sea surface height (SSH), sea surface salinity (SSS), SST, and Argo in-situ data. However, this research reconstructed Chl-a on a low spatial resolution and performed well only for low concentrations. Jo et al. [25] predict Chl-a using five inputs of microwave measurements and geolocation data. However, the model in this research could not predict Chl-a in coastal areas, because input data do not cover coastal areas. Except for neural networks, other traditional machine-learning techniques have also been used in ocean data reconstruction. Park et al. [26] reconstruct Chl-a in the Ross Sea using an ensemble-based machine learning method called Random forest. These traditional machine learning methods predict the Chl-a concentration at a data pixel based on merely other variables at the same pixel. However, in reality, a biophysical variable at a pixel is always influenced by the state of an entire system. The information on the state of the system hidden in adjacent time steps and neighboring grid cells was rarely exploited by the traditional machine learning methods [27].

Convolutional neural networks (CNNs) have shown better performance than ANN in the field of image inpainting. Their convolutional kernel enables the network to predict a pixel by considering inputs at surrounding pixels. Therefore, they can capture not only relationships among variables but also complex nonlinear relationships among pixels in the system. Barth et al. [28] present a CNN called DINCAE to reconstruct missing values in remotely sensed SST. The model uses incomplete SST images of consecutive 3 days, date, and geolocation data to generate complete reconstruction on the middle day. This model utilized the relationship between valid Chl-a pixels and pixels with missing values. The research shows DINCAE outperformed the reconstruction of DINEOF, especially for small-scale features. Han et al. [29] successfully applied the DINCAE to reconstruct remotely sensed Chl-a concentration in the South China Sea and the West Philippine Sea. Luo et al. [30] also applied the DINCAE method to reconstruct Chl-a data in Bohai and the Yellow Sea. Jin et al. [31] used satellite ocean color data including suspended sediment, visibility, and physical parameters generated by Hydrodynamic Model such as currents, temperature, and salinity as input to estimate Chl-a concentration through CNN. This model, different from DINCAE, utilized the relationship between Chl-a and other variables (suspended sediment, physical parameters, etc.). The CNN models used in these studies, however, contain fully connected layers that determine the shape of the input matrix. These models can only be trained in a single area and predict for the same area, having poor generality. Researchers need to train a new model for each region of interest. The training process requires long data series and iterations of days. The data requirements and complex operations restrict their applicability.

All the methods discussed above to reconstruct gaps in remote sensed Chl-a have their advantages in applicability or accuracy. The objective of this manuscript is to present a more general CNN model with U-net architecture which can be trained with data from multiple regions, in order to reconstruct gappy satellite Chl-a images of different regions. This general model can extract general and common statistical relationships in multiple training areas and reconstruct images of new areas, with no need for long data series (DINEOF, DINCAE) or a priori information (OI) in the new areas. Therefore, this model has higher applicability and wider application scenarios than OI, DINEOF, and the CNN-based models discussed before [28,29,30,31]. While compared with traditional interpolation methods, the method has the strength of higher accuracy thanks to the ability to extract complex nonlinear relationships from vast amounts of data for reconstruction. The source code of this method is included in Data Availability Statement.

2. Materials

Four offshore areas were selected for this study along the east coasts of North America and Asia. They are as follows: area1, the East China Sea (ECS, 26–34° N, 120.5–127.5° E), area2, the South China Sea (SCS, 21–26° N, 113–123° E), area3, the East Coast of the United States (EC, 33–40° N, 78–70° W), and area4, the Florida peninsula (Florida, 24–32° N, 84–76° W). Only data from area1, area3, and area4 are used in training. Area2 is used solely as a test set to evaluate the models’ performance in unfamiliar areas. All four areas are on the east side of continents (Eurasian and North America) in the northern hemisphere at middle latitudes. They have similarities as well as differences. It helps to find the limitation of the model’s generalization ability.

2.1. Chl-a

The satellite observed Chl-a concentration data used in this research are from the GlobColour project. Daily Chl-a data were merged from multiple sensors (Modis, VIIRSN, Meris) through weighted averages. Ten years of data from four regions were selected from 2011–2020. The resolution of the grids is around 4 km. Grid shapes are as follows: area1 (193 × 169), area2 (121 × 241), area3 (193 × 193), and area4 (169 × 193). Figure 1 shows the mean Chl-a values of the four regions from 2011 to 2020. The four images all cover offshore regions where Chl-a concentration variations are dominated by terrigenous nutrient supply [2]. High Chl-a values occur over a large area of the Yangtze River estuary in area1, along the mainland coast of the South China Sea and on the western side of Taiwan Island in area2, along the East Coast in area3, and along the Florida Peninsula in area4. All these high Chl-a values are in low water depth regions.

Missing rates of all these areas are calculated at each pixel and shown in Figure 2. The majority of the pixels have missing rates higher than 50%. The Missing rate in the East China Sea and the South China Sea is higher than that in the East Coast and Florida, especially in the Coastal region where the data loss rate is close to 100%. This difference may be due to differences in satellite coverage and cloud density between the two continents. To make the results more reliable, images with valid pixels less than 2% are not used, referring to the previous study [28]. Finally, 3277 images in area1, 3639 in area3, and 3565 in area4 were selected as a bulk dataset. The bulk dataset was then divided into a training set, a validation set, and a test set in the ratio of approximately 8:1:1. Images in area2 are only for testing and 10% (321) of valid images were selected randomly as a separate test set.

2.2. Bathymetry

The General Bathymetric Chart of the Oceans (GEBCO) project provides gridded bathymetric datasets, in meters, on a 15-arc-second interval grid. GEBCO releases a new global grid every year. In this research, the GEBCO_2022 Grid was used, shown in Figure 3. Bathymetry data are interpolated to match the grid points of the Chl-a data. Figure 1 and Figure 3 show a clear correlation between Chl-a concentration and seafloor depth. High Chl-a values are mainly distributed near continental shelves and islands where the depth is less than 100 m. The topography adjacent to these shallow waters is mainly gentle plains surrounded by continental shelves, except for the eastern side of Taiwan island where the elevation drops from mountains to an ocean basin sharply. The cliff topography, different from other plains with river estuary, results in the low Chl-a on the eastern coast of Taiwan Island shown in Figure 1 [32].

3. Methods

3.1. Workflow

Figure 4 illustrates the workflow of how each model is trained. Firstly, Chl-a images from the training set are masked artificially and more data is masked out. This research creates an artificial mask dataset, containing 100 masks for each region. Each mask

M_{i j}

is extracted from the missing position of a real daily Chl-a image, where 1 denotes the pixel is missing and 0 denotes the pixel is valid. The masking process is performed by multiplying the original Chl-a matrix by

(1 - M_{i j})

. The masked 2D Chl-a matrix is combined with other auxiliary variables to create a 3D input matrix. The variables used in the input matrix are shown in Table 1. The model uses the 3D input matrix to generate a 2D Chl-a reconstruction. Initially, all parameters in the model are assigned random values and the reconstruction is poor. The reconstruction (

{Chl - a}_{r e c}

) and true Chl-a observation (

{Chl - a}_{o b s}

) are taken as the input of the cost function given in the equation below:

J ({\hat{y}}_{i j}) = \frac{\sum_{i j} {(({\hat{y}}_{i j} - y_{i j}) * V_{i j})}^{2}}{N_{v a l i d}},

(1)

where

J ({\hat{y}}_{i j})

is the cost,

{\hat{y}}_{i j}

is the reconstruction value,

y_{i j}

is the satellite observation value,

V_{i j}

is the flag that denotes whether each pixel has a valid observed value, i and j are the row number and the column number in the matrix, and

N_{v a l i d}

is the number of valid pixels. Once the cost function

J ({\hat{y}}_{i j})

is defined, the next step is to calculate its derivatives with respect to the model parameters. This involves applying the chain rule to compute the gradient vector, which contains the partial derivatives of

J ({\hat{y}}_{i j})

with respect to each parameter. The optimizer then uses this gradient to update the parameters iteratively, following the gradient descent algorithm, which aims to minimize

J (y)

by adjusting the model’s weights and biases. Adam optimizer [33] was used in this research because it has an adaptive learning rate and hence converges fast. Standard parameters were used for Adam, with the learning rate α = 0.001, the exponential decay rate for the first moment estimates β1 = 0.9, the second-moment estimates β2 = 0.999, and the regularization parameter = 10⁻⁸.

Throughout the training process, the models are evaluated with the validation set, and current parameters are saved as files. The parameters with the lowest root mean squared error (RMSE) in validation through the training phase are regarded as the best parameters for the model and are evaluated with the test sets later.

Table 1 lists all the input variables, each of which is represented by a 2-D matrix corresponding to a single channel in the overall input matrix. To facilitate easier extraction of data variations, all variables are normalized to the same scale and a similar distribution [34]. Variable 1 is the Chl-a satellite observation which is masked by a mask chosen randomly from the mask dataset. Since Chl-a values in nature follow a log-normal distribution approximately, this variable is normalized logarithmically. In the input matrix, all the invalid pixels, including the missing pixels in the original Chl-a matrix and the pixels masked out, are set to 0. To let the model know these pixels are invalid pixels rather than pixels where the observed

\ln (Chl - a)

is 0, variable 2 is introduced. Variable 2 represents the valid flag of each pixel, in which pixels with valid Chl-a data are set to 1, and invalid pixels are set to 0. Similarly, variables 3 and 5 correspond to images of the previous and next days, respectively, after being applied with random masks. The flags for the two days are represented by variables 4 and 6. Variables 7,8 are cosine and sine of the day of the year multiplied by

\frac{2 π}{366}

. Cosine and sine functions are used to provide seasonal information. In these two functions, the outputs of day 0 and day 365 are close. This is close to the real nature.

In previous studies, longitude and latitude were used to provide spatial context, which was effective for reconstructing data in the same area as the training area [28,29]. This approach helped the model fit the distribution of each coordinate. However, using coordinates for spatial context may lead to overfitting, which could harm the model’s ability to reconstruct data in areas outside the coordinate range of the training areas. As opposed to coordinates, bathymetry has a broadly valid impact on Chl-a concentration. Adding bathymetry data as input could increase the performance of the model in all areas.

To evaluate the effect of the two spatial context inputs, models with three input schemes are trained and tested. The input schemes are as follows: no auxiliary spatial information, coordinates input, and bathymetry input. Coordinate inputs are scaled linearly to the range of −1 to 1 (90° S: −1, 90° N: 1, 180° W: −1, 180° E: 1). The elevation of bathymetry input is normalized to the power of 1/5 and scaled linearly, to amplify the variance in elevations of small absolute values (coastal area). This is because a 10 m variance in ocean basin or mountain may not have an impact on Chl-a in adjacent water bodies while the same variance in low elevation could outline an estuary and has a significant impact on nearby Chl-a concentration.

3.2. Model Architecture

The architecture of the U-net used in this research is shown in Figure 5. It is a “U” shaped network consisting of an encoder on the left side and a decoder on the right side. The input matrix is shaped as (C, M, N). C is the channel number, representing the number of variables. M and N represent the grid number in latitude and longitude directions respectively. The arrow in the figure denotes the flow of data through the network.

To begin, the input is fed into a convolutional layer and passed through an activation function. The convolutional layer utilized for feature extraction is composed of 16 filters with a receptive field of 3 × 3 grids and a stride of 1. In this study, the ReLU function is employed as the activation function to provide nonlinearity to the model. After going through the convolutional layer and activation function layer, the input matrix is converted into a feature map with dimensions of (16, M, N). This feature map then undergoes a max pooling operation, where the filter size is 2 × 2, reducing the dimensions of the feature map to half of its original size (16, M/2, N/2) for dimensionality reduction.

Similar operations are applied in subsequent layers, where the grid length of the feature map is halved for each max pooling layer, while the channel number is doubled for each convolutional layer. Given the original image size, the encoder comprises 4 pooling layers and 5 convolutional layers, resulting in a feature map with a shape of (256, M/16, N/16).

The decoder module is comprised of 4 upsampling layers for feature dimension recovery and 5 convolutional layers for feature extraction. Nearest interpolation is employed in the upsampling layer to double the grid length of the feature map. The number of filters in the convolutional layers in the decoder reduces by half each time, except for the last layer which has only one filter to produce a one-channel output.

To better recover lost information in downsampling, skip connections are established between the encoder and decoder. This technique fuses the feature maps at corresponding positions in the two processes, allowing the decoder to better retain high-resolution detail information contained in the higher-level feature maps during upsampling. As a result, small-scale details are recovered more accurately. Ultimately, the feature map with dimensions of (256, M/16, N/16) is transformed into an output matrix of (1, M, N) through the decoder.

This research uses coarse-to-fine architecture with two connected U-nets to stabilize training and improve model performance [35]. A coarse-to-fine model consists of a coarse-net and a refine-net. The coarse-net produces a coarse reconstruction, which is concatenated with the original input to serve as the input for the refine-net. With an input matrix containing the original input and the coarse Chl-a reconstruction, the refine-net can focus on refining the coarse Chl-a reconstruction. Using two separate networks with different tasks has been found to be more effective than relying on a single network to perform a complex task of reconstruction on all scales. The coarse-to-fine architecture has been extensively tested in the computer vision domain [36] and has also been applied in remotely sensed data reconstruction in previous studies [37].

However, adding a refine-net roughly doubles the number of parameters and the depth of the neural network, making it more susceptible to the vanishing gradient issue. As the depth increases, the derivatives of the cost with respect to parameters in the first few layers far from the output layer tend to be very small. To mitigate this issue, the intermediate results are included in the cost function. Thus, the cost function

J

in Figure 6 is calculated as the weighted sum of the cost of two networks, as shown below:

J ({\hat{y}}_{i j}) = α \times J_{c o a r s e} + β \times J_{r e f i n e},

(2)

where weight parameters in this research are set the same as in the previous study [37], with

α

= 0.3, and

β

= 0.7.

J_{c o a r s e}

is the cost function of the coarse net,

J_{r e f i n e}

is the cost function of the refine net. Both of them are calculated as Equation (1).

Although this architecture supports multiple refine-nets after the coarse-net, only one refine-net was used in this research. Each extra refined net increases the number of parameters of the whole network, hence increasing the training time and requirement on computer memory. A second refine-net has a finer intermediate reconstruction input and may improve the performance slightly, but will increase the cost of computing resources and time dramatically. The previous study also showed models with only one similar auxiliary network can achieve a close effect to those with many [38].

4. Results and Discussion

4.1. Training Process

RMSEs of these six models during the training process for both the training and the validation sets are shown in Figure 7. The RMSEs in this phase are calculated as follows:

RMSE ({\hat{y}}_{i j}) = \sqrt{\frac{\sum_{i j} {(({\hat{y}}_{i j} - y_{i j}) \times V_{i j})}^{2}}{N_{m a s k e d}}} = \sqrt{J ({\hat{y}}_{i j})} .

(3)

RMSEs are calculated using logarithmically normalized Chl-a values. Because of the property of Logarithms (

\ln a - \ln b = \ln \frac{a}{b})

, the RMSEs calculated in this research can be broadly seen as the relative errors between the predicted Chl-a values and the actual Chl-a values, regardless of the data scale.

Models 1–3 are the single-net model with no auxiliary spatial variable, the single-net model with coordinates input, and the one with bathymetry input respectively. Models 4–6 are the three models with coarse-to-fine architecture and with the same three input schemes as models 1–3. It can be observed that the RMSEs decrease rapidly in the beginning stage and all of the curves converged at the end after 500 epochs. We select the model parameters with the lowest validation RMSE. Parameters at epoch 370, epoch 440, epoch 335, epoch 445, epoch 440, and epoch 370 are selected for models 1–6, respectively. These models are tested later in the research to evaluate the performance of the model and the effects of different architectures and inputs.

4.2. Reconstruction of the Training Areas

RMSEs of the reconstruction produced by the six models in the test set are shown in Table 2. RMSEs for the test sets are calculated as follows:

RMSE ({\hat{y}}_{i j}) = \sqrt{\frac{\sum_{i j} {(({\hat{y}}_{i j} - y_{i j}) \times V_{i j} \times M_{i j})}^{2}}{N_{m a s k e d}}},

(4)

where

M_{i j}

is the mask applied on input and

N_{m a s k e d}

is the number of valid pixels that are masked. Notably, the RMSE expression computed during the test phase differs from those calculated during the training and validation phases. Specifically, in the test phase, only pixels that possess valid observations and are masked out by a random mask are considered in the RMSE calculation. Conversely, in the previous phases, all pixels with valid values in the observations are accounted for in the RMSE computation. It is worth emphasizing that the valid pixels that are not masked out in the input should be taken into account during training, as the model will learn to use them directly as outputs. However, it is inappropriate to evaluate the model’s performance based on the reconstruction quality at these pixels, as their values are already available in the input. Thus, the evaluation of the model’s performance should only consider the RMSE of the reconstructed pixels that were masked out during the test phase. As Table 2 shows, the refine-net architecture has a visibly positive effect on all the models irrespective of the input schemes, across all areas. Hence, only the coarse-to-refine networks will be further tested later in this paper. Moreover, Table 2 shows that both bathymetry and coordinate inputs enhance the model performance, albeit to a lesser extent than refinement. Additionally, bathymetry outperforms coordinates slightly in three areas. Specifically, the RMSEs in area1 and area3 are substantially higher than those in area4. Furthermore, the RMSE in area1 is higher than that in area3, albeit the difference is less pronounced. The variability among areas could be attributed to differences in the data distribution or missing rates.

In the previous DINCAE application on Chl-a reconstruction [29], the RMSE of reconstruction is 0.27. Without considering the difference in different areas’ data characteristics, the RMSEs for the general model in this research are on a level close to DINCAE specialized for 1 specific area in the South China Sea.

RMSE between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

at each pixel is calculated and shown in Figure 8. As shown, the three models have a similar RMSE spatial distribution. Overall, RMSE values in shallow water regions are higher than those in open ocean regions. Two notable high RMSE areas are the Yangtze River estuary in area1 and the region where the Gulf Stream travels through in area3. These two areas are influenced by two strong streams with high variance and are harder to infer accurately.

Figure 9 displays the reconstruction results of each model for specific days. Data from the test set in area1 on 11 June 2013, in area3 on 7 October 2020, and in area4 on 20 February 2011 was used for reconstruction. The white areas in the last column (Satellite) are missing pixels where

V_{i j} = 0

. The second column (Cur) is generated by multiplying the original satellite observed Chl-a matrix by

(1 - M_{i j})

, hence the white space in the second column are invalid pixels, where

V_{i j} \times (1 - M_{i j}) = 0

. The first (Prev) and the third (Next) columns are the Chl-a of the previous day and the next, after artificial masks. The figure shows that there is no significant difference between the three models’ reconstruction results and all the three models can reconstruct the Chl-a patterns reliably in the training areas. Although they tend to blur fine structures generated by short-term ocean dynamics.

Errors at each pixel are calculated as

\ln ({Chl - a}_{r e c})

subtracted by

\ln ({Chl - a}_{o b s})

and are shown in Figure 10. High error values occur in similar areas among the three models. In area1, high negative and positive errors occur in the south of the Yangtze River estuary, forming a pattern of a vortex. In area3, a high positive error occurs in the north. For the front in the middle of area3, reconstructions have high negative errors along the east side of the front. In area4, a high positive error of vortex shape occurs along the curve of the east side of the Florida peninsula, and a lump of water with a high Chl-a concentration on the northeast is missing in the reconstruction. All the high errors occur due to missing short-term features or blurring feature borders. Among these three models, the one with bathymetry input shows a lower error than the other two models, although the improvement is not significant.

4.3. Reconstruction for the Area Not Used in Training

To further evaluate the ability of the three models to reconstruct areas outside the training areas, area2 was used to test the three models. The model without auxiliary parameters, the model with coordinate input, and the model with bathymetry input have RMSEs of 0.6190, 0.6883, and 0.5761 respectively. The RMSEs are higher than the RMSEs for training areas in Table 2. As mentioned before, the RMSE of logarithmic values evaluates the relative error. A logarithmic error of 0.5761 (RMSE of the model with bathymetry input) may underestimate the Chl-a by 43.8% or overestimate the Chl-a by 77.9%, while a logarithmic error of 0.3538 (RMSE of the same model for area4) may underestimate the Chl-a by 29.8% or overestimate the Chl-a by 42.4%. The results indicate that bathymetry input has a significant improvement on model performance, while coordinate harms model performance when applied to unfamiliar areas.

Figure 11 presents the RMSE of three models for each pixel in the test set. The model without auxiliary parameters and the model with coordinate input have high RMSE in the southwest, close to the Pearl River Estuary, on the western coast of Taiwan island, and nearby another estuary in the north. The model with bathymetry input has significantly lower RMSEs near estuaries and on the western coast of Taiwan island than the other two models. However, it has a high RMSE on the eastern side of Taiwan island, possibly due to the region’s distinct landscape characteristics that differ from those of the areas used for training. While the coasts in training areas are all gentle planes with high Chl-a concentration, the east coast of Taiwan island is a steep mountain area close to the continental shelf with a low Chl-a concentration. Incorrectly inferring the low Chl-a value area as a high Chl-a area may lead to a 1000% overestimation. Since the error of logarithmic values evaluates the relative error and RMSE amplifies the higher errors, this false inference for the east coast of Taiwan land contributes to the overall RMSE greatly. This could be solved by incorporating such bathymetry in training data.

Figure 12 shows the reconstructions of the three models for an image on 27 August 2014. The input image of the next day covers the high Chl-a value near the southern coast, while none of the three days has valid input in the northern coastal area. All models reconstruct the high Chl-a value in the southern coastal area well, but only the model with bathymetry reconstructs the high Chl-a pattern in the northern coastal area successfully. The result suggests that all models can utilize valid pixels in three days to make reconstruction, but only model 3 can achieve reasonable reconstruction for areas where no spatially or temporally nearby pixels are available.

Errors at each pixel are calculated and shown in Figure 13. The model without auxiliary input and the one with coordinate input both have a large negative error on the northern coastal area, due to failure to reconstruct the high Chl-a value near the coast. In contrast, the model with bathymetry input has a much lower error in the same area, although it has a large positive error farther from the coast, due to overextending the scope of high Chl-a. The same advantage due to bathymetry input occurs on the west side of Taiwan island. Furthermore, the last model has a lower error in the open ocean on the east. In summary, the reconstruction error of the model with bathymetry input is slightly lower on a global scope and significantly lower in coastal areas than the other two models.

4.4. Comparison between the Best Model and Other Methods

The main strength of the proposed model compared with the CNN-based methods discussed before is its ability to extract the common statistical relationships in multiple areas, and works for areas with no long data series or a priori information. OI, DINEOF, and the CNN-based models discussed before are not suitable in the absence of long data series and a priori information. To evaluate the effectiveness of the proposed model, it was compared with traditional interpolation methods (Linear interpolation and NN). All the methods used incomplete Chl-a images of the previous day, the current day, and the next day to reconstruct.

The results of the three methods are shown in Figure 14. Linear interpolation has large gaps in the reconstruction due to the limited range of input data. NN fills all the gaps but fails to reproduce the high Chlo-a feature outside the data spread. The best-performing model shows a better result in regions outside the data spread. The possible reason is that the CNN model is able to make inferences based on bathymetry and surrounding valid values, relying on patterns it has seen. In contrast, the other two methods estimate merely by linear combinations of surrounding values.

5. Conclusions

This study proposes a novel approach based on CNN for the reconstruction of gaps in remotely sensed Chl-a images. Data reconstruction techniques rely on the correlation between variables. Previously employed methods either assume this correlation through predetermined mathematical formulas or derive it from statistics. Both have their limitations: mathematical formulas oversimplify the intricate non-linear relationships between geobiological variables, while the statistics extracted by previous methods are restricted to the local area, leading to low generality and limited applicability. The principal strength of the method developed in this study lies in its ability to extract general patterns from vast amounts of data in multiple areas and to generate predictions for regions with limited prior information and no long data series.

Different architectures and auxiliary input schemes were used to improve the model performance. The results demonstrated that the coarse-to-fine architecture enhanced the model performance. Among the input schemes, bathymetry input visibly improved the model’s generalization ability, while the coordinates input commonly used in previous machine-learning techniques harmed the model’s generalization ability.

The best-performing model is the model with bathymetry input and coarse-to-fine architecture. It was compared with two deterministic methods of reconstructing area2 using images of 3 days. The result demonstrated that the model outperformed traditional interpolation methods, especially in areas outside the data coverage.

Despite its success, the current model has certain limitations. For instance, it fails to predict the low Chl-a concentration on the eastern coast of Taiwan Island in area2, which has a different bathymetry (steep slope with no river estuary) than other coasts in the training area. This limitation suggests that the model’s generalization ability is restricted by the data range used in training. Furthermore, the model was only tested in four areas on the east side of continents in the northern hemisphere at middle latitudes, where the climate and ocean dynamics are similar. Hence, its performance may be poorer in areas with distinct characteristics.

Future research may benefit from incorporating data from wider-spread areas and adding other physical variables such as wind speed and SST to enhance the model’s performance and generalization ability. The model will be further tested in areas with distinct characteristics. Additionally, a more general model may leverage the vast data available globally and provide acceptable reconstruction accuracy for polar areas, where data is too sparse to train a model.

Author Contributions

M.Z. contributed to the idea of this study, and reviewed and edited the paper. X.Z. wrote the code, performed the experiments, and wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China under contracts Nos. 41861134040, 41530960, 41941008, Shanghai Key Laboratory of Polar Life and Environment Sciences under contract No. 21DZ2260100, and the Shanghai Frontiers Science Center of Polar Science.

Data Availability Statement

Satellite Chl-a data is available on www.globcolour.info (accessed on 10 January 2022). Bathymetry data is available on www.gebco.net (accessed on 10 January 2022). The source code is released as open source under the terms of the GNU General Public License v3 and is available at the address https://github.com/PeanutZom/ChloRec (accessed on 10 January 2022).

Acknowledgments

We thank GlobColour for providing the satellite Chl-a data and GEBCO for providing the bathymetry data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Frolov, S.; Kudela, R.M.; Bellingham, J.G. Monitoring of harmful algal blooms in the era of diminishing resources: A case study of the U.S. West Coast. Harmful Algae 2013, 21–22, 1–12. [Google Scholar] [CrossRef]
Blondeau-Patissier, D.; Gower, J.F.R.; Dekker, A.G.; Phinn, S.R.; Brando, V.E. A review of ocean color remote sensing methods and statistical techniques for the detection, mapping and analysis of phytoplankton blooms in coastal and open oceans. Prog. Oceanogr. 2014, 123, 123–144. [Google Scholar] [CrossRef] [Green Version]
Cullen, J.J. The deep chlorophyll maximum: Comparing vertical profiles of chlorophyll a. Can. J. Fish. Aquat. Sci. 1982, 39, 791–803. [Google Scholar] [CrossRef]
Groom, S.; Sathyendranath, S.; Ban, Y.; Bernard, S.; Brewin, R.; Brotas, V.; Brockmann, C.; Chauhan, P.; Choi, J.K.; Chuprin, A.; et al. Satellite Ocean Colour: Current Status and Future Perspective. Front. Mar. Sci. 2019, 6, 30. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gerber, F.; de Jong, R.; Schaepman, M.E.; Schaepman-Strub, G.; Furrer, R. Predicting Missing Values in Spatio-Temporal Remote Sensing Data. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2841–2853. [Google Scholar] [CrossRef] [Green Version]
Shen, H.F.; Li, X.H.; Cheng, Q.; Zeng, C.; Yang, G.; Li, H.F.; Zhang, L.P. Missing Information Reconstruction of Remote Sensing Data: A Technical Review. IEEE Geosci. Remote Sens. Mag. 2015, 3, 61–85. [Google Scholar] [CrossRef]
Adhikary, P.P.; Dash, C.J. Comparison of deterministic and stochastic methods to predict spatial variation of groundwater depth. Appl. Water Sci. 2017, 7, 339–348. [Google Scholar] [CrossRef] [Green Version]
Amini, M.A.; Torkan, G.; Eslamian, S.; Zareian, M.J.; Adamowski, J.F. Analysis of deterministic and geostatistical interpolation techniques for mapping meteorological variables at large watershed scales. Acta Geophys. 2019, 67, 191–203. [Google Scholar] [CrossRef]
Borges, P.d.A.; Franke, J.; da Anunciação, Y.M.T.; Weiss, H.; Bernhofer, C. Comparison of spatial interpolation methods for the estimation of precipitation distribution in Distrito Federal, Brazil. Theor. Appl. Climatol. 2016, 123, 335–348. [Google Scholar] [CrossRef]
Delbari, M.; Afrasiab, P.; Jahani, S. Spatial interpolation of monthly and annual rainfall in northeast of Iran. Meteorol. Atmos. Phys. 2013, 122, 103–113. [Google Scholar] [CrossRef]
Mutua, F. A comparison of spatial rainfall estimation techniques: A case study of Nyando River Basin Kenya. J. Agric. Sci. Technol. 2012, 14, 96–113. [Google Scholar]
Chiles, J.-P.; Delfiner, P. Geostatistics: Modeling Spatial Uncertainty; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 497. [Google Scholar]
Hosoda, K.; Sakaida, F. Global daily high-resolution satellite-based foundation sea surface temperature dataset: Development and validation against two definitions of foundation SST. Remote Sens. 2016, 8, 962. [Google Scholar] [CrossRef] [Green Version]
Martin, M.; Dash, P.; Ignatov, A.; Banzon, V.; Beggs, H.; Brasnett, B.; Cayula, J.-F.; Cummings, J.; Donlon, C.; Gentemann, C. Group for High Resolution Sea Surface temperature (GHRSST) analysis fields inter-comparisons. Part 1: A GHRSST multi-product ensemble (GMPE). Deep Sea Res. Part II Top. Stud. Oceanogr. 2012, 77, 21–30. [Google Scholar] [CrossRef]
Reynolds, R.W.; Smith, T.M.; Liu, C.; Chelton, D.B.; Casey, K.S.; Schlax, M.G. Daily high-resolution-blended analyses for sea surface temperature. J. Clim. 2007, 20, 5473–5496. [Google Scholar] [CrossRef]
Hoteit, I.; Luo, X.; Bocquet, M.; Kohl, A.; Ait-El-Fquih, B. Data assimilation in oceanography: Current status and new directions. New Front. Oper. Oceanogr. 2018, 2018, 465–512. [Google Scholar]
Beckers, J.M.; Rixen, M. EOF calculations and data filling from incomplete oceanographic datasets. J. Atmos. Ocean. Technol. 2003, 20, 1839–1856. [Google Scholar] [CrossRef]
Alvera-Azcarate, A.; Barth, A.; Sirjacobs, D.; Lenartz, F.; Beckers, J.M. Data Interpolating Empirical Orthogonal Functions (DINEOF): A tool for geophysical data analyses. Mediterr. Mar. Sci. 2011, 12, 5–11. [Google Scholar] [CrossRef] [Green Version]
Alvera-Azcarate, A.; Barth, A.; Rixen, M.; Beckers, J.M. Reconstruction of incomplete oceanographic data sets using empirical orthogonal functions: Application to the Adriatic Sea surface temperature. Ocean Model. 2005, 9, 325–346. [Google Scholar] [CrossRef] [Green Version]
Alvera-Azcarate, A.; Barth, A.; Beckers, J.M.; Weisberg, R.H. Multivariate reconstruction of missing data in sea surface temperature, chlorophyll, and wind satellite fields. J. Geophys. Res.-Oceans 2007, 112. [Google Scholar] [CrossRef] [Green Version]
Hilborn, A.; Costa, M. Applications of DINEOF to Satellite-Derived Chlorophyll-a from a Productive Coastal Region. Remote Sens. 2018, 10, 1449. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Liu, D. Reconstruction of satellite chlorophyll-a data using a modified DINEOF method: A case study in the Bohai and Yellow seas, China. Int. J. Remote Sens. 2014, 35, 204–217. [Google Scholar] [CrossRef]
Konik, M.; Kowalewski, M.; Bradtke, K.; Darecki, M. The operational method of filling information gaps in satellite imagery using numerical models. Int. J. Appl. Earth Obs. Geoinf. 2019, 75, 68–82. [Google Scholar] [CrossRef]
Krasnopolsky, V.; Nadiga, S.; Mehra, A.; Bayler, E.; Behringer, D. Neural Networks Technique for Filling Gaps in Satellite Measurements: Application to Ocean Color Observations. Comput. Intell. Neurosci. 2016, 2016, 29. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jo, Y.-H.; Kim, D.-W.; Kim, H. Chlorophyll concentration derived from microwave remote sensing measurements using artificial neural network algorithm. J. Mar. Sci. Technol. 2018, 26, 10. [Google Scholar]
Park, J.; Kim, H.-C.; Bae, D.; Jo, Y.-H. Data Reconstruction for Remotely Sensed Chlorophyll-a Concentration in the Ross Sea Using Ensemble-Based Machine Learning. Remote Sens. 2020, 12, 1898. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
Barth, A.; Alvera-Azcarate, A.; Licer, M.; Beckers, J.M. DINCAE 1.0: A convolutional neural network with error estimates to reconstruct sea surface temperature satellite observations. Geosci. Model Dev. 2020, 13, 1609–1622. [Google Scholar] [CrossRef] [Green Version]
Han, Z.H.; He, Y.J.; Liu, G.Q.; Perrie, W. Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea. Remote Sens. 2020, 12, 480. [Google Scholar] [CrossRef] [Green Version]
Luo, X.; Song, J.; Guo, J.; Fu, Y.; Wang, L.; Cai, Y. Reconstruction of chlorophyll-a satellite data in Bohai and Yellow sea based on DINCAE method. Int. J. Remote Sens. 2022, 43, 3336–3358. [Google Scholar] [CrossRef]
Jin, D.; Lee, E.; Kwon, K.; Kim, T. A Deep Learning Model Using Satellite Ocean Color and Hydrodynamic Model to Estimate Chlorophyll-a Concentration. Remote Sens. 2021, 13, 2003. [Google Scholar] [CrossRef]
Lai, C.-C.; Wu, C.-R.; Chuang, C.-Y.; Tai, J.-H.; Lee, K.-Y.; Kuo, H.-Y.; Shiah, F.-K. Phytoplankton and bacterial responses to monsoon-driven water masses mixing in the Kuroshio off the east coast of Taiwan. Front. Mar. Sci. 2021, 8, 707807. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Yu, J.H.; Lin, Z.; Yang, J.M.; Shen, X.H.; Lu, X.; Huang, T.S. Generative Image Inpainting with Contextual Attention. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Liu, H.; Jiang, B.; Xiao, Y.; Yang, C. Coherent Semantic Attention for Image Inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Barth, A.; Alvera-Azcarate, A.; Troupin, C.; Beckers, J.-M. DINCAE 2.0: Multivariate convolutional neural network with error estimates to reconstruct sea surface temperature satellite and altimetry observations. Geosci. Model Dev. 2022, 15, 2183–2196. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.Q.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]

Figure 1. Averaged Chl-a concentration from 2011 to 2020 using valid pixels.

Figure 2. Missing rate of each pixel for the complete dataset.

Figure 3. Seafloor or land height (above mean sea level) in meters of the four areas.

Figure 4. Schematic flowchart for the workflow of model training.

Figure 5. U-net model architecture. Boxes of different colors represent layers of different types. Circles with 2 lines represent the concatenation operations of two matrices. The numbers below each box denote the number of layer channels. Bold black letters below the numbers denote the edge lengths of the matrix, where I is the input edge length (M, N).

Figure 6. The architecture of the coarse-to-fine network.

Figure 7. RMSEs are calculated throughout the training phase for the training set and the validation set. The curves are smoothed to show the trend. (a) RMSEs for 20 random batches in the training set are calculated with a step of 1 epoch with a batch number of 32. (b) RMSEs for the entire validation set are calculated at a step of 5 epochs.

Figure 8. Spatial distribution of RMSE between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

of models with 3 input schemes.

Figure 8. Spatial distribution of RMSE between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

of models with 3 input schemes.

Figure 9. The reconstruction result of each model on specific days for training areas. The first three columns labeled “Prev”, “Cur” and “Next” denote masked Chl-a input of the previous day, the current day, and the next day. The three columns after Chl-a inputs are the reconstruction results of three models with different input schemes. The last column is the full satellite observation of the current day.

Figure 10. Error between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

of the models in the 3 areas.

Figure 10. Error between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

of the models in the 3 areas.

Figure 11. RMSE spatial distribution of the three models for the test set.

Figure 12. The reconstruction result of each model on a specific day for area2 which is not used in training.

Figure 13. Error between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

of the models in area2.

Figure 13. Error between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

of the models in area2.

Figure 14. The reconstruction result of 3D linear interpolation, 3D nearest neighbor interpolation, and the best model.

Table 1. The total list of input variables. Variables of index 9–11 are optional in different models.

Index	Variable Name
1	$\ln (Chl - a)$ of the current day masked by a random mask
2	Flag denotes whether each grid point is valid (0 or 1)
3–4	Masked $\ln (Chl - a)$ and flag of the previous day
5–6	Masked $\ln (Chl - a)$ and flag of the next day
7	$\cos (\frac{d a y o f t h e y e a r}{366} \times 2 π)$
8	$\sin (\frac{d a y o f t h e y e a r}{366} \times 2 π)$
9 (optional)	Longitude (scaled linearly between −1 and 1)
10 (optional)	Latitude (scaled linearly between −1 and 1)
11 (optional)	Elevation (scaled to the power of 1/5 and linearly between −1 and 1)

Table 2. RMSE between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

of the models in the test set.

Table 2. RMSE between

\ln ({Chl - a}_{r e c})

and

\ln ({Chl - a}_{o b s})

of the models in the test set.

Auxiliary Parameters	Area	Single Network	Coarse-to-Fine Network
None	1	0.3705	0.3554
	3	0.3744	0.3482
	4	0.3704	0.3112
Coordinates	1	0.3699	0.3543
	3	0.3627	0.3425
	4	0.3354	0.3060
Bathymetry	1	0.3583	0.3538
	3	0.3433	0.3354
	4	0.3054	0.2916

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Zhou, M. A General Convolutional Neural Network to Reconstruct Remotely Sensed Chlorophyll-a Concentration. J. Mar. Sci. Eng. 2023, 11, 810. https://doi.org/10.3390/jmse11040810

AMA Style

Zhang X, Zhou M. A General Convolutional Neural Network to Reconstruct Remotely Sensed Chlorophyll-a Concentration. Journal of Marine Science and Engineering. 2023; 11(4):810. https://doi.org/10.3390/jmse11040810

Chicago/Turabian Style

Zhang, Xinhao, and Meng Zhou. 2023. "A General Convolutional Neural Network to Reconstruct Remotely Sensed Chlorophyll-a Concentration" Journal of Marine Science and Engineering 11, no. 4: 810. https://doi.org/10.3390/jmse11040810

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A General Convolutional Neural Network to Reconstruct Remotely Sensed Chlorophyll-a Concentration

Abstract

1. Introduction

2. Materials

2.1. Chl-a

2.2. Bathymetry

3. Methods

3.1. Workflow

3.2. Model Architecture

4. Results and Discussion

4.1. Training Process

4.2. Reconstruction of the Training Areas

4.3. Reconstruction for the Area Not Used in Training

4.4. Comparison between the Best Model and Other Methods

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI