Reconstruction of Daily MODIS/Aqua Chlorophyll-a Concentration in Turbid Estuarine Waters Based on Attention U-NET

Ye, Haibin; Tang, Shilin; Yang, Chaoyu; Chen, Chuqun

doi:10.3390/rs15030546

Open AccessArticle

Reconstruction of Daily MODIS/Aqua Chlorophyll-a Concentration in Turbid Estuarine Waters Based on Attention U-NET

by

Haibin Ye

^1,2

,

Shilin Tang

^1,2,*,

Chaoyu Yang

^3,4,5

and

Chuqun Chen

¹

State Key Laboratory of Tropical Oceanography, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China

²

Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 510301, China

³

South China Sea Marine Prediction Center, State Oceanic Administration, Guangzhou 510300, China

⁴

Key Laboratory of Marine Environmental Survey Technology and Application, Ministry of Natural Resources (MNR), Guangzhou 510300, China

⁵

Guangdong Provincial Key Laboratory of Marine Resources and Coastal Engineering, Guangzhou 510300, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(3), 546; https://doi.org/10.3390/rs15030546

Submission received: 19 November 2022 / Revised: 9 January 2023 / Accepted: 10 January 2023 / Published: 17 January 2023

(This article belongs to the Section Ocean Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

An attention U-Net was proposed to reconstruct the missing chlorophyll-a concentration (C_chla) data. The U-Net is a lightweight full convolution neural network architecture consisting of an enccoder-decoder (i.e., down-sampling and up-sampling). The attention gates (AGs) were integrated into the U-Net. Training the U-Net with AGs could implicitly teach it to suppress irrelevant areas and highlight the salient features in the missing data areas, which would increase the network sensitivity and reconstruction accuracy. The neural network uses the satellite-derived C_chla anomalies and its variance as the input, and the reconstructed fields along with their variances as outputs. The trained network was applied to long-term daily MODIS/Aqua C_chla products in the Pearl River estuary (PRE) and adjacent continental shelf area. The model performance was evaluated by using an independent test dataset from both satellite-derived and in-situ measurements. The results showed that the proposed neural network not only had good performance in the reconstruction of valid pixels, but also provided a more reasonable reconstruction compared to the standard U-Net without AGs. This study provided a feasible method for the reconstruction task in the field of ocean color, which should be helpful in producing a creditable dataset to study the ecological effects of extreme weather conditions such as typhoons on the upper ocean in the PRE waters. Based on the reconstructed C_chla products, the footprints of the typhoons were studied. An increase in surface C_chla near the typhoons’ track and a decrease in estuary were found. The composite results illustrated that the C_chla increases occurred for almost the entire area within a radius of 100 km. The time series analysis showed that the C_chla peak appeared on the fifth day after the typhoon’s passage.

Keywords:

chlorophyll-a; reconstruction; U-Net; attention mechanism; ocean color

Graphical Abstract

1. Introduction

Satellite-based ocean color images are widely used to map the distribution of the surface chlorophyll-a concentration (C_chla) due to their extensive coverage in time and space, enabling the study of the oceanographic dynamics of marine ecosystems [1,2]. However, the C_chla products are often influenced by clouds, sun glint, and other potential contaminations because they are retrieved from the water, leaving radiance in the visible and infrared bands [3,4]. These products commonly present with large-scale missing data, and the extent can even be completely missing over the study area. The incompleteness of the products severely limits their use in understanding the biological ocean processes, especially with small-scale and high time variability changes.

The Pearl River estuary (PRE) and northern South China Sea are frequently influenced by typhoons [5]. The great resultant losses induced by typhoons suggest that improving the understanding of typhoon impacts has become an urgent task. Satellite observation is a useful way to capture typhoons’ impact on the ocean surface, i.e., the increase in C_chla [6,7]. However, researchers are faced with the lack of remote sensing data during the passage of typhoons, which makes it difficult to conduct a comprehensive quantitative analysis of the impact of typhoons on the upper ocean’s phytoplankton.

Several schemes have been proposed to solve this problem in the data reconstruction field, from remote sensing images to simple linear and spline interpolation, such as optimal interpolation (OI) [8] and Kriging [9]. At present, a spatiotemporal-based method, Data Interpolation Empirical Orthogonal Functions (DINEOF) [10], is one of the most widely used methods in C_chla reconstruction. The DINEOF method uses dominant spatial and temporal patterns to reconstruct missing data [11]. The growth of phytoplankton is mainly controlled by upper ocean physical processes, such as horizontal and vertical advection, or turbulent diffusion. Although the above schemes have demonstrated impressive performances within their respective study areas, they still have some limitations when handling non-linear relationships in temporal and spatial domains.

Deep learning (DL) methods have the capability to learn the non-linear relationships and complex interactions, which gives them the potential for data reconstruction. Recently, many machine learning and deep learning methods have been proposed in the reconstructed fields, such as the Data-Interpolating Convolution Auto-Encoder (DINCAE) [12], Self-Organizing Maps [13], and the wavelet neural network [14]. The DINCAE is a U-Net-like convolution structure [15], and can be used to learn the nonlinear relationships between the missing and non-missing data areas through a large amount of input data. However, some abrupt features between those satellite-derived and reconstructed pixels are still significant, making the reconstruction of C_chla a difficult task. The attention gate (AGs) model was initially proposed to address the problem of the excessive redundant use of computational resources [16]. It can automatically learn to focus on target structures without additional supervision. Adding AGs into a standard U-Net could automatically teach it to focus on target structures without additional supervision and help to improve the model’s sensitivity and get more accurate results for the missing data, and finally benefit the reconstruction task.

This study aimed to explore the potential of DLs. To achieve the goal, an attention convolution U-Net was trained with a gappy ocean color dataset and then used for the reconstruction of the missing observations. This neural network is a U-Net structure including an encoder and a decoder. The incorporation of AGs is helpful to remove those information mutations with high frequency on the edge of the satellite-derived and reconstructed pixels.

The proposed neural network was applied to the reconstruction of high turbidity and optical complexity in PRE waters for the first time. This work is organized as follows: the study area and the attention U-Net method are introduced in Section 2. The dataset used in this study, the model performance and the reconstructed results is described in Section 3. The necessity of the attention mechanism and its application in typhoon assessment are discussed in Section 4. Finally, the conclusions are presented in Section 5.

2. Materials and Methods

2.1. Study Area

The PRE (21–23.5°N, 111–117°E) is located in the shallow continental shelf of the NSCS, with typical case II waters in estuarine area (‘Box 1’ in Figure 1) and clear open waters in continental shelf area (‘Box 2’ in Figure 1). It is a bell-shaped area of about 49 km from north to south and its width varies from 4 to 48 km [17]. With the rapid development of economy and human activities, PRE is facing enormous ecological and environmental pressures, which also leads to significant inaccuracies in interpreting C_chla from ocean color remote sensing data [18,19]. The unacceptable errors in the estimation make the study area a perfect test site for evaluating the stability of the proposed model.

2.2. Data Source and Processing

Level-1A MODIS/Aqua data were obtained from the National Aeronautics and Space Agency (NASA) ocean color data archive. The MUMM algorithm, a solution that was considered as the better atmospheric correction for the highly turbid PRE waters [17]. Standard flags were used to mask contamination from land, clouds, sun glint, and other potential disturbances to the radiance signal. The daily level-3 standard mapped C_chla product covering the PRE and adjacent continental shelf area from the year 2003 to 2020 was retrieved by a two-stage CNN model (C_chla-Net), which utilized the remote sensing reflectance at visible bands as the model’s inputs. The C_chla-Net was pretrained by a global C_chla retrieval model (OC3M) in first-stage, and then the network’s parameters were refined by in-situ measurements in second-stage [20].

The in-situ C_chla data were collected in 14 surveys during the year 2003 to 2016. Figure 1 shows the specific geographical locations of the sampling stations. A total of 18 consistent stations were pre-set along the central y-axis of the PRE. However due to weather conditions, only the first 16 or 17 stations in several surveys were covered. Distance between neighbouring stations was about 4.5 km, and all the stations were covering a total distance of about 80 km from the sea upstream. Water samples were measured fluorometrically using a pre-calibrated Turner Design 10 fluorometer [21].

The typhoon data were obtained from the Tropical Cyclone Data Center managed by the China Meteorological Administration [22]. The typhoon records comprised information with a time resolution of 6 h, including the location (latitude and longitude of the typhoon center), maximum wind speed (m/s), central pressure level (hPa), and the average translation speed.

For match-up analysis, the procedure developed by [23] was adpoted. The in-situ C_chla data in temporal proximity of 2 h with the satellite observations were selected. Mean value in a 3 × 3 box centered on the sampling location was used for match-up analysis.

To evaluate the performance of the network, the root mean squared difference (RMSD), and Bias between the reconstructed (y’_i) and the satellite-derived (y_i) value was used to evaluate the model performance.

R M S D = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({y^{'}}_{i} - y_{i})}^{2}}

(1)

B i a s = \frac{1}{N} \sum_{i = 1}^{M} ({y^{'}}_{i} - y_{i})

(2)

where N is the number of measurements (excluding land points).

2.3. Neural Network Architecture

2.3.1. U-Net

It has been proved that U-Net architecture permanence better for images segmentation than the standard fully convolution network [15]. U-Net used a symmetric and complete encoder-decoder structure. The encoder extracts image features through a set of convolution and max-pooling layers. Through the sequential process, the model is capable to capture a sufficient information from a large receptive field. The decoder conducts up-sampling of feature images through a set of transposed convolution layers. There are cross-layer connections between the corresponding encoder and decoder layers. On this basis, the high-level features in the encoder are fused with the same scale of up-sampling outputs, which can help the up-sampling layer to recover the details of the image and perform the pixel-wise reconstruction from multiple features in the decoder. The encoder-decoder architecture of U-Net is illustrated in Figure 2.

To clarify the use of U-Net architecture, the general mathematical expression is described below:

y^{'} = f^{U N e t} (x)

(3)

where the input x denotes a multi-channel feature map, U-Net f^UNet outputs the predicted y’ with respect to x, in which y’ denotes the corresponding likelihood in pixel-wise level. Convolution layer filters the multi-channel feature maps to generate multi-level features within f^UNet, in which the general mathematical expression of the convolution is described in Equation (4).

x^{l} = ω * x^{l - 1}

(4)

where x^l⁻¹ is the input image of layer l−1, ω is the kernel, and x^l is the output image of layer l after performing the convolution computation.

Unlike the DINCAE network, two convolution layers were used to connect the encoder and decoder in replace of two fully connected layers. The fully connected structure has too many redundant connections and makes the optimization difficult to capture the local consistencies [24]. The exponentially increasing number of parameters in fully connected layers would consume a lot of computing resources, while a deeper convolution network can produce better results with fewer parameters. In encoder, each of two convolution layers (kernel size = 3 × 3) is followed by a max-pooling layer (2 × 2, stride = 2), which can be regarded as a down-sampling module. The number of feature channels is increased from 16 by 2 times (with the input convolution layer fixed to 10 features). For each down-sampling, the spatial dimensions of the input are cut half, in order to capture the high-level features comprehensively, as well as to reduce variance and computation complexity. In decoder, each up-sampling module is composed of a transpose convolution layer (kernel size = 3 × 3) and two convolution layers. Leaky-ReLU activation functions [25] are used after each convolution layer. After each up-sampling step in decoder, there is a concatenation of the intermediate feature map with a corresponding layer from the encoder, which is computed by a spatial attention block (Equation (5)).

x^{l} = [x_{z t t}^{N - l + 1}, x^{l - 1}]

(5)

where x^l⁻¹ is the intermediate feature map, x_att^N^−l+1 is the corresponding layer from the encoder after a spatial attention block computation, and x^l is concatenation layer.

To avoid over-fitting, a drop-out layer (drop_rate = 0.2) is introduced following each convolution layer. Meanwhile, some Gaussian-distributed noise with mean value of zero and a given standard deviration is added to the input data. Note that the drop-out layers and Gaussian-distributed noise are only used during the optimization.

2.3.2. Attention Mechanism

Attention mechanisms have been widely used recently, of which the aim is to focus on the most critical information for the current task [26,27]. A novel attention gate (AG) module was proposed for automatic learning and focusing on target structures, which can be easily integrated into U-Net to highlight significant features that are passed through the skip connections [16]. The architecture of AGs is shown in Figure 3.

The x^l is the low-level feature. The g_i is the gating vector which is collected from a coarser scale and used for each pixel i to determine focus regions. The output of AGs (x_att^l) is the pixel-wise multiplication of the feature map of layer l and attention coefficients (α_i^l).

x_{a t t}^{l} = x^{l} \cdot α_{i}^{l}

(6)

The additive attention in place of multiplicative attention) was used to obtain the AGs, due to its higher accuracy [28,29]. The formulation for additive attention coefficient is as follows:

α_{i} = σ_{2} (ψ^{T} (σ_{1} (W_{x}^{T} x^{l} + W_{g}^{T} g_{i} + b_{g})) + b_{ψ})

(7)

where σ₁ and σ₂ denote Recttified Linear Unit (ReLU) and Sigmoid activation function respectively. W_x, W_g and Ψ denote linear transformations, b_Ψ and b_g are bias terms. These linear transformations are computed by 1 × 1 convolutions.

In order to have a better fit and to prevent the training from diverging, the time averages of each grid cell has been subtracted. The proposed model thus works with C_chla anomalies scaled by the inverse of the error variance which is either constant (for non-missing data) or zero (for the missing data). The precise value of this constant is not important, as it will be multiplied by a weight matrix and this weight matrix will be optimized by training the network [12]. To get reasonable results, the proposed model works with more input than merely C_chla anomalies scaled by its inverse of the error variance. The final layer of the model produced the results including two parameters. The total catalog of inputs and outputs for the proposed model are given in Table 1.

The output of the proposed network is assumed to follow a Gaussian distribution. Based on this assumption, the loss function is derived from the Kullback-Leibler divergence (KL) representing the likelihood for two univariate Gaussians (satellite derived and reconstructed values). The KL-based loss function (Loss) is defined as follows:

L o s s = \frac{1}{N} \sum_{i = 1}^{N} [\log (\frac{{σ^{'}}_{i}}{σ_{i}}) + \frac{σ_{i}^{2} + {(y_{i} - {y^{'}}_{i})}^{2}}{{σ^{'}}_{i}^{2}}]

(8)

where σ_i and σ_i’ is standard deviation of true value (satellite derived) and predicted value (reconstructed) of pixel i. N is the number of all the valid pixels.

3. Results

3.1. C_chla Dataset Description

For the domain of study, the complete dataset includes 6575 temporal images from 1 January 2003 to 1 December 2020. The size of each image is 240 × 672 pixels, with a resolution of 1 km. To ensure the accuracy of the reconstruction, those images with valid data coverage of less than 2% were excluded, referring to the threshold used by Alvera-Azcárate [30]. Thereafter, the total number of time-series initial images was 2657. The first 90% of the images (2391) were set apart as the training dataset, and the rest (266) were used as the test dataset, which was not involved in the model training process. The input for model training is a 4-D tensor, and each dimension represents the mini-batch size, number of input feature maps, image height, and image width. The data were randomly shuffled over the time dimension before being fed into the model for training. The mini-batch size was set to a relatively small value of 8, due to the limits in GPU (graphical processing unit) memory. However, the training speed was effectively accelerated by applying GPU parallel computing. The complete time series was split into 332 mini-batches and the last mini-batch had only 265 images. During every training epoch, an additional cloud mask was added to a randomly chosen image to mitigate overfitting and aid in generalization.

The complexity of biophysical processes has led to the higher mean C_chla in the coastal area (about 3.0 mg·m⁻³) than the adjacent oligotrophic continental shelf area (lower than 1.0 mg·m⁻³), as shown in Figure 4a. These findings are consistent with previous studies [2]. In the coastal area, the distribution has distinct seasonality caused by both the river runoff and monsoon winds, whereas, in the continental shelf area, the variability is relatively small due to the constant nutrient supply from the deep ocean. A histogram of the C_chla anomalies is shown in Figure 4b. The anomalies follow the theoretical distribution well when a Gaussian probability density function is fitted, with a mean value of 0.0 and a standard deviation of 0.23 (log₁₀ mg·m⁻³). It should be noted that, if the Gaussian assumption is violated, the obtained results can be misleading.

Due to the cloud cover, the average daily missing rate is as high as about 84.9%. The temporal pattern shows an irregular distribution, with some of the images presenting extreme instances of missing data (more than 98%, Figure 5a). Spatially, as shown in Figure 5b, there is a clear divergence in the distribution. The coverage is about 8% over the majority of the domain, with the highest values in the eastern PRE. The Lingdingyang Bay (LBPRE) and part of the western PRE areas display extremely high missing data coverage, probably due to the failure of the atmospheric correction related to the high backscattering of the turbid waters.

3.2. Performance of Reconstructed C_chla

To verify the spatial reconstruction capability for the proposed network, three examples were selected with different initial valid pixel coverages (20.9%, 56.1%, and 85.4%), representing “severely,” “moderately,” and “merely” contaminated by clouds or other factors, which were obtained on 6 July 2011, 30 September 2019, and 13 August 2020. The reconstructed images are shown in Figure 6a,d,g, the satellite-derived images with blanks where there were no data are shown in Figure 6b,e,h, and the scatter plots of the C_chla (satellite-derived versus reconstructed) are shown in Figure 6c,f,i. Overall, these reconstructed fields have a similar spatial distribution to those of the satellited-derived fields. As such, the network can reconstruct the high C_chla exhibited in the coastal and estuarine areas, where the eutrophic and highly turbid waters were brought by the river runoff. Mesoscale features such as the plume fronts associated with the river runoff near the LBPRE are also revealed in the reconstruction.

The 81 samples taken during the filed surveys were used to further quantify the capacity of the network in recovering missing data. There were only nine samples that met the criteria between the satellite-derived and in-situ measurements due to clouds or a failure of atmospheric correction.

As shown in Figure 7, the satellite-derived C_chla generally agree well with the in-situ measurements, and the fitted line was close to the ideal line, with RMSD and Bias values of 0.06 and −0.02 log10 mg·m⁻³ (Table 2). The accuracy tends to be slightly reduced when comparing the reconstructed values with in-situ measurements, with the R², RMSD, and Bias values being reduced to 0.78, 0.12, and −0.08. However, the fitted line was closer to the ideal line, with a slope value of 0.99 and intercept value of 0.08. The results show that the proposed network had a strong generalization ability for waters where the C_chla was lower than 10 mg·m⁻³. The performance of the highly productive waters still needed to be assessed due to the limited range of the in-situ measurements.

The standard deviation relative to the whole dataset was computed for the satellite-derived C_chla and the reconstructions from the proposed network (Figure 8). The standard deviation derived from the network matched well with the standard deviation from the satellite-derived data, in particular in the continental shelf where the values were relatively lower than those in the coastal and estuarine area. However, in parts of the coastal and estuarine area where the C_chla showed a wide range of temporal variations, the standard deviation derived from the reconstruction was significantly smaller than that of the satellite-derived values. Given the fact that these areas were characterized by the lowest coverage of the domain, it is thus remarkable that the neural network retains more variability and features lower RMSE-based measures with lower variability [31].

The daily-averaged C_chla in the coastal and estuarine area are shown in Figure 9. Higher C_chla was observed during summer in the estuarine area and during winter in the coastal area. The temporal patterns of the satellite-derived and reconstructed fields were simultaneous, though the values displayed some divergences. In the continental shelf, the mean satellite-derived values ranged from 0.17 to 0.53 mg·m⁻³, while the mean reconstructed values ranged from 0.15 to 0.51 mg·m⁻³. In the estuarine waters, the ranges of both the satellite-derived and reconstructed mean values were 1.26 to 6.96 mg·m⁻³, and 1.26 to 4.14 mg·m⁻³. After the reconstruction, the seasonal variability was reduced, with the minimum values remaining the same but the maximum values decreasing. This phenomenon was especially significant in the coastal and estuarine areas.

4. Discussion

4.1. Attention Gates Evaluation

To demonstrate the applicability of the attention mechanism to the reconstruction procedure, the C_chla image obtained on 9 July 2003 was selected for evaluation. Reconstructing this image involved several challenging tasks. Firstly, a distinct plume belt with relatively high C_chla formed in the southeast of the LBPRE. However, the data of the belt’s middle part are missing. Secondly, in the high C_chla area of the LBPRE, the amount of data missing is serious. Thirdly, there is also a serious lack of data in the frontal area between the plume waters and continental shelf waters. This requires the model to not only have the capacity to reconstruct a wide range of C_chla values, but also concern the spatial continuity on a global scale.

The attention U-Net is compared with the standard U-Net. Reconstructions of both models are shown in Figure 10. The reconstruction results for the valid pixels of both models are reliable, which means the values are close to the satellite-derived ones. A discrepancy appears in the reconstruction of the missing data. Intuitively, the attention U-Net outperforms the standard U-Net. The three challenges mentioned above are well-addressed by the attention U-Net. The plume belt was recovered, which can be observed extending eastward from the southeast of the LBPRE to the northeast of the study domain. In contrast, the standard U-Net failed to accomplish this work. The missing data in the frontal area were well-reconstructed, and the abrupt features in the spatial pattern almost disappeared. In addition, the attention U-Net did a fine job in the high C_chla area of the LBPRE, where practically no noise pixels appeared in the standard U-Net image.

The receptive field grows as the layers deepen. In addition, due to the supervised learning mode, the deep layer learns the abstract features (high-level), whereas the shallow layer learns the detailed features (low-level). A complete reconstruction task consists of classification and regression. Classification requires features containing more advanced information, and regression (positioning) requires features containing more detailed information. It’s difficult to achieve both requirements in a single feature map. The skip connection is a good solution to this problem. In the standard U-Net architecture, the skip connection allows the decoder to directly use features extracted by the encoder. Thus, the network can learn to use these features when “rebuilding” the image in the decoder. Adding an attention module into such a connection allows the network to focus on specific parts of the input image. Figure 11 depicts the attention coefficients obtained from the above example. The rows depict the different layers of the network with a decreasing resolution. The columns depict the different training epochs. This figure illustrates that how the AGs learn to focus on particular parts of the input image. The AGs in shallow layers (first and second layers) did not change much during the training process, which shows that the network still has room for improvement in learning detailed information. Some small-scale ocean phenomena might be lost after reconstruction. In deep layers, the AGs initially had a uniform distribution, and gradually localized towards different characteristics of the input. For example, in the third layer, the feature map finally focused on the frontal area where the C_chla change rapidly in spatial pattern. Additionally, at a coarser scale (the fourth layer), the AGs provided a rough outline of the missing pixels areas.

4.2. Application in Typhoon Footprints

Three typhoon transit events were selected to demonstrate the positive role of the reconstructed data. The details of the three typhoons are shown in Table 3. During the passage of a typhoon, data going missing is inevitable due to the thick clouds and intensive storms. The reconstruction method can overcome this shortcoming by providing relatively reliable and complete data over the entire typhoon period.

The mean C_chla were calculated 15 days before and after the typhoon. The subtraction between the two mean data as an anomaly was used for further analysis of the changes in C_chla under the influence of a specific typhoon event. The satellite-derived, reconstructed C_chla and the anomaly of the typhoons are shown in Figure 12. Within the selected three typhoons, the percentage of valid data in the study domain acquired by satellite is different. For instance, the satellite was able to obtain relatively complete data before and after the typhoon Imbudo’s transit (with non-missing data coverage of 97.6% and 97.3%), which means the reconstruction procedure only needs to fill a few areas. The non-missing data were rare before the typhoon Haima’s transit (valid data coverage of 49.0%), and the satellite was able to acquire almost complete data after the typhoon’s passage (valid data coverage of 98.4%). However, data coverage during the typhoon Higos was a different story from that of typhoon Haima, with valid data coverages of 96.1% and 0%.

The surface C_chla response varies greatly both in spatial distribution and among individual events. However, two general and obvious phenomena can still be noticed. One is the increase in the C_chla near the typhoon’s track, and the other is the decrease in the C_chla in the estuary (Table 4). The surface C_chla were generally low before the typhoons along the track (except the estuary area), with mean C_chla ranging from 0.40 to 1.37 mg·m⁻³. The surface C_chla increased significantly near the transit, with mean values ranging from 0.13 to 0.51 mg·m⁻³ after the typhoons, an increase of 1.09 to 1.94 times. C_chla formed a negative growth area in the LBPRE. Especially during the typhoon Imbudo, the average decrease had reached about 1.23 mg·m⁻³. The decrease was mainly due to the increase river runoff carrying large amount of sediment into the estuary. The shallow euphotic layer caused by the highly turbid waters could inhibit the growth of phytoplankton.

To summarize the C_chla response to typhoons in the study domain, a composite method with the reconstructed dataset before and after typhoons is applied [32,33]. Tropical cyclones are commonly categorized on a scale of 1–5 for wind speeds exceeding 33 m s⁻¹ following the Saffir-Simpson Hurricane Wind Scale (SSHWS) [34]. A typhoon was counted if its track occurred at least once in the spatial range of the study domain and the instant category ranging from 2 to 5 (excluding those tropical depressions with weak intensity). During the 18 years (from 2003 to 2020), a total of 14 typhoons and 28 typhoon locations were identified (Figure 13a). A common radius of 100 km from all typhoon locations was applied to capture the majority of the C_chla response, which is defined as the C_chla anomalies (δChla(p,t), mg·m⁻³) by removing the climatological average for the corresponding pixel p and time t. The overall responses (ΔChla, mg·m⁻³) can be further evaluated by subtracting the averaged amomalies of C_chla for typhoon location n at time t1, which ranges from 1 to 15 days before typhoon (δChla(n, p,t₁), mg·m⁻³), and the averaged anomalies of C_chla for the typhoon location n at time t₂, which ranges from 1 and 15 days after the typhoon (δChla(n, p,t₂), mg·m⁻³).

The spatial pattern of typhoon-induced C_chla changes is clearly shown in Figure 13b. The C_chla increases occurred for almost the entire area within its radius, with an average of 0.06 ± 0.06 mg·m⁻³. The response is stronger to the right than that to the left, mainly due to the more intense wind-current resonance to the right side of the typhoon [35]. The time series of daily averaged C_chla anomalies showed the averaged C_chla anomalies was −0.05 mg·m⁻³ and 0.12 mg·m⁻³ before and after typhoon, respectively. C_chla started to increase two days after the arrival of typhoon, and reached its peak six days later, when the maximum was approximately 0.23 ± 0.06 mg·m^−3. The C_chla decreased afterward, but barely recovered to its initial value within 15 days.

5. Conclusions

In this study, a deep learning network is employed to reconstruct the missing C_chla data in satellite images. The reconstruction results are close to the concurrent in-situ measured values. Compared with the standard U-Net, the proposed attention U-Net can eliminate the abrupt patches between those missing-value and valid pixels, and has a better resilience to the mesoscale and submesoscale ocean phenomena. The increment of reconstruction accuracy is due to the contribution of the attention mechanism. Through the training process, the attention U-Net can improve its ability to learn about the missing-data area and the frontal area where the changes occur rapidly.

The proposed network was successfully applied to reconstruct the daily C_chla products covering the PRE and adjacent continental shelf area. Differences were found in the time series of mean C_chla between the reconstructed and the initial data, particularly in the estuary where the missing data rate was high. Therefore, the reconstructed data may provide a new data choice for the estuarine ecological research.

The lack of remote sensing data has perplexed researchers in the study of the ecological response of the upper ocean to typhoons. By using reconstructed C_chla products, we analyzed the influence of individual typhoons on phytoplankton as they passed through the study domain. It was found that the surface C_chla increases near the typhoon track and decreases in the estuary. The composite analysis illustrates that the influence of typhoon on the right side is greater than that on the left side. The increase of the surface reaches its peak on the fifth day after the passage of the typhoon, and then the influence of the typhoon gradually decreases.

In order to capture more information from a large receptive field, a set of convolution and max-pooling layers are necessary. However, convolution and pooling can also cause the image to lose some of its details, even if skip connection is used in the U-NET structure. When applied to water color remote sensing, the reconstruction scheme is still full of challenges in extracting small-scale ocean phenomena. The new deep learning network of the Swin Transformer, which replaces the traditional convolution operation with self-attention, may become a feasible solution for small-scale reconstruction.

Author Contributions

Writing—original draft preparation, H.Y.; Supervision and Conceptualization, S.T.; Writing—review & Editing, C.Y. and C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDA19060502), the special project for high-resolution earth observation (No. 30-H30C01-9004-19/21), the Science and Technology Program of Guangzhou, China (No. 202201010101), the Chinese Academy of Sciences No 133244KYSB20180029, the Key R&D Project in Hainan Province No ZDYF2020174, the State Key Laboratory of Tropical Oceanography Independent Research Fund No LTOZZ2103.

Data Availability Statement

No new data were created.

Acknowledgments

The authors would like to thank the NASA Goddard Space Center for providing MODIS data and the NASA OBPG group for providing the SeaDAS software package, and thank the GHER group at the University of Liège for providing the DINCAE. The colleagues in the Ocean Color group of the South China Sea Institute of Oceanology, Chinese Academy of Sciences, are greatly appreciated for their effort in collecting and processing the samples. The numerical analysis is supported by the High Performance Computing Division in the South China Sea Institute of Oceanology.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vantrepotte, V.; Mélin, F. Inter-Annual Variations in the SeaWiFS Global Chlorophyll a Concentration (1997–2007). Deep. Sea Res. Part I Oceanogr. Res. Papers 2011, 58, 429–441. [Google Scholar] [CrossRef]
Ye, H.; Yang, C.; Tang, S.; Chen, C. The phytoplankton variability in the Pearl River estuary based on VIIRS imagery. Cont. Shelf Res. 2020, 207, 104228. [Google Scholar] [CrossRef]
Wang, M.; Shi, W. Cloud Masking for Ocean Color Data Processing in the Coastal Regions. IEEE Trans. Geosci. Remote Sen. 2006, 44, 3196. [Google Scholar] [CrossRef]
Liu, X.; Wang, M. Gap Filling of Missing Data for VIIRS Global Ocean Color Products Using the DINEOF Method. IEEE Trans. Geosci. Remote. Sens. 2018, 56, 4464–4476. [Google Scholar] [CrossRef]
Pan, G.; Chai, F.; Tang, D.L.; Wang, D.X. Marine phytoplankton biomass responses to typhoon events in the South China Sea based on physical-biogeochemical model. Ecol. Model. 2017, 356, 38–47. [Google Scholar] [CrossRef]
Lin, S.; Zhang, W.-Z.; Shang, S.-P.; Hong, H.-S. Ocean response to typhoons in the western North Pacific: Composite results from Argo data. Deep. Sea Res. Part I 2017, 123, 62–74. [Google Scholar] [CrossRef]
Liu, F.; Tang, S. Influence of the interaction between typhoons and oceanic mesoscale eddies on phytoplankton blooms. J. Geophys. Res. Oceans 2018, 123, 2785–2794. [Google Scholar] [CrossRef]
Bennett, A. Inverse Modeling of the Ocean and the Atmosphere; Cambridge University Press: New York, NY, USA, 2005. [Google Scholar]
Meng, Q.; Borders, B.; Madden, M. High-resolution Satellite Image Fusion Using Regression Kriging. Int. J. Remote Sens. 2010, 31, 1857–1876. [Google Scholar] [CrossRef]
Beckers, J.M.; Rixen, M. EOF Calculations and Data Filling from Incomplete Oceanographic Datasets. J. Atmos. Ocean. Technology. 2003, 20, 1839–1856. [Google Scholar] [CrossRef]
Lin, T.; He, Q.; Zhan, W.; Zhan, H. Persistent data gap in ocean color observations over the East China Sea in winter: Causes and reconstructions. Remote. Sens. Lett. 2020, 11, 667–676. [Google Scholar] [CrossRef]
Barth, A.; Alvera-Azcárate, A.; Licer, M.; Beckers, J.-M. DINCAE 1.0: A convolutional neural network with error estimates to reconstruct sea surface temperature satellite observations. Geosci. Model Dev. Discuss. 2019, 13, 1609–1622. [Google Scholar] [CrossRef] [Green Version]
Jouini, M.; Lévy, M.; Crépon, M.; Thiria, S. Reconstruction of satellite chlorophyll images under heavy cloud coverage using a neural classification method. Remote Sens. Environ. 2013, 131, 232–246. [Google Scholar] [CrossRef]
Patil, K.; Deo, M.C. Prediction of daily sea surface temperature using efficient neural networks. Ocean Dyn. 2017, 67, 357–368. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention 2015; IEEE MICCAI: Munich, Germany, 2015; pp. 234–241. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Ye, H.B.; Chen, C.Q.; Yang, C.Y. Atmospheric correction of Landsat-8/OLI imagery in turbid estuarine waters: A case study for the Pearl River estuary. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 252–261. [Google Scholar] [CrossRef]
Son, Y.; Kim, H. Empirical ocean color algorithms and bio-optical properties of the western coastal waters of Svallbard Arctic. ISPRS J. Photogram Remote Sen. 2018, 139, 272–283. [Google Scholar] [CrossRef]
Zheng, G.; DiGiacomo, P.M. Uncertainties and applications of satellite-derived coastal water quality products. Prog. Oceanogr. 2017, 159, 45–72. [Google Scholar] [CrossRef]
Ye, H.; Tang, S.; Yang, C. Deep learning for Chlorophyll-a concentration retrieval: A case study for the Pearl River estuary. Remote Sens. 2021, 13, 3717. [Google Scholar] [CrossRef]
Welschmeyer, N. Fluorometric analysis of chlorophyll a in the presence of chlorophyll b and pheopigments. Limnol. Oceanogr. 1994, 38, 1985–1992. [Google Scholar] [CrossRef] [Green Version]
Ying, M.; Zhang, W.; Yu, H.; Lu, X.; Feng, J.; Fan, Y.; Zhu, Y.; Chen, D. An overview of the China Meteorological Administration tropical cyclone database. J. Atmos. Oceanic Technol. 2014, 31, 287–301. [Google Scholar] [CrossRef] [Green Version]
Evers-King, H.; Martinez-Vicente, V.; Brewin, R.J.W.; Dall’Olmo, G.; Hickman, A.E.; Jackson, T.; Kostadinov, T.S.; Krasemann, H.; Loisel, H.; Röttgers, R.; et al. Validation and Intercomparison of ocean color algorithms for estimating particulate organic carbon in the oceans. Front. Mar. Sci. 2017, 4, 251. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.; Woo, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar]
Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 16–21. Available online: http://robotics.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf (accessed on 18 November 2022).
Li, S.Y.; Dong, M.; Du, G.M.; Mu, X.M. Attention Dense-U-Net for Automatic Breast Mass Segmentation in Digital Mammogram. IEEE Access 2019, 7, 59037–59046. [Google Scholar] [CrossRef]
Zhang, W.W.; Li, J.; Hua, Z. Attention-Based Tri-UNet for Remote Sensing Image Pan-Sharpening. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3719–3732. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473, preprint. [Google Scholar]
Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025, preprint. [Google Scholar]
Alvera-Azcárate, A.; Barth, A.; Sirjacobs, D.; Beckers, J.-M. Enhancing temporal correlations in EOF expansions for the reconstruction of missing data using DINEOF. Ocean. Sci. 2009, 5, 475–485. [Google Scholar] [CrossRef] [Green Version]
Ebert, E.; Wilson, L.; Weigel, A.; Mittermaier, M.; Nurmi, P.; Gill, P.; Göber, M.; Joslyn, S.; Brown, B.; Fowler, T.; et al. Progress and challenges in forecast verification. Meteorol. Appl. 2013, 20, 130–139. [Google Scholar] [CrossRef]
Wang, Y.T. Composite of Typhoon-Induced Sea Surface Temperature and Chlorophyll-a Responses in the South China Sea. J. Geophys. Res. Ocean. 2020, 125, e2020JC016243. [Google Scholar] [CrossRef]
Wang, Y.; Xiu, P. Typhoon footprints on ocean surface temperature and chlorophyll-a in the South China Sea. Sci. Total Environ. 2022, 840, 156686. [Google Scholar] [CrossRef]
Simpson, R.H.; Saffir, H. The hurricane disaster-potential scale. Weatherwise 1974, 27, 169–186. [Google Scholar]
Babin, S.M.; Carton, J.; Dickey, T.D.; Wiggert, J.D. Satellite evidence of hurricane induced phytoplankton blooms in an oceanic desert. J. Geophys. Res. 2004, 109, C03043. [Google Scholar] [CrossRef]

Figure 1. Study area and location of sampling stations.

Figure 2. A block diagram of the proposed Attention U-Net model.

Figure 3. Schematic of additive attention gate (AG).

Figure 4. (a) Averaged C_chla over 2003 to 2020 using valid pixels; (b) Estimated probability distributions of the C_chla anomalies of the whole dataset (satellite derived, log₁₀−transformed).

Figure 5. (a) Temporal variability of non-missing data coverage percentage and with low-pass filter of one month (red line); (b) mean percentage of non-missing data coverage.

Figure 6. Daily reconstruction of the images acquired on 6 July 2011, 30 September 2019, and 13 August 2020. (a,d,g) Satellite−derived C_chla, (b,e,h) network reconstructed C_chla. (c,f,i) Scatter plots between C_chla-sat and C_chla-rec.

Figure 7. Comparison of the in−situ C_chla and the reconstructed C_chla. The dash line represents the fitted lines for the corresponding scatter plots.

Figure 8. Standard deviation of chlorophyll-a concentration computed from the complete dataset: (a) satellite-derived; (b) network reconstructed.

Figure 9. Time series of daily mean of satellite−derived and reconstructed C_chla in the continental shelf (a) and estuary (b).

Figure 10. Comparison of Attention U-Net and U-Net reconstruction in study domain; the C_chla image was obtained on 9 July 2003. (a) Satellite-derived C_chla; (b) Attention U-Net-reconstructed C_chla; (c) U-Net-reconstructed C_chla.

Figure 11. Example of feature maps given an input sample. Top to bottom row shows the AG maps of the four attention layers. Left to right column shows the AG maps across different training epochs (6, 8, 10, 100).

Figure 12. Change of the before and after the three typhoons: (a) Imbudo; (b) Higos; (c) Haima.

Figure 13. (a) Maps of the typhoon transits in the PRE; (b) composite of the typhoon−induced changes in the surface C_chla (ΔChla, mg·m⁻³); (c) daily spatial averages of C_chla anomalies (δChla, mg·m⁻³) before and after the arrival of the typhoon. The shading red patch is standard errors within each day, and the black dotted lines are the average of anomalies before and after typhoons.

Table 1. Catalog of neural network inputs and outputs.

	Index	Parameter Name
Input	1	C_chla anomalies scaled by the inverse of the error variance (zero if the data is missing)
	2	Inverse of the error variance (zero is the data is missing)
	3–4	Scaled C_chla anomalies and inverse of the error variance of the previous day
	5–6	Scaled C_chla anomalies and inverse of the error variance of the next day
	7	Longitude (scaled linearly between −1 and 1)
	8	Latitude (scaled linearly between −1 and 1)
	9	Cosine of the day of the year divided by 365.25
	10	Sine of the day of the year divided by 365.25
Output	1	C_chla scaled by the inverse of the expected error variance
Output	2	Logarithms of the inverse of the expected error variance

Table 2. Relationship between the in-situ C_chla and the reconstructed (satellite-derived). The RMSD and Bias are in log₁₀ mg·m⁻³.

	N	RMSD	Bias	R²	Slope	Intercept
Reconstructed	81	0.12	−0.08	0.78	0.99	0.08
Satellite-derived	9	0.06	−0.02	0.83	0.79	0.13

Table 3. The basic information of three selected typhoons.

	Passing Time	Maximum Wind Speed (m/s)	Central Pressure Level (hPa)	Radius of 30 kn Wind Speed (km)	Translation Distance (km)	Average Translation Speed (km/h)
Imbudo	24–25 July 2003	50	935	280	4232	22.8
Higos	18–19 August 2020	28	992	40	889	21.2
Haima	21 October 2016	59	900	190	3846	23.7

Table 4. The area-average of C_chla and C_chla anomalies before and after the selected typhoons.

	Along-Track			Estuary
	Pre-Typhoon	Post-Typhoon	Anomaly	Pre-Typhoon	Post-Typhoon	Anomaly
Imbudo	0.54	1.24	0.51	5.45	4.35	−1.23
Higos	0.40	0.63	0.15	5.28	4.97	−0.01
Haima	1.37	1.59	0.13	3.56	3.63	0.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, H.; Tang, S.; Yang, C.; Chen, C. Reconstruction of Daily MODIS/Aqua Chlorophyll-a Concentration in Turbid Estuarine Waters Based on Attention U-NET. Remote Sens. 2023, 15, 546. https://doi.org/10.3390/rs15030546

AMA Style

Ye H, Tang S, Yang C, Chen C. Reconstruction of Daily MODIS/Aqua Chlorophyll-a Concentration in Turbid Estuarine Waters Based on Attention U-NET. Remote Sensing. 2023; 15(3):546. https://doi.org/10.3390/rs15030546

Chicago/Turabian Style

Ye, Haibin, Shilin Tang, Chaoyu Yang, and Chuqun Chen. 2023. "Reconstruction of Daily MODIS/Aqua Chlorophyll-a Concentration in Turbid Estuarine Waters Based on Attention U-NET" Remote Sensing 15, no. 3: 546. https://doi.org/10.3390/rs15030546

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reconstruction of Daily MODIS/Aqua Chlorophyll-a Concentration in Turbid Estuarine Waters Based on Attention U-NET

Abstract

1. Introduction