Improvement of the Semi-Analytical Algorithm Integrating Ultraviolet Band and Deep Learning for Inverting the Absorption Coefficient of Chromophoric Dissolved Organic Matter in the Ocean

Wang, Yongchao; Xin, Quanbo; Wei, Xiaodao; Xu, Luoning; Bi, Jinqiang; Bao, Kexin; Song, Qingjun

doi:10.3390/rs18020207

Open AccessArticle

Improvement of the Semi-Analytical Algorithm Integrating Ultraviolet Band and Deep Learning for Inverting the Absorption Coefficient of Chromophoric Dissolved Organic Matter in the Ocean

by

Yongchao Wang

^1,2,

Quanbo Xin

^1,3,

Xiaodao Wei

^4,5,

Luoning Xu

⁶,

Jinqiang Bi

⁷

,

Kexin Bao

¹

and

Qingjun Song

^2,8,*

¹

National Engineering Research Center of Port Hydraulic Construction Technology, Tianjin Research Institute for Water Transport Engineering, Tianjin 300456, China

²

Key Laboratory of Space Ocean Remote Sensing and Application, Ministry of Natural Resources of the People’s Republic of China, Beijing 100081, China

³

Nanjing Hydraulic Research Institute, Nanjing 210029, China

⁴

China Three Gorges Corporation, Wuhan 430010, China

⁵

Shanghai Investigation, Design & Research Institute Co., Ltd., Shanghai 200335, China

⁶

School of Marine Science and Technology, Tianjin University‌, Tianjin 300072, China

⁷

School of Environmental Science and Safety Engineering, Tianjin University of Technology, Tianjin 300384, China

⁸

National Satellite Ocean Application Service, Ministry of Natural Resources of the People’s Republic of China, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(2), 207; https://doi.org/10.3390/rs18020207

Submission received: 27 November 2025 / Revised: 5 January 2026 / Accepted: 6 January 2026 / Published: 8 January 2026

(This article belongs to the Special Issue Artificial Intelligence in Hyperspectral Remote Sensing Data Analysis)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

The DQAAG algorithm significantly improves the retrieval accuracy of the a_g(443) in both coastal and ocean waters by integrating UV bands and deep learning model.
Compared with established models (S2011, A2018, and QAA-CDOM), DQAAG achieves superior performance, demonstrating high accuracy across both simulated (IOCCG) and in situ (NOMAD) datasets.

What are the implications of the main findings?

The integration of UV bands providing a more effective approach for future ocean color satellite missions to retrieve CDOM accurately.
Combining deep learning with semi-analytical algorithms offers a robust and adaptable method for processing hyperspectral ocean color data.

Abstract

As an important component of waters constituent that affects ocean color and the underwater ecological environment, the accurate assessment of Chromophoric Dissolved Organic Matter (CDOM) is crucial for observing the continuous changes in the marine ecosystem. However, remote sensing estimation of CDOM remains challenging for both coastal and oceanic waters due to its weak optical signals and complex optical conditions. Therefore, the development of efficient, practical, and robust models for estimating the CDOM absorption coefficient in both coastal and oceanic waters remains an active research focus. This study presents a novel algorithm (denoted as DQAAG) that incorporates ultraviolet bands into the inversion model. The design leverages the distinct spectral absorption characteristics of phytoplankton versus detrital particles in the ultraviolet (UV) region, enabling improved discrimination of water color parameters. Furthermore, the algorithm replaces empirical formulas commonly used in semi-analytical approaches with an artificial intelligence model (deep learning) to achieve enhanced inversion accuracy. Using IOCCG hyperspectral simulation data and NOMAD dataset to evaluates Shanmugam (2011) (S2011), Aurin et al. (2018) (A2018), Zhu et al. (2011) (QAA-CDOM), DQAAG, the results indicate that the a_g(443) derived from the DQAAG exhibit good agreement with the validation data, with root mean square deviation (RMSD) < 0.3 m⁻¹, mean absolute relative difference (MARD) < 0.30, mean bias (bias) < 0.028 m⁻¹, coefficient of determination (R²) > 0.78. The DQAAG algorithm was applied to SeaWiFS remote sensing data, and validation was performed through match-up analysis with the NOMAD dataset. The results show the RMSD = 0.14 m⁻¹, MARD = 0.39, and R² = 0.62. Through a sensitivity analysis of the algorithm, the study reveals that R_rs(670) and R_rs(380) exhibit more significant characteristics. These results demonstrate that UV bands play a crucial role in enhancing the retrieval accuracy of ocean color parameters. In addition, DQAAG, which integrates semi-analytical algorithms with artificial intelligence, presents an encouraging approach for processing ocean color imagery to retrieve a_g(443).

Keywords:

CDOM; deep learning; ultraviolet; QAA; ocean color remote sensing

1. Introduction

Dissolved organic matter (DOM) represents a major reservoir of organic carbon in aquatic systems and serves as the primary carrier of organic carbon in the Earth’s hydrosphere, playing a significant role in the global carbon cycle [1,2,3,4,5]. Chromophoric Dissolved Organic Matter (CDOM), the photosensitive component of DOM, is not only conservative in nature but also capable of absorbing ultraviolet and visible light [6]. It is thus employed as an effective tracer for evaluating both the concentration and spatial distribution of DOM in aquatic environments [7,8,9]. The absorption of ultraviolet and visible radiation by CDOM drives the development of marine photochemistry, modulates the penetration of UV radiation into the water column [10,11]. Therefore, the global distribution and dynamic processes of CDOM contribute to improving our prior understanding of biogeochemical behaviors and processes, thereby enhancing the accuracy of global marine ecosystem and climate modeling.

Quantifying CDOM through remote sensing serves as a valuable tool for investigating changes in marine ecosystems and studying the global-scale carbon cycle [2,12,13,14]. The absorption coefficient of CDOM, denoted as a_g(λ), is used to represent CDOM concentration in ocean color remote sensing. Numerous algorithms have been developed to estimate a_g(λ), primarily including empirical algorithms [15,16,17,18] and semi-analytical algorithms [19,20,21,22,23,24,25]. Empirical algorithms are established based on statistical relationships between water constituents and remote sensing reflectance (R_rs), exhibiting regional applicability [18,19].

The semi-analytical algorithms have unique theoretical foundations and mathematical solutions. Taking the widely used QAA as an example, this method employs the absorption ratio of two blue bands (e.g., 410 and 440 nm) to partition the total absorption (a) spectrum into contributions by phytoplankton (a_ph) and combined detrital and dissolved organic matter (a_dg) [25]. However, Wei and Lee (2015) [26] pointed out that in the UV region, a_dg is significantly higher than a_ph. By introducing UV wavelengths into the QAA (resulting in QAA-UV), they enhanced the retrieval accuracy of both a_dg and a_ph [27]. Recent and planned ocean color hyperspectral satellite missions increasingly include spectral bands within the UV range (for instance, OLCI, SGLI, HY1C, and PACE) [28,29,30]. These advancements establish a critical data foundation for improving the accurate retrieval of a_g [11,31].

It is noteworthy that neither QAA nor QAA-UV decomposes a_dg into a_d and a_g [25,26], thereby limiting their ability to accurately characterize the spatial and temporal variability of CDOM. In addition, Wang et al. (2021) [11] demonstrated that the complex variations in water constituents often lead to nonlinear relationships among optical parameters, making it difficult for conventional methods to establish reliable functional approximations. In recent years, the rise in artificial intelligence model (deep learning or neural networks) has significantly advanced the retrieval of optical parameters [32,33,34,35,36,37,38]. Deep learning (neural networks) is inherently an empirical model, and relying solely on it imposes the limitation of unclear physical interpretability. Many scholars tend to construct deep learning models directly for the inversion of optical parameters. Chen et al. (2014) [34] developed a neural network model for retrieving the absorption coefficient. However, in the inversion of CDOM, statistical formulas were still employed [39]. Such an approach often lacks clear physical interpretability. Therefore, integrating deep learning with semi-analytical algorithms to effectively separate a_g and enhance inversion accuracy represents an important and meaningful research direction.

This study aims to develop a new algorithm, deep learning-enhanced QAA with UV bands for CDOM retrieval (named DQAAG), that combines semi-analytical algorithms and deep learning models to improve the retrieval of a_g. The performance of the DQAAG algorithm was evaluated using both simulated data and the NOMAD in situ dataset. Using Sea-viewing Wide Field-of-view Sensor (SeaWiFS) data as an example, we demonstrate the performance of the DQAAG algorithm in estimating a_g(443) on a global scale and illustrate its impact on ocean color retrieval accuracy. The organization of this paper is as follows: in Section 2, we described the data used to establish the a_g(λ) inversion algorithm; in Section 3, we introduced the algorithm of DQAAG; in Section 4, we present the results of the algorithm; in Section 5, provide a comprehensive evaluation of the algorithm performance and demonstrates the application of the algorithm to global ocean data; and Section 6 summarizes the key findings and proposed future prospects.

2. Data and Materials

2.1. Training Data

The availability of a large and diverse dataset is critical for the development of any deep learning-based algorithm. We constructed an inclusive hyperspectral synthetic dataset containing IOP and R_rs to train DQAAG. To accommodate diverse water types, the generation of a synthetic dataset encompassing a wide range of R_rs(λ) necessitates that both the a(λ) and the backscattering coefficient b_b(λ) cover broad yet plausible ranges. Both a(λ) and b_b(λ) consist of contributions from pure water and water constituents, including phytoplankton pigments, CDOM, and detrital minerals. The generation of this dataset generally follows the methodology described in IOCCG Report 5 [40,41]. For detailed formulations, please refer to https://ioccg.org/wp-content/uploads/2016/03/lee-data.pdf, accessed on 25 July 2025.

The hyperspectral synthetic dataset generation system is driven by a_ph(440), with randomized parameters designed to ensure broad coverage of diverse water types. The generation of these constrained random values follows the methodology established in IOCCG-OCAG [41]. We set the range of a_ph(440) to 0.001–1 m⁻¹, generated 200,000 IOP–R_rs(λ) synthetic dataset, with 80% of the data is used for training the model, and 20% of the data is used for validating the model. The wavelength ranges from 350 to 800 nm with an interval of 1 nm, the range of R_rs(555) is 6.0 × 10⁻⁴–0.059 sr⁻¹, b_bp(443) is 4.6 × 10⁻⁵–0.71 m⁻¹, a(443) is 6.8 × 10⁻³–10.08 m⁻¹, and a_g(443) is 3.9 × 10⁻⁴–8.05 m⁻¹, respectively.

Figure 1a shows an example of simulating R_rs(λ) spectra, demonstrating that the synthetic dataset covers both clear open ocean waters and optically complex coastal waters. Figure 1b illustrates the statistical distribution of synthesized data a_g(443), and the statistical indicators are shown in Table 1. Figure 1e,f illustrate the relationships between R_rs(443) and R_rs(410), as well as between a_g(443) and a_g(410), in simulated data versus natural observational data. The range covered by the simulated dataset far exceeds that of the IOCCG and NOMAD datasets. This indicates that the IOCCG and NOMAD data are well encompassed within the synthetic dataset range, though some combinations of inherent optical properties (IOPs) in the simulated set may not exist or are extremely rare in natural aquatic environments. In the simulated dataset, the coefficient of variation (CV) for R_rs(555) = 1.18, while that for a_g(443) = 2.19. These relatively high values indicate a substantial degree of dispersion among the constructed data points, which effectively covers a wide range of water types. As a result, the deep learning model developed on this dataset is expected to possess stronger generalization capability.

2.2. Validation Data

2.2.1. Simulated Data

The International Ocean Color Coordinating Group (IOCCG) established a hyperspectral dataset containing 500 inherent optical property (IOP) spectra. The spectral range of the original IOCCG dataset is 400–800 nm (with a wavelength increment of 10 nm), and the solar zenith angle is 30° [40]. We extended the IOCCG simulation data to the ultraviolet band at 360 nm and interpolated it to intervals of 1 nm. The data extended methodology was adopted from Wang et al. (2021) [11]. The data range of IOCCG simulation data is shown in Table 1, where the range of R_rs(555) is 1.0 × 10⁻³–0.029 sr⁻¹, b_bp(443) is 6.4 × 10⁻⁴–0.13 m⁻¹, a(443) is 1.6 × 10⁻²–3.17 m⁻¹, a_g(443) is 2.5 × 10⁻³–2.37 m⁻¹. The IOCCG simulated dataset R_rs(λ) spectrum is shown in Figure 1c. The CV for a_g(443) in the IOCCG simulated dataset is 1.45 (Table 1), indicating that the validation data covers a wide range of waters, thereby enabling a better assessment of the model generalization capability.

2.2.2. NOMAD Dataset

The NOMAD dataset is a publicly available, globally distributed, high-quality in situ bio-optical dataset widely used for ocean color algorithm development and validation of satellite data products [42]. The NOMAD dataset (Version 2.a) was downloaded from the SeaBASS website: http://seabass.gsfc.nasa.gov/, accessed on 25 July 2025. In addition to in situ measured R_rs(λ), the dataset also includes matched IOPs, such as particle absorption (a_p), a_ph, a_g, a_d, and the b_b. A total of 4559 spectra were obtained, but after excluding incomplete records and low-quality data, with missing or invalid values (−999) in R_rs(410, 443, 490, 555, 670) and a_g(443), and quality assurance score lower than 0.8. R_rs(λ) was matched with a_g to obtain a total of 287 spectra for further analysis. The distribution of the matched data is shown in Figure 2 (indicated by red circles). The NOMAD data range is shown in Table 1, where the range of R_rs(555) is 6.4 × 10⁻⁴–0.040 sr⁻¹, a_g(443) is 5.4 × 10⁻⁴–1.12 m⁻¹. The NOMAD R_rs(λ) spectrum is shown in Figure 1d. Although the CV for a_g(443) in the NOMAD dataset is lower than that of the simulated data, it remains effective for evaluating the model.

2.2.3. Remote Sensing Image Data

To advance the understanding of satellite ocean color remote sensing applications, taking SeaWiFS as an example, we employed satellite to in situ data match-up to evaluate the performance of the algorithm in retrieving a_g(443). The SeaWiFS Level-2 R_rs(λ) data were obtained from the NASA website (https://oceancolor.gsfc.nasa.gov, accessed on 25 July 2025).

For each field station, the median value of a 3 × 3 pixel centered around the station is used to represent the satellite measurement [43]. The time window between the in situ and satellite data is set to ±5 h [42]. In addition, the quality of spectral data is judged based on Quality Assurance (QA) scores [44] and Level-2 processing flags (l2_flags), where only data with QA scores > 0.8 are retained, and SeaWiFS data containing these l2_flags (atmospheric correction failure, land pixels, possible cloud or ice pollution, strong solar scintillation pollution, and cloud clutter or shadow pollution) are excluded [45]. In the end, we obtained 81 matching points between SeaWiFS and in situ measurements (Figure 2, yellow squares).

2.3. Accuracy Assessment

The performance of the retrieval algorithm was evaluated using four statistical metrics: the Root Mean Square Difference (RMSD), the Mean Absolute Relative Difference (MARD), and the mean bias (bias). The formulas for calculating these metrics are provided below:

RMSD = \sqrt{\frac{\sum_{i = 1}^{N} {(X_{est, i} - X_{mea, i})}^{2}}{N}},

(1)

MARD = \frac{1}{N} \sum_{i = 1}^{N} \frac{|X_{est, i} - X_{mea, i}|}{X_{mea, i}},

(2)

bias = \frac{1}{N} \sum_{i = 1}^{N} (X_{est, i} - X_{mea, i}),

(3)

where N represents the number of samples, X_est,i and X_mea,i denote the estimated value from the inversion and the measured value from the reference data for the i sample, respectively. This study also computed the coefficient of determination (R²) between X_est,i and X_mea,i.

3. Methods

3.1. S2011

S2011 introduced a novel approach for coastal and oceanic waters designed to accurately model the absorption spectrum of a_g(λ) [46]. This modeling method utilizes two spectral slopes, an exponential curve fit (S) and a hyperbolic curve fit (γ), for the inversion of a_g(λ). The specific formulations are as follows:

a_{g} (λ) = a_{g} (350) e x p (- S (λ - 350) - γ^{0})

(4)

where a_g(350) can be calculated using the following formula:

a_{g} (350) = 0.5567 {(\frac{R_{r s} (443)}{R_{r s} (555)})}^{- 2.0421}

(5)

The spectral slopes S is as follows:

S = 0.0058 {(\frac{R_{r s} (412)}{R_{r s} (350)})}^{- 0.9677}

(6)

Furthermore, R_rs(443) and R_rs(555) are utilized to estimate a_g(412).

a_{g} (412) = 0.1866 {(\frac{R_{r s} (443)}{R_{r s} (555)})}^{- 1.9668}

(7)

The parameter γ₀ serves as a crucial link accounting for the substantial variability of CDOM across transitional coastal and oceanic waters, and is calculated as follows:

γ^{0} = \frac{a_{g} (350) - \frac{1}{γ}}{a_{g} (350) + \frac{1}{γ}}

(8)

The hyperbolic model spectral slopes γ is as follows:

γ = 2.9332 {(\frac{R_{r s} (412)}{R_{r s} (350)})}^{- 0.7506}

(9)

3.2. A2018

Aurin et al. (2018) [2] employed the Global Ocean Carbon Algorithm Database (GOCAD) to derive an empirical model for a_g(λ) through multiple linear regression (named A2018). This model establishes a functional relationship between the natural logarithm of R_rs(λ) at four distinct visible wavelengths and the natural logarithm of a_g(λ), as represented by the following equation:

\ln (a_{g} (443)) = β_{0} + β_{1} \ln (R_{rs} (λ_{1})) + β_{2} \ln (R_{r s} (λ_{2})) + β_{3} \ln (R_{r s} (λ_{3})) + β_{4} l n (R_{r s} (λ_{4}))

(10)

where λ₁ = 443, λ₁ = 490, λ₁ = 510, λ₁ = 555 for SeaWiFS, β₀ = −6.410, β₁ = −0.743, β₂ = −0.145, and β₃ = −0.367, β₄ = 0.547.

3.3. QAA-CDOM

QAA-CDOM represents an enhancement of the QAA developed by Zhu et al. (2011) [23], specifically designed for the separation of the a_g. QAA, developed by Lee et al. (2002) [25], is designed to derive IOPs in optically deep waters. The inversion process is divided into two consecutive steps: in the first part, a reference wavelength λ₀ is selected, and semi-analytical models are applied to accurately estimate the b_b and a at various wavelengths. In the second part, using the total absorption coefficient from the first part, calculate the a_ph and a_dg. Currently, QAA has developed to version 6 (https://ioccg.org/wp-content/uploads/2020/11/qaa_v6_202011.pdf, accessed on 25 July 2025). For detailed formulations, see Table 2.

Due to the similar spectral shapes of a_d and a_g, there is a significant challenge in distinguishing them [36]. Zhu et al. (2011) [23] estimated a_d(443) based on the b_bp(555) derived from QAA_v6, which can further separate a_g from a_dg. The specific formula is as follows:

a_{d} (443) = j_{1} {b_{b p} (555)}^{j_{2}}

(11)

a_{g} (443) = a_{d g} (λ) - a_{d} (λ)

(12)

which j₁ = 0.966, j₂ = 1.038 (the parameters used in this step is derived from empirical fits to in situ data, as reported in the study by Zhu et al. (2011) [23].

3.4. DQAAG

DQAAG is an algorithm developed for retrieving a_g(443) by combining QAA semi-analysis algorithm and three sets of deep learning models.

In this study, we adopt the QAA-UV strategy by incorporating UV band into the inversion process to improve the separation of a_ph and a_dg. The UV band of 380 nm was selected. This wavelength choice was made for two primary reasons: (1) R_rs(380) has been widely adopted in existing research [26], (2) robust models are available for its retrieval [11]. For in situ measurements or satellite data lacking R_rs(380), the UVISR_dl model developed by Wang et al. (2021) [11] was employed for estimation. The method demonstrates strong reliability, with a MARD < 5% for R_rs(380) estimates.

In the first part of the QAA model construction process, the parameter a₀ in Step 3 and the spectral exponent Y in Step 5 are derived through empirical formulations. Since these empirically retrieved parameters are primarily used to obtain b_bp(λ), where b_bw is a wavelength-dependent constant, we replace this segment with a deep learning model that directly establishes an inversion relationship from R_rs(λ) to b_bp(λ). Therefore, the model takes hyperspectral remote sensing dataset, R_rs(380), R_rs(410), R_rs(443), R_rs(490), R_rs(555), and R_rs(670), as input parameters, with the output being the b_bp at the corresponding wavelengths.

In the development of the second part of the QAA model, two empirical formulas, ζ in Step 7 and ξ in Step 8, were originally introduced to further separate the water components. In our model construction, R_rs(380) is integrated as input into a deep learning model, with ζ and ε serving as the output targets, thereby establishing R_rs(380), R_rs(410), R_rs(443), R_rs(490), R_rs(555), and R_rs(670)—ζ, ξ deep learning module for their retrieval.

Moreover, since the standard QAA framework does not directly retrieve a_g, we introduced an additional deep learning model that takes b_bp(380), b_bp(410), b_bp(443), b_bp(490), b_bp(555), and b_bp(670) as input and outputs a_d(443), thereby establishing a dedicated model for a_d inversion. A detailed flowchart of the entire process is presented in Figure 3. Figure 3 shows the algorithm flowchart on the left (Step 0–Step 9), and the deep learning model framework diagrams for Step 2, Step 4, and Step 6 on the right.

Similarly to all artificial intelligence systems, the deep learning models included in DQAAG consist of an input layer, multiple hidden layers, and an output layer. In this study, the Keras framework [47], deeply integrated with TensorFlow, was selected for implementing DQAAG. The Keras environment offers exceptionally clear, concise, and highly readable code, while providing robust backend support and a rich ecosystem. This combination enables rapid model development and experimental research. The number of hidden layers and the number of neurons in each layer were determined based on minimizing the loss function [48]. After extensive experimentation, it was found that a system with three hidden layers yielded the best performance for DQAAG, comprising 256 neurons in the first layer, 128 neurons in the second layer, and 16 neurons in the third layer. The Rectified Linear Unit (ReLU) was employed as the activation function, ReLU is favored for its computational efficiency and its effectiveness in mitigating the gradient vanishing problem [49]. The Adaptive Moment Estimation (Adam) algorithm was used as the optimizer with a learning rate set to 2 × 10⁻⁵, batch size was set to 64, the Adam combines the advantages of the momentum method and adaptive learning rate, enabling it to efficiently and stably address sparse gradient issues [50]. To prevent model overfitting, a dropout rate of 0.1 was set. The loss function is mean absolute error. When the loss function converges and the iteration stops, the training of DQAAG is completed.

4. Results

4.1. Evaluation of b_bp(λ) and a(λ)

QAA_v6 recommends selecting a reference wavelength of 55x or 670 based on the case of water, and using an exponential decay model to calculate b_bp(λ). This approach relies on two empirical formulas—a(λ₀) and the spectral slope Y—which can lead to error propagation. In contrast, DQAAG employs a deep learning model to derive b_bp(λ) directly from R_rs(λ), eliminating the need for water classification and avoiding the cumulative errors associated with the two-step empirical approximations. The b_bp(λ) retrieval results of the DQAAG model are illustrated in Figure 4a–e, the accuracy metrics are provided in Table 3. The results indicate that the b_bp inverted by DQAAG has good consistency with the simulated data, RMSD < 0.0074 m⁻¹, MARD < 0.1, bias < 0.0012 m⁻¹, R² > 0.96. It should be noted that no comparison with NOMAD data was performed due to the lack of in situ b_bp(λ) measurements in that dataset. In addition, Aurin and Dierssen (2012) [51] pointed out that the specific values of g₁ and g₂ may vary with the case of water, and the use of constants for different waters may not be appropriate [19,34]. The use of DQAAG for b_bp(λ) retrieval effectively circumvents this issue.

Figure 5a–e shows the comparison between a(λ) derived from DQAAG and the IOCCG simulated data at 410, 443, 490, 555, and 670 nm. The results demonstrate that the a(λ) values retrieved by DQAAG show good agreement with the simulated data. For waters with a(555) ranging from 0.06 to 0.99 m⁻¹, the data is evenly distributed on both sides of the 1:1 line. The performance metrics are RMSD < 0.23 m⁻¹, MARD < 0.083, and R² > 0.95. The inversion results of a(670) have slightly poor data consistency, RMSD = 0.075 m⁻¹, MARD = 0.10, R² = 0.73, and the accuracy evaluation indicators are shown in Table 4.

Compared with the NOMAD measured data, although the accuracy of the model inversion results has decreased, the overall consistency of the data is good, as shown in Figure 6a–e. The retrieval results for a(λ) show RMSD < 0.31 m⁻¹, MARD < 0.23, and R² > 0.72. When a(410) < 0.04 m⁻¹, there is a slight underestimation phenomenon in the inversion (Figure 6a,b). DQAAG performs accurately in moderately to highly turbid waters, further demonstrating its capability for retrieving a(λ) and b_bp(λ) in global ocean applications.

4.2. Evaluation of a_g(443)

Four estimation algorithms for a_g(443) described in Section 3, including the empirical models S2011 and A2018, the semi-analytical model QAA-CDOM, and the DQAAG model combining deep learning with a semi-analytical algorithm, were evaluated using both simulated data from IOCCG and the publicly available NOMAD dataset. Figure 7 and Figure 8 and Table 5 present the inversion results and performance metrics of these algorithms. In the data comparison, water types were categorized based on the R_rs(490)/R_rs(555) ratio. A ratio greater than 0.85 was defined as Case 1 non-turbid water, while a ratio less than or equal to 0.85 was classified as Case 2 turbid water [6]. This classification was used to investigate the applicability of the algorithm across different water types. The specific outcomes of each algorithm are detailed below.

4.2.1. S2011

The S2011 algorithm exhibited certain deviations in the IOCCG simulated dataset (Figure 7a,c), with RMSD = 0.29 m⁻¹ and MARD = 0.53 (Table 5). A clear underestimation phenomenon is observed in Case 1 waters, whereas the consistency is relatively better in Case 2 waters. However, its performance on the NOMAD dataset proved satisfactory, demonstrating RMSD = 0.15 m⁻¹ and MARD = 0.44.

The accurate inversion results of S2011 in NOMAD come from two characteristics. First, the algorithm leverages the high responsiveness and variability of CDOM around 350 nm to effectively discriminate CDOM signatures [46]. Second, its exponential model incorporates two spectral slope parameters that effectively characterize a_g(λ) across both UV and visible light [6,15,46]. The observed discrepancies in the IOCCG dataset may be attributed to its construction methodology. The IOCCG data were generated using Hydrolight simulations with randomized parameters to ensure broad coverage, which potentially includes optical scenarios rarely encountered in natural environments.

4.2.2. A2018

The A2018 model demonstrated significant deviations in both the IOCCG and NOMAD datasets, irrespective of water type (Case 1 or Case 2 waters), with RMSD > 0.17 m⁻¹, MARD > 0.8, and R² < 0.45 (Figure 7b,d and Table 5). The inversion results exhibited systematic biases, overestimation at low a_g(443) values and underestimation at high values, causing the retrieved a_g(443) to cluster within the narrow range of 0.01–0.5 m⁻¹. Aurin et al. (2018) [2] also acknowledged the model limited accuracy in retrieving a_g(443) when using SeaWiFS bands as input [6].

4.2.3. QAA_CDOM

The inversion accuracy of QAA-CDOM in the simulated dataset is RMSD = 0.17 m⁻¹, MARD = 0.24, R² = 0.89, as shown in Figure 8a, while for the NOMAD dataset, it yielded an RMSD = 0.17 m⁻¹, MARD = 0.38, and R² = 0.59 (Figure 8c).

The performance of QAA-CDOM is influenced by its calibration using multiple datasets, including IOCCG, NOMAD, Hudson, Mississippi, and Neponset [52], which contributes to its relatively good accuracy with the IOCCG simulated and NOMAD data. In addition, QAA_CDOM used IOCCG simulation data and NOMAD data to fit during the inversion of a_d(443), providing different j₁ and j₂ values. It should be noted that the choice of these parameters significantly impacts the final a_g(443) retrieval across different water types. For instance, using the parameters suggested by Zhu et al. [23] (j₁ = 10.51, j₂ = 1.56) leads to a MARD > 0.9 when compared to NOMAD data.

4.2.4. DQAAG

DQAAG achieves optimal inversion performance across both the IOCCG simulated dataset and the NOMAD in situ dataset, particularly in waters with low to moderate turbidity (R_rs(555) < 0.06 sr⁻¹), with RMSD < 0.13 m⁻¹, MARD < 0.30, R² > 0.89, as shown in Figure 8b,d.

The robust performance of DQAAG on the IOCCG simulated dataset (covering both Case 1 and Case 2 waters) stems directly from its training methodology. The model was developed using a synthetically generated dataset created with IOCCG-recommended algorithms, which maintains strong consistency with Hydrolight simulations (deviation < 1%). This comprehensive training dataset spans an extended dynamic range of optical conditions, significantly enhancing the model’s robustness and applicability across diverse aquatic environments. Therefore, DQAAG demonstrated effectiveness on the IOCCG simulated data is justified. It should be noted that the construction of deep learning models heavily depends on the training dataset. Therefore, their applicability to highly turbid waters (R_rs(555) > 0.06 sr⁻¹) still requires further validation.

4.3. Comparison of SeaWiFS Remote Sensing a_g(443) Data

Since the ultimate objective of the algorithm is to apply it to ocean color satellites for obtaining global distributions of a_g, we further evaluated the a_g(443) values derived from DQAAG using SeaWiFS data, where the position of the satellite in situ matching station is shown in Figure 2, which includes yellow squares. Since the in situ data covers 1999–2006, SeaWIFS data was used for matching NOMAD data. The SeaWIFS data was first used to estimate R_rs(380) using the UVISR_dl model established by Wang et al. (2021) [11], followed by the application of DQAAG for retrieval a_g(443). Figure 9 shows the scatter plot between the inversion SeaWIFS data and NOMAD measured ag (443), where RMSD = 0.14 m⁻¹, MARD = 0.36, and R² = 0.51. These evaluation indicators are slightly worse than those obtained using in situ R_rs(λ) invert a_g(443). The reasons for this performance degradation include: (1) The lack of perfect “match-ups” between satellite and field measurements due to temporal and spatial mismatches [53,54], which is a primary source of data bias [42]. (2) Despite efforts to minimize error propagation, residual uncertainties in R_rs(λ) products can propagate to the estimation of a_g(443) due to sensor noise and incomplete atmospheric correction in marine and coastal areas [55,56,57].

5. Discussion

5.1. Model Performance

The DQAAG algorithm includes three sets of deep learning models: (1) using R_rs(λ) as input to obtain b_bp(λ), (2) takes R_rs(λ) as input and produces ζ and ξ as outputs, and (3) with b_bp(λ) input to estimate a_d(443). To further investigate the importance of input features, we computed Shapley Additive explanations (SHAP) values between the input and output parameters of different models. SHAP value is an additional observation of the impact of each input feature on the variability of the corresponding output parameters in a deep learning model [58]. In brief, SHAP values are generated for each input variable to estimate its marginal effect on the output. The SHAP summary plot combines feature importance with the direction of feature effects, where a wider distribution of SHAP values indicates a stronger influence of the variable on the predicted parameter. Given the characteristics of the model, the SHAP interpreter was configured as the DeepExplainer, which estimates contribution values based on a layer-wise backpropagation mechanism, effectively reducing computation time. We employed global bar plots, global bee swarm plots, and SHAP value scatter plots for the analysis.

Figure 10a–c show the effect of R_rs(λ) on b_bp(λ), taking b_bp(555) as an example. It can be observed that R_rs(670) and R_rs(380) exert the most significant influence on b_bp(555) (Figure 10a). Variations in R_rs(670) can lead to changes in b_bp(555) ranging from −0.2–1 m⁻¹ (Figure 10b). The feature dependency analysis (Figure 10c) indicates that higher values of R_rs(670) correspond to a stronger positive effect on b_bp(555) (R² = 0.99, p value < 0.01). R_rs(670) serves as a criterion for determining water classification, and provides a rough estimation of water constituents. When R_rs(670) > 0.0015 sr^−1, it indicates that the water is case 2 waters [59]. Due to an increased concentration of suspended particles, the b_bp(555) also increases. Therefore, it is reasonable that R_rs(670) has a greater impact on b_bp(λ) than other bands.

Figure 10d–f present the impact of R_rs(λ) on ζ. Similarly, R_rs(670) shows the strongest effect on ζ, with variations potentially causing changes in ζ between −0.04~0.06 (Figure 10d,e). Furthermore, R_rs(670) exhibits a positive correlation with ζ (R² = 0.98, p value < 0.01), although this positive influence diminishes when R_rs(670) exceeds 0.01 (Figure 10f). Figure 10g–i demonstrate the effect of R_rs(λ) on ξ. It is worth noting that R_rs(380) has the greatest impact on the change in ξ (Figure 10g), and the change in R_rs(380) can cause a change in ξ of −0.2~0.2 (Figure 10h). Specifically, R_rs(380) has a significant negative impact on ξ (Figure 10i).

Figure 10j–l show the influence of b_bp(λ) on a_d(443). Consistent with previous findings, b_bp(670) demonstrates the strongest positive effect on the estimation of a_d(443) (R² = 0.98, p value < 0.01). Despite the differences in spectral bands, it is well understood that a clear positive correlation exists between the b_bp and a_d.

5.2. Sensitivity Analysis

The DQAAG algorithm uses b_bp(λ) to establish a deep learning model during the inversion of a_d(443), while b_bp(λ) is calculated based on R_rs(λ), which leads to further accumulation of bias generated by R_rs(λ) in a_d(443). To quantify this effect, we introduced random noise within ranges of ±5%, ±10%, ±20%, and ±50% to R_rs(λ) using the 500 simulated datasets provided by IOCCG, and evaluated the resulting impact on a_d(443) retrieval. Figure 11 illustrates the variation in RMSD, MARD, and R² for different wavelengths under the influence of random noise, taking R_rs(380), R_rs(443), R_rs(555), and R_rs(670) as examples. Overall, R_rs(443) exerts the strongest influence on the retrieval of a_d(443) (Figure 11d–f). When ±20% random noise was added to R_rs(380), the RMSD and MARD of a_d(443) increased by less than 5% (Figure 11a,b), and R² decreased by less than 3% (Figure 11c). In contrast, introducing ±20% noise to R_rs(443) led to an approximately 200% increase in MARD and a reduction in R² of nearly 30% (Figure 11e). The influence of noise on R_rs(555) (Figure 11g–i) and R_rs(670) (Figure 11j–l) follows the same trend as that at 380 nm, but its impact is less pronounced. These results indicate that error accumulation in the deep learning model is most significant at 443 nm, while other wavelengths have minimal impact on the retrieval accuracy.

5.3. Global CDOM Distribution Patterns

Based on the results presented above, DQAAG achieves excellent retrieval performance with SeaWiFS data. Figure 12 shows the climatological distribution of a_g(443) derived from SeaWiFS using DQAAG. Extending the estimation of a_g(443) to a global scale helps provide key information on the global distribution of CDOM, thereby facilitating the assessment of potential photochemical and photobiological processes in the ocean.

The global spatial distribution of a_g(443) aligns with reports from previous studies. Notably, this parameter exhibits high spatial dynamism worldwide, with measured values spanning more than three orders of magnitude (a_g(443)~0.001–2 m⁻¹) [6]. As shown in Figure 12a, the a_g(443) in the equatorial Pacific is slightly higher in spring, mainly due to the influence of equatorial upwelling, which leads to an increase in biological activity [60,61]. Spatially, it was found that the a_g(443) of Gyre in the South Pacific was 0.01 m⁻¹ (Figure 12b). The reason is that light degradation limits the content of CDOM [45]. In the Yangtze River estuary area, mainly due to the influence of land-based sources, the content of a_g(443) can reach 1 m⁻¹ [62]. In the southwestern Atlantic, a_g(443) increases during autumn and winter (Figure 12c,d). In addition, across all seasons, a_g(443) values in regions north of 30°N consistently exceed those in the Southern hemisphere, a pattern that aligns with the findings of Bricaud et al. (2012) [63].

Unlike previous algorithms [16,64,65], which face significant challenges in accurately retrieving a_g(443) near the sea surface due to complex marine environments, our study demonstrates that such conditions do not substantially affect the performance of DQAAG, as confirmed by statistical analyses. Within the range of R_rs(555) from 6.0 × 10⁻⁴ to 0.06 sr⁻¹, the DQAAG algorithm consistently yields effective results. Therefore, DQAAG is an effective algorithm for accurately obtaining a_g(443), which is helpful for modeling marine ecosystems and estimating the heat of upper marine organisms.

6. Conclusions

In this study, an artificial intelligence model (deep learning) for retrieving a_g(443) was developed using extensive simulated data. We compared the inversion results of intrinsic optical parameters using globally available datasets such as the IOCCG simulated data and NOMAD field measurements, which cover both Case 1 and Case 2 waters. The comparison between the IOCCG simulated dataset and NOMAD in situ data a(λ) showed RMSD < 0.31 m⁻¹, MARD < 0.23, and R² > 0.72. Compared with QAA-CDOM, S2011, A2018, DQAAG inversion of a_g(443) has better inversion results, RMSD < 0.3 m⁻¹, MARD < 0.30, bias < 0.028 m⁻¹, R² > 0.78. This study also demonstrates the universality of algorithms based on radiative transfer and further underscores the powerful capability of combining deep learning with semi-analytical methods to address ocean color retrieval challenges. In addition, when matching satellite-derived a_g(443) with NOMAD data, DQAAG exhibited higher retrieval accuracy, with RMSD = 0.14 m⁻¹ and MARD ≈ 0.39. Accurate retrieval of CDOM can effectively monitor the spatiotemporal distribution of river plume inputs into the ocean, thereby advancing our understanding of land-ocean interaction processes. Since freshwater plumes often carry significant nutrient loads, they can stimulate phytoplankton growth, thereby providing valuable information for environmental monitoring agencies and fisheries management.

In summary, DQAAG relies on two important characteristics, one is the use of ultraviolet band for AG separation, and the other is the use of deep learning models instead of simple empirical formulas. The adoption of ultraviolet spectral bands enables full utilization of existing satellite data with UV sensing capabilities, while simultaneously opening new methodological avenues for subsequent research on water color parameters. As current satellites are equipped with bands below 380 nm (such as HY-1C or OCI), further exploration of the application scenarios of wavelengths below 380 nm in ocean color remote sensing is a worthwhile endeavor. During this process, attention must be paid to the influence of Mycosporine-like amino acids (MAAs). Furthermore, the integration of deep learning models with semi-analytical approaches represents a practical and effective approach for processing hyperspectral ocean color remote sensing data. They not only enhance the capability to retrieve a_g(443) from both global oceanic and coastal waters but also improve the accuracy of marine primary productivity estimates, making an important contribution to the field of ocean optics.

Author Contributions

Y.W. responsible for data analysis, model training, and manuscript writing; X.W. and Q.S. contributed to the design, organization, and manuscript revision of the manuscript; Q.X. and L.X. have made contributions to the collection and data analysis of remote sensing images; J.B. and K.B. contributed to the collection of remote sensing image data. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 42406180), Key Laboratory of Space Ocean Remote Sensing and Application Open Fund (No. 202301002), Supported by the Guangxi Disclosure System Technology Project (No. 2025JBGS008), the Central Basic Research Business Fund Projects (No. TKS20250205) and the Research Project of China Three Gorges Corporation (No. 202103552).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We thank NASA OBPG for providing satellite ocean color products (http://oceancolor.gsfc.nasa.gov/, accessed on 25 July 2025), NASA for their help with providing the NOMAD dataset. We thank the reviewers for their suggestions, which significantly improved the presentation of the paper.

Conflicts of Interest

Author Xiaodao Wei was employed by the company Shanghai Investigation, Design & Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Cao, F.; Tzortziou, M.; Hu, C.; Mannino, A.; Fichot, C.G.; Del Vecchio, R.; Najjar, R.G.; Novak, M. Remote sensing retrievals of colored dissolved organic matter and dissolved organic carbon dynamics in North American estuaries and their margins. Remote Sens. Environ. 2018, 205, 151–165. [Google Scholar] [CrossRef]
Aurin, D.; Mannino, A.; Lary, D.J. Remote sensing of CDOM, CDOM spectral slope, and dissolved organic carbon in the global ocean. Appl. Sci. 2018, 8, 2687. [Google Scholar] [CrossRef]
Stedmon, C.A.; Nelson, N.B. The optical properties of DOM in the ocean. In Biogeochemistry of Marine Dissolved Organic Matter; Elsevier: Amsterdam, The Netherlands, 2015; pp. 481–508. [Google Scholar]
Nebbioso, A.; Piccolo, A.J.A.; Chemistry, B. Molecular characterization of dissolved organic matter (DOM): A critical review. Anal. Bioanal. Chem. 2013, 405, 109–124. [Google Scholar] [CrossRef]
Huang, J.; Chen, J.; Mu, Y.; Cao, C.; Shen, H. Remote-sensing monitoring of colored dissolved organic matter in the Arctic Ocean. Mar. Pollut. Bull. 2024, 204, 116529. [Google Scholar] [CrossRef]
Bonelli, A.G.; Vantrepotte, V.; Jorge, D.S.F.; Demaria, J.; Jamet, C.; Dessailly, D.; Mangin, A.; D’Andon, O.F.; Kwiatkowska, E.; Loisel, H. Colored dissolved organic matter absorption at global scale from ocean color radiometry observation: Spatio-temporal variability and contribution to the absorption budget. Remote Sens. Environ. 2021, 265, 112637. [Google Scholar] [CrossRef]
Jiao, N.; Luo, T.; Chen, Q.; Zhao, Z.; Xiao, X.; Liu, J.; Jian, Z.; Xie, S.; Thomas, H.; Herndl, G.J.; et al. The microbial carbon pump and climate change. Nat. Rev. Microbiol. 2024, 22, 408–419. [Google Scholar] [CrossRef]
Ducklow, H.W.; Steinberg, D.K.; Buesseler, K.O.J.O. Upper ocean carbon export and the biological pump. Oceanography 2001, 14, 50–58. [Google Scholar] [CrossRef]
Norman, L.; Thomas, D.N.; Stedmon, C.A.; Granskog, M.A.; Papadimitriou, S.; Krapp, R.H.; Meiners, K.M.; Lannuzel, D.; van der Merwe, P.; Dieckmann, G.S. The characteristics of dissolved organic matter (DOM) and chromophoric dissolved organic matter (CDOM) in Antarctic sea ice. Deep. Sea Res. Part II Top. Stud. Oceanogr. 2011, 58, 1075–1091. [Google Scholar] [CrossRef]
De Mora, S.; Demers, S.; Vernet, M. The Effects of UV Radiation in the Marine Environment; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Wang, Y.; Lee, Z.; Wei, J.; Shang, S.; Wang, M.; Lai, W. Extending satellite ocean color remote sensing to the near-blue ultraviolet bands. Remote Sens. Environ. 2021, 253, 112228. [Google Scholar] [CrossRef]
Mahrad, B.E.; Newton, A.; Icely, J.D.; Kacimi, I.; Abalansa, S.; Snoussi, M.J.R.S. Contribution of remote sensing technologies to a holistic coastal and marine environmental management framework: A review. Remote Sens. 2020, 12, 2313. [Google Scholar] [CrossRef]
Kutser, T.; Pierson, D.C.; Kallio, K.Y.; Reinart, A.; Sobek, S. Mapping lake CDOM by satellite remote sensing. Remote Sens. Environ. 2005, 94, 535–540. [Google Scholar] [CrossRef]
Isada, T.; Hooker, S.B.; Taniuchi, Y.; Suzuki, K. Evaluation of retrieving chlorophyll a concentration and colored dissolved organic matter absorption from satellite ocean color remote sensing in the coastal waters of Hokkaido, Japan. J. Oceanogr. 2022, 78, 263–276. [Google Scholar] [CrossRef]
Nguyen, V.S.; Loisel, H.; Vantrepotte, V.; Mériaux, X.; Tran, D.L. An Empirical Algorithm for Estimating the Absorption of Colored Dissolved Organic Matter from Sentinel-2 (MSI) and Landsat-8 (OLI) Observations of Coastal Waters. Remote Sens. 2024, 16, 4061. [Google Scholar] [CrossRef]
Mannino, A.; Russ, M.E.; Hooker, S.B. Algorithm development and validation for satellite-derived distributions of DOC and CDOM in the US Middle Atlantic Bight. J. Geophys. Res.-Ocean. 2008, 113, C07051. [Google Scholar] [CrossRef]
Sathyendranath, S.; Cota, G.; Stuart, V.; Maass, M.; Platt, T. Remote sensing of phytoplankton pigments: A comparison of empirical and theoretical approaches. Int. J. Remote Sens. 2001, 22, 249–273. [Google Scholar] [CrossRef]
Lee, Z.P.; Carder, K.L.; Steward, R.G.; Peacock, T.G.; Davis, C.O.; Patch, J.S. An empirical algorithm for light absorption by ocean water based on color. J. Geophys. Res. 1998, 103, 27967–27978. [Google Scholar] [CrossRef]
Wang, Y.; Shen, F.; Sokoletsky, L.; Sun, X. Validation and Calibration of QAA Algorithm for CDOM Absorption Retrieval in the Changjiang (Yangtze) Estuarine and Coastal Waters. Remote Sens. 2017, 9, 1192. [Google Scholar] [CrossRef]
Hoge, F.E.; Lyon, P.E. Satellite retrieval of inherent optical properties by linear matrix inversion of oceanic radiance models: An analysis of model and radiance measurement errors. J. Geophys. Res. Ocean. 1996, 101, 16631–16648. [Google Scholar] [CrossRef]
D’Sa, E.J.; Miller, R.L.; Del Castillo, C. Bio-optical properties and ocean color algorithms for coastal waters influenced by the Mississippi River during a cold front. Appl. Opt. 2006, 45, 7410–7428. [Google Scholar] [CrossRef]
Barnard, A.H.; Zaneveld, J.R.V.; Pegau, W.S. In situ determination of the remotely sensed reflectance and the absorption coefficient: Closure and inversion. Appl. Opt. 1999, 38, 5108–5117. [Google Scholar] [CrossRef]
Zhu, W.; Yu, Q.; Tian, Y.Q.; Chen, R.F.; Gardner, G.B. Estimation of chromophoric dissolved organic matter in the Mississippi and Atchafalaya river plume regions using above-surface hyperspectral remote sensing. J. Geophys. Res.-Ocean. 2011, 116, C02011. [Google Scholar] [CrossRef]
Maritorena, S.; Siegel, D.A.; Peterson, A.R. Optimization of a semianalytical ocean color model for global-scale applications. Appl. Opt. 2002, 41, 2705–2714. [Google Scholar] [CrossRef]
Lee, Z.; Carder, K.L.; Arnone, R.A. Deriving inherent optical properties from water color: A multiband quasi-analytical algorithm for optically deep waters. Appl. Opt. 2002, 41, 5755–5772. [Google Scholar] [CrossRef]
Wei, J.; Lee, Z. Retrieval of phytoplankton and colored detrital matter absorption coefficients with remote sensing reflectance in an ultraviolet band. Appl. Opt. 2015, 54, 636–649. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; He, X.; Li, Q.; Kratzer, S.; Wang, J.; Shi, T.; Hu, Z.; Yang, C.; Hu, S.; Zhou, Q. Estimating ultraviolet reflectance from visible bands in ocean colour remote sensing. Remote Sens. Environ. 2021, 258, 112404. [Google Scholar] [CrossRef]
Zheng, L.; Lee, Z.; Wang, Y.; Yu, X.; Lai, W.; Shang, S. Evaluation of near-blue UV remote sensing reflectance over the global ocean from SNPP VIIRS, PACE OCI, and GCOM-C SGLI. Opt. Express 2025, 33, 40465–40488. [Google Scholar] [CrossRef] [PubMed]
Siswanto, E. Assessing optical water types in Asian coastal ocean waters from space using GCOM-C/SGLI observations. Int. J. Remote Sens. 2025, 46, 2337–2357. [Google Scholar] [CrossRef]
Li, S.; Chen, S.; Ma, C.; Peng, H.; Wang, J.; Hu, L.; Song, Q. Construction of a radiometric degradation model for ocean color sensors of HY1C/D. In IEEE Transactions on Geoscience Remote Sensing; IEEE: New York, NY, USA, 2024; Volume 62, pp. 1–13. [Google Scholar]
Wang, J.; Wang, Y.; Lee, Z.; Wang, D.; Chen, S.; Lai, W. A revision of NASA SeaDAS atmospheric correction algorithm over turbid waters with artificial Neural Networks estimated remote-sensing reflectance in the near-infrared. ISPRS J. Photogramm. Remote Sens. 2022, 194, 235–249. [Google Scholar] [CrossRef]
Zhao, D.; Feng, L.; Yang, Z.; Yu, X.; Wang, M. A deep-learning assisted algorithm to improve inherent optical properties estimations over inland and nearshore coastal waters. In IEEE Transactions on Geoscience Remote Sensing; IEEE: New York, NY, USA, 2025. [Google Scholar]
Zhang, Z.; Chen, P.; Zhang, S.; Huang, H.; Pan, Y.; Pan, D. A Review of Machine Learning Applications in Ocean Color Remote Sensing. Remote Sens. 2025, 17, 1776. [Google Scholar] [CrossRef]
Chen, J.; Quan, W.; Cui, T.; Song, Q.; Lin, C. Remote sensing of absorption and scattering coefficient using neural network model: Development, validation, and application. Remote Sens. Environ. 2014, 149, 213–226. [Google Scholar] [CrossRef]
Sauzède, R.; Claustre, H.; Uitz, J.; Jamet, C.; Dall’Olmo, G.; d’Ortenzio, F.; Gentili, B.; Poteau, A.; Schmechtig, C. A neural network-based method for merging ocean color and Argo data to extend surface bio-optical properties to depth: Retrieval of the particulate backscattering coefficient. J. Geophys. Res. Ocean. 2016, 121, 2552–2571. [Google Scholar] [CrossRef]
Ioannou, I.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S. Deriving ocean color products using neural networks. Remote Sens. Environ. 2013, 134, 78–91. [Google Scholar] [CrossRef]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Sun, X.; Zhang, Y.; Zhang, Y.; Shi, K.; Zhou, Y.; Li, N. Machine learning algorithms for chromophoric dissolved organic matter (CDOM) estimation based on Landsat 8 images. Remote Sens. 2021, 13, 3560. [Google Scholar] [CrossRef]
Chen, J.; He, X.; Zhou, B.; Pan, D. Deriving colored dissolved organic matter absorption coefficient from ocean color with a neural quasi-analytical algorithm. J. Geophys. Res. Ocean. 2017, 122, 8543–8556. [Google Scholar] [CrossRef]
IOCCG. Remote Sensing of Inherent Optical Properties: Fundamentals, Tests of Algorithms, and Applications. In Reports of the International Ocean-Colour Coordinating Group, No. 5; Lee, Z.-P., Stuart, V., Eds.; IOCCG: Dartmouth, NS, Canada, 2006; Volume 5, p. 126. [Google Scholar]
IOCCG-OCAG (International Ocean Colour Coordinating Group). Model, Parameters, and Approaches That Used to Generate Wide Range of Absorption and Backscattering Spectra. 2003. Available online: http://www.ioccg.org/groups/OCAG_data.html (accessed on 25 July 2025).
Werdell, P.J.; Bailey, S.W. An improved in-situ bio-optical data set for ocean color algorithm development and satellite data product validation. Remote Sens. Environ. 2005, 98, 122–140. [Google Scholar] [CrossRef]
Bailey, S.W.; Werdell, P.J. A multi-sensor approach for the on-orbit validation of ocean color satellite data products. Remote Sens. Environ. 2006, 102, 12–23. [Google Scholar] [CrossRef]
Wei, J.; Lee, Z.; Shang, S. A system to measure the data quality of spectral remote sensing reflectance of aquatic environments. J. Geophys. R. 2016, 121, 8189–8207. [Google Scholar] [CrossRef]
Wang, Y.; Lee, Z.; Ondrusek, M.; Li, X.; Zhang, S.; Wu, J. An evaluation of remote sensing algorithms for the estimation of diffuse attenuation coefficients in the ultraviolet bands. Opt. Express 2022, 30, 6640–6655. [Google Scholar] [CrossRef]
Shanmugam, P. New models for retrieving and partitioning the colored dissolved organic matter in the global ocean: Implications for remote sensing. Remote Sens. Environ. 2011, 115, 1501–1521. [Google Scholar] [CrossRef]
Chollet, F. Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 25 July 2025).
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media: Santa Rosa, CA, USA, 2019. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Aurin, D.A.; Dierssen, H.M. Advantages and limitations of ocean color remote sensing in CDOM-dominated, mineral-rich coastal and estuarine waters. Remote Sens. Environ. 2012, 125, 181–197. [Google Scholar] [CrossRef]
Zhu, W.; Yu, Q.; Tian, Y.Q.; Becker, B.L.; Zheng, T.; Carrick, H.J. An assessment of remote sensing algorithms for colored dissolved organic matter in complex freshwater environments. Remote Sens. Environ. 2014, 140, 766–778. [Google Scholar] [CrossRef]
Antoine, D.; d’Ortenzio, F.; Hooker, S.B.; Bécu, G.; Gentili, B.; Tailliez, D.; Scott, A.J. Assessment of uncertainty in the ocean reflectance determined by three satellite ocean color sensors (MERIS, SeaWiFS and MODIS-A) at an offshore site in the Mediterranean Sea (BOUSSOLE project). J. Geophys. Res. Ocean. 2008, 113, C07013. [Google Scholar] [CrossRef]
Zibordi, G.; Berthon, J.-F.; Mélin, F.; D’Alimonte, D.; Kaitala, S. Validation of satellite ocean color primary products at optically complex coastal sites: Northern Adriatic Sea, Northern Baltic Proper and Gulf of Finland. Remote Sens. Environ. 2009, 113, 2574–2591. [Google Scholar] [CrossRef]
Wei, J.; Lee, Z.; Garcia, R.; Zoffoli, L.; Armstrong, R.A.; Shang, Z.; Sheldon, P.; Chen, R.F. An assessment of Landsat-8 atmospheric correction schemes and remote sensing reflectance products in coral reefs and coastal turbid waters. Remote Sens. Environ. 2018, 215, 18–32. [Google Scholar] [CrossRef]
Wang, M. Remote sensing of the ocean contributions from ultraviolet to near-infrared using the shortwave infrared bands: Simulations. Appl. Opt. 2007, 46, 1535–1547. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Lee, Z.; Wei, J.; Du, K. Atmospheric correction in coastal region using same-day observations of different sun-sensor geometries with a revised POLYMER model. Opt. Express 2020, 28, 26953–26976. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Najah, A.; Al-Shehhi, M.R. Performance of the ocean color algorithms: QAA, GSM, and GIOP in inland and coastal waters. Remote Sens. Earth Syst. Sci. 2021, 4, 235–248. [Google Scholar] [CrossRef]
Nelson, N.B.; Siegel, D.A. The global distribution and dynamics of chromophoric dissolved organic matter. Annu. Rev. Mar. Sci. 2013, 5, 447–476. [Google Scholar] [CrossRef]
Nolan, C.; Overpeck, J.T.; Allen, J.R.; Anderson, P.M.; Betancourt, J.L.; Binney, H.A.; Brewer, S.; Bush, M.B.; Chase, B.M.; Cheddadi, R. Past and future global transformation of terrestrial ecosystems under climate change. Science 2018, 361, 920–923. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhou, L.; Zhou, Y.; Zhang, L.; Yao, X.; Shi, K.; Jeppesen, E.; Yu, Q.; Zhu, W. Chromophoric dissolved organic matter in inland waters: Present knowledge and future challenges. Sci. Total Environ. 2021, 759, 143550. [Google Scholar] [CrossRef]
Bricaud, A.; Ciotti, A.M.; Gentili, B. Spatial-temporal variations in phytoplankton size and colored detrital matter absorption at global and regional scales, as derived from twelve years of SeaWiFS data (1998–2009). Glob. Biogeochem. Cycles 2012, 26, GB1010. [Google Scholar] [CrossRef]
Mannino, A.; Novak, M.G.; Hooker, S.B.; Hyde, K.; Aurin, D. Algorithm development and validation of CDOMproperties for estuarine and continental shelf waters along the northeastern U.S. coast. Remote Sens. Environ. 2014, 152, 576–602. [Google Scholar] [CrossRef]
Bai, Y.; Pan, D.; Cai, W.J.; He, X.; Wang, D.; Tao, B.; Zhu, Q. Remote sensing of salinity from satellite-derived CDOM in the Changjiang River dominated East China Sea. J. Geophys. Res. Ocean. 2013, 118, 227–243. [Google Scholar] [CrossRef]

Figure 1. R_rs(λ) spectra and a_g(443) data statistics used in this study. (a) R_rs(λ) hyperspectral of synthesized data, (b) the statistical distribution of synthesized data a_g(443), (c) the R_rs(λ) hyperspectral of simulated data provided by IOCCG, (d) the measured R_rs(λ) spectrum provided by NOMAD dataset Relationship between R_rs(λ) and a_g(λ) of both synthetic, IOCCG, and NOMAD datasets. (e) R_rs(443) vs. R_rs(410), (f) a_g(443) vs. a_g(410).

Figure 2. The NOMAD in situ data used for evaluating the a_g(443) retrieval algorithm. The red circle represents the measurement positions of NOMAD data, and the yellow square represents the NOMAD data stations that match the SeaWiFS satellite.

Figure 3. Schematic diagram of the system for estimating a_g(443) using deep learning: DQAAG.

Figure 4. Comparison between b_bp(λ) derived from DQAAG and the b_bp(λ) from the IOCCG simulated dataset: (a) b_bp(410), (b) b_bp(443), (c) b_bp(490), (d) b_bp(555), (e) b_bp(670).

Figure 5. Comparison between a(λ) derived from DQAAG and a(λ) from the IOCCG simulated dataset: (a) a(410), (b) a(443), (c) a(490), (d) a(555), and (e) a(670).

Figure 6. Comparison between a(λ) derived from DQAAG and a(λ) from the NOMAD dataset: (a) a(410), (b) a(443), (c) a(490), (d) a(555), and (e) a(670).

Figure 7. Comparison between a_g(443) derived from S2011 and A2018 from the IOCCG (a) S2011, (b) A2018 and NOMAD dataset (c) S2011, and (d) A2018.

Figure 8. Comparison between a_g(443) derived from QAA-CDOM and DQAAG from the IOCCG (a) QAA-CDOM, (b) DQAAG, and NOMAD dataset (c) QAA-CDOM, and (d) DQAAG.

Figure 9. Comparison between the SeaWiFS data derived by DQAAG and the NOMAD measured a_g(443).

Figure 10. SHAP summary plots for the DQAAG model applied to the synthetic dataset. (a) mean |SHAP| values of R_rs(λ) for predicting b_bp(555), (b) SHAP values of R_rs(λ) for b_bp(555), (c) the feature dependence trend of R_rs(670) on b_bp(555), (d) mean |SHAP| values of R_rs(λ) for predicting ζ, (e) SHAP values of R_rs(λ) for ζ, (f) the feature dependence trend of R_rs(670) on ζ, (g) mean |SHAP| values of R_rs(λ) for predicting ξ, (h) SHAP values of R_rs(λ) for ξ, (i) the feature dependence trend of R_rs(380) on ξ, (j) mean |SHAP| values of b_bp(λ) for predicting a_d(443), (k) SHAP values of b_bp(λ) for a_d(443), and (l) the feature dependence trend of b_bp(670) on a_d(443).

Figure 11. Impact of adding random noise (±5%, ±10%, ±20%, and ±50%) to R_rs(λ) on the retrieval accuracy of a_d(443): (a–c) represent the RMSD, MARD, and R² for R_rs(380), (d–f) correspond to the RMSD, MARD, and R² for R_rs(443), (g–i) represent the RMSD, MARD, and R² for R_rs(555), and (j–l) represent the RMSD, MARD, and R² for R_rs(670).

Figure 12. Global distribution of seasonal climatology of SeaWiFS derived a_g(443): (a) Spring, (b) Summer, (c) Autumn, and (d) Winter.

Table 1. Statistical description of R_rs(555) (taking R_rs(555) as an example) and a_g(443) datasets used for model training, validation, and testing (coefficient of variation (CV) is the ratio of standard deviation to mean).

Data	Data Sources (Data Number)	Parameter	Min	Max	Mean	CV
Training data	Simulated data (N = 200,000)	R_rs(555) [sr⁻¹]	6.0 × 10⁻⁴	0.059	0.0062	1.18
Training data	Simulated data (N = 200,000)	a_g(443) [m⁻¹]	3.9 × 10⁻⁴	8.05	0.54	2.19
Validation data	IOCCG data (N = 500)	R_rs(555) [sr⁻¹]	1.0 × 10⁻³	0.029	0.0061	0.77
	IOCCG data (N = 500)	a_g(443) [m⁻¹]	2.5 × 10⁻³	2.37	0.33	1.45
	NOMAD data (N = 287)	R_rs(555) [sr⁻¹]	6.4 × 10⁻⁴	0.040	0.0061	1.08
	NOMAD data (N = 287)	a_g(443) [m⁻¹]	5.4 × 10⁻⁴	1.12	0.17	1.18

Table 2. QAA_v6.

Steps	Property	Derivation	Approach
Step 0	$r_{r s} (λ)$	$= \frac{R_{r s}}{0.52 + 1.7 R_{r s}}$	Semi-analytical
Step 1	$u (λ)$	$= \frac{- g_{0} \pm \sqrt{{g_{0}}^{2} - 4 g_{1} r_{r s} (λ)}}{2 g_{1}}$	Semi-analytical
Step 2	$a (λ_{0})$	if $R_{r s} (670) < 0.0015 {s r}^{- 1}$ $= a_{w} (λ_{0}) + 10^{- 1.146 - 1.366 x - 0.469 x^{2}}$ $x = \log_{10} (\frac{r_{r s} (443) + r_{r s} (490)}{r_{r s} (λ_{0}) + 5 r_{r s} (667) \frac{r_{r s} (667)}{r_{r s} (490)}})$ else ${= a}_{w} (670) + 0.39 {(\frac{R_{r s} (670)}{R_{r s} (443) + R_{r s} (490)})}^{1.14}$	Empirical
Step 3	$b_{b p} (λ_{0})$	$= \frac{u (λ_{0}) a (λ_{0})}{1 - u (λ_{0})} - b_{b w} (λ_{0})$	Analytical
Step 4	$Y$	$= 2.0 (1 - 1.2 e^{- 0.9 \frac{r_{r s} (443)}{r_{r s} (555)}})$	Empirical
Step 5	$b_{b p} (λ)$	$= b_{b p} {(λ}_{0}) {(\frac{λ_{0}}{λ})}^{Y}$	Semi-analytical
Step 6	$a (λ)$	$= \frac{[1 - u (λ)] b_{b} (λ)}{u (λ)}$	Analytical
Step 7	$ζ$	$= p_{1} + \frac{p_{2}}{p_{3} + \frac{r_{r s} (443)}{r_{r s} (555)}}$ $p_{1} = 0.74, p_{2} = 0.2, p_{3} = 0.8$	Empirical
Step 8	$ξ$	$= e^{S (443 - 412)}$ $S = p_{1} + \frac{p_{2}}{p_{3} + \frac{r_{r s} (443)}{r_{r s} (555)}}$ $p_{1} = 0.015, p_{2} = 0.002, p_{3} = 0.6$	Empirical
Step 9	$a_{d g} (443)$	$= \frac{a (412) - ζ a (443)}{ξ - ζ} - \frac{a_{w} (412) - ζ a_{w} (443)}{ξ - ζ}$	Analytical
Step 10	$a_{p h} (λ)$	$= a - a_{d g} (λ) - a_{w} (λ)$ ${a_{d g} (λ) = a}_{d g} (443) e^{- S (λ - 443)}$	Analytical

Table 3. Statistics of DQAAG applied to the IOCCG simulated dataset for b_bp(λ) inversion.

Data	N	RMSD (m⁻¹)	MARD	Bias (m⁻¹)	R²
b_bp(410)	500	0.0052	0.087	−0.00039	~0.97
b_bp(443)		0.0053	0.078	−0.00085	~0.97
b_bp(490)		0.0060	0.084	0.00032	~0.96
b_bp(555)		0.0068	0.081	0.00075	~0.96
b_bp(670)		0.0074	0.073	0.0012	~0.96

Table 4. Statistics of DQAAG applied to the IOCCG simulated dataset and NOMAD dataset for a(λ) inversion.

Algorithms	Data	N	RMSD (m⁻¹)	MARD	Bias (m⁻¹)	R²
a(410)	IOCCG	500	0.23	0.069	0.031	0.96
a(410)	NOMAD	287	0.31	0.23	−0.064	0.75
a(443)	IOCCG	500	0.14	0.12	−0.042	0.97
a(443)	NOMAD	287	0.23	0.21	−0.045	0.76
a(490)	IOCCG	500	0.082	0.063	0.0094	0.96
a(490)	NOMAD	287	0.12	0.20	−0.016	0.82
a(555)	IOCCG	500	0.033	0.083	−0.0066	0.95
a(555)	NOMAD	287 (286) *	0.059	0.17	0.011	0.72
a(670)	IOCCG	500	0.075	0.10	0.039	0.73
a(670)	NOMAD	287 (283) *	0.15	0.22	0.083	0.79

* The number in parentheses indicates the count of valid observations.

Table 5. Statistical metrics of a_g(443) derived from S2011, A2018, QAA-CDOM, and DQAAG from the IOCCG and NOMAD dataset.

	Algorithms	Data	N	RMSD (m⁻¹)	MARD	Bias (m⁻¹)	R²
$a_{g} (443)$	QAA-CDOM	IOCCG	500	0.20	0.32	−0.078	0.89
		Nomad	287	0.15	0.42	−0.047	0.50
	S2011	IOCCG	500	0.29	0.53	0.023	0.64
		Nomad	287	0.15	0.44	−0.0046	0.54
	A2018	IOCCG	500	0.45	1.01	−0.18	0.38
		Nomad	287	0.17	0.82	−0.064	0.45
	DQAAG	IOCCG	500	0.11	0.19	0.0076	0.96
		Nomad	287	0.13	0.30	0.028	0.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Xin, Q.; Wei, X.; Xu, L.; Bi, J.; Bao, K.; Song, Q. Improvement of the Semi-Analytical Algorithm Integrating Ultraviolet Band and Deep Learning for Inverting the Absorption Coefficient of Chromophoric Dissolved Organic Matter in the Ocean. Remote Sens. 2026, 18, 207. https://doi.org/10.3390/rs18020207

AMA Style

Wang Y, Xin Q, Wei X, Xu L, Bi J, Bao K, Song Q. Improvement of the Semi-Analytical Algorithm Integrating Ultraviolet Band and Deep Learning for Inverting the Absorption Coefficient of Chromophoric Dissolved Organic Matter in the Ocean. Remote Sensing. 2026; 18(2):207. https://doi.org/10.3390/rs18020207

Chicago/Turabian Style

Wang, Yongchao, Quanbo Xin, Xiaodao Wei, Luoning Xu, Jinqiang Bi, Kexin Bao, and Qingjun Song. 2026. "Improvement of the Semi-Analytical Algorithm Integrating Ultraviolet Band and Deep Learning for Inverting the Absorption Coefficient of Chromophoric Dissolved Organic Matter in the Ocean" Remote Sensing 18, no. 2: 207. https://doi.org/10.3390/rs18020207

APA Style

Wang, Y., Xin, Q., Wei, X., Xu, L., Bi, J., Bao, K., & Song, Q. (2026). Improvement of the Semi-Analytical Algorithm Integrating Ultraviolet Band and Deep Learning for Inverting the Absorption Coefficient of Chromophoric Dissolved Organic Matter in the Ocean. Remote Sensing, 18(2), 207. https://doi.org/10.3390/rs18020207

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improvement of the Semi-Analytical Algorithm Integrating Ultraviolet Band and Deep Learning for Inverting the Absorption Coefficient of Chromophoric Dissolved Organic Matter in the Ocean

Highlights

Abstract

1. Introduction

2. Data and Materials

2.1. Training Data

2.2. Validation Data

2.2.1. Simulated Data

2.2.2. NOMAD Dataset

2.2.3. Remote Sensing Image Data

2.3. Accuracy Assessment

3. Methods

3.1. S2011

3.2. A2018

3.3. QAA-CDOM

3.4. DQAAG

4. Results

4.1. Evaluation of bbp(λ) and a(λ)

4.2. Evaluation of ag(443)

4.2.1. S2011

4.2.2. A2018

4.2.3. QAA_CDOM

4.2.4. DQAAG

4.3. Comparison of SeaWiFS Remote Sensing ag(443) Data

5. Discussion

5.1. Model Performance

5.2. Sensitivity Analysis

5.3. Global CDOM Distribution Patterns

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. Evaluation of b_bp(λ) and a(λ)

4.2. Evaluation of a_g(443)

4.3. Comparison of SeaWiFS Remote Sensing a_g(443) Data