Study on the Impact of Input Parameters on Seawater Dissolved Oxygen Prediction Models

Li, Wenqing; Lv, Jing; Wang, Yuhang; Kong, Xiangfeng

doi:10.3390/jmse13030536

Open AccessArticle

Study on the Impact of Input Parameters on Seawater Dissolved Oxygen Prediction Models

¹

College of Ocean Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

²

Institute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao 266061, China

³

Laoshan Laboratory, Qingdao 266237, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(3), 536; https://doi.org/10.3390/jmse13030536

Submission received: 9 February 2025 / Revised: 2 March 2025 / Accepted: 5 March 2025 / Published: 11 March 2025

(This article belongs to the Special Issue Ocean Climate: Deep Learning, Statistical Methods and Dynamical Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

The concentration of dissolved oxygen (DO) in seawater is a core ecological indicator in aquaculture, and its accurate prediction is of great value for the management of marine ranching. In response to the lack of exploration on the optimization mechanism of input parameters in existing DO prediction studies, this study, based on observational data from the Goji Island marine ranching, constructed a technical framework of “parameter screening—model optimization—ecological analysis”. By integrating correlation analysis, principal component analysis (PCA), and multi-model comparison (SVM, MLP, and RF) methods, this study systematically revealed the input parameter optimization strategies and its ecological correlation mechanism. The research findings are as follows: (1) Parameter optimization can significantly improve model accuracy, and the model performance is optimal after eliminating the low-correlation parameter (Tur) (RMSE = 0.039, MAE = 0.030, R² = 0.884). (2) The absence of key parameters (such as Sal) will lead to a significant decrease in prediction accuracy (the R² reduction rate reaches 71.6%). (3) The parameter importance ranking is Tem > pH > Sal > Chl-a > Tur, among which Tem explains 42.3% of the variation in DO. The intelligent parameter optimization framework proposed in this study provides theoretical support for the development of a marine ranching DO monitoring system, and its technical path can be extended to the prediction of other water environment indicators. Future research will develop a parameter adaptive selection algorithm, conduct the dynamic monitoring of multi-scale environmental factors, and achieve the intelligent optimization and verification of model parameters.

Keywords:

sea water DO prediction; parameter selection; input parameter optimization; machine learning model

1. Introduction

Dissolved oxygen (DO) serves as a vital biogeochemical indicator for assessing nearshore ecosystem health, exerting profound impacts on fisheries productivity and aquaculture sustainability [1]. Decades of empirical studies have revealed that the dynamic coupling mechanisms between DO concentrations and key environmental parameters—including water temperature (Tem), salinity (Sal), pH, and chlorophyll-a (Chl-a)—manifest pronounced spatiotemporal heterogeneity across coastal systems [2,3]. The expanding implementation of marine ranching operations has introduced additional complexity to these environmental interactions through intensified anthropogenic-natural forcing [4,5]. As a new aquaculture model based on ecological engineering principles, marine ranching, through the intelligent perception of environmental parameters and ecosystem regulation, not only revolutionizes the traditional aquaculture paradigm [6] but also builds a synergy mechanism of ecological, economic, and social benefits [7]. However, in situ monitoring of its ecological elements still faces three technical bottlenecks: biofouling and equipment corrosion caused by prolonged deployment lead to an average annual failure rate of sensors of 22.5% (95% confidence interval: 15–30%) [8,9]; dynamic disturbances caused by ocean turbulence and extreme events increase the abnormal rate of monitoring data by three to five times [10]; and the lack of a standardized processing framework for multi-source heterogeneous data fusion. Therefore, it is necessary to obtain accurate predictions of the trend of DO changes based on early observation data, understand the process of changes in marine ecological parameters, and establish a dynamic early warning system for marine ranching ecological parameters to provide decision support for the management of marine ranching.

Nevertheless, contemporary dissolved oxygen (DO) prediction research confronts two critical methodological constraints: first, the lack of a standardized framework for parameter selection leads to significant differences in variable combinations used in different studies, limiting the model’s generalization ability; second, the disconnect between parameter optimization mechanisms and ecological driving logic, as most models rely on fixed parameter combinations and ignore the dynamic interactions and spatiotemporal heterogeneity between parameters, constraining further improvements in prediction accuracy.

Although machine learning methods (such as random forest (RF) and support vector machines (SVMs)) have demonstrated high accuracy in DO prediction [11,12], the selection of input parameters (e.g., Tem, Chl-a, turbidity (Tur)) lacks a unified framework, limiting the model’s generalization capability [13]. For example, Li et al. (2023) [14] compared the DO prediction performance of seven parameter combinations in the Yangtze River Estuary and found that redundant variables (e.g., weakly correlated Tur) introduce noise, while omitting key parameters (e.g., photosynthetically active radiation) leads to model bias; in a study conducted by Yang et al. (2024) [15], a univariate recursive forecasting method was used to establish a model, and the historical time series of DO was used to predict future values. This model does not consider the impact of environmental factors and other water quality indicators on DO, which limits the universality of the prediction model to a certain extent.

Additionally, most models rely on fixed parameter weights [16], failing to capture the dynamic interactions between parameters (e.g., nonlinear coupling between Tem and Sal) [17]. To address model interpretability issues, explainable techniques such as SHAP (Shapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations) have been gradually introduced into ecological prediction [18,19]. For instance, in a study conducted by Cui et al. (2024) [20], extreme gradient boosting (XGBoost) was utilized to predict the amplitude of SSTC from the 12 predictors. Shadkani et al. (2024) [21] utilized SHAP analysis to investigate the contribution of each predictor to DO prediction, and the analysis revealed that temperature had the greatest contribution to DO prediction in the data from the Illinois River (ILL) and Des Plaines River (DP).

To better interpret the ocean–atmosphere interaction, a SHAP method is further employed to identify the contributions of predictors in determining the amplitude of the TC-induced SSTC, bringing the attribute-oriented explainability to the proposed method. In addition, although complex models such as deep learning can improve prediction accuracy through automatic feature extraction, their black-box nature obscures the contribution of key ecological parameters, posing a dual challenge of interpretability (Rudin, 2019) and adaptability when applied across regions [22].

The relative contributions of key environmental factors, such as Tem, Sal, pH and Chl-a, to DO dynamics remain underexplored. DO dynamics are governed by the synergistic effects of Tem, Sal, pH, and Chl-a [23]. These interactions involve complex physical–chemical–biological coupling processes: Tem directly modulates oxygen solubility while simultaneously enhancing photosynthetic activity. As the Sal of water bodies increases, the saturation of DO will decrease accordingly. Chl-a concentration, representing phytoplankton biomass, drives diurnal oxygen fluctuations through daytime photosynthetic production and nighttime respiratory consumption (concurrently altering pH via CO₂ release). Seasonal typhoons and rainfall events significantly influence Sal profiles and vertical mixing, thereby affecting DO distribution.

Future research needs to integrate multi-source data (e.g., remote sensing, in situ sensors) with adaptive parameter optimization algorithms (e.g., reinforcement learning-based dynamic feature selection) to enhance the spatiotemporal generalization capability of models [24]. Meanwhile, developing “gray-box” models (e.g., physics-informed neural networks) that balance accuracy and interpretability will be a breakthrough direction [25].

For the above issues, this study proposes a “parameter screening-model optimization-ecological interpretation” technical closed-loop, using the multi-source observational data of the Goji Island marine ranching as the research object, systematically analyzing the regulatory mechanism of input parameters on the DO prediction model. By integrating correlation analysis, principal component analysis (PCA), and Shapley Additive exPlanation (SHAP) value interpretation technology, a data-driven variable screening standard is established. This study innovatively reveals the impact path of input parameter optimization on model performance, providing theoretical support for the transformation of the DO prediction model from “accuracy-oriented” to “accuracy-interpretability-adaptability synergistic optimization”. This method system not only enhances the reliability of the intelligent monitoring of DO in marine ranching but also provides a universal paradigm for ecological model optimization under the fusion of multi-source heterogeneous data.

2. Materials and Methods

2.1. Data Sources

The data were obtained from the monitoring platform deployed at the Goji Island Marine Ranching (30°42′ N 122°46′ E), Zhejiang Province. This marine ranching is located on the continental shelf of the East China Sea, characterized by a subtropical monsoon climate and abundant fishery resources. It primarily cultivates shellfish, algae, and fish species such as large yellow croaker and sea bass, serving as an important aquaculture base in the East China Sea. The monitoring platform was equipped with sensors for various parameters deployed (dissolved oxygen, conductivity-temperature-depth (CTD), pH and chlorophyll turbidity sensors integrated on the data collector, Institute of Oceanographic Instrumentation, Shandong, China) at the sea surface, capable of continuously measuring six key ecological factors, including Tem, DO, and Sal (Figure 1). Parameters were sampled hourly, and, during operation, field personnel regularly collected water samples for laboratory analysis to compare with sensor data to verify the accuracy of the data.

As a typical aquaculture area in the East China Sea, the multi-source observational data from the marine ranching on Goji Island can reflect the complex dynamics of the nearshore ecosystem, providing a robust data foundation for this study.

The data utilized in this article cover the period from 22 September 2022, at 12:00 to 24 October 2022, at 11:00, recorded at hourly intervals, totaling 767 ecological data sets. Each data set encompasses six principal ecological parameters: Tem, Sal, pH, DO, Chl-a, and Tur.

The data were noted to have certain missing values and error rates attributable to environmental and communication variables, with the missing rate below 3%. Interpolation was employed to augment the absent data, succeeded by normalization processing [26]. Cubic spline interpolation, chosen for its effectiveness in maintaining the continuity of the parameters, was used to impute missing entries, resulting in optimal continuous observed data.

2.2. Methodologies for Reasearch

The methodological framework of this study is depicted in Figure 2. Firstly, the Pearson and Spearman correlation coefficients were selected to comprehensively analyze ecological variable relationships (MATLAB R2024a). Pearson assesses linear dependencies, ideal for variables like Tem and DO, while Spearman captures monotonic, nonlinear relationships, as it is less sensitive to outliers. This dual approach ensures robustness by accounting for both linear and nonlinear patterns, providing a nuanced understanding of variable interactions. Correlation analyses utilizing Pearson and Spearman coefficients were conducted to discern strong association characteristics among the parameters and to formulate a reference group for coefficient inputs, excluding groups with weak correlations and discarding those with moderate correlations.

Subsequently, PCA (MATLAB R2024a) was then applied to distill the most salient features from the high-dimensional data. PCA enhances the model’s generalization capability by extracting principal components from the data, thereby reducing redundant variables and noise.

Ultimately, we segmented the data set into training and testing sets in an 8:2 ratio. Then, we input the training set into three machine learning models, SVM, Multilayer Perceptron (MLP), and RF (MATLAB R2024a), respectively. The predicted values of DO were obtained and subsequently subjected to comparative analysis with the measured values.

The three models each possess distinct advantages and disadvantages. The RF model excels in handling multi-dimensional features, albeit with relatively high computational complexity. The MLP is well suited for large-scale datasets but is prone to overfitting. The SVM performs effectively with small sample sizes, yet its capability to process nonlinear data is limited.

This process significantly reduces the risk of overfitting by addressing two key aspects. First, the correlation analysis eliminated redundant or low-relevance parameters, ensuring that only the most meaningful features were retained. This reduced the model’s complexity and prevented it from capturing noise or irrelevant patterns in the data. Second, the PCA step further compressed the feature space by transforming the input variables into a set of uncorrelated principal components, which retain the most significant variance in the data. This dimensionality reduction not only enhances computational efficiency but also mitigates the risk of overfitting by focusing on the most informative components.

Finally, in the concluding phase of our experimental investigation, we employed a suite of machine learning algorithms, including SVM, MLP, and RF, to conduct predictive analytics. Table 1 presents a systematic comparative analysis of three machine learning methodologies.

Subsequent to the acquisition of predictive outcomes, a comprehensive analysis was performed to elucidate the efficacy and performance metrics of each model within the context of our study.

This article employs three metrics to assess the model’s performance: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²) (MATLAB R2024a) [27].

The RMSE serves as a metric to quantify the deviation between observed and true values, effectively capturing measurement precision while exhibiting heightened sensitivity to outliers. Notably, when a predicted value substantially deviates from the actual value, the RMSE will yield a significantly larger value.

In contrast, the MAE circumvents the issue of error cancelation, thereby providing a more accurate representation of the actual prediction error. This characteristic makes MAE particularly valuable in scenarios where error compensation might obscure the true magnitude of prediction inaccuracies.

The R² offers a standardized measure of predictive accuracy. As this metric approaches unity, it indicates an increasingly close correspondence between predicted and actual values, with a value of 1 representing perfect predictive accuracy. This statistical measure is particularly useful for assessing the proportion of variance in the dependent variable that is predictable from the independent variable(s).

2.3. Input Parameter Selection Mechanism

The input parameter selection was based on comprehensive correlation analysis using both Pearson and Spearman correlation coefficients to evaluate the relationships between potential predictors and DO concentrations. Parameters with the lowest absolute correlation values (|r| ≤ 0.5) were identified as having relatively weaker predictive power [28].

To systematically assess the impact of parameter selection, we established three distinct experimental groups:

Group a (Baseline): Full parameter set (Tem, Sal, DO, Chl-a, pH, Tur).

Group b (Low-correlation Excluded): Removed the parameter with the lowest correlation.

Group c (Control): Intentionally excluded a parameter with moderate correlation.

This tri-group experimental design enables rigorous validation of parameter selection effects through comparative analysis of model performance metrics (RMSE, MAE, R²). The control group serves as a critical benchmark to verify that the exclusion of low-correlation parameters is not coincidental but statistically justified.

3. Result and Discussions

3.1. Data Analysis

Figure 3 illustrates that, during the monitoring period, there was a gradual decrease in sea surface temperature, from an average of 24.99 °C on 22 September to 20.93 °C on 24 October. The DO concentration increased from an average of 6.14 mg/L on 22 September to 7.10 mg/L on 24 October, showing an inverse relationship with Tem. In the autumn of 2022, Zhoushan encountered a significant cold air intrusion, from 4 to 7 October, resulting in a rapid decrease in Tem by 7–8 °C, accompanied by coastal winds reaching force 8–9. This corresponds with the noted decline in sea surface temperature, generally lagging air temperature by 1–3 days.

The cooling, vigorous winds and precipitation triggered extensive mixing and convection within the surface water layer. On 3 October, the minimum sea surface salinity was recorded, presumably due to the low-Sal coastal currents induced by vigorous winds. On 4 October, DO levels dropped to a minimum of 5.35 mg/L. Sal exhibited notable fluctuations over the subsequent days. Although Chl-a, pH, and Tur levels decreased, an inverse relationship between Sal and pH with Tem was observed, accompanied by increased DO levels, potentially due to enhanced nutrient mixing from coastal currents increasing Chl-a and Tur levels. Following a thorough analysis of the environmental change trends in marine ranching, the analysis highlights that, prior to the cooling period (late summer, before 4 October), characterized by elevated Tem, abundant sunlight, and a relatively stable aquatic environment, the correlation between DO and Tem is difficult to ascertain. The analysis indicates specific correlations among monitoring parameters such as Sal, Chl-a, Tur, and pH, influenced by the combined influences of environmental Sal and marine organisms. Subsequently, with the onset of cold air during autumn, when Tem dropped below 24.27 °C, a more pronounced inverse correlation between Tem and DO emerged. The pattern validates the rationality and reliability of the data. At the same time, variations in Sal and Chl-a persist in exerting an influence on DO.

The overarching pattern of these data aligns with the findings on the spatiotemporal distribution characteristics of DO in the region’s nearshore waters [29]. During summer, increased Tem due to seasonal warming and the influence of the Kuroshio Current create optimal conditions for phytoplankton photosynthesis. In contrast, during autumn, biological activity diminishes, and the Kuroshio Current’s influence progressively weakens. Thus, the DO levels are predominantly responsive to physical factors like Tem and Sal, while the northeastward-moving coastal current, carrying low-Sal saltwater, predominantly influences the nearshore regions. Further investigations into the factors influencing DO will be conducted using correlation analysis, PCA, and partial dependency analysis algorithms.

3.2. Relevance Analysis

This research utilized both Pearson and Spearman correlation coefficients to refine the primary feature datasets and conduct correlation analysis on the critical ecological variables. Figure 4 demonstrates a substantial alignment between the Pearson and Spearman coefficients, with the correlation between Tem and DO as the most pronounced; the Pearson coefficient is −0.876, and the Spearman coefficient is −0.810, signifying a pronounced negative correlation. The pH value exhibits a secondary, yet significant, positive correlation with DO, evidenced by a Pearson coefficient of 0.491 and a Spearman coefficient of 0.662. Sal demonstrates a negative correlation with DO levels, with Pearson and Spearman coefficients of −0.386 and −0.599, respectively. The correlation values for Chl-a and DO are 0.281 and 0.223, respectively, indicating a relatively modest association, while Tur exhibits the weakest relationship. These findings concur with Han et al.’s (2024) [30] research on Goji Island, indicating that Tur exhibits a weaker association with DO relative to other ecological parameters.

Furthermore, Sal and pH demonstrate an inverse relationship, aligning with observations in Figure 3 where an increase in Chl-a correlates with decreased Sal. Low-Sal coastal currents carry significant land nutrients, promoting phytoplankton growth. Chl-a, indicative of phytoplankton biomass, absorbs CO₂ and generates O₂ through photosynthesis. While there exists a negative link between Chl-a and pH, both parameters show positive correlations with DO.

The comparative analysis of the Pearson and Spearman coefficients for environmental features displays a notable consistency with studies conducted in areas such as the Yangtze River Estuary and Shenzhen Bay [31,32,33,34,35].

3.3. Model Prediction Result and Analysis

3.3.1. PCA and Input Parameter Optimization

According to the outcomes of the relevance analysis, the input parameters for model development are segmented into three groups: a, b, and c. Category a encompasses all input parameters: Tem, Sal, Chl-a, pH, and Tur. Group b includes Tem, Sal, Chl-a, and pH, excluding Tur. Group c represents the Sal comparison set, omitting intermediate loading, with input parameters Tem, Chl-a, pH, and Tur. As shown in Table 2.

Each of these categories should be simultaneously integrated into the PCA model. The results for group a-PCA are presented in Table A1 and Table A2; findings for group b-PCA are elaborated in Table A3 and Table A4; and the outcomes for group c-PCA are depicted in Table A5 and Table A6.

x_{i, n e w 1} = {0.279 \times x}_{i, p H} - 0.215 \times x_{i, S a l} - 0.423 \times x_{i, T e m} + 0.320 \times x_{i, C h l - a} + 0.396 \times x_{i, T u r}

(1)

x_{i, n e w 2} = {- 0.587 \times x}_{i, p H} - 0.245 \times x_{i, S a l} + 0.362 \times x_{i, T e m} + 0.502 \times x_{i, C h l - a} + 0.263 \times x_{i, T u r}

(2)

Here,

x_{i, n e w 1}

represented the first new feature recovered by PCA for group i,

x_{i, n e w 2}

represented the second new feature extracted by PCA for group i, and

x_{i, p H}

represented the pH data for group i.

x_{i, n e w 1}, x_{i, n e w 2}

was the PCA result of a.

x_{i, n e w 3} = {- 0.475 \times x}_{i, p H} + 0.518 \times x_{i, T e m} - 0.270 \times x_{i, C h l - a} + 0.306 \times x_{i, S a l}

(3)

x_{i, n e w 4} = {0.405 \times x}_{i, p H} - 0.266 \times x_{i, T e m} - 0.589 \times x_{i, C h l - a} + 0.559 \times x_{i, S a l}

(4)

Here,

x_{i, n e w 3}

represented the first new feature recovered by PCA for group i,

x_{i, n e w 4}

represented the second new feature extracted by PCA for group i, and

x_{i, p H}

represented the pH data for group i.

x_{i, n e w 3}

,

x_{i, n e w 4}

was the PCA result of b.

x_{i, n e w 5} = {0.312 \times x}_{i, p H} - 0.454 \times x_{i, T e m} + 0.311 \times x_{i, C h l - a} + 0.430 \times x_{i, T u r}

(5)

x_{i, n e w 6} = {- 0.582 \times x}_{i, p H} + 0.324 \times x_{i, T e m} + 0.532 \times x_{i, C h l - a} + 0.379 \times x_{i, T u r}

(6)

Here,

x_{i, n e w 5}

represented the first new feature recovered by PCA for group i,

x_{i, n e w 6}

represented the second new feature extracted by PCA for group i, and

x_{i, p H}

represented the pH data for group i.

x_{i, n e w 5}

,

x_{i, n e w 6}

was the PCA result of c.

3.3.2. Machine Learning Models

Three distinct machine learning models use the input groups a, b, c, a-PCA, b-PCA, and c-PCA. Figure 5 depicts the resultant outputs, while Table 3 enumerates the evaluation metrics.

Figure 5 and Table 3 demonstrate significant variations in RMSE, MAE, and R² values across the experimental groups a, b, and c. The removal of Tur from the input groups (group b) significantly enhances the accuracy of the prediction model. In contrast, the exclusion of Sal from the input groups (group c) results in a substantial decline in prediction accuracy. Eliminating Tur factors with low loads can significantly enhance prediction accuracy. Removing Tur, associated with low loadings, can enhance the precision of the prediction model, whereas excluding Sal, characterized by intermediate loadings, reduces its accuracy. This comparison underscores the importance of optimizing and selecting input groups to enhance the accuracy of the prediction model.

For instance, the exclusion of Tur from input group b in certain experimental setups underscores the RF model’s robustness in managing multi-dimensional properties, as it maintains high predictive performance despite the absence of a potentially influential feature. Consequently, the RF model is utilized extensively to predict DO levels in saltwater, yielding favorable results [31,36,37].

Moreover, a comparison of a-PCA, b-PCA, and c-PCA groups, through feature extraction to refine input parameters, indicates that the b-PCA-RF group, which excludes Tur, yielded the most favorable outcomes. The RMSE, MAE, and R² values for this group were 0.039, 0.030, and 0.884, respectively. This represents improvements of 45.5%, 28.6%, and 3.3% over the a-PCA-RF group and 59.4%, 58.9%, and 252.2% over the c-PCA-RF group. This highlights that optimizing and selecting input parameters according to loadings can substantially enhance the accuracy of prediction models.

When comparing group a and a-PCA, and group b and group b-PCA, models utilizing PCA for feature extraction generally outperform the original model. This is evidenced across the three deep learning models, suggesting that PCA may efficiently eliminate noise and redundant information by retaining the most significant features, hence enhancing data quality. PCA diminishes the interdependence among features by transforming the data into a new feature space, reducing the risk of overfitting and boosting the efficacy of algorithms, especially for the RF model. Nonetheless, PCA did not improve outcomes for group c and indeed worsened the model’s accuracy by removing Sal, a crucial element, significantly reducing available data features while concurrently introducing more noise through PCA. Incorporating PCA into the RF model typically outperforms other models in optimizing and integrating input groups, achieving optimal results when Tur is omitted. However, the removal of Sal, a crucial component, significantly compromises the accuracy and precision of the predictive model.

3.3.3. Taylor Diagram and Violin Plot Analysis

To assess the reliability and efficacy of the proposed model, the Taylor diagram, endorsed by numerous researchers and extensively utilized, was selected for result analysis [38,39]. The Taylor diagram illustrates the R², RMSE and standard deviation between actual values and predictions for 18 models, as illustrated in Figure 6. The b-PCA-RF group demonstrates superior performance compared to all other models, with the standard deviation distribution of its predicted data closely aligning with the actual data, indicating the model’s consistency in capturing observed data patterns.

The violin plot effectively showcases the disparity between actual and anticipated data for each model, facilitating observation of the data’s probability distribution. Figure 7 illustrates that the model most closely approximating the actual value in terms of prediction is the b-PCA-RF group, as its projected data exhibit a distribution similar to that of the actual data.

Specifically, the median of Group 15 (6.96 mg/L) is close to the true median (6.99 mg/L), with the lower half of its distribution nearly overlapping the true values. This indicates improved accuracy for lower DO concentrations. However, the upper half shows slight deviation, suggesting room for refinement in predicting higher DO levels. Graphical analysis highlights Group 15 as a strong candidate, particularly for lower concentration ranges.

In conclusion, the b-PCA-RF group, utilizing Tem, Chl-a, pH, and Sal as the input group, effectively addresses challenges such as inadequate generalization, local optima, underfitting, and overfitting compared to conventional machine learning techniques, thereby improving the precision of DO predictions. The optimized model is more adept at accommodating extended time frames and limited sample sizes, as evidenced by the water quality data discussed in this paper.

3.3.4. Predictor Importance Analysis

This research employs a feature analysis utilizing the a-RF full-parameter input group to evaluate the predictive significance of these input groups. The analysis of the a-RF group provides a comprehensive understanding of the significance of each input parameter. The predictor importance function is capable of assessing and ascertaining the significance of each feature (predictor variable) within the model. This is instrumental in identifying features that significantly influence the model’s predictive accuracy, thereby facilitating feature selection and optimization. The output from the predictor importance function is a vector indicating the relative contributions of each feature to the model’s predictive performance. A higher score indicates a greater significance of the characteristic. Figure 8 illustrates that the x-axis represents the feature names, and each bar corresponds to a particular characteristic. The y-axis denotes the significance scores of the features. The Tem feature exhibits the highest score, signifying its paramount contribution to the model’s predictions, followed by pH and Sal. Chl-a exerts little influence, with Tur contributing the least, as supported by the correlation analysis in Section 3.2.

Previous research has demonstrated that oxygen solubility in seawater decreases as Tem increases. Research indicates that global warming is leading to a decline in DO levels in the world’s oceans. DO in saltwater is affected by both Tem and Sal; an increase in Sal results in a decrease in oxygen solubility in water. Furthermore, the process of phytoplankton photosynthesis, which consumes CO₂ and generates O₂, indicates a positive correlation between Chl-a and pH with DO levels. This is consistent with the analytical results from the employed models. More experimental evidence is needed to clarify the impact of Tur on DO levels. Consequently, omitting Tur as an input variable in predictive models may enhance the accuracy of DO level predictions.

3.3.5. Partial Dependency Analysis

The partial dependence plot (PDP) visualizes the marginal effect of specific features on model predictions, isolating their contributions while accounting for other variables. In this study, the PDP was used to clarify how key ecological parameters, such as Tem and pH, influence DO levels. This tool enhances model interpretability, providing insights into critical factors driving DO dynamics and supporting ecological research. Figure 9 displays the partial dependence graphs for five parameters of the a-RF model, with the x-axis indicating feature values and the y-axis representing anticipated DO levels.

Figure 9a presents the partial dependence plot of Tem, showing a range of approximately 0.75 on the y-axis, indicating that Tem can significantly influence DO levels. The findings reveal that, below 24.27 °C, DO levels decrease as Tem rises, whereas, above this threshold, DO levels stabilize or may even increase with rising Tem; this corresponds to the trend depicted in Figure 3, illustrating the correlation between Tem and DO.

Figure 9b illustrates the pH dependence, revealing a pH range of around 0.35, which signifies a substantial influence of pH on DO levels. When the pH is below 8.17, DO levels positively correlate with pH. Above 8.17, DO exhibits a stable or declining trend as pH increases. Figure 3 illustrates that, at a pH of 9.27, beyond 9.17, there is a marked increase followed by a significant reduction, but DO initially decreases and subsequently rises, indicating a negative association with pH.

Figure 9c illustrates the partial dependence diagram of Sal, indicating that Sal has a measurable influence on DO, with a vertical axis range of around 0.3. DO typically decreases as Sal increases, aligning with the results shown in Figure 4. Figure 4 illustrates that both the Pearson and Spearman correlation coefficients between DO and Sal are negative, underscoring an inverse relationship between these variables. This elucidates why the DO prediction model’s precision diminished after Sal was removed.

The partial dependence plot of Chl-a displayed in Figure 9d exhibits significant fluctuations; however, as Chl-a levels increases, DO generally exhibits an upward trend, with a vertical axis range of approximately 0.11, suggesting that Chl-a is not the primary driver of DO dynamics.

The partial dependence plot of Tur in Figure 9e demonstrates considerable fluctuations without a discernible pattern, with the ordinate range remaining below approximately 0.05. This implies that Tur has a negligible influence on DO and contributes minimally to its prediction. These observations are consistent with the findings presented in Section 3.2, indicating that Tur plays a minor impact in predicting DO and may exert a counteractive influence.

3.3.6. SHAP Analysis

SHAP is a technique for interpreting predictions from machine learning models, based on Shapley values from cooperative game theory. These values assign an importance score to each feature, elucidating its contribution to the model’s output [40]. This paper presents a summary of SHAP analysis for five key oceanographic parameters within the a-RF model, as illustrated in Figure 10. In this figure, the y-axis ranks the influence of ocean elements in descending order. At the same time, the x-axis arranges the ocean parameters in descending order of influence, while the x-axis quantifies each feature’s contribution to the model prediction—positive values indicate enhancements, and negative values represent detriments. Each plotted point signifies the SHAP value of a sample, with the point’s color reflecting the feature value’s magnitude; high values are marked in yellow, and low values in blue.

Figure 10 illustrates that Tem is the most influential factor in predicting DO, exhibiting a significantly negative correlation. This supports the discussion in Section 3.2, which details the inverse relationship between Tem and DO. Furthermore, the SHAP analysis also highlights several other critical features affecting DO predictions. Sal demonstrates a negative effect, underscoring the necessity to monitor Sal levels in mariculture. Chl-a and pH levels positively influence DO, reinforcing findings in Section 3.2. Tur exerts the least influence on the DO prediction, with its positive contribution negligible, thereby supporting the prior conclusion that Tur plays a negligible role in this context.

4. Conclusions

4.1. Ecological Mechanism Analysis

Based on in situ monitoring data from the marine ranching construction project in Goji Island, East China Sea, six key parameters—Tem, Sal, DO, Chl-a, pH, and Tur—were selected for high-frequency monitoring campaigns. This selection aligns with ecological protection requirements, technical specifications, and cost-control considerations to assess ecosystem health. The ecological functions of these parameters are defined as follows: Tem regulates marine organisms’ metabolic rates and biogeographic distribution; Sal influences species composition and cellular activity via osmotic pressure; DO serves as the core indicator for aerobic respiration and energy metabolism; pH governs enzymatic activity and carbonate system equilibrium; Chl-a quantifies phytoplankton biomass and primary productivity; Tur reflects suspended particulate concentrations.

This study utilizes observational data obtained from marine aquaculture near Goji Island, specifically during the transitional period between late summer and early autumn (September to October). This timeframe is characterized by significant fluctuations in marine environmental factors, during which DO is synergistically regulated by Tem, Sal, pH, and Chl-a. Based on these factors, the ecological interpretation of ecological parameters in the DO prediction model is as follows:

It is widely recognized that Tem exhibits a negative correlation with DO. Consistent with this observation, the PDP analysis of the predictive model suggests a potential strengthening of the correlation between Tem and DO as Tem decreases. This suggests that declining autumn Tem could become a predominant factor influencing DO.

An increase in Sal typically leads to a reduction in the saturation of DO in water. Additionally, Sal can indirectly reflect phenomena such as water mixing or stratification (e.g., freshwater from terrestrial sources or bottom water), which also affect the concentration of DO in the water body. In this study, we experimented with removing Sal as an input parameter, and the results indicated that the absence of Sal significantly impacts the accuracy of the outcomes. Therefore, Sal is a parameter that requires close attention.

The pH level in seawater represents its acidity and alkalinity, which influences biological activities in aquatic systems. For instance, CO₂ generated from phytoplankton photosynthesis or direct impacts on microbial activity may establish correlations with DO, consistent with existing research findings. Therefore, monitoring seawater pH and DO is critical for assessing aquatic health and providing early warnings for hypoxia or acidification events, particularly in aquaculture and coral reef conservation.

With the progression of seasonal changes, the decline in Chl-a concentration leads to a corresponding reduction in its contribution to the predictive outcomes of DO. Chl-a serves as an indicator of phytoplankton abundance in seawater. During the summer months, when sunlight is abundant and Tem are elevated, phytoplankton engage in photosynthesis, absorbing CO₂ and generating O₂, while simultaneously consuming oxygen. The analytical results of this study corroborate that the weight of Chl-a in the prediction of DO diminishes as its concentration decreases with the changing seasons.

Taking ecological factors into account, Tur in seawater primarily reflects the concentration of suspended particulate matter. The composition of suspended matter in seawater is complex, potentially encompassing both inorganic and organic substances. If the suspended matter is organic and biologically active, it may participate in photosynthesis or consume oxygen during decomposition processes, whereas inorganic matter may consist of nutrients or pollutants. Given that current monitoring methods are unable to precisely identify the predominant components of suspended matter, the mechanisms by which Tur influences DO concentrations remain unclear. Based on preliminary data analysis from the selected region and monitoring period of this study, it can only be inferred that the impact of Tur on DO is relatively minor. Nonetheless, in the practical computational process, the exclusion of Tur can reduce noise within the model and enhance the accuracy of predictions. Further analysis of Tur’s composition could potentially provide a more robust ecological explanation.

Sensor limitations precluded the integration of external drivers (light intensity, air pressure, wind speed) regulating photosynthesis, air–sea exchange, and surface mixing. While our model captures physics–chemistry–biology couplings, predictive stability under extreme meteorological conditions requires enhanced multi-source data fusion.

4.2. Discussion

This study establishes an integrated framework encompassing parameter selection, model optimization, and ecological analysis to elucidate key regulatory mechanisms in seawater DO prediction models and their ecological applications.

Input Parameter Optimization

The comparative analysis of monitoring data from Goji Island marine ranching demonstrates that optimized parameter selection significantly enhances prediction accuracy. Correlation analysis (Pearson/Spearman coefficients) and PCA revealed minimal Tur-DO correlation (r < 0.15). Subsequent model comparisons (SVM, MLP, RF) showed that the PCA-RF model (excluding Tur) outperformed others with the RMSE = 0.039, MAE = 0.030, and R² = 0.884, achieving 45.5%, 28.6%, and 3.3% improvements over the full-parameter model, respectively. Notably, despite moderate Sal-DO correlation (Pearson r = 0.42), Sal omission severely degraded performance (RMSE = 0.096, MAE = 0.073, R² = 0.251), confirming its critical physicochemical regulatory role.

2.: Parameter Importance Hierarchy

A multi-method assessment (Taylor diagrams, SHAP values, partial dependence analysis) established the parameter hierarchy:

Tem > pH > Sal > Chl-a > Tur

This hierarchy aligns with correlation and PCA results. Tem dominated DO variability, while Tur introduced model noise. The findings provide mechanistic insights into seasonal DO dynamics during summer–autumn transitions and guide the monitoring parameter selection for marine ranching.

3.: Ecological Implications

This study highlights two operational principles for regional DO prediction: Prioritize the real-time monitoring of Tem, Sal, pH, and Chl-a and exclude Tur to enhance model robustness. Spatiotemporal heterogeneity in parameter weights suggests that seasonal adjustment mechanisms may optimize predictive models. This parameter optimization strategy improves both the model accuracy and ecological interpretability of DO dynamics.

The current research scope is constrained by the observational data derived from marine aquaculture monitoring systems, which may limit model generalizability and predictive accuracy. Specifically, the data set emphasizes locally measurable parameters (Tem, Sal, pH, Chl-a, Tur) while overlooking the external drivers of DO dynamics, such as meteorological and large-scale hydrological factors. This omission may reduce model applicability to marine ecosystems with distinct environmental regimes or intricate biogeochemical interactions.

To mitigate these limitations and strengthen framework robustness, future studies will extend monitoring protocols to encompass broader environmental variables. This expansion will incorporate remote sensing-derived parameters: wind speed (modulating air–sea gas exchange); atmospheric pressure (controlling interfacial oxygen flux); precipitation (modifying Sal gradients via freshwater influx); and photosynthetically active radiation (regulating phytoplankton-mediated primary production). The integration of these variables will enhance the model’s capacity to resolve DO variability drivers in spatially heterogeneous and temporally dynamic marine environments.

By integrating these supplementary variables, the model is enhanced in its capacity to encapsulate the multifaceted driving factors contributing to DO variability, particularly within dynamic and heterogeneous marine ecosystems.

Author Contributions

Writing—original draft, W.L. and J.L.; methodology, J.L.; software, W.L.; validation, X.K. and Y.W.; formal analysis, J.L. and X.K.; investigation, J.L.; resources, J.L.; data curation, Y.W.; writing—original draft preparation, W.L.; writing—review and editing, X.K.; visualization, W.L.; supervision, X.K.; project administration, J.L.; funding acquisition, X.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Shandong Provincial Natural Science Foundation (ZR2020MD085), the Project (tsqn202408288) supported by the Taishan Scholars Program.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We thank the Special Project on “Fishery Water Environment Monitoring Equipment and Early Warning Technology” within China’s key R&D initiative “Blue Granary Technology Innovation” for the data provided.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DO	Dissolved oxygen
RF	Random forest
Tem	Water temperature
PCA	Principal component analysis
Sal	Salinity
Chl-a	Chlorophyll-a
Tur	Turbidity
SVM	Support vector machine
MLP	Multilayer perceptron
R²	Coefficient of Determination

Appendix A

Table A1. Eigenvalue and principal component contribution rate of the input group a.

Component	Initial Eigenvalues			Extraction Sums of Squared Loading
	Characteristic Value	Variance %	Cumulative %	Characteristic Value	Variance %	Cumulative %
1	1.776	35.527	35.527	1.776	35.527	35.527
2	1.167	23.344	58.871	1.167	23.344	58.871
3	0.958	19.166	78.037
4	0.725	14.498	92.536
5	0.373	7.464	100.000

Table A2. Principal component load matrix of the input group a.

Parameters	Main Ingredient 1	Main Ingredient 2
pH	0.279	−0.587
Sal	−0.215	−0.245
Tem	−0.423	0.362
Chl-a	0.320	0.502
Tur	0.396	0.263

Table A3. Eigenvalue and principal component contribution rate of the input group b.

Component	Initial Eigenvalues			Extraction Sums of Squared Loading
	Characteristic Value	Variance %	Cumulative %	Characteristic Value	Variance %	Cumulative %
1	1.512	37.808	37.808	1.512	37.808	37.808
2	1.120	27.997	65.805	1.120	27.997	65.805
3	0.798	19.939	85.744
4	0.570	14.256	100.000

Table A4. Principal component load matrix of the input group b.

Parameters	Main Ingredient 1	Main Ingredient 2
pH	−0.475	0.405
Tem	0.518	−0.266
Chl-a	−0.270	−0.589
Sal	0.306	0.559

Table A5. Eigenvalue and principal component contribution rate of the input group c.

Component	Initial Eigenvalues			Extraction Sums of Squared Loading
	Characteristic Value	Variance %	Cumulative %	Characteristic Value	Variance %	Cumulative %
1	1.709	42.714	42.714	1.709	42.714	42.714
2	1.149	28.735	71.450	1.149	28.735	71.450
3	0.746	18.650	90.099
4	0.396	9.901	100.000

Table A6. Principal component load matrix of the input group c.

Parameters	Main Ingredient 1	Main Ingredient 2
pH	0.312	−0.582
Tem	−0.454	0.324
Chl-a	0.311	0.532
Tur	0.430	0.379

References

Ay, M.; Kisi, O. Estimation of Dissolved Oxygen by Using Neural Networks and Neuro Fuzzy Computing Techniques. KSCE J. Civ. Eng. 2017, 21, 1631–1639. [Google Scholar] [CrossRef]
Eerkes-Medrano, D.; Menge, B.; Sislak, C.; Langdon, C. Contrasting Effects of Hypoxic Conditions on Survivorship of Planktonic Larvae of Rocky Intertidal Invertebrates. Mar. Ecol. Prog. Ser. 2013, 478, 139–151. [Google Scholar] [CrossRef]
Lee, Y.-W.; Park, M.-O.; Kim, S.-G.; Kim, S.-S.; Khang, B.; Choi, J.; Lee, D.; Lee, S.H. Major Controlling Factors Affecting Spatiotemporal Variation in the Dissolved Oxygen Concentration in the Eutrophic Masan Bay of Korea. Reg. Stud. Mar. Sci. 2021, 46, 101908. [Google Scholar] [CrossRef]
Wang, Y.; Yu, J.; Chen, P. Remote Sensing Assessment of Ecological Effects of Marine Ranching in the Eastern Guangdong Waters, China. J. Geosci. Environ. Prot. 2018, 6, 101–113. [Google Scholar] [CrossRef]
Wang, M.; He, G.; Ishwaran, N.; Hong, T.; Bell, A.; Zhang, Z.; Wang, G.; Wang, M. Monitoring Vegetation Dynamics in East Rennell Island World Heritage Site Using Multi-Sensor and Multi-Temporal Remote Sensing Data. Int. J. Digit. Earth 2018, 13, 393–409. [Google Scholar] [CrossRef]
Didar-Ul Islam, S.M.; Bhuiyan, M.A.H. Impact Scenarios of Shrimp Farming in Coastal Region of Bangladesh: An Approach of an Ecological Model for Sustainable Management. Aquac. Int. 2016, 24, 1163–1190. [Google Scholar] [CrossRef]
Qin, M.; Yue, C.; Du, Y. Evolution of China’s Marine Ranching Policy Based on the Perspective of Policy Tools. Mar. Policy 2020, 117, 103941. [Google Scholar] [CrossRef]
Johnson, K.S.; Coletti, L.J.; Jannasch, H.W.; Sakamoto, C.M.; Swift, D.D.; Riser, S.C. Long-Term Nitrate Measurements in the Ocean Using the in Situ Ultraviolet Spectrophotometer: Sensor Integration into the APEX Profiling Float. J. Atmos. Ocean. Technol. 2013, 30, 1854–1866. [Google Scholar] [CrossRef]
Watt, A.; Phillips, M.; Campbell, C.; Wills, I. Wireless Sensor Networks for Monitoring Underwater Sediment Transport. Sci. Total Environ. 2019, 667, 160–165. [Google Scholar] [CrossRef]
Delgado, A.; Briciu-Burghina, C.; Regan, F. Antifouling Strategies for Sensors Used in Water Monitoring: Review and Future Perspectives. Sensors 2021, 21, 389. [Google Scholar] [CrossRef]
Nadiri, A.A.; Fijani, E.; Tsai, F.T.-C.; Moghaddam, A.A. Supervised Committee Machine with Artificial Intelligence for Prediction of Fluoride Concentration. J. Hydroinform. 2013, 15, 1474–1490. [Google Scholar] [CrossRef]
Fijani, E.; Barzegar, R.; Liu, B.; Tziritis, E.; Skordas, K. Design and Implementation of a Hybrid Model Based on Two-Layer Decomposition Method Coupled with Extreme Learning Machines to Support Real-Time Environmental Monitoring of Water Quality Parameters. Sci. Total Environ. 2019, 648, 839–853. [Google Scholar] [CrossRef] [PubMed]
Sit, M.; Demiray, B.Z.; Xiang, Z.; Ewing, G.J.; Sermet, Y.; Demir, I. A Comprehensive Review of Deep Learning Applications in Hydrology and Water Resources. Water Sci. Technol. 2020, 82, 2635–2670. [Google Scholar] [CrossRef] [PubMed]
Liang, X.; Jian, Z.; Tan, Z.; Dai, R.; Wang, H.; Wang, J.; Qiu, G.; Chang, M.; Li, T. Dissolved Oxygen Concentration Prediction in the Pearl River Estuary with Deep Learning for Driving Factors Identification: Temperature, PH, Conductivity, and Ammonia Nitrogen. Water 2024, 16, 3090. [Google Scholar] [CrossRef]
Yang, H.; Sun, M.; Liu, S. A Hybrid Intelligence Model for Predicting Dissolved Oxygen in Aquaculture Water. Front. Mar. Sci. 2023, 10, 1126556. [Google Scholar] [CrossRef]
Jiang, Z.; Cai, W.; Chen, B.; Wang, K.; Han, C.; Roberts, B.J.; Hussain, N.; Li, Q. Physical and Biogeochemical Controls on PH Dynamics in the Northern Gulf of Mexico during Summer Hypoxia. J. Geophys. Res. Ocean. 2019, 124, 5979–5998. [Google Scholar] [CrossRef]
Liu, J.; Li, S.; Ji, X.; Liu, G.; Pan, Q.; Li, Y. Implementing a Finite-Volume Coupled Physical-Biogeochemical Model to the Coastal East China Sea. Ocean Sci. Discuss. 2020. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S. A Unified Approach to Interpreting Model Predictions. Proc. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 12–17 June 2016; pp. 1135–1144. [Google Scholar]
Cui, H.; Tang, D.; Liu, H.; Liu, H.; Sui, Y.; Lai, Y.; Gu, X. Modeling Ocean Cooling Induced by Tropical Cyclone Wind Pump Using Explainable Machine Learning Framework. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
Shadkani, S.; Hemmatzadeh, Y.; Saber, A.; Sergini, M.M. Enhanced Predictive Modeling of Dissolved Oxygen Concentrations in Riverine Systems Using Novel Hybrid Temporal Pattern Attention Deep Neural Networks. Environ. Res. 2024, 263, 120015. [Google Scholar] [CrossRef]
Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Us e Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Liu, S.; Hassan, S.G.; Cao, Y.; Xu, L.; Feng, D.; Cao, L.; Chen, W.; Chen, Y.; Guo, J.; et al. A Novel Hybrid Model to Predict Dissolved Oxygen for Efficient Water Quality in Intensive Aquaculture. IEEE Access 2023, 11, 29162–29174. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep Learning and Process Understanding for Data-Driven Earth System Science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Chen, G.; Yang, J.; Wu, L. Artificial Intelligence-Aided Remote Sensing of the Intermediate Ocean. Sci. Technol. Foresight 2022, 1, 103–120. [Google Scholar]
Graf, R.; Zhu, S.; Sivakumar, B. Forecasting River Water Temperature Time Series Using a Wavelet–Neural Network Hybrid Modelling Approach. J. Hydrol. 2019, 578, 124115. [Google Scholar] [CrossRef]
Pang, H.; Yongo, E.; Lu, Z.; Li, Q.; Liu, X.; Li, L.; Guo, Z. Spatio-Temporal Dynamics of Phytoplankton Community Structure in the Coastal Waters of the Southern Beibu Gulf. Environ. Monit. Assess. 2024, 196, 721. [Google Scholar] [CrossRef]
Liu, R.; Chen, S.; Yu, J.; Liu, X.; Zhao, C.; Zhang, X. Spatial and Temporal Distribution of Dissolved Oxygen in Zhejiang Coastal Area. Ocean Dev. Manag. 2023, 40, 13–20. [Google Scholar] [CrossRef]
Han, L.; Zhang, J.; Lang, C.; Li, W.; Wu, Z.; He, X.; Wang, X.; Yu, J.; Li, Q.; Li, Y.; et al. Mussel Culture Activities Facilitate the Export and Burial of Particulate Organic Carbon. J. Mar. Sci. Eng. 2024, 12, 910. [Google Scholar] [CrossRef]
Li, X.; Wang, H.; Wang, Y.; Zhang, L.; Wu, Y. Machine Learning-Based Dissolved Oxygen Prediction Modeling and Evaluation in the Yangtze River Estuary. Environ. Sci. 2023, 45, 7123–7133. [Google Scholar] [CrossRef]
Xiong, J.; Xiong, R.; Lu, H.; Zheng, Y. Machine Learning-Based Water Quality Forecasting for Shenzhen Bay. Pearl River 2024, 45, 10–18. [Google Scholar]
Sun, Y.; Lv, F.; Chen, Z.; Diao, X.; Jiang, J.; Wei, C.; Pan, J. Spatial–Temporal Distribution and Dynamics of Dissolved Oxygen in an Adjacent Area of the Changjiang Estuary. Mar. Sci. 2020, 45, 86–96. [Google Scholar]
Xu, C.; Liu, G.; Chen, X. Spatiotemporal Variations and Influencing Factors of River Dissolved Oxygen in Dongguan Section of Dongjiang River, Pearl River Basin. J. Lake Sci. 2021, 34, 1540–1549. [Google Scholar]
Liu, H.; Wang, Y.; An, B.; Qian, J.; Qiu, C. Study on the Variation Trend and Influencing Factors of Summer Hypoxia off the Yangtze River Estuary. Mar. Environ. Sci. 2021, 40, 341–351. [Google Scholar] [CrossRef]
Valera, M.; Walter, R.K.; Bailey, B.A.; Castillo, J.E. Machine Learning Based Predictions of Dissolved Oxygen in a Small Coastal Embayment. J. Mar. Sci. Eng. 2020, 8, 1007. [Google Scholar] [CrossRef]
Garabaghi, F.H.; Benzer, S.; Benzer, R. Modeling Dissolved Oxygen Concentration Using Machine Learning Techniques with Dimensionality Reduction Approach. Environ. Monit. Assess. 2023, 195, 879. [Google Scholar] [CrossRef]
Sapitang, M.; Ridwan, W.M.; Faizal Kushiar, K.; Najah Ahmed, A.; El-Shafie, A. Machine Learning Application in Reservoir Water Level Forecasting for Sustainable Hydropower Generation Strategy. Sustainability 2020, 12, 6121. [Google Scholar] [CrossRef]
Jumin, E.; Zaini, N.; Ahmed, A.N.; Abdullah, S.; Ismail, M.; Sherif, M.; Sefelnasr, A.; El-Shafie, A. Machine Learning versus Linear Regression Modelling Approach for Accurate Ozone Concentrations Prediction. Eng. Appl. Comput. Fluid Mech. 2020, 14, 713–725. [Google Scholar] [CrossRef]
Liu, M.; He, J.; Huang, Y.; Tang, T.; Hu, J.; Xiao, X. Algal Bloom Forecasting with Time-Frequency Analysis: A Hybrid Deep Learning Approach. Water Res. 2022, 219, 118591. [Google Scholar] [CrossRef]

Figure 1. Monitoring platform and system deployed at the marine ranching on Goji Island. The left displays the geographic coordinates of Goji Island, while the right illustrates the buoy configuration within the marine ranching area.

Figure 2. Framework of the proposed approach.

Figure 3. Trends in seawater environmental parameters monitored over the period (DO and Tem are shown in the first graph, followed by pH, Sal, Chl-a, and Tur).

Figure 4. The Pearson coefficient and the Spearman coefficient of input parameters: (a) the Pearson coefficient; (b) the Spearman coefficient. * indicates significance at p < 0.05 (two-tailed), and ** indicates significance at p < 0.01 (two-tailed).

Figure 5. Comparative analysis of predicted results and actual values of DO across six different sets of input parameters: (a) the result of a; (b) the result of a-PCA; (c) the result of b; (d) the result of b-PCA; (e) the result of c; and (f) the result of c-PCA.

Figure 6. Taylor diagram of predicted results and true values show that the result of Group 15 is closest to the true.

Figure 7. The violin plot comparing Group 18’s predictions with actual values demonstrates that Group 15’s results align most closely with the ground-truth data. The labels in the plot represent median values.

Figure 8. The bar chart of predictor importance shows that Tem is the first, and the rest are pH, Sal, Chl-a, and Tur.

Figure 9. Graph of partial dependency: (a) Temperature. (b) pH. (c) Salinity. (d) Chlorophyll-a. (e) Turbidity.

Figure 10. SHAP summary plot shows that Tem is the first, and the rest are pH, Sal, Chl-a, and Tur.

Table 1. Comparative analysis of three machine learning methods.

Input Parameter	MLP	RF	SVM
Optimal Data Type	Buoy-based continuous time-series data (e.g., hourly DO series)	Spatially heterogeneous station data (e.g., discrete CTD profiles)	Small-sample ship-measured data (e.g., regional cruise data)
Missing Value Handling	Sensitive (requires interpolation preprocessing)	Robust (via out-of-bag estimation)	Sensitive (requires complete data matrix)
Interpretability	Low (black-box model)	High (quantifiable feature importance)	Moderate (support vector visualization)
Computational Efficiency	High training cost (requires GPU acceleration)	Fast training (parallelized CPU)	Moderate (affected by kernel function complexity)
Advantages in Marine DO Applications	Captures cross-layer nonlinear coupling of T-S-DO	Identifies dominant factors in geographic hotspots	Delineates abrupt DO boundaries in upwelling zones
Typical Limitations	Requires >10⁴ samples to avoid overfitting	Fails to capture tidal cycle temporal features	Not suitable for multi-parameter interactions

Table 2. Input parameter settings.

Serial Number	Input Parameter
a	Tem, Sal, Chl-a, pH, Tur
b	Tem, Sal, Chl-a, pH
c	Tem, Chl-a, pH, Tur

Table 3. Experimental results for different prediction models.

Input Parameter	Model	RMSE	MAE	R²
a	MLP	0.061	0.043	0.575
	SVM	0.068	0.054	0.527
	RF	0.060	0.044	0.638
b	MLP	0.060	0.043	0.648
	SVM	0.062	0.048	0.608
	RF	0.059	0.047	0.738
c	MLP	0.069	0.051	0.413
	SVM	0.124	0.108	0.349
	RF	0.079	0.064	0.432
a-PCA ( $x_{i, n e w 1}$ , $x_{i, n e w 2}$ )	MLP	0.055	0.040	0.704
	SVM	0.069	0.052	0.647
	RF	0.055	0.042	0.856
b-PCA ( $x_{i, n e w 3}$ , $x_{i, n e w 4}$ )	MLP	0.053	0.037	0.725
	SVM	0.055	0.041	0.727
	RF	0.039	0.030	0.884
c-PCA ( $x_{i, n e w 5}$ , $x_{i, n e w 6}$ )	MLP	0.079	0.062	0.138
	SVM	0.109	0.087	0.278
	RF	0.096	0.073	0.251

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Lv, J.; Wang, Y.; Kong, X. Study on the Impact of Input Parameters on Seawater Dissolved Oxygen Prediction Models. J. Mar. Sci. Eng. 2025, 13, 536. https://doi.org/10.3390/jmse13030536

AMA Style

Li W, Lv J, Wang Y, Kong X. Study on the Impact of Input Parameters on Seawater Dissolved Oxygen Prediction Models. Journal of Marine Science and Engineering. 2025; 13(3):536. https://doi.org/10.3390/jmse13030536

Chicago/Turabian Style

Li, Wenqing, Jing Lv, Yuhang Wang, and Xiangfeng Kong. 2025. "Study on the Impact of Input Parameters on Seawater Dissolved Oxygen Prediction Models" Journal of Marine Science and Engineering 13, no. 3: 536. https://doi.org/10.3390/jmse13030536

APA Style

Li, W., Lv, J., Wang, Y., & Kong, X. (2025). Study on the Impact of Input Parameters on Seawater Dissolved Oxygen Prediction Models. Journal of Marine Science and Engineering, 13(3), 536. https://doi.org/10.3390/jmse13030536

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Study on the Impact of Input Parameters on Seawater Dissolved Oxygen Prediction Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources

2.2. Methodologies for Reasearch

2.3. Input Parameter Selection Mechanism

3. Result and Discussions

3.1. Data Analysis

3.2. Relevance Analysis

3.3. Model Prediction Result and Analysis

3.3.1. PCA and Input Parameter Optimization

3.3.2. Machine Learning Models

3.3.3. Taylor Diagram and Violin Plot Analysis

3.3.4. Predictor Importance Analysis

3.3.5. Partial Dependency Analysis

3.3.6. SHAP Analysis

4. Conclusions

4.1. Ecological Mechanism Analysis

4.2. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI