High-Precision Prediction of Total Nitrogen Based on Distance Correlation and Machine Learning Models—A Case Study of Dongjiang River, China

Chen, Yuanpei; Yao, Weike; Chen, Yiling

doi:10.3390/w17081131

Open AccessArticle

High-Precision Prediction of Total Nitrogen Based on Distance Correlation and Machine Learning Models—A Case Study of Dongjiang River, China

by

Yuanpei Chen

,

Weike Yao

and

Yiling Chen

^*

School of Ecology, Environment and Resources, Guangdong University of Technology, Guangzhou 511400, China

^*

Author to whom correspondence should be addressed.

Water 2025, 17(8), 1131; https://doi.org/10.3390/w17081131

Submission received: 26 February 2025 / Revised: 28 March 2025 / Accepted: 3 April 2025 / Published: 10 April 2025

(This article belongs to the Special Issue Monitoring and Modelling of Contaminants in Water Environment)

Download

Browse Figures

Versions Notes

Abstract

:

Excessive total nitrogen (TN) in water bodies leads to eutrophication, algal blooms, and hypoxia, which pose significant risks to aquatic ecosystems and human health. Accurate real-time TN prediction is crucial for effective water quality management. This study presents an innovative approach that combines the distance correlation coefficient (DCC) for feature selection with a coupled Attention-Convolutional Neural Network-Bidirectional Long Short-Term Memory (At-CBiLSTM) model to predict TN concentrations in the Dongjiang River in China. A dataset of 28,922 time-series data points was collected from seven sampling sites along the Dongjiang River, spanning from November 2020 to February 2023. The DCC method identified conductivity, Permanganate Index (COD_Mn), and total phosphorus as the most significant predictors for TN levels. The At-CBiLSTM model, optimized with a time step of three, outperformed other models, including standalone Long Short-Term Memory (LSTM), Bi-directional LSTM (Bi-LSTM), Convolutional Neural Network LSTM (CNN-LSTM), and Attention-LSTM variants, achieving excellent performance with the following metrics: mean absolute error (MAE) = 0.032, mean squared error (MSE) = 0.005, mean absolute percentage error (MAPE) = 0.218, and root mean squared error (RMSE) = 0.045. Importantly, increasing the number of input features beyond three variables led to a decline in model accuracy, underscoring the importance of DCC-driven feature selection. The results highlight that combining DCC with deep learning models, particularly At-CBiLSTM, effectively captures nonlinear temporal dependencies and improves prediction accuracy. This approach provides a solid foundation for real-time water quality monitoring and can inform targeted pollution control strategies in river ecosystems.

Keywords:

total nitrogen prediction; At-CBiLSTM model; distance correlation coefficient; water quality management

1. Introduction

The degradation of freshwater resources due to excessive total nitrogen (TN) has become a global crisis, resulting in catastrophic eutrophication events from the Mississippi River Basin in the United States to the Ganges River in India. TN-driven algal blooms and hypoxia have severely impacted aquatic biodiversity and compromised drinking water supplies. In China, rapid urbanization and agricultural intensification have exacerbated TN pollution, especially in the Dongjiang River Basin, a critical water source for over 40 million residents in the Greater Bay Area. TN concentrations in this region consistently exceed Class V water quality standards (>2.0 mg/L), rendering the water unsuitable for direct human contact and threatening both ecological and economic sustainability [1]. This dual challenge of global freshwater degradation and localized pollution hotspots highlights the urgent need for innovative monitoring tools capable of providing real-time high-precision TN predictions to guide mitigation efforts. As a result, monitoring TN concentrations in rivers has become crucial. Current analytical methods for TN detection, including spectroscopic analysis, chromatographic separation, electrochemical sensing, and automated flow injection systems, are widely used but come with limitations [2]. These methods require on-site sampling, are labor-intensive, and can be time-consuming. Additionally, they are not capable of real-time monitoring of nitrogen fluctuations, and the prolonged storage of samples may undermine the accuracy of results [3].

Compared to traditional water quality testing methods, machine learning models for water quality prediction offer significant reductions in sampling time and labor costs [4]. In recent years, machine learning has advanced rapidly and has been widely applied in various fields, including environmental science [5,6,7]. These models can uncover hidden patterns in data, learn the underlying connections between datasets, and address problems related to clustering, regression, and more [8].

Machine learning models do not need to account for the complexities of hydrological systems; they focus solely on the information contained within the data. They also do not require large amounts of hydrological or water quality data for modeling, further simplifying the process. Common machine learning models include Support Vector Machine (SVM) [9], Random Forest (RF) [10], Extreme Gradient Boosting (XGBoost) [11], Extreme Learning Machine (ELM) [12], and Multilayer Perceptron (MLP) [13]. Recent studies have increasingly utilized various machine learning frameworks for water quality forecasting [14,15,16]. For instance, Wang et al. [14] proposed the MACLALSTM architecture for urban water supply monitoring and demonstrated the superiority of hybrid computational frameworks over standalone models in predictive accuracy. However, critical methodological limitations persist in these studies, particularly the heavy reliance on isolated feature selection techniques or subjective manual variable curation during model development. Current practices often involve labor-intensive empirical variable selection without standardized protocols [17], as seen with the widespread but limiting use of Pearson correlation analysis. While commonly used in biological and statistical fields, this metric primarily detects only monotonic linear relationships [18], which is insufficient for complex nonlinear hydrological datasets.

Emerging solutions are addressing these limitations through advanced analytical tools, such as the distance correlation coefficient (DCC), which enables a comprehensive characterization of multivariate dependencies [19]. Miao applied DCC-based dimensional clustering and demonstrated accelerated variable categorization across heterogeneous datasets [20]. Li et al. [21] developed the ECDX framework, which combines DCC with XGBoost algorithms to achieve dynamic workload adaptation and improve forecasting reliability in server energy monitoring. Huang et al. [22] validated, through comparative analyses using Backpropagation Neural Network (BPNN) and Random Forest (RF) models, that DCC offers superior feature selection compared to conventional Pearson correlation methods. By incorporating DCC into the feature selection process, these studies highlight its potential to enhance the accuracy and effectiveness of machine learning-based water quality prediction models [23].

Artificial Neural Networks (ANNs) are computational models inspired by biological neural networks, simulating the connections and information transmission processes between neurons in the human brain to enable learning and intelligent inference. ANN prediction technology is becoming increasingly prevalent in water quality prediction [24]. For instance, Noori et al. [25] developed a hybrid model that combines the Soil and Water Assessment Tool (SWAT) model with ANN to optimize water quality predictions by accounting for complex variations. Wang et al. [26] introduced an ARIMA-ANN-based model, which integrates artificial neural networks with linear methods to achieve higher prediction accuracy. However, these hybrid models often fail to fully account for the temporal correlations in water quality data, which can impact prediction accuracy. With advancements in deep learning, Recurrent Neural Networks (RNNs) have proven more suitable for handling nonlinear time series data compared to traditional ANNs. However, RNNs face challenges such as vanishing and exploding gradients. To address these issues, Hochreiter et al. [27] proposed the LSTM network, a variant of RNN that mitigates time delays and gradient vanishing through gated mechanisms [28]. The interaction between these gates enables LSTM to effectively manage long-term dependencies that conventional RNNs struggle with, making it capable of balancing temporal and nonlinear relationships in data [27]. Liu et al. [29] investigated and predicted water quality using LSTM, finding that it outperforms other models in terms of accuracy and predictive performance. Li et al. [30] further enhanced LSTM by using the sparrow search algorithm to optimize its parameters for wastewater quality prediction, demonstrating that parameter optimization can significantly improve model performance. Additionally, hybrid deep-learning models have been developed [31]. For example, Khullar et al. [32] demonstrated that Bi-LSTM-based models effectively capture bidirectional temporal patterns in river water quality, outperforming unidirectional LSTM and CNN-LSTM models in COD/BOD prediction accuracy. Barzegar et al. [33] showed that CNN-LSTM outperforms standalone models in lake DO and Chl-a prediction.

Recent advancements in deep learning, particularly LSTM networks, offer promising solutions for temporal water quality forecasting [34,35,36]. However, several critical gaps remain. First, hybrid architectures like CNN- LSTM focus on spatial feature extraction but fail to account for bidirectional hydrological processes. Integrating Bi-LSTM can optimize the model by capturing temporal dependencies in both forward and backward directions [37]. Second, time-step intervals are often determined empirically, which can lead to the overfitting or underrepresentation of short-term pollutant fluctuations. Third, manual feature selection introduces subjectivity, as evidenced by Noori’s SWAT-ANN model, which requires expert intervention to prioritize variables [25]. These challenges highlight the need for a paradigm shift toward automated adaptive frameworks that reconcile spatial heterogeneity, bidirectional temporal dynamics, and nonlinear feature interactions.

To address these gaps, we propose a novel methodological approach tailored to the Dongjiang River Basin. Firstly, nonlinear feature engineering using the Distance Correlation Coefficient (DCC) replaces Pearson’s linear assumptions, quantifying both linear and nonlinear dependencies and automating the identification of key TN drivers, such as conductivity and COD_Mn. Secondly, the hybrid spatial–temporal architecture, At-CBiLSTM, integrates convolutional layers for multi-scale spatial pattern extraction, Bi-directional LSTM for modeling upstream–downstream interactions, and attention mechanisms to dynamically weigh critical features. Thirdly, precision parameterization involves a systematic protocol that optimizes time steps (1–6 days), balancing computational efficiency with predictive accuracy—an essential advancement for real-time monitoring.

The Dongjiang River, located in southern China, is a major tributary of the Pearl River, with an average annual runoff of 25.7 billion cubic meters. It serves as a vital water source for several cities, including Heyuan, Huizhou, Dongguan, Shenzhen, Guangzhou, and Hong Kong, supporting the needs of over 40 million people. The river also plays a crucial role in the surrounding urban ecosystem, providing habitats for various species and enhancing the ecological environment by regulating water volume. Understanding the status and trends of pollutants in the Dongjiang aquatic ecosystem is vital for strengthening environmental protection and development in the Dongjiang River Basin. This knowledge is also crucial for ensuring water security in the Greater Bay Area, supporting sustainable social and economic growth, and promoting regional prosperity.

In this study, we propose a novel methodology that departs from conventional manual input selection practices. Instead of subjectively choosing input variables, we employed the Distance Correlation Coefficient (DCC) as a feature selection tool to systematically identify the most relevant variables for model training. Seven distinct input schemes were designed by combining selected variables and subsequently used to train LSTM models. Through a comparative analysis of the training outcomes, we identified the optimal input scheme for predicting total nitrogen (TN) levels in the Dongjiang River. This scheme was then integrated into both standalone LSTM models and four hybrid LSTM-based models (e.g., CNN-LSTM, Bi-LSTM). Critical parameters, especially time-step length, were fine-tuned to develop an enhanced water quality prediction framework. Our goal was to combine DCC-driven feature selection with deep learning architectures, incorporating time-step optimization and controlled LSTM variants, to establish a real-time TN prediction model. The results showed that electrical conductivity was the most effective single-variable input, while Program 3 (electrical conductivity, COD_Mn, and total phosphorus) achieved the highest predictive accuracy. Notably, the At-CBiLSTM model achieved peak performance (MAE = 0.032) at a time step of 3, demonstrating a significant advancement in accurately forecasting TN dynamics in the Dongjiang River.

2. Materials and Methods

2.1. Research Areas and Data Collection

Water quality monitoring data for the Dongjiang River Basin were obtained from the National Surface Water Monitoring Network (http://www.mee.gov.cn/hjzl/shj/dbszdjcssfb/, accessed on 28 February 2023). The dataset covers the period from November 2020 to February 2023, consisting of 28,922 time-series data points collected at 4-h intervals. The data include various parameters such as pH, water temperature, dissolved oxygen, ammonia nitrogen, total phosphorus, total nitrogen, conductivity, and turbidity, as shown in Table 1. The sampling site map, shown in Figure 1, includes seven sampling points: Longchuan Xunwu, Xinfengjiang Reservoir, Boluo Chengxia (Xinjiao), Huizhou Ruhu, Zixi, Qiling, and Zhangcun (Carrefour).

2.2. Data Preprocessing

In this study, missing or outlier data were encountered due to occasional maintenance and abnormal conditions at the water quality monitoring sites. Outliers were identified and removed using the boxplot method with a threshold of 1.5 × IQR (interquartile range). Specifically, data points below Q₁ − 1.5 × IQR or above Q₃ + 1.5 × IQR were classified as outliers. The IQR, defined as the distance between Q₃ and Q₁, represents the difference between the upper and lower quartiles in statistical terms. Additionally, the interquartile range is the difference between the upper quartile and the lower quartile, where the first quartile (Q₁) corresponds to the 25th percentile, the second quartile (median) divides the data into two equal parts, and the third quartile (Q₃) corresponds to the 75th percentile. For missing data, the mean imputation was applied after outlier removal. Mean imputation was chosen for computational efficiency in large-scale datasets (n = 28,922), while strict exclusion of outliers ensured minimal influence of extreme observations on the remaining missing values. Between November 2020 and February 2023, the average water temperature in the Dongjiang River was 24.14 °C. The total nitrogen (TN) content ranged from a minimum of 0.133 mg/L to a maximum of 18.230 mg/L, with an average of 5.226 mg/L. According to the Surface Water Environmental Quality Standards and the Basic Engineering Standard Limitation Table, the TN concentration in the Dongjiang River falls under Class V water standards (>1.5 mg/L; see Table 2). This classification indicates that the water is primarily suitable for agricultural irrigation and general landscaping purposes but is not suitable for direct human contact. To meet standards for centralized drinking water, the water quality in the Dongjiang River requires improvement, emphasizing the importance of studying TN concentration trends in the river.

Due to significant differences in magnitude across various data dimensions—for instance, the mean concentration of ammonia nitrogen is 0.397 mg/L, while the mean conductivity is 341 μS/cm—it becomes challenging to compare their changes directly. Ammonia nitrogen values typically range between 0 and 5 mg/L, whereas conductivity values vary between 0 and 1000 μS/cm. To address this issue, the study applies linear normalization (Min–Max scaling), which maps the different data types to a uniform range of [0, 1]. The calculation formula for this normalization is as follows:

X^{'} = \frac{X_{i} - X_{m i n}}{X_{m a x} - X_{m i n}}

(1)

where

X^{'}

represents the normalized data,

X_{m i n}

is the minimum value of the sample data,

X_{m a x}

represents the maximum value of the sample data, and

X_{i}

represents the original data value.

2.3. Feature Selection

The distance correlation coefficient (DCC) is a crucial tool for analyzing the relationship between two variables. It effectively captures both linear and nonlinear associations, making it particularly useful for feature selection in model training [38,39]. Unlike the Pearson correlation coefficient, which is limited to detecting only linear relationships, the DCC addresses this limitation and provides a more comprehensive and accurate assessment of correlations [19]. The DCC ranges from −1 to 1, where a value greater than 0 signifies a positive correlation and a value less than 0 indicates a negative correlation. For more details, please refer to Table 3.

To assess the relationship between two water quality parameters, u and v, we calculate their distance correlation coefficient, denoted as

\hat{d} c o r r (u, v)

. This coefficient is derived from the estimated distance covariance,

\hat{d} c o v (u, v)

, which quantifies the relationship between the two variables. Similarly,

\hat{d} c o v (v, v)

is used to compute the distance standard deviation of variable v, while

\hat{d} c o v (u, u)

calculates the distance standard deviation of variable u. Additionally, the variables S1, S2, and S3 are used to compute the distance variances for the samples, providing further insight into the variability of the water quality parameters.

\hat{d} c o r r (u, v) = \frac{\hat{d} c o v (u, v)}{\sqrt{\hat{d} c o v (u, u) \hat{d} c o v (v, v)}}

(2)

Assuming that {(

u_{i}, v_{i})

, i = 1, 2, … n} constitutes a random sample from the population (u,v), the sample estimate of the squared covariance between the two random variables u and v, denoted as

\hat{d} c o v^{2} (u, v)

, can be calculated as follows:

\hat{d} c o v^{2} (u, v) = \hat{S_{1}} + \hat{S_{2}} - 2 \hat{S_{3}}

(3)

The sample estimates of

\hat{S_{1}}, \hat{S_{2}}

, and

\hat{S_{3}}

are, respectively, as follows:

\hat{S_{1}} = \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} {‖u_{i} - u_{j}‖}_{d_{u}} {‖v_{i} - v_{j}‖}_{d_{v}}

(4)

\hat{S_{2}} = \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} {‖u_{i} - u_{j}‖}_{d_{u}} \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} {‖v_{i} - v_{j}‖}_{d_{v}}

(5)

\hat{S_{3}} = \frac{1}{n^{3}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{l = 1}^{n} {‖u_{i} - u_{l}‖}_{d_{u}} {‖v_{j} - v_{l}‖}_{d_{v}}

(6)

When

\hat{d} c o v (u, v) = 0

, it indicates that u and v are independent of each other; as

\hat{d} c o v (u, v)

approaches 1, it suggests a stronger correlation between u and v. Conversely, a smaller

\hat{d} c o v (u, v)

signifies a weaker correlation. Similarly, we can compute the sample estimates of

\hat{d} c o v (u, u)

and

\hat{d} c o v (v, v)

.

The strength of the correlation between variables is evaluated by DCC, as shown in Figure 2.

Based on the analysis of the distance correlation coefficients and scatter plots in Figure 3, the input variables are ranked according to their correlation levels as follows: electrical conductivity, COD_Mn, total phosphorus, ammonia nitrogen, pH, dissolved oxygen, and water temperature. Electrical conductivity shows a very strong correlation, while COD_Mn and total phosphorus exhibit strong correlations. Total phosphorus (TP), which is commonly linked to agricultural runoff and wastewater, also demonstrates a strong correlation with total nitrogen (TN), making it a critical variable for prediction. Ammonia, nitrogen, pH, and dissolved oxygen show moderate correlations, while water temperature has a very weak correlation. To assess the predictive performance of the LSTM model with various input variable combinations, training will be conducted using different input schemes that reflect the varying correlation levels. The criteria for categorizing the correlation levels are provided in Table 3, while the specific seven input schemes are detailed in Table 4. Program 1 uses a single variable as the input, while the remaining schemes incorporate multiple variables.

After multiple rounds of model tuning, the following parameter settings for LSTM training have been established:

Input Data Configuration: The time step, input layer, and output layer dimensions are configured based on the input data. In this study, data points are recorded at 4-h intervals, and the goal is to predict total nitrogen concentration for the next 4 h using the past 24 h of water quality data. As a result, the time step is set to 6, the output feature dimension is 1, and the input feature dimension corresponds to the number of variables;
LSTM Neuron Configuration: The number of neurons in each LSTM layer has been optimized through extensive experimentation. The first LSTM hidden layer contains 64 neurons, while the second layer has 16 neurons;
Dropout Layer Addition: To mitigate overfitting, a Dropout layer is added after the LSTM layer, with a dropout rate of 0.1;
Dense Layer Implementation: The output Dense layer is configured with 1 unit, and the activation function is set to ReLU;
Optimizer Selection: The Adam optimizer is used with a learning rate of 0.001;
Training Configuration: The number of epochs is set to 256, and the batch size is also 256. A suitable batch size ensures accurate gradient descent, while an appropriate number of epochs enhances the model’s ability to fit the data effectively.

2.4. Water Quality Prediction Model

2.4.1. LSTM Basic Structure and Principles

LSTM networks, by utilizing a combination of forget gates, input gates, output gates, and memory cells, are capable of retaining and updating important long-term dependency information in time series data. The model was trained using TensorFlow (version 2.7.0) and Keras (version 2.3.0). Each gate has a specific function that allows the network to learn crucial information in the sequence, forget irrelevant parts, and retain essential contextual details. This mechanism enables LSTM to effectively address the problem of long-term dependencies, greatly enhancing its performance in various applications [40]. The unfolded structure of LSTM along the time axis is illustrated in Figure 4. The LSTM unit consists of the following components:

Forget Gate:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(7)

where

f_{t}

is the forget gate’s output at time step t, which determines how much of the previous memory should be discarded,

W_{f}

is the weight matrix for the forget gate,

h_{t - 1}

is the previous hidden state from time step t − 1,

x_{t}

is the input vector at time step t,

b_{f}

is the bias term for the forget gate, and

σ

is the sigmoid activation function.

Input Gate:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(8)

where

i_{t}

is the input gate’s output at time step t, which determines how much new information will be added to the memory,

W_{i}

is the weight matrix for the input gate,

h_{t - 1}

is the previous hidden state from time step t − 1,

x_{t}

is the input vector at time step t,

b_{i}

is the bias term for the input gate, and

σ

is the sigmoid activation function.

Cell Sate Update:

{\tilde{C}}_{t} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(9)

where

{\tilde{C}}_{t}

is the candidate cell state at time step t, which represents the new information to be added to the memory cell,

W_{c}

is the weight matrix for the candidate cell state,

h_{t - 1}

is the previous hidden state from time step t − 1,

x_{t}

is the input vector at time step t,

b_{c}

is the bias term for the candidate cell state, and

t a n h

is the hyperbolic tangent activation function.

Output Gate:

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(10)

where

o_{t}

is the output gate’s output at time step t, which determines how much of the memory cell will be output as the hidden state,

W_{o}

is the weight matrix for the output gate,

h_{t - 1}

is the previous hidden state from time step t − 1,

x_{t}

is the input vector at time step t,

b_{o}

is the bias term for the output gate, and

σ

is the sigmoid activation function.

Long-term Memory (Cell State):

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t}

(11)

where

C_{t}

is the memory cell state at time step t,

f_{t}

is the forget gate’s output at time step t,

C_{t - 1}

is the previous memory cell state from time step t − 1,

i_{t}

is the input gate’s output at time step t, and

{\tilde{C}}_{t}

is the candidate cell state at time step t.

Short-term Memory (Hidden State):

h_{t} = O_{t} * \tanh (C_{t})

(12)

where

h_{t}

is the hidden state (output) at time step t, which is the output of the LSTM cell,

O_{t}

is the output gate’s output at time step t,

C_{t}

is the memory cell state at time step t, and

t a n h

is the hyperbolic tangent activation function.

These components work together to process time series data efficiently. First, the forget gate determines which information from the memory cell should be discarded based on the current input and the previous hidden state. Then, the input gate identifies which new information should be written to the memory cell, again considering the current input and the previous state. The memory cell then updates its stored content, influenced by the forget and input gates. Finally, the output gate decides the final hidden state output, which is based on the updated memory cell and will be used for the next time step.

2.4.2. The Basic Structure and Principles of CNN

In deep learning, Convolutional Neural Networks (CNNs) are widely used for image recognition and the processing of sequential data. Their strong feature-learning capabilities have led to impressive results in various applications [41]. The basic structure of a CNN consists of convolutional layers, activation functions, pooling layers, and fully connected layers [42]. In this study, CNNs are primarily employed for processing sequential data, and their working principle is as follows:

Convolutional Layer: Initially, the well-prepared time series data are fed into the convolutional layer. This layer is the core component of the CNN, where sliding window operations are performed using convolutional kernels (filters). Within each window, the data are element-wise multiplied by the convolutional kernel, and the results are summed to produce the output;
Pooling Layer: The pooling layer reduces the output dimensions of the convolutional layer and extracts significant features. Common pooling operations include max pooling and average pooling. Max pooling selects the maximum value within a time window as the output, while average pooling computes the average value within that window. Pooling operations help reduce the data volume, enhance computational efficiency, and improve the model’s invariance to time;
Fully Connected Layer: After passing through several convolutional and pooling layers, the extracted features are flattened into a one-dimensional vector, transforming the time series data into a vector format. This flattened feature vector is then connected to the fully connected layer. By multiplying the flattened feature vector by the weight matrix and adding a bias term, the final output is generated.

In summary, the convolutional layer extracts relevant features from input samples using convolutional operations, the pooling layer reduces the dimensionality of the features to enhance computational speed, and the fully connected layer produces the final prediction results.

2.4.3. Bidirectional Long Short-Term Memory (Bi-LSTM)

The Bi-LSTM architecture is commonly used in tasks such as natural language processing (NLP) [43,44], where it proves particularly effectively for context-dependent applications, including language modeling, named entity recognition, and sentiment analysis. The Bi-LSTM extends the standard LSTM by incorporating bidirectional temporal modeling. Its primary concept is to learn sequence features from both forward hidden states (processing the sequence in chronological order, t = 1 → T) and backward hidden states (processing the sequence in reverse chronological order, t = T → 1). This bidirectional approach allows the model to capture contextual information from both the past and the future, improving its performance in tasks where understanding the entire sequence is crucial.

Forward Hidden State: Computed sequentially over time as

{\vec{H}}_{t} = σ (X_{t} W_{x \vec{H}} + {\vec{H}}_{t - 1} W_{\vec{H} \vec{H}} + b_{\vec{H}})

(13)

where

{\vec{H}}_{t}

is the forward hidden state at time step t,

X_{t}

is the input vector at time step t,

W_{x \vec{H}}

is the weight matrix connecting the input

X_{t}

to the forward hidden state,

{\vec{H}}_{t - 1}

is the forward hidden state from the previous time step t − 1,

W_{\vec{H} \vec{H}}

is the recurrent weight matrix for the forward hidden state,

b_{\vec{H}}

is the bias term for the forward hidden state, and σ is the activation function.

Backward Hidden State: Computed in reverse temporal order as

{\overset{\leftarrow}{H}}_{t} = σ (X_{t} W_{x \overset{\leftarrow}{H}} + {\overset{\leftarrow}{H}}_{t + 1} W_{\overset{\leftarrow}{H} \overset{\leftarrow}{H}} + b_{\overset{\leftarrow}{H}})

(14)

where

{\overset{\leftarrow}{H}}_{t}

is the backward hidden state at time step t,

X_{t}

is the input vector at time step t,

W_{x \overset{\leftarrow}{H}}

is the weight matrix connecting the input

X_{t}

to the backward hidden state,

{\overset{\leftarrow}{H}}_{t + 1}

is the backward hidden state from the next time step t + 1,

W_{\overset{\leftarrow}{H} \overset{\leftarrow}{H}}

is the recurrent weight matrix for the backward hidden state,

b_{\overset{\leftarrow}{H}}

is the bias term for the backward hidden state, and σ is the activation function.

Final Output: Generated by concatenating forward and backward hidden states through a dense layer, as follows:

O_{t} = {\vec{H}}_{t} W_{\vec{H} O} + {\overset{\leftarrow}{H}}_{t} W_{\overset{\leftarrow}{H} O} + b_{O}

(15)

where

O_{t}

is the final output at time step t,

{\vec{H}}_{t}

is the forward hidden state at time t,

{\overset{\leftarrow}{H}}_{t}

is the backward hidden state at time t,

W_{\vec{H} O}

is the weight matrix from the forward hidden state to the output,

W_{\overset{\leftarrow}{H} O}

is the weight matrix from the backward hidden state to the output, and

b_{O}

is the bias term for the output layer.

2.4.4. Attention Mechanism

After extracting long-term dependencies from features using stacked LSTM layers, the outputs are passed as inputs to the attention layer. The attention mechanism automatically learns the importance of each hidden state, allowing the model to focus on the most relevant parts of the input sequence.

The attention mechanism can be thought of as a weighted sum. It first calculates the importance of each input feature and then applies the softmax function to normalize the weights, ensuring that the sum of all weights equals 1. These weights are multiplied by their corresponding input features, and the resulting weighted features are summed to produce the final output.

The relevant equations for the attention mechanism are as follows:

a_{t}^{k} = \frac{\exp (e_{t}^{k})}{\sum_{i = 1}^{n} \exp (e_{t}^{i})}

(16)

e_{t}^{k} = υ_{e}^{T} σ (W_{e} [h_{t - 1}, c_{t - 1}] + U_{e} h_{t} + b_{e})

(17)

{\bar{z}}_{t} = \sum_{t}^{T} a_{t}^{T} h_{t}

(18)

where

u_{e}

,

b_{e} \in R^{T}

,

W_{e} \in R^{T \times m}

, and

U_{e} {\in R}^{m \times m}

are the parameters that need to be learned; m represents the number of neurons.

a_{t}^{k}

represents the attention weight for the

k^{t h}

input at time step t;

e_{t}^{k}

represents the importance of

h_{t}

; and

\bar{z_{t}}

is the output of the attention layer obtained by summing the weighted hidden states.

2.4.5. At-CBiLSTM Model

Due to the complexity of water pollutant formation and the nonlinear characteristics of concentration changes, achieving high accuracy in water quality prediction is challenging. Deep learning technologies offer a solution by automatically training deep neural networks to better capture the features of water quality sequences [45,46,47].

When Recurrent Neural Networks (RNNs) process time series data, the outputs of certain neurons can be fed back as inputs to other neurons, allowing the network to effectively utilize past information. However, RNNs have limited memory and storage capacity, which makes them susceptible to issues like gradient explosion and gradient vanishing. LSTM networks, as a specialized type of RNN for time series, can effectively capture dependencies between input data and mitigate gradient vanishing and gradient explosion problems. Despite this, LSTM models still struggle with long-term dependencies and cannot always effectively capture the time-relatedness between data points or identify the most important features at each time step [48].

Traditional LSTM models typically fail to consider time sequence data in both forward and backward directions. To address this limitation, researchers have proposed the use of Bi-LSTM models. These models make full use of both forward and backward neighborhood information, which enhances the prediction of water quality sequences and improves predictive accuracy [49].

Convolutional Neural Networks (CNNs), as lower-level feature extractors, are highly effective at modeling local features. They extract local features from input data using convolutional operations and integrate them into global features through pooling operations, helping the model better capture key aspects of the data.

The attention mechanism enables the model to focus on the most important parts of the data. It dynamically computes the significance of each position based on the input information and adjusts the focus on key sections according to their importance. This improves the model’s efficiency in processing the data and enhances its ability to extract crucial information.

In this study, we leverage the unique advantages of Bi-LSTM and CNN, incorporating the attention mechanism to create a hybrid model that more efficiently captures the complex relationships between adjacent water quality data. This hybrid approach enhances prediction accuracy [50], as illustrated in Figure 5.

2.5. Evaluation Indicators for Prediction Results

We have selected five evaluation methods to assess the deviation between predicted values and true values: Mean Absolute Error (MAE), Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Coefficient of Determination (R²). The specific calculation formulas are as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |\hat{y_{i}} - y_{i}|

(19)

MAE reflects the deviation between predicted and true values, with a range of [0, +∞). A smaller value indicates a smaller error and greater proximity to the true values. An MAE of 0 signifies perfect prediction performance.

M S E = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}

(20)

MSE primarily measures prediction accuracy, which ranges from [0, +∞). A smaller MSE indicates higher accuracy in predictions.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(21)

The RMSE is the square root of the MSE. A smaller RMSE indicates a better fit, whereas a larger RMSE suggests a poorer fit.

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}|

(22)

MAPE is a statistical measure of prediction accuracy, with a range of [0, +∞). An MAPE of 0% signifies optimal model performance, while values exceeding 100% indicate poorer performance.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(23)

where

\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

. A higher R² value indicates a greater degree to which the independent variable explains the dependent variable, with data points clustering more closely around the regression line. R² ranges from 0 to 1; values closer to 0 indicate a poorer fit, while those closer to 1 signify a better fit. A negative R² value suggests very poor model performance.

3. Results

3.1. DCC Feature Selection Results

The distance correlation coefficient (DCC) analysis was used to assess the correlation between total nitrogen (TN) and other water quality parameters. The results ranked the importance of these parameters in predicting TN concentrations. As shown in Figure 3, the correlation strengths were as follows: conductivity (0.9), COD_Mn (0.79), total phosphorus (0.71), ammonia nitrogen (0.59), pH (0.47), dissolved oxygen (0.4), and water temperature (0.13). These findings highlight that conductivity is the most strongly correlated parameter with TN, followed by COD_Mn and total phosphorus. Based on these results, seven different input schemes were designed to evaluate their impact on model performance, as outlined in Table 4.

3.2. The Training Results of the LSTM Model with Different Input Schemes

To determine the optimal configuration of input variables for predicting TN levels in the Dongjiang River, seven distinct input schemes were evaluated using the LSTM model. The results, presented in Figure 6, indicate that Program 3, which includes conductivity, COD_Mn, and total phosphorus, achieved the highest prediction accuracy. This scheme recorded an MSE of 0.017, MAE of 0.079, MAPE of 0.51, and R² of 0.861. These results suggest that combining multiple strongly correlated variables improves the model’s robustness and accuracy. Scheme 1, which included only conductivity, achieved an MSE of 0.020, MAE of 0.086, MAPE of 0.616, and R² of 0.831. Program 2, which included conductivity and COD_Mn, showed an MSE of 0.018, MAE of 0.084, MAPE of 0.528, and R² of 0.853. The remaining schemes (4, 5, 6, and 7) showed progressively lower accuracy as additional variables were included. This suggests that while more variables can provide additional information, they may also introduce noise or complexity, which can reduce the model’s performance.

3.3. Performance Prediction of Models

In addition to the LSTM model, four coupled LSTM models were developed and evaluated using a validation set. The results, summarized in Table 5 and illustrated in Figure 7, show that the At-CBiLSTM model consistently exhibited the smallest prediction error across all time step lengths. For example, with a time step length of 1, the At-CBiLSTM model achieved an MAE of 0.062, MSE of 0.012, MAPE of 0.415, and RMSE of 0.089. When compared to the other models, At-CBiLSTM demonstrated significant improvements in accuracy. Specifically, relative to the standard LSTM model, At-CBiLSTM reduced the MAPE by 0.095, highlighting a substantial enhancement in prediction precision. The performance metrics for other models, including Bi-LSTM, CNN-LSTM, and Attention-LSTM, were also assessed. These evaluations revealed that the integration of attention mechanisms and convolutional layers contributed to further improvements in prediction accuracy, underscoring the effectiveness of these combined approaches in enhancing model performance.

3.4. Impact of Time Step Length on Model Performance

The time step length was identified as a critical parameter influencing model performance. As shown in Figure 7, the prediction performance (e.g., MAE) of all models initially improved as the time step length increased from 1 to 3, reaching the highest accuracy at a time step length of 3. For instance, the At-CBiLSTM model achieved the lowest MAE (0.032), MSE (0.005), MAPE (0.218), and RMSE (0.045) at this time step length. However, further increases in the time step length beyond 3 led to a gradual decline in model performance, likely due to increased model complexity and overfitting. This finding highlights the importance of optimizing the time step length to strike a balance between model accuracy and computational efficiency.

3.5. Comparative Analysis of Model Performance

A comparative analysis of the different models revealed that the At-CBiLSTM model consistently outperformed all other models across all evaluation metrics. For example, at a time step length of 3, the At-CBiLSTM model achieved an MAE of 0.032, indicating a strong correlation between predicted and actual TN concentrations. In comparison, the standard LSTM model achieved an MAE of 0.074, the Bi-LSTM model reached 0.064, and the CNN-LSTM model recorded an MAE of 0.054. The Attention-LSTM model had an MAE of 0.060, further emphasizing the superior performance of the At-CBiLSTM model. This comparative analysis highlights the effectiveness of integrating attention mechanisms, convolutional layers, and bidirectional LSTM layers in enhancing the predictive capabilities of deep learning models for water quality prediction. The combination of these advanced features enables the At-CBiLSTM model to better capture complex relationships within the data, leading to improved accuracy and more reliable predictions.

4. Discussion

4.1. Key Findings on Model Performance

This study introduces the At-CBiLSTM model, which significantly enhances water quality prediction accuracy compared to traditional machine learning models such as LSTM, Bi-LSTM, and CNN-LSTM. The model achieved an MAE of 0.032 and RMSE of 0.045 in predicting total nitrogen (TN) concentration, demonstrating its strong ability to capture complex temporal and spatial dependencies in water quality data.

The improvement in performance is primarily attributed to the integration of CNN, Bi-LSTM, and attention mechanisms, which enhance feature extraction, long-term dependency learning, and model interpretability. Compared to baseline models, At-CBiLSTM achieves at least a 15% reduction in prediction error, particularly in datasets with high variability. Similar findings have been reported in recent studies on deep-learning applications in hydrology and environmental monitoring [51,52].

Moreover, the hybrid model demonstrates superior robustness across different datasets. When trained on highly dynamic water quality datasets, At-CBiLSTM maintains stable accuracy, while LSTM and CNN-LSTM exhibit larger fluctuations due to their limited capacity to capture long-term dependencies and multiscale interactions. This confirms the effectiveness of multi-component deep learning architectures in hydrological forecasting [53].

4.2. Feature Selection and Model Optimization

Feature selection plays a crucial role in enhancing model accuracy. Using the Distance Correlation Coefficient (DCC), we identified key factors influencing total nitrogen (TN) concentration, such as electrical conductivity, COD_Mn, and total phosphorus. These variables are widely recognized as significant indicators of water quality [54]. The DCC method provides a robust framework for capturing both linear and nonlinear dependencies, outperforming traditional correlation methods like Pearson’s correlation and mutual information analysis.

Moreover, model optimization experiments revealed that a three-day time step achieves the best balance between prediction accuracy and computational efficiency. While longer time steps (e.g., seven days) provide more historical context, they also increase the risk of overfitting. Conversely, shorter time steps (e.g., one day) fail to capture sufficient temporal dependencies.

To further improve feature selection, future research should explore adaptive feature weighting techniques, allowing the model to dynamically adjust the importance of variables in response to environmental changes [55]. Additionally, integrating causal inference frameworks, such as Granger causality tests or graph neural networks, could enhance the model’s interpretability in complex hydrological systems [56].

4.3. Model Applicability and Future Directions

Despite its high accuracy, the computational cost of the At-CBiLSTM model presents a challenge for real-time applications. Future research should focus on reducing computational complexity through techniques such as model pruning, quantization, and knowledge distillation, which have demonstrated the ability to improve efficiency while maintaining predictive power [57]. Additionally, implementing distributed computing frameworks, such as cloud-based deep learning inference, could facilitate real-time deployment in large-scale monitoring networks [58].

Another significant challenge is the black-box nature of deep learning models, which may limit their acceptance in decision-making processes [59]. Future work should explore Explainable AI (XAI) techniques, such as SHAP and LIME, to improve interpretability and build trust in AI-driven water quality management systems [60]. Furthermore, integrating multi-source data could enhance the model’s adaptability to varying environmental conditions. For example, Wang et al. [61] integrated GLDAS hydrological, hydro-meteorological, and streamflow data to develop a spatio-temporal deep learning model, achieving superior streamflow prediction and early flood warning.

Finally, the model’s adaptability to extreme weather conditions and pollution events should be further investigated. The increasing frequency of climate-induced water quality fluctuations, such as flooding and algal blooms, underscores the need for models that can dynamically adjust to evolving environmental conditions. Future enhancements may involve reinforcement learning-based self-adaptive architectures, enabling the model to continuously learn from new water quality data.

4.4. Innovations and Implications for Water Quality Management

The At-CBiLSTM model represents a significant advancement in water quality prediction by integrating the distance correlation coefficient (DCC) with hybrid spatiotemporal modeling. By combining CNN, LSTM, and attention mechanisms alongside DCC-based nonlinear feature selection, the model enhances both predictive accuracy and interpretability in the monitoring of large-scale and complex water systems. The use of DCC enables the identification of key dependencies among water quality variables, effectively filtering out irrelevant features and capturing intricate relationships that traditional methods may overlook.

This integration makes the At-CBiLSTM model highly adaptable to various ecological environments, including coastal regions, estuaries, and groundwater systems, where precise water quality assessment is crucial. Additionally, its ability to dynamically adjust to changing conditions strengthens early warning systems for pollution events, enabling timely responses and mitigation strategies.

As global concerns over water pollution and resource scarcity intensify, the fusion of DCC with deep learning models like At-CBiLSTM will play a vital role in supporting proactive environmental policies and regulatory measures. Future advancements will require interdisciplinary collaboration between AI researchers, hydrologists, and policymakers to refine and deploy these innovative solutions to tackle real-world environmental challenges.

5. Conclusions

This study introduces a novel framework for high-precision total nitrogen (TN) prediction in river systems, addressing critical gaps in feature selection and temporal modeling. Three key innovations define the work, as follows:

Automated nonlinear feature engineering: The Distance Correlation Coefficient (DCC) method quantified nonlinear interactions between TN and water quality parameters, identifying conductivity, COD_Mn, and total phosphorus as dominant drivers;
Hybrid spatiotemporal architecture: The Attention-Convolutional-Bidirectional LSTM (At-CBiLSTM) model integrated CNN-based spatial pattern extraction, bidirectional temporal modeling, and adaptive attention mechanisms. This approach achieved a 22.4% RMSE reduction (0.045 vs. 0.058) compared to conventional Bi-LSTM models;
Real-time optimization: Systematic time-step analysis identified 3-day intervals as optimal (MAE = 0.032, MSE = 0.005, MAPE = 0.218, RMSE = 0.045), balancing prediction accuracy with computational efficiency for deployable monitoring systems.

By overcoming limitations in manual feature bias and unidirectional temporal modeling, this approach provides a generalizable solution for nutrient prediction in tidal rivers and ecologically sensitive basins. Future research will prioritize lightweight deployment for watershed-scale pollution early-warning systems.

Author Contributions

Conceptualization, Y.C. (Yuanpei Chen); investigation, Y.C. (Yuanpei Chen); methodology, Y.C. (Yuanpei Chen); data acquisition, Y.C. (Yuanpei Chen); data analysis, Y.C. (Yuanpei Chen); visualization, Y.C. (Yuanpei Chen); writing—original draft preparation, Y.C. (Yuanpei Chen); formal analysis, W.Y.; writing—review and editing, W.Y. and Y.C. (Yiling Chen); supervision, Y.C. (Yiling Chen). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 72293602 and 72293600.

Data Availability Statement

The data generated and analyzed during the current study are available from the corresponding author.

Acknowledgments

The authors would like to thank Yongkai Huang for his support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rong, Q.Q.; Su, M.R.; Yang, Z.F.; Cai, Y.P.; Yue, W.C.; Dang, Z. Spatial distribution and output characteristics of nonpoint source pollution in the Dongjiang River basin in south China. In Proceedings of the 3rd International Conference on Advances in Energy Resources and Environment Engineering (ICAESEE), Harbin, China, 8–10 December 2017; IOP Publishing Ltd.: Bristol, UK, 2017. [Google Scholar]
Xu, P.L. Research and application of near-infrared spectroscopy in rapid detection of water pollution. Desalination Water Treat. 2018, 122, 1–4. [Google Scholar] [CrossRef]
He, Z.; Yao, J.; Lu, Y.; Guo, D. Detecting and explaining long-term changes in river water quality in south-eastern Australia. Hydrol. Process. 2022, 36, 15. [Google Scholar]
Yan, X.; Zhang, T.; Du, W.; Meng, Q.; Xu, X.; Zhao, X. A Comprehensive Review of Machine Learning for Water Quality Prediction over the Past Five Years. J. Mar. Sci. Eng. 2024, 12, 18. [Google Scholar] [CrossRef]
Guo, H.-N.; Wu, S.-B.; Tian, Y.-J.; Zhang, J.; Liu, H.-T. Application of machine learning methods for the prediction of organic solid waste treatment and recycling processes: A review. Bioresour. Technol. 2021, 319, 13. [Google Scholar] [CrossRef] [PubMed]
Zhao, S.; Zhang, S.; Liu, J.; Wang, H.; Zhu, J.; Li, D.; Zhao, R. Application of machine learning in intelligent fish aquaculture: A review. Aquaculture 2021, 540, 19. [Google Scholar] [CrossRef]
Zhong, S.; Zhang, K.; Bagheri, M.; Burken, J.G.; Gu, A.; Li, B.; Ma, X.; Marrone, B.L.; Ren, Z.J.; Schrier, J.; et al. Machine Learning: New Ideas and Tools in Environmental Science and Engineering. Environ. Sci. Technol. 2021, 55, 12741–12754. [Google Scholar]
Cai, W.; Ye, C.; Ao, F.; Xu, Z.; Chu, W. Emerging applications of fluorescence excitation-emission matrix with machine learning for water quality monitoring: A systematic review. Water Res. 2025, 277, 16. [Google Scholar]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. Isprs J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar]
Khoi, D.N.; Quan, N.T.; Linh, D.Q.; Nhi, P.T.T.; Thuy, N.T.D. Using Machine Learning Models for Predicting the Water Quality Index in the La Buong River, Vietnam. Water 2022, 14, 12. [Google Scholar] [CrossRef]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar]
Juna, A.; Umer, M.; Sadiq, S.; Karamti, H.; Eshmawi, A.A.; Mohamed, A.; Ashraf, I. Water Quality Prediction Using KNN Imputer and Multilayer Perceptron. Water 2022, 14, 19. [Google Scholar] [CrossRef]
Wang, K.; Ye, Z.; Wang, Z.; Liu, B.; Feng, T. MACLA-LSTM: A Novel Approach for Forecasting Water Demand. Sustainability 2023, 15, 19. [Google Scholar] [CrossRef]
Guo, S.; Sun, S.; Zhang, X.; Chen, H.; Li, H. Monthly precipitation prediction based on the EMD-VMD-LSTM coupled model. Water Supply 2023, 23, 4742–4758. [Google Scholar]
Xu, H.; Lv, B.; Chen, J.; Kou, L.; Liu, H.; Liu, M. Research on a Prediction Model of Water Quality Parameters in a Marine Ranch Based on LSTM-BP. Water 2023, 15, 15. [Google Scholar] [CrossRef]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar]
Schober, P.; Boer, C.; Schwarte, L.A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar]
Edelmann, D.; Móri, T.F.; Székely, G.J. On relationships between the Pearson and the distance correlation coefficients. Stat. Probab. Lett. 2021, 169, 6. [Google Scholar]
Miao, C.S. Clustering of different dimensional variables based on distance correlation coefficient. J. Ambient. Intell. Humaniz. Comput. 2021, 12. [Google Scholar] [CrossRef]
Li, C.; Zhu, D.; Hu, C.; Li, X.; Nan, S.; Huang, H. ECDX: Energy consumption prediction model based on distance correlation and XGBoost for edge data center. Inf. Sci. 2023, 643, 13. [Google Scholar]
Huang, Y.K.; Chen, Y.L. Prediction of Total Phosphorus Based on Distance Correlation and Machine Learning Methods-a Case Study of Dongjiang River, China. Water Air Soil Pollut. 2024, 235, 14. [Google Scholar] [CrossRef]
Ruan, S.; Chen, B.; Song, K.; Li, H. Weighted naive Bayes text classification algorithm based on improved distance correlation coefficient. Neural Comput. Appl. 2022, 34, 2729–2738. [Google Scholar] [CrossRef]
Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. A Review of the Artificial Neural Network Models for Water Quality Prediction. Appl. Sci. 2020, 10, 49. [Google Scholar] [CrossRef]
Noori, N.; Kalin, L.; Isik, S. Water quality prediction using SWAT-ANN coupled approach. J. Hydrol. 2020, 590, 10. [Google Scholar] [CrossRef]
Wang, L.; Zou, H.; Su, J.; Li, L.; Chaudhry, S. An ARIMA-ANN Hybrid Model for Time Series Forecasting. Syst. Res. Behav. Sci. 2013, 30, 244–259. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
Liu, P.; Wang, J.; Sangaiah, A.K.; Xie, Y.; Yin, X. Analysis and Prediction of Water Quality Using LSTM Deep Neural Networks in IoT Environment. Sustainability 2019, 11, 14. [Google Scholar] [CrossRef]
Li, G.; Cui, Q.; Wei, S.; Wang, X.; Xu, L.; He, L.; Kwong, T.C.H.; Tang, Y. Long short-term memory network-based wastewater quality prediction model with sparrow search algorithm. Int. J. Wavelets Multiresolut. Inf. Process. 2023, 21, 20. [Google Scholar] [CrossRef]
Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall-runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
Khullar, S.; Singh, N. Water quality assessment of a river using deep learning Bi-LSTM methodology: Forecasting and validation. Environ. Sci. Pollut. Res. 2022, 29, 12875–12889. [Google Scholar]
Barzegar, R.; Aalami, M.T.; Adamowski, J. Short-term water quality variable prediction using a hybrid CNN-LSTM deep learning model. Stoch. Environ. Res. Risk Assess. 2020, 34, 415–433. [Google Scholar]
Wu, J.H.; Wang, Z.C. A Hybrid Model for Water Quality Prediction Based on an Artificial Neural Network, Wavelet Transform, and Long Short-Term Memory. Water 2022, 14, 26. [Google Scholar] [CrossRef]
Wang, Z.C.; Wang, Q.Y.; Wu, T.H. A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM. Front. Environ. Sci. Eng. 2023, 17, 17. [Google Scholar]
Zhang, Y.; Li, C.; Jiang, Y.; Sun, L.; Zhao, R.; Yan, K.; Wang, W. Accurate prediction of water quality in urban drainage network with integrated EMD-LSTM model. J. Clean. Prod. 2022, 354, 12. [Google Scholar]
Madhukumar, N.; Wang, E.; Fookes, C.; Xiang, W. 3-D Bi-directional LSTM for Satellite Soil Moisture Downscaling. IEEE Trans. Geosci. Remote Sens. 2022, 60, 18. [Google Scholar]
Chaudhuri, A.; Hu, W.H. A fast algorithm for computing distance correlation. Comput. Stat. Data Anal. 2019, 135, 15–24. [Google Scholar]
Huo, X.M.; Székely, G.J. Fast Computing for Distance Covariance. Technometrics 2016, 58, 435–447. [Google Scholar]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar]
Salehi, A.W.; Khan, S.; Gupta, G.; Alabduallah, B.I.; Almjally, A.; Alsolai, H.; Siddiqui, T.; Mellit, A. A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability 2023, 15, 28. [Google Scholar] [CrossRef]
Zhao, J.F.; Mao, X.; Chen, L.J. Learning deep features to recognise speech emotion using merged deep CNN. IET Signal Process. 2018, 12, 713–721. [Google Scholar]
Shahid, F.; Zameer, A.; Muneeb, M. Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos Solitons Fractals 2020, 140, 9. [Google Scholar]
Zhang, Q.; Wang, R.; Qi, Y.; Wen, F. A watershed water quality prediction model based on attention mechanism and Bi-LSTM. Environ. Sci. Pollut. Res. 2022, 29, 75664–75680. [Google Scholar]
Yao, S.; Zhang, Y.; Wang, P.; Xu, Z.; Wang, Y.; Zhang, Y. Long-Term Water Quality Prediction Using Integrated Water Quality Indices and Advanced Deep Learning Models: A Case Study of Chaohu Lake, China, 2019-2022. Appl. Sci. 2022, 12, 16. [Google Scholar] [CrossRef]
Im, Y.; Song, G.; Lee, J.; Cho, M. Deep Learning Methods for Predicting Tap-Water Quality Time Series in South Korea. Water 2022, 14, 24. [Google Scholar] [CrossRef]
Wu, H.; Cheng, S.; Xin, K.; Ma, N.; Chen, J.; Tao, L.; Gao, M. Water Quality Prediction Based on Multi-Task Learning. Int. J. Environ. Res. Public Health 2022, 19, 19. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Phys. D-Nonlinear Phenom. 2020, 404, 28. [Google Scholar]
Kang, H.; Yang, S.; Huang, J.; Oh, J. Time Series Prediction of Wastewater Flow Rate by Bidirectional LSTM Deep Learning. Int. J. Control Autom. Syst. 2020, 18, 3023–3030. [Google Scholar]
Kota, V.R.; Munisamy, S.D. High accuracy offering attention mechanisms based deep learning approach using CNN/bi-LSTM for sentiment analysis. Int. J. Intell. Comput. Cybern. 2022, 15, 61–74. [Google Scholar]
Bi, J.; Zhang, L.; Yuan, H.; Zhang, J. Multi-indicator water quality prediction with attention-assisted bidirectional LSTM and encoder-decoder. Inf. Sci. 2023, 625, 65–80. [Google Scholar]
Zhang, M.; Zhang, Z.; Wang, X.; Liao, Z.; Wang, L. The Use of Attention-Enhanced CNN-LSTM Models for Multi-Indicator and Time-Series Predictions of Surface Water Quality. Water Resour. Manag. 2024, 38, 6103–6119. [Google Scholar]
Rajaee, T.; Khani, S.; Ravansalar, M. Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: A review. Chemom. Intell. Lab. Syst. 2020, 200, 25. [Google Scholar]
Wu, W.; Xu, Z.; Zhan, C.; Yin, X.; Yu, S. A new framework to evaluate ecosystem health: A case study in the Wei River basin, China. Environ. Monit. Assess. 2015, 187, 15. [Google Scholar]
An, T.; Feng, K.; Cheng, P.; Li, R.; Zhao, Z.; Xu, X.; Zhu, L. Adaptive prediction for effluent quality of wastewater treatment plant: Improvement with a dual-stage attention-based LSTM network. J. Environ. Manag. 2024, 359, 11. [Google Scholar]
Romić, D.; Reljić, M.; Romić, M.; Babac, M.B.; Brkić, Ž.; Ondrašek, G.; Kovačić, M.B.; Zovko, M. Temporal Variations in Chemical Proprieties of Waterbodies within Coastal Polders: Forecast Modeling for Optimizing Water Management Decisions. Agriculture 2023, 13, 27. [Google Scholar] [CrossRef]
Tao, Z.; Xia, Q.; Cheng, S.; Li, Q. An Efficient and Robust Cloud-Based Deep Learning With Knowledge Distillation. IEEE Trans. Cloud Comput. 2023, 11, 1733–1745. [Google Scholar]
Kim, J.; Kim, H.; Kim, D.-J.; Song, J.; Li, C. Deep Learning-Based Flood Area Extraction for Fully Automated and Persistent Flood Monitoring Using Cloud Computing. Remote Sens. 2022, 14, 19. [Google Scholar] [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar]
Infant, S.S.; Vickram, S.; Saravanan, A.; Muthu, C.M.M.; Yuarajan, D. Explainable artificial intelligence for sustainable urban water systems engineering. Results Eng. 2025, 25, 14. [Google Scholar]
Wang, Z.; Xu, N.; Bao, X.; Wu, J.; Cui, X. Spatio-temporal deep learning model for accurate streamflow prediction with multi-source data fusion. Environ. Model. Softw. 2024, 178, 21. [Google Scholar]

Figure 1. Coordinate map of sampling points in Dongjiang.

Figure 2. Calculation results of distance correlation coefficients between water quality factors.

Figure 3. Importance ranking of total nitrogen and other water quality factor variables.

Figure 4. The unrolled diagram of an LSTM unit on a Timeline.

Figure 5. At-CBiLSTM model structure.

Figure 6. Evaluation results of different input schemes.

Figure 7. Performance evaluation of LSTM-coupled models using different error metrics. (a) MAE analysis of LSTM-coupled models. (b) MSE analysis of LSTM-coupled models. (c) MAPE analysis of LSTM-coupled models. (d) RMSE analysis of LSTM-coupled models.

Table 1. Statistics of water quality parameter characteristics.

Index	Unit	Mean	Standard Deviation	Minimum	Median	Maximum
Temperature	℃	24.143	5.438	0.330	24.144	35.537
PH	-	7.159	0.482	5.370	7.110	9.840
Dissolved Oxygen	mg $L^{- 1}$	6.908	2.221	0.533	7.036	22.588
COD_Mn	mg $L^{- 1}$	2.737	1.432	0.250	2.850	24.292
Ammonia Nitrogen	mg $L^{- 1}$	0.387	0.540	0.025	0.151	4.505
Total Phosphorus (TP)	mg $L^{- 1}$	0.087	0.086	0.005	0.057	1.418
TN	mg $L^{- 1}$	5.226	3.612	0.133	5.193	18.230
Conductivity	$μ S {c m}^{- 1}$	341.310	246.410	0.002	213.100	1133.100
Turbidity	NTU	27.064	42.838	0.030	15.686	775.600

Table 2. Environmental quality standards for total nitrogen (TN) in surface water.

Water Quality Class	TN Concentration (mg/L)	Designated Use
Class I	≤0.2	Protected areas for source water and national reserves
Class II	≤0.5	Centralized drinking water sources, fishery waters
Class III	≤1.0	General drinking water and recreational uses
Class IV	≤1.5	Industrial and agricultural water
Class V	>1.5	Poor-quality water for limited uses

Table 3. Determination of the DCC correlation degree.

Range (Absolute Value)	Degree
0.8–1.0	Extremely strong correlation
0.6–0.79	Good correlation
0.4–0.59	Moderate correlation
0.2–0.39	Weak correlation
0.0–0.19	Extremely weak or no correlation

Table 4. Seven schemes for different combinations of input variables.

Program	Input Variables	Relevant Degree
1	Conductivity	Extremely strong correlation
2	Conductivity, COD_Mn	Extremely strong and good correlation
3	Conductivity, COD_Mn, total phosphorus	Extremely strong and good correlation
4	Conductivity, COD_Mn, total phosphorus, ammonia nitrogen	Extremely strong, good, moderate correlation
5	Conductivity, COD_Mn, total phosphorus, ammonia nitrogen, pH	Extremely strong, good, moderate, and weak correlation
6	Conductivity, COD_Mn, total phosphorus, ammonia nitrogen, pH, dissolved oxygen	Extremely strong, good, moderate, and weak correlation
7	Conductivity, COD_Mn, total phosphorus, ammonia nitrogen, pH, dissolved oxygen, water temperature	Extremely strong, good, moderate, weak, extremely weak correlation

Table 5. Evaluation results of models with different time steps.

Days	Error Indicator	LSTM	Bi-LSTM	CNN-LSTM	Attention-LSTM	At-CBiLSTM
1	MAE	0.079	0.076	0.064	0.068	0.062
	MSE	0.017	0.016	0.014	0.015	0.012
	MAPE	0.51	0.501	0.441	0.453	0.415
	RMSE	0.129	0.117	0.098	0.11	0.089
2	MAE	0.078	0.065	0.061	0.063	0.051
	MSE	0.015	0.015	0.013	0.014	0.011
	MAPE	0.472	0.452	0.422	0.441	0.405
	RMSE	0.087	0.082	0.072	0.077	0.067
3	MAE	0.074	0.064	0.054	0.06	0.032
	MSE	0.012	0.011	0.007	0.010	0.005
	MAPE	0.393	0.375	0.355	0.365	0.218
	RMSE	0.062	0.058	0.055	0.057	0.045
4	MAE	0.078	0.069	0.06	0.065	0.053
	MSE	0.033	0.026	0.016	0.024	0.015
	MAPE	0.480	0.385	0.345	0.351	0.305
	RMSE	0.076	0.072	0.06	0.064	0.054
5	MAE	0.093	0.085	0.072	0.083	0.063
	MSE	0.036	0.03	0.026	0.028	0.016
	MAPE	0.585	0.505	0.459	0.475	0.335
	RMSE	0.081	0.076	0.068	0.072	0.061
6	MAE	0.13	0.115	0.082	0.098	0.080
	MSE	0.044	0.038	0.034	0.036	0.025
	MAPE	0.637	0.575	0.545	0.560	0.395
	RMSE	0.099	0.094	0.074	0.088	0.069

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Yao, W.; Chen, Y. High-Precision Prediction of Total Nitrogen Based on Distance Correlation and Machine Learning Models—A Case Study of Dongjiang River, China. Water 2025, 17, 1131. https://doi.org/10.3390/w17081131

AMA Style

Chen Y, Yao W, Chen Y. High-Precision Prediction of Total Nitrogen Based on Distance Correlation and Machine Learning Models—A Case Study of Dongjiang River, China. Water. 2025; 17(8):1131. https://doi.org/10.3390/w17081131

Chicago/Turabian Style

Chen, Yuanpei, Weike Yao, and Yiling Chen. 2025. "High-Precision Prediction of Total Nitrogen Based on Distance Correlation and Machine Learning Models—A Case Study of Dongjiang River, China" Water 17, no. 8: 1131. https://doi.org/10.3390/w17081131

APA Style

Chen, Y., Yao, W., & Chen, Y. (2025). High-Precision Prediction of Total Nitrogen Based on Distance Correlation and Machine Learning Models—A Case Study of Dongjiang River, China. Water, 17(8), 1131. https://doi.org/10.3390/w17081131

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Precision Prediction of Total Nitrogen Based on Distance Correlation and Machine Learning Models—A Case Study of Dongjiang River, China

Abstract

1. Introduction

2. Materials and Methods

2.1. Research Areas and Data Collection

2.2. Data Preprocessing

2.3. Feature Selection

2.4. Water Quality Prediction Model

2.4.1. LSTM Basic Structure and Principles

2.4.2. The Basic Structure and Principles of CNN

2.4.3. Bidirectional Long Short-Term Memory (Bi-LSTM)

2.4.4. Attention Mechanism

2.4.5. At-CBiLSTM Model

2.5. Evaluation Indicators for Prediction Results

3. Results

3.1. DCC Feature Selection Results

3.2. The Training Results of the LSTM Model with Different Input Schemes

3.3. Performance Prediction of Models

3.4. Impact of Time Step Length on Model Performance

3.5. Comparative Analysis of Model Performance

4. Discussion

4.1. Key Findings on Model Performance

4.2. Feature Selection and Model Optimization

4.3. Model Applicability and Future Directions

4.4. Innovations and Implications for Water Quality Management

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI