Compressible Diagnosis of Membrane Fouling Based on Transfer Entropy

Wu, Xiaolong; Hou, Dongyang; Yang, Hongyan; Han, Honggui

doi:10.3390/app14188176

Open AccessArticle

Compressible Diagnosis of Membrane Fouling Based on Transfer Entropy

by

Xiaolong Wu

^1,2,*

,

Dongyang Hou

^1,2,

Hongyan Yang

^1,2 and

Honggui Han

^1,2

¹

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

²

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8176; https://doi.org/10.3390/app14188176

Submission received: 8 August 2024 / Revised: 5 September 2024 / Accepted: 10 September 2024 / Published: 11 September 2024

(This article belongs to the Special Issue Application of Neural Computation in Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Membrane fouling caused by many direct and indirect triggering factors has become an obstacle to the application of membrane bioreactors (MBRs). The nonlinear relationship between those factors is subject to complex causality or affiliation, which is difficult to clarify for the diagnosis of membrane fouling. To solve this problem, this paper proposes a compressible diagnosis model (CDM) based on transfer entropy to facilitate the fault diagnosis of the root cause for membrane fouling. The novelty of this model includes the following points: Firstly, a framework of a CDM between membrane fouling and causal variables is built based on a feature extraction algorithm and mechanism analysis. The framework can identify fault transfer scenarios following the changes in operating conditions. Secondly, the fault transfer topology of a CDM based on transfer entropy is constructed to describe the causal relationship between variables dynamically. Thirdly, an information compressible strategy is designed to simplify the fault transfer topology. This strategy can eliminate the repetitious affiliation relationship, which contributes to diagnosing the root causal variables speedily and accurately. Finally, the effectiveness of the proposed CDM is verified by the measured data from an actual MBR. The results of experiments demonstrate that the proposed CDM fulfills the diagnosis of membrane fouling.

Keywords:

membrane fouling; diagnosis; causal relationship; root causal variables; transfer entropy

1. Introduction

Membrane fouling is an important factor affecting wastewater treatment processes with membrane bioreactors (MBRs), which can lead to the loss of membrane flux, deterioration of effluent quality, etc. [1,2,3]. To prevent this phenomenon, it is necessary to accurately diagnose future cases of membrane fouling before implementing efficient strategies [4,5]. However, in MBRs, the biochemical process that triggers membrane fouling is complicated and involves many causal variables, such as aeration, reflux, and dosing. The relationship between these variables is time-varying and nonlinear [6,7]. Therefore, it is challenging to establish an accurate diagnosis model for membrane fouling.

To establish the causal relationship between the variables and membrane fouling, some scholars have studied the mechanism of membrane fouling to diagnose its occurrence directly [8,9,10,11]. Lewis et al. analyzed the growth of filter cake in the process of low-pressure cross-flow microfiltration in MBR with fluid dynamics gauging [12]. Then, the diagnosis of membrane fouling was realized by quantifying the significance of membrane pore-level fouling phenomena at the early stage of filtration. In [13], a mechanism model was built for diagnosing membrane fouling by combining adenosine triphosphate and total cell count. The results displayed that this model was suitable for biological fouling diagnosis. In addition, MBR is affected by the flow distribution and hydraulic conditions in the reactor. A residence time distribution technique was developed to determine the impact of membrane geometry, orientation, and mixing efficiency on MBR performance [14]. Azizighannad et al. employed Raman chemical images to identify the morphology of membrane fouling [15]. This strategy could diagnose different types of membrane fouling with its observed appearance under specific static conditions. However, the above mechanisms are difficult to adapt to different working conditions. Since the correction of a large number of parameters is complicated, they are time-consuming for maintaining model accuracy. To resolve this problem, some data-driven diagnosis models, based on support vector regression (SVR), kernel function (KF), and artificial neural network (ANN) with strong adaptability, were established for membrane fouling [16,17,18,19]. For example, Liu et al. designed an SVR model with a LibSVM package to diagnose cases of membrane fouling in MBRs by mapping the relationship between extracellular polymer substances, organic loading rate, transmembrane pressure difference, and total membrane resistance [17]. The results showed that the relevant influencing factors of membrane fouling could be uncovered effectively. Han et al. proposed a multi-category diagnosis method based on KF for the detection and early warning of membrane fouling [18]. This method combined multiple binary classifiers to identify the causal variables of membrane fouling. Mittal et al. employed ANN to identify membrane fouling to minimize the risk of its occurrence [19]. The parameters of this model were updated based on the genetic algorithm, which was able to adapt to different operating conditions. Data-driven diagnosis models have the capability to sharpen the nonlinear relationship between input variables and output variables so that causal variables of membrane fouling can be distinguished. However, the existing models lack interpretability and struggle to determine interactions among different variables. Then, abundant variables with overlap and collinearity increase the complexity and confusion of diagnosis.

To achieve a diagnosis process with interpretability, Chen et al. simplified the causality diagram through Granger causality and maximum spanning trees to diagnose the root causal variables of process abnormalities [20]. However, Granger causality analysis is only applicable to the causality analysis of linear processes, which cannot explain the nonlinearity in membrane fouling. To conquer this challenge, Waghen et al. proposed a multi-level interpretable logic tree to clarify the nonlinear relationship between root causes, intermediate causes, and faults [21]. In addition, several intelligent tools are also introduced to explain the nonlinear process of membrane fouling. For example, Duan et al. developed an accident-relevance tree based on the analysis of the formation mechanism of quality accidents [22]. The method located the root causes of quality accidents utilizing the fuzzy mechanism and the vague nature of datasets. Velásquez et al. combined the decision tree learner and ANN to diagnose power transformer faults of membrane fouling, which reduced the calculation cost and improved the accuracy of fault classification simultaneously [23]. Other similar nonlinear methods can be observed in [24,25]. However, the methods mentioned in [21,22,23,24,25] only focus on the causality between local variables by constructing a tree causality diagram, rather than the interaction between all relevant variables. To provide the causal variables of faults comprehensively, Amin et al. synthesized principal component analysis (PCA) and Bayesian network to capture the nonlinear dependence of high-dimensional process data [26]. Then, the root causal variables of faults were diagnosed with the discretization of continuous data. In [27], a Bayesian network was developed to describe the relationship between alarm variables and root causal variables in thermal power plants. The parameters of the network were updated in a recursive way, which promoted the accurate detection of the root causal variables. Furthermore, Han et al. proposed a recursive kernel PCA and Bayesian network to diagnose sludge bulking in the wastewater treatment process [28]. This method effectively captured the nonlinear and time-varying characteristics of sludge bulking to diagnose the root causal variables with high accuracy. However, once the diagnosis models in [26,27,28] were constructed with given datasets, they always held the invariant information transfer path due to their complexity. When the operating conditions of MBR are changed frequently or drastically, it may be difficult to maintain acceptable accuracy for those models. Additionally, the relationship between those variables of membrane fouling exhibits both time-varying and nonlinear characteristics primarily because membrane fouling is a dynamic and complex process that is influenced by multiple, interacting factors. These characteristics can make it challenging to diagnose membrane fouling effectively.

Based on the above analysis, this paper proposes a compressible diagnosis model (CDM) based on transfer entropy. This model is used to depict the fault transfer topology (FTT) of membrane fouling and further explore the root cause following the operating conditions. The novelties of this work are as follows.

(1): Based on the mechanism analysis associated with membrane fouling, the relationship between causal variables and membrane fouling is clarified with the feature extraction algorithm. Then, the related variables of membrane fouling are obtained under different operating conditions. Different from the data-driven diagnosis models in [15,16,17,18,19] with given causal variables, the feature extraction algorithm will enable the proposed CDM to transform raw data into informative representations that can be utilized for diagnosis.
(2): Instead of using a mapping relationship with simple input–output representation such as decision trees [21,22] and the Bayesian network [24,25], a topology based on transfer entropy is constructed. This approach not only provides a qualitative evaluation of the causal relationships between variables by observing the dynamic transfer path, but it also offers a quantitative description of those variables. It helps uncover the path of fault occurrence and further obtain the fault cause priority that may change over time as the operating conditions change.
(3): The information compressible strategy (ICS) is designed to delete redundant or repetitious affiliation relationships between the causal variables. With this strategy, the simplified FTT is obtained with low complexity during the update of fault transfer topology, which can maintain the diagnosis of membrane fouling speedily and accurately.

The rest of this paper is organized as follows. Section 2 introduces the background of membrane fouling. Section 3 introduces a methodology for the diagnosis of membrane fouling in detail. Then, the results and discussion of diagnosing membrane fouling are introduced in Section 4. Finally, Section 5 concludes this paper.

2. Background of Membrane Fouling

2.1. Membrane Fouling

Membrane fouling refers to the increase in water resistance and the decrease in permeation flux caused by the deposition of pollutants on the membrane surface or in its pores. The mechanisms of membrane fouling mainly include (1) plugging of membrane pores by colloid and SMP, fouling adhesion, and gel layer formation; (2) formation and consolidation of a cake layer; (3) variation in pollutants due to long-term functioning of the reactor; and (4) osmotic pressure effect. The membrane fouling has different characteristics, which can be divided into three types: (1) removable fouling, which usually generates in the filter cake layer and can be removed by physical means; (2) irremovable fouling, which usually requires chemical cleaning to remove the pore blockage; and (3) irreversible fouling, which cannot be removed by any cleaning operation. In addition, there are various methods for studying membrane fouling, with the Hermia model being the most widely used. This semiempirical parametric model assigns physical significance to its parameters, which is effectively described by this model. The generalized Hermia model is a form of a nonlinear differential equation. It displays the dynamic of membrane fouling and its complex relationships between factors of membrane fouling. According to these characteristics, it is difficult to design a diagnosis method to diagnose the root cause of membrane fouling accurately.

2.2. Membrane Fouling Diagnosis System

The membrane fouling diagnosis system used for locating root causal variables in the actual MBR wastewater treatment process online is shown in Figure 1. The system consists of four modules: a data acquisition module, feature extraction module, online prediction module, and online diagnosis module. The data acquisition module retrieves the value of process variables from sensors and transfers the acquired variables to the database via a programmable logic controller. The feature extraction module is designed to filter the collected process data. In this module, multiple related variables are selected as the preselected variables. Then, the partial least squares (PLS) method is used to obtain the feature variables. The online prediction module predicts the indicators to identify membrane states. Finally, the online diagnosis module constructs an FTT to locate root causal variables. It is crucial to take corresponding measures to control membrane fouling.

3. Methodology

In this section, the CDM based on transfer entropy is proposed to diagnose membrane fouling. First, this method extracts relevant features, which preliminarily simplifies the complexity of the system and focuses on key factors influencing membrane fouling. Second, the transfer entropy qualitative is calculated to quantify the causal transfer between different variables, and dynamically analyze the transfer path to form FFT. Finally, the root causal variable is found through the topology. With the design of the information compressible strategy, the causal relationship is further simplified between variables by deleting redundant or repetitive dependencies between causal variables.

3.1. Feature Variable Selection

In this part, the advantages of the linear regression algorithm and the typical PLS are integrated. Hence, the characteristic variables that have a great impact on membrane fouling detection can be selected. To be specific, the PLS algorithm is adopted in this study and the steps are as follows:

➀: The data of the independent variable is given as P = [p₁, p₂, …, p_j], p_j= (p_1j, p_2j, …, p_ij)^T, i = 1, 2, …, m, j = 1, 2, …, n, and the dependent variable Q = [q₁, q₂,…, q_i]^T. The standard treatment is as follows:

$\{\begin{cases} p_{i j}^{'} = \frac{p_{i j} - {\bar{p}}_{j}}{s_{j}} \\ q_{i}^{'} = \frac{q_{i} - \bar{q}}{s} \end{cases},$

(1)

where the standardized P and Q are recorded as E₀ and F₀. $p_{i j}^{'}$ and $q_{i}^{'}$ represent the elements in E₀ and F₀, respectively. ${\bar{p}}_{j}$ and s_j represent the average value and standard deviation of the elements in column j of P, respectively. $\bar{q}$ and s represent the average value and standard deviation of all elements in Q, respectively. ${\bar{p}}_{j}$ and $\bar{q}$ can be expressed by

$\{\begin{cases} {\bar{p}}_{j} = \frac{1}{m} \sum_{i = 1}^{m} p_{i j}, s_{j} = \sqrt{\frac{1}{m - 1} \sum_{i = 1}^{m} {(p_{i j} - {\bar{p}}_{j})}^{2}} \\ \bar{q} = \frac{1}{m} \sum_{i = 1}^{m} q_{i}, s = \sqrt{\frac{1}{m - 1} \sum_{i = 1}^{m} {(q_{i} - \bar{q})}^{2}} \end{cases} .$

(2)
➁: The principal component of the variable is found via the following formula:

$\{\begin{cases} a_{h} = E_{h - 1}^{T} F_{0} / {‖E_{h - 1}^{T} F_{0}‖}^{2} \\ v_{h} = E_{h - 1} a_{h} \\ b_{h} = E_{h - 1}^{T} v_{h} / {‖v_{h}‖}^{2} \\ r_{h} = F_{h - 1}^{T} v_{h} / {‖v_{h}‖}^{2} \\ E_{h} = E_{h - 1} - v_{h} b_{h} \end{cases},$

(3)

where h is the number of extracted principal components. E_h is the standardized independent variable matrix when h components are extracted. F_h is the standardized dependent variable matrix when h components are extracted. v_h is the component extracted from E_h₋₁. a_h, b_h and r_h represent the intermediate vectors.
➂: The cross-validity $Q_{h}^{2}$ is used to determine the number of final extracted components with

$Q_{h}^{2} = 1 - L (h) / L L (h - 1),$

(4)

where L(h) and LL(h) can be expressed by

$L (h) = \sum_{d = 1}^{m} {(q_{d}^{'} - {\hat{q}}_{h (- d)}^{'})}^{2}, L L (h) = \sum_{d = 1}^{m} {(q_{d}^{'} - {\hat{q}}_{h (d)}^{'})}^{2},$

(5)

where $q_{d}^{'}$ is the dth (0 < d ≤ h + 1) element in F₀, and ${\hat{q}}^{'}$ represents the fitting quantity. Sample point d is deleted when modeling with the linear regression model, and h components are taken for regression modeling to obtain the coefficients α_i. Then, the fitting value of the dth sample point is calculated and recorded as ${\hat{q}}_{h (- d)}^{'}$ . Additionally, all sample points are used, and h components are taken for regression modeling to obtain coefficients ς_i. Finally, the fitting value of the dth sample point is calculated and recorded as ${\hat{q}}_{h (d)}^{'}$ .

In the process of extracting components, when

Q_{k + 1}^{2}

< 0.0975 and the model accuracy reaches the expected requirements, the process of extracting components stops. The number of extracted principal components is m. And the feature variables are represented as X = [x₁, x₂, …, x_m].

3.2. Membrane Fouling Detection Model

Autoencoder (AE) is a multi-layer neural network of unsupervised learning in the deep learning technique, which includes an encoder and a decoder (as shown in Figure 2). The encoder compresses the input data X = [x₁, x₂, …, x_m] to obtain the outputs of the hidden layer H = [h₁, h₂,…, h_s]. And the decoder takes the outputs of the hidden layer as the input and obtains the output

\hat{X}

= [

\hat{x_{1}}

,

\hat{x_{2}}

, …,

\hat{x_{m}}

]. H can be expressed as

H = f (W_{x} X),

(6)

where f(x) = 1/(1 + e^-x) is the activation function. W_x represents the weight between the input layer and the hidden layer, and s < m. The output value

\hat{X}

can be expressed as

\overset{\land}{X} = f (W_{h} H),

(7)

where W_h is the weight between the hidden layer and the output layer.

The back-propagation algorithm is used to adjust the parameters of AE in the training process. For the input variable X = [x₁, x₂,…, x_m], the objective function is defined as

J = \frac{1}{2 m} \sum_{i = 1}^{m} {‖x_{i} - \hat{x_{i}}‖}^{2} .

(8)

To judge the existence of membrane fouling, the collected data samples for the autoencoder are tested after training. Assuming that the AE captures the discipline of the input–output accurately, the maximum root mean square error is considered as the threshold that judges the incidents of membrane fouling. When the reconstruction error is greater than the threshold of the maximum root mean square error, it means membrane fouling exists in the wastewater treatment process.

3.3. Construction of Fault Transfer Topology

Since transfer entropy can represent the direction of information transfer between variables, it can represent the relationship between fault variables. The transfer entropy is given as follows:

T (X_{i + 1} | X, Y) = \sum_{x_{i + 1}, x_{i}, y_{j}} p (x_{i + 1}, x_{i}^{(k)}, y_{j}^{(l)}) \log_{2} \frac{p (x_{i + 1} | x_{i}^{(k)}, y_{j}^{(l)})}{p (x_{i + 1} | x_{i}^{(k)})},

(9)

where p(·|·) is the conditional probability. x_i and y_j represent the measured values of X and Y at time i and time j, respectively. X_i+₁ represents the measured value of X at the next time. k and l are the implantation dimensions of X and Y, respectively. The transfer entropy represents the influence of the existence of y_j on the state of x_i+₁.

By calculating the transfer entropy between preselected variables, the information transfer relationship between all variables can be obtained. The transfer entropy of Y to X is different from X to Y, which shows that there are differences in the amount of information transferred in two directions. It reflects the difference in the degree of interaction between variables. The direction of causality between variables can be determined by the difference between two transfer entropies:

T_{Y \to X} = T (X_{i + 1} | X, Y) - T (Y_{i + 1} | Y, X) .

(10)

If T_Y_→X is positive, it means that Y has a greater impact on the information entropy of X than the impact X has on the information entropy of Y. At this time, Y is the causal variable of X. On the contrary, if T_Y_→X is negative, it means that X is the causal variable of Y.

The transfer entropy between all selected variables is calculated to determine the causality, which can build an adjacency matrix A∊R^k^×k. Then, the position in the adjacency matrix A is determined according to the direction of causality between variables. If T_Y_→X is positive, it means Y is the causal variable of X. The value of T_Y_→X is placed in row Y and column X of A:

A = (\begin{matrix} 0 & \dots & 0 & \dots & T_{1 \to a} \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ T_{a \to 1} & \dots & 0 & \dots & 0 \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ 0 & \dots & T_{a \to b} & \dots & 0 \end{matrix}),

(11)

where the diagonal value of A is 0. T_a_→1 means that the ath variable is the causal variable of the first variable. In turn, the value of T_1→a is 0.

Thus, the adjacency matrix between variables can be obtained, and the related variables can be connected by lines according to the values in the matrix to obtain the fault transfer topology. After determining the causal relationship between all variables, FTT can be obtained as shown in Figure 3.

3.4. Simplification of Fault Propagation Topology

FTT expresses the complex relationship between the variables, which will affect the search for the root causal variables of membrane fouling. Therefore, it is necessary to simplify this topology to identify the main impact relationships.

When X and Y are disrupted, a new sequence is constructed as follows:

\{\begin{cases} X^{s} = [X_{i}, X_{i + 1}, ..., X_{i + M - 1}] \\ Y^{s} = [Y_{j}, Y_{j + 1}, ..., Y_{j + M - 1}] \end{cases},

(12)

where M is the number of samples of the new sequence, and N is the total number of samples of the original sequence. The value range of i and j is [1, N−M+1].

The new sequence is a subset of the original sequence. The statistical characteristics in the stationary sequence are the same as those in the original sequence. To ensure no correlation between the two sequences, i and j need to meet ||i−j|| ≥ e, where e is a sufficiently large integer. Then, the new sequence formula has two variables with a large time interval. It can be considered that the correlation between the two variables is eliminated by a large time interval. Thus, two variable sequences without causality are obtained. The transfer entropy te_s of the multiple sets of such new sequences are calculated and stored in NET with NET = [te₁, te₂,…, te_s]. Then, the significance threshold is calculated by

S_{Y \to X} = μ_{N E T} + 3 σ_{N E T},

(13)

where µ_NET is the mean value of NET and σ_NET is the standard deviation of NET. The greater difference in transfer entropy than the threshold value represents that there is a causal relationship between variables.

To further simplify the fault transfer topology, an ICS based on a BIC score function is proposed. The designed ICS mainly includes two parts. First, the fitting degree is considered. Then, the complexity of the structure is reduced to avoid the decline in diagnosis accuracy caused by complex models and many other parameters.

For the dataset D = {D₁, D₂, D₃,…, D_m}, m represents the size of the sample dataset. The logarithmic likelihood function of the parameter θ can be expressed as follows:

I (θ | D) = \sum_{l = 1}^{m} \log P (D_{l} | θ) = \sum_{i = 1}^{n} \sum_{j = 1}^{q_{i}} \sum_{k = 1}^{r_{i}} m_{i j k} \log θ_{i j k},

(14)

where θ ={θ_ijk|i = 1, 2, …, n, j = 1, 2, …, q_i, k = 1, 2, …, r_i}. θ_ijk represents the probability that the value of node X_i is k when the parent node value of node X_i is j. q_i represents the total number of parent node set values of node X_i, and q_i = 1 when node X_i has no parent node. r_i represents the number of possible value types of node X_i, and m_ijk represents the number of samples that meet X_i = k and the parent node is j in D.

The ICS based on the BIC score function can be expressed as

B I C (G | D) = I (θ | D) - \sum_{i = 1}^{n} \frac{q_{i} (r_{i} - 1)}{2} \log m,

(15)

where BIC(G|D) represents the score for structure G. Through ICS, the indirect connection of variables is scored to determine whether to delete. The presentation of the indirect connection is shown in Figure 4. Then, the simplified FTT is obtained after using ICS.

The information relevance strategy is proposed to find the root causal variables. The main idea of information relevance is to select a node as the starting node arbitrarily and calculate the sum of the transfer entropy of the remaining paths. When the sum of transfer entropy has the maximum value, the variable represented by this node is the root causal variable. The sum of transfer entropy can be expressed as

K_{i} = \sum_{v = 1}^{N} T_{v} (X_{i + 1} | X, Y),

(16)

where K_i represents the sum of the transfer entropy of all paths when the ith node is the starting node. N is the number of paths of the current structure, and T_v represents the transfer entropy corresponding to the vth path. Therefore, the corresponding node variable is selected as the root causal variable when K_i takes the maximum value. The specific steps of membrane fouling diagnosis are shown in Table 1. Additionally, the change in operating conditions will lead to changes in the input distribution of the CDM. This means that the potential relationship between process variables will change, the insignificant variables that previously led to membrane fouling may become the root cause and effect variables of the fault. The proposed CDM has two mechanisms to perceive this kind of scenario: (1) The least squares method in the process of feature extraction will produce new principal components, which may lead to changes in the composition and number of filtered variables. (2) After recalculating the transfer entropy of the fault variable, FFT will obtain a completely different causal relationship. When these scenarios happen, the steps of membrane fouling diagnosis will be refreshed.

4. Results and Discussion

The effectiveness of the CDM is verified in an actual WWTP. The performance of this method is evaluated by diagnosis accuracy (DA). All the simulation experiments were programmed with MATLAB version 2018 and run on a PC with one clock speed of 3.0 GHz and 8 GB of RAM in a Microsoft Windows 10 environment. All data were acquired in real WWTPs from 1 January 2016 to 27 February 2016. In total, 2000 groups of data were selected as samples.

4.1. The Feature Variables

In this experiment, 5000 normal data samples, collected in the actual WWTP, are used to select the feature variables. By combining the insights gained from the existing data with expert knowledge, chemical oxygen demand (COD), influent NH₃-N, influent flow volume, NO₃-N in the anoxic zone, influent total phosphorus (TP), oxidation–reduction potential (ORP) in the anaerobic zone, mixed dissolved oxygen (DO) in the aerobic zone, sludge concentration in the aerobic zone, effluent flow volume, liquor suspended solid of the aerobic zone, aeration, effluent turbidity, water temperature, pH, F/M of aerobic zone, and transmembrane pressure (TMP) are selected as preselected fault variables in this experiment and denoted with numbers 1 to 16. They can be used to capture the essential aspects of the fouling process, which ensures that the diagnosis of membrane fouling is based on a solid foundation of both empirical evidence and theoretical understanding, leading to more accurate and effective strategies for its prevention. The regression coefficient in the obtained regression equation was expressed as the correlation between independent and dependent variables. The coefficients corresponding to all independent variables are shown in Figure 5.

By sorting the coefficients, 12 variables with large regression coefficients are selected as the inputs of the membrane fouling detection model and membrane fouling diagnosis model, which are COD, influent NH₃-N, influent TP, ORP in the anaerobic zone, DO in the aerobic zone, sludge concentration in the aerobic zone, effluent flow, aeration, effluent turbidity, water temperature, pH, and TMP.

4.2. Results of Membrane Fouling Detection Model

In this part, 2000 samples of selected variables are used in the training dataset, and 800 samples of selected variables are used in the test dataset. The number of nodes in the input layer and output layer of the autoencoder is 12, and the number of nodes in the hidden layer is 5. Assuming that the CDM has captured the discipline of the input–output accurately with given samples, the maximum root mean square error can be regarded as the threshold that judges the incidents of membrane fouling. The experimental results are shown in Figure 6 and Figure 7. In Figure 6, the trained RMSE is present with an interval of [0, 0.35]. It displays that the trained results of the CDM are promising. The maximum RMSE of the normal training samples in Figure 6 is used as the threshold, which is shown as the red line in Figure 7. It can be found from Figure 7 that the data samples from 0 to 650 are under normal conditions, while membrane fouling happens in the data samples from 650 to 800. Therefore, it can prove the effectiveness of this method.

A_{1} = (\begin{matrix} \begin{array}{r} 0 & 0 & 0 & 0.810403 & 0.464642 & 0.057451 & 0.942594 & 1.247714 & 0.192291 & 0.11872 & 0.094991 & 1.058295 \\ 0.010245 & 0 & 0 & 0.730733 & 0.43716 & 0.05863 & 1.000398 & 1.228595 & 0.20512 & 0.134615 & 0.122252 & 1.100775 \\ 0.016909 & 0.001415 & 0 & 0.403387 & 0.28396 & 0.031732 & 0.611236 & 0.579717 & 0.133133 & 0.047092 & 0.073589 & 0.639023 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0.255535 & 0.470007 & 0 & 0 & 0 & 0.394362 \\ 0 & 0 & 0 & 0.476931 & 0 & 0 & 0.822288 & 1.030058 & 0.010018 & 0 & 0 & 0.958113 \\ 0 & 0 & 0 & 0.756785 & 0.276084 & 0 & 0.90815 & 1.065932 & 0.121703 & 0.003572 & 0 & 1.030733 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0.244137 & 0 & 0 & 0 & 0.155584 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0.22354 & 0 & 0 & 0.306308 & 0.342152 & 0 & 0 & 0 & 0.311331 \\ 0 & 0 & 0 & 0.852613 & 0.355963 & 0 & 1.032178 & 1.267596 & 0.1403 & 0 & 0 & 1.146267 \\ 0 & 0 & 0 & 0.846681 & 0.325523 & 0.009613 & 1.018529 & 1.201856 & 0.212144 & 0.01461 & 0 & 1.119884 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0.140916 & 0 & 0 & 0 & 0 \end{array} \end{matrix})

(17)

A_{2} = (\begin{matrix} \begin{array}{r} 0 & 0 & 0 & 0.810403 & 0 & 0 & 0.942594 & 1.247714 & 0 & 0 & 0 & 1.058295 \\ 0 & 0 & 0 & 0.730733 & 0 & 0 & 1.000398 & 1.228595 & 0 & 0 & 0 & 1.100775 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0.822288 & 1.030058 & 0 & 0 & 0 & 0.958113 \\ 0 & 0 & 0 & 0.756785 & 0 & 0 & 0.90815 & 1.065932 & 0 & 0 & 0 & 1.030733 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0.852613 & 0 & 0 & 1.032178 & 1.267596 & 0 & 0 & 0 & 1.146267 \\ 0 & 0 & 0 & 0.846681 & 0 & 0 & 1.018529 & 1.201856 & 0 & 0 & 0 & 1.119884 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array} \end{matrix})

(18)

4.3. Results of Membrane Fouling Diagnosis Model

According to the results of feature variable selection, COD, influent NH₃-N, influent TP, ORP in the anaerobic zone, DO in the aerobic zone, sludge concentration in the aerobic zone, effluent flow, aeration, effluent turbidity, water temperature, pH, and TMP are determined as the related variables for the diagnosis of membrane fouling and replaced by numbers 1 to 12, respectively. Once a membrane fouling incident occurs, we can trace the causal variables from them. They are used as the inputs of the CDM. In total, 2000 groups of data are selected as samples to construct the initial FTT as shown in Figure 8. The adjacency matrix between variables is shown in A₁. It can be found from Figure 8 that the FTT has high complexity, so it is necessary to extract stronger causality. The FTT can be simplified by setting a threshold value. The adjacency matrix between variables is shown in A₂. As shown in Figure 9, the simplified FTT is obtained, which can represent the relationship between fault variables.

As shown in Figure 10, the information compressible strategy is used to further simplify the fault transfer topology. The adjacency matrix between the variables is shown in A₃. In this experiment, the root causal variables of membrane fouling are diagnosed according to the simplified fault transfer topology, and the effectiveness of the CDM is compared with other diagnosis methods.

A_{3} = (\begin{matrix} \begin{array}{r} 0 & 0 & 0 & 0.810403 & 0 & 0 & 0.942594 & 1.247714 & 0 & 0 & 0 & 1.058295 \\ 0 & 0 & 0 & 0.730733 & 0 & 0 & 1.000398 & 1.228595 & 0 & 0 & 0 & 1.100775 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0.822288 & 0 & 0 & 0 & 0 & 0.958113 \\ 0 & 0 & 0 & 0.756785 & 0 & 0 & 0 & 1.065932 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1.267596 & 0 & 0 & 0 & 1.146267 \\ 0 & 0 & 0 & 0.846681 & 0 & 0 & 1.018529 & 1.201856 & 0 & 0 & 0 & 1.119884 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array} \end{matrix})

(19)

To evaluate the diagnostic efficiency of the fault transfer topology (FTT) with an information compressible strategy (ICS) and a threshold, the results were compared with some other methods: FTT with threshold, initial FTT, Bayesian network (BN), ANN, and fuzzy logic (FL). The comparison results of FTT with ICS and a threshold to other methods are related to diagnosis time, number of connections, and accuracy. In Table 2, it can be seen that the proposed FTT with ICS and a threshold achieves the lowest number of connections compared to FTT without ICS or a threshold. It means that the designed ICS and threshold can simplify the failover topology which also contributes to speeding the diagnosis of membrane fouling. In addition, the accuracy of FTT with ICS and a threshold is also best compared to other methods, which indicates that the proposed FFT in the CDM is in favor of exploring the root causal variables of membrane fouling.

4.4. Discussion of Results

Based on the above experimental results and analysis, the performance of the CDM is significantly superior to other existing methods. The main merits of the CDM are summarized as follows.

(1): Good detection. It is essential for the CDM to identify incidents of membrane fouling with specific causal variables. The proposed autoencoder can summarize thresholds by RMSE for any membrane fouling, assuming it covers the entire normal conditions of the MBR. These thresholds can serve as a reference for operators to monitor the occurrence of membrane fouling without the need for a physical or mathematical model. The results in Figure 6 and Figure 7 also illustrate the efficacy of this method.
(2): Intuitive diagnosis. By constructing a fault transfer topology with the CDM in Figure 8, the dynamic observation of causal relationships between variables facilitates the determination of causal factors leading to membrane fouling. Additionally, to eliminate repetitive affiliation relationships, the fault transfer topology is simplified using an information compressible strategy, as shown in Figure 9 and Figure 10. Table 2 also demonstrates that the proposed CDM significantly enhances the speed and accuracy of diagnosis.

5. Conclusions

Membrane fouling is a bottleneck problem to the wide application of MBR. A CDM is proposed, which can diagnose the root causal variables of membrane fouling and improve the diagnosis accuracy. Primarily, the causal relationship between variables is judged based on the transfer entropy to obtain the initial fault transfer topology. Then, the typical causal relationship of variables is extracted based on the significance threshold to obtain the simplified fault transfer topology. For each feasible structure, the fitting degree between data structures and the complexity of the structure are comprehensively considered through the information compressible strategy. The fault transfer topology can be simplified to improve the diagnosis accuracy. The experimental results show that the proposed model not only simplifies the diagnosis process but also improves the robustness and interpretability of the results. This work represents an important step forward in the field of membrane fouling diagnosis and has the potential to significantly impact decision-making in wastewater treatment and filtration applications. However, the proposed method may not generalize well to different types of membrane fouling or systems with significantly different characteristics. The transfer entropy and information compressible strategies may need to be adapted or re-tuned for specific applications. Future research will focus on investigating its generalizability to different MBR systems and other membrane-based processes.

Author Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by X.W., D.H., H.Y. and H.H. The first draft of the manuscript was written by X.W. and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grants 61890930-5 and 61622301, the National Key Research and Development Project under Grants 2018YFC1900800-5, and the Beijing Natural Science Foundation under Grant 4172005.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are not publicly available due to corporate privacy but are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lu, X.; Wang, J.; Han, Y.; Zhou, Y.; Song, Y.; Dong, K.; Zhen, G. Unrevealing the role of in-situ Fe(II)/S₂O₈²⁻ oxidation in sludge solid–liquid separation and membrane fouling behaviors of membrane bioreactor (MBR). Chem. Eng. J. 2022, 434, 134666. [Google Scholar] [CrossRef]
Tanis-Kanbur, M.B.; Tamilselvam, N.R.; Chew, J.W. Membrane fouling mechanisms by BSA in aqueous-organic solvent mixtures. J. Ind. Eng. Chem. 2022, 108, 389–399. [Google Scholar] [CrossRef]
Cui, Z.; Wang, X.; Ngo, H.; Zhu, G. In-situ monitoring of membrane fouling migration and compression mechanism with improved ultraviolet technique in membrane bioreactors. Bioresour. Technol. 2022, 347, 126684. [Google Scholar] [CrossRef] [PubMed]
Heo, S.; Nam, K.; Woo, T.; Yoo, C. Digitally-transformed early-warning protocol for membrane cleaning based on a fouling-cumulative sum chart: Application to a full-scale MBR plant. J. Membr. Sci. 2022, 643, 120080. [Google Scholar] [CrossRef]
Ruigómez, I.; González, E.; Rodríguez-Gómez, L.; Vera, L. Fouling control strategies for direct membrane ultrafiltration: Physical cleanings assisted by membrane rotational movement. Chem. Eng. J. 2022, 436, 135161. [Google Scholar] [CrossRef]
Sutrisna, P.D.; Kurnia, K.A.; Siagian, U.W.R.; Ismadji, S.; Wenten, I.G. Membrane fouling and fouling mitigation in oil–water separation: A review. J. Environ. Chem. Eng. 2022, 10, 107532. [Google Scholar] [CrossRef]
Yao, J.; Wu, Z.; Liu, Y.; Zheng, X.; Zhang, H.; Dong, R.; Qiao, W. Predicting membrane fouling in a high solid AnMBR treating OFMSW leachate through a genetic algorithm and the optimization of a BP neural network model. J. Environ. Manag. 2022, 307, 114585. [Google Scholar] [CrossRef]
Wu, X.H.; Gao, Y.T. Generalized Darboux transformation and solitons for the Ablowitz–Ladik equation in an electrical lattice. Appl. Math. Lett. 2023, 137, 108476. [Google Scholar] [CrossRef]
Suo, Y.; Chen, S.; Ren, Y. Research on the influence of polyaluminum chloride and benzotriazole on membrane fouling and membrane desalination performance. J. Environ. Chem. Eng. 2021, 9, 106676. [Google Scholar] [CrossRef]
Li, S.; Chen, P.; Maddela, N.R.; Yang, X.; Chen, S.; Feng, J.; Zhang, S.; Zhang, L. Effects of filtration modes on fouling characteristic and microbial community of bio-cake in a membrane bioreactor. J. Environ. Chem. Eng. 2022, 10, 107465. [Google Scholar] [CrossRef]
Zhang, C.; Bao, Q.; Wu, H.; Shao, M.; Wang, X.; Xu, Q. Impact of polysaccharide and protein interactions on membrane fouling: Particle deposition and layer formation. Chemosphere 2022, 296, 134056. [Google Scholar] [CrossRef] [PubMed]
Lewis, W.J.T.; Mattsson, T.; Chew, Y.M.J.; Bird, M.R. Investigation of cake fouling and pore blocking phenomena using fluid dynamic gauging and critical flux models. J. Membr. Sci. 2017, 533, 38–47. [Google Scholar] [CrossRef]
Vrouwenvelder, J.S.; Manolarakis, S.A.; van der Hoek, J.P.; van Paassen, J.A.M.; van der Meer, W.G.J.; van Agtmaal, J.M.C.; Prummel, H.D.M.; Kruithof, J.C.; van Loosdrecht, M.C.M. Quantitative biofouling diagnosis in full scale nanofiltration and reverse osmosis installations. Water Res. 2008, 42, 4856–4868. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Sanly; Brannock, M.; Leslie, G. Diagnosis of membrane bioreactor performance through residence time distribution measurements-a preliminary study. Desalination 2009, 236, 120–126. [Google Scholar] [CrossRef]
Azizighannad, S. Raman imaging of membrane fouling. Sep. Purif. Technol. 2020, 242, 116763. [Google Scholar] [CrossRef]
Nam, K.; Heo, S.; Rhee, G.; Kim, M.; Yoo, C. Dual-objective optimization for energy-saving and fouling mitigation in MBR plants using AI-based influent prediction and an integrated biological-physical model. J. Membr. Sci. 2021, 626, 119208. [Google Scholar] [CrossRef]
Liu, J.; Kang, X.; Luan, X.; Gao, L.; Tian, H.; Liu, X. Performance and membrane fouling behaviors analysis with SVR-LibSVM model in a submerged anaerobic membrane bioreactor treating low-strength domestic sewage. Environ. Technol. Innov. 2020, 19, 100844. [Google Scholar] [CrossRef]
Han, H.G.; Zhang, H.J.; Liu, Z.; Qiao, J.F. Data-driven decision-making for wastewater treatment process. Control Eng. Pract. 2020, 96, 104305. [Google Scholar] [CrossRef]
Mittal, S.; Gupta, A.; Srivastava, S.; Jain, M. Artificial neural network based modeling of the vacuum membrane distillation process: Effects of operating parameters on membrane fouling. Chem. Eng. Process.-Process Intensif. 2021, 164, 108403. [Google Scholar] [CrossRef]
Chen, H.-S.; Yan, Z.; Zhang, X.; Liu, Y.; Yao, Y. Root cause diagnosis of process faults using conditional granger causality analysis and maximum spanning tree. IFAC-PapersOnLine 2018, 51, 381–386. [Google Scholar] [CrossRef]
Waghen, K.; Ouali, M.-S. Multi-level interpretable logic tree analysis: A data-driven approach for hierarchical causality analysis. Expert Syst. Appl. 2021, 178, 115035. [Google Scholar] [CrossRef]
Shen, Y.; Tian, B.; Zhou, T.Y.; Cheng, C.D. Multi-pole solitons in an inhomogeneous multi-component nonlinear optical medium, Chaos. Solitons Fractals 2023, 171, 113497. [Google Scholar] [CrossRef]
Gao, X.T.; Tian, B. Water-wave studies on a (2+1)-dimensional generalized variable-coefficient Boiti–Leon–Pempinelli system. Appl. Math. Lett. 2022, 128, 107858. [Google Scholar] [CrossRef]
Duan, P.; He, Z.; He, Y.; Liu, F.; Zhang, A.; Zhou, D. Root cause analysis approach based on reverse cascading decomposition in QFD and fuzzy weight ARM for quality accidents. Comput. Ind. Eng. 2020, 147, 106643. [Google Scholar] [CrossRef]
Arias Velásquez, R.M.; Mejía Lara, J.V. Root cause analysis improved with machine learning for failure analysis in power transformers. Eng. Fail. Anal. 2020, 115, 104684. [Google Scholar] [CrossRef]
Amin, M.T.; Khan, F.; Ahmed, S.; Imtiaz, S. A data-driven Bayesian network learning method for process fault diagnosis. Process Saf. Environ. Prot. 2021, 150, 110–122. [Google Scholar] [CrossRef]
Wang, J.; Yang, Z.; Su, J.; Zhao, Y.; Gao, S.; Pang, X.; Zhou, D. Root-cause analysis of occurring alarms in thermal power plants based on Bayesian networks. Int. J. Electr. Power Energy Syst. 2018, 103, 67–74. [Google Scholar] [CrossRef]
Han, H.G.; Dong, L.X.; Qiao, J.F. Data-knowledge-driven diagnosis method for sludge bulking of wastewater treatment process. J. Process Control 2021, 98, 106–115. [Google Scholar] [CrossRef]

Figure 1. Membrane fouling diagnosis system.

Figure 2. Autoencoder structure.

Figure 3. Fault transfer topology.

Figure 4. Indirect connection.

Figure 5. Characteristic variable selection.

Figure 6. The RMSE during the training process.

Figure 7. The RMSE during the detection process.

Figure 8. Initial fault transfer topology.

Figure 9. Fault transfer topology after setting the threshold.

Figure 10. Fault transfer topology based on information compressible strategy.

Table 1. Specific steps of membrane fouling diagnosis.

%Characteristic variable selection
1 Standardize the data to obtain E₀ and F₀	% Equations (1) and (2)
2 Obtain the principal component of the variable	% Equation (3)
3 Determine the number of final extracted components	% Equations (4) and (5)
Obtain K variables that have a great influence on membrane fouling
%Membrane fouling detection model
1 Acquire normal data and train an autoencoder
2 Obtain the threshold of reconstruction error J₀ of normal samples
3 Use the autoencoder to detect the data collected in real time, and the reconstruction error J is obtained
4 If J > J₀, the membrane fouling exists
% Calculate the transfer entropy between variables
Obtain the influence relationship between variables T_Y_→X	% Equations (9) and (10)
% Generate adjacency matrix A_kk
for j = 1: k do
for i = j + 1: k do
if T_j_→i >0
A_ji= T_j_→i
else
A_ij= T_j_→i
end for
end for
FTT is obtained because the relationship between variables is connected by lines according to the adjacency matrix A_kk.
% Simplify fault transfer topology
Set threshold
1 Select two data segments with a long time distance from historical data of the two variables
2 Calculate of entropy transfer te_i between the above two data segments	% Equation (10)
3 Repeat steps 1 and 2, calculate multiple sets of such transfer entropy NET = [te₁, te₂,…, te_s]
4 Calculate the average value and standard deviation of NET to obtain the threshold	% Equation (13)
Information compressible strategy
1 Filter all direct and indirect transfer relationships between variables
2 Calculate the score of the structure for each transfer relationship	% Equations (14) and (15)
3 Choose the transfer relationship corresponding to the highest score
The root causal variables are determined according to the simplified fault transfer topology

Table 2. Performance of different methods.

Methods	Time (s)	Number of Connections	Accuracy (%)
FTT with ICS and threshold	8.3	18	93.4%
FTT with threshold	9.5	23	91.0%
Initial FTT	14.9	66	86.7%
BN	12.1	-	85.1%
ANN	13.3	-	82.3%
FL	19.8	-	82.1%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.; Hou, D.; Yang, H.; Han, H. Compressible Diagnosis of Membrane Fouling Based on Transfer Entropy. Appl. Sci. 2024, 14, 8176. https://doi.org/10.3390/app14188176

AMA Style

Wu X, Hou D, Yang H, Han H. Compressible Diagnosis of Membrane Fouling Based on Transfer Entropy. Applied Sciences. 2024; 14(18):8176. https://doi.org/10.3390/app14188176

Chicago/Turabian Style

Wu, Xiaolong, Dongyang Hou, Hongyan Yang, and Honggui Han. 2024. "Compressible Diagnosis of Membrane Fouling Based on Transfer Entropy" Applied Sciences 14, no. 18: 8176. https://doi.org/10.3390/app14188176

APA Style

Wu, X., Hou, D., Yang, H., & Han, H. (2024). Compressible Diagnosis of Membrane Fouling Based on Transfer Entropy. Applied Sciences, 14(18), 8176. https://doi.org/10.3390/app14188176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Compressible Diagnosis of Membrane Fouling Based on Transfer Entropy

Abstract

1. Introduction

2. Background of Membrane Fouling

2.1. Membrane Fouling

2.2. Membrane Fouling Diagnosis System

3. Methodology

3.1. Feature Variable Selection

3.2. Membrane Fouling Detection Model

3.3. Construction of Fault Transfer Topology

3.4. Simplification of Fault Propagation Topology

4. Results and Discussion

4.1. The Feature Variables

4.2. Results of Membrane Fouling Detection Model

4.3. Results of Membrane Fouling Diagnosis Model

4.4. Discussion of Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI