*Editorial* **Unlocking the Full Potential: New Frontiers in Anaerobic Digestion (AD) Processes**

**Sigrid Kusch-Brandt 1,2,\* , Sonia Heaven 1,\* and Charles J. Banks <sup>1</sup>**


Anaerobic digestion (AD) is a bio-based solution designed to convert organic materials into renewable energy and other products, such as soil improver and organic fertiliser. AD is widely used in practice, with facilities at many thousands of sites worldwide: in Europe alone, more than 20,000 full-scale plants were in operation in 2022 [1]. The underlying biological processes are complex, and multiple options exist to steer the AD process towards optimised performance and a desired set of outputs in terms of energy and material flows. This puts AD in a prominent position with research agendas aiming for more sustainable resource management. Its ability to generate high-value products from organic wastes and residues is a key strength.

The Special Issue on "New Frontiers in Anaerobic Digestion (AD) Processes" was initiated to explore recent developments and advanced concepts related to the valorisation of biomass via the application of AD. Fourteen submissions are included in the Special Issue, and each of these publications contributes towards unlocking the full potential of AD. Five thematic clusters to advance AD can be identified based on the included publications:


All included manuscripts contribute to more than one of these five clusters (Table 1). In this paper, some selected findings reported in the publications are highlighted. These are not intended to be exhaustive, but rather to provide some first insights into the rich body of new knowledge created by the authors of the Special Issue.

For the purpose of maintaining the stability and performance of the AD process, in practice, only certain parameters can be monitored in real-time, while adequate methods are still lacking for many others. Yan et al. [2] reviewed the recent progress in applying soft sensor solutions. Some systems are available that use software-supported methods to determine the unmeasurable parameters based on measuring auxiliary variables online; but the need for more research remains high. Integration of deep learning elements into these software solutions is particularly promising.

Liu et al. [3] focused on the residual biogas potential of digestate leaving the digester and the current time-consuming standard procedures to determine this indicator through experimental laboratory testing. Residual biogas potential is a key indicator of digestate stability, which in turn is an essential requirement for spreading digestate onto agricultural land. The authors showed that kinetic modelling, in particular when supported by machine learning, could be successfully applied to reduce the testing time for residual biogas potential.

**Citation:** Kusch-Brandt, S.;

Heaven, S.; Banks, C.J. Unlocking the Full Potential: New Frontiers in Anaerobic Digestion (AD) Processes. *Processes* **2023**, *11*, 1669. https:// doi.org/10.3390/pr11061669

Received: 21 May 2023 Accepted: 29 May 2023 Published: 31 May 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Table 1.** Publications included in this Special Issue and their relevance for the five clusters identified to unlock the full potential of anaerobic digestion.


Full-scale AD plant operators require reliable information on the methane yields of potential substrates. In agriculture, it is common to use crop-based substrates (energy crops, crop residues), either as the main AD input or to supplement biogas production from manurefed digesters. Decisions on which crop material to use in AD can significantly impact the entirety of a farm's management. However, relying on data from the literature to estimate the methane yields of crops is not an advisable strategy. The results of Zhang et al. [4] revealed

that many publications displayed deficiencies in data reporting. The transferability of the reported methane potentials was limited, because the variability in values for the same crop when tested under different experimental conditions or grown under different cultivation conditions often exceeded the variation of values between different crop species.

With a further increasing world population, food production is a major challenge to be addressed as part of the water/energy/food/climate change nexus. Valorisation of residues from the food production chain makes a valuable contribution to more sustainable food systems, and this applies both to established food production methods and to novel solutions. Among the novel solutions is the cultivation of insect biomass as an alternative feedstuff, which generates insect frass as a residue. After exploring the suitability of insect frass as an AD substrate, Wedwitschka et al. [5] reported promising results, but also highlighted some risk of instability of the process. Similarly, citrus waste is a challenging AD substrate due to the presence of toxic compounds. The work of Rosas-Mendoza et al. [6] suggests that, at an industrial scale, it might not be necessary to remove toxic D-limonene from orange peel waste when using cattle manure as inoculum; but the authors also conclude that further research is required to better understand the implications of different D-limonene concentrations under different reactor configurations. Other food production residues clearly require pre-treatment to make the material suitable for AD. One such biomass is wheat straw, a high-volume material stream. Zerback et al. [7] applied hydrothermal pre-treatment with good success, but they also observed that overly severe pre-treatment conditions had a negative impact on the degradation kinetics. For solutions to be applied at full scale, there is also a need to balance technical and economic feasibility.

Avoiding inhibition of the AD process and achieving a high gas yield are both key goals of commercial plant operators. Berzal de Frutos et al. [8] researched the co-digestion of sewage sludge and 160 different trade wastes with the aim of understanding how wastewater treatment plants can improve their AD performance by accepting trade wastes. The authors concluded that the addition of 10 percent (by volume) of trade waste can usually be recommended, but this may need to be confirmed by further experiments, for example, where inhibitory components are present or microbial acclimatisation is required.

Another approach to improve the performance of AD in the wastewater sector is the implementation of highly efficient bioreactors. Pacheco-Ruiz et al. [9] reported findings from long-term experiments conducted over 242 days with submerged anaerobic membrane bioreactors. A key result indicated that operation without chemical or external cleaning was feasible if the process conditions were adequately set by controlling solids retention and, thus, mean cell residence time. Clearly, such long-term experiments are required in order to reliably inform full-scale operators about the performance of specific operating regimes.

In practice, one of the most important factors influencing whether AD will be implemented or not is if there is sufficient confidence in its economic viability. Financial incentives directly influence the AD landscape and its development. As an example, in the United Kingdom (UK) there is currently a policy vacuum for residues-based small-scale farm AD (<150 kWe), and Bywater and Kusch-Brandt [10] showed that it is very difficult for such installations to achieve profitability, despite the currently high energy prices. An innovative policy mechanism would be to introduce financial support based on the "public goods" benefits offered by on-farm AD (e.g., greenhouse gas reduction, positive soil organic carbon impact, support of rural development).

Several of the published papers address AD as an element in decarbonising the energy system through the adoption of hydrogen and biomethane solutions. Here, AD can be applied in different ways. AD can be implemented to produce biohydrogen through dark fermentation in the digester. The concentration of hydrogen, however, is relatively low, and further processes are required to separate the hydrogen from the resulting gas mixture. As reported by Soto et al. [11], novel types of materials have become available in recent years to make gas membrane separation more effective, thus improving the competitiveness of bio-based hydrogen production. Improved membrane separation performance is of great important to other processes besides hydrogen production. A key application in the AD area is biogas upgrading to biomethane through the membrane-assisted removal of CO2. One major advantage of biomethane is that it can directly substitute fossil natural gas in existing infrastructures.

Another approach to enhancing biomethane supply is production at biogas facilities through the biomethanation of CO<sup>2</sup> with hydrogen. While its commercial robustness remains to be confirmed, significant potential clearly exists. Bywater et al. [12] estimated that CO<sup>2</sup> biomethanation could raise the AD's contribution to bioenergy in the United Kingdom from 15 percent to 22 percent. There is, however, a relative shortage of reliable data on current UK AD feedstocks, which makes it difficult to quantify potential biomethane production from different substrates. There are also challenges related to the biomethanation process, especially when conducted in-situ, i.e., within the digester itself rather than in an external reactor. The work of Poggio et al. [13] contributed to a better understanding of the hydrogen gas–liquid mass transfer phenomena and to improving biomethanation in continuous AD. Another challenge of the process is the increase in pH because of CO<sup>2</sup> conversion into biomethane, potentially causing inhibition of the digestion process. Zhang et al. [14] presented a fundamentally-derived, experimentally validated approach to minimise such risks during in-situ biomethanation. These insights increase the feasibility of implementing the CO<sup>2</sup> biomethanation process in existing AD facilities, and, thus, of maximising the value of existing infrastructure while contributing to the decarbonisation goals [14].

A common theme across all publications of this Special Issue is making better use of infrastructure and biomass resources. This can be by increasing the efficiency of processes and performance of equipment, by reducing the risk of inhibition, by making substrates more available, or by integrating hydrogen and biomethane. As an approach with the explicit goal of making the best possible use of one unit of biomass, the concept of the biorefinery has evolved in the last decades; its main feature is to process biomass through different schemes operated widely in parallel, thus supplying a multitude of valuable outputs. Bolzonella et al. [15] present a biorefinery pilot plant based on AD that was designed to supply a set of products, namely, energy products (hydrogen, methane), chemicals (short chain volatile fatty acids, polyhydroxyalkanoates), and other materials (nutrients for agriculture, microbial proteins for food or animal feed applications). In such a biorefinery, AD becomes one integrated element of a larger system, as it is complemented by other processes (mechanical, chemical, or biological).

Clearly, applications of AD will continue to change in future, and further progress in making high-value use of this versatile bio-based technology can be expected. There are many aspects to be addressed by further research, and some current research questions are pointed out by the authors of this Special Issue. At the same time, the results of the Special Issue suggest that it is of particular relevance to inspire trust in the economic viability of AD facilities and the technical reliability of novel AD solutions.

**Author Contributions:** Conceptualization, S.K.-B., S.H. and C.J.B.; writing—original draft preparation, S.K.-B.; writing—review and editing, S.H. and C.J.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** As guest editors, we thank all the authors who have submitted their work to the Special Issue "New Frontiers in Anaerobic Digestion (AD) Processes". We are grateful to the reviewers who have invested their time to provide valuable feedback to the manuscripts, thus ensuring the high quality of the publications included. We thank the editorial team of *Processes* for the excellent support in compiling this Special Issue.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Review* **Review of Soft Sensors in Anaerobic Digestion Process**

**Pengfei Yan <sup>1</sup> , Minghui Gai <sup>1</sup> , Yuhong Wang 1,\* and Xiaoyong Gao <sup>2</sup>**


**Abstract:** Anaerobic digestion is associated with various crucial variables, such as biogas yield, chemical oxygen demand, and volatile fatty acid concentration. Real-time monitoring of these variables can not only reflect the process of anaerobic digestion directly but also accelerate the efficiency of resource conversion and improve the stability of the reaction process. However, the current real-time monitoring equipment on the market cannot be widely used in the industrial production process due to its defects such as expensive equipment, low accuracy, and lagging analysis. Therefore, it is essential to conduct soft sensor modeling for unmeasurable variables and use auxiliary variables to realize real-time monitoring, optimization, and control of the an-aerobic digestion process. In this paper, the basic principle and process flow of anaerobic digestion are first briefly introduced. Subsequently, the development history of the traditional soft sensor is systematically reviewed, the latest development of soft sensors was detailed, and the obstacles of the soft sensor in the industrial production process are discussed. Finally, the future development trend of deep learning in soft sensors is deeply discussed, and future research directions are provided.

**Keywords:** anaerobic digestion; soft sensor; deep learning

#### **1. Introduction**

Anaerobic digestion is a highly complex biochemical reactions process, with characteristics such as multi-factor influence, dynamic change, and complex nonlinearity [1]. Anaerobic digestion can not only treat organic pollutants but also produce clean energy [2]. Therefore, anaerobic digestion technology has broad development space in the treatment of wastewater and organic solid waste [3] and is one of the practical ways to solve energy and environmental problems. However, anaerobic microorganisms of the anaerobic digestion process are intensely sensitive to changes in the digestion environment, and methanogens have extremely strict requirements on the external environment [3]. The unexpected changes in the external environment have an impact on the hydrolysis, acidification, and methanation processes of anaerobic digestion [4,5]. This will cause numerous volatile fatty acids (VFA) to accumulate in the reactor, inhibit the progress of methanation, and even result the failure of the anaerobic reactor operation [6–8]. Therefore, a more advanced online measurement system must be used to fully monitor the anaerobic digestion process in real-time to ensure that the anaerobic digestion process is stable and efficient while obtaining a higher biogas yield [9].

In terms of anaerobic digestion process variables monitoring, there is mature and reliable online monitoring equipment for temperature, pressure, flow rate, gas composition, and other variables [10,11]. However, there are still many key variables that cannot be directly measured, or the measurement equipment is expensive [12], such as biogas yield, chemical oxygen demand (COD), and VFA concentration. Online monitoring equipment for these variables cannot be widely used in industrial production due to factors such as expensive equipment, low accuracy, and lagging analysis [13–16]. Consequently, the soft sensor using online measurable auxiliary variables to estimate the unmeasurable variables

**Citation:** Yan, P.; Gai, M.; Wang, Y.; Gao, X. Review of Soft Sensors in Anaerobic Digestion Process. *Processes* **2021**, *9*, 1434. https:// doi.org/10.3390/pr9081434

Academic Editors: Sonia Heaven, Sigrid Kusch-Brandt and Charles Banks

Received: 1 July 2021 Accepted: 16 August 2021 Published: 19 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

in real-time has been broadly used in the anaerobic digestion process [17,18]. The soft sensor is developed based on the inference control theory proposed by *Brosilow* [19], suggesting that the mathematical relationship between auxiliary variables and target variables is established under certain optimal criteria, and the selection of auxiliary variables should be measurable and easy-to-obtain [20]. Real-time monitoring of target variables is achieved through software [21]. Since the soft sensor has the advantages of fast response, low cost, easy implementation, and simple maintenance [22], it has been widely used in monitoring, optimization, and control of engineering [23]. Soft-sensor technology is broadly based on two modelling approaches: those derived mechanistically and those that are datadriven [24]. Specifically, mechanism models can be classified into common mechanism models and state estimation and system identification based on mechanism models [25]. Data-driven models can be divided into statistical machine learning models and deep learning models.

In this paper, the basic principle and process flow of anaerobic digestion are first briefly introduced. Subsequently, the soft sensors in the anaerobic digestion process are compared and analyzed, the development process of traditional soft sensors is systematically reviewed, and the defects of traditional soft sensors are presented. Next, the latest development of soft sensors is detailed, including the application of deep learning in the anaerobic digestion process. Moreover, the obstacles encountered by soft sensors in industrial production are further discussed. Finally, the future development trend of deep learning in soft sensors is deeply analyzed, and a summary and outlook are drawn.

#### **2. Anaerobic Digestion Process**

#### *2.1. Basic Principles of Anaerobic Digestion*

According to the four-stage theory of anaerobic digestion proposed by *Zeikus*, the anaerobic digestion process can be divided into four stages: hydrolysis, acidification, acetic acidification, and methanation [26]. In the hydrolysis stage, the hydrolase hydrolyzes macromolecular organics (such as protein, fat, and cellulose) into small molecular organics (such as glucose, amino acids, and long-chain fatty acids) for subsequent reactions [26]. After the initial hydrolysis, small-molecule organic substances (such as glucose and amino acids) will be further decomposed by acid-producing bacteria to produce acidified products mainly short-chain fatty acids and secondary metabolites (such as hydrogen and carbon dioxide) [27]. In the acetification stage, acetogens convert the organic acids and alcohols produced in the hydrolysis and acidification stages into acetic acid, generating carbon dioxide and hydrogen [28]. In the methanation stage, acetic acid, hydrogen, and carbon dioxide are converted into methane under the action of obligate anaerobic methanogens [29].

#### *2.2. Process Parameters of Anaerobic Digestion*

There are some essential process variables in the anaerobic digestion process, such as pH, alkalinity, temperature, VFA concentration, COD, and biogas yield. Real-time monitoring of the above variables can ensure the efficient and stable operation of the anaerobic digestion process. However, there is little widely used real-time monitoring equipment for VFA concentration, COD, and biogas yield.


#### *2.3. Anaerobic Digestion Process*

In the industrial production process, anaerobic digestion processes are usually classified according to factors such as operating temperature, feeding method, and the number of reactors [37]. It can be divided into single-phase digestion and two-phase digestion based on the number of reactors [38]. The single-phase digestion process was widely used in the immature stage of the early anaerobic digestion theory due to its low price and simple operation. Single-phase digestion suggests that the hydrolysis, acidification, acetic acidification, and methanation processes of degrading macromolecular organics are all conducted in the same digestion tank, and the inhibition of any one step will affect the overall digestion efficiency [39]. With the development of the anaerobic digestion theory, researchers and technologists have developed a two-phase digestion process to avoid acid inhibition. Two-phase anaerobic digestion suggests the hydrolysis, acidification, and acetic acid stages are conducted in the acid production tank, while the methane production stage is performed in the methane production tank [40]. This method can effectively avoid mutual inhibition between the steps, improve the efficiency of anaerobic digestion, shorten the reaction time, and increase methane production [41].

According to the biodegradability of the input materials, different two-phase anaerobic digestion devices are generally selected [42]. When industrial wastewater is treated with low solid content, the acid production tank and the methane production tank usually adopt a continuous stirred tank reactor and an up-flow anaerobic sludge blanket, respectively [43]. When organic wastewater is treated with high solid content, both the acid production tank and the methane production tank use the up-flow solid reactor [44]. When organic sludge is processed with higher solid content, both the acid production tank and the methane production tank employ the continuous stirred tank reactor [45]. The specific process flow is described as follows [28]. First, the pretreated organic materials are fed into the hydrolysis acidification tank to perform the hydrolysis reaction of macromolecular organics and the acidification reaction of small molecular organics. Then, the acidified product is input into the methane-generating tank for methane production reaction. Since the stages of acid production and methane production are performed separately, it is ensured that acid-producing bacteria and methanogens are in optimal environmental conditions and can exert maximum activity. Moreover, the acid production process improves the biochemical properties of the material, and the acidified product provides a suitable substrate for methanogens. The two-phase anaerobic digestion process is illustrated in Figure 1.

Figure 1.

**Figure 1.** Two-phase anaerobic digestion process flow chart. **Figure 1.** Two-phase anaerobic digestion process flow chart.

#### **3. Development History of Anaerobic Digestion Soft Sensor 3. Development History of Anaerobic Digestion Soft Sensor**

#### *3.1. Soft Sensor Based on Process Mechanism*

*3.1. Soft Sensor Based on Process Mechanism* Mechanism modeling is to determine the mathematical relationship between the target variables and the auxiliary variables through the establishment of a balance equation based on a deep understanding of the process mechanism [23]. It has the advantages of high accuracy, strong interpretability, and clear industrial background. However, the biochemical reaction process of anaerobic digestion is extremely complicated, with strong nonlinearity and uncertainty, making it difficult to establish an accurate mechanism model [46,47]. Moreover, the biochemical reaction process is described by a large number of algebraic equations and differential equations. Therefore, there are defects such as large calculation amount and slow convergence, impeding it in meeting the requirements of real-time monitoring of target variables [48–50]. From another perspective, the mechanism model parameters of anaerobic digestion, such as Monod maximum specific absorption rate and the first-order decay rate in the kinetic parameters of the Anaerobic Digestion Model No.1 (ADM1), are mostly empirical values [51]. The determination of these parameters requires considerable experimental verifications, and the various indicators in the industrial production process will not be tested. Therefore, it is proposed to combine the mechanism model and the data-driven model to establish a hybrid model of anaerobic digestion [52,53]. The hybrid model fully takes advantage of the data-driven model that only pays attention to input and output and does not require a clear internal mechanism, contributing to a decrease in the difficulty of modeling the mechanism model. Moreover, the interpretability of the data-driven model is enhanced using the mechanism model. However, the prediction accuracy and generalization ability of the hybrid model need to Mechanism modeling is to determine the mathematical relationship between the target variables and the auxiliary variables through the establishment of a balance equation based on a deep understanding of the process mechanism [23]. It has the advantages of high accuracy, strong interpretability, and clear industrial background. However, the biochemical reaction process of anaerobic digestion is extremely complicated, with strong nonlinearity and uncertainty, making it difficult to establish an accurate mechanism model [46,47]. Moreover, the biochemical reaction process is described by a large number of algebraic equations and differential equations. Therefore, there are defects such as large calculation amount and slow convergence, impeding it in meeting the requirements of real-time monitoring of target variables [48–50]. From another perspective, the mechanism model parameters of anaerobic digestion, such as Monod maximum specific absorption rate and the first-order decay rate in the kinetic parameters of the Anaerobic Digestion Model No.1 (ADM1), are mostly empirical values [51]. The determination of these parameters requires considerable experimental verifications, and the various indicators in the industrial production process will not be tested. Therefore, it is proposed to combine the mechanism model and the data-driven model to establish a hybrid model of anaerobic digestion [52,53]. The hybrid model fully takes advantage of the data-driven model that only pays attention to input and output and does not require a clear internal mechanism, contributing to a decrease in the difficulty of modeling the mechanism model. Moreover, the interpretability of the data-driven model is enhanced using the mechanism model. However, the prediction accuracy and generalization ability of the hybrid model need to be further improved.

the stages of acid production and methane production are performed separately, it is ensured that acid-producing bacteria and methanogens are in optimal environmental conditions and can exert maximum activity. Moreover, the acid production process improves the biochemical properties of the material, and the acidified product provides a suitable substrate for methanogens. The two-phase anaerobic digestion process is illustrated in

#### be further improved. *3.2. Soft Sensor Based on State Estimation*

*3.2. Soft Sensor Based on State Estimation* In the soft sensor based on state estimation, the method of state observation and state estimation is adopted to obtain the predicted value of the state variable through auxiliary variables and then acquire the predicted value of the target variable [54,55]. With the development of anaerobic digestion soft sensors, various soft sensors based on state estimation have been proposed [56–61]. Among them, the nonlinear observer presented by *Dochain* under the improved anaerobic digestion model can estimate the VFA concentration online under different working conditions [62]. The improved anaerobic digestion model can be expressed as:

$$\begin{cases} \dot{\mathbf{x}} = f(\mathbf{x}, \mu) + \Delta f \\ y = \mathbf{C} \mathbf{x} \end{cases} \tag{1}$$

where ∆*f* denotes the uncertainty item related to unmodeled dynamics and load disturbance; *x* is the vector of dynamic states; *f* denotes the vector field; *C* = [0, 0, 1]; *u* and *y* denote the input and output of the model, respectively. For the improved anaerobic digestion model, the nonlinear observer can be expressed as:

$$\dot{\mathfrak{X}} = f(\mathfrak{X}, \mathfrak{\mu}) + k\_l(\mathfrak{y} - \mathfrak{y}) + k\_d \tanh[\gamma(\mathfrak{y} - \mathfrak{y})] \tag{2}$$

where *x*ˆ ∈ *R* 3 represents the state estimation vector, *y*ˆ indicates the predicted value of the output signal, and *k<sup>l</sup>* , *k<sup>d</sup>* and *γ* denote the observer gains. The estimated error of the model is presented in Formula (3).

$$\dot{e} = f(\mathbf{x}, \boldsymbol{u}) - f(\mathbf{x}, \boldsymbol{u}) + \Delta f - k\_l \mathbf{C}e - k\_d \tanh(\gamma \mathbf{C}e) \tag{3}$$

where *Ce* = *y* − *y*ˆ. This nonlinear observer overcomes the disadvantage of the poor performance of the local observer under non-set conditions and solves the problem that the progressive observer is very sensitive to unknown load disturbances [63]. Additionally, the author has verified the convergence of the observer through Lyapunov stability. Soft sensors based on state estimation can handle situations such as dynamic characteristic differences between the variables and system lag. However, state estimation mainly applies to mature models and models that can reflect the characteristics of the measured object after approximation. Moreover, an increase in the online estimation error would be caused by simplifying the system to reduce the difficulty of modeling, and the use of this method would be restricted by the anaerobic digestion model's requirements for modeling accuracy [64–67].

#### *3.3. Soft Sensor Based on Regression Analysis*

The soft sensor of anaerobic digestion based on regression analysis majorly includes soft sensors based on multiple linear regression (MLR) and soft sensors based on partial least squares regression (PLSR).

MLR is able to establish a linear mapping between auxiliary variables and target variables through the least square method [68]. The soft sensor of anaerobic digestion based on MLR proposed by HU assumes the following linear relationship between auxiliary variables and biogas yield [69]:

$$\mathcal{Y} = \theta\_0 + \theta\_1 X\_1 + \theta\_2 X\_2 + \dots + \theta\_n X\_n \tag{4}$$

where *X* is the auxiliary variable, *θ* is the parameter to be calculated, and *y*ˆ is the predicted value of biogas yield. The target parameter *θ* is solved by minimizing the error of the real biogas yield and the predicted biogas yield with the least squares method. However, the biochemical reaction process of anaerobic digestion is significantly nonlinear, and MLR cannot accurately describe the nonlinear process. Therefore, the anaerobic digestion soft sensor based on MLR has disadvantages such as low accuracy and susceptibility to external interference [70].

The anaerobic digestion soft sensor based on PLSR, which was proposed by Yang [1], can extract the principal components of auxiliary variables and target variables while maximizing the correlation between them [71]. The objective function of the soft sensor is expressed as

$$
\max \text{Cov}(t, y) = \sqrt{var(t)var(y)} \text{corr}(t, y) \tag{5}
$$

where *t* represents the main component of the auxiliary variable, and *y* denotes the COD. The Lagrange multiplier l is introduced to solve the objective function.

$$d = p^T x^T y - \frac{\lambda}{2} \left( p^T p - 1 \right) \tag{6}$$

where *x* and *p* indicate the auxiliary variable and the weight coefficient, respectively. Subsequently, the linear fitting between the principal component and the COD is realized by the MLR algorithm. This model solves the problem of the collinearity of auxiliary variables in the anaerobic digestion process. Unfortunately, the process of dimensionality reduction may eliminate the secondary principal components that are beneficial to regression and retain irrelevant noise, affecting the accuracy of the model. Meanwhile, PLSR is a linear algorithm and is only suitable for linear and weakly nonlinear models. However, there is severe nonlinearity in the anaerobic digestion process, limiting the prediction accuracy and generalization ability of the model.

#### *3.4. Soft Sensor Based on Artificial Neural Network*

Artificial neural networks can establish a non-linear mapping relationship between auxiliary variables and target variables through network learning, including back propagation (BP) neural networks and radial basis function (RBF) neural networks.

The soft sensor based on the BP neural network for the anaerobic digestion process was proposed by researchers [72–78]. In this soft sensor, the gradient descent algorithm is used to update the network weight. Therefore, the soft sensor can approximate the continuous nonlinear function with arbitrary precision and solve the highly nonlinear and uncertain problems in the anaerobic digestion process [79,80]. However, it is prone to fall into a local optimal or over-fitting state, affecting the prediction accuracy and generalization ability of the soft sensor [81].

To handle the complication that the anaerobic digestion soft sensor based on the BP neural network is prone to fall into the local minimum, Yilmaz proposed a soft sensor based on the RBF neural network to predict COD [82]. The soft sensor based on the RBF neural network has the characteristics of global best approximation and strong nonlinear mapping ability. The loss function of the soft sensor is expressed as

$$Loss = \frac{1}{2} \|\mathbf{Y} - \hat{\mathbf{Y}}^2\| + \frac{1}{2}\lambda \|D\hat{\mathbf{Y}}\|^2\tag{7}$$

where *Y* and *Y*ˆ denote the test and predicted values of COD, respectively; *λ* represents the weighting factor of the regular term; *D* indicates the linear differential operator. With the regularization term, the curvature of the approximation function can be controlled, and the problem that the model is prone to overfitting is addressed.

The soft sensor based on the neural network can better handle the problem of nonlinearity in the anaerobic digestion process. However, the performance of soft sensors is dramatically affected by the network topology and hyperparameters in practical applications. Therefore, proper hyperparameters and network topology are selected through optimization algorithms such as genetic algorithm and particle swarm optimization algorithm to improve model prediction accuracy and generalization ability [83–87].

#### *3.5. Soft Sensor Based on Statistical Machine Learning*

The soft sensor based on support-vector regression (SVR) uses the kernel function to map auxiliary variables to the high-dimensional feature space and adopts linear algorithms to analyze the nonlinear characteristics of the samples in the high-dimensional feature space. The convex quadratic programming is solved by the structural risk minimization criterion, which also addresses the high-dimensional and small-sample problems that cannot be solved by artificial neural networks [88]. Given the small-sample problem caused by the difficulty of obtaining target variables in the anaerobic digestion process, Kazemi proposed the soft sensor based on SVR to predict the VFA concentration [89]. The loss function of the soft sensor is expressed as

$$\text{Loss} = \frac{1}{2} \sum\_{i,j=1}^{m} (a\_i - a\_i^\*) \left(a\_j - a\_j^\*\right) \mathbf{k}\left(\mathbf{x}\_{i\cdot}, \mathbf{x}\_j\right) + \sum\_{i}^{m} a\_i (\varepsilon - y\_i) + a\_i^\* (\varepsilon + y\_i) \tag{8}$$

The constraints are

$$\text{s.t.}\left\{\begin{array}{l}\Sigma^{m}\_{i=1}(a\_{i}-a\_{i}^{\*})=0\\a\_{i\prime}a\_{i}^{\*}\in[0,\mathbb{C}]\end{array}\right.\tag{9}$$

where *x* is the auxiliary variable; *y* indicates the VFA concentration; *a<sup>i</sup>* and *a* ∗ *i* are Lagrangian multipliers; *k*(·) represents the kernel function; *ε* is the insensitivity coefficient.

Given the problem of high complexity in solving SVR models, a soft sensor based on least-squares support-vector regression (LS-SVR) was proposed by Liu to monitor the VFA concentration in the anaerobic digestion process in real-time [90]. In the soft sensor based on LS-SVR, the slack variable in the optimization objective is replaced with the quadratic square term of the training error.

$$Loss = \frac{1}{2}||w||^2 + \frac{1}{2}\gamma \sum\_{i=1}^{m} \mathfrak{J}\_i^2 \tag{10}$$

Then, the inequality constraints are replaced with the following equality constraints.

$$y\_i(w\mathbf{x}\_i + b) = \mathbf{1} - \mathfrak{J}\_i \tag{11}$$

where *w* and *b* indicates the learnable parameter of the model, *γ* denotes the regularization coefficient, and *ξ* refers to the training error. Solving the problem of convex quadratic programming is transformed into solving a set of linear equations, reducing the complexity of the model. However, the simplified soft sensor is more sensitive to abnormal values in the anaerobic digestion process, weakening the robustness of the soft sensor. Therefore, optimization algorithms are used to select the appropriate kernel function and hyperparameters to improve the prediction accuracy and generalization ability of the model [91–93].

#### *3.6. Practical Application of Soft Sensors for Anaerobic Digestion*

The soft sensor of anaerobic digestion is widely used in various industries owing to its advantages of low price, easy development, and maintenance. The soft sensor based on the process mechanism proposed by Fan [53] is employed to predict the bacterial concentration of high-temperature anaerobic digestion of cow manure. The kinetic model of anaerobic digestion of cow manure is expressed as:

$$\frac{dX}{dt} = \mu\_{\text{max}} X \left(1 - \frac{X}{X\_{\text{max}}}\right) \tag{12}$$

$$\frac{dP}{dt} = k\_3 X - k\_4 \frac{dX}{dt} \tag{13}$$

$$-\frac{dS}{dt} = k\_1 \frac{dX}{dt} + k\_2 \frac{dP}{dt} \tag{14}$$

where *X*, *P*, and *S* denote cell concentration, product concentration, and substrate concentration, respectively; *µmax* and *Xmax* indicate the maximum growth rate and concentration of the bacteria, respectively; *k*1, *k*2, *k*3, and *k*<sup>4</sup> represent the cell growth rate, acid production rate coefficient, total enzyme activity, and cell activity coefficient, respectively, and the latter two factors can directly affect the cell growth rate and fermentation cycle. It can be observed that the cell concentration and substrate concentration are the direct factors affecting anaerobic digestion. Therefore, the cell growth rate, acid production rate, total enzyme activity, and cell activity are selected as auxiliary variables. However, the versatility of the soft sensor is poor. The prediction accuracy of the model will significantly decrease when fermentation conditions and fermentation batches change. The robust nonlinear observer proposed by *Dochain* [62] is adopted to predict the VFA concentration during

the anaerobic digestion process of industrial wastewater. The mass balance equation of anaerobic digestion is expressed as:

$$\dot{S} = \mu \left( \mathcal{S}\_f - \mathcal{S} \right) - k\_l \mu(\cdot) \text{X} \tag{15}$$

$$
\dot{X} = \mu(\cdot)X - auX \tag{16}
$$

$$Q\_M = k\_m \mu(\cdot) X \tag{17}$$

where *X*, *S*, and *Q<sup>M</sup>* indicate the methanogenic biomass, the soluble organic substrate, and the methane outflow rate, respectively; *k<sup>t</sup>* and *k<sup>m</sup>* represent the yield coefficient related to substrate degradation and the yield coefficient of methane production, respectively; *u*, *a*, and *µ*(·) denote the dilution rate, the proportion of bacteria that are not attached to the support, and the growth rate of methane bacteria, respectively. Considering the limited online monitoring equipment available in the actual factory, the soft sensor only uses the methane outflow rate as an auxiliary variable to predict the VFA concentration under different working conditions and has high engineering practicability. However, the prediction accuracy of the soft sensor is generally not high when an observation model is established by simplifying the biochemical reaction and mass balance equations. *Strik* [75] employed a soft sensor based on the BP neural network to predict the content of ammonia in biogas. According to the kinetic model of anaerobic digestion, the calculation formula of related variables in biogas can be expressed as:

$$\mathcal{C}\_{N} = \frac{\mathcal{C}\_{TAN} \times 10^{\text{pH}}}{e^{\frac{6334}{273 + T}} + 10^{\text{pH}}} \times \left( 1 + \frac{10^{-\text{pH}}}{10^{-(0.09 + \frac{273}{T})}} \right)^{-1} \tag{18}$$

$$M\_t = M\_0 \cdot \left(1 - e^{-Kt}\right) \tag{19}$$

where *CN*, *CTAN*, *T*, pH, and *K* denote the ammonia content, the total inorganic nitrogen concentration, the reaction temperature of anaerobic digestion, the pH value of the collected sample, and the rate constant of methane production, respectively; *M*<sup>0</sup> and *M<sup>t</sup>* represent the methane production potential and the cumulative methane production at time t, respectively. As revealed from the model, pH, total inorganic nitrogen concentration, ammonium ion concentration, and temperature are the direct factors influencing ammonia content, and methane production is its indirect influence factor. Therefore, the ammonia content, ammonium ion concentration, total inorganic nitrogen concentration, nitrogen loading rate, pH, biogas production, and organic loading rate in the reactor are selected as the auxiliary variables of the model. However, the soft sensor lacks a real-time correction function. With the changes in actual working conditions and external interference factors, the prediction accuracy of the model will continue to decrease.

#### **4. The Latest Development of Anaerobic Digestion Soft Sensor**

The previous chapter introduced traditional anaerobic digestion soft sensors, reflecting the mapping relationship between auxiliary variables and target parameters to a certain extent. The characteristics of traditional soft sensors are summarized in Table 1. However, soft sensors still face many challenges in practical applications. For example:



**Table 1.** Advantages and disadvantages of traditional soft sensors.

In this chapter, the latest developments in anaerobic digestion soft sensors are introduced in detail. Furthermore, suitable solutions have been proposed regarding the obstacles encountered by traditional soft sensors in the industrial production process.

#### *4.1. Soft Sensors for Extracting Deep Features*

The deep belief network (DBN) achieves the approximation of complex functions through unsupervised layer-by-layer pre-training and supervised backpropagation finetuning [98,99]. In the process of unsupervised pre-training, the auxiliary variables are subjected to nonlinear mapping through the stacked restricted Boltzmann machine to extract the abstract features of the training samples. In the process of supervised backpropagation fine-tuning, the weights are fine-tuning through the backpropagation of the supervised signal to realize the further adjustment and optimization of the weights of the network.

To overcome the dependence of the traditional anaerobic digestion soft sensor on the features selection, Li proposed a soft sensor based on a deep belief network to predict the concentration of VFA for the anaerobic digestion process [100]. The structure diagram is illustrated in Figure 2.

**Figure 2.** Deep belief network structure [100]. **Figure 2.** Deep belief network structure [100].

where ℎ−1(

, ̂ ) +

The gradient descent algorithm cannot effectively train the deep network. Therefore, the contrast divergence (CD) algorithm is adopted to update the weights of the restricted Boltzmann machine, layer by layer: The gradient descent algorithm cannot effectively train the deep network. Therefore, the contrast divergence (CD) algorithm is adopted to update the weights of the restricted Boltzmann machine, layer by layer:

$$\begin{cases} w\_{ij}^R = w\_{ij}^R + \eta \left( v\_i^{(t-1)} h\_j^{(t-1)} - v\_i^{(t)} h\_j^{(t)} \right) \\\ b\_j = b\_j + \eta \left( h\_j^{(t-1)} - h\_j^{(t)} \right) \\\ a\_i = a\_i + \eta \left( v\_i^{(t-1)} - v\_i^{(t)} \right) \end{cases} \tag{20}$$

den layer, represents the learning rate, and *w* and *b* denote the weights and biases of the network, respectively. The soft sensor, with excellent feature learning capabilities, can effectively learn the essential features from the training samples and address the defects of excessive dependence on prior knowledge in feature selection. However, the random setting of the weights of DBN's output layer increases the ranwhere *v* denotes the state vector of the visible layer, *h* refers to the state vector of the hidden layer, *η* represents the learning rate, and *w* and *b* denote the weights and biases of the network, respectively. The soft sensor, with excellent feature learning capabilities, can effectively learn the essential features from the training samples and address the defects of excessive dependence on prior knowledge in feature selection.

domness of the model's prediction performance. To further improve the stability of prediction performance and generalization performance, Li proposed to adopt the extreme learning machine (ELM) algorithm after the weights of the first *n*-1 layers were obtained using the CD algorithm to determine the weights of the output layer, and establish a soft sensor based on an improved deep belief network (IDBN) to predict the VFA concentration. IDBN structure diagram is presented in Figure 3. = ℎ−1( , ̂ ) + (21) However, the random setting of the weights of DBN's output layer increases the randomness of the model's prediction performance. To further improve the stability of prediction performance and generalization performance, Li proposed to adopt the extreme learning machine (ELM) algorithm after the weights of the first *n*-1 layers were obtained using the CD algorithm to determine the weights of the output layer, and establish a soft sensor based on an improved deep belief network (IDBN) to predict the VFA concentration. IDBN structure diagram is presented in Figure 3.

$$\beta = h\_{n-1} \left( w\_i, \hat{b}\_i \right)^+ y \tag{21}$$

and generalization performance in the experimental. However, the unsupervised layerby-layer training process based on the CD algorithm requires a lot of iterative calculations, and the training process does not consider the mapping relationship between auxiliary variables and target variables. Therefore, Wang proposed a soft sensor based on the stacked supervised autoencoder combined with the kernel extreme learning machine (SSAE-KELM) algorithm to predict the VFA concentration [101]. The structure of SSAE-KELM is shown in Figure 4. where *hn*−<sup>1</sup> *wi* , ˆ*bi* <sup>+</sup> indicates the output of the hidden layer of the *n*-1 layer, *β* represents the weights of the output layer, and *y* denotes the VFA concentration. Compared with the soft sensor based on DBN, the improved soft sensor has preferable prediction accuracy and generalization performance in the experimental. However, the unsupervised layer-bylayer training process based on the CD algorithm requires a lot of iterative calculations, and the training process does not consider the mapping relationship between auxiliary variables and target variables. Therefore, Wang proposed a soft sensor based on the stacked supervised autoencoder combined with the kernel extreme learning machine (SSAE-KELM) algorithm to predict the VFA concentration [101]. The structure of SSAE-KELM is shown in Figure 4.

soft sensor based on DBN, the improved soft sensor has preferable prediction accuracy

**Figure 3.** Improved deep belief network structure [100]. **Figure 3.** Improved deep belief network structure [100]. **Figure 3.** Improved deep belief network structure [100].

**Figure 4.** The stacked supervised autoencoder combined with the kernel extreme learning machine structure [101]. **Figure 4.** The stacked supervised autoencoder combined with the kernel extreme learning machine structure [101]. **Figure 4.** The stacked supervised autoencoder combined with the kernel extreme learning machine structure [101].

For the soft sensor, the ELM algorithm is employed to train supervised autoencoders (SAE), and the deep features of auxiliary variables are extracted through stacked SAE. The loss function of the training process is expressed as: For the soft sensor, the ELM algorithm is employed to train supervised autoencoders (SAE), and the deep features of auxiliary variables are extracted through stacked SAE. The loss function of the training process is expressed as: For the soft sensor, the ELM algorithm is employed to train supervised autoencoders (SAE), and the deep features of auxiliary variables are extracted through stacked SAE. The loss function of the training process is expressed as:

$$Loss = \frac{\mathbb{C}\_1}{2} \|X - Hr\_1\|\_2^2 + \frac{\mathbb{C}\_2}{2} \|Y - Hr\_2\|\_2^2 + \frac{1}{2} \|r\|\_2^2 \tag{22}$$

 = [( − +1 ) −1 ] =[(்−ାଵ)ିଵ்]் (23) where refers to the auxiliary variables; represents the VFA concentration; de-By minimizing the loss function, the output weight is obtained:

$$\mathbf{r} = \left[\mathbf{C}H^T\mathbf{H} - \mathbf{I}\_{\mathbf{m}+1}{}^{-1}H^T\mathbf{Y}\right]^T \tag{23}$$

(23)

kernel extreme learning machine is adopted to establish a regression model to predict the VFA concentration on the extracted deep abstract features. Compared with soft sensors based on IDBN, the soft sensor introduces supervised items by improving the loss function. As a result, the soft sensor can extract the deep features of the auxiliary variable while kernel extreme learning machine is adopted to establish a regression model to predict the VFA concentration on the extracted deep abstract features. Compared with soft sensors based on IDBN, the soft sensor introduces supervised items by improving the loss function. As a result, the soft sensor can extract the deep features of the auxiliary variable while where *X* refers to the auxiliary variables; *Y* represents the VFA concentration; *H* denotes the hidden layer output; *r*<sup>1</sup> and *r*<sup>2</sup> indicate the hidden layer weights and the supervised item weights, respectively; *C*<sup>1</sup> and *C*<sup>2</sup> are the weight coefficients. Finally, the kernel extreme learning machine is adopted to establish a regression model to predict the VFA concentra-

where refers to the auxiliary variables; represents the VFA concentration; de-

=

=

tion on the extracted deep abstract features. Compared with soft sensors based on IDBN, the soft sensor introduces supervised items by improving the loss function. As a result, the soft sensor can extract the deep features of the auxiliary variable while considering the mapping relationship between the auxiliary variable and the VFA concentration. Then, it can extract the essential features that have a greater impact on the VFA concentration. Moreover, the ELM algorithm is used to compensate for the shortcomings of the slow training speed of the traditional CD algorithm and improve the training efficiency of the model. *Processes* **2021**, *9*, x FOR PEER REVIEW 12 of 21 considering the mapping relationship between the auxiliary variable and the VFA concentration. Then, it can extract the essential features that have a greater impact on the VFA concentration. Moreover, the ELM algorithm is used to compensate for the shortcomings of the slow training speed of the traditional CD algorithm and improve the training efficiency of the model.

#### *4.2. Soft Sensors for Extracting Information from Unlabeled Samples 4.2. Soft Sensors for Extracting Information from Unlabeled Samples*

In the anaerobic digestion process, the long period and high cost of target variable collection make it difficult for soft sensors to obtain sufficient labeled samples [102]. However, there are many unlabeled samples composed of process variables in the industrial process. With the semi-supervised learning mechanism, the information of unlabeled samples can be fully mined, and the prediction accuracy and generalization ability of soft sensors are improved. In recent years, semi-supervised learning mechanisms have been widely used in deep neural networks. Therefore, Yan proposed a soft sensor based on the semi-supervised hierarchical extreme learning machine to predict VFA concentration in the anaerobic digestion process [103]. The model structure of the semi-supervised hierarchical extreme learning machine is illustrated in Figure 5. In the anaerobic digestion process, the long period and high cost of target variable collection make it difficult for soft sensors to obtain sufficient labeled samples [102]. However, there are many unlabeled samples composed of process variables in the industrial process. With the semi-supervised learning mechanism, the information of unlabeled samples can be fully mined, and the prediction accuracy and generalization ability of soft sensors are improved. In recent years, semi-supervised learning mechanisms have been widely used in deep neural networks. Therefore, Yan proposed a soft sensor based on the semi-supervised hierarchical extreme learning machine to predict VFA concentration in the anaerobic digestion process [103]. The model structure of the semi-supervised hierarchical extreme learning machine is illustrated in Figure 5.

**Figure 5.** Semi-supervised hierarchical extreme learning machine structure [103]. **Figure 5.** Semi-supervised hierarchical extreme learning machine structure [103].

Hierarchical extreme learning machine (HELM) is a multi-layer feedforward neural network composed of a multi-layer extreme learning machine-autoencoder (ELM-AE). During the training process, ELM-AE can achieve the lossless reconstruction of auxiliary variables. Therefore, the combined feature information of auxiliary variables can be extracted to a certain extent when the number of neurons in the hidden layer of ELM-AE is less than the number of neurons in the input layer [104]. The reconstruction loss function of ELM-AE is expressed as: Hierarchical extreme learning machine (HELM) is a multi-layer feedforward neural network composed of a multi-layer extreme learning machine-autoencoder (ELM-AE). During the training process, ELM-AE can achieve the lossless reconstruction of auxiliary variables. Therefore, the combined feature information of auxiliary variables can be extracted to a certain extent when the number of neurons in the hidden layer of ELM-AE is less than the number of neurons in the input layer [104]. The reconstruction loss function of ELM-AE is expressed as:

where indicates the weight of the output layer of ELM-AE; is the weight factor; denotes the output of the hidden layer; and represent auxiliary variables and VFA concentration, respectively. Manifold regularization is used as a semi-supervised learning

 = ( + 1 ) −1 

$$Loss = \min \frac{1}{2} \left\| \gamma \right\|^2 + \frac{\mathcal{C}}{2} \left\| Y - J\gamma \right\|^2 \tag{24}$$

(25)

The reconstruction loss function is minimized to obtain the output weight.

$$\gamma = \left(J^T J + \frac{1}{\mathcal{C}} I\_{\mathbb{R}}\right)^{-1} J^T X \tag{25}$$

where *γ* indicates the weight of the output layer of ELM-AE; *C* is the weight factor; *J* denotes the output of the hidden layer; *X* and *Y* represent auxiliary variables and VFA concentration, respectively. Manifold regularization is used as a semi-supervised learning mechanism to learn the distribution of unlabeled samples. It can preserve the manifold domain relationship between the data vectors in the original space. The essential idea of manifold regularization is to keep the local geometric structure of the original feature space in the new projection space. The loss function of HELM that introduces the manifold regularization term is:

$$\text{Loss} = \min \frac{1}{2} \left\| \gamma \right\|^2 + \frac{\mathsf{C}}{2} \left\| Y - J\gamma \right\|^2 + \frac{\lambda}{2} \text{Tr} \left( \hat{Y}^T L \hat{Y} \right) \tag{26}$$

The loss function is minimized to acquire the output weight.

$$\sigma = \left(I\_{\text{lt}} + \mathbf{C}f^T I + \lambda H^T L H\right)^{-1} \mathbf{C}f^T \tag{27}$$

where *γ* indicates the output layer weight of HELM; *λ* is the weight factor; *Tr*(·) represents the trace of the matrix; *L* refers to the graph Laplacian matrix; *H* and *Y*ˆ denote the hidden layer output and prediction output of all samples, respectively. Compared with traditional soft sensors, soft sensors based on a semi-supervised learning mechanism can learn both unlabeled sample information and label sample information. The semi-supervised learning mechanism can make full use of many unlabeled samples in the industrial process, contributing to the improvement of the prediction accuracy and generalization ability of soft sensors.

#### *4.3. Soft Sensors for Extracting Dynamic Information*

In the industrial production process of anaerobic digestion, changes in operating tasks, production materials, and production environment would cause changes in system operating conditions, making the prediction accuracy of soft sensors gradually decrease over time. Moreover, the different start-up times of the methane tank could lead to large differences in the digestion degree, substrate concentration, and biological activity, leading to inconsistent data distribution in the original data set. To handle this complication, Wang proposed to use the domain space transfer extreme learning machine (DSTELM) algorithm to adjust the data distribution [103]. The reconstruction loss function of DSTELM is:

$$loss = \frac{1}{2} \|r\|^2 + \frac{c}{2} \|\mathbf{X}\_T - H\_T r\|^2 + \frac{\lambda}{2} Tr\left[r^T H^T M H r\right] \tag{28}$$

where *c* and *λ* are weighting factors; *r* denotes the output weight; *X<sup>T</sup>* represents the auxiliary variables of the test set; *H* = [*HS*; *HT*] indicates the output of the hidden layer; *Tr*(·) refers to the trace of the matrix. The M is defined as:

$$M = \begin{cases} \frac{1}{n\_S^2} \ i, j \le n\_S \\\frac{1}{n\_T^2} \ i, j > n\_S \\\frac{1}{n\_S n\_T} other \end{cases} \tag{29}$$

The loss function is minimized to obtain the output weight.

$$\boldsymbol{\sigma} = \left(\boldsymbol{I}\_{L} + \boldsymbol{H}^{T}(\mathbf{C} + \lambda\boldsymbol{M})\boldsymbol{H}\right)^{-1} \boldsymbol{H}^{T}\mathbf{C}\boldsymbol{X}\_{T} \tag{30}$$

where *C* = *diag*(0*nS*×*n<sup>S</sup>* , *c*, *c*, . . . , *c*). The algorithm can minimize the distribution distance between the training set and the test set while retaining the essential characteristics of the test set. Moreover, it can address the problem of low model prediction accuracy caused by the inconsistent data distribution of the training set and the test set. Furthermore, a soft sensor based on the domain space migration hierarchical extreme learning machine (DSTHELM) is established by stacking DSTELM to extract the deep features of auxiliary variables. Compared with traditional soft sensors, soft sensors based on DSTHELM can better adapt to modal changes and data drift and thus present higher prediction accuracy and generalization ability.

Additionally, the hydrolysis reaction process is slow in the anaerobic digestion process, resulting in a certain time lag between the real-time monitoring variables of the acidgenerating tank and the real-time monitoring variables of the methane generating tank. This suggests that the target variable is affected by the auxiliary variable in the current state, the changes in the operating conditions, and production conditions at the last moment, as well as the target variable in the current state. Therefore, *Mccormick* proposed a dynamic soft sensor based on long short-term memory (LSTM) network to predict biogas yield [105]. The LSTM structure is exhibited in Figure 6. *Processes* **2021**, *9*, x FOR PEER REVIEW 14 of 21 state, the changes in the operating conditions, and production conditions at the last moment, as well as the target variable in the current state. Therefore, *Mccormick* proposed a dynamic soft sensor based on long short-term memory (LSTM) network to predict biogas yield [105]. The LSTM structure is exhibited in Figure 6.

**Figure 6.** Long short-term memory network structure [105]. **Figure 6.** Long short-term memory network structure [105].

where

, , ℎ

In the training process, the soft sensor realizes the retention or deletion of current information and historical information through the gate control unit. The input gate determines the extent to which the current input is retained to the current state. The forget gate determines the extent to which the state at the previous moment is retained to the current state. The output gate determines the extent to which the current state is retained to the output. The specific formulas are In the training process, the soft sensor realizes the retention or deletion of current information and historical information through the gate control unit. The input gate determines the extent to which the current input is retained to the current state. The forget gate determines the extent to which the state at the previous moment is retained to the current state. The output gate determines the extent to which the current state is retained to the output. The specific formulas are

$$\dot{a}\_{t} = \sigma(w\_{i} \cdot [h\_{t-1}, x\_{t}] + b\_{i}) \tag{31}$$

$$f\_t = \sigma\left(w\_f \cdot [h\_{t-1}, \mathbf{x}\_t] + b\_f\right) \tag{32}$$

$$h\_t = \sigma(w\_0 \cdot [h\_{t-1}, \mathbf{x}\_t] + b\_0) \tag{33}$$

teristics of the auxiliary variable at different times. Meanwhile, the soft sensor can retain historical biogas yield and its main influencing factors as auxiliary variables for current biogas yield forecasting, realizing the persistence of historical information. The dynamic soft sensor considers the influence of historical data on the current state and overcomes the defect that the traditional soft sensor neglects the time scale inforwhere *i<sup>t</sup>* , *f<sup>t</sup>* , *h<sup>t</sup>* , and *σ* represent the input gate, the forget gate, the output gate, and the sigmoid activation function, respectively. The soft sensor can extract the different characteristics of the auxiliary variable at different times. Meanwhile, the soft sensor can retain historical biogas yield and its main influencing factors as auxiliary variables for current biogas yield forecasting, realizing the persistence of historical information.

mation. Therefore, the dynamic soft sensor, to a certain extent, addresses the time lag caused by the slow reaction of the anaerobic digestion process. Furthermore, a dynamic soft sensor based on a combined convolutional neural network and long short-term memory network is established using the deep feature extraction ability of the convolutional neural network and the dynamic information extraction ability of LSTM to predict The dynamic soft sensor considers the influence of historical data on the current state and overcomes the defect that the traditional soft sensor neglects the time scale information. Therefore, the dynamic soft sensor, to a certain extent, addresses the time lag caused by the slow reaction of the anaerobic digestion process. Furthermore, a dynamic soft sensor based on a combined convolutional neural network and long short-term memory

biogas yield. It can effectively extract the deep features of the data while using LSTM for timing error compensation. Thus, dynamic correction of the model is realized, and the

In recent years, the graph convolutional network (GCN) has been widely used, owing to its powerful feature representation ability [106]. GCN can reduce the complexity of the soft sensor through the parameter sharing of the convolution kernel in the local area. Moreover, the adjacency matrix of the GCN enables the soft sensor to quantify the mutual influence between auxiliary variables, that is, considering the degree of influence of surrounding nodes on the target node and extracting the spatial information of the sample data. In the actual industry, the combined auxiliary variables are generally highly corre-

*4.4. Soft Sensors for Extracting Spatiotemporal Information*

network is established using the deep feature extraction ability of the convolutional neural network and the dynamic information extraction ability of LSTM to predict biogas yield. It can effectively extract the deep features of the data while using LSTM for timing error compensation. Thus, dynamic correction of the model is realized, and the prediction accuracy and generalization ability of the model are further improved.

#### *4.4. Soft Sensors for Extracting Spatiotemporal Information*

In recent years, the graph convolutional network (GCN) has been widely used, owing to its powerful feature representation ability [106]. GCN can reduce the complexity of the soft sensor through the parameter sharing of the convolution kernel in the local area. Moreover, the adjacency matrix of the GCN enables the soft sensor to quantify the mutual influence between auxiliary variables, that is, considering the degree of influence of surrounding nodes on the target node and extracting the spatial information of the sample data. In the actual industry, the combined auxiliary variables are generally highly correlated with the target variable while the single auxiliary variable often has a weak correlation with the target variable. Therefore, researchers proposed a soft sensor based on GCN to predict VFA concentration [107]. The GCN structure is exhibited in Figure 7. *Processes* **2021**, *9*, x FOR PEER REVIEW 15 of 21 lated with the target variable while the single auxiliary variable often has a weak correlation with the target variable. Therefore, researchers proposed a soft sensor based on GCN to predict VFA concentration [107]. The GCN structure is exhibited in Figure 7.

**Figure 7.** Graph convolutional network structure [107]. **Figure 7.** Graph convolutional network structure [107].

The output of the soft sensor can be expressed as: The output of the soft sensor can be expressed as:

$$Y = f\left(\hat{A}XW\right) \tag{34}$$

resents the nonlinear activation function; ̂ is the normalized adjacency matrix; denotes the learnable convolution kernel parameter. A proper adjacency matrix can be adopted to effectively extract the spatial information between auxiliary variables and improve the prediction accuracy and generalization ability of the soft sensor. Since the maximal information coefficient (MIC) can calculate the correlation between auxiliary variables, the normalized MIC is used to construct the adjacency matrix. = ( , ) (35) where *X* indicates the auxiliary variable; *Y* refers to the output of the soft sensor; *f* represents the nonlinear activation function; *A*ˆ is the normalized adjacency matrix; *W* denotes the learnable convolution kernel parameter. A proper adjacency matrix can be adopted to effectively extract the spatial information between auxiliary variables and improve the prediction accuracy and generalization ability of the soft sensor. Since the maximal information coefficient (MIC) can calculate the correlation between auxiliary variables, the normalized MIC is used to construct the adjacency matrix.

$$\mathfrak{m}\_{\mathrm{ij}} = \mathrm{MIC} \left( \mathfrak{x}\_{i\nu} \mathfrak{x}\_{\mathrm{j}} \right) \tag{35}$$

$$a\_{ij} = \frac{\exp(m\_{ij})}{\sum\_{k \in N\_l} \exp(m\_{ik})} \tag{36}$$

information of the auxiliary variable by fully considering the influence of the combined feature information on the VFA concentration. Given the dynamic characteristics and time lag characteristics of the anaerobic digestion process, a dynamic soft sensor based on the spatiotemporal graph convolutional network (STGCN) is established by introducing a gated recurrent unit (GRU). GRU can learn where *mij* represents the MIC between auxiliary variables *i* and *j*; *αij* denotes the normalized MIC between auxiliary variables *i* and *j*; *so f tmax* indicates the normalization function. Compared with the traditional soft sensor, the soft sensor can learn the spatial information of the auxiliary variable by fully considering the influence of the combined feature information on the VFA concentration.

the dynamic changes of sample data to capture time information and consider the impact of historical sample information on current sample information. Therefore, this soft sensor

Given the dynamic characteristics and time lag characteristics of the anaerobic digestion process, a dynamic soft sensor based on the spatiotemporal graph convolutional network (STGCN) is established by introducing a gated recurrent unit (GRU). GRU can learn the dynamic changes of sample data to capture time information and consider the impact of historical sample information on current sample information. Therefore, this soft sensor can simultaneously consider the time information and spatial information of the anaerobic digestion process data. The structure of STGCN is presented in Figure 8. *Processes* **2021**, *9*, x FOR PEER REVIEW 16 of 21

**Figure 8.** Spatiotemporal graph convolutional network structure [107]. **Figure 8.** Spatiotemporal graph convolutional network structure [107].

During the training process, the STGCN can better handle the spatial and temporal characteristics of samples. The combined feature information of the sample is extracted using GCN to obtain its spatial dependence. Moreover, GRU is used to capture the dynamic change information of historical information and obtain temporal dependence. The specific calculation formulas are: During the training process, the STGCN can better handle the spatial and temporal characteristics of samples. The combined feature information of the sample is extracted using GCN to obtain its spatial dependence. Moreover, GRU is used to capture the dynamic change information of historical information and obtain temporal dependence. The specific calculation formulas are:

$$r\_t = \sigma(\mathcal{W}\_r[h\_{t-1}, f(\mathcal{X}\_{t\prime}A)])\tag{37}$$

$$z\_t = \sigma(\mathcal{W}\_z[h\_{t-1}, f(X\_t, A)])\tag{38}$$

$$\tilde{h}\_t = \tanh\left(\mathcal{W}\_{\tilde{h}}[r\_t \odot h\_{t-1}, f(X\_{t\prime}A)]\right) \tag{39}$$

$$h\_t = (1 - z\_t) \odot h\_{t-1} + z\_t \odot \tilde{h}\_t \tag{40}$$

denotes the reset gate; represents the update gate; ℎ refers to the state of the hidden layer; is the activation function; ⊙ represents the Hadamard product. Compared with the traditional soft sensor, the dynamic soft sensor based on STGCN can effectively extract the time information and spatial information from the anaerobic digestion process data, contributing to the achievement of the accurate prediction of the current VFA concentration. where *A* is the adjacency matrix; *f*(*X<sup>t</sup>* , *A*) represents the graph convolution process; *r<sup>t</sup>* denotes the reset gate; *z<sup>t</sup>* represents the update gate; *h* refers to the state of the hidden layer; *σ* is the activation function;  represents the Hadamard product. Compared with the traditional soft sensor, the dynamic soft sensor based on STGCN can effectively extract the time information and spatial information from the anaerobic digestion process data, contributing to the achievement of the accurate prediction of the current VFA concentration.

#### **5. Conclusions 5. Conclusions**

The anaerobic digestion process is a time-varying, non-linear, and highly complex system with constraints. It is difficult to establish an accurate mechanism model to describe the anaerobic digestion process. The soft sensor based on regression analysis is more suitable for handling linear problems. However, there are strong nonlinear characteristics in the anaerobic digestion process. Soft sensors based on artificial neural networks are significantly affected by the network topology and the quality of training samples. They are prone to a local optimal or over-fitting state. Moreover, their generalization ability is weak. The soft sensor based on statistical learning is not suitable for processing largescale data and is unable to monitor the anaerobic digestion process in real-time with high precision. However, soft sensors based on deep learning can learn essential features from The anaerobic digestion process is a time-varying, non-linear, and highly complex system with constraints. It is difficult to establish an accurate mechanism model to describe the anaerobic digestion process. The soft sensor based on regression analysis is more suitable for handling linear problems. However, there are strong nonlinear characteristics in the anaerobic digestion process. Soft sensors based on artificial neural networks are significantly affected by the network topology and the quality of training samples. They are prone to a local optimal or over-fitting state. Moreover, their generalization ability is weak. The soft sensor based on statistical learning is not suitable for processing large-scale data and is unable to monitor the anaerobic digestion process in real-time with high precision. However, soft sensors based on deep learning can learn essential features from training

training samples, introduce a semi-supervised learning mechanism to fully use unlabeled sample information, consider the dynamic characteristics in actual working conditions

deep learning has higher prediction accuracy and generalization ability. The general idea

of this paper is illustrated in Figure 9.

samples, introduce a semi-supervised learning mechanism to fully use unlabeled sample information, consider the dynamic characteristics in actual working conditions and the mutual mapping relationship between auxiliary variables, and extract the time information and space information of the sample data. Therefore, the soft sensor based on deep learning has higher prediction accuracy and generalization ability. The general idea of this paper is illustrated in Figure 9. *Processes* **2021**, *9*, x FOR PEER REVIEW 17 of 21

**Figure 9.** The general idea of this paper. **Figure 9.** The general idea of this paper.

At present, a soft sensor for anaerobic digestion based on deep learning can be further developed. In the industrial production process, the mechanism model is combined with deep learning to enhance the interpretability of the soft sensor and realize the closed-loop guidance of the industrial process. Furthermore, the difficulty of sample collection during anaerobic digestion hinders researchers to obtain enough samples to train soft sensors. Therefore, constructing generated samples by the generative adversarial network is an effective solution for the shortage of soft sensor training samples. At present, a soft sensor for anaerobic digestion based on deep learning can be further developed. In the industrial production process, the mechanism model is combined with deep learning to enhance the interpretability of the soft sensor and realize the closed-loop guidance of the industrial process. Furthermore, the difficulty of sample collection during anaerobic digestion hinders researchers to obtain enough samples to train soft sensors. Therefore, constructing generated samples by the generative adversarial network is an effective solution for the shortage of soft sensor training samples.

**Author Contributions:** Conceptualization, P.Y. and Y.W.; methodology, P.Y. and Y.W.; software, P.Y.; validation, P.Y., Y.W., M.G. and X.G.; formal analysis, M.G.; investigation, Y.W.; resources, Y.W.; data curation, P.Y.; writing—original draft preparation, P.Y.; writing—review and editing, P.Y.; visualization, M.G. and X.G.; supervision, Y.W.; project administration, Y.W.; funding acquisition, X.G. All authors have read and agreed to the published version of the manuscript. **Author Contributions:** Conceptualization, P.Y. and Y.W.; methodology, P.Y. and Y.W.; software, P.Y.; validation, P.Y., Y.W., M.G. and X.G.; formal analysis, M.G.; investigation, Y.W.; resources, Y.W.; data curation, P.Y.; writing—original draft preparation, P.Y.; writing—review and editing, P.Y.; visualization, M.G. and X.G.; supervision, Y.W.; project administration, Y.W.; funding acquisition, X.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (No. 21706282), National Key R&D Program of China (No. 2016YFC0303703), and the Research Foundation of China University of Petroleum (Beijing) (No. 2462020BJRC004, No. 2462020YXZZ023). **Funding:** This research was funded by the National Natural Science Foundation of China (No. 21706282), National Key R&D Program of China (No. 2016YFC0303703), and the Research Foundation of China University of Petroleum (Beijing) (No. 2462020BJRC004, No. 2462020YXZZ023).

**Institutional Review Board Statement:** Not applicable. **Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable. **Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. **Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


6. Franke-Whittle, I.H.; Walter, A.; Ebner, C.; Insam, H. Investigation into the effect of high concentrations of volatile fatty acids in anaerobic digestion on methanogenic communities. *Waste Manag.* **2014**, *34*, 2080–2089, doi:10.1016/j.wasman.2014.07.020.

on PCA-LSSVM algorithm. *Env. Sci. Pollut. Res. Int.* **2019**, *26*, 12828–12841, doi:10.1007/s11356–019–04671–8.

3. Liu, Z.-j.; Wan, J.-q.; Ma, Y.-w.; Wang, Y. Online prediction of effluent COD in the anaerobic wastewater treatment system based

