A Review Unveiling Various Machine Learning Algorithms Adopted for Biohydrogen Productions from Microalgae

Ahmad Sobri, Mohamad Zulfadhli; Redhwan, Alya; Ameen, Fuad; Lim, Jun Wei; Liew, Chin Seng; Mong, Guo Ren; Daud, Hanita; Sokkalingam, Rajalingam; Ho, Chii-Dong; Usman, Anwar; Nagaraju, D. H.; Rao, Pasupuleti Visweswara

doi:10.3390/fermentation9030243

Open AccessReview

A Review Unveiling Various Machine Learning Algorithms Adopted for Biohydrogen Productions from Microalgae

by

Mohamad Zulfadhli Ahmad Sobri

¹,

Alya Redhwan

²,

Fuad Ameen

³

,

Jun Wei Lim

^1,4,*

,

Chin Seng Liew

^1,*,

Guo Ren Mong

⁵

,

Hanita Daud

⁶,

Rajalingam Sokkalingam

⁶,

Chii-Dong Ho

⁷

,

Anwar Usman

⁸

,

D. H. Nagaraju

⁹

and

Pasupuleti Visweswara Rao

^10,11

¹

HICoE—Centre for Biofuel and Biochemical Research, Department of Fundamental and Applied Sciences, Institute of Self-Sustainable Building, Universiti Teknologi Petronas, Seri Iskandar 32610, Perak Darul Ridzuan, Malaysia

²

Department of Health, College of Health and Rehabilitation Sciences, Princess Nourah bint Abdulrahman University, Riyadh 1167, Saudi Arabia

³

Department of Botany and Microbiology, College of Science, King Saud University, Riyadh 11451, Saudi Arabia

⁴

Department of Biotechnology, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai 602105, India

⁵

School of Energy and Chemical Engineering, Xiamen University Malaysia, Sepang 43900, Selangor, Malaysia

⁶

Mathematical and Statistical Science, Department of Fundamental and Applied Sciences, Institute of Autonomous System, Universiti Teknologi Petronas, Seri Iskandar 32610, Perak Darul Ridzuan, Malaysia

⁷

Department of Chemical and Materials Engineering, Tamkang University, New Taipei 251, Taiwan

⁸

Department of Chemistry, Faculty of Science, Universiti Brunei Darussalam, Gadong BE1410, Brunei

⁹

Department of Chemistry, School of Applied Sciences, REVA University, Bangalore 560064, India

¹⁰

Centre for International Relations and Research Collaborations, REVA University, Bangalore 560064, India

¹¹

Department of Biomedical Sciences, Faculty of Medicine & Health Sciences, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia

Show full affiliation list

Hide full affiliation list

^*

Authors to whom correspondence should be addressed.

Fermentation 2023, 9(3), 243; https://doi.org/10.3390/fermentation9030243

Submission received: 30 January 2023 / Revised: 26 February 2023 / Accepted: 28 February 2023 / Published: 2 March 2023

(This article belongs to the Section Industrial Fermentation)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Biohydrogen production from microalgae is a potential alternative energy source that is now intensively being researched. The complex natures of the biological processes involved have afflicted the accuracy of traditional modelling and optimization, besides being costly. Accordingly, machine learning algorithms have been employed to overcome setbacks, as these approaches have the capability to predict nonlinear interactions and handle multivariate data from microalgal biohydrogen studies. Thus, the review focuses on revealing the recent applications of machine learning techniques in microalgal biohydrogen production. The working principles of random forests, artificial neural networks, support vector machines, and regression algorithms are covered. The applications of these techniques are analyzed and compared for their effectiveness, advantages and disadvantages in the relationship studies, classification of results, and prediction of microalgal hydrogen production. These techniques have shown great performance despite limited data sets that are complex and nonlinear. However, the current techniques are still susceptible to overfitting, which could potentially reduce prediction performance. These could be potentially resolved or mitigated by comparing the methods, should the input data be limited.

Keywords:

machine learning; biohydrogen; microalgae; nonlinear interaction; prediction; overfitting

1. Introduction

Hydrogen that is produced from microalgae, either through photo-fermentation or dark fermentation, is known as microalgal hydrogen. It is a subset of biohydrogen, defined as hydrogen that is produced biologically from microorganisms using renewable biomass materials [1,2]. Microalgal hydrogen production has garnered considerable interest from academia as well as industry due to its potential as an alternative energy source. However, the nature of the complex biological processes and factors involved have made studies and process modelling very arduous. Researchers have recently employed machine learning (ML) to overcome this concern. ML is defined as building algorithms that can predict an outcome based on a statistical analysis of input data. The application of ML to studies can generate regression models that describe the relationship between independent variables and dependent variables [3,4]. These algorithms come in various forms, depending on their purposes and effectiveness.

Numerous studies have deployed ML to predict biohydrogen production. An example of this was a study that developed artificial neuron networks (ANN) to predict biohydrogen production based on dark fermentation time and volatile fatty acid production, which yielded highly accurate results (R² > 0.987) [5]. Next, the ANN approach was also used to model biohydrogen generation from biomass gasification based on biomass characteristics and operating conditions, where the results were in accordance with the input data (R² > 0.999, RMSE < 0.25) [6].

Besides, the use of ML in the field of microalgae has also been widely reported. For instance, ML was used to explore critical factors that affect algal biomass productivity and generate scenarios involving multiple combinations of these factors to yield very high biomass production [7]. Next, Bhola et al. (2017) managed to generate a fuzzy inference model that closely described the correlation between input parameters (peak biomass concentration, CO₂ uptake rate, and maximum relative electron transport rate) and the metabolic yields of Chlorella sp., achieving an R² > 0.985 [8].

Moving forward, the use of ML in microalgal hydrogen production has also sparked interest recently. For instance, Sharma et al. (2022) predicted microalgal hydrogen yield from microalgal biomass based on duration, sulfuric content, and biomass concentration [9]. They further stated that ML techniques are significantly advantageous over conventional methods such as response surface methodology for one variable at a time analysis (OVAT) [9]. Meanwhile, Salameh et al. (2022) even managed to use the ML model to optimize their microalgal biohydrogen production, resulting in an increase of 7% more biohydrogen as compared with the input parameters derived from RSM [10]. All these supported the idea and practicality of incorporating ML into a microalgal biohydrogen production study. While conventional methods such as OVAT could not take into consideration the interactive effect among variables, which are non-linear and complicated, ML algorithms are not constrained by this limitation, allowing for a better understanding of potential correlations among the variables. Considering the practicality, potential, and recent interest in using ML to predict microalgal biohydrogen production, this review paper will discuss the types of ML techniques available, along with comparative analyses of their effectiveness in fulfilling the specific research goals.

2. Types of Machine Learning

2.1. Artificial Neural Networks

ANNs are defined as information processing paradigms that are designed following the inspiration of how the human brain processes information. The complex networks are established from multiple simple processing units that function similarly to a neuron cell with three distinct layer categories, namely, the input, hidden, and output layers. The input neurons receive data through data files that are manually entered or in real-time from measuring instruments. The output layer sends out the information after the data has gone through one or multiple hidden layers, each composed of various interconnected neurons. The arrangement of layers and the connection between each processing unit are what make each ANN different from the other. There are two types of connections between neurons where the variable strength of input is either added or reduced before being output to the next neuron [11,12]. Figure 1 illustrates the structure of a neural network [13,14]. After the ANN model is established for a particular application, it requires training that involves determining the adjustable weights, akin to the process of determining the coefficients of a polynomial equation via regression. A common supervised training algorithm is known as the backpropagation neural network, where the calculated error between the outputs generated by the model and the actual results is reduced by adjusting the weights after the error is propagated backward through the network. This process is repeated until the error falls below a pre-established criterion [15,16].

2.2. Random Forest

The random forest (RF) algorithm combines individual decision trees and aggregates the results they produce by taking the average of the results. This is achieved by generating bootstrapped copies of the original data, where a single tree is grown by some form of randomization, and each tree is estimated in each bootstrap [17,18]. Bootstrapping is referred to as resampling with replacement, meaning that each bootstrapped copy has the same number of data points as the original. A decision tree is a hierarchically organized series of conditions where an instance of data is classified by following the path of satisfied conditions from the bottom (root) of the tree, passing through chains of nodes (branches), until it reaches an endpoint (leaf) that corresponds to a class label. Each node represents an attribute that may describe the particular data input [19,20]. Hence, a class label is only applied to the input after it has shown to fulfil all of its respective attributes. Combining multiple decision trees makes the RF algorithm an ensemble learning method and can be useful for large data sets. For any RF algorithm, the parameters that need to be established are the number of variables in the random subset at each node of the tree and the number of trees in the RF [14,21]. Having a sufficient number of trees within the RF allows for stable estimates of a variable’s importance that provide information on the extent to which each predictor increases or decreases model accuracy compared to the actual results obtained. Figure 2 illustrates an overview of the RF algorithm [22,23].

2.3. Support Vector Machines

Support vector machines (SVM) are designed for binary classification in a multidimensional space. The working principle of SVM involves the identification of a hyperplane, a boundary that separates outcome categories to their full extent [18]. SVM applies a data transformation to the sample data and projects it to a desired dimensional space that is higher via a kernel function. A kernel function is defined as a function that returns the inner product (dot product) between the images of two data points (x, x’) in the higher-dimensional space. ML then takes place in this space [24]. An example of a dot product between x_ij and x_ij’ can be mathematically shown below:

x_{ij} {. x'}_{ij} {= x}_{i} {x'}_{i} {+ x}_{j} {x'}_{j}

(1)

There are multiple kernel functions available depending on the data set, as it needs to have its dimensionality increased to obtain the hyperplane (Table 1) [25]. These kernel functions of two data points all aim to reach the target space T. Among these equations, Karatzoglou and Meyer (2006) stated that the Gaussian radial basis function (RBF) is the most suitable when there is no pre-existing knowledge available regarding a data set [24]. They also stated that the linear kernel function is beneficial for large and inadequate data points. The performance of SVM is based on the established regularization parameter C (box constraint) and the kernel parameter (scaling factor), which make up the hyperplane parameter. Having a high value of C will cause the SVM to create a complex prediction function to greatly reduce the misclassification of data points. In contrast, a low value of C will lead to a simple prediction function [24]. Training an SVM algorithm involves mapping the decision boundary for each outcome category and specifying the hyperplane that separates the categories. The algorithm will then attempt to find the optimal hyperplane that has the highest margin between classes, which is proportional to the classification accuracy [14]. Figure 3 shows a simple 2-D illustration of the SVM algorithm. Any SVM algorithm aims to find the maximum margin hyperplane, situated at the maximum margin between all possible positive and negative hyperplanes that can be defined, which will separate the support vectors into two distinct categories. Misclassifications occur when a data set is mapped onto the wrong side of the hyperplane, which is affected by the box constraint.

2.4. Regression

Another ML technique is regression analysis, a conventional method used to determine the correlation between a dependent variable and one (univariate) or multiple (multivariate) independent variables [26,27]. Since the nature of the correlation between variables exists, there are multiple types of regression being designed to cater to these relationships. These regression techniques all attempt to achieve the same objective, which is to illustrate the variable of interest as a mathematical function of independent variables that affect its value. The most straightforward type is the simple linear regression (SLR) method, which aims to fit the data into a straight line that can be expressed as follows:

y = mx + c

(2)

where y is the dependent variable, x is the independent variable, m is the slope from the established straight line, and c is the constant term of intercept. Data that can be fitted into this type of regression indicates that if the independent variable increases, the dependent variable increases in a linear fashion. Multiple linear regression (MLR) is similar to SLR in establishing a straight-line fit, with the caveat that there are multiple independent variables involved that each have a linear relationship with the dependent variable [27,28]. The MLR model with k independent variables can be written as:

{y = m}_{1} x_{1} {+ m}_{2} x_{2} {+ \dots + m}_{k} x_{k} + c

(3)

Furthermore, polynomial regression describes y as a function of x that is represented as a polynomial equation where x is raised to the power of n. It is considered a special case of MLR where the model can be expressed as:

{y = mx + m}_{2} x^{2} {+ \dots + m}_{n} x^{n} + c

(4)

where n is the polynomial degree [29,30]. A relationship between the outcome and its factors that can be fitted via polynomial regression is described as curvilinear. Last but not least, non-linear regression involves describing independent variables that affect the dependent variable in a manner that is not linear or straightforward. The application and study associated with non-linear regression have been gaining traction as living organisms’ population growth models are often expressed in non-linear equations [31]. This is due to the complex biological processes involving dynamic factors that drive growth. These can be observed in Table 2, which illustrates the equations, parameters, and definitions of growth models for a particular organism [32]. These equations can also be used with other biological growth models to perform non-linear regression, with the caveat that they need to be adjusted accordingly to consider the independent variables and their relationships. For instance, Wang et al. [33] uses the Gompertz equation modified for microalgal hydrogen production as shown below:

P (t) {= P}_{\max} \times \exp {- \exp [R_{\max} \times e \times \frac{L - t}{P_{\max}} + 1]}

(5)

where P(t) (L H₂/kg microalgae) is the cumulative microalgal hydrogen production at time t, P_max (L H₂/kg microalgae) is the microalgal hydrogen production potential, R_max (L H₂/kg microalgae/d) is the maximum microalgal hydrogen production rate, e is the base of natural logarithms which is equivalent to 2.718, L (d) is the microalgal hydrogen production lag time, where the microalgae have not begun the microalgal hydrogen production under anaerobic conditions, and t (d) is dark fermentation of microalgal hydrogen production experimental time [33]. A comparison can be drawn with the Gompertz equation in Table 2, where fundamental components such as the double exponent, the asymptotic growth limit (P_max), and the growth rate (R_max) are retained. The modification can be seen in the removal of the natural logarithm (ln) and initial value, in this case being the microalgal hydrogen production at t = 0, as it is nonexistent, followed by the addition of new variables and constants such as L and e, respectively. Despite the changes, the representation of the individual models will look similar, with different variables being described in the process.

3. Importance of Machine Learning in Biohydrogen Production

The usage of machine learning (ML) techniques has become more widespread in microalgal hydrogen production and studies relating to it in recent years. Sharma et al. (2022) reported that ML had recently demonstrated great potential as a data-driven method. This is due to the fact that ML algorithms can handle complex multivariate data, predict non-linear connections, and manage missing data [9]. The ML has now become a significant tool in microalgal hydrogen production studies since it is capable fo being adopted in various applications, which include studying the relationship between operation parameters and production outputs, classifying factors as being significantly impactful to the overall process, and predicting the produced microalgal hydrogen based on the set initial conditions.

3.1. Relationship Study

An essential step in optimizing microalgal hydrogen production is the modelling of its production system to study how certain parameters influence the overall process. Wang et al. [34] reported that various ANNs were utilized in correlating microalgal hydrogen productions and critical operating parameters. In the same article, Multilayer Perceptron ANN (MLPANN) was proposed as a modelling framework to illustrate the kinetics of microalgal hydrogen production from a dark fermentation process. The MLPANN is a type of ANN that contains more than one hidden layer to accommodate the complexity of the system. It had been reported that the MLPANN was able to reliably model the metabolites, including microalgal hydrogen, with limited experimental kinetic data [34]. Hosseinzadeh et al. [35] developed multiple ML algorithms to model microalgal hydrogen production from wastewater via a dark fermentation process that included RF and SVM. The relative importance of effective factors being inserted into each algorithm was studied via the permutation variable importance (PVI) procedure, which considered the errors from developed models in predicting the results with a random permutation of a particular input [35]. This procedure highlighted the degree of importance of each factor being inserted into a particular ML algorithm, leading to better clarity on its relationship with the overall production process. The PVI procedure indicated that ethanol was of significant importance as a factor in all of the proposed ML models for microalgal hydrogen production [35,36]. This was justified as ethanol, as a solvent, has bactericidal effects, which may negatively impact microalgal hydrogen production [37]. In anaerobic fermentation, hydrogen is formed by accepting electrons from the process. Ethanol is also capable of being an electron acceptor, implying that hydrogen production is reduced as there are fewer electrons available [38].

3.2. Classification of Results

Literature involving microalgal hydrogen production has become saturated over the years as its potential has been materialized by researchers and academia. However, ML algorithms can utilize data from the literature and analyze quantitative correlations between input data and obtained outputs. This is far superior to a traditional comparative analysis as it reduces the time required to analyze the data set from each study. An example of this was the development of ANNs integrated with statistical analysis using response surface methodology (RSM) to study the enhancement of microalgal hydrogen production via chemical addition [39]. It was concluded that, in the case of the addition of Fe-based nanoparticles, the nanoparticle size together with the concentration of nanoparticles added had been classified as statistically significant to the microalgal hydrogen yield, denoting that the optimal value was approached when the nanoparticle size ranged between 81 and 100 nm. However, the same parameter was also classified as statistically insignificant for the hydrogen evolution rate. The explanation given by the authors for this finding was that nanoparticle sizes ranging between 81 and 100 nm were more thermodynamically stable during fermentation as compared with smaller sizes. Monroy and Buitron [40] used the SVM method to diagnose the undesired scenarios in microalgal hydrogen production by photo-fermentation. Five classes were set up, each with a different set of optimum values for light intensities and pH, and 250 scenarios with varying operating conditions were classified by the SVM. The 100% and 55% diagnosis performances were attained for batches where the light intensity and pH values, respectively, deviated from an optimum operating range. The poor pH diagnosis was reportedly due to the photo-fermentation process being highly sensitive to pH changes [40].

3.3. Prediction of Microalgal Hydrogen Production

The most prominent use of ML across all literature is its use in microalgal hydrogen production to predict the outcome of a particular production system. Outputs such as hydrogen yield and hydrogen evolution rate have been extensively studied to determine the most optimal values for these outputs and the required variables to achieve them. Alalayah et al. [41] developed an ANN model that was able to predict microalgal hydrogen production through a dark fermentation process based on three inputs, namely, initial substrate concentration, initial medium pH, and temperature. The ANN model performed better than a traditional Box-Wilson Design (BWD) statistical model as it provided a higher level of accuracy with fewer errors [41]. Another ANN model was constructed using feed backward propagation in conjunction with a cross-out validation approach, which was able to predict the optimal hydrogen yield (3 H₂ mol/mol substrate) based on the optimal composition of glucose (14 g/L) and acetate (1.3 g/L). The MSE value of merely 1.193 suggests that the training outcome of the ANN was good [42]. Sharma et al. [43] developed a novel ML-based optimization approach to predict microalgal hydrogen yield from microalgal biomass based on duration, sulfuric content, and biomass concentration. The validation test for this prediction model indicated an acceptable error of merely 4.52%. This approach was capable of studying multiple factors simultaneously and specifying at which point the best output of microalgal hydrogen was achieved [9,43]. Last but not least, Salameh et al. [10] designed a variation of an ANN known as the Adaptive Network Fuzzy Inference System (ANFIS), capable of predicting the most optimal microalgal biohydrogen production based on operating parameters, namely, initial pH (9.0), N/C ratio (0.1862), xylose concentration (25 g/L), and operating temperature (36.12 °C). This study highlighted that the optimum value generated for microalgal hydrogen production was 200 mL/L higher than the value attained from ANOVA, demonstrating that ML can perform better in prediction studies [10].

4. Comparative Analyses among ML Techniques

For each of the ML techniques mentioned earlier, there are benefits and drawbacks to employing them, depending on the nature of the study being conducted in the field of biohydrogen production. A comparison can be made among the ML techniques in terms of their advantages and disadvantages in determining which scenarios are most suitable for each technique.

The ANN is capable of managing and modelling complex interactions among components of a system. Flexibility is also an additional benefit, which allows for adaptation to new information that may change over time [12]. This makes ANN a strong technique in studies involving microalgal hydrogen, which involve complex interactive processes. Despite these benefits, there are also a few disadvantages. Hossain et al. [44] stated that ANN required a large amount of training data in order to operate, which was time-consuming and costly to provide. Another significant setback is that they are unable to predict outputs based on inputs that are beyond the training data space [44]. This implies that the performance of the ANN is mostly based on the training data provided to the network. Based on these attributes, it can be inferred that ANN is most suitable for studies that already have a lot of training data available, whether from literature or manually attained from research works. ANN has been used in a variety of applications, indicating that it is a versatile technique in microalgal hydrogen research.

The RF algorithms share a similar strength with ANN in the aspect that they are able to estimate results that are derived from complex functions of predictors with many interactions. In addition to this, RF has the distinctive strength of being suitable for multivariate data sets that have a large number of predictors and a small number of observations [45]. This implies that in scenarios where training data is limited, the RF will outperform the ANN in terms of predicting the outputs of a particular system. A common issue suffered by ML techniques is overfitting, where a model fits exactly against its training data, making it unable to predict future observations reliably. RF has a built-in safeguard against this phenomenon by using part of the data that each decision tree in the forest has not observed to calculate its goodness-of-fit [46,47]. This attribute was highlighted in the work of Hosseinzadeh et al. (2022), where the mean squared error (MSE) attained in the training and validation phases had approximately experienced a decreasing trend, showing that there was no overfitting in the constructed RF model [37]. On the other hand, the development of an RF model can be computationally intensive. Furthermore, if the predictors within the data set are correlated, the PVI procedure may be biased [45]. In conclusion, RF algorithms are most suitable in microalgal hydrogen studies that have limited observations, provided that the computational strength is available to execute the ML technique.

The SVM is often implemented in microalgal hydrogen studies as a regression model, known as support vector regression (SVR). The working principle of SVR is similar, classifying data sets via hyperplanes. A major advantage of SVR is that it allows for the setting of tolerable errors in the model [48]. This is achieved using the box constraint variable outlined earlier. This gives the users more control over the complexity of the function, which is important as the desired result from SVR may vary depending on the study being conducted. Another strong advantage of SVR is that, by using the appropriate kernel function, it can manage highly complex and unstructured data, even in instances where the number of predictors is greater than the number of observations. The disadvantages of SVRs include being prone to overfitting as compared with other ML techniques [18]. This can be observed in the work of Hossain et al. (2022), where the R² value for models based on SVM developed to model the microalgal hydrogen production from palm oil mill effluents and activated sludge waste ranged between 0.01 and 0.34 [48]. Another setback from using SVR is that choosing the wrong kernel function to construct the model can give an inaccurate depiction of the results. Furthermore, training time could be time-intensive when using large data sets [49,50]. To summarize, SVMs are more suitable when more control over the results is needed from the users in terms of error tolerance and kernel function used.

The attributes of regression as an ML technique in microalgal hydrogen studies vary depending on the type of regressions being deployed. Regression models are very easy to interpret, allowing for better visualization of the relationships between the variables within the system. Similar to the other ML techniques, most regression models are also prone to overfitting, especially if the independent variables are collinear [18]. An example of a regression model that overcomes this common weakness is the Gaussian Process Regression (GPR), which can accurately evaluate its level of uncertainty. Hossain et al. (2022) reported that models based on GPR being developed to model the microalgal hydrogen production from palm oil mill effluents and activated sludge waste had R² values above 0.9, indicating good modelling performance [48]. Regression models are most suitable in relationship studies between predictors and predictions of microalgal hydrogen production. The advantages and disadvantages of the machine learning techniques discussed are summarized in Table 3.

5. Conclusions

In conclusion, the ML presents itself as an essential element in microalgal biohydrogen production. Multiple methods that had been used in the literature were evaluated in terms of their efficacies in fulfilling various applications such as relationship studies, classification of results, and prediction of microalgal hydrogen production. The specialized ML techniques developed for microalgal biohydrogen production have shown potential in illustrating the nonlinear and complex interactions among the variables involved. The RFs are very useful when data is limited, while SVMs offer more control over error tolerance for classification scenarios. Regression is effective in relationship studies and prediction, and ANNs offer the most versatility. Indeed, different studies must adopt different approaches to addressing specific problems. Future studies could look into developing ML techniques that can overcome issues that arise when current methods are employed, such as overfitting and high computational time.

Author Contributions

Conceptualization, M.Z.A.S., A.R. and F.A.; resources, A.R., F.A., J.W.L. and C.S.L.; writing—original draft preparation, M.Z.A.S.; writing—review and editing, A.R., F.A., J.W.L., G.R.M., H.D., R.S., C.-D.H., A.U., D.H.N. and P.V.R.; supervision, J.W.L. and C.-D.H.; funding acquisition, J.W.L. All authors have read and agreed to the published version of the manuscript.

Funding

The financial supports received from the Ministry of Higher Education Malaysia via the Fundamental Research Grant Scheme (FRGS) with the cost center of 015MA0-110 (FRGS/1/2020/TK0/UTP/02/20) and the Murata Science Foundation with the cost center of 015ME0-299 are gratefully acknowledged.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Ameen, F.; Altuner, E.E.; Tiri, R.N.E.; Gulbagca, F.; Aygun, A.; Sen, F.; Majrashi, N.; Orfali, R.; Dragoi, E.N. Highly active iron (II) oxide-zinc oxide nanocomposite synthesized Thymus vulgaris plant as bioreduction catalyst: Characterization, hydrogen evolution and photocatalytic degradation. Int. J. Hydrogen Energy 2022, in press. [Google Scholar] [CrossRef]
Limongi, A.R.; Viviano, E.; De Luca, M.; Radice, R.P.; Bianco, G.; Martelli, G. Biohydrogen from microalgae: Production and applications. Appl. Sci. 2021, 11, 1616. [Google Scholar] [CrossRef]
Al Husnain, L.; Alajlan, L.; AlKahtani, M.D.; Ameen, F. Avicennia marina endophytic fungi shows antagonism against tomato pathogenic fungi. J. Saudi Soc. Agric. Sci. 2022, in press. [Google Scholar] [CrossRef]
Ozbas, E.E.; Aksu, D.; Ongen, A.; Aydin, M.A.; Ozcan, H.K. Hydrogen production via biomass gasification, and modeling by supervised machine learning algorithms. Int. J. Hydrogen Energy 2019, 44, 17260–17268. [Google Scholar] [CrossRef]
Sydney, E.B.; Duarte, E.R.; Burgos, W.J.M.; de Carvalho, J.C.; Larroche, C.; Soccol, C.R. Development of short chain fatty acid-based artificial neuron network tools applied to biohydrogen production. Int. J. Hydrogen Energy 2020, 45, 5175–5181. [Google Scholar] [CrossRef]
Safarian, S.; Ebrahimi Saryazdi, S.M.; Unnthorsson, R.; Richter, C. Modeling of hydrogen production by applying biomass gasification: Artificial neural network modeling approach. Fermentation 2021, 7, 71. [Google Scholar] [CrossRef]
Coşgun, A.; Günay, M.E.; Yıldırım, R. Exploring the critical factors of algal biomass and lipid production for renewable fuel production by machine learning. Renew. Energy 2021, 163, 1299–1317. [Google Scholar] [CrossRef]
Bhola, V.; Swalaha, F.M.; Nasr, M.; Bux, F. Fuzzy intelligence for investigating the correlation between growth performance and metabolic yields of a Chlorella sp. exposed to various flue gas schemes. Bioresour. Technol. 2017, 243, 1078–1086. [Google Scholar] [CrossRef]
Sharma, P.; Sivaramakrishnaiah, M.; Deepanraj, B.; Saravanan, R.; Reddy, M.V. A novel optimization approach for biohydrogen production using algal biomass. Int. J. Hydrogen Energy 2022, in press. [Google Scholar] [CrossRef]
Salameh, T.; Sayed, E.T.; Olabi, A.G.; Hdaib, I.I.; Allan, Y.; Alkasrawi, M.; Abdelkareem, M.A. Adaptive Network Fuzzy Inference System and Particle Swarm Optimization of Biohydrogen Production Process. Fermentation 2022, 8, 483. [Google Scholar] [CrossRef]
Subramaniyan, S.B.; Ameen, F.; Zakham, F.A.; Anbazhagan, V. Activity of Lipid Loaded Lectin against co-infection of Candida albicans and Staphylococcus aureus using the Zebrafish model. J. Appl. Microbiol. 2022, 134, lxac050. [Google Scholar]
Maind, S.B.; Wankar, P. Research paper on basic of artificial neural network. Int. J. Recent Innov. Trends Comput. Commun. 2014, 2, 96–100. [Google Scholar]
Singaravelu, D.K.; Binjawhar, D.N.; Ameen, F.; Veerappan, A. Lectin-Fortified Cationic Copper Sulfide Nanoparticles Gain Dual Targeting Capabilities to Treat Carbapenem-Resistant Acinetobacter baumannii Infection. ACS Omega 2022, 7, 43934–43944. [Google Scholar] [CrossRef] [PubMed]
Boateng, E.Y.; Otoo, J.; Abaye, D.A. Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: A review. J. Data Anal. Inf. Process. 2020, 8, 341–357. [Google Scholar] [CrossRef]
Ameen, F.; Al-Homaidan, A.A. Treatment of heavy metal–polluted sewage sludge using biochar amendments and vermistabilization. Environ. Monit. Assess. 2022, 194, 861. [Google Scholar] [CrossRef]
Nasr, N.; Hafez, H.; El Naggar, M.H.; Nakhla, G. Application of artificial neural networks for modeling of biohydrogen production. Int. J. Hydrogen Energy 2013, 38, 3189–3195. [Google Scholar] [CrossRef] [Green Version]
Alaguprathana, M.; Poonkothai, M.; Ameen, F.; Bhat, S.A.; Mythili, R.; Sudhakar, C. Sodium hydroxide pre-treated Aspergillus flavus biomass for the removal of reactive black 5 and its toxicity evaluation. Environ. Res. 2022, 214, 113859. [Google Scholar] [CrossRef]
Jiang, T.; Gradus, J.L.; Rosellini, A.J. Supervised machine learning: A brief primer. Behav. Ther. 2020, 51, 675–687. [Google Scholar] [CrossRef] [PubMed]
Almansob, A.; Bahkali, A.H.; Ameen, F. Efficacy of gold nanoparticles against drug-resistant nosocomial fungal pathogens and their extracellular enzymes: Resistance profiling towards established antifungal agents. Nanomaterials 2022, 12, 814. [Google Scholar] [CrossRef]
Fang, Y.; Ma, L.; Yao, Z.; Li, W.; You, S. Process optimization of biomass gasification with a Monte Carlo approach and random forest algorithm. Energy Convers. Manag. 2022, 264, 115734. [Google Scholar] [CrossRef]
Soundararajan, D.; Natarajan, L.; Trilokesh, C.; Harish, B.S.; Ameen, F.; Islam, M.A.; Uppuluri, K.B.; Anbazhagan, V. Isolation of exopolysaccharide, galactan from marine Vibrio sp. BPM 19 to template the synthesis of antimicrobial platinum nanocomposite. Process Biochem. 2022, 122, 267–274. [Google Scholar] [CrossRef]
Hassan, S.; Khurshid, Z.; Bhat, S.A.; Kumar, V.; Ameen, F.; Ganai, B.A. Marine bacteria and omic approaches: A novel and potential repository for bioremediation assessment. J. Appl. Microbiol. 2022, 133, 2299–2313. [Google Scholar] [CrossRef] [PubMed]
Xu, Q.; Yin, J. Application of random forest algorithm in physical education. Sci. Program. 2021, 2021, 1996904. [Google Scholar] [CrossRef]
Karatzoglou, A.; Meyer, D.; Hornik, K. Support vector machines in R. J. Stat. Softw. 2006, 15, 1–28. [Google Scholar] [CrossRef] [Green Version]
Kecman, V. Support vector machines—An introduction. In Support Vector Machines: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2005; pp. 1–47. [Google Scholar]
Afridi, M.S.; Ali, S.; Salam, A.; César Terra, W.; Hafeez, A.; Ali, B.; AlTami, M.S.; Ameen, F.; Ercisli, S.; Marc, R.A.; et al. Plant Microbiome Engineering: Hopes or Hypes. Biology 2022, 11, 1782. [Google Scholar] [CrossRef] [PubMed]
Kadam, V.S.; Kanhere, S.; Mahindrakar, S. Regression techniques in machine learning &applications: A review. Int. J. Res. Appl. Sci. Eng. Technol. 2020, 8, 826–830. [Google Scholar]
Gudeta, K.; Bhagat, A.; Julka, J.M.; Sinha, R.; Verma, R.; Kumar, A.; Kumari, S.; Ameen, F.; Bhat, S.A.; Amarowicz, R.; et al. Vermicompost and Its Derivatives against Phytopathogenic Fungi in the Soil: A Review. Horticulturae 2022, 8, 311. [Google Scholar] [CrossRef]
Sharma, S.; Rana, V.S.; Rana, N.; Sharma, U.; Gudeta, K.; Alharbi, K.; Ameen, F.; Bhat, S.A. Effect of Organic Manures on Growth, Yield, Leaf Nutrient Uptake and Soil Properties of Kiwifruit (Actinidia deliciosa Chev.) cv. Allison. Plants 2022, 11, 3354. [Google Scholar] [CrossRef]
Maulud, D.; Abdulazeez, A.M. A review on linear regression comprehensive in machine learning. J. Appl. Sci. Technol. Trends 2020, 1, 140–147. [Google Scholar] [CrossRef]
Paine, C.T.; Marthews, T.R.; Vogt, D.R.; Purves, D.; Rees, M.; Hector, A.; Turnbull, L.A. How to fit nonlinear plant growth models and calculate growth rates: An update for ecologists. Methods Ecol. Evol. 2012, 3, 245–256. [Google Scholar] [CrossRef]
Sripontan, Y.; Chiu, C.; Tanansathaporn, S.; Leasen, K.; Manlong, K. Modeling the Growth of Black Soldier Fly Hermetia illucens (Diptera: Stratiomyidae): An Approach to Evaluate Diet Quality. J. Econ. Entomol. 2020, 113, 742–751. [Google Scholar] [CrossRef]
Wang, Q.; Gong, Y.; Liu, S.; Wang, D.; Liu, R.; Zhou, X.; Nghiem, L.D.; Zhao, Y. Free ammonia pretreatment to improve bio-hydrogen production from anaerobic dark fermentation of microalgae. ACS Sustain. Chem. Eng. 2018, 7, 1642–1647. [Google Scholar] [CrossRef]
Wang, Y.; Tang, M.; Ling, J.; Wang, Y.; Liu, Y.; Jin, H.; He, J.; Sun, Y. Modelling biohydrogen production using different data driven approaches. Int. J. Hydrogen Energy 2021, 46, 29822–29833. [Google Scholar] [CrossRef]
Hosseinzadeh, A.; Zhou, J.L.; Altaee, A.; Li, D. Machine learning modeling and analysis of biohydrogen production from wastewater by dark fermentation process. Bioresour. Technol. 2022, 343, 126111. [Google Scholar] [CrossRef]
Ameen, F.; Dawoud, T.M.; Alshehrei, F.; Alsamhary, K.; Almansob, A. Decolorization of acid blue 29, disperse red 1 and congo red by different indigenous fungal strains. Chemosphere 2021, 271, 129532. [Google Scholar] [CrossRef] [PubMed]
Wong, Y.M.; Wu, T.Y.; Juan, J.C. A review of sustainable hydrogen production using seed sludge via dark fermentation. Renew. Sustain. Energy Rev. 2014, 34, 471–482. [Google Scholar] [CrossRef]
Gibson, G.R.; Macfarlane, G.T.; Cummings, J.H. Sulphate reducing bacteria and hydrogen metabolism in the human large intestine. Gut 1993, 34, 437. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, Y.; Liu, J.; He, H.; Yang, S.; Wang, Y.; Hu, J.; Jin, H.; Cui, T.; Yang, G.; Sun, Y. A Review of Enhancement of Biohydrogen Productions by Chemical Addition Using a Supervised Machine Learning Method. Energies 2021, 14, 5916. [Google Scholar] [CrossRef]
Monroy, I.; Buitron, G. Diagnosis of undesired scenarios in hydrogen production by photo-fermentation. Water Sci. Technol. 2018, 78, 1652–1657. [Google Scholar] [CrossRef] [PubMed]
Alalayah, W.M.; Alhamed, Y.; Al-Zahrani, A.A.; Edris, G.; Al-Turaif, H.A. Merits of utilizing an artificial neural network as a prediction model for bio-hydrogen production. Rev. Chim. 2014, 65, 458–465. [Google Scholar]
Liu, Y.; Min, J.; Feng, X.; He, Y.; Liu, J.; Wang, Y.; He, J.; Do, H.; Sage, V.; Yang, G.; et al. A Review of Biohydrogen Productions from Lignocellulosic Precursor via Dark Fermentation: Perspective on Hydrolysate Composition and Electron-Equivalent Balance. Energies 2020, 13, 2451. [Google Scholar] [CrossRef]
Sharma, A.K.; Ghodke, P.K.; Goyal, N.; Nethaji, S.; Chen, W.H. Machine learning technology in biohydrogen production from agriculture waste: Recent advances and future perspectives. Bioresour. Technol. 2022, 364, 128076. [Google Scholar] [CrossRef] [PubMed]
Hossain, M.S.; Ong, Z.C.; Ismail, Z.; Noroozi, S.; Khoo, S.Y. Artificial neural networks for vibration based inverse parametric identifications: A review. Appl. Soft Comput. 2017, 52, 203–219. [Google Scholar] [CrossRef]
Buskirk, T.D. Surveying the Forests and Sampling the Trees: An overview of Classification and Regression Trees and Random Forests with applications in Survey Research. Surv. Pract. 2018, 11, 1–13. [Google Scholar] [CrossRef] [Green Version]
Ameen, F.; Al-Homaidan, A.A.; Al-Sabri, A.; Almansob, A.; AlNAdhari, S. Antioxidant, anti-fungal and cytotoxic effects of silver nanoparticles synthesized using marine fungus Cladosporium halotolerans. Appl. Nanosci. 2023, 13, 623–631. [Google Scholar] [CrossRef]
Matsuki, K.; Kuperman, V.; Van Dyke, J.A. The Random Forests statistical technique: An examination of its value for the study of reading. Sci. Stud. Read. 2016, 20, 20–33. [Google Scholar] [CrossRef] [Green Version]
Hossain, S.K.S.; Ayodele, B.V.; Ali, S.S.; Cheng, C.K.; Mustapa, S.I. Comparative Analysis of Support Vector Machine Regression and Gaussian Process Regression in Modeling Hydrogen Production from Waste Effluent. Sustainability 2022, 14, 7245. [Google Scholar] [CrossRef]
Alshehrei, F.; Ameen, F. Vermicomposting: A management tool to mitigate solid waste. Saudi J. Biol. Sci. 2021, 28, 3284–3293. [Google Scholar] [CrossRef] [PubMed]
Çolakoglu, N.; Akkaya, B. Comparison of multi-class classification algorithms on early diagnosis of heart diseases. In y-BIS 2019 Conference Book: Recent Advances in Data Science and Business Analytics, Istanbul, Turkey, 25–28 September 2019; Mimar Sinan Fine Arts University Publications: Istanbul, Turkey, 2019; p. 162. [Google Scholar]

Figure 1. The neural network structure [14].

Figure 2. The random forest algorithm [23].

Figure 3. The 2-D illustration of the support vector machine algorithm [14].

Table 1. The popular admissible Kernel equations [25].

Kernel Functions	Type of Classifier
$K ({x, x}_{i}) = (x^{T} x_{i})$	Linear, dot product
$K ({x, x}_{i}) = [(x^{T} x_{i}) {+ 1]}^{d}$	Complete polynomial of degree d
$K ({x, x}_{i}) {= e}^{\frac{1}{2} [{({x - x}_{i})}^{T} Σ^{- 1} ({x - x}_{i})]}$	Gaussian RBF
$K ({x, x}_{i}) {= \tan h [x}^{T} x_{i}) + b]$	Multilayer perceptron
$K ({x, x}_{i}) = \frac{1}{\sqrt{{\| \| x - x_{i} \| \|}^{2} + β}}$	Inverse multiquadric function

Table 2. The equations, parameters, and definitions of growth models [32].

Model	Equation
Exponential	$W_{t} {= W}_{0} \exp (rt)$
Power-Law	$W_{t} {= (W}_{0}^{1 - β} + rt (1 - β))^{\frac{1}{1 - β}}$
Asymptotic non-linear	Logistic: $W_{t} = \frac{W_{\infty}}{1 + [(\frac{W_{\infty}}{W_{0}}) - 1] \exp^{- kt}}$
	Gompertz: $W_{t} {= W}_{\infty} \exp [\ln (\frac{W_{0}}{W_{\infty}}) \exp^{- kt}]$
	Brody: $W_{t} {= W}_{\infty} [1 + [(\frac{W_{0}}{W_{\infty}}) - 1] \exp^{- kt}]$
	Richards: $W_{t} = \frac{W_{\infty} W_{0}}{{{[W}_{0}^{δ} + (W_{\infty}^{δ} - W_{0}^{δ}) \exp^{- kt}]}^{\frac{1}{δ}}}$
Parameter	Definition
$W_{t}$	Body width at time = t
$W_{0}$	Body width at time = 0
$W_{\infty}$	Asymptotic growth limit of body width
$r$	Growth rate
$β$	Shape and scaling parameter of power-law model
$k$	Growth constant in asymptotic non-linear models
$δ$	Shape and scaling parameter of Richards model

Table 3. Advantages and disadvantages of the different machine learning techniques.

Machine Learning Technique	Advantage	Disadvantage
Artificial neural network	Flexible Capable of modelling complex interactions	Requires large training data Unable to predict output beyond training data space
Random forest	Suitable for limited data sets Safeguard against overfitting	Computationally intensive Biased PVI for correlated predictors
Support vector machine	Customizable error tolerance Can manage highly unstructured data	Prone to overfitting Time intensive for large data sets
Regression	Easy to interpret and visualize	Prone to overfitting

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahmad Sobri, M.Z.; Redhwan, A.; Ameen, F.; Lim, J.W.; Liew, C.S.; Mong, G.R.; Daud, H.; Sokkalingam, R.; Ho, C.-D.; Usman, A.; et al. A Review Unveiling Various Machine Learning Algorithms Adopted for Biohydrogen Productions from Microalgae. Fermentation 2023, 9, 243. https://doi.org/10.3390/fermentation9030243

AMA Style

Ahmad Sobri MZ, Redhwan A, Ameen F, Lim JW, Liew CS, Mong GR, Daud H, Sokkalingam R, Ho C-D, Usman A, et al. A Review Unveiling Various Machine Learning Algorithms Adopted for Biohydrogen Productions from Microalgae. Fermentation. 2023; 9(3):243. https://doi.org/10.3390/fermentation9030243

Chicago/Turabian Style

Ahmad Sobri, Mohamad Zulfadhli, Alya Redhwan, Fuad Ameen, Jun Wei Lim, Chin Seng Liew, Guo Ren Mong, Hanita Daud, Rajalingam Sokkalingam, Chii-Dong Ho, Anwar Usman, and et al. 2023. "A Review Unveiling Various Machine Learning Algorithms Adopted for Biohydrogen Productions from Microalgae" Fermentation 9, no. 3: 243. https://doi.org/10.3390/fermentation9030243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review Unveiling Various Machine Learning Algorithms Adopted for Biohydrogen Productions from Microalgae

Abstract

1. Introduction

2. Types of Machine Learning

2.1. Artificial Neural Networks

2.2. Random Forest

2.3. Support Vector Machines

2.4. Regression

3. Importance of Machine Learning in Biohydrogen Production

3.1. Relationship Study

3.2. Classification of Results

3.3. Prediction of Microalgal Hydrogen Production

4. Comparative Analyses among ML Techniques

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI