1. Introduction
The global energy shortage and environmental pollution are both becoming more and more serious issues. To achieve China’s “carbon peak” and “carbon neutrality” objectives, there is a need for an effective reduction in CO
2 emissions. Exploring renewable and clean energy sources as alternatives to fossil fuels is critical. Biomass is emphasized as the sole viable carbon-based renewable alternative to fossil fuels [
1]. Studies indicate that, in certain regions, initial trials exploring its integration into the energy mix are underway. In 2021, the amount of electricity generated from biomass combustion reached 750 TWh, and it is projected that, by 2050, the supply of biomass is expected to increase to 100 EJ [
2,
3].
Hydrothermal carbonization (HTC) is an emerging thermochemical conversion technology. Biomass is effectively transformed into hydrochar through HTC, with an energy density that is comparable to peat and lignite. This process efficiently recycles agricultural waste and exhibits economic and environmentally friendly characteristics [
4]. The complex composition of the biomass material influences the HTC process. It is also affected by the hydrothermal reaction conditions. This process involves nonlinear and strongly coupled multivariate relationships. These factors hinder scholars’ deep understanding of the hydrothermal carbonization mechanism. Current studies primarily rely on experimental and simulation methods [
5,
6]. These methods are used for quantitative analysis to explore the factors affecting the hydrothermal carbonization process. However, the research mainly focuses on one or a few types of biomass materials. The experimental results need more universality, and the findings are challenging to generalize. When modeling using traditional methods, such as computational fluid dynamics, kinetics, and thermodynamics, various assumptions are inevitably made about the complex HTC process. These assumptions are made to simplify the complex models. However, these simplifications may affect the accuracy of the simulation results [
7].
Machine learning techniques demonstrate more potential in predicting unknown relationships than traditional methods. Machine learning algorithms are widely applied in predicting industrial problems. They can make compelling predictions without a precise mathematical relationship between the input and output features. They have been proven to be alternatives to traditional modeling techniques for studying and understanding complex processes [
8], which demonstrates significant potential in predicting the physicochemical properties of hydrochar.
Rasam et al. [
9] pioneered a machine learning model to predict hydrothermal carbonization properties, with the Support Vector Machine (SVM) method showing superior performance over other methods like Decision Tree (DT), Random Forest (RF), and Multi-Layer Perceptron (MLP). Kardani et al. [
10] found XGBoost to be the most accurate method in predicting hydrocarbons yield in biomass. Li et al.’s studies [
11,
12] employed optimized RF and SVM models to assess biomass hydrochar and pyrochar properties, achieving significant accuracy. Furthermore, they demonstrated that DNN models excelled in predicting the physicochemical properties of hydrochar from urban waste, with high accuracy in parameters like HHV and carbon content. Nguyen et al. [
13] have also conducted similar work, achieving analogous results.
Machine learning methods are key in predicting hydrochar properties, but model optimization is challenging. Mu et al. [
14], in 2022, successfully used Particle Swarm Optimization based on an ANN model to study hydrochar, achieving high accuracy, thereby demonstrating the potential of metaheuristic optimization in such research. Li et al. [
15] employed Genetic Algorithms (GA) to optimize Artificial Neural Networks (ANN), achieving superior outcomes compared to those obtained through the response surface methodology (RSM).
Metaheuristic optimization techniques, known for their minimal derivation, flexibility, and their ability to escape local optima [
16], are effective for nonlinear problems, finding applications in energy, mechanical, and chemical engineering [
17,
18,
19]. And, because the optimization process of metaheuristic algorithms does not depend on gradient information [
20], they find widespread applications in optimization problems for finding the best parameters. For example, the Dujiangyan irrigation system optimization (DISO) [
21] is used to construct a DISO-SVM model [
21] to detect the impact of dam displacement on dam operation; Particle Swarm Optimization (PSO) [
22] is used to construct PSO-NN [
14] and PSO-RF [
23] models to predict hydrochar properties; the Grey Wolf Optimizer (GWO) [
24] is used to construct a GWO-ELM model [
25] for monitoring power quality; the Dandelion Optimizer (DO) [
26] is used to improve the efficiency of multilevel inverters [
27]; the Jellyfish Search Algorithm (JS) [
28] is used to discover unknown parameters in fuel cells [
29]; Young’s Double-Slit Experiment (YDSE) optimizer [
30] is used to construct a YDSE-PWM model for predicting dissolved oxygen levels [
31]; the Starling Murmuration Optimizer (SMO) [
32] algorithm is used to construct an ADA-SMO model [
33] for predicting the strength of the mechanical properties of concrete. The “No Free Lunch” theorem states that specific optimization algorithms are suited only to certain optimization problems [
34]. This principle motivates us to create and improve existing optimization algorithms so that they perform better in specific scenarios with specialized difficulties. Researchers have proposed various improved algorithms, such as the Hybrid Whale–Particle Swarm Optimization Algorithm (HWPSO) [
35]; adaptive hybrid dandelion optimizer (DETDO) [
36]; Ameliorated Young’s double-slit experiment optimizer (IYDSE) [
37]; enhanced jellyfish search algorithm (EJS) [
38]; and efficient hybrid starling murmuration optimizer (DTCSMO) [
39].
In this study, the Dujiangyan irrigation system optimization (DISO) [
21] is enhanced with nonlinear shrinking factors and Cauchy mutation methods. This improvement aims to overcome its tendencies for converging to local optima and poor performance in high-dimensional spaces. IDISO will be used to optimize the predictive model for the physicochemical properties of hydrochar. The aim is to guide experimental designs. This approach can reduce time and economic costs. Additionally, it aids in revealing the relationships between different parameters in the hydrochar reaction process. It also provides theoretical guidance for the application of hydrochar in fuel chemical engineering.
The main contributions of this paper are as follows:
We improved the Dujiangyan irrigation system optimization (DISO) by introducing nonlinear shrinking factors and the Cauchy mutation mechanism, addressing its tendency to become trapped in local optima and its poor performance in high-dimensional spaces.
We compared the IDISO algorithm with seventeen state-of-the-art optimization algorithms using twenty-nine CEC2017 benchmark functions across three dimensions (30, 50, and 100) and nine engineering problems. Non-parametric tests indicated that the IDISO algorithm showed significant improvements in terms of convergence speed and accuracy.
We developed an IDISO-XGBoost model to predict the physicochemical properties of hydrochar, resulting in a prediction model with high robustness and generalization ability.
Section 2 elaborates on the original Dujiangyan method and its advancements.
Section 3 and
Section 4 involve testing the algorithm’s performance.
Section 5 involves algorithm performance analysis.
Section 6 discusses the application of the refined algorithm in optimizing the predictive model for hydrothermal carbon’s physicochemical properties. This manuscript concludes with
Section 7, offering conclusions and insights for future research directions.
2. Materials and Methods
2.1. Data Source
The dataset utilized in this study was compiled through the meticulous collection and organization of existing published data, aligning with the rigorous standards of research methodologies. In this dataset, the input features included the operational conditions during hydrothermal carbonization and the elemental and industrial analysis of the biomass itself. The output feature was the basic analysis of hydrochar. A total of 420 datasets were used. The data originated from hydrothermal carbonization research articles on wood biomass, herbaceous biomass, food waste, sewage sludge, and other raw materials. This included common types of biomass materials.
2.2. DISO
The inspiration for the DISO method comes from the Dujiangyan irrigation project. In the DISO algorithm [
21], potential solutions of the unknown function are considered as massless, volumeless droplets in the search space. The algorithm primarily consists of four main steps.
The first part is the initialization phase. Then, the global search phase simulates the Fish Mouth Dividing Project, and the formula for updating the velocity of water droplets is as follows:
where
and
are random numbers between 0 and 1;
and
are the hydraulic radii of the inner and outer rivers, set at fixed values of 1.2 and 7.2;
and
represent the hydraulic gradients of the inner and outer rivers, set at fixed values of 1.3 and 0.82; and
denotes the comprehensive riverbed roughness, the formula for which is as follows:
where
represents the riverbed roughness, set at a fixed value of 9.435;
is the current iteration number;
is the maximum number of iterations; and
is the gamma probability density function, influenced by the values of
and
, which are set at 0.85 and 2.5, respectively, in the DISO algorithm. The formula for updating the position of the water droplet is as follows:
where
represents the position of the best-performing candidate solution in the
iteration and
is the recipe for the self-improvement of the water body. The procedure is as follows:
In this formula,
is an integer between [0, 1], while the
represents the average position of the population. After the water body enters the inner river, the water droplet will exhibit a spiral motion. At this point, the position update is influenced by centrifugal and lateral pressures, with the corresponding update formula being as follows.
In this formula,
and
represent the centrifugal force and lateral pressure experienced by the
solution, respectively. The formula for this is as follows:
In this formula,
is the density of the water body in the inner river, set at a fixed value of 1.35;
is the fluid distribution coefficient, set at 0.46; and
is the longitudinal average velocity, with the formula being as follows:
In this procedure, represents the average fitness of the population.
The local development phase simulates the Baopingko Project, and the formula for updating the velocity of the water droplet is as follows:
In this formula,
and
are random numbers between 0 and 1, while
is the water dividing ratio at the bottleneck, set at a fixed value of 0.68. The formula for updating the position of the water droplet is as follows:
In this formula,
represents the position of the best-performing candidate solution in the
iteration, while
is the recipe for the self-improvement of the water body. The procedure for this is as follows:
In this formula, represents the fitness value of an individual and denotes a random individual.
The final phase involves simulating the individual elimination stage of the Feishayan. Fitness values are ranked. Initialization is performed for individuals to be eliminated. This process affects the sand discharge. The elimination ratio per iteration is set at 0.23.
2.3. IDISO
In this section, we introduce the improvements to the original algorithm using the nonlinear shrinking factor and the Cauchy mutation mechanism.
2.3.1. Nonlinear Shrinking Factor
In the DISO algorithm, position updates Equations (3) and (10) primarily depend on the best candidate solution and positions of other solutions in the population. However, this transfer of information could be more practical in coordinating the relationship between the global search and local development. Consequently, a nonlinear shrinking factor is introduced. This is to balance the dynamic equilibrium between global tracking and the local development of individuals. Simultaneously, this approach ensures the convergence of DISO while enhancing its precision. The fractal dimension [
40] is introduced as a nonlinear factor in the third term of the position update Formula (5), with the procedure being as follows:
In this formula, and are set at fixed values of and .
A nonlinear factor in the opposite direction is introduced in the second term of the position update Formula (12), with the formula being as follows:
The updated position update formula is as follows:
In this formula, nonlinearly decreases with the increase in the iteration number , while nonlinearly increases correspondingly. During the initial iteration phase, slowly decays while increases gradually. This allows for a more extensive movement range of the water body in the first position update. Consequently, it enables extensive exploration within the search space. During the third position update, the water body can undergo detailed development within a localized area. As the number of iterations increases, C(t) decreases rapidly while Q(t) increases quickly. This results in the initial updates in the later stages of iteration, mainly focusing on local and detailed development. In the third update, the water body can move more quickly towards the global optimum through a more extensive range of motion, allowing for a more effective search for the optimal solution.
2.3.2. Cauchy Mutation Mechanism
The movement mechanism of DISO mainly depends on the information exchange within the population. It tends to fall into local optima and cannot escape from positions of local optimum values. Therefore, the Cauchy mutation mechanism is introduced before the fourth step of eliminating individuals. This enhances the particles’ ability to escape from local optima. The formula is as follows:
In this formula,
represents the position of the water body after the Cauchy mutation.
is the current position of the water body.
and
are fixed values of 0 and 1. Before individual elimination (in the Feishayan stage), the final position of the water body is determined by comparing the fitness before and after mutation. The flowchart of IDISO is shown in
Figure 1.
2.4. IDISO-XGBoost
Predicting the elemental characteristics of hydrochar is crucial for scholars to understand its reaction process. It is equally essential for guiding experimental and industrial applications.
XGBoost [
41] is a machine learning algorithm based on gradient-boosted trees. It introduces regularization terms to control the complexity of the model and employs gradient boosting algorithms for training. Relevant studies indicate that XGBoost exhibits the best performance in predicting structured data problems [
42].
Unlike manual tuning methods that are prone to local optima, grid search methods that are time-consuming and labor-intensive, and Bayesian optimization methods that have limitations in hyperparameter optimization, metaheuristic methods have the advantages of not requiring gradient information, strong global search capabilities, and high speed. Therefore, this study proposes the IDISO-XGBoost model, utilizing IDISO to optimize the parameters of XGBoost.
The dataset was divided into a training set, validation set, and test set in an 8:1:1 ratio. The training set was used to train the predictive model. The validation set was used for the hyperparameter tuning of the trained model. The test set was used to test the predictive model. The coefficient of determination (R2) was used as the evaluation metric for XGBoost and as the fitness value for IDISO. When optimizing the hyperparameters of XGBoost, the IDISO algorithm aimed to maximize the value of the fitness function.
The model selects three hyperparameters that most significantly affect XGBoost’s performance as follows: the learning rate, number of trees, and maximum depth of trees. Additionally, it includes five hyperparameters for IDISO’s search space as follows: the sample subsampling ratio for model generalization and regularization coefficients to prevent overfitting. Due to the lower number of search dimensions, the number of seeds for the IDISO algorithm is set to 10, with 20 iterations for optimization.
In order to implement the above algorithms, Python version 3.9.13, NumPy version 1.24.2, SciPy version 1.9.1, XGBoost version 1.7.5, scikit-learn (sklearn) version 1.3.0, and pandas version 1.4.4 were employed.
2.5. Performance Evaluation
In this section, we introduce the evaluation of the algorithm using statistical analysis and non-parametric statistics (sign test) methods.
2.5.1. Statistical Analysis
The average ranking method is used to preliminarily evaluate IDISO’s performance. This evaluation is specifically applied to CEC2017 and renowned engineering projects.
2.5.2. Non-parametric Statistics (Sign Test)
In average ranking tests, algorithms may have their overall average ranking affected by extreme rankings due to their unsuitability for certain specific problems. Therefore, non-parametric testing methods are used to ascertain IDISO’s performance further.
The sign test is a popular and straightforward method for evaluating algorithm performance. It compares the performance of two algorithms in each scenario. The evaluation is based on the number of times one algorithm outperforms the other. The algorithm with a more significant number of overall victories is considered to be the superior algorithm.
3. CEC2017 Results and Discussion
CEC2017 serves as a benchmark test function [
43]. It comprises four types of benchmark test functions. Each category focuses on evaluating and testing the different capabilities of the algorithm. For detailed information, refer to
Table 1 below. Unimodal function testing assesses the algorithm’s in-depth exploitation capability in the search space. Multimodal function testing measures the algorithm’s efficiency in exploring complex search spaces. Hybrid function testing focuses on the algorithm’s ability to balance between exploration and exploitation tasks. Composite function testing evaluates the overall performance of the algorithm in highly complex search environments.
This section compares the IDISO algorithm with 17 powerful, state-of-the-art optimization algorithms, including DISO. For detailed information, refer to
Table 2 below.
The comparison is based on the performance across 29 CEC2017 test functions, covering 30, 50, and 100 dimensions. To ensure the fairness of the experimental results. All algorithms use the same population size (100) and the same number of iterations (). The comparison algorithms originate from their original authors and utilize their recommended optimal parameters. To minimize the randomness of the experimental outcomes, each algorithm is executed 30 times. The average fitness value is taken as the evaluation metric.
The comparison results of IDISO with other metaheuristic algorithms are shown in
Table 3.
Supplementary data provide a detailed display of the comparative algorithms’ performance on the three dimensions of the twenty-nine test functions. The results show that among the 17 compared optimization algorithms, DISO’s ranking declines as the dimensions increase. It drops from third to the fourth place. In cases of 50 and 100 dimensions, DISO’s average performance on CEC2017 test functions is surpassed by the HHO algorithm. This indicates its underperformance in higher dimensions. The improved IDISO algorithm exhibits a superior overall average performance across all three test dimensions when compared to the other 17 benchmarked algorithms. However, the sign test results indicate that IDISO is not superior to other algorithms in every problem. Its overall performance is the strongest. Different optimization algorithms may outperform IDISO in terms of some specific issues, which aligns with the “No Free Lunch theorem”.
As shown in
Figure 2, the accuracy of the F1 function is significantly improved across all three dimensions. This reflects that the improvement strategy has enhanced the algorithm’s global search ability in unimodal function search spaces. As a result, it becomes more effective in discovering global optima. There is a significant improvement in the accuracy of the F4 function at 100 dimensions. The improvement strategy enhances the algorithm’s exploitation ability in high-dimensional multimodal function search spaces. It also enhances the ability to escape from local optima. There is a significant improvement in the accuracy of the F12 function. This demonstrates the enhanced robustness of the improved algorithm in balancing global exploration and local exploitation. It can effectively find the global optimum within the complex search space of composite functions.
The radar charts of the top four ranked algorithms (IDISO, DISO, GWO, HHO) across three dimensions of the CEC2017 test functions are shown in
Figure 3. The radar charts intuitively demonstrate that the IDISO algorithm has the smallest shaded area in all three dimensions, indicating its best performance in the rankings of the test functions. Particularly, in the high-dimensional space of multimodal problems, the IDISO algorithm exhibits better characteristics than in low-dimensional space, suggesting that the improvement methods effectively enhance the performance of DISO in high-dimensional space. Additionally, the radar charts show that the IDISO algorithm has the best stability among all the compared algorithms, indicating that IDISO can be widely applied to various types of problems, especially high-dimensional problems.