1. Introduction
According to the United Nations, the global population has reached 8 billion as of November 2022. The production of food security is facing a serious challenge with the increase in the world population. The issue of food security is related to the continuous growth and future destiny of mankind [
1]. Freshwater resources on the earth’s surface account for only 2.5% of the total water volume [
2]. Therefore, it is crucial to guarantee food security production in a situation of water shortage [
3]. Traditional agricultural irrigation methods have resulted in significant water wastage due to the lack of precise control of irrigation volume [
4]. Water-saving irrigation technology must be vigorously implemented to improve the shortcomings of traditional irrigation methods. To meet the water needs of agricultural production while achieving the optimal allocation and utilization of water resources.
Drip irrigation, as one of the water-saving irrigation technologies, is an effective method to solve the water shortage in arid and water-scarce areas [
5]. As one of the most critical components in the drip irrigation system, the emitter works on the principle of pressurizing water through a narrow flow path inside and using the flow path boundary to make the water flow turbulent and dissipate the energy completely with the help of turbulent vortex dissipation. Finally, the water flow drips into the soil at a small uniform flow rate [
6]. The internal flow path size of the emitter is tiny, generally around 0.5–1.2 mm, and the structure is complex. The geometric parameters of the flow path significantly affect the hydraulic performance of the emitter, which has a direct impact on the irrigation uniformity and the operating life of the emitter [
7]. The variation in the hydraulic performance of the emitter is significantly influenced by the internal geometric parameters of the flow path [
8].
The emitter flow index and flow coefficient are important parameters for evaluating emitter performance. The flow index can reflect the sensitivity of the water flow pattern within the drip head and the emitter flow to pressure changes [
9,
10]. In [
11], the effect of tooth width, tooth base distance, and tooth height on the hydraulic performance of the flow pattern index was investigated using the CFD numerical simulation technique with orthogonal tests. Under a certain condition of flow path length and depth, the tooth base distance, tooth height, and width of the rotor have significant effects on the flow index. The flow index is positively correlated with tooth base spacing and tooth width, and negatively correlated with tooth height. The study of [
12] found that the flow path depth was positively correlated with both the flow index and the flow coefficient; the flow index decreased with the increase in tooth height, and the flow coefficient showed a trend of increasing and then decreasing with the rise of tooth angle.
Of the current research in this area, there is still a relatively small amount of research into the structural parameters of the internal flow path of the emitter. The existing research mainly uses the development of molds or rapid prototyping technology to process different geometric parameters of the emitter, resulting in complex, lengthy, and expensive experiments [
13]. In recent years, with the rapid advances in computer technology, the simulation of complex flow problems has developed rapidly. Computational Fluid Dynamics (CFD) has received more and more attention. Various standard large-scale commercial computational software, such as FLUENT, ANSYS, and CFX, have been introduced [
14]. The “numerical test” method allows the designer to fully understand the flow laws and efficiently evaluate and select multiple design options for optimal design in the fastest and most economical way. This method significantly reduces the workload of physical experimental studies such as laboratory and testing and obtains the best design by satisfying multiple constraints [
15]. In [
16], the hydraulic performance of a rectangular labyrinth emitter with a rectangular flow path model after the addition of teeth and the variation law of the velocity field in the flow path after the addition of teeth were studied using the CFD fluid analysis technique. In [
17], the internal flow field motion of three emitters at different pressures was compared using numerical simulation methods, and it was found that the probability of emitter clogging increased as the operating pressure increased. However, the CFD modeling process is complex and requires high technical requirements for the personnel involved. Therefore, the limited simulation results based on CFD can be used to construct an “end-to-end” hydraulic characteristic parameter prediction model through machine learning, which can greatly improve the usability of parameter prediction in practical applications.
The rapid development of artificial intelligence (AI) algorithms has effectively improved the ability to fit complex nonlinear relationships. With the development of artificial intelligence technology, models derived from neural networks and decision tree theory have become the two mainstream categories of AI algorithms [
18]. Although DNN models, represented by deep learning, have become a more popular research area due to their high accuracy of the fitting. However, DNNs usually require a longer training time, and the collective data needed for training is more significant, so this method is not a suitable choice [
19]. In contrast, decision tree-based methods have efficient and accurate fitting performance in the face of small datasets and can intuitively obtain feature importance rankings [
20]. Based on this, classical decision-tree algorithms such as Random Forest, GBDT, and XGBoost have been developed based on the theory of Ensemble Learning (EL) [
21]. CatBoost uses symmetric decision trees as the base learner, which can efficiently and reasonably handle category-based features with excellent fitting performance and breakneck training speed [
22]. For other problems in agricultural water resources, the CatBoost model has been applied as a core model [
23], demonstrating its excellent fitting ability.
Based on practical needs, this study proposes to use a small amount of data from FLUENT software simulation as a basis. The CatBoost model is used as the core fitting algorithm to construct an extended data set through stochastic simulation and parameter fitting, and then the CatBoost model is used again to construct the flow index and flow coefficient prediction models, respectively. The objectives of this study are: (1) To analyze the effects of different morphological parameters on the flow characteristics of the flow field in the flow path based on the simulation results of CFD. (2) To analyze the relationship between morphological and hydraulic parameters based on the extended data set. (3) To evaluate the simulation accuracy of the proposed CatBoost model for flow rate, flow index, and flow coefficient and to compare the accuracy advantages with those of the commonly used typical models.
2. Materials and Methods
The framework of this study is shown in
Figure 1, where the FLUENT software was used to simulate a model of a specific emitter type as a benchmark to simulate the flow rate at different pressures. Based on this, a small amount of simulation data were obtained to construct a fitting relationship between the four parameters of emitter depth, width, length and pressure, and flow rate to build a predictive model of emitter flow rate. Based on this model, the depth, width, and length parameters were randomly selected, as was the flow rate at 1–15 m pressure, and then the results of 15 simulations were used for regression analysis to derive the flow index and flow coefficient. The above process was repeated 10,000 times to form a data set of critical parameters of the emitter. Based on this, the model relationship between depth, width, and length as input conditions and flow index and flow was further implemented using the CatBoost model. Under the condition of a small amount of simulated real-world data, in order to maximize the use and reflect the actual accuracy, the model training in this study was used to evaluate the fitting accuracy by the leave-one-out cross-validation (LOOCV) method.
2.1. Method of Physical Simulation and Analysis of the Emitter
2.1.1. Numerical Methods
The standard k-ε turbulence model was used in this paper. The water flow in the emitter was considered as an unpressurized flow with negligible heat exchange, so only the continuity equation, the Navier-Stokes equation, and the turbulence equation were considered as the governing equations:
k-ε model:
where
is the velocity vector,
,
and
are velocity components,
and
are generalized source terms,
is the time-averaged velocity,
is the turbulent viscosity,
is the turbulent kinetic energy,
is the dissipation rate,
is the turbulent energy generation term due to the mean velocity gradient,
= 1.44,
= 1.92,
= 1.0,
= 1.3.
The inlet of the flow channel was set as the pressure inlet (in the range of 0.01–0.15 MPa). The value was taken as the working pressure at each 0.01 MPa interval. There were 15 horizontal inlets. The pressure output was atmospheric pressure. The wall boundary was a non-slip boundary, taking into account the influence of the viscous subsurface, using standard wall function processing.
The finite volume method was used to discrete the control equations. The discrete format was based on discrete convection terms in first-order windward format and discrete diffusion terms in central difference format. The steady-state calculation, solved by the separate SIMPLE algorithm, was calculated with a convergence accuracy of 0.0001.
2.1.2. Physical Model Design of the Emitter
The design of the emitter flow path mainly uses the Minkowski fractal flow path; the basic square steps are as follows: (1) as in
Figure 2 (
n = 0), a straight line of length Lm is selected; (2) as in
Figure 2 (
n = 1), the straight line segment is divided into five equal parts and retains the first, third and fifth segments are retained, the second and fourth segments are changed to the angle of 90° in turn perpendicular to the three long Lm/5 straight line segments selected path width; (3) as in
Figure 2 (
n = 2), the five straight line segments of length Lm/5 are divided into five equal parts, and the first, third, and fifth line segments in each group of five equal parts are retained, and the second and fourth line segments are changed to three straight line segments of length Lm/25 at an angle of 90°. The Minkowski fractal curve generated by
n = 2 iterations is used as the boundary. The distance b is extended to the other side and modified, keeping the geometric parameters of the flow path energy dissipation unit unchanged. The Minkowski fractal flow path is formed, as shown in
Figure 2b.
Based on their generation process and fractal characteristics (
Figure 3), the generated path energy dissipation units have the same structural parameters, such as path tooth height and rotation angle, when a certain Euclidean length Lm is determined. Therefore, only three independent factors of path width (b), path depth (D), and path length (L) need to be selected as the structural geometry parameters for path design to control the structural dimensions of the path. The Minkowski fractal flow path with 13 different structural parameters was designed and constructed, and the geometric modeling was conducted in the Pro/E platform.
2.1.3. Model Validation
Digital Particle Image Velocimetry (DPIV) is a method of digitizing traditional image velocimetry PIV technology. The leading equipment of the Digital Particle Image Velocimetry system, the image acquisition equipment is the Kodak MEGA PLUS II camera (resolution 1600 × 1200, 2 M). The camera lens is an objective microscope lens (specification: 4×, model G10-211) manufactured by Beijing Daheng Camera Factory; the laser light source system is a Q-modulated Nd: YAG double-pulse laser manufactured by LABEST. The fluorescent particles were selected for testing. The primary material was polystyrene, the density was about 1020 kg/m
3, and the concentration was 1–2%. The image acquisition, display, and computational analysis software are TSI insight3G, and the set interval time is 0.01 s. Taking flow path B3 as an example,
Figure 4 shows the particle distribution images in the flow path structure unit at the adjacent moments before and after. Image post-processing was performed using the Tecplot software included with insight 3.
2.1.4. Numerical Design of Experiments
According to the characteristics of the fractal flow path, it is known that the width, depth, and length of the flow path are the key parameters. Therefore, three basic parameters were selected for the study in the following ranges: path width within [0.9 mm, 1.3 mm], path depth within [0.9 mm, 1.3 mm], and path length within [128 mm, 256 mm]. Five levels of each parameter were selected, with one set of duplicates, resulting in 13 different structural geometry parameters for the paths (
Table 1).
2.1.5. Analysis Method
Numerical simulations are performed using FLUENT 6.3 for the constructed fractal flow channels with different structural parameters. By visualizing the microscopic internal flow field, the variation of the flow field and hydraulic properties can be studied. It facilitates the analysis of the flow characteristics and mechanism of the internal flow field. Three groups of representative flow channels (B1, B2, E), (D1, D2, E), and (L1, L2, E) are selected for different flow channel widths, depths, and lengths to visualize the distribution of internal flow characteristic parameters at an operating pressure of 0.1 MPa.
As in
Figure 5, the fractal flow path section (Z = 0.5D) is taken, and the near wall surface is set at 0.1 mm from the top wall as the structural change along the boundary. The centerline of the flow path is selected from the midpoint of the inlet width of the flow path unit section, as shown in the schematic diagram. The Minkowski fractal flow path has several energy dissipation units with the same structure, so the analysis was refined to the flow path structural units. The velocity vector distribution of the cross-sectional (Z = 0.5D) flow path structural unit is visualized, and its velocity mass motion characteristics are analyzed.
Seven longitudinal sections in the Y-Z plane were taken equidistantly at different locations along the forward direction of the water flow in the flow path unit section, as shown in
Figure 6. The velocity contours of the seven longitudinal sections are observed to analyze the velocity distribution characteristics in the longitudinal sections.
Turbulence intensity reflects the intensity of flow pulsation. It is an important index to describe the turbulent motion characteristics of water flow, which can be used to measure the strength of the flow field. Therefore, turbulence intensity is also used as an analysis index in this paper when analyzing the flow motion characteristics within the flow path. The isotropic distribution of the turbulence intensity within the flow path structure unit is analyzed.
According to the design, 13 fractal flow paths were simulated to calculate the fractal flow path discharge at 0.01 MPa operating pressure in the 0.01–0.15 MPa range. The following equation is used to describe the hydraulic performance of the flow path, the calculated flow value, and the operating pressure using regression calculations to obtain the flow index
and flow coefficient
of different fractal flow paths at different operating levels.
where
is the flow rate,
is the working pressure,
is the flow coefficient, and
is the exponent flow.
2.2. CatBoost Model
CatBoost is a gradient-boosting decision tree-based machine learning framework that implements extensions and improvements to the Gradient Boosting Decision Tree (GBDT) algorithm [
24]. Unlike the traditional GBDT, the category feature processing process first randomly sorts all samples. Then, for a given value taken in the category-based features, each part of that feature to numerical is averaged based on the category label ranked before that sample, adding weighting factors for priority and precedence. This practice reduces the noise caused by low-frequency features in the category features. In the regression problem, the stress calculation is generally obtained by averaging the label values. Let a permutation be
= (1, 2, …, n), then
can be replaced by:
where,
is the weight coefficient greater than 0, and
is the added last term.
CatBoost obtains new features by combining category-based elements to improve prediction performance. When constructing a new tree, CatBoost uses a greedy approach to consider combinations to build the choice of split points. No variety is considered when the tree is split for the first time. At the next split, it combines all combinations of the current tree and category-based features with all category-based features in the dataset. It dynamically converts the new varieties of category-based features into numerical parts. Its pseudo code process is shown in Algorithm 1—Pseudocode for the tree construction method of the Catboost model.
Algorithm 1: Building a tree in CatBoost |
Input: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In addition, CatBoost replaces the gradient estimation method in the traditional algorithm with ordered boosting, which reduces the bias of the gradient estimation and improves the generalization ability of the model. To obtain unbiased gradient estimation, CatBoost trains a separate model
for each sample
, the model
is obtained by using a training set that does not contain samples
. The gradient estimate over the sample is obtained using
. The gradient is used to train the base learner and obtain the final model. The pseudo-code of the ordered boosting algorithm is shown in Algorithm 2—Ordered boosting algorithm.
Algorithm 2: Ordered boosting |
Input: |
|
|
|
|
|
|
|
|
In this study, the default hyperparameter values (
Table 2) were used for the parameter configuration of CatBoost to facilitate model evaluation and subsequent application.
2.3. Cross-Validation and Evaluation Criteria
K-fold cross-validation is an essential method in machine learning. The data set is randomly divided into k copies, with the training set taking k-1 copies and the test set taking one copy. Each data set is used as a training set, and the remaining k-1 data sets are used as training sets. Thus, a total of K iterations are required for each round of training. Considering the small amount of data simulated by CFD simulation, this study uses leave-one-out cross-validation (LOOCV) to evaluate the generalization fit accuracy of the model [
25]. LOOCV is a unique form of k-fold cross-validation, which can be considered as n-fold cross-validation when k is equal to the sample size n. This means that one datum at a time is taken out as the only element of the test set. All remaining (n-1) data points are the corresponding training set, and n iterations are required for each round of training. The model hyperparameters and generalization ability are evaluated by calculating the average error over all iterations (
Figure 7). The advantage of LOOCV is that each data set is treated individually as a test set, so it is not affected by the training of the test set. The advantage of LOOCV is that each datum is individually conducted as a test set. It is not affected by the test set training set partitioning method, can make full use of the data, prevent model overfitting from occurring, and assess the actual generalization ability of the model. The combined error
can be expressed as:
Four evaluation measures were selected to indicate the performance of the CatBoost model.
The mean absolute error (MAE):
The mean squared error (MSE):
The root mean squared error (RMSE):
The mean absolute percentage error (MAPE):
The coefficient of determination (R
2):
In the above formulas, is the predicted value, is the true value, and is the average value. MAE can reflect the actual situation of the predicted value error. MSE is the expected value of the square of the difference between the estimated and the observed value; it can evaluate the degree of the data change, and the smaller the MSE, the better accuracy of the prediction model. RMSE is the arithmetic square root of MSE. MAPE is equivalent to normalizing the error at each point, reducing the impact of the absolute error from individual outliers. R2 can eliminate the influence of dimension on the evaluation measure.
2.4. Model Training
The training environment for the experiments in this study is a graphics workstation configured with a CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00 GHz, GPU: NVIDIA GeForce RTX 3090, and RAM: 64 GB. Model training uses the Anaconda platform as the model training base, Spyder as the integrated development tool, CatBoost version 0.25.1 as the model framework, and the underlying Python version 3.7. Considering that CatBoost models can conveniently mobilize GPU arithmetic for computation, and the configuration of the experimental environment can further enhance the efficiency of model training.
2.5. CFD Simulation Results
The FLUENT software was used for the numerical simulation, and the k-ε model was selected to solve the flow field using the SIMPLE algorithm. The flow path inlet is set as a pressure inlet, and the value is taken as the drop inlet pressure every 1 m in the range of 1–15 m, with a total of 15 horizontal inputs. The outlet of the flow path is set to atmospheric pressure, and the wall is treated with the wall function. The simulated flow values were obtained, as shown in
Table 3.
4. Discussion
As one of the most efficient irrigation technologies, drip irrigation technology can achieve more than 90% water-saving efficiency and has been widely used worldwide [
17]. The small size of the emitter path, generally around 0.5–1.2 mm, and the variation of its path geometry directly affect the hydraulic performance of the emitter, which has a significant impact on its anti-clogging performance, irrigation uniformity, and service life [
6]. Current research on the relationship between flow path structural parameters and hydraulic properties has received some attention [
15,
16]. However, the real reason for the change in hydraulic properties is that the change in flow path structural parameters affects the characteristic flow parameters of the internal flow field. Studies on the relationship between the flow path structural parameters, internal flow characteristics, and hydraulic characteristics of emitters are rare.
There have been studies on hydraulic performance, mainly using the flow index and flow coefficient as evaluation indices. Among the findings on the effect of structural parameters on the flow index, many studies have shown that flow path geometry parameters have a more pronounced effect on the flow index. Ref. [
32] studied the relationship between hydraulic performance and flow path length of inlaid sheet emitters and concluded that as the flow path length increases, there is a correlation with the flow index. Ref. [
33] performed an ANOVA on triangular loop flow paths and found that the flow path width had a significant effect on the flow index. In [
34], the flow path width, twist angle, height, upper bottom width, and offset were selected as key flow path parameters to perform ANOVA and extreme difference analysis on the hydraulic performance of the toothed labyrinth flow path. It was found that the top-bottom width and turning angle had a significant effect on the flow index. This is consistent with the results shown in
Figure 11 of this study. Among them, the correlation coefficient between the flow index and b reaches 0.6114, which has a significant positive correlation. The correlation coefficient with L also reaches −0.2514, with a more obvious negative correlation characteristic. In addition, the average model importance score of the b parameter on the flow index is 55.97, which significantly affects the predictive effect of the flow index (
Figure 15).
As a scale factor characterizing the emitter scale, the flow coefficient has a dominant effect on the flow rate and is proportional to the flow rate. Studies on the relationship between flow path geometry and flow coefficient variation have all concluded that flow path structural parameters significantly affect the flow coefficient, which is consistent with the results of this study (
Figure 11). In the case of the inlay patch flow path, [
32] concluded that the flow coefficient decreases as the length of the flow path increases. Ref. [
33] concluded that the height of the flow path unit, the inlet size, the height of the flow path, and the depth all have a significant effect on the flow coefficient of the triangular return flow path. Ref. [
34] concluded that the depth, width, inlet size, and unit height of the labyrinth flow path all significantly affect the flow coefficient. Ref. [
13] concluded that the flow path geometry significantly affects the flow coefficient of the patch emitter. The flow coefficient varies significantly as the cross-sectional area increases and decreases as the number of cells increases. This study also shows that the flow coefficient is strongly influenced by the geometric parameters of the flow path structure and that the width, depth, and length of the flow path are positively correlated with the flow coefficient (
Figure 17). Based on this, the MAE of the prediction model of the flow coefficient by the geometric parameters of the flow path structure is only 0.0261 (
Figure 16) and has a high correlation (R
2 = 0.9946). In addition, b also had the highest importance score (45.2) for the flow coefficient prediction model, showing a strong influence on the flow coefficient (
Figure 17).
The CatBoost model showed an excellent fit with a prediction accuracy R
2 greater than 0.99 for flow rate, flow index, and flow coefficient. As for the differences in prediction accuracy between the different algorithms, the ensemble learning algorithm was able to extract better features and fit trends by integrating the results of many weak learners. The result is similar to other studies using the CatBoost model [
28]. In addition, the resulting XGBoost model has a lower prediction accuracy than the CatBoost model with default parameters, and the studies of [
22,
23,
34] also confirm the advantages of the CatBoost model.
5. Conclusions
In this study, a small number of morphological parameters were simulated using FLUENT software to analyze the influence of flow rate. The CatBoost model was used to construct a flow rate prediction model based on the simulation. The results of this study show that the significant correlation between the geometric structure and the flow index and flow coefficient provides the basis for the correlation model. CatBoost can fit the complex nonlinear relationships between the parameters well, achieving excellent simulation accuracy for flow rate (R2 = 0.9987), flow index (R2 = 0.9961), and flow coefficient (R2 = 0.9946), where b has the highest importance score in the model construction for the flow regime index (score = 55.97) and flow coefficient (score = 45.2). Furthermore, the CatBoost models used in this study achieved the best prediction results compared to their typical counterparts (XGBoost, Bagging, Random Forest, Tree, Adaboost, and KNN). This study can provide more reliable and efficient technical support for agricultural production, which can help improve agricultural production efficiency and reduce water waste.