A Hybrid Prediction Model for CatBoost Tomato Transpiration Rate Based on Feature Extraction

Tong, Zhaoyang; Zhang, Shirui; Yu, Jingxin; Zhang, Xiaolong; Wang, Baijuan; Zheng, Wengang

doi:10.3390/agronomy13092371

Open AccessArticle

A Hybrid Prediction Model for CatBoost Tomato Transpiration Rate Based on Feature Extraction

by

Zhaoyang Tong

^1,2,

Shirui Zhang

²,

Jingxin Yu

³

,

Xiaolong Zhang

⁴,

Baijuan Wang

^5,* and

Wengang Zheng

^3,*

¹

School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China

²

National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China

³

National Engineering Research Center for Intelligent Equipment in Agriculture, Beijing 100097, China

⁴

Beijing Academy of Artificial Intelligence, Beijing 100084, China

⁵

College of Tea Science, Yunnan Agricultural University, Kunming 650201, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2023, 13(9), 2371; https://doi.org/10.3390/agronomy13092371

Submission received: 25 July 2023 / Revised: 8 September 2023 / Accepted: 11 September 2023 / Published: 12 September 2023

(This article belongs to the Special Issue Improving Irrigation Management Practices for Agricultural Production)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The growth and yield of crops are highly dependent on irrigation. Implementing irrigation plans that are tailored to the specific water requirements of crops can enhance crop yield and improve the quality of tomatoes. The mastery and prediction of transpiration rate (T_r) is of great significance for greenhouse crop water management. However, due to the influence of multiple environmental factors and the mutual coupling between environmental factors, it is challenging to construct accurate prediction models. This study focuses on greenhouse tomatoes and proposes a data-driven model configuration based on the Competitive adaptive reweighted sampling (CARS) algorithm, using greenhouse environmental sensors that collect six parameters, such as air temperature, relative humidity, solar radiation, substrate temperature, light intensity, and CO₂ concentration. In response to the differences in crop transpiration changes at different growth stages and time stages, the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm was used to identify three characteristic intervals: florescence stage, fruiting stage daytime, and fruiting stage night-time. Based on this, a greenhouse tomato T_r prediction model (CARS-CatBoost model) based on the CatBoost machine learning algorithm was constructed. The experimental verification shows that the coefficient of determination (R²) of the constructed CARS-CatBoost single model for the whole growth stage is 0.92, which is higher than the prediction accuracy of the traditional single crop coefficient model (R² = 0.54). Among them, the prediction accuracy at night during the fruiting stage is the highest, and the Root Mean Square Error (RMSE) drops to 0.427 g·m⁻²·h⁻¹. This study provides an intelligent prediction method based on the zonal modeling of crop growth characteristics, which can be used to support precise irrigation regulation of greenhouse tomatoes.

Keywords:

tomato; transpiration rate; CatBoost; CARS

1. Introduction

As the global population continues to increase and climate change intensifies, sufficient and stable food supplies are facing enormous challenges [1]. Greenhouse planting is a new type of agricultural production method that can isolate adverse external environmental conditions and achieve year-round crop production. However, it also cuts off rainfall, making artificial irrigation the only source of water supplement for greenhouse crops. The transpiration intensity of tomato is closely related to the amount of irrigation, and many studies have found that ETc decreases with decreasing irrigation [2]. Therefore, reasonable irrigation regulation according to changes in the greenhouse environment and crop growth stage is an important means to ensure the growth of greenhouse crops [3]. Among them, the accurate and quick prediction of the transpiration rate (T_r) of greenhouse crops is the key to making irrigation regulation possible. By establishing a T_r prediction model, we can better understand the water use and growth laws of greenhouse crops, provide a decision-making basis and technical support for scientific irrigation [4], and then improve the water-use efficiency of greenhouse crops.

At present, the traditional crop transpiration calculation method uses the crop coefficient model proposed by FAO-56, which has been widely used in greenhouse crops such as tomato, eggplant, and lettuce [5,6,7,8]. However, in addition to meteorological factors such as air temperature, relative humidity, and solar radiation, the input parameters of the crop coefficient model also need to estimate important parameters that are difficult to obtain, such as canopy resistance and aerodynamic resistance, which limit the wide application; In addition, the crop coefficient (Kc), which is an important parameter in the calculation and publicity, often uses empirical parameters and is affected by different climatic environments and soil characteristics. In practical applications, there is a significant error. Some studies have shown that the Mean squared error (MSE) value of the Kc during the entire growth stage of tomatoes can reach a maximum of 11.9–71.4% [9]. With the increasing scarcity of water resources and the development of precision irrigation technology, higher requirements have been put forward for real-time irrigation regulation. Therefore, the analysis and prediction of transpiration water consumption changes are more real-time, achieving higher frequency T_r analysis, which is more meaningful for on-demand irrigation regulation. However, traditional calculation methods have shown problems such as decreased accuracy and fitting degree when applied to the non-linear and instantaneous complex changes of T_r.

Machine learning (ML) is a class of methods that uses data and algorithms to achieve automated learning and reasoning. Their rise provides new possibilities for predicting crop T_r [10,11]. These algorithms can better capture hidden patterns and laws in data, adapt to different environments and crop conditions, and improve the accuracy and efficiency of predictions [12]. Tunalı et al. [13] used artificial neural networks (ANN) to estimate the actual crop evapotranspiration (ETc) of tomatoes in soilless cultivation systems, compared it with the traditional “two-step” method based on reference evapotranspiration (ET_o) and Kc, and found that the prediction accuracy of the ANN model for site-specific ETc prediction in soilless cultivation was 30% higher than that of traditional methods; Nam et al. [14] used artificial neural networks to estimate T_r and found with ANNs that the annual estimated RMSE of T_r is 0.08–0.10 g·m⁻²·min⁻¹, which is obviously better than the estimation accuracy of traditional estimation methods. However, there are still some challenges in the development of crop transpiration models based on machine learning algorithms. Most of the ANNs used in current research require massive amounts of data to be trained accurately to avoid over-fitting and under-fitting problems [15]. However, in practical agricultural production applications, the available data sets are often limited, and it is necessary to explore algorithms and methods that are more suitable for small sample data sets.

In addition, tomatoes’ T_r is affected by the growth stage and environmental factors, showing nonlinear and complex change characteristics, and the transpiration data changes at different growth stages are quite different [16]. In order to improve the prediction accuracy, this study introduced a clustering algorithm and a feature extraction algorithm to extract the data characteristics during the crop growth stage, divide different feature intervals, and construct corresponding prediction models for each interval [17], and explored a predictive modeling method that considers the crop growth process.

In order to achieve the goal of this study, we took the following steps: (1) Use CARS technology to extract the characteristic variables of environmental variables and determine the best combination of input variables; (2) use the t-SNE algorithm to cluster the data and divide the data intervals with T_r characteristics; and (3) establish a tomato T_r prediction model based on the CatBoost algorithm and verify the feasibility of the model through tomato planting test data.

2. Materials and Methods

The experiment was carried out in a solar-powered greenhouse at the National Precision Agriculture Demonstration Base in Changping District, Beijing (116°27′26.557″ east longitude, 40°11′10.779″ north latitude, 50 m above sea level) from April to July 2022 (Figure 1A). The length of the greenhouse is 60 m, the span is 8 m, and the total area is 480 m², of which the size of the test area was 22 m × 7 m. The tomato plants were planted in two rows, with 36 plants in each row, the spacing between the plants was 30 cm, and the planting density was 4.6 plants/m². On 28 April, the tomatoes were planted at the “six leaves and one heart” stage. The experiment began after the flowering stage on 20 May, the fruiting stage on 17 June, and the harvest on 28 July. On 17 June, they entered the fruiting stage, and on 28 July, the plants were pulled. They were irrigated by the drip irrigation technology under the film of coconut bran substrate cultivation. Irrigation was controlled by setting a radiation accumulation threshold. The radiation accumulation threshold set for each irrigation was 120 KJ·m⁻²·h⁻¹, and when the threshold was reached, the irrigation controller started the water pump and solenoid valve to start irrigation.

2.1. Data Collection and Processing

2.1.1. Data Collection

The actual value of tomato T_r was collected by the on-line weighing system of substrate developed by the National Agricultural Intelligent Equipment Engineering and Technology Research Centre. Six tomato plant samples with consistent growth status were selected, and the numerical changes in the weight of the substrate where the tomato plant samples were located and the flow rate of the return liquid were monitored by the substrate on-line weighing system, and the tomato T_r was derived after calculation using Equation (1) [18]:

T_{r} = \frac{I_{R E F} + B W_{T 2} - B W_{T 1} - D_{L i q u i d}}{(T_{2} - T_{1}) A} / 1000

(1)

In the formula: T_r represents the T_r (mm·h⁻¹) of the plant; I_REF represents the amount of water filled from T₁ to T₂ as measured by the electronic water meter (g); BW_T1 represents the weight value of the substrate at T₁ time (g); BW_T2 represents the weight value of the substrate at T₂ time (g); D_Liquid represents the accumulated return liquid collected by the return flow meter (D_Liquid) during T₁ to T₂ (g); and A is the crop leaf coverage area (m²).

The greenhouse sensing detection system includes: a greenhouse environment sensor (Figure 1B) (National Agricultural Intelligent Equipment Engineering Technology Research Center, Beijing, China), which was used to collect parameters such as temperature (°C), air relative humidity (%), light intensity (umol·m²·s⁻¹), and CO₂ concentration (ppm) in real time, and was set about 20 cm above the crop growth point; the TEROS12 sensor (METER GROUP, Pullman, USA) was used to measure substrate temperature (°C), and the probe was buried at a position about 5 cm horizontally and 10 cm vertically from the arrow dripper; the total radiation sensor (Wuhan Hanqin in System Science & Technology Co., Ltd., Wuhan, China), used to collect the accumulated light radiation data in the greenhouse, was set at a position 2 m above the ground in the greenhouse.

The weight and environmental sensing data were collected every 10 min and uploaded to the agricultural data platform (http://envsys.nxagricloud.com/ (accessed on 15 January 2023)) through the 4G module. We used the last collected data per hour as the parameter value for this hour.

In addition, vapor pressure deficit (VPD, kPa) is also one of the main variables in the construction of the transpiration model [19]. The VPD value is low in the dry summer conditions of greenhouse cultivation, and the difference between day and night is large, which can be used as an environmental parameter in this study. VPD was obtained by the following calculation formula:

V P D = (1 - \frac{R H}{100}) \times 0.61078 \times e^{(\frac{17.27 \times T_{m}}{237.3 + T_{m}})}

(2)

Tm is the average air temperature (°C), RH is the average relative humidity (%), and e is a natural constant.

2.1.2. Data Processing

Table 1 is the parameter information required for the experiment. In the process of monitoring and transmitting data using IoT sensors, issues such as device stability and signal quality resulted in partial data loss, duplication, data imbalance, and inconsistent data types. The number of abnormal data accounts for 1.7% of the total dataset. In order to improve the data quality and ensure the training speed and prediction accuracy of the model, we performed the following processing on the data: (1) For partially missing data, the linear interpolation method was used to supplement; (2) for the case of missing block data, the data of this time stage was directly deleted; and (3) the data was normalized using the following formula so that all data were in the same dimension:

X_{n o m} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(3)

Among them, X_nom is the normalized value, X is the original data, and X_min and X_max are the minimum and maximum values of the original data, respectively.

The descriptive statistical analysis of environmental variables is carried out by using box plots, and the threshold and abnormal values of each environmental variable can be observed intuitively. It can be seen from Figure 2 that the air temperature and air relative humidity do not show obvious abnormal values; the maximum thresholds of R_n and VPD are 209.47 KJ m⁻² h⁻¹ and 5.04 kPa, respectively, and the abnormal values are concentrated above the maximum value and the number is small, respectively, 209.14–297.27 KJ m⁻² h⁻¹ and 5.10–6.14 kPa; the abnormal values of light intensity are 769.60–1464 umol·m⁻²·s⁻¹, and the data are generally concentrated between the upper quartile and the 90th percentile; and the abnormal values of CO₂ concentration are distributed between 558.50–774.55 ppm and 223.30–311.65 ppm.

2.2. T_r Prediction Model Construction

This study adopts the CatBoost model as the basic algorithm for the tomato T_r prediction model, and the optimization of input data is the key to model construction. The construction process of the CARS-CatBoost model is shown in Figure 3. This model retains the strong ability of CARS to extract feature variables and the CatBoost model’s ability to produce good classification results without extensive data training, and utilizes the clustering advantage of the t-SNE algorithm for nonlinear variables to improve model accuracy. The structure of the CARS-CatBoost prediction model is shown in Figure 4. The specific steps of building the CARS-CatBoost model can be summarized as follows:

By combining continuous Tomato T_r and meteorological data, several continuous time-series data can be converted into a two-dimensional matrix. The gridded data matrix contains tomato T_r and meteorological variables from left to right, and time series from far to near from top to bottom. Gridded time-series data can be represented as:

X_{s, t} = \{\begin{matrix} X_{1, t - n} & X_{2, t - n} & X_{2, t - n} & \dots & X_{s, t - n} \\ X_{1, t - n + 1} & X_{2, t - n + 1} & X_{2, t - n + 1} & \dots & X_{s, t - n + 1} \\ X_{1, t - n + 2} & X_{2, t - n + 2} & X_{2, t - n + 2} & \dots & X_{s, t - n + 2} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ X_{1, t} & X_{2, t} & X_{3, t} & \dots & X_{s, t} \end{matrix}\}

(4)

2.: Pre-process a variety of data collected and calculated by various sensors, use CARS algorithm to gradually retain and eliminate variables, and finally find the data subset with the smallest Root Mean Square Error of Cross Validation (RMSECV) as the optimal combination of variables. In this study, CARS algorithm was used to filter the environmental data in the training set, where the Monte Carlo sampling number was set to 100.
3.: Use t-SNE to map high-dimensional features into two-dimensional space to form clusters, and build a model based on the formed clusters.
4.: In the process of model building, the parameters of CatBoost need to be adjusted, so the processed data set was randomly divided into training set and test set. The training set is used for parameter adjustment, and the best model parameters are confirmed according to the model evaluation index. The test set is mainly to test the generalization performance of the model to ensure that the training parameters obtained from the training set have nothing to do with the test set, and the model is more robust.
5.: The CARS-CatBoost T_r prediction model was constructed, and it was combined with the single crop coefficient model to predict the T_r of the growth stage of tomato, and the T_r divided into different characteristic intervals according to t-SNE, and the effect of the model in terms of prediction accuracy was evaluated and discussed.

2.2.1. CARS Variable Selection

The input characteristic variables of the model directly determine the accuracy and computational efficiency of the prediction. In this study, individuals with large absolute values of regression coefficients in the PLS model were retained through adaptive reweighted sampling (ARS), and multiple subsets of variables were obtained. Finally, the optimal combination of variables related to T_r was screened out from multiple subsets of variables by cross-validation method. MATLAB 2019b was used as the operating platform for CARS variable selection, the optimal number of latent variables was selected through the Monte Carlo cross-validation method, and the optimal variable subset was selected according to the RMSECV value obtained by PLS cross-validation modeling (Figure 4A). Variable were compressed, the model structure simplified, and the model performance improved [20].

Assume that Y is expressed as an m × 1 sample target attribute matrix, X is an

m \times n

sample spectral matrix, where m is the number of samples, n is the number of variables, and

α

is the combination coefficient; T is the linear combination of X and

α

, which is the sub-matrix of X;

θ

is the regression coefficient vector of the PLS model built by Y and T, where

β

and

ε

represent the n-dimensional regression coefficient vector and the sample prediction residual, respectively. Assuming Formulas (5) and (6) are established:

T = α X

(5)

Y = θ T + ε = θ α X + ε = β T + ε

(6)

In Formula (6), the regression coefficient vector

β = α θ = [β_{1}, β_{2}, \dots, β_{n}]

, the ith variable contributes to Y, then the total contribution of all wavelengths to Y is represented by the absolute value

|β_{i}| (1 \leq i \leq p)

of the ith element. Use the weight

w_{i}

as the variable preference index to evaluate the importance of each variable, where

w_{i}

is the proportion of

|β_{i}|

to the total contribution. If the value of

w_{i}

is larger, the importance of the variable is more obvious, as shown in Formula (7):

w_{i} = |β_{i}| i / f

(7)

The process of calculating

w_{i}

every time is actually the process of evaluating the importance of variables. Keep the variables with larger

|β_{i}|

values calculated each time, and then use ARS technology to recombine new variables from them. On this basis, use PLS modeling to calculate its RMSECV value. Among them, the number of sampling is set to N, repeated N times, until the end of sampling, we will obtain the optimal variable subset, that is, a series of variable subsets with the smallest RMSECV value.

2.2.2. t-SNE Visual Analysis

Since the transpiration of different growth stages of tomato varied greatly during the experiment, regional modeling was considered to improve the prediction accuracy of the model. Using t-SNE to reduce the output of the test data set to 2D or 3D space, the value of each cluster was used to color the data points in the t-SNE graph, and the distance was used to visually display the similarity and difference between different samples, and to distinguish the difference of T_r between different tomato stages (Figure 4A).

The specific steps of the t-SNE algorithm are as follows: Given a set

X = {x_{1}, x_{2}, \dots, x_{N}}

containing N sample points, for any two samples i and j, the algorithm defines the distance between samples as the probability

p_{i j}

, and the distance is expressed as Formula (8):

P_{i j} = \frac{P_{i | j} + P_{j | i}}{2 N}

(8)

For the conditional probability

P_{i | j}

between sample points, it is defined as Formula (9):

P_{i | j} = \frac{e^{- ‖x_{i} - x_{j}‖}^{^{2} / 2 σ_{i}^{2}}}{\sum_{k \neq i} e^{- ‖x_{i} - x_{j}‖}^{^{2} / 2 σ_{i}^{2}}}

(9)

σ_{i}

is the standard deviation of the Gaussian distribution of the data. The sample set

Y = {y_{1}, y_{2}, \dots, y_{N}}

after t-SNE dimension reduction is the mapping from high-dimensional space X to low-dimensional space Y, and the distance

q_{i j}

between sample points in Y can be expressed as Formula (10):

q_{i j} = \frac{{(1 + {‖y_{i} - y_{j}‖}^{2})}^{- 1}}{\sum_{k \neq l} {(1 + {‖y_{i} - y_{j}‖}^{2})}^{- 1}}

(10)

The final optimization goal of the t-SNE algorithm is the KL divergence, expressed as Formula (11):

C = K L (P | Q) = \sum_{i} \sum_{j} p_{i j} l o g \frac{p_{i j}}{q_{i j}}

(11)

Generally, we consider

p_{i j}

and

q_{i j}

to be 0 values. Since minimizing the KL divergence is a non-convex optimization, we can use stochastic gradient descent to solve it. Then, the gradient of KL divergence is Formula (12):

\frac{\partial c}{\partial y_{i}} = 4 \sum_{j \neq i} (p_{i j} - q_{i j}) (y_{i} - y_{j}) {(1 + {(y_{k} - y_{l})}^{2})}^{- 1}

(12)

2.2.3. Classification Gradient Boosting Model (CatBoost)

Considering the complexity of T_r changes and the small size of the driving data, a decision tree-based machine learning model, CatBoost, was established. Thanks to the powerful gradient boosting technology of CatBoost, it has the advantages of fast calculation and less overfitting than other algorithms, and can use less historical data to learn the relationship between crop T_r and other variables. By using the same split criterion on each node, the created tree is symmetrical and balanced. A new algorithm called Ordered boosting [21] (shows in Algorithm 1). For the input dataset

D = {\{(x_{k}, y_{k})\}}_{k = 1, \dots, n}

, permutations are performed, and the average label value of sequences with homogeneous alignment will be calculated (Figure 4B). Finally, the following formula will replace all categorical features:

x_{k}^{j} = \frac{\sum_{j = 1}^{n} [x_{j}^{i} = x_{k}^{i}] \cdot y_{j} + β \cdot P}{\sum_{j = 1}^{n} [x_{j}^{i} = x_{k}^{i}] + β}

(13)

Among them, the parameter

β > 0

, which is the prior weight, can suppress low-frequency category noise. P is the prior value.

y_{k} \in R

is the target, and

x_{k} = (x_{k}^{1}, \dots, x_{k}^{m})

is the feature. In this paper, several main parameters of the CatBoost model are shown in Table 2.

In order to overcome the problem of conditional deviation that may occur when the data structure and distribution of the training and test data sets are different, CatBoost proposed a new algorithm called Ordered boosting [22]. For sample

x_{i}

, if a model that does not include it is used to estimate its gradient, the estimated result can be regarded as an unbiased estimate:

Algorithm 1: Ordered boosting

Input:

{\{(X_{k}, Y_{k})\}}_{k = 1}^{n} o r d e r e d a c c o r d i n g t o, t h e n u m b e r o f t r e e s I

σ \leftarrow r a n d o m p e r m u t a t i o n o f [1, n]

M_{i} \leftarrow 0 f o r i = 1, \dots, n

f o r t \leftarrow 1 t o I d o

r_{i} \leftarrow y_{i} - M_{σ (i) - 1} (X_{i})

f o r t \leftarrow 1 t o I d o

Δ M \leftarrow L e a r n M o d e l [(X, r) : σ (j) \leq i]

M_{i} \leftarrow M_{i} + Δ M

r e t u r n M_{n}

2.3. Model Training Environment and Evaluation Metrics

The training environment for this research experiment was CPU: AMD Ryzen 5 3600 @ 3.60 GHz, GPU: NVIDIA GeForce GTX 1660 SUPER and RAM: 16 GB. Model training uses the Anaconda platform as the basic platform for machine learning training.

The performance of the model during training and testing was evaluated by three statistical indicators: root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R²). The specific calculation formulas are shown in Formulas (14)–(16). In addition, the single crop coefficient model [6,23] was used to calculate crop T_r and compared with the prediction results of the mixed prediction model proposed in this study.

R M S E = \sqrt{\frac{1}{m} {\sum_{i = 1}^{m} (y_{i} - \hat{y_{i}})}^{2}}

(14)

M A E = \frac{1}{m} \sum_{i = 1}^{m} |(y_{i} - \hat{y_{i}})|

(15)

R^{2} = 1 - \frac{\sum_{i} {(\hat{y_{i}} - y_{i})}^{2}}{\sum_{i} {(\bar{y_{i}} - y_{i})}^{2}}

(16)

In the above formula,

\hat{y_{i}}

represents the predicted value,

y_{i}

represents the actual value, and

\bar{y_{i}}

represents the average value.

3. Results

3.1. Model Input Variable Feature Extraction Analysis

3.1.1. Correlation Analysis

The correlation (r) between different environmental variables and T_r was analyzed, as shown in Figure 5. Air temperature variables (T_max; T_m; T_min), VPD, and R_n were positively correlated with T_r during florescence and fruiting stages, whereas relative humidity (RH_max; RH_m; RH_min) was negatively correlated with T_r. Among them, the correlation between R_n and T_r was the highest (r = 0.83 in florescence stage and r = 0.82 in fruiting stage), and the correlation between Ts and T_r was the lowest (r = 0.56 in florescence stage and r = 0.59 in fruiting stage). The correlation between each environmental variable and T_r reached more than 0.5, indicating that environmental variables have a strong impact on T_r. The above environmental variables can be obtained by selecting the mainstream environmental sensors in the market, which ensures the convenience of the application of the prediction model.

3.1.2. CARS Extracts Feature Variables

CARS was used to perform eigenvalue variable screening on environment variables. It can be seen from Figure 6a that the number of environmental variables shows a gradual decreasing trend and tends to stabilize as the number of sampling times increases; Figure 6b describes the functional relationship of the 10 times RMSECV value with the number of sampling runs, and the optimal subset is determined according to the lowest RMSECV value generated during multiple sampling stages. It can be seen from the figure that as the number of sampling increases, the RMSECV value first decreases slowly, reaches the minimum value at the 18th sampling run, and then rises rapidly. Figure 6c shows the regression coefficient paths of 11 variables in different sampling runs. At the L₁ line, the coefficient values of some samples decreased to 0, wheras the coefficient values of variable E (Figure 6c blue line) and variable CO₂ (Figure 6c red line) increased without convergence, which indicated that variable E and variable CO₂ were not key variables. The minimum RMSECV value for the 18th sampling run is marked with a blue asterisk line. Therefore, this study considered that the selected combination of variables (T_max; T_m; T_min, RH_max; RH_m; RH_min, VPD, R_n, and T_s) was optimal when the number of samples was 18.

Based on the above algorithm, we analyzed and processed the hourly dataset, and selected nine input variables for the prediction model (Table 3; the data volume was 2137. The Tm in the greenhouse was 27.8 °C, and the RH_m was 61.8%, which is within the range of air temperature (18.3–32.2 °C) and relative humidity (60–90%) suitable for greenhouse tomato growth [24]. The florescence and fruiting data volumes were 607 and 1530, respectively, 80% of which were used as training sets and 20% as test sets.

3.2. Visual Analysis of T_r Variation Law Based on t-SNE

The hourly and daily T_r changes of greenhouse tomatoes are summarized in Table 4. The total transpiration of tomato in the florescence and fruiting stages was 73.08 L, of which the water consumption modulus in the florescence and fruiting stages were 37.8% and 62.2%, respectively, and the average daily transpiration and T_r in the florescence stage were 0.883 mm·day⁻¹ and 0.041 mm·h⁻¹, respectively, which were 22.1% and 13.7% lower than those in the fruiting stage; the maximum T_r in the florescence and fruiting stages were 0.256 mm·h⁻¹ and 0.296 mm·h⁻¹, respectively, and appeared on 28 May and 24 June at noon from 12:00 to 14:00; the minimum T_r of florescence and fruiting stages were 0.004 mm·h⁻¹ and 0.005 mm·h⁻¹, respectively, and appeared on 10 June and 4 July at 1:00 to 3:00 in the morning, respectively. It can be seen that tomato T_r showed great differences in different growth stages and different stages, which caused certain difficulties and inaccuracies in the model prediction.

The t-SNE algorithm was used to cluster and analyze the transpiration characteristics of tomatoes at different growth stages, and the results are shown in Figure 7. Figure 7A shows the two-dimensional map of florescence and fruiting stages; high T_r is a curve with an upward opening, and low T_r is a fan-shaped one. Combined with the three-dimensional graphics (Figure 7C), it was found that in the low T_r interval, the clusters of the florescence and fruiting stages were relatively separated, indicating that there were large differences in the growth states of the florescence and fruiting stages. In the interval of high T_r, the clusters of florescence and fruiting stages overlap more, but the proportion of fruiting stage is higher, and the separation state is better. In addition, daytime and night-time labels are added based on solar radiation data for dimensionality reduction analysis. As shown in Figure 7B,D, high transpiration is characteristic during the daytime, and low transpiration characteristics and concentrated distribution occur at night-time. However, the separation between day and night during florescence was not ideal (Figure 7B). Therefore, it is necessary to classify and model tomato T_r to improve the prediction accuracy of short-term T_r.

3.3. Comparative Analysis of CARS-CatBoost Model Prediction Results

The test set data was used to evaluate the predictive performance of the CARS-CatBoost model in four different feature intervals, and the recommended Kc value (florescence Kc = 1.03 and fruiting Kc = 1.48) proposed by Salghi et al. [2] was used as the Kc parameter of this experiment. The results showed that the CARS-CatBoost model had a high linear correlation between the predicted value of T_r and the measured value, and the R² was 0.917, which was higher than the prediction accuracy of the single crop coefficient model (R² = 0.540). Figure 8B–D shows the comparisons between the predicted T_r results and the true values at the florescence stage, fruiting stage daytime, and night-time of tomato, respectively. It was found that the predictive effect of T_r in the three separate intervals is better than that of the whole growth stage (R² = 0.917).

For the whole growth stage of greenhouse tomato, the evaluation indicators of CARS-CatBoost and single crop coefficient model are shown in Figure 9. The RMSE, MAE, Maximum, and Standard deviations of the CARS-CatBoost model are 0.014 mm·h⁻¹, 0.010 mm·h⁻¹, 0.056 mm·h⁻¹, and 0.011 mm·h⁻¹, respectively, which are 72.1%, 72%, 74%, and 72% lower than the error values of the single crop coefficient model, indicating the excellent predictive accuracy of the CARS-CatBoost model in tomato T_r prediction.

Table 5 shows the prediction accuracy results of CARS-CatBoost and the single crop coefficient model in three different intervals of florescence stage, fruiting stage daytime and fruiting stage night-time. The RMSE of the single crop coefficient model in the three partitions are 0.028 mm·h⁻¹, 0.055 mmh⁻¹, and 0.020 mm·h⁻¹, which are much larger than the CARS-CatBoost model; under the CARS-CatBoost model, the prediction accuracy in the intervals of the three data sets was higher than that of the whole growth stage (MAE = 0.014 mm·h⁻¹), and the MAE decreased by 32.6%, 11.3%, and 93.0%, respectively. Among them, the CARS-CatBoost model had the smallest RMSE value of 0 mm·h⁻¹ and the maximum error value (Maximum) of 0.005 mm·h⁻¹ at night during the fruiting stage, providing the best prediction accuracy among the above three characteristic intervals.

4. Discussion

In this study, nine environmental variables (T_max; T_m; T_min, RH_max; RH_m; RH_min, VPD, R_n, and T_s) were used as input variables. The correlation analysis between the environmental variables and T_r showed that during the tomato florescence and fruiting stages, the main environmental variables affecting T_r were R_n, T, RH, and VPD, which is consistent with the conclusions of previous studies [25,26], and the R_n correlation was the most significant. This is because the temperature in the greenhouse gradually increased and the relative humidity gradually decreased with the increase of solar radiation, and the water vapor pressure difference between the leaf surface and the air increased, thereby accelerating the transpiration of plants in the greenhouse. The solar radiation disappears at night, and the air temperature and relative humidity decrease and increase, respectively, over time. At this time, the water vapor pressure difference between the leaves and the air decreases, which inhibits the transpiration of tomatoes [27,28,29].

According to the analysis results, The CARS-CatBoost model is more accurate than the single crop coefficient model, which is mainly determined by two factors. Firstly, Kc in the calculation of single crop coefficient models is usually based on empirical values, which may deviate from actual values due to factors such as climate and environment. Ghuman et al. [9] found that the MSE of Kc can reach a maximum of 11.9–71.4%. Reis et al. [30] found that the estimation error in the early tomato fruiting stage can reach 38%, resulting in up to 20% water waste [31]. Second, compared with the single crop coefficient model, the machine learning model can use the entire data set for training, minimize information loss, and still provide high prediction accuracy in the case of missing variables. Kim et al. [32] proposed a CNN-CatBoost hybrid model solar radiation prediction method and concluded that the prediction accuracy and stability of this hybrid model is better than the single model of CNN and CatBoost; Niu et al. [33] introduced a machine learning method based on wavelet packet denoising and CatBoost for weather forecasting. Using a feature selection and spatio-temporal feature addition to improve forecasting performance, the results show that the CatBoost model combined with wavelet packet denoising can achieve shorter convergence time and higher forecasting accuracy than forecasting models using deep learning or machine learning algorithms alone. In the studies of the above-mentioned scholars, they all considered nonlinear and complex environmental changes, which is similar to the research object of this study. Therefore, in order to predict the T_r of tomato, we adopted the CatBoost model and achieved satisfactory prediction results.

In addition, from the visual comparison of the measured and predicted values of tomato T_r changes over time in Figure 10, it can be found that the maximum prediction errors of CARS-CatBoost and the single crop coefficient model both appeared at noon in the whole growth stage, which were 0.056 mm·h⁻¹ and 0.212 mm·h⁻¹, respectively. The coincidence degree between the change rule of the predicted value of the CARS-CatBoost model and the real value is significantly better than that of the single crop coefficient model; especially in the case of partition modeling, the predicted curves of the florescence stage, fruiting stage, and fruiting stage night are more consistent with the actual curve, and the difference is smaller. This shows that the prediction model can improve the estimation accuracy by dividing different time intervals and emphasizes the advantages of transpiration prediction based on T_r characteristic intervals.

This study utilized multiple advanced sensors developed by the National Agricultural Intelligent Equipment Engineering Technology Research Center to obtain a large amount of high-precision data, aiming to construct a prediction model that can approach the accurate level of tomato actual transpiration rate. Due to the significant influence of greenhouse environment and crop species on transpiration rate, optimizing the model through traditional crop models has shown limited effectiveness in improving accuracy. However, data-driven machine learning modeling methods can achieve high-precision modeling by continuously collecting and training greenhouse data, ultimately meeting the demand for precision irrigation in facility agriculture [34].

5. Conclusions

In this study, we analyzed and extracted the main environmental variables affecting tomato transpiration and established a hybrid prediction model for tomato T_r based on the CatBoost algorithm (CARS-CatBoost model). By analyzing the results, we draw the following conclusions:

Through the correlation analysis of tomato T_r and environmental variables, it was found that temperature, VPD, and R_n were positively correlated with T_r, and relative humidity was negatively correlated with T_r, among which R_n had the highest correlation with T_r. For the prediction results of the whole growth stage, compared with the traditional single crop coefficient model, the RMSE and MAE of the CARS-CatBoost prediction model were lower by 72.1% and 72.0%, respectively, indicating that the prediction performance of the CARS-CatBoost model was better than that of the single crop coefficient model. Under the framework of the CARS-CatBoost model, the RMSE of the partition model established according to the three characteristic intervals of the florescence stage, the fruiting stage, and the night fruiting stage decreased by 13.1%, 18.5%, and 97.0%, respectively, compared with the whole growth stage, indicating that the CARS-CatBoost model can further improve and predict the effect of tomato partition modeling. This study provides useful guidance for exploring the precise irrigation system in different stages of the greenhouse tomato growth cycle.

Author Contributions

Conceptualization, J.Y. and S.Z.; methodology, Z.T.; software, S.Z.; validation, Z.T., S.Z. and J.Y.; formal analysis, Z.T.; investigation, Z.T.; resources, X.Z.; data curation, Z.T.; writing—original draft preparation, Z.T.; writing—review and editing, J.Y.; visualization, Z.T.; supervision, Z.T.; project administration, B.W.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Yunnan Provincial Major Science and Technology Special Project (202202AE090066); Hebei Provincial Key Research and Development Program Project (22327401D); and Beijing Academy of Agriculture and Forestry Sciences Major Scientific and Technological Achievement Cultivation Project: Key Technology Equipment and Industrialization of Smart Irrigation.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fuglie, K. Climate change upsets agriculture. Nat. Clim. Change 2021, 11, 294–295. [Google Scholar] [CrossRef]
Gong, X.; Wang, S.; Xu, C.; Zhang, H.; Ge, J. Evaluation of Several Reference Evapotranspiration Models and Determination of Crop Water Requirement for Tomato in a Solar Greenhouse. HortScience 2020, 55, 244–250. [Google Scholar] [CrossRef]
Sapounas, A.; Katsoulas, N.; Slager, B.; Bezemer, R.; Lelieveld, C. Design, Control, and Performance Aspects of Semi-Closed Greenhouses. Agronomy 2020, 10, 1739. [Google Scholar] [CrossRef]
Babakos, K.; Papamichail, D.; Tziachris, P.; Pisinaras, V.; Demertzi, K.; Aschonitis, V. Assessing the Robustness of Pan Evaporation Models for Estimating Reference Crop Evapotranspiration during Recalibration at Local Conditions. Hydrology 2020, 7, 62. [Google Scholar] [CrossRef]
Yan, H.; Zhang, C.; Coenders Gerrits, M.; Acquah, S.J.; Zhang, H.; Wu, H.; Zhao, B.; Huang, S.; Fu, H. Parametrization of aerodynamic and canopy resistances for modeling evapotranspiration of greenhouse cucumber. Agric. For. Meteorol. 2018, 262, 370–378. [Google Scholar] [CrossRef]
Ge, J.; Zhao, L.; Yu, Z.; Liu, H.; Zhang, L.; Gong, X.; Sun, H. Prediction of Greenhouse Tomato Crop Evapotranspiration Using XGBoost Machine Learning Model. Plants 2022, 11, 1923. [Google Scholar] [CrossRef] [PubMed]
Jiankun, G.; Yanfei, L.; Zengjin, L.; Xuewen, G.; Cundong, X. Comparing the performance of greenhouse crop transpiration prediction models based on ANNs. J. Environ. Biol. 2019, 40, 418–426. [Google Scholar] [CrossRef]
Cahn, M.D.; Johnson, L.F.; Benzen, S.D. Evapotranspiration Based Irrigation Trials Examine Water Requirement, Nitrogen Use, and Yield of Romaine Lettuce in the Salinas Valley. Horticulturae 2022, 8, 857. [Google Scholar] [CrossRef]
Ghumman, A.R.; Jamaan, M.; Ahmad, A.; Shafiquzzaman, M.; Haider, H.; Al Salamah, I.S.; Ghazaw, Y.M. Simulation of Pan-Evaporation Using Penman and Hamon Equations and Artificial Intelligence Techniques. Water 2021, 13, 793. [Google Scholar] [CrossRef]
Feng, K.; Tian, J. Forecasting reference evapotranspiration using data mining and limited climatic data. Eur. J. Remote Sens. 2021, 54, 363–371. [Google Scholar] [CrossRef]
Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef] [PubMed]
Cao, J.; Wang, H.; Li, J.; Tian, Q.; Niyogi, D. Improving the Forecasting of Winter Wheat Yields in Northern China with Machine Learning—Dynamical Hybrid Subseasonal-to-Seasonal Ensemble Prediction. Remote Sens. 2022, 14, 1707. [Google Scholar] [CrossRef]
Tunalı, U.; Tüzel, I.H.; Tüzel, Y.; Şenol, Y. Estimation of actual crop evapotranspiration using artificial neural networks in tomato grown in closed soilless culture system. Agric. Water Manag. 2023, 284, 108331. [Google Scholar] [CrossRef]
Nam, D.S.; Moon, T.; Lee, J.W.; Son, J.E. Estimating transpiration rates of hydroponically-grown paprika via an artificial neural network using aerial and root-zone environments and growth factors in greenhouses. Hortic. Environ. Biotechnol. 2019, 60, 913–923. [Google Scholar] [CrossRef]
Guggilam, S.; Chandola, V.; Patra, A. Large Deviations for Accelerating Neural Networks Training. arXiv 2023, arXiv:2303.00954. [Google Scholar]
Qiu, R.; Kang, S.; Du, T.; Tong, L.; Hao, X.; Chen, R.; Chen, J.; Li, F. Effect of convection on the Penman–Monteith model estimates of transpiration of hot pepper grown in solar greenhouse. Sci. Hortic. 2013, 160, 163–171. [Google Scholar] [CrossRef]
Dong, L.; Zeng, W.; Wu, L.; Lei, G.; Chen, H.; Srivastava, A.K.; Gaiser, T. Estimating the Pan Evaporation in Northwest China by Coupling CatBoost with Bat Algorithm. Water 2021, 13, 256. [Google Scholar] [CrossRef]
Yong-dong, Z.; Quan-ming, Z.; Xin, Z.; Xu-zhang, X.; Li-li, Z.; Shu-juan, W. Development of Multi-Channel Potted-Plant Evapotranspiration Measurement System Based on Lora Wireless Technology. Water Sav. Irrig. 2020, 77–84.tion. [Google Scholar]
Lu, N.; Nukaya, T.; Kamimura, T.; Zhang, D.L.; Kurimoto, I.; Takagaki, M.; Maruo, T.; Kozai, T.; Yamori, W. Control of vapor pressure deficit (VPD) in greenhouse enhanced tomato growth and productivity during the winter season. Sci. Hortic. 2015, 197, 17–23. [Google Scholar] [CrossRef]
Long, Z.-Z.; Xu, G.; Du, J.; Zhu, H.; Yan, T.; Yu, Y.-F. Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework. Big Data Res. 2021, 23, 100170. [Google Scholar] [CrossRef]
Solano, E.S.; Dehghanian, P.; Affonso, C.M. Solar Radiation Forecasting Using Machine Learning and Ensemble Feature Selection. Energies 2022, 15, 7049. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 6639–6649. [Google Scholar]
Morille, B.; Migeon, C.; Bournet, P.E. Is the Penman–Monteith model adapted to predict crop transpiration under greenhouse conditions? Application to a New Guinea Impatiens crop. Sci. Hortic. 2013, 152, 80–91. [Google Scholar] [CrossRef]
Wu, L.; Zhang, X.; Xiao, G. Effects of Environmental Factors on Tomato Growth. Agric. Sci. Technol. 2015, 16, 272–277. [Google Scholar]
Salghi, R.; Abouatallah, A.; Jaouhari, N.E.; Hammouti, B. Impact of Drip Irrigation Scheduling on Vegetative Parameters in Tomato (Lycopersicon esculentum Mill.) Under Unheated Greenhouse. Int. J. Eng. Res. Appl. 2014, 4, 71–76. [Google Scholar]
Antonopoulos, V.Z.; Antonopoulos, A.V. Daily reference evapotranspiration estimates by artificial neural networks technique and empirical equations using limited input climate variables. Comput. Electron. Agric. 2017, 132, 86–96. [Google Scholar] [CrossRef]
Xu, K.; Guo, X.; He, J.; Yu, B.; Tan, J.; Guo, Y. A study on temperature spatial distribution of a greenhouse under solar load with considering crop transpiration and optical effects. Energy Convers. Manag. 2022, 254, 115277. [Google Scholar] [CrossRef]
Yang, L.; Liu, H.; Tang, X.; Li, L. Tomato Evapotranspiration, Crop Coefficient and Irrigation Water Use Efficiency in the Winter Period in a Sunken Chinese Solar Greenhouse. Water 2022, 14, 2410. [Google Scholar] [CrossRef]
Liu, H.; Shao, M.; Yang, L. Photosynthesis Characteristics of Tomato Plants and Its’ Responses to Microclimate in New Solar Greenhouse in North China. Horticulturae 2023, 9, 197. [Google Scholar]
Reis, L.S.; de Souza, J.L.; de Azevedo, C.A.V. Evapotranspiração e coeficiente de cultivo do tomate caqui cultivado em ambiente protegido. Rev. Bras. Eng. Agríc. Ambient. 2009, 13, 289–296. [Google Scholar] [CrossRef]
Libardi, L.G.P.; de Faria, R.T.; Dalri, A.B.; de Souza Rolim, G.; Palaretti, L.F.; Coelho, A.P.; Martins, I.P. Evapotranspiration and crop coefficient (Kc) of pre-sprouted sugarcane plantlets for greenhouse irrigation management. Agric. Water Manag. 2019, 212, 306–316. [Google Scholar] [CrossRef]
Kim, H.; Park, S.; Park, H.J.; Son, H.G.; Kim, S. Solar Radiation Forecasting Based on the Hybrid CNN-CatBoost Model. IEEE Access 2023, 11, 13492–13500. [Google Scholar] [CrossRef]
Niu, D.; Diao, L.; Zang, Z.; Che, H.; Zhang, T.; Chen, X. A Machine-Learning Approach Combining Wavelet Packet Denoising with Catboost for Weather Forecasting. Atmosphere 2021, 12, 1618. [Google Scholar] [CrossRef]
Jung, D.-H.; Lee, T.S.; Kim, K.; Park, S.H. A Deep Learning Model to Predict Evapotranspiration and Relative Humidity for Moisture Control in Tomato Greenhouses. Agronomy 2022, 12, 2169. [Google Scholar] [CrossRef]

Figure 1. The layout of the greenhouse tomato planting experiment. (A) is the photo of the greenhouse experiment, and (B) is the schematic diagram of the weighing lysimeter.

Figure 2. The studied box plot of environmental variables.

Figure 3. The training process of the CARS-CatBoost T_r prediction model.

Figure 4. CARS-CatBoost prediction model structure.

Figure 5. Correlation between T_r and environmental variables in tomato florescence and fruiting stages. (A) Florescence stage; (B) Fruiting stage. Note: * indicates significant at the p < 0.05 level, and ** indicates significant at the p < 0.01 level.

Figure 6. Variable screening results of competitive adaptive reweighted sampling (CARS). (a) The number of sampled variables; (b) 10-fold RMSECV value; and (c) the variation trend of the regression coefficient path of each variable with the increase of sampling in CARS operation.

Figure 7. Visualization results after dimensionality reduction based on the t-SNE algorithm. (A,C) are the t-SNE results of florescence and fruiting stages; and (B,D) are the optimized t-SNE results for day and night.

Figure 8. Comparative analysis of predicted T_r and actual T_r by CARS-CatBoost (A–D) and single crop coefficient (E–H) models in different characteristic intervals. (A,E) are the whole growth stage; (B,F) are the florescence stage; (C,G) are the fruiting stage daytime; and (D,H) are the fruiting stage night-time.

Figure 9. The MAE values of the CARS CatBoost model and the single crop coefficient model for the entire growth period of tomatoes.

Figure 10. The trend graphs of calculated results with the CARS-CatBoost and single crop coefficient model compared to the measured T_r. (A) Full growth stage; (B) florescence stage; (C) daytime fruiting stage; and (D) night-time fruiting stage.

Table 1. Greenhouse parametric statistics.

Symbols	Description	Units
T_r	Transpiration rate	mm·h⁻¹
T_max	Max. air temperature	°C
T_min	Min. air temperature	°C
T_m	Average. air temperature	°C
RH_max	Max. relative humidity	%
RH_min	Max. relative humidity	%
RH_m	Average. relative humidity	%
R_n	Solar radiation	KJ·m⁻²·h⁻¹
VPD	Vapor Pressure Deficit	kPa
T_s	Coconut coir temperature	°C
E	Light intensity	umol·m⁻²·s⁻¹
CO₂	Carbon dioxide concentration	ppm

Table 2. CatBoost parameter settings.

Parameter Name	Value
n_estimators	10,000
early_stopping_rounds	5
depth	6
learning rate	0.1
loss_function	RMSE

Table 3. The statistical results of tomato hourly experimental data were used for training and testing.

Variables		Maximum	Minimum	Average	Standard Deviation
Symbols	Units	Maximum	Minimum	Average	Standard Deviation
T_max	°C	43.5	12.4	28.5	6.0
T_min	°C	41.8	11.2	27.2	5.6
T_m	°C	42.9	11.8	27.8	5.8
RH_max	%	97.1	17.9	64.2	20.1
RH_min	%	96.5	12.0	59.3	20.8
RH_m	%	97.0	15.0	61.8	21.4
R_n	KJ·m⁻²·h⁻¹	355.7	0.0	47.8	65.5
VPD	KPa	6.1	0.1	1.7	1.3
T_s	°C	33.3	18.2	26.4	2.7
Amount	pcs	2137
Training Set	pcs	1710
Testing set	pcs	427

Table 4. Statistical results of hourly and daily transpiration data of tomatoes.

Stage	Numerical Value	Hourly T_r	Daily T_r	Water Consumption Modulus
Stage	Numerical Value	mm·h⁻¹	mm·day⁻¹	%
Florescence stage	Maximum	0.256	1.875	37.8
	Minimum	0.004	0.157
	Average	0.041	0.883
Fruiting stage	Maximum	0.296	2.104	62.2
	Minimum	0.005	0.296
	Average	0.048	1.133
Whole growth stage	Maximum	0.296	2.104	100
	Minimum	0.004	0.157
	Average	0.047	1.022

Table 5. Evaluation indicators of CARS-CatBoost and single crop coefficient model in different partitions (unit: mm·h⁻¹).

Model	Evaluation Metrics	Florescence Stage	Fruiting Stage DayTime	Fruiting Stage Night-Time
CARS-CatBoost	RMSE	0.012	0.012	0.000
	MAE	0.006	0.009	0.001
	Maximum	0.029	0.028	0.005
	Standard deviation	0.006	0.008	0.001
Single Crop Coefficient	RMSE	0.028	0.055	0.020
	MAE	0.018	0.041	0.015
	Maximum	0.114	0.194	0.057
	Standard deviation	0.022	0.037	0.013

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tong, Z.; Zhang, S.; Yu, J.; Zhang, X.; Wang, B.; Zheng, W. A Hybrid Prediction Model for CatBoost Tomato Transpiration Rate Based on Feature Extraction. Agronomy 2023, 13, 2371. https://doi.org/10.3390/agronomy13092371

AMA Style

Tong Z, Zhang S, Yu J, Zhang X, Wang B, Zheng W. A Hybrid Prediction Model for CatBoost Tomato Transpiration Rate Based on Feature Extraction. Agronomy. 2023; 13(9):2371. https://doi.org/10.3390/agronomy13092371

Chicago/Turabian Style

Tong, Zhaoyang, Shirui Zhang, Jingxin Yu, Xiaolong Zhang, Baijuan Wang, and Wengang Zheng. 2023. "A Hybrid Prediction Model for CatBoost Tomato Transpiration Rate Based on Feature Extraction" Agronomy 13, no. 9: 2371. https://doi.org/10.3390/agronomy13092371

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Prediction Model for CatBoost Tomato Transpiration Rate Based on Feature Extraction

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Processing

2.1.1. Data Collection

2.1.2. Data Processing

2.2. T_r Prediction Model Construction

2.2.1. CARS Variable Selection

2.2.2. t-SNE Visual Analysis

2.2.3. Classification Gradient Boosting Model (CatBoost)

2.3. Model Training Environment and Evaluation Metrics

3. Results

3.1. Model Input Variable Feature Extraction Analysis

3.1.1. Correlation Analysis

3.1.2. CARS Extracts Feature Variables

3.2. Visual Analysis of T_r Variation Law Based on t-SNE

3.3. Comparative Analysis of CARS-CatBoost Model Prediction Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Hybrid Prediction Model for CatBoost Tomato Transpiration Rate Based on Feature Extraction

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Processing

2.1.1. Data Collection

2.1.2. Data Processing

2.2. Tr Prediction Model Construction

2.2.1. CARS Variable Selection

2.2.2. t-SNE Visual Analysis

2.2.3. Classification Gradient Boosting Model (CatBoost)

2.3. Model Training Environment and Evaluation Metrics

3. Results

3.1. Model Input Variable Feature Extraction Analysis

3.1.1. Correlation Analysis

3.1.2. CARS Extracts Feature Variables

3.2. Visual Analysis of Tr Variation Law Based on t-SNE

3.3. Comparative Analysis of CARS-CatBoost Model Prediction Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. T_r Prediction Model Construction

3.2. Visual Analysis of T_r Variation Law Based on t-SNE