A Stacking Ensemble Learning Model Combining a Crop Simulation Model with Machine Learning to Improve the Dry Matter Yield Estimation of Greenhouse Pakchoi

Wang, Chao; Xu, Xiangying; Zhang, Yonglong; Cao, Zhuangzhuang; Ullah, Ikram; Zhang, Zhiping; Miao, Minmin

doi:10.3390/agronomy14081789

Open AccessArticle

A Stacking Ensemble Learning Model Combining a Crop Simulation Model with Machine Learning to Improve the Dry Matter Yield Estimation of Greenhouse Pakchoi

by

Chao Wang

¹

,

Xiangying Xu

^1,2,*,

Yonglong Zhang

¹,

Zhuangzhuang Cao

³,

Ikram Ullah

³,

Zhiping Zhang

³

and

Minmin Miao

^2,3,4,*

¹

College of Information Engineering, Yangzhou University, Yangzhou 225009, China

²

Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou 225009, China

³

College of Horticulture and Landscape Architecture, Yangzhou University, Yangzhou 225009, China

⁴

Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding, Yangzhou University, Yangzhou 225009, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2024, 14(8), 1789; https://doi.org/10.3390/agronomy14081789

Submission received: 15 July 2024 / Revised: 12 August 2024 / Accepted: 13 August 2024 / Published: 14 August 2024

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Crop models are instrumental in simulating resource utilization in agriculture, yet their complexity necessitates extensive calibration, which can impact the accuracy of yield predictions. Machine learning shows promise for enhancing yield estimations but relies on vast amounts of training data. This study aims to improve the pakchoi yield prediction accuracy of simulation models. We developed a stacking ensemble learning model that integrates three base models—EU-Rotate_N, Random Forest Regression and Support Vector Regression—with a Multi-layer Perceptron as the meta-model for the pakchoi dry matter yield prediction. To enhance the training dataset and bolster machine learning performance, we employed the EU-Rotate_N model to simulate daily dry matter yields for unsampled data. The test results revealed that the stacking model outperformed each base model. The stacking model achieved an R² value of 0.834, which was approximately 0.1 higher than that of the EU-Rotate_N model. The RMSE and MAE were 0.283 t/ha and 0.196 t/ha, respectively, both approximately 0.6 t/ha lower than those of the EU-Rotate_N model. The performance of the stacking model, developed with the expanded dataset, showed a significant improvement over the model based on the original dataset.

Keywords:

EU-Rotate_N; machine learning; stacking model; dry matter yield; pakchoi

1. Introduction

China maintains a global precedence in both the cultivation area and production of pakchoi, thereby significantly contributing to the stability of the international vegetable supply. As reported by the Ministry of Agriculture in 2023, the country’s annual cultivation area for pakchoi is estimated at 2.7 million hectares, which constitutes around 15% of China’s total vegetable cultivation area [1]. Pakchoi (Brassica campestris L. ssp.), a highly regarded leafy vegetable, is extensively cultivated in China and across several East Asian nations for its abundant nutritional content, including vitamin C, crude fiber, and anthocyanins [2]. Its cultivation is also gaining momentum in Europe, where it is valued for its health-enhancing attributes, thereby underscoring its significance in human nutrition [3].

Due to its inability to be stored and transported for a long time, pakchoi products for large cities, such as Shanghai, mainly come from the plant bases in the suburbs. However, the area available for vegetable cultivation in the suburbs is becoming smaller and smaller because of the industrialization of large cities. Therefore, raising the yield per unit area is crucial for pakchoi production nowadays.

For leafy vegetables, nitrogen fertilizer is the key factor in their growth and has an important role in their nutritional value and yield. In production, it has become a common phenomenon for farmers to excessively use chemical fertilizers in pursuing high yields, which not only leads to waste of fertilizers but also causes huge environmental costs [4]. Nitrogen is a critical element for plant growth, exhibiting both beneficial and detrimental effects that warrant careful consideration. Positively, the judicious application of nitrogen fertilizer can markedly enhance crop yield and quality, thereby making a substantial contribution to global food security. Nonetheless, the adverse impacts of nitrogen should not be overlooked. Overuse of nitrogen fertilizer can result in nitrogen loss, precipitate water eutrophication, and disrupt aquatic ecosystems. The amount of nitrogen applied was not only directly affected by fertilization but also by the type of previous crop. For instance, stubble from leguminous crops harbors a higher nitrogen content, which serves as an excellent source of mineral nitrogen for subsequent crops [5]. This effectively reduces the amount of fertilizer required. Therefore, finding a reasonable nitrogen application amount becomes an important topic. Today, with the advances in precision agriculture technology, more and more studies are beginning to focus on water and fertilizer transport and precise application in vegetable cultivation in order to ensure an increased yield and income [6,7,8,9].

Process-based crop simulation models are widely utilized to mimic the growth and yield of vegetables in response to diverse management strategies and environmental factors. These models require the input of numerous parameters, including crop cultivars, soil characteristics, meteorological conditions, and cultivation techniques. To date, established crop models such as WOFOST, DSSAT, APSIM, and AquaCrop have been adapted to support the simulation of growth of dozens of vegetable varieties [10,11,12,13,14]. Among the array of crop simulation models, EU-Rotate_N stands out as a specialized tool designed for vegetables and applied in both traditional and organic crop rotation systems to refine nitrogen management across European regions [15,16]. Its enhanced versions have been employed to simulate the growth of various vegetables in China, demonstrating promising application outcomes [17,18,19,20]. However, the simulation model currently faces challenges in accurately predicting vegetable yields due to the complexities and uncertainties associated with a multitude of input parameters. Furthermore, certain region-specific vegetable varieties, like pakchoi, have yet to be fully integrated into the mainstream models, which are predominantly developed by countries in the developed world.

Numerous studies have highlighted the efficacy of using a multi-model ensemble (MME) approach in crop production. MME, which integrates the outputs of multiple models to mitigate uncertainty and enhance predictive accuracy in complex systems, often yields more robust simulation outcomes than any single model alone [21,22,23,24]. Indeed, the concept of MME has a long-standing track record of success in the field of climate forecasting, having been employed for several decades [25]. In recent times, MME has gained widespread recognition across agriculture and other fields for its outstanding performance [26]. It has been demonstrated that MME can effectively constrain simulation uncertainties and mitigate the biases inherent in individual models [27]. For instance, a weighted average ensemble model, such as a performance-based weighting mode, typically outperforms the best individual model and even surpasses a simple average ensemble model [28,29,30]. Nonetheless, current MME methodologies applied to crop models predominantly concentrate on major food crops like wheat and rice, with the number of MME models dedicated to vegetables remaining relatively scarce [31,32].

However, it is important to note that different crop models possess unique parameter structures, necessitating a substantial number of input variables and regional calibrations prior to their application. This process is both data-intensive and time-consuming [33]. Consequently, several studies have advocated for the judicious application of MME, emphasizing that their use should be predicated on the availability of site-specific observations of plant and soil conditions for model calibration [34,35]. In contrast, machine learning (ML)—a data-driven approach capable of analyzing vast datasets, discerning underlying patterns and relationships, and subsequently utilizing these insights for outcome prediction or decision-making—has emerged as a formidable tool in the realm of crop yield forecasting. ML models have demonstrated performance comparable to or even surpassing that of traditional process-based models, primarily due to their user-friendly nature [36].

Commonly employed ML algorithms, such as Random Forest (RF), Support Vector Machine (SVM), and artificial neural networks (ANNs), have garnered success in predicting crop yields, as evidenced by a plethora of previous studies [37,38,39,40,41]. These studies have predominantly concentrated on staple crops, including wheat, groundnut, and millet. However, the application of ML models to forecast the yields of diverse vegetable crops has not been as extensively investigated. Nevertheless, recent advancements have begun to address this research gap, with emerging studies highlighting the efficacy of ML in yield prediction for specific vegetables, such as potato [42,43,44], carrot [45], and eggplant [46]. These studies underscore the potential of ML models to be adapted and applied to a broader spectrum of vegetable crops. The present study, focusing on pakchoi, aims to contribute to this growing body of research by exploring the applicability of ML models to this particular crop.

A significant constraint of machine learning (ML) models is their limited interpretability of outcomes, a characteristic attributed to their “black box” nature. Consequently, ML models and crop simulation models can be synergistically integrated to develop a hybrid model that not only enhances performance but also improves interpretability [47]. Studies have demonstrated that the fusion of crop simulation and ML approaches enhances the crop yield prediction accuracy across various agricultural regions, including the U.S. Corn Belt, the wheat-growing regions of southeastern Australia, and the North China Plain [48,49,50,51]. For instance, the hybrid model that combines the APSIM and Random Forest (RF) models has significantly enhanced the accuracy of wheat and corn yield predictions compared to the use of individual models [48,49].

In contrast to field grain crops, such as wheat and rice, vegetables like pakchoi have a short growth cycle, resulting in a limited amount of sampling data throughout the growth period. This paucity of data can compromise the generalizability of ML models. To overcome this challenge, this study introduces a prediction approach for pakchoi dry matter (DM) yield using a stacking ensemble model that integrates multiple ML models with an enhanced EU-Rotate_N model. Given that the majority of the aboveground biomass of pakchoi contributes to its yield and there exists a strong correlation between DM and the yield of pakchoi, we estimate the pakchoi DM as a representation for pakchoi yield in this study. The purpose of this study is to improve the accuracy of in-season yield predictions for pakchoi by integrating the process-based model EU-Rotate_N with machine learning algorithms.

The main contributions are as follows: (1) to explore a data augmentation method for ML models using the process-based model EU-Rotate_N; (2) to investigate the stacking model combined EU-Rotate_N with ML for pakchoi in-season yield prediction; and (3) to evaluate multiple ensemble models with different stacking methods.

2. Materials and Methods

2.1. Data Collection

In this research, the data are divided into sampling data and simulation data. The sampling data include weather data, soil sampling data, crop DM data, fertilization data, and irrigation data. The simulation data primarily consist of the pakchoi DM data predicted by the EU-Rotate_N model.

2.1.1. Sampling Data

The sampling data were collected from field-scale experiments conducted in 2018 at the Yangzhou Lehuo Vegetable Base (119°48′ E, 32°29′ N) in Jiangsu Province, China. This region is characterized by a subtropical humid climate with distinct seasonal variations, typical of vegetable production areas. The average air temperature within the greenhouse during 2018 was 20.2 °C, and the average humidity across the three planting batches inside the greenhouse was 71.17%. The greenhouse, oriented north–south with dimensions of 50 m × 6 m, was equipped with a shading net during the summer months to mitigate the intense sunlight and high temperatures. The pakchoi variety used in the experimental trials was Shanghai green.

A meteorological station (NHQXZ60l) was installed in the greenhouse to collect meteorological data at ten-minute intervals, including a psychrometer for measuring air temperature and relative humidity (NH121WS-R), a sunlight intensity meter for measuring solar radiation (NHFS15B), and a cup-type anemometer for measuring wind speed (NHFS45BP, 0.5 m). Since the experiment was conducted in a greenhouse, the rainfall was 0. Weather data, exported on a daily basis from the station, were employed in this study. The daily meteorological data obtained, including maximum and minimum temperatures, maximum and minimum humidity, sunshine hours, and wind speed, were incorporated into our research analysis.

Given that pakchoi is a vegetable with a shallow root system, we sampled soil from the 0–30 cm depth layer within greenhouses for subsequent analysis. This approach enabled us to determine the physical characteristics and nutrient composition of the soil. The primary soil parameters assessed included the bulk density, sand content, clay content, field capacity, and pH value, among others, with the collection methodologies detailed in our earlier studies [19,20]. At the same time, the field management data, such as the times and amounts of irrigation and fertilization, were collected.

The soil bulk density and field capacity were determined using the ring knife method [52,53]. The sand and clay contents were analyzed using a soil particle size analysis [52,53]. The saturated water content was measured using the oven drying method [52,53]. Soil pH was assessed with a pH meter.

The nutrient content and fundamental physical properties of the topsoil (0–30 cm depth) within the experimental greenhouse are presented in Table 1.

Crop data were sampled during the growth period of pakchoi. Before harvesting, destructive samples were taken to measure the fresh weight. Pakchoi samples were dried at 72 °C until a constant weight was achieved to calculate the DM of each plot. At harvest, the pakchoi yield of each plot was weighed, and the DM of the plot after drying was obtained.

2.1.2. Simulation Data

ML models are prone to overfitting when trained on limited datasets, which diminishes their generalization capabilities. Consequently, in addition to the actual data collected and utilized for the crop simulation model, it is imperative to integrate supplementary simulated data into the dataset to build the ML model. To mitigate the challenges associated with small datasets, previous researchers have enhanced their datasets through sampling techniques and other sophisticated augmentation strategies such as RandAugment and back-translation [54,55,56]. In this study, the EU-Rotate_N model was employed to generate DM data throughout the growth phase of pakchoi. These simulated data were concatenated with meteorological, soil, and crop cultivation data to serve as the augmented input for the ML models. The efficacy of this proposed data augmentation strategy was explored within the context of this research.

2.1.3. Dataset

Due to the scarcity of experimental data acquired through manual measurements, this study employed the EU-Rotate_N simulation model to augment the original dataset. DM data for non-sampled days were generated by inputting daily weather data, along with soil and crop parameters, into the EU-Rotate_N model, thereby increasing the data volume available for training ML models.

To assess the performance of the ML models and the stacking model, the dataset was partitioned into training and testing sets. Given our focus on pakchoi yield during the harvest period, the sampling data corresponding to the harvest period were designated as the test set, while the sampling data from earlier growth stages were allocated to the training set. The training and testing data were divided at an approximate ratio of 8:3, with 96 records allocated for training and 36 records for testing. To further augment the training data, the EU-Rotate_N simulation model was utilized to simulate DM for non-sampling dates prior to the harvest, resulting in an expanded final training dataset comprising 480 records.

2.2. Methods

2.2.1. Field Experiment Design

The experiment was carried out in a greenhouse with a size of 6 m × 50 m. Six nitrogen treatments (N0, N1, N2, N3, N4, and N5) were set up and repeated three times in the experiment to find the optimal nitrogen amount, and there were 18 plots in total with plastic films or partitions buried between plots to prevent the impact of nutrient lateral movement. The planting and harvest dates for the three batches of pakchoi, as well as the fertilizer application rates for the six nitrogen treatments, are detailed in Table 2. The plot area and arrangement of six treatments were consistent with our previous research [20], where the area of each plot was 12.5 m² (2.5 m × 5 m).

2.2.2. EU-Rotate_N Model Description

In this study, the EU-Rotate_N model was employed to simulate the daily DM of pakchoi. Originally derived from the N_ABLE model [57], EU-Rotate_N was developed as an instrument for optimizing nitrogen fertilizer application in field crop rotations and has been adapted for a range of vegetable crops in Europe [15]. The model encompasses a suite of sub-modules capable of simulating various processes, including above- and below-ground plant growth, soil nitrogen mineralization, and plant nitrogen uptake, all of which are influenced by meteorological variables such as temperature and radiation. Although the model primarily supports vegetable varieties prevalent in Europe, certain varieties like pakchoi, which is commonly grown in East Asia, are not included in the model’s crop list. We conducted a calibration of the crop parameters and successfully integrated pakchoi as a new variety into the EU-Rotate_N model [20].

To optimize the water and nitrogen management under the vegetable rotation mode in the experimental site, the EU-Rotate_N model was improved by altering the soil water module due to the high groundwater level in the region. The groundwater level algorithms added to the model are referenced in our previous study [19]. The parameters input into the model are listed in Table 3. The ‘trial and error’ approach was employed for the calibration of pakchoi variety parameters. The daily soil water content and daily soil mineral N content served as validation metrics for the model, as described in our previous research [20].

In this research, the soil water content and soil mineral N content data from the first batch of pakchoi were utilized to calibrate the EU-Rotate_N model. As pakchoi is not included in the crop list of EU-Rotate_N model, the crop parameters for pakchoi were optimized based on a similar crop, spinach, which is included in the original crop list. After adjusting and calibrating parameter values in multiple treatments of the first batch, the optimized crop parameters were determined [20].

2.2.3. Stacking

The stacking model is an ensemble learning method in ML that is widely used to improve overall predictive performance by combining the predictions of multiple basic models [58,59,60,61]. The stacking model consists of two layers: the base model (first layer) and the meta-model (second layer). In this study, we used EU-Rotate_N, Random Forest Regression (RFR) and Support Vector Regression (SVR) as base models in the first layer and Multi-layer Perceptron (MLP) as meta-model for the second layer (Figure 1).

Effective accumulated temperature (GDD) serves as an indicator for quantifying the thermal requirement of crops throughout their growth and development cycles. This index is computed as the cumulative sum of the mean daily temperatures that exceed the crop’s biological minimum temperature threshold. The formula for calculating GDD is as follows:

GDD = \sum (T_{m e a n} - T_{b})

(1)

where T_b is the biological minimum temperature and T_mean is daily mean temperature. For pakchoi, T_b is 7.

For ML models, feature engineering was applied to select the input features relevant to the DM yields of pakchoi. Initially, outlier detection was performed on the dataset by examining the mean, range, and standard deviation of each parameter (Table 4). No anomalies were identified in the dataset upon comparison. Subsequently, it was essential to normalize all data and calculate the correlations between input parameters and DM yields. In feature engineering, feature scaling is an important step. Min-max normalization, a common feature scaling technique, was employed to mitigate the influence of the parameter scale and to map values linearly within the range [0, 1]. Concurrently, feature selection is a vital component of feature engineering. The Pearson correlation coefficient was utilized for feature selection to identify input parameters that exhibited significant correlations with DM yields using a significance threshold of p < 0.1. Finally, the total dataset, comprising features and DM yields, was partitioned into training and testing sets. To augment the training set, EU-Rotate_N model was employed to predict the daily DM for non-sampled dates. Consequently, an augmented training set, integrating both sampled and simulated data, was constructed for the ML models.

In the training phase, EU-Rotate_N, RFR and SVR were used as the three base models. The bias in the training accuracy of the stacking model was not significant because DM data were not used directly in the calibration of EU-Rotate_N and there was no prior correlation between the predicted values and the observed values. Consequently, the augmented training set was applied to all three base models, and the DM of the three crop batches was predicted based on this training set. Subsequently, the predictions generated a dataset, which was then used by the meta-model to learn the characteristics of this newly created dataset for DM prediction.

Following the optimization of the stacking model through parameter adjustments of both the base models and the meta model during the training phase, the DM yield for the harvest period was predicted in the testing phase. This was achieved by inputting meteorological data, soil data, fertilization data, and irrigation data into the stacking model to assess its predictive accuracy.

2.2.4. Random Forest Regression and Support Vector Regression

RFR and SVR are classic ML algorithms and have been proven to have excellent performances in multiple fields of applications [62,63,64]. RF is an effective ensemble method relying on decision trees as the base learner [65]. It is an extended variant of the bagging algorithm, i.e., it extracts samples randomly from the original datasets by the bootstrap method and randomly selects a small number of explanatory variables as features to construct multiple decision trees in the training process. On the basis of the regression results of each decision trees, RFR averages the regression values of each tree as the final regression results. The main hyper-parameters in RFR models are n_estimators and max_features, which represent the number of base decision trees and the number of features selected randomly at each node during the tree induction [66]. In our case, a grid search technology was used to determine the two hyper-parameters. The n_estimators were tuned within the range of 1 and 400 and max_features were tuned between

\sqrt{p}

and p, where p is the number of total input parameters [62,67,68]. The RFR program used in this study is the Random Forest Regression from the sklearn library in Python, which provides functions, such as model training, prediction, variable importance sorting, and plotting.

SVR is a distinguished supervised learning model that originated from the SVM model, which is a ML algorithm based on statistical learning theory and is suitable for small sample classification [69]. The task of SVR is to make all the sample points approximate to the regression hyperplane and minimize the total deviation between ample points and the hyperplane [70]. A kernel function is adopted to map the input sample space into a high dimensional feature space. The kernel function can be either linear or non-linear, and it is determined by the relationship between the independent and dependent variables. A linear model f(x) is constructed in the derived feature space to minimize the errors [71]. Therefore, the aim of SVR is to find an optimal hyperplane that can be solved as the following optimization problem:

\min \frac{1}{2} ║ w ║ + C \sum_{i = 1}^{n} (ξ_{i} + ξ_{i}^{*})

(2)

s . t . f (x_{i}) - y_{i} \leq ε + ξ_{i},

y_{i} - f (x_{i}) \leq ε + ξ_{i}^{*}

ξ_{i}, ξ_{i}^{*} \geq 0, i = 1, 2, \dots, n

where the hyper-parameter C is the penalty factor and it balances the empirical risk and structural risk, which affect the over fitting and under fitting of the model [72].

ε

is a constant called the tube size and

ξ_{i}

and

ξ_{i}^{*}

are slack variables representing the deviation of training samples outside the

ε

-insensitive zone [70]. In this research, the radial basis kernel (rbf) is used and the function is as follows:

K (x, y) = \exp (- γ ║ x - y ║^{2})

(3)

2.2.5. Multi-Layer Perceptron

MLP is a supervised machine learning method for predicting a continuous range of values and is often used in regression tasks. As one of the most widely used neural network models, the MLP constitutes a multi-layer feedforward network that is trained by the error backpropagation algorithm. The distinguishing characteristic of the MLP is the forward propagation of signals and the backward propagation of errors. In the forward propagation process, the input signal is processed layer by layer through the input layer and the hidden layer. Upon reaching the output layer, the error between the predicted and actual values is computed. This error is then propagated backward to the input layer by adjusting the weights of each neuron in the direction of the gradient. Subsequent to calculating the error for each neuron, the weights of each layer are updated accordingly.

MLP contains an input layer, several hidden layers and an output layer. The main hyper-parameters in MLP are the number of hidden layers and the number of neurons in each layer, activation functions, optimizer, and the maximum number of iterations. Using grid search and three-fold cross-validation during the search process, three kinds of activation functions (i.e., relu, tanh, and logistic) and three kinds of optimizers (i.e., adam, sgd, and lbfgs) are compared and the maximum number of iterations is set to 10,000 during parameter tuning, where the best combination of parameters is selected. Based on experience, the number of neurons is

2^{n}

and n ranges from 5 to 8 [73,74,75]. In this study, a neural network was initially constructed comprising a single hidden layer with 32 neurons. Subsequent optimization of the network architecture was achieved by incrementally increasing the number of neurons and the number of layers. During the training stage, the number of neurons in the hidden layer was adjusted by analyzing the evaluation metrics. If the evaluation metrics suggest that the model’s performance has plateaued, despite increments in the number of neurons, the incorporation of an additional hidden layer may be contemplated. To reduce computational complexity, the number of neurons in each layer ranges from 32 to 256.

2.2.6. Evaluation Metrics

The performances of prediction models are evaluated by four metrics: root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R²) and mean square error (MSE). The former three metrics are used for performance evaluation and the MSE is used for the hyper-parameter tuning of RFR and SVR. Formulas for the metrics are as follows:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \bar{y_{i}})}^{2}}

(4)

MAE = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - \bar{y_{i}}|

(5)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - \bar{y_{i}})}^{2}}{\sum_{i = 1}^{N} {(\hat{y_{i}} - \bar{y_{i}})}^{2}}

(6)

MSE = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \bar{y_{i}})}^{2}

(7)

where

y_{i}

represents the observed value,

\bar{y_{i}}

represents the predicted value,

\hat{y_{i}}

represents the mean value of reference samples, and N is the total number of samples.

3. Results

3.1. Prediction Performances of the EU-Rotate_N Model

Since DM values were not used during the calibration phase of EU-Rotate_N model, the prediction performance of the DM during the training process and testing process are investigated. The results (Figure 2a) show that the R² reaches 0.779, RMSE reaches 0.257 t/ha, and MAE reaches 0.159 t/ha for the training set. The corresponding performance for the test set is R² = 0.732, RMSE = 0.925 t/ha, and MAE = 0.836 t/ha (Figure 2b).

3.2. Prediction Performance of the ML Model

Conventionally, ML models can be employed independently to estimate the DM yield. This study investigates the performance of RFR and SVR models in predicting DM yield to assess the viability of using these models individually. For both ML models, raw parameters need to be preprocessed to extract features pertinent to pakchoi DM. Subsequently, ML models were constructed using these features in conjunction with DM values. Due to the limited amount of measured data, especially the DM data, the dataset was augmented with simulation data generated by the EU-Rotate_N model, and model training was conducted using this enhanced dataset.

3.2.1. Feature Selection

Representational features are particularly essential for model performance and generalization [76]. To minimize the interference of irrelevant input parameters on the relationship between independent and dependent variables and to develop a model with robust generalization capabilities, a feature selection process was conducted. The input parameters for the ML model can be divided into four categories: time factors, meteorological factors, initial soil factors, and cultivation factors (Table 5). The Pearson correlation coefficients between pakchoi DM and input parameters indicated that, for the expanded training set, most meteorological factors exhibited a significant positive correlation with DM yield, with the exception of RHmin (Figure 3). The correlation between pakchoi DM and initial soil factors also indicated significant positive associations. Both cultivation factors showed a significant positive correlation with DM, with correlation coefficients of 0.15 and 0.52 for inorganic and irrigation, respectively. Consequently, the parameters demonstrating significant correlations were selected as input features for the ML models.

3.2.2. Performances of the RFR Models

Following normalization, the features are introduced into the RFR models for training. The original training set and the expanded training set are used separately with their respective features. Thereafter, we compare the performance of the models on an independent test set.

In order to select the optimal hyper-parameters of the RFR model, a grid search was used and three-fold cross-validation was adopted in the search process. MSE was used as the evaluation metric and the R² of the validation set was up to 0.97 during the process of tuning hyper-parameters. The results of the grid search showed that the number of trees was about 18 and the number of features was three for the expanded training set.

The predictive performance of the RFR model on the expanded training set and the independent test set is shown in Figure 4. The R² for the expanded training set is 0.998. Meanwhile, the RMSE and MAE are 0.017 t/ha and 0.010 t/ha. For predicting the DM of the test set, the RFR model trained by expanded training set also presents good predictive performance, with R² reaching 0.813. Similarly, the RMSE and MAE values are 0.326 t/ha and 0.221 t/ha.

3.2.3. Performance of the SVR Model

Hyper-parameters of the SVR model were determined by a grid search. During the parameter tuning phase, three kernel functions—linear, polynomial, and rbf—were compared. Ultimately, the rbf kernel function was chosen as the optimal kernel for the SVR model. Using a grid search and adopting the three-fold cross-validation in the search process, the optimal parameter values of penalty parameter C, boundary region ε and kernel coefficient γ are 100, 0.01 and 0.01, respectively, and the evaluation metric is MSE.

For SVR models established by the expanded training set, the results suggest a similar excellent performance on the training set with the RFR model, but it exhibits relatively poor performance compared to the RFR model on the test set with R², RMSE and MAE of 0.778, 0.347 t/ha, and 0.326 t/ha, respectively (Figure 5).

3.3. Performance of the Stacking Model

3.3.1. Structure of the MLP Model

The optimal structure of the MLP model is obtained by the method of a grid search (Figure 6). The input layer contains three neurons corresponding to the outputs of the three base models in the first layer of the stacking model. Four hidden layers include 32, 32, 64 and 256 neurons, respectively. Relu is used as the activation function and lbfgs is used as the optimizer.

3.3.2. The Prediction Performance of the Stacking Model

The prediction results for the test set indicate that the prediction of the pakchoi DM yield is more accurate by the Stacking-MLP model than any single base model and meta models with averaging (AVG) or linear regression (LR) or Back Propagation Neural Network (BP) methods (Table 6). R² of the Stacking-MLP model reaches 0.999 and 0.834 in the training set and testing set, respectively, which reflects that the Stacking-MLP model trained on the simulated data can predict the DM yield of pakchoi accurately. The RMSE and MAE of the Stacking-MLP model are 0.283 t/ha and 0.196 t/ha for the test set, which are the smallest values among the corresponding values of all models. In addition, a comparison of the four meta-models showed that the MLP method is comparable to BP method in fusion ability. However, it demonstrates superior performance when compared to the AVG method and LR. The R² of the Stacking-MLP model is 0.013 higher than the Stacking-BP model, and the RMSE and MAE are 0.013 t/ha and 0.06 t/ha lower than those of the Stacking-BP model. The R² of the Stacking-MLP model is 0.059 higher than the Stacking-AVG model, and the RMSE and MAE are 0.25 t/ha and 0.265 t/ha lower than those of the Stacking-AVG model. Meanwhile, the result of the Stacking-BP model is slightly inferior to that of the Stacking-MLP model for the test set.

The prediction performances of the Stacking-MLP models that were trained with the original training set and the expanded training data set are compared. The results from the test set show that the prediction ability of the Stacking-MLP model has significantly improved with the addition of simulated training data (Table 7). The R² of the Stacking-MLP model improved 0.352 and RMSE decreased 0.361 t/ha and MAE decreased 0.285 t/ha. The R² of the RFR model for the expanded training set is same as the R² for the original training set and the RMSE and MAE are 0.006 t/ha and 0.003 t/ha smaller, respectively. The R² of the RFR model for the test set is about 0.331 higher than the R² for the original training set, and the RMSE and MAE values are 0.312 t/ha and 0.242 t/ha smaller. For SVR models established on the original training set, the training performance of SVR is excellent, with R² reaching 0.994, RMSE reaching 0.025, and MAE reaching 0.015. However, the model performance on the test set is rather poor. The R² is 0.079, RMSE is 0.719 t/ha, and MAE is 0.44 t/ha. For all models established with the expanded training set, the results show improvements in R², RMSE, and MAE both in the training and testing stages.

3.3.3. The Model Performance for Three Batches

Due to the differences in weather conditions, the DM yields vary for three batches, with average DM yields of 0.400 t/ha, 2.053 t/ha, and 1.174 t/ha, respectively (Figure 7). The highest DM appears in the second batch due to the high temperature of the second growing stage in June and July, with the average temperature reaching 27.6 °C when pakchoi grows vigorously. The growing periods of three batches are all about one month.

The prediction results for the test set of three batches (Table 8) show that the stacking model obtained the best performance on the second batch. The R² reaches 0.983 and the R² of all base models is above 0.925. Meanwhile, the RMSE is 0.388 t/ha, which reaches 18.9% of the whole DM yield, and the MAE is 0.295 t/ha, accounting for 14.4% of the DM. In comparison, the prediction error of the first batch and the third batch reach 24% and 24.2% of the DM yields in RMSE and 19.5% and 18.3% in MAE. Similarly, the R² values of the EU-Rotate_N model for the first and third batches are also lower than the R² of the second batch, with the lowest value of 0.237 appearing in the first batch.

3.3.4. The Model Performance for Six Nitrogen Treatments

For six nitrogen treatments, the prediction results for the test set show that the Stacking-MLP model has a good performance for the N4 treatment with a high R² of 0.890 and a low RMSE of 0.234 t/ha (Figure 8). The RFR model has a similar good performance to the Stacking-MLP model for N4, while for the SVR model, the highest R² is 0.846 for N2 and the lowest RMSE and MAE values appear in N1 (Table 9). For the EU-Rotate_N model, the best performance is on the N5 treatment. Among six nitrogen treatments, R² of the Stacking-MLP model ranges from 0.792 to 0.890, RMSE ranges from 0.220 t/ha to 0.432 t/ha and MAE ranges from 0.154 t/ha to 0.307 t/ha, which suggests that the differences in performance on the six nitrogen treatments are not very significant.

The performance of the various ML models was evaluated and compared across metrics (R², RMSE, and MAE) to elucidate their strengths and weaknesses in predicting the dry matter yield of pakchoi. The models under consideration included the RFR, SVR and stacking models. As can be seen in Table 7, the models trained on the expanded training set exhibited superior performance when evaluated on the test set compared to those trained on the original training set. With a larger dataset, the models may have been able to learn more intricate feature relationships and interactions that were not as apparent in the original training set, thereby improving the predictive accuracy. The RFR model, trained on the expanded training set, demonstrated superior performance compared to SVR model, with the R², RMSE, and MAE reaching 0.813, 0.326 t/ha, and 0.221 t/ha, respectively, in the testing stage. In contrast, the SVR model yielded R², RMSE, and MAE of 0.778, 0.347 t/ha, and 0.326 t/ha, respectively, in the testing stage. This discrepancy in performance is attributed to the non-linear and complex relationships between data, which the RFR model is well-suited to handle. RFR’s strength lies in its ability to capture intricate patterns and non-linear relationships, making it particularly effective in scenarios where data relationships are not easily defined by a clear margin. In this study, the data’s complexity, encompassing a multitude of environmental and soil factors, posed a significant challenge to the SVR model’s linear decision boundaries. Consequently, the RFR model’s ensemble approach, which encompasses a multitude of decision trees, proved to be more adept at modeling the complex interactions within our dataset. Furthermore, the stacking ensemble approach outperformed both the RFR and SVR models individually, achieving R², RMSE, and MAE of 0.834, 0.283 t/ha, and 0.196 t/ha, respectively, in the testing stage. The stacking model performs better because it leverages the strengths of each individual model and corrects for their weaknesses. The meta-regressor within the stacking model can learn which base model to trust more on different subsets of the data, allowing it to adapt and provide more accurate predictions.

4. Discussion

The purpose of this study is to predict the in-season DM of pakchoi using a stacking model that integrates machine learning (ML) models with the process-based EU-Rotate_N model. In fact, ML methods like RF, SVM, artificial neural networks (ANNs), etc., are widely used in yield forecasting for crops and vegetables [77,78,79]. However, as a popular regional vegetable, pakchoi received little focus on its yield prediction by ML models. In terms of prediction accuracy, our method has obvious advantages with R² = 0.834 and RMSE = 0.283 t/ha. Among the individual base models, the RFR model outperforms the SVR and the EU-Rotate_N models and is closer to the performance of the stacking model with R² = 0.81 and RMSE = 0.33 t/ha. The performance of RFR in pakchoi DM yield prediction indicates that it is a powerful ML model, which is similar to the results reported in previous studies [80,81,82]. The stacking model is also superior to the RFR and SVR model in the prediction accuracy of six N treatments, except that the R² for the N0 treatment is slightly lower than the SVR model by 0.004 and the R² for the N5 treatment is slightly lower than the RFR model by 0.007. The results suggest that, despite employing a similar combination strategy in model construction, the stacking model, with its heterogeneous multi-layer mechanism, is better equipped to handle more intricate nonlinear problems, thereby achieving superior outcomes compared to the RFR model.

The short growth period of pakchoi poses significant challenges to the data demand when using ML models. To improve the generalizability of ML model, a data augmentation method was explored utilizing the EU-Rotate_N model to simulate unobserved DM yields. The model performance on real test data shows that the augmentation method increases the prediction accuracy significantly for the stacking model by 0.352 in R² and 0.361 t/ha in RMSE, which is 29.5% of the average DM yields of the three batches. The improvement is similar to the results addressed by other studies with different augmentation methods [83,84]. The process model-based augmentation method can be used to increase the sample size when estimating the yields of crops or vegetables by ML models to reduce the time required for accumulating data [85]. Future weather data are also needed for the prediction of the DM yield by the proposed stacking model, which can be obtained through short-term weather forecasts. For some meteorological factors that cannot be obtained through weather forecasts directly, they can be calculated using known meteorological factors, such as calculating relative humidity based on the dew point temperature and current temperature.

Upon analyzing the performance of the models across various batches and nitrogen treatments, it became evident that there were significant variations in predictive accuracy. These disparities prompt an exploration of the underlying causes, which are multifaceted and interrelated. The variations in model performance across different batches and nitrogen treatments can be attributed, in part, to the differing environmental factors. These factors include temperature, humidity, and soil conditions, which can all influence plant growth and the effectiveness of nitrogen treatments.

DM predictions of three batches were investigated in this research. Substantial variations in DM predictions were observed across the three batches. The second batch demonstrated the highest DM content, surpassing the third batch, which in turn exhibited a greater DM content than the first batch. The second batch, which experienced the highest GDD, correspondingly had the highest DM prediction, aligning with the observed reality. The third batch, which had a slightly higher GDD than the first and a higher nitrate nitrogen content (NO₃ content), also showed a greater DM prediction (Figure 9).

Upon comparing the NO₃ contents across the three batches, it was observed that the second batch exhibited a lower NO₃ content under the first four nitrogen treatments (N0-N3) when contrasted with the other batches. This result indicates that GDD is a primary factor influencing pakchoi growth. Nevertheless, the precise impact of the NO₃ content on DM requires further analysis and confirmation.

The stacking model exhibits performance differences in the six nitrogen treatments. Furthermore, the base models also show similar prediction differences among treatments, indicating that the stacking model is to a large extent influenced by the base model. However, the overall differences among six treatments are not significant both for the base models and the stacking model. This indicates that the stacking model can be applied to various fertility management schemes.

The performance of the stacking method is also influenced by the meta-model besides the base models. Among the four ensemble methods evaluated, the MLP method outperformed the others, showing the best results for both the training and testing sets. However, the AVG method performed worst for the possible reason of simply averaging the results of the base models and ignoring the fact that base models have uneven prediction performances [28]. In this research, the RFR model performs better than the other two base models, and the AVG method weakens the advantages of the RFR model. In contrast, LR, BP, and MLP methods provide different feature weights during the ensemble process, which make the ensemble methods more effective.

This study introduces a novel approach by integrating a crop mechanism model with machine learning techniques to predict the pakchoi DM yield. This hybrid model leverages the strengths of both methodologies, providing a robust tool for agricultural optimization. The predictive capabilities of our model offer valuable insights into the potential yield of pakchoi under various conditions. By accurately forecasting yields, farmers and agricultural managers can make informed decisions about planting times, crop densities, and resource allocation.

The findings suggest that the model can be used to optimize nitrogen application. By predicting the yield response to different nitrogen treatments, we can identify the optimal nitrogen dosage, reducing waste and environmental impacts while maximizing crop yield. Moreover, the model’s capability extends beyond nitrogen management to inform comprehensive water–fertilizer strategies. The interdependence of water and fertilizer in crop growth means that optimizing one without considering the other could lead to suboptimal outcomes. The model can provide a reference for reducing the inputs of water and fertilizer while ensuring the output in actual production.

5. Conclusions

In this study, the EU-Rotate_N model and two ML models (RFR and SVR) were integrated to build a stacking model for predicting pakchoi DM. The test results indicate that the stacking model is capable of predicting the DM yield of pakchoi under multiple N treatments with a high accuracy. The stacking model proposed in this research can be applied to predict the biomass and yield of pakchoi.

This research demonstrates the effectiveness of the stacking model in predicting pakchoi yield with high accuracy. To extend the application of this model to other vegetable varieties, several steps would be involved. First, the stacking model would need to be adapted to account for the unique growth characteristics and environmental responses of the target vegetable crop. This could involve recalibrating the process-based model component (EU-Rotate_N) to reflect the specific physiological and agronomic parameters of the new crop. Additionally, the machine learning algorithms within the ensemble would require retraining using datasets that are representative of the new crop’s growth conditions and yield patterns. This retraining process would involve selecting appropriate features, optimizing model parameters, and validating the model’s performance against independent datasets. By following these steps, we can ensure that the stacking model maintains its predictive power across a diverse range of vegetable crops.

In the future, it can be used for different vegetable varieties by calibrating the EU-Rotate_N model and training the ML models. More specifically, the aim is to enhance the predictive accuracy of the model, complemented by the integration of visual information.

Author Contributions

Conceptualization, C.W., X.X. and Y.Z.; methodology, C.W.; software, C.W.; validation, C.W. and X.X.; formal analysis, C.W. and X.X.; resources, Z.C., I.U., Z.Z. and M.M.; data curation, C.W. and X.X.; writing—original draft preparation, C.W.; writing—review and editing, X.X.; supervision, M.M.; project administration, M.M.; funding acquisition, M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by “The R&D Foundation of Jiangsu Province, China, grant number BE2022425” and “The Municipal Science and Technology Plan Project of Yangzhou, grant numbers YZ2021150 and YZ2022179”.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, D.; Wang, X.; Si, Z.; Zhao, X.; Yan, H.; Xu, B.; Chen, Y.; Cui, L. Deposition, dissipation, metabolism, and dietary risk assessment of chlorothalonil on pakchoi. J. Food Compos. Anal. 2024, 134, 106521. [Google Scholar] [CrossRef]
Ren, Y.; Wang, W.; He, J.; Zhang, L.; Wei, Y.; Yang, M. Nitric oxide alleviates salt stress in seed germination and early seedling growth of pakchoi (Brassica chinensis L.) by enhancing physiological and bio-chemical parameters. Ecotoxicol. Environ. Saf. 2020, 187, 109785. [Google Scholar] [CrossRef] [PubMed]
Kapusta-Duch, J.; Kopeć, A.; Piatkowska, E.; Borczak, B.; Leszczyńska, T. The beneficial effects of Brassica vegetables on human health. Rocz. Państwowego Zakładu Hig. 2012, 63, 389–395. [Google Scholar]
Duan, P.; Fan, C.; Zhang, Q.; Xiong, Z. Overdose fertilization induced ammonia-oxidizing archaea producing nitrous oxide in intensive vegetable fields. Sci. Total Environ. 2019, 650, 1787–1794. [Google Scholar] [CrossRef]
Shahrajabian, M.H.; Sun, W.; Cheng, Q. A short review of health benefits and nutritional values of mung bean in sustainable agriculture. Pol. J. Agron. 2019, 30, 31–36. [Google Scholar] [CrossRef]
Lee, H.; Wang, J.; Leblon, B. Using linear regression, random forests, and support vector machine with unmanned aerial vehicle multispectral images to predict canopy nitrogen weight in corn. Remote Sens. 2020, 12, 2071. [Google Scholar] [CrossRef]
Fang, F.; Li, Y.; Yuan, D.; Zheng, Q.; Ding, J.; Xu, C.; Lin, W.; Li, Y. Distinguishing N₂O and N₂ ratio and their microbial source in soil fertilized for vegetable production using a stable isotope method. Sci. Total Environ. 2021, 801, 149694. [Google Scholar] [CrossRef]
Liu, X.C.; Chen, L.; Li, S.Q.; Shi, Q.H.; Wang, X.Y. Effects of vermicompost fertilization on soil, tomato yield and quality in greenhouse. J. Appl. Ecol. 2021, 32, 549–556. [Google Scholar] [CrossRef]
Wu, H.; Yue, Q.; Guo, P.; Xu, X.; Huang, X. Improving the AquaCrop model to achieve direct simulation of evapotranspiration under nitrogen stress and joint simulation-optimization of irrigation and fertilizer schedules. Agric. Water Manag. 2022, 266, 107599. [Google Scholar] [CrossRef]
Jones, J.W.; Hoogenboom, G.; Porter, C.H.; Boote, K.J.; Batchelor, W.D.; Hunt, L.A.; Wilkens, P.W.; Singh, U.; Gijsman, A.J.; Ritchie, J.T. The DSSAT cropping system model. Eur. J. Agron. 2003, 18, 235–265. [Google Scholar] [CrossRef]
Keating, B.A.; Carberry, P.S.; Hammer, G.L.; Probert, M.E.; Robertson, M.J.; Holzworth, D. An overview of APSIM, a model designed for farming systems simulationn. Eur. J. Agron. 2003, 18, 267–288. [Google Scholar] [CrossRef]
Vanuytrecht, E.; Raes, D.; Steduto, P.; Hsiao, T.C.; Fereres, E.; Heng, L.K.; Vila, M.G.; Moreno, P.M. AquaCrop: FAO’s crop water productivity and yield response model. Environ. Model. Softw. 2014, 62, 351–360. [Google Scholar] [CrossRef]
Gaydon, D.; Singh, B.; Wang, E.; Poulton, P.; Ahmad, B.; Ahmed, F.; Akhter, S.; Ali, I.; Amarasingha, R.; Chaki, A.; et al. Evaluation of the APSIM model in cropping systems of Asia. Field Crops Res. 2017, 204, 52–75. [Google Scholar] [CrossRef]
de Wit, A.; Boogaard, H.; Fumagalli, D.; Janssen, S.; Knapen, R.; van Kraalingen, D.; Supit, I.; van der Wijngaart, R.; van Diepen, K. 25 years of the WOFOST cropping systems model. Agric. Syst. 2019, 168, 154–167. [Google Scholar] [CrossRef]
Rahn, C.; Zhang, K.; Lillywhite, R.; Ramos, C.; Doltra, J.; De Paz, J.M.; Riley, H.; Fink, M.; Nendel, C.; Thorup Kristensen, K.; et al. EU-Rotate_N–a decision support system–to predict environmental and economic consequences of the management of nitrogen fertiliser in crop rotations. Eur. J. Hortic. Sci. 2010, 75, 20–32. [Google Scholar]
Øvsthus, I.; Thorup-Kristensen, K.; Seljåsen, R.; Riley, H.; Dörsch, P.; Breland, T.A. Calibration of the EU-Rotate_N model with measured C and N mineralization from potential fertilizers and evaluation of its prediction of crop and soil data from a vegetable field trial. Eur. J. Agron. 2021, 129, 126336. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, J.; Wang, H.; Wang, L.; Li, H. Identifying optimal water and nitrogen inputs for high efficiency and low environment impacts of a greenhouse summer cucumber with a model method. Agric. Water Manag. 2019, 212, 23–34. [Google Scholar] [CrossRef]
Zhang, K.F.; Li, C.; Hu, Z.F.; Huang, S.Q.; Chen, J.S.; Ma, X.F. Simulations of water cycle in the soil-crop system: Model improvement and validation. Appl. Ecol. Environ. Res. 2020, 18, 2163–2177. [Google Scholar] [CrossRef]
Hua, B.; Cao, Z.; Zhang, K.; Xu, X.; Zhang, Y.; Dai, H.; Zhang, Z.; Jiang, J.; Miao, M. Simulation of greenhouse cucumber growth, water and nitrogen dynamics in areas with high groundwater (HG) levels using the HG EU-Rotate_N model. Veg. Res. 2022, 2, 16. [Google Scholar] [CrossRef]
Xu, X.; Wang, C.; Wang, H.; Zhang, Y.; Cao, Z.; Zhang, Z.; Dai, H.; Miao, M. Development and performance evaluation of an APP for vegetable fertilization and irrigation management originated from EU-Rotate_N. Agric. Water Manag. 2023, 289, 108520. [Google Scholar] [CrossRef]
Maiorano, A.; Martre, P.; Asseng, S.; Ewert, F.; Müller, C.; Rötter, R.P.; Ruane, A.C.; Semenov, M.A.; Wallach, D.; Wang, E.; et al. Crop model improvement reduces the uncertainty of the response to temperature of multi-model ensembles. Field Crops Res. 2017, 202, 5–20. [Google Scholar] [CrossRef]
Wallach, D.; Martre, P.; Liu, B.; Asseng, S.; Ewert, F.; Thorburn, P.J.; van Ittersum, M.; Aggarwal, P.K.; Ahmed, M.; Basso, B.; et al. Multimodel ensembles improve predictions of crop–environment–management interactions. Glob. Chang. Biol. 2018, 24, 5072–5083. [Google Scholar] [CrossRef] [PubMed]
Mayer, D.G.; Chandra, K.A.; Burnett, J.R. Improved crop forecasts for the Australian macadamia industry from ensemble models. Agric. Syst. 2019, 173, 519–523. [Google Scholar] [CrossRef]
Pohanková, E.; Hlavinka, P.; Kersebaum, K.-C.; Rodríguez, A.; Balek, J.; Bednařík, M.; Dubrovský, M.; Gobin, A.; Hoogenboom, G.; Moriondo, M.; et al. Expected effects of climate change on the production and water use of crop rotationmanagement reproduced by crop model ensemble for Czech Republic sites. Eur. J. Agron. 2022, 134, 126446. [Google Scholar] [CrossRef]
Rodríguez, A.; Ruiz-Ramos, M.; Palosuo, T.; Carter, T.; Fronzek, S.; Lorite, I.; Ferrise, R.; Pirttioja, N.; Bindi, M.; Baranowski, P.; et al. Implications of crop model ensemble size and composition for estimates of adaptation effects and agreement of recommendations. Agric. For. Meteorol. 2019, 264, 351–362. [Google Scholar] [CrossRef] [PubMed]
Hassall, K.L.; Coleman, K.; Dixit, P.N.; Granger, S.J.; Zhang, Y.; Sharp, R.T.; Wu, L.; Whitmore, A.P.; Richter, G.M.; Collins, A.L.; et al. Exploring the effects of land management change on productivity, carbon and nutrient balance: Application of an Ensemble Modelling Approach to the upper River Taw observatory, UK. Sci. Total Environ. 2022, 824, 153824. [Google Scholar] [CrossRef] [PubMed]
Hossard, L.; Bregaglio, S.; Philibert, A.; Ruget, F.; Resmond, R.; Cappelli, G.; Delmotte, S. A web application to facilitate crop model comparison in ensemble studies. Environ. Model. Softw. 2017, 97, 259–270. [Google Scholar] [CrossRef]
Gao, Y.; Wallach, D.; Hasegawa, T.; Tang, L.; Zhang, R.; Asseng, S.; Kahveci, T.; Liu, L.; He, J.; Hoogenboom, G. Evaluation of crop model prediction and uncertainty using Bayesian parameter estimation and Bayesian model averaging. Agric. For. Meteorol. 2021, 311, 108686. [Google Scholar] [CrossRef]
Yu, J.; Tan, S.; Zhan, J. Multiple model averaging methods for predicting regional rice yield. Agron. J. 2023, 115, 635–646. [Google Scholar] [CrossRef]
Zheng, J.; Zhang, S. Improving rice phenology simulations based on the Bayesian model averaging method. Eur. J. Agron. 2023, 142, 126646. [Google Scholar] [CrossRef]
Jha, P.K.; Ines, A.V.; Han, E.; Cruz, R.; Prasad, P.V. A comparison of multiple calibration and ensembling methods for estimating genetic coefficients of CERES-Rice to simulate phenology and yields. Field Crops Res. 2022, 284, 108560. [Google Scholar] [CrossRef]
Luo, Q.; Hoogenboom, G.; Yang, H. Uncertainties in assessing climate change impacts and adaptation options with wheat crop models. Theor. Appl. Climatol. 2022, 149, 805–816. [Google Scholar] [CrossRef]
Lu, Y.; Chibarabada, T.P.; Ziliani, M.G.; Onema, J.M.K.; McCabe, M.F.; Sheffield, J. Assimilation of soil moisture and canopy cover data improves maize simulation using an under-calibrated crop model. Agric. Water Manag. 2021, 252, 106884. [Google Scholar] [CrossRef]
Ehrhardt, F.; Soussana, J.-F.; Bellocchi, G.; Grace, P.; McAuliffe, R.; Recous, S.; Sándor, R.; Smith, P.; Snow, V.; de Antoni Migliorati, M.; et al. Assessing uncertainties in crop and pasture ensemble model simulations of productivity and N₂O emissions. Glob. Chang. Biol. 2018, 24, e603–e616. [Google Scholar] [CrossRef]
Sándor, R.; Ehrhardt, F.; Grace, P.; Recous, S.; Smith, P.; Snow, V.; Soussana, J.-F.; Basso, B.; Bhatia, A.; Brilli, L.; et al. Ensemble modelling of carbon fluxes in grasslands and croplands. Field Crops Res. 2020, 252, 107791. [Google Scholar] [CrossRef]
Leng, G.; Hall, J.W. Predicting spatial and temporal variability in crop yields: An inter-comparison of machine learning, regression and process-based models. Environ. Res. Lett. 2020, 15, 044027. [Google Scholar] [CrossRef]
Abrougui, K.; Gabsi, K.; Mercatoris, B.; Khemis, C.; Amami, R.; Chehaibi, S. Prediction of organic potato yield using tillage systems and soil properties by artificial neural network (ANN) and multiple linear regressions (MLR). Soil Tillage Res. 2019, 190, 202–208. [Google Scholar] [CrossRef]
Xu, X.; Gao, P.; Zhu, X.; Guo, W.; Ding, J.; Li, C.; Zhu, M.; Wu, X. Design of an integrated climatic assessment indicator (ICAI) for wheat production: A case study in Jiangsu Province, China. Ecol. Indic. 2019, 101, 943–953. [Google Scholar] [CrossRef]
Gyamerah, S.A.; Ngare, P.; Ikpe, D. Probabilistic forecasting of crop yields via quantile random forest and Epanechnikov Kernel function. Agric. For. Meteorol. 2020, 280, 107808. [Google Scholar] [CrossRef]
Roell, Y.E.; Beucher, A.; Møller, P.G.; Greve, M.B.; Greve, M.H. Comparing a random forest based prediction of winter wheat yield to historical yield potential. Agronomy 2020, 10, 395. [Google Scholar] [CrossRef]
Fei, S.; Chen, Z.; Li, L.; Ma, Y.; Xiao, Y. Bayesian model averaging to improve the yield prediction in wheat breeding trials. Agric. For. Meteorol. 2023, 328, 109237. [Google Scholar] [CrossRef]
Paudel, D.; Boogaard, H.; de Wit, A.; Janssen, S.; Osinga, S.; Pylianidis, C.; Athanasiadis, I.N. Machine learning for large-scale crop yield forecasting. Agric. Syst. 2021, 187, 103016. [Google Scholar] [CrossRef]
Tedesco, D.; de Almeida Moreira, B.R.; Júnior, M.R.B.; Papa, J.P.; da Silva, R.P. Predicting on multi-target regression for the yield of sweet potato by the market class of its roots upon vegetation indices. Comput. Electron. Agric. 2021, 191, 106544. [Google Scholar] [CrossRef]
Gómez, D.; Salvador, P.; Sanz, J.; Casanova, J.L. Potato yield prediction using machine learning techniques and sentinel 2 data. Remote Sens. 2019, 11, 1745. [Google Scholar] [CrossRef]
Wei, M.C.F.; Maldaner, L.F.; Ottoni, P.M.N.; Molin, J.P. Carrot yield mapping: A precision agriculture approach based on machine learning. AI 2020, 1, 229–241. [Google Scholar] [CrossRef]
Taşan, S.; Cemek, B.; Taşan, M.; Cantürk, A. Estimation of eggplant yield with machine learning methods using spectral vegetation indices. Comput. Electron. Agric. 2022, 202, 107367. [Google Scholar] [CrossRef]
Zhang, N.; Zhou, X.; Kang, M.; Hu, B.G.; Heuvelink, E.; Marcelis, L.F. Machine learning versus crop growth models: An ally, not a rival. AoB Plants 2023, 15, plac061. [Google Scholar] [CrossRef] [PubMed]
Feng, P.; Wang, B.; Li Liu, D.; Waters, C.; Yu, Q. Incorporating machine learning with biophysical model can improve the evaluation of climate extremes impacts on wheat yield in south-eastern Australia. Agric. For. Meteorol. 2019, 275, 100–113. [Google Scholar] [CrossRef]
Shahhosseini, M.; Hu, G.; Huber, I.; Archontoulis, S.V. Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt. Sci. Rep. 2021, 11, 1606. [Google Scholar] [CrossRef]
Xiao, L.; Wang, G.; Zhou, H.; Jin, X.; Luo, Z. Coupling agricultural system models with machine learning to facilitate regional predictions of management practices and crop production. Environ. Res. Lett. 2022, 17, 114027. [Google Scholar] [CrossRef]
Zhao, Y.; Xiao, D.; Bai, H.; Tang, J.; Liu, D.L.; Qi, Y.; Shen, Y. The prediction of wheat yield in the North China plain by coupling crop model with machine learning algorithms. Agriculture 2022, 13, 99. [Google Scholar] [CrossRef]
Bao, S.D. Soil Agricultural Chemical Analysis, 3rd ed.; China Agricultural Press: Beijing, China, 2000. (In Chinese) [Google Scholar]
Lu, R.K. Analysis Method of Soil Agricultural Chemistry; China Agricultural Science and Technology Press: Beijing, China, 2002. (In Chinese) [Google Scholar]
Chipanshi, A.; Zhang, Y.; Kouadio, L.; Newlands, N.; Davidson, A.; Hill, H.; Warren, R.; Qian, B.; Daneshfar, B.; Bedard, F.; et al. Evaluation of the Integrated Canadian Crop Yield Forecaster (ICCYF) model for in-season prediction of crop yield across the Canadian agricultural landscape. Agric. For. Meteorol. 2015, 206, 137–150. [Google Scholar] [CrossRef]
Xie, Q.; Dai, Z.; Hovy, E.; Luong, T.; Le, Q. Unsupervised data augmentation for consistency training. Adv. Neural Inf. Process. Syst. 2020, 33, 6256–6268. [Google Scholar]
Xiao, J.; Wang, Y.; Chen, J.; Xie, L.; Huang, J. Impact of resampling methods and classification models on the imbalanced credit scoring problems. Inf. Sci. 2021, 569, 508–526. [Google Scholar] [CrossRef]
Greenwood, D.J.; Rahn, C.; Draycott, A.; Vaidyanathan, L.V.; Paterson, C. Modelling and measurement of the effects of fertilizer-N and crop residue incorporation on N-dynamics in vegetable cropping. Soil Use Manag. 1996, 12, 13–24. [Google Scholar] [CrossRef]
Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Breiman, L. Stacked regressions. Mach. Learn. 1996, 24, 49–64. [Google Scholar] [CrossRef]
Haghighi, F.; Omranpour, H. Stacking Ensemble Model of Deep Learning and Its Application to Persian/Arabic Handwritten Digits Recognition. Knowl.-Based Syst. 2021, 220, 106940. [Google Scholar] [CrossRef]
Li, Z.; Tian, L.; Jiang, Q.; Yan, X. Distributed-ensemble stacked autoencoder model for non-linear process monitoring. Inf. Sci. 2021, 542, 302–316. [Google Scholar] [CrossRef]
Bui, D.T.; Tsangaratos, P.; Nguyen, V.T.; Van Liem, N.; Trinh, P.T. Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment. Catena 2020, 188, 104426. [Google Scholar] [CrossRef]
Halim, Z.; Rehan, M. On identification of driving-induced stress using electroencephalogram signals: A framework based on wearable safety-critical scheme and machine learning. Inf. Fusion 2020, 53, 66–79. [Google Scholar] [CrossRef]
Islam, A.R.M.T.; Talukdar, S.; Mahato, S.; Kundu, S.; Eibek, K.U.; Pham, Q.B.; Kuriqi, A.; Thi Thuy Linh, N. Flood susceptibility modelling using advanced ensemble machine learning models. Geosci. Front. 2021, 12, 101075. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Lagomarsino, D.; Tofani, V.; Segoni, S.; Catani, F.; Casagli, N. A tool for classification and regression using random forest methodology: Applications to landslide susceptibility mapping and soil thickness modeling. Environ. Model. Assess. 2017, 22, 201–214. [Google Scholar] [CrossRef]
Soares, A.P.D.M.R.; de Oliveira Carvalho, F.; de Farias Silva, C.E.; da Silva Gonçalves, A.H.; de Souza Abud, A.K. Random Forest as a promising application to predict basic-dye biosorption process using orange waste. J. Environ. Chem. Eng. 2020, 8, 103952. [Google Scholar] [CrossRef]
Xu, X.; Gao, P.; Zhu, X.; Guo, W.; Ding, J.; Li, C.; Zhu, M.; Wu, X. Response to “letter to the editor: ‘Design of an integrated climatic assessment indicator (ICAI) for wheat production: A case study in Jiangsu Province, China’ by Xiangying Xu, ping Gao, Xinkai Zhu, Wenshan Guo, Jinfeng Ding, Chunyn Li, Min Zhu, Xuanwei Wu”. Ecol. Indic. 2020, 113, 106195. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Sun, Y.; Ding, S.; Zhang, Z.; Jia, W. An improved grid search algorithm to optimize SVR for prediction. Soft Comput. 2021, 25, 5633–5644. [Google Scholar] [CrossRef]
Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
Panahi, M.; Sadhasivam, N.; Pourghasemi, H.R.; Rezaie, F.; Lee, S. Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J. Hydrol. 2020, 588, 125033. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Kruschke, J.K.; Movellan, J.R. Benefits of gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Trans. Syst. Man Cybern. 1991, 21, 273–280. [Google Scholar] [CrossRef]
Van Nguyen, N.; Van Le, L.; Nguyen, T.N.; Park, S.S.; Tran, T.D. Prediction of Liquefied Soil Settlement Using Multilayer Perceptron with Bayesian Optimization. Indian Geotech. J. 2024, 1–11. [Google Scholar] [CrossRef]
Zhai, B.; Chen, J. Development of a stacked ensemble model for forecasting and analyzing daily average PM_2.5 concentrations in Beijing, China. Sci. Total Environ. 2018, 635, 644–658. [Google Scholar] [CrossRef]
Abbas, F.; Afzaal, H.; Farooque, A.A.; Tang, S. Crop yield prediction through proximal sensing and machine learning algorithms. Agronomy 2020, 10, 1046. [Google Scholar] [CrossRef]
Anbananthen, K.S.M.; Subbiah, S.; Chelliah, D.; Sivakumar, P.; Somasundaram, V.; Velshankar, K.H.; Khan, M.A. An intelligent decision support system for crop yield prediction using hybrid machine learning algorithms. F1000Research 2021, 10, 1143. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Wang, B.; Feng, P.; Liu, D.L.; He, Q.; Zhang, Y.; Wang, Y.; Li, S.; Lu, X.; Yue, C.; et al. Developing machine learning models with multi-source environmental data to predict wheat yield in China. Comput. Electron. Agric. 2022, 194, 106790. [Google Scholar] [CrossRef]
Jeong, J.H.; Resop, J.P.; Mueller, N.D.; Fleisher, D.H.; Yun, K.; Butler, E.E.; Timlin, D.J.; Shim, K.-M.; Gerber, J.S.; Reddy, V.R.; et al. Random forests for global and regional crop yield predictions. PLoS ONE 2016, 11, e0156571. [Google Scholar] [CrossRef] [PubMed]
Burdett, H.; Wellen, C. Statistical and machine learning methods for crop yield prediction in the context of precision agriculture. Precis. Agric. 2022, 23, 1553–1574. [Google Scholar] [CrossRef]
Mokhtar, A.; El-Ssawy, W.; He, H.; Al-Anasari, N.; Sammen, S.S.; Gyasi-Agyei, Y.; Abua-rab, M. Using machine learning models to predict hydroponically grown lettuce yield. Front. Plant Sci. 2022, 13, 706042. [Google Scholar] [CrossRef]
Chergui, N. Durum wheat yield forecasting using machine learning. Artif. Intell. Agric. 2022, 6, 156–166. [Google Scholar] [CrossRef]
Nowatzke, M.; Damiano, L.; Miguez, F.E.; McNunn, G.S.; Niemi, J.; Schulte, L.A.; Heaton, E.A.; VanLoocke, A. Augmenting agroecosystem models with remote sensing data and machine learning increases overall estimates of nitrate-nitrogen leaching. Environ. Res. Lett. 2022, 17, 114010. [Google Scholar] [CrossRef]
Zhang, J.; Tian, H.; Wang, P.; Tansey, K.; Zhang, S.; Li, H. Improving wheat yield estimates using data augmentation models and remotely sensed biophysical indices within deep neural networks in the Guanzhong Plain, PR China. Comput. Electron. Agric. 2022, 192, 106616. [Google Scholar] [CrossRef]

Figure 1. The framework of the stacking model.

Figure 2. The prediction performance of EU-Rotate_N on training set and test set. (a) The prediction performance on the training set. (b)The prediction performance on the test set. The triangle refers to a pair of actual and predicted values plotted against each other.

Figure 3. Correlation coefficients between pakchoi DM yield and input parameters with significant correlations (p < 0.1).

Figure 4. Prediction performances of RFR models built on the expanded training set. (a) The prediction performance on the training set. (b) The prediction performance on the testing set. The triangle refers to a pair of actual and predicted values plotted against each other.

Figure 5. Prediction performances of SVR models built on the expanded training sets. (a) The prediction performance on the training set. (b) The prediction performance on the testing set. The triangle refers to a pair of actual and predicted values plotted against each other.

Figure 6. The structure of the MLP model.

Figure 7. The measured DM of pakchoi batches.

Figure 8. Prediction performances of the stacking model for six nitrogen treatments.

Figure 9. NO₃ contents of the first and third batches under six nitrogen treatments.

Table 1. Physical and chemical properties for the soil profile of the experimental site.

Soil Layer (cm)	Clay (%)	Sand (%)	Bulk Density (g·cm⁻³)	FC (cm⁻³·cm⁻³)	θ_s (cm⁻³·cm⁻³)	pH
0–10	35.9	1.30	1.30	0.29	0.44	6.57
10–20	42.6	9.2	1.43	0.26	0.38	7.02
20–30	31.7	10.07	1.41	0.27	0.37	7.11

FC: field capacity; θ_s: saturated water content

Table 2. The amount of fertilizer applied in the six nitrogen treatments.

Batch	Planting Time	Harvest Time	Fertilizing Amount (kg/hm²)
Batch	Planting Time	Harvest Time	N0	N1	N2	N3	N4	N5
First	22 April 2018	20 May 2018	0	80	160	240	320	400
Second	6 June 2018	4 July 2018	0	80	160	240	320	400
Third	13 September 2018	14 October 2018	0	80	160	240	320	400

Table 3. Input parameters of the EU-Rotate_N model.

Category	Content
Site properties	Latitude; altitude; N deposition
Simulation period	Simulation start date; simulation end date
Weather files	File name of weather data
Soil properties	Volumetric soil water content at field capacity; permanent wilting point and saturation; clay/sand contents; bulk density; pH; organic matter content
Initial conditions	Soil volumetric water content; layer soil mineral N content
Fertilizer application	Organic and inorganic fertilizer types; application method; application date and application amount
Irrigation management	Irrigation method; irrigation date; irrigation amount; nitrogen concentration in irrigation water
Crop data	Crop type; row width; plant spacing; planting date; harvest date; number of harvests; N in transplant; dry weight at planting

Table 4. Summary of the parameters of the dataset.

Parameter	Range	Mean	Standard Deviation
Tmin	7.5–28.8	18.46	4.88
Tmean	16.4–32.7	24.48	4.02
Tmax	20.6–43.0	32.93	5.01
GDD	17.55–572.5	278.59	159.41
RHmean	24.2–98.8	77.13	10.96
RHmax	29.6–100.0	98.61	7.66
RHmin	22.1–94.2	46.96	15.90
Vwind	0.0–0.4	0.17	0.08
Tsun	0.0–12.5	9.29	2.43
Rs	9.9–28.5	21.97	4.14
Eo	2.0–6.4	4.25	1.12
SWC1	0.18–0.429	0.27	0.09
SMN1	147.99–286.08	204.04	28.00
SWC2	0.37–0.486	0.41	0.04
SMN2	141.6–236.08	172.38	24.48
SWC3	0.338–0.465	0.38	0.04
SMN3	93.3–176.62	172.4	22.78
irrigation	0.0–62.42	40.54	16.05
inorganic	0.0–186.8	55.27	62.62

Table 5. Input parameters used for ML models.

	Parameter	Description	Unit
Time factors	Day	Ordinal number of the sampling date	/
Time factors	Plant_day	The planting date of the batch	/
Daily meteorological factors	Tmax	The max temperature	°C
	Tmean	The mean temperature	°C
	Tmin	The min temperature	°C
	GDD	Effective accumulated temperature	°C
	RHmean	The mean relative humidity	%
	RHmax	The max relative humidity	%
	RHmin	The min relative humidity	%
	Vwind	Vertical wind speed	m/s
	Tsun	Sunlight hours	h
	Rs	Solar radiation	J/m²d
	Eo	Evaporation capacity	mm/d
Initial soil factors of each packoi treatment	SWC1	Soil water content of the 1st soil layer (0–10 cm)	cm³/cm³
	SMN1	Soil mineral N content of the 1st soil layer (0–10 cm)	kg N/ha
	SWC2	Soil water content of the 2nd soil layer (10–20 cm)	cm³/cm³
	SMN2	Soil mineral N content of the 2nd soil layer (10–20 cm)	kg N/ha
	SWC3	Soil moisture content of the 3rd soil layer (20–30 cm)	cm³/cm³
	SMN3	Soil mineral N content of the 3rd soil layer (20–30 cm)	kg N/ha
cultivation factors	irrigation	Accumulated irrigation amount from the planting date to sampling date	mm
cultivation factors	inorganic	Accumulated fertilization amount from the planting date to sampling date	kg/ha

Table 6. Prediction performance of ML models and stacking models built on the expanded training set.

Method	Training			Test
Method	R²	RMSE (t/ha)	MAE (t/ha)	R²	RMSE (t/ha)	MAE (t/ha)
RFR	0.998	0.017	0.010	0.813	0.326	0.221
SVR	0.993	0.032	0.012	0.778	0.347	0.326
EU-Rotate_N	0.942	0.323	0.209	0.732	0.925	0.836
Stacking-LR	0.998	0.015	0.010	0.824	0.304	0.206
Stacking-AVG	0.978	0.124	0.077	0.775	0.533	0.461
Stacking-BP	0.999	0.007	0.005	0.821	0.296	0.256
Stacking-MLP	0.999	0.012	0.008	0.834	0.283	0.196

Table 7. Performance comparison of models built on the original and expanded training sets.

Training Dataset	Method	Training Performance			Test Performance
Training Dataset	Method	R²	RMSE (t/ha)	MAE (t/ha)	R²	RMSE (t/ha)	MAE (t/ha)
original	RFR	0.998	0.011	0.007	0.482	0.638	0.463
	SVR	0.994	0.025	0.015	0.079	0.719	0.440
	Stacking-MLP	0.999	0.008	0.005	0.482	0.644	0.481
expanded	RFR	0.998	0.017	0.010	0.813	0.326	0.221
	SVR	0.993	0.032	0.012	0.778	0.347	0.326
	Stacking-MLP	0.999	0.012	0.008	0.834	0.283	0.196

Table 8. Prediction performances of ML models for three batches.

Batch	Method	Test
Batch	Method	R²	RMSE (t/ha)	MAE (t/ha)
First	RFR	0.488	0.079	0.077
	SVR	0.570	0.243	0.227
	EU-Rotate_N	0.237	0.429	0.409
	stacking	0.536	0.096	0.078
Second	RFR	0.984	0.486	0.426
	SVR	0.925	0.415	0.408
	EU-Rotate_N	0.947	0.914	0.878
	stacking	0.983	0.388	0.295
Third	RFR	0.398	0.278	0.161
	SVR	0.293	0.359	0.343
	EU-Rotate_N	0.399	1.244	1.222
	stacking	0.321	0.284	0.215

Table 9. Prediction performances of the RFR and SVR models for six nitrogen treatments.

Nitrogen Treatments	Method	Test
Nitrogen Treatments	Method	R²	RMSE (t/ha)	MAE (t/ha)
N0	RFR	0.731	0.327	0.216
	SVR	0.796	0.334	0.315
	EU-Rotate_N	0.699	0.828	0.737
N1	RFR	0.821	0.254	0.193
	SVR	0.816	0.286	0.271
	EU-Rotate_N	0.748	0.883	0.803
N2	RFR	0.845	0.248	0.180
	SVR	0.846	0.297	0.289
	EU-Rotate_N	0.778	0.951	0.876
N3	RFR	0.802	0.282	0.196
	SVR	0.751	0.349	0.335
	EU-Rotate_N	0.713	1.043	0.955
N4	RFR	0.887	0.276	0.182
	SVR	0.795	0.343	0.323
	EU-Rotate_N	0.741	1.019	0.925
N5	RFR	0.842	0.500	0.359
	SVR	0.731	0.447	0.421
	EU-Rotate_N	0.816	0.800	0.722

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, C.; Xu, X.; Zhang, Y.; Cao, Z.; Ullah, I.; Zhang, Z.; Miao, M. A Stacking Ensemble Learning Model Combining a Crop Simulation Model with Machine Learning to Improve the Dry Matter Yield Estimation of Greenhouse Pakchoi. Agronomy 2024, 14, 1789. https://doi.org/10.3390/agronomy14081789

AMA Style

Wang C, Xu X, Zhang Y, Cao Z, Ullah I, Zhang Z, Miao M. A Stacking Ensemble Learning Model Combining a Crop Simulation Model with Machine Learning to Improve the Dry Matter Yield Estimation of Greenhouse Pakchoi. Agronomy. 2024; 14(8):1789. https://doi.org/10.3390/agronomy14081789

Chicago/Turabian Style

Wang, Chao, Xiangying Xu, Yonglong Zhang, Zhuangzhuang Cao, Ikram Ullah, Zhiping Zhang, and Minmin Miao. 2024. "A Stacking Ensemble Learning Model Combining a Crop Simulation Model with Machine Learning to Improve the Dry Matter Yield Estimation of Greenhouse Pakchoi" Agronomy 14, no. 8: 1789. https://doi.org/10.3390/agronomy14081789

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Stacking Ensemble Learning Model Combining a Crop Simulation Model with Machine Learning to Improve the Dry Matter Yield Estimation of Greenhouse Pakchoi

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.1.1. Sampling Data

2.1.2. Simulation Data

2.1.3. Dataset

2.2. Methods

2.2.1. Field Experiment Design

2.2.2. EU-Rotate_N Model Description

2.2.3. Stacking

2.2.4. Random Forest Regression and Support Vector Regression

2.2.5. Multi-Layer Perceptron

2.2.6. Evaluation Metrics

3. Results

3.1. Prediction Performances of the EU-Rotate_N Model

3.2. Prediction Performance of the ML Model

3.2.1. Feature Selection

3.2.2. Performances of the RFR Models

3.2.3. Performance of the SVR Model

3.3. Performance of the Stacking Model

3.3.1. Structure of the MLP Model

3.3.2. The Prediction Performance of the Stacking Model

3.3.3. The Model Performance for Three Batches

3.3.4. The Model Performance for Six Nitrogen Treatments

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI