3.1. Construction of Model Library Based on Machine Learning
Constructing an environmental prediction model for detecting crop growth stages can help optimize agricultural production. However, quantifying crop growth intuitively is challenging; therefore, this study uses a dataset on plant growth stages for modeling. The dataset is collected in the field and comprises plant growth stages annotated by experts. Although the proposed model is trained on that dataset, it can adapt to different crops and environmental conditions by configuring the parameters of the different crop growth stages, improving the model’s versatility.
Figure 1 illustrates the model construction process.
This study employs several machine learning algorithms to construct a model library, which is then used to establish the prediction model. Different models are trained for the same task to obtain better prediction results, and the one performing best on the test set is selected as the prediction model. Constructing a model library involves presetting several machine learning algorithms as models trained on the actual greenhouse environmental data, setting several evaluation indicators for each model based on the accumulated collected data, and adaptively selecting the most suitable model as the learning model.
To satisfy the computing resource requirements of the model library, this study increases the update interval of the calculations when updating the model. In actual greenhouse environment production, computing resources are idle in most cases. Thus, several models are calculated and evaluated during the idle periods to exploit these computing resources, thereby improving their utilization. The model is constructed using Pandas, a popular data processing library in the Python language, and Scikit-learn, a machine learning library.
Several machine learning models are evaluated as model libraries through simulation and comparison on the dataset, including three support vector machine models (SVM [
9], linear-SVM [
10], and nu-SVM [
11]), two tree models (gradient boosting tree model [
12] and decision tree model [
13]), and three linear models (Bayes Ridge regression model [
14], logistic regression model [
15], and gradient descent model [
16]).
The main reason for choosing these models is that they represent the classic models used in the current machine learning research. First, for the SVM model, it can distinguish different types of data by finding the hyperplane between the data. This method is also commonly used in the modeling process of agricultural machine learning models. In addition, the model also chooses two types of tree models, because these tree models have a good effect when addressing problems with complex features. The main principle of the algorithm is to use the tree structure to establish the interval parameters of different features. We chose two decision trees representing the model based on this idea and the optimized version of GDBT. In order to further enrich the model, we also chose some more classic regression models, including Bayesian Ridge regression model based on Bayesian thought, which can handle unbalanced data well. As a basic model for statistical calculation, the logistic model is also included in the model base. In addition, the linear regression model based on gradient decline is also used, which mainly uses gradient decline to optimize model parameters, These different models are currently widely used in the field of agricultural applications, but there is a lack of algorithms to optimize and combine these different models in an effective way. Therefore, this article integrates these different models using a model library to establish a greenhouse agriculture parameter prediction model for our method.
In terms of model parameter settings, the two SVM models use a penalty term parameter of 1.0, and it is obvious that the two models use different kernel functions; one is a linear kernel and the other is a nu kernel. The use of two types of kernels allows the model to encompass the learning ability of two different types of kernels. In the parameter settings of the two tree models, the minimum parameter splitting feature number is set to 2, and the minimum leaf feature number is set to 1 to ensure that each feature can be well learned. The learning rate of GDBT is set to 0.1, and in other settings such as linear model parameters and model parameter settings not mentioned, it is ensured that it is consistent with the parameter settings of classical models.
The parallel processing mechanism of Python is exploited for multi-threaded calculations to fully utilize computing resources and complete the management of different models. Meanwhile, the model library is abstracted into a Python class to write the corresponding functions to provide the corresponding responses, such as prediction, evaluation, and data processing (in
Figure 2). Specifically, the collected feature data are first processed using Pandas and sent to the model library when the subsequent training process starts. The model library divides the feature data into a test set and a training set at a ratio of 2:8. Once training is completed, the trained models are evaluated using seven evaluation indicators implemented in Python, i.e., maximum error, mean absolute error, mean square error, root mean square error, mean square logarithmic error, median absolute error, and coefficient of determination. These indicators evaluate the model’s prediction effect or quality, and the smaller the indicator value, the better the prediction effect. Then, by taking these indicators as evaluation criteria, different models are voted on, and the model with the most votes is selected as the crop growth prediction model. Meanwhile, the model with the lowest training error is selected as optimal. When the performance of several models is relatively balanced, the coefficient of determination is taken as the reference index.
3.2. Construction of Greenhouse Decision-Making Optimization Model
The main method to investigate the intelligent decision-making process employs a crop growth prediction model and an intelligent optimization algorithm as the basic algorithm. It utilizes past experts’ experience and knowledge as the intelligent algorithm’s inspiration seeds. This strategy incorporates a precise decision-making optimization model based on the optimization algorithm for the greenhouse environment.
By continuously accumulating agricultural production knowledge, a suitable range of environmental parameters for different crop growth stages is obtained based on the growth and planting of crops. These parameters are summarized in the library presented in
Table 3.
Six parameters, including temperature, soil moisture, air humidity, nitrogen, phosphorus, and potassium, are divided based on some upper and lower limits, leading to 12 parameters in total. Meanwhile, a day and night mechanism is added. Then, the model library is used for prediction learning on the compiled parameters and the collected greenhouse environment information (
Table 4).
Based on the same evaluation method, different machine learning algorithms are evaluated and voted on seven evaluation indicators to determine the optimal model that can effectively learn the appropriate indicators corresponding to environmental information, such as temperature, and realize accurate prediction and adjustment of the growth environment.
For the optimization model, the decision function is based on the prediction model of the six indicators obtained after training using the model library. The upper and lower limits of the various indicators in the knowledge base are used as the upper and lower limits of the decision variables. The main goal of optimization is to control the greenhouse environment based on reasonable decision-making solutions. Using nitrogen, phosphorus, and potassium as examples, the optimization goal is to minimize the soil’s nitrogen, phosphorus, and potassium contents. For this approach, a multi-objective optimization problem with a minimization objective of constrained continuous decision variables is employed, formulated as follows:
where
represents the prediction function of different indicators obtained by learning the machine learning model, used as the objective function of the optimization problem.
is the predicted indicator, the value of
is determined by the number of indicators in the knowledge base used,
denotes the upper and lower limits after transformation from the indicators in the knowledge base, and m represents the number of corresponding constraints in the knowledge base. Then, the optimization model is used to optimize and solve this problem.
This article establishes a multi-objective optimization problem to minimize greenhouse input, which includes several indicators, shown in
Table 3, except for the growth stage. The changes in these indicators in the knowledge base determine the main cost of plant growth, and the changes in these indicators require different cost inputs, such as increasing the heat source in the greenhouse and increasing fertilizer application, which will cause cost increases. At the same time, due to the mutual influence of greenhouse factors, such as humidity affecting fertility, different factors need to be comprehensively considered. Based on the knowledge base for learning modeling, the first step is to enable the prediction model to provide the optimal value of a single indicator in the current greenhouse environment. At this time, the correlation between different indicators is not considered. Using the optimization model based on the single indicator prediction model can obtain its global optimum, that is, considering the optimal solution of different greenhouse indicators. Obviously, using the optimization model can further help decision-making in the greenhouse environment rather than local adjustment of a certain indicator. Therefore, the target variable for modeling this optimization model is the optimal indicator summarized from prior experience in the knowledge base, which means that the optimal corresponding indicator under the current parameters can be obtained.
This study compares four optimization models from the intelligent optimization field, namely the Non-Dominated Sorting Genetic Algorithm II (NSGA-2) [
17], NSGA-3 [
18], push and pull search Multi-Objective Evolutionary Optimization Decomposition (PPS MOEAD DE) [
19], and Reference Vector-Guided Evolutionary Algorithm (RVEA) [
20], and the most appropriate is selected as the appropriate model for solving the above problems. Before deploying the intelligent decision-making model, the corresponding optimization model parameters must be adjusted on the collected data. Considering the equipment’s performance limit, the number of populations and the maximum number of optimization iterations are set to 50 and 2000, respectively.