When choosing a model to predict the target variables such as changes in temperature, humidity, and other factors, it is essential to consider the prediction accuracy and the model performance. The experiment tested three methods: RandomForest, GradientBoosting, and XGBoost. Each method was evaluated based on the mean squared error (MSE) and training time. The main goal was to identify a model that balances accuracy, speed, and computational efficiency.
3.1. Comparison Results of the Main Classification Methods
RandomForest demonstrated the lowest MSE values, for example, for Delta_temp (1.57 × 10−8) and Delta_ph (1.78 × 10−10), making it the leader in accuracy among the tested methods. However, the training time was 3 min, which makes this method less suitable for processing large amounts of data or tasks that require timeliness. Long training times can be a critical limitation in real-world settings, especially when retraining the model on new data. GradientBoosting showed acceptable accuracy with MSE, for example, for Delta_temp (0.0044) and Delta_ec (1.77 × 10−5). Its training time was 1 min 16 s, which makes it more efficient than RandomForest. However, this method still requires a significant processing time and does not show clear performance advantages when working with large datasets. This limits its application in scalable and high-speed systems. XGBoost showed competitive MSE values within the acceptable accuracy range for Delta_temp (0.0107) and Delta_ph (0.0001288). However, its main advantage is the training time of only 2.3 s. Such high computational efficiency makes the method most suitable for problems that require fast data processing, scalability, and a low computational load. The speed of training and prediction allows XGBoost to be used for real-time problems where responsiveness is essential.
XGBoost was chosen due to its excellent training speed of 2.3 s, which is significantly faster than that of GradientBoosting and RandomForest. The difference in accuracy, measured via MSE, between XGBoost and other methods is minimal and does not significantly affect the quality of predictions. However, its high computational efficiency makes XGBoost an optimal choice for scalable problems and real-world applications where processing speed, flexibility, and minimal computational costs are key. The importance of features extracted by using XGBoost showed that the most significant feature is humidity_air (0.125534), indicating a key role. Features such as humidity_solution (0.122299) and temp (0.110875) also demonstrated high importance, confirming their influence on the classification process. According to the model training results, XGBoost achieved a high accuracy (Accuracy: 0.9788, F1-score: 0.9788) and a near perfect ROC-AUC: 0.9999, indicating its ability to distinguish between classes accurately. Moreover, the code execution time on the final dataset was only 1 min 22 s, demonstrating its high computational efficiency compared to the other methods. This makes XGBoost an optimal choice for tasks involving large amounts of data, providing a balance between the processing speed and the importance of the feature determined by GradientBoosting, which suggests that the most significant feature was temp (0.180000), which emphasizes the critical role of temperature in the analyzed problem. This is followed by humidity_solution (0.150000) and humidity_air (0.140000), significantly impacting the model predictions. Parameters such as ph_solution (0.130000) and ec_solution (0.100000) also play an important role, showing the contribution of solution characteristics to the model’s overall performance.
Figure 3 illustrates the important features identified by the three models.
The GradientBoosting model showed promising results in terms of accuracy (Accuracy: 0.9343, F1-score: 0.9343, ROC-AUC: 0.989) and an acceptable Log Loss of 0.1006, indicating its ability to distinguish between classes. However, the execution time was 20 min 45 s, making it the least efficient among the tested methods regarding the computational cost. This makes GradientBoosting less suitable for tasks that require high-speed processing of big data.
Table 2 compares the three classification algorithms, RandomForest, GradientBoosting, and XGBoost, using the key metrics—Accuracy, F1-score, ROC-AUC score, Log Loss, and execution time. The RandomForestClassifier showed the best accuracy (Accuracy: 0.9993) and minimal Log Loss (0.0906), but it was slower (1 min 35 s). GradientBoosting showed acceptable accuracy (Accuracy: 0.9343) but was the least time-efficient (20 min 45 s). XGBoost’s balanced accuracy (Accuracy: 0.9788, ROC-AUC: 0.9999) and execution speed (1 min 22 s) make it the most efficient choice.
The model’s adaptability was tested on an initial dataset that included only two microgreen species, beetroot and tarragon. The data for other species, such as radish, basil, mustard, watercress, spinach, parsley, and cilantro, were added to increase the versatility and validate the model. Using this approach, the tested models, RandomForest, GradientBoosting, and XGBoost, demonstrated how well they cope with changing environmental conditions, including new species and variations in temperature, humidity, light, pH, and conductivity. If necessary, the model easily adapts to removing or adding data for specific microgreen species, confirming its flexibility and versatility. The performance metric shows that each model demonstrated its adaptability. XGBoost achieved an accuracy of 0.9752, F1-score of 0.9746, ROC-AUC score of 0.9994, and minimum Log Loss of 0.1863 with a training time of 30 s, making it the best performer in regard to speed and accuracy. RandomForest achieved the best accuracy (Accuracy: 0.9984, F1-score: 0.9984, ROC-AUC score: 0.9996) with the minimum Log Loss (0.0912), but it took 1 min 25 s to train, which slightly reduces its applicability to large datasets. GradientBoosting achieved decent results with an accuracy of 0.9412, F1-score of 0.9407, and ROC-AUC score of 0.9901, but the training time was 4 min 33 s, making it less suitable for scalable problems. These results confirm that all the models effectively adapt to changing data and different environments, maintaining a high accuracy when considering the diversity of the input features. However, XGBoost stands out due to its speed and ability to work with large amounts of data. It is the most suitable choice for problems with dynamic changes and increasing data complexity.
3.3. Impact and Relationships of the Environmental Parameters
During the research on the cultivation of microgreens, the key factors and optimal intervals that ensure their health and development were identified. Specific standard parameters were established for two types of microgreens, beetroot and tarragon, which include temperature, air and solution humidity, illumination, pH level, and solution electrical conductivity (EC). The following parameters are considered optimal for beetroot. The temperature should be 18–24 °C, supporting active photosynthesis and cell division processes. An air humidity of 50–60% helps to avoid excessive transpiration and maintain a balanced water regime. Solution humidity in the range of 0.8–1.2 maintains the necessary ratio of water and nutrients for the root system. Illumination should be 12–16 h a day to ensure effective photosynthesis and the accumulation of biomass. A pH level of 6.0–6.5 provides the availability of nutrients, and an electrical conductivity of 1.0–1.4 EC helps plants receive enough nutrition without the risk of salt stress. For tarragon, the optimal conditions are slightly different. The temperature should be 20–25 °C, which promotes the accumulation of biologically active substances, such as essential oils. An air humidity of 60–70% maintains healthy leaves, reducing the risk of dehydration. The moisture of the solution, as for beets, is 0.8–1.2, providing optimal conditions for the root system. Tarragon requires 12–16 h of light for active growth and development. The pH level is preferably within 5.5–6.0, which creates favorable acidity for the absorption of nutrients. The electrical conductivity of the solution for tarragon is slightly higher at −1.2–1.6 EC, which is due to its increased need for nutrients.
Each of these factors has a significant impact on plant growth and health. The temperature determines the rate of photosynthesis and cell growth, and its deviations can lead to heat stress. The air humidity affects plants’ transpiration and water balance: too low humidity causes dehydration, and too high humidity promotes the development of fungal infections. Solution humidity regulates the water supply to the root system, and its balance is necessary to prevent root hypoxia. Light provides plants with the energy for photosynthesis, and its deficiency or excess can lead to slow growth or photostress. The pH controls the availability of nutrients, and electrical conductivity (EC) reflects the concentration of these substances in the solution—its imbalance can lead to nutrient deficiencies or salt stress. Thus, compliance with these parameters provides optimal conditions for growing microgreens and promotes healthy development.
Table 4 demonstrates the optimal growing conditions of two types of microgreens: beetroot and tarragon. The data were obtained via sensors measuring the key parameters such as temperature, air and solution humidity, light, pH level, and solution conductivity. For each type of microgreen, the table presents the corresponding values for these parameters. For example, for beets, the temperature is 20.6 °C, and the air humidity is 54.5%, and for tarragon, it is 23.5 °C and 64.9%, respectively. These values help to assess the conditions maintained in the growing system (grow box) for each type of plant.
The key element of the dataset is the negative_class column, which reflects the number of parameter deviations from the norm. This column indicates how many factors are outside the optimal values for each plant type. If all the parameters are within the norm, the deviation class will be 0. For example, if only temperature deviation is recorded for beetroot, the deviation class will be (1). If deviations are recorded for two parameters, such as temperature and air humidity, the class will be (2). For tarragon, given six parameters, the deviation class can take values of up to 65 if all the parameters are simultaneously outside the norm. This approach to classifying deviations allows for the creation of different combinations of factors that can be used for automated monitoring and forecasting in microgreen management systems. For example, class 1 for beetroot indicates a deviation of one factor, such as temperature, while class 65 for tarragon indicates multiple deviations in several parameters simultaneously. This classification mechanism enables the more accurate and flexible management of the growing conditions, which is especially important for automated monitoring systems.
Figure 4 shows a correlation matrix showing the relationship between the critical parameters affecting microgreen growth, such as temperature (temp), air humidity (humidity_air), solution humidity (humidity_solution), light (light), solution pH (ph_solution), and solution electrical conductivity (ec_solution). The color scale on the right shows the degree of correlation: warm shades of red indicate a positive correlation, and blue shades indicate a negative correlation. The correlation values range from −1 (strong negative correlation) to 1 (strong positive correlation), with 0 indicating no relationship between the variables.
Temperature (temp) has a weak positive correlation with air humidity (0.11) and solution electrical conductivity (0.056), and its influence on the other parameters, such as solution substrate, light, and pH, is minimal. Air humidity (humidity_air) shows a negative correlation with solution pH (−0.35), indicating that increasing humidity can decrease the pH and has a moderate positive correlation with electrical conductivity (0.2). Solution humidity (humidity_solution) has virtually no significant correlation with the other parameters, except for a weak relationship with light and electrical conductivity, indicating a low interdependence. Light has a weak positive correlation with solution humidity, possibly due to chance. Solution pH (ph_solution) shows a noticeable negative correlation with air humidity and a weak negative correlation with temperature and electrical conductivity. The electrical conductivity of the solution (ec_solution) is positively correlated with air humidity, which may indicate the need to adjust the composition of the solution at high humidity to maintain the optimal conditions.
Figure 5 shows a mutual information analysis diagram that reflects the relationship between factors such as air humidity, solution pH, temperature, and the target attribute, negative_class, representing the likely negative impact of parameter deviations on the health of microgreens. Mutual information determines how much information one feature contains about another, in this case, the target variable. This indicator helps to understand how strongly each factor, such as air humidity or pH, relates to the target class and its significance for predicting the health of microgreens.
Based on
Figure 5, it can be concluded that the humidity_air and pH of the solution (ph_solution) have the most significant impact on the target class, indicating that these parameters are strongly associated with negative deviations in the condition of micro-greens, affecting their health. The electrical conductivity of the solution (ec_solution) and temperature (temp) also play an essential role, but their influence is slightly smaller com-pared to the first two parameters. Plant type encoding (type_encoded) also shows a significant value, which may indicate differences in the sensitivity of different plant types, such as beetroot and tarragon, to parameter deviations. Light and humidity_solution show the lowest mutual information, indicating their relatively low contribution to predicting negative deviations. This analysis helps to highlight the key factors to consider when building predictive models for microgreen health analysis, which can further im-prove the accuracy of machine learning models.
Figure 6 shows the feature importance diagram, indicating the factors significantly impacting the prediction of the target class negative_class. As a result of the analysis, each feature was rated according to its importance for accurately predicting the negative impact on microgreens.
After training, the model rated the importance of each feature based on how often and how strongly it influences the decisions in each tree. The more a feature helps reduce the prediction error, the higher its importance. The diagram shows that light and temperature were the most important for predicting the negative class, as they significantly influenced the model’s decision on the deviations that affect microgreen growth. Next were humidity_solution and ec_solution, which had a significant but slightly smaller influence. The least important feature was the encoded plant type feature, which may indicate a minor impact on the overall parameter deviations compared to other factors. This feature importance analysis provides valuable information about which parameters are critical to control to maintain the optimal microgreen growth conditions and prevent negative deviations.
Table 5 shows the initial data for beets, where the key parameters are listed: temperature, air humidity, solution moisture, light, solution pH, and solution electrical conductivity (EC). These data are compared with the optimal ranges for each parameter. For example, the temperature for beets is 26 °C, which is 2 °C above the norm (18–24 °C). This deviation is then evaluated in the second table, which shows the predicted deltas (deviations from the norm) and their negative impact on the plant. In this case, temperature has a 100% negative impact on beets, while the other parameters, such as air humidity and solution pH, are within the norm and do not have a negative impact.
Figure 7 includes boxplots showing the distribution of values for parameters such as temperature, air humidity, solution moisture, illumination, solution pH, and solution electrical conductivity (EC). Red dots indicate abnormal values; for example, 15 °C and an air humidity of 76% are outside the permissible values. Solution moisture, illumination, pH, and electrical conductivity are within the normal range, as evidenced by green dots.
Table 6 shows a similar analysis process but with a changed value of air humidity (65%), which is 5% above the upper limit of the norm. This change in substrate resulted in a 100% negative impact on the plant. However, even though the temperature remained above the norm, its negative impact decreased to 23.04%. This demonstrates the interaction between the parameters: an increase in air humidity mitigated the adverse effects of temperature, emphasizing the importance of analyzing the relationships between factors.
Figure 8 shows boxplots showing the distribution of values for the six key parameters. Red dots indicate abnormal values that are outside the normal ranges. For example, the temperature is 26 °C, above normal, and humidity is 65%, also above the optimal range, which are shown as anomalies. The parameters of solution moisture, illumination, pH, and solution conductivity are within the normal limits, as evidenced by the green dots. These results indicate that temperature and humidity must be adjusted to maintain the optimal plant growth conditions, while the other parameters do not require changes.
It is recommended to reduce the temperature to 18–24 °C and air humidity to 50–60% to avoid heat stress and the development of fungal infections, which can negatively affect plant growth. The other parameters should continue to be monitored to maintain stable conditions. These measures will help stabilize the growing conditions, improving the quality and yield of microgreens. Let us consider, for example, the temperature for beets
= 20.96, where the norm is set as
= 18, and
= 24. In this case,
is within the norm, so that the membership degree will be equal to 0 (9):
Now let us consider the illumination
= 17.24, where the normal range is [
12,
16]. Then, we calculate the degree of deviation (10):
this value indicates that the illumination exceeds the norm by 0.774 units from 0 to 1. Thus, hybrid analysis uses a combination of methods to evaluate the factors that affect microgreens.
3.4. Evaluation and Accuracy Metrics
Each stage plays its role: XGBoost selects significant features, PCA transforms the features into a new space for better data representation, and fuzzy logic allows the evaluation of each factor’s deviation from the norm. Accurate data are collected in a controlled hydroponic grow box to test the hybrid model. The collected data are used to validate the model, refine its algorithms, and test its ability to suggest corrective measures in the face of dynamic and changing parameters.
Figure 9 shows a bar chart plotting the model quality metrics, including MAE, MSE, RMSE, R
2, AIC, and BIC. The MSE and MAE metrics have minimal values (almost zero), and the R
2 determination coefficient shows the high accuracy of the model with a result of 0.99. The AIC and BIC metrics have negative values (−25.76 and −27.01), indicating a good model adaptation to the data. This plot illustrates how effectively the model copes with the predictions and interpretation of data.
A bar chart in
Figure 10 illustrates the model quality metrics, including MAE, MSE, RMSE, R
2, AIC, and BIC. The MAE and MSE metrics have low values (0.76 and 1.19, respectively), indicating minor average errors in the model. The R
2 coefficient of determination is 1.00, indicating the high accuracy of the model. The AIC and BIC metrics showed values of 13.03 and 11.78, reflecting the level of adaptation of the model to the data. The graph highlights the overall quality of the model and its ability to make accurate predictions for the given parameters.
The hybrid method of analysis using weights and fuzzy logic demonstrated a high prediction accuracy (MAE: 0.03, MSE: 0.00, R2: 0.99) and profound data interpretation. It highlights the critical parameters (air temperature and humidity), making it suitable for complex analysis systems. The AIC (−25.76) and BIC (−27.01) metrics confirm the effectiveness and adaptation of the model. The traditional method based on regression models showed higher errors (MAE: 0.76, MSE: 1.19) and limited interpretation. Although R2 is 1.00, the lack of consideration of parameter significance makes it less effective. This method is suitable for simple analysis but is inferior to the hybrid approach in problems requiring deep analysis and considering the influence of factors.