Prediction and Classification of Phenol Contents in Cnidium officinale Makino Using a Stacking Ensemble Model in Climate Change Scenarios

Lee, Hyunjo; Koo, Hyun Jung; Lee, Kyeong Cheol; Song, Yoojin; Joo, Won-Kyun; Chae, Cheol-Joo

doi:10.3390/agronomy14081766

Open AccessArticle

Prediction and Classification of Phenol Contents in Cnidium officinale Makino Using a Stacking Ensemble Model in Climate Change Scenarios

by

Hyunjo Lee

¹

,

Hyun Jung Koo

²,

Kyeong Cheol Lee

²

,

Yoojin Song

³,

Won-Kyun Joo

^3,*

and

Cheol-Joo Chae

^1,*

¹

Department of General Education, Korea National University of Agriculture and Fisheries, Jeonju 54874, Republic of Korea

²

Department of Crops and Forestry, Korea National University of Agriculture and Fisheries, Jeonju 54874, Republic of Korea

³

Department of Datacentric Problem Solving Research, Korea Institute of Science and Technology Information, Daejeon 34141, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Agronomy 2024, 14(8), 1766; https://doi.org/10.3390/agronomy14081766

Submission received: 3 July 2024 / Revised: 6 August 2024 / Accepted: 8 August 2024 / Published: 12 August 2024

(This article belongs to the Special Issue Application of Deep and Machine Learning in Crop Monitoring and Management)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Recent studies have focused on using big-data-based machine learning to address the effects of climate change scenarios on the production and quality of medicinal plants. Challenges relating to data collection can hinder the analysis of key feature variables that affect the quality of medicinal plants. In the study presented herein, we analyzed feature variables that affect the phenolic content of Korean Cnidium officinale Makino (C. officinale Makino) under different climate change scenarios. We applied different climate change scenarios based on environmental information obtained from Yeongju city, Gyeongsangbuk-do, Republic of Korea, and cultivated C. officinale Makino to collect data. The collected data included 3237, 75, and 45 records, and data augmentation was performed to address this data imbalance. We designed a function based on the DPPH value to set the phenolic content grade in C. officinale Makino and proposed a stacking ensemble model for predicting the total phenol contents and classifying the phenolic content grades. The regression model in the performance evaluation presented an improvement of 6.23–7.72% in terms of the MAPE; in comparison, the classification model demonstrated a 2.48–3.34% better performance in terms of accuracy. The classification accuracy was >0.825 when classifying phenol content grades using the predicted total phenol content values from the regression model, and the area under the curve values of the model indicated high model fitness (0.987–0.981). We plan to identify the key feature variables for the optimal cultivation of C. officinale Makino and explore the relationships among these feature variables.

Keywords:

climate change; Cnidium officinale Makino; machine learning; phenols; stacking ensemble model

1. Introduction

Medicinal plants have long been utilized for the prevention and treatment of diseases. Of late, natural materials and new material extraction methods for medicinal purposes have gained recognition, with research in related fields progressing actively [1,2,3]. Medicinal plants contain various active compounds with distinct pharmacological properties, including antioxidant components such as vitamin C, vitamin E, and phenols, which remove reactive oxygen species (ROS) from the body, prevent damage to various tissues and different diseases, and exert anti-inflammatory and anticancer effects [4]. Challenges emerge in the cultivation of medicinal plants when the concentration of active compounds fluctuates even among plants of the same species, and this fluctuation is influenced by environmental variables, such as soil composition, climate conditions, and topography. Variations in the contents of active ingredients in medicinal plants pose a challenge that can lead to economic losses for users seeking to utilize them, because they may not provide sufficient ingredients, thereby leading to reduced efficacy or potential side effects [5]. Research has focused on the relationship between geographical conditions and active ingredient contents in medicinal plants [6,7,8,9]. Such studies highlight the importance of regional climate as a key factor in the successful growth of medicinal plants. Various climate change scenarios, such as the Special Report on Emission Scenarios (SRES), representative concentration pathways (RCPs), and shared socioeconomic pathways (SSPs), have been proposed to simulate the effect of climate change [10]. Ongoing research focuses on machine learning techniques using big data for climate change scenarios.

Applications of machine learning in agriculture focus on crop yield prediction, crop identification, disease detection, crop quality grading, and forecasting active ingredient concentrations. In crop yield prediction, researchers utilize historical data and environmental factors such as soil and climate conditions to enhance accuracy, often employing single predictive models with feature selection techniques [11,12,13,14,15,16,17,18]. Crop identification and disease detection leverage image analysis from satellites and monitoring cameras to classify crops and identify diseases based on visual characteristics. Ongoing efforts aim to improve accuracy through ensemble models combining multiple convolutional neural networks (CNNs) [19,20,21,22]. Crop grading studies utilize image analysis to classify fruits and vegetables based on features such as color and size [23,24,25]; in comparison, research on predicting active ingredient contents primarily targets medicinal plants, facing challenges in data collection due to lower yields and higher costs compared to staple food crops [26,27,28,29]. Overall, while significant advancements have been made in machine learning applications for food crops, progress in medicinal plants remains limited due to data collection difficulties.

The medicinal plant Cnidium officinale Makino (C. officinale Makino) is sensitive to climate change because of its delicate leaves. This plant thrives in habitats with temperatures below 30 °C, moderate sunlight, and significant temperature variations [30]. As of 2010, only 3% of the total land area in South Korea, including a portion of Gyeongsangbuk-do, was suitable for cultivating C. officinale Makino. The quality of C. officinale Makino is determined by the presence of phenol, an antioxidant. In the following study, we propose a stacking ensemble model that can predict the total phenol contents and classify the phenol content grades based on different climate change scenarios (SSP1-2.6, SSP3-7.0, and SSP5-8.0) for C. officinale Makino. The proposed model enables the analysis of key variables influencing the phenol content in C. officinale Makino.

The remaining sections of the manuscript are organized as follows: In Section 2, an overview of the published research on crop prediction and classification using machine learning methods is presented. In Section 3, experimental methods and models for predicting phenol contents and classifying phenol content grades based on different climate change scenarios are presented. In Section 4, an assessment is conducted on the performance of predicting phenol content and classifying phenol content grades in C. officinale Makino using the proposed method. The prediction and classification results, in addition to the importance of the feature variables, are analyzed. Lastly, Section 5 concludes the study.

2. Materials and Methods

2.1. Data Architecture

We established different environments based on varying climate change scenarios using a day-lit soil plant system chamber (SPDS chambers, PTW Freiburg, Freiburg, Germany) and collected C. officinale Makino data, including environmental information, physiological response indicators, and physiological activity indicators. In such conditions, issues such as data imbalance or scarcity may arise given the limitations of the cultivation environment. Therefore, we applied data augmentation techniques to the collected data. We produced an equation to assess the phenol content grades based on the measured values obtained from the 2,2-diphenyl-1-picrylhydrazyl (DPPH) radical scavenging activity. The proposed stacking ensemble model includes base models and a metamodel. The final prediction results depend heavily on the base models, and therefore, selecting the base models is crucial. We selected the base models using Spearman’s correlation coefficients. Figure 1 illustrates the proposed model flow for predicting the total phenol contents and classifying the phenol content grade of C. officinale Makino.

2.2. Data Collection

The C. officinale Makino data were collected by cultivating the plant in SPDS chambers at the Climate Change Education Center of Korea National University of Agriculture and Fisheries from May to September 2023. The environmental conditions of the chambers were set based on the average monthly temperatures of the three most recent years (2020–2022) in Yeongju city, Gyeongsangbuk-do, as follows: SSP1-2.6 (CO₂ 445 ppm, +1.8 °C, 60%), SSP3-7.0 (CO₂ 872 ppm, +3.6 °C, 60%), and SSP5-8.5 (CO₂ 1142 ppm, +4.4 °C, 60%) [31]. Environmental information such as CO₂, temperature, and humidity conditions was collected using sensors in the chamber. In addition, the vapor pressure deficit (VPD) was calculated using the temperature and humidity. The physiological response indicators were assessed monthly in five repeated experiments. The physiological response indicators consist of the following three types of parameters: chlorophyll contents (i.e., chlorophyll a (Chl a) and chlorophyll b (Chl b), total chlorophyll (TChl), carotenoids (Car), the ratio of chlorophyll a/b (Chl a/b), and the ratio of total chlorophyll/carotenoids (TChl/Car)), energy flux per reaction center (RC) parameters (i.e., absorption flux per RC (ABS/RC), energy dissipation flux per RC (DIo/RC), trapping of electrons per RC (TRo/RC), and electron flux per RC (ETo/RC)), and photosynthesis activity parameters (i.e., performance index on absorption basis (PI abs), driving force on absorption basis (DF abs), structure–function index on absorption basis (SFI abs), and maximum quantum yield of PSII photochemistry (Fv/Fm)). Physiological activity indicators such as the total phenol contents and DPPH radical scavenging activity were assessed in selected samples harvested in September in three repeated experiments. The data for each climate change scenario consist of 3237, 75, and 45 records for environmental information, physiological response indicators, and physiological activity indicators, respectively.

2.3. Data Augmentation

The collected C. officinale Makino data comprise a small number of physiological response indicators and physiological activity indicators compared to environmental information, which causes data imbalance, thereby leading to decreased prediction performance in machine learning. In the present study, we addressed the issues of data size and imbalance by utilizing TVAE (Tabula Variational Autoencoder) [32,33]. The TVAE performs mode-specific normalization and conditional generation with training-by-sampling to reflect the distribution of the original data by column. In mode-specific normalization, the distribution of categorical and continuous variables is normalized through one-hot encoding. The number of bits for categorical variables is the same as the number of categories in that column; in comparison, for continuous variables, it is the same as the number of sub-Gaussian distributions generated by the Gaussian mixture model. In the collected C. officinale Makino data, the column data for chlorophyll a (Chl a) collected in May were divided into three sub-Gaussian distributions, resulting in three bits. If the i-th entry of the column represents a probability distribution of (0.03, 0.8, 0.2), the largest value, 0.8, is set to 1 and the others are set to 0, resulting in one-hot encoding as 010. In conditional generation with training-by-sampling, new data are generated based on the normalized variables to address the issue of data imbalance. For categorical variables, a log transformation is applied to increase the selection probability of categories with low frequency. Moreover, during the training process, the same ELBO loss function as VAE is used to minimize the generation of augmented data that are dissimilar to the original data [34]. The ELBO loss function is as follows.

E_{q_{ϕ} (z | x)} [l o g \frac{p_{θ} (x, z)}{q_{ϕ} (z | x)}] = E_{q_{ϕ} (z | x)} [l o g p_{θ} (x | z)] - D_{K L} (q_{ϕ} (z | x) ‖ p (z))

(1)

To verify the efficiency of augmented data, performance evaluations were conducted using original data and augmented data based on methods used in existing studies [35,36,37]. As a result, it was confirmed that the performance is similar or improved when using augmented data.

A total of 27,000 augmented data records were generated using the TVAE technique. The cosine similarity, correlation similarity, and Jensen–Shannon similarity were measured to assess the similarity between the original and augmented data. The similarity measurement results showed that the augmented data aligned well with the data characteristics, with an average similarity rate of over 97%. Table 1 summarizes the similarity measurement results for the augmented data.

Figure 2 shows the comparison between the quartile distributions of the original and augmented data. The quartile and mean values of all columns (features) were calculated to compare the quartile distributions. The values were normalized to the range of 0 to 100 using the min–max scaling method. The difference in the quartile distributions between the original and augmented data was small, and the average error rate for the quartiles in each column was 90%. The effectiveness of the augmented data was confirmed by their characteristics, which closely resembled those of the original data.

2.4. Assigning Phenol Content Grades Using DPPH Radical Scavenging Activity

DPPH radical scavenging activity is used to measure the antioxidant activity of plants [38]. We established an equation to classify the phenol content grade of C. officinale Makino based on the DPPH radical scavenging activity results. We collected samples before and after conducting cultivation experiments on C. officinale Makino and extracted phenols using 70% ethanol to assess the DPPH results. We prepared C. officinale Makino extracts in different concentrations (0.0625, 0.125, 0.25, 0.5, and 1%) and added 100 µL of the extracts, 100 µL of 99% ethyl alcohol, and 100 µL of 0.4 mM DPPH to a 96-well plate. The plate was incubated at room temperature in the dark for 30 min, and the absorbance was measured at 517 nm using a microplate reader. Ascorbic acid was used as a control.

DPPH (%) = \frac{Absorbance of control - Absorbance of test}{Absorbance of control} \times 100

(2)

The total phenol contents in random sample A of C. officinale Makino were proportional to the total phenol contents of A multiplied by DPPH.

{ExtractedPhenolContents}_{A} \propto {TotalPhenolContents}_{A} \times DPPH (%)

(3)

Based on Equations (2) and (3), if the DPPH value of sample A was the same as that of the control group, ascorbic acid (e.g., IC₅₀), the phenol content in the extracts of A would be the same as that in the extracts of ascorbic acid.

{TotalPhenolContents}_{A} \times {DPPH}_{A} = {TotalPhenolContents}_{Ascorbic acid} \times {DPPH}_{Ascorbic acid} \Leftrightarrow {TotalPhenolContents}_{A} = ({TotalPhenolContents}_{Ascorbic acid} \times {IC}_{50} (Ascorbic acid)) / {IC}_{50} (A)

(4)

In Equation (4), TotalPhenolContents_{Ascorbic acid} is represented by a constant C with an IC₅₀ value of 0.04 for ascorbic acid. The total phenolic content of sample A was quantified using (5), as follows:

{TotalPhenolContent s}_{A} = \frac{C \times 0.04}{{IC}_{50} (A)}

(5)

Min–max normalization was applied to the IC₅₀ values to determine the phenol contents. Consequently, the original values in the range of 0.04 to 1.00 were transformed into values within the range of 0 to 1. Subsequently, a linear transformation aligned the scaled IC₅₀ values in the range of [highest content grade (C_high = 1) and lowest content grade (C_low = 10)] as described in Equation (6), as follows:

{{IC}_{50} (A)}^{’} = |C_{high} - C_{low}| \times \frac{{IC}_{50} (A) - Min (DPPH)}{Max (DPPH) - Min (DPPH)} + C_{high}

(6)

Lastly, for computational convenience, we set C = 1 and used the ceiling function to convert grades into integers. Equation (7) presents a function for quantifying the phenol content grade.

PhenolContentGrade = ⌈\frac{0.04}{{IC}_{50} (A) ’}⌉

(7)

In the present study, we set IC₅₀min = 0.04, representing the IC₅₀ value of ascorbic acid; IC₅₀max = 1.00, the maximum DPPH value; the highest concentration grade (C_high) as 1; and the lowest concentration grade (C_low) as 10. The calculated IC₅₀ values are listed in Table 2.

The regression and classification models were evaluated using both the original and augmented data to assess the effectiveness of the augmented data. The following five models were used to evaluate the performance of the regression model: linear regression (LinearR), k-nearest neighbors (kNN), support vector machine (SVM), decision tree (DT), and random forest (RF). We calculated the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) for each model and generated performance vectors for each climate change scenario using Equation (8), as follows:

{\vec{V}}_{R} = [\begin{matrix} {MAE}_{SSP}, & {RMSE}_{SSP}, & {MAPE}_{SSP} \end{matrix}]

(8)

Figure 3a shows the performance vectors under the different climate change scenarios. The original and augmented data yielded similar prediction errors. Except for the linear regression and DTs, the results were similar, indicating that the quality of the augmented data was excellent.

Logistic regression (LogisticR), kNN, SVM, DT, and RF are commonly used to evaluate classification models. We evaluated the accuracy, precision, recall, F1 score, and area under the curve (AUC) to generate performance vectors for each climate change scenario using Equation (9), as follows:

{\vec{V}}_{C} = [\begin{matrix} {Accuracy}_{SSP}, & {Precision}_{SSP}, & \begin{matrix} {Recall}_{SSP}, & {F 1 score}_{SSP}, & {AUC}_{SSP} \end{matrix} \end{matrix}]

(9)

To enhance visibility while comparing results, the ln transformation was performed using Equation (10) after measuring the vector length.

{\vec{V}}_{t r a n s f o r m} = \ln \vec{V} + 2.0

(10)

Figure 3b shows the performance vectors of accuracy, precision, recall, F1 score, and AUC based on the climate change scenarios. The original data yielded a slightly better performance than the augmented data because overfitting of the model occurred when using the original data, given its limited quantity.

2.5. Stacking Ensemble Model for the Prediction and Classification of Phenol Contents in C. officinale Makino

The proposed structure of the stacking ensemble model is illustrated in Figure 4.

Data on C. officinale Makino were collected by cultivating the plant in SPDS chambers under different climate change scenarios. Once the data were collected, the phenol content grades were measured and augmented data were generated. The augmented data include environmental information, physiological activity indicators, and physiological response indicators. In the prediction model, the base models predicted the total phenol contents and generated a transposed matrix of their results. The metamodel used the base model results for the final prediction. In the classification model, the base models classified phenol content grades using both augmented data and predicted total phenol contents from the prediction model. The metamodel used the base model results for the final classification.

The results of the base models serve as input variables for the metamodel in the proposed stacking ensemble model. Therefore, the diversity and performance of each base model are crucial for improving its performance [39]. In the present study, we analyzed the performance and correlations of various base models to construct a stacking ensemble model. By selecting base models with a low correlation of the results for prediction and classification, the diversity of the base models increases in the ensemble model, leading to improved performance [40]. We conducted performance evaluations on nine base models (LinearR/LogisticR, kNN, SVM, DT, RF, XGBoost, AdaBoost, LightGBM, and CATBoost) and measured the correlations among the models using Spearman’s correlation coefficient [41,42]. For optimal results, three to five base models were selected as recommended. In the stacking ensemble regression model, we selected XGB as the first base model (shown in Figure 5a), which shows the lowest prediction error, and then chose kNN, LinearR, LGBM, and ADA in order of the lowest sum of correlations with the previously selected models(Table 3). Similarly, in the classification model, we first selected XGB (shown in Figure 5b), which shows the best accuracy, and then chose DT, ADA, kNN, and CAT. To address the overfitting issue in the ensemble model, we applied cross-validation-based model training. We used five-fold validation which creates training and testing sets for each fold used to train the metamodel. Our results confirmed that we were able to enhance the model’s completeness and prevent overfitting.

{Vector}_{Total} = [\begin{matrix} {Vector}_{SSP1}, & {Vector}_{SSP3}, & {Vector}_{SSP5} \end{matrix}]

(11)

A performance evaluation was conducted to determine the best combination of base and metamodels. The best combination of base and meta modelsfor regression was {(XGB, kNN, LinearR, LGBM, and ADA), RF}; in comparison, for classification, the best combination was {(XGB, DT, ADA, and kNN), RF} (Figure 6a,b, respectively).

3. Results

We grouped the feature variables based on environmental information and physiological response indicators to identify the causal variables with the best prediction and classification performance. We included all environmental variables for each group of feature variables because they are the only variables that can be adjusted when cultivating C. officinale Makino. In the classification models, environmental information and total phenol contents predicted from each regression model were included for all groups. Therefore, as shown in Table 4, we created eight variable groups using the physiological response indicators chlorophyll content (Chl), energy flux per reaction center (EF), and photosynthesis activity (PA). For the experiment, 80% of the data were used for training, 20% were used for testing, and k-fold cross-validation (k = 5) was performed.

We measured the MAE, RMSE, MAPE, and R² scores to assess the performance of the prediction model. We compared the performance of the proposed model with that of XGB, which had the best prediction and classification performances among the single models. We also set the hyperparameters for each model using Optuna, the hyperparameter optimization framework. Optuna is a framework that automatically adjusts and optimizes the hyperparameters of machine learning models, performing optimization through a tree-structured Parzen estimator (TPE) and a pruning mechanism. TPE is a type of Bayesian optimization that suggests hyperparameter combinations expected to improve model performance based on previous evaluation results. At this point, additional resources are allocated to the nodes performing training and evaluation with the proposed combinations to reduce the execution time. Furthermore, through a pruning mechanism, if the intermediate evaluation value of a node being trained with a specific hyperparameter combination does not meet expectations, the node is stopped without further training, thereby increasing the speed of the optimization process and reducing unnecessary resource consumption. To optimize hyperparameters using Optuna, it is necessary to define the objective function for each model and establish the ranges for hyperparameter search. In the present study, we defined the objective function (minimize, RMSE) to minimize the RMSE value for the regression models. For the classification models, we defined the objective function (maximize, accuracy) to maximize the accuracy value. The range of the hyperparameter search and the selected hyperparameter values for the regression and classification models are shown in Table 5.

3.1. Prediction of Total Phenol Contents in XGB and the Proposed Stacking Ensemble Regression Model

We performed total phenol content prediction using XGB and the proposed stacking ensemble regression model in the climate change scenarios SSP1-2.6, SSP3-7.0, and SSP5-8.5. The prediction error vectors of the two models were generated using Equations (8), (10), and (11), and the comparison results are shown in Figure 7. The proposed ensemble model demonstrated 6.23–7.73% less error and showed a 1.89–3.50% better R² score compared to those of the XGB. The proposed stacking ensemble regression model exhibited a lower prediction error than that of the XGB because the proposed model calculates the results using the predicted values obtained from the base models. The performance of the feature variable groups that included photosynthetic activity parameters, such as PA, Chl-PA, EF-PA, and Chl-EF-PA, was ~23% better than that of the other variable groups, because these groups enhance the model’s fitness for actual data by complementing parts that are difficult to explain using only environmental information.

3.2. Classification of Phenol Content Grade in XGB and the Proposed Stacking Ensemble Classification Model

We evaluated the performance of the proposed classification ensemble model compared with the XGB using the climate change scenarios SSP1-2.6, SSP3-7.0, and SSP5-8.5. Using Equations (9)–(11), we generated the model-specific accuracy vectors and compared them (Figure 8). As shown in Figure 8a,b the Chl-EF-PA group demonstrated the highest performance for XGB and the proposed model, respectively. The proposed ensemble model demonstrated a 2.28–6.92% better performance compared with the XGB in terms of accuracy and F1 score. Unlike regression models, the parameters of the physiological response indicators have less impact on the classification accuracy because quantization occurs when classifying the phenol content grades, which decreases the importance of the differences among the variables. The ROC curves are shown in Figure 8c. The AUC values of the ROC curve are in the range of 0.978 to 0.981, indicating high model fitness.

4. Discussion

By analyzing the feature variables influencing the phenol content of C. officinale Makino, such information can be utilized for the cultivation of high-quality C. officinale Makino in controlled environments such as smart farms. To achieve this, the feature importance (FI) in regression and classification models was analyzed. In the present study, the permutation importance (PI) technique was utilized to measure the importance of features. PI is a method that assesses the importance of a specific feature by measuring the reduction in a model’s performance when the data order of that feature is randomly shuffled, using performance metrics such as MAE, MSE, R², and accuracy. The process of PI consists of the following five steps: feature selection, data shuffling, performance evaluation, and importance calculation, making it efficient in terms of time and resources, as it does not require model retraining. To validate the importance of features, the average value of PI was measured through k-fold (k = 5) cross-validation during the execution of PI. The data used in the experiments include causes and intermediate processes of photosynthesis, in addition to final production results, which can lead to causal relationships and measurement errors due to multicollinearity. However, the results of experiments conducted by varying the variables within the feature variable group showed that the best performance was achieved when all variables were used, indicating that removing specific features could negatively impact model performance. Considering this domain information, we could not find appropriate methods to mitigate multicollinearity. In order to enhance robustness during the measurement of feature importance, outlier and noise removal were performed on the collected raw data, and normalization was conducted before training.

In the proposed stacking ensemble regression model, we measured the FI, as shown in Figure 9a. In SSP1-2.6, the VPD and temperature showed an FI of 25.59 and 14.90%, respectively; in comparison, in SSP5-8.5, these values stood at 23.02 and 42.99%, respectively. The CO₂ and humidity conditions were consistently controlled; in comparison, the temperature and VPD changed continuously. Therefore, temperature and VPD have higher feature importance than CO₂ and humidity in predicting the total phenol contents. In SSP5-8.5, the photosynthetic activity of C. officinale Makino is disturbed because of the significantly higher temperature (±4.4 °C) compared to the plant’s optimal cultivation environment, which decreases the importance of variables in the physiological response indicators. In contrast, in SSP3-7.0, C. officinale Makino overgrew owing to conditions of high temperature, humidity, and CO₂, thereby highlighting the increased importance of physiological response indicators. The excessive production of ROS due to overgrowth consumes large amounts of phenols for antioxidant activity, leading to a decrease in the total phenol contents.

In the classification ensemble model, we measured the feature importance (Figure 9b). While measuring FI, we excluded the predicted total phenolic content, which was directly used to assign the grade of phenolic content. In SSP1-2.6, temperature and VPD showed an FI of 20.78 and 10.34%, respectively; in SSP3-7.0, temperature and VPD showed an FI of 5.33 and 5.44%, respectively; and in SSP5-8.5, they showed an FI of 27.73 and 3.41%, respectively. Temperature and VPD were highly significant in the classification model.

To analyze the impact of feature variables on model performance in the base models and stacking ensemble model, the performance index (PI) of the base models was measured. In the stacking ensemble regression model, five features with FI values ranking in the top 25% were selected for comparison with the FI values of the base models. The top five features identified for the regression model under climate change scenarios are as follows: SSP1-2.6 = (VPD, Temp, TChl, Fv/Fm, and Car), SSP3-7.0 = (Car, Chl a/b, Fv/Fm, ABS/RC, and Chl b), and SSP5-8.5 = (Temp, VPD, ABS/RC, Car, and Fv/Fm). Upon evaluating the average rank values of the selected features in the base model, the results were as follows: SSP1-2.6 = (1.4, 1.6, 7.6, 5.8, and 7.4), SSP3-7.0 = (1.4, 4.0, 4.8, 3.6, and 6.2), and SSP5-8.5 = (1.2, 2.4, 7, 3.4, and 6.0). Therefore, it is evident that both Temp and VPD show high importance not only in the base model but also in the proposed stacking ensemble model for SSP1-2.6 and SSP5-8.5. In the case of SSP3-7.0, although the rankings of the five features differ, they all belong in the top five, indicating similarity between the base model and the proposed stacking ensemble model. Likewise, a comparison was conducted for the top five features in the stacking ensemble classification model. The top five features identified for the classification model under climate change scenarios are as follows: SSP1-2.6 = (Temp, Dio/RC, VPD, TChl, and Car), SSP3-7.0 = (Tro/RC, Eto/RC, TChl, and Chl a/b), and SSP5-8.5 = (VPD, PI abs, Chl a/b, Fv/Fm, and ABS/RC). When evaluating the average rank values of the selected features in the base model, the results were as follows: SSP1-2.6 = (7.25, 4.25, 5, 4.25, and 8.25), SSP3-7.0 = (4.25, 2.5, 7.75, 4, 5, and 7), and SSP5-8.5 = (3.25, 6.75, 7.5, 4.5, and 9.25). The XGB model demonstrated a high similarity in rank among the base models, whereas the ranks of ADA and kNN exhibited a notably low similarity. Consequently, this led to a decrease in the average rank value for the top five features. In the case of tree-based models, since the importance of features can be quantified through entropy measurement during node splitting, it can be observed that the performances varied depending on the entropy measurement method utilized for each model. As CO₂ and temperature increase, the photosynthetic activity of plants increases; however, the production of reactive oxygen species also rises, leading to increased stress on the plants. An increase in reactive oxygen species content within the plants results in higher phenol production; however, this increase negatively affects changes in chlorophyll content, energy transfer efficiency, and the vitality of chemical energy production if such a situation continues for a long period. Based on the PI analysis results, within the top 25% ranked features of SSP3-7.0 and SSP5-8.5, the selection ratio of features in environmental information variables decreased while the selection ratio of features for plant health status, such as Fv/Fm, and environmental stress sensitivity features (PI abs, DF abs, and SFI abs) increased. Considering the above findings, analyzing the interrelationships among features suggests that it is possible to maintain stress levels and enhance phenol production based on optimal environmental conditions.

5. Conclusions

In the study presented herein, we analyzed the important feature variables affecting phenolic content, which is an essential active ingredient of the medicinal plant C. officinale Makino, under different climate change scenarios. We proposed a stacking ensemble model that can predict total phenolic content and classify phenolic content grades based on the SSP climate change scenario. For data collection for C. officinale Makino, the environmental conditions in Yeongju city, Gyeongsangbuk-do, Republic of Korea, which is a representative cultivation area for this plant, were applied to different climate change scenarios, and C. officinale Makino was cultivated from May to September 2023. For each SSP scenario, 3237 environmental information items, 75 physiological response indicators, and 45 physiological activity indicators were collected, resulting in data imbalance. To address this issue, data augmentation was performed using the TVAE algorithm. The augmented data exhibited high similarity of over 97.3% compared with the original data. In addition, through a quartile distribution comparison, the data similarity by column was found to be over 90% on average. An equation was designed to set the phenolic content grades within C. officinale Makino based on the IC₅₀ value of DPPH radical scavenging activity. When the lowest grade of phenolic content was set at 10, the phenolic content grades of the collected C. officinale Makino data were classified into grades three to five. Base models were selected based on Spearman’s correlation coefficient to construct the proposed stacking-ensemble-based prediction and classification model. When evaluating the performance of the candidate pairs of ensemble models, XGB, kNN, LinearR, LGBM, ADA, and RF were selected for the regression models, and XGB, DT, ADA, kNN, and RF were selected for the classification models.

The total phenol content prediction and phenol content classification in the context of different climate change scenarios were performed using the XGB model, which showed the best performance among the single models, and the proposed stacking ensemble prediction and classification model. The performance of the regression model improved by 6.23–7.73% in terms of MAPE; in comparison, the classification model demonstrated 2.48–3.34% better performance in terms of accuracy. An accuracy rate of 0.825–0.895 was achieved when classifying phenol content grades using the predicted total phenol content values from the regression model. The AUC values indicated high model fitness, ranging from 0.978 to 0.981. Therefore, the proposed stacking ensemble prediction and classification model demonstrates excellent performance in the context of different climate change scenarios. We analyzed the main feature variables that influence the phenol content and grade of phenol content within C. officinale Makino by measuring the feature importance of the ensemble model.

Medicinal plants contain active ingredients that exert various effects, which have garnered attention in diverse fields such as health, healthcare, and beauty. However, research focused on predicting and classifying the content of active ingredients in medicinal plants has not been performed because of difficulties pertaining to data collection. In the present study, we proposed a stacking ensemble model to predict and classify the content of phenols, in the medicinal plant C. officinale Makino. The performance evaluation results showed that the proposed model exhibited excellent performance in terms of prediction error, classification accuracy, and model explanatory power. The results reported herein can contribute to research on improving phenol production through environmental optimization during the cultivation of C. officinale Makino. Furthermore, the generalization of the model can be utilized for the smart cultivation of various medicinal plants. In the proposed stacking ensemble regression and classification model, input variables such as environmental information, physiological response indicators, and physiological activity indicators were used to predict and classify the total phenol content and phenol content grade within C. officinale Makino. The use of this method resulted in low prediction error and high classification accuracy. However, in general cases, such as crop yield and price prediction, the input information primarily consists of nondestructive factors related to crops, such as environmental information, soil information, and plant growth images. Considering these factors, to diversify the application areas of the model and enhance its usability, there is a need for methods to infer characteristic features obtained from the decomposition analysis of crops, such as physiological response indicators and physiological activity indicators, through nondestructive elements such as environmental, soil, and growth images.

In future work, we aim to design and implement a method to predict the contents of active ingredients and classify their grades by predicting photosynthetic activity status and plant health status using only environmental data. Additionally, we intend to study prediction and classification models targeting specific active ingredients, such as flavonoids in phenolic mixtures, using deep learning techniques.

Author Contributions

W.-K.J., Y.S. and C.-J.C., designed the study; H.J.K. and K.C.L., cultivated the Cnidium officinale Makino and gathered data on C. officinale Makino, including physiological activity indicators and physiological response indicators, via extraction experiments; H.L. and C.-J.C. developed the stacking ensemble model for regression and classification; H.L. performed the bursting experiments; Y.S., H.L. and C.-J.C. analyzed the experimental results; W.-K.J. and C.-J.C. managed the project; H.L. and C.-J.C. wrote the initial draft of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Korea Institute of Science and Technology Information (KISTI) (No. (KISTI)J24JR008-24).

Data Availability Statement

All data are available via an email request submitted to the authors.

Conflicts of Interest

All authors declare that they have no conflicts of interest.

References

BRIC. Available online: https://www.ibric.org/bric/trend/bio-report.do?mode=view&articleNo=8692516 (accessed on 23 June 2024).
Chen, S.; Yu, H.; Luo, H. Conservation and sustainable use of medicinal plants: Problems, progress, and prospects. Chin. Med. 2016, 11, 37. [Google Scholar] [CrossRef] [PubMed]
Shen, T.; Yu, H.; Wang, Y. Assessing the impacts of climate change and habitat suitability on the distribution and quality of medicinal plant using multiple information integration: Take Gentiana rigescens as an example. Ecol. Indic. 2021, 123, 107376. [Google Scholar] [CrossRef]
Kim, E.J.; Choi, J.Y.; Yu, M.R.; Kim, M.Y.; Lee, S.H.; Lee, B.H. Total polyphenols, total flavonoid contents, and antioxidant activity of Korean natural and medicinal plants. Korean J. Food Sci. Technol. 2012, 44, 337–342. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Y. Recent trends of machine learning applied to multi-source data of medicinal plants. J. Pharm. Anal. 2023, 13, 1388–1407. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Shen, T.; Xu, F.; Wang, Y. Main components determination and rapid geographical origins identification in Gentiana rigescens Franch. based on HPLC, 2DCOS images combined to ResNet. Ind. Crops Prod. 2022, 187, 115430. [Google Scholar] [CrossRef]
Applequist, W.L.; Brinckmann, J.A.; Cunningham, A.B.; Hart, R.E.; Heinrich, M.; Katerere, D.R.; Van Andel, T. Scientists’ warning on climate change and medicinal plants. Planta Medica 2020, 86, 10–18. [Google Scholar] [CrossRef] [PubMed]
Yang, M.; Li, Z.; Liu, L.; Bo, A.; Zhang, C.; Li, M. Ecological niche modeling of Astragalus membranaceus var. mongholicus medicinal plants in Inner Mongolia, China. Sci. Rep. 2020, 10, 12482. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Yang, S.; Wang, Y.; Zhang, J. Multi-platform integration based on NIR and UV–Vis spectroscopies for the geographical traceability of the fruits of Amomum tsao-ko. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2021, 258, 119872. [Google Scholar] [CrossRef] [PubMed]
Wikipedia. Available online: https://en.wikipedia.org/wiki/Climate_change (accessed on 23 June 2024).
Cao, J.; Zhang, Z.; Tao, F.; Zhang, L.; Luo, Y.; Han, J.; Li, Z. Identifying the contributions of multi-source data for winter wheat yield prediction in China. Remote Sens. 2020, 12, 750. [Google Scholar] [CrossRef]
Elavarasan, D.; Vincent, P.D. Crop yield prediction using deep reinforcement learning model for sustainable agrarian applications. IEEE Access 2020, 8, 86886–86901. [Google Scholar] [CrossRef]
Shahhosseini, M.; Hu, G.; Archontoulis, S.V. Forecasting corn yield with machine learning ensembles. Front. Plant Sci. 2020, 11, 527890. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Huang, J.; Feng, Q.; Yin, D. Winter wheat yield prediction at county level and uncertainty analysis in main wheat-producing regions of China with deep learning approaches. Remote Sens. 2020, 12, 1744. [Google Scholar] [CrossRef]
Peng, B.; Guan, K.; Zhou, W.; Jiang, C.; Frankenberg, C.; Sun, Y.; Köhler, P. Assessing the benefit of satellite-based Solar-Induced Chlorophyll Fluorescence in crop yield prediction. Int. J. Appl. Earth Obs. Geoinf. 2020, 90, 102126. [Google Scholar] [CrossRef]
Guo, Y.; Fu, Y.; Hao, F.; Zhang, X.; Wu, W.; Jin, X.; Bryant, C.R.; Senthilnath, J. Integrated phenology and climate in rice yields prediction using machine learning methods. Ecol. Indic. 2021, 120, 106935. [Google Scholar] [CrossRef]
Oikonomidis, A.; Catal, C.; Kassahun, A. Hybrid deep learning-based models for crop yield prediction. Appl. Artif. Intell. 2022, 36, 2031822. [Google Scholar] [CrossRef]
Iniyan, S.; Varma, V.A.; Naidu, C.T. Crop yield prediction using machine learning techniques. Adv. Eng. Softw. 2023, 175, 103326. [Google Scholar] [CrossRef]
Wu, W.; Zheng, J.; Fu, H.; Li, W.; Yu, L. Cross-regional oil palm tree detection. In Proceedings of the IEEE/CVF Conference on CVPRW, Seattle, WA, USA, 14–19 June 2020; pp. 56–57. [Google Scholar] [CrossRef]
Alaa, H.; Waleed, K.; Samir, M.; Tarek, M.; Sobeah, H.; Salam, M.A. An intelligent approach for detecting palm trees diseases using image processing and machine learning. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 434–441. [Google Scholar] [CrossRef]
Uddin, A.H.; Chen, Y.L.; Borkatullah, B.; Khatun, M.S.; Ferdous, J.; Mahmud, P.; Por, L.Y. Deep-learning-based classification of Bangladeshi medicinal plants using neural ensemble models. Mathematics 2023, 11, 3504. [Google Scholar] [CrossRef]
Shovon, M.S.H.; Mozumder, S.J.; Pal, O.K.; Mridha, M.F.; Asai, N.; Shin, J. PlantDet: A robust multi-model ensemble method based on deep learning for plant disease detection. IEEE Access 2023, 11, 34846–34859. [Google Scholar] [CrossRef]
Ireri, D.; Belal, E.; Okinda, C.; Makange, N.; Ji, C. A computer vision system for defect discrimination and grading in tomatoes using machine learning and image processing. Artif. Intell. Agric. 2019, 2, 28–37. [Google Scholar] [CrossRef]
Mesa, A.R.; Chiang, J.Y. Multi-input deep learning model with RGB and hyperspectral imaging for banana grading. Agriculture 2021, 11, 687. [Google Scholar] [CrossRef]
Piedad, E.J.; Larada, J.I.; Pojas, G.J.; Ferrer, L.V.V. Postharvest classification of banana (Musa acuminata) using tier-based machine learning. Postharvest Biol. Technol. 2018, 145, 93–100. [Google Scholar] [CrossRef]
Amirruddin, A.D.; Muharam, F.M.; Ismail, M.H.; Ismail, M.F.; Tan, N.P.; Karam, D.S. Hyperspectral remote sensing for assessment of chlorophyll sufficiency levels in mature oil palm (Elaeis guineensis) based on frond numbers: Analysis of decision tree and random forest. Comput. Electron. Agric. 2020, 169, 105221. [Google Scholar] [CrossRef]
Kusumiyati, K.; Asikin, Y. Machine learning-based prediction of total phenolic and flavonoid in horticultural products. Open Agric. 2023, 8, 20220163. [Google Scholar] [CrossRef]
Han, Z.; Gong, Q.; Huang, S.; Meng, X.; Xu, Y.; Li, L.; Si, J. Machine learning uncovers accumulation mechanism of flavonoid compounds in Polygonatum cyrtonema Hua. Plant Physiol. Biochem. 2023, 201, 107839. [Google Scholar] [CrossRef] [PubMed]
Ardiansyah, A.; Naufalin, R.; Arsil, P.; Latifasari, N.; Wicaksono, R.; Aliim, M.S.; Waluyo, S. Machine Learning Model for Quality Parameters Prediction and Control System Design in the Kecombrang Flower (Etlingera elatior) Extraction Process. Processes 2022, 10, 1341. [Google Scholar] [CrossRef]
Baek, M.E.; Seong, G.U.; Lee, Y.J.; Won, J.H. Quantitative analysis for the quality evaluation of active ingredients in Cnidium Rhizome. Yakhak Hoeji 2016, 60, 227–234. [Google Scholar] [CrossRef]
Meinshausen, M.; Nicholls, Z.; Lewis, J.; Gidden, M.J.; Vogel, E.; Freund, M.; Wang, H.J. The SSP greenhouse gas concentrations and their extensions to 2500. Geosci. Model Dev. Discuss. 2019, 2019, 1–77. [Google Scholar]
Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional gan. Adv. Neural Inf. Process. Syst. 2019, 32, 7335–7345. [Google Scholar]
Xu, L. Synthesizing Tabular Data Using Conditional GAN. Ph.D. Dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA, 2020. Available online: https://hdl.handle.net/1721.1/128349 (accessed on 7 August 2024).
Wikipedia. Available online: https://en.wikipedia.org/wiki/Variational_autoencoder (accessed on 23 June 2024).
Xin, M.; Ang, L.W.; Palaniappan, S. A data augmented method for plant disease leaf image recognition based on enhanced GAN model network. J. Inform. Web Eng. 2023, 2, 1–12. [Google Scholar] [CrossRef]
Farahanipad, F.; Rezaei, M.; Nasr, M.S.; Kamangar, F.; Athitsos, V. A survey on GAN-based data augmentation for hand pose estimation problem. Technologies 2022, 10, 43. [Google Scholar] [CrossRef]
Strelcenia, E.; Prakoonwit, S. A survey on gan techniques for data augmentation to address the imbalanced data issues in credit card fraud detection. Mach. Learn. Knowl. Extr. 2023, 5, 304–329. [Google Scholar] [CrossRef]
Sharma, O.P.; Bhat, T.K. DPPH antioxidant assay revisited. Food Chem. 2009, 113, 1202–1205. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, J.; Liang, S.; Li, X.; Li, M. An evaluation of eight machine learning regression algorithms for forest aboveground biomass estimation from multiple satellite data products. Remote Sens. 2020, 12, 4015. [Google Scholar] [CrossRef]
Ma, Z.; Dai, Q. Selected an stacking ELMs for time series prediction. Neural Process. Lett. 2016, 44, 831–856. [Google Scholar] [CrossRef]
Cho, D.; Yoo, C.; Im, J.; Lee, Y.; Lee, J. Improvement of spatial interpolation accuracy of daily maximum air temperature in urban areas using a stacking ensemble technique. GISci. Remote Sens. 2020, 57, 633–649. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, J.; Liang, S.; Li, X.; Liu, J. A stacking ensemble algorithm for improving the biases of forest aboveground biomass estimations from multiple remotely sensed datasets. GISci. Remote Sens. 2022, 59, 234–249. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed stacking ensemble model for the prediction and classification of phenol contents in Cnidium officinale Makino.

Figure 2. Comparison of the quartiles between the original and augmented data: (a) SSP1-2.6; (b) SSP3-7.0; (c) SSP5-8.5. Note that in each column the quartile distribution and mean of the original data appear to be similar to those of the augmented data (with an average similarity of 90%). As the data distribution by column in the augmented data is similar to that of the original data, it can be concluded that using augmented data is useful.

Figure 3. Comparison of vector lengths based on the different SSP scenarios: (a) regression models and (b) classification models.

Figure 4. Proposed stacking ensemble prediction and classification model. The process of the proposed model consists of the following four stages: data collection, data preprocessing, model training, and model performance evaluation results. (1) In the data collection stage, data are collected by cultivating C. officinale Makino based on different climate change scenarios. The types of data collected are shown in the upper right corner of the figure. (2) Data augmentation is performed on the collected data, and the phenol content grade is measured. To evaluate performance based on feature variables, combinations of feature variable groups are generated. (3) From nine candidate models, base and metamodels for prediction and classification are selected to consist of an ensemble model and training is conducted. The nine candidate models are shown in the middle right of the figure. (4) Lastly, the model’s performance evaluation is conducted.

Figure 5. Comparison of the vector lengths of the base models: (a) regression models and (b) classification models.

Figure 6. Comparison of total vector lengths for selecting the best pair of base and metamodels: (a) regression models and (b) classification models.

Figure 7. Comparison of vector lengths for the regression models based on feature variable groups: (a) comparison of SSP vector lengths between the two regression models and (b) comparison of R² values between the two regression models.

Figure 8. Comparison of accuracies and F1 scores for the classification models based on feature variable groups for the classification of phenol content grades using the predicted total phenol contents: (a) accuracy; (b) F1 score; (c) ROC curve.

Figure 9. Feature importance for the ensemble regression model: (a) regression and (b) classification.

Table 1. Similarity between original and augmented data.

Similarity Measurement	SSP1-2.6	SSP3-7.0	SSP5-8.5
Cosine similarity	0.990	0.990	0.990
Correlation similarity	0.996	0.996	0.998
Jensen–Shannon similarity	0.973	0.975	0.979

Table 2. IC₅₀ values and phenol content grades of the collected data.

IC₅₀	SSP1-2.6	SSP3-7.0	SSP5-8.5
IC₅₀min (C_high)	0.04 (1)	0.04 (1)	0.04 (1)
IC₅₀ of maximum phenol content in collected data (phenol content grade)	0.32 (3)	0.32 (3)	0.20 (3)
IC₅₀ of minimum phenol content in collected data (phenol content grade)	0.47 (5)	0.57 (5)	0.32 (3)
IC₅₀max (C_low)	1.00 (10)	1.00 (10)	1.00 (10)

Table 3. Result of base model selection based on Spearman’s correlation coefficient.

Order of Model Selection	1st	2nd	3rd	4th	5th
Regression base models	XGB	kNN	LinearR	LGBM	ADA
Classification base models	XGB	DT	ADA	kNN	CAT

Table 4. Feature variable groups used in the performance evaluation.

Feature Variable Group	Physiological Response Indicators (O/-)
Feature Variable Group	Chlorophyll Contents (Chl)	Energy Flux per Reaction Center (EF)	Photosynthesis Activity (PA)
None	-	-	-
Chl	O	-	-
EF	-	O	-
PA	-	-	O
Chl-EF	O	O	-
Chl-PA	O	-	O
EF-PA	-	O	O
Chl-EF-PA	O	O	O

Table 5. Hyperparameter search range and optimized values for each model.

Model	Hyperparameters
Model	Name	Search Range	Optimized Hyperparameter Values
kNN	n_neighbors	3 to 10, step = 1	3
	P	1.0 to 2.0, step = 1.0	2.0
	Metric	{‘manhattan’, ‘minkowski’}	‘minkowski’
DT	criterion	{‘squared_error’, ‘poisson’,	squared_error
		‘friedman_mse’, ‘absolute_error’}
	max_depth	1 to 10, step = 1	7
	max_leaf_nodes	2 to 10, step = 1	10
	min_samples_split	2 to 10, step = 2	2
	min_samples_leaf	1 to 4, step = 1	2
RF	n_estimators	10 to 2000, step = 1	953
	max_features	{‘auto’, ‘sqrt’}	Sqrt
	max_depth	1 to 16, step = 1	15
	min_samples_split	2 to 10, step = 2	2
	min_samples_leaf	1 to 4, step = 1	3
XGB	n_estimators	10 to 2000, step = 1	1760
	max_depth	1 to 20, step = 1	17
	learning_rate	0.01 to 1.00, step = 0.01	0.22
	alpha	0.01 to 1.00, step = 0.01	0.01
LGBM	n_estimators	10 to 2000, step = 1	1515
	max_depth	1 to 20, step = 1	20
	min_child_weight	1 to 300, step = 1	276
	min_child_samples	10 to 50, step = 1	11
	learning_rate	0.01 to 1.00, step = 0.01	0.68
ADA	n_estimators	10 to 2000, step = 1	50
ADA	learning_rate	0.01 to 1.00, step = 0.01	0.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, H.; Koo, H.J.; Lee, K.C.; Song, Y.; Joo, W.-K.; Chae, C.-J. Prediction and Classification of Phenol Contents in Cnidium officinale Makino Using a Stacking Ensemble Model in Climate Change Scenarios. Agronomy 2024, 14, 1766. https://doi.org/10.3390/agronomy14081766

AMA Style

Lee H, Koo HJ, Lee KC, Song Y, Joo W-K, Chae C-J. Prediction and Classification of Phenol Contents in Cnidium officinale Makino Using a Stacking Ensemble Model in Climate Change Scenarios. Agronomy. 2024; 14(8):1766. https://doi.org/10.3390/agronomy14081766

Chicago/Turabian Style

Lee, Hyunjo, Hyun Jung Koo, Kyeong Cheol Lee, Yoojin Song, Won-Kyun Joo, and Cheol-Joo Chae. 2024. "Prediction and Classification of Phenol Contents in Cnidium officinale Makino Using a Stacking Ensemble Model in Climate Change Scenarios" Agronomy 14, no. 8: 1766. https://doi.org/10.3390/agronomy14081766

APA Style

Lee, H., Koo, H. J., Lee, K. C., Song, Y., Joo, W.-K., & Chae, C.-J. (2024). Prediction and Classification of Phenol Contents in Cnidium officinale Makino Using a Stacking Ensemble Model in Climate Change Scenarios. Agronomy, 14(8), 1766. https://doi.org/10.3390/agronomy14081766

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction and Classification of Phenol Contents in Cnidium officinale Makino Using a Stacking Ensemble Model in Climate Change Scenarios

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Architecture

2.2. Data Collection

2.3. Data Augmentation

2.4. Assigning Phenol Content Grades Using DPPH Radical Scavenging Activity

2.5. Stacking Ensemble Model for the Prediction and Classification of Phenol Contents in C. officinale Makino

3. Results

3.1. Prediction of Total Phenol Contents in XGB and the Proposed Stacking Ensemble Regression Model

3.2. Classification of Phenol Content Grade in XGB and the Proposed Stacking Ensemble Classification Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI