Predicting Dry Pea Maturity Using Machine Learning and Advanced Sensor Fusion with Unmanned Aerial Systems (UASs)

Bazrafkan, Aliasghar; Navasca, Harry; Kim, Jeong-Hwa; Morales, Mario; Johnson, Josephine Princy; Delavarpour, Nadia; Fareed, Nadeem; Bandillo, Nonoy; Flores, Paulo

doi:10.3390/rs15112758

Open AccessArticle

Predicting Dry Pea Maturity Using Machine Learning and Advanced Sensor Fusion with Unmanned Aerial Systems (UASs)

¹

Department of Agricultural and Biosystems Engineering, North Dakota State University, Fargo, ND 58102, USA

²

Department of Plant Science, North Dakota State University, Fargo, ND 58102, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(11), 2758; https://doi.org/10.3390/rs15112758

Submission received: 19 April 2023 / Revised: 19 May 2023 / Accepted: 23 May 2023 / Published: 25 May 2023

(This article belongs to the Special Issue High-Throughput Phenotyping in Plants Using Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Maturity is an important trait in dry pea breeding programs, but the conventional process predominately used to measure this trait can be time-consuming, labor-intensive, and prone to errors. Therefore, a more efficient and accurate approach would be desirable to support dry pea breeding programs. This study presents a novel approach for measuring dry pea maturity using machine learning algorithms and unmanned aerial systems (UASs)-collected data. We evaluated the abilities of five machine learning algorithms (random forest, artificial neural network, support vector machine, K-nearest neighbor, and naïve Bayes) to accurately predict dry pea maturity on field plots. The machine learning algorithms considered a range of variables, including crop height metrics, narrow spectral bands, and 18 distinct color and spectral vegetation indices. Backward feature elimination was used to select the most important features by iteratively removing insignificant ones until the model’s predictive performance was optimized. The study’s findings reveal that the most effective approach for assessing dry pea maturity involved a combination of narrow spectral bands, red-edge, near-infrared (NIR), and RGB-based vegetation indices, along with image textural metrics and crop height metrics. The implementation of a random forest model further enhanced the accuracy of the results, exhibiting the highest level of accuracy with a 0.99 value for all three metrics precision, recall, and f1 scores. The sensitivity analysis revealed that spectral features outperformed structural features when predicting pea maturity. While multispectral cameras achieved the highest accuracy, the use of RGB cameras may still result in relatively high accuracy, making them a practical option for use in scenarios where cost is a limiting factor. In summary, this study demonstrated the effectiveness of coupling machine learning algorithms, UASs-borne LIDAR, and multispectral data to accurately assess maturity in peas.

Keywords:

peas phenotyping; machine learning; LIDAR; high-throughput plant phenotyping; multispectral indices

1. Introduction

Dry pea (Pisum sativum L.) is a major field crop that is widely grown around the world for its high protein content and numerous health benefits [1,2]. In addition to its nutritional value, dry pea is also important for farmers due to its ability to fix atmospheric nitrogen and its tolerance to environmental stressors, including drought, heat, salinity, and low-nitrogen soils [3]. As with any field crop, an accurate assessment of maturity is crucial for optimal harvest timing and yield prediction [4]. In the past, crop maturity has typically been assessed visually by trained personnel, or by using destructive sampling techniques that require the physical harvesting of a small portion of the field [5]. However, these methods are time-consuming, labor-intensive, and may not provide a representative sample of the entire field.

In recent years, there has been a growing interest in using machine learning models and unmanned aerial systems (UASs) equipped with multispectral and LIDAR sensors for assessing crop maturity [6,7,8]. These techniques offer several advantages over traditional methods. First, they allow for the rapid and cost-effective assessment of large areas, as UASs can cover a field in a matter of minutes or hours, while traditional methods may take days or weeks [9]. Second, they have the capability to collect high-resolution, multi-dimensional data, allowing for the collection of more precise and detailed information about the crop [9]. The integration of specialized sensors, such as multispectral and LIDAR, enhances the quality of the data collected. Multispectral sensors, for example, can analyze the reflectance of the crop in different wavelengths of the electromagnetic spectrum, providing crucial information on factors such as chlorophyll content and canopy structure [10]. Conversely, LIDAR sensors can measure the distance from the ground to the crop surface by emitting laser pulses and measuring the return time, facilitating accurate measurements of crop height and structural characteristics [10]. LIDAR is capable of measuring not just the crop surface, but also other objects in its field of view [11].

Sensor fusion refers to the integration of data from multiple sensors to provide a more comprehensive and accurate understanding of the environment or object being observed [12]. By integrating data from multiple sensors, such as RGB cameras, multispectral sensors, and LIDAR, sensor fusion can significantly improve the accuracy of maturity prediction classifiers [13,14]. The fusion of these diverse sensor modalities allows researchers to capture various aspects of plant physiology, morphology, and biochemistry simultaneously, providing a more holistic view of plant phenotypic traits. For example, by combining RGB imagery for plant canopy structure, multispectral data for assessing chlorophyll content, and LIDAR imaging for monitoring plant height, sensor fusion enables the simultaneous evaluation of multiple plant traits related to growth and health response. The integration of sensor data through fusion techniques, such as data fusion algorithms, machine learning, and statistical modeling, enhances the accuracy and reliability of plant phenotyping, enabling researchers to unravel complex relationships between genotype, environment, and phenotype [15].

Sensor fusion is typically used in conjunction with machine learning algorithms to analyze the data collected [15,16]. Support vector machine, random forests, and neural networks are just a few examples of the algorithms that have been used for plant maturity assessment using UASs [17]. By training these algorithms on data, they can learn the patterns and characteristics that are indicative of mature plants. Once trained, the algorithms can be used to predict the maturity of new, unlabeled data by using the patterns and characteristics learned during training [18]. In this way, machine learning algorithms enable the efficient processing of the large amounts of data collected by UASs equipped with multispectral and LIDAR sensors, providing valuable insights into crop maturity that can inform decision-making processes for farmers and breeders.

Previous research has shown that machine learning models and UASs can be used to predict important agricultural variables such as plant height [19], biomass [20], and crop yield [21]. While there has been some recent research on predicting crop maturity, particularly in wheat [22] and soybean [23,24], the application of these methods is not yet widely explored, and there is still much potential for further research and development. To the best of the authors’ knowledge, this study is one of the first to use machine learning methods to predict dry pea maturity using LIDAR and multispectral data.

In this study, we aim to address a gap in the field of dry pea breeding by comparing the performances of five machine learning models, namely, support vector machine, random forest, artificial neural network, K-nearest neighbor, and naive Bayes, for predicting dry pea maturity using data obtained from UAS-based sensors such as RGB, multispectral, and LIDAR. Additionally, we evaluated the impacts of six different input configurations, including narrow spectral bands, crop height metrics, image textural metrics, and RGB-based, red-edge, and NIR vegetation indices, on the accuracy of the predictions. Furthermore, we examined the effects of feature data selection and growth stage on the overall precision of the dry pea maturity predictions.

2. Materials and Methods

In this study, we applied a novel approach for estimating dry pea maturity that combines UASs-collected data with advanced machine learning (ML) methods (Figure 1).

2.1. Study Area and Data Acquisition

The location of the experimental areas was at Prosper, ND (−97.1157636°E, 46.9937094°N) (Figure 2). The trial consisted of 714 small plots, each measuring one meter wide by two meters long. The pulse crops breeding program at North Dakota State University established and maintained the field experiment. The plots were planted on 3 May 2023, and there was a total of 304 unique genotypes, including 300 pea accessions and 4 check varieties. As the dates of visual assessment and the maturity date did not always match, the visual assessment was conducted multiple times throughout the maturity season with 2–3 visits to the field each week. During each visit, trained technicians examined each of the 714 plots individually and assessed their maturity based on the degree of brownness of plants in the plot, assigning a putative date of maturity. The following day, the technicians double-checked whether the plots assigned with putative dates had indeed matured, and confirmed the final maturity date if the plots were matured.

UASs image collection campaigns were carried out at 67 (8 July 2022), 84 (25 July 2022), and 100 (11 August 2022) days after planting (DAP). Sensors were mounted on a DJI Matrice 300 and a DJI Matrice 200 (DJI Inc., Shenzhen, China), which have autonomous flight capabilities for take-off, flight planning, and landing. DJI Pilot version 2.5.1.15 (DJI Inc., Shenzhen, China) was used to carry out the flights. The parameters for the aerial data acquisition process are provided in Table 1. To reduce the impact of shadows from the plants, the flights were conducted around noon for each data collection.

2.2. Data Pre-Processing and Image Processing

2.2.1. Radiometric Calibration

Radiometric calibration was performed on the entire scene using a calibrated reflective surface provided by the manufacturer. We used model RP06-2051004-OB for our calibration and performed the process in Pix4D version 4.8.1.

Reflectance bands in the blue (444 nm and 475 nm), green (531 nm and 560 nm), red (650 nm and 668 nm), red-edge (705 nm, 740 nm, and 717 nm), and NIR (840 nm) regions were used to calculate several vegetation indices from the experimental area for each flight date (Table 2), which were then used in further analysis. The selection of these indices was based on their sensitivity to specific physiological and biochemical changes that occur during the maturation process in dry pea. For example, some indices, including visible atmospherically resistant index (VARI) and normalized difference red-edge (NDRE), are sensitive to changes in chlorophyll content [25], which decreases as plants mature, while structural indices, including normalized difference vegetation index (NDVI) and soil-adjusted vegetation index (SAVI), are sensitive to changes in canopy structure, which becomes more compact and less variable as plants mature [26]. By selecting a range of vegetation indices that are sensitive to different aspects of the maturation process, we aimed to develop a comprehensive model for predicting maturity in dry pea that considers multiple physiological and biochemical changes that occur during the process.

2.2.2. Image Stitching and Plot Data Extraction

Datasets collected with the P1 and MicaSense cameras were processed into orthomosaics using Pix4Dmapper (version 4.8.1) by Pix4D (Pix4D, Lausanne, Switzerland). The higher resolution of RGB images captured by the P1 sensor facilitated the accurate delineation of plot boundaries. Concurrently, the LIDAR data were processed using DJI Terra version 3.5.5 (DJI Inc., Shenzhen, China) to generate a point cloud layer, from which the digital surface model (DSM) and digital elevation model (DEM) layers were derived to obtain the plant height layer. To conduct plot-level analysis, a semi-automated procedure was only used once in the QGIS (QGIS Development Team, open source software) version 3.26.0 to delineate the boundary of each plot. This was done based on the RGB mosaic image derived from the P1 sensor, and each plot was assigned an ID code to enable further analysis across all images and dates.

2.2.3. Masking out Soil and Shaded Pixels

Shaded areas can affect the camera’s ability to capture accurate reflectance data, especially as plants grow [43]. To remove the effects of soil and shadow backscatter, we extracted the Excess Green (ExG) Index (Figure 3b) from our RGB image (Figure 3a), since this index showed a good ability to remove the effect of soil and shadow backscatter in another study [44]. This helped us to identify and keep only the pixels of sunlit vegetation for our vegetation indices calculation (Figure 3c). We determined the threshold for segmenting the vegetation and soil pixels in the ExG map through trial-and-error, and found the value of 0.1 to be the most effective in accurately differentiating between the two. This allowed us to extract data only from our target of interest (plants).

2.2.4. Outlier Detection and Removal

The outliers in the dataset may be caused by various factors, such as measurement errors, instrument malfunctions, or extreme environmental conditions [45]. The z-score method

(z = \frac{x - \bar{x}}{σ})

is a statistical technique that identifies outliers by measuring how many standard deviations away from the mean a data point is. A high z-score indicates that the data point is significantly different from the rest of the data and may be considered an outlier [46,47]. The dataset was cleaned by identifying and removing outliers for each predictor variable at each phenological stage, where observations with a z-score greater than 2.5 were considered outliers [48]. Overall, around 4–7% of the data for each growth stage were eliminated as outliers.

2.2.5. Machine Learning Datasets Creation and Feature Selection

ENVI version 5.6.2 (Harris Geospatial, Broomfield, CO, USA) was utilized for generating image textural metrics and stacking raster layers, which included narrow spectral bands and vegetation indices. Image textural metrics were computed with a window size of 3 × 3 using ENVI 5.6.2 software. Crop height was measured by subtracting the DSM and DEM extracted from the LIDAR point cloud within ENVI 5.6.2 software. A model was created using the model-builder package of ARCGIS Pro version 3.0.0 (Esri, Redlands, CA, USA) to extract all crop height metrics across the studied times. All spectral, image textural, and crop height metrics were then exported as a csv file for further data analysis in Python version 3.9.13 (Python Software Foundation, open-source software). To partition the original dataset into training and testing subsets, a stratified random sampling approach was employed with a split ratio of 70:30. This approach ensured that the resulting subsets contained representative proportions of different classes [49]. Due to the imbalanced nature of the original dataset (522 plots for late-maturing plants and 192 plots for early-maturing plants), a resampling method, specifically Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTE-ENN), was utilized to balance the numbers of early- and late-maturing plots in the training set. This approach increases the minority class (early-maturing plots) by generating synthetic examples that are similar to the existing ones [50]. The training set contained a balanced representation of both early- and late-maturing plots. Table 3 provides the relevant data to support these findings.

The term used to describe the task of determining the most relevant features is called “feature selection”. Removing irrelevant, redundant, and noisy features, as well as avoiding significant losses of information, reducing computation requirements, and finally improving the performance of machine learning (ML), are the advantages of feature selection [51]. The backward feature elimination method was used for selecting a subset of the available features to use in building a model. It works by starting with all the features, and then it iteratively removes the least key features one by one until a desired number of features is achieved. The undesired features in this method are those that have little or no impact on the model’s performance, or those that introduce noise or complexity to the model without contributing significantly to its accuracy [52,53]. To evaluate the impact of feature selection on the performance of the models, the metrics of precision, recall, and f1 score values were compared before and after feature selection. A Monte Carlo simulation was conducted to obtain a more accurate estimate of the true performance of all machine learning models before and after feature selection. This involved repeatedly training and testing the models with varying data samples, and computing their precision scores multiple times (100 times) to simulate the process and improve the reliability of the estimates. A paired t-test was conducted to determine the statistical significance of the difference between the model’s results before and after feature selection.

2.3. Machine Learning Models

Random forests (RF), K-nearest neighbors (KNN), artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM) are commonly used machine learning algorithms that can be applied to many tasks in plant breeding programs [9,54,55,56,57]. These methods have been used to estimate the maturity of various crops such as soybean [23], Durian fruits [58], and rapeseed [59].

2.3.1. RF

Random forests (RF) are a type of ensemble learning method that combines the predictions of multiple decision trees, which are trained on different subsets of the data. The main idea behind RF is to train many decision trees on various parts of the training data and then average their predictions to obtain a more robust and accurate model. The number of trees and the number of features used for the split are two important parameters that need to be determined when using an RF model. A comprehensive study of RF is found in [60].

2.3.2. KNN

K-nearest neighbors (KNN) is a simple, non-parametric machine learning algorithm that can be used for classification and regression tasks. It works by finding the K-nearest training examples in feature space and using them to make predictions about the target variable. The number of nearest neighbors and distance metric are the most important parameters of KNN. Commonly used distance metrics include Euclidean distance, Manhattan distance, and Cosine similarity. An in-depth study of KNN can be found at [61].

2.3.3. ANN

Artificial neural networks (ANN) is a machine learning model inspired by the human brain’s structure and function. It consists of many interconnected “neurons” that can process and transmit information, allowing it to learn and make intelligent decisions. The architecture of the network, the activation function, the learning rate, and the optimization algorithm are important specifications and parameters of ANNs. A comprehensive study of ANN was carried out by [62].

2.3.4. NB

Naive Bayes (NB) is a probabilistic machine learning algorithm commonly used for classification tasks. It makes predictions based on the probability of certain events occurring, given the class of the instance. NB algorithms assume that the features are independent of each other, which is why they are referred to as “naive”. The distribution of the features and the smoothing parameter are important parameters of NB algorithms. For details regarding NB, the reader is encouraged to read [63].

2.3.5. SVM

Support vector machine (SVM) uses a concept called the kernel trick to transform the input data into a higher dimensional space, where it can be separated by a hyperplane. The main advantage of SVM is its ability to handle non-linear decision boundaries. The C parameter in SVM represents the regularization parameter, which helps to avoid overfitting. The SVM algorithm is sensitive to the choice of kernel function and the value of the regularization parameter. A complete study of SVM is located at [64].

2.3.6. Late Fusion

Late fusion is a technique used to combine the outputs of multiple classifiers or models to improve the overall performance and reliability of a classification system [65]. It involves aggregating the predictions or scores obtained from individual models to make a final decision. In this study, soft-level late fusion (averaging method) is applied to integrate the results from five machine learning models (ANN, RF, KNN, SVM, NB) for the prediction of maturity in the dry pea. Soft-level fusion employs estimations of the posterior probability of each model [66]. By leveraging the complementary strengths of each model, late fusion aims to enhance the accuracy and robustness of the classification process [65].

2.4. Model Architecture and Trainging Process

2.4.1. Model Configuration

To configure and train the machine learning (ML) models, we utilized a training dataset and Python 3.9.13, along with the Scikit-learn package for deploying the ML algorithms. In this study, two distinct pea categories were established based on the maturity of the crop: early-maturing dry peas and late-maturing dry peas. Table 4 shows the optimal tuning hyperparameter values of each ML model. In this study, we chose the tuning hyperparameters for the ML models (RF, SVM, ANN, KNN, and NB) based on values that have been used in previous studies with similar datasets and problem types (Table 4).

2.4.2. Model Inputs

The use of the statistical metrics of each predictor, including minimum, mean, median, maximum and percentiles, as input features for the ML models allowed for a more comprehensive understanding of the input variables, and can potentially lead to better performance of the models [82]. Table 5 shows the different statistical metrics used for each input parameter as columns and the input parameter name as rows.

2.4.3. Model Implementation

A total of 714 sampling plots collected from a flight at 100 DAP were separated into two sets: a training set that had 70% of the plots, and a testing set that contained 30% of the plots. The training set was used to build the machine learning models, while the testing set was used to evaluate their performance. To determine the optimal time for assessing the maturity of dry pea, we investigated the spectral profiles of early-maturing and late-maturing plots at three different growth stages, specifically, 67 DAP, 84 DAP, and 100 DAP. The spectral profile refers to the reflectance value of each band, with reflectance values ranging from 0 to 1 (higher values indicating higher reflectance). The optimal assessment time was selected based on the highest spectral reflectance differentiation between the early- and late-maturing plots, as this indicates the greatest potential for the accurate classification of maturity status. Then five machine learning models were run based on the dataset of optimal time.

2.4.4. Model Assessment

To evaluate the performances of the machine learning models, three evaluation metrics were used, namely, precision (Pr), recall (Rc), and f1 score. These metrics gauge the accuracy of the model in identifying maturity areas. Precision and recall offer insight into the accuracy of the model, while the f1 score combines the results of both to provide a comprehensive evaluation of the model [83]. The metrics were calculated as shown below.

P r = \frac{T P}{T P + F P}

(1)

R c = \frac{T P}{T P + F N}

(2)

f 1 score = \frac{2 R c P r}{P r + R c}

(3)

where Pr represents precision, Rc represents recall, TP represents the number of true positives, FP represents the number of false positives, and FN represents the number of false negatives. These symbols are used to signify the regions that have been identified correctly as early-maturing peas, those that have been wrongly identified as late-maturing peas, and those that have been wrongly identified as late-maturing peas, respectively [83]. To obtain these values, we labeled the plots based on the early-maturing and late-maturing classes determined through field measurements, and then split the dataset into a training set (70% of the data) and a test set (30% of the data).

To cross-check the outputs of the algorithms against a valid reference dataset, we employed four numeric metrics (precision, recall, f1 score, and the area under the curve (AUC) values) and a graphical metric (Receiver Operating Characteristic (ROC) curve). We evaluated the performance of the algorithms based on their ability to correctly classify the early-maturing and late-maturing classes in the dataset. Specifically, we used the ROC curve to assess the trade-off between sensitivity and specificity in our models. The ROC curve is a graph that shows how well a binary classifier system performs, by plotting the true positive rate (TPR) against the false positive rate (FPR), at different classification thresholds. TPR represents sensitivity or recall, while FPR is the fraction of negative instances that are incorrectly classified as positive [84]. The AUC is a metric that measures the performance of a binary classifier system by quantifying the probability that a randomly selected positive example is ranked higher than a randomly selected negative example by the classifier. AUC values range from 0 to 1, with 0.5 indicating random performance and 1 indicating perfect classification performance. Therefore, a higher AUC value indicates better classification performance for the binary classifier system [85]. To obtain the TP, FP, and FN values, we compared the algorithm output to the labeled data. This comparison was performed manually by visually inspecting the output and comparing it to the labeled data. The model with the highest values of precision, recall, f1 score, and AUC, and the best distribution in the ROC curve (i.e., closest to the top-left corner), was selected as the best model for predicting the maturity of dry peas.

3. Results

3.1. Optimal Time for Dry Pea Maturity Assessment

Upon analyzing the spectral variations between early-maturing and late-maturing dry peas at three dates after planting, we found that the differences were not as pronounced at 67 DAP (Figure 4a) and 84 DAP (Figure 4b) as they were at 100 DAP (Figure 4c). Additionally, using the dataset from 100 DAP allowed us to have a more balanced dataset for our machine learning algorithms. Therefore, we used the 100 DAP dataset to feed our machine learning models for predicting the maturity of dry pea.

3.2. Feature Selection

The results shown in Figure 5 indicate that certain variables were more strongly correlated with maturity as a response variable. These variables include the average reflectance values in the blue, green, red, and NIR bands at narrow spectral bands, as well as metrics such as the maximum, mean, and minimum values of plant height at canopy height. Additionally, the contrast of the green channel, the homogeneity of the blue channel, and the mean of the red channel at image textural metrics exhibited notable correlations with maturity. Among the RGB-based vegetation indices, the average values of ExG, VARI, and GLI showed stronger associations with maturity. Similarly, the maximum value of SR, the minimum value of CIgreen, and the average value of NDVI in NIR vegetation indices demonstrated significant correlations. In addition, the average values of LIC2 and NDRE, along with the maximum values of LIC2 in red-edge vegetation indices, were found to be highly correlated with maturity.

The average reflectance values of spectral bands are indicative of overall plant health and related to chlorophyll quantity and quality, which is vital for photosynthesis and growth [86]. Minimum and maximum reflectance values are more influenced by external factors such as lighting conditions and shading, reducing their consistency and reliability as maturity predictors [87]. Previous studies show that average reflectance values are effective in predicting plant maturity in various crops, including soybean [87], rice [88], and wheat [89], which is thus consistent with using them as predictors for dry pea maturity.

The maximum plant height is crucial in determining the maturity of the plant, since it indicates the plant’s overall growth rate and ability to allocate resources effectively [90]. The mean height may be influenced by outliers, which can reduce its effectiveness as a predictor [91]. The minimum height may not be a good indicator of plant maturity since it only measures the smallest plant height and may not represent the overall growth of the plant.

The contrast of the green channel, the homogeneity of the blue channel, and the mean of the red channel at image textural metrics are important for predicting maturity in dry pea due to their direct relationship to plant color and texture [92]. Despite the potential saturation of the red channel [93], it can still provide valuable information about image texture that is relevant for predicting maturity in dry pea. Therefore, all three color channels—green, red, and blue—may offer unique and meaningful insights into the prediction of maturity in dry pea based on their respective contributions to image texture analysis [93].

The average values of ExG, VARI, and GLI at RGB-based vegetation indices are more important to predicting maturity in dry peas because they are more sensitive to changes in chlorophyll content and plant health [94]. Previous studies have also shown that these indices are effective in predicting plant maturity in various crops, including wheat [95] and maize [19].

The SR, CIgreen, and NDVI indices are important to predicting the maturity in dry pea as they measure chlorophyll content and photosynthetic activity [96]. Previous studies also support the effectiveness of these indices in predicting maturity in various crops [97,98,99].

One reason for the better performance of LIC2 in predicting the dry pea maturity is that it is sensitive to the canopy’s physiological, biochemical and morphological parameters [100], which may be different in mature and immature plants. The NDRE index measures the difference between the reflectance values of the red-edge and NIR bands, which is directly related to the amount of chlorophyll in the plant and the density of the plant canopy [101]. Previous studies have also shown the effectiveness of the NDRE index in predicting plant maturity in various crops, including wheat [102], soybean [103], and cotton [104].

To build an accurate machine learning model for predicting the maturity of dry pea, it is crucial to choose the right set of features. We used a backward feature elimination (BFE) method to rank all the features based on their importance in predicting maturity. Specifically, we trained BFE on all features and calculated the feature importance based on their contribution to reducing the overall prediction error. Then, we eliminated the feature with the lowest importance score and retrained the model, repeating this process until only the top-performing features remained. Then, we selected the top features whose cumulative sum of importance was greater than 90% (denoted by blue box in Figure 5), while the remaining features were not incorporated into the models. This process allowed us to create a simpler and more effective model that uses the most relevant features to predict the maturity of dry pea.

3.3. Performance of UAS-Derived Predictors for Dry Pea Maturity

The evaluation was conducted both before and after the process of feature selection (Table 6). The study found that feature selection led to an enhancement in the performance of all machine learning models in forecasting the maturity of dry peas.

Narrow spectral bands (NSP), red-edge vegetation indices (ReVIs), and RGB-based vegetation indices (RGBVIs) performed better than other predictors in estimating the maturity of dry peas when using the RF classifier due to their sensitivity to changes in plant health and chlorophyll content [105], which are critical indicators of maturity. In addition, previous research has demonstrated the effectiveness of these predictors in estimating plant maturity in various crops [106,107,108], which further supports their use. The NSP predictor achieved the highest precision, recall, and f1 scores of 0.98, 0.96, and 0.97, respectively. The ReVIs predictor achieved the maximum precision, recall, and f1 scores of 0.96, 0.94, and 0.95, respectively. The RGBVIs predictor also demonstrated good performance, with the maximum precision, recall, and f1 scores of 0.95, 0.93, and 0.94, respectively. The use of an RGB camera may not necessarily result in a significant reduction in accuracy. In fact, while the best precision achieved with multispectral sensors was 0.96, the best precision achieved with RGB sensors was 0.95, indicating that it is possible to achieve relatively high accuracy even with RGB cameras. While the use of multispectral cameras may be ideal for achieving the highest levels of accuracy in dry pea maturity prediction, the affordability and accessibility of RGB cameras may make them a more practical option for maturity prediction applications, particularly in scenarios where cost is a limiting factor.

Although the random forest (RF) algorithm had the highest accuracy among the machine learning models for predicting dry pea maturity using crop height metrics (CHM), with precision, recall, and f1 scores of 0.87, 0.85, and 0.86, respectively, it was found that CHM had the least predictive ability compared to other predictors. In addition, ReVIs exhibited superior predictive capabilities for dry pea maturity when compared to NIR vegetation indices (NIRVIs). While NIRVIs attained precision, recall, and F1 score values of 0.94, 0.92, and 0.93, respectively, ReVIs outperformed them. The naive Bayes (NB) algorithm exhibited the lowest level of accuracy among the various machine learning models for predicting the maturity of dry pea, with the highest precision, recall, and f1 scores of 0.79, 0.70, and 0.75, respectively, achieved when using image textural metrics (ITM). Finally, the performance of ITM in predicting the maturity of dry peas was found to be better than that of CHM.

We used the ROC curve (Figure 6) to evaluate the balance between sensitivity and specificity within ML models. The findings indicate that the RF algorithm exhibited the best performance among the models tested, as evidenced by an AUC value of above 0.96. Moreover, the NSP, ReVIs, and RGBVIs predictors were considered more suitable for estimating the maturity of dry peas, and the RF algorithm was more accurate in distinguishing between early- and late-maturing dry pea plants using these predictors.

3.4. Machine Learning Model Performance for Estimating Dry Pea Maturity with Combined Datasets

We focused on investigating the impacts of different sensor types on the accuracy of dry pea maturity prediction. To accomplish this, we started with LIDAR and RGB sensors. Our approach provides insights into how the choice of sensors can affect the accuracy of dry pea maturity prediction and could be useful for farmers and researchers seeking cost-effective and accurate methods for crop monitoring. To determine the effectiveness of combining different types of predictors to improve the accuracy of dry pea maturity prediction using machine learning models, various combinations of selected datasets were tested (Table 7).

Figure 7 shows the overall accuracy of different ML models in estimating the maturity of dry peas using different datasets. Using Dataset 1 as input for five ML models improved the accuracy of all models compared to using each metric separately. Among the tested models, RF showed the highest performance with precision, recall, and f1 scores of 0.97, 0.95, and 0.96, respectively, when using CHM and RGBVIs as input. RF was the best model in accurately estimating dry pea maturity with only RGB and LIDAR sensors. The use of Dataset 2 as input resulted in a noticeable improvement in the accuracy of all ML models. Although the NB model showed a significantly higher improvement in all evaluation metrics than other models, the RF model still had the highest accuracy values. The integration of Dataset 3 led to a notable increase in the accuracy of all ML models, with RF showing the highest level of accuracy with scores of 0.99, 0.97, and 0.98 for precision, recall, and f1 score, respectively. Dataset 4 led to an increase in accuracy for the RF and SVM models, but reduced the accuracy of the NB model, without any significant effect on the ANN and KNN models. However, RF demonstrated the highest level of accuracy in all accuracy metrics. Using Dataset 5 as input resulted in a notable improvement in the accuracy of the NB, KNN, and RF models, while the accuracy of the remaining models remained unchanged.

Overall, the RF and ANN models demonstrated superior abilities to estimate maturity levels using RGB, LiDAR, and multispectral products. These models exhibited higher performance levels compared to other machine learning models. Among these, the RF model demonstrated the highest level of accuracy, with a score of 0.99 for precision, recall, and f1 score.

The late fusion of precision, recall, and f1 scores from multiple machine learning models (ANN, RF, KNN, SVM, NB) was undertaken to obtain an aggregated assessment of classification performance. The fusion results indicate that the combined predictions achieved a precision of 0.98, a recall of 0.96, and an f1 score of 0.97. These metrics demonstrate a high level of accuracy and consistency in the classification results obtained through late fusion.

3.5. Performances of Machine Learning Models in Estimateingthe Maturity of Dry Peas by Considering Plant Growth Stage

Based on the findings of Section 3.4, the random forest machine learning model was chosen to accurately predict dry pea maturity during three different growth periods. The results of this model are displayed in Figure 8. During the early- to mid-maturity period (67 to 84 DAP), all evaluation metrics (precision, recall, and f1 score) showed a decline, with precision decreasing from 0.98 to 0.96, recall from 0.96 to 0.94, and f1 score from 0.97 to 0.95. However, during the mid- to late-maturity period (84 DAP to 100 DAP), these trends were reversed, with precision increasing from 0.96 to 0.99, recall increasing from 0.94 to 0.99, and f1 score increasing from 0.95 to 0.99. Overall, the RF model showed superior performance in predicting dry pea maturity during both early and late reproductive stages.

To evaluate the performances of various machine learning models in predicting the maturity of dry pea, the models were applied to a subset of the dataset consisting of 36 plots. Out of these plots, 15 were for early-maturing peas and 21 were for late-maturing peas. The 100 DAP dataset was utilized, and a comparison was made between the predicted and actual maturity of the plots. Figure 9 provides a visual representation of this comparison, depicting the predicted maturity plots and the corresponding actual maturity plots for both early- and late-maturity genotypes. The presented map serves to reinforce the quantitative findings, confirming the strong performance of the random forest model in accurately predicting the maturity of dry pea. The visual agreement between the predicted and actual maturity plots further validates the reliability and effectiveness of the random forest model for maturity estimation in this specific crop.

4. Discussion

The results (Table 6) demonstrate that feature selection can enhance the accuracy of machine learning models used in predicting dry pea maturity across all spectral and structural predictors. Specifically, for the prediction of maturity using narrow spectral bands, the precision values obtained prior to feature selection were observed to be 0.82, 0.95, 0.88, 0.80, and 0.73 for the SVM, RF, KNN, ANN, and NB models, respectively. However, following feature selection, these values increased to 0.87, 0.98, 0.91, 0.83, and 0.75, respectively. To define the stability of the models, the performance of each model was evaluated using Monte Carlo simulation, which was repeated 100 times to obtain precision scores (Figure 10). In all iterations and datasets, the random forest model exhibited less variation in precision values compared to other machine learning models, both before and after feature selection. This stability can be attributed to the ensemble nature of the random forest algorithm and its ability to handle feature selection effectively. This result aligns with previous studies conducted by [109,110], which demonstrated the consistent performance and stability of the random forest model in achieving higher accuracy for high-throughput plant phenotyping tasks using UAS-derived products.

A paired t-test was conducted to assess the statistical significance of the differences between the performances of the models before and after feature selection. The p-value was reported to determine the significance of these differences (Table 8). The p-value is less than the typical threshold of 0.05, which means that the difference in precision values before and after feature selection is statistically significant. In the random forest model, the difference between the precision values before and after feature selection in all datasets was found to be statistically significant. This means that the increase in precision values in the random forest model is not likely to be due to chance, but rather due to the effect of feature selection. This improvement may be attributed to the fact that feature selection helps to decrease the dataset’s dimensionality [111] and enhance the signal-to-noise ratio, particularly in the case of spectral data [112]. By reducing noise and redundancy, the random forest’s performance can be enhanced [113]. Another reason for the improvement may be that feature selection can enhance the random forest’s generalization ability [114,115]. In other words, it can enhance the model’s performance when dealing with new data, which is crucial in predictive contexts. In contrast to the random forest model, the statistically significant difference between the precision values before and after feature selection varies depending on the type of dataset used for other machine learning models. Specifically, in the ANN model, the observed difference in precision values before and after feature selection was not found to be statistically significant in the majority of the datasets (five out of six). This result may be attributed to the presence of feature interactions or dependencies. Models such as ANN and SVM have the ability to capture complex feature interactions [116] through hidden layers or kernel functions, respectively. This implicit incorporation of feature interactions in these models may reduce the impact of explicit feature selection techniques, resulting in a less significant difference in precision values.

Among the machine learning models and features tested, the RF and KNN models consistently demonstrated the highest accuracy in estimating dry pea maturity using RGB, LIDAR, and multispectral products. While crop height and image textural metrics did not significantly improve accuracy in most models, the RF and KNN models showed a stronger sensitivity to these features compared to other models (Table 6 and Figure 6). This may be due to the limited change in crop height observed in mature plants, which can be better captured by these models (transect i and ii in Figure 11a,b). However, it should be noted that pea plants have a relatively constant growth pattern, resulting in consistent crop height throughout the growing season [117] (Figure 11c), which may explain why crop height did not significantly improve accuracy in most models.

The red-edge vegetation indices have been shown to outperform NIR vegetation indices in predicting dry pea maturity. This is likely due to the fact that red-edge vegetation indices are more sensitive to changes in chlorophyll content and plant health [118], which are important indicators of maturity. Additionally, the red-edge region is where a steep increase in vegetation reflectance occurs [119], making it a key area for vegetation analysis. These findings are consistent with previous research [104,120] that has shown the efficacy of using red-edge vegetation indices in estimating plant growth and maturity in various crops.

While image textural metrics are less sensitive to environmental conditions such as lighting, shadows and atmospheric conditions that can affect the spectral values [121], the effectiveness of image textural metrics for predicting dry pea maturity in machine learning models may be limited because the vertical distribution of leaves on the pea plant does not undergo significant changes during the maturation process. As a result, the values of textural metrics may be similar at different growth stages for both early- and late-maturing dry peas, making it difficult for the algorithms to differentiate between the two based solely on texture (Figure 12).

Table 6 and Figure 7 show that naive Bayes (NB) showed the lowest accuracy in estimating the maturity using crop height and spectral values. After feature selection, the precision, recall, and f1 scores obtained for this model were 0.97, 0.87, and 0.9, respectively. One possible reason for the lower performance of the NB model in this study could be that the data used in the study were correlated, while the NB model assumes that all features are independent from each other [122]. This lack of independency among the predictor variables and the presence of correlated variables, especially in narrow spectral bands and vegetation indices, may have led to the model making less accurate predictions. Another possible reason could be that the NB model may not be well-suited to handling high-dimensional datasets [123], which is the case of the datasets used in this study. Moreover, we acknowledge the work of Calders and Verwer [124] in demonstrating the sensitivity of the naive Bayes algorithm to data distribution. Consistent with their findings, we demonstrate that incorporating additional data through the combination of predictor variables and growth stage considerations can improve the accuracy of the NB model.

The late fusion of precision, recall, and f1 scores from multiple machine learning models (ANN, RF, KNN, SVM, NB) resulted in highly accurate and consistent classification outcomes. This can be attributed to two factors. The complementary information captured by each model contributes to a more comprehensive understanding of the data, leading to improved classification performance [66]. In addition, the consensus decision-making process in late fusion fosters a collective agreement among the models, resulting in reliable and confident classification outcomes [65].

The RF algorithm showed superior accuracy in predicting dry pea maturity utilizing all structural and spectral predictors in comparison to other machine learning models, with a precision, recall, and f1 score of 0.99. One of the main advantages of RF is its ability to handle high-dimensional datasets [125] and to make accurate predictions even when the data are highly correlated [126]. RF could also handle missing values and outliers [127], which may be important when working with NIR or red-edge vegetation indices, as the measurements can be affected by various factors such as weather conditions and soil quality [128]. RF also selects the most important features and decides the best split points by random subsets [129]; this feature could have played a role in its better performance.

The successive decrease in evaluation metrics during the early- to mid-maturity period (67 DAP to 84 DAP) (Figure 8) could be attributed to the rapid growth and development of the crop during this stage, leading to higher variability in the data and making it more challenging to accurately predict maturity. However, during the mid- to late-maturity period (84 DAP to 100 DAP), the decrease in variability as the crop reaches maturity could have resulted in a successive increase in the evaluation indices.

5. Conclusions

This study aimed to evaluate the effectiveness of different UAS-derived indices in predicting the maturity of dry peas using machine learning algorithms. The results showed that feature selection improved the performance of all machine learning models in forecasting the maturity of dry peas. Narrow spectral bands, red-edge vegetation indices, and RGB-based vegetation indices were found to be the most effective parameters for predicting the maturity of dry peas when using the random forest (RF) classifier. Although the RF algorithm had the highest accuracy among the machine learning models for predicting dry pea maturity using crop height metrics (CHM), it was found that CHM had the least predictive ability compared to other predictors. The red-edge vegetation indices demonstrated better performance compared to NIR vegetation indices in predicting dry pea maturity. The RF and artificial neural network (ANN) models demonstrated a superior ability to estimate maturity levels utilizing RGB, LIDAR, and multispectral products. Among these, the RF model exhibited the highest level of accuracy, with a score of 0.99 for all precision, recall, and f1 scores. Finally, the RF machine learning model was chosen to accurately predict dry pea maturity during three different growth periods, and the results show that the RF model used in this study showed superior performance in predicting dry pea maturity during both early and late reproductive stages.

Author Contributions

Conceptualization, A.B., N.B. and P.F.; methodology, A.B.; software, A.B. and P.F.; validation, A.B., H.N., J.-H.K., M.M. and J.P.J.; formal analysis, A.B.; investigation, A.B., H.N., J.-H.K., M.M. and J.P.J.; resources, P.F.; data curation, A.B.; writing—original draft, A.B.; writing—review and editing, A.B., N.B., N.D., N.F. and P.F.; visualization, A.B.; supervision, N.B. and P.F.; project administration, N.B. and P.F.; funding acquisition, N.B. and P.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the North Dakota Department of Agriculture through the Specialty Crop Block Grant Program (21-316 and 20-489), USDA-NIFA (Bandillo Hatch Project ND01513; Flores Hatch Project ND01488), and the U.S. Department of Agriculture, Agricultural Research Service, under agreement No.58-6064-8-023.

Data Availability Statement

The data used in this research are available upon request to the corresponding author.

Acknowledgments

The authors express their gratitude to the Northern Pulse Growers Association for continued funding support in the development of the NDSU advanced pea breeding lines and the U.S. Department of Agriculture, Agricultural Research Service, for providing drones and sensors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tulbek, M.; Lam, R.; Asavajaru, P.; Wang, C. Pea: A Sustainable Vegetable Protein Crop. In Sustainable Protein Sources; Elsevier: Amsterdam, The Netherlands, 2017; pp. 145–164. [Google Scholar]
Quirós Vargas, J.J.; Zhang, C.; Smitchger, J.A.; McGee, R.J.; Sankaran, S. Phenotyping of Plant Biomass and Performance Traits Using Remote Sensing Techniques in Pea (Pisum sativum L.). Sensors 2019, 19, 2031. [Google Scholar] [CrossRef] [PubMed]
Lupwayi, N.Z.; Clayton, G.W.; Rice, W.A. Rhizobial Inoculants for Legume Crops. J. Crop Improv. 2006, 15, 289–321. [Google Scholar] [CrossRef]
Singh, K.D.; Duddu, H.S.N.; Vail, S.; Parkin, I.; Shirtliffe, S.J. UAV-Based Hyperspectral Imaging Technique to Estimate Canola (Brassica napus L.) Seedpods Maturity. Can. J. Remote Sens. 2021, 47, 33–47. [Google Scholar] [CrossRef]
Williams, E.J.; Drexler, J.S. A Non-Destructive Method for Determining Peanut Pod Maturity. Peanut Sci. 1981, 8, 134–141. [Google Scholar] [CrossRef]
Hassanzadeh, A.; Zhang, F.; Murphy, S.P.; Pethybridge, S.J.; van Aardt, J. Toward Crop Maturity Assessment via UAS-Based Imaging Spectroscopy—A Snap Bean Pod Size Classification Field Study. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5519717. [Google Scholar] [CrossRef]
Sharma, B.; Yadav, J.K.P.S.; Yadav, S. Predict Crop Production in India Using Machine Learning Technique: A Survey. In Proceedings of the 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 4–5 June 2020; pp. 993–997. [Google Scholar]
Galli, G.; Horne, D.W.; Collins, S.D.; Jung, J.; Chang, A.; Fritsche-Neto, R.; Rooney, W.L. Optimization of UAS-Based High-Throughput Phenotyping to Estimate Plant Health and Grain Yield in Sorghum. Plant Phenome J. 2020, 3, e20010. [Google Scholar] [CrossRef]
Guo, W.; Carroll, M.E.; Singh, A.; Swetnam, T.L.; Merchant, N.; Sarkar, S.; Singh, A.K.; Ganapathysubramanian, B. UAS-Based Plant Phenotyping for Research and Breeding Applications. Plant Phenomics 2021, 2021, 9840192. [Google Scholar] [CrossRef]
Houldcroft, C.J.; Campbell, C.L.; Davenport, I.J.; Gurney, R.J.; Holden, N. Measurement of Canopy Geometry Characteristics Using LiDAR Laser Altimetry: A Feasibility Study. IEEE Trans. Geosci. Remote Sens. 2005, 43, 2270–2282. [Google Scholar]
Walter, J.D.; Edwards, J.; McDonald, G.; Kuchel, H. Estimating Biomass and Canopy Height with LiDAR for Field Crop Breeding. Front. Plant Sci. 2019, 10, 1145. [Google Scholar] [CrossRef]
Elmenreich, W. An Introduction to Sensor Fusion. Vienna Univ. Technol. Austria 2002, 502, 1–28. [Google Scholar]
Zakaria, A.; Shakaff, A.Y.M.; Masnan, M.J.; Saad, F.S.A.; Adom, A.H.; Ahmad, M.N.; Jaafar, M.N.; Abdullah, A.H.; Kamarudin, L.M. Improved Maturity and Ripeness Classifications of Magnifera Indica Cv. Harumanis Mangoes through Sensor Fusion of an Electronic Nose and Acoustic Sensor. Sensors 2012, 12, 6023–6048. [Google Scholar] [CrossRef] [PubMed]
Ignat, T.; Alchanatis, V.; Schmilovitch, Z. Maturity Prediction of Intact Bell Peppers by Sensor Fusion. Comput. Electron. Agric. 2014, 104, 9–17. [Google Scholar] [CrossRef]
Fei, S.; Hassan, M.A.; Xiao, Y.; Su, X.; Chen, Z.; Cheng, Q.; Duan, F.; Chen, R.; Ma, Y. UAV-Based Multi-Sensor Data Fusion and Machine Learning Algorithm for Yield Prediction in Wheat. Precis. Agric. 2023, 24, 187–212. [Google Scholar] [CrossRef]
Maimaitijiang, M.; Sagan, V.; Sidike, P.; Daloye, A.M.; Erkbol, H.; Fritschi, F.B. Crop Monitoring Using Satellite/UAV Data Fusion and Machine Learning. Remote Sens. 2020, 12, 1357. [Google Scholar] [CrossRef]
Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine Learning in Agriculture: A Review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef] [PubMed]
Divyanth, L.; Marzougui, A.; González-Bernal, M.J.; McGee, R.J.; Rubiales, D.; Sankaran, S. Evaluation of Effective Class-Balancing Techniques for CNN-Based Assessment of Aphanomyces Root Rot Resistance in Pea (Pisum sativum L.). Sensors 2022, 22, 7237. [Google Scholar] [CrossRef]
Adak, A.; Murray, S.C.; Božinović, S.; Lindsey, R.; Nakasagga, S.; Chatterjee, S.; Anderson, S.L.; Wilde, S. Temporal Vegetation Indices and Plant Height from Remotely Sensed Imagery Can Predict Grain Yield and Flowering Time Breeding Value in Maize via Machine Learning Regression. Remote Sens. 2021, 13, 2141. [Google Scholar] [CrossRef]
Zheng, C.; Abd-Elrahman, A.; Whitaker, V.; Dalid, C. Prediction of Strawberry Dry Biomass from UAV Multispectral Imagery Using Multiple Machine Learning Methods. Remote Sens. 2022, 14, 4511. [Google Scholar] [CrossRef]
Barzin, R.; Lotfi, H.; Varco, J.J.; Bora, G.C. Machine Learning in Evaluating Multispectral Active Canopy Sensor for Prediction of Corn Leaf Nitrogen Concentration and Yield. Remote Sens. 2022, 14, 120. [Google Scholar] [CrossRef]
Zhuo, W.; Huang, J.; Gao, X.; Ma, H.; Huang, H.; Su, W.; Meng, J.; Li, Y.; Chen, H.; Yin, D. Prediction of Winter Wheat Maturity Dates through Assimilating Remotely Sensed Leaf Area Index into Crop Growth Model. Remote Sens. 2020, 12, 2896. [Google Scholar] [CrossRef]
Yu, N.; Li, L.; Schmitz, N.; Tian, L.F.; Greenberg, J.A.; Diers, B.W. Development of Methods to Improve Soybean Yield Estimation and Predict Plant Maturity with an Unmanned Aerial Vehicle Based Platform. Remote Sens. Environ. 2016, 187, 91–101. [Google Scholar] [CrossRef]
Teodoro, P.E.; Teodoro, L.P.R.; Baio, F.H.R.; da Silva Junior, C.A.; dos Santos, R.G.; Ramos, A.P.M.; Pinheiro, M.M.F.; Osco, L.P.; Gonçalves, W.N.; Carneiro, A.M.; et al. Predicting Days to Maturity, Plant Height, and Grain Yield in Soybean: A Machine and Deep Learning Approach Using Multispectral Data. Remote Sens. 2021, 13, 4632. [Google Scholar] [CrossRef]
Hunt, E.R., Jr.; Daughtry, C.; Eitel, J.U.; Long, D.S. Remote Sensing Leaf Chlorophyll Content Using a Visible Band Index. Agron. J. 2011, 103, 1090–1099. [Google Scholar] [CrossRef]
Ihuoma, S.O.; Madramootoo, C.A. Sensitivity of Spectral Vegetation Indices for Monitoring Water Stress in Tomato Plants. Comput. Electron. Agric. 2019, 163, 104860. [Google Scholar] [CrossRef]
Eng, L.S.; Ismail, R.; Hashim, W.; Baharum, A. The Use of VARI, GLI, and VIgreen Formulas in Detecting Vegetation in Aerial Images. Int. J. Technol. 2019, 10, 1385–1394. [Google Scholar] [CrossRef]
Jiang, J.; Cai, W.; Zheng, H.; Cheng, T.; Tian, Y.; Zhu, Y.; Ehsani, R.; Hu, Y.; Niu, Q.; Gui, L.; et al. Using Digital Cameras on an Unmanned Aerial Vehicle to Derive Optimum Color Vegetation Indices for Leaf Nitrogen Concentration Monitoring in Winter Wheat. Remote Sens. 2019, 11, 2667. [Google Scholar] [CrossRef]
Yeom, J.; Jung, J.; Chang, A.; Ashapure, A.; Maeda, M.; Maeda, A.; Landivar, J. Comparison of Vegetation Indices Derived from UAV Data for Differentiation of Tillage Effects in Agriculture. Remote Sens. 2019, 11, 1548. [Google Scholar] [CrossRef]
Lu, J.; Cheng, D.; Geng, C.; Zhang, Z.; Xiang, Y.; Hu, T. Combining Plant Height, Canopy Coverage and Vegetation Index from UAV-Based RGB Images to Estimate Leaf Nitrogen Concentration of Summer Maize. Biosyst. Eng. 2021, 202, 42–54. [Google Scholar] [CrossRef]
Stanton, C.; Starek, M.J.; Elliott, N.; Brewer, M.; Maeda, M.M.; Chu, T. Unmanned Aircraft System-Derived Crop Height and Normalized Difference Vegetation Index Metrics for Sorghum Yield and Aphid Stress Assessment. J. Appl. Remote Sens. 2017, 11, 026035. [Google Scholar] [CrossRef]
Shafian, S.; Rajan, N.; Schnell, R.; Bagavathiannan, M.; Valasek, J.; Shi, Y.; Olsenholler, J. Unmanned Aerial Systems-Based Remote Sensing for Monitoring Sorghum Growth and Development. PLoS ONE 2018, 13, e0196605. [Google Scholar] [CrossRef]
Zhang, J.; Qiu, X.; Wu, Y.; Zhu, Y.; Cao, Q.; Liu, X.; Cao, W. Combining Texture, Color, and Vegetation Indices from Fixed-Wing UAS Imagery to Estimate Wheat Growth Parameters Using Multivariate Regression Methods. Comput. Electron. Agric. 2021, 185, 106138. [Google Scholar] [CrossRef]
Candiago, S.; Remondino, F.; De Giglio, M.; Dubbini, M.; Gattelli, M. Evaluating Multispectral Images and Vegetation Indices for Precision Farming Applications from UAV Images. Remote Sens. 2015, 7, 4026–4047. [Google Scholar] [CrossRef]
Barzin, R.; Pathak, R.; Lotfi, H.; Varco, J.; Bora, G.C. Use of UAS Multispectral Imagery at Different Physiological Stages for Yield Prediction and Input Resource Optimization in Corn. Remote Sens. 2020, 12, 2392. [Google Scholar] [CrossRef]
Blanco, V.; Blaya-Ros, P.J.; Castillo, C.; Soto-Vallés, F.; Torres-Sánchez, R.; Domingo, R. Potential of UAS-Based Remote Sensing for Estimating Tree Water Status and Yield in Sweet Cherry Trees. Remote Sens. 2020, 12, 2359. [Google Scholar] [CrossRef]
Burns, B.W.; Green, V.S.; Hashem, A.A.; Massey, J.H.; Shew, A.M.; Adviento-Borbe, M.A.A.; Milad, M. Determining Nitrogen Deficiencies for Maize Using Various Remote Sensing Indices. Precis. Agric. 2022, 23, 791–811. [Google Scholar] [CrossRef]
Gano, B.; Dembele, J.S.B.; Ndour, A.; Luquet, D.; Beurier, G.; Diouf, D.; Audebert, A. Using Uav Borne, Multi-Spectral Imaging for the Field Phenotyping of Shoot Biomass, Leaf Area Index and Height of West African Sorghum Varieties under Two Contrasted Water Conditions. Agronomy 2021, 11, 850. [Google Scholar] [CrossRef]
Dong, T.; Liu, J.; Shang, J.; Qian, B.; Ma, B.; Kovacs, J.M.; Walters, D.; Jiao, X.; Geng, X.; Shi, Y. Assessment of Red-Edge Vegetation Indices for Crop Leaf Area Index Estimation. Remote Sens. Environ. 2019, 222, 133–143. [Google Scholar] [CrossRef]
Xie, Q.; Dash, J.; Huang, W.; Peng, D.; Qin, Q.; Mortimer, H.; Casa, R.; Pignatti, S.; Laneve, G.; Pascucci, S.; et al. Vegetation Indices Combining the Red and Red-Edge Spectral Information for Leaf Area Index Retrieval. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1482–1493. [Google Scholar] [CrossRef]
Adamczyk, J.; Osberger, A. Red-Edge Vegetation Indices for Detecting and Assessing Disturbances in Norway Spruce Dominated Mountain Forests. Int. J. Appl. Earth Obs. Geoinf. 2015, 37, 90–99. [Google Scholar] [CrossRef]
Datt, B. A New Reflectance Index for Remote Sensing of Chlorophyll Content in Higher Plants: Tests Using Eucalyptus Leaves. J. Plant Physiol. 1999, 154, 30–36. [Google Scholar] [CrossRef]
Stow, D.; Nichol, C.J.; Wade, T.; Assmann, J.J.; Simpson, G.; Helfter, C. Illumination Geometry and Flying Height Influence Surface Reflectance and NDVI Derived from Multispectral UAS Imagery. Drones 2019, 3, 55. [Google Scholar] [CrossRef]
Li, J.; Shi, Y.; Veeranampalayam-Sivakumar, A.N.; Schachtman, D.P. Elucidating Sorghum Biomass, Nitrogen and Chlorophyll Contents with Spectral and Morphological Traits Derived from Unmanned Aircraft System. Front. Plant Sci. 2018, 9, 1406. [Google Scholar] [CrossRef] [PubMed]
Ostroumov, I.; Kuzmenko, N. Outliers Detection in Unmanned Aerial System Data. In Proceedings of the 2021 11th International Conference on Advanced Computer Information Technologies (ACIT), Deggendorf, Germany, 15–17 September 2021; pp. 591–594. [Google Scholar]
Torres, J.M.; Nieto, P.G.; Alejano, L.; Reyes, A. Detection of Outliers in Gas Emissions from Urban Areas Using Functional Data Analysis. J. Hazard. Mater. 2011, 186, 144–149. [Google Scholar] [CrossRef] [PubMed]
Schubert, E.; Zimek, A.; Kriegel, H.-P. Generalized Outlier Detection with Flexible Kernel Density Estimates. In Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, PA, USA, 24–26 April 2014; SIAM: Philadelphia, PA, USA, 2014; pp. 542–550. [Google Scholar]
Nurunnabi, A.; West, G.; Belton, D. Outlier Detection and Robust Normal-Curvature Estimation in Mobile Laser Scanning 3D Point Cloud Data. Pattern Recognit. 2015, 48, 1404–1419. [Google Scholar] [CrossRef]
Brede, B.; Terryn, L.; Barbier, N.; Bartholomeus, H.M.; Bartolo, R.; Calders, K.; Derroire, G.; Krishna Moorthy, S.M.; Lau, A.; Levick, S.R.; et al. Non-Destructive Estimation of Individual Tree Biomass: Allometric Models, Terrestrial and UAV Laser Scanning. Remote Sens. Environ. 2022, 280, 113180. [Google Scholar] [CrossRef]
Zhang, A.; Yu, H.; Huan, Z.; Yang, X.; Zheng, S.; Gao, S. SMOTE-RkNN: A Hybrid Re-Sampling Method Based on SMOTE and Reverse k-Nearest Neighbors. Inf. Sci. 2022, 595, 70–88. [Google Scholar] [CrossRef]
Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection: A Data Perspective. ACM Comput. Surv. 2017, 50, 1–45. [Google Scholar] [CrossRef]
Luo, H.; Li, M.; Dai, S.; Li, H.; Li, Y.; Hu, Y.; Zheng, Q.; Yu, X.; Fang, J. Combinations of Feature Selection and Machine Learning Algorithms for Object-Oriented Betel Palms and Mango Plantations Classification Based on Gaofen-2 Imagery. Remote Sens. 2022, 14, 1757. [Google Scholar] [CrossRef]
You, W.; Yang, Z.; Ji, G. Feature Selection for High-Dimensional Multi-Category Data Using PLS-Based Local Recursive Feature Elimination. Expert Syst. Appl. 2014, 41, 1463–1475. [Google Scholar] [CrossRef]
Khuimphukhieo, I.; Marconi, T.; Enciso, J.; da Silva, J.A. The Use of UAS-Based High Throughput Phenotyping (HTP) to Assess Sugarcane Yield. J. Agric. Food Res. 2023, 11, 100501. [Google Scholar] [CrossRef]
Bhandari, M. High-Throughput Field Phenotyping in Wheat Using Unmanned Aerial Systems (UAS). Ph.D. Thesis, Texas A&M University, College Station, TX, USA, 2020. [Google Scholar]
Shu, M.; Fei, S.; Zhang, B.; Yang, X.; Guo, Y.; Li, B.; Ma, Y. Application of UAV Multisensor Data and Ensemble Approach for High-Throughput Estimation of Maize Phenotyping Traits. Plant Phenomics 2022, 2022, 9802585. [Google Scholar] [CrossRef] [PubMed]
Niazian, M.; Niedbała, G. Machine Learning for Plant Breeding and Biotechnology. Agriculture 2020, 10, 436. [Google Scholar] [CrossRef]
Muthulakshmi, A.; Renjith, P.N. Classification of Durian Fruits Based on Ripening with Machine Learning Techniques. In Proceedings of the 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi, India, 3–5 December 2020; pp. 542–547. [Google Scholar]
Xie, Z.; Chen, S.; Gao, G.; Li, H.; Wu, X.; Meng, L.; Ma, Y. Evaluation of Rapeseed Flowering Dynamics for Different Genotypes with UAV Platform and Machine Learning Algorithm. Precis. Agric. 2022, 23, 1688–1706. [Google Scholar] [CrossRef]
Schonlau, M.; Zou, R.Y. The Random Forest Algorithm for Statistical Learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN Model-Based Approach in Classification. In Proceedings of the OTM Confederated International Conferences on the Move to Meaningful Internet Systems; Springer: Berlin/Heidelberg, Germany, 2003; pp. 986–996. [Google Scholar]
Kriegeskorte, N.; Golan, T. Neural Network Models and Deep Learning. Curr. Biol. 2019, 29, R231–R236. [Google Scholar] [CrossRef]
Slavova, V.; Ropelewska, E.; Sabanci, K.; Aslan, M.F.; Nacheva, E. A Comparative Evaluation of Bayes, Functions, Trees, Meta, Rules and Lazy Machine Learning Algorithms for the Discrimination of Different Breeding Lines and Varieties of Potato Based on Spectroscopic Data. Eur. Food Res. Technol. 2022, 248, 1765–1775. [Google Scholar] [CrossRef]
Ray, S. A Quick Review of Machine Learning Algorithms. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud And Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 35–39. [Google Scholar]
Safont, G.; Salazar, A.; Vergara, L. Vector Score Alpha Integration for Classifier Late Fusion. Pattern Recognit. Lett. 2020, 136, 48–55. [Google Scholar] [CrossRef]
Mohandes, M.; Deriche, M.; Aliyu, S.O. Classifiers Combination Techniques: A Comprehensive Review. IEEE Access 2018, 6, 19626–19639. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Y.; Zhang, Q.; Duan, R.; Liu, J.; Qin, Y.; Wang, X. Toward Multi-Stage Phenotyping of Soybean with Multimodal UAV Sensor Data: A Comparison of Machine Learning Approaches for Leaf Area Index Estimation. Remote Sens. 2022, 15, 7. [Google Scholar] [CrossRef]
Yang, H.; Li, F.; Wang, W.; Yu, K. Estimating Above-Ground Biomass of Potato Using Random Forest and Optimized Hyperspectral Indices. Remote Sens. 2021, 13, 2339. [Google Scholar] [CrossRef]
Barradas, A.; Correia, P.M.P.; Silva, S.; Mariano, P.; Pires, M.C.; Matos, A.R.; da Silva, A.B.; Marques da Silva, J. Comparing Machine Learning Methods for Classifying Plant Drought Stress from Leaf Reflectance Spectra in Arabidopsis Thaliana. Appl. Sci. 2021, 11, 6392. [Google Scholar] [CrossRef]
Muharam, F.M.; Nurulhuda, K.; Zulkafli, Z.; Tarmizi, M.A.; Abdullah, A.N.H.; Che Hashim, M.F.; Mohd Zad, S.N.; Radhwane, D.; Ismail, M.R. UAV- and Random-Forest-AdaBoost (RFA)-Based Estimation of Rice Plant Traits. Agronomy 2021, 11, 915. [Google Scholar] [CrossRef]
Pranga, J.; Borra-Serrano, I.; Aper, J.; De Swaef, T.; Ghesquiere, A.; Quataert, P.; Roldán-Ruiz, I.; Janssens, I.A.; Ruysschaert, G.; Lootens, P. Improving Accuracy of Herbage Yield Predictions in Perennial Ryegrass with Uav-Based Structural and Spectral Data Fusion and Machine Learning. Remote Sens. 2021, 13, 3459. [Google Scholar] [CrossRef]
Virnodkar, S.S.; Pachghare, V.K.; Patil, V.; Jha, S.K. Remote Sensing and Machine Learning for Crop Water Stress Determination in Various Crops: A Critical Review. Precis. Agric. 2020, 21, 1121–1155. [Google Scholar] [CrossRef]
Lee, U.; Chang, S.; Putra, G.A.; Kim, H.; Kim, D.H. An Automated, High-Throughput Plant Phenotyping System Using Machine Learning-Based Plant Segmentation and Image Analysis. PLoS ONE 2018, 13, e0196615. [Google Scholar] [CrossRef]
Paulus, S.; Dupuis, J.; Riedel, S.; Kuhlmann, H. Automated Analysis of Barley Organs Using 3D Laser Scanning: An Approach for High Throughput Phenotyping. Sensors 2014, 14, 12670–12686. [Google Scholar] [CrossRef]
Rehman, T.U.; Ma, D.; Wang, L.; Zhang, L.; Jin, J. Predictive Spectral Analysis Using an End-to-End Deep Model from Hyperspectral Images for High-Throughput Plant Phenotyping. Comput. Electron. Agric. 2020, 177, 105713. [Google Scholar] [CrossRef]
Zhao, B.; Li, J.; Baenziger, P.S.; Belamkar, V.; Ge, Y.; Zhang, J.; Shi, Y. Automatic Wheat Lodging Detection and Mapping in Aerial Imagery to Support High-Throughput Phenotyping and In-Season Crop Management. Agronomy 2020, 10, 1762. [Google Scholar] [CrossRef]
Zhou, J.; Zhou, J.; Ye, H.; Ali, M.L.; Chen, P.; Nguyen, H.T. Yield Estimation of Soybean Breeding Lines under Drought Stress Using Unmanned Aerial Vehicle-Based Imagery and Convolutional Neural Network. Biosyst. Eng. 2021, 204, 90–103. [Google Scholar] [CrossRef]
Ballesta, P.; Maldonado, C.; Mora-Poblete, F.; Mieres-Castro, D.; del Pozo, A.; Lobos, G.A. Spectral-Based Classification of Genetically Differentiated Groups in Spring Wheat Grown under Contrasting Environments. Plants 2023, 12, 440. [Google Scholar] [CrossRef]
Shi, G.; Du, X.; Du, M.; Li, Q.; Tian, X.; Ren, Y.; Zhang, Y.; Wang, H. Cotton Yield Estimation Using the Remotely Sensed Cotton Boll Index from UAV Images. Drones 2022, 6, 254. [Google Scholar] [CrossRef]
Du, J.; Lu, X.; Fan, J.; Qin, Y.; Yang, X.; Guo, X. Image-Based High-Throughput Detection and Phenotype Evaluation Method for Multiple Lettuce Varieties. Front. Plant Sci. 2020, 11, 563386. [Google Scholar] [CrossRef]
Samac, A. Objective Phenotyping of Root System Architecture Using Image Augmentation and Machine Learning in Alfalfa (Medicago sativa L.). Plant Phenomics 2022, 2022, 9879610. [Google Scholar]
Shirzadifar, A.; Bajwa, S.; Nowatzki, J.; Bazrafkan, A. Field Identification of Weed Species and Glyphosate-Resistant Weeds Using High Resolution Imagery in Early Growing Season. Biosyst. Eng. 2020, 200, 200–214. [Google Scholar] [CrossRef]
Yu, J.; Cheng, T.; Cai, N.; Zhou, X.-G.; Diao, Z.; Wang, T.; Du, S.; Liang, D.; Zhang, D. Wheat Lodging Segmentation Based on Lstm_PSPNet Deep Learning Network. Drones 2023, 7, 143. [Google Scholar] [CrossRef]
Saito, T.; Rehmsmeier, M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
Gong, M. A Novel Performance Measure for Machine Learning Classification. Int. J. Manag. Inf. Technol. IJMIT 2021, 13, 14. [Google Scholar] [CrossRef]
Wu, C.; Niu, Z.; Tang, Q.; Huang, W. Estimating Chlorophyll Content from Hyperspectral Vegetation Indices: Modeling and Validation. Agric. For. Meteorol. 2008, 148, 1230–1241. [Google Scholar] [CrossRef]
Trevisan, R.; Pérez, O.; Schmitz, N.; Diers, B.; Martin, N. High-Throughput Phenotyping of Soybean Maturity Using Time Series UAV Imagery and Convolutional Neural Networks. Remote Sens. 2020, 12, 3617. [Google Scholar] [CrossRef]
Zhang, J.; Pan, Y.; Tao, X.; Wang, B.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; Liu, X. In-Season Mapping of Rice Yield Potential at Jointing Stage Using Sentinel-2 Images Integrated with High-Precision UAS Data. Eur. J. Agron. 2023, 146, 126808. [Google Scholar] [CrossRef]
Bhandari, M.; Baker, S.; Rudd, J.C.; Ibrahim, A.M.H.; Chang, A.; Xue, Q.; Jung, J.; Landivar, J.; Auvermann, B. Assessing the Effect of Drought on Winter Wheat Growth Using Unmanned Aerial System (UAS)-Based Phenotyping. Remote Sens. 2021, 13, 1144. [Google Scholar] [CrossRef]
Zhang, J.; Song, Q.; Cregan, P.B.; Nelson, R.L.; Wang, X.; Wu, J.; Jiang, G.-L. Genome-Wide Association Study for Flowering Time, Maturity Dates and Plant Height in Early Maturing Soybean (Glycine max) Germplasm. BMC Genom. 2015, 16, 217. [Google Scholar] [CrossRef] [PubMed]
Duncanson, L.I.; Niemann, K.O.; Wulder, M.A. Estimating Forest Canopy Height and Terrain Relief from GLAS Waveform Metrics. Remote Sens. Environ. 2010, 114, 138–154. [Google Scholar] [CrossRef]
Sweet, D.D.; Tirado, S.B.; Springer, N.M.; Hirsch, C.N.; Hirsch, C.D. Opportunities and Challenges in Phenotyping Row Crops Using Drone-Based RGB Imaging. Plant Phenome J. 2022, 5, e20044. [Google Scholar] [CrossRef]
Walter, T.; Massin, P.; Erginay, A.; Ordonez, R.; Jeulin, C.; Klein, J.-C. Automatic Detection of Microaneurysms in Color Fundus Images. Med. Image Anal. 2007, 11, 555–566. [Google Scholar] [CrossRef]
Meena, S.V.; Dhaka, V.S.; Sinwar, D. Exploring the Role of Vegetation Indices in Plant Diseases Identification. In Proceedings of the 2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC), Waknaghat, India, 6–8 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 372–377. [Google Scholar]
Cao, X.; Liu, Y.; Yu, R.; Han, D.; Su, B. A Comparison of UAV RGB and Multispectral Imaging in Phenotyping for Stay Green of Wheat Population. Remote Sens. 2021, 13, 5173. [Google Scholar] [CrossRef]
Sun, H. Crop Vegetation Indices. In Encyclopedia of Smart Agriculture Technologies; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–7. [Google Scholar]
Hatfield, J.L.; Prueger, J.H. Value of Using Different Vegetative Indices to Quantify Agricultural Crop Characteristics at Different Growth Stages under Varying Management Practices. Remote Sens. 2010, 2, 562–578. [Google Scholar] [CrossRef]
Zhou, J.; Yungbluth, D.; Vong, C.N.; Scaboo, A.; Zhou, J. Estimation of the Maturity Date of Soybean Breeding Lines Using UAV-Based Multispectral Imagery. Remote Sens. 2019, 11, 2075. [Google Scholar] [CrossRef]
Khot, L.; Sankaran, S.; Cummings, T.; Johnson, D.; Carter, A.; Serra, S.; Musacchi, S. Applications of Unmanned Aerial System in Washington State Agriculture, Paper No. 1637. In Proceedings of the 12th International Conference on Precision Agriculture, Sacramento, CA, USA, 20–23 July 2014; pp. 20–23. [Google Scholar]
Mustafa, G.; Zheng, H.; Khan, I.H.; Tian, L.; Jia, H.; Li, G.; Cheng, T.; Tian, Y.; Cao, W.; Zhu, Y.; et al. Hyperspectral Reflectance Proxies to Diagnose In-Field Fusarium Head Blight in Wheat with Machine Learning. Remote Sens. 2022, 14, 2784. [Google Scholar] [CrossRef]
Choudhury, M.R.; Christopher, J.; Das, S.; Apan, A.; Menzies, N.W.; Chapman, S.; Mellor, V.; Dang, Y.P. Detection of Calcium, Magnesium, and Chlorophyll Variations of Wheat Genotypes on Sodic Soils Using Hyperspectral Red Edge Parameters. Environ. Technol. Innov. 2022, 27, 102469. [Google Scholar] [CrossRef]
Hassani, K.; Gholizadeh, H.; Jacob, J.; Natalie, V.A.; Taghvaeian, S.; Raun, W.; Carpenter, J. Application of Unmanned Aircraft System (UAS)-Based RGB and Multispectral Data to Monitor Winter Wheat During the Growing Season. In Proceedings of the AGU Fall Meeting Abstracts, Virtual, 1–17 December 2020; Volume 2020, p. B013-01. [Google Scholar]
Santana, D.C.; de Oliveira Cunha, M.P.; Dos Santos, R.G.; Cotrim, M.F.; Teodoro, L.P.R.; da Silva Junior, C.A.; Baio, F.H.R.; Teodoro, P.E. High-Throughput Phenotyping Allows the Selection of Soybean Genotypes for Earliness and High Grain Yield. Plant Methods 2022, 18, 13. [Google Scholar] [CrossRef] [PubMed]
Thompson, C.N.; Guo, W.; Sharma, B.; Ritchie, G.L. Using Normalized Difference Red Edge Index to Assess Maturity in Cotton. Crop Sci. 2019, 59, 2167–2177. [Google Scholar] [CrossRef]
Stamford, J.D.; Vialet-Chabrand, S.; Cameron, I.; Lawson, T. Development of an Accurate Low Cost NDVI Imaging System for Assessing Plant Health. Plant Methods 2023, 19, 9. [Google Scholar] [CrossRef] [PubMed]
Martin, K.L.; Girma, K.; Freeman, K.; Teal, R.; Tubańa, B.; Arnall, D.; Chung, B.; Walsh, O.; Solie, J.; Stone, M.; et al. Expression of Variability in Corn as Influenced by Growth Stage Using Optical Sensor Measurements. Agron. J. 2007, 99, 384–389. [Google Scholar] [CrossRef]
Gwathmey, C.O.; Tyler, D.D.; Yin, X. Prospects for Monitoring Cotton Crop Maturity with Normalized Difference Vegetation Index. Agron. J. 2010, 102, 1352–1360. [Google Scholar] [CrossRef]
Liu, K.; Li, Y.; Hu, H. Predicting Ratoon Rice Growth Rhythmbased on NDVI at Key Growth Stages of Main Rice. Chil. J. Agric. Res. 2015, 75, 410–417. [Google Scholar] [CrossRef]
Peng, J.; Manevski, K.; Kørup, K.; Larsen, R.; Andersen, M.N. Random Forest Regression Results in Accurate Assessment of Potato Nitrogen Status Based on Multispectral Data from Different Platforms and the Critical Concentration Approach. Field Crops Res. 2021, 268, 108158. [Google Scholar] [CrossRef]
Johansen, K.; Morton, M.J.L.; Malbeteau, Y.; Aragon, B.; Al-Mashharawi, S.; Ziliani, M.G.; Angel, Y.; Fiene, G.; Negrão, S.; Mousa, M.A.A.; et al. Predicting Biomass and Yield in a Tomato Phenotyping Experiment Using UAV Imagery and Random Forest. Front. Artif. Intell. 2020, 3, 28. [Google Scholar] [CrossRef]
Li, M.; Wang, H.; Yang, L.; Liang, Y.; Shang, Z.; Wan, H. Fast Hybrid Dimensionality Reduction Method for Classification Based on Feature Selection and Grouped Feature Extraction. Expert Syst. Appl. 2020, 150, 113277. [Google Scholar] [CrossRef]
Galvão, L.S.; Epiphanio, J.C.N.; Breunig, F.M.; Formaggio, A.R. Crop Type Discrimination Using Hyperspectral Data: Advances and Perspectives. Biophys. Biochem. Charact. Plant Species Stud. 2018, 2018, 183–210. [Google Scholar]
Fezai, R.; Dhibi, K.; Mansouri, M.; Trabelsi, M.; Hajji, M.; Bouzrara, K.; Nounou, H.; Nounou, M. Effective Random Forest-Based Fault Detection and Diagnosis for Wind Energy Conversion Systems. IEEE Sens. J. 2020, 21, 6914–6921. [Google Scholar] [CrossRef]
Ibba, P.; Tronstad, C.; Moscetti, R.; Mimmo, T.; Cantarella, G.; Petti, L.; Martinsen, Ø.G.; Cesco, S.; Lugli, P. Supervised Binary Classification Methods for Strawberry Ripeness Discrimination from Bioimpedance Data. Sci. Rep. 2021, 11, 11202. [Google Scholar] [CrossRef] [PubMed]
Chen, Q.; Zhang, M.; Xue, B. Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression. IEEE Trans. Evol. Comput. 2017, 21, 792–806. [Google Scholar] [CrossRef]
Koo, C.L.; Liew, M.J.; Mohamad, M.S.; Mohamed Salleh, A.H. A Review for Detecting Gene-Gene Interactions Using Machine Learning Methods in Genetic Epidemiology. BioMed Res. Int. 2013, 2013, 432375. [Google Scholar] [CrossRef]
Zhang, C.; McGee, R.J.; Vandemark, G.J.; Sankaran, S. Crop Performance Evaluation of Chickpea and Dry Pea Breeding Lines across Seasons and Locations Using Phenomics Data. Front. Plant Sci. 2021, 12, 640259. [Google Scholar] [CrossRef]
Kanke, Y.; Tubaña, B.; Dalen, M.; Harrell, D. Evaluation of Red and Red-Edge Reflectance-Based Vegetation Indices for Rice Biomass and Grain Yield Prediction Models in Paddy Fields. Precis. Agric. 2016, 17, 507–530. [Google Scholar] [CrossRef]
Evangelides, C.; Nobajas, A. Red-Edge Normalised Difference Vegetation Index (NDVI705) from Sentinel-2 Imagery to Assess Post-Fire Regeneration. Remote Sens. Appl. Soc. Environ. 2020, 17, 100283. [Google Scholar] [CrossRef]
Li, F.; Miao, Y.; Feng, G.; Yuan, F.; Yue, S.; Gao, X.; Liu, Y.; Liu, B.; Ustin, S.L.; Chen, X. Improving Estimation of Summer Maize Nitrogen Status with Red Edge-Based Spectral Vegetation Indices. Field Crops Res. 2014, 157, 111–123. [Google Scholar] [CrossRef]
Mao, P.; Qin, L.; Hao, M.; Zhao, W.; Luo, J.; Qiu, X.; Xu, L.; Xiong, Y.; Ran, Y.; Yan, C.; et al. An Improved Approach to Estimate Above-Ground Volume and Biomass of Desert Shrub Communities Based on UAV RGB Images. Ecol. Indic. 2021, 125, 107494. [Google Scholar] [CrossRef]
Taheri, S.; Mammadov, M. Learning the Naive Bayes Classifier with Optimization Models. Int. J. Appl. Math. Comput. Sci. 2013, 23, 787–795. [Google Scholar] [CrossRef]
Singh, A.; Thakur, N.; Sharma, A. A Review of Supervised Machine Learning Algorithms. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 16–18 March 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1310–1315. [Google Scholar]
Calders, T.; Verwer, S. Three Naive Bayes Approaches for Discrimination-Free Classification. Data Min. Knowl. Discov. 2010, 21, 277–292. [Google Scholar] [CrossRef]
Boulesteix, A.-L.; Janitza, S.; Kruppa, J.; König, I.R. Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 493–507. [Google Scholar] [CrossRef]
Li, J.; Tran, M.; Siwabessy, J. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness. PLoS ONE 2016, 11, e0149089. [Google Scholar] [CrossRef] [PubMed]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An Assessment of the Effectiveness of a Random Forest Classifier for Land-Cover Classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
Basso, B.; Cammarano, D.; De Vita, P. Remotely Sensed Vegetation Indices: Theory and Applications for Crop Management. Riv. Ital. Di Agrometeorol. 2004, 1, 36–53. [Google Scholar]
Mutanga, O.; Adam, E.; Cho, M.A. High Density Biomass Estimation for Wetland Vegetation Using Worldview-2 Imagery and Random Forest Regression Algorithm. Int. J. Appl. Earth Obs. Geoinf. 2012, 18, 399–406. [Google Scholar] [CrossRef]

Figure 1. Workflow of distinguishing early- and late-maturing dry pea plots.

Figure 2. (a) Location of North Dakota on the US map; (b) location of Prosper in North Dakota; (c) overview of the 714 experimental plots in the dry pea field; (d) location of early-maturing plots (yellow color) and late-maturing plots (blue color).

Figure 3. Shadow removal and vegetation segmentation from soil background using an example plot. (a) The original RGB image. (b) Excess green (ExG) image showing both the soil and shaded pixels; and (c) vegetation pixels shown in red.

Figure 4. Spectral behavior of early- and late-maturing dry pea at 67 (a), 84 (b), and 100 (c) days after planting.

Figure 5. Feature importance determined by the backward feature elimination method: (a) at NSP—narrow spectral bands; (b) at CHM-crop height metrics; (c) at ITM-image textural metrics; (d) at RGBVIs-rgb-based vegetation indices; (e) at NIRVIs-NIR vegetation indices; (f) at ReVIs-red-edge vegetation indices. Blue boxes indicate the features for which the importance score summed more than 90%.

Figure 6. ROC curve analysis and AUC values for different predictors used in distinguishing early- and late-maturing dry pea plants. The random line represents the performance of a random classifier, meaning a classifier that assigns class labels randomly without any discrimination.

Figure 7. Machine learning algorithms (ANN—artificial neural network, KNN—K-nearest neighbor, SVM—support vector machine, NB—naïve Bayes, RF—random forest) for estimating dry pea maturity with different datasets (DS) containing predictor variables.

Figure 8. Performance of random forest model in estimating dry pea maturity with different combinations of predictor variables at three dates, measured by days after planting (DAP).

Figure 9. Comparison of predicted and actual maturity plots based on five different machine learning models for early- (green) and late (pink)-maturing genotypes 100 days after planting.

Figure 10. Stability analysis of precision values for different machine learning models (SVM—support vector machine, RF—random forest, KNN—K-nearest neighbor, ANN—artificial neural network, and NB—naïve Bayes) before (gray color) and after (dark blue color) feature selection.

Figure 11. Visual evaluation of the amount of separability of early- and late-maturing dry peas regarding crop height. (a) An RGB image represents late-maturing dry pea plant (A), ground (B), early-maturing dry pea plant (C), and i and ii transect black line. (b) Three-dimensional transect i–ii for late-maturing plant (A), ground (B), early-maturing plant (C). (c) Variability of the average of plant height in different growth stages for early-maturing (1) and late-maturing (0) dry pea plants.

Figure 12. The level of separability between early-maturing (1) and late-maturing (0) peas based on two key image textural metrics: contrast (left) and homogeneity for the blue band (right). These metrics were identified as the most important parameters of image textural metrics by the backward feature elimination method.

Table 1. Overview of flight parameters.

Aircraft	Sensor Type	Flight Altitude (m)	Flight Speed (m·s⁻¹)	Side Overlap%	Forward Overlap%	Spatial Resolution (cm)
Matrice 300	Zenmuse P1(RGB)	50	5	80	80	1
Matrice 300	Zenmuse L1 (LIDAR)	50	5	70	70	3
Matrice 200	Mica Sense Dual System	50	5	80	80	3

Table 2. Vegetation index (VI) abbreviation, name, formula, and reference.

VI	Name	Equation	Reference
VARI	Visible atmospherically resistant index	$\frac{(green - red)}{(green + red - blue)}$	[27,28]
ExG	Excess green index	(2 × * g × * r − * b)	[28]
ExR	Excess red index	(1.4 × r − b)	[29]
GLA	Green leaf algorithm	$\frac{(2 \times green - red - blue)}{(2 \times green + red + blue)}$	[27]
IKAW	Kawashima index	$\frac{(red - blue)}{(blue + red)}$	[28]
GRRI	Green, red ratio index	$\frac{green}{red}$	[30]
NDVI	Normalized difference vegetation index	$\frac{(NIR - red)}{(NIR + red)}$	[31,32,33]
SAVI	Soil-adjusted vegetation index	$1.5 \times \frac{(NIR - red)}{(NIR + red + 0.5)}$	[31,34]
NDRE	Normalized difference red-edge index	$\frac{(NIR - RE)}{(NIR + RE)}$	[35]
TRRVI	Transformed red range vegetation index	$\frac{\frac{[(RE - red)}{(RE + red)}}{[NDVI + 1]}$	[36]
GSAVI	Generalized soil-adjusted vegetation index	$\frac{(NIR - red)}{(NIR + red - green - blue)}$	[33]
CIgreen	Green chlorophyll index	$\frac{NIR}{green} - 1$	[37]
GNDVI	Green normalized vegetation index	$\frac{(NIR - green)}{(NIR + green)}$	[28]
SR	Simple ratio	$\frac{NIR}{Red}$	[38]
CI_RE	Chlorophyll index red-edge	$\frac{NIR}{RE} - 1$	[39,40]
NDVI_RE	Red-edge normalized difference vegetation index	$\frac{(NIR - RE)}{(NIR + RE)}$	[39,40]
Datt2	Simple red-edge ratio	$\frac{NIR}{RE}$	[41,42]
LIC2	Simple ratio Lichtenthaler indices 2	$\frac{Blue}{RE}$	[41]

* g =

\frac{green}{(red + green + blue)}

, r =

\frac{red}{(red + green + blue)},

b =

\frac{blue}{(red + green + blue)}

, RE = red-edge.

Table 3. Comparison of datasets before and after sampling applying the Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTE-ENN) to improve the balance between the numbers of early- and late-maturing plots.

SMOTE-ENN	Dataset	Early-Maturing Plots	Late-Maturing Plots
Before	Train	135	365
Before	Test	57	157
After	Train	361	365
After	Test	154	157

Table 4. Hyperparameter tuning for the different ML models reported in the literature.

Model	Hyperparameter	Optimal Value	Used by
RF	n_estimators	500	[67,68]
	Max_depth	5	[67,69,70]
	Input variables per node (mtry)	the square root of the total number of features	[71,72]
SVM	kernel	‘rbf’	[67,73,74]
SVM	Regularization parameter (C)	1	[75]
ANN	hidden_layer_sizes	(16, 16)	[76]
	epochs	100	[77]
	Activation functions	relu	[77]
	Optimizer	adam	[78]
	Learning rate	0.001	[78]
KNN	n_neighbors	14	[79]
KNN	distance metric	‘euclidean’	[80]
NB	Laplace smoothing function	1	[81]

Table 5. Different statistical metrics used for each input parameter to train ML models.

Input Parameter	Abbreviation	Indices/Parameters	Statistical Metrics
Crop Height Metrics	CHM	-	Min, Mean, Median, Max, Variance, Stdev, percentiles (10th to 99th)
Narrow Spectral Bands	NSP	blue, green, red, red-edge, NIR	Min, Mean, Max
RGB-based Vegetation Indices	RGBVIs	ExG, ExR, VARI, GLI, IKAW, GRRI	Min, Mean, Max
NIR Vegetation Indices	NIRVIs	NDVI, SAVI, GSAVI, CIgreen, SR, GNDVI	Min, Mean, Max
Red-edge Vegetation Indices	ReVIs	NDRE, LIC2, Datt2, CI_RE, TRRVI, NDVI_RE	Min, Mean, Max
Image Textural Metrics	ITM	Homogenity, Mean, Contrast, Second moment, Entropy, Dissimilarity, Variance	Min, Mean, Max

Note: Blue at 444 nm and 475 nm, green at 531 nm and 560 nm, red at 650 nm and 668 nm, red-edge at 705 nm, 740 nm, and 717 nm, and NIR at 840 nm.

Table 6. Performance of each UAS-derived metric in estimating dry pea maturity based on five machine learning algorithms before and after feature selection.

Dataset	Model	f1 Score		Recall		Precision
Dataset	Model	Before	After	Before	After	Before	After
NIRVIs	SVM	0.85	0.87	0.81	0.85	0.86	0.88
	RF	0.91	0.93	0.90	0.92	0.92	0.94
	KNN	0.83	0.87	0.81	0.85	0.88	0.89
	ANN	0.76	0.86	0.75	0.85	0.81	0.88
	NB	0.73	0.75	0.71	0.72	0.75	0.77
CHM	SVM	0.74	0.75	0.73	0.74	0.75	0.75
	RF	0.85	0.86	0.84	0.85	0.86	0.87
	KNN	0.79	0.81	0.78	0.80	0.80	0.82
	ANN	0.66	0.67	0.65	0.66	0.68	0.69
	NB	0.74	0.75	0.73	0.74	0.75	0.76
NSP	SVM	0.74	0.75	0.69	0.70	0.82	0.87
	RF	0.94	0.97	0.93	0.96	0.95	0.98
	KNN	0.83	0.85	0.81	0.83	0.88	0.91
	ANN	0.72	0.73	0.67	0.69	0.80	0.83
	NB	0.71	0.72	0.68	0.70	0.73	0.75
ReVIs	SVM	0.65	0.68	0.60	0.62	0.71	0.74
	RF	0.92	0.95	0.91	0.94	0.94	0.96
	KNN	0.87	0.87	0.84	0.84	0.92	0.93
	ANN	0.71	0.72	0.66	0.68	0.79	0.85
	NB	0.65	0.67	0.59	0.61	0.67	0.69
RGBVIs	SVM	0.82	0.85	0.81	0.83	0.83	0.86
	RF	0.93	0.94	0.92	0.93	0.94	0.95
	KNN	0.84	0.86	0.83	0.85	0.88	0.91
	ANN	0.76	0.78	0.74	0.76	0.77	0.79
	NB	0.73	0.74	0.71	0.73	0.74	0.78
IMT	SVM	0.66	0.69	0.65	0.69	0.67	0.69
	RF	0.84	0.87	0.83	0.85	0.86	0.88
	KNN	0.82	0.87	0.83	0.87	0.85	0.87
	ANN	0.66	0.67	0.65	0.65	0.67	0.67
	NB	0.74	0.75	0.69	0.70	0.77	0.79

NIRVIs: NIR vegetation indices; CHM: crop height metrics; NSP: narrow spectral bands; ReVIs: red-edge vegetation indices; RGBVIs: rgb-based vegetation indices; IMT; image textural metrics.

Table 7. Combinations of different variables used to estimate the maturity of dry pea.

Dataset	Variables
1	CHM and RGBVIs
2	CHM, RGBVIs, and NSP
3	CHM, RGBVIs, NSP, and ReVIs
4	CHM, RGBVIs, NSP, ReVIs, and NIRVIs
5	CHM, RGBVIs, NSP, ReVIs, NIRVIs, and ITM

CHM: crop height metrics; RGBVIs: rgb-based vegetation indices; NSP: narrow spectral bands; ReVIs: red-edge vegetation indices; NIRVIs: NIR vegetation indices; ITM: image textural metrics.

Table 8. Statistical analysis of precision before and after feature selection for different machine learning models and datasets.

Dataset	Model	p Value	Dataset	Model	p Value
NIRVIs	SVM	0.000	ReVIs	SVM	0.071
	RF	0.000		RF	0.000
	KNN	0.064		KNN	0.440
	ANN	0.910		ANN	4.623
	NB	0.001		NB	0.088
CHM	SVM	0.031	RGBVIs	SVM	1.19
	RF	0.000		RF	0.000
	KNN	0.029		KNN	1.103
	ANN	0.335		ANN	0.165
	NB	0.000		NB	0.029
NSP	SVM	0.033	IMT	SVM	0.000
	RF	0.000		RF	0.000
	KNN	0.060		KNN	0.064
	ANN	0.000		ANN	0.910
	NB	0.538		NB	0.001

Note: NIRVIs—NIR vegetation indices, CHM—crop height metrics, NSP—narrow spectral bands, ReVIs—red-edge vegetation indices, RGBVIs—RGB-based vegetation indices, IMT—image textural metrics, SVM—support vector machine, RF—random forest, KNN—K-nearest neighbor, ANN—artificial neural network, NB—naïve Bayes.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bazrafkan, A.; Navasca, H.; Kim, J.-H.; Morales, M.; Johnson, J.P.; Delavarpour, N.; Fareed, N.; Bandillo, N.; Flores, P. Predicting Dry Pea Maturity Using Machine Learning and Advanced Sensor Fusion with Unmanned Aerial Systems (UASs). Remote Sens. 2023, 15, 2758. https://doi.org/10.3390/rs15112758

AMA Style

Bazrafkan A, Navasca H, Kim J-H, Morales M, Johnson JP, Delavarpour N, Fareed N, Bandillo N, Flores P. Predicting Dry Pea Maturity Using Machine Learning and Advanced Sensor Fusion with Unmanned Aerial Systems (UASs). Remote Sensing. 2023; 15(11):2758. https://doi.org/10.3390/rs15112758

Chicago/Turabian Style

Bazrafkan, Aliasghar, Harry Navasca, Jeong-Hwa Kim, Mario Morales, Josephine Princy Johnson, Nadia Delavarpour, Nadeem Fareed, Nonoy Bandillo, and Paulo Flores. 2023. "Predicting Dry Pea Maturity Using Machine Learning and Advanced Sensor Fusion with Unmanned Aerial Systems (UASs)" Remote Sensing 15, no. 11: 2758. https://doi.org/10.3390/rs15112758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Dry Pea Maturity Using Machine Learning and Advanced Sensor Fusion with Unmanned Aerial Systems (UASs)

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data Acquisition

2.2. Data Pre-Processing and Image Processing

2.2.1. Radiometric Calibration

2.2.2. Image Stitching and Plot Data Extraction

2.2.3. Masking out Soil and Shaded Pixels

2.2.4. Outlier Detection and Removal

2.2.5. Machine Learning Datasets Creation and Feature Selection

2.3. Machine Learning Models

2.3.1. RF

2.3.2. KNN

2.3.3. ANN

2.3.4. NB

2.3.5. SVM

2.3.6. Late Fusion

2.4. Model Architecture and Trainging Process

2.4.1. Model Configuration

2.4.2. Model Inputs

2.4.3. Model Implementation

2.4.4. Model Assessment

3. Results

3.1. Optimal Time for Dry Pea Maturity Assessment

3.2. Feature Selection

3.3. Performance of UAS-Derived Predictors for Dry Pea Maturity

3.4. Machine Learning Model Performance for Estimating Dry Pea Maturity with Combined Datasets

3.5. Performances of Machine Learning Models in Estimateingthe Maturity of Dry Peas by Considering Plant Growth Stage

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI