Weed Detection and Classification with Computer Vision Using a Limited Image Dataset

Moldvai, László; Mesterházi, Péter Ákos; Teschner, Gergely; Nyéki, Anikó

doi:10.3390/app14114839

Open AccessArticle

Weed Detection and Classification with Computer Vision Using a Limited Image Dataset

¹

Department of Biosystems Engineering and Precision Technology, Albert Kázmér Mosonmagyaróvár Faculty of Agricultural and Food Sciences, Széchenyi István University, 9200 Mosonmagyarovar, Hungary

²

AXIÁL Ltd., 6500 Baja, Hungary

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4839; https://doi.org/10.3390/app14114839

Submission received: 26 April 2024 / Revised: 23 May 2024 / Accepted: 31 May 2024 / Published: 3 June 2024

(This article belongs to the Section Agricultural Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

:

In agriculture, as precision farming increasingly employs robots to monitor crops, the use of weeding and harvesting robots is expanding the need for computer vision. Currently, most researchers and companies address these computer vision tasks with CNN-based deep learning. This technology requires large datasets of plant and weed images labeled by experts, as well as substantial computational resources. However, traditional feature-based approaches to computer vision can extract meaningful parameters and achieve comparably good classification results with only a tenth of the dataset size. This study presents these methods and seeks to determine the minimum number of training images required to achieve reliable classification. We tested the classification results with 5, 10, 20, 40, 80, and 160 images per weed type in a four-class classification system. We extracted shape features, distance transformation features, color histograms, and texture features. Each type of feature was tested individually and in various combinations to determine the best results. Using six types of classifiers, we achieved a 94.56% recall rate with 160 images per weed. Better results were obtained with more training images and a greater variety of features.

Keywords:

weed detection; weed classification; computer vision; distance transformation; color histogram; texture features; SVM; RF; GBM; KNN

1. Introduction

In contemporary agriculture, a significant trend is emerging: a diminishing interest in working within the agricultural sector. Despite this trend, the global population continues to grow, now surpassing 8.1 billion people, all of whom require food to survive. The challenge is clear: How can we produce sufficient food with fewer people willing to work in agriculture? This situation underscores the urgent need for innovative methods and technologies to enhance agricultural productivity and ensure adequate food supply for everyone.

In the field of computer vision, two prominent methodologies are particularly noteworthy: traditional feature-based approaches and deep learning methods. The former involves extracting meaningful features from images to facilitate classification tasks. This method relies on identifying distinctive characteristics within the image, such as shapes, textures, or color patterns, which are then used to classify objects or scenes.

Conversely, deep learning methods, especially those involving neural networks, have gained significant traction in recent years. These approaches train complex neural network models using large datasets of labeled images. Through iterative learning processes, the neural network automatically extracts relevant features and makes classifications based on the input images. The cornerstone of computer vision (CV) deep learning methods is the convolutional neural network (CNN), with one of its prominent implementations being the YOLO algorithm.

A previous study [1] included both artificial plants and real plants. Portulaca weeds were targeted, while square weeds and pepper plants were not targeted. For artificial plants, an accuracy and recall of 91% were achieved, while for real plants, an accuracy and recall of 71% and 78% were achieved, respectively. The YOLO deep learning solution was chosen for classification.

In the field of agriculture, accessible datasets play a crucial role. While it is possible to generate images, progress can be accelerated by utilizing other datasets, particularly those with labeled data, to conduct tests and enable the training of artificial networks. While there are numerous databases available on the internet that contain images of dogs, cats, and human faces, finding usable datasets using images of crops, plants, leaves, fruits, vegetables, or weeds can be challenging. One such dataset, referenced in [2], focuses on five types of weeds and comprises 1200 images for each category.

Similarly, in [3], the author discussed an on-site weed management computer vision system that utilizes a public weed database sourced from Kaggle. However, this dataset contains only 1300 images in total, further compounded by the fact that it is an augmented dataset. Originally, there were only 546 images available.

Ref. [4] provides an up-to-date overview of publicly available image databases for weed detection. Among the 36 identified datasets, there were limitations regarding data variations and distributional shifts. This study concluded that few datasets are suitable for examining the robustness of weed detection models. In this study, a new, two-season, eight-class weed dataset was constructed, consisting of images collected in the 2021 and 2022 seasons. Tests were conducted on the dataset using the YOLO algorithm.

The dataset discussed in [5] contains manually captured data from both handheld cameras and unmanned aerial systems, making it suitable for both ground-based and aerial weed control robots. This dataset comprises 3975 images featuring five different weed species commonly found in North Dakota. The images were captured under various lighting conditions, including sunny, cloudy, shadowy, and low-light conditions.

Despite the impressive capabilities of deep learning algorithms in image classification tasks, there remains a significant level of opacity in understanding how these networks arrive at their decisions. Unlike traditional feature-based approaches, where the rationale behind classifications can often be traced back to specific features or rules, deep learning models operate as complex “black boxes”.

To address the challenges associated with deep learning, it is beneficial to revert to the first method, which focuses on comprehending the content of images and extracting meaningful features from them. This approach involves employing techniques such as image processing and feature extraction to identify relevant characteristics within the images. By discerning key features like shapes, textures, and color patterns, the aim is to capture the essence of the objects or scenes depicted in the images.

Once these meaningful features are extracted, some vector classifiers are trained and tested. For instance, a support vector machine (SVM) is utilized for classification purposes. SVM is a powerful supervised learning algorithm that excels at separating data points into distinct classes based on the features extracted from the input data. By leveraging the discriminative capabilities of SVM, images can be effectively classified into different categories to identify, for instance, specific weed species within agricultural landscapes. However, SVM was not the winner in our case. The random forest algorithm outperformed SVM.

In [6], the authors presented a real-time, computer vision-based plant and weed recognition system for variable-rate agricultural spraying. The recognition and classification of plants and weeds were performed using the RF classifier. The results obtained from multiple field tests demonstrated the effectiveness of the proposed vision-based variable-rate agricultural spraying framework in real time.

Ensuring that the training dataset includes a wide range of variations, such as different growth stages, lighting conditions, and perspectives, can significantly enhance the robustness and generalization ability of the classification model. The other aspect is the algorithms. For instance, Giselsson [7] conducted research on recognizing plant silhouettes based on shape characteristics. Two methods were employed: one method approximated the distribution of distance-transformed image values using Legendre polynomials and classified these data, while the other method extracted 21 shape features and classified them using four different vector classifiers. The distance transformation approach achieved better recognition than the method using 21 features in classifying two types of plants.

Another study [8] focused on recognizing three plants (canola, maize, and radish) using texture analysis. For this, local binary pattern (LBP) parameters were utilized, and classification was performed using an SVM classifier. The images were captured in a weed-free, well-lit environment on a table. This research achieved a recognition rate of 91.85%.

Ramirez-Paredes [9] investigated malting barley using image processing techniques that involved shape recognition. Initially, they considered the application of Hu moments but ultimately utilized other central moments. Color features and local binary patterns (LBPs) were also employed. A linear SVM was used for classification. The images were captured against a clean background; however, a recurring issue was that the barley grains sometimes touched each other. Achieving good results required as many as 25,000 training images, which is feasible for seeds but challenging for weeds.

In [10], weed recognition was performed using local binary patterns (LBPs) to extract local texture features. The extracted features were classified using template matching and SVM. For LBPs, template matching achieved a classification rate of 83.3%, while SVM achieved a rate of 93.8%. The study included two types of weeds, thus conducting binary classification and using a limited set of 200 images for each class.

Bhunia [11] conducted a study focusing on image classification, employing color and texture-based feature extraction. For the color space, Bhunia opted for HSV (hue, saturation, and value). Texture parameters were derived from the gray-level co-occurrence matrix (GLCM) in addition to utilizing local binary patterns (LBPs). Bhunia classified the vectors using various distance calculation methods. The system was tested using five publicly available image databases.

Hamuda [12] utilized HSV color space transformations to detect cauliflower plants. He performed erosion and dilation procedures on the images and then identified the contour, center, and bounding rectangle of the cauliflower. This method achieved a 99% accuracy rate in detecting cauliflower, even in images with minimal weed interference. The experiment was tested under various weather conditions, including sunny, partially cloudy, and completely cloudy scenarios.

The subsequent study by Zhu [13] implemented the YOLO deep learning system to distinguish between two classes: wheat and weed. This research used a relatively large dataset of 5400 images, with 80% allocated for training and 20% for testing. Both the traditional YOLO and an optimized version were employed, with the traditional YOLO achieving a recognition rate of 93.86% and the optimized YOLO reaching 94.86%.

Bakhshipour’s research [14] focused on weed detection, aiming to recognize five different plant species. The study employed contour Fourier descriptors and extracted Hu moments and other parameters from the images. The obtained vectors were classified using artificial neural networks (ANNs) and support vector machines (SVMs), with the ANN classifier achieving a recognition rate of 92.67% and the SVM classifier achieving 93.33%.

During weed detection, the histogram of oriented gradients (HOG) is also useful, although achieving over 90% accuracy on its own is typically not attainable. One study [15] created a single-layer image using the excess green (ExG) index procedure. HOG values were computed on these images, which were then transformed using a visual word technique and classified. This method succeeded in achieving around 90% accuracy, and in some cases, even approached 95%.

In a recent study [16], researchers investigated the problem of detecting partially overlapping weeds and crops. They focused on identifying the boundary between a weed and a crop within the green vegetative area. The images were divided into 64 × 64, 32 × 32, and 16 × 16-pixel cells, and three texture parameters (HOG, LBP, and GLCM) were computed for each cell. These parameters, as well as their combinations, were classified using four different methods. The classified values were then assigned to the cells and propagated between layers, with adjustments made to the classified values as needed. This approach successfully separated weeds from crops.

2. Materials and Methods

2.1. Dataset

A publicly available weed dataset was utilized for this study, accessible for non-commercial applications at https://github.com/zhangchuanyin/weed-datasets (accessed on 29 March 2024). The images were captured from a height of approximately 30 cm above the ground. Only a subset of the images from this dataset was employed in this research. To demonstrate the process, 6 plant types were selected. So, 3000 images were used altogether. Each class comprised the first 500 images from the original dataset. The CV processes were developed with Python programming language, and cv2 and skimage modules were used to handle the images.

The examples provided in Figure 1 demonstrate that the images were captured in real-world settings, as opposed to a controlled laboratory environment. Many deep learning solutions achieve impressive results with images obtained in a lab, which are typically created with a uniform white background and minimal environmental disturbances. However, images exhibit varying soil colors, are marred by cracks and straw, and often feature smaller weeds alongside the main crop. Furthermore, many images contain not just one, but two or three plants. By using these real-world images to develop classification models, we aimed to enhance their robustness.

2.2. Filtering the Images

Due to the presence of numerous distractors in the images, the initial step involved cleaning and preprocessing the images. It was essential to identify the main crop and eliminate any background clutter, as well as smaller or overlapping weeds. The pixel-level noise had to be removed first.

In [17], classification experiments were conducted with two types of weeds, measuring noise variance from no noise to 0.05 sigma noise across six steps. Their results showed that without noise, the classification accuracy was 98.4% without noise filtering and 98.29% with noise mean filtering, indicating nearly identical performance. However, even a small amount of noise, such as 0.01 sigma, reduced classification accuracy to 91.32% without noise filtering and 96.23% with noise reduction, highlighting the significant impact of noise on performance.

Furthermore, with larger noise levels, such as 0.05 sigma, classification accuracy decreased to 87.61% without filtering and increased to 92.85% with filtering, demonstrating a consistent 5% difference attributable to noise filtering. Various methods, including blur, median, and Gaussian filters, are commonly employed to address noise in images, albeit at the cost of slight image blurring.

To remove pixel-level noise from the image, first, a 5 × 5 Gaussian filter was employed. Given that the RGB color space may not be optimal for emphasizing green areas, the images were converted to the HSV color space. In the HSV image, the green areas were isolated, and all other pixels were set to black. The green color was defined by a range in HSV color space: lower_green = (30, 40, 40) and upper_green = (80, 255, 255). This process created a black-and-white mask highlighting the green regions. Figure 2b shows the mask, and it contains many other parts that are related to different plants. Subsequently, clusters of connected pixels within this mask were identified. Each cluster has a different color, as shown in Figure 2c.

From the identified clusters, the largest one was selected as the object of interest for further analysis. In this workflow, the focus was exclusively on this object. However, it was also plausible to consider the second largest object, followed by subsequent ones, for classification purposes. This approach allowed for a more comprehensive analysis, leveraging multiple objects within the image for classification tasks.

Based on the largest cluster, the region of interest was extracted from the image and stored for later use (Figure 2d). Additionally, this region was resized to a 256 × 256-pixel image. The shape parameters of the plant were analyzed on this resized image (Figure 2e,f).

After creating the mask from the image, small black spots may appear on the plant, potentially due to diseases, soil residue, or dried leaf fragments. It is possible to close these spots using erosion and dilation techniques, an important step for contour analysis.

In Figure 3, the main steps of image processing are depicted. The process began with an RGB image, which is shown in the top left corner. The Gaussian filter was applied to eliminate pixel-based noise. Subsequently, the image was transformed into the HSV color space to facilitate the extraction of the green mask, using the range previously specified. These image masks encompassed all green components, not just the primary object. The connected pixels were calculated, and an artificial coloring image was generated. The largest connected area was selected, and its color was used as a mask to isolate a single plant. In our dataset, these steps were enough to locate and extract a single plant. Occasionally, other small plants might adhere to the mask, but their proportion relative to the larger plant was minimal, causing no significant issues. With other datasets, where multiple plants may be present together, a maximum plant size parameter can be employed to determine whether the largest green mask in the image corresponds to a single plant or a group of plants. If it represents a group, the image can be discarded, or it can be subdivided into smaller parts, such as tiles, for further analysis using texture and color features. However, this constitutes a distinct task.

Following the extraction of the “one plant image mask”, the image was cropped, typically with an additional 5 pixels in every direction. This mask retained the same pixel/mm size as the original image and was utilized for distance transformation, which will be elaborated upon in subsequent sections.

Concurrently, the mask was resized to 256 × 256 pixels, and contours were computed, which is beneficial for area and Hu features.

Next, the mask was utilized to obtain the original RGB plant image, where the plant retained its original color, while the background was blackened. This image was suitable for generating RGB histogram color features. Additionally, an RGB-to-HSV color space transformation was performed, yielding HSV histogram color features.

Finally, a gray image comprising only one channel was generated, which is ideal for extracting texture features. Further details will be provided in subsequent sections.

2.3. Shape Features

The contour of the plant was determined based on the mask using the Canny edge detection algorithm, and the convex hull lines were also drawn, as can be seen in Figure 4. From the contour, parameters such as area, hull area, and solidity were derived.

To calculate the hull and hull area, the convex hull of the plant contour was measured. The convex hull is the smallest convex polygon that can enclose all points of the contour. The hull area represents the total area enclosed by this convex polygon.

Solidity is a metric that quantifies the compactness of an object’s shape relative to its convex hull. It is calculated as the ratio of the object’s area to the area of its convex hull. Solidity values range from 0 to 1, where a solidity of 1 indicates a completely solid shape without concavities, while a lower solidity value suggests the presence of concavities or indentations in the shape.

2.4. Hu Moments

Named after Ming-Kuei Hu, Hu moments consist of seven mathematical moments used to characterize the shape of an object in an image. These moments are derived from the normalized central moments of the object’s binary image. Hu moments are invariant to translation, scale, and rotation, making them invaluable for shape analysis and pattern recognition tasks in computer vision. They provide a compact representation of an object’s shape features, facilitating applications such as object classification, recognition, and shape matching.

Hu moments are particularly well suited for analyzing simple shapes due to their robustness to translation, scale, and rotation. However, they may not perform as effectively when applied to complex shapes, such as plants, which often exhibit intricate and irregular contours. In these cases, other shape descriptors or feature extraction methods that capture the detailed structure and texture of the object may be more suitable for accurate analysis and classification.

2.5. Distance Transformation

Image distance transformation, also known as distance mapping or distance transform, is a technique used in image processing to compute the distance of each pixel from the nearest contour pixel within an image. This distance is typically calculated using a defined metric, such as Euclidean or Manhattan distance.

The result of distance transformation is a new image in which each pixel value represents the distance from that pixel to the nearest contour pixel. This transformation is useful for various tasks, including object segmentation, shape analysis, and feature extraction.

One common application of image distance transformation is in morphological operations, such as skeletonization or thinning, where it is used to identify the medial axis or centerline of objects within an image.

After obtaining the distance image, local maxima within the image were identified. Each local maximum corresponded to a point where the distance to the nearest border was maximized. These local maxima served as key reference points within the image. To visualize these points, circles centered at each local maximum were drawn, with a radius equal to the value of the local maximum (as shown in Figure 5). This process effectively outlined the boundaries of the objects in the image, with each circle’s radius extending to the nearest border. By visually representing these contours, we gained valuable insights into the spatial distribution and shape characteristics of the objects present in the image.

The local maximum values were organized in ascending order. This categorization allowed us to rank the objects according to their size, with larger local maximum values typically corresponding to larger objects. By systematically analyzing and categorizing these values, we gained valuable insights into the spatial attributes within the image. This approach enabled us to effectively classify and differentiate between objects of varying sizes, providing valuable information for further analysis and interpretation of the image data.

2.6. Color Features

In the next phase, color feature vectors were extracted from the dataset. This approach allowed us to capture a comprehensive range of information from the image colors, enhancing the efficacy of weed detection and classification algorithms.

For the color-based features, color histograms were utilized. Initially, we analyzed the original RGB images, which were composed of three bands: red, green, and blue. Furthermore, we transformed the images into the HSV color space, where the three bands represented hue, saturation, and value. From these six channels, histograms comprising 32 bins each for the R, G, B, H, S, and V bands were constructed, yielding a total of 192 parameters. These histograms provided valuable insights into the distribution of colors within the images, thereby facilitating weed detection and classification efforts. The HSV channel histograms are displayed in Figure 6.

2.7. Texture Features

In addition to color-based features, the histogram of oriented gradient (HOG) features were integrated into the analysis. These features provided a detailed depiction of the gradient information inherent in the images, capturing essential texture and shape characteristics vital for weed detection. With 3780 parameters, the HOG features offered a comprehensive descriptor of the image content, thereby augmenting the accuracy and robustness of our weed classification algorithms.

Furthermore, the gray-level co-occurrence matrix (GLCM) features, which included 72 parameters, were used. GLCM features utilized in [18] were employed for classifying both plants and weeds. These features provided insights into the spatial relationships of pixel intensities within the images and valuable information about texture patterns, enhancing our understanding of the images’ structural properties.

To expand the analysis, we integrated local binary pattern (LBP) features, which are renowned for capturing micropatterns and texture variations within images. LBP features are particularly valuable for distinguishing weeds. By incorporating LBPs alongside existing features like HOG and GLCM, we aimed to improve the discriminative capability of the classification models, ultimately enhancing weed detection accuracy.

The sizes of the three types of texture feature vectors are detailed in Table 1.

2.8. Features and Feature Combinations

In the preceding sections, the foundational features extractable from an image were introduced. These features serve as the basis for classification tasks or deepening the understanding of their behavioral characteristics. With the three fundamental shape-based features at our disposal, feature combinations were constructed from these attributes. Similarly, three distinct types of texture combinations were explored, and texture-based feature combinations were assembled for testing purposes. The integration of all features was a pivotal step in the analysis. Subsequently, the least impactful texture and shape features were selectively removed to assess their individual contributions. The foundational features and their combinations are depicted in Figure 7.

2.9. Classification Methods

To determine the optimal prediction outcome, we evaluated six classifier algorithms, as outlined in Table 2. Subsequently, all the fundamental features and feature combinations mentioned earlier were trained and tested with each of the six classifiers.

3. Results

Our objective was to determine the recognition rate of the system, even when working with a limited number of sample images. For this purpose, we utilized 5, 10, 20, 40, 80, and 160 samples per plant for training, while the remaining images were used for testing, leaving 495, 490, 480, 460, 420, and 340 images, respectively, for this phase.

3.1. Area-Based Features

For the classification task, we selected only three parameters: weed area, hull area, and solidity. The six classifiers were employed to analyze the results. The outcomes of this approach are presented in Table 3. Initially, when only five training images were used, the recognition rate was 40.99%. However, even when the number of training images was increased to 160, the result improved modestly to 538.5%. This indicates challenges in achieving high accuracy with a limited dataset. In some cases when testing, there were classes with zero matches. In these cases, the average precision or average recall are presented as “nan” in the table.

3.2. Hu Features

Hu moments, which consist of seven features, were initially considered for our analysis. However, due to the complex shapes of weeds, Hu moments proved less effective, as they are better suited for simpler shapes. Consequently, the recognition rate with Hu moments remained slightly above 50%, even when the number of training images was increased to 160, as shown in Table 4.

3.3. Distance Transformation Features

Following distance transformation, we arranged the maximal distance transform values in descending order. Given that each distance-transformed image contained a variable number of maxima, it was essential to determine the appropriate number of maxima for comparison. Depending on the scenario, the analysis may require only one data point or multiple data points. In our approach, we assumed that the highest values held greater significance than the lower ones.

To assess the impact of the number of training images on classification performance, we conducted classification tests using 20 and 60 training images (as detailed in Table 5). Various combinations were explored to evaluate their effectiveness. Below are the results of the experiments.

As illustrated in Figure 8, the classification accuracy of the second column surpasses that of the first. However, the combination of the first three columns demonstrates promising results. In contrast, as shown in the fourth and fifth columns, with the addition of further combinations, the performance deteriorates. Therefore, it may be prudent to either utilize the second column alone or integrate the first three columns. It appears that the first three columns collectively offer slightly more information than the second column alone. Consequently, we proceeded with the first three largest distance maximum values for further analysis and classification tasks.

Table 6 presents the classifiers’ results using the three largest distance transformation values. Remarkably, even with only 10 images, the classification accuracy reached 61%. On the other hand, with 160 training images, the best precision result was still around 66%.

3.4. Shape Features: Area, H, and Distance Transformation Features Together

The results are presented in Table 7, which displays the combinations of the three shape features with the three distance transformation features. Even with only 10 training images, it was possible to achieve a 71% recognition rate, while using 160 training images yielded a favorable 79.48% precision result.

3.5. Color Histogram

The color histograms were constructed from the three RGB and three HSV channels, with each channel represented by a 32-bin histogram without background suppression. Consequently, the color histogram vector comprised 192 elements. Preliminary results indicate that a training set of merely five images achieved a classification accuracy of 76%, while increasing the training set to 160 images enhanced the accuracy to 92%, as detailed in Table 8. These findings are encouraging; however, it is critical to validate these results with a more extensive dataset and images captured under various weather conditions to ensure robust color classification.

3.6. HOG Features

The HOG features exhibited poor performance with only 10 training images. However, even with 160 training images, the results remained relatively weak, reaching only 64%, as depicted in Table 9.

3.7. GLCM Features

The analysis of GLCM features demonstrated that with 10 test images, a 65% accuracy was achieved, while with 160 training images, it reached 82%, as illustrated in Table 10. This outcome surpasses the results obtained with HOG features.

3.8. LBP Features

The third texture feature set comprised the LBP features, which yielded results similar to those obtained with GLCM. The maximum precision achieved with 160 training images was 78.2%, as depicted in Table 11.

3.9. Texture Features

The texture features, which comprised three different types of features (HOG, GLCM, and LBP combination), achieved a recognition rate of 80.5% with 160 training images. The results are listed in Table 12.

3.10. All Features Together

When all extracted features were combined, including color histograms, shape features, distance transformation features, and texture features, optimal close recognition results were achieved. As demonstrated in Table 13, a training set of only five images could potentially yield a 60% recognition rate. This rate increased to 82% with 20 training images and reached an impressive 93.49% with 160 training images (Table 13).

3.11. All Features Together except Hu Moments

When combining all extracted features, except for Hu moments, we observed nearly identical results. Although the RF method showed slightly weaker performance, the GBM exhibited a slight improvement, as shown in Table 14.

3.12. All Features except HOG Features

When all extracted features, excluding HOG features, were combined, the results improved significantly. Even in RF, we observed a precision result of 93.71% with 160 training images, while the GBM reached the maximum value of 94.56%. The results are presented in Table 15.

3.13. All Features except HU Moments and HOG Features Together

When all extracted features, excluding Hu moments and HOG features, were combined, the results were very similar to the previous one. As shown in Table 16, there was only a slight decrease. Here, with the GBM classifier, a maximum precision result of 94.35% was achieved.

3.14. Confusion Matrix

A confusion matrix was constructed for the best-case scenario, in which all available features except for the HOG features were utilized, and the GBM was trained with 160 images per weed or plant species. For this scenario, 340 images per species were retained for testing. The results are detailed in Figure 9.

3.15. Comparison of the Methods

To visualize and compare the results, we created a graph (Figure 10). The x-axis represents the classifier methods, while the y-axis shows the recognition rate. The color shows the number of training images used.

It is apparent that individual features alone were not enough to achieve satisfactory results; rather, a combination of multiple features was necessary. None of the base features, when used independently, met the desired performance criteria. Moreover, the quantity of training images played a critical role in enhancing classification accuracy. In this study, each weed species was represented by 500 images, of which 5–160 were allocated for training purposes. To rigorously assess the effectiveness of our method, future experiments should explore the use of different training set sizes, such as 300 or 500 images per weed. Should these smaller sets fail to maximize the potential of the method, it would indicate that more than 80–160 images are essential for optimal training. However, it is important to note that even the use of 160 training images per category is minimal compared to the extensive data requirements typical of neural network systems.

3.16. The Maximum Precisions by Training Image Number

To find the best classification method, we selected the three best precision results by training image number. Table 17, Table 18, Table 19, Table 20, Table 21 and Table 22 show the results. The bold highlighted columns show the maximum achieved precision results with training images per plant.

When training with only a few images, the RF classifier emerged as the optimal choice. However, as the number of images increased, the GBM classifier tended to yield superior results.

It appears that feature combinations offer the most promising outcomes, as opposed to using all features together. Notably, the inclusion of HOG features and Hu moments tended to degrade classification performance.

For datasets comprising fewer than 20 training images, it is advisable to utilize all features except Hu moments and HOG features with the RF classifier. Conversely, datasets with more than 40 training images may benefit from using all features except HOG features with the RF classifier.

In cases where a large number of images are available, such as 160, and optimal weed classification is desired, employing the GBM classifier is recommended. However, it is essential to note that GBM was slower than RF and only marginally improved the results.

It is encouraging to observe that even with as few as 10 training images, an 84% precision classifier can be developed. With approximately 40 training images, a precision of 91% was achieved. For datasets containing around 160 training images per plant, precision results approaching 95% were attained. In our study, the maximum precision achieved was 94.56%.

3.17. Comparison of the Six Classifiers

A comparison was performed using all features except HOG because this feature set yielded the best results. The Table 23 illustrates that both KNN and ANN yielded the poorest results, failing to achieve even a 60% classification accuracy despite the availability of 160 images per weed. SVM and NB performed slightly better, with SVM outperforming NB, yet still falling short of practical usability, with SVM achieving only 78.75% accuracy. In contrast, RF and GBM demonstrated promising results in our study. RF yielded a 75% accuracy even with just 5 images while achieving a remarkable 94% accuracy with 160 images per weed. Two algorithms (RF and GBM) proved to be effective for weed detection and classification purposes.

Figure 11 illustrates that the RF classifier performed best with a small number of images, while the GBM slightly outperformed it with a larger dataset. Conversely, the ANN model struggled to learn the data structure even with 160 images, and with fewer than 40 images, it appeared to make random guesses.

4. Discussion

In this study, we extracted more distinct types of features from plant images to comprehensively characterize their visual properties. These included shape descriptors, distance transform-based measures, color histograms, and three types of texture features. The integration of these diverse feature sets aimed to provide a holistic representation of the plants’ morphological structures, spatial distributions, color compositions, and textural patterns. This multifaceted approach facilitated a robust analysis and interpretation of the visual information encoded within the images.

We developed a dataset comprising images of six distinct types of weeds or plants, with each category represented by 500 images. To train six types of classifiers, we employed varying subsets of this dataset, consisting of 5, 10, 20, 40, 80, and 160 images from each category, with the remaining images reserved for testing purposes. By systematically adjusting the size of the training dataset, we aimed to assess the impact of varying levels of training data on the classifier’s performance. This methodology allowed us to evaluate the classifier’s robustness and generalization ability across different training scenarios, thereby providing valuable insights into the effectiveness of our classification approach.

With only 10 training images, the detection rate hovered around 83%. When we expanded our training dataset to include 20 to 40 images per weed category, we observed a substantial improvement in detection rates, reaching approximately 86–91%. This increase in training data facilitated a more comprehensive learning process for the classifier, enabling it to better discern subtle variations and nuances among different weed categories. Consequently, the classifier demonstrated enhanced performance and accuracy in distinguishing between weed species, underscoring the importance of sufficient but not too much training data in achieving robust classification outcomes.

In the final experiment, where we employed 160 images per weed category and utilized all available features, we achieved a notable detection result of 94.56%. Therefore, although the results are encouraging, further validation with larger and more diverse test datasets is essential to comprehensively assess the classifier’s robustness and reliability in real-world scenarios.

Our computer vision-based approach is comparable to those of CNN-based neural network approaches, with the added advantage of requiring only approximately 50–200 images per weed category for training. This requirement stands in stark contrast to the 1000–2000 images typically needed to train neural networks effectively. With our method, the proportion of training images relative to the total dataset was reduced to around 5% or a maximum of 10%. This significant reduction in training data requirements underscores the efficiency and practicality of our approach, offering a compelling alternative for weed detection and classification tasks, particularly in resource-constrained environments.

For example, Wei [19] conducted a thorough analysis of various datasets, but none were specifically tailored for weed detection. The closest dataset they used was Veg200, which focuses on vegetables but lacks specific plant and weed datasets. The Veg200 dataset comprises 200 vegetable classes and contains 91,117 images, making it suitable for training a CNN. However, in agricultural settings where robots need to operate efficiently, there are typically only 5–10 different types of plants requiring attention. In such scenarios, providing a robot with 10–20 samples of each weed type should suffice. Therefore, our research aimed to develop a solution for weed detection that could learn plant features from a limited number of sample images. Wei demonstrated the significance of fine-grained image analysis using deep learning. This approach emphasizes that not only the overall image but also the details can be crucial for classification. This methodology is important in the context of weed detection, as often, the differences lie in subtle details. Our approach inherently separates color, texture, and shape factors. In the future, we aim to further expand the features by observing leaf orientation, direction, vein texture, and edge patterns, thus enhancing the feature set.

In other work, Zhang [20] utilized drone or remote sensing data, which is similar to our dataset in the sense that the images were captured from above. They employed a combination of deep learning techniques for localization and classification, utilizing 70 categories with a vast range of 700–13,405 images per category. The training process took approximately 2 days.

Speed is crucial in weed control, especially after rainfall when weeds can proliferate rapidly. While there are numerous weed types, only a few species typically appear at any given time. This means that when new weeds emerge, we can capture and label some images. However, it is impractical to generate thousands of images per weed and label them manually. Additionally, the agricultural field cannot afford to wait for days to label images and train a network.

In future research endeavors, we aim to enhance detection results by incorporating additional shape-based features into the analysis. Potential methods could include Fourier descriptors, Zernike moments, or deep learning-based feature extraction approaches. These techniques offer greater flexibility and adaptability to accommodate the complex shape variations inherent in plant imagery. Additionally, we intend to explore the integration of more texture-based features to further enrich the feature representation and improve the classifier’s ability to discriminate between different weed species. By leveraging a broader range of feature descriptors, we anticipate advancing the accuracy and robustness of our weed detection and classification algorithms, ultimately facilitating more effective weed management strategies in agricultural settings.

5. Conclusions

Selecting an appropriate CV method for classification tasks significantly depends on the size and availability of the labeled image dataset. If there is an abundance of labeled images, it is recommended to use approximately 2–3 thousand or more per weed category, utilizing CNN-based neural network models. However, if the available dataset contains fewer than a hundred images per category, it is pragmatic to initially employ traditional feature-based computer vision methods.

When constructing the dataset, particularly if images are collected via automated methods such as robots or drones, one can amass thousands of images. These images, however, require extensive labeling and segmentation from the background, demanding significant plant expert hours. In such scenarios, we propose the use of traditional computer vision techniques to initially extract individual plants and weeds. This process should be followed by involving a plant expert to select approximately 40 representative samples for each class.

At this juncture, the methodologies described in this study can be applied to classify an additional 100 images per weed category. The experts only need to verify and correct the misclassified images, typically around 15% of the dataset. Following this correction process, the classification scripts can be re-run, this time utilizing several thousands of images, achieving classification accuracies of approximately 95–96%. Consequently, each class directory will predominantly contain high-quality images, with about 4–5% that are inaccurately classified. These erroneous images can be swiftly identified and corrected by a plant expert, thereby ensuring a high-quality and sizable dataset suitable for training both SVM- and CNN-based classification networks.

Author Contributions

L.M.: writing—original draft preparation; P.Á.M.: writing—review and editing; G.T.: writing—review and editing; A.N.: writing—supervision. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Széchenyi István University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Acknowledgments

This research was carried out by Precision Bioengineering Research Group, supported by Széchenyi István University Foundation.

Conflicts of Interest

Author Péter Ákos Mesterházi was employed by the company Leader of precision agriculture team, Axiál Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Partel, V.; Charan Kakarla, S.; Ampatzidis, Y. Development and Evaluation of a Low-Cost and Smart Technology for Precision Weed Management Utilizing Artificial Intelligence. Comput. Electron. Agric. 2019, 157, 339–350. [Google Scholar] [CrossRef]
Hasan, A.S.M.M.; Diepeveen, D.; Laga, H.; Jones, M.G.K.; Sohel, F. Object-Level Benchmark for Deep Learning-Based Detection and Classification of Weed Species. Crop Prot. 2024, 177, 106561. [Google Scholar] [CrossRef]
Dandekar, Y.; Shinde, K.; Gangan, J.; Firdausi, S.; Bharne, S. Weed Plant Detection from Agricultural Field Images Using YOLOv3 Algorithm. In Proceedings of the 2022 6th International Conference On Computing, Communication, Control and Automation ICCUBEA, Pune, India, 26–27 August 2022; pp. 1–4. [Google Scholar] [CrossRef]
Deng, B.; Lu, Y.; Xu, J. Weed Database Development: An Updated Survey of Public Weed Datasets and Cross-Season Weed Detection Adaptation. Ecol. Inform. 2024, 81, 102546. [Google Scholar] [CrossRef]
Rai, N.; Mahecha, M.V.; Christensen, A.; Quanbeck, J.; Zhang, Y.; Howatt, K.; Ostlie, M.; Sun, X. Multi-Format Open-Source Weed Image Dataset for Real-Time Weed Identification in Precision Agriculture. Data Brief 2023, 51, 109691. [Google Scholar] [CrossRef] [PubMed]
Alam, M.; Alam, M.S.; Roman, M.; Tufail, M.; Khan, M.U.; Khan, M.T. Real-Time Machine-Learning Based Crop/Weed Detection and Classification for Variable-Rate Spraying in Precision Agriculture. In Proceedings of the 2020 7th International Conference on Electrical and Electronics Engineering (ICEEE), Antalya, Turkey, 14–16 April 2020; pp. 273–280. [Google Scholar] [CrossRef]
Giselsson, T.; Midtiby, H.; Jørgensen, R. Seedling Discrimination with Shape Features Derived from a Distance Transform. Sensors 2013, 13, 5585–5602. [Google Scholar] [CrossRef] [PubMed]
Nguyen Thanh Le, V.; Apopei, B.; Alameh, K. Effective Plant Discrimination Based on the Combination of Local Binary Pattern Operators and Multiclass Support Vector Machine Methods. Inf. Process. Agric. 2019, 6, 116–131. [Google Scholar] [CrossRef]
Ramirez-Paredes, J.-P.; Hernandez-Belmonte, U.-H. Visual Quality Assessment of Malting Barley Using Color, Shape and Texture Descriptors. Comput. Electron. Agric. 2020, 168, 105110. [Google Scholar] [CrossRef]
Ahmed, F.; Kabir, H.; Bhuyan, S.; Bari, H.; Hossain, E. Automated Weed Classification with Local Pattern-Based Texture Descriptors. Int. Arab J. Inf. Technol. 2014, 11, 87–94. [Google Scholar]
Bhunia, A.K.; Bhattacharyya, A.; Banerjee, P.; Roy, P.P.; Murala, S. A Novel Feature Descriptor for Image Retrieval by Combining Modified Color Histogram and Diagonally Symmetric Co-Occurrence Texture Pattern. Pattern Anal. Appl. 2020, 23, 703–723. [Google Scholar] [CrossRef]
Hamuda, E.; Mc Ginley, B.; Glavin, M.; Jones, E. Automatic Crop Detection under Field Conditions Using the HSV Colour Space and Morphological Operations. Comput. Electron. Agric. 2017, 133, 97–107. [Google Scholar] [CrossRef]
Zhu, H.; Zhang, Y.; Mu, D.; Bai, L.; Wu, X.; Zhuang, H.; Li, H. Research on Improved YOLOx Weed Detection Based on Lightweight Attention Module. Crop Prot. 2024, 177, 106563. [Google Scholar] [CrossRef]
Bakhshipour, A.; Jafari, A. Evaluation of Support Vector Machine and Artificial Neural Networks in Weed Detection Using Shape Features. Comput. Electron. Agric. 2018, 145, 153–160. [Google Scholar] [CrossRef]
Abouzahir, S.; Sadik, M.; Sabir, E. Bag-of-Visual-Words-Augmented Histogram of Oriented Gradients for Efficient Weed Detection. Biosyst. Eng. 2021, 202, 179–194. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, Z.; Wu, C.; Sun, L. Segmentation Algorithm for Overlap Recognition of Seedling Lettuce and Weeds Based on SVM and Image Blocking. Comput. Electron. Agric. 2022, 201, 107284. [Google Scholar] [CrossRef]
Ahmad, J.; Muhammad, K.; Ahmad, I.; Ahmad, W.; Smith, M.L.; Smith, L.N.; Jain, D.K.; Wang, H.; Mehmood, I. Visual Features Based Boosted Classification of Weeds for Real-Time Selective Herbicide Sprayer Systems. Comput. Ind. 2018, 98, 23–33. [Google Scholar] [CrossRef]
Sunil, G.C.; Zhang, Y.; Koparan, C.; Ahmed, M.R.; Howatt, K.; Sun, X. Weed and Crop Species Classification Using Computer Vision and Deep Learning Technologies in Greenhouse Conditions. J. Agric. Food Res. 2022, 9, 100325. [Google Scholar] [CrossRef]
Wei, X.-S.; Song, Y.-Z.; Mac Aodha, O.; Wu, J.; Peng, Y.; Tang, J.; Yang, J.; Belongie, S. Fine-Grained Image Analysis with Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8927–8948. [Google Scholar] [CrossRef]
Zhang, C.; Liu, T.; Xiao, J.; Lam, K.-M.; Wang, Q. Boosting Object Detectors via Strong-Classification Weak-Localization Pretraining in Remote Sensing Imagery. IEEE Trans. Instrum. Meas. 2023, 72, 1–20. [Google Scholar] [CrossRef]

Figure 1. The 6 classes: (a) bluegrass sample; (b) white goosefoot sample; (c) lettuce sample; (d) sedge sample; (e) corn sample; (f) Cirsium setosum sample.

Figure 2. The image processing steps: (a) original image; (b) green mask; (c) artificial coloring of the regions; (d) the largest green object mask; (e) the mask was cut and resized to 256 × 256 pixels; (f) the 256 × 256-pixel mask was used on the image to have the largest plant image.

Figure 3. The main steps of image processing: from RGB images to features.

Figure 4. The weed contour: (a) mask contour; (b) convex hull area.

Figure 5. The distance transformation method: (a) the weed distance transformation map; (b) the local maximums and the values as radius.

Figure 6. The HSV color space histograms.

Figure 7. Image features and feature combinations.

Figure 8. Visualization of the optimum distance transformation features. The first, second, and third features together show the optimum.

Figure 9. Confusion matrix (green: correct recognition; red: incorrect recognition).

Figure 10. Comparison of the classification methods.

Figure 11. The precision results using all features except HOG features by training image number and classifier method.

Table 1. Texture feature types and the features’ vector size.

Texture Feature Types	Vector Size
GLCM features	72
HOG features	3780
LPB features	26

Table 2. The 6 classifier abbreviations and names.

Abbreviation	Method Name
SVM	Support Vector Machine
RF	Random Forest
KNN	k-Nearest Neighbors
ANN	Artificial Neural Network
NB	Naïve Bayes
GBM	Gradient Boosting Machine

Table 3. Area-based features’ results.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	32.80	37.08	34.57	41.82	41.87	43.06
SVM	recall	28.25	34.39	35.24	40.87	40.63	42.50
RF	precision	40.99	43.33	46.72	49.50	51.63	52.50
RF	recall	38.92	42.18	46.11	48.37	50.28	51.13
KNN	precision	33.66	39.06	41.47	48.22	47.22	46.19
KNN	recall	30.67	38.40	40.90	45.22	45.44	45.25
ANN	precision	nan	28.04	nan	40.00	39.74	39.88
ANN	recall	16.84	21.84	21.46	34.06	37.50	36.37
NB	precision	37.00	nan	nan	nan	nan	nan
NB	recall	36.84	38.16	42.43	42.21	43.57	44.41
GBM	precision	38.23	45.30	43.23	48.35	50.54	53.85
GBM	recall	37.10	43.44	42.40	47.07	48.97	51.91

Table 4. Hu moment results.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	40.68	41.03	38.76	nan	nan	48.02
SVM	recall	25.42	24.46	33.54	34.09	37.50	41.47
RF	precision	31.76	41.38	40.29	44.97	48.49	50.92
RF	recall	29.23	38.47	39.27	44.06	48.10	49.75
KNN	precision	nan	35.93	40.58	42.00	43.12	47.68
KNN	recall	26.03	32.31	36.42	40.91	42.54	46.47
ANN	precision	40.14	46.86	42.13	nan	51.17	52.94
ANN	recall	34.55	40.61	39.31	41.81	47.22	48.68
NB	precision	39.18	43.88	42.45	44.20	39.46	42.46
NB	recall	35.56	37.55	35.69	37.86	36.67	33.14
GBM	precision	29.67	36.57	38.33	43.02	46.61	49.99
GBM	recall	27.41	34.49	36.98	41.81	45.87	48.63

Table 5. Some distance transformation parameter combinations to find the optimum features with 20 and 60 training images (1 is the largest value, 2 is the second largest, and so on).

Columns	1	1, 2	1, 2, 3	1, 2, 3, 4	1, 2, 3, 4, 5	2	2, 3	1
20 training images	71.25%	77.19%	77.5%	73.44%	72.81%	76.88%	77.19%	71.25%
60 training images	70%	75.63%	75.63%	75.63%	73.75%	79.38%	74.38%	70%

Table 6. The distance transformation features’ results.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	53.69	55.89	55.60	57.27	59.18	59.51
SVM	recall	52.29	55.34	55.83	58.15	60.20	60.69
RF	precision	60.65	61.08	57.69	60.55	62.45	63.57
RF	recall	60.00	59.49	57.29	60.80	62.58	63.97
KNN	precision	54.89	56.85	58.45	60.70	64.86	65.69
KNN	recall	57.10	56.77	58.68	61.20	64.25	65.44
ANN	precision	50.35	58.43	nan	58.88	63.79	66.08
ANN	recall	48.92	55.82	22.53	57.50	63.81	65.88
NB	precision	57.95	61.65	61.54	62.61	63.70	63.06
NB	recall	56.77	60.34	61.25	63.26	64.21	63.58
GBM	precision	55.67	60.28	57.98	60.56	61.10	63.38
GBM	recall	53.06	56.84	56.63	60.14	60.95	63.38

Table 7. The shape features: the results of area, Hu, and distance transformation features’ combination.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	48.00	59.97	66.27	69.05	71.55	70.07
SVM	recall	43.16	54.59	63.40	67.72	70.95	69.17
RF	precision	64.42	71.41	72.69	74.79	76.52	78.68
RF	recall	63.74	70.82	72.64	74.78	76.59	78.53
KNN	precision	33.56	48.78	51.91	56.04	58.29	60.41
KNN	recall	31.55	47.93	52.40	55.58	58.29	59.71
ANN	precision	nan	nan	nan	nan	47.27	48.63
ANN	recall	19.49	30.34	19.13	30.62	39.05	44.75
NB	precision	65.78	70.22	70.41	72.61	70.09	71.48
NB	recall	62.53	67.82	68.06	71.16	62.62	64.41
GBM	precision	66.64	67.56	70.92	72.32	75.74	79.44
GBM	recall	65.08	63.84	69.38	71.78	75.71	79.31

Table 8. The color histogram features’ results.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	52.78	69.97	73.44	73.25	77.68	81.24
SVM	recall	47.37	60.95	65.14	72.39	77.34	80.93
RF	precision	76.19	82.16	84.24	87.52	89.76	91.18
RF	recall	76.53	80.88	83.92	87.43	89.60	91.03
KNN	precision	50.58	61.19	70.91	77.79	81.52	84.73
KNN	recall	47.21	58.10	68.37	76.56	81.03	83.92
ANN	precision	65.47	72.10	79.33	85.12	87.82	91.06
ANN	recall	65.42	71.22	79.13	85.04	87.74	90.98
NB	precision	58.06	72.57	69.42	75.87	77.68	75.97
NB	recall	52.36	71.77	69.97	75.36	76.11	72.55
GBM	precision	57.15	72.04	78.11	86.32	89.68	92.13
GBM	recall	57.14	70.51	76.88	85.94	89.33	91.96

Table 9. HOG features’ results.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	42.05	45.46	48.73	52.60	55.42	60.42
SVM	recall	40.27	44.69	47.36	51.99	54.68	60.15
RF	precision	41.25	46.23	50.78	56.12	59.24	64.13
RF	recall	40.10	45.51	50.45	56.20	58.93	64.17
KNN	precision	38.01	41.29	44.64	49.06	51.93	56.14
KNN	recall	35.82	38.20	40.83	45.11	48.29	51.47
ANN	precision	33.28	36.76	40.35	43.85	47.25	57.98
ANN	recall	33.91	36.16	40.07	44.38	47.46	58.09
NB	precision	nan	32.33	34.47	37.22	43.05	49.71
NB	recall	17.34	19.05	27.36	35.69	44.56	49.85
GBM	precision	28.47	31.65	41.41	50.80	53.66	62.54
GBM	recall	28.69	31.46	41.18	48.80	52.14	61.76

Table 10. GLCM features’ results.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	36.48	40.31	58.27	69.62	77.37	82.06
SVM	recall	33.60	40.07	57.71	69.28	76.55	81.62
RF	precision	54.72	65.41	66.53	72.71	75.95	77.56
RF	recall	55.22	61.60	63.82	72.21	75.28	77.35
KNN	precision	24.62	34.60	37.43	38.37	40.59	44.83
KNN	recall	25.99	32.41	34.55	37.36	40.24	44.17
ANN	precision	17.15	42.36	44.65	35.79	49.34	54.05
ANN	recall	15.45	42.35	29.90	37.17	47.38	48.33
NB	precision	54.77	61.67	62.09	58.64	59.69	59.21
NB	recall	54.75	58.64	58.96	58.73	59.29	58.87
GBM	precision	50.28	60.28	67.56	73.34	76.17	78.62
GBM	recall	50.91	58.67	65.76	72.79	75.56	78.43

Table 11. LBP features’ results.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	51.11	nan	44.81	45.61	43.13	46.92
SVM	recall	51.11	46.43	47.01	44.24	44.56	47.21
RF	precision	60.12	63.80	68.85	71.04	76.32	78.20
RF	recall	56.50	61.36	68.51	70.54	75.99	77.94
KNN	precision	47.17	50.11	51.05	56.90	59.93	64.53
KNN	recall	47.85	50.10	51.01	57.36	60.12	64.51
ANN	precision	52.65	58.04	56.01	57.74	61.72	70.20
ANN	recall	51.55	57.65	56.46	56.34	61.47	70.15
NB	precision	53.71	58.41	55.06	55.48	55.51	56.43
NB	recall	50.24	56.90	55.14	54.64	54.17	54.80
GBM	precision	56.68	61.64	64.24	69.12	74.47	77.74
GBM	recall	54.92	59.83	62.81	68.48	74.13	77.45

Table 12. The 3 texture features: HOG, GLCM, and LBP combination results.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	36.84	42.88	61.70	69.30	74.32	79.43
SVM	recall	34.34	42.31	60.66	68.70	73.77	79.17
RF	precision	52.32	60.60	64.28	68.74	71.65	74.32
RF	recall	51.38	61.05	63.58	68.19	70.87	73.97
KNN	precision	24.62	34.63	37.43	38.37	40.55	44.94
KNN	recall	25.99	32.45	34.55	37.36	40.20	44.31
ANN	precision	41.14	45.21	50.71	59.92	66.59	74.79
ANN	recall	41.58	44.90	50.63	60.00	66.90	74.61
NB	precision	33.02	34.73	34.93	42.67	47.94	56.01
NB	recall	18.22	23.71	34.20	42.61	48.69	54.90
GBM	precision	38.74	43.22	60.31	73.35	76.29	80.50
GBM	recall	39.12	40.03	59.48	72.28	75.83	80.29

Table 13. The results of color histogram, shape, and texture features’ combination.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	41.83	57.78	67.86	73.21	74.33	79.89
SVM	recall	41.28	54.76	66.22	71.81	73.02	79.26
RF	precision	60.02	73.70	82.76	86.53	89.93	92.08
RF	recall	59.02	73.23	82.64	86.30	89.80	92.01
KNN	precision	33.63	44.38	46.40	52.02	54.37	58.60
KNN	recall	36.26	43.06	43.26	50.43	53.53	57.79
ANN	precision	51.44	nan	27.43	46.57	62.69	68.59
ANN	recall	46.90	15.85	25.17	39.09	57.90	65.78
NB	precision	60.20	63.73	66.90	67.12	67.75	69.68
NB	recall	58.82	62.04	65.42	65.58	64.05	65.10
GBM	precision	39.95	64.63	73.54	89.40	91.66	93.49
GBM	recall	39.70	64.42	71.77	89.28	91.51	93.38

Table 14. The results of combining all features except Hu moments.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	41.83	57.78	67.80	73.04	74.00	79.48
SVM	recall	41.28	54.76	66.08	71.56	72.82	78.82
RF	precision	66.27	75.92	82.57	87.26	89.72	91.71
RF	recall	64.58	75.78	82.15	87.07	89.60	91.67
KNN	precision	33.63	44.38	46.40	52.02	54.37	58.60
KNN	recall	36.26	43.06	43.26	50.43	53.53	57.79
ANN	precision	45.15	48.20	35.84	61.64	57.47	57.29
ANN	recall	39.87	45.99	24.41	58.44	51.63	52.50
NB	precision	59.24	62.91	65.40	65.89	67.05	69.22
NB	recall	58.32	61.56	64.72	64.93	65.04	65.59
GBM	precision	40.47	62.27	73.73	89.08	91.88	93.51
GBM	recall	40.17	62.04	71.91	88.91	91.71	93.38

Table 15. The results of combining all features except HOG features.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	41.86	59.21	66.54	70.91	74.26	78.75
SVM	recall	41.31	56.09	64.51	69.89	72.98	77.84
RF	precision	75.43	83.04	86.16	91.20	92.85	93.71
RF	recall	75.29	82.18	86.18	91.16	92.82	93.68
KNN	precision	33.63	44.38	46.40	52.02	54.37	58.60
KNN	recall	36.26	43.06	43.26	50.43	53.53	57.79
ANN	precision	39.09	nan	20.08	44.36	43.93	50.08
ANN	recall	25.52	23.91	13.72	43.91	41.98	53.73
NB	precision	59.88	61.04	62.65	63.34	62.82	63.26
NB	recall	57.61	57.96	59.58	62.57	60.71	61.37
GBM	precision	66.22	74.03	83.00	90.46	92.81	94.56
GBM	recall	39.70	64.42	71.77	89.28	91.51	93.38

Table 16. The results of combining all features except HU moments and HOG features.

Training Image Number		5	10	20	40	80	160
Test Image Number		495	490	480	460	420	340
SVM	precision	41.86	59.21	66.56	71.66	73.96	78.99
SVM	recall	41.31	56.09	64.62	70.62	72.90	77.94
RF	precision	76.86	84.33	86.53	90.45	92.72	93.77
RF	recall	76.46	83.98	86.53	90.43	92.70	93.73
KNN	precision	33.63	44.38	46.40	52.02	54.37	58.60
KNN	recall	36.26	43.06	43.26	50.43	53.53	57.79
ANN	precision	35.08	47.79	22.86	56.79	48.53	65.19
ANN	recall	28.86	44.52	23.16	49.67	44.76	59.41
NB	precision	58.32	59.60	61.14	61.73	63.24	63.22
NB	recall	56.57	57.35	59.79	62.25	63.69	63.68
GBM	precision	66.57	74.58	81.99	90.30	92.75	94.35
GBM	recall	67.00	72.99	81.28	90.04	92.62	94.26

Table 17. The three best results with five training images.

Features	Classifier	5	10	20	40	80	160
Color	RF	76.19	82.16	84.24	87.52	89.76	91.18
All exp. Hu, HOG	RF	76.86	84.33	86.53	90.45	92.72	93.77
All exp. HOG	RF	75.43	83.04	86.16	91.20	92.85	93.71