This section begins by outlining the general workflow for ML- and DL-based approaches. It is followed by an evaluation of some datasets. Finally, a performance analysis of various ML and DL techniques is presented.
4.1. General Flow Process of Machine Learning and Deep Learning Systems
ML and DL projects follow structured workflows, outlined in
Figure 2 and
Figure 3 [
21]. These workflows comprise distinct phases, including data acquisition, preprocessing, feature/instance selection, parameter optimization, and model building/testing. Specifically, ML workflows involve data acquisition, preprocessing, feature selection, instance selection, parameter optimization, and model building (encompassing training, splitting, testing, and validation). In contrast, DL workflows consist of data acquisition, preprocessing, splitting, hyperparameter tuning, transfer learning, model training/testing, and deployment. The subsequent section provides a detailed explanation of each phase in the ML and DL workflows.
4.1.1. Data Acquisition and Preprocessing
Data acquisition in both ML and DL involves gathering data from diverse sources such as private or public datasets. Whether for ML or DL, these datasets can include text, images, videos, and sensor data, tailored to the problem being addressed. Common data sources like remote sensing technologies—using satellite and drone imagery—provide vast amounts of data, especially for agriculture or environmental monitoring. For ML, obtaining private datasets manually often results in smaller sets, which can hinder model accuracy. DL, on the other hand, generally requires significantly larger datasets to train its models effectively due to the complexity of the neural networks involved. Both methods follow similar steps for data collection, but DL requires far more data to achieve robust performance, particularly in tasks such as image or speech recognition.
Despite the shared approach, key differences exist between ML and DL in the treatment and handling of data. While both involve processes such as data extraction, validation, and quality assurance, DL places a stronger emphasis on large-scale data integration and often involves more complex preprocessing steps due to the intricate structure of deep networks. DL models can automatically learn features from raw data, reducing the need for manual feature extraction, which is a critical step in ML. Furthermore, storage and management of data are also more intensive in DL, given the sheer volume of data required to train deep networks, highlighting the need for more scalable and efficient data handling mechanisms compared to traditional ML methods.
Data preprocessing is essential for both ML and DL algorithms, as it enhances the quality and structure of the data before feeding them into models. Common preprocessing steps in both ML and DL include noise removal, feature selection, feature extraction, and segmentation, with the primary goal of improving model performance. In ML, preprocessing focuses on manual feature extraction and selection to highlight key patterns and relationships, which models rely on for accurate predictions. This step is crucial since ML models cannot autonomously detect features from raw data. Additionally, data are typically split into training, validation, and testing sets, ensuring that models are evaluated effectively on unseen data while avoiding bias from ordered data or improper sampling.
In DL, data preprocessing shares similarities with ML but differs in some significant aspects. While both approaches involve dataset partitioning, DL focuses more on data augmentation, where techniques like rotating, flipping, or scaling images increase dataset size and variability, thus improving generalization. Unlike ML, DL models have the advantage of automatically extracting features during training, reducing the need for manual intervention in feature selection. Additionally, DL models often require adjustments to input data size to align with model requirements, such as resizing images to match the input dimensions of CNNs. Data preprocessing in DL, therefore, aims to prepare large, varied datasets that enhance the model’s ability to learn complex patterns autonomously, with less emphasis on manual feature engineering.
4.1.2. Feature Selection
The objectives of feature selection, as highlighted by [
16], are to identify and retain a subset of relevant features from a larger pool of available features while discarding irrelevant or noisy data. This process aims to streamline future analysis tasks by focusing only on the most informative and discriminative features. The authors of [
22] emphasize that feature selection contributes to enhancing classification outcomes by identifying and prioritizing essential characteristics that contribute to reducing computational complexity. Models trained on a reduced feature set tend to exhibit greater robustness and reproducibility compared to those trained on a larger feature set. This underscores the importance of feature selection in optimizing the performance and efficiency of ML algorithms.
Feature selection in crop disease detection and classification is crucial for building accurate and efficient models. Different features can be used to train ML and DL algorithms, including image-based features, texture features, shape descriptors, local binary patterns (LBPs), and segmentation features, to mention a few. Image-based features include color histograms, which analyze the distribution of colors in images to capture color-based information about the diseased regions. Texture-based feature selection techniques are used to extract features related to texture patterns, which can be indicative of specific diseases. Shape descriptors are used to extract features related to the shape of lesions or affected areas. LBPs are effective for texture analysis and can be used to differentiate between healthy and diseased regions. Segmentation techniques are used for identifying and isolating regions of interest within the images, and superpixel-based segmentation involves grouping of pixels into perceptually meaningful clusters, which can serve as features.
4.1.3. Hyperparameter Tuning
Hyperparameter tuning/optimization is an iterative process of finding the best set of hyperparameters for an ML model to achieve optimal performance. It is also referred to as configurations that are not learned from the data but rather need to be set prior to training. This process involves defining and selecting hyperparameter search space, choosing a search strategy (Bayesian optimization, genetic algorithm, random, and grid search), evaluating matric definition, data splitting for validation, performing hyperparameter search and selecting the best, evaluating on the test set, final modeling, deployment, and monitoring.
4.1.4. Transfer Learning
Transfer learning utilizes pre-existing knowledge and applies it to the specific problem at hand. This entails utilizing pre-trained models for image classification/detection on a large dataset, which are subsequently adjusted to fit the specific dataset being examined. The frequently preferred approach is replacing the last layers of the pre-trained network to customize it for a distinct classification objective. During the training process, only the recently added layers are set to be trainable, while the remaining layers are frozen. This training procedure is often accompanied by fine-tuning, particularly when dealing with a small training dataset that lacks sufficient samples for training a DL model from the ground up.
4.1.5. Instance Selection and Parameter Optimization
Instance selection, which is also referred to as instance-based learning, is termed as an ML technique used to re-size image sizes of a dataset while preserving its important characteristics. This is achieved by selecting a subset of instances from the original dataset to be used for model training aimed at improving the model’s performance or reducing computational cost. This methodology is employed to reduce computational complexity, improve model generalization, reduce noise, and handle imbalanced datasets. The authors of [
23] employed instance segmentation for their strawberry images in designing a higher-capacity model that was more robust and generalizable. Random sampling, stratified sampling, distance-based selection, cluster-based selection, greedy algorithms, model-based selection, and active learning are techniques involved in instance selection.
Parameter optimization, known as hyperparameter tuning, is among the important steps in the ML pipeline, which involves finding the best set of hyperparameters for a given model and dataset. A hyperparameter is a setting or configuration that is not learned from the data but rather set prior to training; hence, they are known to govern the behavior of the learning algorithm. The steps involved in parameter optimization are hyperparameter definition, selection of a hyperparameter search space, choosing a search strategy (grid, random, Bayesian optimization, genetic algorithms), evaluation matric definition, split data for validation, performance of a hyperparameter search, selection of the best Bayesian hyperparameter, test dataset evaluation, final model training, and deployment and monitoring. Parameter optimization is also known to be an iterative process.
4.1.6. Model Building, Training, and Validation
Model building in both ML and DL involves selecting the appropriate algorithm, preparing data, and iterating through different configurations to create an optimal model. For ML, this includes data preparation, feature engineering, model selection, hyperparameter tuning, and iterative refinement. After selecting the best model, retraining using both training and validation data ensures better generalization. In DL, the process is similar but places a stronger emphasis on adjusting model architectures, like neural networks, to optimize the learning process. For both ML and DL, hyperparameter tuning, regularization, and model evaluation help in refining the model to achieve better performance. However, DL typically involves deeper layers and requires more computational resources during model building compared to ML.
Model training, validation, and testing serve critical roles in evaluating the generalization ability of models in both ML and DL. For ML, data are split into training and validation sets to monitor performance and compute evaluation metrics such as accuracy and F1 score. Hyperparameter tuning plays a key role in both ML and DL, allowing adjustments for improving performance. DL training follows a similar iterative process but typically involves more complex architectures and optimization techniques. Model validation, whether through train–test splits or cross-validation, is crucial for both ML and DL, ensuring the models generalize well on unseen data. Finally, model testing provides the final assessment of new data, highlighting how well both ML and DL models perform in real-world scenarios.
4.1.7. Model Deployment
Deploying an ML model involves integrating it into real-world applications, such as web apps, APIs, mobile apps, or existing software, to make it accessible and usable. This process encompasses several key steps, including model serialization, API creation, endpoint setup, data processing, error handling, security measures, authentication, model initialization, testing, monitoring, and ongoing maintenance, ultimately enabling the model to be efficiently loaded and utilized in various environments.
4.2. Datasets Analysis
This section includes a detailed analysis of the PlantVillage dataset, which was chosen for this study due to its reliability and suitability for general comparisons. Most of the models examined in this research were trained on the PlantVillage dataset to ensure consistency and fairness in comparison. This choice allows for a standardized benchmark across the different studies evaluated.
Plant datasets are often made up of images or other data relating to diverse plant species. These datasets are developed to aid in the advancement of research and development in sectors such as agriculture, botany, plant pathology, and computer vision. Plant datasets can be used for tasks such as plant recognition, disease detection, growth monitoring, and classification of plant species.
The applications of plant datasets are diverse because researchers and data scientists can use these datasets to train ML and DL models for plant disease identification, plant species classification, or anomaly detection. By analyzing the images and associated data, valuable insights can be gained to improve crop management, develop targeted pest control strategies, and enhance overall agricultural practices.
4.2.1. PlantVillage Dataset
The PlantVillage dataset is a widely recognized public resource utilized in the fields of agriculture and crop disease identification. It comprises images depicting a range of crops afflicted by different diseases. Over the years, multiple iterations of the PlantVillage dataset have been developed, each featuring distinct attributes and improvements. The initial version of the dataset was established by the PlantVillage initiative, spearheaded by Dr. David Hughes at Penn State University. It features images of both diseased and healthy plants from a variety of crops, documented under diverse conditions and environments. It encompasses a range of crops, such as tomatoes, potatoes, and grapes, each exhibiting distinct specific diseases.
An initial iteration of PlantVillage, released in 2016, included numerous images depicting both damaged and healthy plants, thereby offering a broad spectrum of samples. This dataset underwent enhancements in 2017, resulting in an updated version characterized by superior data quality, including higher-resolution images and enhanced labeling accuracy. The year 2018 saw the introduction of another version of the PlantVillage dataset, which incorporated additional classes and advancements in data curation practices. In 2019, a further refined iteration of the dataset was developed, featuring even more diverse samples and enhanced data preprocessing techniques. In 2020, an updated iteration of the PlantVillage dataset was released, reflecting some of the most recent enhancements in data collection and curation methodologies.
The PlantVillage dataset, which is publicly accessible, contains over 54,303 annotated images depicting both healthy and diseased leaves of various crops. These images were gathered under controlled environmental conditions. This extensive collection was assembled by researchers at Penn State University and features photographs of 14 distinct crop species, including apple, blueberry, cherry, grape, orange, peach, pepper, potato, raspberry, soybean, squash, strawberry, and tomato. The dataset encompasses examples of seventeen major diseases, which include four bacterial infections, two fungal diseases, two viral infections, and one condition caused by mites. Furthermore, among the 12 crop species represented, there are images of healthy leaves that show no visible signs of disease.
The PlantVillage dataset, as noted by the authors in [
24], encompasses more than 38 distinct diseases and conditions. The images within this dataset were sourced from various origins and meticulously annotated by specialists in plant pathology. Each image is categorized by the specific plant species, the nature of the disease or condition affecting the leaf, and a severity score that reflects the extent of the damage. The PlantVillage dataset can be downloaded from its official website at [
24].
The authors of [
24] evaluated the quality of the PlantVillage dataset, which was trained utilizing the AlexNet model and achieved a classification accuracy of 99.35%. Furthermore, they emphasized the importance of expanding the training data to improve accuracy. It was noted that obtaining new images from diverse perspectives is essential. A summary of the PlantVillage dataset is provided in
Table 2.
The PlantVillage dataset, as illustrated in
Table 1, encompasses a variety of crops, such as apple, blueberry, corn, cherry, grape, orange, peach, pepper (bell), potato, raspberry, soybean, strawberry, squash, and tomato. The distribution of infected and uninfected images within the dataset is depicted in
Figure 4 and
Figure 5. The data indicate that tomato has the largest quantity of images, with soybean and orange ranking just after it. Furthermore, the dataset exhibits a significantly higher quantity of healthy images in comparison to unhealthy ones, suggesting an imbalance that undermines the potential for developing a classifier with robust generalization capabilities. Notably, raspberry has the fewest diseased images, while cherry is characterized by the lowest count of healthy images. This imbalance indicates that the dataset is inadequate for constructing a DL model. Nevertheless, crops that are adequately represented within the dataset can be utilized to create a classifier. For instance, tomatoes present a viable option for developing an effective classifier. The ratio of healthy to diseased images for other crops can be enhanced through the application of data augmentation techniques. Furthermore, efforts should be made to increase the number of images for crops with limited representation, such as cherry and raspberry. Additionally, incorporating widely cultivated crops like rice and wheat into the dataset could further enrich its diversity.
Figure 6 depicts an examination of crop diseases using data derived from the PlantVillage dataset. The figure indicates that most images within the dataset are categorized as fungi and bacteria. Notably, the quantity of fungi images far exceeds that of bacteria images, and it is also considerably greater than the counts of virus, mite, and mold images. This observation highlights the uneven distribution of the dataset. Additionally, it is worth noting that the images representing mites and molds are the least prevalent. It is imperative for researchers to thoroughly evaluate this information before utilizing the dataset. The images of mold and mites may be excluded from the dataset before they are employed to train a DL model. Additionally, data augmentation techniques can be applied to increase the number of images representing mold and mites. Furthermore, reducing the number of images of fungi and bacteria can facilitate a more balanced distribution of images across all categories. Implementing these strategies may enhance the efficacy of crop detection methodologies.
4.2.2. Other Datasets
Aside from PlantVillage, the AI Challenger dataset [
25] provides a diverse range of images for plant disease detection, annotated with multiple disease types. This dataset covers different crops and environmental conditions, offering a valuable resource for training models that require high variability in input data. It is especially useful for multi-label classification tasks, enabling researchers to fine-tune their models for different plant species and disease types. The IP102 dataset [
26] focuses on insect pest detection, containing over 75,000 images across 102 species of insects. This dataset is highly valuable for developing models that address pest-related crop diseases. Cassava Leaf Disease [
26] is another dataset available on Kaggle, aimed at identifying diseases in cassava plants. It consists of around 21,000 labeled images and provides an opportunity to work with real-world agricultural problems, especially in regions where cassava is a staple crop. Moreover, the authors of [
27] collected some datasets from Bangladesh. The dataset consists of nine classes of rice diseases, with each class consisting of 100 images. These datasets contribute to advancing AI-driven agricultural solutions.
4.3. Performance Analysis of ML-Based Crop Disease Techniques
Figure 7 shows the summary of studies that developed ML-based techniques for crop detection diagnosis. Most of the studies featured in this analysis used the PlantVillage dataset. As shown in the figure, many ML algorithms have been used to develop improved techniques for crop disease diagnosis, including SVM, KNN, RF, ELM, Artificial Neural Network (ANN), CNN, Decision Tree (DT), and RNN, among others. Most of the research used SVM, KNN, CNN, and RF in disease identification and classification since they can be used in numerous combinations and tailored to individual crops and diseases. Furthermore, the method of choosing is influenced by aspects such as the type and quality of data available, computational resources, and the unique requirements of the situation at hand. This section examines the performance of studies that employed SVM, KNN, RF, and other ML methods. The three algorithms were chosen because they are the most often utilized ML algorithms in the literature. In addition, to ensure fairness, we present the performance analysis of research that used the same dataset.
4.3.1. Performance Analysis of SVM-Based Crop Detection Techniques
The SVM classifier is the most popular model used in the literature. SVM can be utilized for binary and multivariate classification, particularly when there are several predictions. This is reinforced by its well-known capability for dealing with many categorical and continuous data elements. SVM can be used to classify entities into specific groups. It can also be used to classify instances that are not supported by data.
Figure 8 depicts SVM performance from a few studies that employed it and were able to achieve outstanding results.
As shown in
Figure 8 and
Table 3, all the compared techniques produced satisfactory results, with classification accuracy ranging from 67% to 98%. They also achieved an F1 score ranging from 66% to 97%. Also, some techniques achieved a classification accuracy and F1 score of above 97% and 76% [
28,
29], respectively. This shows that the proposed SVM models correctly classified 91% of the diseases in the dataset used for training. This also shows that the SVM models correctly predicted 98% of the samples across the dataset. This is quite remarkable, as SVM proves to be an algorithm that can be used to develop an effective crop disease diagnosis system.
The comparative performance analysis of the SVM-based crop detection algorithms reveals significant variations in accuracy, F1 score, recall, and precision. Notably, algorithm [
28] achieves exceptional performance with 98.9% accuracy and 97.8% F1 score, utilizing 14 crops and 39 diseases. This suggests that larger, diverse datasets can substantially enhance algorithm performance. In contrast, algorithms [
30,
31] demonstrate lower accuracy and F1 scores, potentially due to fewer crops and diseases (16 and 38, respectively). An experiment that was carried out by [
32], exhibited a performance accuracy of 93.4% on 15 crops and 38 diseases. This was achieved by introducing two plant disease detection models of early fusion as well as lead voting ensemble integration with nine pretrained CNN. Interestingly, single-crop algorithms [
22] show competitive performance, indicating effective feature extraction and classification. Additionally, the authors of [
33] achieved a low accuracy of 76.16% by implementation of a RF for classification.
The analysis highlights several key insights. Firstly, dataset size and diversity play a crucial role in algorithm performance. Algorithms utilizing more crops and diseases tend to perform better. Secondly, optimization techniques and feature extraction methods significantly impact performance. The superior performance of the algorithm proposed in [
28] may be attributed to its optimized feature selection and classification strategy. The authors developed a precision agriculture solution that uses SVM, RF, and Fuzzy C-Means. Fuzzy C-Means was used for image segmentation. The results suggest that SVM-based algorithms can achieve high accuracy in crop disease detection, especially when combined with robust feature extraction, image segmentation, and classification techniques. Hybrid SVM models can produce better performance when trained on a large-scale dataset. Therefore, SVM models should be trained on a dataset with a sizable number of images and features to attain good results. A nature-inspired optimization technique can be used for feature selection. The SVM model developed by [
30] produced low classification accuracy. This could be due to the model being trained on a dataset with numerous images and classes. This is an indication that SVM models are not suitable for the classification of large-scale datasets. The training complexity of SVM highly depends on the size of the training dataset.
Future research should focus on exploring ensemble methods, transfer learning, and DL architectures to further improve performance. Additionally, investigating the impact of data augmentation, class imbalance, and hyperparameter tuning on SVM-based crop disease detection algorithms would provide valuable insights.
4.3.2. Performance Analysis of KNN-Based Crop Detection Techniques
The KNN classifier is now being utilized for crop disease diagnosis, with its predictions relying only on the quality and amount of data being utilized. As illustrated in
Figure 8, all the compared techniques generated good results, as evidenced by the categorization performance accuracy range of 64% to 100%. This implies that the suggested KNN models successfully identified between 64 and 100% of the samples across the dataset. This is quite novel, as KNN demonstrates that it can be utilized to create an effective crop disease diagnosis system even though the models were trained on small, medium-sized or imbalanced datasets.
The comparative performance analysis of the KNN-based crop detection algorithms reveals varying degrees of success in accuracy, F1 score, recall, and precision. As shown in
Figure 9 and
Table 4, the algorithm developed by the authors of [
34] achieves perfect accuracy (100%) with a single crop and six diseases, while the algorithm in [
10] demonstrates similarly impressive results (97.3% accuracy, 95.2% F1 score) under the same conditions. The authors of [
34] employed K-means clustering as an image processing technique for segmentation, since it can group pixels based on their intensity similarities, making it a unique characteristic that is optimal for segmentation. Additionally, K-means clustering has the capability of uncovering hidden patterns, creating meaningful clusters as well as providing valuable insights from data without the need for labeled information. Its simplicity, scalability, and interpretability make it a valuable tool in various domains. Furthermore, expert knowledge was also incorporated into the technique developed by [
34], which can assist farmers in identifying crop ailments, making the appropriate decision, and selecting the right treatment for the disease on time, resulting in better crop yield. The authors of [
10] used K-means clustering to partition the data space into Voronoi cells. The ability to identify image boundaries by contour tracing was particularly beneficial, as was the extraction of informative features via GLCM.
Notably, the algorithms developed in [
32,
35] achieve exceptional performance with a single crop, suggesting effective feature extraction and classification strategies. The KNN algorithm in [
36] achieved a classification accuracy of 91%. Algorithm [
37], utilizing 14 crops and 39 diseases, demonstrates a respectable accuracy (94.1%) but a lower F1 score (77.22%), potentially indicating class imbalance issues. The modest F1 score of the algorithm developed in [
37] highlights the challenges of scaling KNN-based approaches. ML was used for classification by the authors in [
38] who achieved a classification accuracy of 84.94%. The KNN algorithms may have performed poorly due to several factors: the training dataset had limited images, which restricted model learning, and many images contained unnecessary noise and backgrounds, complicating accurate segmentation. Additionally, variations in environmental conditions like lighting, moisture, and the capturing device introduced inconsistencies that made it hard to find similar images. The dataset also suffered from imbalanced and non-uniform samples, while differences in symptom characteristics across geographic locations further added complexity. These combined challenges likely impeded effective model training and performance.
Key observations include the significant impact of dataset size and diversity on performance, the potential for single-crop algorithms to achieve high accuracy with optimized feature extraction, and the challenges posed by class imbalance and noise in larger datasets. Future research should explore ensemble methods, distance metric optimization, and handling high-dimensional data to enhance KNN-based crop disease detection. Additionally, investigating the effects of data preprocessing, feature selection, and hyperparameter tuning on KNN performance would provide valuable insights.
Figure 9.
Classification accuracies of KNN-based crop detection techniques. Data sourced from [
32,
34,
36,
37].
Figure 9.
Classification accuracies of KNN-based crop detection techniques. Data sourced from [
32,
34,
36,
37].
Table 4.
Performance of KNN-based crop detection techniques.
Table 4.
Performance of KNN-based crop detection techniques.
Citation | Algorithm Used | Accuracy | F1 Score | Recall | Precision | Number of Crops Used | Number of Diseases |
---|
[32] | KNN | 100% | - | - | - | 1 | 6 |
[34] | KNN | 97.3% | 95.2% | 97.5% | 97.8% | 1 | 6 |
[36] | KNN | 91% | 77.2% | - | - | 1 | 10 |
[37] | KNN | 99.92% | 77.22% | - | - | 14 | 39 |
4.3.3. Performance Analysis of RF-Based Crop Detection Techniques
The RF algorithm is a popular supervised ML that employs a combination of multiple decision trees, with each tree trained using different subsets of the entire dataset, reducing overfitting and improving classifier accuracy. It is also an ensemble learning method that is used for both classification and regression tasks in ML. As shown in
Figure 9 and
Table 5, the RF-based technique developed by the authors in [
39] produced a modest precision and F1-score of 92.85% and 92.84%, respectively. They used an RF classifier for automatic crop leaf disease detection. The RF-based technique generates numerous decision trees and combines them to provide more precise predictions. Furthermore, RF effectively manages missing values in data, thereby enhancing its ability to maintain a high level of accuracy. This can be achieved by using Global Feature Descriptors (GFDs), a mathematical representation that captures the overall characteristics of an entire or specific image and summarizes the content. One of the most popular GFD techniques is the HSV. The HSV approach relies on the development of masks using color information. Additionally, it utilizes the color intensity and brightness of the HSV color system. HSV is considered the most straightforward technique for image segmentation, as it can effectively isolate the region of interest from the images. Applying a threshold to the HSV image produces masks for a healthy and diseased potato leaf image in the RGB color space. Some studies also used GLCM for texture feature extraction. The authors in [
39] extracted deep features from CNN and then fed the deep features into RF for classification.
The comparative performance analysis of the RF-based crop detection algorithms reveals varying degrees of success in accuracy, F1 score, recall, and precision. Algorithm [
39] demonstrates a precision of 92.85% with a single crop and eight diseases. In contrast, the algorithm proposed by the authors of [
40] shows relatively lower accuracy (84.94%) with a single crop and four diseases.
Algorithm [
41] stands out, achieving 96.1% accuracy, 92.1% F1 score, 88.6% recall, and 95.9% precision using 14 crops and 38 diseases. This suggests that RF implementation of the authors in [
41] effectively handles larger, more diverse datasets. The algorithm developed by the authors of [
32] demonstrates modest accuracy (80.68%) and balanced recall and precision (85.7% and 85%, respectively) with a single crop and three diseases. The RF model developed by [
32] was trained on gray images and not RGB images. This could also affect the performance of the models. This shows that the quality of images used to train ML models is very important. Moreover, the algorithm developed by the authors in [
42] produced a relatively low accuracy of 77%. They developed a hybrid model for potato leaf disease classification using RF and CNN.
Key observations include the significant impact of dataset size and diversity on performance, the potential for RF-based algorithms to achieve high accuracy with optimized feature extraction, and the challenges posed by class imbalance and noise in larger datasets. Future research should explore ensemble methods, hyperparameter tuning, and handling high-dimensional data to further enhance RF-based crop disease detection.
4.3.4. Performance Analysis of Other ML Algorithms on PlantVillage Dataset
Apart from SVM, KNN, and RF, previous studies have employed various other ML algorithms, including ELM, ANN, Naïve Bayes (NB) algorithm, and DT. ANN, resembling a biological system, models the processing of information through interconnected processing elements forming a network structure. This characteristic enables ANN to extract patterns from complex data due to its intrinsic understanding. DT is recognized for its robustness and success in classification and detection, making it widely applied in areas such as medical diagnosis, speech recognition, and character recognition. The Extreme Learning Machine (ELM) model, introduced by [
41], represents a straightforward form of feed-forward neural networks. Furthermore, the authors specifically devised a Single-Hidden Layer Feed-Forward Neural Network (SLFN) for diagnosing crop diseases. The technique has the capacity to randomly choose hidden nodes and analytically determine the output weights of the SLFNs. SLFNs tend to provide good generalization performance at commendable learning speed because inputted weights are randomly assigned, and hidden layer biases can be learned in distinct observation.
Figure 10 shows the comparison between different ML techniques that were trained on the PlantVillage dataset. As shown in
Figure 11 and
Table 6, the compared ML algorithms achieved classification accuracies ranging from 80.42% to 99.7%. The ANN-based model developed by [
43] produced a classification accuracy and F1 score of 98.22% and 93.55%, respectively. The authors of [
31] developed a model for plant disease diagnosis using a combination of SVM, GLCM, CNN, and K-means algorithms. GLCM was used to convert the images into a new color space. K-means clustering is used to segment the infected region in the image. KNN was used to extract the textual features of the segmented region. The extracted features are then used to train the CNN algorithm, which achieved a commendable classification accuracy of 99.6%. The results show that combining the use of image processing techniques with CNN and ML algorithms can improve their performance. The results also show that using image segmentation techniques (such as K-means algorithm) and feature selection techniques (such as PCA) can improve the performance of ML- and DL-based crop detection models.
The authors in [
44] proposed a classification technique for plant diseases using the NB algorithm, achieving a classification accuracy of 91%. Naïve Bayes has been termed to be among the simplest and most efficient classification algorithms in building fast models of ML for fast prediction and high accuracy. It is not only a supervised learning method that can be trained efficiently but also a probabilistic classifier based on an object. Nevertheless, the NB classifier method produces the right grouping solution based on experts’ learning. It is beneficial to use Naïve Bayes, since it requires a small amount of training data for variance parameter estimation of the variables required for classification. The authors of [
45] designed a plant detection technique using the DT algorithm. The algorithm was trained on over 21,000 images of eight crops and 23 diseases (including healthy plants), and it produced a modest classification accuracy of 80.42%. The authors of [
46] designed an improved classification technique for plant leaf disease detection using wavelet kernel ELM and MobileNetV3. The MobileNetV3 was used for feature extraction, while the wavelet kernel ELM model was used for classification. The hybrid model was trained on the PlantVillage dataset and it achieved a remarkable classification accuracy of 99.7%
4.4. Performance Analysis of DL-Based Crop Detection Techniques
Numerous DL-based methods have emerged in the research utilizing various architectures such as CNN, VGG16, ResNet50, DenseNet121, and numerous others. This section provides an evaluation of the performance of these DL-based techniques.
4.4.1. Performance Analysis of CNN-Based Crop Detection Techniques
A CNN is a well-known DL technique that has been extensively utilized in computer vision, including crop disease identification. CNNs are said to be capable of automatically and adaptively learning spatial hierarchies of features from input data, making them ideal for image recognition applications. CNN application, especially in the agricultural field, includes training its model using a dataset by learning to recognize patterns and features in the images from a dataset that distinguishes healthy and unhealthy. Additionally, CNN is also used in classifying new and unseen images by predicting the condition affecting the image based on the learnt patterns and the features. Furthermore, CNN has demonstrated high accuracy in crop disease diagnosis, often surpassing human experts.
Many CNN-based crop disease detection techniques have been developed in the literature. As shown in
Figure 12, most of the reviewed CNN-based techniques diagnosed crop diseases with a high classification accuracy of above 92%. The CNN-based method introduced by the authors of [
47] achieved a classification accuracy of 98.2%. This approach comprised five layers and was trained using a medium-scale dataset containing 2002 black gram images. Additionally, the authors of [
47] highlighted that their proposed method is more time-efficient and has the capacity to promptly detect and identify agricultural diseases in their early stages. This capability is essential for establishing an efficient disease management plan, rendering it a strongly endorsed solution for farmers.
The authors of [
48] introduced a DL-centric strategy for the automatic classification of plant diseases, employing CNN models. The devised system aims to categorize three distinct types of potato leaf diseases by analyzing their visual symptoms. The model comprises numerous convolutional layers, pooling layers, and fully connected layers. Moreover, transfer learning was employed to enhance a pre-trained CNN model, adapting it particularly for the classification problem at hand. The suggested system exhibited remarkable effectiveness by achieving an overall accuracy of 98.07% on the test dataset, confirming its ability to accurately predict potato leaf diseases.
The authors of [
49] developed intelligent image recognition systems based on SVM, transfer learning (MobileNet) algorithm, and CNN approaches, all of which were based on moveable systems that were learned utilizing data from paddy fields and the internet. The MobileNet algorithm contributed to the accuracy of performance through its lightweight design, which allows it to be integrated into frameworks such as Tensor Flow and PyTorch, making it accessible to developers working on large projects.
In a separate study, the authors of [
50] developed a CNN model designed to identify the concurrent presence of strawberry leaf spot and leaf blight. Leveraging the feature extraction capabilities of a normalized CNN, the model showcased an innovative approach to detecting multiple classes of fungal infections on a single plant. This achievement was made possible by the model accurately learning the shared characteristics during the training process. For all classes analyzed, the model had an overall accuracy performance of over 98%.
The authors of [
51] developed a customized DL model using a specialized architecture called CNN to automatically identify plant diseases. Specifically, the model was designed to categorize potato leaves into three different disorders. The author utilized data augmentation approaches to gather datasets from many sources in order to generate new images based on existing datasets, employing a range of data transformation techniques, integrated and pooled layers, and fully merged layers through feature learning. The model achieved a classification accuracy of 99.39% with an ability to automatically learn features from raw images, thus demonstrating effective detection of three different types of potato leaf disease can be recommended for application. Additionally, [
50,
51] had remarkable performance accuracy of 98.97% and 99.39% respectively. This was achieved from implementation of 2 different CNN algorithm by each. The summarized performance of the reviewed SNN-based techniques is shown in
Table 7 and
Figure 12.
The results of the compared CNN-based techniques are shown in
Table 8 and
Figure 13. The comparative performance analysis of the CNN-based crop detection algorithms reveals exceptional accuracy across various datasets. Algorithm [
49] achieves the highest accuracy (99.53%) with a single crop and five diseases. Also, the algorithm in [
46] demonstrate impressive accuracy (97.66%) with single-crop datasets. Notably, algorithm [
47] reports a comprehensive evaluation (92.50% accuracy, 91.89% F1 score, 87.17% recall, 97.14% precision) with a single crop and one disease. In contrast, the algorithm developed by the authors of [
52] demonstrates balanced performance (95.17% accuracy, 95.11% F1 score, 95.17% recall, 95.11% precision) using three crops and twelve diseases. The SVM algorithm developed by the authors of [
53] achieves high accuracy (98.36%) with a single crop and ten diseases, highlighting the effectiveness of CNN-based approaches in crop disease detection. Also, the authors in [
52] respectively affirmed the commendable performance accuracy of CNN while [
54] showcased DenseNet121 accuracy of 98.97%.
Key observations include the exceptional performance of CNN-based algorithms, the benefits of optimized architectures for single-crop datasets, and the challenges posed by larger, more diverse datasets. Future research focused on developing an automated, unbiased should explore transfer learning, data augmentation, and ensemble methods to further enhance CNN-based crop disease detection. The summarized performance of the other DL-based.
4.4.2. Performance Analysis of VGG16 Crop Detection Techniques
Visual Geometry Group (VGG) is an Oxford University research group identified for its computer visual-related activities, particularly the development of the VGGNet architecture, which is a deep CNN that has been widely used in image classification and object recognition tasks. VGG is usually followed by a number indicating the architecture’s specific variant, such as VGG16: a variant with 16 weight layers. Notably, this architecture has been identified to be significant in the field of DL, and it is frequently used for benchmarking and evaluating new models and techniques. This is because of its simplicity and efficacy in producing cutting-edge results, as well as the utilization of small filters. Furthermore, the VGG design comprises many convolutional layers stacked on top of each other that are fully coupled with max-pooling layers as well as fully connected layers, allowing the network to learn complicated features at different levels of abstraction by repeatedly using small filter sizes (3 × 3).
Different crop disease detection techniques have been developed in the literature using VGG16. The authors of [
14] developed a web-based application for rice disease classification using VGG16. The application achieved a performance accuracy of 90%. In another study, the authors in [
55] proposed a DL-based technique for the detection and classification of four common palm tree diseases using VGG16. The technique achieved a classification accuracy of 99.56%.
The depth of the VGG16 network can be extended by incorporating additional convolutional layers, as it employs small convolution filters across all layers [
55]. Conversely, the authors in [
56] introduced a DL model designed to automatically diagnose and assess the severity of plant diseases using image data. This initiative drew inspiration from the breakthroughs in DL for image-based recognition of plant diseases. To do this, the researchers trained CNNs with different depths from scratch and adjusted four advanced deep models: VGG16, VGG19, Inception V3, and ResNet50. The VGG models (16 and 19) were particularly remarkable, as they demonstrated substantial enhancements compared to previous architectures. Out of all the models, VGG16 exhibited the most impressive performance, attaining a test dataset accuracy of 90.4%. During training, the deep networks attained accuracies approaching 100%, leading to the early termination of the training process.
The authors of [
57] demonstrated the effectiveness of VGG16 in assisting farmers in identifying pear leaf symptoms and offering targeted guidance for pesticide use. Their approach involved designing a multioutput system based on CNN. They evaluated five pre-trained CNN architectures—VGG16, VGG19, ResNet50, InceptionV3, MobileNetV2, and EfficientNetB0—as feature extractors for classifying three diseases and six severity levels. VGG16 gave an accuracy of 64.23% for this particular study due to the small size of the dataset, and thus, there was difficulty in capturing relevant features.
Furthermore, the authors of [
58] developed a disease classification in eggplant using a pre-trained VGG16 that used images converted to other color spaces such as hue saturation value, YcbCr, and grayscale for evaluation. VGG16 was also used as a feature extractor from an eighth convolutional layer whereby these features have been used for classifying diseases employing multi-class SVM. This allowed the proposed model, which was implemented by VGG16, to achieve a performance accuracy of 99.4% [
59] which was achieved by including features from its various layers that were given to the MSVM to assess classification effectiveness.
The authors of [
60] developed a merged neural network that integrated the retrieved features from both VGG16 and AlexNet networks to construct a disease classification model with fully connected layers. The efficacy of the concatenated model was evaluated, yielding a training classification accuracy of 100%, validation accuracy of 97.29%, and testing accuracy of 95.82%. In another study, the authors of [
61] presented a VGG16 model to detect plant diseases such that farmers can use it to make timely actions with respect to treatment without further delay. This was achieved by using 19 different classes of crops from the PlantVillage dataset for training and testing. The proposed model comprises thirteen convolutional layers, two batch normalization layers, five max-pooling layers, and three full connection layers. These contributed to the proposed model performance accuracy of 95.2% despite the inputted images having certain illumination conditions and a complex background, since the images were collected from actual leaves from planted crops.
Figure 14 and
Table 9 shows the classification accuracies of different VGG16-based crop detection techniques. The comparative performance analysis of the VGG16-based crop detection algorithms reveals impressive accuracy across various datasets. The algorithm achieves the highest accuracy (99.4%) with a single crop and five diseases. Algorithm [
58] demonstrates exceptionally balanced performance (98.19% accuracy, 98.12% F1 score, 98.24% recall, 98.33% precision) using two crops and five diseases. Other algorithms [
14,
62], show similar accuracy (90% and 90.4%, respectively) with a single crop and four diseases. Algorithm [
61] achieves 95.2% accuracy with five crops and nineteen diseases, while the VGG16 algorithm in [
63] demonstrates 92.5% accuracy using fourteen crops and thirty-eight diseases. These results highlight VGG16’s effectiveness in crop disease detection, even with varying dataset sizes and complexities.
Key observations include VGG16’s robust performance across diverse datasets, the benefits of transfer learning, and the importance of dataset quality. Future research should explore fine-tuning VGG16, data augmentation, and ensemble methods to further enhance crop disease detection accuracy.
4.4.3. Performance Analysis of ResNet50 Crop Detection Techniques
Residual Neural Network (ResNet) is a popular DL model architecture that addresses the challenge of vanishing gradients in very deep neural networks, which may hinder their ability to learn effectively. Moreover, ResNet’s key innovation is the use of residue learning that involves the introduction of shortcut connections (skip connections) into the network architecture. This feature allows the information to bypass more than one layer, thus enabling direct flow of the input to the output. Hence, ResNet has been widely adopted and achieved state-of-the-art performance on various computer vision tasks, including image classification, object detection, and image segmentation.
The authors of [
64] designed a smart web application for predicting crop diseases using ResNet50, achieving an impressive classification accuracy of 98.98%. This innovative tool offers farmers the capability to identify plant diseases by analyzing images of crop leaves. In another study, the authors of [
59] developed a technique using ResNet9. The model considered the image shape, the diseased area present, and the general green area of the leaf for its prediction. The authors also used data augmentation and hyperparameter optimization to improve the performance of the model. The accuracy of 99.25% achieved by the model shows that ResNet is highly recommended for crop disease diagnosis.
The authors of [
16] developed a tailored model for identifying and categorizing image leaves. They used ResNet50 to extract various features from plant leaf images, including color and texture features. To acquire optimized and salient features with reduced size of the MRDOA, the modified red deer optimization method was also implemented as an optimal feature selection technique. The PDICNet model achieved an accuracy performance of 99.73%.
The authors of [
65] focused on developing an automated, unbiased, and effective diagnostic approach for identifying wheat illnesses. They employed a model in a greenhouse scenario that incorporated image-deep features and parallel feature fusion. This was achieved by utilizing an automated methodology that relies on image processing techniques to match crop images with the objective of targeting identical locations. To speed up the dataset development process, an automated web tool was utilized to generate the dataset from either image or automatically build the corresponding dataset from the other image. Furthermore, handcrafted features were extracted from each image format, and disease detection results revealed that parallel feature fusion was more accurate than features from either type of image. ResNet101, which was implemented with deep features and parallel feature fusion, outperformed the other DL models. It produced a classification accuracy of 74% on leaf rust and 84% on tan spots.
The authors of [
27] developed a DL-based technique to classify diseases in cauliflower plants at an early stage. Their major objective was to enhance disease diagnosis and detection by employing deep transfer learning techniques. This study trained and analyzed 10 DL transfer learning models, namely EfficientNetB0, Xception, EfficientNetB1, MobileNetV2, EfficientNetB2, DenseNet201, EfficientNetB3, InceptionResNetV2, EfficientNetB4, and ResNet152V2. ResNet152V2 attained an accuracy performance of 66.50%.
The authors of [
62] proposed a DL model that would be capable of crop disease classification. This was achieved by consideration of 10 DL pre-trained models that were each fine-tuned using the PlantVillage dataset. ResNet50 achieved an accuracy of 93.26% among other DL pre-trained models. The authors of [
64] sought to determine the most appropriate ML/DL models for the PlantVillage tomato dataset and the classification of tomato diseases. Among the tested DL models, ResNet was inclusive, whereby the features were extracted using COLOR and GLCM. ResNet also produced the best accuracy of 98.97%.
Figure 15 and
Table 10 show the classification accuracy of ResNet50-based crop detection techniques. The comparative performance analysis of the algorithms reveals exceptional accuracy across various datasets. Algorithm [
59] achieves outstanding performance (99.25% accuracy, 99.33% F1 score, 99.33% recall, 99.67% precision) using two crops and two diseases. Similarly, the ResNet50 algorithm in [
64] demonstrates impressive accuracy (98.98% accuracy, 98.98% F1 score, 99.05% recall, 98.99% precision) with three crops. The ResNet50 models developed in [
27,
64] achieve high accuracy (98.2% and 98.36%, respectively) with diverse dataset sizes (fourteen crops/thirty-eight diseases and one crop/six diseases). The algorithm developed by the authors of [
65] shows modest accuracy (93.5%) with a single crop and two diseases. Algorithms [
61,
64] demonstrate competitive performance (96.49% and 98.2% accuracy) using 14 crops and 38 diseases. These results highlight ResNet50’s robustness across diverse dataset sizes and complexities.
Key observations include ResNet50’s exceptional performance in crop disease detection and the importance of optimized architecture and training parameters. Future research should explore fine-tuning ResNet50 for specific crop disease detection tasks, data augmentation and preprocessing techniques, and ensemble methods. Comprehensive evaluations provide valuable insights into algorithm performance and inform optimization strategies.
4.4.4. Performance Analysis of DenseNet121 Crop Detection Techniques
The Densely Connected Convolutional Network (DenseNet) is another deep neural network architecture that has been able to address a few challenges by traditional deep neural networks, such as gradient vanishing and increased risk of overfitting as networks become deeper. This is made possible by its ability to establish feed-forward connections across all layers, whereby each layer receives direct input from all levels before passing along its own feature maps to all levels that follow. Despite having several of these blocks layered on top of one another, DensNet may be separated into dense blocks and transition blocks.
The authors of [
52] suggested a classification method for crop diseases using ten pre-trained CNN models, specifically AlexNet, GoogleNet, VGG16, VGG19, ResNet50V2, Inception V3, Xception, MobileNetV2, InceptionResNetV2, and DenseNet121. The models were fine-tuned on the PlantVillage dataset. The fine-tuning was accomplished by unfreezing the last block of convolutional layers. The fine-tuning focused on improving the high-dimensional features extracted, aiming to affect the capture of additional details within these features by changing the weight parameter. In the study, DenseNet121 produced the best result, achieving a classification accuracy of 94.4% when all layers were frozen and 98.97% when only the last block’s convolution layers were frozen. In another study, the authors of [
65] employed DenseNet121 to identify apple leaf diseases. They used DenseNet121 with three regression multilevel classification algorithms and a focus loss function, and it obtained a classification accuracy of approximately 93%.
The authors of [
27] conducted a comparison of rice classification using six CNN-based DL architectures: DenseNet121, InceptionV3, MobileNetV2, ResNext101, ResNet152V, and Seresnext101. This study utilized a database comprising nine of the most prevalent rice diseases. Additionally, a transfer learning approach was applied to DenseNet121, MobileNetV2, Resnet152V, and Seresnext101 to investigate whether transfer learning could potentially enhance accuracy.
To compare the original transfer learning and ensemble approaches, a DEX ensemble model built on DenseNet121, EfficientNetB7, and Xception networks was used, which gave an accuracy of 98%. DenseNet121 achieved an accuracy performance of 97% in the detection of rice diseases. The authors of [
16] used a pre-trained neural network model, DenseNet121, that was imported from the Keras library, to train DenseCNN, which was used for classification. In this study, DenseNet121 has been used to classify twenty-nine different diseases for seven crops and also extract features. This DL model was able to achieve 98.23% theoretically and practically 94.96%, outperforming the previously proposed state-of-the-art techniques.
The authors of [
63] developed advanced DL techniques for predicting early diseases in cauliflower plants, with the primary objective of enriching comprehension of the importance of identifying and detecting crop diseases in rural agriculture, employing sophisticated deep transfer learning methods. This was achieved by examining the performance of 10 deep transfer learning models with regard to precision, F1 score, and accuracy, to mention but a few. Remarkably, among the ten models, EfficientNetB1 achieved the highest validation of 99.90%, with DenseNet121 giving 98.48%.
The authors of [
20] implemented a DL-based approach for detecting leaf diseases in crops using images tailored for agricultural applications. The authors accomplished this objective by employing pre-trained CNN models and focusing on fine-tuning the hyperparameters of well-known models, including DenseNet121, ResNet50, VGG16, and InceptionV4, for efficient crop disease identification. The models’ efficacy was evaluated through metrics such as classification accuracy, sensitivity, specificity, and F1 score. It was found that DenseNet121 exhibited the highest classification accuracy, reaching an impressive 99.81%. This superior performance is attributed to DenseNet121’s thin and compact design, where the feature maps from preceding layers are directly linked to subsequent layers, resulting in a reduction in the number of channels in a dense block.
The authors of [
18] introduced a methodology aimed at facilitating early diagnosis and mitigating potential crop losses through precise and resilient disease detection in tomato crops. This was accomplished by leveraging DenseNet121, chosen for its recognized performance resulting from its innovative utilization of dense skip connections across layers. These connections help alleviate the gradient vanishing problem by enabling a direct flow of gradients. The proposed approach incorporates DenseNet121 to enhance the performance and convergence of disease detection models. The resulting classification accuracy achieved by DenseNet121 was 98.30%.
Figure 16 and
Table 11 show the classification accuracies of the reviewed DenseNet121 models. The comparative performance analysis of the models reveals exceptional accuracy across various datasets. The DenseNet model in [
20] achieves outstanding performance (99.81% accuracy, 99.8% F1 score) using 14 crops and 38 diseases. Similarly, the DenseNet model in [
62] demonstrates impressive accuracy (98.97% accuracy, 98.97% F1 score, 98.97% recall, 99.00% precision) with the same dataset configuration. Furthermore, the models in [
10,
18] achieve high accuracy (98.30% and 97%, respectively) with a single crop and nine diseases. The DenseNet model in [
16] shows moderate accuracy (94.96%) using seven crops and twenty-nine diseases. The model developed in [
46] demonstrates relatively lower accuracy (93.71%) with a single crop and six diseases.
Key observations include DenseNet121’s robust performance in crop disease detection, the benefits of using larger datasets, and the importance of optimized architecture and training parameters. Future research should explore fine-tuning DenseNet121, data augmentation, and ensemble methods to further enhance crop disease detection accuracy