Advanced Image Preprocessing and Integrated Modeling for UAV Plant Image Classification
Abstract
:1. Introduction
- Comprehensive Evaluation of Preprocessing Techniques and Pipelines: We thoroughly investigated the impact of various image preprocessing techniques on plant species classification accuracy from RGB UAV images. This includes identifying the most effective combinations of preprocessing techniques within different pipelines to optimize classification performance for seven target species.
- Hybrid Deep Learning and Machine Learning Approaches: We explore the performance of hybrid models combining a pre-trained VGG-16 for feature extraction with different classifiers (SVM, RF, XGBoost, and VGG-16 neural network layers), evaluating their effectiveness in plant species classification.
2. Related Work
- Conducting a comprehensive evaluation of various image preprocessing techniques and their combinations. This goes beyond previous studies that often focus on individual techniques or lack rigorous comparisons.
- Investigating the impact of preprocessing on the performance of hybrid deep learning and machine learning models. This explores the potential for synergistic performance gains by combining the strengths of both deep learning and traditional methods.
3. Materials and Methods
3.1. Objective and Overall Approach
- 1.
- Image Preprocessing: We enhance image quality using a combination of techniques:
- Enhanced Super-Resolution Generative Adversarial Network (ESRGAN): Improves image resolution.
- Contrast-Limited Adaptive Histogram Equalization (CLAHE): Enhances contrast.
- White Balancing: Corrects color imbalances.
- 2.
- Feature Extraction: We utilize a pre-trained VGG-16 model to extract relevant features from the preprocessed images. This deep convolutional neural network is known for its robustness and accuracy in image feature extraction. By using a well-established CNN like VGG-16, we ensure that the critical spatial features of plant species captured by UAVs are preserved and accurately represented. Before feeding the images into the VGG-16 model, we perform the necessary preprocessing steps such as resizing and normalizing to ensure consistent input dimensions and data distribution. The VGG-16 model then processes the images through convolutional layers, activation functions, batch normalization, and max pooling to identify patterns and reduce dimensions.
- 3.
- Classification: We evaluate the performance of four different classifiers:
- Support Vector Machine (SVM)
- Random forest (RF)
- Extreme Gradient Boosting (XGBoost)
- VGG-16 Neural Network’s Classification Layer
3.2. Image Preprocessing Techniques
3.2.1. Enhanced Super-Resolution Generative Adversarial Network (ESRGAN)
3.2.2. Contrast-Limited Adaptive Histogram Equalization (CLAHE)
3.2.3. White Balancing (WB)
3.3. Dataset Creation
- Base Dataset: This dataset comprises the raw, unprocessed images directly captured by a drone, representing the initial state of the data.
- ESRGAN-Refined Dataset: This dataset incorporates the images enhanced using the ESRGAN (Enhanced Super-Resolution Generative Adversarial Network) algorithm, which improves image resolution. We aimed to evaluate whether increased image resolution through ESRGAN would benefit classification performance.
- Contrast-Enhanced Dataset: Building upon the ESRGAN-refined images, this dataset applies additional contrast enhancement techniques, potentially aiding in the classification process by improving the visibility of subtle details. We investigated whether contrast enhancement, following ESRGAN refinement, could further improve classification accuracy.
- White-Balanced Dataset: This dataset includes the ESRGAN-refined images that have undergone white-balancing procedures, correcting for color casts caused by varying lighting conditions and ensuring a more consistent and natural color representation. We explored whether white balancing, alongside ESRGAN refinement, could enhance the model’s ability to accurately classify image features.
3.4. Feature Extraction
3.4.1. VGG-16 Architecture
3.4.2. Feature Extraction Process
3.5. Classification Models
3.5.1. Random Forest
- Unpruned Trees: Trees were allowed to grow without constraints on the maximum number of levels, promoting flexibility in capturing complex decision boundaries.
- Minimum Split and Leaf Nodes: The minimum number of data points required in a node before it can be split was set to 2, and the minimum allowed data points in a terminal leaf node was set to 1. These parameters prevent overfitting by ensuring a minimal level of information in each split and leaf.
- Gini Impurity: The Gini index was employed as the criterion for selecting the best splitting feature at each node. The Gini index measures the level of impurity within a node perfectly homogeneous node (all the data points belong to one class) has a Gini index of 0. The model greedily selects the feature that best separates the data into distinct classes, minimizing the overall Gini impurity.
- Number of Trees: We opted for 50 trees in the random forest ensemble. While increasing the number of trees can further enhance performance, we found that 50 trees provided a good balance between accuracy and computational efficiency for our dataset size.
3.5.2. Support Vector Machine (SVM)
- Radial Basis Function (RBF) Kernel: We employed the RBF kernel, which is a popular choice for non-linear SVM applications. The RBF kernel allows the SVM to capture complex, non-linear relationships between the features in the data, making it suitable for effectively separating the image classes.
- Regularization Parameter (C): This parameter controls the trade-off between achieving a smooth decision boundary and accurately classifying all training data points. A higher C value prioritizes the strict classification of training points, which can lead to a more complex decision boundary and potentially increased risk of overfitting. In our case, we carefully tuned the C parameter to a value of 14.5, striking a balance between these competing factors.
3.5.3. Extreme Gradient Boosting (XGBoost)
- Learning Rate (0.3): This parameter controls the step size taken in each boosting iteration. A smaller learning rate like 0.3 helps prevent overfitting by making smaller adjustments to the model with each step.
- Number of Booster Rounds (100): This parameter determines the number of trees included in the final ensemble. We opted for 100 trees, striking a balance between achieving high accuracy and maintaining computational efficiency.
- Max Tree Depth (6): This parameter limits the maximum depth of each individual tree within the ensemble. Limiting the depth helps control model complexity and reduces the risk of overfitting.
- Subsample (1): This parameter specifies the proportion of training data used to fit each tree. In our case, we used the entire dataset (subsample = 1) to maximize the information available to each tree during training.
3.5.4. VGG-16 Neural Network Classifier
3.6. Performance Evaluation
4. Result
4.1. Performance on Base Image Dataset
4.2. Performance on Preprocessed Datasets
4.2.1. Base Preprocessed ESRGAN Dataset
4.2.2. Base Preprocessed ESRGAN and CLAHE Dataset
4.2.3. Base Preprocessed ESRGAN and White Balancing (WB) Dataset
4.3. Comparative Analysis
4.4. Confusion Matrices
- Base + ESRGAN + WB: The SVM classifier exhibits the lowest misclassification rate with only 6 incorrect classifications, followed by random forest (11) and XGBoost (15). The full VGG-16 model misclassifies 16 images. This dataset enhancement, combining ESRGAN and white balancing, consistently demonstrates the lowest misclassification rates across all models.
- Base + ESRGAN + CLAHE: The SVM classifier shows 7 misclassifications, while random forest and XGBoost misclassify 35 and 22 images, respectively. The VGG-16 classifier misclassifies 15 images. This scenario highlights the potential impact of CLAHE on classifier decision boundaries, leading to a higher number of misclassifications compared to the ESRGAN and white balancing combination.
- Base + ESRGAN: The SVM classifier misclassifies 9 images, while random forest and XGBoost misclassify 28 and 23 images, respectively. The VGG-16 classifier misclassifies 28 images. This suggests that while ESRGAN improves feature extraction, additional preprocessing steps like white balancing contribute to more consistent classification accuracy.
- Base Dataset: The SVM classifier has 17 misclassifications, while random forest and XGBoost misclassify 27 and 22 images, respectively. This highlights the significant improvement achieved through dataset enhancement techniques like ESRGAN and CLAHE in reducing misclassifications and enhancing the overall performance of machine learning models.
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Qin, T.; Wang, L.; Zhou, Y.; Guo, L.; Jiang, G.; Zhang, L. Digital technology-and-services-driven sustainable transformation of agriculture: Cases of China and the EU. Agriculture 2022, 12, 297. [Google Scholar] [CrossRef]
- Shahi, T.B.; Dahal, S.; Sitaula, C.; Neupane, A.; Guo, W. Deep Learning-Based Weed Detection Using UAV Images: A Comparative Study. Drones 2023, 7, 624. [Google Scholar] [CrossRef]
- Lee, S.H.; Chan, C.S.; Mayo, S.J.; Remagnino, P. How deep learning extracts and learns leaf features for plant classification. Pattern Recognit. 2017, 71, 1–13. Available online: https://www.sciencedirect.com/science/article/pii/S003132031730198X (accessed on 7 October 2024). [CrossRef]
- Geetharamani, G.; Pandian, A. Identification of plant leaf diseases using a nine-layer deep convolutional neural network. Comput. Electr. Eng. 2019, 76, 323–338. [Google Scholar] [CrossRef]
- Saleem, M.H.; Potgieter, J.; Arif, K.M. Plant Disease Detection and Classification by Deep Learning. Plants 2019, 8, 468. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhu, L. A Review on Unmanned Aerial Vehicle Remote Sensing: Platforms, Sensors, Data Processing Methods, and Applications. Drones 2023, 7, 398. [Google Scholar] [CrossRef]
- Nguyen, V.S.; Jung, J.; Jung, S.; Joe, S.; Kim, B. Deployable hook retrieval system for UAV rescue and delivery. IEEE Access 2021, 9, 74632–74645. [Google Scholar] [CrossRef]
- Li, X.; Tupayachi, J.; Sharmin, A.; Martinez Ferguson, M. Drone-Aided Delivery Methods, Challenge, and the Future: A Methodological Review. Drones 2023, 7, 191. [Google Scholar] [CrossRef]
- Loianno, G.; Kumar, V. Cooperative Transportation Using Small Quadrotors Using Monocular Vision and Inertial Sensing. IEEE Robot. Autom. Lett. 2017, 3, 680–687. [Google Scholar] [CrossRef]
- Mohsan, S.A.H.; Othman, N.Q.H.; Li, Y.; Alsharif, M.H.; Khan, M.A. Unmanned Aerial Vehicles (UAVs): Practical Aspects, Applications, Open Challenges, Security Issues, and Future Trends. Intell. Serv. Robot. 2023, 16, 109–137. [Google Scholar] [CrossRef]
- Jiang, Y.; Wei, Z.; Hu, G. Detection of Tea Leaf Blight in UAV Remote Sensing Images by Integrating Super-Resolution and Detection Networks. Environ. Monit. Assess. 2024, 196, 1–27. [Google Scholar] [CrossRef] [PubMed]
- Seifert, E.; Seifert, S.; Vogt, H.; Drew, D.; van Aardt, J.; Kunneke, A.; Seifert, T. Influence of Drone Altitude, Image Overlap, and Optical Sensor Resolution on Multi-View Reconstruction of Forest Images. Remote Sens. 2019, 11, 1252. [Google Scholar] [CrossRef]
- Bongomin, O.; Lamo, J.; Guina, J.M.; Okello, C.; Ocen, G.G.; Obura, M.; Alibu, S.; Owino, C.A.; Akwero, A.; Ojok, S. UAV Image Acquisition and Processing for High-Throughput Phenotyping in Agricultural Research and Breeding Programs. Plant Phenome J. 2024, 7, e20096. [Google Scholar] [CrossRef]
- Chen, J.; Chen, Z.; Huang, R.; You, H.; Han, X.; Yue, T.; Zhou, G. The Effects of Spatial Resolution and Resampling on the Classification Accuracy of Wetland Vegetation Species and Ground Objects: A Study Based on High Spatial Resolution UAV Images. Drones 2023, 7, 61. [Google Scholar] [CrossRef]
- Šulc, M.; Matas, J. Fine-Grained Recognition of Plants from Images. Plant Methods 2017, 13, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Zali, S.-A.; Mat-Desa, S.; Che-Embi, Z.; Mohd-Isa, W.-N. Post-Processing for Shadow Detection in Drone-Acquired Images Using U-Net. Future Internet 2022, 14, 231. [Google Scholar] [CrossRef]
- Jonak, M.; Mucha, J.; Jezek, S.; Kovac, D.; Cziria, K. SPAGRI-AI: Smart Precision Agriculture Dataset of Aerial Images at Different Heights for Crop and Weed Detection Using Super-Resolution. Agric. Syst. 2024, 216, 103876. [Google Scholar] [CrossRef]
- Ye, Z.; Wei, J.; Lin, Y.; Guo, Q.; Zhang, J.; Zhang, H.; Deng, H.; Yang, K. Extraction of olive crown based on UAV visible images and the U2-Net deep learning model. Remote Sens. 2022, 14, 1523. [Google Scholar] [CrossRef]
- Modak, S.; Heil, J.; Stein, A. Pan sharpening low-altitude multispectral images of potato plants using a generative adversarial network. Remote Sens. 2024, 16, 874. [Google Scholar] [CrossRef]
- Kusnandar, T.; Surendra, K. Camera-Based Vegetation Index from Unmanned Aerial Vehicles. In Proceedings of the 6th International Conference on Sustainable Information Engineering and Technology, Malang, Indonesia, 13–14 September 2021; pp. 173–178. [Google Scholar]
- Pandey, A.; Jain, K. An intelligent system for crop identification and classification from UAV images using conjugated dense convolutional neural network. Comput. Electron. Agric. 2022, 192, 106543. [Google Scholar] [CrossRef]
- Reedha, R.; Dericquebourg, E.; Canals, R.; Hafiane, A. Transformer Neural Network for Weed and Crop Classification of High-Resolution UAV Images. Remote Sens. 2022, 14, 592. [Google Scholar] [CrossRef]
- Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Deep learning techniques to classify agricultural crops through UAV imagery: A review. Neural Comput. Appl. 2022, 34, 9511–9536. [Google Scholar] [CrossRef]
- Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A survey of transfer learning. J. Big Data 2016, 3, 1345–1459. [Google Scholar] [CrossRef]
- Al Sahili, Z.; Awad, M. The power of transfer learning in agricultural applications: Agrinet. Front. Plant Sci. 2022, 13, 992700. [Google Scholar] [CrossRef] [PubMed]
- Siddharth, T.; Kirar, B.S.; Agrawal, D.K. Plant species classification using transfer learning by pre-trained classifier VGG-19. arXiv 2022, arXiv:2209.03076. [Google Scholar]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- Tariku, G.; Ghiglieno, I.; Gilioli, G.; Gentilin, F.; Armiraglio, S.; Serina, I. Automated identification and classification of plant species in heterogeneous plant areas using unmanned aerial vehicle-collected RGB images and transfer learning. Drones 2023, 7, 599. [Google Scholar] [CrossRef]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Mishra, A. Contrast limited adaptive histogram equalization (CLAHE) approach for enhancement of the microstructures of friction stir welded joints. arXiv 2021, arXiv:2109.00886. [Google Scholar]
- Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2017, 27, 379–393. [Google Scholar] [CrossRef]
- Simonyan, Z.-A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Suthaharan, S. Support Vector Machine. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning; Suthaharan, S., Ed.; Springer: Boston, MA, USA, 2016; pp. 207–235. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Yang, Y.; Newsam, S. Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification. In Proceedings of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27 June–28 July 2016; pp. 770–778. Available online: https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html (accessed on 22 April 2023).
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27 June–28 July 2016; pp. 2818–2826. Available online: https://www.cvfoundation.org/openaccess/content_cvpr_2016/html/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.html (accessed on 22 April 2023).
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 2019, arXiv:1801.04381. [Google Scholar] [CrossRef]
Input Image Dataset Type | Hybrid Classification Model | Accuracy | F1 Score | Recall | Precision |
---|---|---|---|---|---|
Image dataset(base) | VGG 16 + RF | 87.59 | 0.874 | 0.875 | 0.877 |
VGG 16 + SVM | 95.02 | 0.961 | 0.960 | 0.962 | |
VGG 16 + XGBoost | 86.20 | 0.862 | 0.860 | 0.870 | |
Full VGG 16 model | 85.03 | 0.853 | 0.851 | 0.845 | |
Base preprocessed + ESRGAN | VGG 16 + RF | 89.72 | 0.895 | 0.896 | 0.891 |
VGG 16 + SVM | 96.71 | 0.960 | 0.965 | 0.961 | |
VGG 16 + XGBoost | 91.60 | 0.912 | 0.911 | 0.922 | |
Full VGG 16 model | 89.78 | 0.897 | 0.896 | 0.899 | |
Base preprocessed + ESRGAN and CLAHE | VGG 16 + RF | 93.79 | 0.939 | 0.938 | 0.941 |
VGG 16 + SVM | 97.44 | 0.974 | 0.975 | 0.975 | |
VGG 16 + XGBoost | 92.91 | 0.921 | 0.924 | 0.920 | |
Full VGG 16 model | 93.79 | 0.939 | 0.938 | 0.939 | |
Base preprocessed + ESRGAN and WB | VGG 16 + RF | 95.25 | 0.953 | 0.953 | 0.954 |
VGG 16 + SVM | 97.88 | 0.978 | 0.978 | 0.979 | |
VGG 16 + XGBoost | 94.52 | 0.946 | 0.946 | 0.948 | |
Full VGG 16 model | 94.16 | 0.943 | 0.947 | 0.943 |
Input Image Dataset Type | Hybrid Classification Model | Accuracy | F1 Score | Recall | Precision |
---|---|---|---|---|---|
Image dataset(base) | VGG 16 + SVM | 95.02 | 0.947 | 0.956 | 0.950 |
Base preprocessed + ESRGAN | VGG 16 + SVM | 96.66 | 0.966 | 0.968 | 0.966 |
Base preprocessed + ESRGAN and CLAHE | VGG 16 + SVM | 98.33 | 0.983 | 0.984 | 0.983 |
Base preprocessed + ESRGAN and WB | VGG 16 + SVM | 97.67 | 0.975 | 0.979 | 0.976 |
Model | Training Time (min) | Inference Time (s) | Testing Time (s) | Memory Usage (GB) | Accuracy (%) | F1 Score |
---|---|---|---|---|---|---|
VGG-16 (Full) | 15 | 27.01 | 28,61 | 4 | 89.71 | 0.85 |
ResNet50 | 25 | 16.34 | 18.04 | 6 | 60.2 | 0.59 |
InceptionV3 | 18 | 11.97 | 14.58 | 5 | 91.5 | 0.91 |
DenseNet121 | 10 | 10.20 | 11.83 | 3 | 95.3 | 0.95 |
EfficientNet-B0 | 12 | 4.40 | 5.98 | 2.5 | 94.1 | 0.94 |
MobileNetV2 | 8 | 6.42 | 7.99 | 2 | 92.8 | 0.93 |
VGG-16 Feature Extraction + SVM | <2 | 1.74 | 2.88 | 1 | 97.4 | 0.97 |
VGG-16 Feature Extraction + RF | <2 | 0.60 | 1.74 | 1 | 93.7 | 0.93 |
VGG-16 Feature Extraction + XGBoost | <2 | 0.30 | 1.47 | 1 | 92.7 | 0.92 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tariku, G.; Ghiglieno, I.; Simonetto, A.; Gentilin, F.; Armiraglio, S.; Gilioli, G.; Serina, I. Advanced Image Preprocessing and Integrated Modeling for UAV Plant Image Classification. Drones 2024, 8, 645. https://doi.org/10.3390/drones8110645
Tariku G, Ghiglieno I, Simonetto A, Gentilin F, Armiraglio S, Gilioli G, Serina I. Advanced Image Preprocessing and Integrated Modeling for UAV Plant Image Classification. Drones. 2024; 8(11):645. https://doi.org/10.3390/drones8110645
Chicago/Turabian StyleTariku, Girma, Isabella Ghiglieno, Anna Simonetto, Fulvio Gentilin, Stefano Armiraglio, Gianni Gilioli, and Ivan Serina. 2024. "Advanced Image Preprocessing and Integrated Modeling for UAV Plant Image Classification" Drones 8, no. 11: 645. https://doi.org/10.3390/drones8110645
APA StyleTariku, G., Ghiglieno, I., Simonetto, A., Gentilin, F., Armiraglio, S., Gilioli, G., & Serina, I. (2024). Advanced Image Preprocessing and Integrated Modeling for UAV Plant Image Classification. Drones, 8(11), 645. https://doi.org/10.3390/drones8110645