Veg-DenseCap: Dense Captioning Model for Vegetable Leaf Disease Images
Abstract
:1. Introduction
2. Related Work
2.1. Image Captioning
2.2. Target Detection Model
2.3. Vegetable Disease Recognition
3. Materials and Methods
3.1. Datasets
3.1.1. Data Acquisition and Labeling
3.1.2. Data Preprocessing
3.2. Construction of Veg-DenseCap
- (1)
- First, the disease detector extracts the features of a lesion in the image through feature extraction network, region proposal network.
- (2)
- Second, the extracted lesion features are applied to bounding box regression and text generation, respectively.
- (3)
- Finally, the bounding box and the generated text description sentences are matched and displayed in the image, and the user can judge whether the relevant features are consistent with the semantics of human cognition based on the generated lesion features.
3.2.1. Disease Detector
3.2.2. Language Generator
3.2.3. Loss Function
4. Experiment and Analysis
4.1. Evaluation Indicators
4.2. Quantitative Analysis
4.2.1. Comparison between Different Models
4.2.2. Ablation Test
4.2.3. Comparison between Different Attention Mechanisms
4.3. Qualitative Analysis
5. Discussions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dong, J.; Gruda, N.; Li, X.; Cai, Z.; Zhang, L.; Duan, Z. Global vegetable supply towards sustainable food production and a healthy diet. J. Clean. Prod. 2022, 369, 133212. [Google Scholar] [CrossRef]
- Głąbska, D.; Guzek, D.; Groele, B.; Gutkowska, K. Fruit and vegetable intake and mental health in adults: A systematic review. Nutrients 2020, 12, 115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- National Bureau of Statistics of China. 2021 China Statistical Yearbook; China Statistics Publishing Society: Beijing, China, 2021.
- Ma, J.; Pang, S.; Yang, B.; Zhu, J.; Li, Y. Spatial-content image search in complex scenes. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 2503–2511. [Google Scholar]
- Radenović, F.; Tolias, G.; Chum, O. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1655–1668. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Summers, R.M. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9049–9058. [Google Scholar]
- Liu, G.; Hsu, T.M.H.; McDermott, M.; Boag, W.; Weng, W.H.; Szolovits, P.; Ghassemi, M. Clinically accurate chest X-ray report generation. In Proceedings of the Machine Learning for Healthcare Conference, PMLR, Ann Arbor, MI, USA, 9–10 August 2019; pp. 249–269. [Google Scholar]
- Wang, Y.; Mei, T.; Gong, S.; Hua, X.S. Combining global, regional and contextual features for automatic image annotation. Pattern Recognit. 2009, 42, 259–266. [Google Scholar] [CrossRef]
- Burdescu, D.D.; Mihai, C.G.; Stanescu, L.; Brezovan, M. Automatic image annotation and semantic based image retrieval for medical domain. Neurocomputing 2013, 109, 33–48. [Google Scholar] [CrossRef]
- Gu, C.; Bu, J.; Zhou, X.; Yao, C.; Ma, D.; Yu, Z.; Yan, X. Cross-modal image retrieval with deep mutual information maximization. Neurocomputing 2022, 496, 166–177. [Google Scholar] [CrossRef]
- Zeng, X.; Wen, L.; Liu, B.; Qi, X. Deep learning for ultrasound image caption generation based on object detection. Neurocomputing 2020, 392, 132–141. [Google Scholar] [CrossRef]
- Liu, H.; Peng, L.; Xie, Y.; Li, X.; Bi, D.; Zou, Y.; Lin, Y.; Zhang, P.; Li, G. Describe like a pathologist: Glomerular immunofluorescence image caption based on hierarchical feature fusion attention network. Expert Syst. Appl. 2023, 213, 119168. [Google Scholar] [CrossRef]
- Yang, X.; Chen, R.; Zhang, F.; Zhang, L.; Fan, X.; Ye, Q.; Fu, L. Pixel-level automatic annotation for forest fire image. Eng. Appl. Artif. Intell. 2021, 104, 104353. [Google Scholar] [CrossRef]
- Mamat, N.; Othman, M.F.; Abdulghafor, R.; Alwan, A.A.; Gulzar, Y. Enhancing Image Annotation Technique of Fruit Classification Using a Deep Learning Approach. Sustainability 2023, 15, 901. [Google Scholar] [CrossRef]
- Fuentes, A.; Yoon, S.; Park, D.S. Deep learning-based phenotyping system with glocal description of plant anomalies and symptoms. Front. Plant Sci. 2019, 10, 1321. [Google Scholar] [CrossRef] [PubMed]
- Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3156–3164. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, PMLR, Lile, France, 6–11 July 2015; pp. 2048–2057. [Google Scholar]
- Huang, L.; Wang, W.; Chen, J.; Wei, X.Y. Attention on attention for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4634–4643. [Google Scholar]
- Cornia, M.; Stefanini, M.; Baraldi, L.; Cucchiara, R. Meshed-memory transformer for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10578–10587. [Google Scholar]
- Johnson, J.; Karpathy, A.; Fei-Fei, L. Densecap: Fully convolutional localization networks for dense captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4565–4574. [Google Scholar]
- Yang, L.; Tang, K.; Yang, J.; Li, L.J. Dense captioning with joint inference and visual context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2193–2202. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; IEEE: Piscataway, NJ, USA, 2006; Volume 3, pp. 850–855. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Picon, A.; Seitz, M.; Alvarez-Gila, A.; Mohnke, P.; Ortiz-Barredo, A.; Echazarra, J. Crop conditional Convolutional Neural Networks for massive multi-crop plant disease classification over cell phone acquired images taken on real field conditions. Comput. Electron. Agric. 2019, 167, 105093. [Google Scholar] [CrossRef]
- Zhao, Y.; Sun, C.; Xu, X.; Chen, J. RIC-Net: A plant disease classification model based on the fusion of Inception and residual structure and embedded attention mechanism. Comput. Electron. Agric. 2022, 193, 106644. [Google Scholar] [CrossRef]
- Gulzar, Y. Fruit Image Classification Model Based on MobileNetV2 with Deep Transfer Learning Technique. Sustainability 2023, 15, 1906. [Google Scholar] [CrossRef]
- Zhang, K.; Wu, Q.; Chen, Y. Detecting soybean leaf disease from synthetic image using multi-feature fusion faster R-CNN. Comput. Electron. Agric. 2021, 183, 106064. [Google Scholar] [CrossRef]
- Li, J.; Qiao, Y.; Liu, S.; Zhang, J.; Yang, Z.; Wang, M. An improved YOLOv5-based vegetable disease detection method. Comput. Electron. Agric. 2022, 202, 107345. [Google Scholar] [CrossRef]
- Li, S.; Li, K.; Qiao, Y.; Zhang, L. A multi-scale cucumber disease detection method in natural scenes based on YOLOv5. Comput. Electron. Agric. 2022, 202, 107363. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Disease Category | Quantity | Disease Category | Quantity |
---|---|---|---|
Tomato powdery mildew | 800 | Cucumber powdery mildew | 772 |
Tomato early blight | 180 | Cucumber target spot | 87 |
Tomato late blight | 2321 | Cucumber downy mildew | 542 |
Tomato leaf mold | 63 | Cucumber grey mold | 26 |
Tomato gray mold | 157 | Cucumber anthracnose | 121 |
Mixed diseases | 21 |
Disease Image | Text Description | Disease Image | Text Description |
---|---|---|---|
1. There are white round powdery spots on the edge of leaf vein. 2. White powdery disease spots. 3. There are white disease spots near the leaf vein. 4. Tomato leaf with tomato powdery mildew. | 1. Brown spots appear on the leaf, with a light green outer ring. 2. Brown disease spots with a light green outer ring appear on the leaf edge. 3. Tomato leaf with late blight. | ||
1. There are grayish brown thread-like disease spots on the leaf edge. 2. There are grayish white irregular disease spots on the leaf edge. 3. Tomato leaf, suspected to have tomato early blight. | 1. Strip-shape yellow bright spots. 2. Small round-like yellow spots. 3. Small quasi-circular yellow bright spots. 4. Spots with a yellow color and a strip shape. 5. Cucumber leaf that is suspected to have cucumber target spot disease. | ||
1. Irregularly shaped disease spots with a faded green color. 2. Yellowish green disease spots with an irregular shape. 3. Round-like faded green disease spots with obvious edges. 4. Tomato leaf that is suspected to have tomato leaf mildew. | 1. Irregular yellowish brown disease spots, with a yellow ring around the spot. 2. Quasi-circular earthy yellow disease spots, with a yellow ring on the edge of the spot. 3. Cucumber leaf that is suspected to have cucumber anthracnose. | ||
1. There are gray hairy disease spots at the leaf tip. 2. Cucumber leaf that is suspected to have cucumber gray mold. | 1. Grayish brown disease spots with obvious gray hairs on the surface. 2. Tomato gray mold grows on the tomato leaf. | ||
1. The leaf develops white irregular powdery disease spots. 2. There are white powdery disease spots on the leaf. 3. There are irregular white powdery disease spots on the leaf. 4. White irregular powdery disease spots. 5. Cucumber leaf with multiple disease spots of cucumber powdery mildew. | 1. The leaf edge is distributed with irregularly shaped disease spots, which are brown at the center area and faded green near the edge. 2. Round white powdery disease spots. 3. White powdery disease spots. 4. Tomato leaf that is suspected to have tomato powdery mildew and tomato late blight. | ||
1. There are polygonal disease spots near the leaf edge, which are of a yellowish color. 2. Yellowish irregular disease spots. 3. Irregular disease spots, with a brown color at the center and a yellowish color at the upper and lower edges. 4. There are brown and irregular disease spots near the leaf vein. 5. Irregularly shaped downy mildew disease spots in yellow and brown colors. 6. There are many disease spots of cucumber downy mildew on the front side of cucumber leaf. |
Disease Category | Quantity | Disease Category | Quantity |
---|---|---|---|
Tomato powdery mildew | 1600 | Cucumber powdery mildew | 1544 |
Tomato early blight | 1260 | Cucumber target spot | 609 |
Tomato late blight | 2321 | Cucumber downy mildew | 2168 |
Tomato leaf mold | 440 | Cucumber grey mold | 182 |
Tomato gray mold | 1098 | Cucumber anthracnose | 847 |
Mixed diseases | 146 |
Model | mAP (%) | Det_mAP (%) |
---|---|---|
FCLN | 78.9 | 82.7 |
JIVC | 52.9 | 53.1 |
Literature [15] | 56.6 | 56.1 |
Veg-DenseCap | 88.0 | 91.7 |
Scheme | ResNet101 | FPN | CBAM | Focal-Loss |
---|---|---|---|---|
0 | - | - | - | - |
1 | √ | - | - | - |
2 | √ | √ | - | - |
3 | √ | √ | √ | - |
4 | √ | √ | √ | √ |
Model | mAP (%) | Det_mAP (%) |
---|---|---|
Scheme-0 | 81.3 | 85.2 |
Scheme-1 | 82.3 | 85.2 |
Scheme-2 | 85.2 | 89.4 |
Scheme-3 | 87.5 | 91.5 |
Scheme-4 | 88.0 | 91.7 |
Attention Mechanism | mAP (%) | Det_mAP (%) |
---|---|---|
SE | 86.5 | 90.9 |
SimAM | 86.9 | 90.9 |
CA | 85.0 | 89.2 |
ACmix | 82.3 | 87.3 |
CBAM | 88.0 | 91.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, W.; Wang, C.; Gu, J.; Sun, X.; Li, J.; Liang, F. Veg-DenseCap: Dense Captioning Model for Vegetable Leaf Disease Images. Agronomy 2023, 13, 1700. https://doi.org/10.3390/agronomy13071700
Sun W, Wang C, Gu J, Sun X, Li J, Liang F. Veg-DenseCap: Dense Captioning Model for Vegetable Leaf Disease Images. Agronomy. 2023; 13(7):1700. https://doi.org/10.3390/agronomy13071700
Chicago/Turabian StyleSun, Wei, Chunshan Wang, Jingqiu Gu, Xiang Sun, Jiuxi Li, and Fangfang Liang. 2023. "Veg-DenseCap: Dense Captioning Model for Vegetable Leaf Disease Images" Agronomy 13, no. 7: 1700. https://doi.org/10.3390/agronomy13071700
APA StyleSun, W., Wang, C., Gu, J., Sun, X., Li, J., & Liang, F. (2023). Veg-DenseCap: Dense Captioning Model for Vegetable Leaf Disease Images. Agronomy, 13(7), 1700. https://doi.org/10.3390/agronomy13071700