CTDUNet: A Multimodal CNN–Transformer Dual U-Shaped Network with Coordinate Space Attention for Camellia oleifera Pests and Diseases Segmentation in Complex Environments
Abstract
:1. Introduction
2. Results
2.1. Dataset
2.2. Model Evaluation Indicator
2.3. Experiment Setup
2.4. Analysis and Comparison of the Model Results
2.5. Comparison of Coordinate Space Attention
2.6. Ablation Experiment
2.7. Label Balance
3. Discussion
4. Materials and Methods
4.1. CNN Branch
4.2. ViT Branch
4.3. Coordinate Space Attention
4.4. Loss Function
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, X.; Wu, Y.; Gao, Y.; Jiang, Z.; Zhao, Z.; Zeng, W.; Xie, M.; Liu, S.; Liu, R.; Chao, Y.; et al. Valorization of Camellia oleifera Oil Processing Byproducts to Value-Added Chemicals and Biobased Materials: A Critical Review. Green Energy Environ. 2024, 9, 28–53. [Google Scholar] [CrossRef]
- Yang, Z.; Wang, Y.; Wu, X.; Quan, W.; Chen, Q.; Wang, A. Efficient Preparation of Biodiesel Using Sulfonated Camellia Oleifera Shell Biochar as a Catalyst. Molecules 2024, 29, 2752. [Google Scholar] [CrossRef] [PubMed]
- Wu, W.-J.; Zou, Y.-N.; Xiao, Z.-Y.; Wang, F.-L.; Hashem, A.; Abd_Allah, E.F.; Wu, Q.-S. Changes in Fatty Acid Profiles in Seeds of Camellia oleifera Treated by Mycorrhizal Fungi and Glomalin. Horticulturae 2024, 10, 580. [Google Scholar] [CrossRef]
- Dong, Z.; Yang, F.; Du, J.; Wang, K.; Lv, L.; Long, W. Identification of Varieties in Camellia oleifera Leaf Based on Deep Learning Technology. Ind. Crops Prod. 2024, 216, 118635. [Google Scholar] [CrossRef]
- Khan, H.; Haq, I.U.; Munsif, M.; Mustaqeem; Khan, S.U.; Lee, M.Y. Automated Wheat Diseases Classification Framework Using Advanced Machine Learning Technique. Agriculture 2022, 12, 1226. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, X.; Chen, Z.; Wang, K.; Sun, Y.; Jiang, J.; Liu, X. Classification of Camellia oleifera Diseases in Complex Environments by Attention and Multi-Dimensional Feature Fusion Neural Network. Plants 2023, 12, 2701. [Google Scholar] [CrossRef] [PubMed]
- Lei, X.; Wu, M.; Li, Y.; Liu, A.; Tang, Z.; Chen, S.; Xiang, Y. Detection and Positioning of Camellia oleifera Fruit Based on LBP Image Texture Matching and Binocular Stereo Vision. Agronomy 2023, 13, 2153. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, H.; Liu, Y.; Luo, Y.; Li, H.; Chen, H.; Liao, K.; Li, L. A Trunk Detection Method for Camellia oleifera Fruit Harvesting Robot Based on Improved YOLOv7. Forests 2023, 14, 1453. [Google Scholar] [CrossRef]
- Zhang, Y.-P.; Zhang, X.-Y.; Cheng, Y.-T.; Li, B.; Teng, X.-Z.; Zhang, J.; Lam, S.; Zhou, T.; Ma, Z.-R.; Sheng, J.-B.; et al. Artificial Intelligence-Driven Radiomics Study in Cancer: The Role of Feature Engineering and Modeling. Mil. Med. Res. 2023, 10, 22. [Google Scholar] [CrossRef]
- Chen, Q.; Li, M.; Chen, C.; Zhou, P.; Lv, X.; Chen, C. MDFNet: Application of Multimodal Fusion Method Based on Skin Image and Clinical Data to Skin Cancer Classification. J. Cancer Res. Clin. Oncol. 2023, 149, 3287–3299. [Google Scholar] [CrossRef]
- Xu, P.; Zhu, X.; Clifton, D.A. Multimodal Learning With Transformers: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12113–12132. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Zhou, H.; Hu, Y.; Xue, Y.; Zhou, G.; Li, L.; Dai, W.; Li, J. High-Accuracy Tomato Leaf Disease Image-Text Retrieval Method Utilizing LAFANet. Plants 2024, 13, 1176. [Google Scholar] [CrossRef] [PubMed]
- Mia, M.S.; Tanabe, R.; Habibi, L.N.; Hashimoto, N.; Homma, K.; Maki, M.; Matsui, T.; Tanaka, T.S.T. Multimodal Deep Learning for Rice Yield Prediction Using UAV-Based Multispectral Imagery and Weather Data. Remote Sens. 2023, 15, 2511. [Google Scholar] [CrossRef]
- Oluwasammi, A.; Aftab, M.U.; Qin, Z.; Ngo, S.T.; Doan, T.V.; Nguyen, S.B.; Nguyen, S.H.; Nguyen, G.H. Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning. Complexity 2021, 2021, 5538927. [Google Scholar] [CrossRef]
- Zhao, G.; Zhang, C.; Shang, H.; Wang, Y.; Zhu, L.; Qian, X. Generative Label Fused Network for Image–Text Matching. Knowl. -Based Syst. 2023, 263, 110280. [Google Scholar] [CrossRef]
- Lu, S.; Ding, Y.; Liu, M.; Yin, Z.; Yin, L.; Zheng, W. Multiscale Feature Extraction and Fusion of Image and Text in VQA. Int. J. Comput. Intell. Syst. 2023, 16, 54. [Google Scholar] [CrossRef]
- Li, T.; Bai, J.; Wang, Q. Enhancing Medical Text Detection with Vision-Language Pre-Training and Efficient Segmentation. Complex. Intell. Syst. 2024, 10, 3995–4007. [Google Scholar] [CrossRef]
- Li, Z.; Li, Y.; Li, Q.; Wang, P.; Guo, D.; Lu, L.; Jin, D.; Zhang, Y.; Hong, Q. LViT: Language Meets Vision Transformer in Medical Image Segmentation. IEEE Trans. Med. Imaging 2024, 43, 96–107. [Google Scholar] [CrossRef] [PubMed]
- Munsif, M.; Ullah, M.; Ahmad, B.; Sajjad, M.; Cheikh, F.A. Monitoring Neurological Disorder Patients via Deep Learning Based Facial Expressions Analysis. In IFIP International Conference on Artificial Intelligence Applications and Innovations; Springer: Cham, Switzerland, 2022; pp. 412–423. [Google Scholar]
- Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A Review of Convolutional Neural Networks in Computer Vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
- Amer, A.; Lambrou, T.; Ye, X. MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation. Appl. Sci. 2022, 12, 3676. [Google Scholar] [CrossRef]
- Zhao, X.; Xu, W. NFMPAtt-Unet: Neighborhood Fuzzy C-Means Multi-Scale Pyramid Hybrid Attention Unet for Medical Image Segmentation. Neural Netw. 2024, 178, 106489. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Zhou, G.; Zhu, W.; Chai, Y.; Li, L.; Wang, Y.; Hu, Y.; Dai, W.; Liu, R.; Sun, L. Identification of Rice Disease under Complex Background Based on PSOC-DRCNet. Expert. Syst. Appl. 2024, 249, 123643. [Google Scholar] [CrossRef]
- GS-DeepLabV3+: A Mountain Tea Disease Segmentation Network Based on Improved Shuffle Attention and Gated Multidimensional Feature Extraction—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/abs/pii/S026121942400190X (accessed on 3 July 2024).
- On-Plant Size and Weight Estimation of Tomato Fruits Using Deep Neural Networks and RGB-D Imaging. Available online: https://elibrary.asabe.org/abstract.asp?AID=54666&t=3&dabs=Y&redir=&redirType= (accessed on 3 July 2024).
- Liu, C.; Feng, Q.; Sun, Y.; Li, Y.; Ru, M.; Xu, L. YOLACTFusion: An Instance Segmentation Method for RGB-NIR Multimodal Image Fusion Based on an Attention Mechanism. Comput. Electron. Agric. 2023, 213, 108186. [Google Scholar] [CrossRef]
- Transparent Medical Image AI via an Image–Text Foundation Model Grounded in Medical Literature|Nature Medicine. Available online: https://www.nature.com/articles/s41591-024-02887-x (accessed on 3 July 2024).
- Ishmam, M.F.; Shovon, M.S.H.; Mridha, M.F.; Dey, N. From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities. Inf. Fusion 2024, 106, 102270. [Google Scholar] [CrossRef]
- Zhou, H.; Hu, Y.; Liu, S.; Zhou, G.; Xu, J.; Chen, A.; Wang, Y.; Li, L.; Hu, Y. A Precise Framework for Rice Leaf Disease Image–Text Retrieval Using FHTW-Net. Plant Phenomics 2024, 6, 0168. [Google Scholar] [CrossRef] [PubMed]
- Large Sequence Models for Sequential Decision-Making: A Survey|Frontiers of Computer Science. Available online: https://link.springer.com/article/10.1007/s11704-023-2689-5 (accessed on 3 July 2024).
- Turchin, A.; Masharsky, S.; Zitnik, M. Comparison of BERT Implementations for Natural Language Processing of Narrative Medical Documents. Inform. Med. Unlocked 2023, 36, 101139. [Google Scholar] [CrossRef]
- Zhang, X.; Li, W.; Wang, X.; Wang, L.; Zheng, F.; Wang, L.; Zhang, H. A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text–Image Retrieval in Remote Sensing. Remote Sens. 2023, 15, 4637. [Google Scholar] [CrossRef]
- Vision Transformer With Quadrangle Attention|IEEE Journals & Magazine|IEEE Xplore. Available online: https://ieeexplore.ieee.org/abstract/document/10384565 (accessed on 3 July 2024).
- Chen, Y.-C.; Li, L.; Yu, L.; El Kholy, A.; Ahmed, F.; Gan, Z.; Cheng, Y.; Liu, J. Uniter: Learning Universal Image-Text Representations. 2019. Available online: https://openreview.net/forum?id=S1eL4kBYwr (accessed on 23 June 2024).
- Gan, C.; Fu, X.; Feng, Q.; Zhu, Q.; Cao, Y.; Zhu, Y. A Multimodal Fusion Network with Attention Mechanisms for Visual–Textual Sentiment Analysis. Expert Syst. Appl. 2024, 242, 122731. [Google Scholar] [CrossRef]
- Zhang, K.; Mao, Z.; Liu, A.-A.; Zhang, Y. Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching. IEEE Trans. Multimed. 2023, 25, 1320–1332. [Google Scholar] [CrossRef]
- Gao, Y.; Cao, H.; Cai, W.; Zhou, G. Pixel-Level Road Crack Detection in UAV Remote Sensing Images Based on ARD-Unet. Measurement 2023, 219, 113252. [Google Scholar] [CrossRef]
- Wang, S.; Li, Z.; Liao, L.; Zhang, C.; Zhao, J.; Sang, L.; Qian, W.; Pan, G.; Huang, L.; Ma, H. DPAM-PSPNet: Ultrasonic Image Segmentation of Thyroid Nodule Based on Dual-Path Attention Mechanism. Phys. Med. Biol. 2023, 68, 165002. [Google Scholar] [CrossRef]
- Zheng, Z.; Hu, Y.; Guo, T.; Qiao, Y.; He, Y.; Zhang, Y.; Huang, Y. AGHRNet: An Attention Ghost-HRNet for Confirmation of Catch-and-shake Locations in Jujube Fruits Vibration Harvesting. Comput. Electron. Agric. 2023, 210, 107921. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, Q.; Yang, Y.; Liu, N.; Chen, Y.; Gao, J. Seismic Facies Segmentation via a Segformer-Based Specific Encoder–Decoder–Hypercolumns Scheme. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5903411. [Google Scholar] [CrossRef]
- Yang, J.; Ke, A.; Yu, Y.; Cai, B. Scene Sketch Semantic Segmentation with Hierarchical Transformer. Knowl. -Based Syst. 2023, 280, 110962. [Google Scholar] [CrossRef]
- Yang, Y.; Li, D.; Zhao, S. A Novel Approach for Underwater Fish Segmentation in Complex Scenes Based on Multi-Levels Triangular Atrous Convolution. Aquacult. Int. 2024, 32, 5215–5240. [Google Scholar] [CrossRef]
- Akhyar, A.; Asyraf Zulkifley, M.; Lee, J.; Song, T.; Han, J.; Cho, C.; Hyun, S.; Son, Y.; Hong, B.-W. Deep Artificial Intelligence Applications for Natural Disaster Management Systems: A Methodological Review. Ecol. Indic. 2024, 163, 112067. [Google Scholar] [CrossRef]
- Zhang, M.; Gao, H.; Liao, X.; Ning, B.; Gu, H.; Yu, B. DBGRU-SE: Predicting Drug–Drug Interactions Based on Double BiGRU and Squeeze-and-Excitation Attention Mechanism. Brief. Bioinform. 2023, 24, bbad184. [Google Scholar] [CrossRef] [PubMed]
- Wu, L.; Liu, Y.; Zhang, J.; Zhang, B.; Wang, Z.; Tong, J.; Li, M.; Zhang, A. Identification of Flood Depth Levels in Urban Waterlogging Disaster Caused by Rainstorm Using a CBAM-Improved ResNet50. Expert Syst. Appl. 2024, 255, 124382. [Google Scholar] [CrossRef]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Yang, W.; Wu, J.; Zhang, J.; Gao, K.; Du, R.; Wu, Z.; Firkat, E.; Li, D. Deformable Convolution and Coordinate Attention for Fast Cattle Detection. Comput. Electron. Agric. 2023, 211, 108006. [Google Scholar] [CrossRef]
- Bakasa, W.; Viriri, S. VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction. J. Imaging 2023, 9, 138. [Google Scholar] [CrossRef]
- ValizadehAslani, T.; Shi, Y.; Ren, P.; Wang, J.; Zhang, Y.; Hu, M.; Zhao, L.; Liang, H. PharmBERT: A Domain-Specific BERT Model for Drug Labels. Brief. Bioinform. 2023, 24, bbad226. [Google Scholar] [CrossRef] [PubMed]
- Wen, G.; Li, S.; Liu, F.; Luo, X.; Er, M.-J.; Mahmud, M.; Wu, T. YOLOv5s-CA: A Modified YOLOv5s Network with Coordinate Attention for Underwater Target Detection. Sensors 2023, 23, 3367. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Y.; Chen, Y.; Rong, Y.; Xiong, S.; Lu, X. Global-Group Attention Network With Focal Attention Loss for Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4700514. [Google Scholar] [CrossRef]
- Chen, Y.; Shi, B. Enhanced Heterogeneous Graph Attention Network with a Novel Multilabel Focal Loss for Document-Level Relation Extraction. Entropy 2024, 26, 210. [Google Scholar] [CrossRef]
- Adaptive T-vMF Dice Loss: An Effective Expansion of Dice Loss for Medical Image Segmentation—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/pii/S0010482523011605 (accessed on 4 July 2024).
Dataset | Category | Amount | Complex Background | Text | Mask |
---|---|---|---|---|---|
Image recognition of Camellia oleifera diseases based on convolutional neural network & transfer learning [Long Mansheng] | 4 | 3750 | × | × | × |
Classification of Camellia oleifera Diseases in Complex Environments by Attention and Multi-Dimensional Feature Fusion Neural Network [Yixin Chen] | 7 | 12,170 | √ | × | × |
Ours | 7 | 1400 | √ | √ | √ |
Environment | Device | Parameter |
---|---|---|
Hardware environment | CPU | AMD EPYC 7763 |
GPU | NVIDIA RTX 4090 | |
RAM | 64 G | |
Video memory | 24 G | |
Software environment | OS | Ubantu 22.04 |
CUDA Toolkit | 11.5 | |
CUDNN | 8.9.7 | |
Python | 3.7.12 | |
Pytorch-GPU | 1.8.0 | |
torchvision | 0.9.0 |
Hyperparameters | Parameters |
---|---|
Size of input images | 224 × 224 |
Batch | 16 |
Initial learning rete | 0.0001 |
Optimizer | Adam |
Momentum | 0.9 |
Model | Text | Precision (%) | Recall (%) | Dice (%) | mIoU (%) |
---|---|---|---|---|---|
DeeplabV3+-mobilnet | × | 85.91 | 87.36 | 86.63 | 77.12 |
DeeplabV3+-xception | × | 83.63 | 87.25 | 85.40 | 75.33 |
Segformer-b2 | × | 90.18 | 89.13 | 89.65 | 81.76 |
Segformer-b5 | × | 92.45 | 87.73 | 90.03 | 82.21 |
PSPNet-MobilenetV2 | × | 79.56 | 70.61 | 74.82 | 61.73 |
PSPNet-Resnet50 | × | 83.96 | 74.82 | 79.13 | 67.22 |
HrNetV2-W18 | × | 86.59 | 83.87 | 85.20 | 74.96 |
HrNetV2-W32 | × | 89.51 | 85.93 | 87.68 | 78.66 |
HrNetV2-W48 | × | 89.97 | 87.03 | 88.48 | 80.01 |
UNet-Vgg | × | 86.68 | 89.15 | 87.90 | 78.64 |
UNet-Resnet | × | 83.42 | 86.76 | 85.01 | 74.66 |
LViT | √ | 95.28 | 85.50 | 89.26 | 80.73 |
CTDUNet (Ours) | √ | 93.69 | 91.24 | 92.45 | 86.14 |
Attention (CTDUNet) | Precision (%) | Recall (%) | Dice (%) | mIoU (%) |
---|---|---|---|---|
SE [44] | 92.63 | 90.49 | 91.55 | 84.60 |
CBAM [45] | 93.10 | 90.44 | 91.75 | 85.00 |
EMA [46] | 93.16 | 90.73 | 92.18 | 85.30 |
Coordinate Attention [47] | 93.67 | 90.73 | 92.18 | 85.67 |
Coordinate Space Attention (Ours) | 93.69 | 91.24 | 92.45 | 86.14 |
Method | Text | Precision (%) | Recall (%) | Dice (%) | mIoU (%) |
---|---|---|---|---|---|
LViT (base) | √ | 91.91 | 86.38 | 89.06 | 80.27 |
CTDUNet (base) | √ | 92.85 | 89.30 | 91.04 | 83.63 |
CTDUNet (base) | × | 90.51 | 86.97 | 88.70 | 79.83 |
CTDUNet + (SPPF + CA-DownViT) | √ | 93.08 | 90.42 | 91.73 | 84.93 |
CTDUNet + CSA | √ | 93.00 | 90.38 | 91.67 | 84.81 |
CTDUNet + (SPPF + CA-DownViT) + CSA | × | 91.11 | 87.49 | 89.26 | 80.73 |
CTDUNet + (SPPF + CA-DownViT) + CSA | √ | 93.69 | 91.24 | 92.45 | 86.14 |
Model (IoU%) | Background | Leaf | Tea White Scab | Worm Holes | Red Leaf Spot | Algae Leaf Spot | Tea Sooty Mold | Soft Rot | Anthracnose |
---|---|---|---|---|---|---|---|---|---|
DeeplabV3+ | 95.88 | 92.39 | 62.38 | 64.54 | 86.27 | 73.42 | 79.37 | 84.95 | 67.91 |
Segformer | 96.85 | 94.05 | 64.92 | 69.27 | 87.79 | 83.22 | 84.04 | 86.33 | 73.41 |
PSPNet | 94.95 | 89.66 | 16.99 | 49.63 | 79.54 | 65.72 | 69.84 | 80.03 | 58.67 |
HrNetV2 | 95.76 | 92.3 | 59.47 | 66.64 | 86.21 | 77.32 | 79.85 | 88.76 | 73.78 |
UNet | 92.8 | 89.37 | 68.09 | 67.8 | 82.68 | 75.66 | 75.66 | 85.68 | 69.75 |
LViT | 93.25 | 89.03 | 69.58 | 70.63 | 82.39 | 78.79 | 85.6 | 85.46 | 67.78 |
CTDUNet | 95.23 | 91.98 | 75.29 | 78.96 | 86.15 | 85.49 | 87.91 | 94.67 | 79.57 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, R.; Zhang, R.; Zhou, H.; Xie, T.; Peng, Y.; Chen, X.; Yu, G.; Wan, F.; Li, L.; Zhang, Y.; et al. CTDUNet: A Multimodal CNN–Transformer Dual U-Shaped Network with Coordinate Space Attention for Camellia oleifera Pests and Diseases Segmentation in Complex Environments. Plants 2024, 13, 2274. https://doi.org/10.3390/plants13162274
Guo R, Zhang R, Zhou H, Xie T, Peng Y, Chen X, Yu G, Wan F, Li L, Zhang Y, et al. CTDUNet: A Multimodal CNN–Transformer Dual U-Shaped Network with Coordinate Space Attention for Camellia oleifera Pests and Diseases Segmentation in Complex Environments. Plants. 2024; 13(16):2274. https://doi.org/10.3390/plants13162274
Chicago/Turabian StyleGuo, Ruitian, Ruopeng Zhang, Hao Zhou, Tunjun Xie, Yuting Peng, Xili Chen, Guo Yu, Fangying Wan, Lin Li, Yongzhong Zhang, and et al. 2024. "CTDUNet: A Multimodal CNN–Transformer Dual U-Shaped Network with Coordinate Space Attention for Camellia oleifera Pests and Diseases Segmentation in Complex Environments" Plants 13, no. 16: 2274. https://doi.org/10.3390/plants13162274