DoubleNet: A Method for Generating Navigation Lines of Unstructured Soil Roads in a Vineyard Based on CNN and Transformer
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection and Dataset Establishment
2.1.1. Data Collection Platform
2.1.2. The Navigation Point Dataset
2.1.3. Test Videos
2.2. DoubleNet
2.2.1. Self-Adaption GELU (SA-GELU)
2.2.2. Fused MHSA (F-MHSA)
2.2.3. Flattening Module
2.2.4. DNBLK
2.2.5. DoubleNet Structure
2.3. Computing Hardware and Soft Environment
2.4. Experiment Introduction and Evaluation Metrics
2.4.1. Experiment Introduction
2.4.2. Evaluation Metrics
3. Results
3.1. Training Results
3.1.1. Training Performances of Different Models
3.1.2. DoubleNet’s Training Performances
3.2. Accuracy Results
3.2.1. Accuracy Results of Different Models
3.2.2. Ablation Experiment Result
3.3. Inference Results
3.3.1. Image Inference Results
3.3.2. Video Inference Results
4. Discussion
4.1. The Discussion About DoubleNet Components
4.1.1. The Discussion About SA-GELU
4.1.2. The Discussion About F-MHSA
4.1.3. The Discussion About F-Dataset
4.1.4. The Discussion About Model Structure
4.2. The Discussion About the Number of Navigation Points
4.3. The Discussion About DoubleNet’s Robustness
4.4. Limitations of DoubleNet
4.5. Practical Applications and Perspectives
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Block | Operation | Input Size | Output Size |
---|---|---|---|
(1) | cv 2D (3, 16, 3, 2, 1) + batch normalization + SA-GELU | 3 × 224 × 224 | 16 ×112×112 |
cv 2D (16, 16, 3, 2, 1) + batch normalization + SA-GELU | 16 ×112 ×112 | 16 × 56 × 56 | |
reshape 2 | 16 × 56 × 56 | 16 × 3136 × 1 | |
reshape 1 | 3 × 224 × 224 | 16 × 9408 × 1 | |
maxpooling 1D (3, 3) | 16 × 9408 × 1 | 16 × 3136 × 1 | |
adding | 16 × 3136 × 1 16 × 3136 × 1 | 16 × 3136 × 1 | |
(2) | cv 2D (16, 16, 1, 1, 0) + batch normalization + SA-GELU | 16 × 56 × 56 | 16 × 56 × 56 |
cv 2D (16, 16, 3, 1, 1) + batch normalization + SA-GELU | 16 × 56 × 56 | 16 × 56 × 56 | |
adding | 16 × 56 × 56 16 × 56 × 56 16 × 56 × 56 | 16 × 56 × 56 | |
cv 2D (16, 16, 1, 1, 0) + batch normalization + SA-GELU | 16 × 56 × 56 | 16 × 56 × 56 | |
cv 2D (16, 16, 3, 1, 1) + batch normalization + SA-GELU | 16 × 56 × 56 | 16 × 56 × 56 | |
concatenation | 16 × 56 × 56 16 × 56 × 56 | 32 × 56 × 56 | |
cv 2D (16, 32, 3, 1, 1) + batch normalization + SA-GELU | 16 × 56 × 56 | 32 × 56 × 56 | |
adding | 32 × 56 × 56 32 × 56 × 56 | 32 × 56 × 56 | |
cv 2D (32, 32, 3, 2, 1) + batch normalization + SA-GELU | 32 × 56 × 56 | 32 × 28 × 28 | |
cv 2D (16, 16, 7, 1, 3) + batch normalization + SA-GELU | 16 × 56 × 56 | 16 × 56 × 56 | |
adding | 16 × 56 × 56 16 × 56 × 56 | 16 × 56 × 56 | |
cv 2D (16, 32, 3, 2, 1) + batch normalization + SA-GELU | 16 × 56 × 56 | 32 × 28 × 28 | |
concatenation | 32 × 28 × 28 32 × 28 × 28 | 64 × 28 × 28 | |
MLP | 16 × 3136 × 1 | 16 × 784 × 1 | |
F-MHSA_2 + layer normalization | 16 × 784 × 1 | 16 × 784 × 1 | |
F-MHSA_1 + layer normalization | 16 × 3136 × 1 | 16 × 3136 ×1 | |
MLP | 16 × 3136 × 1 | 16 × 784 × 1 | |
adding | 16 × 784 × 1 16 × 784 × 1 | 16 × 784 × 1 | |
cv 1D (16, 64, 3, 1, 1) + batch normalization + SA-GELU | 16 × 784 × 1 | 64 × 784 × 1 | |
MLP | 64 × 784 × 1 | 64 × 784 × 1 | |
reshape | 64 × 28 × 28 | 64 × 784 × 1 | |
adding | 64 × 784 × 1 64 × 784 × 1 | 64 × 784 × 1 | |
(3) | cv 2D (64, 64, 1, 1, 0) + batch normalization + SA-GELU | 64 × 28 × 28 | 64 × 28 × 28 |
cv 2D (64, 64, 3, 1, 1) + batch normalization + SA-GELU | 64 × 28 × 28 | 64 × 28 × 28 | |
adding | 64 × 28 × 28 64 × 28 × 28 64 × 28 × 28 | 64 × 28 × 28 | |
cv 2D (64, 64, 1, 1, 0) + batch normalization + SA-GELU | 64 × 28 × 28 | 64 × 28 × 28 | |
cv 2D (64, 64, 3, 1, 1) + batch normalization + SA-GELU | 64 × 28 × 28 | 64 × 28 × 28 | |
concatenation | 64 × 28 × 28 64 × 28 × 28 | 128 × 28 × 28 | |
cv 2D (64, 128, 3, 1, 1) +batch normalization + SA-GELU | 64 × 28 × 28 | 128 × 28 × 28 | |
adding | 128 × 28 × 28 128 × 28 × 28 | 128 × 28 × 28 | |
cv 2D (128, 128, 3, 2, 1) +batch normalization + SA-GELU | 128 × 28 × 28 | 128 × 14 × 14 | |
cv 2D (64, 64, 7, 1, 3) + batch normalization + SA-GELU | 64 × 28 × 28 | 64 × 28 × 28 | |
adding | 64 × 28 × 28 64 × 28 × 28 | 64 × 28 × 28 | |
cv 2D (64, 128, 3, 2, 1) + batch normalization + SA-GELU | 64 × 28 × 28 | 128 × 14 × 14 | |
concatenation | 128 × 14 × 14 128 × 14 × 14 | 256 × 14 × 14 | |
MLP | 64 × 784 × 1 | 64 × 196 × 1 | |
F-MHSA_2 + layer normalization | 64 × 196 × 1 | 64 × 196 × 1 | |
F-MHSA_1 + layer normalization | 64 × 784 × 1 | 64 × 784 × 1 | |
MLP | 64 × 784 × 1 | 64 × 196 × 1 | |
adding | 64 × 196 × 1 64 × 196 × 1 | 64 × 196 × 1 | |
cv 1D (64, 256, 3, 1, 1) + batch normalization + SA-GELU | 64 × 196 × 1 | 256 × 196 × 1 | |
MLP | 256 × 196 × 1 | 256 × 196 × 1 | |
reshape | 256 × 14 × 14 | 256 × 196 × 1 | |
adding | 256 × 196 × 1 256 × 196 × 1 | 256 × 196 × 1 | |
(4) | cv 2D (256, 256, 1, 1, 0) + batch normalization + SA-GELU | 256 × 14 × 14 | 256 × 14 × 14 |
cv 2D (256, 256, 3, 1, 1) + batch normalization + SA-GELU | 256 × 14 × 14 | 256 × 14 × 14 | |
adding | 256 × 14 × 14 256 × 14 × 14 256 × 14 × 14 | 256 × 14 × 14 | |
cv 2D (256, 256, 1, 1, 0) + batch normalization + SA-GELU | 256 × 14 × 14 | 256 × 14 × 14 | |
cv 2D (256, 256, 3, 1, 1) + batch normalization + SA-GELU | 256 × 14 × 14 | 256 × 14 × 14 | |
concatenation | 256 × 14 × 14 256 × 14 × 14 | 512 × 14 × 14 | |
cv 2D (512, 512, 3, 2, 1) +batch normalization + SA-GELU | 512 × 14 × 14 | 512 × 7 × 7 | |
cv 2D (256, 256, 7, 1, 3) + batch normalization + SA-GELU | 256 × 14 × 14 | 256 × 14 × 14 | |
cv 2D (256, 256, 5, 1, 2) + batch normalization + SA-GELU | 256 × 14 × 14 | 256 × 14 × 14 | |
adding | 256 × 14 × 14 256 × 14 × 14 256 × 14 × 14 | 256 × 14 × 14 | |
cv 2D (256, 512, 3, 2, 1) + batch normalization + SA-GELU | 256 × 14 × 14 | 512 × 7 × 7 | |
concatenation | 512 × 7 × 7 512 × 7 × 7 | 1024 ×7 × 7 | |
F-MHSA_1 + layer normalization | 256 × 196 × 1 | 256 × 196 × 1 | |
F-MHSA_1 + layer normalization | 256 × 196 × 1 | 256 × 196 × 1 | |
adding | 256 × 196 × 1 256 × 196 × 1 | 256 × 196 × 1 | |
MLP | 256 × 196 × 1 | 256 × 49 × 1 | |
cv 1D (256, 1024, 3, 1, 1) +batch normalization + SA-GELU | 256 × 49 × 1 | 1024 × 49 × 1 | |
MLP | 1024 × 49 × 1 | 1024 × 49 × 1 | |
reshape | 1024 × 49 × 1 | 1024 × 7 × 7 | |
adding | 1024 × 7 × 7 1024 × 7 × 7 | 1024 × 7 × 7 | |
(5) | cv 2D (64, 256, 3, 2, 1) + batch normalization + SA-GELU | 64 × 28 × 28 | 256 × 14 × 14 |
(6) | cv 2D (256, 1024, 3, 2, 1) + batch normalization + SA-GELU | 256 × 14 × 14 | 1024 × 7 × 7 |
(7) | T—cv 2D (1024, 256, 4, 2, 1) + batch normalization + SA-GELU | 1024 × 7 × 7 | 256 × 14 × 14 |
(8) | T—cv 2D (256, 64, 4, 2, 1) + batch normalization + SA-GELU | 256 × 14 × 14 | 64 × 28 × 28 |
(9) | T—cv 2D (64, 16, 4, 2, 1) + batch normalization + SA-GELU | 64 × 28 × 28 | 16 × 56 × 56 |
(10) | cv 2D (16, 5, 1, 1, 0) | 16 × 56 × 56 | 5 × 56 × 56 |
(11) | cv 1D (1024, 256, 1,1,0) + linear (49, 196) + layer normal + SA-GELU | 1024 × 49 × 1 | 256 × 196 × 1 |
(12) | cv 1D (256, 64, 1,1,0) + linear (196, 784) + layer normal + SA-GELU | 256 × 196 × 1 | 64 × 784 × 1 |
(13) | cv 1D (64,16, 1,1,0) + linear (784, 3136) + layer normal + SA-GELU | 64 × 784 × 1 | 16 × 3136 × 1 |
References
- State of the World Vine and Wine Sector. Available online: https://www.oiv.int/what-we-do/data-discovery-report?oiv (accessed on 20 October 2024).
- Guevara, J.; Cheein, F.; Gené-Mola, J.; Rosell-Polo, J.; Gregorio, E. Analyzing and overcoming the effects of GNSS error on LiDAR based orchard parameters estimation. Comput. Electron. Agric. 2020, 170, 105255. [Google Scholar] [CrossRef]
- Shamshiri, R.; Navas, E.; Dworak, V.; Cheein, F.; Weltzien, C. A modular sensing system with CANBUS communication for assisted navigation of an agricultural mobile robot. Comput. Electron. Agric. 2024, 223, 109112. [Google Scholar] [CrossRef]
- Eceoğlu, O.; Ünal, İ. Optimizing Orchard Planting Efficiency with a GIS-Integrated Autonomous Soil-Drilling Robot. AgriEngineering 2024, 6, 2870–2890. [Google Scholar] [CrossRef]
- Malavazi, B.P.F.; Guyonneau, R.; Fasquel, J.; Lagrange, S.; Mercier, F. LiDAR-only based navigation algorithm for an autonomous agricultural robot. Comput. Electron. Agric. 2018, 154, 71–79. [Google Scholar] [CrossRef]
- Liu, W.; Li, W.; Feng, H.; Xu, J.; Yang, S.; Zheng, Y.; Liu, X.; Wang, Z.; Yi, X.; He, Y.; et al. Overall integrated navigation based on satellite and lidar in the standardized tall spindle apple orchards. Comput. Electron. Agric. 2024, 216, 108489. [Google Scholar] [CrossRef]
- Guhur, P.L.; Tapaswi, M.; Chen, S.; Laptev, I.; Schmid, C. Airbert: In-domain pretraining for vision-and-language navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV) 2021, Montreal, QC, Canada, 10–17 October 2021; pp. 1634–1643. [Google Scholar]
- Zhang, B.; Zhao, D.; Chen, C.; Li, J.; Zhang, W.; Qi, L.; Wang, S. Extraction of Crop Row Navigation Lines for Soybean Seedlings Based on Calculation of Average Pixel Point Coordinates. Agronomy 2024, 14, 1749. [Google Scholar] [CrossRef]
- Ban, C.; Wang, L.; Chi, R.; Su, T.; Ma, Y. A Camera-LiDAR-IMU fusion method for real-time extraction of navigation line between maize field rows. Comput. Electron. Agric. 2024, 223, 109114. [Google Scholar] [CrossRef]
- Gong, J.; Wang, X.; Zhang, Y.; Lan, Y.; Mostafa, K. Navigation line extraction based on root and stalk composite locating points. Comput. Electr. Eng. 2021, 92, 107115. [Google Scholar] [CrossRef]
- Stefas, N.; Bayram, H.; Isler, V. Vision-Based UAV Navigation in Orchards. IFAC-PapersOnLine 2016, 49, 10–15. [Google Scholar] [CrossRef]
- Fu, D.; Chen, Z.; Yao, Z.; Liang, Z.; Cai, Y.; Liu, C.; Tang, Z.; Lin, C.; Feng, X.; Qi, L. Vision-based trajectory generation and tracking algorithm for maneuvering of a paddy field robot. Comput. Electron. Agric. 2024, 226, 109368. [Google Scholar] [CrossRef]
- Opiyo, S.; Okinda, C.; Zhou, J.; Mwangi, E.; Makange, N. Medial axis-based machine-vision system for orchard robot navigation. Comput. Electron. Agric. 2021, 185, 106153. [Google Scholar] [CrossRef]
- Navone, A.; Martini, M.; Ambrosio, M.; Ostuni, A.; Angarano, S.; Chiaberge, M. GPS-free autonomous navigation in cluttered tree rows with deep semantic segmentation. Robot. Auton. Syst. 2025, 183, 104854. [Google Scholar] [CrossRef]
- Liu, Y.; Guo, Y.; Wang, X.; Yang, Y.; Zhang, J.; An, D.; Han, H.; Zhang, S.; Bai, T. Crop Root Rows Detection Based on Crop Canopy Image. Agriculture 2024, 14, 969. [Google Scholar] [CrossRef]
- Li, G.; Le, F.; Si, S.; Cui, L.; Xue, X. Image Segmentation-Based Oilseed Rape Row Detection for Infield Navigation of Agri-Robot. Agronomy 2024, 14, 1886. [Google Scholar] [CrossRef]
- Simons, C.; Liu, Z.; Marcus, B.; Roy-Chowdhury, A.K.; Karydis, K. Language-guided Robust Navigation for Mobile Robots in Dynamically-changing Environments. arXiv 2024, arXiv:2409.19459. [Google Scholar]
- Choudhary, A.; Kobayashi, Y.; Arjonilla, F.J.; Nagasaka, S.; Koike, M. Evaluation of mapping and path planning for non-holonomic mobile robot navigation in narrow pathway for agricultural application. In Proceedings of the 2021 IEEE/SICE International Symposium on System Integration (SII), Iwaki, Fukushima, Japan, 11–14 January 2021; pp. 17–22. [Google Scholar]
- Xiao, K.; Xia, W.; Liang, C. Visual Navigation Path Extraction Algorithm in Orchard under Complex Background. Trans. Chin. Soc. Agric. Mach. 2023, 54, 197–204+252. [Google Scholar] [CrossRef]
- Yang, Z.; Ouyang, L.; Zhang, Z.; Duan, J.; Yu, J.; Wang, H. Visual navigation path extraction of orchard hard pavement based on scanning method and neural network. Comput. Electron. Agric. 2022, 197, 106964. [Google Scholar] [CrossRef]
- Yu, J.; Zhang, J.; Shu, A.; Chen, Y.; Chen, J.; Yang, Y.; Tang, W.; Zhang, Y. Study of convolutional neural network-based semantic segmentation methods on edge intelligence devices for field agricultural robot navigation line extraction. Comput. Electron. Agric. 2023, 209, 107811. [Google Scholar] [CrossRef]
- Silva, R.; Cielniak, G.; Gao, J. Vision based crop row navigation under varying field conditions in arable fields. Comput. Electron. Agric. 2024, 217, 108581. [Google Scholar] [CrossRef]
- Li, C.; Pan, Y.; Li, D.; Fan, J.; Li, B.; Zhao, B.; Zhao, Y.; Wang, J. A curved path extraction method using RGB-D multimodal data for single-edge guided navigation in irregularly shaped fields. Expert Syst. Appl. 2024, 255, 124586. [Google Scholar] [CrossRef]
- Saha, S.; Noguchi, N. Smart vineyard row navigation: A machine vision approach leveraging YOLOv8. Comput. Electron. Agric. 2025, 229, 109839. [Google Scholar] [CrossRef]
- Ball, D.; Upcroft, B.; Wyeth, G.; Corke, P.; English, A.; Ross, P.; Patten, T.; Fitch, R.; Sukkarieh, S.; Bate, A. Vision-based Obstacle Detection and Navigation for an Agricultural Robot. J. Field Rob. 2016, 33, 1107–1130. [Google Scholar] [CrossRef]
- Zheng, Z.; Hu, Y.; Li, X.; Huang, Y. Autonomous navigation method of jujube catch-and-shake harvesting robot based on convolutional neural networks. Comput. Electron. Agric. 2023, 215, 108469. [Google Scholar] [CrossRef]
- Liu, T.; Zheng, Y.; Lai, J.; Cheng, Y.; Chen, S.; Mai, B.; Liu, Y.; Li, J.; Xue, Z. Extracting visual navigation line between pineapple field rows based on an enhanced YOLOv5. Comput. Electron. Agric. 2024, 217, 108574. [Google Scholar] [CrossRef]
- Zhang, T.; Zhou, J.; Liu, W.; Yue, R.; Shi, J.; Zhou, C.; Hu, J. SN-CNN: A Lightweight and Accurate Line Extraction Algorithm for Seedling Navigation in Ridge-Planted Vegetables. Agriculture 2024, 14, 1446. [Google Scholar] [CrossRef]
- Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv 2019, arXiv:1606.08415. [Google Scholar]
- Xu, Y.; Zhang, J.; Zhang, Q.; Tao, D. ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation. In Proceedings of the Annual Conference on Neural Information Processing Systems(NeurIPS) 2022, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Wang, C.; Bochkovskiy, A.; Liao, H. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Geng, Z.; Sun, K.; Xiao, B.; Zhang, Z.; Wang, J. Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) 2021, Virtual, 19–25 June 2021; pp. 14676–14686. [Google Scholar]
- Braun, M.; Rao, Q.; Wang, Y.; Flohr, F. Pose-RCNN: Joint object detection and pose estimation using 3D object proposals. In Proceedings of the International Conference on Intelligent Transportation Systems (ITSC) 2016, Rio de Janeiro, Brazil, 1–4 November 2016; pp. 1546–1551. [Google Scholar]
- Clevert, D.; Unterthiner, T.; Hochreiter, S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUS). In Proceedings of the 4th International Conference on Learning Representations (ICLR) 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Maas, A.; Hannun, A.; Ng, A. Rectifier Nonlinearities Improve Neural Network Acoustic Models. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Shazeer, N. GLU Variants Improve Transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar]
- Misra, D. Mish: A Self Regularized Non-Monotonic Neural Activation Function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
Models | Average Hardware Occupancy | Training Behavior | ||||
---|---|---|---|---|---|---|
CPU | GPU | RAM | Epoch Time (s) | Model Size (MB) | Parameter Amount | |
DEKR | 14.4% | 55.0% | 7.5% | 29.4 | 341.7 | 29, 404, 738 |
VITPose | 6.4% | 83.8% | 6.1% | 38.1 | 343.2 | 89, 991, 429 |
RCNN-pose | 9.5% | 82.5% | 7.2% | 351.6 | 225.7 | 59, 038, 942 |
YOLOv7-Tiny-pose | 19.7% | 36.3% | 18.8% | 20.3 | 18.6 | 9, 599, 635 |
YOLOv8s-pose | 10.2% | 25.0% | 11.0% | 26.3 | 22.4 | 11, 423, 552 |
DoubleNet | 22.4% | 41.3% | 10.5% | 19.4 | 202.0 | 53, 167, 038 |
Models | Average Hardware Occupancy | Training Behavior | ||||
---|---|---|---|---|---|---|
CPU | GPU | RAM | Epoch Time (s) | Model Size (MB) | Parameter Amount | |
DoubleNet (ELU) | 21.5% | 32.5% | 10.0% | 18.4 | 202.0 | 53, 167, 038 |
DoubleNet (LeakyReLU) | 21.6% | 32.5% | 9.9% | 18.2 | ||
DoubleNet (SiLU) | 22.7% | 32.5% | 10.0% | 17.9 | ||
DoubleNet (Mish) | 21.3% | 32.5% | 9.9% | 17.5 | ||
DoubleNet (GELU) | 22.4% | 32.5% | 9.9% | 18.8 | ||
DoubleNet (SA-GELU) | 22.4% | 41.3% | 10.5% | 19.3 |
Model | Percentage of Correct Key Points (PCK) | |||||
---|---|---|---|---|---|---|
Top Point | Middle Point 1 | Middle Point 2 | Middle Point 3 | Bottom Point | Average | |
DEKR | 60.25% | 59.10% | 57.10% | 55.10% | 56.78% | 57.66% |
VITPose | 92.53% | 92.42% | 85.36% | 95.14% | 88.53% | 90.80% |
RCNN-pose | 78.15% | 76.80% | 75.35% | 75.60% | 76.10% | 76.40% |
YOLOv7-Tiny-pose | 75.10% | 73.92% | 74.89% | 73.45% | 74.25% | 74.32% |
YOLOv8s-pose | 91.22% | 95.13% | 96.41% | 95.00% | 86.17% | 92.79% |
DoubleNet | 96.26% | 97.48% | 97.53% | 97.46% | 90.04% | 95.75% |
Combination | Percentage of Correct Keypoints (PCK) | Model Properties | |||||||
---|---|---|---|---|---|---|---|---|---|
Top Point | Middle Point 1 | Middle Point 2 | Middle Point 3 | Bottom Point | Average | Path File Size | Parameter Amount | ||
1 | MHSA + GELU + RGB-dataset | 94.89% | 93.21% | 93.06% | 92.66% | 74.88% | 89.74% | 201.0 MB | 52, 745, 694 |
2 | F-MHSA + GELU + RGB-dataset | 92.34% | 92.40% | 96.11% | 96.62% | 87.23% | 92.94% | 202.0 MB | 53, 167, 038 |
3 | F-MHSA + GELU + F-dataset | 96.12% | 96.56% | 97.47% | 95.92% | 88.18% | 94.85% | ||
4 | F-MHSA + SiLU + F-dataset | 95.42% | 93.83% | 94.58% | 97.21% | 90.56% | 94.32% | ||
5 | F-MHSA + ELU + F-dataset | 94.91% | 93.32% | 98.67% | 97.88% | 82.78% | 93.51% | ||
6 | F-MHSA + Mish + F-dataset | 91.59% | 94.87% | 96.39% | 93.83% | 86.53% | 92.64% | ||
7 | F-MHSA +LeakyReLU +F-dataset | 92.68% | 90.29% | 93.65% | 91.14% | 82.10% | 89.97% | ||
8 | F-MHSA + SA-GELU + F-dataset | 96.26% | 97.48% | 97.53% | 97.46% | 90.04% | 95.75% |
Models | Average Hardware Occupancy | Inference Performance | ||||
---|---|---|---|---|---|---|
CPU | GPU | RAM | Speed (FPS) | Cost (GFLOPS) | Power (W) | |
DEKR | 6.9% | 64.2% | 3.7% | 2.23 | 21.82 | 15.8 |
VITPose | 4.9% | 83.1% | 5.6% | 74.63 | 22.08 | 12.6 |
RCNN-pose | 6.4% | 78.9% | 6.4% | 0.91 | 483.95 | 25.3 |
YOLOv7-Tiny-pose | 7.6% | 31.6% | 6.8% | 156.25 | 1.47 | 9.4 |
YOLOv8s-pose | 10.1% | 24.4% | 5.6% | 192.32 | 1.85 | 7.9 |
DoubleNet | 7.4% | 40.2% | 5.4% | 71.16 | 2.07 | 11.4 |
Activation Function | SA-GELU | GELU | Mish | SiLU | LeakyReLU | ELU |
Average Time (ms) | 24.91 | 1.99 | 1.37 | 1.98 | 1.13 | 1.98 |
Operation | Parameter | Average Time (ms) | Size (MB) | Final Loss (DoubleNet) | PCK (DoubleNet) |
---|---|---|---|---|---|
MHSA | 31, 427, 424 | 39.14 | 127.59 | 0.000189 (training loss) 0.000377 (validation loss) | 89.74% |
F-MHSA | 31, 428, 240 | 58.79 | 128.20 | 0.000185 (training loss) 0.000370 (validation loss) | 92.94% |
Scheme | Accuracy (Average PCK) | Total Size (MB) | Final Training Loss | Final Validation Loss | Conclusion | Operation |
---|---|---|---|---|---|---|
1 + 1 + 1 | 80.16% | 999.08 | 0.0001314 | 0.0014247 | overfitting | reject |
1 + 1 + 2 | 82.51% | 977.42 | 0.0001242 | 0.0013319 | ||
1 + 2 + 1 | 79.77% | 977.04 | 0.0001559 | 0.0014659 | ||
1 + 2 + 2 | 93.66% | 955.38 | 0.0001480 | 0.0008751 | 2nd high | |
2 + 1 + 1 | 67.81% | 984.37 | 0.0001679 | 0.0018778 | overfitting | |
2 + 1 + 2 | 82.92% | 962.71 | 0.0001173 | 0.0015441 | ||
2 + 2 + 1 | 95.75% | 962.32 | 0.0001156 | 0.0009351 | 1st high | accept |
2 + 2 + 2 | 52.68% | 940.67 | 0.0001426 | 0.0024134 | overfitting | reject |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cui, X.; Zhu, L.; Zhao, B.; Wang, R.; Han, Z.; Lu, K.; Feng, X.; Ni, J.; Cui, X. DoubleNet: A Method for Generating Navigation Lines of Unstructured Soil Roads in a Vineyard Based on CNN and Transformer. Agronomy 2025, 15, 544. https://doi.org/10.3390/agronomy15030544
Cui X, Zhu L, Zhao B, Wang R, Han Z, Lu K, Feng X, Ni J, Cui X. DoubleNet: A Method for Generating Navigation Lines of Unstructured Soil Roads in a Vineyard Based on CNN and Transformer. Agronomy. 2025; 15(3):544. https://doi.org/10.3390/agronomy15030544
Chicago/Turabian StyleCui, Xuezhi, Licheng Zhu, Bo Zhao, Ruixue Wang, Zhenhao Han, Kunlei Lu, Xuguang Feng, Jipeng Ni, and Xiaoyi Cui. 2025. "DoubleNet: A Method for Generating Navigation Lines of Unstructured Soil Roads in a Vineyard Based on CNN and Transformer" Agronomy 15, no. 3: 544. https://doi.org/10.3390/agronomy15030544
APA StyleCui, X., Zhu, L., Zhao, B., Wang, R., Han, Z., Lu, K., Feng, X., Ni, J., & Cui, X. (2025). DoubleNet: A Method for Generating Navigation Lines of Unstructured Soil Roads in a Vineyard Based on CNN and Transformer. Agronomy, 15(3), 544. https://doi.org/10.3390/agronomy15030544