Autonomous Agricultural Robot Using YOLOv8 and ByteTrack for Weed Detection and Destruction
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper proposes a real-time weed detection and destruction system combining YOLOv8 and ByteTrack, deployed on a ROS-integrated agricultural robot platform. This demonstrates technical integration innovation in merging vision algorithms with robotic applications. I suggest minor revision before full acceptance.
(1) It is recommended to introduce some of the latest YOLO versions or other advanced models to further expand the comparative experiments.
(2) It is suggested to provide a detailed explanation of the performance differences of tracking algorithms other than ByteTrack under various conditions. By comparing these, the advantages of the ByteTrack algorithm can be discussed.
(3) For CV appliction in different fields, more recent articles may be added: Dual-Frequency Lidar for Compressed Sensing 3D Imaging Based on All-Phase Fast Fourier Transform; Journal of Optics and Photonics Research. 3D vision technologies for a self-developed structural external crack damage recognition robot; Automation in Construction..
(4) The YOLO algorithm is mentioned multiple times across different sections, making the structure slightly redundant. It can be further revised to improve the flow.
(5) It is recommended to provide a detailed explanation of the impact of pruning and quantization on the model's robustness.
(6) Figures 5 and 6 are somewhat blurry and need to be made clearer. The reference formatting should be consistent, as some references are missing DOI numbers.
(7) Consider discussing potential applications or future research directions for the proposed methodology. This can further highlight the importance of the research and the potential impact on this tracking area.
(8) It is recommended to emphasize the limitations of the research methodology and provide a more specific work plan so that the reader can better understand the full picture of the research.
Author Response
Comments 1: It is recommended to introduce some of the latest YOLO versions or other advanced models to further expand the comparative experiments.
Response 1: To strengthen the study and expand the comparative experiments, we have trained and evaluated additional YOLO versions, YOLOv9 and YOLOv11, on the same dataset used for YOLOv8. Table 1 has been revised with updated data. The following explanations have been added to the results and discussion section of the manuscript.
“YOLOv9 introduces modifications to the network architecture, particularly in feature extraction and optimization strategies, aiming to improve detection speed while maintaining high accuracy. The mAP_0.5 value of this model is 0.893, the recall rate is 0.803, and the precision value is 0.905. The model performed well in detecting weeds but exhibited a slightly lower recall, indicating a higher rate of false negatives compared to other models. Despite this, it maintained a strong balance between precision and detection speed.
YOLOv11 builds on advancements from previous YOLO versions, incorporating improved object localization techniques and an enhanced detection backbone. The mAP_0.5 value of this model is 0.926, the recall rate is 0.870, and the precision value is 0.915. The model demonstrated strong performance in weed detection in terms of both precision and detection speed. However, compared to other models, except for YOLOv9, it exhibited slightly lower precision due to a higher false positive rate.”
Comments 2: It is suggested to provide a detailed explanation of the performance differences of tracking algorithms other than ByteTrack under various conditions. By comparing these, the advantages of the ByteTrack algorithm can be discussed.
Response 2: The performance differences of various tracking algorithms under different conditions have been analyzed, and the advantages of the ByteTrack algorithm over these methods, along with the reasons for its selection, have been explained based on the literature. In accordance with the reviewer’s suggestions, the following section has been added to the manuscript.
…“Commonly used methods in object tracking algorithms include ByteTrack, SORT, DeepSORT, and StrongSORT [2]. The ByteTrack object tracking algorithm has low hardware requirements due to its simple structure, which enables fast processing. By incorporating objects with low confidence scores into its evaluation, it enhances overall tracking accuracy. This capability allows ByteTrack to achieve an effective balance between tracking speed and accuracy [1]. For example, ByteTrack consistently outperformed SORT and DeepSORT in detecting vehicles and people in highway timelapse videos [3]. Moreover, object tracking algorithms such as DeepSORT, StrongSORT, DeepOCSORT, and BOT-SORT employ the Re-Identification (ReID) method, which significantly reduces the network's processing speed [1].
As a result of the literature review, the ByteTrack algorithm, proposed by Zhang et al [4]. in 2022, was selected as the object-tracking algorithm for this study.” …
[1] You, L., Chen, Y., Xiao, C., Sun, C., & Li, R. (2024). Multi-Object Vehicle Detection and Tracking Algorithm Based on Improved YOLOv8 and ByteTrack. Electronics, 13(15), 3033.
[2] Wang, Y., & Mariano, V. Y. (2024). A Multi Object Tracking Framework Based on YOLOv8s and Bytetrack Algorithm. IEEE Access.
[3] Abouelyazid, M. (2023). Comparative Evaluation of SORT, DeepSORT, and ByteTrack for Multiple Object Tracking in Highway Videos. International Journal of Sustainable Infrastructure for Cities and Societies, 8(11), 42-52.
[4] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., ... & Wang, X. (2022, October). Bytetrack: Multi-object tracking by associating every detection box. In European conference on computer vision (pp. 1-21). Cham: Springer Nature Switzerland.https://doi.org/10.1007/978-3-031-20047-2_1
Comments 3: For CV appliction in different fields, more recent articles may be added: Dual-Frequency Lidar for Compressed Sensing 3D Imaging Based on All-Phase Fast Fourier Transform; Journal of Optics and Photonics Research. 3D vision technologies for a self-developed structural external crack damage recognition robot; Automation in Construction..
Response 3: In accordance with the reviewer’s suggestions, the following section has been added to the manuscript.
“Lidar plays a critical role in the process of acquiring depth information by providing high measurement accuracy and fine angular resolution. Various studies have been conducted to enhance the 3D imaging performance of Lidar technology. For example, since obtaining a depth map requires multiple reconstruction calculations, CS-based dual-frequency laser 3D imaging methods, which work with only two calculations, have been preferred by researchers [1]. Additionally, Lidar and camera fusion-based systems have demonstrated effective performance in extracting 3D contours [2]. In this study, depth information between the agricultural robot and obstacles was obtained using Lidar, and this information proved sufficient for the robot to avoid obstacles and create a map.”
[1] Li, X., Hu, Y., Jie, Y., Zhao, C., & Zhang, Z. (2024). Dual-Frequency Lidar for Compressed Sensing 3D Imaging Based on All-Phase Fast Fourier Transform. Journal of Optics and Photonics Research, 1(2), 74-81.
[2] Hu, K., Chen, Z., Kang, H., & Tang, Y. (2024). 3D vision technologies for a self-developed structural external crack damage recognition robot. Automation in Construction, 159, 105262.
Comments 4: The YOLO algorithm is mentioned multiple times across different sections, making the structure slightly redundant. It can be further revised to improve the flow.
Response 4: With the structural revision made in the manuscript, the YOLOv5 and YOLOv8 sections have been merged under a single heading.
Comments 5: It is recommended to provide a detailed explanation of the impact of pruning and quantization on the model's robustness.
Response 5: The effects of pruning and quantization on the model are discussed in greater detail in the "Pruning and Quantization" section. Additionally, a paragraph on this topic has been added to the conclusion section. The contribution of these techniques to the model's speed is presented below.
Effect on Inference Speed One of the major benefits of pruning and quantization was the substantial increase in inference speed:
- YOLOv5withoutpruning and quantization: 15 FPS
- YOLOv5withpruningandquantization using transfer learning: 47 FPS
- YOLOv5withpruningandquantization: 116 FPS
This improvement in processing speed enhances the real-time capabilities of the system, allowing the robot to process more frames per second and make decisions faster, which is crucial for dynamic agricultural environments
“Pruning reduces the number of parameters in the model by removing less significant weights, while quantization compresses the model by reducing the bit precision of weights and activations (e.g., from 32-bit floating-point to 8-bit integer). These modifications balance model size, processing speed, and detection accuracy. While a slight reduction in accuracy is observed, the model continues to maintain a high detection rate. This indicates that the effectiveness of pruning and quantization is not significantly compromised. One of the greatest advantages of pruning and quantization is the notable increase in inference speed. This improvement in processing speed enhances the system’s real-time capabilities, allowing the robot to process more frames per second and make faster decisions an essential factor for dynamic agricultural environments.”
Results and Discussion:
“Despite parameter reduction, it was observed that pruned and quantized models did not have a significant negative impact on robustness. Overall, the models maintained high detection accuracy despite variations in field conditions, lighting, and weed species.”
Comments 6: Figures 5 and 6 are somewhat blurry and need to be made clearer. The reference formatting should be consistent, as some references are missing DOI numbers.
Response 6: Figures 5 and 6 have been clarified and added to the manuscript. Missing DOI numbers have been included, and reference formatting has been completed.
Comments 7: Consider discussing potential applications or future research directions for the proposed methodology. This can further highlight the importance of the research and the potential impact on this tracking area.
Response 7: In accordance with the reviewer's suggestions, the potential applications of the proposed methodology and future research directions have been discussed. This discussion has been incorporated into the conclusion section with the following paragraph.
“Agricultural robots can detect not only weeds but also harmful insects, diseases, and the ripeness level of crops. This enables the development of a more efficient and sustainable agricultural system. Furthermore, the integration of artificial intelligence and agricultural robot technologies, as demonstrated in this study, facilitates the development of environmentally friendly applications.
In future studies, it is recommended to use BLDC motors instead of DC motors to enhance the autonomous driving performance of agricultural robots. Additionally, integrating the IMU with wheel encoders and the Extended Kalman Filter will allow for more accurate odometry readings. Along with these hardware upgrades, several approaches can be applied to improve the performance of YOLO, such as enhancing the model architecture and optimizing the training processes. These improvements could make YOLO more effective, especially in real-time and large-scale applications.”
Comments 8: It is recommended to emphasize the limitations of the research methodology and provide a more specific work plan so that the reader can better understand the full picture of the research.
Response 8: The limitations of this study's methodology have been emphasized by adding the following paragraph. Efforts have been made to present a more specific work plan through all the revisions.
“One of the primary limitations of this study is the high computational power required by the YOLO model due to its operation with large-scale datasets. This challenge has made it difficult to run the model on the embedded system (Jetson Nano). Additionally, the model implemented in the agricultural robot is designed to recognize seven common weed species. However, the predominant weed species can vary across different geographical regions. Consequently, the model will need to be retrained to accommodate the local weed species.”
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis work focuses on developing an “intelligent” machine for detecting and destroying weeds in agricultural areas. The authors based the imaging analysis on deep learning algorithms that allegedly can accurately detect weeds in agricultural fields. The high computational demands of deep learning-based weed detection algorithms pose challenges for their use in real-time applications. This study proposes an alternative: a vision-based autonomous agricultural robot that leverages the YOLOv8 model combined with Byte-Track to achieve effective real-time weed detection. Results show a high detection accuracy, tested under different variables, making the device a promising way to enhance the efficiency of weed management in agricultural practices.
The work follows a well-described method based on an imaging analysis algorithm developed over recent years. The application could be of interest to a broad community and could imply a significant technological development.
Please address the following concerns before the paper is published:
LINES 103-125. Please better elaborate on your work's scientific (or technological) contribution.
MATERIALS AND METHODS. Why did you choose to base the system on Arduino instead of more robust hardware? Contrast the different benefits or limitations of such selection.
FIGURE 2. Improve the insets of the figure; some of the text can hardly be read. Provide a block diagram of the machine.
LINES 201-221. What is your contribution to developing the different YOLO software stages?
Figures 5 & 6. Improve the size or resolution of this figure; some text is unreadable.
LINES 410-428. Is it possible to construct a variable (or reduced variable) where precision (or performance) and fps could be integrated? According to your discussion, both variables are required to characterize an autonomous system better.
Author Response
Comments 1: LINES 103-125. Please better elaborate on your work's scientific (or technological) contribution.
Response 1: The introduction section has been revised in accordance with the reviewers' suggestions.
“Agricultural research has shown that the use of autonomous and semi-autonomous robots can enhance productivity while also promoting healthy production and environmental sustainability. In this context, the integration of image processing-based deep learning methods into agricultural robots is expected to provide significant contributions.
State-of-the-art weed detection and removal robots are optimized for industrial use with advanced multispectral and hyperspectral cameras, LiDAR systems, high-performance processors, and durable mechanical components. Due to these features, they tend to be highly expensive. Academic research plays a crucial role in the development of commercial robots and paves the way for more efficient, cost-effective, and environmentally friendly solutions in the future.
Despite numerous academic studies on weed detection and destruction robots, ensuring that these systems operate with high accuracy and speed remains a challenging research area that requires further development. This study aims to contribute to this gap by developing a prototype of a low-cost weed detection and removal robot with an accuracy/speed balance by integrating an object-tracking algorithm into a new generation deep learning method.
The main objective of this study is to develop an autonomous agricultural robot capable of real-time differentiation between crops and weeds using the YOLOv8-ByteTrack deep learning (DL) algorithm without feature extraction. The low-cost prototype system successfully distinguished crops from weeds in real-time. Additionally, various YOLO deep learning models were compared, and the results were presented. One of the objectives of this study is to minimize the negative impact of chemical herbicides on soil and water resources by replacing chemical spraying with laser-based weed control. This approach contributes significantly to sustainable agricultural practices by targeting only the necessary areas for intervention.”
Comments 2: MATERIALS AND METHODS. Why did you choose to base the system on Arduino instead of more robust hardware? Contrast the different benefits or limitations of such selection.
Response 2: The system is definitely not based on Arduino. As is well known, it is not possible to run ROS and deep learning algorithms on an Arduino. Arduino was used solely as an intermediate electronic board for positioning the stepper motors that control the laser, primarily due to its low cost.
In this study, the ROS system was implemented on the Jetson Nano, while image processing and deep learning algorithms were executed on a computer. The embedded system limitations of the study are also discussed in the conclusion section.
Comments 3: FIGURE 2. Improve the insets of the figure; some of the text can hardly be read. Provide a block diagram of the machine.
Response 3: The resolution of Figure 2 has been improved. Additionally, the block diagram of the machine has been added to the paper.
Comments 4: LINES 201-221. What is your contribution to developing the different YOLO software stages?
Response 4: In this study, an object tracking algorithm was integrated into the existing YOLOv8 model and tested on a real-time system. Additionally, a performance comparison of different YOLO models was conducted to select the most suitable model under the development conditions.
Comments 5: Figures 5 & 6. Improve the size or resolution of this figure; some text is unreadable.
Response 5: The resolution of Figures 5 and 6 has been improved.
Comments 6: LINES 410-428. Is it possible to construct a variable (or reduced variable) where precision (or performance) and fps could be integrated? According to your discussion, both variables are required to characterize an autonomous system better.
Response 6: In computer vision, accuracy (mAP) and speed (fps) are two fundamental but distinct features that influence each other. Although measuring these two aspects directly with a single metric is challenging, a new feature can be derived to enable a balanced evaluation. For instance, a new feature can be obtained by normalization and ratioing accuracy and speed. However, the interpretation of this new feature depends on the user, as determining the optimal balance (ratio) should align with the specific objectives of the project.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper presents application of deep learning in agricultural robotics. The study leverages YOLOv8 for object detection and ByteTrack for object tracking. While the content is relevant to the journal, authors need to address following comments to improve clarity in research contributions, methodology and discussion of limitations.
1. The paper does not explicitly define the research problem and the gap it addresses. Authors need to clearly articulate how the proposed system improves upon existing agricultural weed detection robots/techniques. How does your approach compare to other state-of-the-art weed detection and removal robots beyond YOLO-based methods?
2. Authors mention use of a public dataset. Were images collected under different lighting conditions? Authors need to provide more information about augmentation techniques and dataset variability. How will these factors affect performance of deep learning algorithms?
3. The references seem to be dated and authors should do a literature review with recent papers in the field. A quick search of key words in scientific databases provides more recent and relevant papers which may be of interest to the authors:
https://doi.org/10.1016/j.compag.2021.106067
https://doi.org/10.1016/j.compag.2023.107698
https://doi.org/10.1016/j.eja.2019.01.004
4. The limitations of the proposed approach and implementation are not adequately discussed. For example: dependence on specific hardware, weather and lighting conditions affecting detection accuracy, challenges in detecting dense crop or occluded scenarios.
5. It is unclear how the system adapts to different crop types or new weed species. Can transfer learning be used for different agricultural settings?
6. Can the system generalize to different types of crops and weed species without retraining?
7. What are the latency and computational costs of running YOLOv8-ByteTrack in real time on the agricultural robot?
Author Response
Comments 1: The paper does not explicitly define the research problem and the gap it addresses. Authors need to clearly articulate how the proposed system improves upon existing agricultural weed detection robots/techniques. How does your approach compare to other state-of-the-art weed detection and removal robots beyond YOLO-based methods?
Response 1: In line with the reviewer's suggestions, the research problem and the gap addressed have been clearly defined. Additionally, following the reviewer's other recommendations, the section below has been added to the introduction.
“State-of-the-art weed detection and removal robots are optimized for industrial use with advanced multispectral and hyperspectral cameras, LiDAR systems, high-performance processors, and durable mechanical components. Due to these features, they tend to be highly expensive. Academic research plays a crucial role in the development of commercial robots and paves the way for more efficient, cost-effective, and environmentally friendly solutions in the future.
Despite numerous academic studies on weed detection and destruction robots, ensuring that these systems operate with high accuracy and speed remains a challenging research area that requires further development. This study aims to contribute to this gap by developing a prototype of a low-cost weed detection and removal robot with an accuracy/speed balance by integrating an object-tracking algorithm into a new generation deep learning method.
The main objective of this study is to develop an autonomous agricultural robot capable of real-time differentiation between crops and weeds using the YOLOv8-ByteTrack deep learning (DL) algorithm without feature extraction. The low-cost prototype system successfully distinguished crops from weeds in real-time. Additionally, various YOLO deep learning models were compared, and the results were presented. One of the objectives of this study is to minimize the negative impact of chemical herbicides on soil and water resources by replacing chemical spraying with laser-based weed control. This approach contributes significantly to sustainable agricultural practices by targeting only the necessary areas for intervention.”
Comments 2: Authors mention use of a public dataset. Were images collected under different lighting conditions? Authors need to provide more information about augmentation techniques and dataset variability. How will these factors affect performance of deep learning algorithms?
Response 2: In line with the reviewer's suggestions, the section below has been added to the results and discussion.
“The dataset used in this study was obtained from the Roboflow Weeds Dataset, a publicly available dataset that includes images of different weed species captured in diverse field conditions. The dataset contains 3,926 images of weeds commonly found in agricultural settings []. An additional 200 images of weeds were incorporated into the dataset to enhance the model's performance in its intended environment. In all datasets, Dandelion, Heliotropium indicum, Young Field Thistle (Cirsium arvense), Plantago lanceolata, Eclipta, and Urtica dioica were identified as significant weed species, and examples of images are depicted in Figure 8.
To ensure the robustness of the deep learning model, images were collected under various conditions. The dataset includes images taken in bright sunlight, partial shade, and low-light conditions, simulating real-world agricultural environments. Some images were captured under dry, wet, and cloudy conditions, which affect the contrast and appearance of weeds. Images include different angles, soil textures, and crop backgrounds to improve model generalization.
Augmentation techniques used to further enhance the dataset’s variability and improve the model’s ability to generalize include brightness and contrast adjustment, rotation and flipping, Gaussian noise injection, scaling and cropping, and color jittering. These augmentations were performed using Roboflow's built-in augmentation pipeline and additional processing in Python using the albumentations library.
The implementation of these techniques will positively impact the three key performance aspects of deep learning algorithms. The use of diverse lighting conditions and augmentations helps the model recognize weeds in different scenarios, reducing overfitting. By simulating real-world field conditions, the model can effectively differentiate weeds from crops, even under challenging circumstances. Data augmentation improves mAP scores by exposing the model to multiple variations, making it more resilient to unseen images.”
Comments 3: The references seem to be dated and authors should do a literature review with recent papers in the field. A quick search of key words in scientific databases provides more recent and relevant papers which may be of interest to the authors:
Response 3: In accordance with the reviewer's suggestion, the following relevant studies have been added to the paper after conducting a literature review.
“Yu et al. compared the performance of deep convolutional neural network (DCNN) models—VGGNet, GoogLeNet, and DetectNet—in detecting emerging weeds in Bermuda grass. The study found that while the VGGNet model achieved impressive results across different mowing heights and surface conditions, DetectNet emerged as the most effective DCNN architecture for detecting a variety of broadleaf weeds alongside dormant Bermuda grass [16].
Hussain et al. collected approximately 24,000 images from potato fields under varying weather conditions, including sunny, cloudy, and partly cloudy environments. These images were tested using YOLOv3 and Tiny-YOLOv3 models for the detection of lamb's quarters weed and potato plants infected with early blight, as well as healthy potato plants. For the weed dataset, the mAP values of the Tiny-YOLOv3 and YOLOv3 models were determined to be 78.2% and 93.2%, respectively [17].
Ruigrok et al. reported that they effectively controlled 96% of weeds and identified potato plants with an accuracy of 84% using the YOLOv3 algorithm. Their study focused on recognizing crops rather than identifying weeds [18].
Junior and Ulson proposed a real-time weed detection system based on the YOLOv5 architecture. Their model was evaluated on a custom dataset consisting of five weed species, both with and without transfer learning. The results demonstrated that the system is functional, achieving a 77% accuracy rate while detecting weeds at 62 FPS [19].
Rehman et al. developed a novel detection model based on the YOLOv5 architecture, providing a method for distinguishing between soybean plants and weeds. They compared the model's performance against different YOLO versions and the transformer-based RT-DETR, reporting superior results with a mAP of 73.9 [20]
Li et al. developed a weed detection algorithm called YOLOv10n-FCDS, which identified Sagittaria trifolia, a common weed in rice fields, with an accuracy of 87.4%. Based on the obtained results, the rice fields were divided into sections, and customized spray prescriptions were formulated for each section at varying application rates [21].”
[8] https://doi.org/10.1016/j.compag.2021.106067
[9] https://doi.org/10.1016/j.compag.2023.107698
[16] Yu, J., Sharpe, S. M., Schumann, A. W., & Boyd, N. S. (2019). Deep learning for image-based weed detection in turfgrass. European journal of agronomy, 104, 78-84. https://doi.org/10.1016/j.eja.2019.01.004
[17]Hussain, N., Farooque, A. A., Schumann, A. W., McKenzie-Gopsill, A., Esau, T., Abbas, F., ... & Zaman, Q. (2020). Design and development of a smart variable rate sprayer using deep learning. Remote Sensing, 12(24), 4091.
[18] T. Ruigrok, E. van Henten, J. Booij, K. van Boheemen, G. Kootstra Application-specific evaluation of a weed-detection algorithm for plant-specific spraying Sensors, 20 (24) (2020), p. 7262, 10.3390/s20247262
[19] Junior, L.C.M., and Ulson, C.A.J., 2021. Real time weed detection using computer vision and deep learning. In: 14th IEEE International Conference on Industry Applications (INDUSCON), pp. 1131-1137, doi: 10.1109/INDUSCON51756.2021.9529761.
[20] Rehman, M. U., Eesaar, H., Abbas, Z., Seneviratne, L., Hussain, I., & Chong, K. T. (2024). Advanced drone-based weed detection using feature-enriched deep learning approach. Knowledge-Based Systems, 305, 112655.
[21] Li, Y., Guo, Z., Sun, Y., Chen, X., & Cao, Y. (2024). Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n. Agriculture, 14(11), 2066.
Comments 4: The limitations of the proposed approach and implementation are not adequately discussed. For example: dependence on specific hardware, weather and lighting conditions affecting detection accuracy, challenges in detecting dense crop or occluded scenarios.
Response 4: The limitations of the proposed approach and implementation have been added to the conclusion section.
“One of the primary limitations of this study is the high computational power required by the YOLO model due to its operation with large-scale datasets. This challenge has made it difficult to run the model on the embedded system (Jetson Nano). Additionally, the model used in the agricultural robot was developed to recognize seven common weed species. However, the dominant weed species may vary across different geographical regions. Therefore, to ensure the model adapts to local weed species, it must be retrained using an appropriate dataset that accounts for variations in weather and lighting conditions.”
Comments 5: It is unclear how the system adapts to different crop types or new weed species. Can transfer learning be used for different agricultural settings?
Response 5: Detailed information on how the model adapts to different crop types or new weed species has been included in the results and discussion section of the study. Additionally, the same section explains how the YOLOv8 and ByteTrack models can be utilized in various agricultural environments through transfer learning. In response to the reviewer’s suggestions, the following paragraph has been added to the manuscript.
“The deep learning model based on ByteTrack and YOLOv8 is designed for generalization across multiple environments. However, its adaptation to new agricultural settings can be further improved through several approaches. First, the model can be retrained using novel datasets of weeds and crops specific to a particular region or agricultural application. Alternatively, transfer learning can be utilized to leverage the pre-trained model without the need for full retraining, enabling the identification of new weed species. Transfer learning allows the model to efficiently adapt to new weed types or different crop species without requiring training from scratch. For instance, if the model was initially trained to detect seven different weed species, it can be fine-tuned using new images to accommodate fields with different weed compositions while preserving learned representations of plant structures. If a new weed species emerges in the field, a small labeled dataset containing images of this weed can be used for incremental learning to enhance recognition. Instead of retraining the entire network, transfer learning enables fine-tuning only the last few layers, making adaptation more efficient.”
Comments 6: Can the system generalize to different types of crops and weed species without retraining?
Response 6: The YOLOv8-ByteTrack model was trained on a diverse dataset containing multiple weed species with various lighting conditions and backgrounds. This enables the model to identify weed characteristics, rather than memorizing specific crop-weed combinations.The model focuses on key distinguishing features (leaf shape, texture, color) rather than absolute species labels. YOLOv8 -ByteTrack can detect new objects provided they share characteristics with the training data. However, full generalization without retraining is limited due to: If a previously unseen weed has unique characteristics, detection accuracy may decline. If a crop species closely resembles a detected weed (e.g., broadleaf crops vs. broadleaf weeds), the model may struggle with misclassification. Significant changes in soil color, lighting, or camera angles can affect detection performance.
Comments 7: What are the latency and computational costs of running YOLOv8-ByteTrack in real time on the agricultural robot?
Response 7: Below are the real-time execution time and computational costs of YOLOv8-ByteTrack on the agricultural robot. To address this gap, the following text has been added to the manuscript.
“The models presented in Table 1 were tested on a laptop equipped with an NVIDIA GTX 1650 (4GB VRAM) and an Intel i5 processor with 16GB RAM. YOLOv8-ByteTrack achieved a frame rate of 18 FPS, meaning that processing a new frame took an average of 55 ms. When head tracking was performed using ByteTrack, an additional latency of approximately 5 ms per frame was introduced, resulting in a total delay of around 60 ms. In tests conducted solely on the CPU (Intel i5 without GPU acceleration), the FPS dropped to a range of 4-6, significantly limiting the practicality of real-time deployment on a CPU. In terms of computational costs, running YOLOv8-ByteTrack on the GTX 1650 utilized approximately 75-85% of the GPU resources. Memory consumption ranged between 3GB VRAM and 5-7GB RAM. Under load, the system’s power consumption was estimated to be around 40W-50W.”
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have addressed all of my comments, and the paper has been improved