Next Article in Journal
New Races of Hemileia vastatrix Detected in Peruvian Coffee Fields
Previous Article in Journal
Environmentally Friendly and Effective Alternative Approaches to Pest Management: Recent Advances and Challenges
Previous Article in Special Issue
Metabolic Profiling and Transcriptome Analysis Provide Insights into the Anthocyanin Types and Biosynthesis in Zingiber striolatum Diels Flower Buds in Three Planting Modes
 
 
Article
Peer-Review Record

Instance Segmentation of Lentinus edodes Images Based on YOLOv5seg-BotNet

Agronomy 2024, 14(8), 1808; https://doi.org/10.3390/agronomy14081808
by Xingmei Xu 1, Xiangyu Su 1, Lei Zhou 1, Helong Yu 1,* and Jian Zhang 2,3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Agronomy 2024, 14(8), 1808; https://doi.org/10.3390/agronomy14081808
Submission received: 22 July 2024 / Revised: 9 August 2024 / Accepted: 15 August 2024 / Published: 16 August 2024

Round 1

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

In this paper, the authors propose YOLOv5seg-BotNet, an instance segmentation model for Lentinus edodes, aiming to explore its application in the mushroom industry. First, the primary neural network was replaced with BoTNet, and the spatial convolutions in the local backbone network were replaced with global self-attention modules to enhance feature extraction capability.

However, the authors do not sufficiently detail their own contribution, which is briefly described in section 2.2.2. The structures are quickly presented in two figures and two equations. For a scientific paper, it is crucial that the authors explain their choices in detail and justify the motivation behind these choices. A detailed explanation of why BoTNet was chosen to replace the primary neural network and why the global self-attention modules were introduced would be beneficial. Additionally, the specific advantages of these modifications compared to existing techniques should be clearly articulated.

The authors claim that the improved model showed enhancements in precision, recall, Mask_AP, F1-Score, and FPS, demonstrating improvements in both segmentation accuracy and speed, with excellent detection and segmentation performance on Lentinus edodes fruiting bodies. However, these improvements are minimal compared to the original model. For example, increases of only 2.37% in precision and 2.61% in FPS do not necessarily justify the adoption of the new model without a more in-depth discussion on the impact of these gains in practical scenarios.

Furthermore, the descriptive part of existing methods takes up a disproportionate amount of space compared to the presentation of the authors' original contributions. A more concise description of existing methods would better highlight the innovation brought by this research. It would also be pertinent to include a more thorough and critical comparison of the performance of YOLOv5seg-BotNet with other state-of-the-art models under similar conditions to contextualize the obtained results.

Finally, the overall organization of the paper could be improved. A clearer structure, with distinct sections for methodology, experimental results, and discussion of the implications of the results, would allow for a smoother reading experience and a better understanding of the scientific contributions.

For all these reasons, I propose to reject this paper in its current form. The authors should consider rewriting the article to emphasize their original contributions, detail their methodological choices, and provide a more in-depth analysis of the obtained results. This includes a more balanced discussion between existing methods and the new proposals, as well as a solid justification for the observed improvements.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

The article presents an update to the YOLOv5sSeg that is used to better segment a specific type of mushroom. I have the following comments

1. The authors need to provide further details as to what the contributions achieved by this article are in the ending of Section 1... All that is mentioned is "the present study has taken advantage of the YOLOv5seg-BotNet procedure to segment images at instance for detection and decision-making processes for the evaluation of Lentinus edodes, providing an improved way to report its quality and yield prediction."

2. Also, you need to have a short introduction to what comes next in the article right after the contributions.

3. Title 2.2 is too long

4. The authors did not elaborate as to what is the reason was behind the choice of YOLOv5s as their base model, not a more recent version. They only said, 'Owing to its strong performance in real-time detection and instance segmentation applications," and this is the case for more recent versions of YOLO as well..

5. Continuing on point 4, why not a lighter version, such as YOLOv5n for example, It is even better for real-time performance

6. The captions of Figure 2 and Figure 5 are more explanations of parts of the figures than captions.

7. I don't understand the ablation study, the first row represents YOLOv5Seg, the second is when employing the MHSA, and now the third is when employing the MHSA along with the PANet or just the PANet.. etc.. Ablation should be more of testing the absence and existence of each component within your proposed model so this table is very confusing.

8. Where are the references for the models employed in Table 3

9. What is the difference between subsection 3.2 and 3.2, they seem to be both compared to other segmentation models.

10. What about applying your approach to other datasets?

11. Section 2.1 said that 1000 images were collected, yet the conclusion said 7,000!!!!!

Author Response

Dear Reviewers and Editors:

Thank you for editing and reviewing my manuscript. In this response, we will use red text to indicate the question and blue text to indicate our answer. The specific changes are as follows:

Reviewer 2:

  1. The authors need to provide further details as to what the contributions achieved by this article are in the ending of Section 1... All that is mentioned is "the present study has taken advantage of the YOLOv5seg-BotNet procedure to segment images at instance for detection and decision-making processes for the evaluation of Lentinus edodes, providing an improved way to report its quality and yield prediction."

Response: Thanks for the comments. At the end of the introduction, we add a detailed explanation of the contribution of this paper to ensure that readers can fully understand the innovation and practical application value of this research. Revisions have been made in the manuscript, in line 87.

  1. Also, you need to have a short introduction to what comes next in the article right after the contributions.

Response: Thanks for the comments. After explaining the contribution of this paper, we will add a short introduction in the introduction to outline the structure of the paper and the content arrangement of each chapter to help readers better understand and grasp the context of the full paper. Revisions have been made in the manuscript, in line 107.

  1. Title 2.2 is too long.

Response: Thanks for the comments. We have noted the issue of heading 2.2 being too long and have simplified it to improve the brevity and readability of section headings. Revisions have been made in the manuscript, in line 137.

  1. The authors did not elaborate as to what is the reason was behind the choice of YOLOv5s as their base model, not a more recent version. They only said, 'Owing to its strong performance in real-time detection and instance segmentation applications," and this is the case for more recent versions of YOLO as well.

Response: Thanks for the comments. Thank you for your attention to our choice of YOLOv5s as the base model. We acknowledge that the newer versions of YOLO (such as YOLOv8, v9, and v10) might offer superior performance in some aspects, including enhanced accuracy and speed. In terms of accuracy and speed, YOLOv5 can achieve a good balance, taking into account accuracy and speed, especially suitable for real-time applications. YOLOv10 may surpass YOLOv5 on some metrics, but may require higher computational resources and longer training times due to its more complex model structure. In terms of training data requirements, YOLOv5 has low requirements on data sets and can achieve better performance on small-scale data sets, which is suitable for resource-limited scenarios. The YOLOv10 may require more data and longer training times to fully perform, which may not be the best choice when data resources are limited. In terms of real-time requirements, the YOLOv5 model is small and the reasoning speed is fast, which is very suitable for embedded devices or real-time processing applications. Due to the more complex model, YOLOv10's reasoning speed may be slow, which is not suitable for scenes with high real-time requirements. The Lentinus edodes instance segmentation task requires efficient real-time processing capabilities. YOLOv5 provides a good balance of performance in our  Lentinus edodes instance segmentation task, and its efficiency, speed, and adaptability make it the best choice for current applications.

YOLOv5s performed well in our initial tests with high detection accuracy and speed, which is critical for real-time applications. While the newer YOLO version may perform better in some respects, the stability and compatibility of YOLOv5s on our dataset has been fully validated.

  1. Continuing on point 4, why not a lighter version, such as YOLOv5n for example, it is even better for real-time performance.

Response: Thanks for the comments. Thank you for your question about choosing a lighter version of the model. Here's why we chose YOLOv5s over YOLOv5n:

  • Precision vs. performance trade-off: While the YOLOv5n is lighter than the YOLOv5s, it generally suffers from a drop in accuracy. We emphasize in our research not only to achieve real-time performance, but also to ensure high-precision segmentation and detection results. YOLOv5s offers a better balance between the two.
  • Application requirements: Our research is applied to the case segmentation of Lentinus edodes, which needs to deal with complex background and morphologically similar objects. The higher feature extraction capability provided by YOLOv5s makes it more robust in these complex scenes.
  • Preliminary experimental results: In our preliminary experiments, YOLOv5s outperforms YOLOv5n in both accuracy and speed, especially in our specific application scenarios. Therefore, for better overall performance, we chose YOLOv5s.
  1. The captions of Figure 2 and Figure 5 are more explanations of parts of the figures than captions.

Response:  Thank you for your comments on the headings in Figures 2 and 5. We understand that the title should convey the core content of the image concisely, rather than explain the parts of the image in detail. The overall explanation of the figure has been introduced in the body. We explain the specific content of the figure after the title in order to supplement the content not mentioned in the body and help readers understand.

  1. I don't understand the ablation study, the first row represents YOLOv5Seg, the second is when employing the MHSA, and now the third is when employing the MHSA along with the PANet or just the PANet.. etc.. Ablation should be more of testing the absence and existence of each component within your proposed model so this table is very confusing.

Response: Thanks for the comments. For the ablation study, rows 1 to 5 in Table 2 represent the original model YOLOv5seg, followed by the incorporation of MHSA into the original model, the addition of PANet to the original model, replacing the loss function with VFL in the original model, and finally, YOLOv5seg-BotNet, which integrates the three aforementioned components into the original model. We can see that each individual module improves the original model, demonstrating the effectiveness of each module.

Overall, YOLOv5seg-BotNet achieved the Precision, Recall, Mask_AP, F1-Score, and FPS of 97.58%, 95.74%, 95.90%, 96.65%, and 32.86 frames/s, respectively. Compared to the original model, it improved the Precision, Recall, Mask_AP, F1-Score, and FPS by 2.37%, 4.55%, 4.56%, 3.50%, and 2.61%, respectively.

  1. Where are the references for the models employed in Table 3.

Response: Thanks for the comments. The lack of references for the model cited in Table 3 is indeed an oversight. We will ensure that detailed references are provided in the revisions to support the model we are using. Revisions have been made in the manuscript, in line 292.

  1. What is the difference between subsection 3.2 and 3.2, they seem to be both compared to other segmentation models.

Response: Thanks for the comments. Are you asking about sections 3.2 and 3.3, if so, I will explain the difference below.

Section 3.2 primarily elaborates on the comparison between the proposed model in this study and other segmentation models such as Mask RCNN, YOLCAT, and YOLOv8 in terms of evaluation metrics P, R, Mask_AP, F1-Score, and FPS, mainly focusing on numerical comparisons.YOLOv5seg-BotNet achieved the Precision, Recall, Mask_AP, F1-Score, and FPS values of 97.58%, 95.74%, 95.90%, and 96.65%, respectively, all higher than those of the other models.

Section 3.3 primarily describes the actual application performance of various models on the test set after training. Figures 8 and 9 show that the Mask RCNN model has problems with missed segmentation and mis-segmentation, while YOLACT and YOLOv8 experience mis-segmentation issues. The model proposed in this paper does not have these problems, proving its superiority over other models in both evaluation metrics and practical application.

  1. What about applying your approach to other datasets?

Response: Thanks for the comments. Thank you for highlighting the potential benefit of applying our approach to other datasets.  We understand the importance of demonstrating the versatility and applicability of our model beyond the specific dataset used in the current study. While this study primarily focuses on Lentinus edodes, we intend to extend to other edible mushroom species in future research to validate the widespread applicability of our method.

  1. Section 2.1 said that 1000 images were collected, yet the conclusion said 7,000!!!!!

Response: Thanks for the comments. Thank you for pointing out this discrepancy regarding the number of images mentioned in Section 2.1 and the conclusion.  Revisions have been made in the manuscript, in line 382.

 

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

In this study, the authors proposed various hybrid models called YOLOv5seg-MHSA, YOLOv5seg-PANet, YOLOv5seg-VFL and YOLOv5seg-BotNet for the segmentation of Lentinus edodes (shiitake mushroom) fruiting bodies. They compared these models with each other and with the original YOLOv5seg, Mask RCNN, YOLCAT, YOLOv8 models in the literature. The study provides detailed information on how the newly developed models were created and which layers they have. I think the revisions below will make the study more interesting:

 

1- Under the Data Collection and Preprocessing heading, it is mentioned how the data was obtained, but no information was given about the lighting and background during data collection. Information should be provided on these issues.

2- In line 103, it is stated that "The original images were annotated using Labelme.". Was a classification made in this study or was a feature (such as the diameter of the mushroom head) estimated? It should be stated more clearly. If classification is done, the labels used to label this data and the number of samples each label carries should be given as a table.

3- Line 104 states that data augmentation is done. This process should be presented more clearly. The number of original data and the number of data augmentations should be stated more clearly.

4-Data augmentation is expected to be performed on the training data after the data set is separated into training and testing. This should be stated in the study.

5- The stopping criterion of training is not specified in the parameters given in Table-1. It would be appropriate to add it.

6- In the results given in Table 4, it is seen that the Precision of the YOLCAT method is better than the proposed method. At this point, it should be explained why the proposed method is better.

Author Response

Dear Reviewers and Editors:

Thank you for editing and reviewing my manuscript. In this response, we will use red text to indicate the question and blue text to indicate our answer. The specific changes are as follows:

Reviewer 3:

  1. Under the Data Collection and Preprocessing heading, it is mentioned how the data was obtained, but no information was given about the lighting and background during data collection. Information should be provided on these issues.

Response: Thanks for the comments. We will supplement the data collection and preprocessing section with detailed information about lighting and background to improve data transparency, in line 116. During the data collection process, we used standardized light source and background Settings to ensure the quality and consistency of the image. To be specific:

  • Lighting conditions: We used a uniform artificial light source in the data acquisition, and the light source intensity was kept constant to reduce the impact of shadows and reflections on image quality. This setup helps us get more consistent and reliable data.
  • Background setting: The image is captured in a controlled environment with a single color on the ground to minimize the interference of background clutter to model training. This can ensure that the characteristics of shiitake mushrooms are more prominent, which is convenient for subsequent segmentation and identification. Thank you for pointing out the direction of this improvement.
  1. In line 103, it is stated that "The original images were annotated using Labelme.". Was a classification made in this study or was a feature (such as the diameter of the mushroom head) estimated? It should be stated more clearly. If classification is done, the labels used to label this data and the number of samples each label carries should be given as a table.

Response: Thanks for the comments. In this study, the case segmentation annotation is mainly carried out, rather than classification or feature estimation. Since the traditional classification task is not performed, category labels are not used. We used the Labelme tool to accurately outline each fruiting body of shiitake mushroom, so that each shiitake mushroom instance could be identified and segmented in the subsequent model training. At the same time, you also reminded us that some information related to the morphology of shiitake mushrooms can be collected in future work to assist the training and verification of the model.

  1. Line 104 states that data augmentation is done. This process should be presented more clearly. The number of original data and the number of data augmentations should be stated more clearly.

Response: Thanks for the comments. Revisions have been made in the manuscript, in line 130.

  1. Data augmentation is expected to be performed on the training data after the data set is separated into training and testing. This should be stated in the study.

Response: Thanks for the comments. We will detail the specific steps of data enhancement in the manuscript, especially the partitioning of training and test datasets. Revisions have been made in the manuscript, in line 134.

  • Data set partitioning: Before data enhancement, we first divided the data set into a training set and a test set. In general, we use 70% of the data for training, 20% for validation, and 10% for validation, which helps to evaluate the generalization ability of the model during training.
  • Implementation of data enhancement: Data augmentation is applied only to the training set. In this way, the enhanced data can effectively improve the generalization performance of the model without compromising the independence of the test set. Specific enhancement techniques include rotation, translation, etc. These operations help the model learn more diverse features in the training phase.
  1. The stopping criterion of training is not specified in the parameters given in Table-1. It would be appropriate to add it.

Response: Thanks for the comments. We used some common stopping criteria during training to ensure the best performance of the model. Revisions have been made in the manuscript, in line 248 and Table 1.

We use the following stop criteria to determine when the training process ends:

  • Early Stopping: If the verification set loss does not improve within 10 consecutive epochs, we will stop training. This prevents the model from overfitting.
  • Maximum epochs: We set the maximum epochs for training to 100 to ensure that training does not go on indefinitely even if the early stop standard is not triggered.
  1. In the results given in Table 4, it is seen that the Precision of the YOLCAT method is better than the proposed method. At this point, it should be explained why the proposed method is better.

Response: Thanks for the comments. In Table 4, we note that the Precision of the YOLCAT method is better than the proposed method. However, our approach still has important advantages in segmentation mask AP and overall performance, as specified below:

  • In instance segmentation, mask values are very important because they directly affect the accuracy and effect of the model. The mask is used to precisely identify the area of each instance in the image, helping the model learn and recognize the shape and position of the object. Although the Precision of YOLCAT model is better than that of the proposed model, we believe that the value of mask is more critical in the instance segmentation task, and the value of mask of our proposed model is higher than that of YOLCAT model.
  • Overall performance: While YOLCAT performs well in precision, our approach is more balanced across other metrics such as recall, F1 scores and speed. This means that our model correctly identifies targets while also maintaining high detection speeds and recall capabilities, making it suitable for real-time applications.

Round 2

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

 

BoTNet, PANet and VFL are existing structure.  

 

In athe optimization technqiue is the original part of the part this part should be correctly presented by the authors, wich is not the case. 

 

The paper is a confimrtaive strategy for existing structures. 

 

I propose to reject the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

The paper is improved but I suggest adding the reason behind the choice of YOLOv5s versus newer models and versus YOLOv5n to the text. Also the references to the others should be added within the table itself.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

All the previous issues are fixed by the authors.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study modified YOLOv5seg by incorporating BoTNet, PANet and VFL, and applied the network for shiitake instance segmentation. The study has a low novelty, as it is about integrating existing network components instead of proposing new network components. Literature review is weak, appears to be rather random, and lacks focus and comprehensiveness. No knowledge gaps in current literature are identified. No study justification is provided. No research objectives are specified. The instance segmentation task has a simple nature and a low difficulty. YOLOv5seg is an outdated network. I would be willing to potentially consider acceptance if the research work was done on state-of-the-art networks such as YOLOv9.

Author Response

Dear Reviewers and Editors:

Thank you for editing and reviewing my manuscript. In this response, we will use red text to indicate the question and blue text to indicate our answer. The specific changes are as follows:

Reviewer1:

The study modified YOLOv5seg by incorporating BoTNet, PANet and VFL, and applied the network for shiitake instance segmentation. The study has a low novelty, as it is about integrating existing network components instead of proposing new network components. Literature review is weak, appears to be rather random, and lacks focus and comprehensiveness. No knowledge gaps in current literature are identified. No study justification is provided. No research objectives are specified. The instance segmentation task has a simple nature and a low difficulty. YOLOv5seg is an outdated network. I would be willing to potentially consider acceptance if the research work was done on state-of-the-art networks such as YOLOv9.

1.The study has a low novelty, as it is about integrating existing network components instead of proposing new network components.

Response: Thanks for the comments. By integrating BoTNet, PANet, and VFL into YOLOv5seg, we proposed a new technical solution aimed at improving segmentation accuracy and efficiency.

2.Literature review is weak, appears to be rather random, and lacks focus and comprehensiveness.

Response: Thanks for the comments. The introduction of this paper first introduces the importance of Lentinus edodes, followed by the advantages and disadvantages of traditional segmentation algorithms. With the rapid development of deep learning, instance segmentation has both two-stage and one-stage models. While two-stage segmentation models have high accuracy, they lack real-time performance. Therefore, one-stage instance segmentation models are introduced. The YOLO series of one-stage segmentation models exhibit good performance in both accuracy and speed. Finally, it is introduced that the YOLOv5seg model performs well in segmentation. In response to the situations encountered during the growth of Lentinus edodes, this paper makes improvements to solve the arising issues.

And also we have added more relevant literature and conducted a more detailed analysis of existing literature to ensure that the literature review is more comprehensive and profound, in line 60.

3.No knowledge gaps in current literature are identified.

Response: Thanks for the comments. There is limited research on Lentinus edodes instance segmentation in the existing literature on agriculture and food industry. Existing studies mostly focus on general object detection and segmentation tasks, while instance segmentation in specific domains such as Lentinus edodes still faces challenges due to its unique morphology and complex background. Additionally, while some studies utilized deep learning techniques for agricultural product segmentation, few integrate BoTNet, PANet, and VFL for Lentinus edodes instance segmentation tasks. Therefore, our study addresses this knowledge gap in the field, proposed a novel integration approach, and validates its effectiveness in practical applications.

4.No study justification is provided.

Response: Thanks for the comments. The justification of this study is reflected in several aspects:

  • Practical application demands: Lentinus edodes as an important agricultural product, require quality inspection and classification in the production and sales process. Efficient and accurate instance segmentation technology can significantly improve the efficiency of automated detection and classification, reducing labor costs.
  • Performance improvement: Our experimental results demonstrate that the improved network performs exceptionally well in Lentinus edodes instance segmentation tasks, significantly outperforming traditional methods. This validates the effectiveness and practicality of our approach.

5.No research objectives are specified.

Response: Thanks for the comments. Revisions have been made in the manuscript, in line 80.

6.YOLOv5seg is an outdated network. I would be willing to potentially consider acceptance if the research work was done on state-of-the-art networks such as YOLOv9.

Response: Thanks for the comments. At the beginning of our experiment, YOLOv9 had not yet gained popularity. We attempted to use various models such as Mask RCNN, YOLCAT, YOLOv8 for instance segmentation, and the segmentation results are shown in Table 3. The table indicates that YOLOv5seg-BotNet exhibits the best overall performance, achieving superior Lentinus edodes sub-entity segmentation in terms of both segmentation accuracy and speed. Therefore, we ultimately chose the YOLOv5seg-BotNet network.

 

Table3 The results of different segmentation models are compared

Models

P(%)

R(%)

Mask_AP(%)

F1-Score(%)

FPS

Mask RCNN

91.37

89.72

 91.59

  90.53

7.20

YOLCAT

97.86

92.58

 95.34

  95.14

27.79

YOLOv8

95.94

90.24

 92.28

  93.00

30.62

YOLOv5seg-BotNet

97.58

95.74

 95.90

  96.65

32.86

 

 

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript (agronomy-3051670) introduces a new model, YOLOv5seg-BotNet, for segmenting shiitake mushrooms, improving both accuracy and speed. This provides a foundation for future applications in quality grading and intelligent harvesting in mushroom production.

The topic is up-to-date and suitable for the journal, as well as, one of the key points is the use of Yolo models for real-time identification. The manuscript is written in an appropriate manner.

The sections of introduction, materials and methods, results and discussion are relevant. The figures present good quality and readability. However, the sample number and the analyses performed were not described how many measurements were made.

The authors should have made other samplings, other species and report. If they did this, include in the supplementary material and describe in greater detail in the materials and methods section.

Keywords in alphabetical order; Please, check all scientific names and put them in italics. In the materials and methods section, I suggest to the authors, to write the Python script used as a supplementary file, so that the authors can replicate the YOLOv5 model used.

Comments on the Quality of English Language

English needs corrections in grammar and spelling.

Author Response

Reviewer2:

The manuscript (agronomy-3051670) introduces a new model, YOLOv5seg-BotNet, for segmenting shiitake mushrooms, improving both accuracy and speed. This provides a foundation for future applications in quality grading and intelligent harvesting in mushroom production.

The topic is up-to-date and suitable for the journal, as well as, one of the key points is the use of Yolo models for real-time identification. The manuscript is written in an appropriate manner.

The sections of introduction, materials and methods, results and discussion are relevant. The figures present good quality and readability. However, the sample number and the analyses performed were not described how many measurements were made.

The authors should have made other samplings, other species and report. If they did this, include in the supplementary material and describe in greater detail in the materials and methods section.

Keywords in alphabetical order; Please, check all scientific names and put them in italics. In the materials and methods section, I suggest to the authors, to write the Python script used as a supplementary file, so that the authors can replicate the YOLOv5 model used.

(1) However, the sample number and the analyses performed were not described how many measurements were made.

Response: Thanks for the comments. Revisions have been made in the manuscript, in line 96.

(2) The authors should have made other samplings, other species and report. If they did this, include in the supplementary material and describe in greater detail in the materials and methods section.

Response: Thanks for the comments. We recognize the importance of testing this method on different species. While this study primarily focuses on Lentinus edodes, we intend to extend to other edible mushroom species in future research to validate the widespread applicability of our method.

  • Keywords in alphabetical order.

Response: Thanks for the comments. Revisions have been made in the manuscript, in line 27.

  • Check all scientific names and put them in italics.

Response: Thanks for the comments. Revisions have been made in the manuscript.

(5)In the materials and methods section, I suggest to the authors, to write the Python script used as a supplementary file, so that the authors can replicate the YOLOv5 model used.

Response: Thanks for the comments. The Python script has been written, please refer to the attachment for details.

 

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I appreciate the authors’ response. However, my recommendation for the manuscript remains the same, and I do not think the manuscript can be revised into a publishable form up to my standard. YOLOv10 has been released already, and the study is still a basic application of YOLOv5. My research area overlaps much with the study’s research topic. I do not think the strategy of integrating various modules into an object detector, and apply the object detector for a seemingly novel application, without involving true state-of-the-art models, can create publishable results in this day and age.

Author Response

Dear Reviewers and Editors:

Thank you for editing and reviewing my manuscript. In this response, we will use red text to indicate the question and blue text to indicate our answer. The specific changes are as follows:

Reviewer1:

I appreciate the authors’ response. However, my recommendation for the manuscript remains the same, and I do not think the manuscript can be revised into a publishable form up to my standard. YOLOv10 has been released already, and the study is still a basic application of  YOLOv5. My research area overlaps much with the study’s research topic. I do not think the strategy of integrating various modules into an object detector, and apply the object detector for a seemingly novel application, without involving true state-of-the-art models, can create publishable results in this day and age.

1.YOLOv10 has been released already, and the study is still a basic application of YOLOv5.

Response: Thanks for the comments. We understand that YOLOv10 has been released and may possess more advanced features technically. However, when we initiated this research, YOLOv10 had not yet been released.

In terms of accuracy and speed, YOLOv5 can achieve a good balance, taking into account accuracy and speed, especially suitable for real-time applications. YOLOv10 may surpass YOLOv5 on some metrics, but may require higher computational resources and longer training times due to its more complex model structure.

In terms of training data requirements, YOLOv5 has low requirements on data sets and can achieve better performance on small-scale data sets, which is suitable for resource-limited scenarios. The YOLOv10 may require more data and longer training times to fully perform, which may not be the best choice when data resources are limited.

In terms of real-time requirements, the YOLOv5 model is small and the reasoning speed is fast, which is very suitable for embedded devices or real-time processing applications. Due to the more complex model, YOLOv10's reasoning speed may be slow, which is not suitable for scenes with high real-time requirements. The Lentinus edodes instance segmentation task requires efficient real-time processing capabilities. YOLOv5 provides a good balance of performance in our  Lentinus edodes instance segmentation task, and its efficiency, speed, and adaptability make it the best choice for current applications.

Consequently, we opted for the well-established and extensively used YOLOv5, enhancing it by integrating BoTNet, PANet, and VFL modules to address the particular requirements of  Lentinus edodes instance segmentation. We attempted to use various models such as YOLOv8 for instance segmentation, But YOLOv5seg-BotNet exhibits the best overall performance, achieving superior Lentinus edodes instance segmentation in terms of both segmentation accuracy and speed.

In future research, we will consider using the latest YOLOv10 to ensure our work stays aligned with cutting-edge technology. We plan to employ more advanced models (such as YOLOv10) and investigate additional innovative aspects, such as new module integration methods, more complex datasets, and broader application domains.This will further enhance the cutting-edge nature and impact of our research outcomes.

2.My research area overlaps much with the study’s research topic. I do not think the strategy of integrating various modules into an object detector, and apply the object detector for a seemingly novel application, without involving true state-of-the-art models, can create publishable results in this day and age.

Response: Thanks for the comments. We acknowledge the reviewers' proficiency and stringent standards in the relevant domain. We recognize that the strategy of simply integrating different modules may be seen as a technically straightforward application. However, for practical problems in specific domains (such as Lentinus edodes instance segmentation), we believe this approach holds significant importance in practical applications. Our research not only demonstrates how to solve specific problems by improving existing models but also provides experimental evidence showing the effectiveness of these improvements in specific tasks.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Author, I appreciate the comments. The manuscript is suitable for publication.

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Minor editing of English language required.

Response: Thank you for your comment. We have carefully reviewed the manuscript and made revisions. The main changes are as follows:
1. In terms of grammar, checked and revised, and marked in green in the manuscript.
2. In terms of table data representation, the% after the numbers in Tables 2 and 3 has been removed.
3. In terms of references, the year was highlighted, and the writing standards were verified, especially the 17th reference, which was revised.

Back to TopTop