Next Article in Journal
Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm
Previous Article in Journal
Comparison of Higher Education in Pakistan and China: A Sustainable Development in Student’s Perspective
 
 
Article
Peer-Review Record

The Design and Evaluation of an Orange-Fruit Detection Model in a Dynamic Environment Using a Convolutional Neural Network

Sustainability 2023, 15(5), 4329; https://doi.org/10.3390/su15054329
by Sadaf Zeeshan 1,*, Tauseef Aized 2 and Fahid Riaz 3,*
Reviewer 2:
Sustainability 2023, 15(5), 4329; https://doi.org/10.3390/su15054329
Submission received: 2 January 2023 / Revised: 1 February 2023 / Accepted: 20 February 2023 / Published: 28 February 2023
(This article belongs to the Section Sustainable Agriculture)

Round 1

Reviewer 1 Report

1. The parameters of the proposed model (Number of Filters & Size & Stride value & output size) and the training hyper-parameters should be given so that the study can be replicated.

2. The collection and parsing of training, validation and test data should be made clearer. It is not clear from the text flow where the 2000 images were obtained.

3. In the graph given in Figure 1, it should be given according to what the training termination is done, if the model is given in this graph, the layer parameters and how many layers are used. 

4. Why should the image acquistion connection in Figure 1 be connected to the start section?

5. The model build section is very complicated,

1. a table of parameters such as (Number of Filters & Size & Stride value & output size) for the layers you propose in your model,

2. Model hyperparameters

3. Computer resources used

Statements written in this section ("data is saved in the google drive", "The important libraries are imported using pip import com-124 mand. Path is defined and directed towards the directory" ) are not the stages of your work that should be presented.

Google Colab is used as an IDE ? Jupyter Notebooks for Python 

6. The parameters given in Figure 8 are independent data, so no line should be drawn between them.

The conditions under which the data set will be shared with the researchers should be specified. 

Author Response

Dear Editors and Reviewers,

Thank you for the valuable feedback on the submitted paper. The authors have incorporated all the suggestions and recommendations suggested by the reviewers. The additional text has been highlighted in blue. Further, the paper has been revised for spellings and grammar. Details of the corrections and amendments as suggested by reviewers are as follows: 

Reviewer 1:

  1. The parameters of the proposed model (Number of Filters & Size & Stride value & output size) and the training hyper-parameters should be given so that the study can be replicated.

Done. The parameters are now mentioned under “Model Building” section from line 191-214. All parameters including no. of filters, filter size, stride length, output size along with other parameters are included. 

  1. The collection and parsing of training, validation and test data should be made clearer. It is not clear from the text flow where the 2000 images were obtained.

Done. The dataset collection and parsing of data is explained under “Model Building” section from line 177-190.

  1. In the graph given in Figure 1, it should be given according to what the training termination is done, if the model is given in this graph, the layer parameters and how many layers are used. 

Done. The number of layers and its parameters are already explained in the Model Building section. The flow chart has been improved to give a better understanding of the methodology. Epoch no. is used as the stopping criteria and is now mentioned in the flowchart. (line 161)

  1. Why should the image acquistion connection in Figure 1 be connected to the start section?

Done. The data acquisition connected from start depicts the collection of real-time images for testing of the model after offline training has been done. This model is an offline training and online testing model. For online testing, real-time images using an RGB-D camera is obtained and tested on the model. I have modified the flowchart for better understanding. (line 161)

  1. The model build section is very complicated,
  2. a table of parameters such as (Number of Filters & Size & Stride value & output size) for the layers you propose in your model,

Done. A table is added to show all hyperparameters used for training of the model in the study.(line 202)

  1. Model hyperparameters

Done. It is now mentioned in the Model Building Section (190-203)

  1. Computer resources used

Done. It is now mentioned in the Model Building Section (line 199-200)

Statements written in this section ("data is saved in the google drive", "The important libraries are imported using pip import com-124 mand. Path is defined and directed towards the directory" ) are not the stages of your work that should be presented.

Google Colab is used as an IDE ? Jupyter Notebooks for Python 

Google Colab is a cloud-based programming environment that provides a Jupyter notebook interface. As these are not stages of work, the lines have been removed. Done.

  1. The parameters given in Figure 8 are independent data, so no line should be drawn between them.

Done. Figure 8 changed to a bar chart. (line 307)

The conditions under which the data set will be shared with the researchers should be specified. 

Done. The conditions are mentioned in line 333-334.

Author Response File: Author Response.docx

Reviewer 2 Report

From the point of innovation and framework of the paper, this paper should be revised for the following reasons: 

 

first, the advantages and disadvantages of the previous work are not clearly expounded, in other words, the motivation for writing the paper is not explained. 

 

Second, the paper lacks innovation, only a simple detection of orange using a convolution neural network, without explaining the adequacy and advantages of the detection of the fruit using CNN.

 

Some of the revisions may help to improve the readability of the paper.

. Introduction - What are your contributions? Clearly define contributions.

. Better highlight novelty in the study.

. Better explain motivations for the research.

. The data and analyses should be better presented. Add more discussion on the results. Add more comparisons with existing approaches.

. The conclusion section seems to rush to the end. The authors will have to demonstrate the impact and insights of the research. The authors need to provide several solid future research directions clearly. Clearly state your unique research contributions in the conclusion section. Add limitations of the model.

 

Author Response

Cover letter:

Dear Editors and Reviewers,

Thank you for the valuable feedback on the submitted paper. The authors have incorporated all the suggestions and recommendations suggested by the reviewers. The additional text has been highlighted in blue. Further, the paper has been revised for spellings and grammar. Details of the corrections and amendments as suggested by reviewers are as follows: 

Reviewer 2:

From the point of innovation and framework of the paper, this paper should be revised for the following reasons:

first, the advantages and disadvantages of the previous work are not clearly expounded, in other words, the motivation for writing the paper is not explained. 

Done. The drawbacks of previous works and the motivation of the paper is explained in the Literature Review section and in Discussion section.  (line 92-130) (lines 311-329)

Second, the paper lacks innovation, only a simple detection of orange using a convolution neural network, without explaining the adequacy and advantages of the detection of the fruit using CNN.

Done. The gap and significance of contribution of the study is mentioned in the Literature Review section (lines 98-130)

 

Some of the revisions may help to improve the readability of the paper.

. Introduction - What are your contributions? Clearly define contributions.

Done. Contribution of study is clearly mentioned in Literature review section and in conclusion as well. (lines 127-130)(lines 345-350)

. Better highlight novelty in the study.

Done. Novelty is in improving F1 score by using a diversified dataset based on real-time conditions as well as evaluating model not just on offline training and testing but also on real-time online testing. This makes the model more reliable and more suitable for real-time agricultural applications. (lines 318- 329) (lines 358-363) (lines 126-130)

. Better explain motivations for the research.

Done. The study helps to improve performance and reliability of the model to be used in real-time agricultural system. (lines 344-349) (lines 358-363) (lines 98-113)

. The data and analyses should be better presented. Add more discussion on the results. Add more comparisons with existing approaches.

Done. Analysis is improved by comparison with previous studies. Effect of diversified real-time dataset is analyzed and evaluation is compared with previous studies. Further, limitations of the study are also included.  (lines 309-334)

. The conclusion section seems to rush to the end. The authors will have to demonstrate the impact and insights of the research. The authors need to provide several solid future research directions clearly. Clearly state your unique research contributions in the conclusion section. Add limitations of the model.

Done. Significance and impact are highlighted in the lines 345-350. Future direction is provided in the lines 363-365. Unique contribution is presented in the lines 358-363. Limitations are already suggested in the discussion section. (lines 330-334)

 

 

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have finalised the requested changes.  The study can be published in this form.

Author Response

Thank you very much

Reviewer 2 Report

 

Review -1 answers are not satisfactory.

1. Authors havent done the literature review properly failed to notice some of the papers listed below has been published from 2016 onwards with real-time dataset.

 

Reference #1: DeepFruits: A Fruit Detection System Using Deep Neural Networks

by Inkyu Sa *,Zongyuan Ge,Feras Dayoub,Ben Upcroft,Tristan Perez andChris McCool

 

Reference # 2: Vibration Monitoring of the Mechanical Harvesting of Citrus to Improve Fruit Detachment Efficiency

by Sergio Castro-Garcia 1,*ORCID,Fernando Aragon-Rodriguez 1,Rafael R. Sola-Guirado 1,Antonio J. Serrano 2,Emilio Soria-Olivas 2 andJesús A. Gil-Ribes 1

 

Reference # 3: Ganesh, P., Volle, K., Burks, T. F., & Mehta, S. S. (2019). Deep orange: Mask R-CNN based orange detection and segmentation. IFAC-PapersOnLine, 52(30), 70-75.

 

3. Novelty cant include F1-score increase

3. Still the contribution was not discussed much with the objective of the paper.

4. The overall architecture of the proposed model can be better represented

5. As authors mentioned the batch size as 64, what type of variations happends for other batch sizes.

6. What are the features helps in the detection process. 

7. What is the stopping criteria considered for the proposed model

Overall the manuscript cannot be considered for publications

Author Response

Dear Editor and Reviewers,

The authors would like to thank the reviewers for their feedback. We have also tried to address all the concerns of reviewer 2 in a lot of detail.

We assure that the paper has been thoroughly revised for grammatical errors and all the points mentioned by reviewer 2 have been addressed and incorporated in the paper.

REVIEWER 2:

  1. Authors havent done the literature review properly failed to notice some of the papers listed below has been published from 2016 onwards with real-time dataset.

Reference #1: DeepFruits: A Fruit Detection System Using Deep Neural Networksby Inkyu Sa *,Zongyuan Ge,Feras Dayoub,Ben Upcroft,Tristan Perez andChris McCool

 Reference # 2: Vibration Monitoring of the Mechanical Harvesting of Citrus to Improve Fruit Detachment Efficiencyby Sergio Castro-Garcia 1,*ORCID,Fernando Aragon-Rodriguez 1,Rafael R. Sola-Guirado 1,Antonio J. Serrano 2,Emilio Soria-Olivas 2 andJesús A. Gil-Ribes 1

 Reference # 3: Ganesh, P., Volle, K., Burks, T. F., & Mehta, S. S. (2019). Deep orange: Mask R-CNN based orange detection and segmentation. IFAC-PapersOnLine, 52(30), 70-75.

The literature review for this paper includes most relevant and recent articles based on fruit detection using CNN models. Authors have included previous papers that include preprepared database as well as acknowledge a few studies that have incorporated real-time database.

The authors claim that the use of real-time dataset is limited as compared to single image datasets in repositories but at the same time have also mentioned and acknowledge some important papers that have used real-time datasets. In fact, out of three references mentioned by the reviewer, two references 1 and 3 have already been mentioned in the Literature Review of manuscript sent earlier. Reference no [10] and [12] already mention the paper suggested by the reviewer.  

As far as the 2nd reference that is mentioned by reviewer is concerned, the authors have gone over the paper and it is a paper on citrus harvesting by canopy shaking. It does involve real-time testing in field but does not use CNN model for fruit detection. On reviewer’s recommendation, we have added that in real-time testing of fruit in literature review section as reference number 28 in our paper.

Authors would like to assure that all the papers mentioned in the literature review are relevant and have been added after careful consideration. Three types of papers have been included in literature review. First,  the papers that have used fruit detection using various CNN models. An overview is given along with other deep learning models that have been used for the same purpose. Second, papers that use dataset from image repositories like fruit 360, COCO, ImageNet etc. are studied and later in the discussion section their results are compared with real-time dataset. Lastly, customized dataset is also reviewed and importance of actual field testing is recognized.

We do realize that elaborating more on some of the papers and their contribution in the literature review might add more value to the paper, so keeping reviewer’s point of view in mind, we have further improved the literature review section.

Modifications in literature review can be seen in yellow and blue highlights from lines 86-166.

 

  1. 2. Novelty cant include F1-score increase

Novelty of the paper is not the improvement in F1 score but results in the improvement of the F1 score due to novelty in the following aspects of the study:

  1. Use of universal and diversified real-time dataset of oranges. This dataset is not limited to fruits from one orchard in a region like in Florida only or in Spain only as in previous papers. The diversified dataset has most images from orchard of Sargodha, Punjab, in Asia as well as other orchards from Australia, Spain, Florida, and other regions around the world. Some images are taken from google internet based on real-time orchard scenarios as already mentioned in the study. The orchards vary in different parts of the continent based on their topography, environment and weather. This study contributes to providing a global and diversified real-time dataset that can be used in any part of the world.
  2. Previous studies that use a customized dataset like in Deep fruits by Sa et al., use a pre-trained CNN model that had already been trained on ImageNet dataset. This study involves no pretrained model but a fresh model with complete training, verification and testing of given dataset based by fine-tuning the hyperparameters till best performance is reached. A pre-trained model may have learned some unwanted patterns resulting in overfitting. Also, pre-trained data may be biased. Not using a pre-trained approach for customized dataset provides better and a more authentic performance, hence preventing overfitting.
  3. Lastly, where many fruit detection models are only tested offline, this study also does an online testing in the actual orchards of Sargodha in Punjab. Real-time testing contributes to actual and authentic results of the model. The results can be of significant contribution for future studies in actual orchards in Asia, when used with an agricultural machines, like a fruit harvesting robot, for detection of fruits before picking in that area.

The following novel contributions have now also been added in the paper under “Literature Review” section from lines 145-166.

 

  1. Still the contribution was not discussed much with the objective of the paper.

As mentioned above the following contributions have now been clearly indicated in section of “Literature Review” from lines 145-166.

As stated above, the contributions of the paper are as follows:

  1. Use of a diversified real-time dataset of oranges. This dataset is not limited to fruits from one orchard like in Florida only or in Spain only as in previous papers. The diversified dataset has most images from orchard of Sargodha, Punjab in Asia as well as other orchards from Australia, Spain, Florida, and others. Some images are taken from google internet based on real-time orchard scenarios as already mentioned in the study. The orchards vary in different parts of the continent based on their topography, environment and weather. This study contributes to providing a universal and diversified real-time dataset that can be used in any part of the world.
  2. Previous studies that use a customized dataset like in Deep fruits [10], use a pre-trained CNN model that had already been trained on ImageNet dataset. This study involves no pretrained model but a fresh model with complete training, verification and testing of given dataset based on fine tuning the hyperparameters till best performance is reached. (this is explained in Model Building section in detail)
  3. Lastly, where many fruit detection models are only tested offline, this study also does an online testing in the actual orchards of Sargodha in Punjab. Real-time testing contributes to actual performance results of the model. The results can be a significant contribution for future studies in actual orchards in Asia when used with an agricultural machine like a fruit harvesting robot for detection of fruits before picking in that area.

 

  1. The overall architecture of the proposed model can be better represented

The CNN architecture has been explained in more detail under” Model Building” section. All the important features of the model have been mentioned including: input layer, convolutional layers, maxpooling layer, fully connected layers, flattening layer,  drop out layer, output layer along with activation function, loss function, optimizers, anchor boxes, and other features.

The hyperparameters have also been mentioned in the revised version. The number of filters, filter size, stride length, number of hidden layers, learning rate, optimizer used, batch size, kernel size, number of neurons per layer, and stopping criteria has been mentioned. Further computer resources used are also specified.

All relevant detail and information required to replicate the study has been mentioned. However, if any particular information about the model is required by the reviewer, we can provide that also.

Lines 215-272 explains the overall architecture of the proposed model with all important parameters and features.

 

  1. As authors mentioned the batch size as 64, what type of variations happends for other batch sizes.

To process more data per iteration, batch size of 64 is used as it resulted in faster training time as compared to batch size of 32. Further, batch size of 64 resulted in faster convergence as compared to batch size of 32. This happened because more data is processed per iteration, allowing the model to make more accurate updates. A smaller batch size converged more slowly and the training process became unstable due to the increased variance in the gradient updates. Batch size of 64 was a better choice as it offered more stability in training. 

This information in now added in the paper in “Model Building” section from lines 263-272.

 

  1. What are the features helps in the detection process. 

The features that help in detection process include: convolutional layer, max pooling layer, activation function, fully connected layers, forming bounding boxes. All the features are already explained in the Model Building section  from lines

Convolutional layers extract features from the input image through convolution operations. Maxpooling layer down-sample the feature maps generated by the convolutional layers, reducing the spatial size of the data while increasing the robustness of the features. Activation function introduces non-linearity into the model, allowing it to learn complex representations of the data. A custom layer is used to combine the feature maps into a single feature map using an adaptive approach by weighted average. Batch normalization does activations of the layers and speeds up the training and reduces overfitting. The activations from the previous layer are flattened into a vector and then passed through the fully connected layers. The fully connected layers make image class predictions of fruits from the features extracted by the convolutional and pooling layers.

The model features that help in fruit detection are now added in “Model Building” section lines 253- 262.

 

  1. What is the stopping criteria considered for the proposed model

Epoch number is used as the stopping criteria. The flowchart already depicts the stopping criteria in figure 1. Further, the stopping criteria is now also added and explained in the paper in “Model Building” section from lines 268-271.

Epoch number of 100 is chosen based on factors that the accuracy of model was achieved, validation loss not improved after given epoch number and setting epoch number high enough to prevent overfitting. This is also mentioned now in “Model Building” section from lines 269-272.

 

Author Response File: Author Response.docx

Round 3

Reviewer 2 Report

The authors have provided the answer to the comments raised by the reviewer. The manuscript can be published in the present form. 

Back to TopTop