Next Article in Journal
Some Geospatial Insights on Orange Grove Site Selection in a Portion of the Northern Citrus Belt of Mexico
Previous Article in Journal
Reduction in Atmospheric Particulate Matter by Green Hedges in a Wind Tunnel
 
 
Article
Peer-Review Record

Comparative Evaluation of Remote Sensing Platforms for Almond Yield Prediction

AgriEngineering 2024, 6(1), 240-258; https://doi.org/10.3390/agriengineering6010015
by Nathalie Guimarães 1,2,3, Helder Fraga 1,2,*, Joaquim J. Sousa 3,4, Luís Pádua 1,2,3, Albino Bento 5 and Pedro Couto 1,2,3
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
AgriEngineering 2024, 6(1), 240-258; https://doi.org/10.3390/agriengineering6010015
Submission received: 18 December 2023 / Revised: 13 January 2024 / Accepted: 18 January 2024 / Published: 22 January 2024
(This article belongs to the Section Remote Sensing in Agriculture)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Comments

 

This study compared different machine learning algorithms and different RS data with different spatial resolutions to predict almond yield. The topic is valuable and fits the scope of the journal, and it is qualified to be published on this journal with regards to scientific quality. I have the following comments for this manuscript.

 1. The introduction is verbose and it is not necessary to tell readers that almond production is very important in Portugal.

 

2. For the Table 1, more space are required between “fared Radiometers” and “Moderate Resolu-“ to improve readability.

 3. “2.2.3. Vegetation indices computation”, I suppose the Vis were come from reflectance. Were they calibrated before calculating the Vis with regarding weather conditions? And how?

Author Response

This study compared different machine learning algorithms and different RS data with different spatial resolutions to predict almond yield. The topic is valuable and fits the scope of the journal, and it is qualified to be published on this journal with regards to scientific quality. I have the following comments for this manuscript.

The authors express their gratitude to the reviewer for providing valuable comments and suggestions that helped enhance the quality of the manuscript.

 

  1. The introduction is verbose and it is not necessary to tell readers that almond production is very important in Portugal.

(1) We recognize the significance of maintaining a concise introduction. Therefore, we have decided to delete the sentences on this topic.

 

  1. For the Table 1, more space are required between “fared Radiometers” and “Moderate Resolu-“ to improve readability.

(2) The authors fully agree with the proposed edition to Table 1. The modifications have been made to table 1.

 

  1. “2.2.3. Vegetation indices computation”, I suppose the Vis were come from reflectance. Were they calibrated before calculating the Vis with regarding weather conditions? And how?

(3) Users have access to pre-calibrated data from PlanetScope, Sentinel 2 and Landsat 8, as these datasets are previously processed to account for atmospheric corrections before being made available. This pre-processing eliminates the need for end users to perform additional atmospheric calibration.

Regarding PlanetScope, in this link (https://developers.planet.com/docs/data/planetscope/) information related surface reflectance can be found (“Surface Reflectance (analytic_sr) assets are orthorectified and radiometrically corrected to ensure consistency across localized atmospheric conditions, and to minimize uncertainty in spectral response across time and location. These multispectral imagery products are designed for temporal analysis and monitoring applications, especially in agriculture and forestry sectors.”)

Considering the document accessed through Google Earth Engine Catalog (Sentinel-2 User Handbook): https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR#description , at page 48/64, it is referenced that “Level-2A processing includes a Scene Classification and an Atmospheric Correction applied to Top-Of-Atmosphere (TOA) Level-1C orthoimage products. Level-2A main output is an orthoimage Bottom-Of-Atmosphere (BOA) corrected reflectance product. In present work, we used the Level-2A.”

For Landsat 8, we utilized the Surface Reflectance data (https://developers.google.com/earth-engine/datasets/catalog/landsat-8), which according to the previous link correspond to “Landsat 8 OLI/TIRS Collection 2 atmospherically corrected surface reflectance”.

 

To complete the section on the identification of the vegetation indices used in this study, we have included the following sentence between lines 217 and 218:

LINES: “The data from each platform was processed, and atmospheric corrections were implemented before it became available.”

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

1) International units are missing in some figures, e.g. temperature, NDVI and SAVI in Figure 5. In addition, irrigation or not is a discrete value why does the heat map demonstrate a continuous trend? Furthermore what do the digits in Figure 3 represent? 

2) RFR outperforms XGBR when considering irrigation, climate data and VIs separately. However, combing the three features generate a reversed result. Noise sensitivity may be sort of a reason, what would happen if noise removal procedure were applied? More experiments must be conducted before giving a reliable conclusion)

3) For most agricultural crops, appropriate watering (including precipitation) and temperature (such as sunshine) are key elements for high products. Using the sophisticated devices and comparing different algorithms have not found sufficient evidence than traditional method, so what is the contribution of the study?  

4) Using data from the high-resolution PlanetScope  generates more accurate prediction than free platforms further emphasises the importance of original data set in the almond production estimation. It is worth of investigating the possibility of information fusion as PlanetScope is not accissible to all users. 

5) In terms of regression prediction, only ensembles of decision trees and boosting algorithms were compared while Elastic Net Regression, Support Vector Regression and Neural Network Regression were not considered. Why?

Comments on the Quality of English Language

Language quality must be improved as readability of some sentences are poor (e.g. Line 459-462, Line 480-484) . Additionally, it should be R-squared instead of R2.

 

Author Response

  • International units are missing in some figures, e.g. temperature, NDVI and SAVI in Figure 5. In addition, irrigation or not is a discrete value why does the heat map demonstrate a continuous trend? Furthermore what do the digits in Figure 3 represent? 

(1) The authors express their gratitude to the reviewer for providing valuable comments and suggestions that helped enhance the quality of the manuscript.

In Figure 5, units for temperature have been included since the information is displayed in Celsius degrees (ºC). NDVI and SAVI, on the other hand, are dimensionless indices that do not require units.

As for irrigation, the seemingly continuous trend in the heat map is due to the use of kernel density estimation of the y-axis variable (yield). Although the irrigation variable is discrete, the KDE plots a smooth, continuous estimate of the underlying distribution based on the available data points. This smoothing effect can give the appearance of a continuous trend. The choice of KDE for visualization is intended to provide a more interpretable and visually appealing representation of the data distribution.

In Figure 3, the digits are the NDVI mean values. The information on these values has been added to the caption.

 

  • RFR outperforms XGBR when considering irrigation, climate data and VIs separately. However, combing the three features generate a reversed result. Noise sensitivity may be sort of a reason, what would happen if noise removal procedure were applied? More experiments must be conducted before giving a reliable conclusion).

(2) Thank you for your insightful feedback. In effect, we say “This situation might be due to XGBR being more sensitive to noise and outliers compared to RFR.”, which indeed may not be the case herein, as we performed a pre-processing of the data: “Data was pre-processed for each orchard separately to filter outliers, based on distribution analysis.”. In fact, we believe this could be to the fact that the XGBR can handle better the complex relationship between features. As such, we have changed the sentence where present the explanation for that issue: “This situation might be due to XGBR being more capable to handle complex relationships between features [45], while RFR is known to perform better with simpler features.

 

3) For most agricultural crops, appropriate watering (including precipitation) and temperature (such as sunshine) are key elements for high products. Using the sophisticated devices and comparing different algorithms have not found sufficient evidence than traditional method, so what is the contribution of the study?  

(3) While we understand the reviewer concerns about the current study benefits over traditional methods, there are some key aspects that should be mentioned. One of the advantages of the current study is the use of advanced machine learning algorithms, which, although complex, can easily be implemented/deployed to provide decision support for growers. Particularly for the almonds, this kind of systems is still incipient. The current study also analyses de use of proprietary and open/free satellite data to calculate vegetation indices that have impact on almond trees. Our study highlights the benefits of using higher resolution data but also the benefits of using free data. In addition, we identified the most important variables (e.g. climatic factors, management practices) that allow the early prediction of yield in a given year, which was never done for the TM region. These are the innovative aspects of the current study, which are now highlighted in the discussion:

LINES 481 - 497: “Optimal results were achieved through the utilization of a proprietary/paid platform, potentially limiting accessibility for certain users. Furthermore, the lower resolution of open platforms may impede the identification of smaller orchard areas. Nevertheless, our study underscores the viability of utilizing freely available remote sensing data. While data was sourced from multiple farmers, expanding the dataset could enhance the robustness of our findings. Despite the above-mentioned limitations, the current study methodology, holds promise for adaptation and implementation in various agricultural settings worldwide. Another important point is the careful analysis of features to increase the overall performance of ML regression models. Through this way, this study not only allows to predict almond yields but also to identify the key factors that significantly influence these predictions. Furthermore, the models developed allow the implementation of early-prediction of seasonal almond yields, with the potential integration of climate data and extreme weather events. The comparison between open and proprietary RS data, shows that these models can be implemented using these two types of datasets. As such, these results provide valuable insights for farmers and other sector stakeholders, in their decision-making processes, which can enhance the sustainability of the almond sector in Portugal.

 

4) Using data from the high-resolution PlanetScope  generates more accurate prediction than free platforms further emphasises the importance of original data set in the almond production estimation. It is worth of investigating the possibility of information fusion as PlanetScope is not accissible to all users.

(4) We agree that the original dataset is of great importance for the estimation of almond production. In fact, PlanetScope (proprietary data) with higher resolution provided the best results in predicting almond yield (R2 = 0.80). However, the other platforms also obtained promising results, with Sentinel-2 using the Random Forest Regressor model achieving an R2 of 0.67 and Landsat-8 using the AdaBoost Regressor model achieving an R2 of 0.72. Moreover, we aimed to emphasize the results of unpaid remote sensing platforms such as MODIS Terra LST and GSMaP (R2 = 0.68 with the Random Forest Regressor model), which are also viable alternatives. This is discussed in lines 428 to 436:

LINES: “The combination of irrigation data, day temperature in March, NDVI in January and SAVI in May (from the PlanetScope platform) showed the best performance (R2 = 0.80), using the XGBR model. Indeed, the use of VIs with higher resolution (3 m) from PlanetScope had a positive influence on almond yield prediction, as the results obtained with Sentinel-2 (R2 = 0.67 - RFR) and Landsat-8 (R2 = 0.72 - ABR) were lower. However, it is worth noting that free data with lower resolution could also be a viable alternative to PlanetScope, particu-larly RS platforms providing climatic data, such as MODIS Terra LST and GSMaP, achieving an R2 of 0.68 when using the RFR model with irrigation and climate data.

 

5) In terms of regression prediction, only ensembles of decision trees and boosting algorithms were compared while Elastic Net Regression, Support Vector Regression and Neural Network Regression were not considered. Why?

(5) In our study, we focused on ensembles of decision trees and boosting algorithms, through the Random Forest Regressor, XGBRegressor, Gradient Boosting Regressor, Bagging Regressor and AdaBoostRegressor models, since they prove effectiveness in handling complex relationships and capturing nonlinear patterns in agricultural data, which is consistent with the objectives of our research. Although Elastic Net Regression, Support Vector Regression and Neural Network Regression are valuable regression techniques, we deliberately excluded them from our study for several reasons. First, our focus was to investigate the performance of tree-based ensemble methods in the context of almond yield prediction, as they have been shown to be successful in similar agricultural applications. Secondly, the complexity and interpretability of these models was considered since the level of interpretability of these models is higher.

However, we appreciate your suggestion and recognize the importance of considering a higher number of regression models. In future work, we will explore the inclusion of other models to provide a more complete comparison of regression techniques.

In the manuscript, the strengths of implemented regression models were described in:

LINES 280 - 306: “Several ML regression models were applied to predict almond yield. Among these models, the Random Forest Regressor (RFR) stands out as a prominent option due to its effectiveness in supervised learning [44], and is used in many fields of study. The RFR algorithm generates an ensemble of decision trees, collectively called a RF. Each decision tree, independently learns patterns and relationships within the data, equally contributing to the final prediction, improving the performance and efficacy of the model and dealing with potential overfitting problems [44]. The XGBRegressor (XGBR) was also implemented in this study and is a supervised learning algorithm that belongs to the gradient boosting family. It employs a boosting technique that sequentially improves decision trees, to create a powerful ensemble model [45]. XGBR optimizes the training objective through gradient descent allowing it can effectively identify complex patterns and dependencies in the data. The model has shown remarkable performance in several areas, making it a valuable tool for almond yield prediction [45]. Regarding the Gradient Boosting Regressor (GBR) algorithm, it is also a gradient-boosting-based regression model. It iteratively builds an ensemble of decision trees, with each successive tree aiming to correct the errors of the previous ones, creating a strong predictive model. The GBR model is widely used in predictive analytics and has demonstrated its efficiency and potential for predicting almond yield [46]. The Bagging Regressor (BR) algorithm, on the other hand, uses a bagging technique similar to the RFR model. It generates an ensemble of decision trees by resampling the training data and fitting each sample to a separate tree, combined to form the final output. Its ability to handle high-dimensional data and complex relationships makes it a suitable candidate for almond yield prediction [47]. Lastly, the AdaBoostRegressor (ABR) algorithm is a boosting-based regression model that iteratively adjusts the weights assigned to the training instances, placing more emphasis on the samples that are difficult to predict accurately. ABR is known for its adaptability to different data types and its ability to handle noisy or incomplete datasets [48]. The ML regression models were implemented using the Python library Scikit-learn [49].”

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Dear authors,

thanks so much for this nice piece of work.

I can't contribute too much - it really looks fine to me. But I have to admit, I am in cultivation of crop plants and not that much in modelling.

However - insert a blank (line 120 ..above mentioned..)

Check scientific name of almond: it should be Prunus dulcis var. dulcis - and of course you can give the name of the actual author: (Mill) D.A.Webb - but that is not necessary. But you definitely should add the botanical family (Rosaceae): Prunus dulcis var. dulcis (Rosaceae).

Maybe it is helpful to check the cultivars - in general there is always an effect of cultivar on yields and quality. Maybe you can access this information. And it would be interesting whether the trees are grafted or on own root.

Then I suggest you check your keywords: almong appears in the title and in the key-words. I suggest you replace it by the scientific name in the keywords.

Have good and happy New Year!

 

Author Response

Dear authors,

thanks so much for this nice piece of work.

I can't contribute too much - it really looks fine to me. But I have to admit, I am in cultivation of crop plants and not that much in modelling.

The authors would like to thank the reviewer for its valuable comments/suggestions which helped to improve the manuscript quality.

 

However - insert a blank (line 120 ..above mentioned..)

Thank you for your correction. The blank line was included.

 

Check scientific name of almond: it should be Prunus dulcis var. dulcis - and of course you can give the name of the actual author: (Mill) D.A.Webb - but that is not necessary. But you definitely should add the botanical family (Rosaceae): Prunus dulcis var. dulcis (Rosaceae).

Prunus dulcis var. dulcis (Rosaceae) were included in the manuscript.

 

Maybe it is helpful to check the cultivars - in general there is always an effect of cultivar on yields and quality. Maybe you can access this information. And it would be interesting whether the trees are grafted or on own root.

Unfortunately, this type of information was not provided. However, it is a valuable suggestion to include this information in future works.

 

Then I suggest you check your keywords: almong appears in the title and in the key-words. I suggest you replace it by the scientific name in the keywords.

Thank you for your suggestion. Almond was replaced by Prunus dulcis in the keywords.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

It is an interesting study and revised version is suitable for publication regarding academic merits.

Comments on the Quality of English Language

Minor language improvement is needed before being published.

Back to TopTop