Next Article in Journal
3D-CNN Method for Drowsy Driving Detection Based on Driving Pattern Recognition
Previous Article in Journal
Beta Distribution Function for Cooperative Spectrum Sensing against Byzantine Attack in Cognitive Wireless Sensor Networks
Previous Article in Special Issue
Neuromarketing and Big Data Analysis of Banking Firms’ Website Interfaces and Performance
 
 
Article
Peer-Review Record

Bayesian Modeling of Travel Times on the Example of Food Delivery: Part 1—Spatial Data Analysis and Processing

Electronics 2024, 13(17), 3387; https://doi.org/10.3390/electronics13173387
by Justyna Gibas, Jan Pomykacz and Jerzy Baranowski *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2024, 13(17), 3387; https://doi.org/10.3390/electronics13173387
Submission received: 20 June 2024 / Revised: 16 August 2024 / Accepted: 17 August 2024 / Published: 26 August 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

 

This manuscript addresses the importance of spatial data analysis and preprocessing in improving the accuracy of Bayesian models for predicting food delivery times. The study utilizes the OSRM API to generate routes that reflect real-world conditions and visualizes these routes to identify and examine outliers. The authors establish a boundary for maximum route distance based on their analysis and aim to enhance data quality, ultimately improving delivery time predictions and customer satisfaction. The topic is highly relevant given the rapid growth of online food delivery services and the need for accurate delivery time predictions to ensure customer satisfaction. The use of OSRM API and various data preprocessing techniques is well-detailed and methodologically sound. The manuscript emphasizes the importance of data visualization in identifying outliers and improving data quality.

However, I have the following comments: 
While the manuscript discusses the methods and results in detail, it would benefit from more explicit examples or case studies to illustrate the practical implications of the findings. No formula has been given. I m aware that there is a second paper, but we cannot prepare a baysian paper ithough technical background. 

  •  
  • Literature Integration: The manuscript should integrate more recent studies related to spatial data analysis and Bayesian modelling to provide a broader context for the research.

Furthermore, the authors should discuss the limitations of their approach and the potential impact of these limitations on the findings.

The distribution of their data is skew and thus they should explain and menttion how this approach afect their analysis. 

Outlier: I would like to see more analysis on the outlier detection, approach and technical background. 

 

  •  

    • The methods section is comprehensive, but it would be helpful to include more details on the specific parameters used for the OSRM API and the criteria for identifying outliers.
    • The preprocessing steps are well-described, but the rationale for certain decisions, such as the choice of a 30 km upper limit for deliveries, should be elaborated.
    •  
    • The results section provides a clear presentation of the findings, but it could benefit from more detailed explanations of the figures and tables. For example, the significance of the two peaks in the distribution of delivery distances should be discussed in more detail.
    • The discussion of preprocessing results could be expanded to include a more thorough analysis of the impact of data cleaning on the overall model performance.
  • Spatial Analysis:

    • The spatial analysis is a strong aspect of the manuscript, but it would be useful to provide more examples of how the findings can be applied in real-world scenarios.
    • The manuscript should discuss the potential challenges and limitations of using the OSRM API and other routing engines, particularly in the context of varying geographic regions.
  •  

 

I would like to see further analysis in order to better evaluate the quality of the paper. 

Author Response

Comments 1: This manuscript addresses the importance of spatial data analysis and preprocessing in improving the accuracy of Bayesian models for predicting food delivery times. The study utilizes the OSRM API to generate routes that reflect real-world conditions and visualizes these routes to identify and examine outliers. The authors establish a boundary for maximum route distance based on their analysis and aim to enhance data quality, ultimately improving delivery time predictions and customer satisfaction. The topic is highly relevant given the rapid growth of online food delivery services and the need for accurate delivery time predictions to ensure customer satisfaction. The use of OSRM API and various data preprocessing techniques is well-detailed and methodologically sound. The manuscript emphasizes the importance of data visualization in identifying outliers and improving data quality. 

Response 1: We are very grateful to the reviewer for their high praise and remarks. Regarding particular questions 

 

Comments 2: However, I have the following comments:  
While the manuscript discusses the methods and results in detail, it would benefit from more explicit examples or case studies to illustrate the practical implications of the findings. No formula has been given. I m aware that there is a second paper, but we cannot prepare a baysian paper ithough technical background.  

Response 2: We have supplemented section 3 with the following paragraph explaining the application. 

For better understanding of the importance of preprocessing steps and conclusions a short overview of the proposed Bayesian models is in order. Bayesian inference is a method of statistical inference, in which we fit a predefined probability model to a set of data and evaluate outcomes with regards to observed parameters of the model and unobserved quantities, like predictions for new data points [38].  

 
Both models are generalized linear models. We have defined linear predictor as η = Xβ where X denotes vector of features and β is vector of coefficients. Both vectors are size Nx1 [39 ]. We then used logarithmic link function to transform linear predictor’s domain to positive real numbers. It is necessary step, as both models are defined by inverse gamma function. This particular distribution effectively models skewness of the data and provide strictly positive continuous outputs. 

 

Comments 3: Literature Integration: The manuscript should integrate more recent studies related to spatial data analysis and Bayesian modelling to provide a broader context for the research. 

 

Response 3: We have added more recent studies from 2020-2023 related to spatial analysis and travel time estimation.  

 

Comments 4: Furthermore, the authors should discuss the limitations of their approach and the potential impact of these limitations on the findings. 

Response 4: We added the following paragraph about the limitations of our approach in discussion section. 

Our approach has some limitations. First, the study area is restricted to India. Consequently, the results may not be generalizable on an international scale. Second, we mainly used one map provider OSM and one routing engine OSRM therefore, the results may differ from those obtained using other tools. Third, the data used covers a period of three months, however the number of deliveries in cities is relatively low. In this dataset the average number of deliveries in Jaipur is under 40 per day. This does not accurately reflect the actual workload of OFD companies and some methods may not perform as expected in the case of large amounts of data. 

Comments 5: The distribution of their data is skew and thus they should explain and menttion how this approach afect their analysis.  

Response 5: We mentioned the advantage of Bayesian models is analysing skew distributions.  

Distribution of route distances is skew, however in Bayesian modelling such distributions are not problematic. Models can handle various data distributions including joint distributions. 

Comments 6: Outlier: I would like to see more analysis on the outlier detection, approach and technical background.  

Response 6: The outliers detection is not intended to be the main purpose of this study. Bayesian models are not as susceptible to outliers as long as those outliers are within the logical assumptions of the model. We highlight them as they can be used to test model robustness. 

 

Comments 7: The methods section is comprehensive, but it would be helpful to include more details on the specific parameters used for the OSRM API and the criteria for identifying outliers. 

Response 7: 

We added parameters along with their values that were ​​used in http query. 

We used OSRM API to generate approximate delivery routes based on demo server. The parameters and its values in our requests are as follows: service - route, version - v1, profile - driving, overview - false, geometries - geojson and steps - true. 

Comments 8: The preprocessing steps are well-described, but the rationale for certain decisions, such as the choice of a 30 km upper limit for deliveries, should be elaborated. 

Response 8: 

 
We have added an explanation of our decision on a 30 km upper limit supported by the literature. 

 

Comments 9: The results section provides a clear presentation of the findings, but it could benefit from more detailed explanations of the figures and tables. For example, the significance of the two peaks in the distribution of delivery distances should be discussed in more detail. 

Response 9:  

We have corrected the captions under the figures/tables and extended them with additional conclusions. 

Comments 10: The discussion of preprocessing results could be expanded to include a more thorough analysis of the impact of data cleaning on the overall model performance. 

Response 10: 

We added additional information about the impact of data cleaning and other preprocessing techniques on our models. 

Consequently, additional data cleaning was necessary as data used in modeling need to be obtainable from the distributions used. Negative travel times can destabilize models and make it impossible to obtain reasonable results. 

One of the most critical steps in our preprocessing flow was obtaining real world distances. This allowed to construct well-interpreted model. Utilization of standardization assured numerical stability of our models, which otherwise would  be difficult to achieve. Last but not least careful consideration of features used in modeling was equally important. Our second model which took into account two additional variables required over two hours more to perform sampling. 

Comments 11: Spatial Analysis: 

The spatial analysis is a strong aspect of the manuscript, but it would be useful to provide more examples of how the findings can be applied in real-world scenarios. 

The manuscript should discuss the potential challenges and limitations of using the OSRM API and other routing engines, particularly in the context of varying geographic regions. 

Response 11: 

We added the following paragraph about the challenges and limitations of routing engines. 

Additionally, the use of OSRM API involves several limitations, especially when considering usage of shared server. The number of requests per minute is limited and common to all users. Being an open-source project, OSRM does not offer any quality guarantees and in some regions the data may be sparse or outdated. These issues can affect all open-source routing engines. Moreover, there are certain geographical areas where access to external maps or GPS services is restricted. 

Reviewer 2 Report

Comments and Suggestions for Authors

- The paper is good in terms of topic, presentation and arrangement, but there are some minor notes.

- Table No. 2 needs to be revised in the order of the table content.

- All graphic shapes need titles for each shape that indicate the content of the shape and its implications.

-  tables and figures are sequentially presented. It appears to be quite peculiar in this configuration. It is recommended that the manuscript be organized in a more professional manner by positioning the text behind the figures and the writing behind the tables, while adding a simple explanation of the nature of the data and its characteristics.

-The conclusion is exceedingly inadequate. This section should be enhanced by incorporating a variety of references and drawing comparisons with previous research.

- It is noted that the paper did not come up with a clear idea to develop the food delivery process in a practical way that saves time and effort

Comments on the Quality of English Language

Moderate editing of English language required

Author Response

Comments 1: The paper is good in terms of topic, presentation and arrangement, but there are some minor notes. 

Response 1: We are very grateful to the reviewer for their high praise and remarks. Regarding particular questions 

 

Comments 2: Table No. 2 needs to be revised in the order of the table content. 

Response 2: 

We updated Table No. 2 so the presented statistics matches their description. 

 

Comments 3: All graphic shapes need titles for each shape that indicate the content of the shape and its implications. 

Response 3:  

We have corrected the captions under the figures and extended them with additional implications. However we would  not like to include title to each of the shape to do not lose the readability of the results presentation. 

 

Comments 4:  tables and figures are sequentially presented. It appears to be quite peculiar in this configuration. It is recommended that the manuscript be organized in a more professional manner by positioning the text behind the figures and the writing behind the tables, while adding a simple explanation of the nature of the data and its characteristics. 

Response 4: We have tried to streamline the paper organization. 

Comments 5: The conclusion is exceedingly inadequate. This section should be enhanced by incorporating a variety of references and drawing comparisons with previous research. 

Response 5: 

We expanded the section 5 with additional discussion including: clear result presentation supported by references and comparison with other studies, limitations of our approach, limitations of routing engines, more detailed  areas for future work 

 

Based on the route distribution shown in Figure 3 and direct analysis of the two groups of outliers represented in Figure 4 and Figure 5, we classify 3% of the data has outliers. This also indicates that outliers are predominantly deliveries to more distant locations, whereas deliveries within city limits exhibit fewer outliers. Outliers located in the city boundaries most often refer to unreal destination points. One of them is located on the map of Mumbai (bridge) shown in Figure 8. Additionally, our spatial analysis shows that the Indian OFD market has similar trends to the Chinese and English markets. Distribution of orders among cities presented in Figure 7, confirms that use of this type of platforms in much more popular in densely populated areas. 

Our approach has some limitations. First, the study area is restricted to India. Consequently, the results may not be generalizable on an international scale. Second, we mainly used one map provider OSM and one routing engine OSRM therefore, the results may differ from those obtained using other tools. Third, the data used covers a period of three months, however the number of deliveries in cities is relatively low. In this dataset the average number of deliveries in Jaipur is under 40 per day. This does not accurately reflect the actual workload of OFD companies and some methods may not perform as expected in the case of large amounts of data. 

Additionally, the use of OSRM API involves several limitations, especially when considering usage of shared server. The number of requests per minute is limited and common to all users. Being an open-source project, OSRM does not offer any quality guarantees and in some regions the data may be sparse or outdated. These issues can affect all open-source routing engines. Moreover, there are certain geographical areas where access to external maps or GPS services is restricted. 

Our dataset do not accurately reflect the actual workload of OFD companies therefore, it is highly recommended that future research will evaluate used methods on larger dataset. The analysis of the Indian OFD market is based solely on deliveries, further research may take into account other elements such as social and cultural factors. 

 

Comments 6: It is noted that the paper did not come up with a clear idea to develop the food delivery process in a practical way that saves time and effort 

Response 6: 

We agree that we did not present practical way of develop the food delivery process, however it is not main purpose of this study. We focused more on preparing Bayesian modelling approach, as we believe  that it has a lot of potential in travel/delivery time estimation but is not as commonly used. 

Reviewer 3 Report

Comments and Suggestions for Authors

Please refer to the attached file.

Comments for author File: Comments.docx

Comments on the Quality of English Language


Author Response

We are grateful to the reviewer for their very detailed review. We have attached response in a separate file. 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have no further comments. 

Author Response

Thank you very much for your time spent evaluating this paper, as it helped it become much better than initially. 

Reviewer 3 Report

Comments and Suggestions for Authors

Title of the Article: Bayesian Modelling of Travel Times on the Example of Food Delivery: Part 1 - Spatial Data Analysis and Processing

Second Review Report:

In this second review of the article, I note that the authors have made a commendable effort to incorporate the suggestions given in the first review. However, some points still need to be improved:

  1. Lines 64-65: There is an insufficient explanation of how customers' GPS coordinates are used to predict travel time. A more detailed explanation of the process would help clarify.
  2. Lines 74-77: The complexity of creating a custom time distribution is not fully explained. Examples or a deeper explanation of the specific challenges would be helpful.
  3. Lines 78-79: "Bayesian models cannot accommodate geographic coordinates as input data." - It would be useful to briefly explain why Bayesian models cannot directly use geographic coordinates.
  4. Lines 104-105: The mention of the costs of commercial APIs is relevant but could be more specific about why the costs are so high and how this impacts OFD companies.
  5. Lines 154-158: The explanation of the OSRM API error codes could be clearer about how these errors are handled in practice.
  6. Lines 174-179: The description of data transformation techniques and the calculation of meal preparation time needs more details on why these steps are necessary and how they impact the model.
  7. Lines 219-230: Linear interpolation could be explained with more examples or justifications on why more advanced methods are not necessary in this case.
  8. Lines 247-251: "Bayesian inference is a method of statistical inference, in which we fit a predefined probability model to a set of data and evaluate outcomes with regards to observed parameters of the model and unobserved quantities, like predictions for new data points." - It would be useful to expand the explanation of Bayesian inference for readers who are not familiar with the concept.
  9. Lines 252-253: Briefly explaining what "generalized linear models" are and why they are used in this context would help in understanding.
  10. Lines 261-266: Explain how zero or negative values were detected and corrected during the data cleaning process.
  11. Line 273: Define what is considered "traffic density" and how it was measured.
  12. Line 284: Briefly explain what "geojson" and "steps" are in the OSRM API requests.
  13. Lines 285-287: The explanation of how incorrect distances or outliers were handled could be more detailed.
  14. Line 288: Explain how the "road distance projections" were calculated.
  15. Lines 299-307: More details on the methodology used to investigate and compare routes between different routing engines would be useful.
  16. Lines 381-387: Explain how discrepancies between different routing engines impact the analysis.
  17. Lines 405-410: More details on why the heatmap visualization was not effective, including specific examples or technical issues, would be helpful.

Clarity and Coherence:

  • The Discussion section has a good structure, but it lacks clarity in some crucial points, such as the classification of outliers and the identification of unreal destinations. These issues should be explained in more detail to avoid ambiguities and to help the reader better understand the criteria used.
  • The phrase "Outliers located in the city boundaries most often refer to unreal destination points" (Lines 420-421) lacks a clear definition of what "unreal destination points" are. Explaining the concept and providing examples would help increase clarity.
  • Regarding the phrase "our spatial analysis shows that the Indian OFD market has similar trends to the Chinese and English markets" (Lines 422-423), this statement requires more robust empirical support or references to ensure that this comparison is valid and evidence-based.

Excessive Detail and Redundancy:

  • In several previous figures, it was noted that the captions were excessively detailed. This can be tiresome for the reader and, in some cases, unnecessary. In the Discussion section text, references to the figures and analysis of outliers could be more concise, focusing only on the most relevant points.
  • The phrase "Based on the route distribution shown in Figure 3 and direct analysis of the two groups of outliers represented in Figure 4 and Figure 5" (Lines 417-418) could be simplified to avoid redundancy, as the reference to the figures can be mentioned more directly.

Study Limitations:

  • The section adequately discusses some limitations, such as the geographic restriction to India and the reliance on a single map provider (OSM) and routing engine (OSRM). However, it could delve deeper into the implications of these limitations for generalizing the results, especially in international contexts.
  • The mention of the "relatively low number of deliveries per day in Jaipur" (Lines 430-431) is relevant, but it would be more useful if the text detailed how this impacts the validity of the models and the possible consequences for applicability in scenarios with a higher volume of data.

Definitions and Concepts:

  • The Discussion section uses terms like "unreal destination points" and "actual workload of OFD companies" without adequately defining them, which can cause confusion. Explaining these terms with clear examples and concise definitions would help avoid misinterpretations.
  • The discussion on the use of the OSRM API and its limitations is valid, but it could include a brief explanation of what OSM and OSRM are for readers who may not be familiar with these terms.

In conclusion, while the authors have made significant improvements based on the first review, there are still areas that require further clarification and enhancement. Providing more detailed explanations, simplifying figure captions, refining definitions and concepts, and addressing the remaining grammatical issues will greatly improve the overall clarity and coherence of the paper. Addressing these points will help ensure that the study's findings are robust and comprehensible to a broader audience.

Therefore, my opinion is to request a minor revision. Thank you for your efforts so far, and I look forward to seeing the revised manuscript.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

.

Author Response

We are grateful to the reviewer for their very detailed review. We have attached response in a separate file. 

Author Response File: Author Response.pdf

Back to TopTop