Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Short-Term Demand Prediction of Shared Bikes Based on LSTM Network

Electronics 2023, 12(6), 1381; https://doi.org/10.3390/electronics12061381

by Yi Shi^1,*

, Liumei Zhang¹

, Shengnan Lu¹ and Qiao Liu²

Reviewer 1:

Reviewer 2:

Reviewer 3:

Reviewer 4:

Electronics 2023, 12(6), 1381; https://doi.org/10.3390/electronics12061381

Submission received: 11 January 2023 / Revised: 26 February 2023 / Accepted: 6 March 2023 / Published: 14 March 2023

(This article belongs to the Special Issue Human Factors in the Age of Artificial Intelligence (AI))

Round 1

Reviewer 1 Report

General Comments:

This paper attempted to demonstrate the use of LSTM model to forecast short-term bike usage. Whilst this topic is interesting that deserves further research, this manuscript has to clarify and justify a number of important issues before it can be considered for publications in the journal “Electronics”.

1. Introduction (line 25) – It is specified that “….. research on demand forecasting of shard bikes mainly includes the following methods”. However, the following descriptions reviewed different bike sharing studies, but not necessarily be focussing on forecasting. This should either be rephrase or really focus “forecasting methods” only.

2. Introduction (line 47) – It said “there are only a few studies on the demand forecast of shared bikes in a short time.” References and some elaborations are needed here. Also, what does “short-time” refer to here? What is the definition of “short” here?

3. Data Source – It is interesting to note that the study is being carried out in the Mainland China whilst the dataset is from the UK and the results are mainly applicable to the UK context. The author needs to properly justify the use of this dataset and the values of results this study to the local context of the authors’ region. This leads to the question of the foundation supporting the objectives of this study.

4. Introduction – An outline of the paper is missing at the end of the introduction.

5. Theory and Methods (Section 2.1) – this sub-section 2.1 elaborated the general background of different machine learning models with no specific application cases and references. It is believed that these general descriptions do not add much value to the manuscript. It is suggested that they should be kept simple. Perhaps, the authors may consider to compare and summarise the key features of these methods into a table with references containing real applications of these methods in forecasting bike usage (or other related transport applications). Section 2.2 could then highlight the major differences of the LSTM model with these models.

6. Data Processing (Section 2.3.3) – This section is the most problematic. The manuscript said that “the dataset obeys the normal distribution and based on that to identify outliers using the 3-sigma principle” (lines 165 to 174). First, what does it mean by “the dataset”? Does it mean the bike usage variable? Secondly, from the box-plot, it is quite clear that the distribution is skewed to the right (i.e. having a longer tail towards higher bike usage) instead of a symmetric normal distribution, and thus the use of 3-sigma method is not justifiable. The authors need to (1) further investigate if the high bike usage values are really outliers or not? and probably also need to account for the possible reasons for that; (2) if these values are really justified as outliers (which the reviewer tends to believe they are not), the authors should consider using other methods of identifying outliers.

7. Splitting of Dataset (lines 171 to 174) – Normally, dataset should be divided into training, validation and test sets. The training set is used to train the network whilst the validation set is to check whether the network is over-fitted or not. Should the network be over-fitted, the validation set will lead to large prediction errors. This method of assessing the neural network is called the cross validation method. The test set is used to measure the model performance expected from the trained network, when it is put into service. The authors may wish to consider splitting the dataset in this way to further enhance the model development and validation. Also, simply descriptive statistics should be presented for the original and after split datasets so as to provide a basic understanding on the datasets.

8. Data Standardisation (lines 175 to 180) – the manuscript said that the data are standardised to standard normal distribution for easier network training purpose. First, the authors need to justify the use of normal (and thus standard normal) distribution. Moreover, the equation of standardisation appears to be different from the well-known formula of standardisation (i.e. (xi – u) / sigma). The authors need to (i) elaborate the rationale for this data “conversion”; and (ii) interpret the meaning of the variables after “conversion”.

9. Influence Factors – Figures 4 to 6 are the basis of factors identifications. However, Figure 4 is very difficult to read (e.g. axis titles and the data points). It appears that trends are hardly observable. The authors may wish to highlight on the graphs the trends that they observed. For Figure 5, what does the colour scheme imply? A legend is needed here for the adopted colour scheme. Figure 6 in fact further substantiate that the bike usage distribution is skewed instead of normal. The figure title also appears to be confused. What do “average” and “per month” refer to on the graph? Do they mean “daily bike usage distribution by month”? Do they represent both weekends and weekdays data?

10. Results Comparison (Table 5) – Table 5 compares the accuracy of different models. It is suggested that the OLS model (which is the most straight forward and frequently used baseline model) should also be compared. The manuscript also criticised that models like linear OLS model need a large amount of data and have obvious locality. The authors may need to further justify this statement and why this method is being excluded as currently this study do have a large amount of data and the proposed method also used a large amount of data.

11. Discussions and Conclusions – It appears that this section is just a conclusion instead of discussion.

12. Language – Whilst the current form of the manuscript is generally well-written and understandable, quite a number of simple and obvious mistakes could still be observed. The authors should better proofread the manuscript before resubmission.

Comments for author File: Comments.pdf

Author Response

Response to Reviewer 1 Comments

Dear Reviewers,

Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.

We also appreciate your clear and detailed feedback and hope that the explanation has fully addressed all of your concerns. In the remainder of this letter, we discuss each of your comments individually along with our corresponding responses.

The details of the response are attached.

****************************************************************************************

Point 1: Introduction (line 25) – It is specified that “….. research on demand forecasting of shard bikes mainly includes the following methods”. However, the following descriptions reviewed different bike sharing studies, but not necessarily be focussing on forecasting. This should either be rephrase or really focus “forecasting methods” only.

Response 1: Thank you for your great suggestions on improving the accessibility of our manuscripts.

The introduction about line25 was rewritten.

Point 2: Introduction (line 47) – It said “there are only a few studies on the demand forecast of shared bikes in a short time.” References and some elaborations are needed here. Also, what does “short-time” refer to here? What is the definition of “short” here?

Response 2: Thank you for your great suggestions on improving the accessibility of our manuscripts. The introduction about line 47 was partly re-analyzed and revised, and the relevant changes can be found below. The “short-time” represent one hour. This paper focuses on bike-sharing demand forecasting at the hourly level.

Point 3: Data Source – It is interesting to note that the study is being carried out in the Mainland China whilst the dataset is from the UK and the results are mainly applicable to the UK context. The author needs to properly justify the use of this dataset and the values of results this study to the local context of the authors’ region. This leads to the question of the foundation supporting the objectives of this study.

Response 3: Thank you for your great suggestions on improving the accessibility of our manuscripts. In data source part, the dataset selected for this article comes from the Kaggle website and is a public dataset. It makes sense to use public datasets to compare with others. London also has a thriving bike sharing business. If we collect some data ourselves, we may be biasing the data in my favor, which would be unfair and not conducive to research and comparison.

Point 4: Introduction – An outline of the paper is missing at the end of the introduction.

Response 4: Thank you for the detailed review. We add the outline of the paper in the end of the introduction. The future research direction and the impact of COVID-19 on the use of shared bicycles are mainly discussed.

Point 5: Theory and Methods (Section 2.1) – this sub-section 2.1 elaborated the general background of different machine learning models with no specific application cases and references. It is believed that these general descriptions do not add much value to the manuscript. It is suggested that they should be kept simple. Perhaps, the authors may consider to compare and summarise the key features of these methods into a table with references containing real applications of these methods in forecasting bike usage (or other related transport applications). Section 2.2 could then highlight the major differences of the LSTM model with these models.

Response 5: Thank you for your great suggestions on improving the accessibility of our manuscripts. We have re-written this part according to the reviewer’s suggestion.

Point 6: Data Processing (Section 2.3.3) – This section is the most problematic. The manuscript said that “the dataset obeys the normal distribution and based on that to identify outliers using the 3-sigma principle” (lines 165 to 174). First, what does it mean by “the dataset”? Does it mean the bike usage variable? Secondly, from the box-plot, it is quite clear that the distribution is skewed to the right (i.e. having a longer tail towards higher bike usage) instead of a symmetric normal distribution, and thus the use of 3-sigma method is not justifiable. The authors need to (1) further investigate if the high bike usage values are really outliers or not? and probably also need to account for the possible reasons for that; (2) if these values are really justified as outliers (which the reviewer tends to believe they are not), the authors should consider using other methods of identifying outliers.

Response 6: We were really sorry for our careless mistakes. Thank you for your reminder.

The dataset represents bike usage variable. By re-analyzing the data set, the bicycle usage data did not conform to the normal distribution, so the 3-sigma method could not be used for data processing. By further looking up the materials, there is no need to discuss outliers when using machine learning and deep learning algorithms for predictive analysis.

Point 7: Splitting of Dataset (lines 171 to 174) – Normally, dataset should be divided into training, validation and test sets. The training set is used to train the network whilst the validation set is to check whether the network is over-fitted or not. Should the network be over-fitted, the validation set will lead to large prediction errors. This method of assessing the neural network is called the cross-validation method. The test set is used to measure the model performance expected from the trained network, when it is put into service. The authors may wish to consider splitting the dataset in this way to further enhance the model development and validation. Also, simply descriptive statistics should be presented for the original and after split datasets so as to provide a basic understanding on the datasets.

Response 7: Thank you for your great suggestions on improving the accessibility of our manuscripts.

Regarding the segmentation of the data set, this paper divided the data set into training set (70%), validation set (10%) and test set (20%). The previous article is not clear, the data set is divided into training set (80%) and test set (20%), but before the training process, 10% of the data in the training set is extracted as a validation set.

Point 8: Data Standardisation (lines 175 to 180) – the manuscript said that the data are standardised to standard normal distribution for easier network training purpose. First, the authors need to justify the use of normal (and thus standard normal) distribution. Moreover, the equation of standardisation appears to be different from the well-known formula of standardisation (i.e. (xi – u) / sigma). The authors need to (i) elaborate the rationale for this data “conversion”; and (ii) interpret the meaning of the variables after “conversion”.

Response 8: Thank you for your great suggestions on improving the accessibility of our manuscripts.

The preprocess of the data is needed to be preformed. In this part, we are not standardization, just are normalized to [0, 1] through the minimum and maximum normalization method.

Point 9: Influence Factors – Figures 4 to 6 are the basis of factors identifications. However, Figure 4 is very difficult to read (e.g. axis titles and the data points). It appears that trends are hardly observable. The authors may wish to highlight on the graphs the trends that they observed. For Figure 5, what does the color scheme imply? A legend is needed here for the adopted color scheme. Figure 6 in fact further substantiate that the bike usage distribution is skewed instead of normal. The figure title also appears to be confused. What do “average” and “per month” refer to on the graph? Do they mean “daily bike usage distribution by month”? Do they represent both weekends and weekdays data?

Response 9: We think this is an excellent suggestion. We are sorry for our carelessness. The figure4

has been modified. The color scheme has been changed for figure5. The figure6 also has been modified. The “average” means the average number of bikes used in the corresponding months of the two years included in the dataset. The “ per month “just represent 12 month. The data also include the weekdays and weekends.

Point 10: Results Comparison (Table 5) – Table 5 compares the accuracy of different models. It is suggested that the OLS model (which is the most straight forward and frequently used baseline model) should also be compared. The manuscript also criticised that models like linear OLS model need a large amount of data and have obvious locality. The authors may need to further justify this statement and why this method is being excluded as currently this study do have a large amount of data and the proposed method also used a large amount of data.

Response 10: Thank you for your great suggestions on improving the accessibility of our manuscripts.

This OLS model was tried in this paper, but the prediction results obtained were not ideal, and there was a large gap from the above prediction models shown in Table 5, so it was not listed before.

Point 11: Discussions and Conclusions – It appears that this section is just a conclusion instead of discussion.

Response 11: We thank the reviewer for the constructive comments. We have re-written this part according to the reviewer’s suggestion. The section discussion and conclusion has already divided into two sections.

Point 12: Language – Whilst the current form of the manuscript is generally well-written and understandable, quite a number of simple and obvious mistakes could still be observed. The authors should better proofread the manuscript before resubmission.

Response 12: We tried our best to improve the manuscript and made some changes to the manuscript.

These changes will not influence the content and framework of the paper. And here we did not list the changes but marked in red in the revised paper. We appreciate for Reviewers’ warm work earnestly and hope that the correction will meet with approval.

*****************************************************************************************************************

We would like to take this opportunity to thank you for all your time involved and this great opportunity for us to improve the manuscript. We hope you will find this revised version satisfactory.

Sincerely,

Yi Shi

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper is very interesting and has the potential to contribute. The paper has also started well, however, it needs significant improvement before becomes publishable. The authors should consider the following to improve the paper.

The title says Short-term Demand Prediction of Shared Transportation Based on LSTM Network. Two important aspects need to be considered here. The authors have used bicycles as a mode, so it should be informed in the title. Further, the authors are using the words bike and bicycles interchangeably- so one nomenclature needs to be consistently used. Second, instead of focusing on the prediction of the short-term demand for shared transportation, the focus is on establishing the advantages or robustness of the LSTM model. This needs to be clarified and accordingly focus of the study should be re-articulated. also. the short them time dimension should be defined. In one place one hour is mentioned and also explain how it should be linked to real time information.

The objective of the study, therefore, is sketchy as well as the motivation behind the study.

Section 2 discusses the theories and methods for such predictions. However, the authors have articulated the section without much evidence and references (not citing the references). Moreover, it was found that the authors have compared the results of the LSTM model with other models. however, details of other methods used are not given.

The LSTM model presented is not adequately discussed. They have given some equations and a general network structure. However, the network structure (Figure 1) is sketchy. It does not have many details and how it works is not clear. However, how the neurons interact with each and how what are the inputs and are the outputs are also not clear. Similarly, it should be contextualised to the context of the study. Currently, it is totally opaque.

Why do the authors use the nomenclature experimental- it is a model estimation or use of the model to predict an outcome? Basically, it is a theoretical simulation study based on some field data. I am not sure about it!

Methodologically, the paper lacks deatils. How the data was used, what algorithm used, how validation was done, etc., are missing.

Equations 8-12 are superfluous as the method and equations are well established. There is no need to provide them, rather a reference should have been sufficient.

It si not clear how the Figure 6 and 7 were obtained. If they are part of the data then they should be either in the intrdocutions section while formulatiing the research problem, or in the results section and should be interpreted. I am not sure how they form a part of the methodology. Moreover, the findings and implications of findings from these Figures are not clear.

The authors have analysed the influential factors, but it is clear how they have used these factors in the analyses and modeling and preactions. Furthemore, the authors claimed to have done optimisation, however no details on the optimisation are found. For example, what was the objective function, what are conditions of optimisation, what constraints were used and so on. Moreover, it is not clear what are optimised predicted results. As I have said previously, the authors have not predicted any valid results under a certain optimised scenario, rather checked how the model results compared to other methods.

Since python was used, it would have been better if pseudo codes are provided- may be in the appendix.

The results of the correlation suggested that the meteorological factors do not show any strong correlation, although may be considered as reasonable, but authors needs to be very careful while using the insginifant ones. However, the major problem is how these factors were used in the model and what are their conditions for oprtimisation are not known. Moreover, i is not clear what is the optimal scenario.

Figure 8 and 9 need more explanation and discussion. As mentioned previously, the authors need to first formaute what is the objective of the study- prediction under optimal scenario or comparative analyses of various model results.

No proper discussion is made. The concussion does not include in significant discussion of the results and engagement with literature as well as implications.

Most importantly, the authirs needs to clearly articulate the critical findings i alignment with the objective(s) of the study.

I am not sure if Figures 2 and 3 are much relevant.

Figures 4 and 5 need to be imporved.

Author Response

Dear Reviewers,

Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.

Details of manuscript revisions are in the annex.

Author Response File: Author Response.pdf

Reviewer 3 Report

I enjoyed reading the content of the article. The topic is interesting and current. Here are my comments and suggestions:

Title

I think it should be made clear that this is about bike sharing. The reference to shared transport in general terms is misleading.

Keywords

Keywords need to be completed, e.g. sharing economy, transport, city bike.

Introduction

a/ The introduction lacks a statement of the purpose of the research. The authors only mention what they intend to do, but that is not the same as the goal.

b/ The authors should define the research gap more clearly - in this form it is ambiguous, poorly expressed.

c/ Literature review should be in-depth. Basing on 7 positions is definitely not enough. I suggest doing a broader review of the literature, and including the summary/conclusions in a table.

Methodology

a/ First, the algorithm of the methodology should be presented - step by step. Then, research questions should be presented that facilitate understanding of the authors' research intentions. The choice of research methods needs to be explained in more detail.

b/ An explanation of why London was chosen for the study should also be provided.

Discussion

a/ Discussion as a section must be separated from conclusions.

b/ There is practically no discussion - it is necessary to add it, taking into account references to other studies. It should also be noted that the research was carried out for the period 2015-2017, and this is 5 years ago - has anything changed since then, which affects public transport, e.g. COVID-19.

Conclusions

a/ There is no reference to research questions.

b/ There are no specific conclusions - there are repetitions of research results.

c/ What did this research bring to the theory? What are the empirical implications of your research? What recommendations for the city authorities can be formulated based on the research results?

In conclusion, the article still needs to be rethought and corrected. In this form, I do not recommend it for printing yet.

Author Response

Dear Reviewers,

Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.

The details of the response are attached.

Author Response File: Author Response.pdf

Reviewer 4 Report

The paper entitled “Short-term Demand Prediction of Shared Transportation Based on LSTM Network” is submitted for possible publication in the MDPI Electronics journal. The paper sets out to predict the short-term demand for shared bikes using the LSTM neural network model. Overall, the paper contains some useful information. However, major revisions are needed.

Comments

1. Lines 16-23, author(s) should provide some worldwide statistics concerning the shared bikes. Is this mode widely spread?

2. The whole Introduction section needs to be rewritten. As is, it is poorly structured. The author (s) needs to show and gap(s) in the literature, state the rationale for addressing the problem of the shared bike, and show the novel part(s) of the paper. I have major concerns here.

3. The objective(s) of the paper is not clear. What would be the impacts of the outcome(s) of this paper?

4. Lines 65-69, I do not see any references here. The same comment goes for all-stated models. Again, I have major concerns here.

5. Lines 114-121, this paragraph needs to be re-written in a tightly structured manner.

6. Section 2.2, it is not clear how the LSTM is more appropriate to the prediction of shared bike demand. More discussion is needed here. Why not use time-series analysis to get the short-term predictions of the shared bikes? Time-series analysis is capable of addressing the seasonality issue. Author(s) are encouraged to see the work in https://doi.org/10.1061/(ASCE)0733-947X(1995)121:3(249)

7. Line 163, something is missing.

8. The discussions contain a lot of speculations. I did not see solid and defensible statements. The discussion part should be rewritten.

9. The issue of unobserved heterogeneity is not addressed in this paper. The author (s) should address this issue as one of the study's limitations. Key threats to the validity of results should be stated.

Author Response

Dear Reviewers,

Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.

The details of the response are attached

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

This revised manuscript has significantly improved over the original version. Most of the comments have been addressed. It can be considered for possible publication after clarifying the following issues.

1. Whilst the data set is from the UK, the analysis was done in the mainland China. The reviewer previously commented on the same issue. However, the author still cannot clarify it. The authors need to clearly identify the context for which the results from this study are applicable.

2. Theory and Methods (Section 2.1) – The authors should consider to compare the key features of these methods (e.g. maybe in the form of a table with references containing real applications of these methods in forecasting bike usage (or other related transport applications)). These could help justify why the LSTM model was chosen.

3. Influence Factors – Figure 4 is not convincing in identifying correlation. It does not show any significant correlation.

Author Response

Dear Reviewers,

Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper is improved signifcantly. The authors have addressed the majority of the concerns. However, I am still concerned about some important aspects.

(1) The use of the term experimental, while this is basically a simulation study.

(2) The authors claim to have made optimisation, which is not really evident from the results. I don't see any optimised results.

(3) I am not sure about the statements made between the lines 337-339. How do they fit it and what the authors are indicating at?

(4) The discussion made is very sketcy and irrelevant. It either needs to be improved contexulaising to the focus of the study or may be removed.

Author Response

Dear Reviewers,

Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors carefully analyzed my comments and suggestions. They agreed with me, resulting in changes to the article. The changes are well implemented. Now, the article is better. Now, I recommend it for publication.

Author Response

Thank you very much for your time involved in reviewing the manuscript and recommending it for publication.

Reviewer 4 Report

The reviewer has revised the revised manuscript. Although the author(s) addressed most of the comments and concerns, their response to pints 6 and 9 are not convincing. Time series analysis is a powerful tool to forecast future demand without the need of explanatory variables. Authors should address this issue in a structured manner. The unobserved heterogeneity issue if unaddressed is likely to produce erroneous outcomes. Authors should address this issue. This issue is one key limitation of the paper.

Author Response

Dear Reviewers,

Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits.

Author Response File: Author Response.pdf

Round 3

Reviewer 4 Report

The reviewer has no further comments

Article Menu

Short-Term Demand Prediction of Shared Bikes Based on LSTM Network

Further Information

Guidelines

MDPI Initiatives

Follow MDPI