Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Large-Scale Traffic Congestion Prediction Based on the Symmetric Extreme Learning Machine Cluster Fast Learning Method

Symmetry 2019, 11(6), 730; https://doi.org/10.3390/sym11060730

by Yiming Xing^1,2, Xiaojuan Ban^1,*, Xu Liu¹ and Qing Shen¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Symmetry 2019, 11(6), 730; https://doi.org/10.3390/sym11060730

Submission received: 27 March 2019 / Revised: 13 May 2019 / Accepted: 24 May 2019 / Published: 28 May 2019

(This article belongs to the Special Issue Symmetry in Cooperative Applications III)

Round 1

Reviewer 1 Report

+This paper proposes a very interesting framework (S-ELM-Cluster) to solve effectively the multi-model training. Specifically, a large-scale traffic data are divided into small-scale data sets to account for road segment specific characteristics in order to better predict traffic congestion.

+This is a very well written paper that deals with a very interesting approach to traffic congestion prediction. However, before being accpeted for publication, some of the concerns regarding the experiments should be addressed.

- It is not clear how the predicted results are compared with the real traffic conditions. More details about the video surveillance data, e.g., the coverage of the data should be discussed.

- The main metric for evaluation is the accuracy. However, it is not clear what is exactly the accuracy. The authors should provide the definition of the accuracy.

- It is not clear whether the size of the training data (i.e., 5 days data) is sufficient. The reviewer is not convinced if the five-day data are enough to represent the traffic conditions of a region.

- The experiments performed to determine the length of time series seem quite ad hoc. The authors are suggested to mention about a generic technique to determine the optimal length of the time series.

Author Response

Response to Reviewer 1 Comments

Dear editor and reviewers;

Our deepest gratitude goes to you for your careful work and thoughtful suggestions that have helped improve this paper substantially. According to your advice, we have made the following changes marked in red to the paper, and hoping to meet your approval.

Point 1: It is not clear how the predicted results are compared with the real traffic conditions. More details about the video surveillance data, e.g., the coverage of the data should be discussed. 

Response 1: The applied research in this paper is based on the traffic in the main urban area of Nanning City. The floating vehicle data used is the real-time data collected by the traffic management Department of Nanning City. In addition, the electronic map data of Nanning City is also used. The electronic map data is the basic data for matching the track data of the floating vehicle data. The goal of trajectory data matching is to locate it on the actual road in the electronic map and establish an association with it. All analysis and prediction are based on floating vehicle data for the entire experiment. At present, Nanning City has built a complete floating vehicle detection system. The system consists of more than 8,000 floating vehicle monitoring points composed of buses and taxis, which can monitor the traffic conditions in the city at all times. The system obtains real-time floating vehicle data based on the floating vehicle interface provided by the Transportation and Management Department of Nanning City. The data returned by the floating vehicle contains location information (latitude and longitude), as well as information such as speed, angle, and GPS accuracy. It covers almost all main roads of Nanning.

Point 2: The main metric for evaluation is the accuracy. However, it is not clear what is exactly the accuracy. The authors should provide the definition of the accuracy.

Response 2: The regression results are continuous integer varying between 0 and 100. In this paper, prediction is correct when , is the actual congestion index in testing data while is the estimate result predicted by models.

Point 3: It is not clear whether the size of the training data (i.e., 5 days data) is sufficient. The reviewer is not convinced if the five-day data are enough to represent the traffic conditions of a region.

Response 3: The authors actually experimented with the traffic data for the whole year and found that the daily traffic situation has a high degree of similarity. According to the actual situation, the dates of the year are divided into three categories: working days, weekends, and major festivals. The analysis of the article only selected five working days, four weekends, and two major festivals(Ching Ming Festival and May Day) for presentation.

Point 4: The experiments performed to determine the length of time series seem quite ad hoc. The authors are suggested to mention about a generic technique to determine the optimal length of the time series.

Response 4: For time series modeling, the length of the time series is determined, usually using the sample autocorrelation function(ACF)[24, 25] and the sample partial autocorrelation function(PACF)[24, 25]. ACF and PACF are commonly used model identification tools and are usually used to provide initial guess of sequence length. In order to further determine the optimal model order, other tools need to be used. The most commonly used criteria are Akaike Information Criterion(AIC)[26] and Bayesian Information Criterion(BIC)[27] based on information theory. Through relevant literature reading, this paper found that many scholars selected traffic historical data series length 8 or 9.In this experiment, the length of time series is found to increase from four to observe the prediction accuracy of the test set.

Author Response File: Author Response.docx

Reviewer 2 Report

The paper is devoted to application of multiple-models Extreme learning machines (ELM) with symmetric activation functions to a specific task for road congestion prediction.

Although the used methodology is quite modern and the proposed multi-model approach via clustering of data is reasonable, the paper is not well written and can’t be accepted in its present form.

My basic concern and recommendations are:

1. English language is of bad quality so serious revision is a must.

2. There are many incorrect or misleading notions like:

- At line 132: capital letter G instead of small letter g

- Road congestion was denoted by three different letters: C, f(x) and x

3. There are several incorrect claims and terms as follows:

- line 55: the authors must be aware that the ELM are not the only black-box models; such are all NN models as well;

- In section 3.1 the term “time series” is explained in too naïve manner;

- “clustering clusters” does not sound at all;

- Authors should provide clear explanation what they mean by conversion of “enumerated values” into “two-valued feature”;

- Association of length of time series with features number is also misleading since there are defined several features; authors must define clearly input features of their model;

4. There are also left numerous sentences from the journal template at the beginning of sections 2.1, 3.2, 3.3.1, 4, 4.4

5. Legend of Figure 9 is not clear enough.

6. Authors must refer to the original work proposing S-ELM since this is not their own contribution!

Author Response

Response to Reviewer 2 Comments

Dear editor and reviewers;

Point 1: English language is of bad quality so serious revision is a must.

Response 1: The authors have carefully revised the English expression of the full text and all revisions are marked red in the submission.

Point 2: There are many incorrect or misleading notions like:

Point 2-1: At line 132: capital letter G instead of small letter g

Response 2-1: We use the small letter g instead of capital letter G.

Point 2-2: Road congestion was denoted by three different letters: C, f(x) and x

Response 2-2: The road congestion value is unified as C and all changes are marked red in the submission.

Point 3: There are several incorrect claims and terms as follows:

Point 3-1: line 55: the authors must be aware that the ELM are not the only black-box models; such are all NN models as well;

Response 3-1: Modify the sentence to “Extreme Learning Machine (ELM) is one of the black box modeling methods.”

Point 3-2: In section 3.1 the term “time series” is explained in too naïve manner;

Response 3-2: The definition of time series is redefined in this paper and marked in red in the submission. “Time series is a series of numerical values of certain statistical indicators, which are sorted in chronological order.”

Point 3-3: “clustering clusters” does not sound at all;

Response 3-3: There are two “clustering clusters” in the paper. The first one is modified to “clusters”, the second one is replaced by “logical region”.

Point 3-4: Authors should provide clear explanation what they mean by conversion of “enumerated values” into “two-valued feature”;

Response 3-4: In general, variables either indicate measurements on some continuous scale like traffic congestion index of last time in this case, or represent information about some categorical or discrete characteristics like current time in the above features. The features used in this paper mix categorical features with real-valued features, and should be transformed into all categorical or real-valued features. In our cases, all categorical features are subdivided into several single real-valued features, whose value is set to be either 1 or 0. For instance, feature current time will be transformed into 192 new features. Each of them is 0 or 1 to indicate if it appears or not. Thus all the features become real-valued and can be equally evaluated. According to the experiments in practice, such transformation increase the accuracy rate by 2.2% compared with using original feature directly.

Point 3-5: Association of length of time series with features number is also misleading since there are defined several features; authors must define clearly input features of their model;

Response 3-5: The misleading sentences were deleted from the original paper: Unless specified, the length of the time series is 8, that is, the feature number of the input sample is 8.

Through a series of experiments and feature selection work, the following characteristics are selected as input features in this paper:

Road logical region, discrete features, 1, 2, 3, 4,..., 50, a total of 50 kinds of values.

The current time, discrete characteristics, 06:05, 06:15, 21:55, a total of 191 values.

The congestion value in the past eight historical periods: the continuous feature with a range of 0-100.

Road level: Highways, expressways, main roads, secondary roads and branches.

The number of adjacent roads at the road entrance: positive integer, with a value of 0, 1, 2, 3…

The number of adjacent road connections at the road section: positive integer, with a value of 0, 1, 2, 3,....

Point 4: There are also left numerous sentences from the journal template at the beginning of sections 2.1, 3.2, 3.3.1, 4, 4.4

Response 4: We apologize for making such a serious mistake. Sentences from the template have been deleted in the new submission. 2.1The text continues here. 3.2All figures and tables should be cited in the main text as Figure 1, Table 1, etc. 3.3.1 The text continues here. 4Authors should discuss the results and how they can be interpreted based on previous studies and the working hypotheses. The findings and their implications should be discussed in the broadest context possible. In addition, future research directions may be highlighted.4.4 This is an example of an equation:

Point 5: Legend of Figure 9 is not clear enough.

Response 5: We replaced the old picture with a new clear picture.

Point 6: Authors must refer to the original work proposing S-ELM since this is not their own contribution

Response 6: We quoted the original work in two places:

Line 69: In addition, the symmetrical extreme learning machine [23] has the ability to estimate any finite sample with arbitrary precision.

Line 119: On the basis of the characteristics of the structure of ELM model, the symmetry of ELM algorithm[23] is strictly satisfied by structural improvement.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Althoug authros claim that they made serious corrections, in fact they are too formal and insufficient.

First of all, I don’t see significant corrections of English language.

Second, response to some of my comments is unsatisfactory as follows:

Point 3-2: In section 3.1 the term “time series” is explained in too naïve manner;

Again the claim is not correct enough! The authors must know that a time series is a sequence of data points taken at successive equally spaced points in time.

Response 3-3: There are two “clustering clusters” in the paper. The first one is modified to “clusters”, the second one is replaced by “logical region”.

Not everything is corrected! “logical region” is also incorrect terminology! In all cases the correct term is cluster or group.

In features explanation:

Old text: The congestion value in the past 5 minutes: a continuous feature with a range of 0-100

New text: The congestion value in the past eight historical periods: the continuous feature with a range of 0-100.

So you should define clearly the time step! According to previous sentence (“The current time, discrete characteristics, 06:05, 06:15, 21:55, a total of 191 values”) it seems that it is 5 seconds but “past eight historical periods” are not equal to 5 minutes than???

Point 5: Legend of Figure 9 is not clear enough.

Response 5: We replaced the old picture with a new clear picture.

The problem is missing details in the legend! Authors should define clearly the meaning of different colors on picture, e.g yellow denotes roads with middle congestion or?

Author Response

Dear editor and reviewers;

Point 1: Althoug authros claim that they made serious corrections, in fact they are too formal and insufficient. First of all, I don’t see significant corrections of English language.

Response 1: The authors have carefully revised the English expression of the full text again and all revisions are marked in the submission.

Point 2: Second, response to some of my comments is unsatisfactory as follows:

Point 3-2: In section 3.1 the term “time series” is explained in too naïve manner;

Response 3-2: The definition of time series is redefined in this paper and marked in red in the submission. “Time series is a series of numerical values of certain statistical indicators, which are sorted in chronological order.”

Again the claim is not correct enough! The authors must know that a time series is a sequence of data points taken at successive equally spaced points in time.

New Response 3-2: The definition of time series is redefined again. “A time series is an array of data points indexed (or listed or graphed) in time order. Usually, a time series is termed as a sequence taken at the successive equally spaced points in time.”

Response 3-3: There are two “clustering clusters” in the paper. The first one is modified to “clusters”, the second one is replaced by “logical region”.

Not everything is corrected! “logical region” is also incorrect terminology! In all cases the correct term is cluster or group.

New Response 3-3: According to the reviewer's suggestion, we changed the second “clustering clusters” to “clusters” as well.

In features explanation:

Old text: The congestion value in the past 5 minutes: a continuous feature with a range of 0-100

New text: The congestion value in the past eight historical periods: the continuous feature with a range of 0-100.

New Response: The system creates assessment output every 10 minutes so as to portray the present urban traffic situation for every road segment. These data gradually become historical. According to the follow-up experiments, we get that the length of time series is 8, so we select eight historical congestion values as part of the input feature vectors.

New text: The congestion values in the past eight historical periods: the continuous feature with a range of 0-100, wherein each historical period is ten minutes.

Point 5: Legend of Figure 9 is not clear enough.

Response 5: We replaced the old picture with a new clear picture.

The problem is missing details in the legend! Authors should define clearly the meaning of different colors on picture, e.g yellow denotes roads with middle congestion or?

New Response 5: First of all, we are very sorry that we didn't understand the reviewer's opinion correctly. Secondly, we supplement the missing legend information in the picture. We also added an explanatory statement to the text. “The assessment outcomes satisfy the true condition as illustrated in Figure 9. In the map, green represents basic unblocked traffic, orange shows moderate congestion, and red means serious congestion. Seen from the image taken by surveillance cameras, the traffic evaluation accurately reflects the road traffic congestion at that time.”

Author Response File: Author Response.docx

Round 3

Reviewer 2 Report

I repeat that the paper needs serious revision before acceptance. Although there is some advance with regard to the legend of Fig. 9, there is still left the group of words “clustering clusters” than is nonsense… Besides some of the responses in the cover letter are not included in the paper text.

I repeat that English language needs serious revision by a native speaker! From the responses it became clear that the authors did not understand my recommendations because of poor knowledge of English.

Author Response

Dear editor and reviewers:

Point 1: I repeat that the paper needs serious revision before acceptance. Although there is some advance with regard to the legend of Fig. 9, there is still left the group of words “clustering clusters” than is nonsense… Besides some of the responses in the cover letter are not included in the paper text.

Response 1: According to the reviewer's suggestion, we changed the “clustering clusters” to “clusters”, which is left and has been included in the cover letter. We also make sure that all responses in the cover letter have been included in the paper.

Point 2: I repeat that English language needs serious revision by a native speaker! From the responses it became clear that the authors did not understand my recommendations because of poor knowledge of English.

Response 2: We have asked help from a native speaker to carefully revised the English expression of the full text in the last two rounds. But we are sorry that our changes didn’t meet your requirements. In this round we paid for the English editing services provided by MDPI. After our manuscript submitted to MDPI has been edited, we get a certification which says ” The text has been checked for correct use of grammar and commontechnical terms, and edited to a level suitable for reporting research in a scholarly journal. MDPI uses experienced, native English speaking editors.” The certification can be found in the supplementary information.

Article Menu

Large-Scale Traffic Congestion Prediction Based on the Symmetric Extreme Learning Machine Cluster Fast Learning Method

Further Information

Guidelines

MDPI Initiatives

Follow MDPI