Next Article in Journal
Performance Analysis for Time Difference of Arrival Localization in Long-Range Networks
Previous Article in Journal
Metaverse of Things (MoT) Applications for Revolutionizing Urban Living in Smart Cities
 
 
Article
Peer-Review Record

Forecasting Population Migration in Small Settlements Using Generative Models under Conditions of Data Scarcity

Smart Cities 2024, 7(5), 2495-2513; https://doi.org/10.3390/smartcities7050097
by Kirill Zakharov *, Albert Aghajanyan, Anton Kovantsev and Alexander Boukhanovsky
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Smart Cities 2024, 7(5), 2495-2513; https://doi.org/10.3390/smartcities7050097
Submission received: 9 July 2024 / Revised: 22 August 2024 / Accepted: 28 August 2024 / Published: 3 September 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

I recommend this manuscript to be published provided that the following concerns are properly addressed:

1. The contributions of this paper should be emphasized in the introduction.

2. The literature review is quite confusing, it is suggested to add subheadings

3. There are many types of existing prediction methods, such as autoregressive moving average model, BP, grey model, etc., which should be summarized in the literature review.

4. In Section 5, the predicted results should be compared with other prediction models to demonstrate the superiority of this model

Author Response

[Comment 1] The contributions of this paper should be emphasized in the introduction.

[Response 1] Thank you for your suggestion. We have added the following contribution: In our research we have found out that an ensemble of two models trained on the two synthetic datasets predicts the migration balance with real-world data better than a single regressor. The regression in the ensemble is for the migration flow forecast and the binary classification is to specify its direction.

[Comment 2] The literature review is quite confusing, it is suggested to add subheadings.

[Response 2] Thank you for your suggestion. We have added the subheadings to the literature review.

[Comment 3] There are many types of existing prediction methods, such as autoregressive moving average model, BP, grey model, etc., which should be summarized in the literature review.

[Response 3] Thank you for your suggestion. We have added the description of autoregressive approaches to Related Works.

[Comment 4] In Section 5, the predicted results should be compared with other prediction models to demonstrate the superiority of this model.

[Response 4] Thank you for your suggestion. We have added the additional information to the Discussion section. There are indeed many approaches to forecasting. We fully agree that the study of diferent forecasting approaches is of great interest in terms of finding an optimal/superior model for forecasting migration flows. However, it is important to note that our research focused on other aspects of the problem that are also relevant. We have investigated the possibility of using a generative approach to solve the problem of forecasting migration flows in small towns for the model based on data from large cities. 

We applied an adaptive prediction model (based on gradient boosting) to determine how much it is possible to get a high-quality forecast based on our data. Other important part of the research is the hybrid approach for migration forecasting compared to a single regression model. The forecast error is already comparable to the accuracy of the source data, which is sufficient for the given task. Therefore, improved models have the potential to outperform the baseline model, but at this point, they cannot be tested on our data set. Further, our estimate can serve as a basis for all other models

 

 

Reviewer 2 Report

Comments and Suggestions for Authors

See attachment.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Based on the provided text, the overall quality of English in the paper appears to be adequate for academic publication. Here are some specific observations:

1. The paper effectively uses technical terminology related to population migration, smart cities, and machine learning, demonstrating a good understanding of the field.

2. The paper presents its ideas and methods clearly, making it accessible to readers with a background in the relevant fields.

3. The paper is well-organized, with a clear structure and logical flow from introduction to conclusion.

And there are also some areas for Improvement as follows:

1. Some sentences could be simplified to improve readability. For example, long sentences with multiple clauses can be broken down into shorter sentences.

2. While the paper uses technical terms appropriately, there are a few instances where less common words could be replaced with simpler alternatives to enhance clarity.

3. There are some minor grammatical errors and punctuation inconsistencies that could be corrected.

Overall, the English language quality of the paper is sufficient for academic publication. However, addressing the suggested areas for improvement would further enhance its readability and clarity.

Author Response

[Comment 1] While the paper proposes using generative models for synthetic data generation, the models employed (CTGAN, TVAE) are not the most recent advancements. I recommend exploring more advanced generative models such as Di;usion Models or Normalizing Flows to generate higher-quality synthetic data.

[Response 1] Yes, the applied models are not the most advanced, but they are popular and widely used and more lightweight than diffusion models. For this reason, we have decided that it is justified to use exactly these approaches, since the basic applicability of data generation in this problem is considered. However, in future research it is certainly necessary to evaluate the most advanced generation approaches, which will most likely only increase the overall effectiveness of migration forecasting. We have added this comment to the Discussion section.

[Comment 2] The paper primarily utilizes MAE and MSE as evaluation metrics. Consider incorporating additional metrics like R^2 or Mean Absolute Percentage Error (MAPE) to provide a more comprehensive assessment of model performance.

[Response 2] Thank you for your suggestion. We have added the following comment. The relative metrics like Mean Absolute Percentage Error (MAPE) do not suit for our task because the value of the migration balance counted in the number of migrants may occur close to zero, which usually distorts the evaluation. The symmetric MAPE (sMAPE) solves this problem only partially. We do not use the statistical coe;icient of determination because the available real data set is quite small, and questions may arise about the statistical significance of the obtained results.

[Comment 3] The paper primarily uses small town data for testing. I suggest testing the model with data from other regions to evaluate its generalization capability.

[Response 3] Thank you for your suggestion. We have added the comment to the Discussion section. We used data of towns from completely different regions of Russia. These regions differ greatly in terms of socio-economic development. This gives us reason to believe that the main ideas of the research could be successfully applied to forecasting of migration flows in other places as well. The motivation for our study was the lack of data on migration balance in small settlements and the difficulties associated with obtaining them. Currently, the data collected is insufficient to correctly assess the generalizing ability of the developed models. As new data are collected, we plan to do this in future studies. 

[Comment 4] Certain sentences in the paper are structurally complex, making them difficult to understand. I recommend revising these sentences to make them more concise and clear.

[Response 4] Thank you for your suggestion. We have made some changes to the text in order to simplify it.

[Comment 5] It is recommend checking the reference formatting to ensure compliance with academic standards.

[Response 5] Thank you foryoursuggestion. We utilize the journal's reference format, which is automatically adapted by LATEX.

[Comment 6] Please provide an explanation of how missing values and outliers were handled during the data preprocessing stage.

[Response 6] We have added the infromation about outliers in data description. Missing values were handled during the data curation.

[Comment 7] Some sentences could be simplified to improve readability. For example, long sentences with multiple clauses can be broken down into shorter sentences.

[Response 7] Thank you for your suggestion. We have made some changes to the text in order to simplify it.

[Comment 8] While the paper uses technical terms appropriately, there are a few instances where

less common words could be replaced with simpler alternatives to enhance clarity.

[Response 8] Thank you for your suggestion. We have made some changes to the text in order to simplify it.

[Comment 9] There are some minor grammatical errors and punctuation inconsistencies that could

be corrected.

[Response 9] Thank you for your suggestion. We have made some changes to the text in order to improve grammatical errors and punctuation.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents a generative model to forecast population migration in small settlements. The organization of the work is adequate, even though I think there are some aspects to improve:

 

-              English should be improved, there are times when reading becomes difficult.

Introduction

-              “Smart City” concept is not a “vision”.

-              Why did you consider the case for the population size below 100, 000 people? It should be explained in the text.

-              In the text you say that you increase the data for classification and regression tasks, however, you then only improve these data for regression tasks. The question is: why do you generate the data also with classification in mind?.

Related Work

-              A reference where synthetic data is used to filter instances should be added.

-              You should look for related bibliography where generative techniques have been used for the problem of migratory flows, in the Related Works section there is nothing about it.

Initial dataset

-              Perhaps you have set very strict conditions for collecting the data. Perhaps by changing the filters you could have collected information from more populations.

-              The text states that 21 characteristics have been collected, however, Figure 1 shows far fewer. This should be better explained to avoid confusion.

-              When you use SHAP to see which are the most important characteristics, what do you mean? the pull or push scenario?

-              What is the limit for considering a city as large or small? Figure 2 shows these terms but does not explain them.

Problem formulation

-              It is said that the regression results were not good and that is why it was decided to divide the methodology into two: classification and regression. It would be interesting to see what the metrics of those experiments were, including discussing why it is believed that they were not good. One of the main reasons may be because you only tested one regression model, if more had been tested perhaps the results would have been better and the splitting of the problem would not have been necessary.

 

Comments on the Quality of English Language

-              English should be improved, there are times when reading becomes difficult.

 

Author Response

[Comment 1] English should be improved, there are times when reading becomes dificult.

[Response 1] Thank you for your suggestion. We have made some changes to the text in order to simplify it and improve grammatical errors and punctuation.

[Comment 2] “Smart City” concept is not a “vision”.

[Response 2] Thank you for your suggestion. We have changed the description.

[Comment 3] Why did you consider the case for the population size below 100, 000 people? It

should be explained in the text.

[Response 3] Our research was motivated by the urban planning platform. And for such a platform, small towns (under 100,000) are of great interest, because the goal is to see how certain policy developments might affect migration flows. We have added this information to the paper.

[Comment 4] In the text you say that you increase the data for classification and regression tasks, however, you then only improve these data for regression tasks. The question is: why do you generate the data also with classification in mind?

[Response 4] Thank you for your suggestion. We have revised the text to ensure clarity and avoid misunderstandings.

[Comment 5] A reference where synthetic data is used to filter instances should be added.

[Response 5] Thank you for your suggestion. We have added some bibliography.

[Comment 6] You should look for related bibliography where generative techniques have been used for the problem of migratory flows, in the Related Works section there is nothing about it.

[Response 6] Thank you for your suggestion. We have added the description and some bibliography.

[Comment 7] Perhaps you have set very strict conditions for collecting the data. Perhaps by changing the filters you could have collected information from more populations.

[Response 7] The source of data described at the beginning of section 5.1. Unfortunately, the collection contains only large cities (over 100,000). So, the main problem is not our filters, but the number of large cities (~180). Therefore, we only have ~180 (number of cities) * 12 (years) ≈ 2160 examples. We have added description about filters in the Introduction section. 

[Comment 8] The text states that 21 characteristics have been collected, however, Figure 1 shows far fewer. This should be better explained to avoid confusion.

[Response 8] Yes, you are right. In the main source of data, we collected 21 characteristics of large cities. When it came to applying this model to small towns, we couldn't find all these characteristics for them. So, we were forced to reduce the number of features to use this model to predict migration flows in small towns. We have added explanation to the "Initial dataset and feature selection" section.

[Comment 9] When you use SHAP to see which are the most important characteristics, what do you mean? the pull or push scenario?

[Response 9] We have used both scenarios in SHAP Figures. We didn’t separate the in- and outflow in our current research, we only separate the classification and regression task. But in our furhter research we want to consider separation between scenarios. We have added this description to the Figures.

[Comment 10] What is the limit for considering a city as large or small? Figure 2 shows these terms but does not explain them.

[Response 10] We have changed Fig 2 caption: a) Population over 100 thousand people b) Population below 100 thousand people.

[Comment 11] It is said that the regression results were not good and that is why it was decided to divide the methodology into two: classification and regression. It would be interesting to see what the metrics of those experiments were, including discussing why it is believed that they were not good. One of the main reasons may be because you only tested one regression model, if more had been tested perhaps the results would have been better and the splitting of the problem would not have been necessary.

[Response 11] Your concerns are valid and understandable. From our point of view, the advantage of the hybrid model over the single model approach is due to the fact that we are dealing with the processes of a slightly di;erent nature. This is also justified by the theory of push and pull. Our recent experiments show that the same features have completely di;erent levels of importance when it comes to predicting inflow, outflow or even direction (classifier). We believe this is the main reason why a hybrid model could be a better solution. In fact, we are now moving towards a hybrid approach that combines three models, and the results are quite promising.
We have added our response and detail description in Discussion section.

Reviewer 4 Report

Comments and Suggestions for Authors

The work proposes and compares methodologies for data balancing procedures in migration analysis and prediction problems, comparing several methods based on machine learning for creating synthetic samples to perform data balancing. In general, I see a relevant contribution in the work, the methods are appropriate and are at the frontier of knowledge in machine learning and the applications of the method are useful in the context of urban and demographic planning and in the analysis of smart cities, and so within the context of the article.

But I believe that the work could benefit from a minor revision before possible publication. In particular, I found the description of the data used in the study to be deficient, and a table with all the variables used and descriptive statistics would have been useful. As the main objective is to analyze data balancing methodologies using synthetic samples, it is essential to describe the data used.

The abstract of the article also needs to be improved, summarizing the original method contributions and the main results.

A review of the text with the help of a native English reader would also be useful for the text.

In terms of structure, I would recommend combining some sections by creating a methodology section, and placing this section before the data section.

Comments on the Quality of English Language

A review of the text with the help of a native English reader would also be useful for the text.

Author Response

[Comment 1] I found the description of the data used in the study to be deficient, and a table with all the variables used and descriptive statistics would have been useful. As the main objective is to analyze data balancing methodologies using synthetic samples, it is essential to describe the data used.

[Response 1] Thank you for your suggestion. We have added the Table 1 which include the key statistics of small dataset attributes.

[Comment 2] The abstract of the article also needs to be improved, summarizing the original method contributions and the main results.

[Response 2] Thank you for your suggestion. We have improved the abstract by clarifyuing our contribution.

[Comment 3] A review of the text with the help of a native English reader would also be useful for the text.

[Response 3] Thank you for your suggestion. We have made some changes to the text in order to simplify it and improve grammatical errors and punctuation.

[Comment 4] In terms of structure, I would recommend combining some sections by creating a methodology section, and placing this section before the data section.

[Response 4] Thank you for your suggestion. We separate the problem formulation section and add the methodology section.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

This paper forecasts the population migration in small settlements using generative models in conditions of data scarcity. The topic of this article is attractive and interesting. However, there are still the following shortcomings:

1. The introduction should emphasize the contributions of this article

2.      Currently, there are many prediction methods used for population forecasting, but a comprehensive summary was not provided in Section 2

3. The prediction results of the method proposed in this article should be compared with other prediction models (such as grey model, BP, ARIMA) to highlight its superiority.

Author Response

Dear Reviewer, Thank you for taking the time to review our submission. We appreciate your feedback and have taken your comments into consideration. We have addressed each of your points below. 

Comment 1: The introduction should emphasize the contributions of this article

Response 1: Thank you for your suggestion. We have added the contribution in the Introduction section (page 2, lines 65-81).


Comment 2: Currently, there are many prediction methods used for population forecasting, but a comprehensive summary was not provided in Section 2

Response 2: Thank you for your suggestion. We have added the description of forecasting methods. But it also worth noting, that our study focuses on forecasting the migration balance but not on population forecasting (page 3).


Comment 3: The prediction results of the method proposed in this article should be compared with other prediction models (such as grey model, BP, ARIMA) to highlight its superiority.

Response 3: Thank you for your suggestion. We have added comparison with several models on different kinds of datasets, providing the statistically significant results (Table 4).

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have revised the paper following my previous concerns, and now it  satisfies the standard of journal reurement. Therefere, I suggest to accept the paper for publication at this version.

Comments on the Quality of English Language

The Quality of English expression is fine at this version.

Author Response

Dear Reviewer, Thank you for taking the time to review our submission. We appreciate your feedback. 

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

I think this version can be published.

Back to TopTop