Next Article in Journal
Spatiotemporal Dynamics and Simulation of Landscape Ecological Risk and Ecological Zoning Under the Construction of Free Trade Pilot Zones: A Case Study of Hainan Island, China
Previous Article in Journal
Interactions Between SDG 6 and Sustainable Development Goals: A Case Study from Chenzhou City, China’s Sustainable Development Agenda Innovation Demonstration Area
 
 
Article
Peer-Review Record

Study on the Risk of Urban Population Exposure to Waterlogging in Huang-Huai Area Based on Machine Learning Simulation Analysis—A Case Study of Xuzhou Urban Area

by Shuai Tong 1, Jiuxin Wang 2, Jiahui Qin 3, Xiang Ji 1,4,* and Zihan Wu 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 15 March 2025 / Revised: 20 April 2025 / Accepted: 24 April 2025 / Published: 25 April 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for submitting your manuscript to Land.

 

introduction

1)The article introduces a comparison between machine learning and traditional methods, as well as specific research applications. However, there is a lack of smooth transition from theoretical discussion to practical case studies. The paragraph structure needs to be optimized to make the flow of ideas more coherent.

 

2)The article frequently mentions "machine learning methods" and "traditional hydrological simulation methods." These discussions could be presented more concisely. When discussing the application of machine learning, the article introduces several models and their outcomes but does not sufficiently elaborate on the advantages, disadvantages, and applicable scenarios of each method. Additionally, more background information is needed to support the research questions and conclusions.

 

3)The introduction involves complex technical background; please ensure that non-expert readers can also understand the working principles and applications of machine learning models.

 

4)The article mentions the advantages of machine learning in handling large-scale data. Please add specific experimental data or case studies to demonstrate the limitations of traditional methods and the superiority of machine learning, making the content more convincing.

 

5)The article does not provide a clear enough explanation for why population, land, and buildings are considered indirect rather than direct factors in causing waterlogging. This explanation needs further elaboration to deepen the argument and clarify the roles these factors play in the occurrence of waterlogging.

 

6)When mentioning the concept of a "sponge city," it would be beneficial to further discuss how machine learning can be integrated with actual urban construction to enhance disaster prevention capabilities. This would help readers better understand the practical application and effectiveness of this concept. By explaining how machine learning can optimize urban planning, water management systems, and flood risk prediction in real-world scenarios, the concept of a sponge city can be made more tangible and its impact more evident.

 

Materials and data

1)When discussing the selection of disaster factors, it is important to clearly define the specific content of categories such as "meteorological factors" and "topographic factors" to ensure that readers can easily understand how these factors influence disasters. For example, meteorological factors could include rainfall intensity, wind speed, and temperature, while topographic factors might involve elevation, slope, and drainage patterns. Providing clear explanations and examples of how each factor contributes to disaster risk will help readers grasp the complex relationships between these variables and their impact on disaster occurrence.

2)The "artificial surface environment index" mentioned in the article should include more specific impacts of human activities, such as the effects of urban construction and land development.

3)A more detailed explanation of the interactions between different disaster factors, such as how meteorological factors and topographical factors work together to produce disasters, could be provided.

 

methods

1)Collinearity analysis of disaster-inducing attribute data should be conducted to ensure the relative independence of each indicator.

2)In the comparison of machine learning models, the appropriate algorithm should be selected and evaluated based on metrics such as accuracy, F1 score, and recall.

3)It is necessary to use ArcGIS software to calculate kernel density and categorize flood risk levels based on the waterlogging results derived from the optimal algorithm in order to accurately assess and visualize areas of high risk, enabling more effective disaster management and decision-making.

4)Using nighttime light data and population data for regression analysis can help reflect population distribution density.

5)When calculating population distribution, it is necessary to choose the most suitable model based on different regression functions (such as linear, exponential, quadratic polynomial, and power functions) to ensure that the model accurately reflects the actual population distribution.

 

discussion

1)Please further explain the significant advantages of the CatBoost model in capturing nonlinear relationships and feature processing, and how its accuracy of 81.67% validates its applicability in complex geographic environments.

2)The distribution of waterlogging points identified in the study, including urban built-up areas, suburban, and rural areas, reveals the spatial characteristics of waterlogging risk.

3)How does the distribution of waterlogging points identified in the study (including urban built-up areas, suburban, and rural areas) reveal the spatial characteristics of waterlogging risk?

4)How can the introduction of real-time meteorological monitoring data enhance the dynamic prediction ability of the model to improve the foresight of the early warning system?

5)Why should the "point-line-plane" integrated management strategy be prioritized for high-risk areas (such as M5N5 and M5N4 regions), and what specific measures should be implemented? How can community-level emergency response capacity be strengthened in medium-risk areas (such as M3N4 and M4N3), and what specific measures can be taken to improve response capabilities?

6)Please elaborate further on why it is necessary to incorporate rainfall and waterlogging risk assessments into the rigid constraints of land spatial planning and combine them with urban regeneration actions to gradually restore the natural hydrological cycle.

Author Response

introduction

1)The article introduces a comparison between machine learning and traditional methods, as well as specific research applications. However, there is a lack of smooth transition from theoretical discussion to practical case studies. The paragraph structure needs to be optimized to make the flow of ideas more coherent.

 Response: Based on your suggestions, I have optimized the paragraphs in the introduction part. However, this article is a research piece and it is not advisable to write too much case study content. Therefore, only appropriate discussions have been made.

2)The article frequently mentions "machine learning methods" and "traditional hydrological simulation methods." These discussions could be presented more concisely. When discussing the application of machine learning, the article introduces several models and their outcomes but does not sufficiently elaborate on the advantages, disadvantages, and applicable scenarios of each method. Additionally, more background information is needed to support the research questions and conclusions.

 Response: Based on the suggestions, I reduced the comparative discussion between traditional hydrological methods and machine learning methods. Meanwhile, the advantages and disadvantages of the method as well as the applicable scenarios were discussed.

3)The introduction involves complex technical background; please ensure that non-expert readers can also understand the working principles and applications of machine learning models.

Response:  It is indeed not easy for non-professional readers to understand its working principle, but due to the limited space of this article, it is difficult to elaborate in detail. Non-professional readers should be able to understand the analysis results and their comparisons easily.

4)The article mentions the advantages of machine learning in handling large-scale data. Please add specific experimental data or case studies to demonstrate the limitations of traditional methods and the superiority of machine learning, making the content more convincing.

Response: The advantages of machine learning are obvious. I don't think there is a need to add more experimental data or case illustrations. For instance, the simulation analysis of a waterlogged area on the scale of a city needs to be divided into dozens or even hundreds of catchment areas, and then simulated and analyzed separately. Each catchment area needs to be modeled in detail through professional software (surface system, river channel system, pipe network system, rainfall data input), and then run to obtain the results. The process needs to be repeated dozens or even hundreds of times. Machine learning only needs to feed the result data and index data of several catching areas to the machine learning, and it can learn the features and output the results of other areas, thereby greatly reducing the workload and time of analysis and simulation.

5)The article does not provide a clear enough explanation for why population, land, and buildings are considered indirect rather than direct factors in causing waterlogging. This explanation needs further elaboration to deepen the argument and clarify the roles these factors play in the occurrence of waterlogging.

Response:  When urban flooding occurs, the main attributes of population, land and buildings are disaster-bearing bodies. In this article, I would like to emphasize the research paradigm of "disaster-causing factors - disaster-bearing bodies". The population, land and buildings are generally classified as the disaster-bearing body part.

6)When mentioning the concept of a "sponge city," it would be beneficial to further discuss how machine learning can be integrated with actual urban construction to enhance disaster prevention capabilities. This would help readers better understand the practical application and effectiveness of this concept. By explaining how machine learning can optimize urban planning, water management systems, and flood risk prediction in real-world scenarios, the concept of a sponge city can be made more tangible and its impact more evident.

 Response:  Your suggestions are highly targeted. The discussion in this part is very meaningful. In the discussion section, I explored how the construction of sponge cities can be combined with machine learning and Internet of Things technology.

Materials and data

  • When discussing the selection of disaster factors, it is important to clearly define the specific content of categories such as "meteorological factors" and "topographic factors" to ensure that readers can easily understand how these factors influence disasters. For example, meteorological factors could include rainfall intensity, wind speed, and temperature, while topographic factors might involve elevation, slope, and drainage patterns. Providing clear explanations and examples of how each factor contributes to disaster risk will help readers grasp the complex relationships between these variables and their impact on disaster occurrence.

 Response: I have already made a brief explanation of the influence principle of each influencing factor in the 2.2 indicator selection section. Meanwhile, this part of the content also contains the necessity of factor selection.

  • The "artificial surface environment index" mentioned in the article should include more specific impacts of human activities, such as the effects of urban construction and land development.

Response:The formation process of rain and flood is rainfall, infiltration, runoff and convergence. The influence of urban construction and land development on rain and flood is reflected in three aspects: infiltration, runoff and catchment. The specific influencing links are reflected in subsurface infiltration and soil moisture. The collinearity between the two indicators of urban construction and land development and land use is too strong. Machine learning analysis should not occur simultaneously. Therefore, I chose indicators such as land use, road network density, and railway network density as the artificial surface environment index.

3)A more detailed explanation of the interactions between different disaster factors, such as how meteorological factors and topographical factors work together to produce disasters, could be provided.

Response: When selecting this part of the indicators, I single-factor discussed their mechanism of action on the formation of urban flooding. The influence mechanism of interaction has been rewritten in the discussion section at present.

methods

  • Collinearity analysis of disaster-inducing attribute data should be conducted to ensure the relative independence of each indicator.

Response: I agree with you. This is how this article analyzes it.

  • In the comparison of machine learning models, the appropriate algorithm should be selected and evaluated based on metrics such as accuracy, F1 score, and recall.

Response: I agree with you. This article does indeed conduct a comparative discussion of the model based on these indicators.

  • It is necessary to use ArcGIS software to calculate kernel density and categorize flood risk levels based on the waterlogging results derived from the optimal algorithm in order to accurately assess and visualize areas of high risk, enabling more effective disaster management and decision-making.

Response: Thank you for your recognition

  • Using nighttime light data and population data for regression analysis can help reflect population distribution density.

Response: Thank you for your recognition

  • When calculating population distribution, it is necessary to choose the most suitable model based on different regression functions (such as linear, exponential, quadratic polynomial, and power functions) to ensure that the model accurately reflects the actual population distribution.

Response: Thank you for your recognition

 

discussion

  • Please further explain the significant advantages of the CatBoost model in capturing nonlinear relationships and feature processing, and how its accuracy of 81.67% validates its applicability in complex geographic environments.

Response: In fact, the indicators processed by the machine learning model in this paper are mainly linear features, but the analysis results are nonlinear. This accuracy rate is evaluated based on the index system of this paper, and it is applicable to cities in the Huanghuai Plain. This article has been proved in the research of the main urban area of Xuzhou.

  • The distribution of waterlogging points identified in the study, including urban built-up areas, suburban, and rural areas, reveals the spatial characteristics of waterlogging risk.

Response: The elaboration of this part has been strengthened.

  • How does the distribution of waterlogging points identified in the study (including urban built-up areas, suburban, and rural areas) reveal the spatial characteristics of waterlogging risk?

Response: As you mentioned, in this paper, the severity of waterlogging areas is judged by calculating the spatial distribution and density of waterlogging points, and at the same time coupling the density characteristics of population distribution, thereby obtaining the spatial characteristic distribution of risks.

  • How can the introduction of real-time meteorological monitoring data enhance the dynamic prediction ability of the model to improve the foresight of the early warning system?

Response: This is indeed worth discussing because this part is an inevitable trend of future development. By integrating the analytical techniques of this study into various meteorological sensors, temperature and humidity sensors, remote sensing detection devices, communication devices, etc. of the Internet of Things (IoT), real-time data updates can be achieved, thereby realizing the real-time dynamic changes of predictions. This is not technically difficult, but it requires a considerable investment in urban infrastructure construction.

  • Why should the "point-line-plane" integrated management strategy be prioritized for high-risk areas (such as M5N5 and M5N4 regions), and what specific measures should be implemented? How can community-level emergency response capacity be strengthened in medium-risk areas (such as M3N4 and M4N3), and what specific measures can be taken to improve response capabilities?

Response: High-risk areas are more vulnerable to rain and flooding and pose a greater threat to life safety. Therefore, it is necessary to give priority to systematic management of points, lines and surfaces. Point-level measures focus on the refined renovation of small and micro Spaces, such as low-lying nodes, building ancillary facilities, and the storage and regulation facilities in old residential areas. Rain gardens, permeable pavements, green roofs, sunken tree pits, etc. can be renovated. The line-level measures focus on the improvement of the drainage system. Specifically, improvements can be made in rivers, waterways, and drainage corridors, such as repairing and widening rivers, upgrading rainwater networks, and laying drainage ditches along roadsides. At the surface level, measures should focus on ecological restoration of territorial space and zonal control of sponge cities. For instance, rivers and lakes should be connected through river networks and canals to enhance regulatory capabilities. In the zoning of sponge cities, requirements for seepage and water storage indicators as well as development restrictions should be strengthened.

 

  • Please elaborate further on why it is necessary to incorporate rainfall and waterlogging risk assessments into the rigid constraints of land spatial planning and combine them with urban regeneration actions to gradually restore the natural hydrological cycle.

Response: Countries such as Japan and the Netherlands have incorporated flood risks into their territorial spatial planning. Its disaster prevention planning holds the same high status as the territorial space planning. China's documents such as the "National Spatial Planning Law" and the "Guidelines for Sponge City Construction" clearly state that it is necessary to strengthen flood risk assessment. Rigid constraints are concrete manifestations of implementing the national disaster prevention and mitigation strategy. Although China has conducted evaluations of resource and environmental carrying capacity and the suitability of territorial space development at present, the content includes the assessment of flood disasters, but the rigid constraints on urban waterlogging are too low. The old urban areas, which were developed earlier, are low-lying and prone to urban flooding. Promoting the prevention of urban flooding during urban renewal can achieve multiple benefits at once.

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript explores the exposure of urban populations to urban flooding risk in the Huanghuai region of China, using the main urban area of Xuzhou City as a case study, using machine learning simulation and analysis methods to predict flooding points, and assessing the relationship between flooding risk and population exposure. I think there are some innovations in this article, but it needs to be revised before publication. My specific comments are as follows:
1. The introduction of the article describes the impact of climate change on flooding risk, and some additional data on projections of future climate change scenarios could be considered to enhance the foresight and urgency of the study.
2. The introduction presents the background of the increase in extreme rainfall events due to climate change and the research progress on the use of machine learning, but it is not comprehensive. It should include urbanization, climate change, land use, and ecology. It is suggested to refer to the following references to enhance the comprehensiveness. - Machine learning in modelling the urban thermal field variance index and assessing the impacts of urban land expansion on seasonal thermal environment Forecast Urban Ecosystem Services to Track Climate Change: Combining Machine Learning and Emergy Spatial Analysis. 
3. When discussing the model performance, the article should explain some details about the model training and validation process, such as the division ratio between the training and validation sets, and the hyper-parameter tuning of the model, etc. This will help readers to better understand the model. This will help the reader to better understand the performance and reliability of the model.
4. The article mentions that “high-risk areas have higher population densities”, but it lacks elaboration on the specific characteristics and causes of these areas. For example, the topography, land use type, etc. of these areas to enhance the interpretability of the results.
5. When discussing model limitations there is a need to explain how these limitations can be overcome. For example, can the accuracy of the predictions be improved by introducing more real-time data or improving the model structure?
6. The article suggests “incorporating rainfall and flooding risk assessment into the rigid constraints of territorial spatial planning”, but does not provide details on how to achieve this goal.

Author Response

The manuscript explores the exposure of urban populations to urban flooding risk in the Huanghuai region of China, using the main urban area of Xuzhou City as a case study, using machine learning simulation and analysis methods to predict flooding points, and assessing the relationship between flooding risk and population exposure. I think there are some innovations in this article, but it needs to be revised before publication. My specific comments are as follows:
1. The introduction of the article describes the impact of climate change on flooding risk, and some additional data on projections of future climate change scenarios could be considered to enhance the foresight and urgency of the study.

 

Response: I am extremely grateful for your professional suggestions. I have strengthened the discussion in this part in the introduction section.


  1. The introduction presents the background of the increase in extreme rainfall events due to climate change and the research progress on the use of machine learning, but it is not comprehensive. It should include urbanization, climate change, land use, and ecology. It is suggested to refer to the following references to enhance the comprehensiveness. - Machine learning in modelling the urban thermal field variance index and assessing the impacts of urban land expansion on seasonal thermal environment Forecast Urban Ecosystem Services to Track Climate Change: Combining Machine Learning and Emergy Spatial Analysis. 

Response: The formation of urban flooding is indeed the result of multiple influences, including urbanization, climate change, land use and ecology. The local microclimate formed by urban heat islands can also affect rainfall and urban flooding. Your suggestions for article revision have been accepted.


  1. When discussing the model performance, the article should explain some details about the model training and validation process, such as the division ratio between the training and validation sets, and the hyper-parameter tuning of the model, etc. This will help readers to better understand the model. This will help the reader to better understand the performance and reliability of the model.

 

Response:The division of the training set and the validation set is introduced in the data processing section and the content. I will enhance the description in the technical route section and the confusion matrix discussion section.


  1. The article mentions that “high-risk areas have higher population densities”, but it lacks elaboration on the specific characteristics and causes of these areas. For example, the topography, land use type, etc. of these areas to enhance the interpretability of the results.

 

Response:Your suggestion is highly targeted. I have elaborated on why "the population density is higher in high-risk areas" from the perspective of the data characteristics of this region. First of all, the population density in the old urban area is higher than that in the new urban area and the rural areas on the outskirts. Meanwhile, the old urban area, due to its earlier construction, has a low penetration rate of pipeline facilities. Moreover, the construction density in the old urban area is high and the green space ratio is low. During rainfall, the surface runoff and catchment volume are relatively large, which is prone to cause urban flooding. This results in the characteristic of high exposure in high-risk areas.


  1. When discussing model limitations there is a need to explain how these limitations can be overcome. For example, can the accuracy of the predictions be improved by introducing more real-time data or improving the model structure?

 

Response:This is indeed worth discussing because real-time data is an inevitable trend of future development. By integrating the analytical techniques of this study into various meteorological sensors, temperature and humidity sensors, remote sensing detection devices, communication devices, etc. of the Internet of Things (IoT), real-time data updates can be achieved, thereby realizing the real-time dynamic changes of predictions. This is not technically difficult, but it requires a considerable investment in urban infrastructure construction. This is of great help for improving the accuracy of prediction. Due to the limited funding and equipment acquisition capacity, this article can only be studied up to this step at present. New articles on real-time data and models will continue to be published in the future.


  1. The article suggests “incorporating rainfall and flooding risk assessment into the rigid constraints of territorial spatial planning”, but does not provide details on how to achieve this goal.

 

Response: I have added an overview of this part. Specific measures for strict constraints can legalize the risk map or integrate the risk analysis process with the demarcation of the three zones and three lines. For instance, the area around Jiuli Lake Wetland can be designated as an ecological red line zone, and land reclamation from the lake is prohibited. At the level of control detailed planning, indicators such as the total runoff control rate and the proportion of permeable area should be set.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript "Study on the risk of urban population exposure to waterlogging in Huang-Huai area based on machine learning simulation analysis--A case study of Xuzhou urban area" (land-3559100) presents a solid and innovative approach to assessing population risk in the face of urban flooding, with strong technical foundations and great potential for application. With the proposed improvements, it could become an important reference at the intersection of urban planning, hydrology, and data science. However, before recommending the manuscript for publication, the authors must improve several aspects of the present study. Therefore, I am recommending this work for major revisions.

 

As major observations, which must be attended to, I highlight:

1 – I noticed some grammatical errors in writing, therefore, I suggest the revision of English by a native speaker.

2 – Authors must reformulate the abstract. Note that you are presenting an abstract with 249 words, and Land limits an abstract to 200 words. I also highlight that authors must follow the premise of presenting the highlights of the results in the abstract, something that I did not observe in this summary.

3 – The introduction is well elaborated in terms of contextualizing the problem of urban flooding and the growing relevance of climate change. However, the following suggestions may further strengthen the text:

  • The literature review is extensive, but it lacks clarity regarding which aspects have not yet been sufficiently explored and how this study advances the field.
  • Some paragraphs repeat similar ideas about the potential of AI and remote sensing. The introduction could be more concise.

4 – The use of five machine learning models is interesting, but it lacks justification for why these specific algorithms were chosen.

5 – The transformation of categorical data (such as land use) into dummy variables is mentioned, but it would be useful to detail which categories were used and how they were numerically represented.

6 – The division into training and validation sets is described, but there is no mention of the use of cross-validation or techniques to prevent overfitting.

7 – The use of nighttime light data is innovative. The regression for population estimation is interesting, but a scatter plot could be included to better visualize the fit.

8 – The analysis of variable weights across different models is insightful, but a more critical discussion of the implications of these results would be beneficial.

9 – The superiority of CatBoost is well demonstrated, but statistical tests (e.g., t-test or ANOVA) are missing to evaluate whether the difference in performance compared to the other models is statistically significant.

10 – The discussion on applications in urban planning is relevant, but it could be more specific regarding how the data could be used by decision-makers.

11 – The limitation regarding reliance on historical data is mentioned, but other methodological limitations (e.g., data resolution, lack of real-time data) could be more explicitly stated.

12 – The conclusion is comprehensive and clearly highlights the study's contributions. Some recommendations:

  • Emphasize the innovation in using CatBoost with spatial and nighttime illumination data.
  • Suggest incorporating real-time climate data, deep neural networks, or integration with IoT for real-time monitoring.
  • Indicate ways to translate the model into accessible tools for decision-makers.

 

As a minor and main note, I highlight:

1 – Use the Mendeley Reference Manager for references as well as citations, as both Land standards are not standardized in the body of every manuscript.

2 – Figure 1 must have larger font sizes for all information presented.

3 – Line 126: “Figure 1.”

4 – Figure 2 must also present larger font sizes for all information displayed.

5 – The same observation applies to Figures 3, 4, 5, 10, and 11.

6 – Sections 2 and 3 should be merged into a single section titled “Materials and Methods.”

7 – Line 305: “Figure 6. Methodological flowchart of data processing steps.”

8 – The discussion section should not be included in the conclusion. Please create a separate discussion section based on the results presented in the previous section.

Author Response

The manuscript "Study on the risk of urban population exposure to waterlogging in Huang-Huai area based on machine learning simulation analysis--A case study of Xuzhou urban area" (land-3559100) presents a solid and innovative approach to assessing population risk in the face of urban flooding, with strong technical foundations and great potential for application. With the proposed improvements, it could become an important reference at the intersection of urban planning, hydrology, and data science. However, before recommending the manuscript for publication, the authors must improve several aspects of the present study. Therefore, I am recommending this work for major revisions.

 

As major observations, which must be attended to, I highlight:

1 – I noticed some grammatical errors in writing, therefore, I suggest the revision of English by a native speaker.

Response: I thank you for your suggestion. Our team has polished and revised the language.

2 – Authors must reformulate the abstract. Note that you are presenting an abstract with 249 words, and Land limits an abstract to 200 words. I also highlight that authors must follow the premise of presenting the highlights of the results in the abstract, something that I did not observe in this summary.

Response: I have reduced the abstract to 200 words at present. I have strengthened the discussion of the highlights of the result.

3 – The introduction is well elaborated in terms of contextualizing the problem of urban flooding and the growing relevance of climate change. However, the following suggestions may further strengthen the text:

  • The literature review is extensive, but it lacks clarity regarding which aspects have not yet been sufficiently explored and how this study advances the field.Some paragraphs repeat similar ideas about the potential of AI and remote sensing. The introduction could be more concise.

Response: I re-summarized the deficiencies of the current research and promoted their solutions in this study. I have simplified and deleted the repetitive discussion part.

4 – The use of five machine learning models is interesting, but it lacks justification for why these specific algorithms were chosen.

Response: The collinearity test proves that although the degree of collinearity of the index is not high, medium and low degrees of collinearity may still affect the model results of machine learning. Therefore, machine learning models can only choose machine learning models based on decision trees, which do not have strict requirements for collinearity problems. The current mainstream machine learning models based on decision trees are the five mentioned in this article. I have rewritten the reasons for this part of the choice in the thesis.

5 – The transformation of categorical data (such as land use) into dummy variables is mentioned, but it would be useful to detail which categories were used and how they were numerically represented.

Response: The codes for land use data are 1 (bare land), 2 (forest land), 3 (shrub), 4 (grassland), 5 (cultivated land), 6 (water area), and 7 (construction land). There is actually no characteristic of quantity change from 1 to 7 among them. Therefore, it needs to be transformed into a dummy variable.

Suppose 7 (construction land) is selected as the reference category, then 6 dummy variables corresponding to categories 1 to 6 will be generated. The following table shows the transformation process of dummy variables. This process is relatively simple, so it is only briefly described in the main text.

Code

1:Bare ground

2:Woodland

3:Undergrowth

4:Grassland

5:Cultivated lands

6:Rivers and Lakes

Dummy Variables

1

1

0

0

0

0

0

2

0

1

0

0

0

0

3

0

0

1

0

0

0

4

0

0

0

1

0

0

5

0

0

0

0

1

0

6

0

0

0

0

0

1

7

0

0

0

0

0

0

 

6 – The division into training and validation sets is described, but there is no mention of the use of cross-validation or techniques to prevent overfitting.

Response: Overfitting is manifested as the model performing well on the training set but experiencing a sudden drop in performance on the test set. In this paper, the accuracy rates of the test set and the verification set are close. The results of the five types of machine learning prove that their accuracy rates are all distributed between 76% and 81.67%, and it is difficult for overfitting to occur. The results of the confusion matrix also prove that there is no overfitting situation.

7 – The use of nighttime light data is innovative. The regression for population estimation is interesting, but a scatter plot could be included to better visualize the fit.

Response: The scatter plot has been modified.

8 – The analysis of variable weights across different models is insightful, but a more critical discussion of the implications of these results would be beneficial.

Response: I agree with you. A critical discussion of the result has already been added.

 

9 – The superiority of CatBoost is well demonstrated, but statistical tests (e.g., t-test or ANOVA) are missing to evaluate whether the difference in performance compared to the other models is statistically significant.

Response: The assessment of rainwater accumulation points is a binary classification method. The process is difficult to conduct t-tests or analysis of variance. The comparison of the advantages and disadvantages of binary classification models mainly analyzes the accuracy rate and the results of the confusion matrix, which may be more meaningful.

10 – The discussion on applications in urban planning is relevant, but it could be more specific regarding how the data could be used by decision-makers.

Response: Based on the distribution of water accumulation characteristics and the distribution map of population risk exposure, urban planning managers can carry out targeted governance. It is also possible to transfer and evacuate high-risk exposed populations between extreme rainfall occurrences.

11 – The limitation regarding reliance on historical data is mentioned, but other methodological limitations (e.g., data resolution, lack of real-time data) could be more explicitly stated.

Response: The other several limitations you mentioned have been added to the conclusion of the main text by me.

12 – The conclusion is comprehensive and clearly highlights the study's contributions. Some recommendations:

  • Emphasize the innovation in using CatBoost with spatial and nighttime illumination data.
  • Suggest incorporating real-time climate data, deep neural networks, or integration with IoT for real-time monitoring.
  • Indicate ways to translate the model into accessible tools for decision-makers.

 As a minor and main note, I highlight:

1 – Use the Mendeley Reference Manager for references as well as citations, as both Land standards are not standardized in the body of every manuscript.

Response: The format of the references has been modified

2 – Figure 1 must have larger font sizes for all information presented.

Response: The font size of the picture has been modified

3 – Line 126: “Figure 1.”

Response: The font size of the picture has been modified

4 – Figure 2 must also present larger font sizes for all information displayed.

Response: The font size of the picture has been modified

5 – The same observation applies to Figures 3, 4, 5, 10, and 11.

Response: The font size of the picture has been modified. At the same time, the picture has been enlarged and placed at the top.

6 – Sections 2 and 3 should be merged into a single section titled “Materials and Methods.”

Response: Has been merged

7 – Line 305: “Figure 6. Methodological flowchart of data processing steps.”

Response: The size of the picture has been modified

8 – The discussion section should not be included in the conclusion. Please create a separate discussion section based on the results presented in the previous section.

Response: The discussion has been split up and the discussion content has been increased.

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

Based on the corrections provided by the authors, I am considering the present study for publication.

Back to TopTop