Next Article in Journal
Quantifying the Spatial Ratio of Streets in Beijing Based on Street-View Images
Next Article in Special Issue
Multi-Supervised Feature Fusion Attention Network for Clouds and Shadows Detection
Previous Article in Journal
Inconsistency Detection in Cross-Layer Tile Maps with Super-Pixel Segmentation
 
 
Article
Peer-Review Record

Efficient Classification of Imbalanced Natural Disasters Data Using Generative Adversarial Networks for Data Augmentation

ISPRS Int. J. Geo-Inf. 2023, 12(6), 245; https://doi.org/10.3390/ijgi12060245
by Rokaya Eltehewy 1,*, Ahmed Abouelfarag 2 and Sherine Nagy Saleh 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
ISPRS Int. J. Geo-Inf. 2023, 12(6), 245; https://doi.org/10.3390/ijgi12060245
Submission received: 20 April 2023 / Revised: 9 June 2023 / Accepted: 15 June 2023 / Published: 17 June 2023

Round 1

Reviewer 1 Report

Dear Authors,

After having assessed the suitability for publication of the Manuscript ID: ijgi-2382971, having the title "Efficient Classification of Imbalanced Natural Disasters Data Using Generative Adversarial Networks for Data Augmentation", I have distinguished several elements that from my point of view should be made less confused and more comprehensible by the authors in view of improving the quality of the manuscript. Therefore, I have devised and wrote a series of comments to the authors of the manuscript under review.

In this paper, the authors propose a framework for classification of natural disasters by utilizing social media collected insights. The framework combines a set of synthesized diverse disaster images generated using generative adversarial networks (GANs) and the domain-specific fine-tuning of a deep Convolutional Neural Network (CNN) based model. The Manuscript ID: ijgi-2382971 is interesting. However, the article under review can be improved if the authors address the following aspects in the text of the manuscript and reflect them clearly point-by-point within the cover letter:

1. The paper has been submitted to the Special Issue “Unlocking the Power of Geospatial Data: Semantic Information Extraction, Ontology Engineering, and Deep Learning for Knowledge Discovery” of the MDPI "ISPRS International Journal of Geo-Information”. Therefore, I consider that the authors should strengthen the main impact and relationship of their study with regard to this domain. In the actual form of the paper, this connection is not enough highlighted. It will benefit the paper if the authors provide more details on this issue.

2. The "Abstract" of the paper. The manuscript will benefit if in its "Abstract", along with the elements already presented, the authors also declare and briefly justify the novelty of their work.

3. Lines 31-179, the "Introduction" and "Related Works" sections – the purposes overlap. In the actual form of the manuscript, after the "1. Introduction" section, there exists a section entitled "2. Related Works". I consider that this section's purpose and the one of the "Introduction" are overlapping and therefore I consider that the two sections should be concatenated and reorganized into a single section, namely the "Introduction" (if they consider necessary, the authors can use a subsection "Related Works" within the "Introduction" section).

4. The gap in the current state of knowledge. I consider that in the Manuscript ID: ijgi-2382971, the gap in the current state of knowledge is not enough highlighted. After having performed the literature survey in an appropriate manner, the authors will be able to pinpoint clearer an exact deficiency, an unsolved problem, a gap that still exists in the current body of knowledge that needs to be filled, therefore justifying the need and novelty of their study. Otherwise, without identifying and stating clearly this gap, the study from the manuscript under review does not justify its need, importance and novelty. Emphasizing the gap will improve the manuscript under review on multiple plans, as the identified deficiency, the identified unsolved problem will offer the authors great opportunities to highlight and prove, when discussing their results, the contribution, the advancement that the conducted research has brought to the existing state of knowledge. Afterwards, it will benefit to state the novel aspects of the conducted study.

5. The dataset availability. In what concerns the datasets, the authors state in the “Data Availability” section: "The initial dataset analyzed during this study is available in the CrisisNLPrepository (https://crisisnlp.qcri.org/). In addition, the extended version of the dataset collected for this research is available from the corresponding author on reasonable  request.." I would like the authors to provide more details regarding the dataset. In order to be able to verify the relevance and correctness of the developed case study, in the supplementary materials the authors must provide all the necessary details in order to allow other researchers to verify, reproduce, discuss and extend the obtained scientific results based on the obtained published results. Other researchers should not have to obtain and concatenate the data sets from various sources, risking in acquiring different datasets or datasets that have been normalized differently than the datasets on which the authors have performed their experimental tests and validations. The exact datasets that the authors have used will be a valuable addition to the manuscript if they can be provided as supplementary materials to the manuscript as the authors must provide all the necessary details in order to allow other researchers to verify, reproduce, discuss and extend the obtained published results of the authors.

6. The dataset actuality. As mentioned by the authors, at Lines 292-296, “A subset of CrisisMMD [13], a real-world disaster-related dataset collected from Twitter during different natural disasters that took place in 2017 is used as the backbone of our dataset. Additional samples were selected from other recent and publicly available datasets including the damage identification multimodal dataset [35] and the damage severity assessment dataset [36].” In this context, the authors should comment in the paper whether the data collected in 2017 are still relevant today, in 2022, in what concerns the same analyzed parameters. The authors should provide explanations whether their study is consistent, whether the changes that may occur within the older dataset from the above-mentioned years and the current year risk altering the final result. The authors should take into account the fact that a part of the information might change in time, or information can be outdated and therefore the whole study risks to become inconsistent and irrelevant.

7. The dataset’s correctness, relevance and trustworthiness. According to the above-mentioned paragraph from Lines 292-296, the used dataset is based on “a real-world disaster-related dataset collected from Twitter”. The authors should explain in the manuscript if and why this online social media and social networking service represent a source of scientific, correct, relevant and reliable information.

8. The Convolutional Neural Networks’ parameters. The performance of Convolutional Neural Networks can be greatly impacted by parameters such as filters' sizes, number of filters, step size, factor for dilated convolution, size of padding, and learning rate functions used to initialize the weights (he, glorot, narrow-normal, ones, zeros or other used functions) and biases. Providing clear details regarding the values that were tested for each of these parameters during the experimental tests, along with how the final values have been chosen, is essential in order to ensure the reproducibility of the study. Furthermore, depicting the impact that each of these parameters has on the performance level is of extreme importance in order to demonstrate the effectiveness of the research methodology and ultimately of the proposed approach.

9. The data division ratio. At Lines 377-380, the authors state: "Geometric Augmentation is applied involving image flipping, a 30% zooming rate and 30% horizontal and vertical shifting. Additionally, the dataset is divided into 70% training, 20% validation and 10% testing samples with batch size set to 8 samples.". The authors should explain in the paper the reasons for choosing these data division ratios. The paper will benefit if the authors present more details regarding the results obtained during various tests, for all the different tested ratio values, up to the moment when the chosen ratio has proven to be the best (or suitable) approach and what was the criterion/performance metric used in choosing this ratio.

10. The equations within the manuscript. The equations within the manuscript should be explained, demonstrated or cited, as there are some equations that have not been introduced in the literature for the first time by the authors and that are not cited.

11. Issues regarding the neural network approach - retraining process. As the authors have used an Artificial Neural Network based approach, I consider that the authors must specify in the paper how often does each network need to be retrained/updated and how did they tackle the need of retraining/updating the network.

12. Issues regarding the neural network approach – new datasets and more results. How is the new data encountered stored for subsequent updates of the networks? The paper will benefit if the authors present more details regarding the results obtained during various tests, for all the different number of hidden layers, neurons and epochs tested and especially the training time for each test, until they have obtained the configuration that has provided the best results. The information can be summarized in a table and if it becomes too long, the authors can restrict it in the paper to ten main experimental runs, and a complete table with all the experimental runs can be inserted in the "Supplementary Materials" file of the article.

13. Issues regarding the proposed architectures. In Section 3.2, the authors describe the the Inception-V4 and VGG16 Convolutional Neural Networks architectures implemented within their study. Afterwards, at Lines 329-335, the authors state: “In terms of data generation, the cGAN model was trained with 3000 epochs to generate 5000 new instances for the under-represented classes. Regarding hyper parameters, Adam optim izer along with binary cross entropy as a loss function are used. Leaky Relu was used for activations except for the final layer which used a Tanh activation, generator dropout with a probability of 0.2, a learning rate of 0.0002 and batch size of 4 were applied. After adding the additional samples to the dataset, the class imbalance ratio is drastically reduced to a ratio of 2.5.” It will benefit the paper if the authors provide more details regarding the CNN and cGAN architectures and the reasons that stood behind the decision of choosing these architectures, with the above-mentioned parameters and settings.

14. The effectiveness of the proposed approach with regard to the computing time. Did the authors take into account making use of parallel processing architectures, for example Compute Unified Device Architecture (CUDA) enabled GPUs when computing the functions, in order to achieve a speed-up of the computations? It will benefit the paper if the authors have the possibility to analyze how the computing time varies when using a Compute Unified Device Architecture (CUDA) GPU in order to speed-up considerably the computations. If the authors lack the necessary equipment, they should at least provide an insight when discussing their results regarding the potential to achieve significant speed-ups by means of a CUDA approach.

15. The generalization capability of the developed approach. Can the authors mention how much of their model is being influenced by the used data or to which extent the model can be easily applied to other situations, when the datasets are different? In this way, the authors could highlight more the generalization capability of their approach in order to be able to justify a wider contribution that has been brought to the current state of art.

16. The "Discussion" section. In the "Discussion"  section the authors should highlight current limitations of their study. The authors should underline both the advantages and disadvantages of their proposed approach when compared with other valuable studies from the current state of art. When discussing their obtained results, the authors should emphasize not only the novel aspects and strong points of their developed method, but also should point out objectively the existing limitations of their method, possible circumstances that will hinder their method’s effectiveness.

17. Potential abnormal situations. I would like the authors to provide more details in the "Discussion” section of the paper regarding how their developed approach can be adjusted as to take into consideration potential abnormal situations, such as financial crises, economic collapse, energy crises, Coronavirus recession, economic impact of the COVID-19 pandemic.

18. Insights. The paper will benefit if the authors make a step further, beyond their approach and provide an insight when discussing their obtained results regarding what they consider to be, based on the obtained results, the most important benefits of the research conducted within the manuscript, taking also into account its practical applicability.

19. The cost-benefit analysis. It will benefit the paper if the authors elaborate a cost-benefit analysis regarding the implementation of their proposed solution in a real working environment, taking also into account the licensing cost of the software.

Other remarks.

·       Error messages. At Lines 265, 300, 305 and on Pages 11, 12 and 13 (where the line numbers are missing), instead of the Figures' references, an error message is being displayed "Error! Reference source not found."

·   Run-on expressions. At Lines 297-299, the authors state: “The stated datasets consist of a combination of manually annotated tweets and images collected during seven major natural disasters including earthquakes, hurricanes, wildfires, floods …etc.,” In a scientific paper one should avoid using run-on expressions, such as "and so forth", "and so on", "and some other" or "etc.". Therefore, instead of "etc", the sentence should mention all the elements that are relevant to the manuscript.

Author Response

Dear Reviewer

Please see the attachment.

Thank you

Author Response File: Author Response.pdf

Reviewer 2 Report

Please see again some reference errors in Figures (Lines:264-265, 305-306,...) and Table 2 

 

 

Author Response

Dear Reviewer

Please see the attachment.

Thank you

Author Response File: Author Response.pdf

Reviewer 3 Report

There are few minor things to fix in the paper:

- please check sentence in line 181

- In Figure 1, there are 2 phases II; one should be phase III. It is not clear in the figure that phase III is based only on real data

- line 252, -4 should be superscript

- there are several "Error! Reference source not found" in the text (line 265, 305, section 5.2, section 6, line 406)

- line 407: comma instead of period

Author Response

Dear Reviewer

Please see the attachment.

Thank you

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Dear Authors,

After having assessed the suitability for publication of the revised version of Manuscript ID: ijgi-2382971, having the title "Efficient Classification of Imbalanced Natural Disasters Data Using Generative Adversarial Networks for Data Augmentation", I can conclude that the authors have addressed the most important signaled issues, therefore improving the manuscript.

Back to TopTop