Next Article in Journal
Automatic Grapevine Trunk Detection on UAV-Based Point Cloud
Next Article in Special Issue
Assessment of Wind Direction Estimation Methods from SAR Images
Previous Article in Journal
The Impacts of the COVID-19 Lockdown on Air Quality in the Guanzhong Basin, China
 
 
Article
Peer-Review Record

Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text

Remote Sens. 2020, 12(18), 3041; https://doi.org/10.3390/rs12183041
by Edwin Aldana-Bobadilla 1,*, Alejandro Molina-Villegas 2, Ivan Lopez-Arevalo 3, Shanel Reyes-Palacios 3, Victor Muñiz-Sanchez 4 and Jean Arreola-Trapala 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Remote Sens. 2020, 12(18), 3041; https://doi.org/10.3390/rs12183041
Submission received: 10 July 2020 / Revised: 28 August 2020 / Accepted: 13 September 2020 / Published: 17 September 2020

Round 1

Reviewer 1 Report

Your work is full of heuristics. After going through the method you proposed one is left with the impression that there are lots of decisions which cannot be extended to other data. In other words, a lot of fine-tuning is required. How can we (the readers) be reasonably certain that we will be able to replicate the purported results in general?

As examples I have taken pieces of your text with some relevant issues and comments interspersed:

A) The geographic-named entity recognition module is based on a trained model whose inputs are  vector representations of words, also referred to as embeddings in a semantic space. Basically, the  input is transformed into dense vectors (DEFINE A DENSE VECTOR), and then a GER model (WHICH MODEL?) determines when a specific word  or n-gram is a geographic entity. It is worth mentioning that, given the lack of NLP resources for  Mexican-Spanish, we deployed our own geographic-named entity recognition (GNER) module based  on a fusion model of lexical and semantic features alongside a neural network classifier.

B) 3.1.1. Preprocessing and Vectorization First, texts are preprocessed with standard tokenization (WHAT IS CONSIDERED “STANDARD” TOKENIZATION?), where we preserve capital letters. This is because capitalized words are actually part of the standard lexical features for GNER; most of the time, location entities appear capitalized (WHAT HAPPENS IF NOT?).

C) Once we obtain lexical and semantic characteristics of words, we fuse them by concatenating all features in what we call a bag of features, and then we use a binary neural network (DEFINE binary neural network) REFERENCES. WHY THIS NETWORK ABOVE OTHERS? WHAT ARE THE ADVANTAGES/DISADVANTAGES? in order to assess whether a token is a geographic entity or not. The neural network GNER classifier contains 1 hidden  layer with 3 hidden units and a sigmoid activation function with weight decay. WHERE DOES THIS ARCHITECTURE COME FROM? WHY? IS THIS THE BEST CHOICE? HOW CAN YOU BE SURE? At the end of the       neural network GNER classifier, the last layer determines, for each token, one of the two possible  classes {<location>,<word>}. Finally, for entities composed of two or more tokens, we use a heuristic (WHICH HEURISTIC)    to reconstruct the whole entity by using the class of the tokens in the original text. That is, two or more consecutive <location> labels will be considered as one single location entity. The labeled version of  the original text is sent to the dynamic context disambiguation module. (ANOTHER HEURISTIC!!!)

D) Our disambiguation approach derives decisions based on rules and facts. The rules specify how ambiguities could be solved by considering the context in a similar manner as that of a person. Given that the rules are activated as needed in execution time, we call it dynamic context disambiguation. (WHICH RULES? HOW ARE THESE RULES ARRIVED AT?)

We found no general conclusions which guarantee the adoption of your method in general.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript deals with the geoparsing problem from an unstructured text. The geographic entities are recognized from the text using a method published in the Master's thesis of Trapala (possibly a colleague or student of the authors), based on lexical and semantic features and a neural network classifier. A novel method is proposed to find the correct match in a gazetteer for each recognized entity, in order to disambiguate places with a similar name. The method takes into account the entity at the top of a stack of previously processed entities and the hierarchy of geographic levels present in the gazetteer. Test results are shown for Mexican-Spanish text. The manuscript is clearly written. Some minor comments are as follows.

Please consider referencing the following papers:
Stuart E. Middleton, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Yiannis Kompatsiaris, Location Extraction from Social Media: Geoparsing, Location Disambiguation and Geotagging, ACM Trans. Information Systems 36(4), 2018.
Morteza Karimzadeh, Scott Pezanowski, Alan M. MacEachren, Jan O. Wallgrün, GeoTxt: A scalable geoparsing system for unstructured text geolocation, Transactions in GIS 23(1), 2019, 118-136.

Line 194: How is the match determined? All the characters match exactly?
Table 2: Why are all the entities in C assigned the same geographic properties in Q3 of Rule R7? If these entities are located at different places in reality, it does not make sense to assign the same properties to all of them. Are the stacks S and C emptied also at some other stage in addition to rows 4 and 6 in Algorithm 1, so that the entities in C are somehow related to those in S? Q1 in Rule R4: Is T the entity at the top of S?
Lines 235-237: What does "precede" actually mean in Rule R2, when it is claimed that Ixcatán precedes Azcapotzalco? Ixcatán appears after Azcapotzalco in the text and Ixcatán is lower than Azcapotzalco in the hierarchy in Fig. 2, so how can it precede?
Lines 283-285: San Francisco does not appear in the sentence. Why is it labeled?
Table 6: What are Micro and Macro?
Lines 354-365: It would be good to report also the proportion of correctly located places. The place is either correct or not.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The article received for review constitutes a significant scientific contribution in the presented scientific field. Proper structure of the article, figures, legible tables referenced in the text. Better quality geolocation of objects can be made in the future, as there are many free tools available. Another remark is the number of literature items. In my opinion, it was possible to conduct a larger literature research, but nevertheless it is a remark to be corrected in the future.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

This a well written paper. I have no specific comments or suggestions.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 5 Report

Dear authors,
The structure of the paper is very clear and the organization of the various sections is also good. The main focus of the paper and the objectives are well explained.
Comments:
1. How did the authors deal with local site names (historical or folk)? Have the authors considered this possibility as well? This study could also be beneficial for the study of historical texts.
2. Does the system also allow you to locate scanned maps by geographic names that do not have the location system specified? If the system made it possible, it would provide great help to researchers in the future, especially in the active research of historical documents and maps.
3. I would recommend the authors to add to the final section the application use of their created system.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Back to TopTop