Next Article in Journal
A GloVe Model for Urban Functional Area Identification Considering Nonlinear Spatial Relationships between Points of Interest
Previous Article in Journal
The Design and Implementation of Geospatial Information Verification Middle Platform for Natural Resources Government Affairs
 
 
Article
Peer-Review Record

Using Flickr Data to Understand Image of Urban Public Spaces with a Deep Learning Model: A Case Study of the Haihe River in Tianjin

ISPRS Int. J. Geo-Inf. 2022, 11(10), 497; https://doi.org/10.3390/ijgi11100497
by Chenghao Yang 1,2, Tongtong Liu 1,* and Shengtian Zhang 3
Reviewer 1:
Reviewer 2:
Reviewer 4:
ISPRS Int. J. Geo-Inf. 2022, 11(10), 497; https://doi.org/10.3390/ijgi11100497
Submission received: 11 July 2022 / Revised: 11 September 2022 / Accepted: 15 September 2022 / Published: 21 September 2022

Round 1

Reviewer 1 Report

This paper tried to understand the image of urban public space using Flickr data by developing a VGG-16 image classification method. The topic is interesting. Here are some of my comments:

1.       The introduction part can be further improved by introducing the research motivation and research gaps so that readers can understand the novelty and contribution of this study very well.

2.       Figure 3 can be further improved to make it more understandable.  On page 6, you mentioned that part includes six main steps. Specifying which step corresponds to which part in the Figure would be good.

3.       In section 4.1, it is mentioned that 1940 images are obtained after cleaning. On line 289, ‘metadata also includes the user's information, including 1811 data with the user's location, and another 577 unknown user’s location data.’ I am a bit confused about the two numbers. If I understand correctly, the total number of data with the user’s location and unknown user’s location should be 1940, right?

4.       Considering many neural network architectures, it is required to interpret why VGG-16 is selected in this study.

5.       In section 4.2, most of the contents are related to the theories and methods of VGG-16. I think it’s more suitable to introduce them in Section 3.

6.       The full name of Grad-CAM [53] (Gradient-weighted Class Activation Mapping) should be given when it appears for the first time rather than in the middle of the paper.

7.       What is the link between image classification results and Kernel density results? From the kernel density analysis, we can find the three areas with high density. By overlaying the kernel density result with Google maps, we can also understand what the three areas are. Why the image classification is required? Please justify it.

8.       In the discussion part (section 5.1), the authors mainly discuss the placemaking suggestions based on the kernel density results. It seems the image classification results have a tiny contribution to urban design and placemaking, as I mentioned in the seventh comment.

 

9.       In a nutshell, this study applies VGG-16 to Flickr data to understand the image of urban public space. I didn’t see the innovation of the methods. Please justify the contributions of this study. 

Author Response

Thank you very much for your detailed suggestions on the manuscript, and I have made the following changes in response to your suggestions.

1.We have added a description of research motivation and research gaps in the Introduction section on page 2.

2.We marked phase 1 to 6 in figure 3.

3.We reinterpreted the data for 1940 images with and without geo-information, including 1500 images without geo-information and 440 images with geo-information. The previous count was due to the fact that we didn't deduplicate the metadata, we've corrected that, thank you very much for your correction.

4.We added the reason why we use VGG-16 in Line 252. The core reason is that the TOP-1 accuracy of VGG-16 is the best in Places365.

5.We adjusted the position of section 4.2 into the result section.

6.We adjusted the full name of Grad-CAM at the place when it appears for the first time.

7.The results of image classification are abstract. In previous studies, image classification results usually only show the scene categories and scene distributions based on geo-tagged photo and perceived by the public perception in the entire city or region. 

We have improved the research method design framework to include qualitative analysis in page 6. Including the discovery of Haihe River hotspots, the comparison of geographic data image classification results with current land development, and the comparison of image classification results between geographic data and non-geographic data.

The added part is concentrated in the part of 4.3, and it is more effective to combine the analysis with the current land use nearby the Haihe River.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper presents a deep learning analysis of Flickr images to understand urban public space. It explores a variety of data attributes, such as user demographic metadata, image, and geographic elements of the social media data source. The method has a good potential to contribute to the wider application of urban renovation and renewal. The main issue is the limit of the data. Despite being mentioned in the paper, these limitations have affected the credibility of the conclusions of this research at least to a certain extent.

Here are some of the issues that are probably worth better elaboration:

1.       Line 288, Figure 4: any reasons for high/low points of geo-tagged images over the years? It is quite interesting to see images shared on the platform clustered to certain years of the city. Is it also happen to other cities? Or whether this is affected by the tourist influx due to special events?

2.       Line 303, Figure 5: This user demographic confirms the fact that images about the city probably is biased toward tourists, especially the geo-tagged part. It is rather insecure to claim if the final pattern is of public perception.

3.       Line 428: Please elaborate more on whether an image is classified under one or more than one category. How does the higher accuracy of TOP-5 bring benefits to the analysis?

4.       Line 449-450: Taking the small sample sizes, is it possible to have higher accuracy/inaccuracy in a particular category? Would it be possible to illustrate the matrix using the labeled set?

5.       Line 469: This is further evident that user demographic bias can stem from the outliers. Would it be possible to elaborate more on the distribution of the data at the individual level?

6.       Line 471, Figure 10: It is rather evident again that the data reflect tourist interests. The identified public areas are likely to be around attractions. Would it be possible to compare with land use data to confirm to what extent the pattern helps to unfold the real cases?

7.       Line 494-495: Unfortunately, the current data is limited in revealing residents’ opinions too.

8.       Line 509 and further: The suggestions are probably worth to be reconsidered taking the bias of the data. It may be worth comparing with the current development plan or land uses.

9.       Line 528-529: would it be possible to verify the accuracy of lat/long from the image with the labeled set?

 

To summarize, this paper is well-written and easy to follow. I think the paper needs to gear slightly to be more methodology-driven due to the data bias and limitations. As stated, social media data is more valuable to complement the existing data sources. It would be nice to see some considerations on how to incorporate this method with planning data.  

Author Response

1.We explain the possible reasons for the peak values of the user statistics section in 2009, 2013, and 2017, which are related to the major international events that took place in that year.

2.We have recounted tourists and locals, with the locals still represented. We explained several times in the study. Although the data tends to be more tourist-oriented, still does not only reflect tourism perceptions

3.On page14 and page15 we have added TOP-1 to TOP-5 statistics and discussed them with the TOP-1 results. In the Method section, more explanation of the TOP-5 accuracy has been added.

4.We tried to make matrices, but the results were not satisfactory due to the so many categories, so we thought it would be better to judge the accuracy by interpreting it from the deep learning domain.

5.We explained the photo-taking behavior of the user of the outlier, which is interesting, this user took 52 images on the same day, but we could not explain more personal information in detail due to privacy reasons.

6.I have to say that this is a very good proposal and we have added this section on page 16, which is an overlay of the land use and Mapping results.

7.We have re-emphasized the limitations of this section in the limitation section.

8.As stated in point 6, we have added a large section on the analysis of current land use

9.We have removed the very unusual lat/long, such as those deviating from the perimeter of the Hai River, and in general, the accuracy of this dataset is trustworthy.

Author Response File: Author Response.docx

Reviewer 3 Report

Dear authors, thank you for the article. I am interested in issues of urban image and perception and yours is an interesting case study based on big data and the commonly used Flickr source.

However, for its publication in the ISPR International Journal of Geo-Information, I consider that the following revisions should be made:

1. There is a misuse (and abuse) of acronyms:

- The acronym UPS for “urban public space” is a very misleading acronym. In addition to not being commonly used in urban studies, very interested in the issue of public spaces, it coincides with the acronym of a well-known American multinational shipping & receiving. It must be eliminated, and instead of acronyms, I recommend using synonyms, pronouns and other ways of writing if you want to avoid the repetition of “urban public space”.

- The acronym CNN should be explained (not very accurate either due to its coincidence with the media) the first time it is used, on page 2, line 74 and not as it is done on page 3.

- The NLP acronym for “natural language processing” is not necessary. It will never be used in the article again.

- The acronym SVM appears on page 9, line 312. It is no longer used in the article. And its meaning is not explained. It must be corrected.

- The acronym VGG, which appears from the abstract and page 1, its meaning is not explained (Visual Geometry Group) until page 9. The meaning of the acronym must be indicated the first time it is cited.

- The acronym MC for “manual classification” is superfluous. It should not be used. In fact, it only appears once in the text. And, in addition, the words “manual classification” are used again.

2. The High Line in New York cannot be chosen as a global example of public space because it is a controversial neoliberal capitalist action that has given rise to gentrification. Since the authors want to use New York as an example, Central Park would be a much better choice.

3. The first paragraph of the introduction should indicate, with bibliographic references, the current process of crisis in urban public space due to its growing privatization and its replacement by private areas such as malls.

4. Every time it is indicated that the source is Flickr in the text of the article, the important limits of this source must be pointed out: it has a maximum number of photographs to upload for free, then you must pay to share more photos; there may be a negative influence from professional paid accounts that have more images and, above all, it is a source of sharing images used by much fewer people (mainly only used by those interested in photography) than Instagram (a source that it would give more conclusive results). These limits must be indicated from line 65 on page 2, from line 102 on page 3 and from line 159 on page 4.

When it is pointed out that “Flickr is also a dominant data source” (line 162), it should be noted that it is due to the ease of its API to obtain the metadata of the figures, not because it is the most appropriate source, but rather because it is the most available. .

It then states “Compared with balanced text comments and sharing images on social media platforms such as Twitter, Weibo and Facebook, Flickr is a social media platform mainly based on image sharing.” That's fake because it's Instagram. And it is with Instagram, which is mainly a means of sharing photos, with which Flickr must be compared and not with other social networks. It would even be debatable that Flickr is more used than Reddit or Pinterest.

It is also a deficit that the main users are North Americans (Flickr is a platform that belongs to the Canadian company Ludicorp) while in other parts of the world it is used much less. Especially in the case of China, where, as the authors discover, few people use Flickr. This distorts the results of the article and should be pointed out.

These clarifications should be noted. Afterwards, the authors make a correct use, with a good methodology, of the source.

However, the source has important limitations that must also be pointed out in the conclusion as the main limitation of the investigation (number 1), in addition to the other limitations that are indicated and that are correct.

5. In some parts of the text, “The Flickr” is used instead of “Flickr”, which is correct.

6. The source of all the figures must be added, indicating if it is the authors' own elaboration when so.

7. In figure 1 a scale and a north arrow must be added.

8. In the results, on page 8, it is normal that there are fewer photos on Flickr in 2020 and 2021 as a result of the mobility limits due to the COVID-19 pandemic. This circumstance must be indicated.

9. The text that goes from line 316 and begins with “In our research (…)” and continues until line 369 ending with “(…) for 2 Dense layers.” should be transferred to the methodology section. The method used is explained. This cannot be in the results section, but must be in the methodology section.

Also the text on page 12 “Top N accuracy is a measure of how often the predicted class falls within the top N values ​​of the SOFTMAX distribution. In ImageNet, the error rate is often used to explain the probability of image recognition [52]. TOP-1 accuracy is the conventional accuracy: the model response must be exactly the expected answer. TOP-5 accuracy means hat any of the five highest probability results must match the expected response” should be transferred to the methodology section.

The text “the Grad-CAM [53] (Gradient-weighted Class Activation Mapping) visualization method was used in this study to obtain heatmaps that characterize image classification. Grad-CAM Module works by finding the convolutional layer in the VGG-16 network and examining the gradient information flowing into that layer after output results. GradCAM is an upgraded version of CAM [54] (Class Activation Mapping), which does not need to modify the neural network and retrain it.” from page 13

10. Modify figure 5. States should be grouped by regional groups of the world (Europe, North America, etc.), it would be more graphic. In addition, now it is wrong because it mixes States with territories that are not (Scotland belongs to the United Kingdom of Great Britain, which is also shown in the graph, and is not an independent country) and even with a category of "other city, CN[?]”.

11. The text of the results section of lines 371 to 378, “The Places365-Standard database developed by MIT was used in this study, which was used for scene recognition, including indoor and outdoor scenes. Places365 is the latest subset of the Places2 database. There are two versions of Places365: Places365-Standard and Places365-Challenge [35]. The Places365-Standard train set has about 1.8 million images from 365 scene categories, with up to 5000 Train images per category. The Place365Challenge-2016 dataset is an extension of the Place365-Standard dataset, including 6.2 million additional image photos for a total of 8 million photos in the training set. To improve training speed, we used the places365 database to train our architecture.”, should be removed. It is a reiteration. The authors have already explained it.

12. Delete the period in this text on page 439: “the facades of Tianjin Station. and the tower”.

13. Add the scale in figure 10.

14. In the Discussion section when “previous studies” and “fewers researchs” are mentioned, which ones should be mentioned, or at least one example.

15. The limitations of the research are usually in the conclusions section (inserted, not as a subsection) and not in the discussion section. Move them. On the other hand, the conclusions section is very good.

In this way, the discussions section would no longer have a subsection as there was no longer a 5.2.

16. A Data Availability Statement must be added at the end of the article, where the reader can consult the data, or at least the availability of the authors to share the data of the article with any interested party.

Author Response

Thank you very much for your detailed suggestions on the manuscript, and I have made the following changes in response to your suggestions.

1.The misuse of acronyms has been corrected.

2.We converted to another New York public space, Bryant Park

3.We have added this section but not expanded on it because privatisation is less of a concern in Chinese cities and because most public space is state-owned in China.

4.We explain why we use Flickr data and the disadvantages of Flickr data for a Asian cities in the Data collection section and specify the shortcomings of the data volume of Flickr data in the Limitation section of the conclusion.

5.We have corrected this issue about “The Flickr”.

6.All the Figures in this study are drawn by the author, and the journal will require the source of the cited Figures to be marked, but we have not found it necessary to declare the source of all Figures. We also try our best to ensure the privacy of users when citing Flickr images.

7.The north arrow and scale added to the Figure 1.

8.We explain the data reduction due to COVID-19 in section 4.1.

9.We adjusted the order of the Methods section.

10.This is a very good proposal, and we were missing the duplication step before, the new user location statistics are in section 4.1.

11.We have removed duplicate explanations。

12.We have deleted the period in this text.

13.We have added the scale on Figure 10 (now is Figure 11).

14.We improved the discussion section.

15.We moved Limitation to the conclusion section.

16. It should be noted that most of the research data based on social media data are not public, which is related to the protection of user privacy. Based on the discussion of user ID, user location and images in this study, we chose not to disclose the data

Author Response File: Author Response.docx

Reviewer 4 Report

The manuscript is well written and organized. The argument is original and aligned with the scope of the journal. According to my opinion it should be accepted for publication after minor improvements.

Keywords

The keyword “Haihe River” is too generic and it is not useful for indexing this article.

Introduction

The novelty of this study should be better explained.

Section 4

Why Section 4.2.2. presenting a methodology to alleviate the overfitting problem is presented among the results and not before (e.g. in Section 3)? The authors should better enhance this subsection as an obtained result or move it earlier.

Section 5 – Discussion

The proposed approach has great validity. It could also be applied in other areas, for example to geolocate suppliers of certain materials in the field of logistics, provided by various methods?

Author Response

Thank you very much for your suggestions on the manuscript, and I have made the following changes in response to your suggestions.

1.Delete Keyword “Haihe River” and add "social media data"

2.We have improved the Introduction section with added additions to manuscript novelty and previous research gaps.

3.We adjusted the section on Section 4 and put the methodology section where Section 3 should have been in the correct direction.

4.In Section 5, we increase the possibility of this method in the fields of physical geography, humanities and social sciences or landscape research. Since our research background is limited to the field of urban studies, we only propose suggestions for fields related to us.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I have no other comments.

Author Response

Thank you very much.

Reviewer 2 Report

This is an interesting paper employing novel user-generated image data to analyze users' preferences in the urban public space in Haihe river, Tianjin, China.

The paper has been revised substantially based on the last round of reviews. The majority of the issues mentioned in the previous comments have been successively addressed.

 I suggest some minor issues that can be improved before accepting for publication.

1.      A improved description or title is needed to describe Figure 13. A better word such as ‘share’ is probably better than ‘proportion’.

2.      Maybe worth considering to reshuffle the final discussion and conclusion part.

a.       May consider improving the summary of your work, especially your contributions. The policy recommendations seem to appear in the wrong place and are less convincing without a firm summary of your contributions.

b.   May include your opinion on how to better utilize the big image data in the policy-making process.

 

c.    May reduce the detailed planning suggestion to accommodate (a) and (b). The arguments shall be discussed with a greater context about the city which is probably out of scope for this method-driven paper. 

Author Response

Thank you very much for your suggestion for this study

1.We changed the description and title of Figure 13.

2.We have added a summary of the contribution of this research at the end of the discussion section. We believe that the logic of the entire discussion section is smooth,unable to be self-consistent after adjusting the order.

We've reduced some of the Placemaking suggestions, but this is still very important for our study. Our proposal for Placemaking in Haihe public space has always been one of our research goals. Based on the results of our research, we propose several urban design advices.

Regarding opinions on how to make better use of big data in the decision-making process, we believe that the research is applicable to the evaluation of built urban public spaces, but not if decisions are made before the design project begins.

In general, thank you very much for your suggestions. After adding a summary of the contribution of this research in discussion section, the conclusion of this research is more fluent, and also responds to the content of the Introduction section.

Author Response File: Author Response.docx

Reviewer 3 Report

I I think the manuscript has been improved enough to warrant publication in IJGI. The authors have made an important effort and the article is much improved.

Author Response

Thank you very much.

Back to TopTop