Next Article in Journal
Virtual Sea-Drifting Experiments between the Island of Cyprus and the Surrounding Mainland in the Early Prehistoric Eastern Mediterranean
Previous Article in Journal
Characterization of Fungal Melanins from Black Stains on Paper Artefacts
 
 
Article
Peer-Review Record

Deep Learning in Historical Architecture Remote Sensing: Automated Historical Courtyard House Recognition in Yazd, Iran

Heritage 2022, 5(4), 3066-3080; https://doi.org/10.3390/heritage5040159
by Hadi Yazdi 1,*, Shina Sad Berenji 2, Ferdinand Ludwig 1 and Sajad Moazen 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 4: Anonymous
Heritage 2022, 5(4), 3066-3080; https://doi.org/10.3390/heritage5040159
Submission received: 8 August 2022 / Revised: 28 September 2022 / Accepted: 6 October 2022 / Published: 12 October 2022
(This article belongs to the Section Architectural Heritage)

Round 1

Reviewer 1 Report

Your paper is very interesting and clear. There are some  suggestions I would highlight:

·         All figures: please provide source/s of spatial data/maps/sketches.

·      Sub Section 2.1: Airborne and Satellite Data: Describe precisely what data was used, the format of this data, resolution, etc. The table would improve the readability of the data used. How was the data obtained? In what coordinate system?

·         Line 177: The authors gathered 1280 photos … What photos? In what format? Where historical aerial photographs are used, the exact dates should be provided.

·         Line 185:  …. two different labels 185 (historic and non-historic) are distinguished for the training process. Please provide more information about this process.

·         Line 193, 197: How were the templates/samples cut out of the entire photos? Was it a manual process? Could errors have occurred during this process - e.g., the entire building was not cut properly?

·         Line 200: In this article, a binary classification model based on CNNs is proposed to recognize 200 features of historic buildings in Google Earth images. Is it relevant in this part of the article? Are there no other similar articles?

·         Line 246 – (Ref.) ??

·         Line 246 – airborne ?? What does it mean?

After this minor revision, I would suggest the publication in Heritage.

 

Author Response

We thank the reviewer for his/her extensive feedback. We have taken up many of the points.

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript includes a case study of the use of deep learning (convolutional neural networks) and satellite imagery in facilitating recognition of architectural heritage. The authors argue that the study describes new and unusual applications of these technologies and, apart from some exceptions listed below, I agree with that. In most cases the detection and management of the architectural heritage employs photographs (e.g. facade photos, for example in Demir et al., 2021) or LiDAR point clouds (Özeren and Korumaz, 2021), with aerial and satellite imagery being less important. This is an obvious consequence of the lack of resolution of the airborne images and the scarcity of 3D data that can be obtained from such photos, insufficient for implementing true BIM (Building Information Modeling). The manuscript shows that such data sources can be used in the study of architectural heritage. Thus, I agree that the manuscript is an important contribution and should be published. In the comments listed below I suggest minor improvements that can improve the scientific soundness of the paper.

The merits

lines 24-25 - In the abstract the authors state that this is "one of the first efforts to use automated remote sensing 23 techniques for recognizing historical house features". In the introduction, lines 39-44, they provide a list of similar that use machine learning for architectural heritage detection and classification. However, it looks like some earlier contributions which featured both satellite imagery and deep learning (the techniques that are employed by the authors and which are used as a proof of the novelty of the current study) are omitted from the literature review. For example, see Abed et al. (2020) and Maltezos et al. (2018). In my opinion that these two papers covers similar themes and authors should decide if they need to be referenced to here. It would be also interesting to see arguments that will explain what is new in the present study when compared to those I mentioned here.

Although the authors use the deep learning and aerial photographs to detect houses with a long history, there are also examples of the use of similar techniques in studies of more recent historical changes. For example, see earlier contributions regarding the use of satellite imagery and deep learning for the detection of (definitely more recent than historic houses in Yazd) solar panels, for example Zech and Ranalli (2020); for similar study that use historical satellite images, see Wang et al. (2019). If these papers include case studies that employ the same techniques, they should be definitely cited in the manuscript.

Structure

The study is well designed and documented, the text flows well. However, in the discussion section and in conclusions there are certain things that can be improved, see comments below.

lines 343-347 - Please note that there are significant repetitions here. The content is nearly the same as in lines 327-330. If the numbers that document the performance of the model are significant and should be repeated, then I would suggest moving the lines 343-347 to the conclusions section.

lines 358-360 and lines 366-367 - We have here two sentences that include some obvious comments, not the outcomes of the study. I therefore recommend to delete them or move them to the discussion. The authors could also replace them with lines 343-347, where important results of the case study are documented (see my earlier comment).

Editing issues

I cannot comment on the quality of the language, but there are some, fortunately very few, editing issues that should be fixed:

line 179 - "(blue area). And, The dataset was" - remove "And,".

line 246 - "accessible as an open-source product (Ref.)." - I do not understand the "Ref." here. I think that the authors planned to add some references here.

line 257 - "CNNs in python and running the program on the Google Colab" - please use the capital letter in "Python".

lines 428-429 - "In Proceedings of the Proceedings of the Eighth Indian Conference" - I think that "In Proceedings of the Eighth Indian Conference" is more appropriate here.

line 434 - "In Proceedings of the Proceedings of the 2nd ACM International Conference" - The same problem here, I suppose that the authors mean "Proceedings of the 2nd ACM International".

References

Abed, M. H., Al-Asfoor, M., & Hussain, Z. M. (2020). Architectural heritage images classification using deep learning with CNN. Proceedings of the 2nd International Workshop on Visual Pattern Extraction and Recognition for Cultural Heritage Understanding, Bari, Italy.

Demir, G., ÇekmiÅŸ, A., YeÅŸilkaynak, V. B., & Unal, G. (2021). Detecting visual design principles in art and architecture through deep convolutional neural networks. Automation in Construction, 130, 103826.

Maltezos, E., Protopapadakis, E., Doulamis, N., Doulamis, A., & Ioannidis, C. (2018, October). Understanding historical cityscapes from aerial imagery through machine learning. In Euro-Mediterranean Conference (pp. 200-211). Springer, Cham.

Özeren, Ö., & Korumaz, M. (2021). Lidar to HBIM for Analysis of Historical Buildings. Advanced LiDAR, 1(1), 27-31.

Wang, Z., Wang, Z., Majumdar, A., & Rajagopal, R. (2019). Identify Solar Panels in Low Resolution Satellite Imagery with Siamese Architecture and Cross-Correlation. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 7 pp.

Zech, M., & Ranalli, J. (2020, June). Predicting PV Areas in Aerial Images with Deep Learning. In 2020 47th IEEE Photovoltaic Specialists Conference (PVSC) (pp. 0767-0774). IEEE.

Author Response

We thank the reviewer for his/her extensive feedback. We have taken up many of the points.

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The work is interesting to the scientific community. In this study, the applicability of CNNs for historical architectural surveying is demonstrated by using them to recognize historical buildings in aerial and satellite images of historic cities. 

First of all, it must be said that initially it is an interesting investigation from the point of view of its usefulness for detecting historical buildings in a certain area of Iran. However, this utility is very limited since it is applicable to a specific region that has historic buildings with a central courtyard, which is not globally extensible to other cities. It is of interest with respect to the patio detection technique and its positioning in the building. 

The first approach that the authors have to reinforce is why identify historic houses in a context of architectural heritage, that is, for what purpose (i.e.) for cataloging, registration, etc., and how it is possible to identify historic buildings from buildings that do not they are historical. Is there a segmentation methodology to detect morphological patterns? In what objectives would it enter international organizations such as the International Council on Monuments and Sites (ICOMOS). 

On the other hand, Section 1.2 should be simplified and, on the contrary, work more on an independent section between the introduction and the state-of-the-art with a deeper analysis of the work that has already been published and the work that the authors contribute as a novelty. 

In the methodology there are images data used that its approach is deficient, this can alter the data of the learning process. Explain how these drawbacks have been used and how they have been resolved. 

When equations that are not original are used, they must be cited and specify their origin, and if they are original, that is, created by the authors, they must also be specified. 

An interesting aspect is that the authors could reinforce the paper through a comparison between Qgis and TensorFlow to modify our CNNs in Python and run the program on the Google on-line platform. 

Moving on to specifics: 

Lines 2-3: the title could include the mention of the patios since it is the key to the detection 

Lines 14-24: the “Abstract” is correct, although the traditional structure of “objective”, “methodology”, “results”, “conclusions” could have been better indicated. 

Lines 29-68: this part of the introduction summarizes the state of the art regarding deep learning, remote sensing, and its application in the recognition of historic architecture. In this sense, it must be said that although the works he cites are of interest, he should review said bibliography in general, almost all of it is prior to 2018 and there is evidence of the existence of more recent works, even from 2022, that could be of interest to include them. being related to the aforementioned matters. 

Lines 69-114: Regarding the historical question, it seems correct to me. Be careful not to reference sources of figures. 

Lines 115-127: The description can be considered correct, although perhaps at this point or later it could be indicated the limitations of the case for the purposes of application to other regions. 

Lines 128-148: Although the objectives of the research have been reflected and although its use in other regions is pointed out, as has already been repeatedly commented, the limitations in this sense are not mentioned. 

Lines 151-164: the limitations of Google E. in terms of copyright, use licenses, and issues related to accuracy, updating, etc. should be mentioned. Properly cite the source of Figure 5 source 

Lines 165-175: Here, the standard process to follow in image classification is succinctly described. It is understood that an explanatory graphic diagram could be provided. 

Lines 176-303: All these lines provide information on how the data has been collected, processed, applied to algorithms for training and validation, etc. 

Regarding the collection and annotation of data, in principle it can be understood as correct, although some details regarding the 1280 photos and their characteristics and previous selection should have been indicated. Also, the characteristics of the experts and what criteria were followed to ensure their reliability in your case. Regarding data augmentation, rotation and flipping are used and not displacement, although not only these techniques exist. 

 

Regarding the convolutional neural network in this article, it proposes a CNN-based binary classification model based on CNN. In it, an exposition of the origin of convolutional neural networks, structure, and operation of these networks is also made. All this is understood can be summarized by the specific references, especially in lines 200-224 and in the case of the reference to tensorflow up to line 256. 

Regarding the neural network used, although it is described and even referred to in Github, the justification for why this option has been used is not clearly appreciated. It is true that Figure 9 provides a graphic scheme of the binary classification for historic and nonhistoric buildings and that the layers are indicated, and their characteristics, activation function, last layer with sigmoid function, relevant parameters, etc. However, the reason for that design is missing, that is, why that design is reached or how appropriate tests or analyzes have been carried out to arrive at that design and not another that could match or improve the results, not only for precision purposes. , but for example for the purposes of training times, real-time use, etc. 

Regarding training, although the Binary_Crossentropy method is a solution for binary classification tasks, there are other methods and a broader justification could be added as to why this has been used, as is the case with the gradient descent algorithm and the rate of learning established among other questions selected by the authors. 

In general, all these lines describe the network and the training for image classification by already known methods, and what is advised to the authors is that it is necessary to adequately indicate the justification for the choice of this network and all its components, this being even more important than the development of already known concepts (how a network of this type works or what is tensorflow) that fewer explanations and properly referencing them could be enough. 

Finally, there are currently techniques that, compared to proposal classification networks, perform detection, pass the image once through FCNN and output. This architecture would be of great interest for the case and would have to be justified in this regard, since it can include not only greater speed, but also additional data that could provide information not only on the type of building but also on the location and type of courtyard. 

Lines 304-352: Regarding the results, first of all, real images of how the algorithm performs the classification in various cases are missing. This is understood to be very important. 

The exposed results (increasing rate of precision, the loss function) are adequate, but a figure with the confusion matrix and its analysis is missing. 

 

Lines 353-374: Regarding the conclusions, fundamentally, the limitations and the possibility of other faster options that allow not only the distinction between historic or nonhistorical buildings and their generalization to any place in the world should be delved into. 

Author Response

We thank the reviewer for his/her extensive feedback. We have taken up many of the points.

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

The manuscript is focused on deep learning, historical architecture and remote sensing. 

Authors should consider addressing the following remarks:

1. Lines  46-48: are those authors conclusions or those findings are based on literature? If findings are based in literature, authors should add appropriate citations.

2. Same remark as 1 for lines 51-56 and lines 57-60. Especially in the case of central courtyards appropriate literature should be added. 

3. Sections 1.2.and 1.3 should be reorganized. Extensive literature on central courtyards should be added (in terms of historical importance as well as contraction methods for coping with weather conditions). The city of Yazd should be better presented in respect to its spatial location within Iran - map is needed. Figure 1, 2 & 3 should be part of the presentation of the city of Yazd.

4. Section 1.4. is lacking in literature and reference. Authors should consider reorganizing that section - maybe add it in section 2. 

5. In figure 5 houses and neighborhoods are depicted outside the blue area?Authors should better present the city and its zones (see also comment 3).

6. Section 2.2. literature should be added. 

7. Section 2.3 must be reorganized. Authors must describe what photos they used, from which source, at which scale, which year, what was the criteria of their selection etc. 

8. Authors must provide adequate description and analysis on the method they used (especially the one described in lines 188-192).

9. Figure 6 must include type of photo, scale, date of photo etc. 

10. Add literature lines 208-215 (especially in the statement "neural 210 networks have proven to be the most effective of all forms of neural networks in computer vision"). 

11. lines 218-229: better and further analysis - justification for the selected CNN layers (Conv2d and Max-pooling2d). Better presentation - explanation of figure 8. 

12. lines 244-247 reference is missing. 

13. Why is this network architecture selected for the specific case study? Based on what features of the historic buildings etc.? Who the characteristics of courtyards were embedded in the model? 

14. Better analysis of the results. Is not clear how the model worked with the training set. 

15. Conclusions must be enriched. 

Author Response

We thank the reviewer for his/her extensive feedback. We have taken up many of the points.

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

The authors have modified the text with substantial improvements

Reviewer 4 Report

Authors addressed to round 1 comments.

Minor language editing 

Back to TopTop