Next Article in Journal
Reliability-Based Seismic Safety Assessment of the Metropolitan Cathedral of Brasília
Next Article in Special Issue
Four-Dimensional Digital Monitoring and Registering of Historical Architecture for the Preservation of Cultural Heritage
Previous Article in Journal
Optimizing Solar Power Generation in Urban Industrial Blocks: The Impact of Block Typology and PV Material Performance
Previous Article in Special Issue
A Study on Tourist Satisfaction Based on the Conservation and Reuse of Alleyway Spaces in Urban Historic Neighborhoods
 
 
Article
Peer-Review Record

Artificial Intelligence for Routine Heritage Monitoring and Sustainable Planning of the Conservation of Historic Districts: A Case Study on Fujian Earthen Houses (Tulou)

Buildings 2024, 14(7), 1915; https://doi.org/10.3390/buildings14071915
by Jiayue Fan 1, Yile Chen 2,* and Liang Zheng 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Buildings 2024, 14(7), 1915; https://doi.org/10.3390/buildings14071915
Submission received: 30 May 2024 / Revised: 19 June 2024 / Accepted: 20 June 2024 / Published: 22 June 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper addresses an extremely important topic, i.e. the expert survey of the wooden structures of earthen houses, which are typical of the Fujian province (China) with the use of the YOLOv8 model. This model, through multiple experiments, has been gradually improved in order to verify its effectiveness and final reliability in the conducted practical applications.

From the paper's review, the following revisions need to be made:

- The difference between machine learning and artificial intelligence needs to be clearly stated. The title simply refers to machine learning, while the abstract refers to artificial intelligence. Already in the title, it must be made clear that artificial intelligence systems are to be used.

- It should be specified whether the approach to the analysis of wooden structures has taken into account the possible conservation work that could be carried out. It is evident that any type of analysis must be based on goals and, if so, the ultimate goal is the conservation of such buildings. A specific conservation intervention may require a particular analysis and therefore a causal link should already be established in the methodology.

Overall, the paper is interesting both methodologically and in terms of scientific rigour and it is well written with an appropriate language. Some sections could be summarised.

Author Response

Reviewer 1

Comments and Suggestions for Authors

The paper addresses an extremely important topic, i.e. the expert survey of the wooden structures of earthen houses, which are typical of the Fujian province (China) with the use of the YOLOv8 model. This model, through multiple experiments, has been gradually improved in order to verify its effectiveness and final reliability in the conducted practical applications.

From the paper's review, the following revisions need to be made:

- The difference between machine learning and artificial intelligence needs to be clearly stated. The title simply refers to machine learning, while the abstract refers to artificial intelligence. Already in the title, it must be made clear that artificial intelligence systems are to be used.

Response: Thank you for your suggestion. Machine learning is an important branch and core technology of artificial intelligence. Artificial intelligence is a broad concept that aims to enable computers to simulate human intelligence, including learning, reasoning, decision-making, perception, and other capabilities. Machine learning focuses on enabling computers to learn and improve themselves through data and algorithms, thereby gaining the ability to understand data, recognize patterns, and predict them. Machine learning provides an important way and method for artificial intelligence to achieve intelligent behavior. Through machine learning, computers can automatically extract knowledge and rules from large amounts of data and then analyze and predict new data, helping artificial intelligence systems to better complete various tasks, such as image recognition, speech recognition, natural language processing, intelligent recommendations, etc. When we first considered focusing on the description of machine learning, the main research method was machine learning, and the research objectives were achieved through experiments and training.

Currently, we have changed the title to “Artificial Intelligence for Routine Heritage Monitoring and Sustainable Planning of the Conservation of Historic Districts: A Case Study on Fujian Earthen Houses (Tulou)”.

 

- It should be specified whether the approach to the analysis of wooden structures has taken into account the possible conservation work that could be carried out. It is evident that any type of analysis must be based on goals and, if so, the ultimate goal is the conservation of such buildings. A specific conservation intervention may require a particular analysis and therefore a causal link should already be established in the methodology.

Overall, the paper is interesting both methodologically and in terms of scientific rigour and it is well written with an appropriate language. Some sections could be summarised.

Response: Thank you very much for your recognition of our research. In order to improve these causal relationships, we also combined the opinions of the other two reviewers to revise and improve them. At the same time, we replaced some charts and indicators to make them more scientific and reasonable.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes an improved method of detection based on the YOLOv8 model to detect damage to Fujian Earthen Houses' wooden structure. The paper has a certain significance and is within the scope of Buildings. However, the scientificity of the paper needs to be improved, and some technical details need to be supplemented and explained. The main issues are listed below:

1. The detection of holes and cracking based on images can be well understood. How is it possible to detect strains based on images? This is the biggest problem of insufficient scientificity in this paper, and it should not be published without reasonable revisions or explanations.

2. Leaving aside the mention of strains, even the detection of holes and cracking requires professional knowledge to quantify, and calibrate its severity. The content at this aspect seems to be very lacking in this version of the paper, please add the corresponding contents.

3. From Figure 16, it can be found that the network model has basically in a stable state around the 30th epoch. Is it necessary to continue training after the 30th epoch? In fact, after the model reaches the expected accuracy, continuing to train may not only fail to improve accuracy, but also lead to overfitting or local convergence. This point needs to be discussed and explained for Figure 16. Some relevant literature can be cited to support this discussion, such as: https://doi.org/10.1007/s13349-022-00635-8

4. The quality of the Figures 12~17 is too poor and needs to be improved. The figures in this paper are more like for display, but in fact, scientific papers should pay more attention to their scientificity and quantifiability.

5. Figures of confusion matrices for object detection results are suggested adding to clearly show the performance of the proposed methods. More quantifiable indicators such as ‘Recall’ are encouraged to be introduced to enhance the scientificity of this paper.

6. The abstract and conclusion need to be refined and further condensed to highlight innovative points and contributions. Moreover, the length of ‘Limitations and Future Work’ needs to be reduced.

7. Section 7 seems not necessary and is recommended deleting.

Comments on the Quality of English Language

Moderate editing of English language is required.

Author Response

Reviewer 2

Comments and Suggestions for Authors

This paper proposes an improved method of detection based on the YOLOv8 model to detect damage to Fujian Earthen Houses' wooden structure. The paper has a certain significance and is within the scope of Buildings. However, the scientificity of the paper needs to be improved, and some technical details need to be supplemented and explained. The main issues are listed below:

Response: Thank you for your suggestion. We will refer to your valuable suggestions and make revisions. Please refer to the red font part in the revised version for details.

 

  1. The detection of holes and cracking based on images can be well understood. How is it possible to detect strains based on images? This is the biggest problem of insufficient scientificity in this paper, and it should not be published without reasonable revisions or explanations.

Response: Thank you for your suggestion. In the damage detection of wooden structures, in addition to common damage such as holes, cracks, stains, etc., deformation is also worthy of attention. However, since the wooden structures of this study rarely have deformation, it is not included in the main damage types in the study. The specific explanation is as follows:

(1) The image recognition-based model of this study can be applied to the deformation inspection of wooden structures. The specific method is to annotate the texture and edge of the deformed wooden structure surface as a new damage type in model training. Another method is to align the wooden structure images taken at different times using image registration technology. The aligned images can be analyzed by differential analysis, and the differential part can be used as the label in the model to perform image detection tasks. Therefore, in terms of image recognition technology, it is feasible to detect the deformation of wooden structures.

(2) The wooden structures of Fujian Tulou rarely have deformation. The main reason is that the wooden structure design of Tulou is very ingenious. It adopts mortise and tenon structure and frame structure. These traditional craftsmanship methods can effectively disperse and bear the load of the building, reduce the force concentration of the wood, and avoid deformation. The mortise and tenon structure allows the wood components to be tightly combined, with strong integrity and less prone to deformation.

(3) The deformation risk of the wood structure of Fujian Tulou is relatively low. The main reason is that the climate in the mountainous area of ​​Fujian Province where Fujian Tulou is located is mild and humid, with high relative humidity but small changes. This stable environment is conducive to maintaining the moisture content balance of the wood, reducing the expansion and contraction caused by changes in dryness and wetness, thereby reducing the risk of deformation.

 

  1. Leaving aside the mention of strains, even the detection of holes and cracking requires professional knowledge to quantify, and calibrate its severity. The content at this aspect seems to be very lacking in this version of the paper, please add the corresponding contents.

Response: Thank you for your suggestion. We added the following content to the article:

Early wood defect detection mainly relied on stress wave, X-ray and ultrasonic tech-nology. Stress wave technology requires high environmental control and the equip-ment needs to be fixed on the wood. X-ray imaging limits the sensitivity of defect iden-tification due to poor image quality and low visual contrast; although ultrasonic waves can reveal internal and external defects, they are sensitive to external condi-tions and easily interfered. These methods have their own advantages and disad-vantages, which affect the accuracy and efficiency of detection [56]. Therefore, visual judgment is widely used as a standard for early detection and damage judgment of wood materials, and these preliminary assessments provide an effective method for professionals to conduct targeted inspections later with minimal impact on buildings. The use of YOLO technology can be deployed faster and provide preliminary visual judgments for early wood defect detection [57,58].

 

  1. From Figure 16, it can be found that the network model has basically in a stable state around the 30th epoch. Is it necessary to continue training after the 30th epoch? In fact, after the model reaches the expected accuracy, continuing to train may not only fail to improve accuracy, but also lead to overfitting or local convergence. This point needs to be discussed and explained for Figure 16. Some relevant literature can be cited to support this discussion, such as: https://doi.org/10.1007/s13349-022-00635-8

Response: Thank you for your suggestion. During the model training process, although Figure 16 shows that the model is basically stable around the 30th epoch, we believe that it is still necessary to continue training to 200 epochs, mainly based on the following considerations:

Stability verification: Although the model stabilized around the 30th epoch, this does not mean that it will not fluctuate during subsequent training. Continuing training can help verify the model's stability and ensure that it maintains good performance on more training data.

Measures to prevent overfitting: In order to avoid the overfitting problem, we used a variety of regularization methods during the training process, such as dropout and L2 regularization, as well as early stopping technology. In this case, continuing training can further optimize the model parameters without increasing the model validation set loss.

Overcoming local convergence: Training more epochs helps the model jump out of possible local optimal solutions and find better global optimal solutions. Although the model has shown good results at the 30th epoch, we discovered through multiple experiments that continuing training to 200 epochs can increase the model's probability of finding a better parameter combination, thereby improving its generalization ability.

Learning rate scheduling strategy: During the training process, we adopted a learning rate decay strategy. As the training progresses, the learning rate gradually decreases, which helps the model make more detailed parameter adjustments in the later stages of training, thereby improving its final performance. This strategy can work better with long-term training.

In summary, the choice of training 200 epochs in this study is not redundant but is based on a comprehensive consideration of model stability, generalization ability, and global optimal solution. This study uses detailed experimental design and training strategies to ensure that the proposed model has higher reliability and accuracy in practical applications.

 

Regarding the literature, we added the following:

In addition to YOLO vision recognition technology, deep learning has been exten-sively applied in the field of building health monitoring. For instance, in civil engi-neering, researchers have utilized Convolutional Neural Networks (CNN) to auto-matically identify and classify cracks on concrete structural surfaces, significantly en-hancing the efficiency and accuracy of crack detection. In the domain of steel structur-al bridges, scholars have proposed a method based on Bidirectional Long Short-Term Memory (BiLSTM) networks. This method learns the temperature-strain distribution characteristics of steel box girders under non-uniform temperature fields, achieving intelligent perception of structural strain field redistribution [65]. These research out-comes demonstrate that deep learning techniques can effectively process massive monitoring data and uncover the intrinsic patterns of structural responses, serving as a crucial supplement to traditional manual inspection methods.

Given that wooden structures also face issues such as aging degradation and bio-logical decay, deep learning holds promising prospects for the health assessment of wooden constructions. By deploying sensors to measure temperature, humidity, and strain at critical points of wooden structures, and utilizing deep learning algorithms to analyze long-term monitoring data, it is feasible to detect internal anomalies in the wood timely, thereby providing early warnings of potential structural safety hazards. This approach can overcome the limitations of visual inspections and achieve a mac-roscopic evaluation of the state of wooden constructions. If integrated with localized detection technologies like YOLO, it could form a comprehensive, multi-scale health monitoring system for wooden structures, offering significant technical support for the sustainable protection of wooden constructions. However, due to the significant dif-ferences in material properties between wood and steel, the application of deep learn-ing in the wooden structure realm requires extensive exploration and optimization, presenting a broad scope for innovation to researchers in the related fields.

 

  1. The quality of the Figures 12~17 is too poor and needs to be improved. The figures in this paper are more like for display, but in fact, scientific papers should pay more attention to their scientificity and quantifiability.

Response: Thank you for your suggestion. In order to improve the quality of the pictures, we will redraw all the pictures mentioned to have better quality. At the same time, combined with the requirements of your next question, we will add confusion matrix diagrams and more quantitative indicators to enhance the scientific nature of the article.

 

  1. Figures of confusion matrices for object detection results are suggested adding to clearly show the performance of the proposed methods. More quantifiable indicators such as ‘Recall’ are encouraged to be introduced to enhance the scientificity of this paper.

Response: Thank you for your suggestion. We will replace the images involved and add more quantifiable metrics as well as confusion matrix plots.

 

  1. The abstract and conclusion need to be refined and further condensed to highlight innovative points and contributions. Moreover, the length of ‘Limitations and Future Work’ needs to be reduced.

Response: Thank you for your suggestion. We have condensed this passage.

The main conclusions of this study are: (1) The YOLOv8 model was optimized through multiple experiments to perform well in wooden structure damage detection. By removing samples with complex backgrounds, improving label quality, and adjusting hyperparameters, the model's detection accuracy and stability were significantly improved. In the final experiment, the model achieved an overall mAP of 57.48% and was able to capture almost all damage points in the field test, meeting the needs of testing work.(2) Feature layer analysis demonstrated the model's powerful ability to process and identify damage to wooden structures at different scales. Using uncropped high-resolution on-site photos, the model accurately identified and annotated various types of damage on wooden surfaces, verifying its effectiveness in practical applications. (3) In the field test at KuiJu Lou in Fujian, the model performed well in complex environments and reliably detected damage types such as holes, stains, and cracks in wooden structures, confirming its efficiency and stability in practical applications and providing technical support for Fujian Tulou protection and restoration.

This research has three main advantages: (1) Enhancing the efficiency and accuracy of cultural heritage protection by implementing automated and intelligent detection of wooden structure damage, reducing errors, omissions, and workload associated with manual inspection. (2) Expanding the application of computer vision in cultural heritage protection by verifying its feasibility and practicality through detailed experiments and analysis, providing valuable reference and inspiration for future research. (3) Providing valuable insight and data support for further model optimization and adjustment through in-depth experimental analysis of the model's working principles and performance in detecting and identifying wooden structure damage.

 

6.2 Limitations and Future Work

Despite the remarkable results, this study has some limitations:

(1) The model's accuracy in detecting cracks requires improvement, particularly when faced with complex backgrounds or light damage, leading to missed and false detections.

(2) The model's detection performance varies depending on the type of damage, with better results for some types (e.g., holes) and relatively poor results for others (e.g., stains and cracks).

(3) The current experiments focus mainly on the specific scenario of Fujian Tulou, and the model's adaptability and generalization ability for other types of wooden structures have not been fully verified.

Future research can be developed in the following directions:

(1) Dataset expansion: Collect more diverse wood structure damage data, espe-cially from different environments and backgrounds, to improve the model's generali-zation ability.

(2) Model optimization: Explore advanced model architectures and training methods, such as multi-task learning and transfer learning, to improve detection per-formance on different damage types.

(3) Algorithm innovation: Investigate model structures and algorithms more suitable for assessing the damage characteristics of wooden materials to enhance ac-curacy and robustness.

(4) Multimodal learning: Combine data from various sources, such as drone im-agery, laser scanning, hyperspectral imaging, or X-ray imaging, to achieve more com-prehensive and accurate detection.

(5) Practical application and system deployment: Focus on the model's deploy-ment and optimization in real-world scenarios, integrating it into mobile terminals, drone devices, or portable devices for on-site technical support and decision-making assistance.

(6) Cross-scenario application: Apply the model to other types of wooden struc-tures to verify its adaptability and stability in different application scenarios and ex-pand its scope of application.

 

  1. Section 7 seems not necessary and is recommended deleting.

Response: Thank you for your suggestion. We have removed this section.

 

Comments on the Quality of English Language

Moderate editing of English language is required.

Response: Thank you for your suggestion. For rigor and accuracy, we used MDPI's English proofreading and editing services to ensure the content of the article.

Reviewer 3 Report

Comments and Suggestions for Authors

 

1.       What are the main objectives of the study on the Fujian Earthen Houses (Tulou) conducted by Jiayue Fan, Yile Chen, and Liang Zheng?

2.       How does the YOLOv8 model function in detecting damage to the wooden structures of Fujian Earthen Houses?

3.       What specific types of damage were successfully identified by the YOLOv8 model during the field tests in KuiJu Lou?

4.       What historical and architectural significance do Fujian Tulous hold, and how has this influenced their recognition by UNESCO?

5.       Discuss the evolution of Tulou architecture from the Yuan Dynasty to the late 20th century, highlighting the key changes in construction materials and techniques.

6.       What are the main challenges faced by professionals in evaluating and repairing the wooden structures of Fujian Tulous?

7.       How do the results of this study demonstrate the effectiveness and reliability of the YOLOv8 model in practical applications?

8.       What improvements were made to the YOLOv8 model to enhance its detection accuracy and stability?

9.       In what ways does the application of machine learning-based object detection methods overcome the limitations of traditional evaluation methods for Tulou maintenance?

10.   Explain the significance of achieving an mAP of 57.48% in the final experiment with the YOLOv8 model.

11.   How does the integration of AI technology in heritage monitoring contribute to the sustainable conservation of historic districts?

12.   What role does wood play in the construction and maintenance of Fujian Tulous, and why is it particularly susceptible to damage?

13.   How has the influx of tourists impacted the structural integrity of Fujian Tulous, and what measures can be taken to mitigate this impact?

14.   Compare and contrast the use of the YOLO algorithm in different fields such as security monitoring, autonomous driving, and architectural heritage preservation.

15.   What are the implications of using computer vision technology for the preservation and monitoring of cultural heritage sites?

16.   Discuss the transition in preservation philosophy from focusing on individual buildings to safeguarding entire historic districts.

17.   How does the study address the balance between preserving original wooden components and replacing damaged ones in Tulou restoration?

18.   What are the potential benefits of implementing a real-time monitoring system for wooden structures using YOLO technology?

19.   How can the methodology developed in this study be applied to other types of historical buildings beyond Fujian Tulous?

20.   What future developments in AI and computer vision technology could further enhance the preservation of architectural heritage?

Comments on the Quality of English Language

 

1.       What are the main objectives of the study on the Fujian Earthen Houses (Tulou) conducted by Jiayue Fan, Yile Chen, and Liang Zheng?

2.       How does the YOLOv8 model function in detecting damage to the wooden structures of Fujian Earthen Houses?

3.       What specific types of damage were successfully identified by the YOLOv8 model during the field tests in KuiJu Lou?

4.       What historical and architectural significance do Fujian Tulous hold, and how has this influenced their recognition by UNESCO?

5.       Discuss the evolution of Tulou architecture from the Yuan Dynasty to the late 20th century, highlighting the key changes in construction materials and techniques.

6.       What are the main challenges faced by professionals in evaluating and repairing the wooden structures of Fujian Tulous?

7.       How do the results of this study demonstrate the effectiveness and reliability of the YOLOv8 model in practical applications?

8.       What improvements were made to the YOLOv8 model to enhance its detection accuracy and stability?

9.       In what ways does the application of machine learning-based object detection methods overcome the limitations of traditional evaluation methods for Tulou maintenance?

10.   Explain the significance of achieving an mAP of 57.48% in the final experiment with the YOLOv8 model.

11.   How does the integration of AI technology in heritage monitoring contribute to the sustainable conservation of historic districts?

12.   What role does wood play in the construction and maintenance of Fujian Tulous, and why is it particularly susceptible to damage?

13.   How has the influx of tourists impacted the structural integrity of Fujian Tulous, and what measures can be taken to mitigate this impact?

14.   Compare and contrast the use of the YOLO algorithm in different fields such as security monitoring, autonomous driving, and architectural heritage preservation.

15.   What are the implications of using computer vision technology for the preservation and monitoring of cultural heritage sites?

16.   Discuss the transition in preservation philosophy from focusing on individual buildings to safeguarding entire historic districts.

17.   How does the study address the balance between preserving original wooden components and replacing damaged ones in Tulou restoration?

18.   What are the potential benefits of implementing a real-time monitoring system for wooden structures using YOLO technology?

19.   How can the methodology developed in this study be applied to other types of historical buildings beyond Fujian Tulous?

20.   What future developments in AI and computer vision technology could further enhance the preservation of architectural heritage?

Author Response

Reviewer 3

Comments and Suggestions for Authors

  1. What are the main objectives of the study on the Fujian Earthen Houses (Tulou) conducted by Jiayue Fan, Yile Chen, and Liang Zheng?

Response: Thank you for your suggestion. We added the aim of our study in the Abstract and 1.3. Problem Statement and Objectives.

 

  1. How does the YOLOv8 model function in detecting damage to the wooden structures of Fujian Earthen Houses?

Response: Thank you for your suggestion. We added the following content to the article:

The test results of column A show that the model successfully identified multiple holes (marked in blue) and cracks (marked in red). From the original image to the re-sult image, it can be seen that the model accurately locates these types of damage, es-pecially in the complex texture background, and it can still effectively detect small holes and cracks.

The detection results in column B also demonstrate the model's strong perfor-mance. In the original image, there are many types of damage on the surface of the wooden column, as well as shadows covering it. The model accurately identifies these damages in the result image, particularly the stains and cracks that remain unaffected by shadows, demonstrating the model's reliability in handling intricate damage fea-tures.

The inspection results for column C further confirm the model's stability. The re-sult image clearly marks the large stains and elongated cracks from the original image, demonstrating the model's high accuracy in identifying these types of damage. Among them, the model performs well in detecting cracks and can still identify partially hid-den cracks even when covered by large stains.

In the inspection results of column D, the model demonstrated its comprehensive detection capabilities for multiple types of damage and its ability to resist interference from irrelevant elements. The original image clearly shows holes and cracks on the wooden column's surface, as well as a specific area in the background. The model ac-curately marked these damaged locations in the result image. Especially in terms of hole detection, the model can accurately locate multiple densely distributed holes, showing its advantage in detail processing.

The test results for column E show that the model is also good at detecting cracks and stains on wooden columns. The result map clearly marks all major damaged areas, especially the identification of long cracks. The model has efficient detection capabili-ties, demonstrating its wide applicability in detecting different types of damage.

The column F's detection results demonstrate the model's ability to detect a varie-ty of small stains. The original image contains complex textures and multiple types of damage. The model accurately marks all major types of damage in the result image, demonstrating its efficiency and stability in dealing with complex scenes.

The on-site measurement of the wooden columns of the KuiJu Lou building vali-dated the damage detection capability of the model in a complex real-world environ-ment. The model can accurately identify and annotate various types of damage, such as holes, stains, and cracks in the wooden structure, showing its reliability and effi-ciency in practical applications. This test result further proves the model's superior performance in detecting damage to wooden structures and provides strong technical support for Fujian Tulou protection and restoration.

 

  1. What specific types of damage were successfully identified by the YOLOv8 model during the field tests in KuiJu Lou?

Response: Thank you for your suggestion. We added the following content to the article:

The test results of column A show that the model successfully identified multiple holes (marked in blue) and cracks (marked in red). From the original image to the re-sult image, it can be seen that the model accurately locates these types of damage, es-pecially in the complex texture background, and it can still effectively detect small holes and cracks.

The detection results in column B also demonstrate the model's strong perfor-mance. In the original image, there are many types of damage on the surface of the wooden column, as well as shadows covering it. The model accurately identifies these damages in the result image, particularly the stains and cracks that remain unaffected by shadows, demonstrating the model's reliability in handling intricate damage fea-tures.

The inspection results for column C further confirm the model's stability. The re-sult image clearly marks the large stains and elongated cracks from the original image, demonstrating the model's high accuracy in identifying these types of damage. Among them, the model performs well in detecting cracks and can still identify partially hid-den cracks even when covered by large stains.

In the inspection results of column D, the model demonstrated its comprehensive detection capabilities for multiple types of damage and its ability to resist interference from irrelevant elements. The original image clearly shows holes and cracks on the wooden column's surface, as well as a specific area in the background. The model ac-curately marked these damaged locations in the result image. Especially in terms of hole detection, the model can accurately locate multiple densely distributed holes, showing its advantage in detail processing.

The test results for column E show that the model is also good at detecting cracks and stains on wooden columns. The result map clearly marks all major damaged areas, especially the identification of long cracks. The model has efficient detection capabili-ties, demonstrating its wide applicability in detecting different types of damage.

The column F's detection results demonstrate the model's ability to detect a varie-ty of small stains. The original image contains complex textures and multiple types of damage. The model accurately marks all major types of damage in the result image, demonstrating its efficiency and stability in dealing with complex scenes.

The on-site measurement of the wooden columns of the KuiJu Lou building vali-dated the damage detection capability of the model in a complex real-world environ-ment. The model can accurately identify and annotate various types of damage, such as holes, stains, and cracks in the wooden structure, showing its reliability and effi-ciency in practical applications. This test result further proves the model's superior performance in detecting damage to wooden structures and provides strong technical support for Fujian Tulou protection and restoration.

 

  1. What historical and architectural significance do Fujian Tulous hold, and how has this influenced their recognition by UNESCO?

Response: Thank you for your suggestion. Fujian Tulou has important historical and architectural significance. These unique architectural forms not only reflect the essence of Chinese traditional culture and architectural skills, but also reflect the lifestyle and social organization structure of local residents. Fujian Tulou was mainly built from the 12th century to the early 20th century, and was created by the Hakka people and the Minnan people in Fujian to cope with the mountainous terrain and social unrest. As a unique form of residence, Tulou is not only a residence for residents, but also a defensive fortification that can resist external attacks. The historical significance of this architectural form lies in its defensive function, social organization and cultural heritage. The heavy walls and closed structure design of the Tulou effectively defended against the invasion of bandits and foreign enemies and protected the residents living in it. The interior of the Tulou is usually inhabited by a large family. Each family has a separate room but shares a public space. This lifestyle promotes close contact and cooperation between family members. In addition, the Tulou retains a rich Hakka culture and folk customs, and inherits traditional culture and customs through activities such as sacrifices and festivals.

Fujian Tulou has a unique architectural style, which combines the traditional architectural elements of the Central Plains Han people and the characteristics of the local natural environment, and has a high architectural and artistic value. These buildings are usually round or square, built with raw earth ramming technology, with a wall thickness of 1.5 to 2 meters, and have good thermal insulation and earthquake resistance. The interior of the Tulou is surrounded by multiple layers, with a public space in the center and connected by corridors on each floor, which not only ensures privacy but also enhances the interaction among family members. This structural design not only reflects the practicality of the Tulou, but also demonstrates its architectural aesthetics and clever use of space. In addition, the Tulou mainly uses loess, wood and bamboo taken from the local area, reflecting the concept of ecological protection and adapting to local conditions. This construction method not only adapts to the local natural environment, but also reflects the wisdom and creativity of the Hakka people.

Fujian Tulou was inscribed on the World Heritage List in 2008 for its outstanding universal value. Fujian Tulou is a unique form of civil architecture in the mountainous areas of southern China, representing an outstanding example of traditional Chinese defensive residential architecture. As a unique cultural phenomenon, Tulou reflects the uniqueness and diversity of Hakka culture and is an important part of the culture of ethnic minorities in southern China. Many Tulou are well preserved and still serve as places for residence and community activities, showing the combination of history and modern life. The Tulou is well preserved and can provide rich historical information, which is of great significance for the study of social history and cultural changes in the mountainous areas of southern China.

Fujian Tulou has been highly recognized by UNESCO and has become part of the World Heritage for its unique historical and architectural value, as well as its important role in cultural inheritance and social organization. Tulou is not only a symbol of the wisdom and creativity of the Hakka people, but also an important part of Chinese traditional culture, which deserves to be cherished and protected.

 

So, We added the following content to the article:

The architectural structures known as Fujian Tulou (earthen houses) are primarily situated within Fujian Province, China. In a wider sense, Tulou architecture is dis-persed across the southeast and south of China, with a smaller quantity distributed in Southeast Asia. UNESCO has recognized 46 Tulous in Fujian Province as “outstanding examples of traditional architecture and function” and listed them as World Heritage Sites [1]. The Tulou, with a history spanning several dynasties and nearly 800 years, from the Yuan Dynasty (A.D.1271-A.D.1368) through to the latter half of the 20th century [2], have had their main characteristics shaped and defined by local Hakka and Minnan residents, who have adapted their construction to the local conditions [3]. The emergence of the Tulou is associated with the turbulent political climates in the late periods of various dynasties. The origin of Tulou architecture in Fujian dates back to the late Song Dynasty (A.D. 960 - A.D. 1279) and the early Yuan Dynasty (A.D.1271-A.D.1368), with a significant increase in Tulou construction towards the end of the Ming Dynasty (A.D.1488–A.D.1605) [4]. These Tulous represent a unique form of communal living and defensive organization in the mountainous regions of southern China. Moreover, their harmonious interaction with the environment posi-tions them as exemplary models of human habitation [5].

 

  1. Discuss the evolution of Tulou architecture from the Yuan Dynasty to the late 20th century, highlighting the key changes in construction materials and techniques.

Response: Thank you for your suggestion. We added the following content to the article:

During the Ming and Qing Dynasties (A.D.1368-A.D.1912), tulous architecture further evolved, introducing a round form and significantly increasing in scale. The transition from simple rammed earth materials to a combination of earth and wood, as well as a mixture of earth and stone, marked the beginning of a more complex construction process. The extensive use of wooden materials shifted the function of these structures from military to residential, integrating clan and community living with cultural sig-nificance [11].

 

  1. What are the main challenges faced by professionals in evaluating and repairing the wooden structures of Fujian Tulous?

Response: Thank you for your suggestion. We added the following content to the article:

Additionally, the daily accommodation of large populations and the influx of tourists have led to an increase in damage to the wooden structures.

 

  1. How do the results of this study demonstrate the effectiveness and reliability of the YOLOv8 model in practical applications?

Response: Thank you for your suggestion. We added the following content to the article:

In the final experiment, the model's overall mAP was only 57.48% at most. However, during the field test, the model successfully identified nearly all damage points, including holes, stains, and cracks in the wooden structure of the earth building, effectively fulfilling the requirements of the detection task.

 

  1. What improvements were made to the YOLOv8 model to enhance its detection accuracy and stability?

Response: Thank you for your suggestion. Specifically, the researchers significantly improved the model's detection accuracy and stability by removing samples with complex backgrounds, improving label quality, and adjusting hyperparameters.

 

  1. In what ways does the application of machine learning-based object detection methods overcome the limitations of traditional evaluation methods for Tulou maintenance?

Response: Thank you for your suggestion. We added the following content to the article:

The machine learning-based object detection method can efficiently and accurately identify the damaged content, overcoming the limitations of traditional evaluation methods in terms of man-power and time costs.

 

  1. Explain the significance of achieving an mAP of 57.48% in the final experiment with the YOLOv8 model.

Response: Thank you for your suggestion. In the final experiment, although the overall mAP of the model was only 57.48% at its highest, in the field test, when detecting various damage types such as holes, stains, and cracks in the earth building wooden structure, the model was able to capture al-most all damage points, basically meeting the needs of testing work.

 

  1. How does the integration of AI technology in heritage monitoring contribute to the sustainable conservation of historic districts?

Response: Thank you for your suggestion. We added the following content to the article:

With the development of big data and artificial intelligence technology, it has become possible to conduct non-destructive testing and regular heritage monitoring in histori-cal districts such as Tulou.

 

  1. What role does wood play in the construction and maintenance of Fujian Tulous, and why is it particularly susceptible to damage?

Response: Thank you for your suggestion. We added the following content to the article:

As a significant architectural cultural heritage, the heavy reliance on wooden materi-als renders Tulous particularly susceptible to damage from natural causes in southern China [12].

 

  1. How has the influx of tourists impacted the structural integrity of Fujian Tulous, and what measures can be taken to mitigate this impact?

Response: Thank you for your suggestion. We added the following content to the article:

Therefore, the local application of YOLO visual technology can prioritize the maintenance and care of severely damaged and heavily visited Tulous, thereby allo-cating limited maintenance funds and enhancing the sustainability of these buildings. The dense aggregation of local wooden buildings also allows YOLO technology to be popularized locally at a low cost to prevent possible accidents caused by issues with wooden materials.

 

  1. Compare and contrast the use of the YOLO algorithm in different fields such as security monitoring, autonomous driving, and architectural heritage preservation.

Response: Thank you for your suggestion. We added the following content to the article:

The YOLO algorithm transforms the object detection problem into a regression problem, achieving end-to-end object detection through a single convolutional neural net-work [36]. Subsequently, researchers have introduced versions such as YOLOv2, YOLOv3, YOLOv4, and YOLOv5 [37], which further enhance the performance and applicability of the algorithm. Areas such as security monitoring [38], autonomous driving [39], medical imaging [40], and industrial quality inspection [41] have widely applied the YOLO series. Looking forward, the development of the YOLO algorithm may include integration with other technologies and expansion into more application fields [42].

 

The current application of the YOLO algorithm across various fields such as secu-rity monitoring, autonomous driving, and architectural heritage preservation exhibits consistent similarities. Notably, it replaces traditional manual recognition and opera-tions with a more advanced and economical approach, addressing potential issues in-herent in these traditional methods [49,50]. Furthermore, the adoption of deep learning and related techniques enhances the method’s long-term scalability and future up-grade capabilities.

 

  1. What are the implications of using computer vision technology for the preservation and monitoring of cultural heritage sites?

Response: Thank you for your suggestion. We added the following content to the article:

By summarizing the types of damage to Tulous' wooden components and establishing a corresponding image database of the damage, training intelligent recognition models can assist managers in promptly discovering and locating problems with wooden components, thereby formulating targeted maintenance and repair plans.

 

  1. Discuss the transition in preservation philosophy from focusing on individual buildings to safeguarding entire historic districts.

Response: Thank you for your suggestion. We added the following content to the article:

Since the concept of protecting buildings and areas of historical value was first proposed in the 1933 Athens Charter, the philosophy surrounding the preservation of historical districts has progressively evolved and deepened. By the time of the 1976 Nairobi Recommendation, historical districts were depicted as vivid representations of diverse cultural, religious, and social characteristics, emphasizing the importance of preserving these areas for future generations [22]. The recommendation also high-lighted that integrating the preservation of historical districts with modern social life should become a core component of urban planning and land development.

Further developments were made in 2005 with the Vienna Memorandum and the Declaration on the Conservation of Historic Urban Landscapes, which expanded the concept of historic landscape conservation. These documents moved beyond the tradi-tional confines of historic centers and building complexes to include a broader range of land and landscape environments, marking a significant expansion in the preservation philosophy for historical areas. Collectively, these documents underscore the pivotal role of historical districts in maintaining cultural heritage and fostering social identity, while also reflecting the evolution of preservation strategies to meet the demands of modern urban development.

 

  1. How does the study address the balance between preserving original wooden components and replacing damaged ones in Tulou restoration?

Response: Thank you for your suggestion. We added the following content to the article:

According to field survey data, wooden components in Fujian Tulous generally suffer from varying degrees of damage. During the repair process, maintenance per-sonnel frequently need to strike a balance between preserving original components and replacing damaged ones in order to preserve Tulous's historical authenticity to the maximum extent. In light of the generally limited maintenance funds available for Tu-lou, and the current context where the upkeep and operation of Tulous within tourist areas are managed by local cultural and tourism companies, the financial constraints have necessitated multifaceted considerations in the preservation efforts. As the Tu-lous have developed over recent years, there is a compelling need to extend the lifespan of wooden components that are not severely damaged as much as possible. At the same time, the adherence of newly replaced wooden components to standard spec-ifications under a low-budget maintenance regime is subject to scrutiny. This situation underscores the complex decisions involved in preserving these historical structures while navigating the challenges of limited funding and regulatory compliance.

 

  1. What are the potential benefits of implementing a real-time monitoring system for wooden structures using YOLO technology?

Response: Thank you for your suggestion. We added the following content to the article:

The machine learning-based object detection method can efficiently and accurately identify the damaged content, overcoming the limitations of traditional evaluation methods in terms of man-power and time costs.

 

  1. How can the methodology developed in this study be applied to other types of historical buildings beyond Fujian Tulous?

Response: Thank you for your suggestion. We added the following content to the article:

In the Asian region, where buildings predominantly constructed with wood are widely distributed, there is a pressing need for more economical and rapid long-term monitoring solutions. This context provides significant opportunities for broader adoption of this model. Additionally, the characteristics of YOLO technology also en-dow it with the potential to develop monitoring solutions for other types of architec-tural materials, such as stone, meeting the diverse conservation needs of various ar-chitectural heritages. This adaptability ensures that the technology can be tailored to the specific requirements of different types of building heritage, enhancing its applica-bility across a broad spectrum of conservation scenarios.

 

  1. What future developments in AI and computer vision technology could further enhance the preservation of architectural heritage?

Response: Thank you for your suggestion. We added the following content to the article:

In the Asian region, where buildings predominantly constructed with wood are widely distributed, there is a pressing need for more economical and rapid long-term monitoring solutions. This context provides significant opportunities for broader adoption of this model. Additionally, the characteristics of YOLO technology also en-dow it with the potential to develop monitoring solutions for other types of architec-tural materials, such as stone, meeting the diverse conservation needs of various ar-chitectural heritages. This adaptability ensures that the technology can be tailored to the specific requirements of different types of building heritage, enhancing its applica-bility across a broad spectrum of conservation scenarios.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The reviewer finds that the authors have made extensive revisions according to the comments. Minor revision needs to be done before recommending publication:

1. From the response to the previous round of comments, the reviewer finds that the authors seem to have confused the definitions of strain and deformation. Strain is a local behavior caused by structural degradation, which is related to stress. Deformation is a global behavior caused by structural degradation, which is related to displacement. Although deformation can cause strain, it is not equal to strain. The author must provide reasonable and appropriate discussion and explanation in the text.

a. Regarding the definition of strain, the author can refer to relevant literature, such as: https://doi.org/10.1109/TIM.2023.3343742

b. Regarding the definition of deformation, the author can refer to relevant literature, such as: https://doi.org/10.1109/JSEN.2023.3294912

2. In the monitoring and damage detection of architectural heritage, there will be a large number of interference from abnormal signals such as noise or any other factors of uncertainty. They are not related to structural degradation. Relevant intelligent algorithms should improve their robustness to the interference of uncertainty. This point is suggested being added to ‘Limitations and Future Work’. The above two references can also serve as support for the discussion of this point.

Comments on the Quality of English Language

Minor editing of English language is required.

Author Response

Dear Reviewer 2,
Thank you very much for your prompt response and your opinions. In this round, we have added the following changes: (Please refer to the red font in the article)

Strain in engineering structures describes the change in an object's unit length following a force application. It is a dimensionless quantity that represents a relative measurement of a material's degree of deformation. Strain usually manifests itself as the stretching or compression of an object in a local area, reflecting the stress state of the material's internal structure. Deformation refers to the displacement and shape change of an object as a whole in space. The change in position between the parts of the object enables the observation and measurement of its macroscopic manifestation. Usually, deformation corresponds to structural changes like bending, twisting, or compression.

While YOLOv8 itself focuses on identifying and localizing objects in images, it can be trained to identify cracks or other signs of damage in the wooden structure of the earthen building, thereby indirectly helping to monitor strain and deformation. However, directly detecting strain and deformation usually requires combining sensor data or specialized image processing techniques rather than visual object detection models alone [66, 67].

Furthermore, in the process of monitoring and detecting damage to architectural heritage, one frequently encounters interference from noise or other uncertain factors, typically unrelated to the structural degradation itself. This situation may lead to inaccurate or misunderstood monitoring results, which in turn affects the reliability of damage assessment. Therefore, improving the robustness of intelligent algorithms to these uncertain interferences becomes a key challenge. Future work will focus on developing more efficient algorithms to distinguish between structural degradation signals and unrelated noise or interference to improve the accuracy and reliability of detection. This will not only improve current monitoring technology, but also provide more solid scientific and technological support for heritage protection. In addition, future research can also consider exploring the further application of machine learning and artificial intelligence technologies in this field, especially the potential advantages in data processing and real-time monitoring. 

We sincerely hope that you will consider our manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

Accept

Author Response

Thank you very much for your recognition of our research.

Back to TopTop