1. Introduction
The historical architecture of the Chinese nation is considered to be the oldest architectural system with the longest history, the longest existence, and the highest innovation, and has extremely high cultural heritage value. However, with the erosion of time, historic buildings are threatened with destruction and extinction. In recent years, the protection of cultural heritage has been continuously strengthened at the national level and a number of policy documents have been implemented successively. This makes the protection and repair of historical buildings in our country increasingly important and urgent.
With the continuous deepening of 3D digital technology in large-scale cultural heritage protection research [
1], point cloud semantic segmentation has become an important direction for remote sensing applications. Different from traditional pixels in two-dimensional images, point clouds have more detailed depth information and provide a large amount of valuable information to describe the real world [
2]. However, point cloud data also have shortcomings. Although point cloud data contains three-dimensional coordinates and various additional attributes with high precision, high resolution, and high dimension, it cannot directly offer information at the semantic level. These problems make it difficult for cultural heritage experts to directly use point cloud data. Therefore, as a basic link, point cloud segmentation of historical buildings has important research significance.
Nowadays, point cloud semantic segmentation has become the basic technology for three-dimensional scene understanding, and researchers have conducted in-depth exploration on it, but its unique disordered and unstructured characteristics make it difficult to obtain precise and effective semantic segmentation outcomes, which is extremely challenging [
3,
4,
5]. In the early stage, people were committed to using the traditional point cloud semantic segmentation method, which divides the point cloud data into different surface regions according to the feature attributes of the point cloud. Its algorithms are divided into four methods: edge based, region-growing based [
6], model-fitting based [
7] and clustering based. Each algorithm has its own unique advantages and disadvantages in point cloud semantic segmentation, as well as characteristics which are applicable to different scenarios [
8]. Although the traditional point cloud segmentation method performs well in man-made structures with regular geometric shapes and runs faster, there are still some limitations in large-scale historical buildings. For example, most historical buildings consist of a large number of components with irregular shapes, so it is difficult to select suitable geometric models to fit objects. Only relatively rough segmentation results can be obtained.
With the deepening of research, deep learning technology has propelled amazing advancements in point cloud semantic segmentation. Numerous methods for semantic segmentation of point clouds using deep learning techniques have emerged in recent years [
9]. In contrast to the conventional point cloud semantic segmentation technique, the deep learning-based model technology not only has higher performance of multiscale spatial three-dimensional information but also has different granularity levels of semantic information, include partial segmentation, semantic segmentation, and instance segmentation. They can be categorized into three types based on various point cloud extraction techniques: voxel-based methods [
10,
11], projector-based methods [
12,
13], and point-based methods [
14]. The projection-based method and voxel-based method have high computational costs, and it is easy to cause semantic feature or spatial position loss in the process of projection or voxelization. In order to address this issue, researchers constructed a network for collecting features from point clouds without requiring data transformation processes [
15,
16,
17], which can directly consume irregular three-dimensional point cloud data, reduce the limitation of point cloud characteristics, and make full use of point cloud geometry information to improve the interpretation ability of three-dimensional point cloud scenes. PointNet [
18], as a pioneer of deep learning, first provides a network architecture that directly handles the original point cloud. Therefore, many scholars have proposed improved networks based on it, but most of the methods are limited to the input of minuscule three-dimensional point clouds into the network and cannot be directly extended to larger scenarios. Subsequently, Hu et al. [
19] proposed a RandLA-Net network model with better performance in large-scale scenarios. It chooses random sampling instead of the widely used remote point sampling method and extracts geometric features through the local feature aggregation module, which reduces the network complexity and effectively retains the geometric details. On this basis, many methods of local feature aggregation have recently been presented. As an illustration, SCF [
20] introduces spatial representations that are not affected by Z-axis rotation, LACV-Net [
21] uses the neighborhood feature as the offset and converges to the centroid feature, which reduces the local perceptual ambiguity through its similarity, and DGFA-Net [
22] has an expansion graph characteristic aggregate structure.
According to the above analysis, although point-based methods have obtained good accuracy in semantic segmentation, they are rarely used for historical architecture in China. When dealing with large-scale ancient architectural scenes, these methods cannot sufficiently capture both local and global information, especially when faced with uniquely structured historical architectural components. Overemphasis on local features may neglect the spatial geometric structure information of the point cloud. Therefore, the main contributions of this study are as follows:
- (1)
This paper proposes a unique semantic segmentation network named MSFA-Net. It designs a double attention aggregation (DAA) module, which consists of a bidirectional adaptive pooling (BAP) block and a multiscale attention aggregation (MSAA) block. Through the combination of two different attention mechanisms, it can obtain multiscale information features of the target in the sampling process and reduce redundant information.
- (2)
This paper proposes a contextual feature enhancement (CFE) module, which enhances the connection between the model context by fusing the local global features across the encoding and decoding layers and fully considers the semantic gap between neighboring features.
- (3)
This paper proposes an edge interactive classifier (EIC), which introduces the features of each point into the edge interactive classifier to obtain the edge features of each point. Through the information transfer between nodes, it better performs label prediction, making it possible to smoothly segment the edges of objects.
5. Conclusions
This paper proposes an efficient MSFA-Net model to solve the issue of semantic segmentation of efficient architectural scene components. Three modules make up the model. The first module is made up of a bidirectional adaptive pooling block and a multiscale attention aggregation block that employs multilevel and different scale feature information to enhance the network’s capacity to understand the topological relationship of nearby points and minimize redundant data. The second module, called the contextual feature enhancement module, combines local–global characteristics from the encoder and the decoder to enhance the relationship between the model contexts. As the third module, the edge interactive classifier further strengthens the extraction of edge features based on the original so that it can segment the edge of the object more smoothly.
Although this article has validated the superiority of the proposed model on both the public dataset S3DIS and the self-curated historical building dataset, there are still some issues that need to be overcome. Firstly, research needs to continue to enrich the diversity of the dataset. Many types of historical buildings exist in the Chinese nation, and in the future, representative historical buildings from different periods need to be collected to enrich the types of components in the dataset and enhance the universality of the model. Secondly, due to the varying density of point clouds, the segmentation effect of building components with small data volumes and incomplete geometric information is poor. Therefore, future research will set constraint functions for different wooden components to further refine the research. Finally, with the continuous updating and development of collection instruments, the density and quality of point cloud data will continue to improve, and large-scale point cloud data will become more common, with higher annotation costs. Under this requirement, how to reduce model complexity and annotation costs will become the focus of future research.