Extraction and Analysis of the Spatial Morphology of a Heritage Village Based on Digital Technology and Weakly Supervised Point Cloud Segmentation Methods: An Innovative Application in the Case of Xisongbi Village in Jiexiu City, Shanxi Province

Chang, Ruixin; Wang, Jinping; Li, Lei; Chen, Dengxing

doi:10.3390/su17083349

Open AccessArticle

Extraction and Analysis of the Spatial Morphology of a Heritage Village Based on Digital Technology and Weakly Supervised Point Cloud Segmentation Methods: An Innovative Application in the Case of Xisongbi Village in Jiexiu City, Shanxi Province

College of Architecture and Art, Taiyuan University of Technology, Taiyuan 030024, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(8), 3349; https://doi.org/10.3390/su17083349

Submission received: 21 February 2025 / Revised: 7 April 2025 / Accepted: 7 April 2025 / Published: 9 April 2025

Download

Browse Figures

Versions Notes

Abstract

Due to the imbalance between urban and rural development and improper management, the spatial forms of many heritage villages have suffered severe damage, and their landscape styles are gradually being blurred, posing serious challenges to the protection of traditional villages. Taking the traditional village of Xi Songbi in Jiexiu City, Shanxi Province, as a case study, this paper employs UAV low-altitude multi-view measurement technology to obtain high-resolution image data from different angles. Three-dimensional modeling technology is then used to construct a 3D real-world model, orthophotos, and point cloud data of the settlement. Based on these data, the weakly supervised point cloud segmentation method, DDLA, is further applied to finely segment and classify the acquired point cloud data, accurately extracting key spatial elements such as buildings, roads, and vegetation, thereby enabling a comprehensive and quantitative analysis of the spatial morphology of traditional villages. The results of the study show the following: (1) The use of UAVs for low-altitude multi-view measurement not only greatly improves the efficiency of data acquisition but also provides millimeter-level precision spatial data in a short time through the constructed 3D models and orthophotos. (2) The acquired point cloud data can be processed through the DDLA, which effectively differentiates building contours from other environmental elements. (3) The calculation and analysis of the segmented point cloud data can accurately quantify key spatial morphology elements, such as the dimensions of traditional village buildings, spacing, and road widths, ensuring the scientific rigor and reliability of the data. (4) The comprehensive application of digital technology and point cloud segmentation methods provides clear expectations and solid technical support for the quantitative study of the spatial morphology of traditional villages, laying a scientific foundation for the protection and sustainable development of cultural heritage.

Keywords:

digital technology; weakly supervised point cloud partitioning; low-altitude multi-view measurement techniques; quantitative analytics; traditional village of West Songbi

1. Introduction

1.1. Backgrade

It is clearly stated in the report of the Twentieth National Congress of the Party that “the protection of cultural relics and cultural heritage should be strengthened, and the protection and inheritance of history and culture in urban and rural construction should be enhanced”. As an important carrier of outstanding traditional Chinese culture, cultural heritage has profound historical and cultural value, artistic value, social functional value, and economic value [1]. The protection and inheritance of cultural heritage are closely linked to the country’s cultural security and long-term sustainable development strategy. China’s cultural heritage is significantly enriched by traditional settlements, which possess rich cultural connotations and unique spatial characteristics. The spatial components are primarily composed of the geographical environment, building layout, and road transportation. The acceleration of the urbanization process, while it has promoted economic growth, has also brought about several challenges [2]. The problems of population loss, collapsing buildings, destroying spatial patterns, and hollowing out villages are becoming more pronounced [3,4]. These changes have led to significant changes in ancient villages’ original spatial morphological characteristics. If this phenomenon is not effectively emphasized, the revival of national culture will be seriously threatened [5,6]. In this context, how to apply scientific and practical methods for the sustainable protection and development of traditional villages has become the core topic of related research [7]. Hence, effectively gathering and interpreting spatial datasets plays a vital role in both conserving cultural heritage resources and sustaining the unique spatial patterns observed in traditional village environments.

Current investigations demonstrate a notable scarcity in methodological approaches to geospatial data acquisition within traditional village studies, with systematic explorations of spatial configuration dynamics remaining underdeveloped [8]. The spatial pattern of traditional villages is a concentrated manifestation of their history, culture, ecological environment, and spatial development [9,10]. Although scholars have conducted research from various perspectives, such as landscape characteristics [11], environment [12], historical and cultural elements, cultural factors, and tourism potential [13], the systematic analysis of traditional village spatial patterns from macro to micro levels is still lacking. However, their data collection method has an extended period, insufficient accuracy, and intense subjectivity, and the manual measurement method may cause damage to cultural relics. Saadatseresht et al. [14] stated that UAV photogrammetry has higher spatial data reliability than traditional measurement techniques. It is an effective alternative due to its non-contact measurement and faster process. In collecting spatial features and landscape data of traditional villages, recording spatial data comprehensively and accurately is still an urgent challenge that needs to be solved. Therefore, digital preservation technology is crucial in preserving traditional village heritage.

In the face of these challenges, researchers have strongly called for adopting automated digital technologies [15] to improve conservation efficiency and reduce costs. The ongoing advancement of innovative technological innovations, particularly the progressive integration of unmanned aerial vehicle (UAV) systems, has enabled oblique photogrammetry-based approaches to emerge as a novel methodology for acquiring geospatial information from traditional villages across localized and regional levels [16]. In contemporary research, unmanned aerial vehicles (UAVs) have gained extensive application in capturing architectural mapping imagery [17], demonstrating notable advantages: they can quickly generate routes and capture area images, generate high-density 3D point cloud data, and are inexpensive, making them an ideal choice for solving traditional manual surveying problems [18,19]. As an automated and high-precision method, UAV measurement technology can provide accurate surface coordinates and 3D environmental data [20] and is easy to operate and efficient. UAV sensors and platforms have been used in various applications, including rural perimeter hazard monitoring, emergency management, and traffic monitoring [21,22]. In addition, based on point cloud data acquired by UAVs, it can also be used to monitor and map landslides, helping to identify high-risk areas for natural disasters [23]. The application of digital surveying technology has not only penetrated agriculture and forestry but has also demonstrated significant value in heritage conservation, ecological research, and landscape analysis [24]. Among them, topographic surveying and 3D modeling are some of the most common applications of this technology [25]; for example, 3D orthophotos and digital surface models (DSMs) were generated using UAV surveying technology by Lizuka et al. [26]. The accuracy of the spatial data are considered particularly critical in this process, especially during the initial data collection, site surveys, and 3D modeling stages [27]. Highly accurate visualization models can be generated using these techniques, providing solid and reliable data support for the collection and analysis of the spatial characteristics of traditional villages. With the continuous advancement of digital technology, spatial pattern analysis of traditional settlements has gradually become essential in the study of urban and rural development and cultural heritage protection.

It is worth noting that the extraction of spatial elements is the key to the spatial morphology analysis of traditional villages. The point cloud segmentation technique can reveal the structural features, road system, and landscape features of the village space, thus laying the foundation for spatial morphology analysis. Using the point cloud data generated from images collected by UAVs, more detailed information about the study area can be obtained. In recent years, deep learning algorithms have played an important role in rural heritage conservation [28]. Many scholars have proposed deep learning-based point cloud segmentation methods. For example, Marco et al. investigated the application of machine learning in important heritage contexts, which developed a predictive deep learning model (DLM) that can be adapted to large landscape heritage environments and heterogeneous data scales [29]. Aissou et al. proposed a connected domain analysis method and combined it with a support vector machine (SVM) for 3D point cloud segmentation [30]. Liao et al. achieved accurate point cloud segmentation and improved the time efficiency of the segmentation process by combining Conditional Random Fields (CRF) with Random Forests (RF) [31].

On the other hand, Haznedar et al. proposed a PointNet-based point cloud segmentation method for heritage buildings, using five feature tokens for training [32]. After the pioneering studies of PointNet [33] and SparseConv [34], many complex neural network architectures have been proposed one after another [35,36,37,38,39,40,41]. In addition, recent models such as PointCNN have emerged, demonstrating remarkable capability in capturing local geometric features from unordered point sets, thereby further enhancing segmentation performance. These developments significantly improved the accuracy and efficiency of the semantic estimation of point clouds [36]. However, these methods also have certain limitations and usually require manual design and extraction of classification features, affecting the model’s accuracy. To overcome these limitations, Hu et al. proposed a fully supervised point cloud segmentation method, RandLA-Net, which achieves large-scale and efficient semantic segmentation through random point sampling and can directly infer the semantics of each point in a large-scale point cloud [38]. Based on RandLA-Net, Yang et al. further proposed the SQN weakly supervised scheme, which implicitly increases the total number of available supervised signals by exploiting the semantic similarity between neighboring points and thus performs weakly supervised semantic segmentation of a large 3D point cloud using only 1000 labels [42]. Despite their encouraging results on various datasets, several issues remain to be addressed.

In summary, while existing digital heritage preservation methods have made progress in data acquisition and documentation, they exhibit three critical limitations: (1) Traditional photogrammetric approaches often oversimplify the spatial hierarchy of settlements through planar projections, failing to capture the vertical dimension of architectural ensembles. (2) Supervised learning-based segmentation methods rely heavily on pre-labeled datasets that are particularly scarce for vernacular architectures, resulting in poor generalization across diverse settlement typologies. (3) Current spatial pattern analyses predominantly employ static 2D indicators (e.g., space syntax) that inadequately characterize the dynamic interactions between three-dimensional built forms and cultural landscapes.

This study aims to address the aforementioned shortcomings through methodological innovation. By integrating low-altitude, multi-view UAV photogrammetry, we have established a multi-scale 3D recording framework that balances both the macro settlement patterns and the micro architectural details. Utilizing digital technology, we acquire precise spatial data, and by incorporating the innovative weakly supervised point cloud segmentation algorithm, DDLA, we achieve accurate segmentation of building categories, public spaces, and transportation networks within large traditional villages, thereby laying a solid foundation for in-depth spatial morphology analysis.

1.2. Research Aim

The conservation of traditional heritage villages faces multifaceted challenges that demand innovative and scalable solutions. First, the spatial forms of heritage villages are increasingly threatened by imbalanced urban–rural development and inadequate management practices, resulting in fragmented landscapes and blurred cultural identities. Conventional data collection methods, such as manual surveying, exhibit limitations in efficiency, accuracy, and objectivity, while also posing risks of physical damage to fragile heritage structures. Second, existing analytical frameworks for spatial morphology often lack systematic and granular insights, relying predominantly on macro-level assessments that fail to capture nuanced spatial elements such as building dimensions, road networks, and vegetation distribution. Third, although advanced deep learning methods have enhanced data segmentation capabilities, their integration into robust analytical workflows—particularly for large-scale, complex village environments—remains underdeveloped. Finally, traditional point cloud segmentation techniques, which predominantly depend on fully supervised algorithms, necessitate extensive manual labeling. This process is not only time-consuming and labor-intensive but also impractical for heritage sites with limited prior documentation.

The protection of traditional heritage villages faces multifaceted challenges that demand innovative and scalable solutions. First, the spatial forms of heritage villages are increasingly threatened by urban–rural development imbalances and inadequate management practices, leading to fragmented landscapes and blurred cultural identities. Conventional data acquisition methods, such as manual surveying, suffer from inefficiency, low precision, and subjectivity, while also posing risks of physical damage to fragile heritage structures. Second, existing analytical frameworks for spatial morphology often lack systematic and granular insights, relying on macro-level assessments that fail to capture nuanced spatial elements like building dimensions, road networks, and vegetation distribution. Third, while digital technologies like UAV photogrammetry have advanced data collection capabilities, the integration of these datasets into robust analytical workflows—particularly for large-scale, complex village environments—remains underdeveloped. Finally, traditional point cloud segmentation methods, often reliant on fully supervised algorithms, require extensive manual labeling, which is time-consuming, labor-intensive, and impractical for heritage sites with limited prior documentation.

This study directly addresses these challenges through a synergistic application of UAV-based digital measurement and an innovative weakly supervised point cloud segmentation method (DDLA). UAV low-altitude multi-view measurement technology enables the rapid, non-contact acquisition of high-resolution spatial data, overcoming the inefficiencies and inaccuracies of manual surveys. By generating millimeter-precision 3D models and orthophotos, this approach ensures comprehensive spatial documentation without compromising heritage integrity. Crucially, the proposed DDLA, building upon the SQN [42] framework, tackles the limitations of conventional segmentation techniques by significantly reducing dependency on labeled training data. Through weakly supervised learning, DDLA leverages semantic similarities between neighboring points to achieve precise classification of large-scale point clouds—extracting critical spatial elements (e.g., buildings, roads, vegetation) with minimal human intervention. This innovation not only enhances computational efficiency but also improves scalability, enabling detailed spatial morphology analysis even in data-scarce contexts.

By integrating these technologies, this research establishes a systematic workflow for quantifying spatial morphology at both macro and micro levels. The method directly mitigates subjectivity in spatial analysis by providing objective, data-driven metrics for architectural dimensions, spacing, and road widths. Furthermore, the scalability of DDLA ensures adaptability to diverse village layouts, supporting comparative studies and heritage conservation strategies across regions. Ultimately, this approach bridges the gap between advanced digital documentation and actionable heritage preservation insights, offering a replicable framework for safeguarding traditional villages’ cultural and spatial authenticity while fostering sustainable development.

The study’s outcomes aim to redefine spatial analysis paradigms in heritage conservation, providing a foundation for evidence-based policymaking and technical interventions. By addressing critical gaps in data acquisition, processing, and interpretation, this work advances the scientific rigor of heritage studies and strengthens the global discourse on balancing cultural preservation with contemporary development needs.

2. Study Area

The West Songbi Traditional Settlement is situated in the southern region of Jiexiu City, Shanxi Province, China, with its precise geographical location illustrated in (Figure 1). The village is bounded by Jiexiu City on the north, Mianshan Mountain on the south, and Jingkun Expressway and Daxi High-Speed Railway on the west. It is located on the southwest side of Longfeng Town and is bounded by the Zhangbi Ancient Fortress on the east. The village of Xisongbi has a long history. Its origins date back to the Neolithic period. At the end of the Sui Dynasty and the beginning of the Tang Dynasty, the Song people in Qufu, Shandong Province, moved here to escape war and famine. With the gradual increase in population and the limited carrying capacity of the land, relocation to relatively higher and flatter terrain in the north was undertaken by the villagers. After the middle of the Tang Dynasty, people with the surnames Gao, Rong, and Hou settled here, and the village gradually expanded, forming the prototype of a village with “a Song family street, a small piece of Gao family, and a Hou family alley”. Later, due to the need for defense, the village was surrounded by a wall, forming a square village pattern. In the 1990s, the village expanded to the north, and the “Back Street” road was built, forming the overall layout of the village.

Located on the Loess Plateau, Xisongbi Village is situated on high ground surrounded by gullies and ravines, and the settlements are arranged in a network. In addition to residential buildings, the village also preserves important historical buildings such as the Xingguo Temple, Sanguan Temple, Ming and Qing Schools, Tiandi Temple, Longwang Temple, etc. These buildings are scattered all over the village, containing rich historical and cultural connotations, and serve as witnesses of historical and cultural heritage. They are regarded as some of the most representative of the residential buildings in Shanxi in the year 2023. The village was officially listed as the sixth batch of traditional Chinese villages. However, West Songbi Village has also faced some challenges with the development of urbanization. The decline in population and the aging problem have led to a series of problems, such as the collapse of traditional buildings, overgrown courtyards, and haphazard demolition (Figure 2), which has led to the destruction of traditional spatial forms. which has led to the destruction of traditional spatial forms. Planting areas have been created unorganized, and natural landscapes such as old buildings have collapsed, resulting in the loss of the original character of the village’s original buildings and street spaces. Due to the complex structure of the villages, the collection of spatial and temporal development data of villages over time is challenging [43]. Therefore, real-time and accurate spatial data collection and extraction methods are urgently needed. The in-depth analysis of its spatial pattern is crucial for the conservation and development of West Songbi Village. Therefore, it is urgent to study the analysis method of collecting and extracting data on its spatial morphology. With the application of digital technology and point cloud segmentation technology as the starting point, this study attempts to analyze the spatial morphology of Xisongbi Village, extending the research methodology of the spatial morphology of traditional villages.

3. Methods

3.1. Research Methodology

Based on the data sources (Table 1), the research flow of this paper is shown in (Figure 3), which consists of three main phases. First, route planning and ground control point (GCP) determination are performed, and low-altitude UAV multi-view photography is used for image acquisition and correction, creation of real-world 3D models, and the generation of orthophotos and point cloud data. Second, through the point cloud segmentation method proposed in the paper, which is based on the SQN [42] weakly supervised point cloud segmentation algorithm based on the innovation of the DDLA weakly supervised point cloud segmentation algorithm to segment and classify the point cloud data, and form an accurate point cloud classification model. The obtained data are used to extract the spatial morphology elements of traditional villages precisely. Finally, based on the extracted spatial elements, a systematic analysis was carried out to analyze the spatial morphology of the settlement in terms of architecture, streets and lanes, space, topography, etc., at macro, meso, and micro levels, and to propose the corresponding protection countermeasures, to provide a scientific basis and research method for the protection of the spatial morphological features of traditional villages.

3.2. Acquisition and Processing of Spatial Information

Data Acquisition and Processing

(1) Mission and data acquisition

The spatial morphology analysis of the Xisongbi settlement presents significant challenges due to the inherent limitations of traditional two-dimensional cartographic representations, which fail to visually and three-dimensionally articulate critical spatial features such as traditional building distributions, street network characteristics, and landscape configurations. To address this, this study employs an integrated digital workflow encompassing UAV-based photogrammetry, high-precision geospatial data processing, and computational optimization to systematically capture and interpret multidimensional spatial information.

In this study, the DJI Mavic3E UAV was selected as the primary device for image acquisition (Figure 4), and the detailed parameters of the UAV are provided in (Table 2). The UAV is characterized by easy operation and is equipped with a fisheye lens that realizes full-view perception. The system incorporates configurable safety thresholds and deceleration parameters to enable continuous tracking of display coordinates, stabilization platform orientation, elevation metrics, and velocity vectors during operations [44]. To fulfill the operational demands for high-precision geospatial data acquisition and analytical processing in West Songbi’s heritage preservation initiatives, a ground sampling distance (GSD) of 30 mm was specified, with an optimized flight elevation of 60 m in the mission configuration. Preventive measures against motion-induced artifacts and frame duplication in photogrammetric workflows require precise synchronization of aerial platform velocity with sensor configurations [17]. Operational parameters included 85% lateral and longitudinal overlap ratios and a cruise velocity of 2 m/s, with the latter’s overlap percentage derived through computational optimization of imaging sensor characteristics:

V = \frac{H S_{h} (1 - O_{h})}{f T_{s}}

where the sensor height is

S_{h}

, the heading overlap rate is

O_{h}

, and the exposure time interval is

T_{s}

.

The DJI MAVIC 3E UAV was selected for this study due to its various advantages, especially for operating single-lens multi-rotor UAVs, which can satisfy the need for tilt shots with a single lens. The flight path included two inclined and a series of close-in stereo flight paths. Among them, the low-altitude tilt photography flight path was designed in the shape of an “S”, and the camera angle was set at 45 degrees, covering the whole study area, as shown in (Figure 5). The close flight paths were targeted at other historic buildings, as shown in (Figure 6), to ensure that all the image data required for low-altitude photography was acquired. During the mission, 21,686 tilt images and 15,230 close-up images were taken.

(2) Error estimation

In order to obtain raster imagery with precise geographic coordinates, ground control points (GCPs) must be set. These GCP datums must be plotted on the ground before the mission and be clear and easily recognizable. Ground control points (GCPs) should be evenly distributed throughout the study area, and selected locations should be free of obstructions and easily accessible from the field for accurate identification in aerial photographs. This helps to ensure the accuracy and reliability of measurement data in the study area, especially when performing high-precision measurements and geographic information data processing. Twenty-one ground control points were strategically distributed across the research region. The Trimble R10 GNSS receiver employing real-time kinematic (RTK) GPS technology was implemented to achieve precise georeferencing of both horizontal and vertical coordinates for these markers [45,46]. This advanced geodetic instrument demonstrates exceptional measurement capabilities, with planimetric positioning accuracy at control point centroids ranging between 0.012 m and 0.043 m, while altimetric precision maintains consistency between 0.020 m to 0.045 m of the tolerance levels. Finally, the acquired coordinate data and images are imported into the subsequent processing software for further analysis.

In the geo-referencing phase of this research, 21 evenly distributed and accurately characterized GPS ground control points were utilized within the study region (Figure 7). To evaluate the accuracy and reliability of the results, PXi4D software (version 4.5.6) was utilized to analyze the ground control points (GCPs) in the 3D model. The deviation values obtained from the coordinate system were used, with all 21 GCPs being employed as reference points, and their exact coordinates were provided in (Table 3). It was found through the analysis that the root mean square error (RMSE) was 0.029 m horizontally and 0.036 m vertically. The precision requirements for the spatial feature extraction analysis of traditional villages were met by these error metrics (Figure 8).

3.3. Point Cloud Segmentation Processing

3.3.1. DDLA Concepts

In deep learning, a new weakly supervised learning method for the semantic segmentation of sizable 3D point cloud data is proposed by combining the self-attention module in DLA-Net [38] with the SQN algorithm. In this approach, the point cloud data are pre-processed, including down sampling and normalization, to reduce the computational complexity. Then, the self-attention module of DLA-Net, including the self-attention block and the attention pooling block, is used to capture the local neighborhood information in the point cloud by augmenting the position encoding block. The self-attention block focuses on learning the feature representation of the local neighborhood around each point, while the attention pooling block aggregates these local features and automatically focuses on the important local features. The SQN algorithm [36] does this by encoding the entire raw point cloud into a set of hierarchical latent representations, querying these representations in a local neighborhood, summarizing the queried representations into a compact vector, and, finally, predicting the final semantic labels using multilayer perceptrons (MLPs). This combined approach enables local features and structural information to be more effectively captured by SQN during the processing of point clouds, resulting in the accuracy and precision of segmentation being improved. In the training phase, a small amount of labeled point cloud data are used to train the model. Moreover, in the inference phase, the model can perform semantic segmentation on the new point cloud data, maintaining high performance even with very limited labeled data. Through experimentation and parameter tuning, the model structure and training strategy are optimized to further enhance the generalization ability and robustness of the model. Efficient and accurate semantic segmentation of large-scale 3D point cloud data is achieved under weakly supervised conditions, and the workload of manual annotation is significantly reduced, while the performance of the segmentation task is maintained or improved. Through experimentation and parameter tuning, the model structure and training strategy are optimized to further improve the generalization ability and robustness of the model, which achieves efficient and accurate semantic segmentation of sizeable 3D point cloud data under weakly supervised conditions and significantly reduces the workload of manual annotation while maintaining or improving the performance of the segmentation task. The specific structure is shown in (Figure 9).

The methodology of this study is characterized by four main contributions, which are outlined as follows:

(1): In this study, the self-attention mechanism and the attention pooling block are introduced to improve the feature expression ability by combining local aggregation and spatial location coding.
(2): We improved the handling of weakly monitored labels (unmarked points, ignored labels, etc.) to reduce unnecessary interference.
(3): Through the innovative weakly supervised point cloud segmentation algorithm, we can obtain finer and more accurate point cloud segmentation results and more efficiently and accurately extract and analyze the spatial elements of the studied traditional village areas.
(4): These innovative points improve the model’s learning ability under weakly supervised conditions, especially when dealing with complex data such as large point cloud data, which increases the model’s robustness and accuracy.

3.3.2. Self-Attention Block

The self-attention mechanism is inherently well-suited for processing point cloud data, as point clouds fundamentally consist of sets situated within an irregular metric space. The self-attention block operates through pairwise self-attention calculations, employing subtraction as the relational function and incorporating positional encoding into both the mapping function η and the transformed features γ. The structure of our self-attention blocks can be expressed as follows:

F_{i} = \sum_{k = 1}^{K} R e L U (η (α (f_{i}) - β (f_{i}^{k}) + c_{i}^{k})) ⊙ (γ (f_{i}^{k}) + c_{i}^{k})

(1)

where

F_{i}

is the output feature, obtained by the Hadamard product. Parameters

α

,

β

and

γ

are multilayer perceptual machines (MLPs) realized through a linear layer, respectively. The mapping function

η

is an MLP consisting of two linear transformations with ReLU activation.

c_{i}^{k}

is the position encoding. The output feature

F_{i}

of the self-attentive block represents the new set of neighboring features that can explicitly encode the geometric structure of the centroid

P_{i}

of the local feature set. Ultimately, the output feature

F_{i}

is further processed by a batch normalization layer with ReLU activation.

3.3.3. Attention Pooling Blocks

The local features are automatically learned, and their understanding is enhanced through continuous optimization by the attention mechanism. To further mine local features, we employ an attention pooling block structure. Given the output of the self-attention block

F_{i}

and the location encoding

c_{i}^{k}

, we utilize a relational function for aggregation. The process is formally defined as follows:

{\hat{F}}_{i} = η (F_{i}, c_{i}^{k}) .

(2)

In the attention pooling block, we use the connection operator ⊕ as the relation function

η

. To obtain the aggregated feature set

{\hat{F}}_{i} =

{{\hat{f}}_{1}^{i}, \dots, {\hat{f}}_{k}^{i}, \dots, {\hat{f}}_{K}^{i}}

, a shared function

φ (\cdot)

is designed, which consists of a shared multilayer perceptual machine MLP and softmax operations. The learnable weights

W_{i}

in the shared MLP are used to assign a unique attention score to each feature. The learned attention score can be considered an adaptive mask that automatically selects key features. Then, the features are weighted and summed according to the attention scores to generate a new information feature vector

{\overline{F}}_{i}

. The process is represented as follows:

{\overline{F}}_{i} = \sum_{k = 1}^{K} φ ({\hat{f}}_{i}^{k}, W_{i}) ⊙ {\hat{f}}_{i}^{k}

(3)

where

⊙

is the Hadamard product.

4. Experiment

4.1. Comparative Experiment

The output feature is obtained through learned Hadamard-based interaction operations within the self-attention mechanism. The model parameters and multilayer perceptrons are implemented through a series of linear transformation layers followed by nonlinear activations. The mapping function is an MLP consisting of two linear transformations with ReLU activation. Position encoding is explicitly incorporated through learned coordinate embeddings (4D vectors per point) that concatenate with the point features before attention computation. This enables the self-attentive block to explicitly encode geometric structural relationships while preserving local neighborhood characteristics.

The proposed DDLA model synergizes DAL-Net’s global-local feature interaction mechanism with SQN’s weakly supervised propagation paradigm, establishing a multimodal collaborative training framework. During training, a two-stage optimization strategy is employed: First, centimeter-level point cloud data acquired by the DJI Mavic 3E undergoes data augmentation through random rotation, density perturbation, and color encoding. The DAL-Net branch captures component-level local geometric features via self-attention modules incorporating learnable relative positional encodings, while hierarchical attention pooling generates global semantic representations. Concurrently, the SQN branch extracts multi-scale features using a pre-trained RandLA-Net encoder, constructs cross-scale query networks through K-nearest neighbor graphs, and drives weakly supervised feature propagation by regressing semantic labels from 10% sparse annotated points via differentiable nearest neighbor matching. Dual-stream features are concatenated at cross-layer fusion points, with final outputs optimized through a dynamically weighted combination of full-supervision cross-entropy loss (CE) and weakly supervised contrastive loss (contrastive loss). The CE incorporates class frequency weighting to address building class imbalance, whereas the contrastive loss employs InfoNCE for enhanced feature discrimination. The training process employs the Adam optimizer with an initial learning rate of

10^{2}

⁻ and a custom learning rate decay schedule where the learning rate decreases by 5% after each epoch, totaling 100 epochs. Batch normalization layers and gradient clipping (grad_norm = 5.0) are incorporated to stabilize training.

The proposed DDLA framework demonstrates significant advancements over existing methods like RandLA-Net and the SQN approach in both architectural heritage segmentation accuracy and nuanced feature representation. While RandLA-Net excels in processing large-scale point clouds through random sampling and localized feature aggregation, DDLA addresses critical limitations by introducing a hierarchical attention mechanism that captures multi-scale contextual information—from room-level spatial arrangements to object-level decorative details (e.g., roof ridges, window carvings)—a capability absent in RandLA-Net’s architecture. As shown in Table 3, we compare the three segmentation approaches, listing the mIoU scores of all models used to distinguish between the six categories. The bolded parts indicate the advantages of DDLA on several modules. The results show that with only 0.01% weak labeling, DDLA achieves a mIoU of 63.55%, surpassing the 53.95% of the SQN method. Due to the challenges associated with segmenting traditional and new buildings, a segmentation accuracy of 49.5% is achieved by the SQN method, whereas an accuracy of 62.3% is attained by our method. In general, significant improvement in model performance is demonstrated by DDLA when compared to the weakly supervised approach of SQN [36]. The experimental results demonstrate substantial improvements over the baseline SQN method across all semantic categories, achieving a significant 17.8% absolute increase in mean IoU (from 53.95% to 63.55%). Our method achieves 62.3% mIoU for traditional buildings (vs. 49.5% in SQN), demonstrating superior capability in preserving heritage architectural elements through enhanced global-feature alignment via the proposed DAL-Net integration. The vegetation class also shows a remarkable 21.1% improvement (57.58% vs. 46.44%), likely attributed to the attention pooling mechanism’s sensitivity to fine-grained texture variations. Despite challenges in alleys and streets (only 1.34% mIoU gain), our framework maintains robustness through multi-scale feature fusion, validating its applicability to irregular urban layouts. The proposed method reduces the inter-class performance gap, alleviating the class imbalance issue prevalent in heritage datasets where dominant classes (e.g., traditional buildings) often overshadow minor ones (e.g., courtyards).

For transparent comparison, all experiments used identical protocols: (1) shared datasets (S3DIS and custom-collected Xisongbi Village) preserving spatial resolution; (2) strict weak supervision with 0.01% labeled data through stratified sampling; (3) consistent preprocessing; and (4) fixed hyperparameters. The evaluation adopted mIoU with per-class IoU decomposition, ensuring metric uniformity across methods.

This study leverages the S3DIS dataset (Stanford Large-Scale 3D Indoor Spaces Dataset), a widely adopted public resource comprising 272 indoor scenes from six buildings with over 695 million points, each annotated with 13 semantic categories (e.g., ceiling, wall, furniture). The dataset’s centimeter-level precision and comprehensive annotations make it a gold standard for 3D semantic segmentation validation. The S3DIS dataset’s open-source toolchain and standardized evaluation protocols ensure reproducibility, enabling direct comparison with baseline methods. Although S3DIS consists of indoor scenes, the macro-scale architectural structures (such as walls and doors/windows) and micro-scale objects (such as furniture and lamps) within it form a natural multi-scale feature distribution. The class imbalance in S3DIS and its sparse annotation requirements closely align with the practical challenges in cultural heritage contexts, where artifact point cloud annotation costs remain prohibitively high. The applicability of S3DIS stems not from its geographic coverage but from its multi-scale feature architecture, sparse annotation challenges, and standardized validation framework, which collectively provide methodological support for heritage preservation scenarios. By integrating self-collected datasets with innovative model architectures, this study demonstrates the feasibility of transferring knowledge from indoor to outdoor environments. It further reveals the latent value of weakly supervised learning in cultural heritage digitization, where such approaches can significantly reduce reliance on costly manual annotations while maintaining robust segmentation performance across diverse spatial scales and environmental conditions. We also validate our method on self-collected outdoor datasets to demonstrate cross-domain generalization. The S3DIS dataset, despite its similarities to architectural structures in cultural heritage contexts, is primarily designed for indoor scenes and lacks outdoor urban or territorial data. Additionally, it was not tailored for cultural heritage preservation, resulting in mismatches between its annotation categories and data distribution and those required for cultural heritage scenarios. Given the scarcity and incompleteness of high-quality annotated datasets in cultural heritage preservation, we supplemented the S3DIS dataset with a self-collected dataset from Xisongbi Village. This dataset, with its higher spatial resolution and richer cultural heritage features, better validates the model’s performance in real-world cultural heritage applications. This project provides an efficient and flexible TensorFlow implementation to solve the problem of weakly supervised semantic segmentation of large 3D point clouds, especially for large datasets like S3DIS. Its innovation is maintaining high segmentation accuracy even with a very low percentage of labels (Table 4).

Future improvement directions include targeted data augmentation to develop geometry-preserving synthetic point cloud generation for underrepresented classes, such as courtyards, in order to mitigate dataset bias. Multi-modal feature fusion will be explored by integrating LiDAR intensity channels and temporal consistency for dynamic scenes to enhance the robustness of segmentation.

4.2. Generalization Experiment

To validate the generality and robustness of the proposed weakly supervised DDLA point cloud segmentation method in real-world heritage conservation scenarios, we conducted additional experiments in Hongshan Traditional Village, Shanxi Province. This village is a complex rural settlement characterized by dense historical buildings, irregular street networks, and heterogeneous vegetation distributions.

Using a DJI Mavic 3E UAV, we acquired high-density 3D point cloud data comprising 12.8 million points over an area of 0.6 square kilometers. Only 0.01% of these points were manually labeled into six semantic categories: traditional buildings, streets, new buildings, courtyards, vegetation, and others.

We directly applied the pre-trained DDLA model (trained on the S3DIS and Xisongbi datasets) without fine-tuning to assess its zero-shot adaptation capability. The model’s hierarchical attention mechanism captures multi-scale geometric patterns, while the weakly supervised contrastive loss ensures robust feature discrimination even under sparse annotation constraints. As shown in Figure 10, DDLA achieved precise segmentation of the primary heritage components.

5. Results

5.1. Orthophoto and 3D Model

In this study, uncrewed aerial vehicle (UAV) technology was used to acquire a centimeter resolution digital elevation model (DEM) and spatially analyze the surroundings of the heritage building. Before performing the spatial morphology analysis, the digital surface model (DSM) was first constructed, which [47] provides more reliable data in the spatial morphology analysis and helps to improve the interpretation accuracy of the spatial element extraction results. To obtain spatial data, including orthophotos, 3D models, and point cloud data of the Western Song wall, we imported the image data with coordinate positioning information into ContextCapture mapping software (v.4.4.10) for 3D model reconstruction. The specific workflow is as follows: initially, uniform color correction is applied to all images. Subsequently, the adjusted image data are imported into ContextCapture, where the CMOS dimensions, focal length, and pixel details of the UAV camera are input to perform aerial triangulation. Through this process, sparse and dense point clouds are generated. The dense point cloud is then used to construct a triangular irregular network (TIN), onto which the image texture is mapped, leading to the generation of a 3D model (Figure 11).

In addition, this study used Pix4D software to generate orthophotos. After importing the data, the coordinate positions of the control points in the images were parametrically calibrated and optimized with the ground control points (GCPs) measured in the field [48]. Eventually, the orthophotos generated from traditional villages can reach a ground resolution of 2 cm (Figure 12).

5.2. Point Cloud Segmentation Results

This paper introduces the DDLA self-attention and attention pooling blocks into the SQN [36] framework to capture the local neighborhood information in the point cloud more efficiently by enhancing the location coding module. This approach enables local features and structures to be more precisely captured during the processing of point clouds, resulting in the accuracy and precision of segmentation being improved. In addition, our model uses training labels as small as 1‰ on several large open datasets. It still achieves close to fully supervised accuracy. The qualitative results of our method are illustrated in (Figure 13).

Compared to the SQN approach [42], our model achieves more advanced segmentation performance, validating the effectiveness of the enhanced location coding, self-attention, and attention pooling modules. It is worth noting that the model still performs well under very low supervision, reflecting its strong robustness in weakly supervised learning scenarios.

Based on the segmented point cloud data, point cloud processing technology was employed to analyze and quantify the point cloud distance, slope, morphology, and other features within each layer. The point cloud data were processed in layers and then analyzed further from macro to micro level for the traditional village of Xisongbi.

5.3. Spatial Form Featurepoint Cloud Segmentation Results

In this study, using a point cloud segmentation method, we classified the spatial elements of the village into three categories: buildings, roads, and others. We then extracted the spatial data for each category separately. Buildings include traditional, historical, and new structures; roads are divided into main streets and alleys; and the “other” category encompasses vegetation, cultivated land, and terrain. For practical application, we used UAV-acquired image data of West Songbi Village to reconstruct a 3D model through ContextCapture. This process involved applying photogrammetry and multi-view stereo (MVS) techniques, where feature points were extracted from overlapping images, matched to estimate camera positions, and used to generate a dense point cloud and textured mesh. The spatial data were then extracted directly from the 3D model using measurement tools in Acute3D Viewer to visualize different spatial elements. By selecting a specific area, we were able to accurately calculate an element’s area, perimeter, and distance. In preliminary analyses, we efficiently extracted key spatial attributes such as the structural characteristics of buildings, the extent and morphology of water bodies, the distribution of vegetation, the area of cultivated land, and the dimensions of roads and alleys. To ensure the accuracy of spatial data extraction, we further integrated the segmented point cloud data with point cloud processing techniques to refine measurements of distance, slope, and morphological features for each layer of the point cloud.

6. Discussion

In the discussion section, we applied the research strategy proposed in Section 3, utilizing low-altitude multi-view digital measurement techniques to collect data from Xisongbi Village. This approach enabled the generation of a 3D model, orthophoto maps, and point cloud data. Subsequently, we employed our proposed DDLA weakly supervised point cloud segmentation method to analyze the spatial morphology. Specifically, the 3D model facilitated the analysis of architectural features, orthophotos were used for coordinate correction and data refinement, and point cloud data allowed for the quantification of the village’s topography, architectural texture, street distribution, and public spaces at macro, meso, and micro levels using CloudCompare point cloud analysis software. Based on these analyses, we propose corresponding conservation strategies, aiming to provide a scientific foundation and practical guidance for the preservation and sustainable development of Xisongbi Village’s cultural heritage.

6.1. Macro-Analysis

The spatial morphological characteristics of the present study area can be obtained using the low-altitude multi-view photographic model. First, the live 3D model data are transformed into village orthophoto and digital elevation model; then, the point cloud 3D model is accurately segmented by the DDLA weakly supervised segmentation method, and the accurate classification of village morphology elements is generated; then, the ground and above-ground elements are separated and extracted, and the 3D models of multiple subsystems, such as traditional buildings, vegetation, roads and lanes, newly built buildings, yards and cultivated land, etc., are obtained (Figure 14), highlighting the subtle changes in the village’s spatial morphology, which can clearly and intuitively reflect the natural geomorphological characteristics of the village.

The settlement pattern of Xisongbi Village is agglomeration, and the village is distributed in a net shape, and the general plan shape is similar to a trapezoid. Within the study area of 3.7852 km², buildings occupy 24%, 4% of roads, 35% of cultivated land, and more or less 31% of vegetation (Figure 15). Second, analyzing the environmental and morphological elements of the villages, it can be found that the traditional village settlements of Xisongbi are clustered on the loess terrace wall, with a unique ecological landscape system, where the entire terrain is high and surrounded by ditches on all sides. It can be seen that the village area of Xisongbi Village is significant; the village as a whole is located in a semi-hilly area, ditches on all sides surround the village, the terrain is high in the east and low in the west, the village is adjacent to Longfeng River, and the loess soil. The vegetation and forest elements of the whole village account for a relatively high proportion, which is an important landscape component of Xisongbi Village, and in principle, attention should be paid to the protection of the ecological landscape function. The development of advantageous resources should be appropriately utilized to increase the economic benefits of the village. The village mainly faces disasters such as drought, landslides, mudslides caused by bad weather, etc. Water is relatively scarce, but it is also one of the important elements of the village layout that should be protected and utilized. In general, the arable land, vegetation, and the overall layout of the buildings are well preserved. However, the water system is not protected, and the original ponds are seriously polluted, which is the focus of the subsequent landscape improvement and landscape environment enhancement.

6.1.1. Landscape Zoning Study

Many traditional villages with a long history face the problem of the destruction of spatial forms, which is largely due to the lack of implementation of control measures for new buildings in the villages. Due to the lack of effective planning and management, new buildings often do not harmonize with traditional architectural styles, resulting in a mix of old and new buildings and a lack of unified planning and design. This mixed phenomenon not only affects the overall aesthetics of the village, but more importantly, it creates considerable difficulty in identifying and preserving the historic character area of the village. The definition of historic landscapes requires clear visual and spatial boundaries, a task made more complex and difficult by the unorganized mix of old and new buildings, which in turn affects the preservation and transmission of this precious cultural heritage. Consequently, the research leverages the predominant architectural traits of Xisongbi’s traditional villages, where historic buildings are predominantly distinguished by their sloping roofs and considerable height. Utilizing the weakly supervised point cloud segmentation technique, DDLA, introduced earlier, the study computes the slope and elevation values relative to the ground from the point cloud data, following the stratification of building layers. The building type of West Songbi can be divided into two categories: historical structures and contemporary constructions (characterized by flat roofs, lower heights, and predominantly constructed during the 1980s and 1990s). Historical buildings encompass culturally significant heritage sites and traditional dwellings (featuring sloping roofs, greater heights, and primarily erected during the Ming and Qing dynasties). Subsequently, the 3D reality model was integrated to refine and correct the building data, resulting in a distribution map illustrating the two building categories (Figure 16). Building upon this foundation, software specialized in point cloud analysis was utilized to perform precise point-to-point distance measurements across the point cloud layers of both architectural categories. This process yielded a distance heat map, effectively illustrating the spatial dynamics between historic and contemporary edifices (Figure 17). Finally, by adjusting the color threshold of the thermal map, the influence area and the boundary line of the historic buildings are determined, which provides a reference basis for delineating the core area of the historic features [49].

Through the above analysis, combining the absolute protection difficulty and unique geographical environment of Xisongbi Village, the distribution of historical buildings and the distribution of historical environmental elements and other factors can be seen: the red area in Figure 16 is designated as the core landscape area, the yellow and green parts with traditional buildings as the center of outward influence are set as the landscape coordination area, and the blue area, i.e., the part of the village, is set as the construction control area, and finally forms the protection zone map of the village area, as shown in (Figure 18).

6.1.2. Characterization of Settlement Topography

The village is located on the loess terrace wall; the terrain is high, with its four sides surrounding the ditch, east of the plains and hills, south of Mianshan Mountain, the west side of the Beijing–Kunming Expressway Danyun high-speed rail line, and the north of the plain area. The overall terrain in the village is not hilly; the form is less affected by the topography and terrain, the road network and layout are square and neat, and the building layout is concentrated. The village’s traditional residential buildings are located in the center and south.

We analyzed the surface elevation data represented by the digital surface model (DSM), which shows the extent of ground, buildings, and vegetation distribution in the village space and surface elevations (Figure 19). The terrain elevation predominantly ranges from 855 m to 994 m, exhibiting a gradual decline from the eastern to the western regions. This suggests that the village’s construction adhered to the natural topography without substantial artificial alterations. In terms of spatial distribution, the western area lies at a lower elevation compared to the eastern side, which mitigates the obstruction of summer southwestern winds, thereby enhancing ventilation and cooling. Furthermore, a gentle slope of 1–2 m exists between the building clusters and the farmland, facilitating efficient rainwater drainage and effectively preventing waterlogging issues.

The topography of West Songbi has a slope between 0 degrees and 15 degrees. The extra fertile terrain ensures sufficient arable land for cultivation, essential production, and living. At the same time, it is conducive to constructing plazas and more significant buildings to provide a place for residents to socialize with each other.

When analyzing the overall characteristics of the settlement terrain, it was found that the original village had a terrace terrain pattern. However, after the village land was saturated, the population growth led to the construction of village houses, which could only be selected under the terrace and occupied part of the arable land, making the construction of arable land haphazard and destroying the original terrace pattern, resulting in high building density, less open space, limited agricultural and economic development, and other problems. In the subsequent spatial landscape planning, new buildings will be appropriately removed to restore the landscape form of the settlement, increase the open space, and eliminate potential safety hazards. Horizontal terrain analysis of the tilt photographs will also be used to sort out the layout pattern within the cluster, provide protection suggestions for building zoning, functional nodes, and road design of the unique terrain to promote the effective use of the village terrain and maintain and continue the original spatial morphology characteristics of traditional villages.

6.2. Meso-Analysis

6.2.1. Built Fabric Analysis

The structural arrangement of traditional villages can be examined through the analysis of building intervals and architectural shapes, which precisely assesses the density of the village layout. By determining the neighborhood distance values from the point cloud data of the building layers (Figure 20), it is observed that the building intervals in West Songbi Village are primarily within the range of 1 to 6 m. In the central landscape area, the intervals between buildings are typically under 0.6 m, while some newer constructions have intervals ranging from 1 to 2 m. The tightly packed and dense arrangement of buildings is a significant aspect of the architectural texture in Xisongbi Village. This analytical approach allows for an accurate and efficient understanding of the overall architectural characteristics of the traditional village, aiding in the subsequent preservation and restoration of building volumes, intervals, and related data.

6.2.2. Building Height Analysis

To maintain the traditional historical characteristics of the West Songbi settlement, we separately processed the point cloud data for historic and new buildings, estimating building heights based on the distribution of point cloud elevations above the ground. As illustrated in (Figure 21), the point cloud heights for historic buildings are primarily concentrated between 3.5 and 6.5 m, peaking around 5 m. This indicates that the heights of traditional buildings are generally around 5 m and seldom exceed 8 m. Analysis of the point cloud data for new buildings (Figure 22) reveals that their heights are distributed across 3, 4, 5, and 6 m, with a peak between 4 and 5 m. The density of point clouds decreases with increasing height, suggesting that most new buildings are single-story, with a few having more than one floor. Based on the point cloud analysis of building heights, the current landscape of the village can be comprehensively assessed, showing that the heights of most new buildings in Xisongbi Village align with the landscape requirements. Combined with the line of sight and the topography of various protected areas, the three-dimensional realistic model is used to screen the buildings that affect the landscape and propose remediation requirements. The peak height of historic buildings of 5m should be used as a reference element to set the height control requirements for the core landscape area. It is evident that performing a thorough statistical analysis of the point cloud data derived from the low-altitude photogrammetric model serves as an efficient method for establishing parameter thresholds and quantitatively assessing the landscape control requirements, offering a significant reference value.

6.2.3. Cyberspace Analysis of Streets and Alleys

Street space plays a crucial role in the traditional village landscape, and the ratio of building height to street width, known as the height-to-width ratio, serves as a key parameter for assessing spatial morphology [50]. (Figure 23) shows a map of road width variations and locations within West Songbi village. The width of the main roads in the village is generally in the range of 4 to 8 m, while the traditional roads are in the range of 2.5 to 4.2 m, and the lanes are in the range of 1.6 to 2.7 m. The height and width ratio of the streets and alleys in Xisongbi Village does not have a very obvious difference between the historical style core area and the newly built area; the height and width ratio of the traditional streets and alleys in the style core area is between 0.3 and 1, while the height and width ratio of the streets and alleys in the newly built area is between 0.5 and 1.2, which can be seen that the heights of the newly built buildings are similar to those of the traditional ones, forming a style core with comparable heights. Measuring and counting the scale of streets and lanes in Xisongbi Village through tilt photography technology can provide quantitative indexes of the scale of streets and lanes of traditional villages in a more scientific and systematic way, so as to provide the basis for the subsequent upgrading of the townscape.

6.3. Micro-Analysis

6.3.1. Analysis of Public Space

According to the analysis of architectural texture in the previous section, it is first estimated that there are fewer public spaces in the traditional village of Xisongbi. Through the analysis of three-dimensional model and landscape zoning, the more typical public spaces in the village are selected, such as Sanguan Temple, Ming and Qing Academy, Xingguo Temple, Longwang Temple, Tiandi Hall, villagers’ activity sites, the village name activity center, and garden recreation sites, and, combined with the analysis of field research, we can see the distribution of public spaces in the village, and we find that the public spaces are scattered in the village. Based on the spatial topology theory [51], the public space nodes are analyzed, extracted, and superimposed on the village road connectivity analysis, resulting in the connection of the public space nodes with high connectivity (Figure 24). The analysis results show that, except for the Ming and Qing Dynasties Academy, the Dragon King Temple, the villagers’ activity center, the landscape recreation space, the San Guan Temple, and the villagers’ activity place, the rest of the public spaces are distributed around the traditional architectural clusters, forming a relatively independent and weakly connected public space system.

These public spaces are concentrated and dispersed in the southwest, south, and southeast of the village; for the villagers in the northern area, the distance to use public spaces is greater, and landscape accessibility is poorer. However, the distribution of public spaces reflects a stronger sense of clan identity in Xisongbi. Therefore, it is essential to prioritize optimizing the public space layout of the Ming and Qing Dynasty Academy and the Dragon King Temple, the villagers’ activity center, and the garden leisure space, and the Three Guan Temple and the villagers’ activity site, and without destroying the characteristics of the development of the agglomeration, we need to appropriately dismantle the buildings of the landscape your coordinated area to increase the number of public spaces to improve the degree of connectivity of the public spaces of the villages.

6.3.2. Analysis of Architectural Features

In micro-level architectural landscape analysis, the color, scale, height, and spacing of buildings can be statistically analyzed by separating the point cloud model of a specific building and applying the previously mentioned methodology to study its type, exterior form, and dimensional characteristics. Most of the historic buildings in West Songbi Village are typical Shanxi Harmony residences, dominated by one- and two-part courtyard structures (Figure 25), and commonly have double-sloped roofs. Influenced by the local climatic conditions, these structures feature steeply pitched roofs with a gradient typically ranging from 25 to 30 degrees, facilitating efficient water runoff. Most building elevations have a 2:1 ratio.

Combined with the low-altitude photogrammetric technique mentioned above, the scale of some historic buildings in West Songbi Village was measured and numerically counted (Table 5). This can not only systematically quantify the scale indexes of the historic buildings but also clearly analyze the architectural style and provide a scientific basis for future restoration and conservation. Through the application of point cloud segmentation techniques for facade data analysis, a comprehensive repository of architectural stylistic components, including roofing systems, wall structures, entrance configurations, and fenestration elements, can be systematically compiled (Table 6). By extracting the point cloud data of the historic buildings, we can initially assess their landscape characteristics and scale relationships, laying the foundation for subsequent in-depth analysis.

7. Conclusions

Taking the traditional village of Xisongbi as the research object, this study explores the way of combining digital technology with the weakly supervised point cloud segmentation method to analyze the spatial form of the village in-depth and conducts a comprehensive analysis from the three perspectives of architecture, street, and public space. First, the point cloud data obtained by low-altitude photogrammetry was used to classify the point cloud of the village by the point cloud segmentation method, and then, based on the quantitative analysis of the accurately segmented point cloud data, a specific study was conducted on the architectural texture, wind zoning, road and lane width, and topographic features, respectively. The main conclusions of the study are as follows:

(1) Compared to traditional two-dimensional mapping methods, digital technology not only completes data acquisition quickly but also provides a more intuitive and comprehensive three-dimensional model. The digital surface model is obtained by low-altitude multi-view photogrammetry, and image processing technology is used to rapidly generate vectorized spatial morphology feature information such as orthophotos, three-dimensional real-view models, and local field-of-view models of settlements. The combination of ground control points (GCPs) measured in the field ensures the required accuracy of photogrammetric products. The data collection method in this paper applies to the acquisition of spatial data from traditional villages. It is highly efficient and accurate, which helps improve the efficiency of conventional village landscape research.

(2) The DDLA point cloud segmentation method proposed in this study is a weakly supervised 3D point cloud segmentation model that utilizes enhanced position coding, self-attention mechanisms, and attention pooling modules to improve the segmentation accuracy under a small amount of labeled data. An efficient and effective model is proposed by combining these techniques, which can achieve performance comparable to fully supervised methods under limited supervision. This approach opens new avenues for applying deep learning to large-scale 3D point cloud segmentation, especially when labeled data are scarce. The accuracy of spatial element segmentation is significantly improved by this method. Ultimately, multiple layers of 3D vector information, including terrain, buildings, roads, vegetation, and others, are generated, which can be overlaid and analyzed. This innovation eliminates interferences and focuses on single-system analysis. The identification of correlations among elements is also expanded, which aids in gaining a deeper understanding of the formation process of traditional village spatial forms and their design wisdom. This is especially important when studying villages with complex terrain.

(3) The limitations of traditional two-dimensional data analysis are broken by the methodology of this study. The quantitative analysis of spatial and environmental characteristics is completed through the extraction of different types of spatial units (e.g., buildings, roads, open spaces, etc.). Calculations based on point cloud data can provide precise statistics on the elements of the traditional village landscape. This can distill the overall characteristics and infer spatial patterns’ spatial and temporal evolution trends that are difficult to realize with traditional methods. This technical innovation is of great significance for analyzing traditional villages with dense buildings, complex topography, and large areas, improving the accuracy of spatial pattern analysis and providing important support for the regional protection of village spatial patterns.

A systematic research method, from data acquisition to spatial feature extraction, is provided in this paper. Key spatial elements such as building groups, public facilities, and transportation networks in villages are accurately identified by a spatial morphology analysis method that combines point cloud segmentation technology and digital technology. Based on the weakly supervised learning framework, this study significantly improves the accuracy and efficiency of point cloud segmentation by introducing a small amount of labeled data and a large amount of unlabeled data, and reducing the dependence on manual labeling. By further analyzing the segmentation results, the spatial structure model of Xisongbi Village is constructed, and the influence of its spatial morphology on village function, social structure, and historical and cultural heritage is explored. The spatial data acquisition and feature extraction methods proposed in this paper are widely applicable. They can be applied to the study of other traditional villages and provide technical support for subsequent research and conservation work. Innovative methods and technical support are provided by this technology to protect the spatial patterns of traditional villages, aid rural revitalization, and assist in the repair of architectural sites. Important academic value and application prospects are held by it.

Based on our findings, future research should explore the integration of additional technologies and methodologies to further advance digital heritage conservation. For instance, combining LiDAR with UAV photogrammetry may yield even higher-resolution 3D reconstructions, particularly by capturing vertical architectural details that traditional approaches often overlook. Moreover, employing advanced self-supervised or semi-supervised learning frameworks could further reduce the reliance on large, annotated datasets, thereby enhancing segmentation performance in diverse and complex village environments. Research could also focus on developing dynamic monitoring systems that utilize time-series data to track temporal changes in village morphology, offering real-time insights into the impacts of urbanization and environmental factors. These directions not only build on our current methodology but also hold significant promise for the sustainable protection and development of traditional settlements.

Author Contributions

R.C.: conceived and designed the study, conducted the experiments, analyzed and interpreted the data, and wrote the paper; J.W.: reviewed and edited; L.L. and D.C.: investigated, contributed to the analytical tools, and visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Social Science Foundation of China Cold and Absolute Science Research Special Project (23VJXG030), Centralized Guidance of Local Science and Technology Development Funds Project No. YDZJSX2021A017.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

We thank the village committees and villagers of the villages in the study sample for their assistance. In addition, we would like to thank Jiayu Zhao and Yaqian Bi from the College of Architecture and Art, Taiyuan University of Technology, for their support during the study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UAV	Unmanned Aerial Vehicle
GCP	Ground Control Point
DSM	Digital Surface Modeling

References

Zheng, X.; Wu, J.; Deng, H. Spatial distribution and land use of traditional villages in southwest China. Sustainability 2021, 13, 6326. [Google Scholar] [CrossRef]
Xia, J.; Gu, X.; Fu, T.; Ren, Y.; Sun, Y. Trends and Future Directions in Research on the Protection of Traditional Village Cultural Heritage in Urban Renewal. Buildings 2024, 14, 1362. [Google Scholar] [CrossRef]
Wang, M.; Webber, M.; Finlayson, B.; Barnett, J. Rural industries and water pollution in China. J. Environ. Manag. 2008, 86, 648–659. [Google Scholar] [CrossRef] [PubMed]
Lin, L.; Du, C.; Yao, Y.; Gui, Y. Dynamic influencing mechanism of traditional settlements experiencing urbanization: A case study of Chengzi Village. J. Clean. Prod. 2021, 320, 128462. [Google Scholar] [CrossRef]
Lu, S.; Li, G.; Xu, M. The linguistic landscape in rural destinations: A case study of Hongcun Village in China. Tour. Manag. 2020, 77, 104005. [Google Scholar] [CrossRef]
Hu, X.; Li, H.; Zhang, X.; Chen, X.; Yuan, Y. Multi-dimensionality and the totality of rural spatial restructuring from the perspective of the rural space system: A case study of traditional villages in the ancient Huizhou region, China. Habitat Int. 2019, 94, 102062. [Google Scholar] [CrossRef]
Liu, Y.; Li, Y. Revitalize the world’s countryside. Nature 2017, 548, 275–277. [Google Scholar] [CrossRef]
Qin, R.J.; Leung, H.H. Becoming a traditional village: Heritage protection and livelihood transformation of a Chinese village. Sustainability 2021, 13, 2331. [Google Scholar] [CrossRef]
Turner, M.G. Spatial simulation of landscape changes in Georgia: A comparison of 3 transition models. Landsc. Ecol. 1987, 1, 29–36. [Google Scholar] [CrossRef]
Hulshoff, R.M. Landscape indices describing a Dutch landscape. Landsc. Ecol. 1995, 10, 101–111. [Google Scholar] [CrossRef]
Ma, H.D.; Tong, Y. Spatial differentiation of traditional villages using ArcGIS and GeoDa: A case study of Southwest China. Ecol. Inform. 2022, 68, 101416. [Google Scholar] [CrossRef]
Liu, S.; Ge, J.; Li, W.; Bai, M. Historic environmental vulnerability evaluation of traditional villages under geological hazards and influencing factors of adaptive capacity: A district-level analysis of Lishui, China. Sustainability 2020, 12, 2223. [Google Scholar] [CrossRef]
Zhang, Q.; Liu, Y.; Liu, L.; Lu, S.; Zhang, J. Strategy analysis for the interaction between tourism development and local eco-environment in traditional villages. J. Environ. Prot. Ecol. 2020, 21, 2279–2289. [Google Scholar]
Saadatseresht, M.; Hashempour, A.H.; Hasanlou, M. UAV photogrammetry: A practical solution for challenging mapping projects. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 40, 619–623. [Google Scholar] [CrossRef]
Tao, C.; Watts, B.; Ferraro, C.C.; Masters, F.J. A multivariate computational framework to characterize and rate virtual Portland cements. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 266–278. [Google Scholar] [CrossRef]
Guan, S.; Zhu, Z.; Wang, G. A review on UAV-based remote sensing technologies for construction and civil applications. Drones 2022, 6, 117. [Google Scholar] [CrossRef]
Taddia, Y.; González-García, L.; Zambello, E.; Pellegrinelli, A. Quality assessment of photogrammetric models for façade building reconstruction using DJIPhantom 4, R.T.K. Remote Sens. 2020, 12, 3144. [Google Scholar] [CrossRef]
Westoby, M.J.; Brasington, J.; Glasser, N.F.; Hambrey, M.J.; Reynolds, J.M. ‘Structure-from-Motion’ photogrammetry: A low-cost, effective tool for geoscience applications. Geomorphology 2012, 179, 300–314. [Google Scholar] [CrossRef]
Zheng, X.; Wang, F.; Li, Z. A multi-UAV cooperative route planning methodology for 3D fine-resolution building model reconstruction. ISPRS J. Photogramm. Remote Sens. 2018, 146, 483–494. [Google Scholar] [CrossRef]
Reshetyuk, Y. Terrestrial Laser Scanning: Error Sources, Self-Calibration and Direct Georeferencing; VDM Verlag Dr. Muller: Riga, Latvia, 2009. [Google Scholar]
Nex, F.; Remondino, F. UAV for 3D mapping applications: A review. Appl. Geomat. 2014, 6, 1–15. [Google Scholar] [CrossRef]
Yao, H.; Qin, R.; Chen, X. Unmanned aerial vehicle for remote sensing applications—A review. Remote Sens. 2019, 11, 1443. [Google Scholar] [CrossRef]
Liu, C.; Sui, H.; Huang, L. Identification of building damage from UAV-based photogrammetric point clouds using supervoxel segmentation and latent dirichlet allocation model. Sensors 2020, 20, 6499. [Google Scholar] [CrossRef] [PubMed]
Calders, K.; Adams, J.; Armston, J.; Bartholomeus, H.; Bauwens, S.; Bentley, P.L.; Chave, J.; Danson, M.; Disney, M.; Gaulton, R.; et al. Terrestrial laser scanning in forest ecology: Expanding the horizon. Remote Sens. Environ. 2020, 251, 112102. [Google Scholar] [CrossRef]
Pytharouli, S.; Souter, J.; Tziavou, O. Unmanned Aerial Vehicle (UAV) based mapping in engineering surveys: Technical considerations for optimum results. In Proceedings of the 4th Joint International Symposium on Deformation Monitoring, Athens, Greece, 15–17 May 2019. [Google Scholar] [CrossRef]
Iizuka, K.; Ogura, T.; Akiyama, Y.; Yamauchi, H.; Hashimoto, T.; Yamada, Y. Improving the 3D model accuracy with a post-processing kinematic (PPK) method for UAS surveys. Geocarto Int. 2022, 37, 4234–4254. [Google Scholar] [CrossRef]
Guarnieri, A.; Remondino, F.; Vettore, A. Digital photogrammetry and TLS data fusion applied to Cultural Heritage 3D modeling. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2006, 36, 1–6. [Google Scholar]
Tan, G.; Zhu, J.; Chen, Z. Deep learning based identification and interpretability research of traditional village heritage value elements: A case study in Hubei Province. Herit. Sci. 2024, 12, 200. [Google Scholar] [CrossRef]
Cappellazzo, M.; Patrucco, G.; Spanò, A. ML Approaches for the Study of Significant Heritage Contexts: An Application on Coastal Landscapes in Sardinia. Heritage 2024, 7, 5521–5546. [Google Scholar] [CrossRef]
Aissou, B.E.; Aissa, A.B.; Dairi, A.; Harrou, F.; Wichmann, A.; Kada, M. Building roof superstructures classification from imbalanced and low density airborne LiDAR point cloud. IEEE Sens. J. 2021, 21, 14960–14976. [Google Scholar] [CrossRef]
Liao, L.; Tang, S.; Liao, J.; Li, X.; Wang, W.; Li, Y.; Guo, R. A supervoxel-based random forest method for robust and effective airborne LiDAR point cloud classification. Remote Sens. 2022, 14, 1516. [Google Scholar] [CrossRef]
Haznedar, B.; Bayraktar, R.; Ozturk, A.E.; Arayici, Y. Implementing PointNet for point cloud segmentation in the heritage context. Herit. Sci. 2023, 11, 2. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar] [CrossRef]
Graham, B.; Engelcke, M.; Van Der Maaten, L. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9224–9232. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
Choy, C.; Gwak, J.Y.; Savarese, S. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 3075–3084. [Google Scholar] [CrossRef]
Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, L.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar] [CrossRef]
Liu, Z.; Tang, H.; Lin, Y.; Han, S. Point-voxel cnn for efficient 3d deep learning. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
Zhu, X.; Zhou, H.; Wang, T.; Hong, Z.; Ma, Y.; Li, W.; Li, H.; Lin, D. Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9939–9948. [Google Scholar] [CrossRef]
Hu, Q.; Yang, B.; Fang, G.; Guo, Y.; Leonardis, A.; Trigoni, N.; Markham, A. Sqn: Weakly-supervised semantic segmentation of large-scale 3d point clouds. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 600–619. [Google Scholar] [CrossRef]
Li, J.; Chu, J.; Wang, Y.; Ma, M.; Yang, X. Reconstruction of traditional village spatial texture based on parametric analysis. Wirel. Commun. Mob. Comput. 2022, 2022, 5151421. [Google Scholar] [CrossRef]
Elkhrachy, I. Accuracy assessment of low-cost unmanned aerial vehicle (UAV) photogrammetry. Alex. Eng. J. 2021, 60, 5579–5590. [Google Scholar] [CrossRef]
Xi, R.; Jiang, W.; Meng, X.; Chen, H.; Chen, Q. Bridge monitoring using BDS-RTK and GPS-RTK techniques. Measurement 2018, 120, 128–139. [Google Scholar] [CrossRef]
Valente DS, M.; Momin, A.; Grift, T.; Hansen, A. Accuracy and precision evaluation of two low-cost RTK global navigation satellite systems. Comput. Electron. Agric. 2020, 168, 105142. [Google Scholar] [CrossRef]
Aguilar, M.A.; Jiménez-Lao, R.; Nemmaoui, A.; Aguilar, F.J. Geometric accuracy assessment of deimos-2 panchromatic stereo pairs: Sensor orientation and digital surface model production. Sensors 2020, 20, 7234. [Google Scholar] [CrossRef]
Zeybek, M.; Şanlıoğlu, İ. Point cloud filtering on UAV based point cloud. Measurement 2019, 133, 99–111. [Google Scholar] [CrossRef]
Gao, C.; Wang, Z.; Zhou, J. Exploration of the application strategy of three-dimensional digital technology mapping in traditional village architectural heritage conservation research. J. Inn. Mong. Univ. Technol. (Nat. Sci. Ed.) 2020, 39, 379–385. (In Chinese) [Google Scholar] [CrossRef]
Zhao, Y.; Zhu, Z.; Huang, B.; Fei, J.; Jiang, J. Residential Identification in Jiaxing Old Town Based on UAV Tilt Photography. J. Zhejiang Univ. Eng. Ed. 2021, 55, 1083–1089. (In Chinese) [Google Scholar]
Peng, P.; Zhou, X.; Wu, S.; Zhang, Y.; Zhao, J.; Zhao, L.; Wu, J.; Rong, Y. An exploration of the self-similarity of traditional settlements: The case of Xiaoliangjiang Village in Jingxing, Hebei, China. Herit. Sci. 2024, 12, 196. [Google Scholar] [CrossRef]

Figure 1. Location map (location of the West Songbi Village in China).

Figure 2. Extent of damage to traditional buildings: (a) the collapse of traditional compounds; (b) weed regrowth in traditional compounds.

Figure 3. The methodological flowchart (source: drawn by the author).

Figure 4. DJI Mavic3E UAV.

Figure 5. Tilt-photography flight path.

Figure 6. Close-range photogrammetric route planner; (a) building elevations route plan; (b) route plan for façade interface.

Figure 7. GCPs distribution map.

Figure 8. GCPs deviation figure.

Figure 9. Our model consists of three modules: Point Feature Extract, Neighboring Attention Query, and MLP prediction. The Point Feature Extract module utilizes the point cloud feature extraction module from the SQN algorithm. The Neighboring Attention Query module employs self-attention blocks and attention pooling blocks from DLA-Net to enhance the encoding of positional features and fuse point cloud features. Finally, the results are output through the MLP.

Figure 10. Segmentation results of DDLA for Redhill Village.

Figure 11. Three-dimensional realistic model.

Figure 12. Orthographic image.

Figure 13. Qualitative results of SQN and our proposed DDLA on the S3DIS dataset. Trained with only 1‰ annotations, DDLA achieves better results than the weakly supervised SQN method. The red bounding box highlights the superior segmentation accuracy of our DDLA.

Figure 14. Layered treatment of spatial elements in West Songbi Village.

Figure 15. Spatial element occupancy diagram.

Figure 16. Distribution of traditional buildings.

Figure 17. Range of traditional building images.

Figure 18. West Songbi Village area conservation district zoning map.

Figure 19. Digital surface area.

Figure 20. Building spacing thresholds in West Songbi Village: C2C: Cloud-to-Cloud (C2C) distance is a metric used to measure the point-wise Euclidean distance between two 3D point clouds. It calculates the absolute distance from each point in the source point cloud to its nearest neighbor in the target point cloud.

Figure 21. Height of the historic building.

Figure 22. Height of new buildings.

Figure 23. Range of street widths calculated for West Songbi streets. The horizontal axis is the street width, and the vertical axis is the total number of point clouds at that width.

Figure 24. Public space analysis process diagram: (a) current distribution of public space in villages; (b) village public space analysis plan.

Figure 25. Structure of the compound: (a) courtyard with two borders; (b) courtyard with one border.

Table 1. The data acquisition process.

Stage	Data Acquisition Method	Main Outcome
Field Research	UAV low-altitude multi-view photogrammetry (integrating tilt photogrammetry and close-range photogrammetry) was employed to capture image data of traditional settlements and important historical buildings.	High-resolution images of the traditional village of Xisongbi and its historical buildings were obtained.
3D Modeling and Processing	The acquired image data were imported into 3D modeling software for processing.	High-precision 3D real-view models, orthophotos, and point cloud data were generated, providing foundational data for subsequent analysis.
Spatial Morphology Extraction	The generated point cloud data were processed, segmented, and classified.	Spatial morphological elements of the traditional village were extracted at macro, meso, and micro levels.

Table 2. Key parameters of the DJI Mavic 3E (Da-Jiang Innovations Science and Technology Co., Ltd., Shenzhen, China).

UAV Specifications	UAV Parameters	Camera Specifications	Camera Parameters
Diagonal wheelbase	302 mm	Camera model	Hasselblad L2D-20c
Maximum take-off weight	960 g	Image sensor	1/2 inch CMOS, 20 million px
Maximum altitude	6000 m	Equivalent focal length	24 mm
GNSS	GPS + GLONASS + Galileo + Beidou	Camera angle	84°
Hovering accuracy	Vertical: ±0.1 m, Horizontal: ±0.1 m	Lens iris	f/2.8
Battery power	5000 mAh	Size of image	5280 × 3956
Hovering time	45 min	Color pattern	Dlog-M (10 bit), HDR video (HLG 10 bit)

Table 3. UAV model accuracy assessment GCP.

GCP	Field Survey Data			Deviation
GCP	X (m)	Y (m)	Z (m)	dX (m)	dY (m)	dZ (m)
1	4,093,608.612	584,181.028	947.877	0.0287	0.3131	−0.0374
2	4,093,617.074	584,113.894	946.142	−0.0124	0.0112	0.0271
3	4,093,577.473	584,038.893	945.131	0.0111	0.0142	−0.2113
4	4,093,558.911	584,034.97	941.459	−0.0252	0.1637	−0.0438
5	4,093,610.642	583,952.754	945.17	0.0172	0.0286	−0.0254
6	4,093,547.986	583,914.41	943.528	0.0423	0.0367	0.0226
7	4,093,602.344	583,867.815	944.546	−0.0368	−0.0251	0.0541
8	4,093,715.501	583,841.577	945.149	−0.0123	0.0113	0.0114
9	4,093,723.687	583,960.747	946.056	0.0259	−0.0425	−0.0249
10	4,093,809.616	584,115.351	947.128	0.0468	0.0334	−0.0156
11	4,093,646.733	584,267.79	949.369	−0.0178	0.0122	0.0329
12	4,093,546.563	584,263.675	950.061	0.0498	0.0458	0.0216
13	4,093,532.849	584,312.357	951.133	0.0237	−0.0125	−0.0274
14	4,093,590.436	584,393.441	953.896	0.0429	−0.0349	0.0564
15	4,093,676.804	584,363.442	951.716	−0.0111	0.0274	0.0254
16	4,093,797.444	584,315.559	950.572	−0.0294	0.0369	−0.0235
17	4,093,661.076	584,285.025	950.361	0.0246	−0.0186	0.0358
18	4,093,647.181	584,176.039	947.807	0.0125	0.0425	0.0142
19	4,093,795.373	584,099.403	946.988	−0.0223	0.0291	−0.0329
20	4,093,754.054	583,950.697	945.816	0.0224	0.0524	0.0222
21	4,093,729.174	584,048.387	947.149	0.0423	0.0127	0.0142

Table 4. Quantitative results of different methods on the Area-5 of the S3DIS dataset. Mean IoU (mIoU, %) and per-class IoU (%) scores are reported.

Methods	MIoU(%)	Traditional Buildings	Alleys and Streets	New Construction	Courtyard	Vegetation	Clutter
SQN [42]	53.95	49.5	48.42	66.63	54.42	46.44	58.29
Ours	63.55	62.3	49.76	68.52	65.71	57.58	77.40
RandLA-Net [38]	30.03	16.21	28.29	48.40	15.44	35.42	36.44

Table 5. The extracted building sizes and areas from the 3D model.

Name of Building	Length (m)	Width (m)	Hright (m)	Area (m²)
Temple of the Three Guan’s	9.8	7	9.5	68.6
Xingguo Temple	29.8	36.2	6.6	600
Tiandi Hall	4	2.1	3.6	8.4

Table 6. Traditional courtyard door and window style model.

Name of Room Door and Window Style	Room Door and Window Style Types	Door and Window Sample 1	Door and Window Sample 2
Traditional Courtyard Main Room Door and Window Style
Traditional Courtyard Wing Room Door and Window Style

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, R.; Wang, J.; Li, L.; Chen, D. Extraction and Analysis of the Spatial Morphology of a Heritage Village Based on Digital Technology and Weakly Supervised Point Cloud Segmentation Methods: An Innovative Application in the Case of Xisongbi Village in Jiexiu City, Shanxi Province. Sustainability 2025, 17, 3349. https://doi.org/10.3390/su17083349

AMA Style

Chang R, Wang J, Li L, Chen D. Extraction and Analysis of the Spatial Morphology of a Heritage Village Based on Digital Technology and Weakly Supervised Point Cloud Segmentation Methods: An Innovative Application in the Case of Xisongbi Village in Jiexiu City, Shanxi Province. Sustainability. 2025; 17(8):3349. https://doi.org/10.3390/su17083349

Chicago/Turabian Style

Chang, Ruixin, Jinping Wang, Lei Li, and Dengxing Chen. 2025. "Extraction and Analysis of the Spatial Morphology of a Heritage Village Based on Digital Technology and Weakly Supervised Point Cloud Segmentation Methods: An Innovative Application in the Case of Xisongbi Village in Jiexiu City, Shanxi Province" Sustainability 17, no. 8: 3349. https://doi.org/10.3390/su17083349

APA Style

Chang, R., Wang, J., Li, L., & Chen, D. (2025). Extraction and Analysis of the Spatial Morphology of a Heritage Village Based on Digital Technology and Weakly Supervised Point Cloud Segmentation Methods: An Innovative Application in the Case of Xisongbi Village in Jiexiu City, Shanxi Province. Sustainability, 17(8), 3349. https://doi.org/10.3390/su17083349

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Extraction and Analysis of the Spatial Morphology of a Heritage Village Based on Digital Technology and Weakly Supervised Point Cloud Segmentation Methods: An Innovative Application in the Case of Xisongbi Village in Jiexiu City, Shanxi Province

Abstract

1. Introduction

1.1. Backgrade

1.2. Research Aim

2. Study Area

3. Methods

3.1. Research Methodology

3.2. Acquisition and Processing of Spatial Information

Data Acquisition and Processing

3.3. Point Cloud Segmentation Processing

3.3.1. DDLA Concepts

3.3.2. Self-Attention Block

3.3.3. Attention Pooling Blocks

4. Experiment

4.1. Comparative Experiment

4.2. Generalization Experiment

5. Results

5.1. Orthophoto and 3D Model

5.2. Point Cloud Segmentation Results

5.3. Spatial Form Featurepoint Cloud Segmentation Results

6. Discussion

6.1. Macro-Analysis

6.1.1. Landscape Zoning Study

6.1.2. Characterization of Settlement Topography

6.2. Meso-Analysis

6.2.1. Built Fabric Analysis

6.2.2. Building Height Analysis

6.2.3. Cyberspace Analysis of Streets and Alleys

6.3. Micro-Analysis

6.3.1. Analysis of Public Space

6.3.2. Analysis of Architectural Features

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI