A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation

He, Peipei; Ma, Zheng; Fei, Meiqi; Liu, Wenkai; Guo, Guihai; Wang, Mingwei

doi:10.3390/app122211801

Open AccessArticle

A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation

by

Peipei He

¹,

Zheng Ma

¹

,

Meiqi Fei

^1,*

,

Wenkai Liu

¹,

Guihai Guo

¹ and

Mingwei Wang

²

¹

College of Surveying and Geo-Informatics, North China University of Water Resources and Electric Power, Zhengzhou 450045, China

²

School of Computer Science, Hubei University of Technology, Wuhan 430068, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(22), 11801; https://doi.org/10.3390/app122211801

Submission received: 14 October 2022 / Revised: 17 November 2022 / Accepted: 18 November 2022 / Published: 20 November 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In point-cloud scenes, semantic segmentation is the basis for achieving an understanding of a 3D scene. The disorderly and irregular nature of 3D point clouds makes it impossible for traditional convolutional neural networks to be applied directly, and most deep learning point-cloud models often suffer from an inadequate utilization of spatial information and of other related point-cloud features. Therefore, to facilitate the capture of spatial point neighborhood information and obtain better performance in point-cloud analysis for point-cloud semantic segmentation, a multiscale, multi-feature PointNet (MSMF-PointNet) deep learning point-cloud model is proposed in this paper. MSMF-PointNet is based on the classical point-cloud model PointNet, and two small feature-extraction networks called Mini-PointNets are added to operate in parallel with the modified PointNet; these additional networks extract multiscale, multi-neighborhood features for classification. In this paper, we use the spherical neighborhood method to obtain the local neighborhood features of the point cloud, and then we adjust the radius of the spherical neighborhood to obtain the multiscale point-cloud features. The obtained multiscale neighborhood feature point set is used as the input of the network. In this paper, a cross-sectional comparison analysis is conducted on the Vaihingen urban test dataset from the single-scale and single-feature perspectives.

Keywords:

airborne LiDAR; PointNet; semantic segmentation; multiscale and multi-feature

1. Introduction

Urban surface coverage is important basic data for studying and managing urban environments [1]. Since the ground surface in urban areas is often covered by different features, it is difficult to accurately obtain the spatial and attribute information of each type of feature [2,3]. Therefore, the use of 3D point-cloud data for quickly and accurately interpreting complex urban scenes has become a research hotspot.

In a point-cloud scene, semantic segmentation is a crucial visual task enabling scene understanding. Chen et al. [4] propose the semantic segmentation network RandLA-Net++, which could achieve accurate and fine semantic segmentation. Yu et al. [5] proposed the bidirectional segmentation network (BiSeNetV2), which demonstrated high accuracy and high efficiency in real-time semantic segmentation. Meanwhile, there are also new research results in the processing of 3D point cloud data, which can be broadly classified into the following three categories: (1) multi-view-based methods, (2) voxel-based methods, and (3) primitive point-cloud-based methods.

(1): Multi-view-based methods. The main idea of this method is to project 3D point clouds into 2D images from multiple views, perform semantic segmentation processing using 2D convolutional neural networks, and finally restore the images to the original 3D point cloud for representation. In 3D radar point-cloud processing, Xu et al. [6] designed the spatial adaptive convolution (SAC) method and constructed SqueezeSegV3 for LiDAR point-cloud segmentation accordingly, which solved the common problem of different image locations affecting the feature distribution of the image, which in turn affected the network’s performance [7]. Moreover, 3D-MiniNet was designed on the basis of MiniNet to achieve higher efficiency [8,9]. Milioto et al. designed RangeNet++ with modifications and optimization on DarkNet to enable semantic segmentation; however, these methods have limited application and inevitable information loss during projection [10].
(2): Voxel-based methods. This method converts point clouds into voxels and then convolves them in 3D. This method can efficiently handle large-scale 3D data [11,12]. However, the memory requirements of this method are too high, and there is too much data redundancy in the voxel representation.
(3): On the basis of the original point0cloud approach, in the study of irregular point clouds, Qi et al. [13] designed a new deep neural network, PointNet, which can be used to learn point cloud features point by point. PointNet does not convert the point cloud into any other data representation, but directly uses it as an input network. It retains the spatial features of point clouds to the maximum extent and has strong testing performance. The subsequent deep learning networks based on original point clouds can be broadly classified into several categories, such as multilayer perceptrons, graph convolutions, and RNNs. Xiang et al. [14] proposed a neighborhood search method that selects the appropriate search method according to the characteristics of each point, thus avoiding the shortcomings of manual selecting search methods. Thomas et al. and Boulch et al. [15,16] used different convolution weights to process point clouds, an approach which also solved the alignment invariance problem. Hou et al. and Zeng et al. [17,18] used graph convolution to process point clouds. Graph convolution has a powerful local feature extraction ability and can solve structural problems in point clouds, but the structure of the network is relatively complex and computationally inefficient. Huang et al. [19] combined PointNet with an RNN and introduced the contextual information of the point cloud, which enabled the network to achieve more accurate classification. However, the PointNet network simply connects all the points and only considers global features and individual point characteristics without local information, and the results of multi-instance, multi-classification problems are not good, which limits its ability to capture local structure and identify fine-grained models. Charles et al. [20] proposed an improved PointNet++ architecture to address PointNet’s limitations. While PointNet++ can extract point cloud features at multiple scales, it is still computationally intensive in multiscale neighborhoods due to its complex structure and architecture. In addition, like the original PointNet, PointNet++’s performance in semantic segmentation of complex language environments is unknown.

In the current research on the semantic segmentation of point clouds, it is difficult to achieve fine classification by only relying on sparse three-dimensional coordinates of airborne point clouds. Li et al. [21] designed a refined feature extractor using a self-attention mechanism to improve the accuracy of point-cloud classification. Yang et al. [22] proposed a graph attention feature fusion network (GAFFNet) that could achieve a satisfactory classification performance by capturing a wider range of contextual information of the ALS point cloud. Luo et al. [23] confirmed the potential of using multispectral LiDAR in the classification of complex urban land cover through three comparison methods. Multispectral LiDAR has the advantage of detecting spectral information. Li et al. [24] constructed feature pyramids to integrate features at different scales and were able to classify point clouds with good results. However, these methods suffer from the problem of excessive consumption of memory or computational resources. In addition, building a deeper network structure is difficult; thus, the semantic information extraction remains insufficient. If the point cloud is directly used as the input in the PointNet network, the local feature extraction ability is slightly weak. Therefore, this paper improves the network based on PointNet, and proposes MSMF-PointNet (MSMF-PointNet: semantic segmentation of airborne LiDAR point clouds based on PointNet fusion with multiple scales and features). The proposed network uses PointNet as a classifier for urban terrain. However, the PointNet netalgorithm ignores the local characteristics of the problem; thus, on the basis of the multiscale spherical neighborhood calculation of airborne roughness of point clouds and the total variance, linearity, etc., the proposed method considers a variety of characteristics of covariance and combines remote sensing images, point-cloud fusion, XYZ coordinates, and RGB color images to produce a 16-dimensional feature vector as the input of the neural network. After multiple parameter combination experiments, the structure of PointNet is optimized to make up for the defects of the original network in generating classification results.

This article is divided into five sections: introduction, materials and methods, experimental data, discussion, and conclusions. Section 1 introduces the research of some scholars and our research; Section 2 describes the structure and principle of the MSMF-PointNet model we use; Section 3 describes our experimental data and point-cloud classification results; Section 4 describes our experimental results and compares them to results obtained through other methods; Section 5 summarizes the experimental results and experimental analysis. From the final results, we can see that the method proposed in this paper can make full use of point-cloud features and improve the classification accuracy.

2. Materials and Methods

Feature learning in PointNet is mostly carried out with only single-point XYZ features, without considering other local features extracted from the point cloud. This limits PointNet’s ability to identify and summarize detailed information. To solve these problems, we propose the MSMF-PointNet model to improve the performance of PointNet. MSMF-PointNet consists of three main parts.

(1): Selection of multi-neighborhood features

Firstly, the point cloud and the remote sensing image are registered. After registration, the spectral information of each point in the point cloud is fuse to the corresponding remote sensing image to obtain the basic features, namely, the coordinates (XYZ) and color (RGB) of the point cloud.

Multiple features of the point cloud data are extracted using the spherical neighborhood method. Some common features, such as roughness, are selected. Covariance-based features are mainly calculated using the covariance-based eigenvalue

λ_{1}, λ_{2},

and

λ_{3}

, where

λ_{1}

≥

λ_{2}

≥

λ_{3}

> 0. A total of four point-cloud features with strong descriptive ability are selected: omnivariance (

O_{λ}

), planarity (

P_{λ}

), linearity (

L_{λ}

), and verticality (

V_{λ}

).

(2): Expression of multiscale features

The neighborhood scope of the target point is determined. In the process of feature calculation, the K value needs to be set in the K-neighbor method, while the radius R needs to be set in the spherical neighborhood method. The characteristics obtained are different depending on the size of the selected neighborhood. The minimum scale is generally larger than the minimum density of the point cloud, the maximum scale can be defined according to the average width of the analyzed building, and the interval size of the scale can be increased in turn according to the minimum scale. Two scales are comprehensively selected, with R values of 0.8 m and 1.2 m, respectively.

(3): The construction of the MSMF-PointNet network

The LiDAR point cloud assigned a priori to produce the 3D urban land cover is classified using a 3D DNN. The LiDAR point cloud is sparse and irregular, which renders traditional convolution methods unusable. PointNet [13] represented pioneering work on point clouds to overcome this problem, and its hyper-parameters and structure were redesigned to classify urban LiDAR point clouds in this study.

In the next section, the steps for point-cloud classification and segmentation are described in detail by combining multiple features of neighborhood sampling with multiple scales to construct a deep neural network.

2.1. Selection of Multi-Neighborhood Features

At present, the features applied to airborne LiDAR point-cloud classification can be divided into three categories: features based on echo signals, features based on descriptors, and geometric features. Relatively speaking, in the current point-cloud classification research, geometric features have been proven to be the most effective and have seen the most use [25]. Unlike eigenvectors, eigenvalues have good rotationally invariant properties [26]. In order to allow the network to learn more features and improve the classification accuracy of the network, four geometric features based on point clouds were selected in addition to the basic point-cloud features (i.e., XYZ and RGB). At the same time, we prevent the network from learning less-than-effective features due to feature redundancy, which leads to the non-convergence of the network and the establishment of an effective classification model. The specific meanings and formulas used in the calculations are shown in Table 1.

The roughness is the ratio between the surface area of a given region and its projected area. The roughness of trees, for example, is significantly higher than that of manmade structures.

λ₁, λ₂, and λ₃ are eigenvalues of the point cloud, where λ₁ ≥ λ₂ ≥ λ₃. An analysis of the eigenvalues and eigenvectors can often provide important information for extraction decisions. According to the points in the neighborhood, the covariance matrix of the center point was calculated, and then the eigenvalues of the points were obtained [27]. On the basis of these eigenvalues, four kinds of features can be calculated, namely, the sum of omnivariance (O_λ), planarity (P_λ), linearity (L_λ), and verticality (V_λ).

O_λ has a strong ability to describe the degree of fluctuation in a point-cloud surface. Generally speaking, complex surfaces have higher O_λ values. The omnivariances of trees and grass are higher than those of manmade surfaces and buildings.

P_λ is a measurement of the planar characteristics of the point cloud. It can effectively represent the level of the fitted surface in the neighborhood at a given point. A flatter surface is characterized by a higher P_λ. For example, the flatness of an artificial road surface is significantly higher than that of a tree-lined surface.

L_λ denotes the degree of linearity of the point cloud. Power lines and edges of buildings have obvious linear structures, and the linearities of these points are characterized by high values.

As for V_λ, within 90°, a larger angle between the surface and the ground is characterized by a higher V_λ of the points on the surface. The V_λ of tree trunks, walls, street lamps, and fences is higher, while that of the road surface and grass is obviously lower.

Compared with the use of a single feature, the combination of all the above features can provide more effective information for subsequent classification, achieving better results.

2.2. Expression of Multiscale Features

Point-cloud data comprise a mass point set representing the surface of the target and do not contain the geometric topology information of traditional entity grid data. The most important problem in point-cloud processing is to establish the topological structure between discrete points and realize fast searches based on neighborhood relationships, which becomes very important in real-world applications. In this paper, kD-Tree is used to efficiently compress the storage and management of massive point clouds and perform quick searches based on neighborhood relationships. There are several common methods used to perform searches in a kD-Tree: neighbors with voxels search, K nearest neighbors search, and neighbors within radius search. The radius search method is used in this paper. It gives the threshold value of the target point and the search distance, takes the target point as the center of the circle and the search distance as the radius, and finds all the data within the dataset whose distance from the target point is less than the threshold value. The search radius of the selected region is taken as the scale parameter, and the multiscale is obtained by changing its value to form the multiscale and multi-neighborhood features. Moreover, the point-cloud features calculated under different spherical radii of the neighborhood are not consistent, and different scales of the point cloud can extract the features of different dimensions [28]. According to the average point spacing of the dataset, we set the radius of two scales, R₁ and R₂, to be 0.8 m and 1.2 m, respectively.

The multiscale spherical neighborhood was used for feature calculation. When the radius value r of the neighborhood is small, there will be some discrete points, and the effective eigenvalues cannot be calculated because there are not enough points in the neighborhood to generate the fitting surface. In this study, these outliers were replaced by −1.

When R = 0.8, the estimated values of roughness, omnivariance, planarity, linearity, and verticality are as shown in Figure 1.

2.3. Construction of the MSMF-PointNet Model

2.3.1. MSMF-PointNet Deep Neural Network

Unlike optical imagery, whereby a regular grid renders it convenient for convolution and automatic feature extraction in an end-to-end framework, a LiDAR point cloud is disordered and irregular, which must be overcome in the design of the DNN.

PointNet defines the MLP-Max operation in a spherical neighborhood to extract point features. A spherical neighborhood, also known as the r-neighborhood, contains all points in a sphere with a point as the center and r as the radius. In MLP-Max operations, a multilayer perceptron (MLP) is operated to extract a feature for every point, and then max pooling is used to summarize the extract.

As mentioned in Section 2.1 and Section 2.2, we took multidimensional features for each point, used these to replace single features as the input of the deep neural network for feature learning, and took radius R as the scale parameter to construct the multiscale structure.

In this section, a multiscale and multi-neighborhood MSMF-PointNet classification model is constructed on the basis of the PointNet structure to improve the input feature dimension of the model and extract multiscale neighborhood point-cloud features for ground object classification.

In order to improve the training efficiency of the network, the 16-dimensional features are divided into three parts. In the first part, the XYZ information and RGB information of the point cloud are input into the improved PointNet network. The second part and the third part are the five features obtained when the radius of the spherical neighborhood was 0.8 and 1.2, respectively, and the eight-dimensional features of the XYZ information of the point cloud are input into the two additional small networks.

There are some specific ideas for improvement. Aiming at the problem of increasing the dimensions of the fusion point-cloud feature space, the input transformation matrix dimension is adjusted to increase the number of channels so that the matrix is changed from originally processing three-dimensional feature vectors to processing the fusion of six-dimensional and eight-dimensional feature vectors. In view of the increase in data volume brought by the expansion of the feature space of fusion point-cloud data, the point cloud’s depth features are fully extracted by deepening the number of network layers of the MLP layer. In order to solve the problem of lack of local neighborhood features, two small PointNet feature extraction networks were built using as inputs the point-cloud features calculated in the spherical neighborhood, and multi-neighborhood point-cloud features of different scales were extracted. A block diagram of the MSMF-PointNet network structure is shown in Figure 2.

The concrete measures to construct the MSMF-PointNet network were as follows: first, the original input transformation matrix T-Net(3) was replaced with T-Net(6) and T-Net(8). The new networks can carry out more spatial transformations of the fusion point-cloud data. Three convolution layers were added to the MLP layer of the original global feature extraction part to increase the depth of the feature extraction. Two multi-feature extraction networks of different scales, each being a small PointNet network, were added to the original PointNet network. The network structure is shown in the dotted box in Figure 2.

At each scale, a five-layer small MLP network and maxpooling structure were set, and a 256-dimensional feature vector was obtained as the local point-cloud feature through pooling. The two-scale neighborhood features were connected to the original single-point and global features extracted by PointNet through the full connection layer to obtain the multiscale neighborhood feature vectors. The improved multiscale PointNet neural network used the neighborhood information obtained from the spherical neighborhood to extract the local features of the point cloud; this approach overcomes the lack of neighborhood features in the original PointNet neural network. Neighborhood features are important because they are better for describing the details of the point cloud in the local region. The specific network structure is shown in Figure 2.

2.3.2. Parameter Setting

Table 2 (a,b) presents the hardware and environmental configurations of the experiment in this paper.

Additionally, in the training process of the deep neural network, the settings of the network parameters have a great impact on the training results. Using appropriate optimization algorithms can also improve the performance of the model to some extent. Therefore, this paper drew on the network parameter settings of PointNet and PointNet++ to provide relevant references for training the MSMF-PointNet model, as shown in Table 3.

3. Experimental Data

3.1. Experimental Data Information

In this paper, the same dataset from the ISPRS 2D and 3D Semantic Labeling Contest was used, including airborne LiDAR data and IR-R-G images from the German Vaihingen dataset (Figure 3). The dataset was divided into two areas, namely, the training set and the test set. In addition, the dataset was divided into nine categories, including power line, low vegetation, impervious surfaces, cars, fence/hedge, roof, facade, shrub, and tree. Each point in the dataset contained the 3D coordinates, return_number, number_of_returns, and intensity. According to statistics, the training set containsed753,859 points and the test set contained 411,721 points [31].

In order to improve the robustness and generalization of the network using another data check, we took test data from Sha City in Hubei Province, which contained 1,954,659 laser points in total. The X-axis span was 904.55 m, and the Y-axis span was 789.47 m. The relative altitude of the aerial images, which were obtained with an RCD-105 35 mm focal length camera, was 1100 m, and the point-cloud density was 1.34 pts/m², as shown in Figure 4. Raw data include the XYZ information of the point cloud. The original point cloud data and remote sensing images are fused to obtain the point cloud data with RGB information. We used CloudCompare (http://www.cloudcompare.org/, accessed on 12 February 2020) tools to carefully label datasets to facilitate precision in subsequent comparisons.

3.2. Classification Results for Point Clouds

The intersection over union (IoU) [32] is obtained by dividing the overlap between two regions by the separate parts of these regions. The result is compared with the calculated result of this IoU through the set threshold and is expressed as a percentage. It can be simply solved as the union set on the intersection ratio of the detection result and ground truth, i.e., the intersection ratio of the detection result and IoU (see Equation (1)).

I o U = \frac{D e t e c t i o n R e s u l t \cap G r o u n d T r u t h}{D e t e c t i o n R e s u l t \cup G r o u n d T r u t h} .

(1)

Overall accuracy (OA) was used to evaluate the overall classification accuracy of all categories, which is defined as the percentage of the total number of correctly classified points over the total number of points.

We used the F₁ score as a comprehensive evaluation index. The formula for calculating the F₁ fraction is as follows:

F_{1} = 2 \cdot \frac{P_{precision} \cdot P_{r e c a l l}}{P_{precision} + P_{r e c a l l}} .

(2)

In order to verify the multiscale and multi-feature algorithm proposed in this paper, the same data and the same network were first used for single-scale and single-feature (SS) point-cloud data (i.e., data containing only XYZ information); then, verification using single-scale, multi-feature (SM) data (i.e., containing XYZRGB and R = 1 m features, a total of 11 features) and multiscale, multi-feature (MM) data (all scales and all features) was carried out. A comparison of the OA results is shown in Table 4.

As can be seen from Figure 5, the proposed multiscale PointNet neighborhood algorithm (MM) had the highest overall classification accuracy and best classification effect. The overall accuracy was 88.1%. From the above quantitative evaluation results, fusion based on spectral information and covariance characteristics (SM) after all the features of the point cloud was shown to promote classification accuracy. The overall accuracy of SM was 19.2 percentage points higher than that of SS. After the fusion of spectral information and the covariance characteristics (MM) of the two scales, the accuracy was further improved by 8.4 percentage points compared with the single-scale SM. Among the classification categories, the accuracies of impervious ground, tree, and roof were improved by 19.7 percentage points, 30 percentage points, and 31.3 percentage points, respectively. It can be seen that the attribute information of multiscale and multi-neighborhood point clouds can be effectively enhanced, thus achieving a more accurate classification of various ground objects.

Figure 6 shows the comparison of specific classification results. The actual value, SS, SM, and MM classification results are shown in sequence.

According to the above scale comparison, the classification accuracy of the proposed method was the highest, and the classification effect was the best. When this method was applied to the data in Shashi district, the results were as shown in Figure 7. Here, the classification accuracy reached 92.4%.

TerraSolid software cannot classify road point clouds. In order to make the data comparative, this paper divided the ground points classified by TerraSolid into bare land to evaluate the classification results. As can be seen from Figure 7, TerraSolid had better classification results for tall trees and artificial buildings. However, the classification effect of low-growing vegetation and bare land was not ideal. The algorithm in this paper had a good classification effect on roofs, trees, and bare land. The average IoU values of specific categories are shown in Table 5.

4. Discussion

4.1. Classification Accuracy

As can be seen from Figure 6, the classification results were consistent with the previous expectations. The multiscale and multi-feature MSMF-PointNet structure proposed in this paper had the best classification effect, as can also be clearly seen from the quantitative evaluation results in the table in Figure 5. On a whole, the final classification result of MM was almost the same as the real value, but the classification effect was especially prominent in the roof, tree, and impervious surface categories. As can be seen from the SS, features were classified as roof or tree much more accurately when added to the SM based on the local characteristics of the spherical neighborhood. Figure 6 shows the two places with ground features that the SS, according to the topography and elevation, simply classified as tree, roof, and shrub, while the SM refined the classification and was able to distinguish roof, tree, shrub, impervious surface, fence, and car.

Although MM had the highest classification accuracy and the best segmentation effect of various ground objects, some details were still incorrectly classified for various reasons. Therefore, we selected several typical regions to further investigate these reasons, as shown in Figure 8a–c.

When the height of a shrub was too low, i.e., too close in height to the low vegetation, it was misclassified as low vegetation, as shown in Figure 8a. When tall and short trees were mixed, the MSMF-PointNet misclassified some smaller trees as shrubs, as shown in Figure 8b. Similarly, in Figure 8c, some shrubs were also misclassified as trees.

Overlap in a strip can make the point cloud’s density uneven. In overlapping strips, the local density is large, whereas the density of a single strip density is small. As shown in Figure 9, a significant fault appeared in the single-strip area, where a bump can clearly be seen. Moreover, in the field, there was a large difference between the heights of the left and right sides of a narrow groove, R = 0.8. The ground on the right side of this groove was higher and was inappropriately classified as a roof. When the local feature R = 1.2 was added, the impervious ground was accurately classified.

As can be seen from the results in Figure 7 and Table 5, the classification accuracy of the Shashi city dataset was higher than that of the Vaihingen city dataset because the distribution of the former was more encrypted and more uniform than that of the latter; this also proves the generalizability of the proposed network.

4.2. Comparison with Other Methods

To further determine the performance of the proposed MSMF-Net, comparison experiments with other state-of-the-art methods and generalization capability analysis were conducted to accomplish these experiments.

Table 6 shows the proposed algorithm and the IoU values of the nine PointNet categories used in in the Vaihingen dataset (power line, Low_veg, Imp_surfaces, car, fence/hedge, roof, facade, shrub, and tree). The proposed algorithm performed significantly better in terms of accuracy than most PointNet methods in airborne LiDAR scanning to obtain large-scale cloud data of fields. This was a result of fusing the linearity, verticality, building facades, and fences to obtain a better classification; moreover, combining the roughness and the variance resulted in an improved classification of trees and shrubs, and, when adding in flatness, roofs and impervious surfaces were more likely to be distinguished from other features. However, due to the sparsity of airborne point clouds, there may not be enough points covering a surface when searching for neighborhood points with spherical neighborhoods. In addition, since the number of power line points was already small, the multiscale, multi-neighborhood algorithm was not as accurate in the classification of power lines as the PointNet method using only XYZ information.

The ISPRS website provides the experimental results of different methods. In this study, the accuracy results of the proposed method were compared with some previously submitted results (http://www2.isprs.org/commissions/comm2/wg4/vaihingen-3d-semantic-labeling.html; accessed on 24 September 2019). The comparison results of the F1 and OA values are shown in Table 7.

WhuY4 [33] used a multiscale CNN to process feature images obtained from LiDAR point clouds. The features used included normalized elevation, intensity, normal vector, and local plane features. UM is a feature-based supervised machine learning classification method, which obtains the texture features of point clouds through local fitting surface and the k-nearest neighbor method, obtains the geometric properties and attribute information of the point cloud through the morphology method, and uses a multilevel machine learning method to classify the point cloud. NANJ2 [34] uses three normalized point cloud features, namely, altitude, intensity, and roughness, to generate multiscale point-cloud feature maps. It then designs a multi-foot CNN to extract multiscale deep-level features and classify them, so as to obtain the classification probability of each point. Then, a decision tree is constructed, and the obtained probability is used for training again to optimize the initial classification results.

5. Conclusions

On the basis of the comprehensive analysis of the experimental results in this paper, it can be seen that, on a whole, the method in this paper had a relatively good classification effect on three types of ground objects, namely, trees, impervious surfaces, and roofs, but a relatively poor classification effect on fences, shrubs, and cars.

The experimental comparison with the data from Shashi city showed that the classification accuracy of different ground objects was related to the choice of point-cloud density and the sphere radius. By comparing our proposed method with the existing cloud classification methods, it can be seen that the proposed method has a great advantage in classification accuracy and can provide reliable information in the application of point clouds, such as in three-dimensional urban modeling. This study used a point-cloud method based on semantic segmentation in one direction using low-level characteristics and selected point-cloud scales to maintain a satisfactory classification accuracy and simplify the time complexity of the algorithm. The PointNet neural network structure and the feature selection were optimized, improving the accuracy and efficiency of the algorithm.

Author Contributions

Conceptualization, P.H.; methodology, P.H. and M.F.; software, W.L.; validation, Z.M. and M.W.; resources, G.G.; data curation, Z.M.; writing—original draft, M.F.; writing—review and editing, M.W.; supervision, G.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China [grant No. 41901285 and No. 41901296] and the Funds for Henan Province young talent support project [grant No. 2021HYTP009].

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author (Meiqi Fei), upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhu, Z.; Zhou, Y.; Seto, K.C.; Stokes, E.C.; Deng, C.; Pickett, S.T.A.; Taubenböc, H.K. Understanding an Urbanizing Planet: Strategic Directions for Remote Sensing. Remote Sens. Environ. 2019, 228, 164–182. [Google Scholar] [CrossRef]
Wang, R.; Peethambaran, J.; Chen, D. LiDAR Point Clouds to 3-D Urban Models: A Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 606–627. [Google Scholar] [CrossRef]
Liu, X. Airborne LiDAR for DEM generation: Some critical issues. Prog. Phys. Geogr. 2008, 32, 31–49. [Google Scholar] [CrossRef]
Chen, J.; Zhao, Y.; Meng, C.; Liu, Y. Multi-Feature Aggregation for Semantic Segmentation of an Urban Scene Point Cloud. Remote Sen. 2022, 14, 5134. [Google Scholar] [CrossRef]
Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
Xu, C.; Wu, B.; Wang, Z.; Zhan, W.; Vajda, P.; Keutzer, K.; Tomizuka, M. SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation. arXiv 2020, arXiv:2004.01803. [Google Scholar]
Wu, B.; Zhou, X.; Zhao, S.; Yue, X.; Keutzer, K. SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 4376–4382. [Google Scholar]
Alonso, I.; Riazuelo, L.; Montesano, L.; Murillo, A.C. 3D-MiniNet: Learning a 2D Representation from Point Clouds for Fast and Efficient 3D LIDAR Semantic Segmentation. IEEE Robot. Autom. Lett. 2020, 5, 5432–5439. [Google Scholar] [CrossRef]
Alonso, I.; Riazuelo, L.; Murillo, A.C. MiniNet: An Efficient Semantic Segmentation ConvNet for Real-Time Robotic Applications. IEEE Trans. Robot. 2020, 36, 1340–1347. [Google Scholar] [CrossRef]
Lei, J.; Song, J.; Peng, B.; Li, W.; Pan, Z.; Huang, Q. C2FNet: A Coarse-to-Fine Network for Multi-View 3D Point Cloud Generation. IEEE Trans. Image Process. 2022, 31, 6707–6718. [Google Scholar] [CrossRef] [PubMed]
Alkadri, M.F.; Luca, F.D.; Turrin, M.; Sariyildiz, S. A Computational Workflow for Generating A Voxel-Based Design Approach Based on Subtractive Shading Envelopes and Attribute Information of Point Cloud Data. Remote Sens. 2020, 12, 2561. [Google Scholar] [CrossRef]
Zhao, L.; Xu, S.; Liu, L.; Ming, D.; Tao, W. SVASeg: Sparse Voxel-Based Attention for 3D LiDAR Point Cloud Semantic Segmentation. Remote Sens. 2022, 14, 4471. [Google Scholar] [CrossRef]
Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar] [CrossRef] [Green Version]
Xiang, Q.; He, Y.; Wen, D. Adaptive deep learning-based neighborhood search method for point cloud. Sci. Rep. 2022, 12, 2098. [Google Scholar] [CrossRef]
Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L. KPConv: Flexible and Deformable Convolution for Point Clouds. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6410–6419. [Google Scholar] [CrossRef] [Green Version]
Boulch, A. ConvPoint: Continuous convolutions for point cloud processing. Comput. Graph. 2020, 88, 24–34. [Google Scholar] [CrossRef] [Green Version]
Hou, X.; Yu, X.; Liu, H. 3D Point Cloud Classification and Segmentation Model Based on Graph Convolutional Network. Laser Optoelectron. Prog. 2020, 57, 181019. [Google Scholar] [CrossRef]
Zeng, Z.; Xu, Y.; Xie, Z.; Wan, J.; Wu, W.; Dai, W. RG-GCN: A Random Graph Based on Graph Convolution Network for Point Cloud Semantic Segmentation. Remote Sens. 2022, 14, 4055. [Google Scholar] [CrossRef]
Huang, Q.; Wang, W.; Neumann, U. Recurrent Slice Networks for 3D Segmentation of Point Clouds. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2626–2635. [Google Scholar] [CrossRef] [Green Version]
Qi, C.R.; Li, Y.; Hao, S.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5105–5114. [Google Scholar]
Li, Y.; Cai, J. Point cloud classification network based on self-attention mechanism. Comput. Electr. Eng. 2022, 104, 108451. [Google Scholar] [CrossRef]
Yang, J.; Zhang, X.; Huang, Y. Graph Attention Feature Fusion Network for ALS Point Cloud Classification. Sensors 2021, 21, 6193. [Google Scholar] [CrossRef]
Luo, B.; Yang, J.; Song, S.; Shi, S.; Gong, W.; Wang, A.; Du, L. Target Classification of Similar Spatial Characteristics in Complex Urban Areas by Using Multispectral LiDAR. Remote Sens. 2022, 14, 238. [Google Scholar] [CrossRef]
Li, D.; Shen, X.; Guan, H.; Yu, Y.; Wang, H.; Zhang, G.; Li, J.; Li, D. AGF-Net: Attentive geometric feature pyramid network for land cover classification using airborne multispectral LiDAR data. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102723. [Google Scholar] [CrossRef]
Yao, M.M.; Li, X.M.; Wang, W.X.; Xie, L.F.; Tang, S.J. Semantic Segmentation of Indoor 3d Point Clouds by Joint Optimization of Geometric Features and Neural Networks. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, X-4/W2-2022, 305–310. [Google Scholar] [CrossRef]
Meng, F.; Wang, X.; Shao, F.; Wang, D.; Hua, X. Energy-Efficient Gabor Kernels in Neural Networks with Genetic Algorithm Training Method. Electronics 2019, 8, 105. [Google Scholar] [CrossRef] [Green Version]
Lai, X.; Yang, J.; Li, Y.; Wang, M. A Building Extraction Approach Based on the Fusion of LiDAR Point Cloud and Elevation Map Texture Features. Remote Sens. 2019, 11, 1636. [Google Scholar] [CrossRef] [Green Version]
Yue, C.; Liu, C.; Wang, X. Classification Algorithm for Laser Point Clouds of High-steep Slopes Based on Multi-scale Dimensionality Features and SVM. Geomat. Inf. Sci. Wuhan Univ. 2016, 41, 882–888. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
Shang, W.; Sohn, K.; Almeida, D.; Lee, H. Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2217–2225. [Google Scholar]
Yousefhussien, M.; Kelbe, D.J.; Ientilucci, E.J.; Salvaggio, C. A Fully Convolutional Network for Semantic Labeling of 3D Point Clouds. ISPRS J. Photogramm. Remote Sens. 2017, 143, 191–204. [Google Scholar] [CrossRef]
Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Martinez-Gonzalez, P.; Garcia-Rodriguez, J. A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 2018, 70, 41–65. [Google Scholar] [CrossRef]
Yang, Z.; Tan, B.; Pei, H.; Jiang, W. Segmentation and Multi-Scale Convolutional Neural Network-Based Classification of Airborne Laser Scanner Data. Sensors 2018, 18, 3347. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhao, R.; Pang, M.; Wang, J. Classifying airborne LiDAR point clouds via deep features learned by a multi-scale convolutional neural network. Int. J. Geogr. Inf. Sci. 2018, 32, 960–979. [Google Scholar] [CrossRef]

Figure 1. Estimated values of each feature.

Figure 2. Architecture of MSMF-PointNet. (R = 0.8 m and R = 1.2 m: the radius of the spherical neighborhood is 0.8 and 1.2 respectively; R, O, P, L, and V: roughness, omnivariance, planarity, linearity, and verticality).

Figure 3. Point cloud color-coded (a) by categories for the train data and test data, and (b) by spectral information (IR, R, G) for the training area and the test area.

Figure 4. Point-cloud data displayed by elevation in Shashi area and orthophoto image.

Figure 5. All kinds of ground features and overall classification accuracy.

Figure 6. Segmentation results of the Vaihingen dataset at various scales.

Figure 7. Image (a), TerraSolid classification results (b), and MSMF-PointNet classification results (c) in Shashi area.

Figure 8. Details which were misclassified. (a) Zoomed-in image of the misclassification details of the model proposed in this paper on shrub; (b,c) zoomed-in details of the misclassification of the model proposed in this paper on tree.

Figure 9. Schematic diagram of point-cloud classification error in specific region. ((a): point cloud displayed with elevation; (b) indicates that there is a ditch where the box is framed, and the height difference between the two sides of the ditch is large, so the classification effect is not good.)

Table 1. Features for classification.

Feature Type	Name	Formula	Explanation
Basic features	XYZ	/	The basic features of point cloud are obtained through fusion of the point cloud and image.
Basic features	RGB	/
Common features	Roughness	/	The roughness is the ratio between the surface area of a given region and its projected area.
Eigenvalue-based features	Omnivariance	${(λ_{1} \cdot λ_{2} \cdot λ_{3})}^{\frac{1}{3}}$	Omnivariance describes the surface undulation of the point cloud.
	Planarity	$(λ_{2} - λ_{3}) / λ_{1}$	Planarity denotes the evenness of the fitted surface in the neighborhood at this point.
	Linearity	$(λ_{1} - λ_{2}) / λ_{1}$	The linearity of the point cloud.
	Verticality	$1 - \|Z \cdot N\|$	Verticality describes the fitting at a point according to the relationship between the plane and horizontal plane, where Z is the unit vector in the vertical direction, and N is the normal vector of this point.

Table 2. (a) The configuration of hardware. (b) The configuration of software.

(a)
Hardware	Configuration Status
CPU CPU frequency	Intel® Core (TM) i7-8700K CPU 2.20 GHz
RAM	64 GB RAM
Hard disk	128 GB SSD + 1 TB HDD
GPU	NVIDIA GeForce RTX 2080
Video memory	12 GB
Computing platform	CUDA10.0 cudnn7.5
(b)
Software	Configuration Status
Operating system	Windows 10
Deep learning framework	Tensorflow-gpu 1.13
Development language	Python 3.7.2
Manager	Anaconda
IDE	Pycharm

Table 3. Settings of network parameters.

Parameters Model	PointNet	PointNet++	MSMF-PointNet
Optimizer	Adam	Adam	Adam [29]
Activation function	Relu	Relu	Relu [30]
Learning rate	0.001	0.001	0.001
Dropout	0.7	0.5	0.5
Batch Size	32	32	32

Table 4. Classification results and overall accuracy.

	Low_veg (%)	Car (%)	Imp_Surfacce (%)	Shrub (%)	Tree (%)	Roof (%)	OA (%)
SS	27.4	15.5	70.4	30.8	55.4	60.1	60.5
SM	39.5	37.7	87.7	49.4	66.7	89.3	79.7
MM	45.5	32.4	90.1	54.7	85.4	91.4	88.1

Table 5. IOU values of each classification.

Category	Roof (%)	Tree (%)	Imp_Surface (%)	Low_Veg (%)	Shrub (%)	Average IoU (%)
IoU	90.7	78.5	88.4	35.7	48.9	68.4

Table 6. Vaihingen City datasets for each class of IoU (%).

Algorithm	Power Line	Car	Low_Veg	Imp_Surfaces	Roof	Hedge	Facade	Shrub	Tree
PointNet [13]	0.8	23.2	32.1	47.6	84.7	2.3	5.7	15.4	76.2
Ours	0.5	58.7	65.4	78.4	95.8	15.8	16.9	27.5	92.4

Table 7. Accuracy comparison of different classification methods (%).

	F₁						OA
	Low_Veg	Imp_Surface	Car	Roof	Shrub	Tree	OA
Ours	88.1	94.8	55.8	97.5	51.1	93.6	89.1
WhuY4	82.7	91.4	74.7	94.3	47.9	82.8	84.9
UM	79.0	89.1	47.7	92.0	40.9	77.9	80.8
NANJ2	88.8	91.2	66.7	93.6	55.9	82.6	85.2

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, P.; Ma, Z.; Fei, M.; Liu, W.; Guo, G.; Wang, M. A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation. Appl. Sci. 2022, 12, 11801. https://doi.org/10.3390/app122211801

AMA Style

He P, Ma Z, Fei M, Liu W, Guo G, Wang M. A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation. Applied Sciences. 2022; 12(22):11801. https://doi.org/10.3390/app122211801

Chicago/Turabian Style

He, Peipei, Zheng Ma, Meiqi Fei, Wenkai Liu, Guihai Guo, and Mingwei Wang. 2022. "A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation" Applied Sciences 12, no. 22: 11801. https://doi.org/10.3390/app122211801

APA Style

He, P., Ma, Z., Fei, M., Liu, W., Guo, G., & Wang, M. (2022). A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation. Applied Sciences, 12(22), 11801. https://doi.org/10.3390/app122211801

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation

Abstract

1. Introduction

2. Materials and Methods

2.1. Selection of Multi-Neighborhood Features

2.2. Expression of Multiscale Features

2.3. Construction of the MSMF-PointNet Model

2.3.1. MSMF-PointNet Deep Neural Network

2.3.2. Parameter Setting

3. Experimental Data

3.1. Experimental Data Information

3.2. Classification Results for Point Clouds

4. Discussion

4.1. Classification Accuracy

4.2. Comparison with Other Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI