Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network

Xiao, Kai; Qian, Jia; Li, Teng; Peng, Yuanxi

doi:10.3390/rs15010243

Open AccessArticle

Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network

by

Kai Xiao

¹,

Jia Qian

²,

Teng Li

^3,4

and

Yuanxi Peng

^1,*

¹

State Key Laboratory of High Performance Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

²

College of Surveying and Geo-Informatics, Tongji University, Shanghai 200082, China

³

Beijing Institute for Advanced Study, National University of Defense Technology, Beijing 100020, China

⁴

College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(1), 243; https://doi.org/10.3390/rs15010243

Submission received: 25 November 2022 / Revised: 29 December 2022 / Accepted: 29 December 2022 / Published: 31 December 2022

(This article belongs to the Special Issue Application of Remote Sensing for Mining, Energy and Environmental Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Multispectral LiDAR technology can simultaneously acquire spatial geometric data and multispectral wavelength intensity information, which can provide richer attribute features for semantic segmentation of point cloud scenes. However, due to the disordered distribution and huge number of point clouds, it is still a challenging task to accomplish fine-grained semantic segmentation of point clouds from large-scale multispectral LiDAR data. To deal with this situation, we propose a deep learning network that can leverage contextual semantic information to complete the semantic segmentation of large-scale point clouds. In our network, we work on fusing local geometry and feature content based on 3D spatial geometric associativity and embed it into a backbone network. In addition, to cope with the problem of redundant point cloud feature distribution found in the experiment, we designed a data preprocessing with principal component extraction to improve the processing capability of the proposed network on the applied multispectral LiDAR data. Finally, we conduct a series of comparative experiments using multispectral LiDAR point clouds of real land cover in order to objectively evaluate the performance of the proposed method compared with other advanced methods. With the obtained results, we confirm that the proposed method achieves satisfactory results in real point cloud semantic segmentation. Moreover, the quantitative evaluation metrics show that it reaches state-of-the-art.

Keywords:

multispectral LiDAR point cloud; deep learning network; semantic segmentation

1. Introduction

Light detection and ranging (LiDAR) technology is one of the main approaches to acquiring 3D spatial information, allowing rapid access to highly precise surface information. Point clouds, a commonly used LiDAR data format, are able to retain the original geometric information in 3D space and are the preferred denotation for many 3D scene recognition applications. However, due to a lack of sufficient spectral information, it is not always possible to obtain satisfactory segmentation results of fine-grained point cloud scenes only through LiDAR data [1,2,3,4]. Furthermore, the application of multispectral LiDAR technology allows simultaneous acquisition of spatial geometry data and multispectral band intensity information, which can provide richer attribute features for point cloud scene segmentation. Recent studies [5,6,7,8,9,10] have also demonstrated the excellent performance of multispectral LiDAR in actual scene applications and provided a promising technical solution for fine 3D point cloud scene segmentation. However, in real point cloud scenes, the point clouds are disorderly distributed and huge data volume exists, which is difficult to achieve fine-grained semantic segmentation with. Therefore, the task of point cloud scene segmentation of multispectral LiDAR has made for challenging and valuable research [11,12,13,14].

In recent years, deep learning, an important technique that dominates the fields of computer vision and pattern recognition [15,16,17], has also been successfully applied to the related field of point cloud processing. Notably, point clouds are different from regular images, as they are composed of points distributed disorderly in 3D space. Therefore, the conventional neural network model architecture is not suitable for processing point clouds with disordered distribution in 3D space. Hence, research on deep learning networks adapted to point cloud processing has attracted more and more attention and motivated researchers to develop more effective deep learning network models.

Nowadays, researchers propose various methods to process point cloud data. In this paper, we generally classify these methods into two categories based on the different ways of processing point cloud. The first category is the conversion of point clouds into regular representations, such as projection of 3D point clouds into 2D images to achieve convolutional operation on point cloud. Current methods include converting point clouds into multi-view projections [18,19], spheres [20,21,22], volumetric measurements [23,24] and hybrid representations [25,26]. The principle of the above methods is first to process point cloud data into regular data in order to use standard convolution in subsequent operations. These works combined point cloud with previous excellent deep learning network model frameworks, driving the initial development of point cloud processing research. In fact, a typical deep learning network for 2D images cannot be directly applied to 3D point cloud due to their inherent data irregularities [27], and normalizing point cloud processing inevitably leads to the loss of the original point cloud data. Therefore, in order to further improve the effectiveness of point cloud processing, researchers have been exploring deep learning networks that can be directly applied to 3D point cloud classification, which has become one of the research hotspots [28]. The following is an introduction to relevant important research work in recent years.

To this end, in order to efficiently deal with unstructured and irregularly distributed point clouds, in 2017, C. R. Qi et al. proposed the pioneering work, PointNet [29], using symmetric functions and multilayer perceptrons (MLPs) for the point cloud. However, PointNet ignores point-to-point spatial adjacency and uses pooling functions to extract global features, which makes it difficult to obtain local features. After this, PointNet++ [30] was proposed to improve the network in PointNet by adding a hierarchical structure. It learns local features by aggregating the information of local neighboring points, which can extract features from smaller local regions. Based on this, Chen et al. proposed an improved PointNet++ network [31] to classify LiDAR point cloud scenes by considering centroids and global features, which showed good classification performance. However, the maxpooling layer of this network retains only the strongest activation features in local or global regions, which may still lose some details. In 2018, Wang et al. proposed a dynamic graph CNN neural network (DGCNN) [32] for point cloud classification. The network efficiently captures local domain information and global shape features of point clouds for scene classification by constructing local neighborhood graphs through EdgeConv. In 2019, Thomas et al. proposed a neural network model based on kernel point convolution (KPConv) [33]. This network achieves the extraction of point cloud features by establishing the distance coefficients between point cloud and convolution kernel point and interpolating point cloud features to the convolution kernel, showing good robustness for point cloud classification. Then, Hu et al. proposed an efficient, lightweight network called RandLA [34] for large-scale point cloud classification. The network utilizes random point sampling and achieves very high efficiency in terms of memory and computation. Lu et al. designed a spatial channel attention module [35] to capture multi-scale and global background features. The proposed spatial channel attention module not only efficiently combines multi-scale and global background information, but also generates spatial and channel attention to select discriminative features. In 2021, Jing et al. proposed SE PointNet++ [36] to integrate squeeze and excitation blocks (SE blocks) into PointNet++. This model improves the classification of point cloud scenes by emphasizing important channel features and suppressing unfavorable channel features for prediction. In 2022, Lin et al. proposed an attention-based mechanism [37] to enhance feature-boosting for point cloud classification tasks. It combines the convolutional features of the input points with the corresponding learned attention points to enhance point cloud feature extraction.

Inspired by the excellent previous work and combined with the task of segmenting a real large-scale land cover point cloud scene in this paper, we designed a deep learning network leveraging semantic fusion. The proposed deep learning network can effectively extract features from irregular and disordered point clouds by exploiting contextual semantics. We adopted the advanced RANDLA network [34] as the backbone network and designed a semantic fusion module to integrate into the backbone network. This approach exploits contextual semantic fusion to alleviate the challenge of feature extraction caused by the irregular and disordered features of the point cloud in real application [29,30,31,36]. To briefly summarize, the major contributions of this research include:

(1): In view of the huge number of point clouds and to cope with the problem of redundant point cloud feature distribution found in the experiment, we designed a data preprocessing applying principal component extraction to enhance the performance of the proposed network model on the applied multispectral LiDAR data.
(2): For the segmentation task of land cover large-scale multispectral LiDAR point clouds, we drew on RandLA, which can be used for large-scale point clouds. Furthermore, we designed and embedded a module that leverages semantic fusion to improve the fine-grained segmentation capability in point cloud semantic segmentation.
(3): We conducted a series of experiments on a real-world land cover large-scale multispectral LiDAR point cloud. Through quantitative analysis and evaluation, we confirmed that the proposed deep learning network achieves satisfactory performance on a real land cover point cloud semantic segmentation task, and various evaluation metrics achieve state of the art level.

The paper is structured as follows: In Section 2, the actual land cover large-scale LiDAR point cloud data and the architecture of the deep learning network are presented. Section 3 comprehensively describes the experiments and result analysis of this paper. The last part of the paper ends with the discussion and conclusions.

2. Materials and Methods

2.1. Multispectral LiDAR Data

The multispectral LiDAR system can simultaneously collect geometric data and multi-wavelength intensity information, thereby realizing the task of classifying point clouds in the scene [38,39,40]. In this paper, the actual multispectral LiDAR test data collected consists of three active laser wavelengths, 1550 nm, 1064 nm, and 532 nm. The system acquired cloud points at around 1075 m altitude with 300 kHz pulse repetition frequency per wavelength, and 40 Hz scan frequency. That is, the multispectral LiDAR test data provides three independent LiDAR point cloud wavelength signatures, which are used to complete the 3D point cloud segmentation task. The actual LiDAR experimental data collected came from a small town (the center of latitude 44°02′25″, longitude −79°17′00″), as shown in Figure 1. In the experiment, 13 representative areas were studied, containing rich object types such as roads, buildings, etc., covering an area of about 25 square kilometers. Additionally, the average point density was about 3.6 points/m². The total number of point clouds collected in the 13 test areas was 8.52 million, which were divided into 6 categories: road, building, grass, tree, open ground and powerline.

Each point cloud of multispectral LiDAR contains seven attributes: three-dimensional coordinate information, three wavelength features, and label information. Using CloudCompare software v2.11.3, the groundtruth is manually marked point by point. Among them, the three wavelengths are 1550 nm, 1064 nm and 532 nm, respectively. Furthermore, the 13 research areas are divided into the first 10 areas (about 70% of the data) as training data and the last 3 areas (about 30% of the data) as test data, which are used for training and evaluation of the proposed network model, respectively. The point clouds collected come from the same small town with adjacent spatial locations and continuous feature distribution.

2.2. Preprocessing by Singular Value Decomposition

It is worth noting that in real large-scale point cloud scenes, some features learned by neural networks may be ineffective for point cloud segmentation tasks, resulting in consumption of computing resources and even reduced accuracy [34]. Thus, in the real large-scale point clouds scene we applied, we performed a visual analysis of point cloud data with different class labels, as shown in Figure 2 (scale unit is spectral intensity). We can visually see the distribution of point cloud data.

As shown in Figure 2, the three-channel spectral intensity values of the point clouds are displayed. It can be seen that the cloud points in these figures are dense and coincide with each other, which is caused by the huge number of points and the fact that the spectral intensity is widely distributed. At the same time, by comparing the distribution of different categories, we found some categories that show similar distribution. Considering the high similarity of point clouds of different categories, which causes the network model to extract invalid or confusing features, in order to improve the adaptability of the proposed neural network to the point cloud, we adopted the strategy of singular value decomposition to preprocess the data.

Singular value decomposition (SVD) is an algorithm widely used in the field of machine learning [41,42]. It can be used not only for feature decomposition in dimensionality reduction algorithms [43,44,45], but also for recommendation systems, as well as in fields such as natural language processing. SVD decomposes a matrix, but unlike eigen decomposition, SVD does not require the matrix to be decomposed to be a square matrix. Assuming that matrix A is an m × n matrix, then the SVD of matrix F is defined as:

F = U Σ V^{T}

(1)

Σ

is a matrix of m × n, which has all zero elements except the elements on the main diagonal, and each element on the main diagonal is called a singular value, where U is a matrix of m × m and V is a matrix of n × n. Moreover, U and V are both unitary matrices. The main idea of SVD is to map n-dimensional features to k-dimensions. This k-dimension is a brand-new orthogonal feature, which is a k-dimensional feature reconstructed on the basis of the original n-dimensional feature. In fact, we calculated the covariance matrix of the data, and then obtained the eigenvalue eigenvectors of the covariance matrix, and selected the matrix composed of the eigenvectors corresponding to the k features with the largest eigenvalues. In this way, the data matrix can be transformed into a new space to achieve dimensionality reduction of data features. In this paper, the singular value decomposition operation was performed on the three spectral features of the LiDAR data. Through this, we performed feature dimensionality reduction. Our principal component extraction algorithm was obtained by applying SVD, which works well when the amount of data is large. Our original feature matrix

F_{m \times n}

is spectral information, then we calculate the following through SVD:

F_{m \times n} = U_{m \times k}^{}_{} Σ_{k \times k} V_{k \times n}^{T}

(2)

From the above formula, the left singular matrix

U_{m \times k}^{}

can be used to compress the number of rows, and the right singular matrix

V_{k \times n}^{T}

can be used to compress the number of columns, that is, the extraction of principal components. After the abovementioned principal component extraction operation, we added the extracted feature data to the multispectral LiDAR data. This operation aimed to extract main features while removing feature redundancy to improve the adaptability of our neural network to multispectral LiDAR. In the next part we introduce the framework of the proposed deep learning network.

2.3. Framework of Deep Learning Network

2.3.1. Backbone of the Network

Compared with images, point clouds have richer 3D spatial information and disordered distribution, which describe the features of objects in a manner closer to reality. However, the disordered characteristics of point clouds and the lack of intuitive topology, that is, the lack of direct contact between points, as well as the huge number of point clouds in large scene have always been a challenge in point cloud processing. Additionally, considering the high density and similarity of the point cloud data features used in this paper, some points may have features common to multiple categories rather than unique features, which leads to ineffectiveness of network model training. In particular, in the large-scale LiDAR point cloud segmentation task, some features learned by deep learning methods may be ineffective for point cloud segmentation, which leads to high computational cost and even reduced accuracy. In this regard, related research has continuously been in-depth. Here, we focus on fully exploiting the semantic information of a point cloud to improve our network model to extract effective feature information. Specifically, we built the network using RandLA [34] and combined it with the CSF block (contextual semantic fusion) designed in this paper [7,36], which extracts features from point clouds in a more refined way.

In the CSF block, we worked on fusing local geometry and feature content based on 3D spatial geometric associativity and embedded it into a backbone network. From this, we propose a modified deep learning network based on RandLA-Net, as Figure 3 below shows the framework of our proposed deep learning network.

As shown above, in the large-scale LiDAR point cloud segmentation task, we take multispectral LiDAR point cloud as input data, where each point cloud is labeled with a specified class in an end-to-end manner. The proposed deep learning network follows encoder–decoder architecture [33,34]. First, the input point cloud is fed to a fully connected layer to extract point-wise features. Then, five encoding layers are set to extract and learn the features of each point. Then, five decoding layers are used to extract and upsample the point cloud. Finally, the fully connected layers are applied to fully implement predicting the semantic label of each point.

The specific implementation details are as follows: The network input is a large-scale point cloud of size N × dim, where N is the number of points and dim is the feature dimension of each input point. For our large-scale multispectral LiDAR point cloud dataset, each point is represented by its 3D coordinates and wavelength fusion information. Here, five encoding layers are utilized in the proposed network architecture to gradually reduce the size of the point cloud and increase the per-point feature dimension. Each encoding layer consists of a feature aggregation module (introduced in the next section) and a random sampling operation. In the first four encoding layers, the number of point clouds are downsampled with a 4-fold decimation ratio, which means that after each layer of sampling, only 25% of the points are retained in the first four layers, and 50% of the points are retained in the last layer, that is, N→N/4→N/16→N/64→N/256→N/512. At the same time, the point-by-point feature dimension of each layer is gradually increased to retain more feature information, that is: 8→16→64→128→256→512. Immediately thereafter, five decoding layers continue to be used after the above-mentioned encoding layer. For each layer in the decoder, we first use the KNN algorithm to find a nearest neighbor for each query point, and then upsample the point feature set by nearest neighbor interpolation. Next, the upsampled feature maps are concatenated with the intermediate feature maps generated by the encoding layer via skip connections, and then a shared MLP is generated to apply the concatenated feature maps. Finally, the final semantic prediction is implemented by obtaining the final semantic label of each point through the fully connected layers. The output of the network is the predicted semantics of all points of size N × classes, where classes is the number of classification categories. In the next subsection, the designed contextual semantic fusion block is introduced in detail.

2.3.2. Contextual Semantic Fusion Block

In this paper, due to the setting of the backbone network, which takes into account the inevitable challenges of random sampling, some critical point cloud features may be lost under random sampling. Admittedly, random downsampling loses part of the point cloud to improve efficiency, which also leads to the discarding of their input features. Therefore, based on the backbone network that proposes the dilated residual module block to solve this problem, it still has potential for improvement in the application of large-scale LiDAR point cloud segmentation. On this basis, in order to make full use of the semantic information of the point cloud and improve the ability of network model to extract effective feature information, we combined RandlaNet with a designed contextual semantic fusion (CSF) block to further exploit point cloud features. As shown in Figure 4 below, it is the feature aggregation module that combines with our designed CSF block.

As can be seen from Figure 4, this module contains 2 sub-modules: the original dilated residual block and the incorporated contextual semantic fusion (CSF) block. In the dilated residual block, feature extraction of the block is mainly achieved by connecting the combination of multiple local spatial encoding + attention pooling. Specifically, the divided residential block is composed of two groups of local spatial encoding + attention pooling. Each time such a combination is added, the receptive field of each point increases the size of the neighborhood formed by a KNN. On this basis, in order to fully exploit the semantic information of the point cloud and further improve the ability of the network model to extract effective feature information, we designed the CSF block, which was embedded in the backbone network.

In the designed CSF block, we worked on fusing local geometry and feature content based on 3D spatial geometric associativity and embedded it into the backbone network. Figure 5 is the detailed structure of our designed CSF block.

As can be seen from the Figure 5, the CSF block is designed to fuse geometric and spectral features based on 3D spatial geometric correlation. Specifically, it fuses local geometry and feature content based on 3D spatial geometry correlation, encodes the correlation of local geometry and feature content separately, and finally concatenates them together, which can provide certain sequential information. The principle is to fuse local geometry and feature content based on 3D spatial geometric associativity. This process mainly performs the following steps in the geometry and feature branch:

Step 1: find the nearest neighbor point of KNN:

\forall P_{i_{k}} \in N (P_{i})

(3)

where

P_{i_{k}}

denotes the k neighbor points around

P_{i}

.

Step 2: build local geometry map. For each point

P_{i}

, its local geometric map includes its own position and the difference information between it and its neighboring points, as shown in the equations below:

g_{i} = [P_{i}, P_{i} - P_{i_{k}}], g_{i} \in R^{k \times 6}

(4)

G_{m a p} = [g_{1,} g_{2, \dots,} g_{N}], G_{m a p} \in R^{N \times k \times 6}

(5)

Here, the input point cloud feature includes two parts: geometric space and spectrum, each part has 3 channels, and has k nearest neighbors, which is a total of k × 6 dimension input.

Step 3: coding geometry. We first perform the MLP + BN + ReLu operation on the local geometry map

G_{m a p}

, and then obtain the largest feature through maxpooling to represent the overall local geometric information, as shown in the equation below:

\tilde{G_{m a p}} = m a x (M L P (B N (R e L u (G_{m a p})))

(6)

At the same time, the same operation as above is performed for the feature content branch, as shown in the equations below:

F_{i} = [f_{i}, f_{i} - f_{i_{k}}], F_{i} \in R^{k \times 6}

(7)

F_{m a p} = [F_{1,} F_{2, \dots,} F_{N}], F_{m a p} \in R^{N \times k \times 6}

(8)

\tilde{F_{m a p}} = m a x (M L P (B N (R e L u (F_{m a p})))

(9)

Afterwards, concatenate the geometry together with the feature branch, as shown in Equation (10).

F_{f u s i o n} = c o n c a t (\tilde{G_{m a p}}, \tilde{F_{m a p}})

(10)

Based on the above operations, the fusion coding of 3D spatial geometric correlation and spectral feature content in the CSF block is completed, which can help to enhance the effectiveness of the proposed network for point cloud feature extraction. Then, through the multi-layer random sampling and decoding module, the point-wise segmentation task of large-scale LIDAR point cloud can finally be realized. It is worth mentioning that through the above specific modification of the network structure, the semantic information of the point cloud can be fully squeezed, which contributes to improving the accuracy of our network model for the segmentation of large-scale LiDAR point clouds. In the next section, we quantitatively and objectively evaluate the performance of our proposed deep learning network through a series of comparative experiments.

3. Experiments and Results

In this section, we objectively evaluate the performance of the proposed deep learning network in large-scale multispectral LiDAR point cloud scene classification. First, we conduct a series of experiments on our method using the acquired multispectral LiDAR data and quantitatively evaluate the performance of the proposed method. Then, we also conduct comparative experiments using five classes of methods that perform well in the field of point cloud scene classification. Here, we introduce the specific parameter design of the experiment in detail, and then show the results of the method on multispectral LiDAR data. Finally, we conduct comparative experiments and specific analytical evaluations on the results.

3.1. Experimental Configuration

The large-scale multispectral LiDAR point cloud applied in our experiment contains 3D coordinates information, wavelength intensity values of three bands, features extracted by SVD and the labels. During model training, the learning rate, batch size, decay rate, optimizer, and maximum epoch are set to 0.01, 4096 × 5, 0.95, Adam, and 100, respectively. Our experimental large-scale multispectral LiDAR point cloud is divided into 13 data areas, where the first ten areas (area 1 to area 10, containing 70% of the data) are used as the training dataset, and the remaining three areas (area 11 to area 13, containing 30% of the data) used as the test dataset. In addition, all experiments are conducted on the same machine with an Intel Core i7 @2.5GHz CPU and an NVIDIA RTX3060 GPU. We used the deep learning framework of Tensorflow and Cloud Compare software v2.11.3 for visualization. The convergence curve during training is shown in Figure 6.

From the training process in the Figure 6, it can be seen that the accuracy rate steadily increases as the iterative process proceeds, while the loss curve decreases until convergence. As can be seen from the vertical coordinates, the final oscillation intervals in the value domain are accuracy: 0.99 ± 10⁻², and loss: 0.1 ± 10⁻¹, respectively. It can be known that the accuracy and loss curves reach convergence results, respectively. In addition, we produced visualization results and a series of evaluation metrics to objectively analyze the performance of the proposed method in multispectral LiDAR point cloud scene segmentation. The specific visualization results and evaluation parameters metrics are presented and discussed in the next section.

3.2. Overall Performance

In this work, we validate the proposed network model with real large-scale multispectral LiDAR data. Furthermore, we quantitatively evaluate the performance of the proposed method in the task of point cloud scenario segmentation, and use Cloud Compare software to process and visualize the results of the three test areas, as shown in Figure 7 below. The point cloud scene segmentation results and ground truth of the three test areas are shown in Figure 7a–f, respectively (the scale is in meters).

From the comparison of the visualization results in the Figure 7, it can be known that the proposed network model generally completes the accurate segmentation of the point cloud scene, which can identify the outlines of each category in the scene. Specifically, in the comparison of the output visualization results with the groundtruth, it can be shown that the proposed method can accurately segment the scene frames such as road, grass and tree. In addition, the powerlines that occupy a small area in the scene can be distinguished, and the buildings and open ground can also be located. It is undeniable that the details of some small regions are not well differentiated, especially the edge parts of different categories are prone to misjudgment as shown in small images in Figure 7a–c.

More importantly, in this work, we quantitatively evaluate the network model applied in the task of large-scale multispectral LiDAR point cloud segmentation using a series of objective evaluation metrics. This paper utilizes domain-specific evaluation metrics, including overall accuracy (OA) [46], mean intersection of joints (mIOU) [47], Kappa coefficient (Kappa) [48], and F1-score [49]. These metrics are introduced as follows:

OA = \frac{T P + T N}{T P + T N + F P + F N}

(11)

mIOU = \frac{T P}{T P + F P + F N}

(12)

F_{1} - score = \frac{2 \times Recall \times Precision}{Recall + Precision}, (Recall = \frac{T P}{T P + F N}, Precision = \frac{T P}{T P + F P})

(13)

Kappa = \frac{OA - P e}{1 - P e}, (P e = \frac{(T P + T N) \times (T P + F P) + (T N + F P) \times (T N + F N)}{{(T P + T N + F P + F N)}^{2}})

(14)

where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN is the number of false negatives.

In the next step, we separately calculate the evaluation metrics of the output results of the three test areas, and quantitatively demonstrate and evaluate the performance of the proposed network model in large-scale multispectral LiDAR point cloud scene segmentation. The obtained evaluation metric results are shown in Table 1.

As can be seen from Table 1, the overall accurate rate of the three test areas exceeds 93%, while the results of mIOU are all higher than 80%. It can be concluded that the average OA of the three test areas reached 94.9%, the average mIOU was 81.55%, the average F1-Score was 86.69 and the average Kappa was 92.07%, which met the expected standard. More specifically, we go on to compute confusion matrices for the results for six categories (road, building, grass, tree, open ground, and powerline) in the applied point cloud scene. The calculated confusion matrices for these three test areas are shown in Figure 8a–c.

As can be seen from the confusion matrices above, for different categories in the point cloud scene, the distinction of grass, trees, buildings and power lines reached a high accuracy, and even the segmentation accuracy of powerline reached almost 100%. On the other hand, the discrimination performance of proposed network model on open ground is not satisfactory, and more open ground points are wrongly classified as road. Combined with the feature distribution of point clouds mentioned in Section 2 (Figure 2), we found that the feature distribution of the two categories is similar and the feature density is high, which is easily causes confusion and cannot be distinguished. However, it has a high recognition rate for powerlines with low density feature distribution. Therefore, the above experiments also prove that it is necessary to improve the segmentation accuracy of point cloud scenes by removing redundancy or reducing feature similarity. In the next section, we conduct a series of comparative experiments using representative deep learning network models in the research field of point cloud segmentation to objectively evaluate the advantages and disadvantages of our method.

3.3. Comparative Experimental Performance

In the work of this paper, it is necessary to objectively verify and evaluate the effectiveness and performance of the proposed method. We conduct comparative experiments on five advanced deep learning models, namely RandLA [34], PointNet [29], PointNet++ [30], KPConv [33] and DGCNN [32]. In the experiments, we use xyz + three-channel intensity + SVD-extracted features with a total of seven dimensions of data as input. Simultaneously, other deep learning network models use the same data input and conduct segmentation experiments on the large-scale multispectral LiDAR point cloud scene. Then, the trained model is utilized to make predictions within a few minutes, form visual outputs and calculate specific evaluation metrics. The visualization results of our method and other network models are shown in Figure 9 and Table 2 (the scale in Figure 9 is in meters).

From Figure 9, the visual output results of the point cloud scene segmentation in the test area for the contrasting methods are shown, where overall all methods obtain the outline of the scene. We can see that the level of detail reproduced in the scene varies significantly from method to method. For large regions such as road, grass and tree, all methods are able to distinguish better. However, for small regions such as building, open ground and powerline, the distinction does not reach the desired level. In particular, different categories at the junction can easily cause confusion and misclassification. It is for this reason that we designed the CSF block, which enhances the exploitation of semantic information to alleviate this problem. As we can see in Figure 9b,c, the proposed method in this paper performs better in distinguishing local features compared with the original RandLA method.

In addition, as shown in Table 2, the results of evaluation metrics show that all methods have achieved effective segmentation, where OA, mIOU, F1-Score and Kappa of each method are above 82%, above 58%, above 68% and above 0.76%, respectively. Specifically, our proposed methods achieved satisfactory levels in all three tested areas, where the bolded values represent the highest values achieved. It is worth mentioning that our method achieves state-of-the-art in terms of OA, mIoU, F1 scores and Kappa coefficients for the test area’s evaluation metrics with 7/12 results. In particular, the resultant metrics of point cloud segmentation are improved in the network model with the addition of CSF blocks compared to the original RandLA. Specifically, the significant improvement of mIOU also proves that CSF blocks play an effective role in exploiting semantic fusion.

Overall, in this paper we propose a deep learning network model for multispectral LiDAR scene segmentation applied to real application scenarios. We tested the proposed method on real multispectral lidar land cover data and carried out comparative experiments. Quantitative evaluation shows that our method achieved good performance and reached the same level as the most advanced methods. In the next section, we further discuss and quantitatively evaluate the approach proposed in this paper.

4. Discussion with Ablation Experiment

From the comparative experiments in the previous section, it can be seen that the proposed network achieves better performance than the original RandLA [34]. Therefore, we further conducted an ablation experiment. We analyzed and discussed the effects of two experimental configurations, including the CSF block and SVD. At the same time, we also produced visual and quantitative evaluation results for comparison.

4.1. Promotion of CSF Block to Backbone Network

The backbone network of this method uses random point sampling instead of a complex point selection method. Based on this model, we added a CSF block to leverage the semantic information for practical scenario applications, with the aim of improving the accuracy in real point cloud semantic segmentation tasks. In the designed CSF block, we worked on fusing local geometry and feature content based on 3D spatial geometric associativity and embed it into the backbone network. The fusion encoding of 3D spatial geometric correlation and spectral feature content in CSF blocks was accomplished in the backbone network, which helps to improve the effectiveness of the proposed network for point cloud feature extraction.

To this end, we conducted experiments on real land cover point clouds using the original RandLA and our improved network model and acquired a number of quantitative evaluation results to objectively assess the effectiveness of our proposed method. The following Table 3, Figure 10 and Figure 11 are the experimental results we obtained (the scale in Figure 10 is in meters).

From the comparison of the various results obtained above, it can be concluded that the addition of our designed CSF block to the original RandLA model can fully exploit the semantic information of the point cloud to improve the accuracy of the semantic segmentation. First of all, the results of the evaluated metrics show that the proposed method really improves the level of all metrics, compared to the original RandLA network. In particular, the significant improvement in mIOU also proves that the proposed method is more precise for semantic segmentation of different categories. Secondly, the comparison of the visualization output shows that the proposed method has better segmentation for open ground and building with smaller region features. Combined with the computed confusion matrix, the proposed method achieves a significant improvement in the segmentation accuracy of the easily confused open ground class while maintaining better segmentation results.

4.2. The Enhancement from SVD

Moreover, in order to improve the processing capability of our network model for real multispectral LiDAR point cloud data, we designed the operation of data preprocessing using the SVD method. The purpose was to solve the redundancy problem of point cloud feature distribution that we were faced with in the experiment. Here, we conducted a set of comparison experiments to confirm the effectiveness of the redundancy reduction operation for data preprocessing. Firstly, we completed the experiments in three test areas, using raw data without SVD preprocessing. Then, we compared the obtained results with the experimental results where the SVD data preprocessing operation was performed, for which the specific evaluation metrics for the three test areas are shown in Table 4.

From Table 4, we see that the scene segmentation accuracy of the point cloud land cover with the addition of SVD is improved compared with the results obtained using only the original data. The scores of various evaluation indexes also rose to a certain extent, the results for mIOU improved in particular. Therefore, combined with previous analysis of the applied point cloud features, which show dense and high similarity, which may cause the network to extract invalid feature, the above comparative experimental results can demonstrate the effectiveness of redundancy removal by SVD.

Finally, it is worth mentioning that the level of point cloud segmentation varies depending on the specific methods. In this work, we objectively verify and evaluate the effectiveness and performance of the proposed method and conduct comparative experiments on five advanced deep learning models: RandLA [34], PointNet [29], PointNet++ [30], KPConv [33] and DGCNN [32]. By comparing the experimental results, we can see that the level of detail reproduced by different methods in the scene is significantly different. For large areas of point clouds such as roads, grass and tree, all methods can achieve satisfactory differentiation. Not to be ignored, for small areas of point clouds such as building, open ground and powerline, the difference is not clear enough. In particular, the different categories at the junction are prone to confusion, which may be caused by the high similarity and spatial adjacency of the features. It is for this reason that we designed CSF blocks, which enhance the use of semantic information to alleviate this problem. Through comparative experiments, the method proposed in this paper shows better performance in distinguishing local features. However, further essential improvement may require more prior dimensional information.

5. Conclusions

In conclusion, we proposed a deep learning network that can leverage contextual semantic information to complete the semantic segmentation of large-scale point clouds. For this, we referred to RandLA, an efficient semantic segmentation network model that can be used for large-scale point clouds and propose a kind of deep learning network that exploit semantic fusion to accomplish the task of segmenting a real multispectral LiDAR point cloud land cover scene. Initially, we designed the data pre-processing operation of SVD to cope with real large-scale land cover point cloud scenes with dense and high similarity of feature distribution. The step of data preprocessing was used to de-redundant the features and retain the valid features necessary to improve the performance of the subsequent network model in the semantic segmentation task of a real point cloud scene. Next, we introduced the designed network model. In large-scale point cloud scenes, it is a critical task to efficiently handle the semantic segmentation task of a large number of point clouds. Inspired by previous work, we referred to RandLA, an efficient semantic segmentation network model that can be used for large-scale point clouds. The key to this approach is the use of random point sampling instead of the complex point selection method. Based on this model, we added a CSF block to leverage the semantic information for practical scenario applications, with the aim of improving accuracy in real point cloud semantic segmentation tasks. In the designed CSF block, we worked on fusing local geometry and feature content based on 3D spatial geometric associativity and embedded it into a backbone network. The fusion encoding of 3D spatial geometric correlation and spectral feature content in CSF blocks was accomplished in the backbone network, which helps to improve the effectiveness of the proposed network for point cloud feature extraction.

Finally, we conducted a series of comparative experiments on a real land cover multispectral LiDAR point cloud to professionally evaluate the practical performance of the proposed method. In order to objectively validate the effectiveness and performance of the proposed method, we conducted comparative experiments on five representative deep learning models including RandLA, PointNet, PointNet++, KPConv and DGCNN. From the comparison results, we can confirm that the proposed method achieves satisfactory results for the actual semantic segmentation of point clouds. In particular, our method shows better performance in distinguishing local features. Meanwhile, the quantitative evaluation metrics also prove that it achieves state-of-art level. It is remarkable that during the experiment we found that all the methods were able to recover the point cloud scene contours but could not achieve a high level of segmentation for the category with small regions, which may be caused by the high similarity and spatial adjacency of the features. How to improve the accuracy of small regions segmentation is the focus of one of our future works.

Author Contributions

Methodology, K.X.; validation, K.X. and J.Q.; writing—original draft preparation, K.X.; writing—review and editing, K.X. and T.L.; resources, K.X. and Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the National Natural Science Foundation of China (grant numbers 91948303-1 and 61803375) and the Postgraduate Scientific Research Innovation Project of Hunan Province (grant number QL20210018).

Data Availability Statement

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Acknowledgments

The authors acknowledge the State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, China. The authors would also like to thank Jonathan Li.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

KNN	K-nearest neighbor
MLP	Multilayer perceptron
SVD	Singular value decomposition
CSF	Context semantic fusion

References

Shi, S.; Bi, S.; Gong, W.; Chen, B.; Chen, B.; Tang, X.; Qu, F.; Song, S. Land Cover Classification with Multispectral LiDAR Based on Multi-Scale Spatial and Spectral Feature Selection. Remote Sens. 2021, 13, 4118. [Google Scholar] [CrossRef]
Ekhtari, N.; Glennie, C.; Fernandez-Diaz, J.C. Classification of airborne multispectral lidar point clouds for land cover mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2068–2078. [Google Scholar] [CrossRef]
Teo, T.-A.; Wu, H.-M. Analysis of land cover classification using multi-wavelength LiDAR system. Appl. Sci. 2017, 7, 663. [Google Scholar] [CrossRef] [Green Version]
Matikainen, L.; Karila, K.; Hyyppä, J.; Litkey, P.; Puttonen, E.; Ahokas, E. Object-based analysis of multispectral airborne laser scanner data for land cover classification and map updating. ISPRS J. Photogramm. Remote Sens. 2017, 128, 298–313. [Google Scholar] [CrossRef]
Wei, G.; Shalei, S.; Bo, Z.; Shuo, S.; Faquan, L.; Xuewu, C. Multi-wavelength canopy LiDAR for remote sensing of vegetation: Design and system performance. ISPRS J. Photogramm. Remote Sens. 2012, 69, 1–9. [Google Scholar]
Ibrahim, M.; Akhtar, N.; Ullah, K.; Mian, A. Exploiting Structured CNNs for Semantic Segmentation of Unstructured Point Clouds from LiDAR Sensor. Remote Sens. 2021, 13, 3621. [Google Scholar] [CrossRef]
Zhang, Z.; Li, T.; Tang, X.; Lei, X.; Peng, Y. Introducing Improved Transformer to Land Cover Classification Using Multispectral LiDAR Point Clouds. Remote Sens. 2022, 14, 3808. [Google Scholar] [CrossRef]
Handayani, H.H.; Bawasir, A.; Cahyono, A.B.; Hariyanto, T.; Hidayat, H. Surface drainage features identification using LiDAR DEM smoothing in agriculture area: A study case of Kebumen Regency, Indonesia. Int. J. Image Data Fusion 2022, 6, 240. [Google Scholar] [CrossRef]
Lin, X.; Xie, W. A segment-based filtering method for mobile laser scanning point cloud. Int. J. Image Data Fusion 2022, 13, 136–154. [Google Scholar] [CrossRef]
Zhao, J.; Zhao, X.; Liang, S.; Zhou, T.; Du, X.; Xu, P.; Wu, D. Assessing the thermal contributions of urban land cover types. Landsc. Urban Plan. 2020, 204, 103927. [Google Scholar]
Morsy, S.; Shaker, A.; El-Rabbany, A. Multispectral LiDAR data for land cover classification of urban areas. Sensors 2017, 17, 958. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Fernandez-Diaz, J.C.; Carter, W.E.; Glennie, C.; Shrestha, R.L.; Pan, Z.; Ekhtari, N.; Singhania, A.; Hauser, D.; Sartori, M. Capability assessment and performance metrics for the Titan multispectral mapping lidar. Remote Sens. 2016, 8, 936. [Google Scholar] [CrossRef]
Wichmann, V.; Bremer, M.; Lindenberger, J.; Rutzinger, M.; Georges, C.; Petrini-Monteferri, F. Evaluating the potential of multispectral airborne LIDAR for topographic mapping and land cover classification. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 2, 113–119. [Google Scholar] [CrossRef] [Green Version]
Bakuła, K.; Kupidura, P.; Jełowicki, Ł. Testing of land cover classification from multispectral airborne laser scanning data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 161–169. [Google Scholar] [CrossRef] [Green Version]
Li, W.; Wang, F.D.; Xia, G.S. A geometry-attentional network for ALS point cloud classification. ISPRS J. Photogramm. Remote Sens. 2020, 164, 26–40. [Google Scholar]
Scaioni, M.; Höfle, B.; Kersting, A.B.; Barazzetti, L.; Previtali, M.; Wujanz, D. Methods from information extraction from lidar intensity data and multispectral lidar technology. ISPRS J. Photogramm. Remote Sens. 2018, 42, 1503–1510. [Google Scholar] [CrossRef] [Green Version]
Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4340–4354. [Google Scholar] [CrossRef]
Lawin, F.J.; Danelljan, M.; Tosteberg, P.; Bhat, G.; Khan, F.S.; Felsberg, M. Deep Projective 3D Semantic Segmentation. In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Ystad, Sweden, 22–24 August 2017; pp. 476–483. [Google Scholar]
Boulch, A.; Le Saux, B.; Audebert, N. Unstructured point cloud semantic labeling using deep segmentation networks. in3DOR 2017, 3, 17–24. [Google Scholar]
Wu, B.; Wan, A.; Yue, X.; Keutzer, K. SqueezeSeg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3D lidar point cloud. ICRA 2018, 25, 1887–1893. [Google Scholar]
Wu, B.; Zhou, X.; Zhao, S.; Yue, X.; Keutzer, K. SqueezeSegV2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. ICRA 2019, 39, 4376–4382. [Google Scholar]
Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. RangeNet++: Fast and Accurate Lidar Semantic Segmentation. In Proceedings of the IROS, Macau, China, 4–8 November 2019; pp. 4213–4220. [Google Scholar]
Meng, H.Y.; Gao, L.; Lai, Y.K.; Manocha, D. VV-Net: Voxelvae Net with Group Convolutions for Point Cloud Segmentation. In Proceedings of the ICCV, Seoul, Republic of Korea, 29 October 2019; pp. 8499–8507. [Google Scholar]
Rethage, D.; Wald, J.; Sturm, J.; Navab, N.; Tombari, F. Fully-Convolutional Point Networks for Large-Scale Point Clouds. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 235–242. [Google Scholar]
Dai, A.; Nießner, M. 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 458–474. [Google Scholar]
Jaritz, M.; Gu, J.; Su, H. Multi-View pointNet for 3D Scene Understanding. In Proceedings of the ICCVW, Seoul, Republic of Korea, 29 October 2019; pp. 3995–4003. [Google Scholar]
Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 23, 4338. [Google Scholar] [CrossRef] [PubMed]
Xie, Y.; Tian, J.; Zhu, X.X. Linking Points with Labels in 3D: A Review of Point Cloud Semantic Segmentation. Geosci. Remote Sens. 2020, 8, 38–59. [Google Scholar] [CrossRef] [Green Version]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep Learning on Point Sets for 3d Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 4–7. [Google Scholar]
Chen, Y.; Liu, G.; Xu, Y.; Pan, P.; Xing, Y. PointNet++ Network Architecture with Individual Point Level and Global Features on Centroid for ALS Point Cloud Classification. Remote Sens. 2021, 13, 472. [Google Scholar] [CrossRef]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 2018, 38, 1–12. [Google Scholar] [CrossRef] [Green Version]
Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. KPConv: Flexible and Deformable Convolution for Point Clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 29 October 2019; pp. 6410–6419. [Google Scholar]
Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Markham, A. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11105–11114. [Google Scholar]
Lu, H.; Chen, X.; Zhang, G.; Zhou, Q.; Ma, Y.; Zhao, Y. Scanet: Spatial-Channel Attention Network for 3D Object Detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1992–1996. [Google Scholar]
Jing, Z.; Guan, H.; Zhao, P.; Li, D.; Yu, Y.; Zang, Y.; Wang, H.; Li, J. Multispectral LiDAR Point Cloud Classification Using SE-PointNet++. Remote Sens. 2021, 13, 2516. [Google Scholar] [CrossRef]
Lin, L.; Huang, P.; Fu, C.-W.; Xu, K.; Zhang, H.; Huang, H. On Learning the Right Attention Point for Feature Enhancement. Sci. China Inf. Sci. 2022, 7, 1674–1686. [Google Scholar] [CrossRef]
Liao, R.; Yang, L.; Ma, L.; Zhu, J. In-motion continuous point cloud measurement based on bundle adjustment fused with motion information of triple line-scan images. Opt. Express 2022, 30, 21544–21567. [Google Scholar] [CrossRef] [PubMed]
Chen, B.; Shi, S.; Sun, J.; Gong, W.; Yang, J.; Du, L.; Guo, K.; Wang, B.; Chen, B. Hyperspectral lidar point cloud segmentation based on geometric and spectral information. Opt. Express 2019, 27, 24043–24059. [Google Scholar] [CrossRef]
Himmelsbach, M.; Hundelshausen, F.V.; Wuensche, H. Fast segmentation of 3D point clouds for ground vehicles. IEEE Intell. Veh. Symp. 2010, 11, 560–565. [Google Scholar]
Yang, J.; Zhang, D.; Frangi, A.F.; Yang, J.Y. Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 131–137. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nagabhushan, P.; Guru, D.S.; Shekar, B.H. Rapid and brief communication: Visual learning and recognition of 3D objects using two-dimensional principal component analysis: A robust and an efficient approach. Pattern Recognit. 2006, 39, 721–725. [Google Scholar] [CrossRef]
Zhang, D.; Zhou, Z.H. Letters: (2D)2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition. Neurocomputing 2005, 69, 224–231. [Google Scholar] [CrossRef]
Zhang, Y.Y.; Liu, X.Y.; Wang, H.J. Saliency detection via two-directional 2DPCA analysis of image patches. J. Light Electronoptic 2014, 1, 125–138. [Google Scholar] [CrossRef]
Zhao, L.; Yang, Y. Theoretical Analysis of Illumination in PCA-Based Vision Systems. Pattern Recognit. 1999, 32, 547–564. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar]
Turpin, A.; Scholer, F. User Performance Versus Precision Measures for Simple Search Tasks. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 6–11 August 2006; pp. 11–18. [Google Scholar]
Grouven, U.; Bender, R.; Ziegler, A.; Lange, S. The kappa coefficient. Dtsch. Med. Wochenschr. 2007, 132, 65–68. [Google Scholar] [CrossRef] [Green Version]
Guo, Z.; Du, S.; Li, M.; Zhao, W. Exploring GIS knowledge to improve building extraction and change detection from VHR imagery in urban areas. Int. J. Image Data Fusion 2015, 7, 42–62. [Google Scholar] [CrossRef]

Figure 1. Presentation of the study areas.

Figure 2. The feature distribution of spectral intensity in 3 channels of point cloud.

Figure 3. The framework of the deep learning network proposed in this paper.

Figure 4. The structure in the feature aggregation module of each layer.

Figure 5. The proposed context semantic fusion (CSF) block to leverage semantic information.

Figure 6. The accuracy curve and loss curve output during the training process.

Figure 7. The visualized results and groundtruth for the test areas.

Figure 8. The calculated confusion matrices for three test areas.

Figure 9. The Visualization output of different methodological models.

Figure 10. The visualization output of the original RandLA and the proposed method in this paper.

Figure 11. The confusion matrix of the original RandLA and the proposed method in this paper.

Table 1. The calculated evaluation metrics for the test areas.

Test area	OA (%)	mIOU (%)	F1-Score (%)	Kappa
Area11	95.25	80.56	85.80	0.93
Area12	96.07	81.12	85.19	0.94
Area13	94.45	82.97	89.08	0.90

Table 2. The results of evaluation metrics from different methodological models.

Network Model	OA (%)		mIOU (%)	F1-Score (%)	Kappa
Proposed Method	Area11	95.25	80.56	85.80	0.93
	Area12	96.07	81.12	85.19	0.94
	Area13	94.45	82.97	89.08	0.90
RandLA	Area11	94.24	69.14	76.64	0.90
	Area12	93.07	72.31	78.93	0.89
	Area13	93.64	81.77	87.91	0.89
PointNet	Area11	85.79	58.76	69.54	0.81
	Area12	86.38	60.12	68.79	0.79
	Area13	82.56	61.03	70.16	0.76
PointNet++	Area11	95.04	69.60	76.24	0.91
	Area12	95.97	78.90	83.35	0.93
	Area13	92.13	75.58	81.90	0.87
KPConv	Area11	96.05	81.06	86.02	0.93
	Area12	95.74	78.88	82.81	0.93
	Area13	94.37	82.95	88.74	0.91
DGCNN	Area11	95.98	73.52	80.17	0.92
	Area12	96.03	79.25	85.71	0.92
	Area13	94.12	78.42	88.42	0.90

Table 3. The evaluation metrics for the original RandLA and the proposed method in this paper.

Network Model	OA (%)		mIOU (%)	F1-Score (%)	Kappa
Proposed Method	area11	95.25	80.56	85.80	0.93
	area12	96.07	81.12	85.19	0.94
	area13	94.45	82.97	89.08	0.90
RandLA	area11	94.24	69.14	76.64	0.90
	area12	93.07	72.31	78.93	0.89
	area13	93.64	81.77	87.91	0.89

Table 4. The evaluation metrics from operation with and without SVD.

Network Model	OA (%)		mIOU (%)	F1-Score (%)	Kappa
Added SVD	Area11	95.25	80.56	85.80	0.93
	Area12	96.07	81.12	85.19	0.94
	Area13	94.45	82.97	89.08	0.90
Without SVD	Area11	94.36	78.32	82.61	0.90
	Area12	94.70	79.34	81.53	0.92
	Area13	93.07	80.06	85.70	0.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, K.; Qian, J.; Li, T.; Peng, Y. Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network. Remote Sens. 2023, 15, 243. https://doi.org/10.3390/rs15010243

AMA Style

Xiao K, Qian J, Li T, Peng Y. Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network. Remote Sensing. 2023; 15(1):243. https://doi.org/10.3390/rs15010243

Chicago/Turabian Style

Xiao, Kai, Jia Qian, Teng Li, and Yuanxi Peng. 2023. "Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network" Remote Sensing 15, no. 1: 243. https://doi.org/10.3390/rs15010243

APA Style

Xiao, K., Qian, J., Li, T., & Peng, Y. (2023). Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network. Remote Sensing, 15(1), 243. https://doi.org/10.3390/rs15010243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Multispectral LiDAR Data

2.2. Preprocessing by Singular Value Decomposition

2.3. Framework of Deep Learning Network

2.3.1. Backbone of the Network

2.3.2. Contextual Semantic Fusion Block

3. Experiments and Results

3.1. Experimental Configuration

3.2. Overall Performance

3.3. Comparative Experimental Performance

4. Discussion with Ablation Experiment

4.1. Promotion of CSF Block to Backbone Network

4.2. The Enhancement from SVD

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI