2. Methodology
It is the goal of this paper to present an efficient CRF-based framework for semantic classification from ALS point cloud data without the use of image data providing spectral information. Firstly, multiple features of ALS point clouds are processed mainly based on their locations which can efficiently improve the results of the point-based classification process. Secondly, a Random Forests (RF) classifier is employed to produce the soft labeling results. Some outliers are found in the initial semantic result, then, a CRF framework is presented to smooth the result with context information between neighboring points. However, we find that it is of low accuracy for the objects with a small size, especially for cars. LCS-CRF is proposed to solve this problem and can achieve higher overall accuracy with a higher-order potential. Cluster-based features are extracted on the cluster obtained by a constrained mean-shift clustering method and semantic rules are defined. Then, based on the common knowledge of semantic rules, we define the higher-order potentials. Finally, the location, context and semantics cues are, respectively, encoded by unary, pairwise and higher-order potentials. Once fused, they can provide complementary information from varying perspectives, to improve the ALS point cloud semantic classification performance. A mean-field approximate inference method is employed to obtain the semantic classification results.
Figure 1 shows the flowchart of the proposed method.
2.1. Feature Extraction
2.1.1. Point-Based Feature Extraction
Three types of features are employed in this section, geometric features from the ALS point cloud properties, local shape features from the structure tensor and primitive features from the data source. Since the distinctiveness of point-based features strongly depends on the respective neighborhood encapsulating those 3D points, a data-driven approach is proposed to determine the neighborhood size by selecting the number of nearest neighbors in the local 3D neighborhood of each individual point with eigenentropy-based scale selection [
7]. The neighbor size can be determined based on the minimum eigenentropy by varying values of the scale parameter:
where
is the eigenentropy of
ith point based on the scale parameter
, and
represents the optimal value for
ith point. Three eigenvalues (
,
= 1, 2, 3) can be derived by the symmetric positive semi-definite 3D structure tensor
, which is obtained by the
nearest neighbors of each point. In the scope of our work, scales parameters within an interval
are considered, with a lower boundary of
= 10 neighbors to remain robustness statistically [
25,
26,
27] and an upper boundary of
= 100 to limit the computational effort.
After the recovery of local neighborhoods, we congregate some features which well-suit this semantic classification for ALS point cloud. The features used in our work are shown in
Table 1. The point-based feature vector comprises 34 elements.
Height
above Digital Terrain Model (DTM) is a discriminating feature to distinguish different classes. The DTM can be generated based on the local topography of the scene [
26]. General geometric properties are represented by the radius
of the sphere encompassing
nearest neighbors and the maximum difference
within the neighborhood. Density
, principle curvatures
and
, Gaussian curvature
, mean curvature
, and verticality
[
28] are used to describe the basic properties of ALS data, which has been demonstrated their efficiency by feature important analysis. Normal vector relationships
and curvature
(i.e., normal change rate) are also derived in this work.
means the variance of above geometric features in a sphere of radius
. With the
nearest neighbors of each point, 3D structure tensor
can be derived to obtain 8 local shape features: linearity
, planarity
, scattering
, omnivariance
, anisotropy
, eigenentropy
, sum of eigenvalues
, and change of curvature
. Intensity
obtained directly by the ALS laser and its variance
in a sphere of radius
comprise the primitive feature set. In analogy to the 3D case, 2D projection of the 3D points onto XY-plane can reveal complementary information, especially for perfectly vertical structures. Then,
defined by the circle encompassing
nearest neighbors, 2D structure tensor
(sum of eigenvalues
, ratio of eigenvalues
), density
[
29], and its variance
are also derived as the elements of the point-based feature vector.
2.1.2. CSF with RANSAC
To increase the efficiency of LCS-CRF, off-ground points are employed to extract cluster-based features for the higher-order potentials. CSF [
30] algorithm can be used to extract off-ground point from LiDAR data, which has been shown superior performance compared with other ground filtering methods.
Two difficulties should be overcome for the ground filtering for ALS point cloud, i.e., insufficient information of small size objects for clustering which will have an obvious effect on the class accuracy, overall accuracy of classification result [
17], and misjudgment between ground and classes with lower height (e.g., low vegetation). Then, RANSAC [
31] is integrated with CSF to solve these problems, which is able to segment ground and off-ground points simultaneously. Pseudocode of Algorithm 1 for the RANSAC-based CSF algorithm is shown in
Appendix A.
Off-ground points set is generated in Algorithm 1 and the result is shown in
Figure 2. More information of small size objects (e.g., car) and lower error samples between ground and classes are obtained. Then, clustering is performed on the off-ground points.
2.1.3. Off-Ground Points Clustering
In this section, we first derive an over-segmentation of ALS point cloud by applying the mean-shift algorithm [
32,
33], a mountain climbing algorithm based on kernel density estimation without the need to initially specify the number of clusters. An adaptive gradient ascent is applied in the iterations of this algorithm, where shift vector value
will be larger in areas of low point density and lower in areas of high point density [
4]. An isotropic Gaussian kernel
is adopted, and shift vector value
of point
can be defined as:
where
represents the set of current point’s neighbors within the radius of
, and
denotes the kernel width selected based on the point distribution for a considered scene.
In this work, off-ground ALS data is heterogeneous and it is hard to distinguish different classes closed to each other in distance space (e.g., car and building or building and vegetation). Then, a constrained mean-shift algorithm is proposed, i.e., a post-processing step for the initial over-segmentation performed by mean-shift algorithm and two initial clusters with a low dissimilarity are preferred to be combined into one cluster. Two constraints are used for the dissimilarity discriminate between initial clusters:
The pseudocode of Algorithm 2, which shows the details of the constrained mean shift algorithm, is presented in
Appendix B.
Clusters of different classes exhibit different characteristics, which can be used to extract more discriminative cluster-based features. Clusters derived from mean-shift algorithm, as shown in
Figure 3a, are scattered and cluttered, which cannot show the special information for different classes. But, as shown in
Figure 3b, more accuracy and discriminative information are provided to perform cluster-based feature extraction.
2.1.4. Cluster-Based Feature Extraction
In contrast to point-based feature extraction, features of cluster are extracted in this section. Point-based features can describe the details of a single point, whereas whole level information for different classes can be obtained from clusters and used to derive higher-order potentials. Herein, five features are extracted from each cluster:
Hight
Hight above ground measured by the barycenter of the cluster is used to distinguish the roofs and other classes (e.g., cars, low vegetations), as even the lowest roofs are generally higher than cars or low vegetations.
Distribution of ground points
A circular region centered on the cluster center can be divided in to angular bins. The distribution of ground points can be described by the proportion of bins containing ground points [
35]. This feature can be used to classify objects which are adjacent to ground.
Roughness
Roughness can be determined by the variance in distances between the points and the fitting plane computed on its kernel size, namely the scale of a sphere containing nearest points. Smooth surface, such as roofs and facades, can be distinguished by this feature from other classes (e.g., cars, vegetations).
Compactness
Compactness can be measured by the volume of the convex hull divided by the area for each cluster. The number of points in a cluster is defined here as the area. A small compactness will be obtained for erect or small size classes.
Normal correlation
This feature can be measured by the correlation between normal vectors of cluster and the vertical direction of the horizontal plane, which has shown a better performance for regular classes compared with other classes.
All above cluster-based features have been proven effective in distinguishing one or more classes from others. As shown in
Figure 4, each feature’s capacity is distinguished with different color-coded values.
2.2. The LCS-CRF Model
To conveniently describe the semantic classification problems, we first establish the notations and definitions used throughout the paper. Consider the input ALS point cloud , where () represents a 3D point corresponding to the vertices in a graphical model, and is the total number of points. A labeled point cloud can be represented by vector , containing the labels for all points. takes its value from the label set , where denotes the number of classes. Edges are used to model the relations between pairs of adjacent points and . Then, an undirected graphical model with graph consisting of nodes and can be constructed.
2.2.1. Pairwise CRF Model
Pairwise CRF model is widely used in semantic classification [
13,
36,
37] to model the spatial interaction in both the labels and observed values, which is of importance in semantic classification. It is a discriminative classification approach, which directly models the posterior probability of the label
conditioned on the observed data
[
38,
39]. No more than two kinds of cliques are defined in a pairwise CRF. With the Hammersley–Clifford theorem, the CRF model as a Gibbs distribution can be modeled by:
where
is the partition function,
the set of all the cliques, and
the potential function defined over the clique
to model the relationship of the random variables. An assignment of all the random variables (i.e., a labeling) takes values from
. Based on the Bayesian maximum a posteriori rule, the most likely labeling
is inferred based on the given observation, which can be described as:
The semantic classification problem with pairwise CRF model is therefore equivalent to finding the minimization of the Gibbs energy function
, which can be described by the sum of the unary and pairwise potentials. As a special case of Equation (6),
is formulated as:
where
is the unary potential term, a proxy for the initial probability distribution across semantic classes, and
is the pairwise potential term to keep smoothness and consistency between predictions.
2.2.2. LCS-CRF Model
Compared with pairwise CRF, richer statistics of point cloud can be captured by LCS-CRF. The problem of misclassification among different classes can be efficiently addressed by encoding higher-order semantics information, which can be employed in CRF model to improve the semantic classification performance. In our work, the potential functions are divided in three parts (i.e., unary, pairwise, and higher-order potentials) based on various cliques:
where
represents the set of higher-order cliques, and
are the higher-order potentials defined over cliques.
Then, the mean-field approximate inference algorithm is employed to optimize the energy function to obtain the final labels. Specifically, the location, context, and semantics are congregated in a higher-order CRF model, and the flowchart of the LCS-CRF-based semantic classification implemented in our study is shown in
Figure 5.
2.3. LCS-CRF Energies
2.3.1. Point-based Features for Unary Potentials
The location information of point
and its optimal neighbors are used to determine the point-based feature vectors, by which the unary potentials
linking the point to the class labels determines the most probable label for a single point. The unary potentials
can be defined by a discriminative classifier with a probabilistic output [
40].
An ensemble learning method, RF classifier is employed to produce the soft labeling results for the unary potentials. RF classifier, constructing a multitude of decision trees during training and integrating the class probabilities of the individual trees at a testing stage, has shown a superior performance based on its robustness, high accuracy, and feasibility for ALS data [
9]. In the implementation, each decision tree casts a vote for the most likely class. If the number of votes casts for a class
is
, the unary potential is defined by
where
is the total number of decision trees. Based on the point-based features, the location cues are directly used to discriminate the ALS points by the class membership probabilities.
2.3.2. Weighted Potts Model
The pairwise potential
incorporates the contextual cues based on the spatial smoothing dependence principle. Based on the prior spatial knowledge, neighboring points are expected to take the same label. The weighted Potts model has been shown to work well for semantic classification in many previous studies [
41,
42]. Herein, the pairwise potential takes the form of:
where
and
represent the observed values and 3D coordinates. The label compatibility function
, the weights of the spatial kernel and bilateral kernel
and
, and the parameters of Gaussian kernels
,
, and
are learned on the training set with the implementation provided in Reference [
43].
Based on the spatial relationship, contextual relations between classes can be modeled and weighting factors are defined depending on how likely two classes occur near each other.
2.3.3. Higher-Order Potentials
Higher-order potentials are incorporated in a CRF model to capture richer perception between features and classes with semantics cues. In our work, the higher-order potentials are directly modeled by the cluster-based features with a sigmoid function. The sigmoid function is usually used as the activation function in many classification methods [
44,
45,
46], which can be seen in
Figure 6.
Before computing the higher-order energy of CRF defined in (9), the cluster-based features are normalized in
to balance the perception between features and classes. Furthermore, because some features are only discriminated and beneficial for specified classes, the perception of all of the cluster-based features with regard to the labels on two test datasets, described in
Section 3.1, can be summarized in
Table 2, respectively. To simplify the description, the perception between a normalized feature
and each label
,
, can be modeled by:
where
is the scale parameter, and
the translation parameter.
Specifically, some semantic rules are defined to adjust the higher-order potentials. Discriminative thresholds
and
for
and
, respectively, can be used to classify buildings and vehicles. Buildings and facades have a lower value in
, which must be smaller than a threshold
. The values of
,
, and
are semantically defined based on common knowledge, and are generally suitable in all scenes. Then, the higher-order potentials are defined as:
where
is the normalized set of cluster-based features. We consider that off-ground points in a cluster share the same higher-order potential. To reduce the complexity of inference, the higher-order potentials can be rewritten by class membership probabilities and turned into unary potentials [
43]. The integrated unary potentials can be written as:
where
is a free parameter from 0 to 1, to compromise the location cues and semantics cues.
2.4. Evaluation Metrics
For evaluation, we compare the derived semantic labeling to the ground truth on a per-point basis. The confusion matrix and five commonly used measures are employed. The evaluation metrics are represented by overall accuracy (OA), Kappa coefficient (KA), recall (
R), precision (
P), and
F1-score. Generally, the number of examples per class is inhomogeneous in the test data, and then OA and KA are used to reflect the overall performance and the degree of consistency. Meanwhile,
R represents a measure of completeness or quantity, and
P represents a measure of exactness or quality. The
F1-score is a compound metric which combines
P and
R with equal weights.
Appendix C describes the formulas in detail.
3. Experimental Analysis
To evaluate the performance of the proposed LCS-CRF algorithm, experiments with two ALS data sets were performed on a Windows 10 64-bit, Intel Core i7-4790k 4.00GHz processor with 32 GB of RAM, using Python language.
3.1. Study Areas
Two labeled benchmark datasets, Vaihingen Dataset (
Figure 7) and GML Dataset A (
Figure 8), are employed to evaluate our methodology for ALS data of different characteristics.
The Vaihingen Dataset is provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF) and was acquired with a Leica ALS50 system over Vaihingen, Germany, with an average point density of 4 points/m
2. In the scope of the ISPRS Benchmark on 3D Semantic Labeling, a reference labeling was performed with respect to nine semantic classes, (namely, power line, low vegetation, impervious surfaces, car, fence/hedge, roof, facade, shrub, and tree). Thereby, each point in the data set is labeled accordingly [
9]. For this dataset, containing about 1.166 M points in total, a split into a training scene (about 754 k points) and a test scene (about 412 k points) is provided. For each point, its XYZ-coordinates and intensity value are provided.
The GML Dataset A is provided by the Graphics & Media Lab, Moscow State University, and publicly available sources. This dataset has been acquired with an ALTM 2050 system (Optech Inc.) and contains about 2.077M labeled 3D points, whereby the reference labeling has been performed with respect to five semantic classes (namely, ground, building, car, tree, and low vegetation). For this dataset, a split into a training scene and a test scene is provided. For each point, its XYZ-coordinates are provided without intensity value.
3.2. Qualitative Comparison
In this section, we mainly focus on the analysis of three stages, i.e., ground points filtering, off-ground points clustering, and LCS-CRF performing.
To visually compare our proposed Algorithm 1 with the CSF method, some small parts with meaningful information are selected from Vaihingen Dataset and GML Dataset A, as shown in
Figure 9. In
Figure 9, each group (
Figure 9a–h) presents the comparison of filtering results for off-ground point with CSF method and our proposed Algorithm 1. We can observe that some confusing object information, especially for small size objective, can be extracted from ground points set, which can be obtained from CSF method. Not only our method can extract off-ground points from ground point sets, but it can also enhance the reliability of higher-order potentials by eliminate the misclassification between off-ground and ground points. Yet, it has two shortcomings: (1) a fraction of ground points are filtered as off-ground points, which cause a coarse cluster-based classification result; and (2) different parameters should be explored for ALS data diversity. To overcome these shortcomings, we further consider the ground as one of the objectives classified in the calculation of higher-order potentials. Besides, sensitivity analysis for parameters is shown in
Section 3.4.1.
Compared with point-based features, the cluster-based features can provide new attributes, upon which semantics cues can be effectively employed. We define five cluster-based features for the derivation of higher-order potentials, which relate closely to the clustering results of off-ground points.
Figure 10 presents the clustering results for the test data from Vaihingen Datasets and GML Dataset A, based on the off-ground points, which are extracted with Algorithm 1. As shown in
Figure 10, class
roof (green in
Figure 10a)/
building (blue in
Figure 10c), which is far from ground with smooth surface; class
car (cyan in
Figure 10a and reseda in
Figure 10c), which has a high correlation with ground; and class
tree (yellow in
Figure 10a)/
high vegetation (orange in
Figure 10c), which has a roughness surface, tend to be aggregated into single cluster, and we can make the utmost of semantics cues on these clusters. Due to the similarity of attributes for some different classes, mis-clusters, which means multiple classes contained in a cluster, also exist in the clustering results. Then, we employ the clustering result to define the higher-order potential in the LCS-CRF model, rather than the final semantic classification result. In the LCS-CRF model, we integrate the point-based features and cluster-based features, which show different attributes for each point and complement mutually.
To better evaluate the effectiveness of the LCS-CRF model, the qualitative results for three classification algorithms (i.e., RF, CRF, and LCS-CRF) of the two test datasets are, respectively, shown in
Figure 11 and
Figure 12. To learn the RF models, 400 trees are sufficient in our work. One thousand training samples for each class are randomly chosen from the reference ground-truth data of Vaihingen Dataset and GML Dataset A. The performance of RF in the case of limited training samples can be shown in
Figure 11a,b and
Figure 12a,b. The soft labeling results for each class, produced by RF, are considered as the unary term of CRF and LCS-CRF.
As can be seen in
Figure 11a,b, RF results in a discontinuous shape with lots of discrete points, due to the lack of consideration for the spatial contextual information. By considering the contextual information to alleviate the effect of noise, CRF can deliver a smoother classification map. Although the classification performance of a CRF model can be promoted dramatically by combining contextual information compared with RF method, their classification performance in keeping useful details are different. Due to the similarity between point-based features of different classes, e.g., ground and low vegetation, tree, and roof, etc., misclassified points are aggregated together, as shown in
Figure 11d, which always directly affects the accurate interpretation of the various classes. It is a challenging task to accurately discriminate similar classes. However, on the whole, our proposed LCS-CRF model can achieve the semantic classification result with fewer misclassified regions and less salt and pepper classification noise by employing location-contextual-semantics cues. As shown in
Figure 11e–f, the proposed model shows a competitive visual performance and can preserve useful detail information.
To verify the robustness of our method, another high-resolution ALS data of a different sensor is used to assess the performance of proposed method. Similarly, the semantic classification results of GML Dataset A obtained by three methods, i.e., RF, CRF, and LCS-CRF, are shown in
Figure 12. Similar to the above test, CRF can deliver smoother results than RF and an improvement in the classification accuracy. Compared with RF model, CRF tends to greatly reduce the classification noise based on context cues. Then, some potentially useful details may also be eliminated. In this experiment, there is a slight difference of point-based features between the class car and low vegetation, which are easily confused. As shown in
Figure 12c,d, an obvious misclassification has been presented. Most
low vegetation points are classified as
car, which limited the accuracies of the low vegetation and car. With the proposed LCS-CRF model, not only the location and context information are considered, but also the semantics to alleviate the misclassification effectively are fused. The visual results in
Figure 12e,f, show an improvement for the
car and
low vegetation classification.
It is observed that our proposed method outperforms RF and CRF. An improvement in the quantitative metrics will be analyzed in the next section, in which the quantitative performances of Vaihingen Dataset and GML Dataset A are also reported.
3.3. Quantitative Comparison
In this section, the corresponding quantitative performances of Vaihingen Dataset and GML Dataset A are reported and analyzed. In accordance with
Figure 11e,f and
Figure 12e,f, our method can correctly label most of the test data. It can achieve a high OA of 83.1% and KA of 78.5% on the Vaihingen Dataset with eight categories of objects and a high OA of 94.3% and KA of 89.3% on the GML Dataset A with five categories of objects.
We classify semantic classification methods for the Vaihingen Dataset into two categories: traditional machine learning-based and deep learning-based. We compare our method with the result provided in Reference [
26] and the submitted results with published papers provided by the ISPRS Semantic Labeling Benchmark. Reference [
5,
47,
48] adopted the traditional machine learning classifiers to classify ALS point clouds, while Reference [
49,
50,
51,
52] leveraged deep learning for the semantic classification. For the sake of clarity and readability, the results achieved by each research group and our model (namely LCS-CRF) are listed for comparison in
Table 3.
We perform experiments on another ALS dataset, i.e., GML Dataset A, to verify the effectiveness of our method. The LCS-CRF model ranks first in terms of the OA and
compared with other methods listed in the
Table 4.
3.4. Sensitivity Analysis for Parameters
In our experiments, the LCS-CRF model obtained a good classification performance. However, there are so many parameters in the LCS-CRF model to be determined, which play an important role in the classification. These parameters distribute in three parts, i.e., Algorithm 1, Algorithm 2, and higher-order potentials.
3.4.1. Parameters for Algorithm 1
The implementation of CSF requires three essential parameters, including the
to determine the number of particles,
to select the distances between points and the simulated terrain, and
to end the simulation process. To study the sensitivity of
and
for the CSF algorithm,
is set to be 200, which is enough for our scene.
varies from 0.2 to 1.2 and 0.2 to 1.0 for the test data of Vaihingen Dataset and GML Dataset A, respectively, with a step of 0.2.
selected from 0.3 to 1.3 and 0.4 to 2.4 for the test data of Vaihingen Dataset and GML Dataset A, respectively, with a step of 0.4. Sensitivity analysis for parameters is presented in
Figure 13. As can be observed, better results can be obtained, which are considered as the initial input of Algorithm 1, with
equal to 0.6 and
equal to 0.5 for Vaihingen Dataset, and
equal to 0.4 and
equal to 1.2 for GML Dataset A.
Although the ground points filtering accuracy can be 95.1% and 96.0% for Vaihingen Dataset and GML Dataset A, more detailed off-ground object information, especially objects with small size and low height, are essential for our scene to improve the semantic classification results. Then, we employ RANSAC for the ground points obtained by CSF to enrich the off-ground information with enough filtering accuracy. The property of RANSAC for each point is mostly determined by two thresholds, the maximum distance to distinctive initial inliers among current point’s neighbors and the minimum inlier ratio to determine whether the current point is an element of ground points set on the premise that the current point belongs to initial inliers.
To find an appropriate value of maximum distance and minimum inlier ratio, we test the procedure with maximum distance varying from 0.1 to 0.4 with a step of 0.05 and 0.1 to 0.8 with a step of 0.1 for the test data of Vaihingen Dataset and GML Dataset A, respectively. It is worth noting that 0.4 and 0.8 are not the cut-off values of maximum distance, only representing the variation tendency of OA for ground/off-ground points. The minimum inlier ratio varies from 0.5 to 0.8 with a step of 0.05 for the two test datasets. For evaluating the filtering results, we utilize OA for ground/off-ground points to analyze the OA. Analysis for these two parameters are shown in
Figure 14. We can observe that the OA for ground/off-ground points converges to a certain value, due to the higher maximum distance and the lower minimum inlier ratio with a slight influence on the OA. To make the results more reliable, the observed results, which can show the details directly, parts of them as shown in
Figure 9, are also considered to determine the values of these two parameters.
In order to obtain more details of off-ground object information and keep the OA, the parameters are utilized to perform Algorithm 1, which are listed in
Table 5. These parameters are determined based on the experimental results (as shown in
Figure 9 and
Figure 14) and the properties of the input ALS point cloud.
3.4.2. Parameters for Algorithm 2
Algorithm 2 is proposed to produce clusters of off-ground points, which can be used to extract discriminative cluster-based features. In the first step, two parameters,
and
, are selected for the mean-shift algorithm, which are based on the prior knowledge about the expected point distribution for the scene we consider. Then, parameters
,
, and
, which were described in
Section 2.1.3, are determined for the post-processing step.
Herein, the performance of Algorithm 2 is mainly evaluated based on the intuitive result, and an experiment example has been shown in
Figure 3. Then, we only provide the configuration of these parameters for Vaihingen Dataset and GML Dataset A, which is shown in
Table 6.
3.4.3. Parameters for Higher-Order Potentials
In the LCS-CRF model, the higher-order potentials are derived with semantics cues based on a Sigmoid function. Two parameters are utilized to determine the formulation of Sigmoid function, and they are, respectively, denoted as
and
. Parameter
mainly controls the scaling of Sigmoid function, while
controls the translation. In this section, we also normalize the cluster-based features into [0,1], and then parameter
is set as 0.5 to consist with the distribution of cluster-based features. The expression of Sigmoid function with different values of parameter
is shown in
Figure 15. The datum line is represented by a red straight line, which is treated as a reference to Sigmoid function. It means that the values of cluster-based features are directly used for the calculation of higher-order potentials. Different curves in the figure represent the projection values of cluster-based features through Sigmoid function with different
. We employ Sigmoid function to enhance the discrimination of the cluster-based features to obtain a better classification result. However, there have been a few misjudgments in terms of cluster-based features, which are utilized to obtain the higher-order potential based on the regulations described in
Section 2.3.3. Then, the corresponding analysis for parameter
is given to test its effect in the LCS-CRF algorithm.
In order to study the sensitivity of the parameter
for our method, other parameters are set to be constants. Experiments are conducted to analyze the effect of the parameter
, which is varied from 2 to 12 with a step of 2 for Vaihingen Dataset and GML Dataset A. The sensitivity analysis for the parameter
is presented in
Figure 16. To make them more concise, we also compute the variation tendency of the OA under different settings of parameter
, as shown in
Figure 16a,b. The parameter
shows obvious impact on the OA compared with employing the datum function, and the relative importance of the higher-order potential is increased as parameter
increases.
We can observe that, the OA first increases as parameter
increases since the semantic rules are properly utilized with Sigmoid functions to enhance the discrimination of cluster-based features. Then, the OA no longer increases at a certain value of parameter
(i.e., around 6 for Vaihingen Dataset and around 8 for GML Dataset A), and even shows a slight decreasing trend, since the large varying degrees of cluster-based features can lead to the accumulation of noise from cluster-based features and cause misjudgments of clusters. The red dotted lines in
Figure 16, serving as a reference, represents the classification results based on the higher-order potentials derived by datum function.
Another parameter,
, is also analyzed with Vaihingen Dataset and GML Dataset A, which mainly controls the effect of the higher-order potentials in the classification. As shown in the
Figure 17, parameter
is selected from 0 to 1 with a step of 0.1, while other parameters are set to be constant values. The OA gradually increases in the beginning with the increase in parameter
, in which the semantic rules dominate the tendency compared with location information in the unary potential. After parameter
reaches up to a certain value (i.e., around 0.6 for Vaihingen Dataset and around 0.7 for GML Dataset A), the OA also shows a slight decreasing trend, since the unary potential become dominant with the increase in parameter
. When
equals to 1, the overall accuracies for Vaihingen Dataset and GML Dataset A reach 0.783 and 0.924, respectively, where the classification result is obtained by the CRF model. It is found that an obvious improvement of the classification results was shown in both test datasets by integrating higher-order potentials, compared with the results directly derived by CRF model.
4. Discussion
From
Table 3, we can observe that the OA of LCS-CRF model performs the best among all of the traditional machine learning based method. As far as the eight specific classes are concerned, our method ranks first in the
imp_sur,
car, and
shrub classes within the traditional machine learning-based methods, and its
P surpass previous highest results with absolute advantages (+1.1%, +2.6%, and +6.1%). The RF model is mainly based on the point-based features, which are derived by the location cues of points, to perform semantic classification for ALS data. The CRF model integrates the location and contextual cues and shows a smoother result compared with the RF model (as shown in
Figure 11). Obviously, the LCS-CRF model shows a superior result by incorporating location, context, and semantics cues into a higher CRF model. Especially for the
car class, a great improvement of
P is obtained by adding semantics cues. The class
low-veg, with a higher
P, mainly benefits from Algorithm 1. The OA of the LCS-CRF model ranks first among the traditional machine learning-based methods and third among the deep learning-based methods, with minor disadvantages (1.8% and 2.1% lower than the second and the first OA, respectively). Though some deep learning-based methods perform better than our method, the LCS-CRF model can also satisfy the general demand with less training costs.
In
Table 4, the
P of
car class with LCS-CRF model surpass the results of RF and CRF model with +26% and +22.5%, which means that semantics cues play an important role in the semantic classification. We perform the methods
RF+LBP and
RF+α-exp by adding a regulation framework to smooth the semantic results derived by RF model. Though significant improvements are shown in
building,
car, and
low vegetation classes compared with RF model, the OA of methods
RF+LBP and
RF+α-exp are still less than 90%. The
P of
car class for our method is superior to others, and plausible results are shown in
ground,
building,
tree, and
low vegetation classes, which validate our proposed method.
In comparison to other approaches, our method shows several strengths. We compare the results achieved with our methodology to the ones obtained by recent approaches. Similarly, Reference [
5] proposed a hierarchical higher-order CRF framework, in which, spatial and context were integrated via a two-layer CRF. The Robust
Potts model was utilized to build the higher-order potential in their first layer CRF. Their framework iterated and mutually propagated context to improve the classification results. The results, with their framework on the Vaihingen Dataset, have been described in
Table 3 (LUH), which showed outstanding performance in
and revealed a rather high quality of the results in several classes. In contrast, our methodology extra integrates semantic cue in a higher-order CRF, which is a one-layer CRF with neither iteration nor propagation of context, and shows obvious increases in class
car and OA by 5.8% and 1.5%, respectively. Currently, the only approach delivering semantic classification results of higher quality (with OA = 85.2% and
= 69.3%) for the Vaihingen Dataset is the one presented by Reference [
52] that leverages deep learning for the semantic labeling of ALS point clouds. Yet, a multi-convolutional neural network (MCNN) was trained to automatically learn deep features of each point from the generated contextual images across multiple scales, which was time-consuming in training process and had relatively high requirements to hardware, while the proposed LCR-CRF framework only employs explicit point-based and cluster-based features. Comparable results can be observed in
Table 3 with
P in classes
imp_sur (+0.1%),
car (+8.8%),
façade (-1.9%), and
shrub (−0.5), and with the OA (−2.1%). Compared with [
49] and [
50], which also adopted deep learning for the semantic classification, the OA is, respectively, raised by 1.6% and 1.5% in our framework and
P in several classes shows better performance, especially in class
car. Due to the consideration of multi-scale neighborhoods, Reference [
26] obtained an improved performance on the GML Dataset A by exploring contextual information across different scales in the, respectively, extracted features, while we obtain the optimal neighbors with the algorithm proposed in Reference [
7] and integrate meaningful semantic cues. As shown in
Table 4, our method increases the OA by 3.8% and the
by 11.7%, and three of the five classes’
P are improved. The methods
RF+LBP and
RF+α-exp, which was performed based on the methodology proposed in Reference [
25], constructed graph models and employed structured regularization for spatially smoothing semantic labeling of point clouds. In our method, not only spatial information is utilized, but also context and semantic cues are integrated in a posterior probability model. In contrast with these two methods, our method better addresses some hard-to-retrieve classes, such as classes
car and low
vegetation, and increases OA by 8.3% and 6.5%, as observed in
Table 4.
Experiment results suggest that the LCS-CRF model shows superior performance on the semantic classification for ALS data. However, there are still some misclassification in the results. For the Vaihingen Dataset, classes fence and facade are at a disadvantage due to their attributes, including the small cardinal number, sparsity, and similar characteristics with some other classes. A close-up visual inspection shows that the class fence is often classified as class low_veg or shrub, which causes adverse effects on the OA and . For the GML Dataset A, classes building and car produce lower precisions compared with classes ground and tree. Based on the visual inspection of test data, class building with small height shows similar attributes to classes ground and car, due to its planarity and clustering. Class low vegetation with smaller clusters is easily classified as car, which is very sensitive to the P of class car due to the extremely small size of class car compared with the whole test dataset.
As shown in
Section 3.3, parameters in three parts, i.e., Algorithm 1, Algorithm 2, and higher-order potentials, are analyzed. Most parameter values are tested in a general interval based on the attributes of point clouds and common experience. Based on the hardware described in
Section 3, it takes about 1.5 h to calibrate the parameters in the first and second parts both on the Vaihingen Dataset and GML Dataset A. The decision of parameters in the third part need a heavier time cost due to the large-scale ALS point clouds, and the time for each inference on the LCR-CRF model is about 1.2 h. Then, parallel computing is utilized to speed up the process to a great extent. Once the parameters are determined, automatic interpretation can be performed on large-scale ALS point clouds. In addition, it takes only about 0.5 h to train a CRF model on the Vaihingen Dataset in our work, while the training time in a deep learning framework takes about three to six days [
54].
5. Conclusions
In this paper, we presented an LCS-CRF model for ALS data semantic classification. The main novelty of this framework consists of the integration of location, context, and semantics cues from irregularly distributed ALS points to semantically labeled point clouds in a higher-order CRF framework. The method processes in three main stages, i.e., (i) feature extraction; (ii) off-ground points extraction and clustering; and (iii) classification. A total of 34 point-based features from their locations and 5 cluster-based features from off-ground points’ clusters are extracted to form the feature space. To effectively employ the semantics cues, off-ground points extraction and clustering are performed for the cluster-based feature extraction. Based on the location and semantics cues, the unary potentials and higher-order potentials can be derived by the RF classifier and the sigmoid function. Then, the context information between neighbor points is integrated in a higher-order CRF as a pairwise potential to smooth the classification results. Therefore, the location, context, and semantics cues are, respectively, formulated in unary, pairwise, and higher-order potentials within the probabilistic LCS-CRF model to alleviate the misclassification. The experiments with two ALS point cloud data sets confirm the competitive semantic classification performance of the proposed method in both the qualitative and quantitative evaluations.
However, parameters with different values are sensitive to the classification results. In our future work, further improvements aim at preserving more potentially useful details to improve the results with fewer parameters. We also intend to investigate the potential of deep learning adapted to the ALS point cloud data.