Next Article in Journal
Collective Sensing: Integrating Geospatial Technologies to Understand Urban Systems—An Overview
Next Article in Special Issue
Mapping Infrared Data on Terrestrial Laser Scanning 3D Models of Buildings
Previous Article in Journal
An Object-Based Classification Approach for Mapping Migrant Housing in the Mega-Urban Area of the Pearl River Delta (China)
Previous Article in Special Issue
Portable and Airborne Small Footprint LiDAR: Forest Canopy Structure Estimation of Fire Managed Plots
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comprehensive Utilization of Temporal and Spatial Domain Outlier Detection Methods for Mobile Terrestrial LiDAR Data

1
Department of Earth and Space Science, York University, 4700 Keele Street, Toronto, ON M3J 1P3, Canada
2
Industrial and 3D Imaging Department, Optech Incorporated, 300 Interchange Way, Vaughan, ON L4K 5Z8, Canada
*
Author to whom correspondence should be addressed.
Remote Sens. 2011, 3(8), 1724-1742; https://doi.org/10.3390/rs3081724
Submission received: 25 June 2011 / Revised: 29 July 2011 / Accepted: 8 August 2011 / Published: 16 August 2011
(This article belongs to the Special Issue Terrestrial Laser Scanning)

Abstract

:
Terrestrial LiDAR provides many disciplines with an effective and efficient means of producing realistic three-dimensional models of real world objects. With the advent of mobile terrestrial LiDAR, this ability has been expanded to include the rapid collection of three-dimensional models of large urban scenes. For all its usefulness, it does have drawbacks. One of the major problems faced by the LiDAR industry today is the automatic removal of outlying data points from LiDAR point clouds. This paper discusses the development and combined implementation of two methods of performing outlier detection in georeferenced point clouds. These methods made use of the raw data available from most time-of-flight mobile terrestrial LiDAR scanners in both the temporal and spatial domains. The first method involved a moving fixed interval smoother derived from the well-known position velocity acceleration Kalman Filter. The second method fitted a quadratic curved surface to sections of LiDAR data. The combined use of these routines is discussed through examples with real LiDAR data.

1. Introduction

LiDAR (Light Detection and Ranging) is a tool, which allows for the fast and efficient capture of three-dimensional spatial information from real world targets. This ability has allowed both terrestrial based and airborne LiDAR to be used in a variety of applications [1,2]. Until recently, terrestrial based time of flight LiDAR has been relegated to stationary tripod mounts with comparatively low scanning speeds (10,000 points per second) when compared to airborne LiDAR systems (300,000 points per second). With the advent of mobile terrestrial LiDAR, this is no longer the case. Terrestrial based scenes can now be collected faster than ever, firstly because they are being collected from a moving platform and secondly, because collection speed has greatly increased (500,000 points per second). This increase in the number of terrestrial based data points collected during a survey means that greater and greater amounts of data are being produced faster. To complicate matters, due to the fact that the scanners are now immersed in the scene being scanned instead of flying high above it, the geometry contained in these massive data files is more complex than those encountered previously. This makes filtering of the data harder than previously encountered but even more necessary. Specifically, detecting and eliminating erroneously collected points or outliers becomes critical.
Simply stated, an outlier is a point which differs from its neighbors or neighborhood significantly [3]. The determination of what the term significantly means is, of course, up to the individual user of the data. Outliers in LiDAR data occur due to a variety of reasons. Some of these reasons such as boundaries of occlusion, surface reflectance and multi-path reflection are described in [4]. To this list can be added moving objects which pass through the scan area faster than they can be captured and particulate matter such as snow, rain, dust, etc., in the air, which reflect the laser energy.
Several strategies exist for dealing with these outliers [4,5,6,7,8,9]. They can be classified as univariate (single variable), multivariate (multiple variable), parametric (statistic based) and non-parametric (non-statistic based) methods [5]. The non-parametric methods can further be broken down into distance-based, depth-based, density-based and clustering techniques [5,7,8]. Despite this large amount of research into outlier detection, correctly finding outliers in spatial data remains a vexing problem. Most of the strategies listed above fail when confronted with obstacles such as surface discontinuities, poor statistical distributions and varying local densities within the LiDAR point cloud [4]. Improving outlier detection for LiDAR requires the development of algorithms which make use of as much of the data, inherently output in LiDAR point clouds as is practical. This includes using the precise timings available from time-of-flight LiDAR systems, raw polar coordinate observations, calculated Cartesian coordinates and the intensity data if possible. Once developed, these algorithms employing different detection strategies can be combined to improve overall detection results.
In this study two different algorithms, one in the temporal domain (Moving Fixed Interval Smoother) and the other in the spatial domain (Curved Surface Fitting), were developed and utilized in a novel way for comprehensive outlier detection in real mobile terrestrial LiDAR data. The algorithm in the temporal domain detects outliers by testing the difference between the computed coordinates of a point and a prediction of these values based on the surrounding data. The predicted values were obtained through a modified version of the moving fixed interval smoother derived from the position, velocity and acceleration (PVA) version of the Kalman Filter [10]. Utilization of the precise timings available from LiDAR either mobile or static time-of-flight in a PVA Kalman Filter for outlier detection is a new concept to literature. The second algorithm uses a best-fit quadratic curved-surface to spatially measure each point in the cloud and compare it to the points in its neighborhood. Their performance was numerically studied both individually and in tandem. While quadratic curved surface fitting is not new in literature [11,12,13,14,15], the sequential use of the spatial domain algorithm after the temporal domain algorithm is a new idea and will provide for an overall better result.

2. Moving Fixed Interval Smoother (MFIS)

2.1. The MFIS Algorithm

Given a discrete time series z ( 1 ) , z ( 2 ) , , z ( n ) of a continuous time signal with their standard deviations σ 1 , σ 2 , , σ n and the associated precise timings t 1 , t 2 , , t n , a second order polynomial of z can be used to model the time series. By using a random variable x ( k ) to denote the state of z ( k ) at time instant t k , the second order polynomial was given as follows:
x ( k ) = x ( k 1 ) + x ˙ ( k 1 ) ( t k t k 1 ) + 1 2 x ¨ ( k ) ( t k t k 1 ) 2
x ˙ ( k ) = x ˙ ( k 1 ) + x ¨ ( k 1 ) ( t k t k 1 )
x ¨ ( k ) = x ¨ ( k 1 )
wherein the process noise is not included, and the measurement equation was:
z ( k ) = x ( k ) + Δ ( k )
where Δ ( k ) was the white noise with zero expectation and a variance of σ k 2 . The filter algorithm based on Equation (1) is called the PVA filter in position tracking applications, and also called the α-β-γ filter if it is time invariant [16].
The proposed moving fixed-interval smoother was developed on the basis of Equations (1–4) that estimate the states for a time instant k using the measurements over a specified window ( t k n 1 , t k + n 2 ) (Figure 1).
Figure 1. The Fixed Interval Smoother.
Figure 1. The Fixed Interval Smoother.
Remotesensing 03 01724 g001
With the measurements z ( k n 1 ) , , z ( k 1 ) , z ( k ) , , z ( k + n 2 ) , the unbiased linear smoother was derived based on the principle of minimal variance. The smoothed solution for the states at k was given by [10].
x ^ ( k ) = i = n 1 n 2 a i ( k ) z ( k + i )
x ˙ ^ ( k ) = i = n 1 n 2 b i ( k ) z ( k + i )
x ¨ ^ ( k ) = i = n 1 n 2 c i ( k ) z ( k + i )
with their variances
σ x 2 ( k ) = i = n 1 n 2 a i 2 ( k ) σ k + i 2
σ x ˙ 2 ( k ) = i = n 1 n 2 b i 2 ( k ) σ k + i 2
σ x ¨ 2 ( k ) = i = n 1 n 2 c i 2 ( k ) σ k + i 2
where
a i ( k ) = 1 σ k + i 2 { λ 1 ( k ) + λ 2 ( k ) δ t k , k + i + λ 3 ( k ) δ t k , k + i 2 }
b i ( k ) = 1 σ k + i 2 { μ 1 ( k ) + μ 2 ( k ) δ t k , k + i + μ 3 ( k ) δ t k , k + i 2 }
c i ( k ) = 1 σ k + i 2 { η 1 ( k ) + η 2 ( k ) δ t k , k + i + η 3 ( k ) δ t k , k + i 2 }
( λ 1 ( k ) μ 1 ( k ) η 1 ( k ) λ 2 ( k ) μ 2 ( k ) η 2 ( k ) λ 3 ( k ) μ 3 ( k ) η 3 ( k ) ) = ( i = n 1 n 2 1 σ k + i 2 i = n 1 n 2 δ t k , k + i σ k + i 2 i = n 1 n 2 δ t k , k + i 2 σ k + i 2 i = n 1 n 2 δ t k , k + i σ k + i 2 i = n 1 n 2 δ t k , k + i 2 σ k + i 2 i = n 1 n 2 δ t k , k + i 3 σ k + i 2 i = n 1 n 2 δ t k , k + i 2 σ k + i 2 i = n 1 n 2 δ t k , k + i 3 σ k + i 2 i = n 1 n 2 δ t k , k + i 4 σ k + i 2 ) 1
with δ t k , k + i = t k + 1 t k .
By rearranging Equation (5), one can predict the point z ( k ) :
z p ( k ) = 1 1 a 0 ( k ) { i = n 1 1 a i ( k ) z ( k + i ) + i = 1 n 2 a i ( k ) z ( k + i ) }
Using the measurement z ( k ) and predicted value z p ( k ) , the difference δ z ( k ) = z ( k ) z p ( k ) was computed for the purpose of outlier detection. The variance of this difference is as follows:
σ δ z 2 ( k ) = σ k 2 + 1 ( 1 a 0 ( k ) ) 2 { i = n 1 1 a i 2 ( k ) σ k + i 2 + i = 1 n 2 a i 2 ( k ) σ k + i 2 }
Accordingly, the standardized difference was assumed to be normally distributed as:
δ z ( k ) σ δ z ( k ) ~ N ( 0 , 1 )
under the Null hypothesis H 0 : δ z ( k ) = 0 against the alternative hypothesis H 1 : δ z ( k ) 0 .
A time series was investigated where the position of each point in an example point cloud was predicted through the use of an appropriate interval ( n 1 , n 2 ) and the statistic test was performed. Outliers were identified through the use of Equation (17) as the predictor.

2.2. Outlier Detection in Time Domain

In theory, data from a rotating prism mobile terrestrial LiDAR can be regarded as a set of discrete observations from a single continuous line of data. In practice, discrete observations provided by all mobile terrestrial systems include raw angle-range measurements which were adjusted by the calibration model and used to compute the local east, north, up or x, y, z coordinates. This means that we were provided with a choice of raw angle-range, adjusted angle-range or coordinates when extracting discrete observations from the output data. Most importantly, because of the nature of mobile terrestrial LiDAR data, each discrete observation (whichever is chosen) was paired with an accurate timestamp indicating when the observation was made.
Utilizing the available information in MFIS requires that a moving window be created to extract small samples of the point cloud for analysis. This window would centre around each discrete observation in turn and use the data immediately preceding the observation as well as the data immediately succeeding the observation to calculate a predicted value. Comparing this predicted value with the observed value allows outlying data to be identified and removed from the LiDAR data. Figure 2 shows a sketch of a typical window of data that might be extracted from a point cloud and how the distance between the observed measurement (M) and the predicted measurement (P) was used to identify points which lie outside their neighborhood.
Figure 2. Time series of points used to generate predictions (P) for measured points (M).
Figure 2. Time series of points used to generate predictions (P) for measured points (M).
Remotesensing 03 01724 g002
The moving fixed interval prediction recognizes the fact that the point cloud can be treated as a series of lines of point data. Forming windows out of these lines of data requires care since any significant gap in the data has the potential to produce erroneous predictions (Figure 3). As the gap shown in Figure 3 was allowed to increase, the likelihood that the predicted point (P) falls close to the true position decreased. Once the predicted measurement has strayed from the true value, any comparison between the measured value and the predicted value was meaningless.
Figure 3. Time series of points with an appreciable gap between two neighboring points. As the gap increases, the likelihood that the predicted measurement (P) represents the true value, decreases.
Figure 3. Time series of points with an appreciable gap between two neighboring points. As the gap increases, the likelihood that the predicted measurement (P) represents the true value, decreases.
Remotesensing 03 01724 g003
In practice, gaps caused by occlusions, reflections and/or drop-out readings effectively segmented the continuous line of data into smaller sections. In addition, since a significant portion of any terrestrial LiDAR scan is likely to include portions of the sky, numerous LiDAR points were expected to be missing from the point cloud. These missing shots effectively segment the continuous line being followed by the scanner’s optics into multiple smaller line segments. Treating these smaller line segments as independent entities allowed us to apply the PVA filter to each of these subset lines from the point cloud. Allowances had to be made for lines shorter than the window size ( n 1 , n 2 ) and the window size had to be adjusted to accommodate points at the start and end of each line.
Great care was also taken when interpreting the test results for a given point. If an outlier is included in a window of data then the likelihood that the predicted point (P) falls close to the true position was again decreased. One way to counter this possible scenario was to form the same data section into three windows ( n 1 , 0 ) , ( 0 , n 2 ) and ( n 1 , n 2 ) . Computing and testing the predicted points (P1, P2, P3) as shown in Equation (17), we were able to conclude that if one of the three predicted points pass, then the observed measurement (M) passes and was not treated as an outlier. This strategy effectively deals with the situation where more than one outlier existed in a given window ( n 1 , n 2 ) . Outliers which occurred either immediately before the observed measurement (M) or immediately after the observed measurement (M) in the discreet time series did not cause a false detection to occur. When outliers existed both immediately before the observed measurement (M) and immediately after the observed measurement (M), a false detection still likely occurred.

3. Curved Surface Fitting (CSF)

3.1. The CSF Algorithm

The generic model of a quadric curved-surface was given by
f ( a 1 , , a 10 , x , y , z ) = a 1 x 2 + a 2 y 2 + a 3 z 2 + a 4 x y + a 5 x z + a 6 y z + a 7 x + a 8 y + a 9 z + a 10 = 0
where ( x , y , z ) is the coordinate of a point on the surface and a j ( j = 1 , , 10 ) are the parameters. Due to the ambiguity in the surface determination introduced by the parameter a 10 , it was necessary to constrain the ten parameters by
C = a 1 2 + a 2 2 + a 3 2 + a 4 2 + a 5 2 + a 6 2 + a 7 2 + a 8 2 + a 9 2 + a 10 2 = 1
Given the measurements ( x i , y i , z i ) of point i with its 3x3 variance matrix D i i and the approximate values ( a 1 ( 0 ) , , a 10 ( 0 ) ) of the ten parameters, one obtains the linearized form of Equation (18) as
A i v + B i δ a + w i = 0
where
v = ( v x i v y i v z i ) T
A i = ( a x i a y i a z i )
B i = ( x i 2 y i 2 z i 2 x i y i x i z i y i z i x i y i z i 1 )
w i = F i ( a 1 ( 0 ) , , a 10 ( 0 ) , x i , y i , z i )
The values ( a x i a y i a z i ) in Equation (22) represent the partial derivatives of Equation (18) with respect to the given measurements of point i . Under the assumption that all of the measurement points are not correlated to each other, one defines
v i = A i v = { a x i v x i + a y i v y i + a z i v z i }
to create an equivalent single measurement to the measured three coordinate components of a point so that Equation (25) was simplified to
v i = B i δ a + w i
for i = 1 , 2 , , n .
As it can be seen, the combination of the linearized form of Equations (18), (19) and (26) is a standard parametric adjustment model with a constraint. There is no need to provide further detail for its solution. For more details, refer to [17].
Due to the likelihood that one or more outliers may creep into the point cloud sample being used to form the polynomial surface, it was a good idea to provide a statistic check on the goodness-of-fit for each calculated surface. By comparing the a posteriori variance with the a priori variance, we produced such a statistic which follows the Chi-Squared distribution, as shown in Equation (27):
V T P L L V σ 0 2 ~ χ 2 ( n 10 + 1 )
In order to test if a point (i) is a potential outlier, two different test statistics can be constructed. When point i was included in the polynomial surface fitting, the Tau distribution [18] was used to test the residual of point i as shown in Equation (28) (test statistic 1).
T i ( 1 ) = | v i | σ ^ 0 q V i V i ~ τ ( n 10 + 1 )
where v i was the residual of point i, σ ^ 0 was the posterior variance of unit weight and q V i V i was the cofactor of v i . The Tau test was used in test statistic 1 due to the fact that v i and σ ^ 0 are dependant variables.
Alternatively, when point i was not included in the polynomial surface fitting, the Student t distribution was used to test the discrepancy between point i and the surface as shown in Equation (29) (test statistic 2).
T i ( 2 ) = w i σ w i ~ t ( n 1 10 + 1 )
where w i was computed by plugging point i into Equation (18) after the surface parameters have been determined and σ w i was the estimated standard deviation of w i .
Using the specified patch size, data surrounding each individual point in an example point cloud was used to create a curved surface. The surfaces were validated using Equation (27) and outliers were spatially detected using Equation (29) as the predictor.

3.2. Outlier Detection in the Spatial Domain

Viewing the LiDAR data as a strictly spatial entity, the relative position of a point with respect to its neighbors was used to identify outliers. Conducting a search around each point in a point cloud, a representative sample of neighboring points was obtained. This representative sample was used to form a surface. Comparing these points to the surface, outliers were identified by their spatial separation from the aforementioned surface. This concept is illustrated in Figure 4.
Figure 4. Polynomial surface patch in the immediate neighborhood of the point being tested.
Figure 4. Polynomial surface patch in the immediate neighborhood of the point being tested.
Remotesensing 03 01724 g004
The quadratic curved-surface fitting algorithm generates small surface patches in the neighborhood of each point (Figure 4). This is an outlier detector in the spatial domain which relies on the assumption that the points immediately adjacent to an outlier will themselves lie on the surface and not be outliers as well. The number of points to use in the polynomial patch fitting was a variable that needs to be determined. On one hand, at least 10 points are required to derive the best fit surface. On the other hand, the larger the number of coordinates used the greater the probability that other outliers will be incorporated into the calculation of the surface (Figure 5). In fact, when discussing LiDAR, the conditions which cause an outlier will also greatly increase the likelihood that other outliers lie close by. Therefore, care had to be taken when setting a patch size. The test statistic given in Equation (27) by giving us a measure of the fit of the surface to the data patch was used as an aid to determining whether outliers are included within the selected patch data.
Figure 5. Polynomial surface patch generated using a data section containing outlying points. Blue lines indicate outliers used to compute the surface. The Red line indicates the point being tested as an outlier.
Figure 5. Polynomial surface patch generated using a data section containing outlying points. Blue lines indicate outliers used to compute the surface. The Red line indicates the point being tested as an outlier.
Remotesensing 03 01724 g005

4. Tests and Results

4.1. The Lynx Mobile Mapper: Hardware and Software

MFIS and CSF were tested using parts of two data sets collected with the Lynx Mobile Mapper. The Lynx Mobile Mapper, consisted of two LiDAR sensors, two calibrated passive imaging cameras, and the Applanix POS LV 420 during data collection. This system is designed to collect rich survey-grade LiDAR and image data from a vehicle moving at traffic speeds.
For each test site, calibration of the Applanix POS system was accomplished immediately before the data collect. GAMS (GPS azimuth measurement subsystem) parameters for the Applanix POS system were determined the day of the data collection to ensure accuracy. For each test area the data sections used in testing were selected from the same data used to determine the LiDAR system boresight values. The system lever arms and boresight values were obtained using the LiDAR manufacturer’s recommended procedure.
Processing the raw POS data proceeded using the software package POSPAC. The result of this processing was an SBET (Smoothed Best Estimated Trajectory) file. This SBET file was then used in conjunction with the boresight values previously obtained to process the raw LiDAR data. This was accomplished using the software package Dashmap. The processed LiDAR data was output into ASCII format.

4.2. Description of Data

Four sections of Lynx point clouds (A, B, C and D) shown in Figure 6 and Figure 7 were selected for testing. Table 1 gives specifics about the contents of these point clouds.
Table 1. Specifications for point clouds used in algorithm testing.
Table 1. Specifications for point clouds used in algorithm testing.
Point CloudABCD
Total No. of Points295,147495,345120,092216,228
Total No. of Outliers8722801,308226
Total % of Points Which are Outliers0.300.061.090.10

4.2.1. Data with Simple Geometry

Point cloud A was obtained on a section of asphalt from a generic parking lot over which multiple drive passes were performed. Point cloud B was collected in a dirt lot where multiple passes were also performed.
Point cloud A contained numerous outliers in two large groups with other outliers spread throughout the data. As shown in Table 1, the outliers made up 0.30% of the total point cloud. This data was collected on a day where the asphalt was wet but the temperature was just above 0° Celsius. The prevailing cold wet conditions caused condensation from the vehicle’s exhaust pipe to combine with varying high and low intensity returns from the standing pools of water. This caused multiple laser reflections to be recorded above the asphalt surface.
In contrast, point cloud B was collected in a lumber yard with an unpaved, rough finished, mostly native clay driving area that had been pitted and grooved by the passage of heavy vehicles. The ambient temperature during the collect was about 15° Celsius and the ground surface was dry. These conditions produced a point cloud with comparatively few outliers (0.06% from Table 1). Many of the outliers which did exist in this data set were within centimeters of the ground surface. It has also been observed that several of the outliers in this point cloud were collected from a different position than the ground surface and therefore lie out of temporal series with much of the data.
Figure 6. Point clouds of simple geometry taken from two separate parking lots used during testing of the two outlier algorithms previously described. (a) Point cloud A contains numerous outliers evenly distributed above an asphalt surface; (b) Point cloud B contains outliers distributed evenly across a bare soil surface.
Figure 6. Point clouds of simple geometry taken from two separate parking lots used during testing of the two outlier algorithms previously described. (a) Point cloud A contains numerous outliers evenly distributed above an asphalt surface; (b) Point cloud B contains outliers distributed evenly across a bare soil surface.
Remotesensing 03 01724 g006

4.2.2. Data with Complex Geometry

Point cloud C was taken from the same data set as point cloud A. A two second section of the vehicle trajectory was isolated and all the LiDAR data collected during that time was extracted to form point cloud C. Similarly, point cloud D was taken from the same data set as point cloud B. Again a two second section of the vehicle trajectory was isolated and all LiDAR data collected during that time was extracted to form point cloud D.
Point cloud C contains a complex scene including part of a building with windows, cars, asphalt road, sidewalk, curb, two small trees, bushes and part of a natural field. As Table 1 indicates, the number of outliers in point cloud C comprise about 1.09% of the data. Point cloud D contains another complex scene including part of a building with windows, overhead wires, a bare earth driving surface and a bare earth mound. Unlike point cloud C, point cloud D contains no vegetation, and as such, the number of points that can be classified as outliers is much lower (0.10% of the data, Table 1).
Figure 7. Point clouds of complex geometry taken from two separate parking lots used during testing of the two outlier algorithms previously described. (a) Point cloud C contains numerous outliers along with a building wall, sidewalk, curb, road, field and vegetation; (b) Point cloud D contains outliers along with a building wall, overhead wires, bare soil surface and soil mound.
Figure 7. Point clouds of complex geometry taken from two separate parking lots used during testing of the two outlier algorithms previously described. (a) Point cloud C contains numerous outliers along with a building wall, sidewalk, curb, road, field and vegetation; (b) Point cloud D contains outliers along with a building wall, overhead wires, bare soil surface and soil mound.
Remotesensing 03 01724 g007

4.3. Analysis of Outlier Detection Utility in Real Data

Both of the algorithms described in Section 2 and Section 3 were implemented under Microsoft Visual C++ 6.0.
To ensure that the routine was working as expected and to assess the effectiveness of the routine in realistic data, a line of data was extracted from point cloud A (Figure 8). Figure 8 shows a line of data with one of the points lying far out of spatial position from the rest. The labels ‘Start of Interval’ and ‘End of Interval’ give an indication of the order in which the points were collected. Those points, colored green were collected before the outlier point and those points colored red were collected after the outlier point.
Figure 8. The moving fixed interval smoothing method applied to a section of a mobile terrestrial LiDAR point cloud collected with the Lynx Mobile Mapper.
Figure 8. The moving fixed interval smoothing method applied to a section of a mobile terrestrial LiDAR point cloud collected with the Lynx Mobile Mapper.
Remotesensing 03 01724 g008
As with the moving fixed interval smoother, quadratic polynomial surface fitting was used on a section of point cloud B (Figure 9). As before, this was done to ensure that the routine was working as expected and to assess the effectiveness of the routine in realistic data. The results are illustrated in Figure 8. Selecting a patch size of 100 points around the point in Figure 8 identified as ‘Outlier Point’, a polynomial surface was calculated. Computing the shortest distance between 'Outlier Point' and the surface, we find that the point lies 0.195 m from the surface. Generating the statistics described in Section 2.1, we find that the surface does fit the data well but the point is far outside the allowable deviation from the surface. Looking at a cross section of the surface and data (Figure 9), the deviations of the points around the calculated surface become clear. The deviation of the point labeled ‘Outlier Point’, is obviously far greater than the deviation of any other point.
Figure 9. Section of a mobile terrestrial LiDAR point cloud collected with the Lynx Mobile Mapper. Object is being viewed from the South.
Figure 9. Section of a mobile terrestrial LiDAR point cloud collected with the Lynx Mobile Mapper. Object is being viewed from the South.
Remotesensing 03 01724 g009

4.4. Results

As previously mentioned in Section 2.2, the output from the mobile terrestrial LiDAR allowed for raw angle-range measurements, angle-range measurements adjusted by the system calibration information and/or coordinate values to be input into the MFIS algorithm as the discreet observations. During testing, all three options were tried with data of both simple and complex geometry, however, the results obtained from the raw angle-range data were unimpressive and they were excluded from this section.
Several trials were conducted to find the optimum window size for MFIS in each point cloud. Table 2 gives the results of the best trial running the MFIS algorithm using the LiDAR range values after they had been adjusted for the scanner’s constant range offset value and the individual range corrections for varying intensity returns. Table 3 gives the results of the best trial, running the MFIS algorithm using the output easting, northing and up coordinates.
Being that two test statistics were derived for the CSF routine, one requiring the point being tested to be included in the data used to create the surface patch Equation (28) and one requiring the point being tested to be excluded from the data used to create the surface patch Equation (29), both scenarios were tested. Several trials were conducted to find the optimum patch size for CSF in each point cloud for both test statistic 1 Equation (28) and test statistic 2 Equation (29). The results of the best trial from tests conducted using point clouds A, B, C and D are given in Table 4 for test statistic 1 and in Table 5 for test statistic 2.
Table 2. Results from trials conducted using adjusted LiDAR ranges in the Moving Fixed Interval Smoother (MFIS) on point clouds A, B, C and D.
Table 2. Results from trials conducted using adjusted LiDAR ranges in the Moving Fixed Interval Smoother (MFIS) on point clouds A, B, C and D.
Point CloudABCD
Window Size (points)20602015
No. of Outliers Identified519541,200202
No. of Non-Outliers Identified (False Identification)5,70917,11224,50235,195
No. of Outliers Missed35322610824
% of Outliers Identified59.5219.2991.7489.38
% of Point Cloud Identified2.113.4721.4016.37
% of Point Cloud Identified Incorrectly (False Identification Rate)1.933.4520.4016.28
Table 3. Results from trials conducted using XYZ coordinates in the Moving Fixed Interval Smoother (MFIS) on point clouds A, B, C and D.
Table 3. Results from trials conducted using XYZ coordinates in the Moving Fixed Interval Smoother (MFIS) on point clouds A, B, C and D.
Point CloudABCD
Window Size (points)15351515
No. of Outliers Identified7422441,183206
No. of Non-Outliers Identified (False Identification)4,1307,84722,55537,255
No. of Outliers Missed1303612520
% of Outliers Identified85.0987.1490.4491.15
% of Point Cloud Identified1.651.6319.7717.32
% of Point Cloud Identified Incorrectly (False Identification Rate)1.401.5818.7817.22
Table 4. Results from trials conducted using Quadratic Polynomial Surface Fitting (CSF) on point clouds A, B, C and D using test statistic 1 in Equation (28).
Table 4. Results from trials conducted using Quadratic Polynomial Surface Fitting (CSF) on point clouds A, B, C and D using test statistic 1 in Equation (28).
Point CloudABCD
Patch Size (points)500100500400
No. of Outliers Identified381511828
No. of Non-Outliers Identified (False Identification)61696986784
No. of Outliers Missed8342651,190198
% of Outliers Identified4.365.369.0212.39
% of Point Cloud Identified0.540.020.920.38
% of Point Cloud Identified Incorrectly (False Identification Rate)0.510.020.820.36
Table 5. Results from trials conducted using Quadratic Polynomial Surface Fitting (CSF) on point clouds A, B, C and D using test statistic 2 in Equation (29).
Table 5. Results from trials conducted using Quadratic Polynomial Surface Fitting (CSF) on point clouds A, B, C and D using test statistic 2 in Equation (29).
Point CloudABCD
Patch Size (points)500100500400
No. of Outliers Identified253199335188
No. of Non-Outliers Identified (False Identification)017,0295,193
No. of Outliers Missed6198197338
% of Outliers Identified29.0171.0725.6183.19
% of Point Cloud Identified0.090.046.132.49
% of Point Cloud Identified Incorrectly (False Identification Rate)<0.01<0.015.852.40
The combination of the MFIS and CSF methods was performed, where the reduced point cloud produced by the MFIS method was then analyzed with the CSF method. This combined MFIS/CSF routine was performed using the easting, northing, up coordinate version of the MFIS method and the test statistic 2 version of the CSF method. The results for this test conducted using data strips A, B, C and D are given in Table 6.
Table 6. Results from trials conducted using the Moving Fixed Interval Smoother (MFIS) preceding Quadratic Polynomial Surface Fitting (CSF) on point clouds A, B, C and D.
Table 6. Results from trials conducted using the Moving Fixed Interval Smoother (MFIS) preceding Quadratic Polynomial Surface Fitting (CSF) on point clouds A, B, C and D.
Point CloudABCD
MFIS Window Size (points)15351515
CSF Patch Size (points)100100500400
No. of Outliers Identified7772451,197220
No. of Non-Outliers Identified (False Identification)4,1307,85124,76038,874
No. of Outliers Missed95351116
% of Outliers Identified89.1187.5091.5197.35
% of Point Cloud Identified1.661.6321.6118.08
% of Point Cloud Identified Incorrectly (False Identification Rate)1.401.5820.6217.98
As a further control on the results, point clouds A, B, C and D were imported into the Polyworks (www.innovmetric.com) IMSurvey module and the built-in outlier detection routine was used. This outlier routine is a precursor to the wrap mesh function, used in the creation of triangular irregular network (TIN) models. The outlier routine in Polyworks required us to set values for the maximum point to point distance and the maximum cluster size. Several attempts were made to optimize these inputs to the commercial routine. The results of the best trial using the Polyworks software are summarized in Table 7.
Table 7. Results from trials conducted using Polyworks IMSurvey’s (Version 11.0.30) Reject Outliers routine on point clouds A, B, C and D.
Table 7. Results from trials conducted using Polyworks IMSurvey’s (Version 11.0.30) Reject Outliers routine on point clouds A, B, C and D.
Point CloudABCD
Max Spot Space Measured (m)0.0530.0400.1000.536
Min Spot Space Measured (m)0.0230.0120.0240.012
Max Point-to-Point Distance Used (m)0.0800.0500.0800.080
Maximum Cluster Size Used (m)5.0001.0005.0001.000
No. of Outliers Identified7172211,281170
No. of Non-Outliers Identified (False Identification)15,84630,64528,25242,774
No. of Outliers Missed155592756
% of Outliers Identified82.2278.9397.9475.22
% of Point Cloud Identified5.616.2324.5919.86
% of Point Cloud Identified Incorrectly (False Identification Rate)5.376.1923.5319.78

4.5 Analysis and Discussions

From the trials of the MFIS routine, it was clear that for simple geometry, using the coordinate values consistently produced better results than using the adjusted ranges. This was most clearly shown from the trial with point cloud B, where 19.29% of the outliers were found by using ranges as opposed to the 87.14% outliers found by using coordinates. The trials also showed that the coordinate version of the MFIS routine had a much smaller failure rate, identifying 1.58% of point cloud B incorrectly as outliers. The range version of the MFIS routine incorrectly identified 3.45% of point cloud B, nearly double the amount of the coordinate version. The results using coordinates from point cloud B were interesting because many of the outliers in point cloud B occur temporally after the initial scan, and therefore do not follow the time series pattern. Looking more closely at the results we found that many of these outliers in point cloud B fail in the easting or northing components and not, as is the case in point cloud A, the up component. It is significant that the easting and northing components were able to save the MFIS routine in point cloud B, even though many of the outlying data points lie out of temporal sequence with other spatially close data.
Upon introducing complex geometry to the MFIS routine (point clouds C and D), we find that the false identification rate dramatically increases to between 16% and 20% of the total point cloud. Interestingly enough, the use of adjusted ranges in the MFIS routine was able to match the performance when the coordinate values were used in both point clouds C and D. The coordinate method still shows a slightly smaller amount of false identifications (about 1–2%) than when the adjusted ranges were used in point clouds C and D; however, the difference in the number of correct identifications is not as stark as it had been when using simple geometry.
The non-outliers that were wrongly identified by the MFIS routine occur in areas of the point cloud where the regularity of the time series was disrupted by rough objects such as vegetation or manholes. Where a manhole was encountered, the data density was insufficient to model the raised surfaces on the manhole’s lid. Also, areas of the point cloud where the system was collecting data while stationary, occlusions and changes in surface direction seemed to cause false detections.
CSF performed best on point clouds B and D, identifying 5.36% and 12.39% of the outliers, respectively, when test statistic 1 was used and 71% and 83% of the outliers, respectively when test statistic 2 was used. Point clouds A and C had only 4.36% and 9.02% of their outliers identified respectively when test statistic 1 was used and 29% and 25% of their outliers identified respectively when test statistic 2 was used. Overall, test statistic 2 consistently produced better results than test statistic 1. Test statistic 2 always found more outliers than test statistic 1. It is interesting to note that the number of false identifications from test statistic 2 did significantly increase with complex geometry, something that cannot be said about the results obtained from test statistic 1.
The CSF routine performed exceptionally poorly on data where the outliers were clumped closely together as was the case in point clouds A and C. Examining the chi squared test statistic, we see that in areas where the data is clumped together, the fit of surrounding polynomial surfaces are quite poor. Additionally, when complex geometry was introduced by point clouds C and D, the chi square statistic shows us that the polynomial surfaces fit very poorly in areas were the point cloud transitioned from one object to another. This routine had much greater success on point clouds B and D, where the outliers are more spread out on the road surface and building. In fact, the spatial nature of the method showed itself as being quite well suited for point clouds where outlying data points may not lie in a sequential temporal order.
Combining MFIS and CSF using simple geometry showed that in both point clouds A and B we found almost 90% of the outliers with relatively few false alarms (about 1.5% of the total point cloud size). The results proved slightly better in point cloud A when the routines were combined, since a small number of outliers not identified by MFIS were identified by CSF. On the other hand, point cloud B saw much less benefit from the combination of the methods. The vast majority of points found in point cloud B were found by the MFIS routine. In fact only 1 new outlier was identified by the CSF routine when the combined MFIS/CSF routine was applied to point cloud B. With the introduction of complex geometry in point clouds C and D, we find that again we could identify greater than 90% of the outliers, however the cost in false identifications increased to around 20%.
The results from the commercial software package Polyworks show that it was able to identify fewer outliers in point clouds A and B. The cost to identify these outliers was much greater than that from either the MFIS algorithm or the CSF algorithm. Where MFIS identified around 1.5% and CSF identified less than 0.01% of the point cloud incorrectly as outliers, Polyworks identified as much as 6.19% of the point cloud incorrectly as outliers. However, when complex geometry was introduced in point cloud C, we find that the commercial software achieved comparable results with the MFIS routine. The commercial software identified about 97% of the known outliers while wrongly eliminating another 24% of the total point cloud. The MFIS routine found over 90% of known outliers, while eliminating about 20% of the point cloud incorrectly. The complex geometry of point cloud D again separates the commercial software from the combined MFIS/CSF routines. With point cloud D, the commercial software identified around 75% of the outliers in the point cloud while again identifying about 20% incorrectly. The combined MFIS/CSF routine was able to find 97% of the outliers in point cloud D with a similar failure of approximately 20% of the point cloud being identified incorrectly. Like the results obtained from point clouds A and B, the results from point cloud D demonstrate a significant improvement over the commercial software.

5. Conclusion

This paper presented the results of combining two different types of algorithms for the detection and filtering of outliers in point clouds collected using mobile terrestrial LiDAR. These routines have taken advantage of the extra information that is generally available from this type of equipment. The two mathematical models presented here allowed for the creation of two computer routines which perform outlier detection in mobile terrestrial point cloud data.
It has been shown that individually, each method of outlier detection has difficulty detecting outliers when certain conditions are met. Under real world conditions, variations in surface quality from objects such as vegetation or manholes can cause problems. The possibility also exists that not all points which lie spatially close to each other will have been collected at relatively similar times. Due to the mobile nature of this LiDAR variant, it is possible that the vehicle carrying the LiDAR has looped back on its previous position or that the scanners have seen the same point from different parts of its trajectory. Equally, the conditions which cause outliers tend to create multiple outliers within a particular region. Creating a surface patch from data that contains outliers, results in a poor fitting surface and a problematic outlier detection outcome.
While each method has proven to have its own strengths and weaknesses, they have each proven capable of detecting and removing outliers from actual LiDAR data. Whether the point cloud contains relatively simple geometry or contains data comprising a more complex scene, these methods have successfully identified outliers in real LiDAR data. When compared to commercial software, the combined routines have proven to exceed the commercial routine in performance. More work is needed to optimize the inputs to the routines, specifically, to determine accurate error estimates for the point cloud coordinates.

Acknowledgements

The authors would like to thank the Natural Sciences and Engineering Research Council (NSERC) of Canada for the financial support.

References

  1. Antova, G. Precise Mapping with 3D Laser Scanning. In Proceedings of the International Conference on Cartography and GIS, Borovets, Bulgaria, 25–28 January 2006.
  2. Miller, M.M.; Meertens, C.; Phillips, D.; Rubin, C.; Ely, L.; Pratt-Sitaul, B. Collaborative Research MRI: Acquisition of Terrestrial Laser Scanning Systems for Earth Science Research; Proposal Submitted to EAR Major Research Instrumentation; UNAVCO: Boulder, CO, USA, 2009; Available online: http://www.unavco.org/pubs_reports/proposals/2009/MRI_TLS_EAR2009_UNAVCO-CWU.pdf (accessed on 7 March 2010).
  3. Lu, C.T.; Chen, D.; Kou, Y. Algorithms for Spatial Outlier Detection. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), Melbourne, FL, USA, 19–22 November 2003.
  4. Sotoodeh, S. Outlier Detection in Laser Scanner Point Clouds. In Proceedings of the ISPRS Commission V Symposium, Image Engineering and Vision Metrology, Dresden, German, 25–27 September 2006; Volume XXXVI, Part 5. pp. 297–302.
  5. Ben-Gal, I. Outlier detection. In Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researcher; Maimon, O., Rockach, L., Eds.; Springer: New York, NY, USA, 2005; Volume 1, Chapter 1; pp. 1–13. [Google Scholar]
  6. Breuing, M.; Kriegel, H.; Ng, R.; Sander, J. LOF: Identifying Density-Based Local Outliers. In Proceedings of the International Conferance on Management of Data (ACM SIGMOD 2000), Dallas, TX, USA, 16–18 May 2000.
  7. Last, M.; Kandel, A. Automated Detection of Outliers in Real-World Data. In Proceedings of the Second International Conference on Intelligent Technologies, Bangkok, Thailand, 27–29 November 2001.
  8. Papadimitriou, S.; Kitagawa, H.; Gibbons, P.B.; Faloutsos, C. LOCI: Fast Outlier Detection Using the Local Correlation Integral. In Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, 5–8 March 2003.
  9. Zheng, M.-Q.; Chen, C.-C.; Lin, J.-X.; Fan, M.-H.; Janscó, T. An Algorithm for Spatial Outlier Detection Based on Delaunay Triangulation. In Proceedings of the International Workshop on Computational Intelligence in Security for Information Systems (CISIS’08), Genova, Italy, 23–24 October 2008.
  10. Wang, J.G. Pre-Processing of INS-Data with the Help of the α-β-γ-Filter; Internal Report; Institute of Geodesy, UniBw Munich: Neubiberg, Germany, July 1997. (in German) [Google Scholar]
  11. Ahn, S.J.; Rauh, W.; Cho, H.S.; Wanecke, H.J. Orthogonal distance fitting of implicit curves and surfaces. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 620–638. [Google Scholar]
  12. Gasca, M.; Sauer, T. Polynomial interpolation in several variables. Adv. Comput. Math. 2000, 12, 377–410. [Google Scholar] [CrossRef]
  13. Xu, Y. Polynomial interpolation in several variables, cubature formulae, and ideals. Adv. Comput. Math. 2000, 12, 363–376. [Google Scholar] [CrossRef]
  14. Gruen, A.; Akca, D. Least squares 3D surface and curve matching. ISPRS J. Photogramm. Remote Sens. 2005, 59, 151–174. [Google Scholar] [CrossRef]
  15. Akca, D. Matching of 3D surfaces and their intensities. ISPRS J. Photogramm. Remote Sens. 2007, 62, 112–121. [Google Scholar] [CrossRef]
  16. Chui, C.K.; Chen, G. Kalman Filtering with Real-Time Applications, 4th ed.; Springer-Verlag: Berlin/Heidelberg, German, 2009. [Google Scholar]
  17. Wang, J.G. Least Squares Quadric Surface Fitting with the help of Statistical Tests—A Case Study in Industrial Surveying. In Proceedings of International Geomatics Forum, Qingdao, China, 29–30 May 2009.
  18. Caspary, W.F. Concepts of Network and Deformation Analysis; Monograph 11, School of Surveying, UNSW: Sydney; NSW: Australia, August 2000. [Google Scholar]

Share and Cite

MDPI and ACS Style

Leslar, M.; Wang, J.-g.; Hu, B. Comprehensive Utilization of Temporal and Spatial Domain Outlier Detection Methods for Mobile Terrestrial LiDAR Data. Remote Sens. 2011, 3, 1724-1742. https://doi.org/10.3390/rs3081724

AMA Style

Leslar M, Wang J-g, Hu B. Comprehensive Utilization of Temporal and Spatial Domain Outlier Detection Methods for Mobile Terrestrial LiDAR Data. Remote Sensing. 2011; 3(8):1724-1742. https://doi.org/10.3390/rs3081724

Chicago/Turabian Style

Leslar, Michael, Jian-guo Wang, and Baoxin Hu. 2011. "Comprehensive Utilization of Temporal and Spatial Domain Outlier Detection Methods for Mobile Terrestrial LiDAR Data" Remote Sensing 3, no. 8: 1724-1742. https://doi.org/10.3390/rs3081724

Article Metrics

Back to TopTop