1. Introduction
For the advancement of autonomous driving LV3 and LV4 defined by [
1], real-time high-performance recognition technology is required even at high speeds. Many companies have installed these LV3 self-driving technologies in their mass-produced cars in the form of self-driving on highways. In the case of high speed, if the logic execution delay time and recognition distance are not substantial, obstacles can quickly approach due to speed and become dangerous. Therefore, efforts are needed to increase the cognitive distance and speed up logic execution.
The deep learning method using the point cloud of LiDAR has steadily developed, and its performance has been advanced [
2]. It recognizes objects as bounding boxes by learning datasets divided into three classes, mainly cars, pedestrians, and cyclists, with neural networks. Its performance depends on learning datasets and network structures. To improve the performance, studies that directly construct many object recognition datasets in various real environments have been conducted [
3]. Recently, many researchers also have built a dataset in a virtual environment to expands the diversity of situations [
4]. In addition, researchers have actively conducted to improve the structure of neural network models [
5,
6]. However, performance-based network architectures cannot be used in actual driving unless applied in real time. Also, regardless of real-time and recognition performance, learning relies on datasets, mainly on three classes (vehicle, pedestrian, cyclist). So, other objects that vehicles need to recognize must be correctly recognized.
To solve the limitations of the obstacle recognition type, on the contrary, there are also studies to learn the driving area of the vehicle using semantic segmentation [
7]. However, semantic segmentation using the 3D point cloud has heavy data, so a model that satisfies both performance and real-time requirements has yet to be developed. According to the state of the art, in terms of performance alone, the best model [
8]’s mIoU is 72.9. However, looking at the real-time models, the mIoU of the model [
9] that satisfies the minimum reference level of 98fps in real time is 46.9. So, the model that satisfies both real-time and performance requirements still needs to be completed.
Most semantic segmentation models using a camera satisfy both real-time and performance requirements [
10]. Accordingly, a sensor fusion method was also studied to determine the driving area with a camera and recognize the exact distance and actual size with a LiDAR [
11]. In this study, they identified the part where the ring of the LiDAR was cut off due to a small obstacle in front of the VLP-16 LiDAR. Then, they matched with the semantic segmentation learning results of the front camera image. However, as the calibration of the front camera and LiDAR is also an important research topic, it is difficult to match, and there is a limitation that only objects in the range of the front camera angle can be recognized.
Then, how is it recognizable regardless of object type, satisfying both real-time and performance requirements, and not requiring complex calibration? It is a rule-based recognition technique using LiDAR’s point cloud. This method has been developed since before using deep learning and was mainly studied by judging the ground, which is a plane, and recognizing the part outside the ground as an object.
Recently, many researchers have conducted research to logically extract objects’ characteristics from the point cloud [
12]. It is a method of segmenting the area according to the form of a polar coordinate system of the point cloud and extracting the object part by the difference in slope within each area and between each area. It is cumbersome for this method to adjust the slope parameters that determine the characteristics of dividing each area or extracting it as an object. In general, rule-based algorithms have the disadvantage of having many user-definable parameters, on which performance mainly depends.
However, among rule-based ground removal logics, there is an algorithm that reduces the inconvenience of parameter tuning and is based on raw data from LiDAR that are not separately filtered by RoI [
13]. In that paper, parameters were calculated using adaptive ground likelihood estimation and previous results, and a powerful ground segmentation method was proposed by utilizing temporal ground characteristics, regional vertical plane fitting, and noise removal techniques. However, ground clearance alone cannot accurately determine the obstacles autonomous driving must avoid. The point cloud remaining after the ground is removed contains various classes that need to be filtered, such as building walls, curbs, and obstacles. Additionally, lighter and faster algorithms are needed to reduce resources when running in high-speed situations or conjunction with other decision and control algorithms. The algorithm proposed in this study is very simple and fast by utilizing voxels.
Therefore, in this work, we propose a simple small object detection logic that is rule-based but minimizes setup elements and ensures that both performance and real-time requirements are satisfied. After removing noise from the point cloud raw data, the algorithm extracts edges based on the analysis of height value of the point cloud using ring data. It contains the processes that align points with the same ring ID and azimuth to mask object parts according to the rising edge/falling edge. In the last step, we cover edge cases and supplement the algorithm using height and distance indicators. This algorithm was verified by comparing recognition performance and real time in virtual scenarios and actual autonomous driving competition environments with RANSAC [
14].
3. Ring Edge-Triggered Detection Method
This section describes the ring edge-triggered LiDAR detection method and preprocessing for applying the logic.
3.1. Noise Filtering
The Ring Edge algorithm proposed in this paper is an algorithm that separates the ground and objects at once after sorting points according to ring ID and azimuth. It is very vulnerable to noise because it is recognized based on when the value difference between aligned neighboring points is large. Accordingly, a part to remove noise was added prior to the full-scale application of the algorithm in the following manner.
First, the edge is found based on a value smaller than the difference between the indicator sizes to be recognized as the edge set as a parameter. Since the found edges are differences between neighboring points in previously sorted points, edge indices can be extracted continuously. Accordingly, the index difference between the extracted edges is calculated, and if it is less than 3, the edges are judged to be noise. If the index difference between edges to be judged as noise is too small, it is difficult to filter out all the noise, and if it is too large, objects with fewer poles or points may also fly away, so some adjustment is necessary.
3.2. Object Points Extraction with Ring Edge
The most basic edge trigger-based object point extraction logic is as follows. Edges between neighboring points are extracted based on the parameters () set in the filtered points.
If the z value indicates edge extraction, the part following the rising edge is viewed as an object and masked as 1. The part following the falling edge is viewed as an object only up to the front, and the back part is viewed as ground and is all masked as 0. When the first falling edge is the first falling edge, the front part is viewed as an object, so masking of the front part must also be done.
Figure 5a is the point cloud scene where a small load is placed in front. If masking is applied to a set of points bounded by the same ring ID, the object part is converted to 1, as shown in
Figure 5b. Algorithm 1 of
Figure 5 summarizes the basic object extraction logic using Ring Edge from the point cloud from which noise has been removed.
Algorithm 1. Algorithm to detect objects with edges |
| Input: Point cloud in multiple areas divided by azimuths |
| Output: Point cloud recognized as obstacle |
| for each point cloud area in P do |
| | := p (sorted by ring id) |
| | initialize |
| | for each ring in R do |
| | | := [ring == each ring] |
| | | th := arctan(.y/.x) |
| | | = (sorted by th) |
| | | := list of zeros (size : number of ) |
| | | diff := difference of z value between neighbor points of |
| | | edges := index of where diff > |
| | | if diff[ first edge ] < 0 then // if first edge is falling edge |
| | | | [before first edge] = 1 |
| | | end if |
| | | for each edge in edges do |
| | | | if diff[each edge] > 0 then // if edge is the rising edge |
| | | | | [after each edge] = 1 |
| | | | else then // if edge is the falling edge |
| | | | | [after each edge] = 0 |
| | | | end if |
| | | end for |
| | | = +[] |
| | end for |
| end for |
3.3. Covering Edge Cases
However, when using the basic Ring Edge algorithm, there were two edge cases as follows, which were resolved.
The first case is when continuous rising edges appear, and the entire area between the two rising edges is masked with an object, resulting in a misrecognition. If you carry this out the original way, all parts behind the rising edge are considered objects, and all parts behind the first rising edge are masked with 1. During this time, whether a falling edge exists is determined based on a value slightly lower than. When extra falling edges exist, misrecognition is improved by masking only the first rising edge to the falling edge with 1, as shown in the example in
Figure 6a.
The second case is when continuous falling edges occur, and the object is not recognized because all falling edges are masked with the ground. In this case, if the minimum height between two edges is higher than the minimum height of the part previously masked as an object, the part between the two edges is also considered an object and masked to 1, as shown in the example in
Figure 6b.
3.4. Results of Processes of Ring Edge
Finally, by applying the Ring Edge point extraction logic considering edge cases, the results shown in
Figure 7 can be obtained from the input point cloud frame in
Figure 8.
Figure 8 is another frame from the same sequence in the Waymo Open Dataset. Ring Edge result plots were expressed in BEV form according to azimuth, and raw data from a similar time point was imported for comparison briefly.
3.5. Estimation of Bounding Boxes
At points where buildings and walls have been removed to estimate the bounding box based on rules, a clustering process is first necessary. Clustering is dividing points into small sets according to their distribution. Each point is given an ID indicating how many times it has been set.
In this paper, we used DBSCAN [
17] as the clustering logic. DBSCAN inputs the number of points in a particular space as a parameter. Then, the input parameters become the boundaries that separate each chunk. In this paper, we set it to judge whether at least four points are in a 3 m radius space.
Figure 9 clusters the filtered points and estimates the bounding box without rotation based on the size and location of each cluster.
Principal component analysis (PCA [
18]) is a dimensionality reduction technique in machine learning and statistics. It transforms high-dimensional data into a lower-dimensional representation by identifying and retaining the most significant features, known as principal components. These components capture the maximum variance in the data, aiding in simplifying its complexity. PCA is valuable for visualization, noise reduction, and speeding up machine learning algorithms. It works by computing eigenvectors and eigenvalues from the covariance matrix of the data, allowing for the selection of the most informative dimensions while minimizing information loss.
This paper uses this PCA algorithm to estimate an object’s heading angle from a clustered point set [
19]. We input the point set to find the covariance matrix and the direction vector with the largest variance as eigenvectors. Because the data are three-dimensional points, they return three principal component vectors perpendicular to each other. Among the returned vectors, only two principal component vectors are projected onto the xy-plane. Afterward, the vector’s angle obtained using the tangent becomes the object’s heading angle (yaw).
6. Conclusions
This paper proposes a ground removal logic that interprets a group of points aligned with LiDAR Ring ID and azimuth as digital pulses to find edges and mask object parts. In addition, beyond ground removal, the object’s bounding box was estimated using a rule-based algorithm through DBSCAN and principal component analysis.
Ring Edge’s ground removal speed is 20 ms, slightly slower than RANSAC (15 ms) and faster than Patchwork++ (28 ms). In addition, compared to RANSAC based on SemanticKITTI and the Waymo Open Dataset, the F1 score was higher, proving its superiority. The results of estimating the object’s bounding box based on rules from the extracted object voxels were confirmed to be well estimated with a PDR index of over 90% when verified in the Waymo Open Dataset, virtual driving environment, and actual driving environment.
The proposed algorithm guarantees performance and real-time requirements, so it can be applied and used immediately when an autonomous vehicle drives. It was proved by participating in an actual autonomous driving competition and performing a mission to recognize small obstacles by applying this algorithm. Afterward, we will research to learn the bounding box by featuring the object points extracted with this Ring Edge algorithm. Compared to existing end-to-end deep learning methods in terms of speed and performance, we will find the most efficient network and develop logic that can quickly recognize regardless of class.