1. Introduction
Real-time and accurate object detection is essential for the safe driving of autonomous vehicles. If detection is lagging or inaccurate, autonomous vehicles may issue wrong control instructions and cause serious accidents. At present, the object detection algorithm based on images [
1,
2] is becoming more mature, and there are many improved algorithms based on these algorithms [
3,
4]. According to the benchmark report of KITTI [
5], the average accuracy of the latest object detection algorithm based on images can achieve an average accuracy of about 80%. However, due to the lack of information related to the objects’ depth, object detection based on images is not enough to provide sufficient information for autonomous vehicles to perform planning and controlling. Although we can restore the spatial structure of the object through a matching algorithm, such as Mono3D [
6], Deep3DBox [
7], and 3DOP [
8], the calculation amount is large, and the depth information recovered from them has errors. Therefore, direct 3D-data input is necessary. As the cost of lidar decreases, object detection based on point cloud data will be more widely used, and it will be able to provide high-precision, highly robust obstacle information for the safe driving of autonomous vehicles.
Point cloud is a set of points with a large amount of data that is different from the image, as it can reach hundreds of thousands of points in a single frame and contains a large number of ground points. It brings great challenges to object detection.
In the early days, most point cloud object detection methods were based on the traditional point cloud processing method, and they can be divided into three categories: object detection methods based on mathematical morphology, object detection methods based on image processing, and feature-based object detection methods. The object detection method based on mathematical morphology mainly performs morphological object detection on point cloud data; Linden-berger [
9] first adopted this method in lidar object detection by using opening operation, which is a filtering method based on mathematical morphology, to process point cloud data, and then using auto-regression to improve the results. However, the method has limitations, since point cloud data are irregular, discrete spatial points. The object detection method based on image processing mainly converts the lidar point cloud data into a range images, and then uses image-processing algorithms for object detection. For example, Stiene et al. [
10] proposed a CSS (Eigen-Curvature Scale Space) feature extraction method. This method extracts the contours and silhouettes of objects in range images and implements object detection using supervised learning. The basic idea of the feature-based object detection algorithm is to first obtain features, such as object height, curvature, edge, and shadow, then conduct conditional screening, and finally use clustering or recombination methods to obtain suspected targets. Yao et al. [
11] used adaptive algorithms to obtain local features of objects, and constructed vehicle classifiers for object detection. Ioanou et al. [
12] divided the scene point cloud based on the vector difference to obtain multiple interconnected spatial point cloud clusters, and then extracted the objects through clustering. In addition, there are other object extraction methods based on machine learning such as k-means and DBSCAN [
13]. By selecting appropriate clustering parameters, the detection target is obtained in an unsupervised learning manner, but the parameter selection is difficult, it is easy to cause the problems of under-segmentation and over-segmentation, and it is not good for clustering sparse point clouds at long distances.
Several recent methods have adopted the deep learning method for point cloud object detection. According to different data-processing forms, it can be divided into three categories: object detection based on voxel grid, object detection based on point cloud projection, and object detection based on original points. The object detection method based on voxel grid divides the point cloud space into small cubes, called voxels, uniformly, which are used as the index structure unit in 3D space. For example, Zhou et al. [
14] proposed VoxelNet, which is an end-to-end point cloud target detection network. It uses stacked voxel feature encoding (VFE) for feature learning, and extracts 3D features of the region proposal network for three-dimensional(3D) target detection. However, the calculation efficiency of this method is not high, and it cannot meet the real-time requirements of autonomous vehicles. The object detection method based on point cloud projection projects point cloud data in a certain direction, and then uses a deep neural network to perform object detection based on the projected image before, inversely transforming back to get the bounding box of the 3D object. For example, BirdNet [
15] and RT3D [
16] generated suggestion frames of object detection in 3D space from a bird’s eye view, but the results were not good. LMNet [
17] takes the front view as input, but due to the loss of details, even for simple tasks such as car detection, it cannot obtain satisfactory results. Although VeloFCN [
18] can accurately obtain the detection box of the object, the algorithm runs very slowly, and it is difficult to meet the real-time requirements of autonomous driving. The object detection method based on the original point cloud directly operating on the original point cloud data without converting the point cloud data into other data formats. The earliest realization of this idea was PointNet [
19], which designed a lightweight network T-Net to solve the rotation problem of point cloud and used maximum pooling to solve the disorder problem of point cloud. On this basis, PointNet++ [
20] performs local division and local feature extraction on point cloud data, enhancing the generalization ability of the algorithm. Although the semantic segmentation information can be obtained, the task it completes is the classification of the points, and the detection frame of the object is not obtained.
Usually, point cloud object detection methods first preprocess the original data, such as through ground segmentation and downsampling, and then cluster the acquired non-ground points to detect the objects. However, existing algorithms basically operate directly on the original data, so it is difficult to meet real-time requirements due to the great amount of original point cloud data. If the point cloud data are downsampled to reduce the number of scan points processed during calculation, part of the point cloud data is discarded, affecting the data integrity of the target point cloud. Moreover, when the object to be detected is far away from the lidar, the point cloud of the target surface becomes sparse, and the traditional Euclidean clustering method [
13] for object detection can easily cause the problem of over-segmentation of objects at a long distance.
Based on these problems, this paper proposes a fast object detection algorithm based on vehicle-mounted lidar. The algorithm includes three modules: Module 1 uses the ground segmentation by discriminant image (GSDI) method to realize ground segmentation. It converts the original point cloud data into a discriminant image firstly and then uses breadth-first search (BFS) to traverse discriminate images to judge whether a point is ground point. It avoids direct point cloud data computing. Module 2 first uses the mature detector to obtain the object’s 2D detection boxes and then projects the object’s 2D detection boxes to 3D point cloud. Thus, the interest areas in 3D space can be obtained, and this can improve the search efficiency and ensure the integrity of the objects. Considering that the difference of the point cloud distance will cause the density of the point cloud to be different. Module 3 adopts a DDTC method to detect objects, it uses an adjustable parameter to determine the distance threshold when clustering points at different distances. Compared with the traditional Euclidean clustering method, it effectively solves the problem of over-segmentation of long-distance objects.
Figure 1 shows the framework of the entire algorithm. Experiments have verified that the algorithm proposed in this paper has a good detection effect in different scenarios, and it can meet the real-time requirements of autonomous driving while maintaining high accuracy.
5. Conclusions
Autonomous driving technology can fundamentally solve the traffic problems caused by human factors, and the lidar-based environment perception algorithm is a key component of the autonomous driving algorithm system, which is of great significance to the development of the autonomous driving field. In this paper, a fast object detection algorithm based on vehicle-mounted lidar was proposed. The algorithm proposes the GSDI method to convert point cloud data into discriminant images for threshold judgment, and ground points are filtered out efficiently. Then, the image detector is used to generate the region of interest of the 3D object, effectively narrowing the search range of the target. Finally, in view of the characteristic that the difference of the point cloud data distance will cause the density of the point cloud data to be different on the target surface, a DDTC method which uses dynamic distance threshold for the Euclidean clustering algorithm is designed, which effectively improves the detection accuracy of long-distance objects. By comparing with the mainstream 3D object detection algorithm, it was also shown that the algorithm can maintain high accuracy and also meet the real-time requirements of unmanned driving. Although the algorithm has a good detection effect in most scenes, the detection effect of obstacles under strong occlusion is not good. This problem can be improved by combining the bird’s-eye view detection results or by inferring the position of the obscured object in the current frame based on the past information of the object, which will be completed in the next research. It is worth mentioning that, although only the front-view camera was used in the study, if the vehicle is equipped with panoramic camera or multiple cameras covered 360-degree view, point cloud of 360 degree view of lidar also can be used. In actual application, if only one camera with a non-big FOV is applied, suppressing point clouds at the beginning will be very effective and will reduce the time consumption further.