An Efficient LiDAR Point Cloud Map Coding Scheme Based on Segmentation and Frame-Inserting Network

Wang, Qiang; Jiang, Liuyang; Sun, Xuebin; Zhao, Jingbo; Deng, Zhaopeng; Yang, Shizhong

doi:10.3390/s22145108

Open AccessArticle

An Efficient LiDAR Point Cloud Map Coding Scheme Based on Segmentation and Frame-Inserting Network

by

Qiang Wang

^1,2,

Liuyang Jiang

^1,3,

Xuebin Sun

⁴

,

Jingbo Zhao

¹,

Zhaopeng Deng

¹

and

Shizhong Yang

^1,*

¹

College of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China

²

State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China

³

School of Automation Science and Electrical Engineering, Beihang University, Beijing 100083, China

⁴

School of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(14), 5108; https://doi.org/10.3390/s22145108

Submission received: 6 June 2022 / Revised: 29 June 2022 / Accepted: 6 July 2022 / Published: 7 July 2022

(This article belongs to the Special Issue Multi-Unmanned Aerial Vehicle (Multi-UAV) for Autonomous Transportation)

Download

Browse Figures

Versions Notes

Abstract

:

In this article, we present an efficient coding scheme for LiDAR point cloud maps. As a point cloud map consists of numerous single scans spliced together, by recording the time stamp and quaternion matrix of each scan during map building, we cast the point cloud map compression into the point cloud sequence compression problem. The coding architecture includes two techniques: intra-coding and inter-coding. For intra-frames, a segmentation-based intra-prediction technique is developed. For inter-frames, an interpolation-based inter-frame coding network is explored to remove temporal redundancy by generating virtual point clouds based on the decoded frames. We only need to code the difference between the original LiDAR data and the intra/inter-predicted point cloud data. The point cloud map can be reconstructed according to the decoded point cloud sequence and quaternion matrices. Experiments on the KITTI dataset show that the proposed coding scheme can largely eliminate the temporal and spatial redundancies. The point cloud map can be encoded to 1/24 of its original size with 2 mm-level precision. Our algorithm also obtains better coding performance compared with the octree and Google Draco algorithms.

Keywords:

LiDAR; point cloud map; coding; segmentation; interpolation

1. Introduction

LiDAR point clouds have been widely used in many emerging applications [1], such as the preservation of historical relics, mobile robots, and remote sensing [2,3,4,5,6,7,8,9]. LiDAR sensors are essential for autonomous vehicles, and dense LiDAR point cloud maps play an indispensable role in unmanned driving, such as obstacle detection [10], localization [11], and navigation [12]. In wide geographic areas, a LiDAR point cloud map consists of a vast number of points and requires a large bandwidth and storage space to transmit and store [4,13,14,15]. Therefore, developing compression algorithms for the dense LiDAR point cloud maps is an urgent task.

With the characteristics of covering a large area, unstructured organization, and a huge volume [16], it is difficult to remove redundancy without distortion when encoding the LiDAR point cloud map. Octree, as a data structure, has been widely used to encode point clouds. Each internal node has exactly eight children in an octree. The octree-based point cloud compression method is to divide a 3D point cloud by recursively subdividing it into eight octants. As octree-based compression methods are lossy, they are suboptimal for the autonomous vehicles that have strict requirements for compression accuracy in the task of path planning or obstacle detection, etc.

In this research, we focus on compressing the large-scale dense LiDAR point cloud maps. The well-known low-drift and real-time LiDAR odometry (LOAM) algorithm are used to build the 3D map [17]. The proposed LiDAR point cloud map coding algorithm can be used in mobile robots or surveying and mapping fields. The major contributions are as follows.

By recording the time stamp and quaternion matrix of each scan during mapping, the large-scale point cloud map compression can be formulated as a point cloud sequence compression problem;
For intra-coding, we develop an intra-prediction method based on segmentation and plane fitting, which can exploit and remove the spatial redundancy by utilizing the spatial structure characteristics of the point cloud.
For inter-coding, we develop an interpolation-based inter-prediction network, in which the previous time and the next time encoded point clouds are utilized to synthesize the point clouds of the intermediate time to remove the temporal redundancy.
Experimental results on the KITTI dataset demonstrate that the proposed method achieves a competitive compression performance for the dense LiDAR point cloud maps compared with other state-of-the-arts.

2. Point Cloud Coding: A Brief Review

The compression of the 3D point cloud data is a hot topic and preliminary investigations have been recently made.

2.1. Volumetric/Tree-Based Point Clouds Coding

For unstructured point clouds, Octree representation is commonly utilized as a point-cloud geometry compression method. To improve the immersive visual experience, De Oliveira Renteet et al. [18] proposed an efficient geometric coding scheme for point clouds, in which an octree-based compression algorithm compression was utilized as a basic layer and used the graph transformation technique as an enhancement layer to encode the residual data. Their reported evaluation results show that the method produced significant improvement, especially at low and medium bit rates. Traditional point-cloud compression algorithms are limited to encoding the position and attribute of discrete point clouds. In [19], Krivokua et al. introduced an alternative technique based on volume function. Like regression analysis, the volume function is a continuous function that can interpolate the values on a finite set of points into a linear combination of continuous basis functions. B-spline wavelet basis is utilized for encoding the volume function, which represents the geometry and attributes of point clouds. Compared to the latest MPEG point-cloud coding standard [20], their algorithm achieves better coding performance in geometry and attributes. To alleviate the computational complexity of the 3D point-cloud model in registration, data abstraction, and visualization, Elseberg et al. [21] proposed an effective point-cloud storage and compression scheme based on the Octree. The coding scheme can be used in file format conversion and 3D model registration.

2.2. Image/Video-Based Point Clouds Coding

For structured point clouds, some studies have focused on employing image/video codecs to compress point cloud data by mapping it into 2D data. Similar to this approach, Tu et al. [22] transformed LiDAR point clouds into a range image sequence and used a simultaneous localization and mapping (SLAM) algorithm to perform inter-prediction. The intra-frame and inter-prediction residual data are encoded by the MPEG-like compression method. Unlike the aforementioned methods, Wang et al. [23] proposed a method to compress RGB-D point clouds. The properties between RGB-D point clouds and LiDAR point clouds are similar, except for the measurement range (i.e., Lidar is around 100 m, while RGB-D camera is around a few meters). They developed a warping-based depth data coding method, in which a point-cloud registration algorithm was utilized to remove redundancy. Experimental results showed that their algorithm achieved a higher compress ratio with less distortion compared to recent methods. Tu et al. [24] used the conventional image and video-based schemes to compress the 2D arrays by converting the LiDAR data to a range image. Feng et al. [25] proposed a real-time spatio-temporal LiDAR point clouds compression scheme. In this scheme, key frames are identified and encoded by interative plane fitting, and then the temporal streams are encoded by referencing the spatially encoded data. In [26], Tu et al. firstly chose frames as keyframes (I-frame) and obtained the optical flow between the two nearest keyframes. Then, according to the two keyframes and the optical flow, a U-net network was utilized to generate the remaining LiDAR frames (P-frames) between the two keyframes. They removed the temporal redundancy by interpolating point cloud data between two non-adjacent frames. In [27], Tu et al. proposed an RNN-based network to encode LiDAR point clouds. They used a recurrent neural network, and only the input of the first layer was the point cloud data, while the inputs of other layers were the residual data. Their method was to remove the spatial redundancy of a frame of point clouds, not the temporal redundancy. Coding each frame is independent and does not depend on other frames.

2.3. Summary

Currently, there is still no efficient coding solution for the LiDAR point cloud map. Though volumetric/tree-based schemes, such as Google Draco [28] and MPEG TMC 13 [20], can be used to encode the LiDAR point cloud map, their coding performance is far from satisfactory as these methods fail to utilize the spatial structure of point clouds. Converting LiDAR data into a 2D matrix in itself is an efficient method for reducing spatial redundancy. However, most of the existing methods [22,24,29] directly use the image/video-based method to encode LiDAR range image without further exploiting the temporal and spatial redundancies.

3. Overall Codec Architecture

In the paper, the problem of efficient coding dense LiDAR point cloud map is addressed, which can be used for mobile robots or surveying and mapping fields. The low-drift and real-time LOAM algorithm is utilized to construct a LiDAR point cloud map [17]. During the mapping process, the time stamp of the LiDAR scans and quaternion matrices is recorded relative to the global coordinate origin. As the LiDAR point cloud map is constructed by these frames, the dense point cloud map compression can be specified into a point cloud sequence coding task.

Generally, the point cloud sequence compression needs to exploit both temporal and spatial redundancies. The system architecture of our LiDAR point cloud map coding algorithm is illustrated in Figure 1. We divide the frames in the point cloud sequence into intra-frames (I-frames) and inter-frames (B-frames, bi-prediction). An I-frame is compressed independently by removing the spatial redundancies, while a B-frame is compressed by referring to encoded I-frames or B-frames to remove the temporal redundancies. Two I-frames are encoded firstly, followed by B-frames in the middle of two I-frames. B-frames cannot be encoded independently and rely on two encoded I-frames [30].

The intra-frames are coded by a segmentation-based prediction technique, while for inter-frames, we develop an interpolation-based coding network to remove the temporal redundancy.

Decoding is the inverse process of encoding. The decoded residual data is added to the prediction data to recover the point clouds. According to the decoded point clouds, quaternion matrices, and translation matrices, the point cloud map can be reconstructed:

[\begin{matrix} x_{B} \\ y_{B} \\ z_{B} \end{matrix}] = R_{y a w} \times R_{p i t c h} \times R_{r o l l} \times [\begin{matrix} x_{I} \\ y_{I} \\ z_{I} \end{matrix}] + [\begin{matrix} C_{x} \\ C_{y} \\ C_{z} \end{matrix}],

(1)

where

x_{I}

,

y_{I}

,

z_{I}

is x, y, z of coordinate system of an intra frame, and

x_{B}

,

y_{B}

,

z_{B}

are corresponding x, y, z, when using the predicted B-frame as the coordinate origin.

C_{x}

,

C_{y}

, and

C_{z}

denotes the translation matrix, and

R_{y a w}

,

R_{p i t c h}

,

R_{r o l l}

represents the rotation matrix of yaw, pitch, and roll angle.

4. Intra-Frame Point Cloud Coding Based on Semantic Segmentation

4.1. Overview of Intra-Coding Network

The pipeline of the proposed segmentation-based intra-frame point cloud coding method is illustrated in Figure 2. Firstly, the point cloud conversion from 3D to 2D is performed to obtain the 2D Matrix of point clouds. Then, the RangeNet++ [31] network is utilized to realize point cloud segmentation. The residual data, contour map, and the quadric surface parameters are encoded with lossless coding schemes and packaged as the the intra-bitstream.

4.2. LiDAR Point Cloud Segmentation

The LiDAR data from the KITTI dataset is utilized to verify our method, which uses 64 channels, covering 26.9

^{\circ}

vertical field of view and a 360

^{\circ}

horizontal field of view [32]. Considering that LiDAR sensors acquire data in an orderly way, the point clouds can be converted from

R^{3}

to

R^{2}

. Let

{(I_{i} = (x_{i}, y_{i}, z_{i}))}_{i = 1 \dots N}

be the coordinates of a frame of point cloud captured by Velodyne sensors. Considering that the point clouds is ordered, it can be transformed into a 2D matrix

X {(u, v)}_{u = 1 \dots M, v = 1 \dots N}

.

(\begin{matrix} u \\ v \end{matrix}) = (\begin{matrix} \frac{1}{2} (1 - a r c t a n (y, x) π^{- 1}) \cdot w \\ (1 - (a r c s i n (z r^{- 1} + f_{u p})) f^{- 1}) \cdot h \end{matrix},)

(2)

where

(x, y, z)

represent the coordinates of point P,

(u, v)

are 2D matrix coordinates,

(h, w)

are the height and width of the desired 2D matrix representation,

r = \sqrt{x^{2} + y^{2} + z^{2}}

represents the distance of point

P = (x, y, z)

from the origin, and

f = f_{u p} + f_{d o w n}

denotes the vertical field of view of the LiDAR sensor [33].

The RangeNet++ takes the fused 2D information as input and outputs the segmentation results. Three 2D convolutional blocks are adopted as a 2D feature extractor. The output of the final layer is the point cloud segmentation results. The segmentation results will pave the way for the subsequent surface fitting-based intra-prediction technique.

4.3. Segmentation-Based Intra-Prediction Technique

Nearly one-third of the point cloud data are ground points. After segmenting the point cloud, we can segment the ground and other objects. Thus, the quadric surface fitting-based technique is utilized to fit the point clouds. Considering the complexity and compression efficiency, we fit the segmented region with the plane for the ground points and sphere for other object.

A plane is defined by

J = (n, d)

, where n is the normal vector, with

| | n | | = 1

, and d is the vertical distance from the origin to the plane. The distance from a point

p_{i}

to the plane is defined as:

D_{p l a n e} (J, p_{i}) = | | n^{T} p - d | |

(3)

Then we can construct the equation

ε_{p l a n e}

as a function of the distance of the minimum value of the sum.

ε_{p l a n e} (J, S) = \sum_{i = 1}^{N} | | n^{T} S_{i, :} - d | |

(4)

A sphere is represented by

J = (c, r)

, where

r \in R

denotes the radius, and

c \in R^{3}

represents the center. The distance from point

p_{i}

to the sphere is defined as:

D_{s p h e r e} (J, p_{i}) = | | c - p_{i} | | - r

(5)

Then we can construct the fitting equation

ε_{s p h e r e}

as a function of the distance of the minimum value of the sum.

ε_{s p h e r e} (J, S) = \sum_{i = 1}^{N} | | (| | c - p_{i} | |) - r | |

(6)

4.4. Residual Data Coding

According to the parameters of the LiDAR sensor, we use the fitting plane to calculate the coordinates of the virtual points. The residual data

R_{i n t r a} (x, y)

is the difference between the original range image

X_{i n t r a} (x, y)

and the predicted range image

P_{i n t r a} (x, y)

. As the pixel values of the residual data are nearly zero, the entropy of the

R_{i n t r a} (x, y)

is smaller copared with

X_{i n t r a} (x, y)

. Thus, the residual data

R_{i n t r a} (x, y)

can be encoded with fewer bits.

R_{i n t r a} (x, y) = X_{i n t r a} (x, y) - P_{i n t r a} (x, y) .

(7)

5. Inter-Frame Point Cloud Coding Based on Inserting Network

5.1. Overall Inter-Prediction Network

To remove temporal redundancy in the LiDAR point clouds [34], an inter-frame point cloud inserting network is designed, as illustrated in Figure 3. The interpolation module utilizes the encoded points clouds

X = {X_{t 0}, X_{t 0 + 2 k}}

, and generates the internal point cloud frame

P_{t 0 + k}

. We calculate the difference between the predicted result

P_{t 0 + k}

and the real point cloud

X_{t 0 + k}

as residual data

R_{i n t e r} (x, y)

, which will be encoded as the inter-bitstream.

5.2. Point Cloud Interpolation Module

Figure 3 illustrates the diagram of the LiDAR point cloud interpolation module. The encoder and decoder parts predict the 3D voxel stream, which will be used to generate the required intermediate frames. The network generate the predicted frame

P_{t = t_{0} + k}

according to the input point clouds

{\overset{ˇ}{X}}_{t = t_{0}}

and

{\hat{X}}_{t = t_{0} + 2 k}

, where

{\overset{ˇ}{X}}_{t = t_{0}}

,

{\hat{X}}_{t = t_{0} + 2 k}

are already encoded frames. The predicted frame can be the middle frame by interpolating or the next frame by extrapolating the input point clouds. We focus on interpolating the intermediate frame according to two decoded frames. The network is represented by

H (X_{r e c}, Θ)

, where the output F is the 3D voxel flow of the input

X_{r e c}

, and

Θ

is the network parameters.

F = (Δ x, Δ y, ω) = H (X_{r e c}; Θ),

(8)

where F is the optical flow of two adjacent frames. The opposite direction of the optical flow is used to identify the corresponding position in the previous frame. The coordinates of the corresponding positions in the preceding and following frames is defined as

L_{f o r m e r} = (x - Δ x, y - Δ y)

and

L_{l a t e r} = (x + Δ x, y + Δ y)

. We use tri-linear interpolation from the eight corner points of the voxel to calculate the output value

P (x, y)

by four points:

{\overset{ˇ}{X}}_{00} ({\overset{ˇ}{x}}_{c e i l}, {\overset{ˇ}{y}}_{c e i l})

,

{\overset{ˇ}{X}}_{01} ({\overset{ˇ}{x}}_{c e i l}, {\overset{ˇ}{y}}_{f l o o r})

,

{\overset{ˇ}{X}}_{10} ({\overset{ˇ}{x}}_{f l o o r}, {\overset{ˇ}{y}}_{c e i l})

, and

{\overset{ˇ}{X}}_{11} ({\overset{ˇ}{x}}_{f l o o r}, {\overset{ˇ}{y}}_{f l o o r})

, from the former frame, and the other four points:

{\hat{X}}_{00} ({\hat{x}}_{c e i l}, {\hat{y}}_{c e i l})

,

{\hat{X}}_{01} ({\hat{x}}_{c e i l}, {\hat{y}}_{f l o o r})

,

{\hat{X}}_{10} ({\hat{x}}_{f l o o r}, {\hat{y}}_{c e i l})

and

{\hat{X}}_{11} ({\hat{x}}_{f l o o r}, {\hat{y}}_{f l o o r})

, from the later frame.

The time component of the voxel stream F can be considered as a linear blending weight between two adjacent frames. We employ this voxel stream to sample the input two frames and use the volume sampling function

T_{x, y, ω}

to generate the final predicted frame P.

\begin{matrix} P_{i n t e r} (x, y) & = T_{x, y, ω} (X_{r e c}, H (X_{r e c}; Θ)) \\ = ω \cdot P_{f o r m e r} (x, y) + (1 - ω) \cdot P_{l a t e r} (x, y), \end{matrix}

(9)

where

P_{f o r m e r} (x, y)

and

P_{l a t e r} (x, y)

are computed by

\begin{matrix} P_{f o r m e r} (x, y) = [\begin{matrix} 1 - x & x \end{matrix}] \times [\begin{matrix} {\overset{ˇ}{X}}_{00} & {\overset{ˇ}{X}}_{01} \\ {\overset{ˇ}{X}}_{10} & {\overset{ˇ}{X}}_{11} \end{matrix}] \times [\begin{matrix} 1 - y \\ y \end{matrix}], \\ P_{l a t e r} (x, y) = [\begin{matrix} 1 - x & x \end{matrix}] \times [\begin{matrix} {\hat{X}}_{00} & {\hat{X}}_{01} \\ {\hat{X}}_{10} & {\hat{X}}_{11} \end{matrix}] \times [\begin{matrix} 1 - y \\ y \end{matrix}] . \end{matrix}

(10)

The interpolation network adopts a fully convolutional structure, with four convolutional layers and four deconvolutional layers. To better maintain the spatial features in the low dimension layer, some jump connections between the corresponding convolutional layer and deconvolutional layer are added.

5.3. Inter Loss Function Design

The prediction module is represented by

H (X_{(t 0, t 0 + 2 k)}, Θ)

, where the output

P_{t 0 + k}

is the predicted point cloud at

t = T + 1

and

Θ

is the network parameters.

P_{t 0 + k} = H (X_{(t 0, t 0 + 2 k)}, Θ),

(11)

The predited point cloud data are converted to range image, and the difference between the original range image

X_{t 0 + k} (x, y)

and the predicted range image

P_{t 0 + k} (x, y)

is calculated as the residual data

R_{i n t e r} (x, y)

, which will be encoded losslessly. Explicity, the training loss is defined as follows:

\begin{matrix} R_{i n t e r} (x, y) & = X_{t 0 + k} (x, y) - P_{t 0 + k} (x, y) \\ = X_{t 0 + k} (x, y) - H (X_{(t 0, t 0 + 2 k)}, Θ) \end{matrix}

(12)

L_{l o s s} = | | R_{i n t e r} {| |}_{2}

(13)

5.4. Visualization Results

For the inter-coding network, if the point-cloud interpolation module synthesizes more accurate point clouds, smaller residual data and higher compression performance can be achieved. To deeply evaluate, four scenes point cloud interpolation results are obtained, as illustrated in Figure 4. We can see that the predicted point cloud and the original point cloud are almost the same. The effectiveness of the inter insertion module is confirmed.

6. Experimental Results

The proposed point cloud compression scheme is implemented in python using point cloud libraries (PCL) [35] on a PC with a TITAN RTX GPU. The KITTI dataset [36], including city, residential, campus, and road scenes, is used to evaluate our algorithm.

6.1. Evaluation Metric

To evaluate the overall performance, the calculation of the compression ratio (CR) and relative distance (

D_{d}

) is considered. The CR is obtained by calculating the ratio between the point cloud size after compression and the original size.

C R = \frac{C o m p r e s s e d_{s i z e}}{O r i g i n a l_{s i z e}} \times 100 %,

(14)

D_{d}

represents the distance between the ground truth LiDAR data

P_{i n p u t}

and the reconstructed one

P_{d e c o d e}

.

D_{d}

is defined as follows:

\begin{matrix} D_{d} (P_{i n p u t}, P_{d e c o d e}) & = \frac{{\bar{D}}_{d} (P_{i n p u t}, P_{d e c o d e})}{2} + \frac{{\bar{D}}_{d} (P_{d e c o d e}, P_{i n p u t})}{2} \\ {\bar{D}}_{d} (P_{i}, P_{j}) & = \frac{1}{| P_{i} |} \sum_{x_{i} \in P_{i}} min_{x_{j} \in P_{j}} d (x_{i} - x_{j}) \end{matrix} .

(15)

The measure is sensitive to false positives (rebuilding points in unoccupied areas) and false negatives (excluding occupied areas).

6.2. Coding Performance for a Single Frame

To find the most efficient coding method for the residual data, several lossless coding schemes are used to encode the residual data, including Zstandard, LZ5, Deflate, Lizard, LZ4, and PPMd. To verify the compression efficiency in removing the time and space redundancy, the 2D matrices are also directly encoded with the lossless coding algorithms without any preprocessing. Figure 5 shows the experimental results.

For intra-coding, the smallest CR value is 4.64%, achieved by using the PPMd scheme for the city scene due to its simple structure. The point cloud of the residential scene, however, is complex and the CR is much higher. The smaller the CR, the better the coding performance.

For inter-coding, it can be observed that using the PPMd scheme achieves the best compression performance for the campus scene, with a CR of 3.09%. The worst CR is 6.87% for the residential scene using the LZ4 scheme. However, the CR of the proposed inter-coding network is still smaller than that of directly coding point cloud data with lossless coding schemes.

6.3. Comparsion with Octree and Draco

According to the time stamp and quaternion matrix of each scan, the scans are merged into a LiDAR panoramic map. Taking this point cloud map as a whole, Table 1 describes the CR results of the proposed coding method compared to Octree [35] and Google Draco [28]. The quantization accuracy (QA) of our method is set to 2 mm, 5 mm, and 1 cm, while the distance resolution (DR) of Octree is set to 1 mm

^{3}

, 5 mm

^{3}

, and 1 cm

^{3}

. The quantization bits (QB) of Draco are set to 17, 15, and 14, which correspond to 1 mm accuracy, 5 mm accuracy, and 1 cm accuracy, respectively. Besides that, the compression level (CL) = 10 is set to achieve the highest compression rate. In our experiments, PPMd is selected to encode residual data. From Table 1, it can be observed that compared with Octree, the proposed algorithm achieves a smaller CR value.

6.4. Rate-Distortion Curves

Four state-of-the-art baselines are chosen for comparison, including Google Draco [28], MPEG TMC13 [20], and Tu’s method [26]. The results are evaluated in terms of the relationship between the Distance and the bit per point (bpp) for point cloud data for four scenes, as illustrated in Figure 6. The proposed LiDAR point clouds coding method has shown outstanding advantages in terms of bbp and

D_{d}

among these methods [37,38,39]. Google Draco and MPEG TMC13 are not designed specifically for multi-line LiDAR data [40,41]. Compared to these methods, our method can generate the point cloud more accurately, which contributes to largely removing the redundancy and obtaining a better

D_{d}

-

b p p

result.

6.5. Computational Complexity

The proposed intra-coding consists of three steps, namely segmentation, intra-prediction, and residual data coding, while the inter-coding method consists of inserting frame and residual data coding. By calculating the average coding time, 100 frames are selected. Experimental results show that the total intra-coding time with the lossless method (PPMd) is 0.21 s and the total inter-coding time is 0.15 s.

7. Discussion

With a 5–15 HZ user-selectable frame rate, the LiDAR sensor of HDL-64E S2 captures over 1.3 million points per second. The drawback of the proposed LiDAR point cloud map coding scheme is that the intra-coding and inter-coding can not meet the real-time requirement [42,43,44]. However, it can be used for off-line LiDAR point cloud map coding to reduce its storage space and transmission bandwidth, which can be used in mobile robots or surveying and mapping fields. The follow-up research focuses on implementing the coding scheme to the FPGA platform to accelerate the algorithm and achieve real-time performance.

8. Conclusions

Ranging sensors, such as LiDAR, are considered to be very robust under all light conditions or foggy weather, which have been widely used in the field of autonomous driving tasks, for instance, navigation, obstacle avoidance, target tracking, and recognition, etc. However, the enormous volume of LiDAR point clouds brings great challenges to data storage and transmission. To address this issue, this paper focuses on LiDAR point cloud map coding. Firstly, an intra-coding technique is designed based on point cloud segmentation and geometric reconstruction, which can effectively remove the spatial redundancy of the LiDAR point cloud. Secondly, we designed a point cloud insertion network to remove the time redundancy of point clouds by inserting frames into the intermediate moment according to the encoded point clouds. Experiments demonstrate that the proposed method obtains a higher comparison performance compared with several representative point cloud methods.

Author Contributions

Conceptualization, Q.W. and L.J.; methodology, X.S.; software, Z.D. and S.Y.; validation, Q.W., L.J. and J.Z.; formal analysis, Q.W., J.Z. and S.Y.; investigation, X.S. and Q.W.; resources, Q.W. and X.S.; data curation, X.S.; writing—original draft preparation, Q.W. and X.S.; writing—review and editing, Q.W. and S.Y.; visualization, Z.D. and L.J.; supervision, S.Y.; project administration, X.S.; Funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported by the National Natural Science Foundation of China (Grant No. 62001262) and the National Natural Science Foundation of China (Grant No. 62001263) and the Nature Science Foundation of Shandong Province (Grant No. ZR2020QF008).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, C.; Ji, M.; Wang, J.; Wen, W.; Li, T.; Sun, Y. An improved DBSCAN method for LiDAR data segmentation with automatic Eps estimation. Sensors 2019, 19, 172. [Google Scholar] [CrossRef] [PubMed] [Green Version]
McGlade, J.; Wallace, L.; Reinke, K.; Jones, S. The potential of low-cost 3D imaging technologies for forestry applications: Setting a research agenda for low-cost remote sensing inventory tasks. Forests 2022, 13, 204. [Google Scholar] [CrossRef]
Guimarães, N.; Pádua, L.; Marques, P.; Silva, N.; Peres, E.; Sousa, J.J. Forestry remote sensing from unmanned aerial vehicles: A review focusing on the data, processing and potentialities. Remote Sens. 2020, 12, 1046. [Google Scholar] [CrossRef] [Green Version]
Sadeghifar, T.; Lama, G.; Sihag, P.; Bayram, A.; Kisi, O. Wave height predictions in complex sea flows through soft-computing models: Case study of Persian Gulf. Ocean Eng. 2022, 245, 110467. [Google Scholar] [CrossRef]
Gale, M.G.; Cary, G.J.; Van Dijk, A.I.; Yebra, M. Forest fire fuel through the lens of remote sensing: Review of approaches, challenges and future directions in the remote sensing of biotic determinants of fire behaviour. Remote Sens. Environ. 2021, 255, 112282. [Google Scholar] [CrossRef]
Jalonen, J.; Järvelä, J.; Virtanen, J.P.; Vaaja, M.; Kurkela, M.; Hyyppä, H. Determining characteristic vegetation areas by terrestrial laser scanning for floodplain flow modeling. Water 2015, 7, 420–437. [Google Scholar] [CrossRef] [Green Version]
Lama, G.F.C.; Sadeghifar, T.; Azad, M.T.; Sihag, P.; Kisi, O. On the indirect estimation of wind wave heights over the southern coasts of Caspian Sea: A comparative analysis. Water 2022, 14, 843. [Google Scholar] [CrossRef]
Godone, D.; Allasia, P.; Borrelli, L.; Gullà, G. UAV and structure from motion approach to monitor the maierato landslide evolution. Remote Sens. 2020, 12, 1039. [Google Scholar] [CrossRef] [Green Version]
He, J.; Barton, I. Hyperspectral remote sensing for detecting geotechnical problems at Ray mine. Eng. Geol. 2021, 292, 106261. [Google Scholar] [CrossRef]
Rao, Y.; Zhang, M.; Cheng, Z.; Xue, J.; Pu, J.; Wang, Z. Semantic Point Cloud Segmentation Using Fast Deep Neural Network and DCRF. Sensors 2021, 21, 2731. [Google Scholar] [CrossRef]
Xue, G.; Wei, J.; Li, R.; Cheng, J. LeGO-LOAM-SC: An Improved Simultaneous Localization and Mapping Method Fusing LeGO-LOAM and Scan Context for Underground Coalmine. Sensors 2022, 22, 520. [Google Scholar] [CrossRef] [PubMed]
Xiong, L.; Fu, Z.; Zeng, D.; Leng, B. An optimized trajectory planner and motion controller framework for autonomous driving in unstructured environments. Sensors 2021, 21, 4409. [Google Scholar] [CrossRef] [PubMed]
Vanhellemont, Q.; Brewin, R.J.; Bresnahan, P.J.; Cyronak, T. Validation of Landsat 8 high resolution Sea Surface Temperature using surfers. Estuar. Coast. Shelf Sci. 2021, 265, 107650. [Google Scholar] [CrossRef]
Walton, C.C. A review of differential absorption algorithms utilized at NOAA for measuring sea surface temperature with satellite radiometers. Remote Sens. Environ. 2016, 187, 434–446. [Google Scholar] [CrossRef]
Lama, G.F.C.; Rillo Migliorini Giovannini, M.; Errico, A.; Mirzaei, S.; Padulano, R.; Chirico, G.B.; Preti, F. Hydraulic efficiency of green-blue flood control scenarios for vegetated rivers: 1D and 2D unsteady simulations. Water 2021, 13, 2620. [Google Scholar] [CrossRef]
Pomerleau, F.; Liu, M.; Colas, F.; Siegwart, R. Challenging data sets for point cloud registration algorithms. Int. J. Robot. Res. 2012, 31, 1705–1711. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Singh, S. Low-drift and real-time lidar odometry and mapping. Auton. Robot. 2017, 41, 401–416. [Google Scholar] [CrossRef]
De Oliveira Rente, P.; Brites, C.; Ascenso, J.; Pereira, F. Graph-based static 3D point clouds geometry coding. IEEE Trans. Multimed. 2018, 21, 284–299. [Google Scholar] [CrossRef]
Krivokuća, M.; Chou, P.A.; Koroteev, M. A volumetric approach to point cloud compression—Part ii: Geometry compression. IEEE Trans. Image Process. 2019, 29, 2217–2229. [Google Scholar] [CrossRef]
Guede, C.; Andrivon, P.; Marvie, J.E.; Ricard, J.; Redmann, B.; Chevet, J.C. V-PCC: Performance evaluation of the first MPEG Point Cloud Codec. In Proceedings of the SMPTE 2020 Annual Technical Conference and Exhibition, Virtual, 10–12 November 2020; pp. 1–27. [Google Scholar]
Elseberg, J.; Borrmann, D.; Nüchter, A. One billion points in the cloud–an octree for efficient processing of 3D laser scans. ISPRS J. Photogramm. Remote Sens. 2013, 76, 76–88. [Google Scholar] [CrossRef]
Tu, C.; Takeuchi, E.; Carballo, A.; Miyajima, C.; Takeda, K. Motion analysis and performance improved method for 3D LiDAR sensor data compression. IEEE Trans. Intell. Transp. Syst. 2019, 22, 243–256. [Google Scholar] [CrossRef]
Wang, X.; Şekercioğlu, Y.A.; Drummond, T.; Natalizio, E.; Fantoni, I.; Frémont, V. Fast depth video compression for mobile RGB-D sensors. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 673–686. [Google Scholar] [CrossRef]
Tu, C.; Takeuchi, E.; Miyajima, C.; Takeda, K. Compressing continuous point cloud data using image compression methods. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Janeiro, Brazil, 1–4 November 2016; pp. 1712–1719. [Google Scholar]
Feng, Y.; Liu, S.; Zhu, Y. Real-time spatio-temporal lidar point cloud compression. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 10766–10773. [Google Scholar]
Tu, C.; Takeuchi, E.; Carballo, A.; Takeda, K. Real-time streaming point cloud compression for 3d lidar sensor using u-net. IEEE Access 2019, 7, 113616–113625. [Google Scholar] [CrossRef]
Tu, C.; Takeuchi, E.; Carballo, A.; Takeda, K. Point cloud compression for 3D LiDAR sensor using recurrent neural network with residual blocks. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3274–3280. [Google Scholar]
Google. Draco: 3D Data Compression. 2018. Available online: https://github.com/google/draco (accessed on 6 May 2022).
Houshiar, H.; Nüchter, A. 3D point cloud compression using conventional image compression for efficient data transmission. In Proceedings of the 2015 XXV International Conference on Information, Communication and Automation Technologies (ICAT), Washington, DC, USA, 29–31 October 2015; pp. 1–8. [Google Scholar]
Liu, Z.; Yeh, R.A.; Tang, X.; Liu, Y.; Agarwala, A. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4463–4471. [Google Scholar]
Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. Rangenet++: Fast and accurate lidar semantic segmentation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 4213–4220. [Google Scholar]
Sun, X.; Wang, S.; Liu, M. A novel coding architecture for multi-line LiDAR point clouds based on clustering and convolutional LSTM network. IEEE Trans. Intell. Transp. Syst. 2020, 23, 2190–2201. [Google Scholar] [CrossRef]
Langer, F.; Milioto, A.; Haag, A.; Behley, J.; Stachniss, C. Domain Transfer for Semantic Segmentation of LiDAR Data using Deep Neural Networks. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020. [Google Scholar]
Sun, X.; Wang, S.; Wang, M.; Cheng, S.S.; Liu, M. An advanced LiDAR point cloud sequence coding scheme for autonomous driving. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2793–2801. [Google Scholar]
Rusu, R.B.; Cousins, S. 3D is here: Point cloud library (PCL). In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 1–4. [Google Scholar]
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
Yang, B.; Li, J. A hierarchical approach for refining point cloud quality of a low cost UAV LiDAR system in the urban environment-ScienceDirect. ISPRS J. Photogramm. Remote Sens. 2022, 183, 403–421. [Google Scholar] [CrossRef]
He, P.; Emami, P.; Ranka, S.; Rangarajan, A. Learning Scene Dynamics from Point Cloud Sequences. Int. J. Comput. Vis. 2022, 130, 669–695. [Google Scholar] [CrossRef]
Zhou, B.; He, Y.; Huang, W.; Yu, X.; Fang, F.; Li, X. Place recognition and navigation of outdoor mobile robots based on random Forest learning with a 3D LiDAR. J. Intell. Robot. Syst. 2022, 104, 267–279. [Google Scholar] [CrossRef]
Shi, Y.; Yang, C. Point cloud inpainting with normal-based feature matching. Multimed. Syst. 2022, 28, 521–527. [Google Scholar] [CrossRef]
Tian, J.; Zhang, T. Secure and effective assured deletion scheme with orderly overwriting for cloud data. J. Supercomput. 2022, 78, 9326–9354. [Google Scholar] [CrossRef]
Luan, S.; Chen, C.; Zhang, B.; Han, J.; Liu, J. Gabor Convolutional Networks. IEEE Trans. Image Process. 2017, 2018, 4357–4366. [Google Scholar]
Zhang, B.; Perina, A.; Li, Z.; Murino, V.; Liu, J.; Ji, R. Bounding multiple gaussians uncertainty with application to object tracking. Int. J. Comput. Vis. 2016, 118, 364–379. [Google Scholar] [CrossRef]
Zhang, B.; Yang, Y.; Chen, C.; Yang, L.; Han, J.; Shao, L. Action recognition using 3D histograms of texture and a multi-class boosting classifier. IEEE Trans. Image Process. 2017, 26, 4648–4660. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. System architecture of our LiDAR point cloud map encoder.

Figure 2. Intra-prediction technique based on segmentation.

Figure 3. Inter-frame point cloud inserting network.

Figure 4. Qualitative results: (a) campus; (b) city; (c) road; (d) residential (Best viewed by zooming in).

Figure 5. CR of different lossless coding schemes: (a) campus; (b) city; (c) road; (d) residential.

Figure 6.

D_{d}

-

b p p

curves of our method in compared to Google Draco [28], MPEG TMC13 [20] and Tu’s method [26]: (a) campus, (b) city, (c) road, and (d) residential.

Figure 6.

D_{d}

-

b p p

curves of our method in compared to Google Draco [28], MPEG TMC13 [20] and Tu’s method [26]: (a) campus, (b) city, (c) road, and (d) residential.

Table 1. Comparison ratio results with the Draco and Octree methods.

	Inter-Inserting Method			Octree [35]			Draco [28]
Scene	QA			DR			QB
	2 mm	5 mm	1 cm	1 mm³	5 mm³	1 cm³	17 (bits)	15 (bits)	14 (bits)
Campus	3.09	2.49	2.01	21.27	8.05	5.75	11.87	7.75	5.47
City	3.90	3.38	2.83	23.98	10.76	8.40	12.52	8.38	6.49
Road	3.16	0.26	2.12	23.56	10.35	7.99	12.31	8.35	6.59
Residential	4.29	3.68	3.16	22.94	9.72	7.37	12.66	8.59	6.33
Average	3.61	3.03	2.53	20.23	9.72	7.29	12.34	8.27	6.22

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Q.; Jiang, L.; Sun, X.; Zhao, J.; Deng, Z.; Yang, S. An Efficient LiDAR Point Cloud Map Coding Scheme Based on Segmentation and Frame-Inserting Network. Sensors 2022, 22, 5108. https://doi.org/10.3390/s22145108

AMA Style

Wang Q, Jiang L, Sun X, Zhao J, Deng Z, Yang S. An Efficient LiDAR Point Cloud Map Coding Scheme Based on Segmentation and Frame-Inserting Network. Sensors. 2022; 22(14):5108. https://doi.org/10.3390/s22145108

Chicago/Turabian Style

Wang, Qiang, Liuyang Jiang, Xuebin Sun, Jingbo Zhao, Zhaopeng Deng, and Shizhong Yang. 2022. "An Efficient LiDAR Point Cloud Map Coding Scheme Based on Segmentation and Frame-Inserting Network" Sensors 22, no. 14: 5108. https://doi.org/10.3390/s22145108

APA Style

Wang, Q., Jiang, L., Sun, X., Zhao, J., Deng, Z., & Yang, S. (2022). An Efficient LiDAR Point Cloud Map Coding Scheme Based on Segmentation and Frame-Inserting Network. Sensors, 22(14), 5108. https://doi.org/10.3390/s22145108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Efficient LiDAR Point Cloud Map Coding Scheme Based on Segmentation and Frame-Inserting Network

Abstract

1. Introduction

2. Point Cloud Coding: A Brief Review

2.1. Volumetric/Tree-Based Point Clouds Coding

2.2. Image/Video-Based Point Clouds Coding

2.3. Summary

3. Overall Codec Architecture

4. Intra-Frame Point Cloud Coding Based on Semantic Segmentation

4.1. Overview of Intra-Coding Network

4.2. LiDAR Point Cloud Segmentation

4.3. Segmentation-Based Intra-Prediction Technique

4.4. Residual Data Coding

5. Inter-Frame Point Cloud Coding Based on Inserting Network

5.1. Overall Inter-Prediction Network

5.2. Point Cloud Interpolation Module

5.3. Inter Loss Function Design

5.4. Visualization Results

6. Experimental Results

6.1. Evaluation Metric

6.2. Coding Performance for a Single Frame

6.3. Comparsion with Octree and Draco

6.4. Rate-Distortion Curves

6.5. Computational Complexity

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI