Texture-Guided Graph Transform Optimization for Point Cloud Attribute Compression

Shao, Yiting; Song, Fei; Gao, Wei; Liu, Shan; Li, Ge

doi:10.3390/app14104094

Open AccessArticle

Texture-Guided Graph Transform Optimization for Point Cloud Attribute Compression

by

Yiting Shao

^1,2

,

Fei Song

^1,2,

Wei Gao

¹

,

Shan Liu

³ and

Ge Li

^1,*

¹

School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China

²

Peng Cheng Laboratory, Shenzhen 518066, China

³

Media Laboratory, Tencent, Palo Alto, CA 94306-2028, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(10), 4094; https://doi.org/10.3390/app14104094

Submission received: 25 February 2024 / Revised: 24 April 2024 / Accepted: 7 May 2024 / Published: 11 May 2024

Download

Browse Figures

Versions Notes

Abstract

There is a pressing need across various applications for efficiently compressing point clouds. While the Moving Picture Experts Group introduced the geometry-based point cloud compression (G-PCC) standard, its attribute compression scheme falls short of eliminating signal frequency-domain redundancy. This paper proposes a texture-guided graph transform optimization scheme for point cloud attribute compression. We formulate the attribute transform coding task as a graph optimization problem, considering both the decorrelation capability of the graph transform and the sparsity of the optimized graph within a tailored joint optimization framework. First, the point cloud is reorganized and segmented into local clusters using a Hilbert-based scheme, enhancing spatial correlation preservation. Second, the inter-cluster attribute prediction and intra-cluster prediction are conducted on local clusters to remove spatial redundancy and extract texture priors. Third, the underlying graph structure in each cluster is constructed in a joint rate–distortion–sparsity optimization process, guided by geometry structure and texture priors to achieve optimal coding performance. Finally, point cloud attributes are efficiently compressed with the optimized graph transform. Experimental results show the proposed scheme outperforms the state of the art with significant BD-BR gains, surpassing G-PCC by 31.02%, 30.71%, and 32.14% in BD-BR gains for Y, U, and V components, respectively. Subjective evaluation of the attribute reconstruction quality further validates the superiority of our scheme.

Keywords:

point cloud attribute compression; graph transform; texture-guided graph optimization

1. Introduction

The advancement of three-dimensional (3D) sensing and acquisition technologies has elevated point clouds as a prevalent media format for representing 3D objects and scenes across various multimedia applications [1,2,3,4]. A point cloud consists of a set of points distributed in 3D space, with each point possessing geometry positions and attribute information, such as color, intensity, reflectance, etc. Concurrently, the escalating market demand for high-resolution 3D representation has led to a significant increase in the data volume of point clouds, characterized by abundant geometry information and rich attributes [5,6,7]. Consequently, the imperative to compress point clouds while preserving high fidelity has garnered substantial attention from both industry and academia. The field of point cloud compression has seen fruitful advancements, particularly in geometry compression [8,9,10] and attribute compression [11,12,13]. In this paper, we focus on point cloud attribute compression. Here, the attributes primarily pertain to point cloud color information, chosen as a representative attribute for evaluating compression performance due to its visually informative nature. However, it is important to note that our approach extends beyond color alone. The essence of our method lies in leveraging the spatial correlation of attributes within point clouds. Therefore, our approach is applicable to various attributes present in point clouds.

The objective of point cloud attribute compression is to minimize the required number of bits for representation by eliminating redundancy while ensuring the quality of the decoded point cloud through a rate-distortion (R-D) optimization process. Numerous prediction-based attribute compression methods have been devised for removing attribute redundancy, demonstrating notable performance [14,15,16], particularly the attribute predictive coding scheme within the standard G-PCC platform [17]. However, the sparse and uneven geometry distribution inherent in point clouds presents challenges in reducing attribute correlation effectively in 3D space [18]. In G-PCC, a point cloud is represented through levels of detail (LoD), and attribute intra-coding is executed within this structure using a k-nearest neighbor (k-NN)-based attribute linear interpolation scheme. Nevertheless, the LoD structure in G-PCC fails to fully consider the 3D topological structure of point clouds, resulting in an incomplete capture of the underlying attribute correlation within the geometry [15]. Consequently, there is a demand for improved coding tools to effectively address frequency-domain signal redundancy in point cloud attribute compression.

Graphs have emerged as popular representations of point clouds, providing a flexible framework for capturing the intricate spatial relationships inherent in such data [19,20,21]. Given that point clouds comprise irregularly distributed points in 3D space, graph structures adeptly encapsulate these complex geometries and topologies. Concurrently, graph transform tools have become effective components in point cloud attribute compression [16,18,22]. The work in [22] is a pioneer that leverages graph transform tools to address the challenges of attribute redundancy within point clouds. Based on this, Shao et al. [16] propose a sparsity-optimized graph transform to improve the coding performance. Recently, Song et al. [18] proposed an optimized graph transform scheme based on region-aware geometry analysis, which is the state-of-the-art graph transform scheme for point cloud attribute compression.

Current graph transform-based attribute coding schemes encounter several limitations. Firstly, many of these schemes rely solely on k-NN methods for graph construction, which restricts their ability to capture inherent attribute correlations, consequently limiting transform performance. To address this issue, our approach incorporates graph optimization techniques, improving graph construction through tailored energy functions. Secondly, existing graph-based coding schemes predominantly utilize geometry correlations for graph node identification and edge weight determination, thereby overlooking the potential benefits of texture-guided analysis in enhancing graph construction accuracy. We aim to enhance graph representations by integrating texture-guided analysis into our approach. Lastly, the oversight of sparsity optimization in existing schemes results in high computational complexity. Our solution addresses this challenge by optimizing sparsity within the graph, effectively reducing computational overheads in point cloud attribute compression. Unlike previous works solely relying on geometry information for graph sparsity optimization [16,18], we introduce a novel texture-guided graph transform optimization scheme to overcome the limitations of current graph-based attribute codecs and achieve more efficient point cloud attribute compression.

To address the challenges above, we present a texture-guided graph transform optimization scheme tailored for point cloud attribute compression. Our contributions can be delineated as follows:

We design a novel point cloud graph transform coding framework, which incorporates region-adaptive graph construction and R-D optimized transform to improve coding performance. By formulating attribute transform as a graph optimization problem, our scheme achieves optimal transform coding performance by effectively leveraging attribute correlation, thus outperforming state-of-the-art methods.
We propose a texture-guided graph optimization scheme to fully capture underlying attribute correlation in point clouds. Texture analysis is performed in point cloud local regions to guide graph optimization. By optimizing graph construction with a tailored optimization function, the scheme improves the fidelity of graph representations and consequently enhances transform performance.
We introduce a R-D optimized graph transform scheme with graph sparsity constraints. Our optimization framework considers both the decorrelation capability of the transform and the sparsity of the constructed graph. This enhancement not only bolsters coding performance but also augments the efficiency of graph-based point cloud transform processing.

The subsequent sections of this paper unfold as follows: Section 2 provides a comprehensive literature review on point cloud attribute compression. Section 3 presents the details of the proposed framework. Section 4 offers a detailed exposition of the experimental results and analysis. Finally, concluding remarks are presented in Section 5.

2. Related Work on Point Cloud Attribute Compression

The efficient compression of point cloud attributes necessitates the removal of two main types of signal redundancy: spatial-domain redundancy and frequency-domain redundancy. Current approaches to point cloud attribute compression can be categorized into three groups based on the methods employed for redundancy removal: prediction based, transform based, and hybrid schemes. Prediction-based schemes aim to remove redundancy by predicting attribute values from neighboring points, while transform-based schemes focus on representing attributes in transformed domains to exploit signal decorrelation. Hybrid schemes integrate a combination of attribute prediction and transform coding tools within a comprehensive framework.

In the realm of prediction-based attribute codecs, Schnabel et al. in [23] first propose a prediction-based attribute coding scheme by creating the color octree to enable neighboring-based prediction. Based on this, Huang et al. in [24] propose an octree-based progressive attribute coding scheme to remove attribute redundancy in octree nodes. Instead of using octree for point cloud representation, Chen et al. [25] adopt the Hilbert space-filling curve to reorganize the point cloud and design a prediction scheme based on the reorganized scan order for point cloud attribute compression. Moreover, Yang et al. [15] introduce the partial differential equation to tackle the attribute prediction task and devise a PDE-based attribute coding scheme to perform attribute prediction by optimizing attribute gradients of adjacent point cloud areas. Those prediction-based attribute codecs achieve notable progress in attribute spatial redundancy removal, while attribute frequency-domain redundancy needs to be further reduced.

Regarding transform-based attribute codecs, Zhang et al. in [22] pioneer a transform-based approach by decomposing point clouds into graphs and employing graph Fourier transform (GFT) for attribute compression. Building upon this, Cohen et al. in [26] extend the work with an enhanced graph construction scheme for improved point cloud attribute compression. Then, Shao et al. in [27] propose to integrate block partitioning with k-dimensional trees for graph construction. Further, Shao et al. in [16] provide a Lagrangian-optimized GFT in a block-based attribute coding scheme. Xu et al. in [28] introduce a spatio-temporal GFT for dynamic point cloud attribute compression. In response to the time-consuming nature of GFT-based codecs, Queiroz et al. in [29] devise a more efficient transform-based attribute coding scheme named RAHT, and this work is adopted as a part of the solution in MPEG G-PCC. Moreover, Song et al. in [18] design a block-adaptive graph transform method to improve attribute coding performance. However, the approach solely relies on geometry information for constructing the graph transform basis, which could potentially limit its coding performance. We propose to enhance it through a texture-guided graph optimization scheme, leveraging texture information alongside geometry to improve graph construction and coding performance.

Graph optimization is crucial in the design of graph transform coding tools for point cloud attribute compression. The challenge lies in deriving an appropriate graph kernel that accurately reflects attribute similarities within local regions of the point cloud. Many existing graph optimization approaches utilize the graph Laplacian operator to capture the irregular structure of the point cloud data by representing it as a graph. This operator is essential for graph spectral decomposition to derive the associated GFT basis. Zhang et al. in [30] formulate the attributes in a point cloud as signals following a Gaussian Markov Random Field (GMRF) model and construct graphs in the point cloud. Then, the redundancy among point cloud attribute signals can be optimally decorrelated by the eigenvector-based transform matrix derived from the graph Laplacian. Dong et al. [31] propose an optimization-based graph Laplacian matrix learning scheme to minimize the smoothness property of the graph signals. Kalofolias in [32] proposes a new sparsity-based scheme to learn the graph structure underlying a set of smooth signals. Moreover, Hu et al. [33] propose a feature-based graph learning scheme for 3D point clouds by optimizing a feature metric. Inspired by these works, we aim to enhance current graph transform coding schemes by introducing a more effective graph optimization scheme.

A recent representative hybrid point cloud codec is Geometry-based Point Cloud Compression (G-PCC) [17], a standard platform released by the Moving Picture Experts Group (MPEG). G-PCC is an integrated point cloud coding solution including a set of geometry codecs (octree-based, predictive tree-based, and trisoup-based compression schemes) and several attribute coding tools (region-adaptive hierarchical transform, attribute prediction, and lifting transform). The attribute prediction and transform coding tools within G-PCC have demonstrated notable performance in point cloud attribute coding, contributing to its recognition as a state-of-the-art solution [34]. For further technical insights into G-PCC, comprehensive details can be found in [5,6,7]. Given its established performance and comprehensive features, G-PCC serves as the baseline for evaluating and benchmarking advancements in point cloud attribute compression techniques.

3. The Proposed Texture-Guided Graph Transform Optimization Scheme for Point Cloud Attribute Compression

3.1. Problem Formulation

Building on the insights from [30], it points out that when a signal follows a GMRF model, the eigenvectors of its precision matrix can optimally decorrelate the signal within an underlying graph structure. Once the spatial prediction on those GMRF signals is complete, with the constructed graph structure, the eigenanalysis of the GMRF model’s graph Laplacian matrix can serve as the optimal transform on those signals’ residual to compact its energy for quantization and entropy coding. Therefore, to obtain the optimal prediction and transform coding performance on compressing point cloud attribute, we propose to formulate the point cloud as a GMRF model and intend to construct the optimal graph within the GMRF model to pursue the most efficient graph transform coding performance.

Let

x = {(x_{1}, \dots, x_{m})}^{T}

represent the point cloud attribute signals within a GMRF model. The density function of

x

with mean

μ

and a precision matrix

Q

can be defined as [30]:

p (x) = {(2 π)}^{- \frac{m}{2}} {| Q |}^{\frac{1}{2}} e x p (- \frac{1}{2} {(x - μ)}^{T} Q (x - μ)),

(1)

where the matrix

Q

is symmetric and positive definite.

Q

is the inverse of the covariance matrix in a typical multivariate Gaussian distribution. This property ensures that

Q

effectively captures the interdependence among the elements of the attribute signals

x

.

Certainly, the precision matrix

Q

in Equation (1) can be further partitioned based on the current attribute signals

x_{c}

to be coded and the reference attribute signals

x_{r}

that have already been coded. This partitioning facilitates a more targeted analysis of the interdependencies within the data. Let

x_{c} = {(x_{c_{1}}, \dots, x_{c_{n}})}^{T}

represent the current attribute signals to be coded, and

x_{r} = {(x_{r_{1}}, \dots, x_{r_{n}})}^{T}

denote the reference attribute signals that have already been coded. Then, the precision matrix

Q

can be partitioned as [28]:

Q = (\begin{matrix} Q_{c, c} & Q_{c, r} \\ Q_{c, r}^{T} & Q_{r, r} \end{matrix}),

(2)

where

Q_{c, c} \in R^{n \times n}

,

Q_{r, r} \in R^{n \times n}

, and

Q_{c, r} \in R^{n \times n}

are the sub-matrices.

Q_{c, c}

is the desired precision matrix for the optimal transform of current attributes

x_{c} = {(x_{c_{1}}, \dots, x_{c_{n}})}^{T}

.

Q_{r, r}

can be obtained with the decoded reference attributes

x_{r} = {(x_{r_{1}}, \dots, x_{r_{n}})}^{T}

.

Q_{c, r} \in R^{n \times n}

represents the spatial attribute correlation between

x_{c}

and

x_{r}

. Based on the deviation in [28], the conditional distribution

x_{c} | x_{r}

also adheres to a GMRF with mean

μ_{x_{c} | x_{r}}

and a precision matrix

Q_{c, r} \in R^{n \times n}

, which are defined as [28]:

\begin{matrix} μ_{x_{c} ∣ x_{r}} & = μ_{x_{c}} - Q_{c, c}^{- 1} Q_{c, r} (x_{r} - μ_{x_{r}}), \\ Q_{x_{c} ∣ x_{r}} & = Q_{c, c}, \end{matrix}

(3)

where

μ_{x_{c}}

and

μ_{x_{r}}

are the mean of the current attributes

x_{c}

and the reference attributes

x_{r}

, respectively.

Combining Equations (2) and (3), it becomes evident that achieving optimal decorrelation between the current attributes

x_{c}

and the reference attributes

x_{r}

involves the following steps: First, subtract the conditional mean

μ_{x c | x r}

as the intra prediction from

x_{c}

. Next, construct a graph to model the spatial relationship

x_{c} | x_{r}

among the attributes and derive the precision matrix

Q_{c, c}

. Subsequently, utilizing the eigenvector matrix of

Q_{c, c}

, the residuals of

x_{c}

can be decorrelated into the frequency domain, resulting in a more compact representation.

Therefore, the fundamental task for achieving optimal transform in point cloud attribute compression lies in constructing an optimal graph to obtain the optimal transform matrix. To construct a graph that models the spatial relationship among point cloud attributes, we consider a graph

G = (V, E, W)

comprising the point set

V

and the edge set

E

connecting adjacent points, along with a weighted adjacency matrix

W

. The adjacency matrix

W

is a symmetric non-negative weight matrix describing the weights assigned to the edges connecting two points. Then, the graph Laplacian matrix

L

can be obtained as

L = D - W

, where

D

is a diagonal matrix formulated as

D_{i, i} = \sum_{j = 1}^{n} W_{i, j}

. The matrix

D

represents the degree of connection between each point and other points on the graph.

With the constructed graph

G

and the associated graph Laplacian matrix

L

, the graph transform matrix

Φ

can be obtained via the eigenvalue decomposition of the graph Laplacian matrix

L

, expressed as

L = Φ Λ Φ^{- 1}

, where

Λ

is a diagonal matrix containing the eigenvalues of the graph Laplacian matrix

L

. Finally, we utilize the obtained optimal graph transform matrix

Φ

in point cloud attribute compression. The transformed point cloud attributes

\hat{x}

are obtained through the following process:

\hat{x} = Φ^{T} x .

(4)

3.2. Overview of Our Proposed Framework

The pipeline of the proposed texture-guided graph transform optimization framework for point cloud attribute compression is depicted in Figure 1. Before the attribute compression, the geometry information is already coded and available as side information for point cloud attribute compression. Then, at the beginning of the attribute compression framework, the geometry information is organized based on the Hilbert curve in 3D space to preserve attribute spatial correlation. Subsequently, the reordered point cloud is divided into local clusters with uniform point counts, ensuring balanced graph sizes for subsequent processing. Intra-cluster attribute analysis is then conducted to extract texture complexity priors from local regions, which guide the optimization of the graph. Inter-cluster attribute prediction further reduces spatial redundancy by leveraging attributes from previously decoded clusters. During the graph optimization process, point attribute residuals are treated as graph signals, and the graph topology and adjacency matrix are determined by our optimization-based graph construction scheme. This scheme considers transform decorrelation, graph sparsity, distortion, and bitrate terms to optimize the graph transform for each cluster, integrating both geometry structure and texture priors. Compressed attribute residuals are obtained using the optimized graph transform, resulting in transformed point cloud attribute coefficients. Finally, these coefficients are quantized, entropy coded, and assembled into the total bitstream, thereby completing the compression process.

3.3. Point Cloud Reorganization and Clustering

The requirement for point cloud reorganization stems from the arbitrary arrangement of points in the point cloud, posing challenges for efficient attribute redundancy removal during compression. Thus, before compressing point cloud attributes, meaningful reorganization is essential. Proper reordering is crucial for preserving attribute spatial correlation, which significantly influences the efficiency and quality of the compression codec. Based on the requirement for point cloud reorganization, we select the Hilbert curve-based clustering method based on its advantages in computational efficiency and clustering quality.

In terms of computational complexity, constructing a Hilbert curve involves recursively subdividing the 3D space into smaller cells and ordering them along the curve. Once constructed, assigning points to clusters based on their proximity along the curve can be relatively efficient. In contrast, the computational complexity of the k-means clustering method depends on the number of clusters (k) and the size of the dataset, often requiring multiple iterations until convergence. Regarding the clustering quality, the Hilbert curve’s ability to preserve spatial locality is crucial for maintaining attribute correlation within the point cloud. Studies such as [25] have shown that the Hilbert curve consistently outperforms other space-filling curves, like the Morton code, in preserving attribute correlation across point clouds with varying sparsity levels. While k-means clustering aims to minimize within-cluster variance, its performance can be sensitive to initialization and the choice of k, leading to inconsistent clustering quality. The Hilbert curve-based clustering method offers advantages in preserving spatial locality and potentially reducing computational complexity for point cloud reorganization. These advantages align well with the goals of our proposed point cloud attribute codec, hence our decision to employ the Hilbert order to reorganize the points in the point cloud.

Leveraging the geometry spatial correlation, the Hilbert-based point cloud reorganization aims to preserve attribute spatial correlation within a one-dimensional (1D) sequence of points. The Hilbert order of a point is derived from its 3D positions within the point cloud, represented as an ordering number on the Hilbert space-filling curve. Table 1 illustrates the mapping rules from 3D locations in a space cube to the Hilbert code. Utilizing these mapping rules, all 3D points in the point cloud can be organized into a 1D sequence of points, akin to a fitted Hilbert filling curve. The Hilbert-based point reorganization process is demonstrated in Figure 2. Figure 2a showcases an example of the fitted Hilbert filling curve within a

2 \times 2 \times 2

3D cube, with the maximum geometry displacement along the curve limited to 1 unit. In Figure 2b, it is evident that compared to the popular Morton code-based order, the Hilbert curve yields a more compact reordered point sequence with smaller geometry displacements. This compactness is advantageous for preserving geometry-associated attribute spatial correlation. It is posited that maintaining nearby points in close proximity in 3D space during the point cloud reorganization process is desirable, as it enhances the contribution of attribute prediction and transform coding. Therefore, it is desired to keep nearby points in 3D space close in the point cloud reorganization process. By arranging points into a 1D sequence using the Hilbert space-filling curve, we facilitate the creation of more efficient and compact blocks over the sorted points in the point cloud.

Following Hilbert-based point cloud reorganization, we apply uniform clustering to the reordered 1D point sequence to generate point cloud clusters, which serve as attribute prediction units in subsequent processing. Specifically, suppose there are m resorted points in the current point cloud to be assigned into k clusters. In that case, the desired number of points n for each cluster is computed as

n = floor (\frac{m}{k})

, where the floor function

floor (\cdot)

provides the nearest integer down of the value

\frac{m}{k}

. Let

p = (p_{1}, \dots, p_{m})

denote the resorted points following a Hilbert space-fitting curve of the point cloud. With the uniform clustering method, the resorted point sets

(p_{1}, \dots, p_{n})

are assigned to the first cluster, and the resorted point sets

(p_{i}, \dots, p_{i \cdot n})

are assigned to the ith cluster, where

i = 1, \dots, (k - 1)

. The last cluster contains the resorted point sets

(p_{n (k - 1) + 1}, \dots, p_{m})

.

An important property of the proposed Hilbert-based point cloud reorganization is that two consecutive resorted points on the curve differ in their coordinate value in only one dimension, while the coordinate values in other dimensions remain unchanged. This ensures spatial coherence among all resorted points. Leveraging this property, the proposed Hilbert-based uniform clustering method not only generates a series of point cloud subsets with a uniform number of points but also provides a compact representation and avoids density jumps observed in other point cloud organizations like the Morton-based methods.

3.4. Attribute Inter-Cluster Prediction and Intra-Cluster Analysis

With the generated point cloud clusters from the previous stage, we compress the attributes in those clusters in sequential order. The optimal predictive transform for the current attributes essentially considers the correlation between points given a reference set, which is the conditional probability distribution problem. As formulated in Equations (2) and (3) of Section 3.1, achieving the optimal decorrelation of current attributes

x_{c}

with reference attributes

x_{r}

involves a multi-step process: Firstly, the intra prediction involves subtracting the conditional mean

μ_{x c ∣ x r}

from

x_{c}

. Subsequently, a graph is constructed to model the spatial relationship

x_{c} | x_{r}

among the attributes, facilitating the derivation of the precision matrix

Q_{c, c}

. Leveraging the eigenvector matrix of

Q_{c, c}

, the residuals of

x_{c}

are decorrelated into the frequency domain, achieving a more compact representation.

Building upon this foundation, we engage in inter-cluster attribute prediction to diminish spatial redundancy within the current cluster, leveraging previously decoded clusters as references. In our inter-cluster attribute prediction scheme, the mean of the decoded attributes in the reference point cloud cluster, denoted as

μ_{x r}

, serves as a reference for the mean of the current attributes

x c

(as per Equation (3)). The attributes in the current cluster are then subtracted from the mean of the reference attributes, yielding attribute residuals. These residuals have effectively mitigated inter-cluster spatial redundancy, priming them for further compression via graph transform coding tools.

To better guide the design of graph transform coding tools, we propose an intra-cluster attribute analysis scheme aimed at extracting texture priors before the graph optimization process. Graph construction operates under the assumption that signals on the graph exhibit smoothness concerning an underlying sparse representation in the graph spectral domain. Traditional graph optimization algorithms [31,33] often consider all possible edges between points in a point cloud cluster, resulting in high computational costs per iteration. However, fully connected graphs do not always yield optimal compression performance [20]. Oftentimes, appropriate sparsity parameters are necessary to achieve optimal compression performance and processing efficiency. To address this, we propose an intra-cluster attribute analysis to estimate the desired graph sparsity, i.e., the desired average number of connecting neighbors per point before the graph optimization process. The specifics of our intra-cluster attribute analysis are as follows:

Let

k_{d}

represent the desired graph sparsity to be estimated within the point cloud cluster. To mitigate the complexity associated with fully considering all possible edges for deriving

k_{d}

, we constrain

k_{d}

within a limited range

(1, \dots, k_{0})

. Subsequently, we employ the k-NN method to construct a graph with edges connecting points to their k nearest neighbors, wherein various k values are tested within the range

(1, \dots, k_{0})

. We evaluate all

k_{0}

constructed graphs using our texture complexity evaluation. For each graph, we compute the texture complexity variation

Δ E

between the attributes of all graph nodes and the average attributes within the graph, employing the classic CIEDE2000 Color-Difference Formula [35]. The total texture complexity variation

Δ E

for all m points in the constructed graph is defined as [35]:

Δ E = \sum_{i = 1}^{m} \sqrt{{(\frac{Δ L_{i}^{'}}{k_{L} S_{L}})}^{2} + {(\frac{Δ C_{i}^{'}}{k_{C} S_{C}})}^{2} + {(\frac{Δ H_{i}^{'}}{k_{H} S_{H}})}^{2} + R_{T} \frac{Δ C_{i}^{'}}{k_{C} S_{C}} \frac{Δ H_{i}^{'}}{k_{H} S_{H}}}

(5)

where

Δ L_{i}^{'}

,

Δ C_{i}^{'}

, and

Δ H_{i}^{'}

are attribute differences between the graph node i and the average attribute of all graph nodes in lightness, chroma, and hue, respectively.

k_{L}

,

k_{C}

, and

k_{H}

are adjustment factors.

S_{L}

,

S_{C}

, and

S_{H}

are weighting functions.

R_{T}

is a rotation function to address hue rotation issues.

Among all constructed graphs with varying sparsity levels, the graph exhibiting the lowest variation in texture complexity is chosen as the initial graph for the subsequent graph optimization process. The corresponding sparsity value of the selected graph is considered the desired graph sparsity

k_{d}

. We define the selection process of the initial desired graph

G_{d} = (V_{d}, E_{d}, W_{d})

as:

G_{d} (V_{d}, E_{d}, W_{d}) = a r g min_{i} (Δ E_{1}, \dots, Δ E_{i}, \dots, Δ E_{k_{0}}),

(6)

where

Δ E_{i}

represents the texture complexity variation for the ith constructed graph.

G_{d}

denotes the selected initial desired graph.

V_{d}

,

E_{d}

, and

W_{d}

represent the vertex set, edge set, and weighted adjacency matrix of the selected graph, respectively.

The connectivity between n graph nodes in the selected graph

G_{d}

is encapsulated by the adjacency matrix

W_{d}

, which is transformed into a graph connectivity mask matrix

A \in R^{n \times n}

. We define the graph connectivity mask matrix

A

as:

A_{i j} = \{\begin{matrix} 1 & if (i, j) \in E_{d}, \\ 0 & otherwise . \end{matrix}

(7)

It is notable that the derived graph sparsity parameter, denoted as

k_{d}

, for each cluster of the point cloud must be encoded into the bitstream. This necessity arises from the utilization of texture information during intra-cluster analysis to determine

k_{d}

. Consequently, during decoding, the process begins with the reorganization of the point cloud and uniform clustering based on the reconstructed geometry. Subsequently, the decoder retrieves and employs the decoded value of

k_{d}

to construct k-NN graphs within each cluster, ensuring consistency between the encoder and decoder regarding the desired graph sparsity

k_{d}

and the resulting graph connectivity mask matrix

A

for every cluster of the point cloud.

3.5. Point Cloud Graph Transform Optimization

Graph optimization facilitates the direct apprehension of the latent graph structure underlying point attributes, presuming their smooth distribution on the graph. Employing the acquired graph transform, residual attributes of the point cloud can be effectively decorrelated into a more condensed representation in the frequency domain, which is conducive to subsequent quantization and entropy coding procedures. Pursuing the optimal efficacy of graph transform involves optimizing the lossy coding performance, specifically by minimizing the introduced rate and distortion terms through transform coding. This optimization can be articulated as an optimization problem aimed at minimizing the cost function J with the Lagrange multiplier

λ

. We formulate the minimization process of the cost function J as follows:

min_{W} J = D (x, W, q) + λ \cdot R (x, W, q),

(8)

where x is the attribute residual to be coded,

W

is the graph adjacency matrix, and q is the uniform quantization step.

D (x, W, q)

denotes the attribute distortion introduced by the uniform quantization on transformed attribute residuals.

R (x, W, q)

is the total bitrate to encode the transformed attribute residuals after quantization.

In our graph optimization approach, we incorporate a graph sparsity constraint to enhance coding performance and reduce computational complexity. Therefore, we propose an optimization-based graph construction scheme employing a joint energy function that integrates three key terms: the distortion term

D (W)

, which quantifies the attribute distortion between the original attribute residuals and the quantized transformed attribute residuals; the bitrate term

R_{b} (W)

, representing the bitrate expenditure for coding the quantized transformed attribute residuals; and the graph sparsity constraint term

R_{s} (W)

, which acts as a regularizer to enforce graph sparsity constraints during the optimization process. We formulate the joint rate–distortion–sparsity graph transform optimization as:

min_{W} J = D (W) + λ_{b} R_{b} (W) + λ_{s} R_{s} (W),

(9)

where

λ_{b}

and

λ_{s}

serve as weights ranging from 0 to 1, determining the tradeoff among the three terms. We provide detailed explanations for each term—the attribute distortion term

D (W)

, the attribute bitrate term

R_{b} (W)

, and the graph sparsity term

R_{s} (W)

—each pertaining to the optimization objective of the graph adjacency matrix

W

.

Attribute distortion term. Since the graph transform is a lossless coding tool, it would not introduce distortion on attribute residuals. The attribute distortion is introduced by the uniform quantization on transformed attribute coefficients. Therefore, we define the attribute distortion term using the classic equation in [18] as:

D (W) = \frac{q^{2} n}{12},

(10)

where q is the uniform quantization step, and n is the number of attribute coefficients. From the definition, we can see that the distortion is only related to the quantization step.

Attribute bitrate term. Inspired by the work in [36], the bitrate expense for transformed attribute coefficients after the quantization can be estimated from the smoothness degree of attribute signals in the graph. Therefore, we define the bitrate term

R_{b} (W)

as:

R_{b} (W) = \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} W_{i, j} {(x_{i} - x_{j})}^{2} / q^{2},

(11)

where

x_{i}

and

x_{j}

are the attribute residuals to be coded. Notably, when the attribute residuals are utilized in the graph optimization process, the optimized graph transform matrix must be transmitted to the decoder. To mitigate the overhead associated with signaling the graph transform matrix, we exploit the underlying correlation between the attribute and the geometry in the local cluster of a point cloud. As proposed in Section 3.3, our Hilbert-based uniform clustering method can generate a set of point cloud local clusters with preserved spatial correlation, which enhances the geometry–attribute correlation assumption. Therefore, the modified bitrate term

R_{b} (W)

in Equation (11) is defined as:

R_{b} (W) = \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} a W_{i, j} {(g_{i} - g_{j})}^{2} / q^{2},

(12)

where

g_{i}

and

g_{j}

denote the decoded geometry positions for the attribute residuals

x_{i}

and

x_{j}

, respectively. The parameter a quantifies the correlation between the point cloud geometry and attribute, determined through statistical analysis of the relationship between the decoded geometry and reconstructed attributes from previously decoded point cloud clusters.

Graph sparsity term. Recognizing that graph sparsity impacts both coding performance and computational complexity in attribute compression, we introduce the graph sparsity term

R_{s} (W)

to enforce graph sparsity constraints during optimization. Following [37], the graph sparsity term

R_{s} (W)

is defined as:

R_{s} (W) = - α 1^{⊤} log (W 1) + \frac{β}{2} {∥ W ∥}_{F}^{2},

(13)

where

1 = {[1, \dots, 1]}^{⊤}

.

α

and

β

are two tradeoff parameters.

W 1

describes the connectivity degrees of graph nodes. The first term

- 1^{⊤} log (W 1)

guarantees a meaningful graph whose each node is connected to at least one edge with another node. The second term

\frac{1}{2} {∥ W ∥}_{F}^{2}

is the Frobenius norm of the graph adjacency matrix to control the sparsity by penalizing large degrees of node connectivity.

Texture-guided Constrained Graph Transform Optimization. To expedite the optimization process while bypassing exhaustive exploration of all conceivable edge connections, we incorporate the graph connectivity mask

A

obtained through intra-cluster attribute analysis as delineated in Section 3.4. This mask serves to impart a texture-guided constraint onto the graph adjacency matrix

W

under optimization. Subsequently, the resultant constrained graph adjacency matrix

\tilde{W}

is formulated as follows:

\tilde{W} = A \circ W,

(14)

where ∘ denotes the Hadamard product. Utilizing the constrained adjacency matrix

\tilde{W}

, the updated attribute bitrate term

R_{b} (\tilde{W})

in Equation (12) and the geometry sparsity term

R_{s} (\tilde{W})

in Equation (13) undergo modification as follows:

R_{b} (\tilde{W}) = \frac{a}{2 q^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} {\tilde{W}}_{i, j} {(g_{i} - g_{j})}^{2},

(15)

R_{s} (\tilde{W}) = - α 1^{⊤} log (\tilde{W} 1) + \frac{β}{2} {∥ \tilde{W} ∥}_{F}^{2},

(16)

where the parameters

α

and

β

are consolidated into a singular optimization parameter

θ = \frac{1}{\sqrt{α β}}

as advocated in [37]. The parameter

θ

is related to the desired graph sparsity

k_{d}

. We will provide a detailed ablation analysis about the impact of the texture-guided graph sparsity

k_{d}

and the sparsity-related parameter

θ

on the point cloud attribute coding performance.

Building upon the insights from [37], we introduce a numerical optimization algorithm tailored to tackle the graph transform optimization problem presented in Equation (9). Commencing with the initial graph

G_{d} = (V_{d}, E_{d}, W_{d})

and its corresponding desired graph sparsity

k_{d}

as determined from the preceding texture-guided intra-cluster analysis outlined in Section 3.4, we employ the primal–dual algorithm outlined in [38] to address our proposed graph optimization problem using a splitting optimization strategy. Within this framework, the weight optimization step automatically discerns the edges to be nullified and sets their optimal values. Subsequently, with the optimized graph structure in hand, the graph Laplacian is computed to facilitate the compression of attribute residuals via transform. Consequently, point cloud attribute residuals are efficiently compressed by leveraging the optimized graph transform. Finally, the transformed point cloud attribute coefficients undergo quantization and entropy coding to produce the total bitstream.

In our scheme, we do not compress the entire graph structure but rather encode a graph sparsity parameter into the bitstream. The construction of the graph in the point cloud local clusters utilizes both the decoded geometry information and the graph sparsity parameter extracted from the intra-cluster attribute analysis. Specifically, the decoded geometry information is available in both the attribute encoder and decoder. The extracted graph sparsity parameter is then encoded into the bitstream to facilitate consistent construction of the graph in the decoder. In the decoder, the geometry information of the point cloud is first decoded and then used as side information to guide the partitioning of the point cloud attributes into local clusters. Later, graph sparsity parameters of point cloud clusters are decoded. By leveraging these two types of decoded information, the graph structure in point cloud local clusters can be consistently reconstructed in the decoder, ensuring accurate representation and decoding of the attribute information.

4. Experimental Results

We perform a series of experiments to ascertain the effectiveness and efficiency of the proposed framework for point cloud attribute compression. Details of our experimental setup are provided in Section 4.1. The compression performance comparison between the proposed scheme with competitive platforms is presented in Section 4.2. Reconstruction quality evaluation results are presented in Section 4.3. Ablation studies are conducted in Section 4.4 to validate the performance of our key processing modules.

4.1. Simulation Setup

Our experiments are executed on a computer outfitted with an Intel i7 8700K CPU (3.7 GHz) and 64 GB RAM. We utilize standard point clouds exhibiting diverse geometry and attribute characteristics sourced from the MPEG solid and dense point cloud categories [39] as our test datasets. The visualization of these datasets is depicted in Figure 3, and their characteristics are summarized in Table 2. Our comparative analysis involves benchmarking the proposed framework against several competitive platforms from both industry and academia: (i) G-PCC PLT(v23) [17] (Prediction and Lifting Transform Attribute Codec), (ii) G-PCC RAHT(v23) [17] (Region-Adaptive Hierarchical Transform Attribute Codec), and (iii) BAAC (Block-Adaptive Attribute Codec [18]).

Regarding evaluation metrics, we adhere to the MPEG common test conditions [39] and utilize bits per point (bpp) to represent the total attribute bitrate. The Bjontegaard Delta Bitrate (BD-BR) value [40] is employed to quantify the average bitrate differences between rate-distortion (R-D) curves for equivalent reconstruction quality. Furthermore, we calculate the Peak Signal-to-Noise Ratio (PSNR) to gauge the reconstructed attribute distortion. Specifically, once the PSNR_Y, PSNR_U, and PSNR_V of attribute components Y, U, and V are obtained, the combined PSNR_YUV is computed as

{PSNR}_{YUV} = (6 {PSNR}_{Y} + {PSNR}_{U} + {PSNR}_{V}) / 8

.

4.2. Compression Performance Evaluation

Table 3 presents an overview of the compression performance evaluation, comparing the proposed scheme with competitive platforms G-PCC PLT, G-PCC RAHT, and BAAC in point cloud attribute lossy compression. In Table 3, we utilize the BD-BR value to quantify the average bitrate reduction achieved by our proposed scheme compared to comparative methods at four testing bitrate points. A negative BD-BR value indicates a reduction in bitrate, implying improved compression efficiency. Compared to G-PCC PLT, our scheme achieves average BD-BR gains of 15.81%, 30.28%, and 29.57% on coding Y, U, and V components, respectively. Similarly, compared to G-PCC RAHT, we observe average BD-BR gains of 31.02%, 30.71%, and 32.14% on Y, U, and V components, respectively. Notably, while G-PCC PLT and G-PCC RAHT represent attribute codecs in the latest standard point cloud compression platform MPEG G-PCC(v23), our scheme significantly improves attribute coding performance over G-PCC by integrating a well-designed graph transform coding tool.

When compared with the graph transform anchor BAAC, our scheme achieves average BD-BR gains of 16.24%, 19.59%, and 20.44% on coding Y, U, and V components, respectively. The results denote that when equivalent reconstruction quality is achieved, our scheme can realize bitrate savings of 16.24%, 19.59%, and 20.44% on coding Y, U, and V components in comparison to BAAC. These bitrate savings are indeed crucial, as they directly translate into reduced point cloud storage requirements and transmission bandwidth. This performance enhancement over BAAC is achieved by leveraging texture-guided priors in the joint rate–distortion–sparsity optimization process for graph construction. Unlike BAAC, which solely relies on geometry information for graph construction, our scheme transcends such limitations, resulting in superior coding performance improvement. The consistent and substantial attribute BD-BR gains obtained by our scheme across all tests underscore its effectiveness and robustness in compressing point clouds with diverse attribute features. These findings consistently validate the superior coding efficiency of our approach for point cloud attribute compression.

Since the experimental tests are conducted at different bitrate points for point cloud attribute lossy compression, presenting compression ratios alone in Table 3 may not adequately reflect the degree of attribute coding distortion. As shown in Figure 4, we provide the results of the total attribute bitrates and combined PSNR values at each testing bitrate point of the proposed scheme and comparative platforms. In the R-D curves of Figure 4, the x-axis represents the total attribute bitrates, while the y-axis represents the combined PSNR values at each testing bitrate point. In Figure 4, consistent R-D performance gains are evident with the proposed scheme across the majority of datasets at various bitrate points. While BAAC achieves comparable coding performance with a slight BD-BR loss compared to our scheme on MPEG solid point cloud datasets, the performance gap widens significantly when compressing MPEG dense point clouds. This discrepancy stems from the increased geometry precision of dense point clouds, leading to a more intricate distribution of textures in geometry space. BAAC, relying solely on geometry information for graph construction, fails to adequately capture attribute correlations. Consequently, the decorrelation capability of the generated graph transform for point cloud attribute compression is diminished. These results highlight our scheme’s versatility in handling different types of point clouds and its ability to achieve substantial coding performance improvements over state-of-the-art methods at various bitrate points.

The presentation of Table 3 and Figure 4 offers a comprehensive evaluation of the coding performance from different aspects, providing insights into the compression efficiency and improvements achieved by our proposed scheme. The consistent and substantial attribute BD-BR gains achieved by our scheme across all tests highlight its effectiveness and robustness in compressing point clouds with diverse attribute features. However, it is important to note that the introduction of our proposed texture-guided graph transform optimization scheme comes with an increase in computational complexity compared to existing platforms. Unlike previous approaches [16,18], which mainly rely on geometry information for graph sparsity optimization, our scheme incorporates texture analysis of point cloud local regions to extract graph sparsity parameters for guiding graph optimization. This process may lead to increased computational demands, especially when dealing with complex texture distributions in point cloud local regions that contain numerous details. To address this potential issue, we plan to design a more efficient texture analysis scheme in the future, aimed at enhancing the efficiency of the proposed texture-guided graph transform optimization. By refining our texture analysis approach, we aim to mitigate the computational demands and improve the overall efficiency of our scheme.

4.3. Reconstruction Quality Evaluation

To further assess the efficacy of the proposed scheme in point cloud attribute reconstruction quality following lossy compression, we conduct subjective quality evaluations on decoded point clouds with distorted attributes generated by our scheme and comparative platforms. Specifically, we select datasets

L o n g d r e s s_v o x 10_1300

and

T h a i d a n c e r_v i e w d e p_v o x 12

, characterized by different attribute characteristics and rich textures, for this quality assessment.

In Figure 5, we present a subjective quality comparison of decoded point clouds between our scheme and state-of-the-art methods at comparable attribute bitrates. To facilitate a detailed examination, we zoom in on specific regions of interest marked by red boxes. Each sub-figure in Figure 5 showcases the reconstructed attribute details within the red box, alongside the associated attribute bitrate (bpp) and combined YUV-PSNR (dB). Notably, local details within white circles allow for a clear depiction of quality differences among decoded point clouds. Examining the contents within the yellow circles, we observe that the proposed method better preserves attribute details, particularly in areas characterized by complex textures and rich edges. Moreover, our scheme achieves the highest PSNR values at the lowest bitrate expenses compared to comparative platforms. Consequently, the results demonstrate that the proposed scheme outperforms other platforms in point cloud attribute lossy compression, yielding superior subjective quality.

4.4. Ablation Studies

The primary contribution of this paper lies in the introduction of a novel texture-guided graph transform optimization scheme. In our approach, texture priors derived from the proposed intra-cluster attribute analysis serve as the desired graph sparsity during the graph construction process for point cloud clusters. Subsequently, this desired graph sparsity is utilized in both the graph connectivity mask

A

(as per Equation (14)) and the setting of the optimization parameter

θ = \frac{1}{\sqrt{α β}}

(as per Equation (16)). To objectively assess the effectiveness of our texture-guided graph connectivity mask

A

and the sparsity-related parameter

θ

on point cloud attribute coding performance, we conduct ablation studies to independently analyze each module as follows.

Validation of the Texture-guided Graph Connectivity Mask: As detailed in Section 3.4, we propose an intra-cluster attribute analysis scheme to extract texture priors. These texture priors are then leveraged to construct a texture-guided graph connectivity mask for graph optimization. To validate the impact of the proposed texture-guided graph connectivity mask, we conduct a series of comparison experiments between our scheme with the texture-guided graph connectivity mask and the scheme employing fixed graph sparsity values k. Specifically, we conduct three tests with k set to 5, 10, and 15 for this validation. Results in Table 4 demonstrate significant coding performance losses introduced with fixed graph sparsity values, even across different settings. For example, the experiment with k set to 5 presents the average BD-BR loss of 6.50%, 8.25%, and 8.20% on Y, U, and V components, respectively. This validation underscores the performance improvement of the proposed texture-guided scheme for point cloud attribute compression. By leveraging the texture-guided graph connectivity mask, we achieve a more precise representation of attributes within the underlying graph, surpassing conventional graph construction methods reliant solely on fixed graph sparsity values.

Validation of the Sparsity-related Optimization Parameter $θ$ : As illustrated in Section 3.5, we adjust the optimization parameter

θ

in Equation (16) relative to the desired sparsity obtained from our attribute analysis process. The optimization parameter

θ

then is used to adjust the weight of the graph sparsity term

R_{s} (W)

within the rate–distortion–sparsity energy function in Equation (9). The graph sparsity term plays a pivotal role in the graph optimization process, influencing the decorrelation capability of the constructed graph transform. To validate the impact of our sparsity-related optimization parameter, we conduct comparison experiments between our scheme with the sparsity-related optimization parameter and the scheme employing fixed optimization parameter values

θ

. Specifically, three tests with

θ

set to 0.01, 0.1, and 1 are conducted for this validation. The results presented in Table 5 demonstrate the effectiveness of our sparsity-related optimization parameter compared to all tests with fixed optimization parameter values. For example, when

θ

is set to 1, significant coding performance reduction is introduced, with an average BD-BR loss of 42.42%, 65.47%, and 64.53% on the Y, U, and V components, particularly evident in MPEG dense point clouds. These findings affirm the advantages of our proposed sparsity-related optimization parameter in the graph construction process and point cloud attribute compression.

5. Conclusions

In this paper, we propose a texture-guided graph transform optimization scheme for point cloud attribute compression. We formulate the attribute transform coding task as a joint graph optimization problem, considering both the decorrelation capability of the graph transform and the sparsity of the constructed graph. Additionally, we integrate a Hilbert-based point cloud reorganization and uniform clustering approach, facilitating our inter-cluster attribute prediction and intra-cluster prediction on segmented local clusters to eliminate spatial redundancy and extract texture priors. Moreover, we devise a joint rate–distortion–sparsity optimization scheme for constructing the underlying graph structure within each cluster. This optimization process is guided by both geometry structure and texture priors, aiming to achieve optimal coding performance. These methodological advancements present promising solutions for efficient point cloud attribute compression. Experimental results show the proposed scheme outperforms the state of the art with significant BD-BR gains, surpassing G-PCC by 31.02%, 30.71%, and 32.14% in BD-BR gains for Y, U, and V components, respectively. Furthermore, ablation studies conducted on key modules within our scheme validate their effectiveness, further reinforcing the significance of our proposed approach.

Author Contributions

Conceptualization, Y.S., F.S. and W.G.; methodology, Y.S., F.S. and S.L.; software, Y.S., F.S. and G.L.; validation, W.G., S.L. and G.L.; formal analysis, W.G., S.L. and G.L.; investigation, W.G.; resources, S.L.; data curation, G.L.; writing—original draft preparation, Y.S.; writing—review and editing, W.G., S.L. and G.L.; visualization, Y.S. and F.S.; supervision, W.G.; project administration, S.L.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 62172021).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from MPEG and are available from the authors with the permission of MPEG.

Conflicts of Interest

Author Shan Liu was employed by the company Tencent. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

He, Y.; Liang, B.; Yang, J.; Li, S.; He, J. An iterative closest points algorithm for registration of 3D laser scanner point clouds with geometric features. Sensors 2017, 17, 1862. [Google Scholar] [CrossRef] [PubMed]
Yue, Y.; Li, X.; Peng, Y. A 3D Point Cloud Classification Method Based on Adaptive Graph Convolution and Global Attention. Sensors 2024, 24, 617. [Google Scholar] [CrossRef] [PubMed]
Feng, Y.; Zeng, S.; Liang, T. Part2Point: A Part-Oriented Point Cloud Reconstruction Framework. Sensors 2024, 24, 34. [Google Scholar] [CrossRef] [PubMed]
Zhuang, L.; Tian, J.; Zhang, Y.; Fang, Z. Variable Rate Point Cloud Geometry Compression Method. Sensors 2023, 23, 5474. [Google Scholar] [CrossRef] [PubMed]
Graziosi, D.; Nakagami, O.; Kuma, S.; Zaghetto, A.; Suzuki, T.; Tabatabai, A. An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Trans. Signal Inf. Process. 2020, 9, e13. [Google Scholar] [CrossRef]
Schwarz, S.; Preda, M.; Baroncini, V.; Budagavi, M.; Cesar, P.; Chou, P.A.; Cohen, R.A.; Krivokuća, M.; Lasserre, S.; Li, Z.; et al. Emerging MPEG standards for point cloud compression. IEEE J. Emerg. Sel. Top. Circuits Syst. 2018, 9, 133–148. [Google Scholar] [CrossRef]
Cao, C.; Preda, M.; Zaharia, T. 3D point cloud compression: A survey. In Proceedings of the 24th International Conference on 3D Web Technology, Los Angeles, CA, USA, 26–28 July 2019; pp. 1–9. [Google Scholar]
Wang, J.; Ding, D.; Li, Z.; Feng, X.; Cao, C.; Ma, Z. Sparse Tensor-Based Multiscale Representation for Point Cloud Geometry Compression. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9055–9071. [Google Scholar] [CrossRef] [PubMed]
Guo, T.; Yuan, H.; Wang, L.; Wang, T. Rate-distortion optimized quantization for geometry-based point cloud compression. J. Electron. Imaging 2023, 32, 013047. [Google Scholar] [CrossRef]
Zhang, J.; Chen, T.; Ding, D.; Ma, Z. YOGA: Yet Another Geometry-based Point Cloud Compressor. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; ACM: New York, NY, USA, 2023; pp. 9070–9081. [Google Scholar]
Do, T.T.; Chou, P.A.; Cheung, G. Volumetric Attribute Compression for 3D Point Clouds Using Feedforward Network with Geometric Attention. In Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
Liu, H.; Yuan, H.; Liu, Q.; Hou, J.; Zeng, H.; Kwong, S. A hybrid compression framework for color attributes of static 3D point clouds. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1564–1577. [Google Scholar] [CrossRef]
Wang, J.; Ding, D.; Ma, Z. Lossless Point Cloud Attribute Compression Using Cross-scale, Cross-group, and Cross-color Prediction. In Proceedings of the 2023 Data Compression Conference (DCC), Snowbird, UT, USA, 21–24 March 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 228–237. [Google Scholar]
Zhao, B.; Lin, W.; Lv, C. Fine-grained patch segmentation and rasterization for 3-d point cloud attribute compression. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4590–4602. [Google Scholar] [CrossRef]
Yang, X.; Shao, Y.; Liu, S.; Li, T.H.; Li, G. PDE-based Progressive Prediction Framework for Attribute Compression of 3D Point Clouds. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; ACM: New York, NY, USA, 2023; pp. 9271–9281. [Google Scholar]
Shao, Y.; Zhang, Q.; Li, G.; Li, Z.; Li, L. Hybrid point cloud attribute compression using slice-based layered structure and intra prediction. In Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 1199–1207. [Google Scholar]
ISO/IEC JTC 1/SC 29/WG 7; G-PCC Test Model v16. MPEG: Geneva, Switzerland, 2021.
Song, F.; Li, G.; Yang, X.; Gao, W.; Liu, S. Block-Adaptive Point Cloud Attribute Coding with Region-Aware Optimized Transform. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4294–4308. [Google Scholar]
Ortega, A.; Frossard, P.; Kovačević, J.; Moura, J.M.; Vandergheynst, P. Graph signal processing: Overview, challenges, and applications. Proc. IEEE 2018, 106, 808–828. [Google Scholar] [CrossRef]
Dong, X.; Thanou, D.; Toni, L.; Bronstein, M.; Frossard, P. Graph signal processing for machine learning: A review and new perspectives. IEEE Signal Process. Mag. 2020, 37, 117–127. [Google Scholar] [CrossRef]
Hu, W.; Pang, J.; Liu, X.; Tian, D.; Lin, C.W.; Vetro, A. Graph signal processing for geometric data and beyond: Theory and applications. IEEE Trans. Multimed. 2021, 24, 3961–3977. [Google Scholar] [CrossRef]
Zhang, C.; Florencio, D.; Loop, C. Point cloud attribute compression with graph transform. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Chongqing, China, 4–6 March 2022; IEEE: Piscataway, NJ, USA, 2014; pp. 2066–2070. [Google Scholar]
Schnabel, R.; Klein, R. Octree-based Point-Cloud Compression. PBG@ SIGGRAPH 2006, 3, 111–121. [Google Scholar]
Huang, Y.; Peng, J.; Kuo, C.C.J.; Gopi, M. A generic scheme for progressive point cloud coding. IEEE Trans. Vis. Comput. Graph. 2008, 14, 440–453. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Yu, L.; Wang, W. Hilbert space filling curve based scan-order for point cloud attribute compression. IEEE Trans. Image Process. 2022, 31, 4609–4621. [Google Scholar] [CrossRef] [PubMed]
Cohen, R.A.; Tian, D.; Vetro, A. Attribute compression for sparse point clouds using graph transforms. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1374–1378. [Google Scholar]
Shao, Y.; Zhang, Z.; Li, Z.; Fan, K.; Li, G. Attribute compression of 3D point clouds using Laplacian sparsity optimized graph transform. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–4. [Google Scholar]
Xu, Y.; Hu, W.; Wang, S.; Zhang, X.; Wang, S.; Ma, S.; Guo, Z.; Gao, W. Predictive generalized graph Fourier transform for attribute compression of dynamic point clouds. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 1968–1982. [Google Scholar] [CrossRef]
De Queiroz, R.L.; Chou, P.A. Compression of 3D point clouds using a region-adaptive hierarchical transform. IEEE Trans. Image Process. 2016, 25, 3947–3956. [Google Scholar] [CrossRef]
Zhang, C.; Florêncio, D. Analyzing the optimality of predictive transform coding using graph-based models. IEEE Signal Process. Lett. 2012, 20, 106–109. [Google Scholar] [CrossRef]
Dong, X.; Thanou, D.; Frossard, P.; Vandergheynst, P. Learning Laplacian matrix in smooth graph signal representations. IEEE Trans. Signal Process. 2016, 64, 6160–6173. [Google Scholar] [CrossRef]
Kalofolias, V. How to learn a graph from smooth signals. In Proceedings of the Artificial intelligence and statistics, PMLR, Cadiz, Spain, 9–11 May 2016; pp. 920–929. [Google Scholar]
Hu, W.; Gao, X.; Cheung, G.; Guo, Z. Feature graph learning for 3D point cloud denoising. IEEE Trans. Signal Process. 2020, 68, 2841–2856. [Google Scholar] [CrossRef]
ISO/IEC JTC 1/SC 29/WG 7; G-PCC Performance Evaluation and Anchor Results. MPEG: Geneva, Switzerland, 2023.
Sharma, G.; Wu, W.; Dalal, E.N. The CIEDE2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations. Color Res. Appl. 2005, 30, 21–30. [Google Scholar] [CrossRef]
Hu, W.; Cheung, G.; Ortega, A.; Au, O.C. Multiresolution graph fourier transform for compression of piecewise smooth images. IEEE Trans. Image Process. 2014, 24, 419–433. [Google Scholar] [CrossRef] [PubMed]
Kalofolias, V.; Perraudin, N. Large Scale Graph Learning From Smooth Signals. In Proceedings of the 7th International Conference on Learning Representations, ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Komodakis, N.; Pesquet, J.C. Playing with duality: An overview of recent primal? dual approaches for solving large-scale optimization problems. IEEE Signal Process. Mag. 2015, 32, 31–54. [Google Scholar] [CrossRef]
ISO/IEC JTC2/SC29/WG7 MPEG Output Document N00650; Common Test Conditions for G-PCC. MPEG: Geneva, Switzerland, 2023.
Bjontegaard, G. Calculation of Average PSNR Differences between RD-Curves. ITU SG16 Doc. VCEG-M33. 2001. Available online: https://www.itu.int/wf-tp3/av-arch/video-site/0104_Aus/VCEG-M33.doc (accessed on 5 May 2024).

Figure 1. The pipeline of the proposed texture-guided graph transform optimization scheme for point cloud attribute compression.

Figure 2. The proposed Hilbert-based point reorder operation. (a) An example of the fitted Hilbert filling curve in a

2 \times 2 \times 2

3D cube. The numbers from 0 to 7 represent the Hilbert-based order of points in the 3D cube. (b) An example of original points reordered by the traditional Morton code and our Hilbert code.

Figure 2. The proposed Hilbert-based point reorder operation. (a) An example of the fitted Hilbert filling curve in a

2 \times 2 \times 2

3D cube. The numbers from 0 to 7 represent the Hilbert-based order of points in the 3D cube. (b) An example of original points reordered by the traditional Morton code and our Hilbert code.

Figure 3. Point cloud datasets. (a) MPEG solid point clouds. Those point clouds are shown from left to right and up to down:

B a s k e t b a l l_p l a y e r_v o x 11_00000200

,

D a n c e r_v o x 11_00000001

,

L o n g d r e s s_v o x 10_1300

,

L o o t_v o x 10_1200

,

R e d a n d b l a c k_v o x 10_1550

,

T h a i d a n c e r_v i e w d e p_v o x 12

,

S o l d i e r_v o x 10_0690

,

Q u e e n_0200

, and

F a c a d e_00064_v o x 11

. (b) MPEG dense point clouds. Those point clouds are shown from left to right:

L o n g d r e s s_v i e w d e p_v o x 12

,

L o o t_v i e w d e p_v o x 12

,

R e d a n d b l a c k_v i e w d e p_v o x 12

,

S o l d i e r_v i e w d e p_v o x 12

, and

B o x e r_v i e w d e p_v o x 12

.

Figure 3. Point cloud datasets. (a) MPEG solid point clouds. Those point clouds are shown from left to right and up to down:

B a s k e t b a l l_p l a y e r_v o x 11_00000200

,

D a n c e r_v o x 11_00000001

,

L o n g d r e s s_v o x 10_1300

,

L o o t_v o x 10_1200

,

R e d a n d b l a c k_v o x 10_1550

,

T h a i d a n c e r_v i e w d e p_v o x 12

,

S o l d i e r_v o x 10_0690

,

Q u e e n_0200

, and

F a c a d e_00064_v o x 11

. (b) MPEG dense point clouds. Those point clouds are shown from left to right:

L o n g d r e s s_v i e w d e p_v o x 12

,

L o o t_v i e w d e p_v o x 12

,

R e d a n d b l a c k_v i e w d e p_v o x 12

,

S o l d i e r_v i e w d e p_v o x 12

, and

B o x e r_v i e w d e p_v o x 12

.

Figure 4. R-D performance comparison of the proposed method, G-PCC PLT, G-PCC RAHT, and BAAC on point cloud attribute lossy compression.

Figure 5. Subjective quality comparison of decoded point clouds between our scheme and the state of the art on the

L o n g d r e s s_v o x 10_1300

and

T h a i d a n c e r_v i e w d e p_v o x 12

. (a) Subjective quality evaluation on

L o n g d r e s s_v o x 10_1300

. (b) Subjective quality evaluation on

T h a i d a n c e r_v i e w d e p_v o x 12

.

Figure 5. Subjective quality comparison of decoded point clouds between our scheme and the state of the art on the

L o n g d r e s s_v o x 10_1300

and

T h a i d a n c e r_v i e w d e p_v o x 12

. (a) Subjective quality evaluation on

L o n g d r e s s_v o x 10_1300

. (b) Subjective quality evaluation on

T h a i d a n c e r_v i e w d e p_v o x 12

.

Table 1. The mapping rules from the locations in the 3D cube to the Hilbert code.

3D Geometry Location in X Y Z Axis			Hilbert Code
0	0	0	( 0 0 0 )
0	0	1	( 0 0 1 )
0	1	0	( 0 1 1 )
0	1	1	( 0 1 0 )
1	0	0	( 1 1 1 )
1	0	1	( 1 1 0 )
1	1	0	( 1 0 0 )
1	1	1	( 1 0 1 )

Table 2. Characteristics of point cloud datasets.

Category	Sequence	Abbreviation	Total Point Number	Geometry Precision	Attribute Type
Solid	Basketball_player_vox11_00000200	Basketball	2,925,514	11	R, G, B
	Dancer_vox11_00000001	Dancer	2,592,758	11	R, G, B
	Longdress_vox10_1300	Longdress	857,966	10	R, G, B
	Loot_vox10_1200	Loot	805,285	10	R, G, B
	Redandblack_vox10_1550	Redandblack	757,691	10	R, G, B
	Soldier_vox10_0690	Soldier	1,089,091	10	R, G, B
	Thaidancer_viewdep_vox12	Thaidancer_v	3,130,215	12	R, G, B
	Queen_0200	Queen	1,000,993	10	R, G, B
	Facade_00064_vox11	Façade	4,061,755	11	R, G, B
Dense	Longdress_viewdep_vox12	Longdress_v	3,096,122	12	R, G, B
	Loot_viewdep_vox12	Loot_v	3,017,285	12	R, G, B
	Redandblack_viewdep_vox12	Redandblack_v	2,770,567	12	R, G, B
	Soldier_viewdep_vox12	Soldier_v	4,001,754	12	R, G, B
	Boxer_viewdep_vox12	Boxer_v	3,493,085	12	R, G, B

Table 3. Performance comparisons of the proposed framework with G-PCC PLT [17], G-PCC RAHT [17], and BAAC [18] on point cloud attribute lossy compression. The coding performance improvement is measured by the BD-BR gains.

Category	Sequence	G-PCC PLT [17]			G-PCC RAHT [17]			BAAC [18]
Category	Sequence	Y	U	V	Y	U	V	Y	U	V
Solid	Basketball	−11.95%	−13.73%	−32.23%	−25.25%	−10.36%	−24.22%	−3.08%	−3.11%	−3.91%
	Dancer	−14.68%	−12.37%	−29.03%	−22.80%	−15.10%	−26.73%	−2.70%	−2.74%	−3.24%
	Longdress	−12.37%	−15.28%	−13.87%	−17.59%	−12.86%	−12.62%	−6.69%	−7.97%	−7.80%
	Loot	−24.58%	−68.73%	−53.06%	−36.50%	−58.13%	−52.70%	−8.98%	−12.79%	−11.95%
	Redandblack	−16.40%	−21.87%	−19.32%	−24.89%	−19.02%	−18.30%	−6.53%	−7.25%	−6.95%
	Soldier	−17.45%	−45.58%	−38.75%	−36.86%	−55.31%	−47.56%	−8.43%	−11.97%	−11.98%
	Thaidancer_v	−18.13%	−31.55%	−31.42%	−34.21%	−33.40%	−34.17%	−24.23%	−27.91%	−28.47%
	Queen	−14.24%	−24.56%	−28.79%	−28.95%	−19.91%	−20.80%	−3.36%	−4.01%	−4.21%
	Facade	−13.10%	−17.09%	−18.85%	−23.31%	−14.72%	−16.73%	−2.52%	−2.98%	−3.16%
Dense	Longdress_v	−11.64%	−15.73%	−13.60%	−16.62%	−14.26%	−13.83%	−28.97%	−28.68%	−28.16%
	Loot_v	−23.58%	−42.83%	−38.52%	−55.02%	−49.73%	−52.81%	−25.76%	−46.07%	−54.77%
	Redandblack_v	−14.44%	−24.04%	−19.40%	−31.30%	−25.57%	−21.46%	−39.41%	−41.84%	−43.12%
	Soldier_v	−15.28%	−39.10%	−35.22%	−39.88%	−44.06%	−51.38%	−32.42%	−34.62%	−41.93%
	Boxer_v	−13.52%	−51.47%	−41.99%	−41.12%	−57.49%	−56.59%	−34.31%	−42.37%	−36.54%
Average Results		−15.81%	−30.28%	−29.57%	−31.02%	−30.71%	−32.14%	−16.24%	−19.59%	−20.44%

Table 4. Performance comparisons between the texture-guided graph connectivity mask and the fixed parameter k = 5, 10, and 15 on point cloud attribute lossy compression.

Category	Sequence	k = 5			k = 10			k = 15
Category	Sequence	Y	U	V	Y	U	V	Y	U	V
Solid	Basketball	1.95%	1.96%	2.60%	−0.43%	−0.49%	−0.54%	−0.53%	−0.54%	−0.69%
	Dancer	1.98%	1.97%	2.72%	−0.42%	−0.54%	−0.58%	−0.53%	−0.67%	−0.71%
	Longdress	3.05%	4.18%	4.02%	−0.93%	−1.13%	−1.09%	−1.10%	−1.31%	−1.29%
	Loot	4.77%	6.22%	6.19%	−1.24%	−0.92%	−0.99%	−1.51%	−1.18%	−1.29%
	Redandblack	3.44%	3.64%	4.21%	−1.18%	−1.38%	−1.37%	−1.27%	−1.42%	−1.46%
	Soldier	1.61%	2.31%	2.26%	−1.30%	−0.92%	−0.88%	−1.63%	−1.16%	−1.18%
	Thaidancer_v	11.17%	16.41%	16.82%	−3.12%	−4.39%	−4.55%	−3.11%	−4.37%	−4.54%
	Queen	4.02%	4.58%	5.08%	−0.36%	−0.51%	−0.53%	−0.73%	−0.84%	−0.80%
	Facade	1.34%	1.93%	2.18%	−0.44%	−0.60%	−0.58%	−0.35%	−0.57%	−0.52%
Dense	Longdress_v	10.61%	14.00%	13.10%	8.30%	10.87%	10.16%	8.30%	10.87%	10.16%
	Loot_v	17.12%	20.64%	19.98%	12.63%	13.77%	13.48%	12.63%	13.77%	13.48%
	Redandblack_v	9.47%	12.05%	10.73%	7.01%	9.05%	7.93%	7.02%	9.05%	7.94%
	Soldier_v	12.59%	16.56%	16.06%	9.74%	11.92%	11.49%	9.74%	11.92%	11.49%
	Boxer_v	7.95%	8.99%	8.91%	5.34%	5.34%	5.16%	5.35%	5.35%	5.18%
Average Results		6.50%	8.25%	8.20%	2.40%	2.86%	2.65%	2.31%	2.78%	2.56%

Table 5. Performance comparisons between the proposed sparsity-related optimization parameter and the fixed parameter

θ

= 0.01, 0.1, and 1 on point cloud attribute lossy compression.

Table 5. Performance comparisons between the proposed sparsity-related optimization parameter and the fixed parameter

θ

= 0.01, 0.1, and 1 on point cloud attribute lossy compression.

Category	Sequence	$θ$ = 0.01			$θ$ = 0.1			$θ$ = 1
Category	Sequence	Y	U	V	Y	U	V	Y	U	V
Solid	Basketball	2.68%	2.47%	3.08%	0.59%	0.51%	1.17%	1.27%	1.29%	1.69%
	Dancer	2.92%	2.70%	3.73%	2.71%	2.52%	3.40%	1.31%	1.30%	1.81%
	Longdress	8.25%	11.55%	11.15%	7.28%	10.20%	9.85%	2.95%	3.93%	3.80%
	Loot	11.94%	11.96%	11.92%	10.58%	10.74%	10.81%	4.54%	6.15%	5.88%
	Redandblack	8.80%	9.54%	10.60%	7.87%	8.47%	9.59%	3.07%	3.25%	3.87%
	Soldier	12.75%	12.82%	12.47%	11.39%	11.53%	11.27%	4.53%	5.53%	5.53%
	Thaidancer_v	12.33%	18.32%	18.25%	10.99%	16.33%	16.27%	11.61%	16.91%	17.46%
	Queen	3.32%	4.48%	4.67%	2.39%	3.32%	3.56%	0.76%	0.46%	0.80%
	Facade	1.16%	2.03%	2.02%	0.96%	1.68%	1.69%	0.35%	0.50%	0.64%
Dense	Longdress_v	0.19%	0.21%	0.25%	−1.61%	−1.99%	−1.89%	60.96%	71.45%	69.42%
	Loot_v	−2.78%	−5.77%	−5.25%	−5.04%	−7.16%	−6.64%	189.75%	357.43%	337.41%
	Redandblack_v	1.09%	0.49%	1.66%	−0.91%	−1.57%	−0.72%	82.60%	97.15%	90.60%
	Soldier_v	1.28%	−1.71%	−1.93%	−1.35%	−3.63%	−3.77%	93.64%	154.82%	165.47%
	Boxer_v	−5.23%	−9.09%	−9.06%	−7.34%	−10.42%	−10.45%	136.58%	196.47%	199.10%
Average Results		4.19%	4.29%	4.54%	2.75%	2.89%	3.15%	42.42%	65.47%	64.53%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shao, Y.; Song, F.; Gao, W.; Liu, S.; Li, G. Texture-Guided Graph Transform Optimization for Point Cloud Attribute Compression. Appl. Sci. 2024, 14, 4094. https://doi.org/10.3390/app14104094

AMA Style

Shao Y, Song F, Gao W, Liu S, Li G. Texture-Guided Graph Transform Optimization for Point Cloud Attribute Compression. Applied Sciences. 2024; 14(10):4094. https://doi.org/10.3390/app14104094

Chicago/Turabian Style

Shao, Yiting, Fei Song, Wei Gao, Shan Liu, and Ge Li. 2024. "Texture-Guided Graph Transform Optimization for Point Cloud Attribute Compression" Applied Sciences 14, no. 10: 4094. https://doi.org/10.3390/app14104094

APA Style

Shao, Y., Song, F., Gao, W., Liu, S., & Li, G. (2024). Texture-Guided Graph Transform Optimization for Point Cloud Attribute Compression. Applied Sciences, 14(10), 4094. https://doi.org/10.3390/app14104094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Texture-Guided Graph Transform Optimization for Point Cloud Attribute Compression

Abstract

1. Introduction

2. Related Work on Point Cloud Attribute Compression

3. The Proposed Texture-Guided Graph Transform Optimization Scheme for Point Cloud Attribute Compression

3.1. Problem Formulation

3.2. Overview of Our Proposed Framework

3.3. Point Cloud Reorganization and Clustering

3.4. Attribute Inter-Cluster Prediction and Intra-Cluster Analysis

3.5. Point Cloud Graph Transform Optimization

4. Experimental Results

4.1. Simulation Setup

4.2. Compression Performance Evaluation

4.3. Reconstruction Quality Evaluation

4.4. Ablation Studies

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI