3.2. Feature Extraction Module
The feature extraction module covers dynamic graph structure modeling, adaptive graph convolution, feature constraints, and bi-branch attention. Specifically, to focus on global and local contexts, the module dynamically captures the complex short- and long-range correlations between a current point and its neighboring and distant points by means of the dynamic graph structure. Subsequently, adaptive graph convolution is introduced to focus on the points with a higher degree of artifacts, to extract and enhance their attribute features. Moreover, a feature constraint mechanism and a bi-branch attention mechanism are designed. The former is to maintain the spatial consistency of neighboring points at the attribute feature level during the PC recovery process, while the latter is to merge the high-frequency information and contextual information to effectively restore the fine features and texture details.
Figure 3 intuitively showcases the architecture of the entire feature extraction module.
Dynamic graph construction. For a given compressed PC P = {G,A}, where represents its geometric coordinate information and represents its color attribute. Here, i denotes the index of the points in a PC and N is the number of points. First, a directed self-loop graph = (,E) is constructed based on the coordinates of P to capture its intrinsic spatial structure, where = {vi|i = 1, …, N} represents the node set, which accurately maps each point in P, and E ⊆ × defines the edge set, which obtains the connectivity relationships among the nodes based on a specific proximity criterion (e.g., K-nearest neighbor algorithm), so as to ensure that each node vi ∈ is connected to its K nearest neighbors, including itself. Subsequently, to enhance the information representation of the graph, the edge eigenvectors are defined, where j denotes the index of neighboring points in the neighborhood of i-th center point, j = 1, 2, …, K, and, in particular, when j = 1, it denotes the connection of a node to itself. Specifically, any eij integrates the geometric coordinates Gi and the color attributes Ai of node vi, as well as the relative geometric features Gj-Gi and color features Aj − Ai between that node and its j-th neighboring point. In the above formula, dis(·) denotes the Euclidean distance metric function, which reflects the spatial proximity between the nodes and in turn enhances the differentiation of the features. It is worth noting that when j = 1, G1 and A1 are set to be the information of the node itself, ensuring that the self-connecting edges carry the complete information of the node, thus avoiding the loss of information, while maintaining the integrity of the graph structure. The concatenation operation [·,·] effectively merges spatial location and color attribute features into a unified eigenvector, capturing comprehensive node information.
Subsequently, a dynamic graph structure updating strategy based on feature similarity is used in the graph construction process, discarding the traditional method that relies on fixed spatial locations. Specifically, node similarity is evaluated in the feature dimension, allowing nodes with similar features to be grouped together, which dynamically reshapes the graph topology. This strategy enables the graph structure to capture intrinsic connections between a current point in the PC and distant points that share closely related features.
Overall, the dynamic graph construction strategy significantly broadens the receptive field of individual nodes within the network, while also enriching the feature input space for subsequent dynamic graph convolution operations. This mechanism enhances the network’s ability to capture and utilize both local and global dependencies, resulting in more accurate and insightful feature representations.
Adaptive graph convolution. To focus further on the points with a high amount of artifacts, adaptive graph convolution is introduced for feature extraction, with its structure shown in
Figure 4. Specifically, an adaptive kernel is designed to capture the complex and unique geometric and contextual relationships in PCs. For each channel in terms of the
M-dimensional output features, adaptive graph convolution dynamically generates the kernel using the point features:
where
m = 1, 2, …,
M denotes one of the output dimensions corresponding to a single filter defined in adaptive graph convolution and
N(
i) is the set of neighborhood points associated with node
vi. To combine the global and local features, we define the edge feature
eij as the input feature
fij for the first adaptive kernel layer. Moreover,
g(·) is a feature mapping function, consisting of a 1 × 1 convolutional layer, a BatchNorm layer, and a LeakyReLu activation layer.
Similar to 2D convolution, the
M output dimension is obtained by computing the convolution of the input features with the corresponding filter weights, i.e., the convolution of the adaptive kernel with the corresponding edges:
where
denotes the inner product of two vectors outputting
∈
and
σ(·) denotes the nonlinear activation function. The
stacking of each channel generates the edge feature
Sij = [
,
, …,
] ∈
between the connection points. Finally, the output feature
of the central point
vi is defined by applying the aggregation to all the edge features in the neighborhood.
where max(·) denotes the maximum pooling function and ⊕ represents an element-wise addition.
Additionally, fast and robust aggregation of the output features can be delivered by introducing a residual connection mechanism.
In general, the dynamic graph convolution achieves fine-grained extraction and characterization of PC data attributes. The core advantage lies in its ability to pay significant attention to and enhance the processing of distortion-significant regions in PC data, enabling the network model to more acutely identify and focus on these potentially challenging regions. In addition, it also facilitates the design and implementation of a feature constraint mechanism, making the feature regulation process more flexible and efficient.
Feature constraint mechanism. After compression, the attribute information of neighboring points in a PC should theoretically be spatially consistent, so as to ensure the visual effect of content rendering. Global features can capture the overall statistical information and structure of a PC, so that the neural network can deeply understand the attribute distribution and geometry. So, it is confirmed that global features can play a key role in the artifact removal process, guiding the network to adjust and optimize the extracted graph features at the macro level, thus improving the overall recovery effect. Therefore, a global feature-based constraint mechanism is designed and implemented to ensure that the recovery process strictly follows the principle of spatial consistency.
The core of the feature constraint mechanism designed as mentioned above consists in the scaling adjustment to the graph features of PCs. Specifically, its expression is:
where
α denotes the global eigenvector of
P and
GFC(·) is the feature constraint function, which adjusts the graph feature
according to
α.
This process not only strengthens the correlation between the graph features and the global context, but also effectively promotes the maintenance of spatial consistency in the restoration process, resulting in a more visually coherent and natural PC.
By implementing the feature constraint mechanism, it can effectively mitigate abrupt changes during the recovery process, ensuring the continuity and smoothness of PC attribute information.
Bi-branch Attention Module. To address the problem that high-frequency information may get lost during the compression process, a bi-branch attention module is proposed, as shown in
Figure 5. It is designed to deeply integrate the high-frequency features of PCs to accurately recover detailed information and improve the overall quality. In the bi-branch attention module, input features
F are processed in parallel. One branch uses CNNs to directly represent feature mapping and transformations, in order to capture the basic structural information. Another branch performs high-frequency feature extraction, dedicated to mining and enhancing the high-frequency information in PCs. Specifically, the branch for high-frequency feature extraction first selects the significant maxima from the input features through a max-pooling operation, thus initially retaining the key texture information. Subsequently, this branch deeply processes the selected features through four consecutive unit layers (the first three unit layers consist of a 1 × 1 convolutional layer, a BatchNorm layer, and an ReLu layer, while the last omits the ReLu layer, to maintain feature flexibility) and then extracts the feature
FH, which is rich in high-frequency details. Meanwhile, another branch applies the same sequence of the four unit layers directly to the input feature
F to acquire a more general and optimized feature representation. Then, the output features from the two branches are preliminarily fused by element-by-element summation and further normalized in a
Softmax layer to adjust the weights of each feature element, in order to ensure the rationality and effectiveness of the fusion process. Finally, the normalized result is multiplied with the input feature
F in an element-by-element manner, in order to enhance the contribution of the high-frequency features in terms of the overall features and deliver the seamless fusion of the high-frequency features with the base features.
The expression for this process can be demonstrated as:
where
Unit(·) denotes the processing function for the unit layer sequence, ⊙ denotes the element-by-element multiplication, and
S(·) denotes the
Softmax normalization function.
By introducing the innovative bi-branch attention module, it enables the deep extraction and efficient utilization of high-frequency features in PCs. This module not only compensates for the loss of high-frequency information during compression, but also enhances the complementarity and fusion between low- and high-frequency features. As a result, the recovered PC can accurately reproduce the detailed features of the original data, while preserving global structural integrity.
Layered feature concatenation. To address the problem of gradient vanishing or gradient explosion that may be encountered during the neural network backpropagation process, we adopt the strategy of layered feature concatenation. This strategy directs the output features of each bi-branch attention block to the terminal of the feature extraction architecture and integrates these features through the concatenation operation to form the final output feature
, which is expressed as:
where
concat(·) denotes the feature concatenation operation and
denote the output features from the first to the fourth bi-branch attention blocks, respectively.
Such a layered feature concatenation strategy not only effectively maintains the fine-grained information independently extracted from each branch and avoids the loss of key information during the transmission process, but also captures the features more comprehensively by integrating multi-level feature representations.
3.3. Point Cloud Attribute Offset Estimation Module
This module takes
as the input parameter to estimate the offsets of the attributes. The structure is shown in
Figure 6. Specifically, it takes an entire PC chunk as the input. Then, for any given 3D coordinates
Gi, its
K nearest neighbors are extracted as a sample set. Further, the attribute relative feature
and
are utilized to learn the PC attribute’s offset, denoted as
Foffset. Formally, the module can be represented as:
where
denotes the attribute offset of point
Pi,
Est(·) is a composite function consisting of a Multi-Layer Perceptron (MLP), and
j ∈
N(
i) are the neighboring points distributed in the attribute space of point
Pi.
Training objective. A training objective function is constructed, which aims to minimize the L2 norm squared distance between the predicted attribute offset and its true offsets. The original PC is defined as
, which is used to compute the attribute offset of point
Pi:
where
is the difference vector in the attribute space between the original and compressed PCs, for this difference vector,
can also be expressed as the displacement vector from the processed attribute state
A to the original attribute state
.
The training objective is to align the predictions of the module with the predefined offsets of the real attributes, so that the module can learn and predict the attribute offsets accurately:
where
denotes the square of the L2 norm and
E(·) denotes the expectation function; weighted averaging of the offset of the point properties occurs in the local neighborhood to reduce the overall error.
For a PC, the attribute information exhibits spatial consistency in space, i.e., neighboring points tend to have similar attribute characteristics, a phenomenon that does not exist in geometric coordinates. For this reason, we not only focus on the attribute offset prediction of point Pi at its own location, but also explore in-depth the attribute offset prediction in terms of its spatial neighborhood. This consideration aims to fully explore and utilize the rich attribute information within the local neighborhood of point Pi, so as to enhance the accuracy and robustness of attribute prediction.
The final training objective is to aggregate the predicted offsets of the local attributes about each point
Pi and compute their mean values using the following expression:
where
mean(·) denotes the average of the local attribute offsets of all the points in a PC.
This training objective can effectively balance the local prediction bias and promote equal consideration of the prediction contributions among different points. Furthermore, it includes a strong emphasis on the spatial consistency of attributes between the recovered PCs and the original PCs. By minimizing the mean value of the predicted offsets, the network can learn and simulate the spatial variation patterns of the original attributes of PCs, thereby reducing the impact of texture artifacts during the recovery process.