In this section, we will begin by introducing the redefinition of granulation and lower-approximation operators to make them applicable to deep learning networks. Building upon this, a rough set-based attention mechanism is developed. Subsequently, based on this foundation, we construct a transformer network for point cloud learning.
3.1. The Rough Set-Based Attention Mechanism
For the neighborhood rough set that we employ, the mutually exclusive information granules generated by neighborhood relations serve as the fundamental units for approximation. Below, we first introduce the fundamental properties of the neighborhood relations adopted.
Definition 1 ([24]). Let U be a non-empty set, if, for any element in U, there exists a uniquely determined real function Δ corresponding to it, and Δ satisfies:
(1) 0. When and only when , 0.
(2)
(3)
where Δ is a distance function on U and is a distance space, also called a metric space.
In the N-dimensional Euclidean space, given any two points and , the distances are: The specific definition of the distance function can have various expressions due to the existence of multiple variables [
25], but they all play the role of measuring the distance relationship between any two points in the Euclidean space.
Information granules generated through neighborhood relations can be further guided through upper and lower approximations. Below, we provide the definition of upper and lower approximations employed in our context.
Definition 2 ([24]). Given a nonempty finite set on a real space and a neighborhood relation on U domain relation N, we call the NAS = as a neighborhood approximation space.
Definition 3 ([24]). Given = and X ⊆ U, X is in the neighborhood approximation space, the lower and upper approximations of = are defined as: It has the following properties: In the neighborhood rough set, mutually exclusive information particles generated by neighborhood relations are the basic units used for approximation. It can be said that the two modules of granulation and approximation form the rough set methodology’s cornerstone. Therefore, we will design the self-attention mechanism based on the two modules of granulation and approximation.
The following describes the granulation module. The fuzzy equivalence relation functions
R used to measure between samples are diverse. To better extract the commonality among the high-dimensional features of the point cloud data, we generated the relational functions for the granulation operation with Gaussian functions.
The Gaussian function satisfies all the properties needed to be defined as a rough set relation function for n feature vectors X of length m.
The granulation matrix has the following properties:
(1) ∀x,y∈U, = 1;
(2) = ;
(3) ∈ [0,1].
The granulation matrix serves as an information bottleneck in this process. It reflects the relationships between objects, expresses the granular structure of the argument domain, and carries all the sample information available to the approximation operation. Essentially, once the relationships between samples are extracted, subsequent rough calculations are performed on the fuzzy information grains constituted between samples rather than on individual samples.
The following describes the approximation module. In the fuzzy calculation of rough sets, the classification of samples is no longer either 0 or 1 in the deterministic sense, but is evaluated in terms of fuzzy affiliation between 0 and 1. For all information grains generated by the relational function, the mutual approximation between different fuzzy information grains can guide the importance of each other, and this property makes the approximation matrix a natural global feature guidance matrix, while the affiliation of the approximation is reflected by the upper and lower approximations of the rough set.
For the relational function
, the lower and upper approximation affiliations to the approximated information grain
are:
Among them, where N is the complementary operator, T is the trigonometric norm and S is the trigonometric remainder.
By approximating the different concepts (
) formed through granulation to each other, the fuzzy lower-approximation affiliations of ∀
x ∈
U affiliation to d through the relational function
R, which we construct, can be expressed as:
The value of the lower approximation characterizes the necessary correlation affiliation between the information grains composed of two features with each other. The granulation operation and approximation operation are calculated as shown in
Figure 2.
The approximation operation more objectively and accurately measures the degree of correlation between two information grains, which leads to the correlation of global features.
The rough set-based attention mechanism constructed based on granulation and approximation operations, where is generated by shared linear transformations and the input features F. The specific calculation of the rough set-based attention mechanism is as follows.
First, we can use the query and key matrices to calculate the granulation matrix (
G) by
R:
The granularity matrix is naturally normalized due to the design of the relationship function.
Then,
G regenerates a weight matrix
A of the same size as the granulation matrix via the approximation operation, whose values characterize the degree of necessary correlation between information grains and are used as the weights of attention. The rough set-based attention mechanism output features
are the weighted sums of the value vector using the corresponding attention weights:
The whole rough set self-attention computation process is carried out in the form of information grains, and both its granulation and weighted sum are permutation-independent operators. Therefore, the rough set-based attention mechanism is better adapted to the irregularity and disorder of the point cloud, while the correlation between the features is better measured. The overall approximate guided representation of rough set attention is shown in Algorithm 1.
Algorithm 1 Approximate guided representation methods based on rough set |
- 1:
A feature matrix (d tokens of length n) - 2:
Granulation operation: - 3:
for each token X do - 4:
Calculate each token with the others by Formula (6); - 5:
end for - 6:
Obtain the granulation matrix (n information granule g of length n) - 7:
Approximation operation: - 8:
for each information granule g do - 9:
The lower approximation is calculated by Formula (9); - 10:
end for - 11:
Obtain the weight matrix - 12:
return A.
|