1. Introduction
Topological signal processing [
1] is gaining momentum in different applications ranging from the characterization of brain networks [
2] to the monitoring of physical systems [
3,
4]. Compared to GSP, TSP extends the representation by associating signals not only with nodes and edges, but also with higher-order connectivity structures such as faces, thereby enabling enriched connectivity modeling. Representing signals on topological domains remains an ongoing issue [
5], particularly in the context of dictionary learning.
In this work, we propose a novel framework, termed TSP-SLAM, which establishes a direct link between the texture information provided by Visual Simultaneous Localization and Mapping (V-SLAM) and the topological signal processing of point clouds acquired through multi-camera systems. While V-SLAM has achieved significant advances for 3D reconstruction in static [
6] and dynamic [
7,
8,
9] scenarios, graph-based methods relying solely on node-to-node connectivity fail to capture higher-order structures such as planar surfaces and occlusions [
10]. Higher-order topological representations, including faces and simplicial complexes, improve robustness and semantic consistency, and recent studies demonstrate that both clustering and geometric learning benefit from such enriched connectivity [
10,
11,
12].
State-of-the-art extensions, such as the integration of deep learning [
13] and hardware optimizations [
14], further enhance robustness by adaptively handling visual features under varying conditions. SLAM outputs are increasingly enriched with value-aware geometric reconstructions, as in [
15] where a dense neural point cloud model encodes attributes such as normals and intensity. Implicit neural surfaces [
16] enable signal extraction on dense geometries, while Red Green and Blue-Depth (RGB-D) and Light Detection and Ranging (LiDAR) information in topology-based models [
17] support loop closure and relocalization and enhance robustness and scalability. Graph-theoretic signatures (e.g., von Neumann entropy, spanning trees) [
18] reduce uncertainty while lowering computational cost. With the growing adoption of real-time navigation and mapping for autonomous systems [
19,
20], V-SLAM enables on-the-fly mapping and safe adaptation to environmental changes [
21] simultaneously performing localization and mapping using camera input [
22,
23] in dynamic autonomous system applications.
Within this context, the main contribution of this study is the introduction of TSP-SLAM, a texture-aware topological framework that extends the 2D-to-3D geometric mappings of V-SLAM to associate signals not only with individual points in the cloud, but also with higher-order topological structures. In addition, recent work such as [
24] has demonstrated the use of topological descriptors for segmentation and recognition of objects from point clouds. While effective, these methods primarily rely on geometric information and do not exploit luminance or photometric texture cues. Our framework suggests that integrating luminance-based signals alongside geometric descriptors could further benefit tasks such as segmentation and recognition in point cloud processing. Specifically, TSP-SLAM supports the construction of signals over nodes, edges, and faces, thereby enabling advanced topological signal processing strategies. As a case study, we present Topological Multiscale Anisotropic Harmonic Filtering (T-MAHF), which extends to the topological domain a filtering technique (MAHF) originally introduced on graphs and applied successfully to denoising problems, thereby enhancing localization and mapping accuracy [
25]. Numerical results confirm the effectiveness of the enriched representation achieved by TSP-SLAM for robust and expressive point cloud processing.
The remainder of this paper is organized as follows.
Section 2 introduces the TSP-SLAM framework, providing a detailed description of the construction of topological structures from V-SLAM point clouds and the assignment of signals to nodes, edges, and faces. In
Section 3, we present the mathematical formulation and implementation aspects of the T-MAHF algorithm.
Section 4 reports the results of numerical experiments and comparative analyses, demonstrating the advantages of TSP-SLAM in terms of expressiveness of the resulting signal representations. Finally,
Section 5 concludes the paper by summarizing the main contributions and outlining potential directions for future research.
3. Topological Harmonic Filtering in TSP-SLAM
Here, we show an application of TSP-SLAM to extend anisotropic graph filtering to topological spaces. Specifically, we extend a filtering technique previously introduced for images and signals on graphs to the topological space, and we show how it can leverage the side information provided by TSP-SLAM. Thanks to the information acquired by TSP-SLAM, each topological level—vertex, edge, face—carries its own type of signal, allowing for a rich and flexible representation of the geometry and semantics of the point cloud.
Multiscale harmonic filters were formerly introduced and widely applied in image processing. Among these, Complex Harmonic Filters [
26] operate over a 2D spatial domain as a function of radial distance and angular orientation. The filtering is written as follows:
where
is defined as a separable function:
is a radial envelope—typically an isotropic Gaussian kernel—ensuring spatial localization, and the exponential term encodes angular selectivity. For
, the filter is purely radial and yields a real-valued low-pass response, effectively acting as a smoothing operator, and for
, the angular term introduces directional sensitivity, allowing the filter to highlight edge-like structures oriented along specific directions.
Recently, the harmonic filtering approach has been extended to non-Euclidean domain in [
27], where multiscale anisotropic filters have been introduced. The Multiscale Anisotropic Harmonic Filters (MAHF) of
m-th order and centered at
act on a point cloud signal
as follows:
and it is defined as a function of the geodesic distance metric between
and
, and of the angular coordinate of the
j-th vertex on the graph
tangent plane centered at
in the following formulas:
where
is the angular coordinate of
on the tangent plane in
. The function
is a real weighting function depending on the connectivity of the point cloud graph
G. Specifically,
is the so-called heat diffusion kernel, formulated based on the theory of heat diffusion over smooth surfaces, and it is defined as follows. Let
denote the eigenvectors of the graph Laplacian
. The kernel is computed as
Hence,
is related to the differences between the
i-th and
j-th coefficients across the eigenvectors of the Laplacian
, and it is larger when the points
belong to strongly connected areas [
25]. MAHF has been applied to signals on point clouds for different tasks, such as point cloud denoising [
25] or visual quality evaluation [
28]. Still, MAHF filters rely on accurate estimation of the Laplacian associated with the point cloud, in turn depending on the quality of the triangulation algorithm. Point clouds from V-SLAM techniques often capture different objects at a wide set of different distances. This hinders the development of a regular mesh model by conventional triangulation techniques. Hence, useful feature extractors such as MAHF, which proved useful in application like denoising or point cloud quality evaluation, can fail to extract features due to sensitivity to Laplacian/mesh errors.
Making use of the side information provided by signals on edges and faces can lead to a richer estimate of the point cloud features. Specifically, we leverage the side information provided in TSP-SLAM to build an enhanced version of MAHF, suited to topological signal processing, which we refer to as T-MAHF.
In T-MAHF, we leverage the TSP-SLAM framework to generalize MAHF by exploiting side information available at the graph
associated with the point cloud, as detailed below. We define the output of the T-MAHF topological filter at the
i-th vertex as follows:
where, for a generic topological element
representing a node, an edge, or a face, the function
is written as follows:
where
a real weighting function depending on a non-Euclidean distance metric between
and a neighboring element
, i.e., on the connectivity of the topological space
associated with the point cloud, and
a measure of angular distance in the tangent plane at the
i-th vertex.
The TSP-SLAM framework allows us to compute the functions
and
appearing in
with approximate values computed in the TSP-SLAM framework. To this end, let us develop the neighborhood system of the
i-th vertex in the topological space, as established by the incidence matrices
and
. Let us denote the neighborhood of the
i-th vertex by introducing the set
of the neighboring faces, i.e., faces that are incident on the vertex:
Then, we introduce the set
of neighboring edges as those belonging to the incident faces in
Finally we denote the set
of neighboring nodes:
In the following experiments, we set the weighting function as follows:
i.e.,
is an indicator function that specifies whether
—which can be a node, face or edge—belongs to a neighborhood system of the
i-th vertex or not. It should be noted that with this definition, the values of the weighting function in Equation (
17) are larger (equal to one) for one-hop connected topological elements, and zero otherwise. This choice provides a hard approximation of topology element interactions extending the soft definition in (
11). More generally, one could define the weighting function as a decreasing function of the topological distance
, namely
where the indicator function adopted in this work corresponds to the simplest binary instance of this general formulation.
The T-MAHF expression is then rewritten as follows:
Let us observe that
is a complex number, whose magnitude describes the intensity of the local variation and whose phase is related to the direction of the variation. In addition, we approximate the angular distance metrics
,
, and
with their 2D counterparts. Specifically, we introduce the following approximations:
and
where
and
are the barycenters of the edge and faces in the neighborhood systems of the vertex
i. The above quantities are directly available in the 2D domain, and in the following we show the limits within which the above approximation stands.
As for the computational architecture, the proposed MAHF and T-MAHF methods share an initial V-SLAM feature detection and sparse reconstruction, producing 3D points of the scene. MAHF operates on the reconstructed mesh, computing the Laplacian and its eigenvectors to build the heat kernel, which is combined with 3D angular information from mesh normals to produce a complex-valued filtered signal and enable 3D angle computation. This fully exploits the 3D structure but has higher computational cost. T-MAHF, instead, although relying on the mesh topology, works on the 2D image plane, performing 2D angular filtering without spectral decomposition. While computationally lighter, it captures less geometric information, as 3D surface variations are not explicitly modeled.
To sum up, TSP-SLAM allows building a texture dictionary suited to topological point cloud signal processing. The topology-related dictionary is built by extending techniques used in simultaneous localization and mapping algorithms, and adapting it to the point cloud topology description. This is useful for several developments, where non-Euclidean operators can be applied for processing purposes, including topological neural network architecture as discussed in [
29].
4. Numerical Results
Herein, we provide a set of results describing how the TSP-SLAM framework is built on real data. We then show how it can be used for topological processing by presenting the application of T-MAHF, with reference to the dataset in [
30].
Figure 3 illustrates the topology acquisition from measurements. Starting from 2D observations of projected point triplets in a stereo pair of images (bottom row), one can recover information about the underlying 3D triangle structure (top center). The middle row shows the schematic projection planes, where the corresponding left and right image coordinates, respectively
and
, are detected.
The two images in [
30] used for topological processing were acquired with a stereo camera configuration, featuring a
m baseline and a resolution of
pixels. For keypoint detection and correspondence estimation, we adopted the method in [
7], with the following settings: (i) the focal length of the cameras was 387.77 pixels for both x- and y- axes; (ii) the principal point of the cameras was located at coordinates (257.446, 197.718), specified in pixels; (iii) the maximum horizontal displacement between keypoints was limited to 48 pixels in order to filter matches that could lead to excessively shallow depth once triangulation is performed. By leveraging geometric constraints and correspondences between the two views, we inferred the spatial position of the points on the original 3D surface as in [
7].
Figure 4 represents the detected keypoints from the stereo image pair, with a pseudo-color indicating the distance of the corresponding 3D points with respect to the camera.
The cloud of estimated 3D points is shown in
Figure 5 (left), where the point color corresponds to its 3D depth
. The point cloud was then equipped with a mesh using the Crust algorithm in [
31], leading to a graph
associated to the 3D point cloud, as shown in
Figure 5 (right). The graph was defined with binary edge weights. The graph incidence matrices
and
, with values in
, represent the graph connectivity in terms of edges and faces, thereby enabling the definition of a structure suitable for topological signal processing.
In the TSP-SLAM framework, the topological elements (points, edges and faces) are naturally associated with the signals available in the 2D domain, as illustrated in
Figure 6. Specifically,
Figure 6A shows the projection of the 3D edges
in the 2D domain. At the barycenter
of each edge, we plot a square indicating the average value of the luminance over the edge pixels. In the TSP-SLAM framework, this value will be used as the signal
associated with the edge. Similarly,
Figure 6B shows the projection of the 3D faces
in the 2D domain. At the barycenter
a circle indicates the average value of the luminance over the face pixels, i.e., the signal
that will be associated with the face.
With these positions, we present some results exemplifying possible applications of TSP-SLAM in T-MAHF filtering.
Firstly, it is worth observing that the heat diffusion function , depending on a non-Euclidean distance metric between two 3D points and , is correlated to the distance between the two points in terms of graph hops.
This is illustrated in
Figure 7 (blue circles), showing the scatter plot of the values of the heat diffusion kernel
—computed using a Chebyshev polynomial approximation of heat diffusion over the graph—versus the number of hops
between the same nodes.
Figure 7 also shows a first-order exponential approximation of the underlying relationship. We observe a trend of exponential decay of the heat diffusion kernel
versus the hop distances on the graph. This suggests that a reasonable weighting function can be obtained by retaining the closest topological elements.
This is the rationale behind the choice of approximating the T-MAHF weighting function with a neighborhood indicator function, as in (
17).
Secondly,
Figure 8 shows the scatter plot of the angles
measured in the 3D domain versus the corresponding angles
computed in the 2D image plane. Due to the ray tracing projection geometry underlying TSP-SLAM, the approximation is expected to hold when the angles belong to faces parallel to the image pane. As the faces are increasingly tilted in the 3D domain, the angles and their projection increasingly differ. This is evident in
Figure 8, where the color of each point encodes the standard deviation of the depth (
) coordinates of the triangle’s vertices in 3D space.
As expected, a low standard deviation indicates that the vertices lie (approximately) at the same depth, a condition in which the 3D triangle is approximately affine to the 2D projected triangle via ray tracing. For triangles with vertices at largely different depths, i.e., tilted with respect to the image plane, the 2D and 3D angles differ, but in most cases the scattered points lie along the bisectrix of the first quadrant, indicating that the 2D angles serve as an approximation of the corresponding 3D angles.
As illustrated in
Figure 8, the discrepancy between 3D and 2D angular measurements increases with the depth variance of the face vertices, a phenomenon consistent with foreshortening effects reported in the structure-from-motion literature [
32]. Reprojection errors and keypoint noise, as discussed in [
33,
34], are non-negligible in practical V-SLAM pipelines, but can be explicitly modeled to improve reconstruction reliability. Incorporating higher-order geometric primitives, such as faces and simplicial complexes, further helps to stabilize and refine point cloud representations by enforcing local planarity and structural consistency [
35].
While exact 3D angle computation is computationally expensive, empirical evidence shows that 2D angular estimates align closely with their 3D counterparts when faces are approximately parallel to the image plane. Even for tilted faces, the scatter distribution remains concentrated around the bisector, indicating that 2D estimates provide a statistically reliable approximation. This trade-off allows TSP-SLAM to significantly reduce computational cost while maintaining geometric consistency at the topological level. Since the computation of the 3D angles is computationally expensive, the fast approximation provided by the 2D estimated angles can be adopted.
Finally,
Figure 9 and
Figure 10 present two examples of the result of topological harmonic filtering within the TSP-SLAM framework. For better visualization, the filter output at the 3D point
is displayed at the corresponding keypoint
, overlaid on the image. The color of
in the top panel represents the absolute value of the T-MAHF filter output on a logarithmic scale, i.e.,
, while the bottom panel represents, in the same manner, the phase of the T-MAHF filter output on a logarithmic scale, i.e.,
. We observe that T-MAHF allows the ranking of points according to larger magnitudes
; for these points, the phase
is an estimate of the direction of signal variation. In the bottom panel, detail in the right box, a few points characterized by larger magnitudes
are highlighted to illustrate the meaning of the phase component
. Although visually distinct, the selected points reveal a direction of the signal variation along angles of 0 (light green),
(red) and
(fuchsia). Therefore, the variation corresponds to a horizontal discontinuity. The same applies to the detail in the left box, where horizontal or slightly tilted edges are recognized.
It is worth noting that, although visualized in the 2D domain for clarity, this information refers to the signal defined on the point cloud graph and its topology. Such information can be employed for various applications, including point cloud classification or loop closure recognition. At the same time, thanks to the TSP-SLAM framework, this information can be computed directly on 2D data, increasing robustness and reducing computational complexity compared to computations in the full 3D domain.
A few remarks are in order regarding the applicability of the method to point clouds with larger numbers of points. In principle, the approach scales naturally to different resolutions, as illustrated in
Figure 11 and
Figure 12. In these examples, we considered the stereo projections—obtained with the same camera parameters as in [
30]—of a synthetic toroidal grayscale point cloud with
and
N = 20,000, respectively. For both cases, in addition to the 2D projections, we show the point cloud signal
and the output of the T-MAHF filtering
in terms of its magnitude
and phase
, computed from the 2D signal information. In a practical V-SLAM framework, when points are acquired from only two cameras, the 3D point set is restricted to the salient points for which reliable correspondences can be established. Conversely, if the cameras are complemented with Lidar data, the generated point cloud becomes significantly richer.
Finally, we carried out a comparison of the computational complexity of the T-MAHF method with the existing MAHF [
25]. The two algorithms share several tasks—such as salient points extraction and matching, and 3D point cloud and mesh computation—but differ in the filtering stage. In T-MAHF a binary adjacency matrix is computed, and filtering is carried out through signal extraction and distance/relative angle computation. In contrast, the MAHF algorithm requires computing the real-valued conformal Laplacian and performing its eigen analysis to calculate the filtering weights as functions of
and
.
Table 2 reports the computation time of the two methods in milliseconds per point, analyzed on four frames (1,137,981 and 1041 in [
30]), and broken down by task. On average, T-MAHF achieves a total computation time reduction of about 40% compared to MAHF. Let us observe that the complexity of the face and edge signal extraction task can increase as far as an increasing number of pixels is considered in the averages in (
5) and (
6). Still, T-MAHF allows faster point cloud topological signal processing exploiting the potentiality of the T-SLAM framework.