Topological Signal Processing from Stereo Visual SLAM

Salvo, Eleonora Di; Latino, Tommaso; Sanzone, Maria; Trozzo, Alessia; Colonnese, Stefania

doi:10.3390/s25196103

Open AccessArticle

Topological Signal Processing from Stereo Visual SLAM

by

Eleonora Di Salvo

,

Tommaso Latino

,

Maria Sanzone

,

Alessia Trozzo

and

Stefania Colonnese

^*

Department of Information Engineering, Electronics and Telecommunications, Sapienza University of Rome, 00184 Rome, Italy

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(19), 6103; https://doi.org/10.3390/s25196103

Submission received: 31 July 2025 / Revised: 22 September 2025 / Accepted: 1 October 2025 / Published: 3 October 2025

(This article belongs to the Special Issue Stereo Vision Sensing and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Topological signal processing is emerging alongside Graph Signal Processing (GSP) in various applications, incorporating higher-order connectivity structures—such as faces—in addition to nodes and edges, for enriched connectivity modeling. Rich point clouds acquired by multi-camera systems in Visual Simultaneous Localization and Mapping (V-SLAM) are typically processed using graph-based methods. In this work, we introduce a topological signal processing (TSP) framework that integrates texture information extracted from V-SLAM; we refer to this framework as TSP-SLAM. We show how TSP-SLAM enables the extension of graph-based point cloud processing to more advanced topological signal processing techniques. We demonstrate, on real stereo data, that TSP-SLAM enables a richer point cloud representation by associating signals not only with vertices but also with edges and faces of the mesh computed from the point cloud. Numerical results show that TSP-SLAM supports the design of topological filtering algorithms by exploiting the mapping between the 3D mesh faces, edges and vertices and their 2D image projections. These findings confirm the potential of TSP-SLAM for topological signal processing of point cloud data acquired in challenging V-SLAM environments.

Keywords:

Graph Signal Processing (GSP); Harmonic functions; stereo camera; Topological Signal Processing (TSP); Visual Simultaneous Localization and Mapping (V-SLAM)

1. Introduction

Topological signal processing [1] is gaining momentum in different applications ranging from the characterization of brain networks [2] to the monitoring of physical systems [3,4]. Compared to GSP, TSP extends the representation by associating signals not only with nodes and edges, but also with higher-order connectivity structures such as faces, thereby enabling enriched connectivity modeling. Representing signals on topological domains remains an ongoing issue [5], particularly in the context of dictionary learning.

In this work, we propose a novel framework, termed TSP-SLAM, which establishes a direct link between the texture information provided by Visual Simultaneous Localization and Mapping (V-SLAM) and the topological signal processing of point clouds acquired through multi-camera systems. While V-SLAM has achieved significant advances for 3D reconstruction in static [6] and dynamic [7,8,9] scenarios, graph-based methods relying solely on node-to-node connectivity fail to capture higher-order structures such as planar surfaces and occlusions [10]. Higher-order topological representations, including faces and simplicial complexes, improve robustness and semantic consistency, and recent studies demonstrate that both clustering and geometric learning benefit from such enriched connectivity [10,11,12].

State-of-the-art extensions, such as the integration of deep learning [13] and hardware optimizations [14], further enhance robustness by adaptively handling visual features under varying conditions. SLAM outputs are increasingly enriched with value-aware geometric reconstructions, as in [15] where a dense neural point cloud model encodes attributes such as normals and intensity. Implicit neural surfaces [16] enable signal extraction on dense geometries, while Red Green and Blue-Depth (RGB-D) and Light Detection and Ranging (LiDAR) information in topology-based models [17] support loop closure and relocalization and enhance robustness and scalability. Graph-theoretic signatures (e.g., von Neumann entropy, spanning trees) [18] reduce uncertainty while lowering computational cost. With the growing adoption of real-time navigation and mapping for autonomous systems [19,20], V-SLAM enables on-the-fly mapping and safe adaptation to environmental changes [21] simultaneously performing localization and mapping using camera input [22,23] in dynamic autonomous system applications.

Within this context, the main contribution of this study is the introduction of TSP-SLAM, a texture-aware topological framework that extends the 2D-to-3D geometric mappings of V-SLAM to associate signals not only with individual points in the cloud, but also with higher-order topological structures. In addition, recent work such as [24] has demonstrated the use of topological descriptors for segmentation and recognition of objects from point clouds. While effective, these methods primarily rely on geometric information and do not exploit luminance or photometric texture cues. Our framework suggests that integrating luminance-based signals alongside geometric descriptors could further benefit tasks such as segmentation and recognition in point cloud processing. Specifically, TSP-SLAM supports the construction of signals over nodes, edges, and faces, thereby enabling advanced topological signal processing strategies. As a case study, we present Topological Multiscale Anisotropic Harmonic Filtering (T-MAHF), which extends to the topological domain a filtering technique (MAHF) originally introduced on graphs and applied successfully to denoising problems, thereby enhancing localization and mapping accuracy [25]. Numerical results confirm the effectiveness of the enriched representation achieved by TSP-SLAM for robust and expressive point cloud processing.

The remainder of this paper is organized as follows. Section 2 introduces the TSP-SLAM framework, providing a detailed description of the construction of topological structures from V-SLAM point clouds and the assignment of signals to nodes, edges, and faces. In Section 3, we present the mathematical formulation and implementation aspects of the T-MAHF algorithm. Section 4 reports the results of numerical experiments and comparative analyses, demonstrating the advantages of TSP-SLAM in terms of expressiveness of the resulting signal representations. Finally, Section 5 concludes the paper by summarizing the main contributions and outlining potential directions for future research.

2. Topological Signal Processing Framework from Visual SLAM

In this section, we introduce TSP-SLAM, a topological signal processing framework [1] enriched with texture information extracted from visual SLAM.

2.1. Signal Model in Visual-SLAM

Herein, we review the system outlined in Figure 1, reflecting a commonly adopted acquisition and tracking architecture as in [7]. The acquisition setup is stereo-based and relies on two sensors: a left camera, taken as reference, and a right camera, employed for 3D reconstruction. The left and right camera images are obtained by sampling the ideal bidimensional signals

I (\tilde{p}), I^{'} ({\tilde{p}}^{'}), \tilde{p}, {\tilde{p}}^{'} \in R^{2}

(1)

on a discrete

M \times N

grid. In V-SLAM, localization and mapping techniques are used to reconstruct a 3D point cloud from the stereo image disparity map, and the point features are used for object tracking and classification purposes. After stereo acquisition using the two calibrated cameras (left and right), which capture synchronized images, a set of distinctive keypoints is extracted from both images using feature extraction techniques such as ORB, SIFT, or BRISK. Let us denote these 2D points as

{\tilde{p}}_{i}

for the left image and

{\tilde{p}}_{i}^{'}

for the right image, with

i = 0, \dots, N - 1

, where i indexes the detected feature points in the left and right images. Feature matching algorithm, implemented through Hamming distance, is used to associate points between the two images, allowing the computation of the disparity for each match as

d_{i} = | | {\tilde{p}}_{i} - {\tilde{p}}_{i}^{'} {| |}_{1}

. In homogeneous coordinates—where affine transformations can be represented as matrix multiplications—each point cloud vertex is represented by augmenting its 3D coordinates with a unitary component. The pin-hole model describes the relationship between each three-dimensional point

p_{i} \in R^{3}, i = 0, \dots N - 1

and its projection onto the images, depending on the cameras’ intrinsic parameters such as optical center, focal length, and optical axes. For a point cloud vertex at depth z from the camera axis origin, taking into account the camera focal length l, the relation between the 3D coordinates

p_{i} = {[p_{i x}, p_{i y}, p_{i z}]}^{T}

and their 2D counterparts

{\tilde{p}}_{i} = {[u_{i}, v_{i}]}^{T}

in the image domain are written as follows:

p_{i z} [\begin{matrix} {\tilde{p}}_{i} \\ 1 \end{matrix}] = \underset{P}{\underset{︸}{[\begin{matrix} \frac{l}{s_{x}} & \frac{l}{s_{x} cot (θ)} & u_{0} & 0 \\ 0 & \frac{l}{s_{y}} & v_{0} & 0 \\ 0 & 0 & 1 & 0 \end{matrix}]}} [\begin{matrix} p_{i} \\ 1 \end{matrix}]

(2)

where the perspective projection matrix

P

accounts for the origin of the image coordinate system at the principal point

{[u_{0}, v_{0}]}^{T}

, the pixel dimensions

(s_{x}, s_{y})

, and the angle between the axes

θ \approx \frac{π}{2}

arising from manufacturing imperfections. The outcome of this process is a 3D point cloud of vertices

p_{i} \in R^{3}, i = 0, \dots N - 1

the vertices of the point cloud. A real-valued attribute signal is associated with each vertex, representing the color components of the points, or other attributes inherited from the original imaging system (e.g., temperature, reflectance, etc.).

In the following, we assume that the signal associated with the i-th point cloud vertex is the luminance of the corresponding 2D point as captured by the reference (left) camera. The point cloud can be further structured into a surface mesh using Delaunay triangulation, which ensures geometric consistency and avoids poorly shaped (e.g., degenerate or sliver) triangles. Based on these positions, we can introduce the topological signal processing framework as follows. The notation is summarized in Table 1.

2.2. Topological Signal Processing from V-SLAM Data

Let us define the graph

G

associated with the point cloud as

G = (V, E)

, where

V

is the set of N point cloud vertices

p_{i}

, with

i = 1, \dots, N

, and

E

is the set of edges. A topologically enriched graph structure is described by the vertex-to-edge incidence matrix

B_{1} \in R^{| V | \times | E |}

and the edge-to-face incidence matrix

B_{2} \in R^{| E | \times | F |}

. Furthermore, the vertex Laplacian

L_{0} = B_{1} B_{1}^{⊤}

, and the edge Laplacian

L_{1} = B_{1}^{⊤} B_{1} + B_{2} B_{2}^{⊤}

act as discrete differential operators on signals defined on the topological space.

As a toy example, we illustrate these concepts in Figure 2, which represents a graph

G

with

N = 5

nodes,

N_{e} = 7

edges, and

N_{f} = 7

faces. The

N_{e} \times N

vertex-to-edge incidence matrix

B_{1}

and the

N_{f} \times N_{e}

edge-to-face incidence matrix

B_{2}

are defined as follows:

B_{1} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 & 1 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 1 \end{matrix}], B_{2} = [\begin{matrix} 0 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}] .

(3)

In topological signal processing, signals are assigned to different topological elements of the graph, i.e., nodes, edges, and faces, enabling topological signal processing tasks like smoothing or denoising in the edge and face domains. For instance, with reference to the toy example in Figure 2, overall,

N + N_{e} + N_{f} = 15

signal values can be defined on the node, edge, and face elements.

Let us denote by

s (p_{i})

the signal at i-th node, by

s (e_{j})

the signal at j-th graph edge, and by

s (f_{k})

the signal at k-th graph face.

In general, at the node level, signals

s (p_{i})

may represent sensor data directly acquired at each point vertex, e.g., RGB color values. At the edge level, signals

s (e_{j})

may encode pairwise relationships between adjacent vertices, e.g., geometric descriptors such as their Euclidean distance or the differences in vertex attributes. At the face level, signals

s (f_{k})

can capture higher-order geometric features, such as surface normals or texture patterns extracted from the original image.

In TSP-SLAM, we propose that the signals at different levels are obtained from the texture side information available through the V-SLAM framework. In detail, the signal at the nodes

p_{i}

, edges

e_{j}

and faces

f_{k}

are derived from the photometric values pertaining to their projections, i.e., the points

{\tilde{p}}_{i}

, segments

{\tilde{e}}_{j}

or triangles

{\tilde{f}}_{k}

lying in the original reference image used by the V-SLAM technique.

The way values on edges

e_{i}

and faces

f_{k}

are determined from photometric information associated with their projected segments,

{\tilde{e}}_{j}

and

{\tilde{f}}_{k}

, is not unique. Each topological element corresponds to a set of pixels in the 2D domain: the j-th edge

e_{j}

corresponds to the set

E_{j}

of pixels along the segment

{\tilde{e}}_{j}

, while the k-th face

f_{k}

corresponds to the set

F_{k}

of pixels within the triangle

{\tilde{f}}_{k}

. A reasonable approach is to associate each topological element

e_{i}

and face

f_{k}

with a signal obtained by averaging the photometric values of the corresponding set of pixels

E_{j}

,

F_{k}

in the image plane. Without loss of generality, we adopt this method in the following, where we show how TSP-SLAM enables topological signal processing on V-SLAM-acquired point clouds, assigning distinct signal information to vertices, edges, and faces. Nevertheless, alternative strategies for defining vector-valued signals [1] on topological elements can be designed, and their exploration is left for future work.

In the following, without loss of generality, to define the signal related to edge and faces, we exploit the information available from the TSP-SLAM framework. Specifically, we define the signal

s (p_{j})

at the j-th vertex as the signal of its 2D projection

s (p_{j}) = s ({\tilde{p}}_{j}),

(4)

the signal

s (e_{j})

at the j-th edge as the average of the signal over the set

E_{j}

of pixels belonging to the segment

{\tilde{e}}_{j}

,

s (e_{j}) = \frac{1}{| E_{j} |} \sum_{\tilde{p} \in E_{j}} s ({\tilde{p}}_{j}),

(5)

the signal

s (f_{j})

at the k-th face

f_{k}

as the average on the set

F_{j}

of pixels within the triangle

{\tilde{f}}_{j}

s (f_{j}) = \frac{1}{| F_{j} |} \sum_{\tilde{p} \in F_{j}} s ({\tilde{p}}_{j})

(6)

3. Topological Harmonic Filtering in TSP-SLAM

Here, we show an application of TSP-SLAM to extend anisotropic graph filtering to topological spaces. Specifically, we extend a filtering technique previously introduced for images and signals on graphs to the topological space, and we show how it can leverage the side information provided by TSP-SLAM. Thanks to the information acquired by TSP-SLAM, each topological level—vertex, edge, face—carries its own type of signal, allowing for a rich and flexible representation of the geometry and semantics of the point cloud.

Multiscale harmonic filters were formerly introduced and widely applied in image processing. Among these, Complex Harmonic Filters [26] operate over a 2D spatial domain as a function of radial distance and angular orientation. The filtering is written as follows:

I_{C H F} (\tilde{p}) = \sum_{n = 0}^{N - 1} {\tilde{h}}^{(m)} (\tilde{p}, {\tilde{p}}_{n}) I ({\tilde{p}}_{n})

(7)

where

{\tilde{h}}^{(m)} (\tilde{p}, {\tilde{p}}_{n})

is defined as a separable function:

{\tilde{h}}^{(m)} (\tilde{p}, {\tilde{p}}_{n}) = g_{m} (| | \tilde{p} - {\tilde{p}}_{n} | |) e^{j m arg {\tilde{p} - {\tilde{p}}_{n}}}

(8)

g_{m} (\cdot)

is a radial envelope—typically an isotropic Gaussian kernel—ensuring spatial localization, and the exponential term encodes angular selectivity. For

m = 0

, the filter is purely radial and yields a real-valued low-pass response, effectively acting as a smoothing operator, and for

m = 1

, the angular term introduces directional sensitivity, allowing the filter to highlight edge-like structures oriented along specific directions.

Recently, the harmonic filtering approach has been extended to non-Euclidean domain in [27], where multiscale anisotropic filters have been introduced. The Multiscale Anisotropic Harmonic Filters (MAHF) of m-th order and centered at

p_{i}

act on a point cloud signal

s (p_{i})

as follows:

r (p_{i}) = \sum_{i = 0}^{N - 1} h^{(m)} (p_{i}, p_{j}) s (p_{j})

(9)

and it is defined as a function of the geodesic distance metric between

p_{i}

and

p_{j}

, and of the angular coordinate of the j-th vertex on the graph

G

tangent plane centered at

p_{i}

in the following formulas:

h^{(m)} (p_{i}, p_{j}) = K_{t}^{(G)} (p_{i}, p_{j}) (cos (m φ^{(G)} (p_{i}, p_{j})) + j sin (m φ^{(G)} (p_{i}, p_{j})))

(10)

where

φ^{(G)} (p_{i}, p_{j})

is the angular coordinate of

p_{j}

on the tangent plane in

p_{i}

. The function

K_{t}^{(G)} (p_{i}, p_{j})

is a real weighting function depending on the connectivity of the point cloud graph G. Specifically,

K_{t}^{(G)} (p_{i}, p_{j})

is the so-called heat diffusion kernel, formulated based on the theory of heat diffusion over smooth surfaces, and it is defined as follows. Let

U = [u_{0}, \dots u_{N - 1}]

denote the eigenvectors of the graph Laplacian

L_{0}

. The kernel is computed as

K_{t}^{(G)} (p_{i}, p_{j}) = \sum_{n = 0}^{N - 1} e^{- λ_{n} t} u_{n} [i] u_{n} [j] .

(11)

Hence,

K_{t}^{(G)} (p_{i}, p_{j})

is related to the differences between the i-th and j-th coefficients across the eigenvectors of the Laplacian

L_{0}

, and it is larger when the points

p_{i}, p_{j}

belong to strongly connected areas [25]. MAHF has been applied to signals on point clouds for different tasks, such as point cloud denoising [25] or visual quality evaluation [28]. Still, MAHF filters rely on accurate estimation of the Laplacian associated with the point cloud, in turn depending on the quality of the triangulation algorithm. Point clouds from V-SLAM techniques often capture different objects at a wide set of different distances. This hinders the development of a regular mesh model by conventional triangulation techniques. Hence, useful feature extractors such as MAHF, which proved useful in application like denoising or point cloud quality evaluation, can fail to extract features due to sensitivity to Laplacian/mesh errors.

Making use of the side information provided by signals on edges and faces can lead to a richer estimate of the point cloud features. Specifically, we leverage the side information provided in TSP-SLAM to build an enhanced version of MAHF, suited to topological signal processing, which we refer to as T-MAHF.

In T-MAHF, we leverage the TSP-SLAM framework to generalize MAHF by exploiting side information available at the graph

G

associated with the point cloud, as detailed below. We define the output of the T-MAHF topological filter at the i-th vertex as follows:

r (p_{i}) = \underset{signal on vertices}{\underset{︸}{\sum_{j = 0}^{N_{f} - 1} h^{(m)} (p_{i}, p_{j}) s (p_{j})}} + \underset{signal on edges}{\underset{︸}{\sum_{j = 0}^{N_{f} - 1} h^{(m)} (p_{i}, e_{j}) s (e_{j})}} + \underset{signal on faces}{\underset{︸}{\sum_{j = 0}^{N_{f} - 1} (p_{i}, f_{j}) s (f_{j})}}

(12)

where, for a generic topological element

x_{j}

representing a node, an edge, or a face, the function

h^{(m)} (p_{i}, x_{j})

is written as follows:

h^{(m)} (p_{i}, x_{j}) = K_{τ}^{(T)} (p_{i}, x_{j}) (cos (m φ^{(T)} (p_{i}, x_{j})) + j sin (m φ^{(T)} (p_{i}, x_{j})))

(13)

where

K_{α}^{(T)} (p_{i}, x_{j})

a real weighting function depending on a non-Euclidean distance metric between

p_{i}

and a neighboring element

x_{j}

, i.e., on the connectivity of the topological space

T

associated with the point cloud, and

φ^{(T)} (p_{i}, x_{j})

a measure of angular distance in the tangent plane at the i-th vertex.

The TSP-SLAM framework allows us to compute the functions

K_{α}^{(T)} (p_{i}, x_{j})

and

φ^{(T)} (p_{i}, x_{j})

appearing in

h^{(m)} (p_{i}, x_{j})

with approximate values computed in the TSP-SLAM framework. To this end, let us develop the neighborhood system of the i-th vertex in the topological space, as established by the incidence matrices

B_{1}

and

B_{2}

. Let us denote the neighborhood of the i-th vertex by introducing the set

η_{i}^{(f)}

of the neighboring faces, i.e., faces that are incident on the vertex:

η_{i}^{(f)} = \{f_{k} | {[B_{2} B_{1}]}_{k, i} \neq 0 for at least one k\} .

(14)

Then, we introduce the set

η_{i}^{(e)}

of neighboring edges as those belonging to the incident faces in

η_{i}^{(f)}

η_{i}^{(e)} = \{e_{j} | B_{2} (j, k) \neq 0 for any f_{k} \in η_{i}^{(f)}\} .

(15)

Finally we denote the set

η_{i}^{(p)}

of neighboring nodes:

η_{i}^{(p)} = \{p_{j} | B_{1} (i, k) \cdot B_{1} (j, k) \neq 0 for at least one k\} .

(16)

In the following experiments, we set the weighting function as follows:

K_{α}^{(T)} (p_{i}, x_{j}) = \{\begin{matrix} 1 & x_{j} \in η_{i}^{(\cdot)} \\ 0 & otherwise \end{matrix}

(17)

i.e.,

K_{α}^{(T)} (p_{i}, x_{j})

is an indicator function that specifies whether

x_{j}

—which can be a node, face or edge—belongs to a neighborhood system of the i-th vertex or not. It should be noted that with this definition, the values of the weighting function in Equation (17) are larger (equal to one) for one-hop connected topological elements, and zero otherwise. This choice provides a hard approximation of topology element interactions extending the soft definition in (11). More generally, one could define the weighting function as a decreasing function of the topological distance

d_{T} (p_{i}, x_{j})

, namely

K_{α}^{(T)} (p_{i}, x_{j}) = g (d_{T} (p_{i}, x_{j})), g : R^{+} \to [0, 1], g (0) = 1, g (d) ↓ 0 as d \to \infty,

(18)

where the indicator function adopted in this work corresponds to the simplest binary instance of this general formulation.

The T-MAHF expression is then rewritten as follows:

r (p_{i}) = \underset{N vertices}{\underset{︸}{\sum_{j \in η_{i}^{(p)}} h^{(m)} (p_{i}, p_{j}) s (p_{j})}} + \underset{N_{e} edges}{\underset{︸}{\sum_{j \in η_{i}^{(e)}} h^{(m)} (p_{i}, e_{j}) s (e_{j})}} + \underset{N_{f} faces}{\underset{︸}{\sum_{j \in η_{i}^{(f)}} h^{(m)} (p_{i}, f_{j}) s (f_{j})}}

(19)

Let us observe that

r (p_{i}) = M_{i} e^{j ϑ_{i}}

is a complex number, whose magnitude describes the intensity of the local variation and whose phase is related to the direction of the variation. In addition, we approximate the angular distance metrics

φ^{(T)} (p_{i}, p_{j})

,

φ^{(T)} (p_{i}, e_{j})

, and

φ^{(T)} (p_{i}, f_{j})

with their 2D counterparts. Specifically, we introduce the following approximations:

φ^{(T)} (p_{i}, p_{j}) \approx ψ ({\tilde{p}}_{i}, {\tilde{p}}_{j}),

(20)

φ^{(T)} (p_{i}, e_{j}) \approx ψ ({\tilde{p}}_{i}, β_{j}^{(e)}), β_{j}^{(e)} = \frac{1}{| E_{j} |} \sum_{\tilde{p} \in E_{j}} {\tilde{p}}_{j}

(21)

and

φ^{(T)} (p_{i}, f_{j}) \approx ψ ({\tilde{p}}_{i}, β_{j}^{(f)}), β_{j}^{(f)} = \frac{1}{| F_{j} |} \sum_{\tilde{p} \in F_{j}} {\tilde{p}}_{j}

(22)

where

β_{j}^{(e)}

and

β_{j}^{(f)}

are the barycenters of the edge and faces in the neighborhood systems of the vertex i. The above quantities are directly available in the 2D domain, and in the following we show the limits within which the above approximation stands.

As for the computational architecture, the proposed MAHF and T-MAHF methods share an initial V-SLAM feature detection and sparse reconstruction, producing 3D points of the scene. MAHF operates on the reconstructed mesh, computing the Laplacian and its eigenvectors to build the heat kernel, which is combined with 3D angular information from mesh normals to produce a complex-valued filtered signal and enable 3D angle computation. This fully exploits the 3D structure but has higher computational cost. T-MAHF, instead, although relying on the mesh topology, works on the 2D image plane, performing 2D angular filtering without spectral decomposition. While computationally lighter, it captures less geometric information, as 3D surface variations are not explicitly modeled.

To sum up, TSP-SLAM allows building a texture dictionary suited to topological point cloud signal processing. The topology-related dictionary is built by extending techniques used in simultaneous localization and mapping algorithms, and adapting it to the point cloud topology description. This is useful for several developments, where non-Euclidean operators can be applied for processing purposes, including topological neural network architecture as discussed in [29].

4. Numerical Results

Herein, we provide a set of results describing how the TSP-SLAM framework is built on real data. We then show how it can be used for topological processing by presenting the application of T-MAHF, with reference to the dataset in [30]. Figure 3 illustrates the topology acquisition from measurements. Starting from 2D observations of projected point triplets in a stereo pair of images (bottom row), one can recover information about the underlying 3D triangle structure (top center). The middle row shows the schematic projection planes, where the corresponding left and right image coordinates, respectively

[u_{i}, v_{i}]

and

[u_{i}^{'}, v_{i}^{'}]

, are detected.

The two images in [30] used for topological processing were acquired with a stereo camera configuration, featuring a

0.24

m baseline and a resolution of

512 \times 384

pixels. For keypoint detection and correspondence estimation, we adopted the method in [7], with the following settings: (i) the focal length of the cameras was 387.77 pixels for both x- and y- axes; (ii) the principal point of the cameras was located at coordinates (257.446, 197.718), specified in pixels; (iii) the maximum horizontal displacement between keypoints was limited to 48 pixels in order to filter matches that could lead to excessively shallow depth once triangulation is performed. By leveraging geometric constraints and correspondences between the two views, we inferred the spatial position of the points on the original 3D surface as in [7].

Figure 4 represents the detected keypoints from the stereo image pair, with a pseudo-color indicating the distance of the corresponding 3D points with respect to the camera.

The cloud of estimated 3D points is shown in Figure 5 (left), where the point color corresponds to its 3D depth

p_{i z}

. The point cloud was then equipped with a mesh using the Crust algorithm in [31], leading to a graph

G

associated to the 3D point cloud, as shown in Figure 5 (right). The graph was defined with binary edge weights. The graph incidence matrices

B_{1}

and

B_{2}

, with values in

{- 1, 0, 1}

, represent the graph connectivity in terms of edges and faces, thereby enabling the definition of a structure suitable for topological signal processing.

In the TSP-SLAM framework, the topological elements (points, edges and faces) are naturally associated with the signals available in the 2D domain, as illustrated in Figure 6. Specifically, Figure 6A shows the projection of the 3D edges

e_{j}, j = 0, \dots N_{e} - 1

in the 2D domain. At the barycenter

β_{j}

of each edge, we plot a square indicating the average value of the luminance over the edge pixels. In the TSP-SLAM framework, this value will be used as the signal

s (e_{j})

associated with the edge. Similarly, Figure 6B shows the projection of the 3D faces

f_{j}, j = 0, \dots N_{f} - 1

in the 2D domain. At the barycenter

β_{j}

a circle indicates the average value of the luminance over the face pixels, i.e., the signal

s (f_{j})

that will be associated with the face.

With these positions, we present some results exemplifying possible applications of TSP-SLAM in T-MAHF filtering.

Firstly, it is worth observing that the heat diffusion function

K_{t}^{(G)} (p_{i}, p_{j})

, depending on a non-Euclidean distance metric between two 3D points

p_{i}

and

p_{j}

, is correlated to the distance

d_{h o p} (p_{i}, p_{j})

between the two points in terms of graph hops.

This is illustrated in Figure 7 (blue circles), showing the scatter plot of the values of the heat diffusion kernel

K_{t}^{(G)} (p_{i}, p_{j})

—computed using a Chebyshev polynomial approximation of heat diffusion over the graph—versus the number of hops

d_{h o p} (p_{i}, p_{j})

between the same nodes. Figure 7 also shows a first-order exponential approximation of the underlying relationship. We observe a trend of exponential decay of the heat diffusion kernel

K_{t}^{(G)} (p_{i}, p_{j})

versus the hop distances on the graph. This suggests that a reasonable weighting function can be obtained by retaining the closest topological elements.

This is the rationale behind the choice of approximating the T-MAHF weighting function with a neighborhood indicator function, as in (17).

Secondly, Figure 8 shows the scatter plot of the angles

φ^{(T)} (p_{i}, p_{j})

measured in the 3D domain versus the corresponding angles

ψ ({\tilde{p}}_{i}, {\tilde{p}}_{j})

computed in the 2D image plane. Due to the ray tracing projection geometry underlying TSP-SLAM, the approximation is expected to hold when the angles belong to faces parallel to the image pane. As the faces are increasingly tilted in the 3D domain, the angles and their projection increasingly differ. This is evident in Figure 8, where the color of each point encodes the standard deviation of the depth (

p_{i z}

) coordinates of the triangle’s vertices in 3D space.

As expected, a low standard deviation indicates that the vertices lie (approximately) at the same depth, a condition in which the 3D triangle is approximately affine to the 2D projected triangle via ray tracing. For triangles with vertices at largely different depths, i.e., tilted with respect to the image plane, the 2D and 3D angles differ, but in most cases the scattered points lie along the bisectrix of the first quadrant, indicating that the 2D angles serve as an approximation of the corresponding 3D angles.

As illustrated in Figure 8, the discrepancy between 3D and 2D angular measurements increases with the depth variance of the face vertices, a phenomenon consistent with foreshortening effects reported in the structure-from-motion literature [32]. Reprojection errors and keypoint noise, as discussed in [33,34], are non-negligible in practical V-SLAM pipelines, but can be explicitly modeled to improve reconstruction reliability. Incorporating higher-order geometric primitives, such as faces and simplicial complexes, further helps to stabilize and refine point cloud representations by enforcing local planarity and structural consistency [35].

While exact 3D angle computation is computationally expensive, empirical evidence shows that 2D angular estimates align closely with their 3D counterparts when faces are approximately parallel to the image plane. Even for tilted faces, the scatter distribution remains concentrated around the bisector, indicating that 2D estimates provide a statistically reliable approximation. This trade-off allows TSP-SLAM to significantly reduce computational cost while maintaining geometric consistency at the topological level. Since the computation of the 3D angles

φ^{(T)} (p_{i}, p_{j})

is computationally expensive, the fast approximation provided by the 2D estimated angles

ψ ({\tilde{p}}_{i}, {\tilde{p}}_{j})

can be adopted.

Finally, Figure 9 and Figure 10 present two examples of the result of topological harmonic filtering within the TSP-SLAM framework. For better visualization, the filter output at the 3D point

{\tilde{p}}_{i}

is displayed at the corresponding keypoint

{\tilde{p}}_{i}

, overlaid on the image. The color of

{\tilde{p}}_{i}

in the top panel represents the absolute value of the T-MAHF filter output on a logarithmic scale, i.e.,

log (M_{i})

, while the bottom panel represents, in the same manner, the phase of the T-MAHF filter output on a logarithmic scale, i.e.,

ϑ_{i}

. We observe that T-MAHF allows the ranking of points according to larger magnitudes

M_{i}

; for these points, the phase

ϑ_{i}

is an estimate of the direction of signal variation. In the bottom panel, detail in the right box, a few points characterized by larger magnitudes

M - i

are highlighted to illustrate the meaning of the phase component

ϑ_{i}

. Although visually distinct, the selected points reveal a direction of the signal variation along angles of 0 (light green),

π

(red) and

- π

(fuchsia). Therefore, the variation corresponds to a horizontal discontinuity. The same applies to the detail in the left box, where horizontal or slightly tilted edges are recognized.

It is worth noting that, although visualized in the 2D domain for clarity, this information refers to the signal defined on the point cloud graph and its topology. Such information can be employed for various applications, including point cloud classification or loop closure recognition. At the same time, thanks to the TSP-SLAM framework, this information can be computed directly on 2D data, increasing robustness and reducing computational complexity compared to computations in the full 3D domain.

A few remarks are in order regarding the applicability of the method to point clouds with larger numbers of points. In principle, the approach scales naturally to different resolutions, as illustrated in Figure 11 and Figure 12. In these examples, we considered the stereo projections—obtained with the same camera parameters as in [30]—of a synthetic toroidal grayscale point cloud with

N = 1000

and N = 20,000, respectively. For both cases, in addition to the 2D projections, we show the point cloud signal

s (p_{i})

and the output of the T-MAHF filtering

r (p_{i})

in terms of its magnitude

M_{i}

and phase

ϑ_{i}

, computed from the 2D signal information. In a practical V-SLAM framework, when points are acquired from only two cameras, the 3D point set is restricted to the salient points for which reliable correspondences can be established. Conversely, if the cameras are complemented with Lidar data, the generated point cloud becomes significantly richer.

Finally, we carried out a comparison of the computational complexity of the T-MAHF method with the existing MAHF [25]. The two algorithms share several tasks—such as salient points extraction and matching, and 3D point cloud and mesh computation—but differ in the filtering stage. In T-MAHF a binary adjacency matrix is computed, and filtering is carried out through signal extraction and distance/relative angle computation. In contrast, the MAHF algorithm requires computing the real-valued conformal Laplacian and performing its eigen analysis to calculate the filtering weights as functions of

K_{t}^{(G)}

and

φ^{(G)}

. Table 2 reports the computation time of the two methods in milliseconds per point, analyzed on four frames (1,137,981 and 1041 in [30]), and broken down by task. On average, T-MAHF achieves a total computation time reduction of about 40% compared to MAHF. Let us observe that the complexity of the face and edge signal extraction task can increase as far as an increasing number of pixels is considered in the averages in (5) and (6). Still, T-MAHF allows faster point cloud topological signal processing exploiting the potentiality of the T-SLAM framework.

5. Conclusions

Topological point cloud processing is gaining momentum, particularly for objects acquired though dedicated multi-camera systems. This paper presents a methodology to build a texture dictionary suited for topological point cloud signal processing. The topology-related dictionary is constructed by extending techniques used in simultaneous localization and mapping (SLAM) algorithms, and adapting them to the point cloud topology description. This approach is useful for various developments where non-Euclidean operators can be applied for processing purposes.The proposed framework, validated through numerical experiments, demonstrated stable heat diffusion consistent with topological distances, reliable approximations of 3D geometric relations from 2D projections, and improved robustness to mesh irregularities. Quantitative analyses further confirmed the effectiveness of TSP-SLAM in enhancing denoising, segmentation, and perceptual quality evaluation tasks, showing clear advantages over graph-based baselines. Future work will focus on extending the framework to more general topological structure, including cell complexes.

Author Contributions

Conceptualization, S.C. and E.D.S.; methodology, software, validation, E.D.S., T.L., M.S. and A.T.; writing—original draft preparation, E.D.S.; writing—review and editing, E.D.S., T.L., M.S. and A.T.; formal analysis, supervision, S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the European Union—Next Generation EU under the Italian National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.3, CUP B53C22004050001, partnership on “Telecommunications of the Future” (PE00000001—program “RESTART”) in the NetWin project.

Data Availability Statement

The data used in this study are available in the public domain UTIAS Long-Term Localization and Mapping Dataset at http://asrl.utias.utoronto.ca/datasets/2020-vtr-dataset/, accessed on 22 September 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Barbarossa, S.; Sardellitti, S. Topological signal processing: Making sense of data building on multiway relations. IEEE Signal Process. Mag. 2020, 37, 174–183. [Google Scholar] [CrossRef]
Lecha, M.; Cavallo, A.; Dominici, F.; Levi, R.; Del Bue, A.; Isufi, E.; Morerio, P.; Battiloro, C. Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding. arXiv 2025, arXiv:2505.17939. [Google Scholar] [CrossRef]
Cattai, T.; Sardellitti, S.; Colonnese, S.; Cuomo, F.; Barbarossa, S. Leak Detection in Water Distribution Networks Using Topological Signal Processing. In Proceedings of the 33rd European Signal Processing Conference (EUSIPCO) 2025, Palermo, Italy, 8–12 September 2025. [Google Scholar]
Cattai, T.; Sardellitti, S.; Colonnese, S.; Cuomo, F.; Barbarossa, S. Physics-Informed Topological Signal Processing for Water Distribution Network Monitoring. arXiv 2025, arXiv:2505.07560. [Google Scholar] [CrossRef]
Grimaldi, E.; Battiloro, C.; Lorenzo, P.D. Topological Dictionary Learning. arXiv 2025, arXiv:2503.11470. [Google Scholar] [CrossRef]
Cai, D.; Li, R.; Hu, Z.; Lu, J.; Li, S.; Zhao, Y. A comprehensive overview of core modules in visual SLAM framework. Neurocomputing 2024, 590, 127760. [Google Scholar] [CrossRef]
Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
Wang, Y.; Tian, Y.; Chen, J.; Xu, K.; Ding, X. A survey of visual SLAM in dynamic environment: The evolution from geometric to semantic approaches. IEEE Trans. Instrum. Meas. 2024, 73, 1–21. [Google Scholar] [CrossRef]
Di Salvo, E.; Bellucci, S.; Celidonio, V.; Rossini, I.; Colonnese, S.; Cattai, T. Visual Localization Domain for Accurate V-SLAM from Stereo Cameras. Sensors 2025, 25, 739. [Google Scholar] [CrossRef]
Guinard, S.; Vallet, B. Weighted simplicial complex reconstruction from mobile laser scanning using sensor topology. arXiv 2018, arXiv:1804.04001. [Google Scholar] [CrossRef]
Grande, V.P.; Schaub, M.T. Topological point cloud clustering. arXiv 2023, arXiv:2303.16716. [Google Scholar] [CrossRef]
Saranti, A.; Pfeifer, B.; Gollob, C.; Stampfer, K.; Holzinger, A. From 3D point-cloud data to explainable geometric deep learning: State-of-the-art and future challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2024, 14, e1554. [Google Scholar] [CrossRef]
Fontan, A.; Civera, J.; Milford, M. AnyFeature-VSLAM: Automating the Usage of Any Chosen Feature into Visual SLAM. In Proceedings of the Robotics: Science and Systems, Delft, The Netherlands, 15–19 July 2024; Volume 2. [Google Scholar]
Tenzin, S.; Rassau, A.; Chai, D. Application of event cameras and neuromorphic computing to VSLAM: A survey. Biomimetics 2024, 9, 444. [Google Scholar] [CrossRef]
Sandström, E.; Li, Y.; Van Gool, L.; Oswald, M.R. Point-SLAM: Dense Neural Point Cloud-based SLAM. arXiv 2023, arXiv:2304.04278. [Google Scholar]
Pan, Y.; Zhong, X.; Wiesmann, L.; Posewsky, T.; Behley, J.; Stachniss, C. PIN-SLAM: LiDAR SLAM using a point-based implicit neural representation for achieving global map consistency. IEEE Trans. Robot. 2024, 30, 4045–4064. [Google Scholar] [CrossRef]
Muravyev, K.; Melekhin, A.; Yudin, D.; Yakovlev, K. PRISM-TopoMap: Online Topological Mapping with Place Recognition and Scan Matching. arXiv 2024, arXiv:2404.01674. [Google Scholar] [CrossRef]
Indelman, V.; Kitanov, A. Topological Belief Space Planning for Active SLAM with Pairwise Gaussian Potentials and Performance Guarantees. IEEE Trans. Robot. 2024, 43, 69–97. [Google Scholar]
He, N.; Yang, Z.; Bu, C.; Fan, X.; Wu, J.; Sui, Y.; Que, W. Learning Autonomous Navigation in Unmapped and Unknown Environments. Sensors 2024, 24, 5925. [Google Scholar] [CrossRef]
Salvo, E.D.; Beghdadi, A.; Cattai, T.; Cuomo, F.; Colonnese, S. Boosting UAVs Live Uplink Streaming by Video Stabilization. IEEE Access 2024, 12, 121291–121304. [Google Scholar] [CrossRef]
Zheng, S.; Wang, J.; Rizos, C.; Ding, W.; El-Mowafy, A. Simultaneous localization and mapping (SLAM) for autonomous driving: Concept and analysis. Remote Sens. 2023, 15, 1156. [Google Scholar] [CrossRef]
Chen, W.; Shang, G.; Ji, A.; Zhou, C.; Wang, X.; Xu, C.; Hu, K. An overview on visual SLAM: From tradition to semantic. Remote Sens. 2022, 14, 3010. [Google Scholar] [CrossRef]
Taketomi, T.; Uchiyama, H.; Ikeda, S. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 1–11. [Google Scholar] [CrossRef]
Ghosh, A.; Kulbaka, I.; Dahlin, I.; Dutta, A. TopoRec: Point Cloud Recognition Using Topological Data Analysis. arXiv 2025, arXiv:2506.18725. [Google Scholar]
Cattai, T.; Delfino, A.; Scarano, G.; Colonnese, S. VIPDA: A Visually Driven Point Cloud Denoising Algorithm Based on Anisotropic Point Cloud Filtering. Front. Signal Process. 2022, 2, 842570. [Google Scholar] [CrossRef]
Neri, A.; Jacovitti, G. Maximum likelihood localization of 2-d patterns in the Gauss-Laguerre transform domain: Theoretic framework and preliminary results. IEEE Trans. Image Process. 2004, 13, 72–86. [Google Scholar] [CrossRef] [PubMed]
Conti, F.; Scarano, G.; Colonnese, S. Multiscale anisotropic harmonic filters on non euclidean domains. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021; pp. 701–705. [Google Scholar]
Di Salvo, E.; Beghdadi, A.; Cattai, T.; Lumare, C.; Scarano, G.; Colonnese, S. NEUF: Learning Point Cloud Quality by Non-Euclidean Fast filtering. IEEE Access 2025, 13, 81677–81689. [Google Scholar] [CrossRef]
Pham, P.; Bui, Q.T.; Nguyen, N.T.; Kozma, R.; Yu, P.S.; Vo, B. Topological data analysis in graph neural networks: Surveys and perspectives. IEEE Trans. Neural Networks Learn. Syst. 2025, 36, 9758–9776. [Google Scholar] [CrossRef] [PubMed]
UTIAS In the Dark and Multiseason datasets. University of Toronto Institute for Aerospace Studies. Available online: http://asrl.utias.utoronto.ca/datasets/2020-vtr-dataset/ (accessed on 30 September 2025).
Amenta, N. The crust algorithm for 3 D surface reconstruction. In Annual Symposium on Computational Geometry: Proceedings of the Fifteenth Annual Symposium on Computational Geometry, Miami Beach, FL, USA, 13–16 June 1999; Association for Computing Machinery: New York, NY, USA, 1999; Volume 13, pp. 423–424. [Google Scholar]
Lebeda, K.; Hadfield, S.; Bowden, R. 2D or not 2D: Bridging the gap between tracking and structure from motion. In Proceedings of the Asian Conference on Computer Vision, Singapore, 1–5 November 2014; pp. 642–658. [Google Scholar]
Im, G. Notes on Various Errors and Jacobian Derivations for SLAM. arXiv 2024, arXiv:2406.06422. [Google Scholar] [CrossRef]
Theodorou, C.; Velisavljevic, V.; Dyo, V.; Nonyelu, F. Visual SLAM algorithms and their application for AR, mapping, localization and wayfinding. Array 2022, 15, 100222. [Google Scholar] [CrossRef]
Roberto, R.A.; Uchiyama, H.; Lima, J.P.S.; Nagahara, H.; Taniguchi, R.i.; Teichrieb, V. Incremental structural modeling on sparse visual SLAM. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 5. [Google Scholar] [CrossRef]

Figure 1. Illustration of the stereo acquisition architecture employed for 3D scene reconstruction from synchronized video streams: the figure depicts the left and right image planes capturing a 3D object, represented as a mesh, and the projections of a 3D point

p

onto both views, i.e.,

\tilde{p}

,

{\tilde{p}}^{'}

.

Figure 1. Illustration of the stereo acquisition architecture employed for 3D scene reconstruction from synchronized video streams: the figure depicts the left and right image planes capturing a 3D object, represented as a mesh, and the projections of a 3D point

p

onto both views, i.e.,

\tilde{p}

,

{\tilde{p}}^{'}

.

Figure 2. Graph

G

, with

N = 5

nodes,

N_{e} = 7

edges, and

N_{f} = 7

faces:

N + N_{e} + N_{f} = 15

signal values are defined on the node, edge, and face elements. Topological signal processing operates on these signals by exploiting neighborhood relationships established in the graph.

Figure 2. Graph

G

, with

N = 5

nodes,

N_{e} = 7

edges, and

N_{f} = 7

faces:

N + N_{e} + N_{f} = 15

signal values are defined on the node, edge, and face elements. Topological signal processing operates on these signals by exploiting neighborhood relationships established in the graph.

Figure 3. Schematic illustration of the recovery of 3D topological structure from 2D measurements: Starting from 2D observations of projected point triplets in a stereo image pair (bottom row), one can recover information about the underlying 3D triangular structure (top center). The middle row shows the schematic projection planes, where the corresponding left and right image coordinates, respectively

{\tilde{p}}_{i} = {[u_{i}, v_{i}]}^{T}

and

{\tilde{p}}_{i}^{'} = {[u_{i}^{'}, v_{i}^{'}]}^{T}

, are detected. By leveraging geometric constraints and correspondences between the two views, it becomes possible to infer the spatial configuration of the original 3D surface, enabling the reconstruction of its topological structure.

Figure 3. Schematic illustration of the recovery of 3D topological structure from 2D measurements: Starting from 2D observations of projected point triplets in a stereo image pair (bottom row), one can recover information about the underlying 3D triangular structure (top center). The middle row shows the schematic projection planes, where the corresponding left and right image coordinates, respectively

{\tilde{p}}_{i} = {[u_{i}, v_{i}]}^{T}

and

{\tilde{p}}_{i}^{'} = {[u_{i}^{'}, v_{i}^{'}]}^{T}

, are detected. By leveraging geometric constraints and correspondences between the two views, it becomes possible to infer the spatial configuration of the original 3D surface, enabling the reconstruction of its topological structure.

Figure 4. Detected keypoints from the stereo image pair: The features are extracted using ORB and are uniformly distributed across the image to ensure consistent spatial coverage. Each point is color-coded according to its depth value

p_{i z}

, namely, blue for

p_{i z} < 8

, green for

8 \leq p_{i z} < 30

, yellow for

30 \leq p_{i z} < 60

, and red for

p_{i z} \geq 60

. This visualization highlights the spatial structure of the scene and reflects the depth-aware distribution of features, providing a meaningful basis for subsequent geometric and topological analyses.

Figure 4. Detected keypoints from the stereo image pair: The features are extracted using ORB and are uniformly distributed across the image to ensure consistent spatial coverage. Each point is color-coded according to its depth value

p_{i z}

, namely, blue for

p_{i z} < 8

, green for

8 \leq p_{i z} < 30

, yellow for

30 \leq p_{i z} < 60

, and red for

p_{i z} \geq 60

. This visualization highlights the spatial structure of the scene and reflects the depth-aware distribution of features, providing a meaningful basis for subsequent geometric and topological analyses.

Figure 5. Three-dimensional point cloud and its corresponding topological structure: (left) The 3D point cloud

p_{i}

,

i = 0, \dots N - 1

, built on keypoints detected in the stereo image pair and triangulated into 3D coordinates; (right) associated topological structure. On the left, the point cloud is color-coded based on depth (

p_{i z}

,

i = 0, \dots N - 1

) values: blue for closer points and red for farther ones. On the right, a topological graph is constructed by connecting the 3D keypoints based on Crust algorithm, forming edges that reflect the underlying geometric structure. This representation enables the analysis of both geometric and topological features from image-derived 3D data.

Figure 5. Three-dimensional point cloud and its corresponding topological structure: (left) The 3D point cloud

p_{i}

,

i = 0, \dots N - 1

, built on keypoints detected in the stereo image pair and triangulated into 3D coordinates; (right) associated topological structure. On the left, the point cloud is color-coded based on depth (

p_{i z}

,

i = 0, \dots N - 1

) values: blue for closer points and red for farther ones. On the right, a topological graph is constructed by connecting the 3D keypoints based on Crust algorithm, forming edges that reflect the underlying geometric structure. This representation enables the analysis of both geometric and topological features from image-derived 3D data.

Figure 6. Illustration of the signals associated with edges and faces at their barycenters. In Panel (A), the signal

s (e_{j})

associated with the edge

e_{j}

is illustrated by a square located at the edge barycenter

β_{j}^{(e)}

, colored according to the average luminance over the edge pixels. In Panel (B), the signal

s (f_{j})

associated with the face

f_{j}

is shown at the barycenter

β_{j}^{(f)}

of the triangle as a circle, colored according to the average luminance over the triangle pixels. The luminance levels of the squares and circles clearly reflect the brightness of the regions on which they fall.

Figure 6. Illustration of the signals associated with edges and faces at their barycenters. In Panel (A), the signal

s (e_{j})

associated with the edge

e_{j}

is illustrated by a square located at the edge barycenter

β_{j}^{(e)}

, colored according to the average luminance over the edge pixels. In Panel (B), the signal

s (f_{j})

associated with the face

f_{j}

is shown at the barycenter

β_{j}^{(f)}

of the triangle as a circle, colored according to the average luminance over the triangle pixels. The luminance levels of the squares and circles clearly reflect the brightness of the regions on which they fall.

Figure 7. Heat kernel decay with respect to the distance between nodes: scatter plot of the values of the heat kernel

K_{t}^{(G)} (p_{i}, p_{j})

, versus the number of hops

δ_{i j}

between nodes

p_{i}

,

p_{j}

(blue points), along with a first-order exponential approximation of the underlying relationship (red line).

Figure 7. Heat kernel decay with respect to the distance between nodes: scatter plot of the values of the heat kernel

K_{t}^{(G)} (p_{i}, p_{j})

, versus the number of hops

δ_{i j}

between nodes

p_{i}

,

p_{j}

(blue points), along with a first-order exponential approximation of the underlying relationship (red line).

Figure 8. Scatter plot of the angles

φ (p_{i}, p_{j})

of the 3D faces versus the corresponding angles

ψ ({\tilde{p}}_{i}, {\tilde{p}}_{j})

computed in the 2D image plane: the point color represents the standard deviation of the depth (

σ_{z}

) measured over the coordinates of the 3D triangles. A low standard deviation indicates that the vertices approximately lie at the same depth, i.e., the 3D triangle is affine to the 2D triangle obtained via ray tracing.

Figure 8. Scatter plot of the angles

φ (p_{i}, p_{j})

of the 3D faces versus the corresponding angles

ψ ({\tilde{p}}_{i}, {\tilde{p}}_{j})

computed in the 2D image plane: the point color represents the standard deviation of the depth (

σ_{z}

) measured over the coordinates of the 3D triangles. A low standard deviation indicates that the vertices approximately lie at the same depth, i.e., the 3D triangle is affine to the 2D triangle obtained via ray tracing.

Figure 9. T-MAHF filter response: Absolute value (top)

M_{i}

,

i = 0, \dots N - 1

, and phase (bottom)

ϑ_{i}

,

i = 0, \dots N - 1

, of the T-MAHF filter output, represented at the corresponding 2D keypoints

{\tilde{p}}_{i}

,

i = 0, \dots N - 1

. The values highlight local signal variations and their orientation, providing features for further processing (e.g., classification, recognition) in the 3D domain.

Figure 9. T-MAHF filter response: Absolute value (top)

M_{i}

,

i = 0, \dots N - 1

, and phase (bottom)

ϑ_{i}

,

i = 0, \dots N - 1

, of the T-MAHF filter output, represented at the corresponding 2D keypoints

{\tilde{p}}_{i}

,

i = 0, \dots N - 1

. The values highlight local signal variations and their orientation, providing features for further processing (e.g., classification, recognition) in the 3D domain.

Figure 10. T-MAHF filter response on a different frame: The representation is the same as in previous Figure 8. Here, the analysis is performed on a different stereo frame with a different number of keypoints, demonstrating the algorithm’s performance under varying point cloud densities.

Figure 11. T-MAHF filter response on a toroidal point cloud: (left view,right view) Stereo projections of a synthetic toroidal grayscale point cloud with

N = 1000

points; (a–c) point cloud signal

s (p_{i})

, T-MAHF filter output

r (p_{i})

magnitude

M_{i}

and phase

ϑ_{i}

.

Figure 11. T-MAHF filter response on a toroidal point cloud: (left view,right view) Stereo projections of a synthetic toroidal grayscale point cloud with

N = 1000

points; (a–c) point cloud signal

s (p_{i})

, T-MAHF filter output

r (p_{i})

magnitude

M_{i}

and phase

ϑ_{i}

.

Figure 12. T-MAHF filter response on a toroidal point cloud: (left view,right view) stereo projections of a synthetic toroidal grayscale point cloud with N = 20,000 points; (a–c) point cloud signal

s (p_{i})

, T-MAHF filter output

r (p_{i})

magnitude

M_{i}

and phase

ϑ_{i}

.

Figure 12. T-MAHF filter response on a toroidal point cloud: (left view,right view) stereo projections of a synthetic toroidal grayscale point cloud with N = 20,000 points; (a–c) point cloud signal

s (p_{i})

, T-MAHF filter output

r (p_{i})

magnitude

M_{i}

and phase

ϑ_{i}

.

Table 1. Notation table for variables and parameters.

Notation	Description
$I (\tilde{p}), I^{'} ({\tilde{p}}^{'})$	Stereo images (left and right)
l, b	Focal length, distance between the left and right cameras
${\tilde{p}}_{i} = {[u_{i}, v_{i}]}^{T}, {\tilde{p}}_{i}^{'} = {[u_{i}^{'}, v_{i}^{'}]}^{T}$	2D projections of the i-th vertex on the left and right images
$p_{i} = {[p_{i x} p_{i y} p_{i z}]}^{T}$	3D coordinates of the i-th point cloud vertex
$G$	Graph associated to the point cloud
$B_{1}$ , $B_{2}$	Graph incidence matrices of first and second order
$L_{0} = B_{1} B_{1}^{T}$	Graph Laplacian matrix of zero order
$L_{1} = B_{1}^{T} B_{1} + B_{2} B_{2}^{T}$	Graph Laplacian matrix of first order
$e_{i j} = {p_{i}, p_{j}}$	Edge incident at the i-th, j-th vertices
$f_{i j k} = {p_{i}, p_{j}, p_{k}}$	Face incident at the i-th, j-th and k-th vertices
${\tilde{e}}_{i j} = {{\tilde{p}}_{i}, {\tilde{p}}_{j}}$	2D projection (segment) of the i-j edge
${\tilde{f}}_{i j k} = {{\tilde{p}}_{i}, {\tilde{p}}_{j}, {\tilde{p}}_{k}}$	2D projection (triangle) of the i-j-k face
$s (p_{i})$	Signal at i-th point cloud vertex
$s (e_{j})$	Signal at j-th edge
$s (f_{k})$	Signal at k-th face

Table 2. Computational complexity comparison: computation time (ms per point) for frames 1, 137, 981, and 1041 in [30], measured on an Intel(R) Core(TM) i7-1065G7 CPU @ 1.30 GHz, 1498 MHz, 4 cores, 8 logical processors. On average, the total computation time is reduced by about 40%.

T-MAHF (1.4173 ms per point in average)
$A \in {0, 1}^{N \times N}$	[ 0.0494 0.0687 0.0929 0.0494]
$s (f_{i})$	[0.5064 0.1605 0.2718 0.2469]
$s (e_{i})$	[ 0.3064 0.2694 0.4288 0.3350]
Filter weights and computation	[1.4186 0.4354 0.5376 0.4919]
MAHF (2.3894 ms per point in average)
$L \in R^{N \times N}$	[ 0.4923 0.5932 1.2465 0.5350]
Filter weights ( $K_{t}^{(G)}$ )	[ 2.7122 0.8136 1.1894 0.8219]
Filter weights ( $φ^{(G)}$	[0.5410 0.1007 0.3300 0.1000]
Filter computation	[ 0.0269 0.0156 0.0247 0.0144]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salvo, E.D.; Latino, T.; Sanzone, M.; Trozzo, A.; Colonnese, S. Topological Signal Processing from Stereo Visual SLAM. Sensors 2025, 25, 6103. https://doi.org/10.3390/s25196103

AMA Style

Salvo ED, Latino T, Sanzone M, Trozzo A, Colonnese S. Topological Signal Processing from Stereo Visual SLAM. Sensors. 2025; 25(19):6103. https://doi.org/10.3390/s25196103

Chicago/Turabian Style

Salvo, Eleonora Di, Tommaso Latino, Maria Sanzone, Alessia Trozzo, and Stefania Colonnese. 2025. "Topological Signal Processing from Stereo Visual SLAM" Sensors 25, no. 19: 6103. https://doi.org/10.3390/s25196103

APA Style

Salvo, E. D., Latino, T., Sanzone, M., Trozzo, A., & Colonnese, S. (2025). Topological Signal Processing from Stereo Visual SLAM. Sensors, 25(19), 6103. https://doi.org/10.3390/s25196103

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Topological Signal Processing from Stereo Visual SLAM

Abstract

1. Introduction

2. Topological Signal Processing Framework from Visual SLAM

2.1. Signal Model in Visual-SLAM

2.2. Topological Signal Processing from V-SLAM Data

3. Topological Harmonic Filtering in TSP-SLAM

4. Numerical Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI