Next Article in Journal
New Insights into Rough Set Theory: Transitive Neighborhoods and Approximations
Previous Article in Journal
Some Aspects of Differential Topology of Subcartesian Spaces
Previous Article in Special Issue
On Centralizers of Idempotents with Restricted Range
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Persistence Symmetric Kernels for Classification: A Comparative Study

by
Cinzia Bandiziol
* and
Stefano De Marchi
*
Dipartimento di Matematica “Tullio Levi-Civita”, University of Padova, 35121 Padova, Italy
*
Authors to whom correspondence should be addressed.
Symmetry 2024, 16(9), 1236; https://doi.org/10.3390/sym16091236
Submission received: 1 August 2024 / Revised: 8 September 2024 / Accepted: 10 September 2024 / Published: 20 September 2024
(This article belongs to the Special Issue Algebraic Systems, Models and Applications)

Abstract

:
The aim of the present work is a comparative study of different persistence kernels applied to various classification problems. After some necessary preliminaries on homology and persistence diagrams, we introduce five different kernels that are then used to compare their performances of classification on various datasets. We also provide the Python codes for the reproducibility of results and, thanks to the symmetry of kernels, we can reduce the computational costs of the Gram matrices.

1. Introduction

In the last two decades, with the increasing need to analyze large amounts of data, which are usually complex and of high dimension, it has been revealed as meaningful and helpful to discover further methodologies to provide new information from data. This has brought to birth Topological Data Analysis (TDA), whose aim is to extract intrinsic, topological features related to the so-called “shape of data”. Thanks to its main tool, Persistent Homology (PH), it can provide new qualitative information that would be impossible to extract in any other way. This kind of feature, which one can collect in the so-called Persistence Diagram (PD), have been gainful in many different applications, mainly related to applied science, improving the performances of models or classifiers, as in our context. Thanks to the strong basis of algebraic topology beneath it, the TDA is very versatile and can be applied to data with a priori any kind of structure, as we will explain in the following. This is the reason why there is a wide range of fields of applications, like chemistry [1], medicine [2], neuroscience [3,4], finance [5] and computer graphics [6], to name only a few.
An interesting and relevant property of this tool is its stability with regard to noise [7], which is a meaningful aspect for applications to real-world data. On the other hand, since the space of PDs is only metric, to use methods that require data to live in a Hilbert space, such as the SVM and PCA, it is necessary to introduce the notion of the kernel or, better still, the Persistence Kernel (PK), which maps PDs to space with more structure, where it is possible to apply techniques that need a proper definition of inner product. A relevant aspect to highlight is that PKs are symmetric, as usual, and, taking advantage of this symmetry, we may reduce the computational costs of the corresponding Gram matrices in our codes, since we need only to compute values onto the diagonal and below it.
In the literature, researchers have tested the PKs in the context of classification on some datasets but, to our knowledge, there is a lack of transversal analysis. For instance, the Persistence Scale-Space Kernel (PSSK) was introduced and tested in [8] on shape classification (SHREC14) and texture classification (OUTEX TC 00000), while in [9] the authors considered image classification (MNIST, HAM10000), classification of sets of points (PROTEIN) and shape classification (SHREC14, MPEG7). Ref. [10] presented the Persistence Weighted Gaussian Kernel (PWGK) and reported performances of kernels in trying to classify protein and synthetized data. In [11], the authors introduced their own kernel, the Sliced Wasserstein Kernel (SWK), and they compared it with other kernels for classification of 3D shapes, orbit recognition (linked twisted map) and texture classification (OUTEX00000). In [12], the Persistence Fisher Kernel (PFK) was tested for orbit recognition (linked twisted map) and shape classification (MPEG7). Persistence Image (PI) in [13] was used in orbit recognition (linked twisted map) and breast tumor classification [2]. Finally, in [14] there were comparisons of PWGK, SWK and PI in the context of graphs classification (PROTEIN, PTC, MUTAG, etc.…), and in [15] PSSK, PWGK, SWK, PFK were tested on classification related to Alzheimer disease, orbit recognition (linked twisted map) and classification of 3D shapes.
The goals of the present paper are as follows: first, we investigate how to choose values for parameters related to different kernels; then, we collect tools for computing PD starting from different kinds of data; finally, we compare the performances of the main kernels in the classification context. As far as we know, the content of this study is not already present in the literature.
The paper is organized as follows: in Section 2, we recall the basic notion related to persistent homology, the problem of classification and how to solve it using the Support Vector Machine (SVM) and we list the main PK available in the literature. Section 3 collects all the numerical tests that we have run, and in Section 4 we outline our conclusions.

2. Materials and Methods

2.1. Persistent Homology

This brief introduction does not claim to be exhaustive; therefore, we invite interested readers to refer, for instance, to the works [16,17,18,19,20,21]. The first ingredient needed is the concept of filtration. The most common choice in applications is to take into account a function f : X R , where X is a topological space that varies based on different contexts, and then to take into account the filtration based on the sub-level set given by f 1 ( , a ) , a R . For example, such an f can be chosen as the distance function in the case of point cloud data, the gray-scale values at each pixel for images, the heat kernel signature for datasets as SHREC14 [22], the weight function of edges for graphs, and so on. We now recall the main theoretical results related to point cloud data, but all of them can be easily applied in other contexts.
We assume a set of points X = { x k } k = 1 , , m that we suppose to live in an open set of a manifold M . The aim is to be able to capture the relevant intrinsic properties of the manifold itself, and this is achieved through Persistent Homology (PH) being applied to such discrete information. To understand how PH has been introduced, first we have to mention simplicial homology, which represents the extension of homology theory to structures called simplicial complexes, that roughly speaking, are collections of simplices, glue togheter in a valid manner, as shown in Figure 1.
Definition 1. 
A  simplicial complex   K consists of a set of simplices of different dimensions and has to meet the following conditions:
  • Every face of a simplex σ in K must belong to K;
  • The non-empty intersection of any two simplices σ 1 , σ 2 K is a face of both σ 1 and σ 2 .
The dimension of K is the maximum dimension of simplices that belong to K.
In application, data analysts usually compute the Vietoris–Rips complex.
Definition 2. 
Let ( X , d ) denote a metric space from which the samples are taken. The Vietoris–Rips complex related to X , associated to the value of parameter ϵ, denoted by V R ( X , ϵ ) , is the simplicial complex whose vertex set is X , and { x 0 , , x k } spans a k-simplex if and only if d ( x i , x j ) 2 ϵ for all 0 i , j k .
If K : = V R ( X , ϵ ¯ ) then we can divide all simplices of this set K into groups, based on their dimension k, and we can enumerate them using Δ i k . If G = ( Z , + ) is the well-known Abelian group, we may build linear combinations of simplices with coefficients in G, and so we introduce the following:
Definition 3. 
An object of the form c = i a i Δ i k with a i Z is an integer-valued k-dimensional chain.
Linearity allows us to extend the previous definition to any subsets of simplices of K with dimension k.
Definition 4. 
The group C k ϵ ¯ ( X ) is called the group of k-dimensional simplicial integer-valued chains of the simplicial complex K.
It is then possible to associate with each simplicial complex the corresponding set of Abelian groups C 0 ϵ ¯ ( X ) , , C n ϵ ¯ ( X ) .
Definition 5. 
The boundary Δ k of an oriented simplex Δ k is the sum of all its ( k 1 ) -dimensional faces taken with a chosen orientation. More precisely,
Δ k = i = 0 k ( 1 ) k Δ i k 1 .
In a general setting, we can extend the boundary operator by linearity to a general element of C k ϵ ¯ ( X ) , obtaining a map k : C k ϵ ¯ ( X ) C k 1 ϵ ¯ ( X ) .
For any value of k, k is a linear map. Therefore, we can take into account its kernel: for instance, the group of k-cycles, Z k ϵ ¯ ( X ) : = ker ( k ) and the image, the group of k-boundaries, B k + 1 ϵ ¯ ( X ) : = im ( k ) . Then, H k ϵ ¯ ( X ) = Z k ϵ ¯ ( X ) / B k + 1 ϵ ¯ ( X ) is the k-homology group and represents the k-dimensional holes that can be recovered from the simplicial structure. We briefly recall here that, for instance, zero-dimensional holes correspond to connected components, one-dimensional holes are cycles, and two-dimensional holes are cavities/voids. Since they are algebraic invariants, they collect qualitative information regarding the topology of the data. The most crucial aspect is highlighting the best value for ϵ to obtain a simplicial complex K that faithfully reproduces the original manifold’s topological structure. The answer is not straightforward and the process reveals instability; therefore, the PH analyzes not only one simplicial complex but a nested sequence of them, and, following the evolution of such a structure, it notes down the features that gradually emerge. From a theoretical point of view, letting 0 < ϵ 1 < < ϵ l be an increasing sequence of real numbers, we obtain the filtration
K 1 K 2 K l
with K i = V R ( X , ϵ i ) , and then
Definition 6. 
The p-persistent homology group of K i is the group defined as
H k i , p = Z k i / ( B k i + p Z k i ) .
This group contains all stable homology classes in the interval i to i + p : they are born before the time/index i and are still alive after p steps. The persistent homology classes that are alive for large p correspond to stable topological features of S (see [23]). Along the filtration, the topological information appears and disappears; thus, it means that they may be represented with a couple of indexes. If p is such a feature, it must be born in some K i and die in K j , so it can be described as ( i , j ) , i < j . We underline here that j can be equal to + , since some features can be alive up to the end of the filtration. Hence, all such topological invariants live in the extended positive plane that here is denoted by R + 2 = R 0 × { R 0 { + } } . Another interesting aspect to highlight is that some features can appear more than once and, accordingly, such collections of points are called multisets. All of these observations are grouped into the following:
Definition 7. 
A Persistence Diagram (PD) Dr ( X , ε ) related to the filtration K 1 K 2 K l with ε : = ( ϵ 1 , , ϵ l ) is a multiset of points defined as
D r ( X , ε ) : = { ( b , d ) | ( b , d ) P r ( X , ε ) } Δ
where P r ( X , ε ) denotes the set of r-dimensional birth–death couples that came out along the filtration, each ( b , d ) is considered with its multiplicity, while points of Δ = { ( x , x ) | x 0 } with infinite multiplicity. One may consider all P r ( X , ε ) for every r together, obtaining the total PD denoted here by D ( X , ε ) , which we will usually consider in the following sections.
Each point ( b , d ) D r ( X , ε ) is known as a generator of the persistent homology, and it corresponds to a topological feature that is born at K b and dies at K d . The difference d b is called the persistence of the generator, which represents its lifespan and shows the robustness of the topological property.
Figure 2 is an example of a total PD collecting features of zero dimensions (in blue), one dimension (in orange), and two dimensions (in green). Points close to the diagonal represent features with a short lifetime, and so, usually, they are concerned with noise, while features far away are also relevant and meaningful and, based on applications, one can decide to consider both or only the most interesting ones. At the top of the Figure, there is a dashed line that indicates infinity and also allows us to plot couples as ( i , + ) .
In the previous definition, the set Δ is added to finding out proper bijections between sets that without Δ could not have the same number of points. This makes it possible to compute the proper distance between PDs.

Stability

A key property of PDs is stability under perturbation of the data. First, we recall two famous distances for sets:
Definition 8. 
Given two non-empty sets X , Y R d with equal cardinality, the Haussdorff distance is
d H ( X , Y ) : = max { sup x X inf y Y x y , sup y Y inf x X y x }
and the bottleneck distance is defined as
d B ( X , Y ) : = inf γ sup x X x γ ( x )
where we consider all possible bijection of multisets γ : X Y . Here, we use
v w = max { | v 1 w 1 | , | v 2 w 2 | } , f o r v = ( v 1 , v 2 ) , w = ( w 1 , w 2 ) R 2 .
We will now try to better explain how to compute the bottleneck distance. We have to take all possible ways to move points from X to Y in a bijective manner; then, we can compute properly the distance. Figure 3 shows two different PDs overlapped that consist of Δ joined with 2 points in red and 11 points in blue, respectively. First, in order to apply definition (1), we need two sets with the same cardinality. For this aim, it is necessary to add points of Δ —more precisely, points of the diagonal obtained, projecting in an orthogonal manner 9 blue points closer to it, to reach 11. The lines between the points and Δ represent the bijection that realizes the best matching between the points in definition (1).
Proposition 1. 
Let X and Y be a finite subset in a metric space ( M , d M ) . Then, two Persistence Diagrams, D ( X , ε ) , D ( Y , ε ) , satisfy
d B ( D ( X , ε ) , D ( Y , ε ) ) d H ( X , Y ) .
For any further details, see, for example, in [17].

2.2. Classification with SVM

Let Ω R d and { x 1 , , x m } X Ω be the set of input data with d , m N . We have a training set composed of the couples ( x i , y i ) with i = 1 , , m and y i Y = { 1 , 1 } . The binary supervised learning task consists in finding a function f : Ω Y , the model, such that it can predict satisfactorily the label of an unseen x ˜ Ω X .
The goal is to define a hyperplane that can separate, in the best possible way, points that belong to different classes and, from here, name the separating hyperplane. The best possible way means that it separates the two classes with the higher margin—that is, the distance between the hyperplane and the points of both classes.
More formally, if we assume that we are in a space F with a dot product—for instance, F can be a subset of R d with · , · , since a generic hyperplane can be defined as
{ x F | w , x + b = 0 } w F , b R
—then we can introduce the following definition.
Definition 9. 
We call
ρ w , b ( x , y ) : = y ( w , x + b ) w
the geometrical margin of the point ( x , y ) F × { 1 , 1 } . The minimum value,
ρ w , b : = min i = 1 , , m ρ w , b ( x i , y i )
may be called a geometrical margin of ( x 1 , y 1 ) , , ( x m , y m ) .
From a geometrical perspective, this margin measures effectively the distance between samples and the hyperplane itself. Then, the SVM is looking for a suitable hyperplane that intuitively realizes the maximum of such a margin. For any further details, see, for example, in [24]. The precise formalization brings us to an optimization problem that, thanks to the Lagrange multipliers and the Karush–Kuhn–Tucker conditions, turns out to have the following formulation, as an SVM optimization problem:
max α R m i = 1 m α i 1 2 i , j = 1 m α i α j y i y j x i , x j s . to i = 1 m α i y i = 0 0 α i C i = 1 , , m
where [ 0 , C ] m is the bounding box C [ 0 , + ) and α i > 0 are called the support vectors. Henceforth, the name Support Vector Machine is shortened to SVM, and · , · denotes the inner product in R d . This formulation can face satisfactorily the classification task if the data are linearly separable. In applications, this does not happen frequently, and so it is necessary to introduce some nonlinearity and to move in a higher-dimensional space where, hopefully, this can happen. This can be achieved with the use of kernels. Starting from the original dataset X , the theory tells us to introduce a feature map Φ : X H that moves data from X to a Hilbert space of function H : the so-called feature space. The kernel is then defined as κ ( x , x ¯ ) : = Φ ( x ) , Φ ( x ¯ ) H (kernel trick). Thus, the optimization problem becomes
max α R m i = 1 m α i 1 2 i , j = 1 m α i α j y i y j κ ( x i , x j ) s . to i = 1 m α i y i = 0 0 α i C i = 1 , , m
where the kernel represents a generalization of the inner product in R d . We are interested in classifying the PDs and, obviously, we need suitable definitions for the kernels for the PDs, the so-called Persistence Kernels (PK).

2.3. Persistence Kernels

In what follows, we denote with D the set of the total PDs.

2.3.1. Persistence Scale-Space Kernel (PSSK)

The first kernel was described in [8]. The main idea is to compute the feature map as the solution of the Heat equation. We consider Ω a d = { x = ( x 1 , x 2 ) R 2 : x 2 x 1 } and we denote with δ x the Dirac delta with its center at x. If D D , we take into account the solution u : Ω a d × R 0 R , ( x , t ) u ( x , t ) of the following PDE:
Δ x u = t u in Ω a d × R 0 u = 0 on Ω a d × R 0 u = y D δ y on Ω a d × 0 .
The feature map Φ σ : D L 2 ( Ω a d ) at scale σ > 0 at D is defined as Φ σ ( D ) = u | t = σ . This map yields the Persistence Scale-Space Kernel (PSSK) K P S S on D as
K P S S ( D , E ) = Φ σ ( D ) , Φ σ ( E ) L 2 ( Ω a d ) .
But, since it is known as an explicit formula for the solution u, the kernel takes the form
K P S S ( D , E ) = 1 8 π σ x D , y E exp ( x y 2 8 σ ) exp ( x y ¯ 2 8 σ )
where y = ( a , b ) , y ¯ = ( b , a ) for any D , E D .

2.3.2. Persistence Weighted Gaussian Kernel (PWGK)

In [10], the authors introduced a new kernel, whose idea is to replace each PD with a discrete measure. Starting with a strictly positive definite kernel—as, for example, the Gaussian one κ G ( x , y ) = e x y 2 2 ρ 2 , ρ > 0 —we indicate the corresponding Reproducing Kernel Hilbert Space H κ G .
If Ω R d , we denote with M b ( Ω ) the space of finite signed Radon measures and
E κ G : M b ( Ω ) H κ G , μ Ω κ G ( · , x ) d μ ( x ) .
For any D D , if μ D w = x D w ( x ) δ x , where the weight function satisfies w ( x ) > 0 for all x D , then
E κ G ( μ D w ) = x D w ( x ) κ G ( · , x )
where
w ( x ) = arctan ( C w p e r s ( x ) p )
and p e r s ( x ) = x 2 x 1 .
The Persistence Weight Gaussian Kernel (PWGK) is defined as
K P W G ( D , E ) = exp 1 2 τ 2 E κ G ( μ D w ) E κ G ( μ E w ) H κ G 2 , τ > 0
for any D , E D .

2.3.3. Sliced Wasserstein Kernel (SWK)

Another possible choice for κ was introduced in [11].
If μ and ν are two non-negative measures on R , such that μ ( R ) = r = | μ | and ν ( R ) = r = | ν | , then we recall that the 1-Wasserstein distance for non-negative measures is defined as
W ( μ , ν ) = inf P Π ( μ , ν ) R × R | x y | d P ( x , y )
where Π ( μ , ν ) is the set of measures on R 2 with marginals μ and ν .
Definition 10. 
If θ R 2 with θ 2 = 1 , let L ( θ ) indicate the line { λ θ | λ R } and let π θ : R 2 L ( θ ) be the orthogonal projection onto L ( θ ) . Let D , E D and let μ D θ : = x D δ π θ ( x ) and μ D Δ θ : = x D δ π θ π Δ ( x ) and similarly for μ E θ and μ E Δ θ , where π Δ denotes the orthogonal projection onto the diagonal. Then, the Sliced Wasserstein distance is
S W ( D , E ) = 1 2 π S 1 W ( μ D θ + μ E Δ θ , μ E θ + μ D Δ θ ) d θ .
Thus, the Sliced Wasserstein Kernel (SWK) is defined as
K S W ( D , E ) : = exp S W ( D , E ) 2 η 2 , η > 0
for any D , E D .

2.3.4. Persistence Fisher Kernel (PFK)

In [12], the authors described a kernel based on Fisher Information geometry.
Given a persistence diagram D D , it is possible to build a discrete measure μ D = u D δ u , where δ u is Dirac’s delta centered in u. Given a bandwidth σ > 0 and a set Θ , one can smooth and normalize μ D as follows:
ρ D : = 1 Z u D N ( x ; u , σ I )
where N is a Gaussian function, Z = θ u D N ( x ; u , σ I ) d x and I is the identity matrix. Thus, using this measure, any PD can be regarded as a point in P = { ρ | ρ ( x ) d x = 1 , ρ ( x ) 0 } .
Given the two elements in ρ i , ρ j P , the Fisher Information Metric is
d P ( ρ i , ρ j ) = arccos ρ i ( x ) ρ j ( x ) d x .
Inspired by the Sliced Wasserstein Kernel construction, we have the following definition.
Definition 11. 
Given two finite and bounded persistence diagrams D , E , the Fisher Information Metric between D and E is defined as
d F I M ( D , E ) : = d P ( ρ D E Δ , ρ E D Δ )
where D Δ : = { Π Δ ( u ) | u D } , E Δ : = { Π Δ ( u ) | u E } and Π Δ is the orthogonal projection on the diagonal Δ = { ( a , a ) | a 0 } .
The Persistence Fisher Kernel (PFK) is then defined as
K P F ( D , E ) : = exp ( t d F I M ( D , E ) ) , t > 0 , for any D , E D .

2.3.5. Persistence Image (PI)

The main reference is [13]. If D D then we introduce a change of coordinates, T : R 2 R 2 given by T ( x , y ) = ( x , y x ) and we let T ( D ) be the multiset made by the first-persistence coordinates. Let ϕ u : R 2 R be a differentiable probability distribution with mean u = ( u x , u y ) R 2 , usually ϕ u = g u , where g u is the two-dimensional Gaussian with mean u and variance σ 2 , defined as
g u ( x , y ) = 1 2 π σ 2 e [ ( x u x ) 2 + ( y u y ) 2 ] / 2 σ 2 .
Fix a weight function f : R 2 R , where f 0 , which is equal to zero on the horizontal axis, continuous and piecewise differentiable. A possible choice is a function that depends only on the persistence coordinate y, a function f ( x , y ) = w b ( y ) where
w b ( t ) = 0 if t 0 , t b if 0 < t < b , 1 if t b .
Definition 12. 
Given D D , the corresponding persistence surface  ρ D : R 2 R is the function
ρ D ( x , y ) = u T ( D ) f ( u ) ϕ u ( x , y ) .
If we divide the plane into a grid with n 2 pixels ( P i , j ) i , j = 1 , , n then we have the following definition.
Definition 13. 
Given D D , its persistence image is the collection of pixels
P I ( ρ D ) i , j = P i , j ρ D ( x , y ) d x d y .
Thus, through the persistence image, each persistence diagram is turned into a vector P I V R n 2 that is P I V ( D ) i + n ( j 1 ) = P I ( D ) i , j ; then, it is possible to introduce the following kernel:
K P I ( D , E ) = < P I V ( D ) , P I V ( E ) > R n 2 .

3. Results

3.1. Shape Parameters Analysis

From the definitions of the aforementioned kernels, it is evident that each of them depends on some parameters, and it is not clear, at present, which values are assigned to them. The cross-validation phase—or, for abbreviation, the CV phase—tries to answer to this question. For instance, in the CV phase the user defines a set of values for the parameters, and for any possible admissible choice he solves the classification task and measures the goodness of the obtained results. It is well-known that in the RBF interpolation literature [25] the shape parameters have to be chosen after a process similar to the CV phase. In the beginning, the user chooses some values for each parameter, and then, varying them, one checks the condition number and the interpolation error, noting how they vary according to the values of the parameters. The trade-off principle suggests considering values where the condition number is not huge (ill-conditioned) and the interpolation error is too small (accuracy). Now, in the context of classification, we have to replace the concepts of the condition number of the interpolation matrix and the interpolation error. We can achieve this goal by considering the condition number of the Gram matrix and the accuracy of the classifier. To obtain a good classifier, it is desirable to have a small condition number of the Gram matrix and high accuracy, as close to 1 as possible. The aim here was to run such an analysis for the kernels presented in this paper.
The PSSK has only one parameter to tune: σ . Typically, the users consider σ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 1000 } . We ran the CV phase for different shuffles of a dataset and plotted the results in terms of the condition number of the Gram matrix related to the training samples and the accuracy. For our analysis, we considered σ { 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 500 , 800 , 1000 } and we ran tests on some datasets cited in the following. The results were similar in each case, so we decided to report those for the SHREC14 dataset.
From the Figure 4, it is evident that large values of σ result in an unstable matrix and less accuracy. Then, in what follows, we will take into account only σ { 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1 , 10 } .
The PWGK is the kernel with a higher number of parameters to tune; therefore, it was not so evident what were the best-set values to take into account. We chose reasonable starting sets as follows: τ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 1000 } , ρ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 1000 } , p { 1 , 5 , 10 , 50 , 100 } , C w { 0.001 , 0.01 , 0.1 , 1 } . Due to a large number of parameters, we first ran some experiments varying ( ρ , τ ) with fixed ( p , C w ) , and then we reversed the roles.
We report here in Figure 5 only a plot for fixed C w and p because this highlights how high values of τ (for example τ = 1000 ) were excluded. We found this behavior for different values of C w , p and various datasets—here, for the case C w = 1 , p = 10 and the MUTAG dataset. Therefore, we decided to vary the parameters, as follows: τ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 } , ρ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 1000 } , p { 1 , 5 , 10 , 50 , 100 } , C w { 0.001 , 0.01 , 0.1 , 1 } . Unfortunately, there was no other evidence that could guide the choices, except for τ , where values τ = 1000 always had bad accuracy, as one can see below in the case of MUTAG with the shortest path distance.
In the case of the SWK, there is only one parameter, η . In [11], the authors proposed to consider values starting from the first and last decile and with the median value of the gram matrix of the training samples flattened, in order to obtain a vector; then, they multiplied these three values for 0.01 , 0.1 , 1 , 10 , 100 .
For our analysis, we decided to study the behavior of such kernels, considering the same set of values independently from the specific dataset. We considered η { 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 500 , 800 , 1000 } .
We ran tests on some datasets, and the plot, related to the DHFR dataset, revealed evidently that large values for η were to be excluded, as suggested by Figure 6. So, we decided to take η only in { 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1 , 10 } .
The PFK has two parameters: the variance σ and t. In [12], the authors exhibited the procedure to follow, in order to obtain the corresponding set of values. It shows that the choice of t depends on σ . Our aim in this paper was to carry out an analysis that was dataset-independent, which turned out to be strictly connected only to the definition of the kernel itself. First, we took different values for ( σ , t ) and we plotted the corresponding accuracies—here, in the case of MUTAG with the shortest path distance, but the same behavior holds true also for other datasets.
The condition numbers were indeed high for every choice of parameters and, therefore, we avoided reporting here, because it would have been meaningless. From the Figure 7, it is evident that it is convenient to set σ lower or equal to 10, while t should be set larger or equal to 0.1. Thus, in what follows, we took into account σ { 0.001 , 0.01 , 0.1 , 1 , 10 } and t { 0.1 , 1 , 10 , 100 , 1000 } .
In the case of the PI, we considered a reasonable set of values for the parameter σ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 1000 } . The results were related to BZR with the shortest path distance redand shown in Figure 8.
As in the previous kernels, it seemed that the accuracy was better for small values of σ . For this reason, we set σ { 0.000001 , 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1 , 10 } .

3.2. Numerical Tests

For what concerned the computation of simplicial complexes and persistence diagrams, we used some Python libraries available online: gudhi [26], ripser [27], giotto-tda [28] and persim [29]. On all the datasets, we performed a random splitting ( 70 % / 30 % ) for training and testing, and we applied a tenfold cross-validation on the training set, in order to tune the parameters. Then, we averaged the results over 10 runs. For balanced datasets, we measured the performances of the classifier through accuracy for binary and multiclass problems:
accuracy = number of test samples correctly classify all test samples
In the case of the imbalanced datasets, we adopted balanced accuracy, as explained in [30]: if for every class i we defined the related recall as
r e c a l l i = test samples of class i correctly classify all test samples of class i
then the balanced accuracy in the case of n different classes was
balanced_accuracy = i = 1 n r e c a l l i n
This definition was able to effectively quantify how accurate the classifier was, even in the case of the smallest classes. For the tests, we used the implementation of the SVM provided by the Scikit [31] library of Python. For PFK, we precomputed the Gram matrices using a Matlab (Matlab R2023b) routine because it is faster than the Python one. The values for C belonged to { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 } . For each kernel, we considered the following values for the parameters:
  • PSSK: σ { 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1 , 10 } .
  • PWGK: τ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 } , ρ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 1000 } , p { 1 , 5 , 10 , 50 , 100 } , C w { 0.001 , 0.01 , 0.1 , 1 } , and for the kernel we chose the Gaussian one.
  • SWK: η { 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1 , 10 } .
  • PFK: σ { 0.001 , 0.01 , 0.1 , 1 , 10 } and t { 0.1 , 1 , 10 , 100 , 1000 } .
  • PI: σ { 0.000001 , 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1 , 10 } and number of pixel 0.1.
All the codes were run using Python 3.11 on a 2.5 GHz Dual-Core Intel Core i5, 32 Giga RAM. They can be found and downloaded from the GitHub page https://github.com/cinziabandiziol/persistence_kernels (accessed on 1 August 2024).

3.3. Point Cloud Data and Shapes

3.3.1. Protein

This is the Protein Classification Benchmark dataset PCB00019 [32]. It sums up information for 1357 proteins related to 55 classification problems. The data were highly imbalanced and therefore we applied the classifier to one of them, whereby the imbalance was slightly less evident. Persistence diagrams were computed for each protein by considering the 3-D structure or, better still, the ( x , y , z ) position of any atoms in each of the 1357 molecules, as a point cloud in R 3 . Finally, using ripser we computed the persistence diagrams of only one dimension.

3.3.2. SHREC14—Synthetic Data

This dataset is related to the problem of non-rigid 3D shape retrieval. It collects exclusively human models in different body shapes and 20 poses, some examples are reported in Figure 9. It consists of 15 different human models, including man, woman, and child, each with its own body shape. Each of these models exists in 20 different poses, making up a dataset composed of 300 models.
For each shape, the meshes are given with about 60,000 vertices and, using the Heat Kernel Signature (HKS) introduced in [33], over different values of t i as [8], we computed the persistence diagrams of the induced filtrations in dimensions 1.

3.3.3. Orbit Recognition

We considered the dataset proposed in [13]. We took into account the linked twisted map, which modeled the fluid flows. The orbits could then be computed through the following discrete dynamical system:
x n + 1 = x n + r y n ( 1 y n ) mod 1 y n + 1 = y n + r x n + 1 ( 1 x n + 1 ) mod 1
with the starting point ( x 0 , y 0 ) [ 0 , 1 ] × [ 0 , 1 ] and r > 0 being a real number that influenced the behavior of the orbits, as shown in Figure 10.
As in [13], r 2.5 , 3.5 , 4 , 4.1 , 4.3 , and it was strictly connected to the label of the corresponding orbit. For each of them, we provided the first 1000 points of 50 orbits, with starting points chosen randomly. The final dataset was composed of 250 elements. We computed the PDs, considering only the one-dimensional features. Since each PD had a huge number of topological features, we decided to consider only the first 10 most persistent ones, as in [15].
Firstly, the great difference in performances among the different datasets was probably due to the high imbalance of the PROTEIN one, with respect to the perfect balance of the other ones. It is well-known that if the classifier does not have enough samples for each class, as in the case of the imbalanced dataset, it has to face significant issues in classifying correctly the elements of the minor classes. From Table 1 it is evident how, except for PROTEIN, where the PSSK showed slightly better performances, for SHREC14 and DYN SYS the best accuracy was achieved by the SWK.

3.4. Images

All the definitions introduced in Section 2 can be extended to another kind of simplicial complex, the cubical complex. This is useful when one deals with images or objects based on meshes, for example. More details can be found in [34].
Definition 14. 
An elementary cube Q R d can be defined as the product of I 1 , Q = I 1 × × I d , where each I j can be either a set with one element { m } or a unit-length interval [ m ; m + 1 ] for some m Z . A k-cube Q is a cube whose number of unit-length intervals in the product of Q is equal to k, which is then defined as the dimension of the cube Q. If Q 1 and Q 2 are cubes and Q 1 Q 2 then Q 1 is called a face of Q ¯ . A cubical complex X in R d is a collection of k-cubes ( 0 k d ) , such that:
  • every face of a cube in X has to belong to X;
  • given two cubes of X, their intersection must be either empty or a face of each of them.
The lower dimensional cubical complexes are reported in Figure 11.

MNIST and FMNIST

MNIST [35] is very common in the classification framework. It consists of 70,000 handwritten digits, in grayscale as one can see in Figure 12, which one can try to classify into 10 different classes. Each image can be viewed as a set of pixels with a value between 0 and 256 (black and white), as in the figure below:
Starting from this kind of dataset, we have to compute the corresponding persistent features. According to the approach proposed in [36] coming from [9], we first binarize each image: for instance, we replace each grayscale image with a white/black one, then we use as a filtration function the so-called Height filtration H ( p ) in [36]. For a cubical complex, for a chosen vector v R d of unit norm, it is defined as
H ( p ) = p , v if p is black , H otherwise
where H is a large default value chosen by the user. As in [9], we choose four different vectors for p, ( 1 , 0 ) , ( 1 , 0 ) , ( 0 , 1 ) , ( 0 , 1 ) , and we compute zero and one-dimensional persistent features, using both the tda-giotto and the gudhi libraries. Finally, we concatenate them. For the current experiment, we decided to focus the test on a subset of the original MNIST, composed of only 10,000 samples. This was a balanced dataset. Due to some memory issues, we had to consider for this dataset a pixel size of 0.5 and for the PWGK only τ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 } , ρ { 0.001 , 0.1 , 10 , 1000 } , p = 10 , C w { 0.001 , 0.01 , 0.1 , 1 } .
Another example of a grayscale image dataset is the FMNIST [37], which contains 28 × 28 grayscale images related to the fashion world. Figure 13 shows an example.
To deal with this, we followed another approach proposed in [23], where the authors applied padding, median filter, shallow thresholding and canny edges and then computed the usual filtration to the image obtained. Due to some memory issues, we had to consider for this dataset a pixel size of 1 and for the PWGK only τ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 } , ρ { 0.001 , 0.1 , 10 , 1000 } , p = 10 , C w { 0.001 , 0.01 , 0.1 , 1 } .
Both datasets were balanced, and it is probable that the results were better in the case of MNIST, due the fact that it is easier to classify handwritten digits instead of images of cloths. The SWK showed a slightly better performance as reported in Table 2.

3.5. Graphs

In many different contexts, from medicine to chemistry, data can have the structure of graphs. Graphs are couples of a set ( V , E ) , where V is the set of vertices and E is the set of edges. The graph classification is the task of attaching a label/class to each whole graph. In order to compute the persistent features, we needed to build a filtration. In the context of graphs, as in other cases, there are different definitions; see, for example, in [38].
We considered the Vietoris–Rips filtration, where, starting from the set of vertices, at each step we would add the corresponding edge whose weights were less or equal to a current value ϵ . This turned out to be the most common choice, and the software available online allowed us to build it after providing the corresponding adjacency matrix. In our experiments, we considered only undirected graphs, but, as in [38], building a filtration is possible also for directed graphs. Once defining the kind of filtration to use, one needs again to choose the corresponding weights. We decided to take into account first the shortest path distance and then the Jaccard index, as, for example, in [14].
Given two vertices u , v V the shortest path distance was defined as the minimum number of different edges that one has to meet going from u to v or vice versa, since the graphs here were considered as undirected. In graphs theory, this is a widely use metric.
The Jaccard index is a good measure of edge similarity. Given an edge e = ( u , v ) E then the corresponding Jaccard index is computed as
ρ ( u , v ) = | N B ( u ) N B ( v ) N B ( u ) N B ( v ) |
where N B ( u ) is the set of neighbors of u in the graph. This metric recovers the local information of nodes, in the sense that two nodes are considered similar if their neighbor sets are similar.
In both cases, we considered the sub-level set filtration and we collected both zero- and one-dimensional persistent features.
We took six of such sets among the graph benchmark datasets, all undirected, as follows:
  • MUTAG: a collection of nitroaromatic compounds, the goal being to predict their mutagenicity on Salmonella typhimurium;
  • PTC: a collection of chemical compounds represented as graphs that report the carcinogenicity of rats;
  • BZR: a collection of chemical compounds that one has to classify as active or inactive;
  • ENZYMES: a dataset of protein tertiary structures obtained from the BRENDA enzyme database; the aim is to classify each graph into six enzymes;
  • DHFR: a collection of chemical compounds that one has to classify as active or inactive;
  • PROTEINS: in each graph, nodes represent the secondary structure elements; the task is to predict whether or not a protein is an enzyme.
The properties of the above are summarized in Table 3, where the IR index is the so-called Imbalanced Ratio (IR) that denotes the imbalance of the dataset, and it is defined as a sample size of the major class over a sample size of the minor class.
The computations of the adjacency matrix and the PDs were made using the functions implemented in tda-giotto.
The performances achieved with the two edge weights are reported in Table 4 and Table 5.
Thanks to these results, two conclusions can be reached. The first one is that, as expected, the goodness of the classifier is strictly related to the particular filtration used for the computation of persistent features. The second one is related to the fact that the SWK and the PFK seem to work slightly better than the other kernels: in the case of the shortest path distance in Table 4, the SWK is to be preferred while the PFK seems to work better in the case of the Jaccard index Table 5. In the case of PROTEINS, in both cases the PWGK provides the best Balanced Accuracy.

3.6. One-Dimensional Time Series

In many different applications, one can deal with one-dimensional time series. A one-dimensional time series is a set { x t R | t = 1 , , T } . In [39], the authors provided different approaches to building a filtration upon this kind of data. We decided to adopt the most common one. Thanks to the Taken’s embedding, these data could be translated into point clouds. With suitable choices for two parameters— τ > 0 for the delay parameter and d > 0 for the dimension—it was possible to compute a subset of points in R d composed by v i = { x i , x i + τ , , x i + ( d 1 ) τ } for i = 1 , , T ( d 1 ) τ . The theory mentioned above related to point clouds could then be applied to signals, as points in R d . For how to choose values for the parameters, see [39]. The datasets for the tests, in Table 6, were taken from the UCR Time Series Classification Archive (2018) [40], which consists of 128 datasets of time series from different worlds of application. In the archive, there is a split into test and train sets, but for the aim of our analysis we did not regard the split; we considered the train and test data as a whole dataset and then the codes provided properly the subdivision.
Using giotto-tda, we computed the persistent features of zero, one and two dimensions and joined them together. The final results of the datasets are reported in Table 7.
As in the previous examples, the SWK achieved the best results and provided slightly better performances, in terms of accuracy.

4. Discussion

In this paper, we compared the performances of five Persistent Kernels applied to data of different natures. The results show how different PKs are indeed comparable, in terms of accuracy, and that there was no one PK that emerged clearly above the others. However, in many cases, the SWK and PFK performed slightly better. In addition, from a purely computational point of view, the SWK is to be preferred, since by construction the preGram matrix is parameter-independent. Therefore, in practice, the user has to compute such a matrix on the whole dataset only once at the beginning and then choose a suitable subset of rows and columns to perform the training, cross-validation and test phases. This aspect is relevant and reduces the computational costs and time compared with other kernels. Another aspect to be considered, as in the case of graphs, is how to choose the function f that provides the filtration. The choice of such a function is still an open problem and an interesting field of research. The right choice, in fact, would guarantee being able to better extract the intrinsic information from data, thus improving the classifier’s performances. For the sake of completeness, we recall here that in the literature there is also an interesting direction of research, the aim of which is to build a new PK starting from the main five kernels that we introduced in Section 2.3. From one of the PKs mentioned in previous sections, the authors in [15,41] studied how to modify them, obtaining the so-called Variably Scaled Persistent Kernels, which are Variably Scaled Kernels applied to the classification context. The results reported by the authors are indeed promising. This could, therefore, be another interesting direction for further analysis.

Author Contributions

Conceptualization, C.B. and S.D.M.; methodology, C.B. and S.D.M.; software, C.B.; writing—original draft preparation, C.B. and S.D.M.; supervision, S.D.M.; funding acquisition, S.D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was achieved as part of RITA “Research ITalian network on Approximation” and as part of the UMI topic group “Teoria dell’Approssimazione e Applicazioni”. The authors are members of the INdAM-GNCS Research group. The project was also funded by the European Union-Next Generation EU under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.1—Call PRIN 2022 No. 104 of 2 February 2022 of the Italian Ministry of University and Research; Project 2022FHCNY3 (subject area: PE—Physical Sciences and Engineering) “Computational mEthods for Medical Imaging (CEMI)”.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
TDATopological Data Analysis
PDPersistence Diagram
PKPersistence Kernel
PHPersistent Homology
SVMSupport Vector Machine
PSSKPersistence Scale-Space Kernel
PWGKPersistence Weighted Gaussian Kernel
SWKSliced Wasserstein Kernel
PFKPersistence Fisher Kernel
PIPersistence Image

References

  1. Townsend, J.; Micucci, C.P.; Hymel, J.H.; Maroulas, V.; Vogiatzis, K.D. Representation of molecular structures with persistent homology for machine learning applications in chemistry. Nat. Commun. 2020, 11, 3230. [Google Scholar] [CrossRef] [PubMed]
  2. Asaad, A.; Ali, D.; Majeed, T.; Rashid, R. Persistent Homology for Breast Tumor Classification Using Mammogram Scans. Mathematics 2022, 10, 21. [Google Scholar] [CrossRef]
  3. Pachauri, D.; Hinrichs, C.; Chung, M.K.; Johnson, S.C.; Singh, V. Topology based Kernels with Application to Inference Problems in Alzheimer’s disease. IEEE Trans. Med. Imaging 2011, 30, 1760–1770. [Google Scholar] [CrossRef]
  4. Flammer, M. Persistent Homology-Based Classification of Chaotic Multi-variate Time Series: Application to Electroencephalograms. SN Comput. Sci. 2024, 5, 107. [Google Scholar] [CrossRef]
  5. Majumdar, S.; Laha, A.K. Clustering and classification of time series using topological data analysis with applications to finance. Expert Syst. Appl. 2020, 162, 113868. [Google Scholar] [CrossRef]
  6. Brüel-Gabrielsson, R.; Ganapathi-Subramanian, V.; Skraba, P.; Guibas, L.J. Topology-Aware Surface Reconstruction for Point Clouds. Comput. Graph. Forum 2020, 39, 197–207. [Google Scholar] [CrossRef]
  7. Cohen-Steiner, D.; Edelsbrunner, H.; Harer, J. Stability of persistence diagrams. Discret. Comput. Geom. 2007, 37, 103–120. [Google Scholar] [CrossRef]
  8. Reininghaus, J.; Huber, S.; Bauer, U.; Kwitt, R. A Stable Multi-Scale Kernel for Topological Machine Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4741–4748. [Google Scholar]
  9. Barnes, D.; Polanco, L.; Peres, J.A. A Comparative Study of Machine Learning Methods for Persistence Diagrams. Front. Artif. Intell. 2021, 4, 681174. [Google Scholar] [CrossRef]
  10. Kusano, G.; Fukumizu, K.; Hiraoka, Y. Kernel method for persistence diagrams via kernel embedding and weight factor. J. Mach. Learn. Res. 2017, 18, 6947–6987. [Google Scholar]
  11. Carriere, M.; Cuturi, M.; Oudot, S. Sliced Wasserstein kernel for persistent diagrams. Int. Conf. Mach. Learn. 2017, 70, 664–673. [Google Scholar]
  12. Le, T.; Yamada, M. Persistence fisher kernel: A riemannian manifold kernel for persistence diagrams. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
  13. Adams, H.; Emerson, T.; Kirby, M.; Neville, R.; Peterson, C.; Shipman, P.; Chepushtanova, S.; Hanson, E.; Motta, F.; Ziegelmeier, L. Persistence images: A stable vector representation of persistent homology. J. Mach. Learn. Res. 2017, 18, 1–35. [Google Scholar]
  14. Zhao, Q.; Wang, Y. Learning metrics for persistence-based summaries and applications for graph classification. arXiv 2019, arXiv:1904.12189. [Google Scholar]
  15. De Marchi, S.; Lot, F.; Marchetti, F.; Poggiali, D. Variably Scaled Persistence Kernels (VSPKs) for persistent homology applications. J. Comput. Math. Data Sci. 2022, 4, 100050. [Google Scholar] [CrossRef]
  16. Fomenko, A.T. Visual Geometry and Topology; Springer Science and Business Media: New York, NY, USA, 2012. [Google Scholar]
  17. Rotman, J.J. An Introduction to Algebraic Topology; Springer: New York, NY, USA, 1988. [Google Scholar]
  18. Edelsbrunner, H.; Harer, J. Persistent homology—A survey. Contemp. Math. 2008, 453, 257–282. [Google Scholar]
  19. Edelsbrunner, H.; Harer, J. Computational Topology: An Introduction; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
  20. Guillemard, M.; Iske, A. Interactions between kernels, frames and persistent homology. In Recent Applications of Harmonic Analysis to Function Spaces, Differential Equations, and Data Science; Springer: Cham, Switzerland, 2017; pp. 861–888. [Google Scholar]
  21. Carlsson, G. Topology and data. Bull. Am. Math. Soc. 2009, 46, 255–308. [Google Scholar] [CrossRef]
  22. Pickup, D.; Sun, X.; Rosin, P.L.; Martin, R.R.; Cheng, Z.; Lian, Z.; Aono, M.; Ben Hamza, A.; Bronstein, A.; Bronstein, M.; et al. SHREC’ 14 Track: Shape Retrieval of Non-Rigid 3D Human Models. In Proceedings of the 7th Eurographics workshop on 3D Object Retrieval, EG 3DOR’14, Strasbourg, France, 6 April 2014. [Google Scholar]
  23. Ali, D.; Asaad, A.; Jimenez, M.; Nanda, V.; Paluzo-Hidalgo, E.; Soriano-Trigueros, M. A Survey of Vectorization Methods in Topological Data Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 14069–14080. [Google Scholar] [CrossRef]
  24. Scholkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond; The MIT Press: Cambridge, MA, USA, 2002; ISBN 978-026-225-693-3. [Google Scholar]
  25. Fasshauer, G.E. Meshfree Approximation with MATLAB; World Scientific: Singapore, 2007; ISBN 978-981-270-634-8. [Google Scholar]
  26. The GUDHI Project, GUDHI User and Reference Manual, 3.5.0 Edition, GUDHI Editorial Board. 2022. Available online: https://gudhi.inria.fr/doc/3.5.0/ (accessed on 13 January 2022).
  27. Tralie, C.; Saul, N.; Bar-On, R. Ripser.py: A lean persistent homology library for python. J. Open Source Softw. 2018, 3, 925. [Google Scholar] [CrossRef]
  28. Giotto-tda 0.5.1 Documentation. 2021. Available online: https://giotto-ai.github.io/gtda-docs/0.5.1/library.html (accessed on 25 January 2019).
  29. Saul, N.; Tralie, C. Scikit-tda: Topological Data Analysis for Python. 2019. Available online: https://docs.scikit-tda.org/en/latest/ (accessed on 25 January 2019).
  30. Grandini, M.; Bagli, E.; Visani, G. Metrics for multi-class classification: An overview. arXiv 2020, arXiv:2008.05756. [Google Scholar]
  31. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  32. Sonego, P.; Pacurar, M.; Dhir, S.; Kertész-Farkas, A.; Kocsor, A.; Gáspári, Z.; Leunissen, J.A.M.; Pongor, S. A Protein Classification Benchmark collection for machine learning. Nucleic Acids Res. 2006, 35, D232–D236. [Google Scholar] [CrossRef]
  33. Sun, J.; Ovsjanikov, M.; Guibas, L. A Coincise and Provably Informative Multi-Scale Signature Based on Heat Diffusion. Comput. Graph. Forum 2009, 28, 1383–1392. [Google Scholar] [CrossRef]
  34. Lee, D.; Lee, S.H.; Jung, J.H. The effects of topological features on convolutional neural networks—An explanatory analysis via Grad-CAM. Mach. Learn. Sci. Technol. 2023, 4, 035019. [Google Scholar] [CrossRef]
  35. LeCun, Y.; Cortes, C. MNIST Handwritten Digit Database. 2010. Available online: https://yann.lecun.com/exdb/mnist/ (accessed on 10 November 1998).
  36. Garin, A.; Tauzin, G. A Topological “Reading” Lesson: Classification of MNIST using TDA. In Proceedings of the 18th IEEE International Conference On Machine Learning And Applications, Boca Raton, FL, USA, 16–19 December 2019; pp. 1551–1556. [Google Scholar]
  37. Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
  38. Aktas, M.E.; Akbas, E.; El Fatmaoui, A. Persistent Homology of Networks: Methods and Applications. Appl. Netw. Sci. 2019, 4, 61. [Google Scholar] [CrossRef]
  39. Ravinshanker, N.; Chen, R. An introduction to persistent homology for time series. WIREs Comput. Stat. 2021, 13, e1548. [Google Scholar] [CrossRef]
  40. Dau, H.A.; Keogh, E.; Kamgar, K.; Yeh, C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Chen, Y.; Hu, B.; Begum, N.; et al. University of California Riverside. Available online: https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (accessed on 1 October 2018).
  41. De Marchi, S.; Lot, F.; Marchetti, F. Kernel-Based Methods for Persistent Homology and Their Applications to Alzheimer’s Disease. Master’s Thesis, University of Padova, Padova, Italy, 25 June 2021. [Google Scholar]
Figure 1. An example of a valid simplicial complex (left) and an invalid one (right).
Figure 1. An example of a valid simplicial complex (left) and an invalid one (right).
Symmetry 16 01236 g001
Figure 2. Example of PD with features of zero, one and two dimensions.
Figure 2. Example of PD with features of zero, one and two dimensions.
Symmetry 16 01236 g002
Figure 3. Example of bottleneck distance between two PDs in red and blue.
Figure 3. Example of bottleneck distance between two PDs in red and blue.
Symmetry 16 01236 g003
Figure 4. Comparison results about PSSK for SHREC14, in terms of condition number (left) and accuracy (right) with different σ .
Figure 4. Comparison results about PSSK for SHREC14, in terms of condition number (left) and accuracy (right) with different σ .
Symmetry 16 01236 g004
Figure 5. Comparison results about the PWGK for MUTAG, in terms of accuracy with different τ and ρ .
Figure 5. Comparison results about the PWGK for MUTAG, in terms of accuracy with different τ and ρ .
Symmetry 16 01236 g005
Figure 6. Comparison results about the SWK for DHFR, in terms of condition number (left) and accuracy (right) with different η .
Figure 6. Comparison results about the SWK for DHFR, in terms of condition number (left) and accuracy (right) with different η .
Symmetry 16 01236 g006
Figure 7. Comparison results about the PFK for MUTAG, in terms of accuracy with different t and σ .
Figure 7. Comparison results about the PFK for MUTAG, in terms of accuracy with different t and σ .
Symmetry 16 01236 g007
Figure 8. Comparison results about the PI for BZR, in terms of condition number (left) and accuracy (right) with different σ .
Figure 8. Comparison results about the PI for BZR, in terms of condition number (left) and accuracy (right) with different σ .
Symmetry 16 01236 g008
Figure 9. Some elements of the SHREC14 dataset.
Figure 9. Some elements of the SHREC14 dataset.
Symmetry 16 01236 g009
Figure 10. Orbits composed by the first 1000 iterations of the twisted map with r = 3.5 , 4.1 , 4.3 from left to right, starting from the fixed random ( x 0 , y 0 ) [ 0 , 1 ] 2 .
Figure 10. Orbits composed by the first 1000 iterations of the twisted map with r = 3.5 , 4.1 , 4.3 from left to right, starting from the fixed random ( x 0 , y 0 ) [ 0 , 1 ] 2 .
Symmetry 16 01236 g010
Figure 11. Cubical simplices.
Figure 11. Cubical simplices.
Symmetry 16 01236 g011
Figure 12. Example of an element in the MNIST dataset.
Figure 12. Example of an element in the MNIST dataset.
Symmetry 16 01236 g012
Figure 13. Example of an element in the FMNIST dataset.
Figure 13. Example of an element in the FMNIST dataset.
Symmetry 16 01236 g013
Table 1. Accuracy related to point cloud and shape datasets (Balanced Accuracy only for the PROTEIN dataset). The best results are underlined in bold.
Table 1. Accuracy related to point cloud and shape datasets (Balanced Accuracy only for the PROTEIN dataset). The best results are underlined in bold.
KernelPROTEINSHREC14DYN SYS
PSSK0.5610.9330.829
PWGK0.5380.9230.819
SWK0.5310.9350.841
PFK0.5560.9350.784
PI0.5600.9340.777
Table 2. Accuracy related to MNIST and FMNIST. The best results are underlined in bold.
Table 2. Accuracy related to MNIST and FMNIST. The best results are underlined in bold.
KernelMNISTFMNIST
PSSK0.7290.664
PWGK0.7540.684
SWK0.8020.709
PFK0.7340.671
PI0.7600.651
Table 3. Graph datasets.
Table 3. Graph datasets.
DatasetN° GraphsN° ClassesIR
MUTAG1882125:63
PTC3442192:152
BZR4052319:86
ENZYMES6006100:100
DHFR7562461:295
PROTEINS11132663:450
Table 4. Balanced Accuracy related to graph datasets using the shortest path distance (for the ENZYMES dataset, Accuracy only). The best results are underlined in bold.
Table 4. Balanced Accuracy related to graph datasets using the shortest path distance (for the ENZYMES dataset, Accuracy only). The best results are underlined in bold.
KernelMUTAGPTCBZRDHFRPROTEINSENZYMES
PSSK0.8680.5450.6060.5570.6680.281
PWGK0.8580.5100.6440.6550.6940.329
SWK0.8720.5110.7120.6560.6860.370
PFK0.8420.5340.6820.6560.6940.341
PI0.8630.5420.5850.5190.6910.285
Table 5. Balanced Accuracy related to graph datasets using the Jaccard Index (for the ENZYMES dataset, Accuracy only) The best results are underlined in bold.
Table 5. Balanced Accuracy related to graph datasets using the Jaccard Index (for the ENZYMES dataset, Accuracy only) The best results are underlined in bold.
KernelMUTAGPTCBZRDHFRPROTEINSENZYMES
PSSK0.8650.4900.7040.7170.6750.298
PWGK0.8590.5160.7200.7270.6990.355
SWK0.8580.5230.7030.7260.6890.406
PFK0.8740.5540.7040.7430.6780.400
PI0.8460.4780.6700.7120.6900.280
Table 6. Time series datasets.
Table 6. Time series datasets.
DatasetN° Time SeriesN° ClassesIR
ECG2002002133:67
SONY6212349:272
DISTAL8762539:337
STRAWBERRY9832632:351
POWER10962549:547
MOTE12722685:587
Table 7. Balanced Accuracy related to time series datasets. The best results are underlined in bold.
Table 7. Balanced Accuracy related to time series datasets. The best results are underlined in bold.
KernelECG200SONYDISTALSTRAWBERRYPOWERMOTE
PSSK0.6420.8740.6580.8140.7200.618
PWGK0.7260.8880.6960.8920.7690.633
SWK0.7310.8920.7230.8980.7840.671
PFK0.7070.8950.6760.8920.7500.652
PI0.7170.8410.6620.7930.7120.606
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bandiziol, C.; De Marchi, S. Persistence Symmetric Kernels for Classification: A Comparative Study. Symmetry 2024, 16, 1236. https://doi.org/10.3390/sym16091236

AMA Style

Bandiziol C, De Marchi S. Persistence Symmetric Kernels for Classification: A Comparative Study. Symmetry. 2024; 16(9):1236. https://doi.org/10.3390/sym16091236

Chicago/Turabian Style

Bandiziol, Cinzia, and Stefano De Marchi. 2024. "Persistence Symmetric Kernels for Classification: A Comparative Study" Symmetry 16, no. 9: 1236. https://doi.org/10.3390/sym16091236

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop